changeset 79:0099d4269428

trying 15 down to 8 at a time
author Henry Thompson <ht@markup.co.uk>
date Wed, 19 Mar 2025 21:48:44 -0400 (6 days ago)
parents 45b45810aa12
children 02fb801ac3c1
files lurid3/notes.txt
diffstat 1 files changed, 17 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/lurid3/notes.txt	Wed Mar 19 14:42:41 2025 -0400
+++ b/lurid3/notes.txt	Wed Mar 19 21:48:44 2025 -0400
@@ -1835,6 +1835,23 @@
   all.cdx:20230922044250 "http://www.dryelf.com/downloads.asp?Sort=0&Cat=3",
   all.cdx:20230922053834 "http://www.dryelf.com/downloads.asp?Sort=2&Cat=0",
 
+OK, rerun in the right place:
+  >: cd ~/results/CC-MAIN-2023-40/warc_lmhx
+  >: time python3 -c 'import sys,warc2cdb; sys.exit(warc2cdb.main(*sys.argv[1:]))'  2023-40 15 . 2> 15/w2c_errs &/work/dc007/dc007/hst/lib/python/cc:/work/dc007/dc007/hst/lib/python/cc/lmh
+
+  real    210m54.183s
+  user    152m34.172s
+  sys     12m25.122s
+
+Ran a test of 5 jobs on a single compute node, did 78 files in 18
+minutes.  Try 15, for grins
+  >: export PYTHONPATH=/work/dc007/dc007/hst/lib/python/cc:/work/dc007/dc007/hst/lib/python/cc/lmh
+  >: cd $W/results/CC-MAIN-2023-40/warc_lmhx
+  >: { date ; seq 0 14 | parallel "python3 -c 'import sys,warc2cdb;
+sys.exit(warc2cdb.main(*sys.argv[1:]))'  2023-40 '{}' . 2> '{}'/w2c_errs" ; date ; } &
+  Thu Mar 20 12:05:50 AM GMT 2025
+After an hour, we're only at about 160 per process, and the job only
+has 3:30 to run, so I'm going to kill 8-14.
 ================
 
 Try it with the existing _per segment_ index we have for 2019-35