Mercurial > hg > cc > work
changeset 79:0099d4269428
trying 15 down to 8 at a time
author | Henry Thompson <ht@markup.co.uk> |
---|---|
date | Wed, 19 Mar 2025 21:48:44 -0400 (6 days ago) |
parents | 45b45810aa12 |
children | 02fb801ac3c1 |
files | lurid3/notes.txt |
diffstat | 1 files changed, 17 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/lurid3/notes.txt Wed Mar 19 14:42:41 2025 -0400 +++ b/lurid3/notes.txt Wed Mar 19 21:48:44 2025 -0400 @@ -1835,6 +1835,23 @@ all.cdx:20230922044250 "http://www.dryelf.com/downloads.asp?Sort=0&Cat=3", all.cdx:20230922053834 "http://www.dryelf.com/downloads.asp?Sort=2&Cat=0", +OK, rerun in the right place: + >: cd ~/results/CC-MAIN-2023-40/warc_lmhx + >: time python3 -c 'import sys,warc2cdb; sys.exit(warc2cdb.main(*sys.argv[1:]))' 2023-40 15 . 2> 15/w2c_errs &/work/dc007/dc007/hst/lib/python/cc:/work/dc007/dc007/hst/lib/python/cc/lmh + + real 210m54.183s + user 152m34.172s + sys 12m25.122s + +Ran a test of 5 jobs on a single compute node, did 78 files in 18 +minutes. Try 15, for grins + >: export PYTHONPATH=/work/dc007/dc007/hst/lib/python/cc:/work/dc007/dc007/hst/lib/python/cc/lmh + >: cd $W/results/CC-MAIN-2023-40/warc_lmhx + >: { date ; seq 0 14 | parallel "python3 -c 'import sys,warc2cdb; +sys.exit(warc2cdb.main(*sys.argv[1:]))' 2023-40 '{}' . 2> '{}'/w2c_errs" ; date ; } & + Thu Mar 20 12:05:50 AM GMT 2025 +After an hour, we're only at about 160 per process, and the job only +has 3:30 to run, so I'm going to kill 8-14. ================ Try it with the existing _per segment_ index we have for 2019-35