Mercurial > hg > cc > work
changeset 50:5556c04c7597
all of 49?
author | Henry S. Thompson <ht@inf.ed.ac.uk> |
---|---|
date | Wed, 09 Oct 2024 09:43:07 +0100 |
parents | deeac8a0a682 |
children | dc24bb6e524f |
files | lurid3/notes.txt |
diffstat | 1 files changed, 22 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/lurid3/notes.txt Fri Oct 04 21:41:53 2024 +0100 +++ b/lurid3/notes.txt Wed Oct 09 09:43:07 2024 +0100 @@ -852,3 +852,25 @@ if list of matches is empty, redo setting of lowest Resort the result by actual key + +Meanwhile, get a whole test set: +sbatch --output=slurm_aug_cdx_49_10-599-out --time=01:00:00 --ntasks=10 -c 36 --exclusive $HOME/bin/runme.sh -m 00 59 $PWD -t 18 -b 'export resdir=CC-MAIN-2019-35/aug_cdx/49 +export DEC=$xarg' "export PYTHONPATH=./lib/python/cc:$PYTHONPATH +seq 0 9 | parallel -j 10 \"~/lib/python/cc/cdx_extras.py /beegfs/common_crawl/CC-MAIN-2019-35/1566027314638.49/orig/warc/CC-MAIN-*-*-00\${DEC}'{}'.warc.gz > \$resdir/00\${DEC}'{}'.tsv\"" + +Actually finished 360 in the hour. + +Leaving + +sbatch --output=slurm_aug_cdx_49_360-599-out --time=01:00:00 --ntasks=10 -c 36 --exclusive $HOME/bin/runme.sh -m 36 59 $PWD -t 18 -b 'export resdir=CC-MAIN-2019-35/aug_cdx/49 +export DEC=$xarg' "export PYTHONPATH=./lib/python/cc:$PYTHONPATH +seq 0 9 | parallel -j 10 \"~/lib/python/cc/cdx_extras.py /beegfs/common_crawl/CC-MAIN-2019-35/1566027314638.49/orig/warc/CC-MAIN-*-*-00\${DEC}'{}'.warc.gz > \$resdir/00\${DEC}'{}'.tsv\"" + +But something is wrong, the number of jobs is all wrong: + + 5>: fgrep -c parallel slurm_aug_cdx_49_0-359-out + 741 + sing<4046>: ls -lt CC-MAIN-2019-35/aug_cdx/49/|wc -l + 372 + +Every file is being produced twice.