Mercurial > hg > cc > work
comparison lurid3/notes.txt @ 50:5556c04c7597
all of 49?
author | Henry S. Thompson <ht@inf.ed.ac.uk> |
---|---|
date | Wed, 09 Oct 2024 09:43:07 +0100 |
parents | deeac8a0a682 |
children | dc24bb6e524f |
comparison
equal
deleted
inserted
replaced
49:deeac8a0a682 | 50:5556c04c7597 |
---|---|
850 remove fileno from list of matches | 850 remove fileno from list of matches |
851 read and store a new line for fileno [handle EOF] | 851 read and store a new line for fileno [handle EOF] |
852 if list of matches is empty, redo setting of lowest | 852 if list of matches is empty, redo setting of lowest |
853 | 853 |
854 Resort the result by actual key | 854 Resort the result by actual key |
855 | |
856 Meanwhile, get a whole test set: | |
857 sbatch --output=slurm_aug_cdx_49_10-599-out --time=01:00:00 --ntasks=10 -c 36 --exclusive $HOME/bin/runme.sh -m 00 59 $PWD -t 18 -b 'export resdir=CC-MAIN-2019-35/aug_cdx/49 | |
858 export DEC=$xarg' "export PYTHONPATH=./lib/python/cc:$PYTHONPATH | |
859 seq 0 9 | parallel -j 10 \"~/lib/python/cc/cdx_extras.py /beegfs/common_crawl/CC-MAIN-2019-35/1566027314638.49/orig/warc/CC-MAIN-*-*-00\${DEC}'{}'.warc.gz > \$resdir/00\${DEC}'{}'.tsv\"" | |
860 | |
861 Actually finished 360 in the hour. | |
862 | |
863 Leaving | |
864 | |
865 sbatch --output=slurm_aug_cdx_49_360-599-out --time=01:00:00 --ntasks=10 -c 36 --exclusive $HOME/bin/runme.sh -m 36 59 $PWD -t 18 -b 'export resdir=CC-MAIN-2019-35/aug_cdx/49 | |
866 export DEC=$xarg' "export PYTHONPATH=./lib/python/cc:$PYTHONPATH | |
867 seq 0 9 | parallel -j 10 \"~/lib/python/cc/cdx_extras.py /beegfs/common_crawl/CC-MAIN-2019-35/1566027314638.49/orig/warc/CC-MAIN-*-*-00\${DEC}'{}'.warc.gz > \$resdir/00\${DEC}'{}'.tsv\"" | |
868 | |
869 But something is wrong, the number of jobs is all wrong: | |
870 | |
871 5>: fgrep -c parallel slurm_aug_cdx_49_0-359-out | |
872 741 | |
873 sing<4046>: ls -lt CC-MAIN-2019-35/aug_cdx/49/|wc -l | |
874 372 | |
875 | |
876 Every file is being produced twice. |