log

age author description
2021-03-14 Henry S. Thompson environment improvements
2021-03-03 Henry S. Thompson trying to move to slurm
2020-05-09 Henry S. Thompson improved F handling/logging
2020-05-08 Henry S. Thompson keep separate antecedants separate, buggy?
2020-05-07 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
2020-05-07 Henry S. Thompson refactor, change summary print (problem?)
2020-05-06 Henry S. Thompson bare framework working
2020-05-06 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
2020-05-06 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
2020-05-06 Henry S. Thompson works for big files with Hadoop 3.4.0
2020-05-06 Henry S. Thompson x
2020-04-28 Henry S. Thompson log trucations
2020-04-28 Henry S. Thompson impose some limits
2020-04-28 Henry S. Thompson x
2020-04-24 Henry S. Thompson x
2020-04-24 Henry S. Thompson mostly from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson fix from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson several efficiency (hofentlich) tweaks
2020-04-23 Henry S. Thompson x
2020-04-23 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg
2020-04-22 Henry S. Thompson java stuff
2020-04-22 Henry S. Thompson try nutch fetch for big pdfs
2020-04-15 Henry S. Thompson final most general versin
2020-04-14 Henry S. Thompson too big for /dev/shm, split in half
2020-04-14 Henry S. Thompson one-off to convert big extracts.tar into lots of smaller ones
2020-04-13 Henry S. Thompson as used successfully for 3rd run
2020-04-13 Henry S. Thompson ready to try another pass with robust diff checking
2020-04-13 Henry S. Thompson working towards more robust diff checking
2020-04-11 Henry S. Thompson a few tweaks after 2nd parallel run
2020-04-10 Henry S. Thompson another few log fixes
2020-04-10 Henry S. Thompson as running, modulo 1 log output wrong
2020-04-10 Henry S. Thompson log more, work around more glitches
2020-04-10 Henry S. Thompson x
2020-04-08 Henry S. Thompson start try to work around failures
2020-04-08 Henry S. Thompson parallelised version of reExtract.sh
2020-04-07 Henry S. Thompson complete change of array var construction, used it for log file names too, tar update enabled, so maybe complete but w/o any parallel
2020-04-04 Henry S. Thompson added computation of required additions to tar file, but not actually added
2020-04-03 Henry S. Thompson refactored, not tested
2020-04-03 Henry S. Thompson done through re-extraction, fixing tars still to come
2020-04-02 Henry S. Thompson sketching more
2020-04-02 Henry S. Thompson towards re-running extraction in part
2020-04-02 Henry S. Thompson up the time limit
2020-04-02 Henry S. Thompson clean up after ourselves
2020-03-26 Henry S. Thompson fixed scope pblm in tar step
2020-03-26 Henry S. Thompson sync up filenames and log names,
2020-03-26 Henry S. Thompson pass through extract args
2020-03-24 Henry S. Thompson towards sub-division of resulting tar files
2020-03-24 Henry S. Thompson not relevant
2020-03-19 Henry S. Thompson x
2020-03-19 Henry S. Thompson better quoting
2020-03-18 Henry S. Thompson try to fix multi-line lossage
2020-03-18 Henry S. Thompson fix missing use of $t