log

age author description
2021-04-14 Henry S. Thompson working with one input
2021-04-14 Henry S. Thompson -w and -h working
2021-04-13 Henry S. Thompson working on flags
2021-04-13 Henry S. Thompson new
2021-03-16 Henry S. Thompson working with locking and copying
2021-03-15 Henry S. Thompson working for -t 2 -c 2
2021-03-15 Henry S. Thompson minor
2021-03-14 Henry S. Thompson prepare for real parallel distribution
2021-03-14 Henry S. Thompson environment improvements
2021-03-03 Henry S. Thompson trying to move to slurm
2020-05-09 Henry S. Thompson improved F handling/logging
2020-05-08 Henry S. Thompson keep separate antecedants separate, buggy?
2020-05-07 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
2020-05-07 Henry S. Thompson refactor, change summary print (problem?)
2020-05-06 Henry S. Thompson bare framework working
2020-05-06 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
2020-05-06 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
2020-05-06 Henry S. Thompson works for big files with Hadoop 3.4.0
2020-05-06 Henry S. Thompson x
2020-04-28 Henry S. Thompson log trucations
2020-04-28 Henry S. Thompson impose some limits
2020-04-28 Henry S. Thompson x
2020-04-24 Henry S. Thompson x
2020-04-24 Henry S. Thompson mostly from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson fix from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson several efficiency (hofentlich) tweaks
2020-04-23 Henry S. Thompson x
2020-04-23 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg
2020-04-22 Henry S. Thompson java stuff
2020-04-22 Henry S. Thompson try nutch fetch for big pdfs
2020-04-15 Henry S. Thompson final most general versin
2020-04-14 Henry S. Thompson too big for /dev/shm, split in half
2020-04-14 Henry S. Thompson one-off to convert big extracts.tar into lots of smaller ones
2020-04-13 Henry S. Thompson as used successfully for 3rd run
2020-04-13 Henry S. Thompson ready to try another pass with robust diff checking
2020-04-13 Henry S. Thompson working towards more robust diff checking
2020-04-11 Henry S. Thompson a few tweaks after 2nd parallel run
2020-04-10 Henry S. Thompson another few log fixes
2020-04-10 Henry S. Thompson as running, modulo 1 log output wrong
2020-04-10 Henry S. Thompson log more, work around more glitches
2020-04-10 Henry S. Thompson x
2020-04-08 Henry S. Thompson start try to work around failures
2020-04-08 Henry S. Thompson parallelised version of reExtract.sh
2020-04-07 Henry S. Thompson complete change of array var construction, used it for log file names too, tar update enabled, so maybe complete but w/o any parallel