log

age author description
2021-04-26 Henry S. Thompson warc and headers parts working
2021-04-22 Henry S. Thompson back to IGzipFile
2021-04-22 Henry S. Thompson approved Popen version using .communicate
2021-04-22 Henry S. Thompson using Popen to run igzip (also not great)
2021-04-20 Henry S. Thompson added support for copying to/using /dev/shm or /tmp
2021-04-20 Henry S. Thompson working with -x and rich directory structure
2021-04-20 Henry S. Thompson convert to rich directory structure per 2019-35
2021-04-19 Henry S. Thompson -x barely working
2021-04-19 Henry S. Thompson never should have added
2021-04-19 Henry S. Thompson better dd error handling
2021-04-19 Henry S. Thompson (none)
2021-04-18 Henry S. Thompson bare minimum working
2021-04-16 Henry S. Thompson triple args checked, filename opened
2021-04-16 Henry S. Thompson help format hacking done
2021-04-16 Henry S. Thompson basic help format hacking works
2021-04-16 Henry S. Thompson (none)
2021-04-16 Henry S. Thompson (none)
2021-04-15 Henry S. Thompson just strugging with argparse
2021-04-15 Henry S. Thompson support a command to receive each result,
2021-04-14 Henry S. Thompson accepts index lines, less line-at-a-time
2021-04-14 Henry S. Thompson working with one input
2021-04-14 Henry S. Thompson -w and -h working
2021-04-13 Henry S. Thompson working on flags
2021-04-13 Henry S. Thompson new
2021-03-16 Henry S. Thompson working with locking and copying
2021-03-15 Henry S. Thompson working for -t 2 -c 2
2021-03-15 Henry S. Thompson minor
2021-03-14 Henry S. Thompson prepare for real parallel distribution
2021-03-14 Henry S. Thompson environment improvements
2021-03-03 Henry S. Thompson trying to move to slurm
2020-05-09 Henry S. Thompson improved F handling/logging
2020-05-08 Henry S. Thompson keep separate antecedants separate, buggy?
2020-05-07 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
2020-05-07 Henry S. Thompson refactor, change summary print (problem?)
2020-05-06 Henry S. Thompson bare framework working
2020-05-06 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
2020-05-06 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
2020-05-06 Henry S. Thompson works for big files with Hadoop 3.4.0
2020-05-06 Henry S. Thompson x
2020-04-28 Henry S. Thompson log trucations
2020-04-28 Henry S. Thompson impose some limits
2020-04-28 Henry S. Thompson x
2020-04-24 Henry S. Thompson x
2020-04-24 Henry S. Thompson mostly from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson fix from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson several efficiency (hofentlich) tweaks
2020-04-23 Henry S. Thompson x
2020-04-23 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg
2020-04-22 Henry S. Thompson java stuff
2020-04-22 Henry S. Thompson try nutch fetch for big pdfs
2020-04-15 Henry S. Thompson final most general versin
2020-04-14 Henry S. Thompson too big for /dev/shm, split in half
2020-04-14 Henry S. Thompson one-off to convert big extracts.tar into lots of smaller ones
2020-04-13 Henry S. Thompson as used successfully for 3rd run
2020-04-13 Henry S. Thompson ready to try another pass with robust diff checking
2020-04-13 Henry S. Thompson working towards more robust diff checking
2020-04-11 Henry S. Thompson a few tweaks after 2nd parallel run
2020-04-10 Henry S. Thompson another few log fixes
2020-04-10 Henry S. Thompson as running, modulo 1 log output wrong
2020-04-10 Henry S. Thompson log more, work around more glitches