log

age author description
2022-07-01 Henry Thompson for 2022 exercise
2021-11-17 Henry S. Thompson instead of csv
2021-11-01 Henry S. Thompson add -c switch to btot
2021-10-28 Henry S. Thompson use sqlite3 just to tabulate
2021-10-26 Henry S. Thompson fixed
2021-10-26 Henry S. Thompson working, with compound driver files
2021-10-25 Henry S. Thompson better comments
2021-10-25 Henry S. Thompson do the work for cdx2sql
2021-10-25 Henry S. Thompson change test to use Master
2021-10-22 Henry S. Thompson works for 0--9
2021-10-21 Henry S. Thompson replace too-complex invocation of cdx2tsv
2021-10-20 Henry S. Thompson basic, works
2021-10-20 Henry S. Thompson too clever by half, keys won't work in parallel for e.g. media types
2021-10-19 Henry S. Thompson working, w. pickle
2021-10-19 Henry S. Thompson mail-lib
2021-10-19 Henry S. Thompson move to ec164.guest
2021-07-23 Henry S. Thompson fixed bug(s) wrt large payload files
2021-07-23 Henry S. Thompson just barely working
2021-07-21 Henry S. Thompson add cl arg --fpath replacing FPAT, which is now default value
2021-07-21 Henry S. Thompson more paths
2021-07-14 Henry S. Thompson add usage/help info
2021-07-14 Henry S. Thompson add usage/help info
2021-07-14 Henry S. Thompson parameterise the temp file and move it to /dev/shm
2021-07-14 Henry S. Thompson sic
2021-07-09 Henry S. Thompson use printf safely
2021-07-09 Henry S. Thompson handle multiple L-M lines :-(
2021-07-09 Henry S. Thompson improve error handling
2021-07-09 Henry S. Thompson more focussed, better SLURM_... vars
2021-06-29 Henry S. Thompson bits and pieces
2021-06-29 Henry S. Thompson better btot
2021-06-28 Henry S. Thompson extract Last Modified via cdx
2021-06-28 Henry S. Thompson fix path to qpdf
2021-06-28 Henry S. Thompson silently skip robotstxt
2021-06-28 Henry S. Thompson workaround histcontrol
2021-06-28 Henry S. Thompson support field edit
2021-06-28 Henry S. Thompson for use in processing CC index files
2021-06-16 Henry S. Thompson implement --cmd
2021-06-16 Henry S. Thompson qpdf needs LD_LIB_PATH
2021-06-15 Henry S. Thompson refactor final processing loop,
2021-06-15 Henry S. Thompson frame size
2021-06-15 Henry S. Thompson include sh-script
2021-04-26 Henry S. Thompson all parts working, idempotency achieved
2021-04-26 Henry S. Thompson debugging
2021-04-26 Henry S. Thompson (none)
2021-04-26 Henry S. Thompson warc and headers parts working
2021-04-22 Henry S. Thompson back to IGzipFile
2021-04-22 Henry S. Thompson approved Popen version using .communicate
2021-04-22 Henry S. Thompson using Popen to run igzip (also not great)
2021-04-20 Henry S. Thompson added support for copying to/using /dev/shm or /tmp
2021-04-20 Henry S. Thompson working with -x and rich directory structure
2021-04-20 Henry S. Thompson convert to rich directory structure per 2019-35
2021-04-19 Henry S. Thompson -x barely working
2021-04-19 Henry S. Thompson never should have added
2021-04-19 Henry S. Thompson better dd error handling
2021-04-19 Henry S. Thompson (none)
2021-04-18 Henry S. Thompson bare minimum working
2021-04-16 Henry S. Thompson triple args checked, filename opened
2021-04-16 Henry S. Thompson help format hacking done
2021-04-16 Henry S. Thompson basic help format hacking works
2021-04-16 Henry S. Thompson (none)
2021-04-16 Henry S. Thompson (none)
2021-04-15 Henry S. Thompson just strugging with argparse
2021-04-15 Henry S. Thompson support a command to receive each result,
2021-04-14 Henry S. Thompson accepts index lines, less line-at-a-time
2021-04-14 Henry S. Thompson working with one input
2021-04-14 Henry S. Thompson -w and -h working
2021-04-13 Henry S. Thompson working on flags
2021-04-13 Henry S. Thompson new
2021-03-16 Henry S. Thompson working with locking and copying
2021-03-15 Henry S. Thompson working for -t 2 -c 2
2021-03-15 Henry S. Thompson minor
2021-03-14 Henry S. Thompson prepare for real parallel distribution
2021-03-14 Henry S. Thompson environment improvements
2021-03-03 Henry S. Thompson trying to move to slurm
2020-05-09 Henry S. Thompson improved F handling/logging
2020-05-08 Henry S. Thompson keep separate antecedants separate, buggy?
2020-05-07 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
2020-05-07 Henry S. Thompson refactor, change summary print (problem?)
2020-05-06 Henry S. Thompson bare framework working
2020-05-06 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
2020-05-06 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
2020-05-06 Henry S. Thompson works for big files with Hadoop 3.4.0
2020-05-06 Henry S. Thompson x
2020-04-28 Henry S. Thompson log trucations
2020-04-28 Henry S. Thompson impose some limits
2020-04-28 Henry S. Thompson x
2020-04-24 Henry S. Thompson x
2020-04-24 Henry S. Thompson mostly from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson fix from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson several efficiency (hofentlich) tweaks
2020-04-23 Henry S. Thompson x
2020-04-23 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg