2021-07-14 |
Henry S. Thompson |
add usage/help info
|
2021-07-14 |
Henry S. Thompson |
parameterise the temp file and move it to /dev/shm
|
2021-07-14 |
Henry S. Thompson |
sic
|
2021-07-09 |
Henry S. Thompson |
use printf safely
|
2021-07-09 |
Henry S. Thompson |
handle multiple L-M lines :-(
|
2021-07-09 |
Henry S. Thompson |
improve error handling
|
2021-07-09 |
Henry S. Thompson |
more focussed, better SLURM_... vars
|
2021-06-29 |
Henry S. Thompson |
bits and pieces
|
2021-06-29 |
Henry S. Thompson |
better btot
|
2021-06-28 |
Henry S. Thompson |
extract Last Modified via cdx
|
2021-06-28 |
Henry S. Thompson |
fix path to qpdf
|
2021-06-28 |
Henry S. Thompson |
silently skip robotstxt
|
2021-06-28 |
Henry S. Thompson |
workaround histcontrol
|
2021-06-28 |
Henry S. Thompson |
support field edit
|
2021-06-28 |
Henry S. Thompson |
for use in processing CC index files
|
2021-06-16 |
Henry S. Thompson |
implement --cmd
|
2021-06-16 |
Henry S. Thompson |
qpdf needs LD_LIB_PATH
|
2021-06-15 |
Henry S. Thompson |
refactor final processing loop,
|
2021-06-15 |
Henry S. Thompson |
frame size
|
2021-06-15 |
Henry S. Thompson |
include sh-script
|
2021-04-26 |
Henry S. Thompson |
all parts working, idempotency achieved
|
2021-04-26 |
Henry S. Thompson |
debugging
|
2021-04-26 |
Henry S. Thompson |
(none)
|
2021-04-26 |
Henry S. Thompson |
warc and headers parts working
|
2021-04-22 |
Henry S. Thompson |
back to IGzipFile
|
2021-04-22 |
Henry S. Thompson |
approved Popen version using .communicate
|
2021-04-22 |
Henry S. Thompson |
using Popen to run igzip (also not great)
|
2021-04-20 |
Henry S. Thompson |
added support for copying to/using /dev/shm or /tmp
|
2021-04-20 |
Henry S. Thompson |
working with -x and rich directory structure
|
2021-04-20 |
Henry S. Thompson |
convert to rich directory structure per 2019-35
|
2021-04-19 |
Henry S. Thompson |
-x barely working
|
2021-04-19 |
Henry S. Thompson |
never should have added
|
2021-04-19 |
Henry S. Thompson |
better dd error handling
|
2021-04-19 |
Henry S. Thompson |
(none)
|
2021-04-18 |
Henry S. Thompson |
bare minimum working
|
2021-04-16 |
Henry S. Thompson |
triple args checked, filename opened
|
2021-04-16 |
Henry S. Thompson |
help format hacking done
|
2021-04-16 |
Henry S. Thompson |
basic help format hacking works
|
2021-04-16 |
Henry S. Thompson |
(none)
|
2021-04-16 |
Henry S. Thompson |
(none)
|
2021-04-15 |
Henry S. Thompson |
just strugging with argparse
|
2021-04-15 |
Henry S. Thompson |
support a command to receive each result,
|
2021-04-14 |
Henry S. Thompson |
accepts index lines, less line-at-a-time
|
2021-04-14 |
Henry S. Thompson |
working with one input
|
2021-04-14 |
Henry S. Thompson |
-w and -h working
|
2021-04-13 |
Henry S. Thompson |
working on flags
|
2021-04-13 |
Henry S. Thompson |
new
|
2021-03-16 |
Henry S. Thompson |
working with locking and copying
|
2021-03-15 |
Henry S. Thompson |
working for -t 2 -c 2
|
2021-03-15 |
Henry S. Thompson |
minor
|
2021-03-14 |
Henry S. Thompson |
prepare for real parallel distribution
|
2021-03-14 |
Henry S. Thompson |
environment improvements
|
2021-03-03 |
Henry S. Thompson |
trying to move to slurm
|
2020-05-09 |
Henry S. Thompson |
improved F handling/logging
|
2020-05-08 |
Henry S. Thompson |
keep separate antecedants separate, buggy?
|
2020-05-07 |
Henry S. Thompson |
track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
|
2020-05-07 |
Henry S. Thompson |
refactor, change summary print (problem?)
|
2020-05-06 |
Henry S. Thompson |
bare framework working
|
2020-05-06 |
Henry S. Thompson |
starting on tool to assemble as complete as we have info wrt a seed URI
|
2020-05-06 |
Henry S. Thompson |
use local .m2/repository for Hadoop 3.4.0
|