2022-07-01 |
Henry Thompson |
for 2022 exercise
|
2021-11-17 |
Henry S. Thompson |
instead of csv
|
2021-11-01 |
Henry S. Thompson |
add -c switch to btot
|
2021-10-28 |
Henry S. Thompson |
use sqlite3 just to tabulate
|
2021-10-26 |
Henry S. Thompson |
fixed
|
2021-10-26 |
Henry S. Thompson |
working, with compound driver files
|
2021-10-25 |
Henry S. Thompson |
better comments
|
2021-10-25 |
Henry S. Thompson |
do the work for cdx2sql
|
2021-10-25 |
Henry S. Thompson |
change test to use Master
|
2021-10-22 |
Henry S. Thompson |
works for 0--9
|
2021-10-21 |
Henry S. Thompson |
replace too-complex invocation of cdx2tsv
|
2021-10-20 |
Henry S. Thompson |
basic, works
|
2021-10-20 |
Henry S. Thompson |
too clever by half, keys won't work in parallel for e.g. media types
|
2021-10-19 |
Henry S. Thompson |
working, w. pickle
|
2021-10-19 |
Henry S. Thompson |
mail-lib
|
2021-10-19 |
Henry S. Thompson |
move to ec164.guest
|
2021-07-23 |
Henry S. Thompson |
fixed bug(s) wrt large payload files
|
2021-07-23 |
Henry S. Thompson |
just barely working
|
2021-07-21 |
Henry S. Thompson |
add cl arg --fpath replacing FPAT, which is now default value
|
2021-07-21 |
Henry S. Thompson |
more paths
|
2021-07-14 |
Henry S. Thompson |
add usage/help info
|
2021-07-14 |
Henry S. Thompson |
add usage/help info
|
2021-07-14 |
Henry S. Thompson |
parameterise the temp file and move it to /dev/shm
|
2021-07-14 |
Henry S. Thompson |
sic
|
2021-07-09 |
Henry S. Thompson |
use printf safely
|
2021-07-09 |
Henry S. Thompson |
handle multiple L-M lines :-(
|
2021-07-09 |
Henry S. Thompson |
improve error handling
|
2021-07-09 |
Henry S. Thompson |
more focussed, better SLURM_... vars
|
2021-06-29 |
Henry S. Thompson |
bits and pieces
|
2021-06-29 |
Henry S. Thompson |
better btot
|
2021-06-28 |
Henry S. Thompson |
extract Last Modified via cdx
|
2021-06-28 |
Henry S. Thompson |
fix path to qpdf
|
2021-06-28 |
Henry S. Thompson |
silently skip robotstxt
|
2021-06-28 |
Henry S. Thompson |
workaround histcontrol
|
2021-06-28 |
Henry S. Thompson |
support field edit
|
2021-06-28 |
Henry S. Thompson |
for use in processing CC index files
|
2021-06-16 |
Henry S. Thompson |
implement --cmd
|
2021-06-16 |
Henry S. Thompson |
qpdf needs LD_LIB_PATH
|
2021-06-15 |
Henry S. Thompson |
refactor final processing loop,
|
2021-06-15 |
Henry S. Thompson |
frame size
|
2021-06-15 |
Henry S. Thompson |
include sh-script
|
2021-04-26 |
Henry S. Thompson |
all parts working, idempotency achieved
|
2021-04-26 |
Henry S. Thompson |
debugging
|
2021-04-26 |
Henry S. Thompson |
(none)
|
2021-04-26 |
Henry S. Thompson |
warc and headers parts working
|
2021-04-22 |
Henry S. Thompson |
back to IGzipFile
|
2021-04-22 |
Henry S. Thompson |
approved Popen version using .communicate
|
2021-04-22 |
Henry S. Thompson |
using Popen to run igzip (also not great)
|
2021-04-20 |
Henry S. Thompson |
added support for copying to/using /dev/shm or /tmp
|
2021-04-20 |
Henry S. Thompson |
working with -x and rich directory structure
|
2021-04-20 |
Henry S. Thompson |
convert to rich directory structure per 2019-35
|
2021-04-19 |
Henry S. Thompson |
-x barely working
|
2021-04-19 |
Henry S. Thompson |
never should have added
|
2021-04-19 |
Henry S. Thompson |
better dd error handling
|
2021-04-19 |
Henry S. Thompson |
(none)
|
2021-04-18 |
Henry S. Thompson |
bare minimum working
|
2021-04-16 |
Henry S. Thompson |
triple args checked, filename opened
|
2021-04-16 |
Henry S. Thompson |
help format hacking done
|
2021-04-16 |
Henry S. Thompson |
basic help format hacking works
|
2021-04-16 |
Henry S. Thompson |
(none)
|
2021-04-16 |
Henry S. Thompson |
(none)
|
2021-04-15 |
Henry S. Thompson |
just strugging with argparse
|
2021-04-15 |
Henry S. Thompson |
support a command to receive each result,
|
2021-04-14 |
Henry S. Thompson |
accepts index lines, less line-at-a-time
|
2021-04-14 |
Henry S. Thompson |
working with one input
|
2021-04-14 |
Henry S. Thompson |
-w and -h working
|
2021-04-13 |
Henry S. Thompson |
working on flags
|
2021-04-13 |
Henry S. Thompson |
new
|
2021-03-16 |
Henry S. Thompson |
working with locking and copying
|
2021-03-15 |
Henry S. Thompson |
working for -t 2 -c 2
|
2021-03-15 |
Henry S. Thompson |
minor
|
2021-03-14 |
Henry S. Thompson |
prepare for real parallel distribution
|
2021-03-14 |
Henry S. Thompson |
environment improvements
|
2021-03-03 |
Henry S. Thompson |
trying to move to slurm
|
2020-05-09 |
Henry S. Thompson |
improved F handling/logging
|
2020-05-08 |
Henry S. Thompson |
keep separate antecedants separate, buggy?
|
2020-05-07 |
Henry S. Thompson |
track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
|
2020-05-07 |
Henry S. Thompson |
refactor, change summary print (problem?)
|
2020-05-06 |
Henry S. Thompson |
bare framework working
|
2020-05-06 |
Henry S. Thompson |
starting on tool to assemble as complete as we have info wrt a seed URI
|
2020-05-06 |
Henry S. Thompson |
use local .m2/repository for Hadoop 3.4.0
|
2020-05-06 |
Henry S. Thompson |
works for big files with Hadoop 3.4.0
|
2020-05-06 |
Henry S. Thompson |
x
|
2020-04-28 |
Henry S. Thompson |
log trucations
|
2020-04-28 |
Henry S. Thompson |
impose some limits
|
2020-04-28 |
Henry S. Thompson |
x
|
2020-04-24 |
Henry S. Thompson |
x
|
2020-04-24 |
Henry S. Thompson |
mostly from Sebastian
|
2020-04-24 |
Henry S. Thompson |
misc
|
2020-04-24 |
Henry S. Thompson |
misc
|
2020-04-24 |
Henry S. Thompson |
fix from Sebastian
|
2020-04-24 |
Henry S. Thompson |
misc
|
2020-04-24 |
Henry S. Thompson |
misc
|
2020-04-24 |
Henry S. Thompson |
several efficiency (hofentlich) tweaks
|
2020-04-23 |
Henry S. Thompson |
x
|
2020-04-23 |
Henry S. Thompson |
switch for use on login server, invoke by hand with 0/1 as only cmd line arg
|