log

age author description
5 months ago Henry S. Thompson add lastmod to cdx lines,
12 months ago Henry S. Thompson csing-related tweaks
15 months ago Henry S. Thompson too many overdue updates to break down
18 months ago Henry S. Thompson use csing, and _runme_c.sh to get it initialised
18 months ago Henry S. Thompson MANPATH (?)
18 months ago Henry S. Thompson tab completion fix
19 months ago Henry S. Thompson add support for multiple calls to srun with a counter
20 months ago Henry S. Thompson add private work bin dir to PATH
20 months ago Henry S. Thompson tweak UI: copy/paste and title bar
20 months ago Henry S. Thompson ec184 now, run w. unbuffered output
20 months ago Henry S. Thompson moved to work tree
20 months ago Henry S. Thompson working, about to move to work tree
20 months ago Henry S. Thompson working on implementing types and parts:
2023-01-10 Henry S. Thompson change account back
2022-07-28 Henry S. Thompson x
2022-07-28 Henry S. Thompson generalised sbatch front-end to cdx2tsv.py
2022-07-28 Henry S. Thompson x
2022-07-20 Henry S. Thompson add $W
2022-07-20 Henry S. Thompson new-style log notice
2022-07-20 Henry S. Thompson x
2022-07-18 Henry S. Thompson new style batch jobs, see cirrus_work repo for _xxx.sh
2022-07-18 Henry S. Thompson old style
2022-07-18 Henry S. Thompson symlink to dir does't work
2022-07-18 Henry S. Thompson work-path bin dir
2022-07-18 Henry S. Thompson previous approach to lang/field extraction
2022-07-18 Henry S. Thompson moved to shared/bin
2022-07-18 Henry S. Thompson x
2022-07-18 Henry S. Thompson x
2022-07-06 Henry S. Thompson demo of slurm usage using cdx2tsv.py
2022-07-06 Henry S. Thompson do whole line
2022-07-04 Henry S. Thompson no more gentoo,
2022-07-04 Henry S. Thompson allow use of global stash
2022-07-01 Henry Thompson for 2022 exercise
2021-11-17 Henry S. Thompson instead of csv
2021-11-01 Henry S. Thompson add -c switch to btot
2021-10-28 Henry S. Thompson use sqlite3 just to tabulate
2021-10-26 Henry S. Thompson fixed
2021-10-26 Henry S. Thompson working, with compound driver files
2021-10-25 Henry S. Thompson better comments
2021-10-25 Henry S. Thompson do the work for cdx2sql
2021-10-25 Henry S. Thompson change test to use Master
2021-10-22 Henry S. Thompson works for 0--9
2021-10-21 Henry S. Thompson replace too-complex invocation of cdx2tsv
2021-10-20 Henry S. Thompson basic, works
2021-10-20 Henry S. Thompson too clever by half, keys won't work in parallel for e.g. media types
2021-10-19 Henry S. Thompson working, w. pickle
2021-10-19 Henry S. Thompson mail-lib
2021-10-19 Henry S. Thompson move to ec164.guest
2021-07-23 Henry S. Thompson fixed bug(s) wrt large payload files
2021-07-23 Henry S. Thompson just barely working
2021-07-21 Henry S. Thompson add cl arg --fpath replacing FPAT, which is now default value
2021-07-21 Henry S. Thompson more paths
2021-07-14 Henry S. Thompson add usage/help info
2021-07-14 Henry S. Thompson add usage/help info
2021-07-14 Henry S. Thompson parameterise the temp file and move it to /dev/shm
2021-07-14 Henry S. Thompson sic
2021-07-09 Henry S. Thompson use printf safely
2021-07-09 Henry S. Thompson handle multiple L-M lines :-(
2021-07-09 Henry S. Thompson improve error handling
2021-07-09 Henry S. Thompson more focussed, better SLURM_... vars
2021-06-29 Henry S. Thompson bits and pieces
2021-06-29 Henry S. Thompson better btot
2021-06-28 Henry S. Thompson extract Last Modified via cdx
2021-06-28 Henry S. Thompson fix path to qpdf
2021-06-28 Henry S. Thompson silently skip robotstxt
2021-06-28 Henry S. Thompson workaround histcontrol
2021-06-28 Henry S. Thompson support field edit
2021-06-28 Henry S. Thompson for use in processing CC index files
2021-06-16 Henry S. Thompson implement --cmd
2021-06-16 Henry S. Thompson qpdf needs LD_LIB_PATH
2021-06-15 Henry S. Thompson refactor final processing loop,
2021-06-15 Henry S. Thompson frame size
2021-06-15 Henry S. Thompson include sh-script
2021-04-26 Henry S. Thompson all parts working, idempotency achieved
2021-04-26 Henry S. Thompson debugging
2021-04-26 Henry S. Thompson (none)
2021-04-26 Henry S. Thompson warc and headers parts working
2021-04-22 Henry S. Thompson back to IGzipFile
2021-04-22 Henry S. Thompson approved Popen version using .communicate
2021-04-22 Henry S. Thompson using Popen to run igzip (also not great)
2021-04-20 Henry S. Thompson added support for copying to/using /dev/shm or /tmp
2021-04-20 Henry S. Thompson working with -x and rich directory structure
2021-04-20 Henry S. Thompson convert to rich directory structure per 2019-35
2021-04-19 Henry S. Thompson -x barely working
2021-04-19 Henry S. Thompson never should have added
2021-04-19 Henry S. Thompson better dd error handling
2021-04-19 Henry S. Thompson (none)
2021-04-18 Henry S. Thompson bare minimum working
2021-04-16 Henry S. Thompson triple args checked, filename opened
2021-04-16 Henry S. Thompson help format hacking done
2021-04-16 Henry S. Thompson basic help format hacking works
2021-04-16 Henry S. Thompson (none)
2021-04-16 Henry S. Thompson (none)
2021-04-15 Henry S. Thompson just strugging with argparse
2021-04-15 Henry S. Thompson support a command to receive each result,
2021-04-14 Henry S. Thompson accepts index lines, less line-at-a-time
2021-04-14 Henry S. Thompson working with one input
2021-04-14 Henry S. Thompson -w and -h working
2021-04-13 Henry S. Thompson working on flags
2021-04-13 Henry S. Thompson new
2021-03-16 Henry S. Thompson working with locking and copying
2021-03-15 Henry S. Thompson working for -t 2 -c 2
2021-03-15 Henry S. Thompson minor
2021-03-14 Henry S. Thompson prepare for real parallel distribution
2021-03-14 Henry S. Thompson environment improvements
2021-03-03 Henry S. Thompson trying to move to slurm
2020-05-09 Henry S. Thompson improved F handling/logging
2020-05-08 Henry S. Thompson keep separate antecedants separate, buggy?
2020-05-07 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
2020-05-07 Henry S. Thompson refactor, change summary print (problem?)
2020-05-06 Henry S. Thompson bare framework working
2020-05-06 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
2020-05-06 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
2020-05-06 Henry S. Thompson works for big files with Hadoop 3.4.0
2020-05-06 Henry S. Thompson x
2020-04-28 Henry S. Thompson log trucations
2020-04-28 Henry S. Thompson impose some limits
2020-04-28 Henry S. Thompson x
2020-04-24 Henry S. Thompson x
2020-04-24 Henry S. Thompson mostly from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson fix from Sebastian
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson misc
2020-04-24 Henry S. Thompson several efficiency (hofentlich) tweaks
2020-04-23 Henry S. Thompson x
2020-04-23 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg