Mercurial > hg > cc > cirrus_home
diff bin/00README @ 132:128b18459f9e
sic
author | Henry S. Thompson <ht@inf.ed.ac.uk> |
---|---|
date | Wed, 14 Jul 2021 15:30:29 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/bin/00README Wed Jul 14 15:30:29 2021 +0000 @@ -0,0 +1,39 @@ +Various tools and bash function sources. + +All the tools will give useful output if run with a --help argument + +functions.sh Source this in your .bashrc to get useful functions + including ux, lss and btot + +cdx2tsv.py Extract fields and subparts from fields of a CDX-format + index file + +clm.sh Intended for use as a sub-command to ix.py: Given an + HTML response header, appends to a given file the Last-Modified value + if there is one, otherwise a blank line. + +ix.py Efficiently extract some or all of response data contents of + Common Crawl WARC-format files + +qpdf Wrapper for locally compiled version. + + Qpdf as supplied only works with a named file, but this + wrapper supports streamed input. + _If_ it's invoked as + qpdf [args...] - + it takes input from stdin, saves it as /dev/shm/$USER/xxx.pdf + and runs + qpdf args... /dev/shm/$USER/xxx.pdf + + Qpdf is the best available PDF validator + as far as I know. See + http://qpdf.sourceforge.net/files/qpdf-manual.html + for documentation. + +qpdf_check Runs qpdf with all the arguments needed to + make it run as a validator: no corrections are appied, + no warnings are output, + fails iff there are any errors in the input file. + + Uses the above qpdf wrapper, so supports input either + from stdin or a named file