view bin/00README @ 174:bfe9085a1d39

change account back
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 10 Jan 2023 17:48:26 +0000
parents 128b18459f9e
children
line wrap: on
line source

Various tools and bash function sources.

All the tools will give useful output if run with a --help argument

functions.sh  Source this in your .bashrc to get useful functions
	      including ux, lss and btot

cdx2tsv.py    Extract fields and subparts from fields of a CDX-format
	      index file

clm.sh	      Intended for use as a sub-command to ix.py:  Given an
	      HTML response header, appends to a given file the Last-Modified value
	      if there is one, otherwise a blank line.

ix.py	      Efficiently extract some or all of response data contents of
	      Common Crawl WARC-format files

qpdf	      Wrapper for locally compiled version.

              Qpdf as supplied only works with a named file, but this
	      wrapper supports streamed input.
	      _If_ it's invoked as
                  qpdf [args...] -
              it takes input from stdin, saves it as /dev/shm/$USER/xxx.pdf
	      and runs
                  qpdf args... /dev/shm/$USER/xxx.pdf

	      Qpdf is the best available PDF validator
	      as far as I know.  See
	      http://qpdf.sourceforge.net/files/qpdf-manual.html 
	      for documentation.

qpdf_check    Runs qpdf with all the arguments needed to
	      make it run as a validator: no corrections are appied,
	      no warnings are output,
	      fails iff there are any errors in the input file.

              Uses the above qpdf wrapper, so supports input either
	      from stdin or a named file