diff bin/00README @ 132:128b18459f9e

sic
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Wed, 14 Jul 2021 15:30:29 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/bin/00README	Wed Jul 14 15:30:29 2021 +0000
@@ -0,0 +1,39 @@
+Various tools and bash function sources.
+
+All the tools will give useful output if run with a --help argument
+
+functions.sh  Source this in your .bashrc to get useful functions
+	      including ux, lss and btot
+
+cdx2tsv.py    Extract fields and subparts from fields of a CDX-format
+	      index file
+
+clm.sh	      Intended for use as a sub-command to ix.py:  Given an
+	      HTML response header, appends to a given file the Last-Modified value
+	      if there is one, otherwise a blank line.
+
+ix.py	      Efficiently extract some or all of response data contents of
+	      Common Crawl WARC-format files
+
+qpdf	      Wrapper for locally compiled version.
+
+              Qpdf as supplied only works with a named file, but this
+	      wrapper supports streamed input.
+	      _If_ it's invoked as
+                  qpdf [args...] -
+              it takes input from stdin, saves it as /dev/shm/$USER/xxx.pdf
+	      and runs
+                  qpdf args... /dev/shm/$USER/xxx.pdf
+
+	      Qpdf is the best available PDF validator
+	      as far as I know.  See
+	      http://qpdf.sourceforge.net/files/qpdf-manual.html 
+	      for documentation.
+
+qpdf_check    Runs qpdf with all the arguments needed to
+	      make it run as a validator: no corrections are appied,
+	      no warnings are output,
+	      fails iff there are any errors in the input file.
+
+              Uses the above qpdf wrapper, so supports input either
+	      from stdin or a named file