annotate bin/00README @ 174:bfe9085a1d39

change account back
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 10 Jan 2023 17:48:26 +0000
parents 128b18459f9e
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
132
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
1 Various tools and bash function sources.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
2
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
3 All the tools will give useful output if run with a --help argument
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
4
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
5 functions.sh Source this in your .bashrc to get useful functions
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
6 including ux, lss and btot
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
7
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
8 cdx2tsv.py Extract fields and subparts from fields of a CDX-format
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
9 index file
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
10
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
11 clm.sh Intended for use as a sub-command to ix.py: Given an
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
12 HTML response header, appends to a given file the Last-Modified value
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
13 if there is one, otherwise a blank line.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
14
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
15 ix.py Efficiently extract some or all of response data contents of
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
16 Common Crawl WARC-format files
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
17
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
18 qpdf Wrapper for locally compiled version.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
19
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
20 Qpdf as supplied only works with a named file, but this
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
21 wrapper supports streamed input.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
22 _If_ it's invoked as
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
23 qpdf [args...] -
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
24 it takes input from stdin, saves it as /dev/shm/$USER/xxx.pdf
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
25 and runs
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
26 qpdf args... /dev/shm/$USER/xxx.pdf
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
27
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
28 Qpdf is the best available PDF validator
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
29 as far as I know. See
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
30 http://qpdf.sourceforge.net/files/qpdf-manual.html
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
31 for documentation.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
32
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
33 qpdf_check Runs qpdf with all the arguments needed to
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
34 make it run as a validator: no corrections are appied,
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
35 no warnings are output,
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
36 fails iff there are any errors in the input file.
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
37
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
38 Uses the above qpdf wrapper, so supports input either
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
39 from stdin or a named file