annotate workers/bin/_timedWhich.sh @ 68:1f04bce6ead7 default tip

use basefile instead of transferfile, and remove cleanup: belt and braces wrt lossage of sac_schemes.py in 15% of 1000_k3, this as used in a_2
author Henry S. Thompson <ht@markup.co.uk>
date Thu, 04 Jun 2020 20:44:44 +0000
parents d4f186655bcc
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
17
2a2c1fb03c54 first cut at http/https real trial, with month and year last-modified info too
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
1 #!/bin/bash
18
9631fca89cc6 F2-related stuff, and new experiment
Henry S. Thompson <ht@markup.co.uk>
parents: 17
diff changeset
2 egrep -o '("WARC-Target-URI":"https?:|"Last-Modified":"[^"]*")'|\
17
2a2c1fb03c54 first cut at http/https real trial, with month and year last-modified info too
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
3 egrep -o '(https?:|:".*"$)' |\
2a2c1fb03c54 first cut at http/https real trial, with month and year last-modified info too
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
4 tr '\012' \# | sed 's/:#:/ /g'|tr \# '\012' | tr -d \"|\
19
d4f186655bcc lots of tweaking, reached the 80/20 point
Henry S. Thompson <ht@markup.co.uk>
parents: 18
diff changeset
5 sed ';s/gmt//ig;s/ [[:digit:]][[:digit:]]\?:[[:digit:]][[:digit:]]:[[:digit:]][[:digit:]]\(\.[[:digit:]]*\)\?\b//;s/^\(https\? \)\(: \)/\1/;s/ [MTWFSa-z]..\.\?, \?/ /;s/\( [[:upper:]][[:alnum:]]\{1,3\}\)\{1,2\}$//;s/ [-+][[:digit:]]\{4\}\b//;s/ [[:digit:]]\{1,2\} / /;s/ [[:upper:]][[:alnum:]]*\/[[:upper:]][[:alnum:]]*$//;s/ \+$//'|\
17
2a2c1fb03c54 first cut at http/https real trial, with month and year last-modified info too
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
6 awk '{c[$0]+=1} END {for (k in c) {print k, c[k]}}'
2a2c1fb03c54 first cut at http/https real trial, with month and year last-modified info too
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
7