Mercurial > hg > cc > azure
diff workers/bin/_timedWhich.sh @ 17:2a2c1fb03c54
first cut at http/https real trial, with month and year last-modified info too
author | Henry S. Thompson <ht@markup.co.uk> |
---|---|
date | Fri, 19 Oct 2018 11:36:31 +0000 |
parents | |
children | 9631fca89cc6 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/workers/bin/_timedWhich.sh Fri Oct 19 11:36:31 2018 +0000 @@ -0,0 +1,7 @@ +#!/bin/bash +egrep -o'("WARC-Target-URI":"https?:|"Last-Modified":"[^"]*")'|\ + egrep -o '(https?:|:".*"$)' |\ + tr '\012' \# | sed 's/:#:/ /g'|tr \# '\012' | tr -d \"|\ + sed 's/ [[:digit:]][[:digit:]]\?:[[:digit:]][[:digit:]]:[[:digit:]][[:digit:]] / /;s/\(https\? \)\(: \)\?[MTWFSa-z]..\.\?, \?/\1/;s/ \([-+][[:digit:]]\{4\}\|[[:upper:]]\{2,3\}\)$//;s/ [[:digit:]]\{1,2\} / /;s/\/[[:digit:]]\{1,2\}\/\([[:digit:]]\{4\}\)$/ \1/'|\ +awk '{c[$0]+=1} END {for (k in c) {print k, c[k]}}' +