Mercurial > hg > cc > cirrus_home
graph
-
add -c switch to btotMon, 01 Nov 2021 21:23:13 +0000, by Henry S. Thompson
-
use sqlite3 just to tabulateThu, 28 Oct 2021 12:11:08 +0000, by Henry S. Thompson
-
fixedTue, 26 Oct 2021 14:07:34 +0000, by Henry S. Thompson
-
working, with compound driver filesTue, 26 Oct 2021 14:05:35 +0000, by Henry S. Thompson
-
better commentsMon, 25 Oct 2021 15:07:03 +0000, by Henry S. Thompson
-
do the work for cdx2sqlMon, 25 Oct 2021 15:05:46 +0000, by Henry S. Thompson
-
change test to use MasterMon, 25 Oct 2021 15:05:25 +0000, by Henry S. Thompson
-
works for 0--9Fri, 22 Oct 2021 12:36:15 +0000, by Henry S. Thompson
-
replace too-complex invocation of cdx2tsvThu, 21 Oct 2021 19:18:47 +0000, by Henry S. Thompson
-
basic, worksWed, 20 Oct 2021 17:14:18 +0000, by Henry S. Thompson
-
too clever by half, keys won't work in parallel for e.g. media typesWed, 20 Oct 2021 15:47:55 +0000, by Henry S. Thompson
-
working, w. pickleTue, 19 Oct 2021 12:57:50 +0000, by Henry S. Thompson
-
mail-libTue, 19 Oct 2021 12:56:14 +0000, by Henry S. Thompson
-
move to ec164.guestTue, 19 Oct 2021 12:55:30 +0000, by Henry S. Thompson
-
fixed bug(s) wrt large payload filesFri, 23 Jul 2021 22:19:15 +0000, by Henry S. Thompson
-
just barely workingFri, 23 Jul 2021 16:23:46 +0000, by Henry S. Thompson
-
add cl arg --fpath replacing FPAT, which is now default valueWed, 21 Jul 2021 20:05:42 +0000, by Henry S. Thompson
-
more pathsWed, 21 Jul 2021 20:04:11 +0000, by Henry S. Thompson
-
add usage/help infoWed, 14 Jul 2021 16:50:30 +0000, by Henry S. Thompson
-
add usage/help infoWed, 14 Jul 2021 16:49:54 +0000, by Henry S. Thompson
-
parameterise the temp file and move it to /dev/shmWed, 14 Jul 2021 16:49:35 +0000, by Henry S. Thompson
-
sicWed, 14 Jul 2021 15:30:29 +0000, by Henry S. Thompson
-
use printf safelyFri, 09 Jul 2021 14:20:51 +0000, by Henry S. Thompson
-
handle multiple L-M lines :-(Fri, 09 Jul 2021 13:46:10 +0000, by Henry S. Thompson
-
improve error handlingFri, 09 Jul 2021 13:45:43 +0000, by Henry S. Thompson
-
more focussed, better SLURM_... varsFri, 09 Jul 2021 13:45:04 +0000, by Henry S. Thompson
-
bits and piecesTue, 29 Jun 2021 08:00:40 +0000, by Henry S. Thompson
-
better btotTue, 29 Jun 2021 07:53:47 +0000, by Henry S. Thompson
-
extract Last Modified via cdxMon, 28 Jun 2021 21:50:30 +0000, by Henry S. Thompson
-
fix path to qpdfMon, 28 Jun 2021 17:16:34 +0000, by Henry S. Thompson
-
silently skip robotstxtMon, 28 Jun 2021 17:16:15 +0000, by Henry S. Thompson
-
workaround histcontrolMon, 28 Jun 2021 17:15:19 +0000, by Henry S. Thompson
-
support field editMon, 28 Jun 2021 15:40:10 +0000, by Henry S. Thompson
-
for use in processing CC index filesMon, 28 Jun 2021 14:01:41 +0000, by Henry S. Thompson
-
implement --cmdWed, 16 Jun 2021 16:12:46 +0000, by Henry S. Thompson
-
qpdf needs LD_LIB_PATHWed, 16 Jun 2021 16:12:16 +0000, by Henry S. Thompson
-
refactor final processing loop,Tue, 15 Jun 2021 18:04:34 +0000, by Henry S. Thompson
-
frame sizeTue, 15 Jun 2021 16:58:31 +0000, by Henry S. Thompson
-
include sh-scriptTue, 15 Jun 2021 16:58:03 +0000, by Henry S. Thompson
-
all parts working, idempotency achievedMon, 26 Apr 2021 17:18:29 +0000, by Henry S. Thompson
-
debuggingMon, 26 Apr 2021 17:17:58 +0000, by Henry S. Thompson
-
(none)Mon, 26 Apr 2021 17:17:38 +0000, by Henry S. Thompson
-
warc and headers parts workingMon, 26 Apr 2021 15:28:23 +0000, by Henry S. Thompson
-
back to IGzipFileThu, 22 Apr 2021 21:31:03 +0000, by Henry S. Thompson
-
approved Popen version using .communicateThu, 22 Apr 2021 21:10:02 +0000, by Henry S. Thompson
-
using Popen to run igzip (also not great)Thu, 22 Apr 2021 19:06:55 +0000, by Henry S. Thompson
-
added support for copying to/using /dev/shm or /tmpTue, 20 Apr 2021 19:11:57 +0000, by Henry S. Thompson
-
working with -x and rich directory structureTue, 20 Apr 2021 12:26:09 +0000, by Henry S. Thompson
-
convert to rich directory structure per 2019-35Tue, 20 Apr 2021 11:12:35 +0000, by Henry S. Thompson
-
-x barely workingMon, 19 Apr 2021 18:09:51 +0000, by Henry S. Thompson
-
never should have addedMon, 19 Apr 2021 18:09:25 +0000, by Henry S. Thompson
-
better dd error handlingMon, 19 Apr 2021 13:08:16 +0000, by Henry S. Thompson
-
(none)Mon, 19 Apr 2021 13:07:58 +0000, by Henry S. Thompson
-
bare minimum workingSun, 18 Apr 2021 17:03:45 +0000, by Henry S. Thompson
-
triple args checked, filename openedFri, 16 Apr 2021 18:28:00 +0000, by Henry S. Thompson
-
help format hacking doneFri, 16 Apr 2021 13:15:23 +0000, by Henry S. Thompson
-
basic help format hacking worksFri, 16 Apr 2021 12:55:05 +0000, by Henry S. Thompson
-
(none)Fri, 16 Apr 2021 09:01:16 +0000, by Henry S. Thompson
-
(none)Fri, 16 Apr 2021 09:00:17 +0000, by Henry S. Thompson
-
just strugging with argparseThu, 15 Apr 2021 19:22:27 +0000, by Henry S. Thompson
-
support a command to receive each result,Thu, 15 Apr 2021 10:59:25 +0000, by Henry S. Thompson
-
accepts index lines, less line-at-a-timeWed, 14 Apr 2021 20:15:32 +0000, by Henry S. Thompson
-
working with one inputWed, 14 Apr 2021 10:08:41 +0000, by Henry S. Thompson
-
-w and -h workingWed, 14 Apr 2021 08:57:43 +0000, by Henry S. Thompson
-
working on flagsTue, 13 Apr 2021 17:52:31 +0000, by Henry S. Thompson
-
newTue, 13 Apr 2021 17:02:09 +0000, by Henry S. Thompson
-
working with locking and copyingTue, 16 Mar 2021 16:20:02 +0000, by Henry S. Thompson
-
working for -t 2 -c 2Mon, 15 Mar 2021 14:26:42 +0000, by Henry S. Thompson
-
minorMon, 15 Mar 2021 14:20:00 +0000, by Henry S. Thompson
-
prepare for real parallel distributionSun, 14 Mar 2021 21:28:02 +0000, by Henry S. Thompson
-
environment improvementsSun, 14 Mar 2021 21:25:01 +0000, by Henry S. Thompson
-
trying to move to slurmWed, 03 Mar 2021 19:33:56 +0000, by Henry S. Thompson
-
improved F handling/loggingSat, 09 May 2020 16:16:28 +0100, by Henry S. Thompson
-
keep separate antecedants separate, buggy?Fri, 08 May 2020 19:52:36 +0100, by Henry S. Thompson
-
track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"Thu, 07 May 2020 18:47:24 +0100, by Henry S. Thompson
-
refactor, change summary print (problem?)Thu, 07 May 2020 11:33:24 +0100, by Henry S. Thompson
-
bare framework workingWed, 06 May 2020 18:28:52 +0100, by Henry S. Thompson
-
starting on tool to assemble as complete as we have info wrt a seed URIWed, 06 May 2020 14:25:44 +0100, by Henry S. Thompson
-
use local .m2/repository for Hadoop 3.4.0Wed, 06 May 2020 14:24:42 +0100, by Henry S. Thompson
-
works for big files with Hadoop 3.4.0Wed, 06 May 2020 14:23:33 +0100, by Henry S. Thompson
-
xWed, 06 May 2020 14:22:48 +0100, by Henry S. Thompson
-
log trucationsTue, 28 Apr 2020 19:02:34 +0100, by Henry S. Thompson
-
impose some limitsTue, 28 Apr 2020 19:02:14 +0100, by Henry S. Thompson
-
xTue, 28 Apr 2020 19:01:41 +0100, by Henry S. Thompson
-
xFri, 24 Apr 2020 20:12:44 +0100, by Henry S. Thompson
-
mostly from SebastianFri, 24 Apr 2020 20:12:29 +0100, by Henry S. Thompson
-
miscFri, 24 Apr 2020 20:03:29 +0100, by Henry S. Thompson
-
miscFri, 24 Apr 2020 20:01:35 +0100, by Henry S. Thompson
-
fix from SebastianFri, 24 Apr 2020 20:01:25 +0100, by Henry S. Thompson
-
miscFri, 24 Apr 2020 19:57:16 +0100, by Henry S. Thompson
-
miscFri, 24 Apr 2020 19:55:11 +0100, by Henry S. Thompson
-
several efficiency (hofentlich) tweaksFri, 24 Apr 2020 15:20:33 +0100, by Henry S. Thompson
-
xThu, 23 Apr 2020 17:26:55 +0100, by Henry S. Thompson
-
switch for use on login server, invoke by hand with 0/1 as only cmd line argThu, 23 Apr 2020 17:25:25 +0100, by Henry S. Thompson
-
java stuffWed, 22 Apr 2020 18:42:40 +0100, by Henry S. Thompson
-
try nutch fetch for big pdfsWed, 22 Apr 2020 18:42:23 +0100, by Henry S. Thompson
-
final most general versinWed, 15 Apr 2020 18:44:18 +0100, by Henry S. Thompson
-
too big for /dev/shm, split in halfTue, 14 Apr 2020 17:52:34 +0100, by Henry S. Thompson
-
one-off to convert big extracts.tar into lots of smaller onesTue, 14 Apr 2020 16:10:22 +0100, by Henry S. Thompson
-
as used successfully for 3rd runMon, 13 Apr 2020 17:29:31 +0100, by Henry S. Thompson
-
ready to try another pass with robust diff checkingMon, 13 Apr 2020 15:24:32 +0100, by Henry S. Thompson
-
working towards more robust diff checkingMon, 13 Apr 2020 14:12:12 +0100, by Henry S. Thompson
-
a few tweaks after 2nd parallel runSat, 11 Apr 2020 13:41:46 +0100, by Henry S. Thompson
-
another few log fixesFri, 10 Apr 2020 18:45:30 +0100, by Henry S. Thompson
-
as running, modulo 1 log output wrongFri, 10 Apr 2020 18:42:08 +0100, by Henry S. Thompson
-
log more, work around more glitchesFri, 10 Apr 2020 18:22:48 +0100, by Henry S. Thompson
-
xFri, 10 Apr 2020 18:22:24 +0100, by Henry S. Thompson
-
start try to work around failuresWed, 08 Apr 2020 14:11:04 +0100, by Henry S. Thompson
-
parallelised version of reExtract.shWed, 08 Apr 2020 11:27:33 +0100, by Henry S. Thompson
-
complete change of array var construction, used it for log file names too, tar update enabled, so maybe complete but w/o any parallelTue, 07 Apr 2020 18:00:29 +0100, by Henry S. Thompson
-
added computation of required additions to tar file, but not actually addedSat, 04 Apr 2020 15:31:58 +0100, by Henry S. Thompson
-
refactored, not testedFri, 03 Apr 2020 19:04:06 +0100, by Henry S. Thompson
-
done through re-extraction, fixing tars still to comeFri, 03 Apr 2020 17:35:17 +0100, by Henry S. Thompson
-
sketching moreThu, 02 Apr 2020 19:21:21 +0100, by Henry S. Thompson
-
towards re-running extraction in partThu, 02 Apr 2020 19:14:23 +0100, by Henry S. Thompson
-
up the time limitThu, 02 Apr 2020 19:13:40 +0100, by Henry S. Thompson
-
clean up after ourselvesThu, 02 Apr 2020 19:13:14 +0100, by Henry S. Thompson
-
fixed scope pblm in tar stepThu, 26 Mar 2020 15:29:12 +0000, by Henry S. Thompson
-
sync up filenames and log names,Thu, 26 Mar 2020 12:24:30 +0000, by Henry S. Thompson
-
pass through extract argsThu, 26 Mar 2020 12:23:33 +0000, by Henry S. Thompson