log

age author description
Sun, 14 Mar 2021 21:28:02 +0000 Henry S. Thompson prepare for real parallel distribution
Sun, 14 Mar 2021 21:25:01 +0000 Henry S. Thompson environment improvements
Wed, 03 Mar 2021 19:33:56 +0000 Henry S. Thompson trying to move to slurm
Sat, 09 May 2020 16:16:28 +0100 Henry S. Thompson improved F handling/logging
Fri, 08 May 2020 19:52:36 +0100 Henry S. Thompson keep separate antecedants separate, buggy?
Thu, 07 May 2020 18:47:24 +0100 Henry S. Thompson track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
Thu, 07 May 2020 11:33:24 +0100 Henry S. Thompson refactor, change summary print (problem?)
Wed, 06 May 2020 18:28:52 +0100 Henry S. Thompson bare framework working
Wed, 06 May 2020 14:25:44 +0100 Henry S. Thompson starting on tool to assemble as complete as we have info wrt a seed URI
Wed, 06 May 2020 14:24:42 +0100 Henry S. Thompson use local .m2/repository for Hadoop 3.4.0
Wed, 06 May 2020 14:23:33 +0100 Henry S. Thompson works for big files with Hadoop 3.4.0
Wed, 06 May 2020 14:22:48 +0100 Henry S. Thompson x
Tue, 28 Apr 2020 19:02:34 +0100 Henry S. Thompson log trucations
Tue, 28 Apr 2020 19:02:14 +0100 Henry S. Thompson impose some limits
Tue, 28 Apr 2020 19:01:41 +0100 Henry S. Thompson x
Fri, 24 Apr 2020 20:12:44 +0100 Henry S. Thompson x
Fri, 24 Apr 2020 20:12:29 +0100 Henry S. Thompson mostly from Sebastian
Fri, 24 Apr 2020 20:03:29 +0100 Henry S. Thompson misc
Fri, 24 Apr 2020 20:01:35 +0100 Henry S. Thompson misc
Fri, 24 Apr 2020 20:01:25 +0100 Henry S. Thompson fix from Sebastian
Fri, 24 Apr 2020 19:57:16 +0100 Henry S. Thompson misc
Fri, 24 Apr 2020 19:55:11 +0100 Henry S. Thompson misc
Fri, 24 Apr 2020 15:20:33 +0100 Henry S. Thompson several efficiency (hofentlich) tweaks
Thu, 23 Apr 2020 17:26:55 +0100 Henry S. Thompson x
Thu, 23 Apr 2020 17:25:25 +0100 Henry S. Thompson switch for use on login server, invoke by hand with 0/1 as only cmd line arg
Wed, 22 Apr 2020 18:42:40 +0100 Henry S. Thompson java stuff
Wed, 22 Apr 2020 18:42:23 +0100 Henry S. Thompson try nutch fetch for big pdfs
Wed, 15 Apr 2020 18:44:18 +0100 Henry S. Thompson final most general versin
Tue, 14 Apr 2020 17:52:34 +0100 Henry S. Thompson too big for /dev/shm, split in half
Tue, 14 Apr 2020 16:10:22 +0100 Henry S. Thompson one-off to convert big extracts.tar into lots of smaller ones