Mon, 26 Apr 2021 17:18:29 +0000 |
Henry S. Thompson |
all parts working, idempotency achieved
|
Mon, 26 Apr 2021 17:17:58 +0000 |
Henry S. Thompson |
debugging
|
Mon, 26 Apr 2021 17:17:38 +0000 |
Henry S. Thompson |
(none)
|
Mon, 26 Apr 2021 15:28:23 +0000 |
Henry S. Thompson |
warc and headers parts working
|
Thu, 22 Apr 2021 21:31:03 +0000 |
Henry S. Thompson |
back to IGzipFile
|
Thu, 22 Apr 2021 21:10:02 +0000 |
Henry S. Thompson |
approved Popen version using .communicate
|
Thu, 22 Apr 2021 19:06:55 +0000 |
Henry S. Thompson |
using Popen to run igzip (also not great)
|
Tue, 20 Apr 2021 19:11:57 +0000 |
Henry S. Thompson |
added support for copying to/using /dev/shm or /tmp
|
Tue, 20 Apr 2021 12:26:09 +0000 |
Henry S. Thompson |
working with -x and rich directory structure
|
Tue, 20 Apr 2021 11:12:35 +0000 |
Henry S. Thompson |
convert to rich directory structure per 2019-35
|
Mon, 19 Apr 2021 18:09:51 +0000 |
Henry S. Thompson |
-x barely working
|
Mon, 19 Apr 2021 18:09:25 +0000 |
Henry S. Thompson |
never should have added
|
Mon, 19 Apr 2021 13:08:16 +0000 |
Henry S. Thompson |
better dd error handling
|
Mon, 19 Apr 2021 13:07:58 +0000 |
Henry S. Thompson |
(none)
|
Sun, 18 Apr 2021 17:03:45 +0000 |
Henry S. Thompson |
bare minimum working
|
Fri, 16 Apr 2021 18:28:00 +0000 |
Henry S. Thompson |
triple args checked, filename opened
|
Fri, 16 Apr 2021 13:15:23 +0000 |
Henry S. Thompson |
help format hacking done
|
Fri, 16 Apr 2021 12:55:05 +0000 |
Henry S. Thompson |
basic help format hacking works
|
Fri, 16 Apr 2021 09:01:16 +0000 |
Henry S. Thompson |
(none)
|
Fri, 16 Apr 2021 09:00:17 +0000 |
Henry S. Thompson |
(none)
|
Thu, 15 Apr 2021 19:22:27 +0000 |
Henry S. Thompson |
just strugging with argparse
|
Thu, 15 Apr 2021 10:59:25 +0000 |
Henry S. Thompson |
support a command to receive each result,
|
Wed, 14 Apr 2021 20:15:32 +0000 |
Henry S. Thompson |
accepts index lines, less line-at-a-time
|
Wed, 14 Apr 2021 10:08:41 +0000 |
Henry S. Thompson |
working with one input
|
Wed, 14 Apr 2021 08:57:43 +0000 |
Henry S. Thompson |
-w and -h working
|
Tue, 13 Apr 2021 17:52:31 +0000 |
Henry S. Thompson |
working on flags
|
Tue, 13 Apr 2021 17:02:09 +0000 |
Henry S. Thompson |
new
|
Tue, 16 Mar 2021 16:20:02 +0000 |
Henry S. Thompson |
working with locking and copying
|
Mon, 15 Mar 2021 14:26:42 +0000 |
Henry S. Thompson |
working for -t 2 -c 2
|
Mon, 15 Mar 2021 14:20:00 +0000 |
Henry S. Thompson |
minor
|
Sun, 14 Mar 2021 21:28:02 +0000 |
Henry S. Thompson |
prepare for real parallel distribution
|
Sun, 14 Mar 2021 21:25:01 +0000 |
Henry S. Thompson |
environment improvements
|
Wed, 03 Mar 2021 19:33:56 +0000 |
Henry S. Thompson |
trying to move to slurm
|
Sat, 09 May 2020 16:16:28 +0100 |
Henry S. Thompson |
improved F handling/logging
|
Fri, 08 May 2020 19:52:36 +0100 |
Henry S. Thompson |
keep separate antecedants separate, buggy?
|
Thu, 07 May 2020 18:47:24 +0100 |
Henry S. Thompson |
track redirects, need to us full crawldiagnostics.warc.gz for "location:" and "Uri:"
|
Thu, 07 May 2020 11:33:24 +0100 |
Henry S. Thompson |
refactor, change summary print (problem?)
|
Wed, 06 May 2020 18:28:52 +0100 |
Henry S. Thompson |
bare framework working
|
Wed, 06 May 2020 14:25:44 +0100 |
Henry S. Thompson |
starting on tool to assemble as complete as we have info wrt a seed URI
|
Wed, 06 May 2020 14:24:42 +0100 |
Henry S. Thompson |
use local .m2/repository for Hadoop 3.4.0
|
Wed, 06 May 2020 14:23:33 +0100 |
Henry S. Thompson |
works for big files with Hadoop 3.4.0
|
Wed, 06 May 2020 14:22:48 +0100 |
Henry S. Thompson |
x
|
Tue, 28 Apr 2020 19:02:34 +0100 |
Henry S. Thompson |
log trucations
|
Tue, 28 Apr 2020 19:02:14 +0100 |
Henry S. Thompson |
impose some limits
|
Tue, 28 Apr 2020 19:01:41 +0100 |
Henry S. Thompson |
x
|
Fri, 24 Apr 2020 20:12:44 +0100 |
Henry S. Thompson |
x
|
Fri, 24 Apr 2020 20:12:29 +0100 |
Henry S. Thompson |
mostly from Sebastian
|
Fri, 24 Apr 2020 20:03:29 +0100 |
Henry S. Thompson |
misc
|
Fri, 24 Apr 2020 20:01:35 +0100 |
Henry S. Thompson |
misc
|
Fri, 24 Apr 2020 20:01:25 +0100 |
Henry S. Thompson |
fix from Sebastian
|
Fri, 24 Apr 2020 19:57:16 +0100 |
Henry S. Thompson |
misc
|
Fri, 24 Apr 2020 19:55:11 +0100 |
Henry S. Thompson |
misc
|
Fri, 24 Apr 2020 15:20:33 +0100 |
Henry S. Thompson |
several efficiency (hofentlich) tweaks
|
Thu, 23 Apr 2020 17:26:55 +0100 |
Henry S. Thompson |
x
|
Thu, 23 Apr 2020 17:25:25 +0100 |
Henry S. Thompson |
switch for use on login server, invoke by hand with 0/1 as only cmd line arg
|
Wed, 22 Apr 2020 18:42:40 +0100 |
Henry S. Thompson |
java stuff
|
Wed, 22 Apr 2020 18:42:23 +0100 |
Henry S. Thompson |
try nutch fetch for big pdfs
|
Wed, 15 Apr 2020 18:44:18 +0100 |
Henry S. Thompson |
final most general versin
|
Tue, 14 Apr 2020 17:52:34 +0100 |
Henry S. Thompson |
too big for /dev/shm, split in half
|
Tue, 14 Apr 2020 16:10:22 +0100 |
Henry S. Thompson |
one-off to convert big extracts.tar into lots of smaller ones
|