Tue, 24 Oct 2023 16:58:44 +0100 |
Henry S. Thompson |
resurrect parallel fetch
|
Tue, 24 Oct 2023 14:34:58 +0100 |
Henry S. Thompson |
convert to single thread,
|
Tue, 24 Oct 2023 14:26:36 +0100 |
Henry S. Thompson |
avoid global name conflict
|
Wed, 11 Oct 2023 12:51:06 +0100 |
Henry S. Thompson |
moved from /beegfs/common-crawl to get under .hg
|
Wed, 11 Oct 2023 12:50:29 +0100 |
Henry S. Thompson |
fix typo
|
Fri, 06 Oct 2023 15:06:53 +0100 |
Henry S. Thompson |
build cluster.idx
|
Fri, 06 Oct 2023 15:05:55 +0100 |
Henry S. Thompson |
no longer using cmp_to_key
|
Wed, 04 Oct 2023 20:04:34 +0100 |
Henry S. Thompson |
handle -m case, support src from cmdline
mergefix
|
Thu, 05 Oct 2023 10:42:15 +0100 |
Henry S. Thompson |
new branch to save do_idx.sh from abandoned merge fixup
mergefix
|
Wed, 04 Oct 2023 18:53:55 +0100 |
Henry S. Thompson |
try to get the counts right, particularly when re-merging
|
Wed, 04 Oct 2023 18:51:56 +0100 |
Henry S. Thompson |
for use in debugging, see notes and tests 2, 17, merge test
|
Tue, 03 Oct 2023 17:45:57 +0100 |
Henry S. Thompson |
add various www deletion cases
|
Tue, 03 Oct 2023 17:44:59 +0100 |
Henry S. Thompson |
iterate WPAT fix with improved pattern
|
Tue, 03 Oct 2023 17:43:52 +0100 |
Henry S. Thompson |
loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
|
Mon, 02 Oct 2023 18:56:50 +0100 |
Henry S. Thompson |
refactor to enable rerun with fixup,
|