17 months ago |
Henry S. Thompson |
robotstxt and crawldiagnostics get free ride,
|
17 months ago |
Henry S. Thompson |
a few more from ecclerig,
|
17 months ago |
Henry S. Thompson |
refactor datestream reading,
|
17 months ago |
Henry S. Thompson |
more faithful regexps and non-byte uri output
|
17 months ago |
Henry S. Thompson |
one uncommited fix from quentin
|
17 months ago |
Henry Thompson |
pass in debug flag(s) to merge_date.py
|
17 months ago |
Henry Thompson |
loosen must-match criterion in the both-messy case
|
17 months ago |
Henry Thompson |
one more sid fix,
|
17 months ago |
Henry S. Thompson |
working on sessionID pblms, still
|
17 months ago |
Henry Thompson |
first try
|
17 months ago |
Henry S. Thompson |
switch to gzip -7 to get comparable compressed cdx block size
|
17 months ago |
Henry S. Thompson |
use my own Canonicalizer to fix more obscure
|
17 months ago |
Henry S. Thompson |
re-instate logging splits for .idx
|
17 months ago |
Henry S. Thompson |
reinstate better check to start queuing,
|
17 months ago |
Henry S. Thompson |
bug4 fixed, but that created a new, earlier bug
|
17 months ago |
Henry S. Thompson |
rework handling of session key problem
|
17 months ago |
Henry S. Thompson |
initialise paths for csing
|
17 months ago |
Henry S. Thompson |
d'oh
|
17 months ago |
Henry S. Thompson |
include full URI in output
|
17 months ago |
Henry S. Thompson |
try to do csing correctly on compute nodes
|
17 months ago |
Henry S. Thompson |
version which outputs more identification,
|
18 months ago |
Henry S. Thompson |
last version before giving up on approach based only on key and datestamp
|
18 months ago |
Henry S. Thompson |
improve reordering, still failing on cdx-00004
|
18 months ago |
Henry S. Thompson |
attempt at reordering if necessary
|
18 months ago |
Henry S. Thompson |
mostly working, but need to reorder in case of cfid and friends
|
18 months ago |
Henry S. Thompson |
flip loops
|
18 months ago |
Henry S. Thompson |
merge a stream of ks files with a set of cdx files
|
18 months ago |
Henry S. Thompson |
final keystroke fixes, recurse and decimal www stripping
|
18 months ago |
Henry S. Thompson |
final keystroke fixes,
|
18 months ago |
Henry S. Thompson |
handle double .www, more keep-me chars
|
18 months ago |
Henry S. Thompson |
work-around for weird handling of %-encoding in Java impl. of SURT
|
18 months ago |
Henry Thompson |
merge, including pointless fix wrt pq
|
18 months ago |
Henry Thompson |
use surt instead of trying to create index term by hand
|
18 months ago |
Henry Thompson |
merge
|
18 months ago |
Henry Thompson |
stale
|
18 months ago |
Henry Thompson |
catching up by hand with markup version,
|
18 months ago |
Henry S. Thompson |
include timestamp
|
18 months ago |
Henry S. Thompson |
include query
|
18 months ago |
Henry S. Thompson |
make CC's own sorting explicit
|
18 months ago |
Henry S. Thompson |
handle corner cases with final . and initial www..+
|
19 months ago |
Henry S. Thompson |
handle %-encoded utf-8 as idna
|
19 months ago |
Henry S. Thompson |
merge
|
19 months ago |
Henry S. Thompson |
compute timestamps, key and sort lmh lines
|
19 months ago |
Henry S. Thompson |
work with csing
|
19 months ago |
Henry S. Thompson |
get man -k working
|
19 months ago |
Henry Thompson |
for warc_lmh slurm logs
|
19 months ago |
Henry S. Thompson |
for timing analysis
|
19 months ago |
Henry S. Thompson |
add support for multiple calls to srun with a counter
|
19 months ago |
Henry S. Thompson |
fix eof bug, expand error messages
|
19 months ago |
Henry S. Thompson |
part 2 is now working for all types
|
19 months ago |
Henry S. Thompson |
add a response-only test
|
19 months ago |
Henry S. Thompson |
revert to just showing first LM
|
19 months ago |
Henry S. Thompson |
more tests
|
19 months ago |
Henry S. Thompson |
Test 2 works with parts=1,2,3.
|
19 months ago |
Henry S. Thompson |
whole working
|
19 months ago |
Henry S. Thompson |
tests 1 & 2 now working
|