Thu, 28 Sep 2023 08:46:01 +0100 |
Henry S. Thompson |
moved from bin
|
Wed, 27 Sep 2023 17:29:51 +0100 |
Henry S. Thompson |
minor bug wrt EOF of final cdx input file
|
Wed, 27 Sep 2023 17:29:09 +0100 |
Henry S. Thompson |
replicate two extremely-corner cases of the way
|
Tue, 26 Sep 2023 18:55:43 +0100 |
Henry S. Thompson |
a bit more logging
|
Tue, 26 Sep 2023 18:55:11 +0100 |
Henry S. Thompson |
a bit more logging
|
Tue, 26 Sep 2023 17:42:57 +0100 |
Henry S. Thompson |
robotstxt and crawldiagnostics get free ride,
|
Tue, 26 Sep 2023 14:18:40 +0100 |
Henry S. Thompson |
a few more from ecclerig,
|
Tue, 26 Sep 2023 09:03:47 +0100 |
Henry S. Thompson |
refactor datestream reading,
|
Mon, 25 Sep 2023 23:53:13 +0100 |
Henry S. Thompson |
more faithful regexps and non-byte uri output
|
Fri, 22 Sep 2023 15:27:28 +0100 |
Henry S. Thompson |
one uncommited fix from quentin
|
Tue, 19 Sep 2023 19:40:58 +0100 |
Henry Thompson |
pass in debug flag(s) to merge_date.py
|
Tue, 19 Sep 2023 19:29:41 +0100 |
Henry Thompson |
loosen must-match criterion in the both-messy case
|
Tue, 19 Sep 2023 19:28:34 +0100 |
Henry Thompson |
one more sid fix,
|
Sun, 17 Sep 2023 15:18:11 +0100 |
Henry S. Thompson |
working on sessionID pblms, still
|
Thu, 14 Sep 2023 19:27:23 +0100 |
Henry Thompson |
first try
|
Wed, 13 Sep 2023 16:48:43 +0100 |
Henry S. Thompson |
switch to gzip -7 to get comparable compressed cdx block size
|
Wed, 13 Sep 2023 12:41:55 +0100 |
Henry S. Thompson |
use my own Canonicalizer to fix more obscure
|
Wed, 13 Sep 2023 12:40:39 +0100 |
Henry S. Thompson |
re-instate logging splits for .idx
|
Tue, 12 Sep 2023 12:14:04 +0100 |
Henry S. Thompson |
reinstate better check to start queuing,
|
Mon, 11 Sep 2023 22:06:45 +0100 |
Henry S. Thompson |
bug4 fixed, but that created a new, earlier bug
|
Mon, 11 Sep 2023 12:56:47 +0100 |
Henry S. Thompson |
rework handling of session key problem
|
Fri, 08 Sep 2023 21:40:52 +0100 |
Henry S. Thompson |
initialise paths for csing
|
Fri, 08 Sep 2023 21:40:06 +0100 |
Henry S. Thompson |
d'oh
|
Fri, 08 Sep 2023 18:06:54 +0100 |
Henry S. Thompson |
include full URI in output
|
Fri, 08 Sep 2023 18:05:57 +0100 |
Henry S. Thompson |
try to do csing correctly on compute nodes
|
Fri, 08 Sep 2023 09:29:25 +0100 |
Henry S. Thompson |
version which outputs more identification,
|
Thu, 07 Sep 2023 18:03:55 +0100 |
Henry S. Thompson |
last version before giving up on approach based only on key and datestamp
|
Wed, 06 Sep 2023 18:51:21 +0100 |
Henry S. Thompson |
improve reordering, still failing on cdx-00004
|
Tue, 05 Sep 2023 17:33:29 +0100 |
Henry S. Thompson |
attempt at reordering if necessary
|
Tue, 05 Sep 2023 17:32:46 +0100 |
Henry S. Thompson |
mostly working, but need to reorder in case of cfid and friends
|
Thu, 31 Aug 2023 14:14:21 +0100 |
Henry S. Thompson |
flip loops
|
Wed, 30 Aug 2023 21:49:43 +0100 |
Henry S. Thompson |
merge a stream of ks files with a set of cdx files
|
Wed, 30 Aug 2023 11:11:31 +0100 |
Henry S. Thompson |
final keystroke fixes, recurse and decimal www stripping
|
Wed, 30 Aug 2023 11:10:54 +0100 |
Henry S. Thompson |
final keystroke fixes,
|
Mon, 28 Aug 2023 21:07:43 +0100 |
Henry S. Thompson |
handle double .www, more keep-me chars
|
Thu, 24 Aug 2023 18:21:41 +0100 |
Henry S. Thompson |
work-around for weird handling of %-encoding in Java impl. of SURT
|
Mon, 21 Aug 2023 13:06:20 -0400 |
Henry Thompson |
merge, including pointless fix wrt pq
|
Sat, 19 Aug 2023 16:33:23 -0400 |
Henry Thompson |
use surt instead of trying to create index term by hand
|
Sat, 19 Aug 2023 16:02:29 -0400 |
Henry Thompson |
merge
|
Sat, 19 Aug 2023 15:58:38 -0400 |
Henry Thompson |
stale
|
Sat, 19 Aug 2023 15:53:59 -0400 |
Henry Thompson |
catching up by hand with markup version,
|
Mon, 21 Aug 2023 13:37:07 +0100 |
Henry S. Thompson |
include timestamp
|
Sun, 20 Aug 2023 00:28:43 +0100 |
Henry S. Thompson |
include query
|
Fri, 18 Aug 2023 18:25:54 +0100 |
Henry S. Thompson |
make CC's own sorting explicit
|
Thu, 10 Aug 2023 22:14:49 +0100 |
Henry S. Thompson |
handle corner cases with final . and initial www..+
|
Wed, 09 Aug 2023 02:01:32 +0100 |
Henry S. Thompson |
handle %-encoded utf-8 as idna
|
Tue, 08 Aug 2023 17:48:29 +0100 |
Henry S. Thompson |
merge
|
Tue, 08 Aug 2023 17:47:27 +0100 |
Henry S. Thompson |
compute timestamps, key and sort lmh lines
|
Tue, 08 Aug 2023 17:46:20 +0100 |
Henry S. Thompson |
work with csing
|
Tue, 08 Aug 2023 17:46:02 +0100 |
Henry S. Thompson |
get man -k working
|
Fri, 28 Jul 2023 00:50:13 +0100 |
Henry Thompson |
for warc_lmh slurm logs
|
Wed, 26 Jul 2023 18:42:19 +0100 |
Henry S. Thompson |
for timing analysis
|
Fri, 21 Jul 2023 11:37:47 +0100 |
Henry S. Thompson |
add support for multiple calls to srun with a counter
|
Thu, 20 Jul 2023 10:32:55 +0100 |
Henry S. Thompson |
fix eof bug, expand error messages
|
Wed, 19 Jul 2023 13:20:46 +0100 |
Henry S. Thompson |
part 2 is now working for all types
|
Wed, 19 Jul 2023 13:19:58 +0100 |
Henry S. Thompson |
add a response-only test
|
Wed, 19 Jul 2023 13:19:42 +0100 |
Henry S. Thompson |
revert to just showing first LM
|
Fri, 14 Jul 2023 17:39:14 +0100 |
Henry S. Thompson |
more tests
|
Fri, 14 Jul 2023 17:38:54 +0100 |
Henry S. Thompson |
Test 2 works with parts=1,2,3.
|
Fri, 14 Jul 2023 12:08:09 +0100 |
Henry S. Thompson |
whole working
|
Thu, 13 Jul 2023 14:02:02 +0100 |
Henry S. Thompson |
tests 1 & 2 now working
|
Thu, 13 Jul 2023 11:28:24 +0100 |
Henry S. Thompson |
avoid slicing buf by using memoryview to save copying
|
Wed, 12 Jul 2023 19:07:56 +0100 |
Henry S. Thompson |
but skip at eobp is not working (with test 2)
|
Wed, 12 Jul 2023 18:48:27 +0100 |
Henry S. Thompson |
works with all types, part=1
|
Mon, 10 Jul 2023 19:52:18 +0100 |
Henry S. Thompson |
rework completely to refill as much as possible only when necessary,
|
Mon, 10 Jul 2023 18:17:35 +0100 |
Henry S. Thompson |
finds multiples
|
Fri, 07 Jul 2023 19:30:23 +0100 |
Henry S. Thompson |
little steps
|
Fri, 07 Jul 2023 19:04:16 +0100 |
Henry S. Thompson |
made 1 mean 1, still losing after a while
|
Fri, 07 Jul 2023 17:04:05 +0100 |
Henry S. Thompson |
better debugging output
|
Fri, 07 Jul 2023 17:03:52 +0100 |
Henry S. Thompson |
working better, gets confused by 3-part response
|
Fri, 07 Jul 2023 13:39:23 +0100 |
Henry S. Thompson |
a bit better
|
Thu, 06 Jul 2023 14:53:28 +0100 |
Henry S. Thompson |
just barely working for 1, need to rethink buffering
|
Thu, 06 Jul 2023 13:27:33 +0100 |
Henry S. Thompson |
starting on conversion to direct-querying of buffer
|
Thu, 06 Jul 2023 10:19:02 +0100 |
Henry S. Thompson |
sic
|
Wed, 05 Jul 2023 19:32:36 +0100 |
Henry S. Thompson |
support on-board unzipping, reduce buffer size to 2MB
|
Wed, 05 Jul 2023 19:32:02 +0100 |
Henry S. Thompson |
make test 1 idempotent
|
Wed, 05 Jul 2023 17:51:44 +0100 |
Henry S. Thompson |
just count part length
|
Wed, 05 Jul 2023 17:49:24 +0100 |
Henry S. Thompson |
get EOF right, finally
|
Wed, 05 Jul 2023 15:37:16 +0100 |
Henry S. Thompson |
make warc.py a library, separate out testing
|
Wed, 05 Jul 2023 15:12:54 +0100 |
Henry S. Thompson |
correct comment
|
Wed, 05 Jul 2023 15:12:07 +0100 |
Henry S. Thompson |
add lots more debugging output,
|
Wed, 05 Jul 2023 15:09:57 +0100 |
Henry S. Thompson |
moved from home bin
|
Tue, 10 Jan 2023 17:49:01 +0000 |
Henry S. Thompson |
doc pointer
|
Tue, 13 Dec 2022 14:16:42 +0000 |
Henry S. Thompson |
push actions in main fn
|
Tue, 13 Dec 2022 14:16:22 +0000 |
Henry S. Thompson |
fixed for paper
|
Thu, 24 Nov 2022 12:37:17 +0000 |
Henry S. Thompson |
fix N
|
Wed, 23 Nov 2022 11:05:45 +0000 |
Henry S. Thompson |
compute and graph confidence intervals
|
Tue, 22 Nov 2022 19:13:25 +0000 |
Henry S. Thompson |
generalise hist
|
Tue, 22 Nov 2022 11:02:51 +0000 |
Henry S. Thompson |
add sort flag to plot_x
|
Thu, 17 Nov 2022 13:51:19 +0000 |
Henry S. Thompson |
get multi-ranking done right
|
Thu, 17 Nov 2022 11:27:07 +0000 |
Henry S. Thompson |
comments and more care about rows vs. columns
|
Wed, 16 Nov 2022 19:52:50 +0000 |
Henry S. Thompson |
start work on ranking,
|
Wed, 16 Nov 2022 17:29:55 +0000 |
Henry S. Thompson |
Spearman for matlab
|
Wed, 16 Nov 2022 17:28:56 +0000 |
Henry S. Thompson |
move all plots into functions
|
Tue, 15 Nov 2022 19:37:28 +0000 |
Henry S. Thompson |
a bit more
|
Mon, 14 Nov 2022 18:52:35 +0000 |
Henry S. Thompson |
framework for stats over results of rank correlations
|
Fri, 11 Nov 2022 14:44:05 +0000 |
Henry S. Thompson |
first plot efforts w. scipy
|
Fri, 21 Oct 2022 18:09:53 +0100 |
Henry S. Thompson |
sic
|
Thu, 29 Sep 2022 16:36:52 +0100 |
Henry S. Thompson |
accept filenames on stdin,
|
Thu, 29 Sep 2022 16:33:42 +0100 |
Henry S. Thompson |
interpolate process0, support permanent subproc
|
Thu, 29 Sep 2022 16:31:28 +0100 |
Henry S. Thompson |
new
|
Thu, 29 Sep 2022 16:30:48 +0100 |
Henry S. Thompson |
new
|
Sun, 07 Aug 2022 13:58:33 +0100 |
Henry S. Thompson |
write to tmp file implemented
|
Sun, 07 Aug 2022 13:57:28 +0100 |
Henry S. Thompson |
use awk for simple cut
|
Sun, 07 Aug 2022 13:56:49 +0100 |
Henry S. Thompson |
toward link extractions from pdf
|
Sun, 07 Aug 2022 13:56:00 +0100 |
Henry S. Thompson |
in progress...
|
Sun, 07 Aug 2022 13:55:05 +0100 |
Henry S. Thompson |
x
|
Thu, 28 Jul 2022 17:32:11 +0100 |
Henry S. Thompson |
x
|
Thu, 28 Jul 2022 17:30:10 +0100 |
Henry S. Thompson |
fix quoting pblm by using parallel ... -q
|
Thu, 28 Jul 2022 14:45:35 +0100 |
Henry S. Thompson |
catch-up
|
Sat, 23 Jul 2022 11:50:46 +0100 |
Henry S. Thompson |
minimal hst preferred options
|
Sat, 23 Jul 2022 11:50:02 +0100 |
Henry S. Thompson |
work around problem with PROMPT_COMMAND
|
Wed, 20 Jul 2022 19:44:03 +0100 |
Henry S. Thompson |
x
|
Wed, 20 Jul 2022 19:41:49 +0100 |
Henry S. Thompson |
fix PROMPT_COMMAND
|
Wed, 20 Jul 2022 19:41:19 +0100 |
Henry S. Thompson |
x
|
Wed, 20 Jul 2022 19:39:41 +0100 |
Henry S. Thompson |
tidy up and include uniq -c
|
Wed, 20 Jul 2022 19:38:30 +0100 |
Henry S. Thompson |
convert to no longer need uniq -c
|
Tue, 19 Jul 2022 11:02:41 +0100 |
Henry S. Thompson |
oops, 1.1 was half-modified, bogus
|
Mon, 18 Jul 2022 19:22:42 +0100 |
Henry S. Thompson |
compute node workers, see cirrus_home/bin repo for login node masters
|
Mon, 18 Jul 2022 19:20:56 +0100 |
Henry S. Thompson |
getting started
|