log

age author description
Mon, 04 Dec 2023 19:06:13 +0000 Henry S. Thompson with three tracks from two years
Mon, 04 Dec 2023 10:42:02 +0000 Henry S. Thompson for pub
Mon, 04 Dec 2023 10:40:47 +0000 Henry S. Thompson tweaked formatting
Mon, 04 Dec 2023 10:21:30 +0000 Henry S. Thompson excel rewrote, no important changes (?)
Mon, 04 Dec 2023 09:42:39 +0000 Henry S. Thompson replace wrong one with right one
Mon, 04 Dec 2023 09:37:14 +0000 Henry S. Thompson merge
Mon, 04 Dec 2023 09:35:53 +0000 Henry S. Thompson implement alternative confidence measure using stats.bootstrap,
Mon, 04 Dec 2023 09:33:13 +0000 Henry S. Thompson for LMh percentile
Thu, 30 Nov 2023 14:42:46 +0000 Henry S. Thompson decorated
Thu, 30 Nov 2023 14:20:22 +0000 Henry S. Thompson merge
Thu, 30 Nov 2023 14:18:56 +0000 Henry S. Thompson can't add props to DescribeResult
Thu, 30 Nov 2023 14:17:34 +0000 Henry S. Thompson for 2023-40
Tue, 28 Nov 2023 18:40:38 +0000 Henry S. Thompson with decorations
Tue, 28 Nov 2023 18:40:17 +0000 Henry S. Thompson excel rewrote, no important changes (?)
Tue, 28 Nov 2023 10:23:20 +0000 Henry S. Thompson with percentile instead of raw mean correl
Tue, 28 Nov 2023 10:22:38 +0000 Henry S. Thompson change heatmap to by percentile
Tue, 28 Nov 2023 10:21:36 +0000 Henry S. Thompson with heat
Mon, 27 Nov 2023 22:15:39 +0000 Henry S. Thompson heat map for mime vs. nl1 vs. len
Mon, 27 Nov 2023 22:14:53 +0000 Henry S. Thompson add head_map fn
Mon, 27 Nov 2023 18:25:39 +0000 Henry S. Thompson add explore_deltas and predict analysis fns
Sun, 26 Nov 2023 21:24:38 +0000 Henry S. Thompson rename to avoid name clash with scipy.stats
Fri, 24 Nov 2023 20:41:03 +0000 Henry S. Thompson move to class with local vars instead of many globals
Fri, 24 Nov 2023 20:40:09 +0000 Henry S. Thompson renamed to by_interval.py
Fri, 24 Nov 2023 20:39:08 +0000 Henry S. Thompson renamed from spearman.py
Fri, 24 Nov 2023 20:38:39 +0000 Henry S. Thompson renamed to stats.py
Fri, 24 Nov 2023 19:52:52 +0000 Henry S. Thompson do the __main__ thing
Fri, 24 Nov 2023 19:52:14 +0000 Henry S. Thompson put results in numbered subdirs
Fri, 24 Nov 2023 19:50:12 +0000 Henry S. Thompson add minimal logging and don't return until finished
Wed, 15 Nov 2023 10:24:32 +0000 Henry S. Thompson should work for months also now
Wed, 15 Nov 2023 09:36:23 +0000 Henry S. Thompson cross-language confusion :-)
Mon, 06 Nov 2023 15:55:57 +0000 Henry S. Thompson LM plot for multiple crawls, magnitude or %age
Fri, 03 Nov 2023 19:05:54 +0000 Henry S. Thompson can overlay the two
Thu, 02 Nov 2023 15:38:39 +0000 Henry S. Thompson fix output year
Thu, 02 Nov 2023 13:49:02 +0000 Henry S. Thompson sic
Tue, 31 Oct 2023 14:05:12 +0000 Henry S. Thompson sic
Tue, 31 Oct 2023 14:04:24 +0000 Henry S. Thompson get in/out file management working right
Tue, 31 Oct 2023 14:03:02 +0000 Henry S. Thompson refactor to provide for buffer overflow fix
Tue, 31 Oct 2023 14:01:50 +0000 Henry S. Thompson bug-fix wrt 1st time,
Mon, 30 Oct 2023 12:19:53 +0000 Henry S. Thompson make extra file info optional
Wed, 25 Oct 2023 23:01:59 +0100 Henry S. Thompson forget parallel, just do (default 2) parallel single threads
Wed, 25 Oct 2023 23:00:45 +0100 Henry S. Thompson add missing makedir
Tue, 24 Oct 2023 16:59:23 +0100 Henry S. Thompson now does one named segment only
Tue, 24 Oct 2023 16:58:44 +0100 Henry S. Thompson resurrect parallel fetch
Tue, 24 Oct 2023 14:34:58 +0100 Henry S. Thompson convert to single thread,
Tue, 24 Oct 2023 14:26:36 +0100 Henry S. Thompson avoid global name conflict
Wed, 11 Oct 2023 12:51:06 +0100 Henry S. Thompson moved from /beegfs/common-crawl to get under .hg
Wed, 11 Oct 2023 12:50:29 +0100 Henry S. Thompson fix typo
Fri, 06 Oct 2023 15:06:53 +0100 Henry S. Thompson build cluster.idx
Fri, 06 Oct 2023 15:05:55 +0100 Henry S. Thompson no longer using cmp_to_key
Wed, 04 Oct 2023 20:04:34 +0100 Henry S. Thompson handle -m case, support src from cmdline mergefix
Thu, 05 Oct 2023 10:42:15 +0100 Henry S. Thompson new branch to save do_idx.sh from abandoned merge fixup mergefix
Wed, 04 Oct 2023 18:53:55 +0100 Henry S. Thompson try to get the counts right, particularly when re-merging
Wed, 04 Oct 2023 18:51:56 +0100 Henry S. Thompson for use in debugging, see notes and tests 2, 17, merge test
Tue, 03 Oct 2023 17:45:57 +0100 Henry S. Thompson add various www deletion cases
Tue, 03 Oct 2023 17:44:59 +0100 Henry S. Thompson iterate WPAT fix with improved pattern
Tue, 03 Oct 2023 17:43:52 +0100 Henry S. Thompson loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
Mon, 02 Oct 2023 18:56:50 +0100 Henry S. Thompson refactor to enable rerun with fixup,
Mon, 02 Oct 2023 18:55:48 +0100 Henry S. Thompson correct mistaken futnsz test,
Mon, 02 Oct 2023 18:54:10 +0100 Henry S. Thompson change path to merge_date.py
Mon, 02 Oct 2023 18:52:43 +0100 Henry S. Thompson remove the mistaken deletion of NONPRINT,
Sat, 30 Sep 2023 18:04:15 +0100 Henry S. Thompson fix a bad fix and a bad test for the televida case
Sat, 30 Sep 2023 14:13:19 +0100 Henry S. Thompson fix and test for all-decimal host
Sat, 30 Sep 2023 14:12:39 +0100 Henry S. Thompson no import in lmh.__init__ any more
Sat, 30 Sep 2023 14:11:49 +0100 Henry S. Thompson importing in __init__ causes problems
Fri, 29 Sep 2023 15:59:34 +0100 Henry S. Thompson commented out duplicate, handle comments better
Fri, 29 Sep 2023 15:14:29 +0100 Henry S. Thompson more corner case tests
Fri, 29 Sep 2023 15:13:51 +0100 Henry S. Thompson tweaks to get all tests through #14
Thu, 28 Sep 2023 18:31:23 +0100 Henry S. Thompson get 7f (two cases) and %25 working
Thu, 28 Sep 2023 18:30:48 +0100 Henry S. Thompson add televida case test
Thu, 28 Sep 2023 16:36:15 +0100 Henry S. Thompson add test description
Thu, 28 Sep 2023 16:35:39 +0100 Henry S. Thompson importable just in case
Thu, 28 Sep 2023 16:34:49 +0100 Henry S. Thompson move most of the hacking into fixGoogleCanon,
Thu, 28 Sep 2023 16:10:05 +0100 Henry S. Thompson forget assert, allow multiple failures
Thu, 28 Sep 2023 16:09:38 +0100 Henry S. Thompson x
Thu, 28 Sep 2023 14:08:36 +0100 Henry S. Thompson found right place for \x7f hack, maybe
Thu, 28 Sep 2023 14:06:11 +0100 Henry S. Thompson readability
Thu, 28 Sep 2023 11:00:36 +0100 Henry S. Thompson x
Thu, 28 Sep 2023 11:00:24 +0100 Henry S. Thompson refactor to sort a module in an lmh package
Thu, 28 Sep 2023 10:54:12 +0100 Henry S. Thompson start some regression tests
Thu, 28 Sep 2023 09:01:18 +0100 Henry S. Thompson creating lmh package
Thu, 28 Sep 2023 08:46:01 +0100 Henry S. Thompson moved from bin
Wed, 27 Sep 2023 17:29:51 +0100 Henry S. Thompson minor bug wrt EOF of final cdx input file
Wed, 27 Sep 2023 17:29:09 +0100 Henry S. Thompson replicate two extremely-corner cases of the way
Tue, 26 Sep 2023 18:55:43 +0100 Henry S. Thompson a bit more logging
Tue, 26 Sep 2023 18:55:11 +0100 Henry S. Thompson a bit more logging
Tue, 26 Sep 2023 17:42:57 +0100 Henry S. Thompson robotstxt and crawldiagnostics get free ride,
Tue, 26 Sep 2023 14:18:40 +0100 Henry S. Thompson a few more from ecclerig,
Tue, 26 Sep 2023 09:03:47 +0100 Henry S. Thompson refactor datestream reading,
Mon, 25 Sep 2023 23:53:13 +0100 Henry S. Thompson more faithful regexps and non-byte uri output
Fri, 22 Sep 2023 15:27:28 +0100 Henry S. Thompson one uncommited fix from quentin
Tue, 19 Sep 2023 19:40:58 +0100 Henry Thompson pass in debug flag(s) to merge_date.py
Tue, 19 Sep 2023 19:29:41 +0100 Henry Thompson loosen must-match criterion in the both-messy case
Tue, 19 Sep 2023 19:28:34 +0100 Henry Thompson one more sid fix,
Sun, 17 Sep 2023 15:18:11 +0100 Henry S. Thompson working on sessionID pblms, still
Thu, 14 Sep 2023 19:27:23 +0100 Henry Thompson first try
Wed, 13 Sep 2023 16:48:43 +0100 Henry S. Thompson switch to gzip -7 to get comparable compressed cdx block size
Wed, 13 Sep 2023 12:41:55 +0100 Henry S. Thompson use my own Canonicalizer to fix more obscure
Wed, 13 Sep 2023 12:40:39 +0100 Henry S. Thompson re-instate logging splits for .idx
Tue, 12 Sep 2023 12:14:04 +0100 Henry S. Thompson reinstate better check to start queuing,
Mon, 11 Sep 2023 22:06:45 +0100 Henry S. Thompson bug4 fixed, but that created a new, earlier bug
Mon, 11 Sep 2023 12:56:47 +0100 Henry S. Thompson rework handling of session key problem
Fri, 08 Sep 2023 21:40:52 +0100 Henry S. Thompson initialise paths for csing
Fri, 08 Sep 2023 21:40:06 +0100 Henry S. Thompson d'oh
Fri, 08 Sep 2023 18:06:54 +0100 Henry S. Thompson include full URI in output
Fri, 08 Sep 2023 18:05:57 +0100 Henry S. Thompson try to do csing correctly on compute nodes
Fri, 08 Sep 2023 09:29:25 +0100 Henry S. Thompson version which outputs more identification,
Thu, 07 Sep 2023 18:03:55 +0100 Henry S. Thompson last version before giving up on approach based only on key and datestamp
Wed, 06 Sep 2023 18:51:21 +0100 Henry S. Thompson improve reordering, still failing on cdx-00004
Tue, 05 Sep 2023 17:33:29 +0100 Henry S. Thompson attempt at reordering if necessary
Tue, 05 Sep 2023 17:32:46 +0100 Henry S. Thompson mostly working, but need to reorder in case of cfid and friends
Thu, 31 Aug 2023 14:14:21 +0100 Henry S. Thompson flip loops
Wed, 30 Aug 2023 21:49:43 +0100 Henry S. Thompson merge a stream of ks files with a set of cdx files
Wed, 30 Aug 2023 11:11:31 +0100 Henry S. Thompson final keystroke fixes, recurse and decimal www stripping
Wed, 30 Aug 2023 11:10:54 +0100 Henry S. Thompson final keystroke fixes,
Mon, 28 Aug 2023 21:07:43 +0100 Henry S. Thompson handle double .www, more keep-me chars
Thu, 24 Aug 2023 18:21:41 +0100 Henry S. Thompson work-around for weird handling of %-encoding in Java impl. of SURT
Mon, 21 Aug 2023 13:06:20 -0400 Henry Thompson merge, including pointless fix wrt pq
Sat, 19 Aug 2023 16:33:23 -0400 Henry Thompson use surt instead of trying to create index term by hand
Sat, 19 Aug 2023 16:02:29 -0400 Henry Thompson merge
Sat, 19 Aug 2023 15:58:38 -0400 Henry Thompson stale
Sat, 19 Aug 2023 15:53:59 -0400 Henry Thompson catching up by hand with markup version,
Mon, 21 Aug 2023 13:37:07 +0100 Henry S. Thompson include timestamp
Sun, 20 Aug 2023 00:28:43 +0100 Henry S. Thompson include query
Fri, 18 Aug 2023 18:25:54 +0100 Henry S. Thompson make CC's own sorting explicit
Thu, 10 Aug 2023 22:14:49 +0100 Henry S. Thompson handle corner cases with final . and initial www..+
Wed, 09 Aug 2023 02:01:32 +0100 Henry S. Thompson handle %-encoded utf-8 as idna
Tue, 08 Aug 2023 17:48:29 +0100 Henry S. Thompson merge
Tue, 08 Aug 2023 17:47:27 +0100 Henry S. Thompson compute timestamps, key and sort lmh lines
Tue, 08 Aug 2023 17:46:20 +0100 Henry S. Thompson work with csing
Tue, 08 Aug 2023 17:46:02 +0100 Henry S. Thompson get man -k working
Fri, 28 Jul 2023 00:50:13 +0100 Henry Thompson for warc_lmh slurm logs
Wed, 26 Jul 2023 18:42:19 +0100 Henry S. Thompson for timing analysis
Fri, 21 Jul 2023 11:37:47 +0100 Henry S. Thompson add support for multiple calls to srun with a counter
Thu, 20 Jul 2023 10:32:55 +0100 Henry S. Thompson fix eof bug, expand error messages
Wed, 19 Jul 2023 13:20:46 +0100 Henry S. Thompson part 2 is now working for all types
Wed, 19 Jul 2023 13:19:58 +0100 Henry S. Thompson add a response-only test
Wed, 19 Jul 2023 13:19:42 +0100 Henry S. Thompson revert to just showing first LM
Fri, 14 Jul 2023 17:39:14 +0100 Henry S. Thompson more tests
Fri, 14 Jul 2023 17:38:54 +0100 Henry S. Thompson Test 2 works with parts=1,2,3.
Fri, 14 Jul 2023 12:08:09 +0100 Henry S. Thompson whole working
Thu, 13 Jul 2023 14:02:02 +0100 Henry S. Thompson tests 1 & 2 now working
Thu, 13 Jul 2023 11:28:24 +0100 Henry S. Thompson avoid slicing buf by using memoryview to save copying
Wed, 12 Jul 2023 19:07:56 +0100 Henry S. Thompson but skip at eobp is not working (with test 2)
Wed, 12 Jul 2023 18:48:27 +0100 Henry S. Thompson works with all types, part=1
Mon, 10 Jul 2023 19:52:18 +0100 Henry S. Thompson rework completely to refill as much as possible only when necessary,
Mon, 10 Jul 2023 18:17:35 +0100 Henry S. Thompson finds multiples
Fri, 07 Jul 2023 19:30:23 +0100 Henry S. Thompson little steps
Fri, 07 Jul 2023 19:04:16 +0100 Henry S. Thompson made 1 mean 1, still losing after a while
Fri, 07 Jul 2023 17:04:05 +0100 Henry S. Thompson better debugging output
Fri, 07 Jul 2023 17:03:52 +0100 Henry S. Thompson working better, gets confused by 3-part response
Fri, 07 Jul 2023 13:39:23 +0100 Henry S. Thompson a bit better
Thu, 06 Jul 2023 14:53:28 +0100 Henry S. Thompson just barely working for 1, need to rethink buffering
Thu, 06 Jul 2023 13:27:33 +0100 Henry S. Thompson starting on conversion to direct-querying of buffer
Thu, 06 Jul 2023 10:19:02 +0100 Henry S. Thompson sic
Wed, 05 Jul 2023 19:32:36 +0100 Henry S. Thompson support on-board unzipping, reduce buffer size to 2MB
Wed, 05 Jul 2023 19:32:02 +0100 Henry S. Thompson make test 1 idempotent
Wed, 05 Jul 2023 17:51:44 +0100 Henry S. Thompson just count part length
Wed, 05 Jul 2023 17:49:24 +0100 Henry S. Thompson get EOF right, finally
Wed, 05 Jul 2023 15:37:16 +0100 Henry S. Thompson make warc.py a library, separate out testing
Wed, 05 Jul 2023 15:12:54 +0100 Henry S. Thompson correct comment
Wed, 05 Jul 2023 15:12:07 +0100 Henry S. Thompson add lots more debugging output,
Wed, 05 Jul 2023 15:09:57 +0100 Henry S. Thompson moved from home bin
Tue, 10 Jan 2023 17:49:01 +0000 Henry S. Thompson doc pointer
Tue, 13 Dec 2022 14:16:42 +0000 Henry S. Thompson push actions in main fn
Tue, 13 Dec 2022 14:16:22 +0000 Henry S. Thompson fixed for paper
Thu, 24 Nov 2022 12:37:17 +0000 Henry S. Thompson fix N
Wed, 23 Nov 2022 11:05:45 +0000 Henry S. Thompson compute and graph confidence intervals
Tue, 22 Nov 2022 19:13:25 +0000 Henry S. Thompson generalise hist
Tue, 22 Nov 2022 11:02:51 +0000 Henry S. Thompson add sort flag to plot_x
Thu, 17 Nov 2022 13:51:19 +0000 Henry S. Thompson get multi-ranking done right
Thu, 17 Nov 2022 11:27:07 +0000 Henry S. Thompson comments and more care about rows vs. columns
Wed, 16 Nov 2022 19:52:50 +0000 Henry S. Thompson start work on ranking,
Wed, 16 Nov 2022 17:29:55 +0000 Henry S. Thompson Spearman for matlab
Wed, 16 Nov 2022 17:28:56 +0000 Henry S. Thompson move all plots into functions
Tue, 15 Nov 2022 19:37:28 +0000 Henry S. Thompson a bit more
Mon, 14 Nov 2022 18:52:35 +0000 Henry S. Thompson framework for stats over results of rank correlations
Fri, 11 Nov 2022 14:44:05 +0000 Henry S. Thompson first plot efforts w. scipy
Fri, 21 Oct 2022 18:09:53 +0100 Henry S. Thompson sic
Thu, 29 Sep 2022 16:36:52 +0100 Henry S. Thompson accept filenames on stdin,
Thu, 29 Sep 2022 16:33:42 +0100 Henry S. Thompson interpolate process0, support permanent subproc
Thu, 29 Sep 2022 16:31:28 +0100 Henry S. Thompson new
Thu, 29 Sep 2022 16:30:48 +0100 Henry S. Thompson new
Sun, 07 Aug 2022 13:58:33 +0100 Henry S. Thompson write to tmp file implemented
Sun, 07 Aug 2022 13:57:28 +0100 Henry S. Thompson use awk for simple cut
Sun, 07 Aug 2022 13:56:49 +0100 Henry S. Thompson toward link extractions from pdf
Sun, 07 Aug 2022 13:56:00 +0100 Henry S. Thompson in progress...
Sun, 07 Aug 2022 13:55:05 +0100 Henry S. Thompson x
Thu, 28 Jul 2022 17:32:11 +0100 Henry S. Thompson x
Thu, 28 Jul 2022 17:30:10 +0100 Henry S. Thompson fix quoting pblm by using parallel ... -q
Thu, 28 Jul 2022 14:45:35 +0100 Henry S. Thompson catch-up
Sat, 23 Jul 2022 11:50:46 +0100 Henry S. Thompson minimal hst preferred options
Sat, 23 Jul 2022 11:50:02 +0100 Henry S. Thompson work around problem with PROMPT_COMMAND
Wed, 20 Jul 2022 19:44:03 +0100 Henry S. Thompson x
Wed, 20 Jul 2022 19:41:49 +0100 Henry S. Thompson fix PROMPT_COMMAND
Wed, 20 Jul 2022 19:41:19 +0100 Henry S. Thompson x
Wed, 20 Jul 2022 19:39:41 +0100 Henry S. Thompson tidy up and include uniq -c
Wed, 20 Jul 2022 19:38:30 +0100 Henry S. Thompson convert to no longer need uniq -c
Tue, 19 Jul 2022 11:02:41 +0100 Henry S. Thompson oops, 1.1 was half-modified, bogus
Mon, 18 Jul 2022 19:22:42 +0100 Henry S. Thompson compute node workers, see cirrus_home/bin repo for login node masters
Mon, 18 Jul 2022 19:20:56 +0100 Henry S. Thompson getting started
Mon, 18 Jul 2022 18:50:26 +0100 Henry S. Thompson getting started