log

age author description
Thu, 23 Jan 2025 12:53:28 +0000 Henry S. Thompson renamed cpython class Cdb to CCdb to avoid name conflict with cdb.Cdb default tip
Thu, 23 Jan 2025 12:27:57 +0000 Henry S. Thompson work with libcdb.a
Sat, 18 Jan 2025 23:00:30 +0000 Henry S. Thompson value from memory view working
Sat, 18 Jan 2025 21:25:17 +0000 Henry S. Thompson try using cdb as C library
Fri, 17 Jan 2025 20:37:10 +0000 Henry S. Thompson add some cython decoration, not much effect
Fri, 17 Jan 2025 20:35:21 +0000 Henry S. Thompson run with login shell
Fri, 17 Jan 2025 20:34:32 +0000 Henry S. Thompson tweak XEmacs font/key bindings
Fri, 17 Jan 2025 19:58:04 +0000 Henry S. Thompson tweak XEmacs font
Thu, 02 Jan 2025 18:35:08 +0000 Henry S. Thompson time the unpickling
Thu, 02 Jan 2025 18:30:03 +0000 Henry S. Thompson with bloom prefilter
Thu, 02 Jan 2025 14:52:14 +0000 Henry S. Thompson try adding lm to existing index from ks_0-9
Thu, 02 Jan 2025 14:51:00 +0000 Henry S. Thompson output bytes, pickle and save dict if -p, trim lm value to int
Wed, 01 Jan 2025 23:02:35 +0000 Henry S. Thompson test big dict for associating lm timestamp with cc timestamp+uri
Thu, 03 Oct 2024 18:17:55 +0100 Henry S. Thompson working together works well to provide what's needed to update a cdx to include lastmod where possible
Wed, 02 Oct 2024 19:54:45 +0100 Henry S. Thompson make into a library, entry point def unpackz(infileName, callback, outfile = None),
Wed, 02 Oct 2024 11:09:58 +0100 Henry S. Thompson cleaned up indentation to 2 spaces throughout
Wed, 02 Oct 2024 09:56:37 +0100 Henry S. Thompson take bufsize from cmdline
Tue, 01 Oct 2024 15:59:26 +0100 Henry S. Thompson eof pblms fixed, seems to work
Sat, 28 Sep 2024 15:19:05 +0100 Henry S. Thompson working, but last count/offset not being written
Thu, 26 Sep 2024 17:54:12 +0100 Henry S. Thompson fix error message
Thu, 26 Sep 2024 12:38:34 +0100 Henry S. Thompson csing disabled for now
Thu, 26 Sep 2024 12:29:27 +0100 Henry S. Thompson font hacking, see also lib/xemacs/common-init.el
Thu, 26 Sep 2024 12:25:54 +0100 Henry S. Thompson new default from CC themselves
Thu, 26 Sep 2024 12:24:16 +0100 Henry S. Thompson for debugging?
Thu, 09 May 2024 12:36:57 +0100 Henry S. Thompson for use in Stuttgart, maybe
Sat, 02 Mar 2024 10:59:06 +0000 Henry S. Thompson xxx
Thu, 29 Feb 2024 15:01:10 +0000 Henry S. Thompson merge
Thu, 29 Feb 2024 15:01:02 +0000 Henry S. Thompson post-processing
Wed, 28 Feb 2024 18:31:52 +0000 Henry S. Thompson sic
Thu, 29 Feb 2024 14:59:50 +0000 Henry S. Thompson compute offset between LM and crawl timestamp
Thu, 29 Feb 2024 14:59:09 +0000 Henry S. Thompson sic
Wed, 28 Feb 2024 15:27:00 +0000 Henry S. Thompson rebuild to match triple fig line colour
Wed, 28 Feb 2024 15:13:38 +0000 Henry S. Thompson rebuild with more consistent appearance
Wed, 28 Feb 2024 14:50:08 +0000 Henry S. Thompson merge
Wed, 28 Feb 2024 14:49:45 +0000 Henry S. Thompson replaced mean_lens by w or wo bogon
Wed, 28 Feb 2024 14:44:59 +0000 Henry S. Thompson now using clean 2005 count
Wed, 28 Feb 2024 10:32:01 +0000 Henry Thompson minor addition?
Wed, 28 Feb 2024 10:20:44 +0000 Henry S. Thompson merge
Wed, 28 Feb 2024 10:15:56 +0000 Henry S. Thompson what is this?
Tue, 20 Feb 2024 15:23:47 +0000 Henry Thompson add percentage of non-latin by crawl table
Fri, 16 Feb 2024 16:24:28 +0000 Henry Thompson tld change investigation
Fri, 16 Feb 2024 13:54:12 +0000 Henry S. Thompson nl1 and tld summary results
Thu, 15 Feb 2024 22:31:09 +0000 Henry S. Thompson correct Usage
Thu, 15 Feb 2024 22:30:40 +0000 Henry S. Thompson csing-related tweaks
Thu, 15 Feb 2024 16:36:00 +0000 Henry S. Thompson merge
Thu, 15 Feb 2024 15:10:34 +0000 Henry S. Thompson see Paul:Documents/HTalks/WebSci2024
Thu, 11 Jan 2024 16:44:45 +0000 Henry S. Thompson add some debugging info
Thu, 11 Jan 2024 16:43:16 +0000 Henry S. Thompson use 2-digit suffixes,
Fri, 08 Dec 2023 10:32:07 +0000 Henry S. Thompson sic
Thu, 07 Dec 2023 18:23:11 +0000 Henry S. Thompson sic
Thu, 07 Dec 2023 18:21:48 +0000 Henry S. Thompson added back missing years
Thu, 07 Dec 2023 18:15:43 +0000 Henry S. Thompson support semilogy from cmd line
Wed, 06 Dec 2023 13:36:49 +0000 Henry S. Thompson means of all columns in length analyses
Wed, 06 Dec 2023 13:33:25 +0000 Henry S. Thompson normalise % counts by non-empty bases only
Tue, 05 Dec 2023 19:49:29 +0000 Henry S. Thompson new plots various
Tue, 05 Dec 2023 19:49:11 +0000 Henry S. Thompson get single graph working, tweak params various
Tue, 05 Dec 2023 10:35:15 +0000 Henry S. Thompson compute (component) uri lengths and a few other properties
Mon, 04 Dec 2023 19:06:13 +0000 Henry S. Thompson with three tracks from two years
Mon, 04 Dec 2023 10:42:02 +0000 Henry S. Thompson for pub
Mon, 04 Dec 2023 10:40:47 +0000 Henry S. Thompson tweaked formatting
Mon, 04 Dec 2023 10:21:30 +0000 Henry S. Thompson excel rewrote, no important changes (?)
Mon, 04 Dec 2023 09:42:39 +0000 Henry S. Thompson replace wrong one with right one
Mon, 04 Dec 2023 09:37:14 +0000 Henry S. Thompson merge
Mon, 04 Dec 2023 09:35:53 +0000 Henry S. Thompson implement alternative confidence measure using stats.bootstrap,
Mon, 04 Dec 2023 09:33:13 +0000 Henry S. Thompson for LMh percentile
Thu, 30 Nov 2023 14:42:46 +0000 Henry S. Thompson decorated
Thu, 30 Nov 2023 14:20:22 +0000 Henry S. Thompson merge
Thu, 30 Nov 2023 14:18:56 +0000 Henry S. Thompson can't add props to DescribeResult
Thu, 30 Nov 2023 14:17:34 +0000 Henry S. Thompson for 2023-40
Tue, 28 Nov 2023 18:40:38 +0000 Henry S. Thompson with decorations
Tue, 28 Nov 2023 18:40:17 +0000 Henry S. Thompson excel rewrote, no important changes (?)
Tue, 28 Nov 2023 10:23:20 +0000 Henry S. Thompson with percentile instead of raw mean correl
Tue, 28 Nov 2023 10:22:38 +0000 Henry S. Thompson change heatmap to by percentile
Tue, 28 Nov 2023 10:21:36 +0000 Henry S. Thompson with heat
Mon, 27 Nov 2023 22:15:39 +0000 Henry S. Thompson heat map for mime vs. nl1 vs. len
Mon, 27 Nov 2023 22:14:53 +0000 Henry S. Thompson add head_map fn
Mon, 27 Nov 2023 18:25:39 +0000 Henry S. Thompson add explore_deltas and predict analysis fns
Sun, 26 Nov 2023 21:24:38 +0000 Henry S. Thompson rename to avoid name clash with scipy.stats
Fri, 24 Nov 2023 20:41:03 +0000 Henry S. Thompson move to class with local vars instead of many globals
Fri, 24 Nov 2023 20:40:09 +0000 Henry S. Thompson renamed to by_interval.py
Fri, 24 Nov 2023 20:39:08 +0000 Henry S. Thompson renamed from spearman.py
Fri, 24 Nov 2023 20:38:39 +0000 Henry S. Thompson renamed to stats.py
Fri, 24 Nov 2023 19:52:52 +0000 Henry S. Thompson do the __main__ thing
Fri, 24 Nov 2023 19:52:14 +0000 Henry S. Thompson put results in numbered subdirs
Fri, 24 Nov 2023 19:50:12 +0000 Henry S. Thompson add minimal logging and don't return until finished
Wed, 15 Nov 2023 10:24:32 +0000 Henry S. Thompson should work for months also now
Wed, 15 Nov 2023 09:36:23 +0000 Henry S. Thompson cross-language confusion :-)
Mon, 06 Nov 2023 15:55:57 +0000 Henry S. Thompson LM plot for multiple crawls, magnitude or %age
Fri, 03 Nov 2023 19:05:54 +0000 Henry S. Thompson can overlay the two
Thu, 02 Nov 2023 15:38:39 +0000 Henry S. Thompson fix output year
Thu, 02 Nov 2023 13:49:02 +0000 Henry S. Thompson sic
Tue, 31 Oct 2023 14:05:12 +0000 Henry S. Thompson sic
Tue, 31 Oct 2023 14:04:24 +0000 Henry S. Thompson get in/out file management working right
Tue, 31 Oct 2023 14:03:02 +0000 Henry S. Thompson refactor to provide for buffer overflow fix
Tue, 31 Oct 2023 14:01:50 +0000 Henry S. Thompson bug-fix wrt 1st time,
Mon, 30 Oct 2023 12:19:53 +0000 Henry S. Thompson make extra file info optional
Wed, 25 Oct 2023 23:01:59 +0100 Henry S. Thompson forget parallel, just do (default 2) parallel single threads
Wed, 25 Oct 2023 23:00:45 +0100 Henry S. Thompson add missing makedir
Tue, 24 Oct 2023 16:59:23 +0100 Henry S. Thompson now does one named segment only
Tue, 24 Oct 2023 16:58:44 +0100 Henry S. Thompson resurrect parallel fetch
Tue, 24 Oct 2023 14:34:58 +0100 Henry S. Thompson convert to single thread,
Tue, 24 Oct 2023 14:26:36 +0100 Henry S. Thompson avoid global name conflict
Wed, 11 Oct 2023 12:51:06 +0100 Henry S. Thompson moved from /beegfs/common-crawl to get under .hg
Wed, 11 Oct 2023 12:50:29 +0100 Henry S. Thompson fix typo
Fri, 06 Oct 2023 15:06:53 +0100 Henry S. Thompson build cluster.idx
Fri, 06 Oct 2023 15:05:55 +0100 Henry S. Thompson no longer using cmp_to_key
Wed, 04 Oct 2023 20:04:34 +0100 Henry S. Thompson handle -m case, support src from cmdline mergefix
Thu, 05 Oct 2023 10:42:15 +0100 Henry S. Thompson new branch to save do_idx.sh from abandoned merge fixup mergefix
Wed, 04 Oct 2023 18:53:55 +0100 Henry S. Thompson try to get the counts right, particularly when re-merging
Wed, 04 Oct 2023 18:51:56 +0100 Henry S. Thompson for use in debugging, see notes and tests 2, 17, merge test
Tue, 03 Oct 2023 17:45:57 +0100 Henry S. Thompson add various www deletion cases
Tue, 03 Oct 2023 17:44:59 +0100 Henry S. Thompson iterate WPAT fix with improved pattern
Tue, 03 Oct 2023 17:43:52 +0100 Henry S. Thompson loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
Mon, 02 Oct 2023 18:56:50 +0100 Henry S. Thompson refactor to enable rerun with fixup,
Mon, 02 Oct 2023 18:55:48 +0100 Henry S. Thompson correct mistaken futnsz test,
Mon, 02 Oct 2023 18:54:10 +0100 Henry S. Thompson change path to merge_date.py
Mon, 02 Oct 2023 18:52:43 +0100 Henry S. Thompson remove the mistaken deletion of NONPRINT,
Sat, 30 Sep 2023 18:04:15 +0100 Henry S. Thompson fix a bad fix and a bad test for the televida case
Sat, 30 Sep 2023 14:13:19 +0100 Henry S. Thompson fix and test for all-decimal host
Sat, 30 Sep 2023 14:12:39 +0100 Henry S. Thompson no import in lmh.__init__ any more