Wed, 02 Oct 2024 09:56:37 +0100 |
Henry S. Thompson |
take bufsize from cmdline
|
Tue, 01 Oct 2024 15:59:26 +0100 |
Henry S. Thompson |
eof pblms fixed, seems to work
|
Sat, 28 Sep 2024 15:19:05 +0100 |
Henry S. Thompson |
working, but last count/offset not being written
|
Thu, 26 Sep 2024 17:54:12 +0100 |
Henry S. Thompson |
fix error message
|
Thu, 26 Sep 2024 12:38:34 +0100 |
Henry S. Thompson |
csing disabled for now
|
Thu, 26 Sep 2024 12:29:27 +0100 |
Henry S. Thompson |
font hacking, see also lib/xemacs/common-init.el
|
Thu, 26 Sep 2024 12:25:54 +0100 |
Henry S. Thompson |
new default from CC themselves
|
Thu, 26 Sep 2024 12:24:16 +0100 |
Henry S. Thompson |
for debugging?
|
Thu, 09 May 2024 12:36:57 +0100 |
Henry S. Thompson |
for use in Stuttgart, maybe
|
Sat, 02 Mar 2024 10:59:06 +0000 |
Henry S. Thompson |
xxx
|
Thu, 29 Feb 2024 15:01:10 +0000 |
Henry S. Thompson |
merge
|
Thu, 29 Feb 2024 15:01:02 +0000 |
Henry S. Thompson |
post-processing
|
Wed, 28 Feb 2024 18:31:52 +0000 |
Henry S. Thompson |
sic
|
Thu, 29 Feb 2024 14:59:50 +0000 |
Henry S. Thompson |
compute offset between LM and crawl timestamp
|
Thu, 29 Feb 2024 14:59:09 +0000 |
Henry S. Thompson |
sic
|
Wed, 28 Feb 2024 15:27:00 +0000 |
Henry S. Thompson |
rebuild to match triple fig line colour
|
Wed, 28 Feb 2024 15:13:38 +0000 |
Henry S. Thompson |
rebuild with more consistent appearance
|
Wed, 28 Feb 2024 14:50:08 +0000 |
Henry S. Thompson |
merge
|
Wed, 28 Feb 2024 14:49:45 +0000 |
Henry S. Thompson |
replaced mean_lens by w or wo bogon
|
Wed, 28 Feb 2024 14:44:59 +0000 |
Henry S. Thompson |
now using clean 2005 count
|
Wed, 28 Feb 2024 10:32:01 +0000 |
Henry Thompson |
minor addition?
|
Wed, 28 Feb 2024 10:20:44 +0000 |
Henry S. Thompson |
merge
|
Wed, 28 Feb 2024 10:15:56 +0000 |
Henry S. Thompson |
what is this?
|
Tue, 20 Feb 2024 15:23:47 +0000 |
Henry Thompson |
add percentage of non-latin by crawl table
|
Fri, 16 Feb 2024 16:24:28 +0000 |
Henry Thompson |
tld change investigation
|
Fri, 16 Feb 2024 13:54:12 +0000 |
Henry S. Thompson |
nl1 and tld summary results
|
Thu, 15 Feb 2024 22:31:09 +0000 |
Henry S. Thompson |
correct Usage
|
Thu, 15 Feb 2024 22:30:40 +0000 |
Henry S. Thompson |
csing-related tweaks
|
Thu, 15 Feb 2024 16:36:00 +0000 |
Henry S. Thompson |
merge
|
Thu, 15 Feb 2024 15:10:34 +0000 |
Henry S. Thompson |
see Paul:Documents/HTalks/WebSci2024
|
Thu, 11 Jan 2024 16:44:45 +0000 |
Henry S. Thompson |
add some debugging info
|
Thu, 11 Jan 2024 16:43:16 +0000 |
Henry S. Thompson |
use 2-digit suffixes,
|
Fri, 08 Dec 2023 10:32:07 +0000 |
Henry S. Thompson |
sic
|
Thu, 07 Dec 2023 18:23:11 +0000 |
Henry S. Thompson |
sic
|
Thu, 07 Dec 2023 18:21:48 +0000 |
Henry S. Thompson |
added back missing years
|
Thu, 07 Dec 2023 18:15:43 +0000 |
Henry S. Thompson |
support semilogy from cmd line
|
Wed, 06 Dec 2023 13:36:49 +0000 |
Henry S. Thompson |
means of all columns in length analyses
|
Wed, 06 Dec 2023 13:33:25 +0000 |
Henry S. Thompson |
normalise % counts by non-empty bases only
|
Tue, 05 Dec 2023 19:49:29 +0000 |
Henry S. Thompson |
new plots various
|
Tue, 05 Dec 2023 19:49:11 +0000 |
Henry S. Thompson |
get single graph working, tweak params various
|
Tue, 05 Dec 2023 10:35:15 +0000 |
Henry S. Thompson |
compute (component) uri lengths and a few other properties
|
Mon, 04 Dec 2023 19:06:13 +0000 |
Henry S. Thompson |
with three tracks from two years
|
Mon, 04 Dec 2023 10:42:02 +0000 |
Henry S. Thompson |
for pub
|
Mon, 04 Dec 2023 10:40:47 +0000 |
Henry S. Thompson |
tweaked formatting
|
Mon, 04 Dec 2023 10:21:30 +0000 |
Henry S. Thompson |
excel rewrote, no important changes (?)
|
Mon, 04 Dec 2023 09:42:39 +0000 |
Henry S. Thompson |
replace wrong one with right one
|
Mon, 04 Dec 2023 09:37:14 +0000 |
Henry S. Thompson |
merge
|
Mon, 04 Dec 2023 09:35:53 +0000 |
Henry S. Thompson |
implement alternative confidence measure using stats.bootstrap,
|
Mon, 04 Dec 2023 09:33:13 +0000 |
Henry S. Thompson |
for LMh percentile
|
Thu, 30 Nov 2023 14:42:46 +0000 |
Henry S. Thompson |
decorated
|
Thu, 30 Nov 2023 14:20:22 +0000 |
Henry S. Thompson |
merge
|
Thu, 30 Nov 2023 14:18:56 +0000 |
Henry S. Thompson |
can't add props to DescribeResult
|
Thu, 30 Nov 2023 14:17:34 +0000 |
Henry S. Thompson |
for 2023-40
|
Tue, 28 Nov 2023 18:40:38 +0000 |
Henry S. Thompson |
with decorations
|
Tue, 28 Nov 2023 18:40:17 +0000 |
Henry S. Thompson |
excel rewrote, no important changes (?)
|
Tue, 28 Nov 2023 10:23:20 +0000 |
Henry S. Thompson |
with percentile instead of raw mean correl
|
Tue, 28 Nov 2023 10:22:38 +0000 |
Henry S. Thompson |
change heatmap to by percentile
|
Tue, 28 Nov 2023 10:21:36 +0000 |
Henry S. Thompson |
with heat
|
Mon, 27 Nov 2023 22:15:39 +0000 |
Henry S. Thompson |
heat map for mime vs. nl1 vs. len
|
Mon, 27 Nov 2023 22:14:53 +0000 |
Henry S. Thompson |
add head_map fn
|
Mon, 27 Nov 2023 18:25:39 +0000 |
Henry S. Thompson |
add explore_deltas and predict analysis fns
|
Sun, 26 Nov 2023 21:24:38 +0000 |
Henry S. Thompson |
rename to avoid name clash with scipy.stats
|
Fri, 24 Nov 2023 20:41:03 +0000 |
Henry S. Thompson |
move to class with local vars instead of many globals
|
Fri, 24 Nov 2023 20:40:09 +0000 |
Henry S. Thompson |
renamed to by_interval.py
|
Fri, 24 Nov 2023 20:39:08 +0000 |
Henry S. Thompson |
renamed from spearman.py
|
Fri, 24 Nov 2023 20:38:39 +0000 |
Henry S. Thompson |
renamed to stats.py
|
Fri, 24 Nov 2023 19:52:52 +0000 |
Henry S. Thompson |
do the __main__ thing
|
Fri, 24 Nov 2023 19:52:14 +0000 |
Henry S. Thompson |
put results in numbered subdirs
|
Fri, 24 Nov 2023 19:50:12 +0000 |
Henry S. Thompson |
add minimal logging and don't return until finished
|
Wed, 15 Nov 2023 10:24:32 +0000 |
Henry S. Thompson |
should work for months also now
|
Wed, 15 Nov 2023 09:36:23 +0000 |
Henry S. Thompson |
cross-language confusion :-)
|
Mon, 06 Nov 2023 15:55:57 +0000 |
Henry S. Thompson |
LM plot for multiple crawls, magnitude or %age
|
Fri, 03 Nov 2023 19:05:54 +0000 |
Henry S. Thompson |
can overlay the two
|
Thu, 02 Nov 2023 15:38:39 +0000 |
Henry S. Thompson |
fix output year
|
Thu, 02 Nov 2023 13:49:02 +0000 |
Henry S. Thompson |
sic
|
Tue, 31 Oct 2023 14:05:12 +0000 |
Henry S. Thompson |
sic
|
Tue, 31 Oct 2023 14:04:24 +0000 |
Henry S. Thompson |
get in/out file management working right
|
Tue, 31 Oct 2023 14:03:02 +0000 |
Henry S. Thompson |
refactor to provide for buffer overflow fix
|
Tue, 31 Oct 2023 14:01:50 +0000 |
Henry S. Thompson |
bug-fix wrt 1st time,
|
Mon, 30 Oct 2023 12:19:53 +0000 |
Henry S. Thompson |
make extra file info optional
|
Wed, 25 Oct 2023 23:01:59 +0100 |
Henry S. Thompson |
forget parallel, just do (default 2) parallel single threads
|
Wed, 25 Oct 2023 23:00:45 +0100 |
Henry S. Thompson |
add missing makedir
|
Tue, 24 Oct 2023 16:59:23 +0100 |
Henry S. Thompson |
now does one named segment only
|
Tue, 24 Oct 2023 16:58:44 +0100 |
Henry S. Thompson |
resurrect parallel fetch
|
Tue, 24 Oct 2023 14:34:58 +0100 |
Henry S. Thompson |
convert to single thread,
|
Tue, 24 Oct 2023 14:26:36 +0100 |
Henry S. Thompson |
avoid global name conflict
|
Wed, 11 Oct 2023 12:51:06 +0100 |
Henry S. Thompson |
moved from /beegfs/common-crawl to get under .hg
|
Wed, 11 Oct 2023 12:50:29 +0100 |
Henry S. Thompson |
fix typo
|
Fri, 06 Oct 2023 15:06:53 +0100 |
Henry S. Thompson |
build cluster.idx
|
Fri, 06 Oct 2023 15:05:55 +0100 |
Henry S. Thompson |
no longer using cmp_to_key
|
Wed, 04 Oct 2023 20:04:34 +0100 |
Henry S. Thompson |
handle -m case, support src from cmdline
mergefix
|
Thu, 05 Oct 2023 10:42:15 +0100 |
Henry S. Thompson |
new branch to save do_idx.sh from abandoned merge fixup
mergefix
|
Wed, 04 Oct 2023 18:53:55 +0100 |
Henry S. Thompson |
try to get the counts right, particularly when re-merging
|
Wed, 04 Oct 2023 18:51:56 +0100 |
Henry S. Thompson |
for use in debugging, see notes and tests 2, 17, merge test
|
Tue, 03 Oct 2023 17:45:57 +0100 |
Henry S. Thompson |
add various www deletion cases
|
Tue, 03 Oct 2023 17:44:59 +0100 |
Henry S. Thompson |
iterate WPAT fix with improved pattern
|
Tue, 03 Oct 2023 17:43:52 +0100 |
Henry S. Thompson |
loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
|
Mon, 02 Oct 2023 18:56:50 +0100 |
Henry S. Thompson |
refactor to enable rerun with fixup,
|
Mon, 02 Oct 2023 18:55:48 +0100 |
Henry S. Thompson |
correct mistaken futnsz test,
|
Mon, 02 Oct 2023 18:54:10 +0100 |
Henry S. Thompson |
change path to merge_date.py
|
Mon, 02 Oct 2023 18:52:43 +0100 |
Henry S. Thompson |
remove the mistaken deletion of NONPRINT,
|
Sat, 30 Sep 2023 18:04:15 +0100 |
Henry S. Thompson |
fix a bad fix and a bad test for the televida case
|
Sat, 30 Sep 2023 14:13:19 +0100 |
Henry S. Thompson |
fix and test for all-decimal host
|
Sat, 30 Sep 2023 14:12:39 +0100 |
Henry S. Thompson |
no import in lmh.__init__ any more
|
Sat, 30 Sep 2023 14:11:49 +0100 |
Henry S. Thompson |
importing in __init__ causes problems
|
Fri, 29 Sep 2023 15:59:34 +0100 |
Henry S. Thompson |
commented out duplicate, handle comments better
|
Fri, 29 Sep 2023 15:14:29 +0100 |
Henry S. Thompson |
more corner case tests
|
Fri, 29 Sep 2023 15:13:51 +0100 |
Henry S. Thompson |
tweaks to get all tests through #14
|
Thu, 28 Sep 2023 18:31:23 +0100 |
Henry S. Thompson |
get 7f (two cases) and %25 working
|
Thu, 28 Sep 2023 18:30:48 +0100 |
Henry S. Thompson |
add televida case test
|
Thu, 28 Sep 2023 16:36:15 +0100 |
Henry S. Thompson |
add test description
|
Thu, 28 Sep 2023 16:35:39 +0100 |
Henry S. Thompson |
importable just in case
|
Thu, 28 Sep 2023 16:34:49 +0100 |
Henry S. Thompson |
move most of the hacking into fixGoogleCanon,
|
Thu, 28 Sep 2023 16:10:05 +0100 |
Henry S. Thompson |
forget assert, allow multiple failures
|
Thu, 28 Sep 2023 16:09:38 +0100 |
Henry S. Thompson |
x
|
Thu, 28 Sep 2023 14:08:36 +0100 |
Henry S. Thompson |
found right place for \x7f hack, maybe
|
Thu, 28 Sep 2023 14:06:11 +0100 |
Henry S. Thompson |
readability
|
Thu, 28 Sep 2023 11:00:36 +0100 |
Henry S. Thompson |
x
|
Thu, 28 Sep 2023 11:00:24 +0100 |
Henry S. Thompson |
refactor to sort a module in an lmh package
|
Thu, 28 Sep 2023 10:54:12 +0100 |
Henry S. Thompson |
start some regression tests
|