log

age author description
2 months ago Henry S. Thompson try adding lm to existing index from ks_0-9
2 months ago Henry S. Thompson output bytes, pickle and save dict if -p, trim lm value to int
2 months ago Henry S. Thompson test big dict for associating lm timestamp with cc timestamp+uri
5 months ago Henry S. Thompson working together works well to provide what's needed to update a cdx to include lastmod where possible
5 months ago Henry S. Thompson make into a library, entry point def unpackz(infileName, callback, outfile = None),
5 months ago Henry S. Thompson cleaned up indentation to 2 spaces throughout
5 months ago Henry S. Thompson take bufsize from cmdline
5 months ago Henry S. Thompson eof pblms fixed, seems to work
5 months ago Henry S. Thompson working, but last count/offset not being written
5 months ago Henry S. Thompson fix error message
5 months ago Henry S. Thompson csing disabled for now
5 months ago Henry S. Thompson font hacking, see also lib/xemacs/common-init.el
5 months ago Henry S. Thompson new default from CC themselves
5 months ago Henry S. Thompson for debugging?
10 months ago Henry S. Thompson for use in Stuttgart, maybe
12 months ago Henry S. Thompson xxx
12 months ago Henry S. Thompson merge
12 months ago Henry S. Thompson post-processing
12 months ago Henry S. Thompson sic
12 months ago Henry S. Thompson compute offset between LM and crawl timestamp
12 months ago Henry S. Thompson sic
12 months ago Henry S. Thompson rebuild to match triple fig line colour
12 months ago Henry S. Thompson rebuild with more consistent appearance
12 months ago Henry S. Thompson merge
12 months ago Henry S. Thompson replaced mean_lens by w or wo bogon
12 months ago Henry S. Thompson now using clean 2005 count
12 months ago Henry Thompson minor addition?
12 months ago Henry S. Thompson merge
12 months ago Henry S. Thompson what is this?
12 months ago Henry Thompson add percentage of non-latin by crawl table
12 months ago Henry Thompson tld change investigation
12 months ago Henry S. Thompson nl1 and tld summary results
12 months ago Henry S. Thompson correct Usage
12 months ago Henry S. Thompson csing-related tweaks
12 months ago Henry S. Thompson merge
12 months ago Henry S. Thompson see Paul:Documents/HTalks/WebSci2024
14 months ago Henry S. Thompson add some debugging info
14 months ago Henry S. Thompson use 2-digit suffixes,
15 months ago Henry S. Thompson sic
15 months ago Henry S. Thompson sic
15 months ago Henry S. Thompson added back missing years
15 months ago Henry S. Thompson support semilogy from cmd line
15 months ago Henry S. Thompson means of all columns in length analyses
15 months ago Henry S. Thompson normalise % counts by non-empty bases only
15 months ago Henry S. Thompson new plots various
15 months ago Henry S. Thompson get single graph working, tweak params various
15 months ago Henry S. Thompson compute (component) uri lengths and a few other properties
15 months ago Henry S. Thompson with three tracks from two years
15 months ago Henry S. Thompson for pub
15 months ago Henry S. Thompson tweaked formatting
15 months ago Henry S. Thompson excel rewrote, no important changes (?)
15 months ago Henry S. Thompson replace wrong one with right one
15 months ago Henry S. Thompson merge
15 months ago Henry S. Thompson implement alternative confidence measure using stats.bootstrap,
15 months ago Henry S. Thompson for LMh percentile
15 months ago Henry S. Thompson decorated
15 months ago Henry S. Thompson merge
15 months ago Henry S. Thompson can't add props to DescribeResult
15 months ago Henry S. Thompson for 2023-40
15 months ago Henry S. Thompson with decorations
15 months ago Henry S. Thompson excel rewrote, no important changes (?)
15 months ago Henry S. Thompson with percentile instead of raw mean correl
15 months ago Henry S. Thompson change heatmap to by percentile
15 months ago Henry S. Thompson with heat
15 months ago Henry S. Thompson heat map for mime vs. nl1 vs. len
15 months ago Henry S. Thompson add head_map fn
15 months ago Henry S. Thompson add explore_deltas and predict analysis fns
15 months ago Henry S. Thompson rename to avoid name clash with scipy.stats
15 months ago Henry S. Thompson move to class with local vars instead of many globals
15 months ago Henry S. Thompson renamed to by_interval.py
15 months ago Henry S. Thompson renamed from spearman.py
15 months ago Henry S. Thompson renamed to stats.py
15 months ago Henry S. Thompson do the __main__ thing
15 months ago Henry S. Thompson put results in numbered subdirs
15 months ago Henry S. Thompson add minimal logging and don't return until finished
16 months ago Henry S. Thompson should work for months also now
16 months ago Henry S. Thompson cross-language confusion :-)
16 months ago Henry S. Thompson LM plot for multiple crawls, magnitude or %age
16 months ago Henry S. Thompson can overlay the two
16 months ago Henry S. Thompson fix output year
16 months ago Henry S. Thompson sic
16 months ago Henry S. Thompson sic
16 months ago Henry S. Thompson get in/out file management working right
16 months ago Henry S. Thompson refactor to provide for buffer overflow fix
16 months ago Henry S. Thompson bug-fix wrt 1st time,
16 months ago Henry S. Thompson make extra file info optional
16 months ago Henry S. Thompson forget parallel, just do (default 2) parallel single threads
16 months ago Henry S. Thompson add missing makedir
16 months ago Henry S. Thompson now does one named segment only
16 months ago Henry S. Thompson resurrect parallel fetch
16 months ago Henry S. Thompson convert to single thread,
16 months ago Henry S. Thompson avoid global name conflict
17 months ago Henry S. Thompson moved from /beegfs/common-crawl to get under .hg
17 months ago Henry S. Thompson fix typo
17 months ago Henry S. Thompson build cluster.idx
17 months ago Henry S. Thompson no longer using cmp_to_key
17 months ago Henry S. Thompson handle -m case, support src from cmdline mergefix
17 months ago Henry S. Thompson new branch to save do_idx.sh from abandoned merge fixup mergefix
17 months ago Henry S. Thompson try to get the counts right, particularly when re-merging
17 months ago Henry S. Thompson for use in debugging, see notes and tests 2, 17, merge test
17 months ago Henry S. Thompson add various www deletion cases
17 months ago Henry S. Thompson iterate WPAT fix with improved pattern
17 months ago Henry S. Thompson loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
17 months ago Henry S. Thompson refactor to enable rerun with fixup,
17 months ago Henry S. Thompson correct mistaken futnsz test,
17 months ago Henry S. Thompson change path to merge_date.py
17 months ago Henry S. Thompson remove the mistaken deletion of NONPRINT,
17 months ago Henry S. Thompson fix a bad fix and a bad test for the televida case
17 months ago Henry S. Thompson fix and test for all-decimal host
17 months ago Henry S. Thompson no import in lmh.__init__ any more
17 months ago Henry S. Thompson importing in __init__ causes problems
17 months ago Henry S. Thompson commented out duplicate, handle comments better