log

age author description
15 months ago Henry S. Thompson renamed to by_interval.py
15 months ago Henry S. Thompson renamed from spearman.py
15 months ago Henry S. Thompson renamed to stats.py
15 months ago Henry S. Thompson do the __main__ thing
15 months ago Henry S. Thompson put results in numbered subdirs
15 months ago Henry S. Thompson add minimal logging and don't return until finished
15 months ago Henry S. Thompson should work for months also now
15 months ago Henry S. Thompson cross-language confusion :-)
15 months ago Henry S. Thompson LM plot for multiple crawls, magnitude or %age
15 months ago Henry S. Thompson can overlay the two
15 months ago Henry S. Thompson fix output year
15 months ago Henry S. Thompson sic
16 months ago Henry S. Thompson sic
16 months ago Henry S. Thompson get in/out file management working right
16 months ago Henry S. Thompson refactor to provide for buffer overflow fix
16 months ago Henry S. Thompson bug-fix wrt 1st time,
16 months ago Henry S. Thompson make extra file info optional
16 months ago Henry S. Thompson forget parallel, just do (default 2) parallel single threads
16 months ago Henry S. Thompson add missing makedir
16 months ago Henry S. Thompson now does one named segment only
16 months ago Henry S. Thompson resurrect parallel fetch
16 months ago Henry S. Thompson convert to single thread,
16 months ago Henry S. Thompson avoid global name conflict
16 months ago Henry S. Thompson moved from /beegfs/common-crawl to get under .hg
16 months ago Henry S. Thompson fix typo
16 months ago Henry S. Thompson build cluster.idx
16 months ago Henry S. Thompson no longer using cmp_to_key
16 months ago Henry S. Thompson handle -m case, support src from cmdline mergefix
16 months ago Henry S. Thompson new branch to save do_idx.sh from abandoned merge fixup mergefix
16 months ago Henry S. Thompson try to get the counts right, particularly when re-merging
16 months ago Henry S. Thompson for use in debugging, see notes and tests 2, 17, merge test
16 months ago Henry S. Thompson add various www deletion cases
16 months ago Henry S. Thompson iterate WPAT fix with improved pattern
16 months ago Henry S. Thompson loosen WARC pattern to avoid failure from "mime" = "{...}" intervening
17 months ago Henry S. Thompson refactor to enable rerun with fixup,
17 months ago Henry S. Thompson correct mistaken futnsz test,
17 months ago Henry S. Thompson change path to merge_date.py
17 months ago Henry S. Thompson remove the mistaken deletion of NONPRINT,
17 months ago Henry S. Thompson fix a bad fix and a bad test for the televida case
17 months ago Henry S. Thompson fix and test for all-decimal host
17 months ago Henry S. Thompson no import in lmh.__init__ any more
17 months ago Henry S. Thompson importing in __init__ causes problems
17 months ago Henry S. Thompson commented out duplicate, handle comments better
17 months ago Henry S. Thompson more corner case tests
17 months ago Henry S. Thompson tweaks to get all tests through #14
17 months ago Henry S. Thompson get 7f (two cases) and %25 working
17 months ago Henry S. Thompson add televida case test
17 months ago Henry S. Thompson add test description
17 months ago Henry S. Thompson importable just in case
17 months ago Henry S. Thompson move most of the hacking into fixGoogleCanon,
17 months ago Henry S. Thompson forget assert, allow multiple failures
17 months ago Henry S. Thompson x
17 months ago Henry S. Thompson found right place for \x7f hack, maybe
17 months ago Henry S. Thompson readability
17 months ago Henry S. Thompson x
17 months ago Henry S. Thompson refactor to sort a module in an lmh package