Mercurial > hg > cc > work
changeset 69:fb3dcd144e59 default tip
notes.txt
author | Henry S. Thompson <ht@inf.ed.ac.uk> |
---|---|
date | Tue, 11 Feb 2025 15:31:47 +0000 |
parents | 3cd52d1849bb |
children | |
files | APP39557 Vision and Approach.docx lurid3/notes.txt |
diffstat | 2 files changed, 80 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/lurid3/notes.txt Mon Feb 10 15:27:12 2025 +0000 +++ b/lurid3/notes.txt Tue Feb 11 15:31:47 2025 +0000 @@ -1429,7 +1429,87 @@ -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-13.cdb -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-14.cdb -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-15.cdb + >: time cdbtest < ks-00.cdb + found: 32449485 + different record: 0 + bad length: 0 + not found: 0 + untested: 19063 + real 2m42.674s + user 0m46.146s + sys 0m29.479s + >: ls *.cdb | fgrep -v 00 | parallel -j 8 'cdbtest < {} | egrep -v " 0$"' |& tee /tmp/hst/cdbtest.out & + found: 32588023 + untested: 23504 + found: 32180038 + untested: 14228 + found: 33045196 + untested: 13271 + found: 32216571 + untested: 19361 + found: 33329508 + untested: 11891 + found: 32449830 + untested: 17097 + found: 32913005 + untested: 12445 + found: 33046940 + untested: 12611 + found: 33056259 + untested: 12760 + found: 32277255 + untested: 17407 + found: 32270904 + untested: 11798 + found: 32838272 + untested: 14305 + found: 32560341 + untested: 24139 + found: 33389520 + untested: 12154 + found: 33002021 + untested: 14181 + +All good. +Loaded all 16 in to a python image w/o difficulty +Converted CCdb shim to use cpdef for most methods w/o apparent loss of +speed. + +But, sigh, need to be on segment boundaries so I can tell by the +filename in an original index entry which cdb to look in. + >: echo ?/ks_lines.txt ??/ks_lines.txt | xargs cat | tr ' ' '\n' |python3 -c 'M = 33000000 + import sys + n = 0 + for i in range(100): + j = int(sys.stdin.readline()) + if n+j > M: + print(i-1, n, sep = "\t") + n = j + else: + n += j + ' > ks_divs.tsv + >: cat ks_divs.tsv + 5 31296954 + 11 31587394 + 17 31428928 + 23 31622785 + 29 31304094 + 35 31467923 + 41 31502409 + 47 31456463 + 53 31462669 + 59 31538406 + 65 31355147 + 71 31281451 + 77 31429609 + 83 31389542 + 89 31444473 + 95 31385531 + >: wc -l ks_divs.tsv + 16 ks_divs.tsv + +So, has to be 17 :-( ================ Try it with the existing _per segment_ index we have for 2019-35