changeset 69:fb3dcd144e59 default tip

notes.txt
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 11 Feb 2025 15:31:47 +0000
parents 3cd52d1849bb
children
files APP39557 Vision and Approach.docx lurid3/notes.txt
diffstat 2 files changed, 80 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
Binary file APP39557 Vision and Approach.docx has changed
--- a/lurid3/notes.txt	Mon Feb 10 15:27:12 2025 +0000
+++ b/lurid3/notes.txt	Tue Feb 11 15:31:47 2025 +0000
@@ -1429,7 +1429,87 @@
   -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-13.cdb
   -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-14.cdb
   -rw-r--r-- 1 hst dc007 3.9G Feb 10 15:22 ks-15.cdb
+  >: time cdbtest < ks-00.cdb
+  found: 32449485
+  different record: 0
+  bad length: 0
+  not found: 0
+  untested: 19063
 
+  real    2m42.674s
+  user    0m46.146s
+  sys     0m29.479s
+  >: ls *.cdb | fgrep -v 00 | parallel -j 8 'cdbtest < {} | egrep -v " 0$"' |& tee /tmp/hst/cdbtest.out &
+  found: 32588023
+  untested: 23504
+  found: 32180038
+  untested: 14228
+  found: 33045196
+  untested: 13271
+  found: 32216571
+  untested: 19361
+  found: 33329508
+  untested: 11891
+  found: 32449830
+  untested: 17097
+  found: 32913005
+  untested: 12445
+  found: 33046940
+  untested: 12611
+  found: 33056259
+  untested: 12760
+  found: 32277255
+  untested: 17407
+  found: 32270904
+  untested: 11798
+  found: 32838272
+  untested: 14305
+  found: 32560341
+  untested: 24139
+  found: 33389520
+  untested: 12154
+  found: 33002021
+  untested: 14181
+
+All good.
+Loaded all 16 in to a python image w/o difficulty
+Converted CCdb shim to use cpdef for most methods w/o apparent loss of
+speed.
+
+But, sigh, need to be on segment boundaries so I can tell by the
+filename in an original index entry which cdb to look in.
+  >: echo ?/ks_lines.txt ??/ks_lines.txt | xargs cat | tr ' ' '\n' |python3 -c 'M = 33000000
+  import sys
+  n = 0
+  for i in range(100):
+   j = int(sys.stdin.readline())
+   if n+j > M:
+    print(i-1, n, sep = "\t")
+    n = j
+   else:
+    n += j
+  ' > ks_divs.tsv
+  >: cat ks_divs.tsv
+  5       31296954
+  11      31587394
+  17      31428928
+  23      31622785
+  29      31304094
+  35      31467923
+  41      31502409
+  47      31456463
+  53      31462669
+  59      31538406
+  65      31355147
+  71      31281451
+  77      31429609
+  83      31389542
+  89      31444473
+  95      31385531
+  >: wc -l ks_divs.tsv
+  16 ks_divs.tsv
+
+So, has to be 17 :-(
 ================
 
 Try it with the existing _per segment_ index we have for 2019-35