cc/work: lurid3/notes.txt comparison

comparison lurid3/notes.txt @ 67:24ca6ab32e47

malloc

author	Henry S. Thompson <ht@inf.ed.ac.uk>
date	Tue, 04 Feb 2025 12:50:52 +0000
parents	0c814f07865a
children	3cd52d1849bb

comparison

equal deleted inserted replaced

-:0c814f07865a
+:24ca6ab32e47
 tested
 2.055266048759222
 Oops, that was ndb, and nndb doesn't work!
 Things to try next:
-1) Build a bigger .cdb w. as close to 4GB as possible
+1) Build a bigger .cdb w. as close to 4GB as possible Done
-2) Shift to a shared library for cdb-0.75
+2) Shift to a shared library for cdb-0.75 Done
 3) Get rid of the single fixed Cdb struct instance and malloc it as
-required
+required Done
-3a) Remove debugging output and recompile everything
+3a) Remove debugging output and recompile everything Done
 4) Build and test the real harness to process .cdx files using .cdb
 Try 50% more, e.g. approx. 1.5 segments
 >: python3 -c "print(1.5 * 5236974)"
 7855461.0
 so much:
 'cfind(probe)',
 vs
 '(X:=X+1) if cfind(probe)==1 else None',
 setup = 'global X'
+For this test, size of db doesn't matter:
+>: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
+testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
+1 2488 10
+1564555978
+tested
+1.9914793614298105
+2.711402891203761 10000000
+>: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
+testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
+1 2488 10
+1564555978
+tested
+2.015953341498971
+2.798953704535961 10000000
+Now using non-static Cdb, it's slower???  Cirrus load is low :-(.
+>: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
+testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
+1 2488 10
+1564555978
+tested
+2.474294847997953
+3.2440936170023633 10000000
+At least it works:
+>: python3 -c 'import get2' cdb/rts-tmp/sv.cdb cdb/rts-tmp/12.cdb two discard/tcp
+two cdb/rts-tmp/sv.cdb missing
+discard/tcp cdb/rts-tmp/sv.cdb 9
+two cdb/rts-tmp/12.cdb Goodbye
+discard/tcp cdb/rts-tmp/12.cdb missing
 ================
 Try it with the existing _per segment_ index we have for 2019-35
 Assuming we have to key on segment / file and offset, as reconstructing the

Mercurial > hg > cc > work

comparison lurid3/notes.txt @ 67:24ca6ab32e47