comparison lurid3/notes.txt @ 67:24ca6ab32e47

malloc
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 04 Feb 2025 12:50:52 +0000
parents 0c814f07865a
children 3cd52d1849bb
comparison
equal deleted inserted replaced
66:0c814f07865a 67:24ca6ab32e47
1189 tested 1189 tested
1190 2.055266048759222 1190 2.055266048759222
1191 Oops, that was ndb, and nndb doesn't work! 1191 Oops, that was ndb, and nndb doesn't work!
1192 1192
1193 Things to try next: 1193 Things to try next:
1194 1) Build a bigger .cdb w. as close to 4GB as possible 1194 1) Build a bigger .cdb w. as close to 4GB as possible Done
1195 2) Shift to a shared library for cdb-0.75 1195 2) Shift to a shared library for cdb-0.75 Done
1196 3) Get rid of the single fixed Cdb struct instance and malloc it as 1196 3) Get rid of the single fixed Cdb struct instance and malloc it as
1197 required 1197 required Done
1198 3a) Remove debugging output and recompile everything 1198 3a) Remove debugging output and recompile everything Done
1199 4) Build and test the real harness to process .cdx files using .cdb 1199 4) Build and test the real harness to process .cdx files using .cdb
1200 1200
1201 Try 50% more, e.g. approx. 1.5 segments 1201 Try 50% more, e.g. approx. 1.5 segments
1202 >: python3 -c "print(1.5 * 5236974)" 1202 >: python3 -c "print(1.5 * 5236974)"
1203 7855461.0 1203 7855461.0
1285 so much: 1285 so much:
1286 'cfind(probe)', 1286 'cfind(probe)',
1287 vs 1287 vs
1288 '(X:=X+1) if cfind(probe)==1 else None', 1288 '(X:=X+1) if cfind(probe)==1 else None',
1289 setup = 'global X' 1289 setup = 'global X'
1290
1291 For this test, size of db doesn't matter:
1292 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
1293 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
1294 1 2488 10
1295 1564555978
1296 tested
1297 1.9914793614298105
1298 2.711402891203761 10000000
1299
1300 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
1301 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
1302 1 2488 10
1303 1564555978
1304 tested
1305 2.015953341498971
1306 2.798953704535961 10000000
1307
1308 Now using non-static Cdb, it's slower??? Cirrus load is low :-(.
1309
1310 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000
1311 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000
1312 1 2488 10
1313 1564555978
1314 tested
1315 2.474294847997953
1316 3.2440936170023633 10000000
1317
1318 At least it works:
1319
1320 >: python3 -c 'import get2' cdb/rts-tmp/sv.cdb cdb/rts-tmp/12.cdb two discard/tcp
1321 two cdb/rts-tmp/sv.cdb missing
1322 discard/tcp cdb/rts-tmp/sv.cdb 9
1323 two cdb/rts-tmp/12.cdb Goodbye
1324 discard/tcp cdb/rts-tmp/12.cdb missing
1325
1290 ================ 1326 ================
1291 1327
1292 Try it with the existing _per segment_ index we have for 2019-35 1328 Try it with the existing _per segment_ index we have for 2019-35
1293 1329
1294 Assuming we have to key on segment / file and offset, as reconstructing the 1330 Assuming we have to key on segment / file and offset, as reconstructing the