Mercurial > hg > cc > work
comparison lurid3/notes.txt @ 67:24ca6ab32e47
malloc
author | Henry S. Thompson <ht@inf.ed.ac.uk> |
---|---|
date | Tue, 04 Feb 2025 12:50:52 +0000 |
parents | 0c814f07865a |
children | 3cd52d1849bb |
comparison
equal
deleted
inserted
replaced
66:0c814f07865a | 67:24ca6ab32e47 |
---|---|
1189 tested | 1189 tested |
1190 2.055266048759222 | 1190 2.055266048759222 |
1191 Oops, that was ndb, and nndb doesn't work! | 1191 Oops, that was ndb, and nndb doesn't work! |
1192 | 1192 |
1193 Things to try next: | 1193 Things to try next: |
1194 1) Build a bigger .cdb w. as close to 4GB as possible | 1194 1) Build a bigger .cdb w. as close to 4GB as possible Done |
1195 2) Shift to a shared library for cdb-0.75 | 1195 2) Shift to a shared library for cdb-0.75 Done |
1196 3) Get rid of the single fixed Cdb struct instance and malloc it as | 1196 3) Get rid of the single fixed Cdb struct instance and malloc it as |
1197 required | 1197 required Done |
1198 3a) Remove debugging output and recompile everything | 1198 3a) Remove debugging output and recompile everything Done |
1199 4) Build and test the real harness to process .cdx files using .cdb | 1199 4) Build and test the real harness to process .cdx files using .cdb |
1200 | 1200 |
1201 Try 50% more, e.g. approx. 1.5 segments | 1201 Try 50% more, e.g. approx. 1.5 segments |
1202 >: python3 -c "print(1.5 * 5236974)" | 1202 >: python3 -c "print(1.5 * 5236974)" |
1203 7855461.0 | 1203 7855461.0 |
1285 so much: | 1285 so much: |
1286 'cfind(probe)', | 1286 'cfind(probe)', |
1287 vs | 1287 vs |
1288 '(X:=X+1) if cfind(probe)==1 else None', | 1288 '(X:=X+1) if cfind(probe)==1 else None', |
1289 setup = 'global X' | 1289 setup = 'global X' |
1290 | |
1291 For this test, size of db doesn't matter: | |
1292 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000 | |
1293 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0-9.60.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000 | |
1294 1 2488 10 | |
1295 1564555978 | |
1296 tested | |
1297 1.9914793614298105 | |
1298 2.711402891203761 10000000 | |
1299 | |
1300 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000 | |
1301 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000 | |
1302 1 2488 10 | |
1303 1564555978 | |
1304 tested | |
1305 2.015953341498971 | |
1306 2.798953704535961 10000000 | |
1307 | |
1308 Now using non-static Cdb, it's slower??? Cirrus load is low :-(. | |
1309 | |
1310 >: python3 -c 'import nndb' ~/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb 20190825142846http://71.43.189.10/dermorph/ 10000000 | |
1311 testing... /work/dc007/dc007/hst/results/CC-MAIN-2019-35/warc_lmhx/ks_0.cdb b'20190825142846http://71.43.189.10/dermorph/' x 10000000 | |
1312 1 2488 10 | |
1313 1564555978 | |
1314 tested | |
1315 2.474294847997953 | |
1316 3.2440936170023633 10000000 | |
1317 | |
1318 At least it works: | |
1319 | |
1320 >: python3 -c 'import get2' cdb/rts-tmp/sv.cdb cdb/rts-tmp/12.cdb two discard/tcp | |
1321 two cdb/rts-tmp/sv.cdb missing | |
1322 discard/tcp cdb/rts-tmp/sv.cdb 9 | |
1323 two cdb/rts-tmp/12.cdb Goodbye | |
1324 discard/tcp cdb/rts-tmp/12.cdb missing | |
1325 | |
1290 ================ | 1326 ================ |
1291 | 1327 |
1292 Try it with the existing _per segment_ index we have for 2019-35 | 1328 Try it with the existing _per segment_ index we have for 2019-35 |
1293 | 1329 |
1294 Assuming we have to key on segment / file and offset, as reconstructing the | 1330 Assuming we have to key on segment / file and offset, as reconstructing the |