# HG changeset patch # User Henry S. Thompson # Date 1724164067 -3600 # Node ID 4167d8f33325a5e5b88ecf250c8be7704a8293b9 # Parent ddedac65afa21a4f3dc5ca16d5ca6c227d68947a start lab notes for LURID3 diff -r ddedac65afa2 -r 4167d8f33325 lurid3/notes.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lurid3/notes.txt Tue Aug 20 15:27:47 2024 +0100 @@ -0,0 +1,10 @@ +See old_notes.txt for all older notes on Common Crawl dataprocessing, +starting from Azure via Turing and then LURID and LURID2. + +Installed /beegfs/common_crawl/CC-MAIN-2024-33/cdx + >: cd results/CC-MAIN-2024-33/cdx/ + >: cut -f 2 counts.tsv | btot + 2,793,986,828 + +State of play wrt data -- see status.xlsx + diff -r ddedac65afa2 -r 4167d8f33325 lurid3/status.xlsx Binary file lurid3/status.xlsx has changed