annotate lurid3/notes.txt @ 40:4167d8f33325

start lab notes for LURID3
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 20 Aug 2024 15:27:47 +0100
parents
children 64b7fb44e8dc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
40
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
1 See old_notes.txt for all older notes on Common Crawl dataprocessing,
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
2 starting from Azure via Turing and then LURID and LURID2.
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
3
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
4 Installed /beegfs/common_crawl/CC-MAIN-2024-33/cdx
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
5 >: cd results/CC-MAIN-2024-33/cdx/
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
6 >: cut -f 2 counts.tsv | btot
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
7 2,793,986,828
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
8
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
9 State of play wrt data -- see status.xlsx
4167d8f33325 start lab notes for LURID3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
10