annotate lurid3/notes.txt @ 40:4167d8f33325
start lab notes for LURID3
author |
Henry S. Thompson <ht@inf.ed.ac.uk> |
date |
Tue, 20 Aug 2024 15:27:47 +0100 |
parents |
|
children |
64b7fb44e8dc |
rev |
line source |
40
|
1 See old_notes.txt for all older notes on Common Crawl dataprocessing,
|
|
2 starting from Azure via Turing and then LURID and LURID2.
|
|
3
|
|
4 Installed /beegfs/common_crawl/CC-MAIN-2024-33/cdx
|
|
5 >: cd results/CC-MAIN-2024-33/cdx/
|
|
6 >: cut -f 2 counts.tsv | btot
|
|
7 2,793,986,828
|
|
8
|
|
9 State of play wrt data -- see status.xlsx
|
|
10
|