Mercurial > hg > cc > pub
comparison index.xml @ 4:268fe5fd117f
add slides
author | Henry Thompson <ht@markup.co.uk> |
---|---|
date | Wed, 22 May 2024 17:18:23 +0200 |
parents | d6f13dda3a11 |
children | cc5cef8ba548 |
comparison
equal
deleted
inserted
replaced
3:7ec8f691a25a | 4:268fe5fd117f |
---|---|
3 <!DOCTYPE doc SYSTEM "../../../lib/xml/doc.dtd" > | 3 <!DOCTYPE doc SYSTEM "../../../lib/xml/doc.dtd" > |
4 <doc> | 4 <doc> |
5 <head> | 5 <head> |
6 <title>Augmentations to Common Crawl</title> | 6 <title>Augmentations to Common Crawl</title> |
7 <author>Henry S. Thompson</author> | 7 <author>Henry S. Thompson</author> |
8 <date>15 Apr 2024</date> | 8 <date>22 May 2024</date> |
9 </head> | 9 </head> |
10 <body> | 10 <body> |
11 <div> | 11 <div> |
12 <title>Introduction</title> | 12 <title>Introduction</title> |
13 <p>This site contains a preliminary publication of my augmented <link href="https://commoncrawl.org/blog/announcing-the-common-crawl-index">index files</link> | 13 <p>This site contains a preliminary publication of my augmented <link href="https://commoncrawl.org/blog/announcing-the-common-crawl-index">index files</link> |
20 the augmented index and its uses</item> | 20 the augmented index and its uses</item> |
21 <item>The <link href="CC-MAIN-2019-35/cdx/warc/cluster.idx">top-level index file</link></item> | 21 <item>The <link href="CC-MAIN-2019-35/cdx/warc/cluster.idx">top-level index file</link></item> |
22 <item><link href="CC-MAIN-2019-35/cdx/warc/idx/">The directory containing | 22 <item><link href="CC-MAIN-2019-35/cdx/warc/idx/">The directory containing |
23 the individual gzipped index files themselves</link>, with names of the form | 23 the individual gzipped index files themselves</link>, with names of the form |
24 <code>cdx-00nnn.gz</code>, for <code>nnn</code> in <code>000–299</code></item> | 24 <code>cdx-00nnn.gz</code>, for <code>nnn</code> in <code>000–299</code></item> |
25 <item><link href="Thompson_WebSci24_slides.pdf">WebSci 24 conference slides</link></item> | |
25 </list> | 26 </list> |
26 </div> | 27 </div> |
27 <div> | 28 <div> |
28 <title>Licence and citation</title> | 29 <title>Licence and citation</title> |
29 <p>The paper and data contained herein are Copyright © 2024 Henry S. Thompson <link href="http://creativecommons.org/licenses/by-sa/3.0/deed.en">CC-BY-SA</link></p> | 30 <p>The paper and data contained herein are Copyright © 2024 Henry S. Thompson <link href="http://creativecommons.org/licenses/by-sa/3.0/deed.en">CC-BY-SA</link></p> |