view data/00README @ 163:ef961d91eea5

previous approach to lang/field extraction
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Mon, 18 Jul 2022 18:16:27 +0100
parents 128b18459f9e
children
line wrap: on
line source

*CC-MAIN-2019-35/*

Around 100 sample WARC files, all the index files, index file for some
pdfs

*bin/*

Release version of various tools

See 00README files in subdirectories for more information.