annotate bin/count_warc.py @ 44:083229195d12

just count part length
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Wed, 05 Jul 2023 17:51:44 +0100
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
44
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
1 #!/usr/bin/env python3
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
2 import warc,sys
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
3
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
4 OUT=open(sys.stdout.fileno(),'wb')
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
5
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
6 if (debug:=(sys.argv[1]=='-d')):
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
7 sys.argv.pop(1)
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
8
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
9 def countme(wtype,buf,part):
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
10 if debug:
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
11 breakpoint()
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
12 OUT.write(b"%d\n"%len(buf))
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
13
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
14 #warc(showme,[b'response','warcinfo','request','metadata'],int(sys.argv[2]))
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
15 #warc(showme,[b'response'],whole=True)
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
16
083229195d12 just count part length
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
17 warc.warc(sys.argv[1],countme,[b'response'],parts=int(sys.argv[2]),debug=debug)