annotate master/src/wecu/sac_schemes.py @ 63:d46c8b12fc04

support multiple approaches to key combination, use local files to collect results
author Henry S. Thompson <ht@markup.co.uk>
date Wed, 03 Jun 2020 16:40:34 +0000
parents 892e1c0240e1
children b04870ab3035
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
1 #!/usr/bin/python3
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
2 '''Assumes export PYTHONIOENCODING=utf-8 has been done if necessary
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
3
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
4 Usage: uz ...wat.gz | sac_schemes.py [-d] [altStorageScheme]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
5
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
6 where altStorageScheme if present selects an alternative approach to storing triple counts:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
7 [absent]: three nested dictionaries
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
8 1: one dictionary indexed by 4-tuple
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
9 2: one dictionary indexed by ".".join(keys)'''
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
10
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
11 import sys, json, regex
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
12 from collections.abc import Iterable
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
13
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
14 if len(sys.argv)>1 and sys.argv[1]=='-d':
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
15 sys.argv.pop(1)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
16 dictRes=True
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
17 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
18 dictRes=False
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
19
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
20 META_PATH=['Envelope', 'Payload-Metadata', 'HTTP-Response-Metadata']
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
21
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
22 PATHS={'hdr':['Headers'],
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
23 'head':['HTML-Metadata','Head'],
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
24 'body':['HTML-Metadata','Links']}
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
25
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
26 SCHEME=regex.compile('(<?[a-zA-Z][a-zA-Z0-9+.-]*):')
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
27 URN=regex.compile('(<?urn:[a-z][a-z0-9+.-]*):',regex.I)
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
28
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
29 EMPTY=''
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
30
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
31 def walk(o,f,r,path=None):
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
32 '''Apply f to every key+leaf of a json object in region r'''
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
33 if isinstance(o,dict):
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
34 for k,v in o.items():
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
35 if isinstance(v,dict):
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
36 walk(v,f,r,(path,k))
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
37 elif isinstance(v,Iterable):
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
38 walked=False
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
39 for i in v:
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
40 if isinstance(i,dict):
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
41 if (not walked) and (i is not v[0]):
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
42 print('oops',key,path,k,i,file=sys.stderr)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
43 walked=True
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
44 walk(i,f,r,(path,k))
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
45 elif walked:
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
46 print('oops2',key,path,k,i,file=sys.stderr)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
47 if not walked:
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
48 f(v,k,path,r)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
49 else:
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
50 f(v,k,path,r)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
51 elif isinstance(o,Iterable):
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
52 for i in o:
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
53 walk(i,f,r,path)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
54
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
55 def pp(v,k,p,r):
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
56 '''Uses nested dictionaries'''
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
57 if isinstance(v,str):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
58 m=SCHEME.match(v)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
59 if m is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
60 n=URN.match(v)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
61 if n is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
62 m=n
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
63 s=m.group(1)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
64 # The following assumes paths are always either length 1 or length 2!!!
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
65 # by open-coding rather than using qq(p)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
66 if p is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
67 assert p[0] is None
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
68 p=p[1]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
69 d=res[r].setdefault(p,dict())
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
70 d=d.setdefault(k,dict())
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
71 d[s]=d.get(s,0)+1
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
72
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
73 def pp_tuple(v,k,p,r):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
74 '''Uses one dict and 4-tuple'''
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
75 if isinstance(v,str):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
76 m=SCHEME.match(v)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
77 if m is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
78 n=URN.match(v)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
79 if n is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
80 m=n
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
81 s=m.group(1)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
82 # The following assumes paths are always either length 1 or length 2!!!
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
83 # by open-coding rather than using qq(p)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
84 if p is not None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
85 assert p[0] is None
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
86 p=p[1]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
87 k=(r,p,k,s)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
88 res[k]=res.get(k,0)+1
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
89
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
90 SEP='\x00'
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
91 DOT='.'
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
92
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
93 def pp_concat(v,k,p,r):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
94 '''Uses one dict and one string'''
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
95 if isinstance(v,str):
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
96 m=SCHEME.match(v)
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
97 if m is not None:
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
98 n=URN.match(v)
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
99 if n is not None:
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
100 m=n
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
101 s=m.group(1)
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
102 # The following assumes paths are always either length 1 or length 2!!!
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
103 # by open-coding rather than using qq(p)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
104 if p is None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
105 p=EMPTY
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
106 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
107 assert p[0] is None
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
108 p=p[1]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
109 k=SEP.join((r,p,k,s))
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
110 res[k]=res.get(k,0)+1
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
111
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
112 def dump(res):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
113 for r in res.keys():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
114 rv=res[r]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
115 for p in rv.keys():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
116 pv=rv[p]
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
117 for k,v in pv.items():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
118 for s,c in v.items():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
119 print(r,end=EMPTY)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
120 if p is None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
121 print(EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
122 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
123 print('.',p,sep=EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
124 print(k,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
125 print(s,c,sep='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
126
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
127 def dump_tuple(res):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
128 for (r,p,k,s),c in res.items():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
129 print(r,end=EMPTY)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
130 # The following assumes paths are always either length 1 or length 2!!!
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
131 # by open-coding rather than using qq(p)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
132 if p is None:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
133 print(EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
134 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
135 print(DOT,p,sep=EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
136 print(k,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
137 print(s,c,sep='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
138
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
139 def dump_concat(res):
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
140 for ks,c in res.items():
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
141 (r,p,k,s)=ks.split(SEP)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
142 print(r,end=EMPTY)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
143 # The following assumes paths are always either length 1 or length 2!!!
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
144 # by open-coding rather than using qq(p)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
145 if p==EMPTY:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
146 print(EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
147 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
148 print('.',p,sep=EMPTY,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
149 print(k,end='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
150 print(s,c,sep='\t')
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
151
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
152 if len(sys.argv)==2:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
153 res=dict()
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
154 if sys.argv[1]=='1':
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
155 pp=pp_tuple
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
156 dump=dump_tuple
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
157 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
158 pp=pp_concat
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
159 dump=dump_concat
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
160 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
161 res=dict((r,dict()) for r in PATHS.keys())
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
162
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
163 def main():
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
164 global n # for debugging
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
165 n=0
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
166 for l in sys.stdin:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
167 if l[0]=='{' and '"WARC-Type":"response"' in l:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
168 j=json.loads(l)
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
169 n+=1
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
170 for s in META_PATH:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
171 j=j[s]
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
172 for k,v in PATHS.items():
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
173 p=j
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
174 try:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
175 for s in v:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
176 p=p[s]
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
177 except KeyError as e:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
178 continue
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
179 walk(p,pp,k)
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
180
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
181 print(n,file=sys.stderr)
61
cfaf5223b071 trying to get my own mapper working
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
182
63
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
183 if dictRes:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
184 print('res=',end=EMPTY)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
185 from pprint import pprint
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
186 pprint(res)
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
187 else:
d46c8b12fc04 support multiple approaches to key combination, use local files to collect results
Henry S. Thompson <ht@markup.co.uk>
parents: 62
diff changeset
188 dump(res)
62
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
189
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
190 def qq(p):
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
191 if p is None:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
192 sys.stdout.write('\t')
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
193 else:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
194 qq1(p[0])
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
195 print(p[1],end='\t')
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
196
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
197 def qq1(p):
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
198 if p is None:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
199 return
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
200 else:
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
201 qq1(p[0])
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
202 print(p[1],end='.')
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
203
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
204 if __name__=="__main__":
892e1c0240e1 added more robust (I hope) error handling,
Henry S. Thompson <ht@markup.co.uk>
parents: 61
diff changeset
205 main()