annotate bin/per_segment.py @ 26:5c5440e7854a

a bit more
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Tue, 15 Nov 2022 19:37:28 +0000
parents e82a82ea3704
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
23
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
1 #!/usr/bin/python3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
2 '''refactor a per-cdx count table to be per-segment
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
3 input on STDIN
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
4 Usage: per_segment segment-column
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
5 Assumes column 0 is empty, count is in column 1
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
6 Segment column is 0-origin
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
7 '''
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
8
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
9 import sys
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
10
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
11 c=int(sys.argv[1])
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
12
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
13 ss=[dict() for i in range(100)]
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
14
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
15 for l in sys.stdin:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
16 try:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
17 cc=l.split('\t')
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
18 s=int(cc.pop(c))
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
19 n=int(cc.pop(1))
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
20 ll='\t'.join(cc[1:]) # note we ditch the initial empty column
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
21 #print(s,n,cc,ll,sep='|')
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
22 #exit(0)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
23 t=ss[s].get(ll,0)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
24 ss[s][ll]=t+n
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
25 except:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
26 sys.stdout.write(l)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
27 print(cc)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
28 exit(1)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
29
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
30 # note this won't work if c is last column!
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
31 for s in range(100):
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
32 with open('s%s.tsv'%s,'w') as f:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
33 for (l,c) in sorted(ss[s].items(),key=lambda p:p[1],reverse=True):
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
34 f.write(str(c))
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
35 f.write('\t')
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
36 f.write(l)