annotate bin/uniq_merge.py @ 143:ddff993994be

too clever by half, keys won't work in parallel for e.g. media types
author Henry S. Thompson <ht@inf.ed.ac.uk>
date Wed, 20 Oct 2021 15:47:55 +0000
parents 464d2dfb99c9
children f5e2211b50bd
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
88
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
1 #!/usr/bin/env python3
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
2 # Merge counts by key from the output of "uniq -c" and sort in descending order
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
3 # An alternative to sus when the scale is too big for the initial sort, or if uniq -c already does a lot
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
4 # of the work
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
5 # Usage: ... | uniq -c | uniq-merge.py
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
6 import sys
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
7 s={}
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
8 for l in sys.stdin:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
9 (i,d)=l.split()
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
10 i=int(i)
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
11 if d in s:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
12 s[d]+=i
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
13 else:
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
14 s[d]=i
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
15 for (d,n) in sorted(s.items(),key=lambda j:j[1],reverse=True):
Henry S. Thompson <ht@inf.ed.ac.uk>
parents:
diff changeset
16 print('%5d\t%s'%(n,d))