annotate parse.py @ 0:ddd5f7539abc

reconstruct the original more or less
author Henry Thompson <ht@markup.co.uk>
date Fri, 24 Mar 2017 20:27:15 +0000
parents
children 20424d7e99e4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
1 ''' Tokenise Excel formulae
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
2 Starting from regexps and concat rules in
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
3 https://sites.google.com/site/e90e50/random-topics/tool-for-parsing-formulas-in-excel
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
4 and
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
5 parser_formule_with_textbox_v01_2003.xla
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
6 linked to therein'''
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
7
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
8 import sys,re
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
9
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
10 cw = "[^=\-+*/();:,.$<>^]"
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
11 #cWW = "[=\-+*/();:,.$<>^]"
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
12
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
13 sListSeparator = ","
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
14 sRowSeparator = ";"
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
15
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
16 # If comma is decimal symbol, then \ is used in place of comma in Array.
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
17 # If semi-colon is decimal symbol, then \ is used in place of semi-colon
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
18 # in Array. (Semi-colon is row separator)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
19 # See https://www.ablebits.com/office-addins-blog/2015/02/25/array-formulas-functions-excel/
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
20
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
21 pats=["\"[^\"]*\"|'[^']*'", ## "'[^']*'!",
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
22 "\{[^}]+}",
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
23 sListSeparator,
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
24 cw + "+(?:\." + cw + "+)*\\(",
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
25 "\\)",
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
26 "^=|\\(",
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
27 "."]
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
28
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
29 ## They perform the following tasks, in order:
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
30 ## 1. Represents a text (delimited by double quotes) or a sheet
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
31 ## name (delimited by single quotes)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
32 ## 2. Represents a constant matrix
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
33 ## 3. Represents a list (function parameter) separator
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
34 ## 4. Represents a function name followed by an opening parenthesis
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
35 ## 5. Represents a closing parenthesis
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
36 ## 6. Represents the beginning of the formula or an opening
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
37 ## parenthesis (not part of a function)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
38 ## 7. Each characters not matched by the previous patterns
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
39
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
40 tokPat=re.compile("("+(")|(".join(pats))+")",re.IGNORECASE)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
41
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
42 f="""=IF(I$1>$C$2,VLOOKUP($C52,GP_input!$C$187:$CM$196,VLOOKUP($C$9,lists!$A$34:$B$39,2,FALSE)+BC$4,FALSE),SUMIF('BEX2011'!$C$32:$C$5000,$B$9&"sl"&$B52&$C52&$B53,'BEX2001'!Q$32))"""
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
43
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
44 l=tokPat.findall(f)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
45
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
46 ## Visual basic code:
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
47 ## set M = RE.Execute(s)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
48 ## s = ""
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
49 ## For Each SM In M
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
50 ## Set SB = SM.SubMatches
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
51 ## If Len(SB(0) & SB(6)) Then
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
52 ## t = SB(0) & SB(6) [HST doesn't understand why atoms
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
53 ## and strings are concatenated]
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
54 ## ElseIf Len(SB(1)) Then
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
55 ## t = Array_Const_Wrap(SB(1), sRowSeparator) & vbCr
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
56 ## ElseIf Len(SB(2) & SB(5)) Then
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
57 ## t = SB(2) & SB(5) & vbCr
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
58 ## ElseIf Len(SB(3)) Then
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
59 ## t = vbCr & SB(3) & vbCr
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
60 ## ElseIf Len(SB(4)) Then
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
61 ## t = vbCr & SB(4)
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
62 ## End If
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
63 ## s = s & t
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
64 ## Next
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
65
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
66 def mergeMatches(l):
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
67 res=""
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
68 cur=""
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
69 for txt,cm,sep,ofun,close,opn,misc in l:
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
70 if txt is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
71 res+=txt
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
72 continue
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
73 if misc is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
74 res+=misc
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
75 continue
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
76 if cm is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
77 cur=cm
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
78 elif sep is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
79 cur=sep
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
80 elif ofun is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
81 cur=ofun
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
82 elif close is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
83 cur=close
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
84 elif opn is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
85 cur=opn
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
86 if res is not '':
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
87 yield res
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
88 res=''
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
89 yield cur
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
90
ddd5f7539abc reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff changeset
91 toks=list(mergeMatches(l))