Mercurial > hg > ooxml
annotate parse.py @ 0:ddd5f7539abc
reconstruct the original more or less
author | Henry Thompson <ht@markup.co.uk> |
---|---|
date | Fri, 24 Mar 2017 20:27:15 +0000 |
parents | |
children | 20424d7e99e4 |
rev | line source |
---|---|
0
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
1 ''' Tokenise Excel formulae |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
2 Starting from regexps and concat rules in |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
3 https://sites.google.com/site/e90e50/random-topics/tool-for-parsing-formulas-in-excel |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
4 and |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
5 parser_formule_with_textbox_v01_2003.xla |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
6 linked to therein''' |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
7 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
8 import sys,re |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
9 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
10 cw = "[^=\-+*/();:,.$<>^]" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
11 #cWW = "[=\-+*/();:,.$<>^]" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
12 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
13 sListSeparator = "," |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
14 sRowSeparator = ";" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
15 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
16 # If comma is decimal symbol, then \ is used in place of comma in Array. |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
17 # If semi-colon is decimal symbol, then \ is used in place of semi-colon |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
18 # in Array. (Semi-colon is row separator) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
19 # See https://www.ablebits.com/office-addins-blog/2015/02/25/array-formulas-functions-excel/ |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
20 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
21 pats=["\"[^\"]*\"|'[^']*'", ## "'[^']*'!", |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
22 "\{[^}]+}", |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
23 sListSeparator, |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
24 cw + "+(?:\." + cw + "+)*\\(", |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
25 "\\)", |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
26 "^=|\\(", |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
27 "."] |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
28 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
29 ## They perform the following tasks, in order: |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
30 ## 1. Represents a text (delimited by double quotes) or a sheet |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
31 ## name (delimited by single quotes) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
32 ## 2. Represents a constant matrix |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
33 ## 3. Represents a list (function parameter) separator |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
34 ## 4. Represents a function name followed by an opening parenthesis |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
35 ## 5. Represents a closing parenthesis |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
36 ## 6. Represents the beginning of the formula or an opening |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
37 ## parenthesis (not part of a function) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
38 ## 7. Each characters not matched by the previous patterns |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
39 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
40 tokPat=re.compile("("+(")|(".join(pats))+")",re.IGNORECASE) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
41 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
42 f="""=IF(I$1>$C$2,VLOOKUP($C52,GP_input!$C$187:$CM$196,VLOOKUP($C$9,lists!$A$34:$B$39,2,FALSE)+BC$4,FALSE),SUMIF('BEX2011'!$C$32:$C$5000,$B$9&"sl"&$B52&$C52&$B53,'BEX2001'!Q$32))""" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
43 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
44 l=tokPat.findall(f) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
45 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
46 ## Visual basic code: |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
47 ## set M = RE.Execute(s) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
48 ## s = "" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
49 ## For Each SM In M |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
50 ## Set SB = SM.SubMatches |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
51 ## If Len(SB(0) & SB(6)) Then |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
52 ## t = SB(0) & SB(6) [HST doesn't understand why atoms |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
53 ## and strings are concatenated] |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
54 ## ElseIf Len(SB(1)) Then |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
55 ## t = Array_Const_Wrap(SB(1), sRowSeparator) & vbCr |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
56 ## ElseIf Len(SB(2) & SB(5)) Then |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
57 ## t = SB(2) & SB(5) & vbCr |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
58 ## ElseIf Len(SB(3)) Then |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
59 ## t = vbCr & SB(3) & vbCr |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
60 ## ElseIf Len(SB(4)) Then |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
61 ## t = vbCr & SB(4) |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
62 ## End If |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
63 ## s = s & t |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
64 ## Next |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
65 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
66 def mergeMatches(l): |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
67 res="" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
68 cur="" |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
69 for txt,cm,sep,ofun,close,opn,misc in l: |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
70 if txt is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
71 res+=txt |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
72 continue |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
73 if misc is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
74 res+=misc |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
75 continue |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
76 if cm is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
77 cur=cm |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
78 elif sep is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
79 cur=sep |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
80 elif ofun is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
81 cur=ofun |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
82 elif close is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
83 cur=close |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
84 elif opn is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
85 cur=opn |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
86 if res is not '': |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
87 yield res |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
88 res='' |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
89 yield cur |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
90 |
ddd5f7539abc
reconstruct the original more or less
Henry Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
91 toks=list(mergeMatches(l)) |