Mercurial > hg > ooxml
annotate notes.txt @ 52:9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
author | Henry S. Thompson <ht@markup.co.uk> |
---|---|
date | Tue, 16 May 2017 17:28:52 +0100 |
parents | 01a7c2ebd3d1 |
children | 191c95187e87 |
rev | line source |
---|---|
37
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
1 Tokenisation patterns, derived from parse.py, derived from |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
2 https://sites.google.com/site/e90e50/random-topics/tool-for-parsing-formulas-in-excel |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
3 and |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
4 parser_formule_with_textbox_v01_2003.xla |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
5 linked to therein |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
6 |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
7 1 ("[^"]*") q |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
8 A text (delimited by double quotes) |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
9 2 (\{[^}]+}) m |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
10 A constant matrix |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
11 3 (,) c |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
12 A list (function parameter) separator |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
13 4 ([^=\-+*/();:,.$<>^!]+(?:\.[^=\-+*/();:,.$<>^!]+)*\() f |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
14 A function name followed by an opening parenthesis |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
15 5 ([)]) p |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
16 A closing parenthesis |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
17 6 (^=|\() l |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
18 The beginning of the formula or an opening |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
19 parenthesis (not part of a function) |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
20 7 ((?:(?:'[^']+')|(?:\[[0-9]+\][^!]*)|(?:[a-zA-Z_][a-zA-Z0-9._]*)!)) n |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
21 A sheet name (either delimited by single quotes, or |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
22 bracketed number plus optional string, |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
23 or simple name (syntax is a _guess_)) |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
24 8 (\$?[A-Z]+\$?[0-9]+) s or r |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
25 A cell reference |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
26 9 ([a-zA-Z_\\][a-zA-Z0-9._]*) v |
39
4c6a341e75da
big rework works on sample2, w/o refs processing
Henry S. Thompson <ht@markup.co.uk>
parents:
37
diff
changeset
|
27 A name (boolean constant or a variable -- anything else?) |
37
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
28 10 (.) x |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
29 Single characters not matched by the previous patterns |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
30 ---------- |
3
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
31 You can't depend on |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
32 <f si="..." t="shared"/> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
33 That is, it's _true_, but you can have a table with shared formulae |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
34 that doesn't use it. Compare M17:T28 (see below, uses shared) and |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
35 C17:J28 (mostly no shared) in sample4 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
36 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
37 Looks like the result of a sweep-and-copy-{right,down} results in the |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
38 _new_ cells covered showing as 'shared': |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
39 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
40 [ ][1][1][1][1]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
41 [2][2][2][2][2]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
42 [2][2][2][2][2]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
43 ... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
44 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
45 Presumably that one was right-then-down, down-then-right would give a |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
46 slightly different pattern |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
47 -------- |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
48 Thinking about a pipeline... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
49 1) convert all variable references into (verbose!) elts: |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
50 <!ELEMENT R EMPTY> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
51 <!ATTLIST R |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
52 ac CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
53 rc CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
54 ar CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
55 rr CDATA IMPLIED> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
56 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
57 where e.g. ac is 'absolute column' |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
58 'D6' --> <R rc='D' rr='6'/> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
59 and |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
60 '$E5' --> <R ac='E' rr='5'/> |
27
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
61 No, in fact -- absolute vs. 'variable' isn't relevant for our purposes. |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
62 What we probably _do_ want is to add to every reference a _relative_ |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
63 version, i.e. +/-columnDelta, +/-rowDelta |
3
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
64 -------- |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
65 Identifying dates is . . . tedious. They will be ints or floats (?), |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
66 with s="<int>", where the int is a 0-origin index into the list of |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
67 <xf...numFmtId="<bin>".../> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
68 children of <cellXfs> in styles.xml, and bin is a built-in date format |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
69 code, see 18.8.30 numFmt (Number Format) in ISO/IEC 29500-1:2016(E) == |
27
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
70 C071691e.pdf DONE |
10
01e80c7a9575
simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents:
3
diff
changeset
|
71 --------- |
01e80c7a9575
simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents:
3
diff
changeset
|
72 Decided to distinguish between type (num, date, str, err, ...) and |
27
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
73 class (cur(rency), others to come?). If non-standard code, just record |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
74 that. |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
75 The current pipe has two main steps, followed by an optional |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
76 prettifying step: |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
77 format.xsl (extracts type={bool,date,num,str,err} |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
78 class={cur,[nothing else yet]} |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
79 code={raw format code if not recognised} |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
80 rect.xsl (fills in gaps, cuts down size, using only bdnse for |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
81 <t>[ype] with attrs c[lass]={c,...} and [co]d[e]=... |
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
82 For now, just using first letters of type, class DONE |
16
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
83 ---------- |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
84 Hmm, looking at real data (kenneth_lay__19506), I see _lots_ of cells |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
85 with (numerical) formats, but no content. Where do I throw those |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
86 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
87 to wait for ascii.xsl or html.xsl. But only copy type in in rect if |
27
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
88 there was content before. DONE |
22
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
89 ----------- |
23 | 90 Using attributes to hold space-separated lists is risky, as in |
24
87e0d620deea
switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents:
23
diff
changeset
|
91 refs.xsl output, is risky! Fixed, see below. |
23 | 92 ----------- |
30
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
93 Not handling variables as references FIXED. Not catching external |
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
94 references to variables FIXED (as externals). Not catching naked [n]! as external |
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
95 references FIXED |
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
96 Solo local vars are recursively dereferenced |
22
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
97 The definition table is in workbook.xml definedNames/definedName[@name=$name]/. |
37
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
98 Sheet name to filename mapping for locals is in workbook.xml |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
99 sheets/sheet[@name=$sname]/@sheetId |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
100 These appear in definedName, single-quoted if (iff?) the sheet name has spaces |
ac3cd8de7a10
towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents:
36
diff
changeset
|
101 (or other specials?) |
36
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
102 ??? Variables on l or r of ranges are just looked up: if they are complex |
30
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
103 no recursion is done: the _semantics_ of this case are not clear to |
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
104 me, need a real-life example... |
47
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
105 Variables whose value is itself a range are not being handled FIXED |
23 | 106 ----------- |
24
87e0d620deea
switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents:
23
diff
changeset
|
107 Switch to default namespace in order to reduce size and improve |
27
8309dcfce613
preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents:
24
diff
changeset
|
108 readability, and to elements instead of attributes DONE |
23 | 109 ----------- |
110 Should put another step after refs.xsl to compute a map from | |
111 distinct-values of all targets to all the cells which use them | |
36
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
112 DONE. |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
113 Likewise ranges @@ |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
114 |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
115 That really does mean we should move to elts for |
23 | 116 each ref or range, since at this point we want to compute vector |
30
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
117 representation as well DONE, so we can identify projections |
23 | 118 |
119 Slightly irritating that we'll have to serialise this as XML and then | |
120 re-build it later... | |
121 ----------- | |
122 Overgenerating in kenneth_lay__19506: e.g. <e:ref c="E9" er="[1]!'.SPX' '.SPX'!"/> | |
123 from <f>[1]!'.SPX'</f> | |
124 Hmm. This cell displays in Excel as REUTERS|IDN!.SPX | |
125 The indirections work as follows: | |
126 in workbook.xml: | |
127 <externalReferences> | |
128 <externalReference r:id="rId3"/> | |
129 <externalReference r:id="rId4"/> | |
130 </externalReferences> | |
131 in _rels/workbook.xml.rels | |
132 <Relationship Id="rId3" Target="externalLinks/externalLink1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink"/> | |
133 in externalLinks/externalLink1.xml | |
134 <ddeLink ddeService="REUTER" ddeTopic="IDN"... | |
135 <ddeItems> | |
136 ... | |
137 <ddeItem advise="1" name=".SPX"> | |
138 <values> | |
139 <value> | |
140 <val>1264.96</val> | |
141 </value> | |
142 </values> | |
143 </ddeItem> | |
144 Whew! | |
30
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
145 FIXED |
23 | 146 ---------- |
28
c56a2e6990bd
convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents:
27
diff
changeset
|
147 http://upcommons.upc.edu/bitstream/handle/2117/100584/KDIR_2016_47_CR.pdf |
c56a2e6990bd
convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents:
27
diff
changeset
|
148 [downloaded] |
c56a2e6990bd
convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents:
27
diff
changeset
|
149 uses appearance a lot. That needs to be harvested from styles.xml |
c56a2e6990bd
convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents:
27
diff
changeset
|
150 The kenneth_lay enron sample has _403_ numbered formats... |
30
16eff0d30d4d
tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents:
28
diff
changeset
|
151 ---------- |
23 | 152 Tried the largest sheet from the largest .xlsx I could find: |
153 fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx | |
154 -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx' | |
155 -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml | |
156 | |
157 > lxcount xl/worksheets/sheet3.xml | sort -k2nr | |
158 *Total* 1230217 | |
159 c 596032 | |
160 v 595876 | |
161 f 19201 | |
162 row 18985 | |
163 col 106 | |
164 | |
165 <dimension ref="A1:DY18985"/> | |
166 | |
167 Blew java out of the water :-( | |
168 java.lang.OutOfMemoryError: Java heap space | |
169 | |
170 Need to try again with more memory, if I remember how... | |
171 | |
172 The raw result is going to have 18985 x 102 == 2 million cells == | |
173 (assuming average cell size of 30 bytes and row overhead of 20 (* | |
174 18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable... | |
175 ---------------- | |
35
e500d7c18aad
Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents:
30
diff
changeset
|
176 sample4 html reveals several problems: mistaken content based on class |
e500d7c18aad
Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents:
30
diff
changeset
|
177 bug, e.g. B4 is 'a' FIXED |
e500d7c18aad
Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents:
30
diff
changeset
|
178 highlighted cells are being |
e500d7c18aad
Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents:
30
diff
changeset
|
179 labelled as cur, e.g. B61 in |
e500d7c18aad
Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents:
30
diff
changeset
|
180 output of format.xsl FIXED |
36
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
181 ----------- |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
182 Need to rethink variable handling... |
45
6ed900e8cc61
towards comparable formulae
Henry S. Thompson <ht@markup.co.uk>
parents:
39
diff
changeset
|
183 Is all we really need a normalised formula computation?: |
36
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
184 1) recursively replace variables; |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
185 2) convert all simple refs to new CR string normal form: |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
186 crnf ::= col row |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
187 col ::= abs | rel |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
188 row ::= abs | rel |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
189 abs ::= '\xAA' xs:positiveInteger |
ae605b77d1e4
compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents:
35
diff
changeset
|
190 rel ::= '\xAE' ( ( '-' xs:positiveInteger ) | xs:nonNegativeInteger ) |
47
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
191 ----------- |
48
5d9806f90896
basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents:
47
diff
changeset
|
192 Would <c c=COL [fi= [si=]]> be sufficient? |
50
01a7c2ebd3d1
top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents:
49
diff
changeset
|
193 fi an index into _all_ functions, si original index into explicitly |
01a7c2ebd3d1
top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents:
49
diff
changeset
|
194 shared functions -- note that the same fi may appear with multiple |
01a7c2ebd3d1
top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents:
49
diff
changeset
|
195 si, see discussion back at the top of this doc. |
48
5d9806f90896
basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents:
47
diff
changeset
|
196 Brute force for now -- rect sees shared table, computes CRNF |
49
d3569a8cbf7a
shared refs rebuilt correctly
Henry S. Thompson <ht@markup.co.uk>
parents:
48
diff
changeset
|
197 Not good enough -- <f> in shared table can't be used as is, need to |
d3569a8cbf7a
shared refs rebuilt correctly
Henry S. Thompson <ht@markup.co.uk>
parents:
48
diff
changeset
|
198 rebuild ref names relative to each new home. FIXED |
48
5d9806f90896
basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents:
47
diff
changeset
|
199 ----------- |
47
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
200 Picking colours to label regions, e.g. with similar formulae: |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
201 http://stackoverflow.com/questions/470690/how-to-automatically-generate-n-distinct-colors |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
202 Start with just top-n, limited to 22 from Kelly |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
203 #FFB300, # Vivid Yellow |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
204 #803E75, # Strong Purple |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
205 #FF6800, # Vivid Orange |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
206 #A6BDD7, # Very Light Blue |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
207 #C10020, # Vivid Red |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
208 #CEA262, # Grayish Yellow |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
209 #817066, # Medium Gray |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
210 # The following don't work well for people with defective color vision |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
211 #007D34, # Vivid Green |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
212 #F6768E, # Strong Purplish Pink |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
213 #00538A, # Strong Blue |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
214 #FF7A5C, # Strong Yellowish Pink |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
215 #53377A, # Strong Violet |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
216 #FF8E00, # Vivid Orange Yellow |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
217 #B32851, # Strong Purplish Red |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
218 #F4C800, # Vivid Greenish Yellow |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
219 #7F180D, # Strong Reddish Brown |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
220 #93AA00, # Vivid Yellowish Green |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
221 #593315, # Deep Yellowish Brown |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
222 #F13A13, # Vivid Reddish Orange |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
223 #232C16, # Dark Olive Green |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
224 ------------ |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
225 @@ string identify, to say nothing of actual value, is lost -- fix? |
3e9a3e51627e
explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents:
45
diff
changeset
|
226 @@ row/column/both spans |
52
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
227 ------ |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
228 enron/kenneth_lay__19506 contains this formula: |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
229 |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
230 <f>[1]!'.SPX'</f> |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
231 |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
232 which crashes tokenise/rnf |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
233 |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
234 Changes intended to fix this fixed a bug (?) which wasn't properly |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
235 merging e.g. +3 -- no examples of larger numbers available to check |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
236 with... |
9bb415e0adc9
try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents:
50
diff
changeset
|
237 |