annotate notes.txt @ 52:9bb415e0adc9

try to fix error processin odd REUTER|IDN\!'.SPX' external ref
author Henry S. Thompson <ht@markup.co.uk>
date Tue, 16 May 2017 17:28:52 +0100
parents 01a7c2ebd3d1
children 191c95187e87
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
37
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
1 Tokenisation patterns, derived from parse.py, derived from
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
2 https://sites.google.com/site/e90e50/random-topics/tool-for-parsing-formulas-in-excel
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
3 and
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
4 parser_formule_with_textbox_v01_2003.xla
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
5 linked to therein
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
6
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
7 1 ("[^"]*") q
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
8 A text (delimited by double quotes)
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
9 2 (\{[^}]+}) m
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
10 A constant matrix
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
11 3 (,) c
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
12 A list (function parameter) separator
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
13 4 ([^=\-+*/();:,.$&lt;>^!]+(?:\.[^=\-+*/();:,.$&lt;>^!]+)*\() f
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
14 A function name followed by an opening parenthesis
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
15 5 ([)]) p
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
16 A closing parenthesis
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
17 6 (^=|\() l
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
18 The beginning of the formula or an opening
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
19 parenthesis (not part of a function)
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
20 7 ((?:(?:'[^']+')|(?:\[[0-9]+\][^!]*)|(?:[a-zA-Z_][a-zA-Z0-9._]*)!)) n
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
21 A sheet name (either delimited by single quotes, or
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
22 bracketed number plus optional string,
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
23 or simple name (syntax is a _guess_))
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
24 8 (\$?[A-Z]+\$?[0-9]+) s or r
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
25 A cell reference
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
26 9 ([a-zA-Z_\\][a-zA-Z0-9._]*) v
39
4c6a341e75da big rework works on sample2, w/o refs processing
Henry S. Thompson <ht@markup.co.uk>
parents: 37
diff changeset
27 A name (boolean constant or a variable -- anything else?)
37
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
28 10 (.) x
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
29 Single characters not matched by the previous patterns
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
30 ----------
3
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
31 You can't depend on
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
32 <f si="..." t="shared"/>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
33 That is, it's _true_, but you can have a table with shared formulae
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
34 that doesn't use it. Compare M17:T28 (see below, uses shared) and
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
35 C17:J28 (mostly no shared) in sample4
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
36
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
37 Looks like the result of a sweep-and-copy-{right,down} results in the
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
38 _new_ cells covered showing as 'shared':
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
39
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
40 [ ][1][1][1][1]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
41 [2][2][2][2][2]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
42 [2][2][2][2][2]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
43 ...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
44
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
45 Presumably that one was right-then-down, down-then-right would give a
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
46 slightly different pattern
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
47 --------
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
48 Thinking about a pipeline...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
49 1) convert all variable references into (verbose!) elts:
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
50 <!ELEMENT R EMPTY>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
51 <!ATTLIST R
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
52 ac CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
53 rc CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
54 ar CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
55 rr CDATA IMPLIED>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
56
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
57 where e.g. ac is 'absolute column'
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
58 'D6' --> <R rc='D' rr='6'/>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
59 and
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
60 '$E5' --> <R ac='E' rr='5'/>
27
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
61 No, in fact -- absolute vs. 'variable' isn't relevant for our purposes.
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
62 What we probably _do_ want is to add to every reference a _relative_
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
63 version, i.e. +/-columnDelta, +/-rowDelta
3
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
64 --------
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
65 Identifying dates is . . . tedious. They will be ints or floats (?),
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
66 with s="<int>", where the int is a 0-origin index into the list of
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
67 <xf...numFmtId="<bin>".../>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
68 children of <cellXfs> in styles.xml, and bin is a built-in date format
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
69 code, see 18.8.30 numFmt (Number Format) in ISO/IEC 29500-1:2016(E) ==
27
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
70 C071691e.pdf DONE
10
01e80c7a9575 simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents: 3
diff changeset
71 ---------
01e80c7a9575 simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents: 3
diff changeset
72 Decided to distinguish between type (num, date, str, err, ...) and
27
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
73 class (cur(rency), others to come?). If non-standard code, just record
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
74 that.
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
75 The current pipe has two main steps, followed by an optional
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
76 prettifying step:
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
77 format.xsl (extracts type={bool,date,num,str,err}
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
78 class={cur,[nothing else yet]}
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
79 code={raw format code if not recognised}
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
80 rect.xsl (fills in gaps, cuts down size, using only bdnse for
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
81 <t>[ype] with attrs c[lass]={c,...} and [co]d[e]=...
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
82 For now, just using first letters of type, class DONE
16
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
83 ----------
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
84 Hmm, looking at real data (kenneth_lay__19506), I see _lots_ of cells
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
85 with (numerical) formats, but no content. Where do I throw those
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
86 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
87 to wait for ascii.xsl or html.xsl. But only copy type in in rect if
27
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
88 there was content before. DONE
22
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
89 -----------
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
90 Using attributes to hold space-separated lists is risky, as in
24
87e0d620deea switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents: 23
diff changeset
91 refs.xsl output, is risky! Fixed, see below.
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
92 -----------
30
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
93 Not handling variables as references FIXED. Not catching external
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
94 references to variables FIXED (as externals). Not catching naked [n]! as external
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
95 references FIXED
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
96 Solo local vars are recursively dereferenced
22
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
97 The definition table is in workbook.xml definedNames/definedName[@name=$name]/.
37
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
98 Sheet name to filename mapping for locals is in workbook.xml
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
99 sheets/sheet[@name=$sname]/@sheetId
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
100 These appear in definedName, single-quoted if (iff?) the sheet name has spaces
ac3cd8de7a10 towards big rework of tokenisation
Henry S. Thompson <ht@markup.co.uk>
parents: 36
diff changeset
101 (or other specials?)
36
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
102 ??? Variables on l or r of ranges are just looked up: if they are complex
30
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
103 no recursion is done: the _semantics_ of this case are not clear to
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
104 me, need a real-life example...
47
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
105 Variables whose value is itself a range are not being handled FIXED
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
106 -----------
24
87e0d620deea switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents: 23
diff changeset
107 Switch to default namespace in order to reduce size and improve
27
8309dcfce613 preparing for variable deref
Henry S. Thompson <ht@markup.co.uk>
parents: 24
diff changeset
108 readability, and to elements instead of attributes DONE
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
109 -----------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
110 Should put another step after refs.xsl to compute a map from
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
111 distinct-values of all targets to all the cells which use them
36
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
112 DONE.
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
113 Likewise ranges @@
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
114
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
115 That really does mean we should move to elts for
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
116 each ref or range, since at this point we want to compute vector
30
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
117 representation as well DONE, so we can identify projections
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
118
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
119 Slightly irritating that we'll have to serialise this as XML and then
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
120 re-build it later...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
121 -----------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
122 Overgenerating in kenneth_lay__19506: e.g. <e:ref c="E9" er="[1]!'.SPX' '.SPX'!"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
123 from <f>[1]!'.SPX'</f>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
124 Hmm. This cell displays in Excel as REUTERS|IDN!.SPX
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
125 The indirections work as follows:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
126 in workbook.xml:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
127 <externalReferences>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
128 <externalReference r:id="rId3"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
129 <externalReference r:id="rId4"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
130 </externalReferences>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
131 in _rels/workbook.xml.rels
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
132 <Relationship Id="rId3" Target="externalLinks/externalLink1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
133 in externalLinks/externalLink1.xml
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
134 <ddeLink ddeService="REUTER" ddeTopic="IDN"...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
135 <ddeItems>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
136 ...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
137 <ddeItem advise="1" name=".SPX">
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
138 <values>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
139 <value>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
140 <val>1264.96</val>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
141 </value>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
142 </values>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
143 </ddeItem>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
144 Whew!
30
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
145 FIXED
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
146 ----------
28
c56a2e6990bd convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents: 27
diff changeset
147 http://upcommons.upc.edu/bitstream/handle/2117/100584/KDIR_2016_47_CR.pdf
c56a2e6990bd convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents: 27
diff changeset
148 [downloaded]
c56a2e6990bd convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents: 27
diff changeset
149 uses appearance a lot. That needs to be harvested from styles.xml
c56a2e6990bd convert tokenisation to a function, so can make recursive
Henry S. Thompson <ht@markup.co.uk>
parents: 27
diff changeset
150 The kenneth_lay enron sample has _403_ numbered formats...
30
16eff0d30d4d tidied dereferencing, added simple (no recursion) coverage for variables in ranges
Henry S. Thompson <ht@markup.co.uk>
parents: 28
diff changeset
151 ----------
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
152 Tried the largest sheet from the largest .xlsx I could find:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
153 fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
154 -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx'
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
155 -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
156
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
157 > lxcount xl/worksheets/sheet3.xml | sort -k2nr
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
158 *Total* 1230217
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
159 c 596032
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
160 v 595876
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
161 f 19201
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
162 row 18985
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
163 col 106
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
164
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
165 <dimension ref="A1:DY18985"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
166
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
167 Blew java out of the water :-(
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
168 java.lang.OutOfMemoryError: Java heap space
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
169
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
170 Need to try again with more memory, if I remember how...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
171
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
172 The raw result is going to have 18985 x 102 == 2 million cells ==
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
173 (assuming average cell size of 30 bytes and row overhead of 20 (*
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
174 18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
175 ----------------
35
e500d7c18aad Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents: 30
diff changeset
176 sample4 html reveals several problems: mistaken content based on class
e500d7c18aad Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents: 30
diff changeset
177 bug, e.g. B4 is 'a' FIXED
e500d7c18aad Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents: 30
diff changeset
178 highlighted cells are being
e500d7c18aad Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents: 30
diff changeset
179 labelled as cur, e.g. B61 in
e500d7c18aad Fixed confusion wrt gen vs. num, nature of @ format (id=49)
Henry S. Thompson <ht@markup.co.uk>
parents: 30
diff changeset
180 output of format.xsl FIXED
36
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
181 -----------
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
182 Need to rethink variable handling...
45
6ed900e8cc61 towards comparable formulae
Henry S. Thompson <ht@markup.co.uk>
parents: 39
diff changeset
183 Is all we really need a normalised formula computation?:
36
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
184 1) recursively replace variables;
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
185 2) convert all simple refs to new CR string normal form:
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
186 crnf ::= col row
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
187 col ::= abs | rel
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
188 row ::= abs | rel
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
189 abs ::= '\xAA' xs:positiveInteger
ae605b77d1e4 compute (but not use) master formula cells info,
Henry S. Thompson <ht@markup.co.uk>
parents: 35
diff changeset
190 rel ::= '\xAE' ( ( '-' xs:positiveInteger ) | xs:nonNegativeInteger )
47
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
191 -----------
48
5d9806f90896 basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents: 47
diff changeset
192 Would <c c=COL [fi= [si=]]> be sufficient?
50
01a7c2ebd3d1 top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents: 49
diff changeset
193 fi an index into _all_ functions, si original index into explicitly
01a7c2ebd3d1 top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents: 49
diff changeset
194 shared functions -- note that the same fi may appear with multiple
01a7c2ebd3d1 top 20 shared formulae coloured in
Henry S. Thompson <ht@markup.co.uk>
parents: 49
diff changeset
195 si, see discussion back at the top of this doc.
48
5d9806f90896 basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents: 47
diff changeset
196 Brute force for now -- rect sees shared table, computes CRNF
49
d3569a8cbf7a shared refs rebuilt correctly
Henry S. Thompson <ht@markup.co.uk>
parents: 48
diff changeset
197 Not good enough -- <f> in shared table can't be used as is, need to
d3569a8cbf7a shared refs rebuilt correctly
Henry S. Thompson <ht@markup.co.uk>
parents: 48
diff changeset
198 rebuild ref names relative to each new home. FIXED
48
5d9806f90896 basic integration of shared, but copying <f> is wrong, should reconstruct by denormalising <nf> for new home
Henry S. Thompson <ht@markup.co.uk>
parents: 47
diff changeset
199 -----------
47
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
200 Picking colours to label regions, e.g. with similar formulae:
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
201 http://stackoverflow.com/questions/470690/how-to-automatically-generate-n-distinct-colors
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
202 Start with just top-n, limited to 22 from Kelly
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
203 #FFB300, # Vivid Yellow
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
204 #803E75, # Strong Purple
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
205 #FF6800, # Vivid Orange
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
206 #A6BDD7, # Very Light Blue
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
207 #C10020, # Vivid Red
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
208 #CEA262, # Grayish Yellow
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
209 #817066, # Medium Gray
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
210 # The following don't work well for people with defective color vision
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
211 #007D34, # Vivid Green
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
212 #F6768E, # Strong Purplish Pink
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
213 #00538A, # Strong Blue
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
214 #FF7A5C, # Strong Yellowish Pink
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
215 #53377A, # Strong Violet
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
216 #FF8E00, # Vivid Orange Yellow
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
217 #B32851, # Strong Purplish Red
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
218 #F4C800, # Vivid Greenish Yellow
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
219 #7F180D, # Strong Reddish Brown
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
220 #93AA00, # Vivid Yellowish Green
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
221 #593315, # Deep Yellowish Brown
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
222 #F13A13, # Vivid Reddish Orange
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
223 #232C16, # Dark Olive Green
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
224 ------------
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
225 @@ string identify, to say nothing of actual value, is lost -- fix?
3e9a3e51627e explicit form match working, but shared still needs work
Henry S. Thompson <ht@markup.co.uk>
parents: 45
diff changeset
226 @@ row/column/both spans
52
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
227 ------
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
228 enron/kenneth_lay__19506 contains this formula:
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
229
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
230 <f>[1]!'.SPX'</f>
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
231
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
232 which crashes tokenise/rnf
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
233
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
234 Changes intended to fix this fixed a bug (?) which wasn't properly
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
235 merging e.g. +3 -- no examples of larger numbers available to check
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
236 with...
9bb415e0adc9 try to fix error processin odd REUTER|IDN\!'.SPX' external ref
Henry S. Thompson <ht@markup.co.uk>
parents: 50
diff changeset
237