annotate notes.txt @ 26:d2ca3ea1f5ae

plumb in and use inverted ref table (via key)
author Henry S. Thompson <ht@markup.co.uk>
date Fri, 07 Apr 2017 18:42:47 +0100
parents 87e0d620deea
children 8309dcfce613
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
1 You can't depend on
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
2 <f si="..." t="shared"/>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
3 That is, it's _true_, but you can have a table with shared formulae
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
4 that doesn't use it. Compare M17:T28 (see below, uses shared) and
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
5 C17:J28 (mostly no shared) in sample4
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
6
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
7 Looks like the result of a sweep-and-copy-{right,down} results in the
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
8 _new_ cells covered showing as 'shared':
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
9
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
10 [ ][1][1][1][1]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
11 [2][2][2][2][2]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
12 [2][2][2][2][2]...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
13 ...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
14
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
15 Presumably that one was right-then-down, down-then-right would give a
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
16 slightly different pattern
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
17 --------
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
18 Thinking about a pipeline...
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
19 1) convert all variable references into (verbose!) elts:
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
20 <!ELEMENT R EMPTY>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
21 <!ATTLIST R
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
22 ac CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
23 rc CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
24 ar CDATA IMPLIED
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
25 rr CDATA IMPLIED>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
26
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
27 where e.g. ac is 'absolute column'
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
28 'D6' --> <R rc='D' rr='6'/>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
29 and
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
30 '$E5' --> <R ac='E' rr='5'/>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
31 --------
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
32 Identifying dates is . . . tedious. They will be ints or floats (?),
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
33 with s="<int>", where the int is a 0-origin index into the list of
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
34 <xf...numFmtId="<bin>".../>
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
35 children of <cellXfs> in styles.xml, and bin is a built-in date format
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
36 code, see 18.8.30 numFmt (Number Format) in ISO/IEC 29500-1:2016(E) ==
2c115aefde6b beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff changeset
37 C071691e.pdf
10
01e80c7a9575 simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents: 3
diff changeset
38 ---------
01e80c7a9575 simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents: 3
diff changeset
39 Decided to distinguish between type (num, date, str, err, ...) and
01e80c7a9575 simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents: 3
diff changeset
40 class (cur, others to come?). If non-standard code, just record that.
16
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
41 ----------
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
42 Hmm, looking at real data (kenneth_lay__19506), I see _lots_ of cells
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
43 with (numerical) formats, but no content. Where do I throw those
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
44 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
45 to wait for ascii.xsl or html.xsl. But only copy type in in rect if
2bbd067529b6 improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents: 10
diff changeset
46 there was content before.
22
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
47 -----------
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
48 Using attributes to hold space-separated lists is risky, as in
24
87e0d620deea switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents: 23
diff changeset
49 refs.xsl output, is risky! Fixed, see below.
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
50 -----------
22
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
51 Not handling variables as references. Not catching external
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
52 references to variables. Not catching naked [n]! as external
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
53 references.
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
54 Fixed, but not dereferenced vars
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
55 The definition table is in workbook.xml definedNames/definedName[@name=$name]/.
ca98c74a7cb1 towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents: 16
diff changeset
56 Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
57 -----------
24
87e0d620deea switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents: 23
diff changeset
58 Switch to default namespace in order to reduce size and improve
87e0d620deea switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents: 23
diff changeset
59 readability, and to elements instead of attributes
23
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
60 -----------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
61 Should put another step after refs.xsl to compute a map from
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
62 distinct-values of all targets to all the cells which use them
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
63 (likewise ranges). That really does mean we should move to elts for
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
64 each ref or range, since at this point we want to compute vector
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
65 representation as well, so we can identify projections
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
66
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
67 Slightly irritating that we'll have to serialise this as XML and then
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
68 re-build it later...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
69 -----------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
70 Overgenerating in kenneth_lay__19506: e.g. <e:ref c="E9" er="[1]!'.SPX' '.SPX'!"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
71 from <f>[1]!'.SPX'</f>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
72 Hmm. This cell displays in Excel as REUTERS|IDN!.SPX
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
73 The indirections work as follows:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
74 in workbook.xml:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
75 <externalReferences>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
76 <externalReference r:id="rId3"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
77 <externalReference r:id="rId4"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
78 </externalReferences>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
79 in _rels/workbook.xml.rels
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
80 <Relationship Id="rId3" Target="externalLinks/externalLink1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
81 in externalLinks/externalLink1.xml
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
82 <ddeLink ddeService="REUTER" ddeTopic="IDN"...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
83 <ddeItems>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
84 ...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
85 <ddeItem advise="1" name=".SPX">
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
86 <values>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
87 <value>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
88 <val>1264.96</val>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
89 </value>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
90 </values>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
91 </ddeItem>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
92 Whew!
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
93 ----------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
94 Tried the largest sheet from the largest .xlsx I could find:
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
95 fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
96 -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx'
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
97 -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
98
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
99 > lxcount xl/worksheets/sheet3.xml | sort -k2nr
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
100 *Total* 1230217
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
101 c 596032
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
102 v 595876
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
103 f 19201
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
104 row 18985
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
105 col 106
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
106
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
107 <dimension ref="A1:DY18985"/>
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
108
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
109 Blew java out of the water :-(
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
110 java.lang.OutOfMemoryError: Java heap space
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
111
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
112 Need to try again with more memory, if I remember how...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
113
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
114 The raw result is going to have 18985 x 102 == 2 million cells ==
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
115 (assuming average cell size of 30 bytes and row overhead of 20 (*
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
116 18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable...
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
117 ----------------
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
118 Back to ranges -
bfa38afaea63 change to default ns
Henry S. Thompson <ht@markup.co.uk>
parents: 22
diff changeset
119