Mercurial > hg > ooxml
annotate notes.txt @ 24:87e0d620deea
switch to elements from attributes and default namespace
author | Henry S. Thompson <ht@markup.co.uk> |
---|---|
date | Thu, 06 Apr 2017 17:24:30 +0100 |
parents | bfa38afaea63 |
children | 8309dcfce613 |
rev | line source |
---|---|
3
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
1 You can't depend on |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
2 <f si="..." t="shared"/> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
3 That is, it's _true_, but you can have a table with shared formulae |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
4 that doesn't use it. Compare M17:T28 (see below, uses shared) and |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
5 C17:J28 (mostly no shared) in sample4 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
6 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
7 Looks like the result of a sweep-and-copy-{right,down} results in the |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
8 _new_ cells covered showing as 'shared': |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
9 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
10 [ ][1][1][1][1]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
11 [2][2][2][2][2]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
12 [2][2][2][2][2]... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
13 ... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
14 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
15 Presumably that one was right-then-down, down-then-right would give a |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
16 slightly different pattern |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
17 -------- |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
18 Thinking about a pipeline... |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
19 1) convert all variable references into (verbose!) elts: |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
20 <!ELEMENT R EMPTY> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
21 <!ATTLIST R |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
22 ac CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
23 rc CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
24 ar CDATA IMPLIED |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
25 rr CDATA IMPLIED> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
26 |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
27 where e.g. ac is 'absolute column' |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
28 'D6' --> <R rc='D' rr='6'/> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
29 and |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
30 '$E5' --> <R ac='E' rr='5'/> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
31 -------- |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
32 Identifying dates is . . . tedious. They will be ints or floats (?), |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
33 with s="<int>", where the int is a 0-origin index into the list of |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
34 <xf...numFmtId="<bin>".../> |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
35 children of <cellXfs> in styles.xml, and bin is a built-in date format |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
36 code, see 18.8.30 numFmt (Number Format) in ISO/IEC 29500-1:2016(E) == |
2c115aefde6b
beginning work on elaboration of worksheets
Henry S. Thompson <ht@markup.co.uk>
parents:
diff
changeset
|
37 C071691e.pdf |
10
01e80c7a9575
simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents:
3
diff
changeset
|
38 --------- |
01e80c7a9575
simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents:
3
diff
changeset
|
39 Decided to distinguish between type (num, date, str, err, ...) and |
01e80c7a9575
simple ascii type matrix output working
Henry S. Thompson <ht@markup.co.uk>
parents:
3
diff
changeset
|
40 class (cur, others to come?). If non-standard code, just record that. |
16
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
41 ---------- |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
42 Hmm, looking at real data (kenneth_lay__19506), I see _lots_ of cells |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
43 with (numerical) formats, but no content. Where do I throw those |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
44 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
45 to wait for ascii.xsl or html.xsl. But only copy type in in rect if |
2bbd067529b6
improve efficiency, detect blank rows, don't type empty cells
Henry S. Thompson <ht@markup.co.uk>
parents:
10
diff
changeset
|
46 there was content before. |
22
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
47 ----------- |
23 | 48 Using attributes to hold space-separated lists is risky, as in |
24
87e0d620deea
switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents:
23
diff
changeset
|
49 refs.xsl output, is risky! Fixed, see below. |
23 | 50 ----------- |
22
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
51 Not handling variables as references. Not catching external |
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
52 references to variables. Not catching naked [n]! as external |
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
53 references. |
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
54 Fixed, but not dereferenced vars |
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
55 The definition table is in workbook.xml definedNames/definedName[@name=$name]/. |
ca98c74a7cb1
towards var handling, no lookup yet
Henry S. Thompson <ht@markup.co.uk>
parents:
16
diff
changeset
|
56 Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId |
23 | 57 ----------- |
24
87e0d620deea
switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents:
23
diff
changeset
|
58 Switch to default namespace in order to reduce size and improve |
87e0d620deea
switch to elements from attributes and default namespace
Henry S. Thompson <ht@markup.co.uk>
parents:
23
diff
changeset
|
59 readability, and to elements instead of attributes |
23 | 60 ----------- |
61 Should put another step after refs.xsl to compute a map from | |
62 distinct-values of all targets to all the cells which use them | |
63 (likewise ranges). That really does mean we should move to elts for | |
64 each ref or range, since at this point we want to compute vector | |
65 representation as well, so we can identify projections | |
66 | |
67 Slightly irritating that we'll have to serialise this as XML and then | |
68 re-build it later... | |
69 ----------- | |
70 Overgenerating in kenneth_lay__19506: e.g. <e:ref c="E9" er="[1]!'.SPX' '.SPX'!"/> | |
71 from <f>[1]!'.SPX'</f> | |
72 Hmm. This cell displays in Excel as REUTERS|IDN!.SPX | |
73 The indirections work as follows: | |
74 in workbook.xml: | |
75 <externalReferences> | |
76 <externalReference r:id="rId3"/> | |
77 <externalReference r:id="rId4"/> | |
78 </externalReferences> | |
79 in _rels/workbook.xml.rels | |
80 <Relationship Id="rId3" Target="externalLinks/externalLink1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink"/> | |
81 in externalLinks/externalLink1.xml | |
82 <ddeLink ddeService="REUTER" ddeTopic="IDN"... | |
83 <ddeItems> | |
84 ... | |
85 <ddeItem advise="1" name=".SPX"> | |
86 <values> | |
87 <value> | |
88 <val>1264.96</val> | |
89 </value> | |
90 </values> | |
91 </ddeItem> | |
92 Whew! | |
93 ---------- | |
94 Tried the largest sheet from the largest .xlsx I could find: | |
95 fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx | |
96 -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx' | |
97 -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml | |
98 | |
99 > lxcount xl/worksheets/sheet3.xml | sort -k2nr | |
100 *Total* 1230217 | |
101 c 596032 | |
102 v 595876 | |
103 f 19201 | |
104 row 18985 | |
105 col 106 | |
106 | |
107 <dimension ref="A1:DY18985"/> | |
108 | |
109 Blew java out of the water :-( | |
110 java.lang.OutOfMemoryError: Java heap space | |
111 | |
112 Need to try again with more memory, if I remember how... | |
113 | |
114 The raw result is going to have 18985 x 102 == 2 million cells == | |
115 (assuming average cell size of 30 bytes and row overhead of 20 (* | |
116 18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable... | |
117 ---------------- | |
118 Back to ranges - | |
119 |