comparison notes.txt @ 23:bfa38afaea63

change to default ns
author Henry S. Thompson <ht@markup.co.uk>
date Thu, 06 Apr 2017 16:47:53 +0100
parents ca98c74a7cb1
children 87e0d620deea
comparison
equal deleted inserted replaced
22:ca98c74a7cb1 23:bfa38afaea63
43 with (numerical) formats, but no content. Where do I throw those 43 with (numerical) formats, but no content. Where do I throw those
44 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have 44 away? Can throw away empty _rows_ in rect.xsl, but for _cells_ have
45 to wait for ascii.xsl or html.xsl. But only copy type in in rect if 45 to wait for ascii.xsl or html.xsl. But only copy type in in rect if
46 there was content before. 46 there was content before.
47 ----------- 47 -----------
48 Using attributes to hold space-separated lists is risky, as in
49 refs.xsl output, is risky!
50 -----------
48 Not handling variables as references. Not catching external 51 Not handling variables as references. Not catching external
49 references to variables. Not catching naked [n]! as external 52 references to variables. Not catching naked [n]! as external
50 references. 53 references.
51 Fixed, but not dereferenced vars 54 Fixed, but not dereferenced vars
52 The definition table is in workbook.xml definedNames/definedName[@name=$name]/. 55 The definition table is in workbook.xml definedNames/definedName[@name=$name]/.
53 Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId 56 Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId
54 57 -----------
58 Switch to default namespace in order to reduce size and improve readability
59 -----------
60 Should put another step after refs.xsl to compute a map from
61 distinct-values of all targets to all the cells which use them
62 (likewise ranges). That really does mean we should move to elts for
63 each ref or range, since at this point we want to compute vector
64 representation as well, so we can identify projections
65
66 Slightly irritating that we'll have to serialise this as XML and then
67 re-build it later...
68 -----------
69 Overgenerating in kenneth_lay__19506: e.g. <e:ref c="E9" er="[1]!'.SPX' '.SPX'!"/>
70 from <f>[1]!'.SPX'</f>
71 Hmm. This cell displays in Excel as REUTERS|IDN!.SPX
72 The indirections work as follows:
73 in workbook.xml:
74 <externalReferences>
75 <externalReference r:id="rId3"/>
76 <externalReference r:id="rId4"/>
77 </externalReferences>
78 in _rels/workbook.xml.rels
79 <Relationship Id="rId3" Target="externalLinks/externalLink1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink"/>
80 in externalLinks/externalLink1.xml
81 <ddeLink ddeService="REUTER" ddeTopic="IDN"...
82 <ddeItems>
83 ...
84 <ddeItem advise="1" name=".SPX">
85 <values>
86 <value>
87 <val>1264.96</val>
88 </value>
89 </values>
90 </ddeItem>
91 Whew!
92 ----------
93 Tried the largest sheet from the largest .xlsx I could find:
94 fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx
95 -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx'
96 -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml
97
98 > lxcount xl/worksheets/sheet3.xml | sort -k2nr
99 *Total* 1230217
100 c 596032
101 v 595876
102 f 19201
103 row 18985
104 col 106
105
106 <dimension ref="A1:DY18985"/>
107
108 Blew java out of the water :-(
109 java.lang.OutOfMemoryError: Java heap space
110
111 Need to try again with more memory, if I remember how...
112
113 The raw result is going to have 18985 x 102 == 2 million cells ==
114 (assuming average cell size of 30 bytes and row overhead of 20 (*
115 18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable...
116 ----------------
117 Back to ranges -
118