# HG changeset patch # User Henry S. Thompson # Date 1491493673 -3600 # Node ID bfa38afaea63fe654a1597895694ffa58d2d2b4b # Parent ca98c74a7cb180bb929481f64aed66c92ef3e95e change to default ns diff -r ca98c74a7cb1 -r bfa38afaea63 notes.txt --- a/notes.txt Wed Apr 05 11:57:00 2017 +0100 +++ b/notes.txt Thu Apr 06 16:47:53 2017 +0100 @@ -45,10 +45,74 @@ to wait for ascii.xsl or html.xsl. But only copy type in in rect if there was content before. ----------- +Using attributes to hold space-separated lists is risky, as in +refs.xsl output, is risky! +----------- Not handling variables as references. Not catching external references to variables. Not catching naked [n]! as external references. Fixed, but not dereferenced vars The definition table is in workbook.xml definedNames/definedName[@name=$name]/. Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId - +----------- +Switch to default namespace in order to reduce size and improve readability +----------- +Should put another step after refs.xsl to compute a map from +distinct-values of all targets to all the cells which use them +(likewise ranges). That really does mean we should move to elts for +each ref or range, since at this point we want to compute vector +representation as well, so we can identify projections + +Slightly irritating that we'll have to serialise this as XML and then +re-build it later... +----------- + Overgenerating in kenneth_lay__19506: e.g. + from [1]!'.SPX' + Hmm. This cell displays in Excel as REUTERS|IDN!.SPX + The indirections work as follows: + in workbook.xml: + + + + + in _rels/workbook.xml.rels + + in externalLinks/externalLink1.xml + + ... + + + + 1264.96 + + + + Whew! +---------- +Tried the largest sheet from the largest .xlsx I could find: + fuse1k/'benjamin_rogers__1002__NYISO Price Information version 2'.xlsx + -rw-r--r-- 1 ht None 6273325 Apr 3 16:22 '../benjamin_rogers__1002__NYISO Price Information version 2.xlsx' + -rw-r--r-- 1 ht None 23221149 Jan 1 1980 xl/worksheets/sheet3.xml + + > lxcount xl/worksheets/sheet3.xml | sort -k2nr + *Total* 1230217 + c 596032 + v 595876 + f 19201 + row 18985 + col 106 + + + +Blew java out of the water :-( + java.lang.OutOfMemoryError: Java heap space + +Need to try again with more memory, if I remember how... + +The raw result is going to have 18985 x 102 == 2 million cells == +(assuming average cell size of 30 bytes and row overhead of 20 (* +18985 (+ 20 (* 102 30))) 58,473,800 bytes, which is big but tolerable... +---------------- +Back to ranges - + diff -r ca98c74a7cb1 -r bfa38afaea63 refs.xsl --- a/refs.xsl Wed Apr 05 11:57:00 2017 +0100 +++ b/refs.xsl Thu Apr 06 16:47:53 2017 +0100 @@ -1,11 +1,11 @@ - + ("[^"]*")|(\{[^}]+})|(,)|([^=\-+*/();:,.$<>^!]+(?:\.[^=\-+*/();:,.$<>^!]+)*\()|([)])|(^=|\()|((?:'[^']+')|(?:\[[0-9]+\][^!]*))|(\$?[A-Z]+\$?[0-9]+)|([a-zA-Z_\\][a-zA-Z0-9._]*)|(.) - + @@ -16,9 +16,14 @@ + - + + + + + +