diff notes.txt @ 37:ac3cd8de7a10

towards big rework of tokenisation
author Henry S. Thompson <ht@markup.co.uk>
date Tue, 25 Apr 2017 18:30:04 +0100
parents ae605b77d1e4
children 4c6a341e75da
line wrap: on
line diff
--- a/notes.txt	Tue Apr 25 12:24:31 2017 +0100
+++ b/notes.txt	Tue Apr 25 18:30:04 2017 +0100
@@ -1,3 +1,33 @@
+Tokenisation patterns, derived from parse.py, derived from 
+ https://sites.google.com/site/e90e50/random-topics/tool-for-parsing-formulas-in-excel
+   and
+ parser_formule_with_textbox_v01_2003.xla
+   linked to therein
+
+1 ("[^"]*") q
+  A text (delimited by double quotes) 
+2 (\{[^}]+}) m
+  A constant matrix
+3 (,) c
+  A list (function parameter) separator
+4 ([^=\-+*/();:,.$&lt;>^!]+(?:\.[^=\-+*/();:,.$&lt;>^!]+)*\() f
+  A function name followed by an opening parenthesis
+5 ([)]) p
+  A closing parenthesis
+6 (^=|\() l
+  The beginning of the formula or an opening
+         parenthesis (not part of a function)
+7 ((?:(?:'[^']+')|(?:\[[0-9]+\][^!]*)|(?:[a-zA-Z_][a-zA-Z0-9._]*)!)) n
+  A sheet name (either delimited by single quotes, or
+                bracketed number plus optional string,
+                or simple name (syntax is a _guess_)) 
+8 (\$?[A-Z]+\$?[0-9]+) s or r
+  A cell reference
+9 ([a-zA-Z_\\][a-zA-Z0-9._]*) v
+  A name (always for a variable?)
+10 (.) x
+  Single characters not matched by the previous patterns
+----------
 You can't depend on 
   <f si="..." t="shared"/>
  That is, it's _true_, but you can have a table with shared formulae
@@ -65,7 +95,10 @@
 references FIXED
  Solo local vars are recursively dereferenced
  The definition table is in workbook.xml definedNames/definedName[@name=$name]/.
-  Sheet name to filename mapping for locals is in workbook.xml sheets/sheet[@name=$sname]/@sheetId
+  Sheet name to filename mapping for locals is in workbook.xml
+    sheets/sheet[@name=$sname]/@sheetId
+    These appear in definedName, single-quoted if (iff?) the sheet name has spaces
+      (or other specials?)
  ??? Variables on l or r of ranges are just looked up: if they are complex
   no recursion is done: the _semantics_ of this case are not clear to
   me, need a real-life example...