# HG changeset patch # User Henry S. Thompson # Date 1497278805 -7200 # Node ID 823ac978f4abf09aaf169ac74f2cf612b2d4ab64 # Parent 9cf99d5131970154f7ca8d2c71016c2f82150be8 scrappy first pass w/o auto features diff -r 9cf99d513197 -r 823ac978f4ab annotate.xml --- a/annotate.xml Mon Jun 12 13:40:56 2017 +0200 +++ b/annotate.xml Mon Jun 12 16:46:45 2017 +0200 @@ -38,16 +38,91 @@ representation and 'char' being ASCII-only (?).

If possible, the selected range should appear as the value of the new name without single-quotes.

-

Some top-level features should be computed, others require annotator -decision. Some features are unique to a particular selection type, others are +

Some features can and should be computed, others require annotator +decision. Some features and/or feature values are unique to a particular selection type, others are shared across all or some types.

Accordingly, in order for the annotator to supply the required information, a form should pop up with all the features appropriate to the selection type. Literal or array-valued form fields will just require a value menu (allowing multiple selection in the array-valued case), but features with dictionary values will require cascading sub-forms.

-

The sections below tabulate the annotator-supplied features and their -possible values.

+

The next two sections document the annotator-supplied and +software-supplied features. Except for 'comment', whose value is free text, +allowed values are tabulated.

+ +
+ Annotator-supplied features +
+ All types + + string: unconstrained. By its nature difficult to +exploit, really should only be used to document a problem with the available +feature&value vocabulary or structure. + +
+
+ Both one-dimensional types + + string: "data"|"key"|"label" +

"key" is my preferred word for what Dresden call "attribute". In the +simpler cases, think of it as what you might use in an HLOOKUP or VLOOKUP cell.

+
+ fvd: + + string: "currency"|"date"|"datetime"|"integer"|"float"|"key"|"label"|"string"|"time" + +

The "key" and "label" content types are for use (as in the Dresden +paper example) where compound keys/labels are indicated by row or column spans.

+
+
+
+
+ Matrices + + string: "table"|"data"|"label"|"condition" + fvd: + + string: "rows"|"columns"|"cells" + + + +

When a form for a matrix is completed, if type is 'data' a pop-up should offer to auto-fill +based on content/type. If chosen, this fills the matrix with +named ranges of the appropriate orientation (rows, columns or, in the case of +cells, both). If +it's not too hard, it would be good to go on to pop up the form for each +generated range +in turn, either having asked in advance for appropriate +features whose values are the same for all the ranges, or carrying forward +values from one to the next as defaults.

+
+
+
+ Software-supplied features +
+
+ Issues +
+ Compound labels and keys +

There's a problem with +defining the structure I want for compound labels and keys, in that you can't +for example select the 6th column of rows 2 through 4 in the Dresden example, +to denote the "Group stage/Match 2/GA" column label:

+ + + +

Excel would allow you to define a name for +F2:F4 in that spreadsheet, but I don't think you can select that +range with the mouse

+
+
+ Metadata +

Nothing in the above proposal provides a way to annotate what Dresden +call 'Metadata'. We could simply provide another 1-D type, e.g. 'meta', I suppose, or just allow uninteresting regions to remain unannotated. +There is a difference between on the one hand informative prose such as occurs in the Dresden +example with the Metadata label, and regions whose type is just not obvious (as +e.g. lots in the Kenneth Lay sheet from the Enron dataset...

+
diff -r 9cf99d513197 -r 823ac978f4ab dresden.jpg Binary file dresden.jpg has changed diff -r 9cf99d513197 -r 823ac978f4ab notes.txt --- a/notes.txt Mon Jun 12 13:40:56 2017 +0200 +++ b/notes.txt Mon Jun 12 16:46:45 2017 +0200 @@ -228,6 +228,12 @@ ------------ @@ string identity, to say nothing of actual value, is lost -- fix? @@ row/column/both spans [what?] + +Now using up to 4 border colours to reflect incoming refs + @@ sort these before clipping to 4 to reflect frequency + @@ use vertical layering in the cell to get the borders + more evident when a background colour is present? Already + happening, just a bit hard to see, need a 1px space? ------ enron1k/kenneth_lay__19506 contains this formula: diff -r 9cf99d513197 -r 823ac978f4ab sample2y/workbook_indented.xml --- a/sample2y/workbook_indented.xml Mon Jun 12 13:40:56 2017 +0200 +++ b/sample2y/workbook_indented.xml Mon Jun 12 16:46:45 2017 +0200 @@ -1,4 +1,4 @@ - + @@ -14,90 +14,69 @@ - + rel:{label:down,parent:_M000}}"> "Sheet1!$A$1:$H$1" - + rel:{parent:_M000}}"> "Sheet1!$A$2:$A$61" - + rel:{parent:_M000,reffed:[_C003]}}"> "Sheet1!$B$2:$B$61" - + rel:{parent:_M000}}"> "Sheet1!$D$2:$D$61" - + rel:{parent:_M000,reffed:[_C009]}}"> "Sheet1!$E$2:$E$61" - + rel:{parent:_M000,reffed:[_C003]}}"> "Sheet1!$F$2:$F$61" - + rel:{parent:_M000}}"> "Sheet1!$H$2:$H$61" - + rel:{parent:_M000}}"> "Sheet1!$J$2:$J$61" - + rel:{parent:_M000,refs:[_C001,_C005]}}"> "Sheet1!$C$2:$C$61" - + rel:{parent:_M000,refs:[_C004,_C005]}}"> "Sheet1!$G$2:$G$61" - + data:[_C001,_C002,_C008,_C003,_C004, + _C005,_C009,_C006,_C007]}}"> Sheet1!$A$1:$J$61