view annotate.xml @ 64:823ac978f4ab

scrappy first pass w/o auto features
author Henry S. Thompson <ht@markup.co.uk>
date Mon, 12 Jun 2017 16:46:45 +0200
parents 9cf99d513197
children 53dd4ccac4fb
line wrap: on
line source

<?xml version='1.0'?>
<?xml-stylesheet type="text/xsl" href="../../../lib/xml/doc.xsl" ?>
<!DOCTYPE doc SYSTEM "../../../lib/xml/doc.dtd" >
<doc>
 <head>
  <title>Spreadsheet annotation spec</title>
  <author>Henry S. Thompson</author>
  <date>$Id$</date>
 </head>
 <body>
  <div>
   <title>Introduction</title>
   <p>This is a first pass at defining an annotation menu structure for
spreadsheets. The assumption is that we'll have an 'Annotate' entry in the Excel
right-button menu for selected regions, which will pop up region-appropriate
menus.</p>
  </div>
  <div>
   <title>Top-level menus</title>
   <note>If the selection is a single cell I guess we try popping up a selection type menu,
with choices 'Row', 'Column', 'Matrix' and 'None' (the latter resulting in <code>_Nnnn</code>).</note>
   <p>Right-clicking 'Annotate' when over a selected range will create a new
defined name of the form <code>_Xnnn</code>, where <code>X</code> is one of <code>R</code>,
<code>C</code> or <code>M</code> for <name>r</name>ows (horizontal range
selection), <name>c</name>olumns (vertical range selection) or
<name>m</name>atrix (for two-dimensional range selection) respectively, and
<code>nnn</code> is a serial number for the relevant selection type.</p>
   <p>The comment field (attribute in the XML) of the defined name should contain a
feature-value dictionary, represented in JSON/Python style, that is, using the
following BNF</p>:
   <display><code>fvd := '{' ( fvp ( ',' fvp )* )? '}'
fvp := key ':' value
key := string
value := string | number | fvp | array
string := '"' char* '"'
array := '[' ( value ( ',' value )* )? ']'</code></display>
<p>with whitespace ignored, 'number' being the usual integer or decimal
representation and 'char' being ASCII-only (?).</p>
   <p>If possible, the selected range should appear as the value of the new
name <emph>without</emph> single-quotes.</p>
   <p>Some features can and should be computed, others require annotator
decision.  Some features and/or feature values are unique to a particular selection type, others are
shared across all or some types.</p>
   <p>Accordingly, in order for the annotator to supply the required
information, a form should pop up with all the features appropriate to the
selection type.  Literal or array-valued form fields will just require a value
menu (allowing multiple selection in the array-valued case), but features with
dictionary values will require cascading sub-forms.</p>
   <p>The next two sections document the annotator-supplied and
software-supplied features.  Except for 'comment', whose value is free text,
allowed values are tabulated.</p>
  </div>
  <div>
   <title>Annotator-supplied features</title>
   <div>
    <title>All types</title>
    <list type="defn">
     <item term="comment">string: unconstrained.  By its nature difficult to
exploit, really should only be used to document a problem with the available
feature&amp;value vocabulary or structure.</item>
   </list>
   </div>
   <div>
    <title>Both one-dimensional types</title>
    <list type="defn">
     <item term="type">string: <code>"data"|"key"|"label"</code>
      <p>"key" is my preferred word for what Dresden call "attribute".  In the
simpler cases, think of it as what you might use in an HLOOKUP or VLOOKUP cell.</p>
     </item>
     <item term="content">fvd:
      <list type="defn">
       <item term="type">string: <code>"currency"|"date"|"datetime"|"integer"|"float"|"key"|"label"|"string"|"time"</code></item>
      </list>
      <p>The "key" and "label" content types are for use (as in the Dresden
paper example) where compound keys/labels are indicated by row or column spans.</p>
     </item>
    </list>
   </div>   
   <div>
    <title>Matrices</title>
    <list type="defn">
     <item term="type">string: <code>"table"|"data"|"label"|"condition"</code></item>
     <item term="content">fvd:
      <list>
       <item term="type">string: <code>"rows"|"columns"|"cells"</code></item>
      </list>
     </item>     
    </list>    
   <p>When a form for a matrix is completed, if <code>type</code> is 'data' a pop-up should offer to auto-fill
based on <code>content/type</code>.  If chosen, this fills the matrix with
named ranges of the appropriate orientation (rows, columns or, in the case of
<code>cells</code>, both).  If
it's not too hard, it would be good to go on to pop up the form for each
generated range
in turn, either having asked in advance for appropriate
features whose values are the same for all the ranges, or carrying forward
values from one to the next as defaults.</p>
   </div>
  </div>
  <div>
   <title>Software-supplied features</title>
  </div>
  <div>
   <title>Issues</title>
   <div>
    <title>Compound labels and keys</title>
    <p>There's a problem with
defining the structure I want for compound labels and keys, in that you can't
for example select the 6th column of rows 2 through 4 in the Dresden example,
to denote the "Group stage/Match 2/GA" column label:</p>
    <image source="dresden.jpg" textGloss="table with three-row labels involving column spans" original="http://upcommons.upc.edu/bitstream/handle/2117/100584/KDIR_2016_47_CR.pdf" width="75%">
<licence></licence>
    </image>
    <p>Excel would allow you to define a name for
F2:F4 in that spreadsheet, but I don't <emph>think</emph> you can select that
range with the mouse</p>
   </div>
   <div>
    <title>Metadata</title>
    <p>Nothing in the above proposal provides a way to annotate what Dresden
call 'Metadata'.  We could simply provide another 1-D type, e.g. 'meta', I suppose, or just allow uninteresting regions to remain unannotated. 
There is a difference between on the one hand informative prose such as occurs in the Dresden
example with the Metadata label, and regions whose type is just not obvious (as
e.g. lots in the Kenneth Lay sheet from the Enron dataset...</p>
   </div>
  </div>
 </body>
</doc>