Visual-Meta design considerations

Authors
Jakob Voß ORCID iD
Date
2019-10-29
Identifier
http://jakobvoss.de/visual-meta/
Status
Work in progress
Repository
https://github.com/nichtich/visual-meta/
Feedback
Annotate via hypothes.is
Open a GitHub issue
License
CC BY 4.0

Abstract

This paper summarizes some design considerations of Visual-Meta, an approach to visibly embed metadata in documents.

Introduction

The connection between a document and information about it (metadata), is easily lost when the document is copied or converted. Some document formats support embedded metadata such as EXIF in image files and XMP in PDF files, but the quality of this data is usually poor and its is rarely used when referencing the document. The loss of metadata is even more inevitable when a section of a document is copied. Software for citation management and personal knowledge management such as Zotero and Citavi provide tools to keep track of document metadata and notes but first the data has to be provided in some form and second it is kept in a database separate from the document so it is only useable with the specific software.

Visual-Meta is an approach to keep the connection between document and metadata by visibly putting the metadata into a document. The idea was presented by Hegland (2019) at the Hypertext’19 conference where I first criticized it but then got into a fruitful discussion that eventually led to this document.1

Visual-Meta

In its current, informal specification Visual-Meta is basically a BibTeX entry embedded in a document. The BibTeX entry MUST start with @ directly followed by the document type2 and an opening {.

In its simplest form for a Visual-Meta record is just a BibTeX entry like this (referencing Engelbart (1962)):

A regular expression to catch potential start positions of Visual-Meta records is “@[a-z]+{”. To extract Visual-Meta from a document a parser needs to find the last of these positions that start a syntactically well-formed BibTeX entry.

BibTeX format

BibTeX is an outdated legacy format but still the most common format to exchange bibliographic records for the purpose of citation management. In contrast to more modern alternatives such as CSL-JSON, BibTeX records may at least be familar to some users. Moreover its syntax is relatively condence, extendible and not affected by line breaks. An extensive description of BibTeX format is included in the biblatex manual (Kime, Wemheuer, and Lehman (2019); section 2).

To process BibTeX format it requires a BibTeX parser. Such programming libraries exist in several programming languages although not all of them fully respect the actual BibTeX grammar, as specified in the BibTeX source code.3 BibTeX entries can include custom key-value pairs so Visual-Meta can extend the format with additional fields.

BibTeX format extension

BibTeX supports lists of values in “separated value fields” using comma as separator (for instance in then standard BibTeX field keywords). Values can optionally be wrapped in braces to support literal commas in fields values. To further support nested fields, Visual-Meta introduces its own syntax on top of custom BibTeX custom fields like this:

The extended BibTeX syntax with custom fields can internally be converted to another format such as JSON to simplify processing. BibTeX entry type and optional key should NOT be stored as additional fields type and key but for instance entrytype and entrykey (if needed). The example above could then be processed like this:

Note that the BibTeX extension does not encoding of artitrary JSON in BibTeX: extended field values must not contain unbalanced braces or lists of lists.

Visual-Meta fields

Non-standard BibTeX fields of Visual-Meta are yet to be specified. Fields proposed in earlier documents about Visual-Meta include:

  • document
  • citations
  • glossary
  • visible-meta

Custom Visual-Meta field names must be checked against the biblatex manual (Kime, Wemheuer, and Lehman 2019) to not collide with existing fields! Some possible Visual-Meta fields are described as following.

orcid

A list of ORCID of authors. This is useful to extend the author field. For instance:

Corresponding fields can be used for other name fields, e.g. editororcid, holderorcid… The orcid field should have the same number of elements like authors listed in the corresponding author field. If an author has no ORCID, an empty list element can be included.

styles

A list of layout rules how to extract semantic parts from layout.4 The syntax of such rules has to be defined. At least layout properties such as font-size, color etc. should be based on corresponding CSS names instead of inventing an additional style language. The style location language might also allow comparision and more complex location such as known from query languages like XPath but it should not be too complex. A preliminary example:

styles = {
  heading = {font-weight=bold},
  footnote = {font-size=<small}
}

fragments

A list of document fragments such as chapters, transcluded quotes5… Identification of fragments is relevant to assign differnt metadata to different parts of the documents.

fragments = {
  {selector = ..., author = ...},
  {selector = ..., author = ...}
}

Each fragment is identfied by a selector to reference the fragment by an existing locating method (Web annotation selectors, purple numbers…).

Copy & Paste with Visual-Meta

An important design goal of Visual-Meta is to support persistence of metadata for copy & paste. Two solution exist to transport the metdata via clipboard:

  1. Visual-Meta is appended as BibTeX entry to the end of the document (at least in raw text format)
  2. Visual-Meta is included as alternative data format BibTeX with content type application/x-bibtex

The second approach has the benefit (or disadvantage) of not including a BibTeX entry in the default format so applications not aware of Visual-Meta will not behave differently when pasting into them. The first approach, however may be more easy to implement. See Clipboard API to access the clipboard from web applications. If a section is copied, the Visual-Meta field selector should be added to point to the particular selection such as applied with selectors of the Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017).

Visual-Meta of this document

The following BibTeX entry is the last occurring in this document, so it will be used as Visual-Meta.

References

Engelbart, Douglas. 1962. “Augmenting Human Intellect: A Conceptual Framework.” Menlo Park, USA: SRI International. http://dougengelbart.org/content/view/138.

Hegland, Frode. 2019. “Visual-Meta: An Approach to Surfacing Metadata.” In Proceedings of the 2nd International Workshop on Human Factors in Hypertext, 31–33. https://doi.org/10.1145/3345509.3349281.

Kime, Philip, Moritz Wemheuer, and Philipp Lehman. 2019. “The Biblatex Package.” August 17, 2019. http://www.ctan.org/pkg/biblatex.

Nelson, Ted. 1965. “Complex Information Processing: A File Structure for the Complex, the Changing and the Indeterminate.” Essay. In Proceedings of the 1965 20th National Conference, 84–100. https://doi.org/10.1145/800197.806036.

Sanderson, Robert, Paolo Ciccarese, and Benjamin Young, eds. 2017. Web Annotation Data Model.

Voss, Jakob. 2019. “Infrastructure-Agnostic Hypertext,” June. https://jakobib.github.io/hypertext2019/.


  1. The pros and cons of putting metadata into documents are out of the scope of this paper.

  2. See Kime, Wemheuer, and Lehman (2019); section 2.1 for a list of common document types. To allow arbitrary document types, any sequence of letters a to z should be allowed.

  3. Available at http://ftp.rrze.uni-erlangen.de/ctan/biblio/bibtex/base/bibtex.web. See https://github.com/aclements/biblib#recognized-grammar for a formal grammar.

  4. The use case is the reverse of application of styles to elements. See https://stackoverflow.com/q/58503493/373710 fora discussion how to extract document sections based on style.

  5. Hypertext as originally envisioned by Nelson (1965) requires transclusion but this feature is rarely implemented in current systems. See Voss (2019) for a summary of what’s needed to finally get real hypertext.