Tag Archives: xml

Balisage 2010 The Markup Conference

Balisage 2010 The Markup Conference was
preceded by the International Symposium on XML for the Long Haul Issues in the Long-term Preservation of XML which opened with:

A brief history of markup of social science data: from punched cards to “the life cycle” approach covering the “25-year process of historical evolution leading to DDI, the Data Documentation Initiative, which unites several levels of metadata in one emerging standard.”

Sustainability of linguistic resources revisited looked at some of the difficulties facing language resources over the long-term.

Report from the field: PubMed Central, an XML-based archive of life science journal articles provided insight into the processes deployed to give public access to the full text of more than two million articles.

Portico: A case study in the use of XML for the long-term preservation of digital artifacts discussed some practices that can help assure the semantic stability of digital assets.

The Sustainability of the Scholarly Edition in a Digital World explored the need for “ tools to make XML encoding easier, to encourage collaboration, to exploit social media, and to separate transcriptions of texts from the editorial scholarship applied to
them”.

A formal approach to XML semantics: implications for archive standards examined whether “The application of Montague semantics to markup languages may make it possible to distinguish vocabularies that can last from those which will not last”.

Metadata for long term preservation of product data discussed the “valuable lessons to be learned from the library metadata and packaging standards and how they relate to product metadata”.

The day concluded with Beyond eighteen wheels: Considerations in archiving documents represented using the Extensible Markup Language (XML) which contemplated “strategies for extending the useful life of archived documents”.

Sessions in the main conference 2010 – covered topics such as :

gXML, a new approach to cultivating XML trees in Java which proposed “A single unified Java-based API, gXML, can provide a programming platform for all tree models for which a “bridge” has been developed. gXML exploits the Handle/Body design pattern and supports the XQuery Data Model (XDM)”.

Java integration of XQuery — an information unit oriented approach explored “a novel pattern of cooperation between XQuery and Java developer? A new API, XQJPLUS, makes it possible to let XQuery build “information units” collected into “information trays”.

XML pipeline processing in the browser discussed the benefits that providing XProc as a Javascript-based implementation would offer comprehensive client-side portability for XML pipelines specified in XProc.

Where XForms meets the glass: Bridging between data and interaction design explored using XForms which offers a model-view framework for XML whilst working within the conventions of existing Ajax frameworks such as Dojo as a way to bridge differing development approaches,data-centric versus starting from the user interface .

A packaging system for EXPath demonstrated how to adapt conventional ideas of packaging to work well in the EXPath environment. “EXPath provides a framework for collaborative community-based development of extensions to XPath and XPath-based technologies (including XSLT and Xquery)”.

A streaming XSLT processor Michael Kay (editor of the XSLT 2.1 specification) showed how he has been implementing streaming features in his Saxon XSLT processor;

Processing arbitrarily large XML using a persistent DOM covered moving the DOM out of memory and into persistent storage offering another processing option for large documents, by utilising, an efficient binary representation of the XML document that has been developed, with a supporting Java API.

Scripting documents with XQuery: virtual documents in TNTBase presented a virtual-document facility integrated into TNTBase, an XML database with support for versioning. The virtual documents can be edited, and changes to elements in the underlying XML repository are propagated automatically back to the database.

XQuery design patterns illustrated the benefits that might extend from the application of meta design patterns to Xquery.

-Renhart Gittens

Validating normalised dates in XML

I had some fun (hmm…, maybe that’s not the right word) a year or two ago with regular expressions, trying to come up with something that could validate the kinds of ‘normalised’ dates that archivists use. You know the ones. The fuzzy dates, the approximates, the uncertains, the ‘it was in this decade, but I can’t be more precise than that’ date, the ‘I can tell you the start-date, but not the end-date’ (and vice-versa) date. To add to this, we now have the very precise dates associated with born-digital materials – down to the second complete with timezone. In the event, my problem was dispatched by the folks working on PREMIS, who created a union type that brings together some regular expressions to provide a fix (not perfect, but that’s regular expressions for you). Just recently the Library of Congress have mounted some pages in the Standards section of their website, where they have put together a nice statement of the problem, as well as pubishing the union type and an XML document with some test dates. See their Extended Date Time Format page.

-Susan Thomas