July 7, 2015

It has been an intense time for the Structured Stories project – mostly due to the ongoing Structured Stories NYC experiment. And while I haven’t been very active on this blog there has been lots written about our experiment on the RJI blog, in a Nieman Labs article, and on the Reporter’s Lab blog, which is becoming a really interesting record of the experiment.

A deluge of excellent reporting has been pouring in from the impressive and productive team in New York (Ishan Thakore, Natalie Ritchie and Rachel Chason), along with a deluge of revelations, clarifications, consternations and realizations – all valuable learning that could only have resulted from real reporting. The quantity of rich information being generated about the reality of structuring news events and narratives far exceeds what was available before this experiment, and will take months to fully digest. This is the first time that Structured Stories has been used in a production setting and, while many bugs and inadequacies have been revealed, the team has nonetheless been able to successfully use the software to continually capture events and stories, enabling us to explore the editorial aspects of the approach.

Some of the things we’ve learned are substantial. The range of situations described in the FrameNet semantic database seems to be sufficient to cover the majority of news events that we are seeking to report, which is very encouraging. A significant proportion of reporting seems to involve speech acts and other forms of communication by characters – probably to an extent that will require special handling of those kinds of events. There seems to be a previously unappreciated challenge in distinguishing between the structuring of events from language and the structuring of events from ‘models’ of stories. There are several built-in trade-offs in the nature of the event frames that we are creating – for example general vs specific, or across multiple FrameNet frames – which will probably require a shallow taxonomy of event frames (as FrameNet itself already has). Reporting and editing tools for handling characters and entities that have no external knowledge graph references will need to be substantially improved.

We have also learned much that will enable the development of nascent editorial guidelines to aid future structured reporting – how to define and choose event frames, how to choose between importance values and sub-narratives to represent detail, how to name characters and entities, how to systematically select external references for characters and entities, how to approach the specificity required for capturing structure. The list of software issues to be fixed and improved is also long and somewhat daunting. We have not yet come across any specific issue that suggests an insurmountable editorial barrier to the concept, although there are still lots of puzzles, questions, weirdness, vagueness and things-to-explore that may yet prove to be major challenges.

This isn’t easy. We are attempting to record general news events and news stories as structured data, which is a radical and unexplored notion. Success of any kind is not guaranteed and the events and stories that we are reporting and recording may be somewhat simplistic, coarse and clunky. All of this is, obviously, much harder than just writing more text. But we are actually reporting and recording general news as structured data. That is actually happening. For real.