Structure meets Journalism

keep_calm

July 7, 2015

It has been an intense time for the Structured Stories project – mostly due to the ongoing Structured Stories NYC experiment. And while I haven’t been very active on this blog there has been lots written about our experiment on the RJI blog, in a Nieman Labs article, and on the Reporter’s Lab blog, which is becoming a really interesting record of the experiment.

A deluge of excellent reporting has been pouring in from the impressive and productive team in New York (Ishan Thakore, Natalie Ritchie and Rachel Chason), along with a deluge of revelations, clarifications, consternations and realizations – all valuable learning that could only have resulted from real reporting. The quantity of rich information being generated about the reality of structuring news events and narratives far exceeds what was available before this experiment, and will take months to fully digest. This is the first time that Structured Stories has been used in a production setting and, while many bugs and inadequacies have been revealed, the team has nonetheless been able to successfully use the software to continually capture events and stories, enabling us to explore the editorial aspects of the approach.

Some of the things we’ve learned are substantial. The range of situations described in the FrameNet semantic database seems to be sufficient to cover the majority of news events that we are seeking to report, which is very encouraging. A significant proportion of reporting seems to involve speech acts and other forms of communication by characters – probably to an extent that will require special handling of those kinds of events. There seems to be a previously unappreciated challenge in distinguishing between the structuring of events from language and the structuring of events from ‘models’ of stories. There are several built-in trade-offs in the nature of the event frames that we are creating – for example general vs specific, or across multiple FrameNet frames – which will probably require a shallow taxonomy of event frames (as FrameNet itself already has). Reporting and editing tools for handling characters and entities that have no external knowledge graph references will need to be substantially improved.

We have also learned much that will enable the development of nascent editorial guidelines to aid future structured reporting – how to define and choose event frames, how to choose between importance values and sub-narratives to represent detail, how to name characters and entities, how to systematically select external references for characters and entities, how to approach the specificity required for capturing structure. The list of software issues to be fixed and improved is also long and somewhat daunting. We have not yet come across any specific issue that suggests an insurmountable editorial barrier to the concept, although there are still lots of puzzles, questions, weirdness, vagueness and things-to-explore that may yet prove to be major challenges.

This isn’t easy. We are attempting to record general news events and news stories as structured data, which is a radical and unexplored notion. Success of any kind is not guaranteed and the events and stories that we are reporting and recording may be somewhat simplistic, coarse and clunky. All of this is, obviously, much harder than just writing more text. But we are actually reporting and recording general news as structured data. That is actually happening. For real.

Read More →

On the Road

On_the_road

June 8, 2015

In the month since my last blog post I’ve made three week-long trips across the country, engaging with the two communities most closely associated with Structured Stories – Journalism and Computational Narrative.

My first trip was to the Reynolds Journalism Institute at the University of Missouri in Columbia, where I am a fellow this year, conducting a formal evaluation of Structured Stories. I met most of the RJI leadership team, including Executive Director Randy Picht and Research Director Esther Thorson, and spent several days with my research partner at Mizzou, Frank Russell. I was impressed by the intellect and seriousness of everyone I met and I’m convinced that RJI is the perfect environment for a careful, thoughtful and credible evaluation of the concept. I will publish more details on our research program as Frank and I develop the particulars.

Trip two was to Atlanta, Georgia, where the 6th workshop on Computational Models of Narrative was taking place. I was there to deliver my paper on ‘Narrative Structures as a Framework for Journalism”, and this was the first time I had presented the Structured Stories concept to the computational narrative community. I was very pleased by the interest and response, and I came away feeling confident about the conceptual basis of the approach and about the place of Structured Stories within the field. I also met many fascinating people with long experience in representing narrative as data, made new friends at the workshop and over dinner each night, and was introduced to several other people doing interesting and related work in the Atlanta area.

The third trip was to New York, New York – specifically the exciting and fast-paced neighbourhoods of Soho and TriBeCa. This trip was for the training program for the team participating in the Structured Stories NYC reporting project. The team members are Ishan Thakore, Natalie Richie and Rachel Chason – all students from Duke recruited and guided by Bill Adair. We were also joined by several guests, and went from an introductory overview to structuring real events and stories in three days. It was an intense experience filled with interesting discussions and examples, and the high calibre of our reporters made me very pleased with how the project has kicked off. Structured events from the NYC project are already pouring in and stories should be up on the website within a few days.

I am now back in L.A. for at least the next 2 months, focused primarily on supporting the reporting team in NYC. The NYC project is critical because it will determine whether the Structured Stories concept is editorially feasible – i.e. can it work on real stories in a real reporting workflow. With this project we are exploring ‘structured editorial’ issues that are new for journalism, and we may uncover many unanticipated challenges and opportunities. These are still very early days for Structured Stories, but they are increasingly busy and filled with interesting engagement!

Read More →

Coding, reporting and editing

Call for entries by CJF Innovation Award, among others

March 11, 2015

The past 6 weeks have been mostly consumed with activity that was important to moving Structured Stories forward, but was not coding and was not reporting. More will be forthcoming in the fullness of time.

The bit of coding that was done in February was mostly about building a nascent editing tool, which isn’t publicly visible but which will enable publicly visible things to happen – namely many more events and stories.

The editing tool will do a lot of things (see the previous post), but at the moment the one big thing that it needs to do is enable the easy creation and editing of event frames. Event frames are at the heart of structured stories and working with them quickly and easily is essential to establishing a reporting process.

Nothing is easy, and everything takes more time that hoped, but things are moving forward. My goal for 2015 is to enable reporting into Structured Stories sufficient to enable a robust evaluation of the concept. There are many aspects to that goal that will require more exploration, trial, error and time. If you want to help then get in touch.

Read More →

From story code to story data

newyear

December 31, 2014

With just a few hours left in 2014 I figured it was time for quick recap of the year that was and a quick preview of the year to come. I also just pushed a major update to the ‘production beta’ code, including the ability to use Facebook reference IDs to define characters.

A year ago Structured Stories was just a goal, a pile of research notes, some nascent ideas and a primitive prototype. The data model and architecture stabilized in Q1 and design and coding of the application began in Q2. The early beta application launched in October and has been continuously improved since then. Today it is an increasingly stable tool with an accessible user experience and a robust API, and users can use it to consume and create structured events and structured stories. Judge for yourself here.

2015 is about transitioning the Structured Stories project from coding to journalism. The beta application will be function-complete within a few weeks, and by February I hope to have editing tools in place sufficient to enable user-entered stories to become permanent. At that point the project changes from a technical focus to a journalism focus – coding will scale back to mostly bug fixes, and the creation and growing of stories will become the primary activity. Local government news in Los Angeles remains the domain.

Focusing on journalism requires discovering, understanding and addressing a complex set of editorial challenges that will probably be at least as daunting as the technical challenges of the past year, including:

  • Creating and applying editorial guidelines for the creation of event frames.
  • Creating and applying editorial guidelines for the entry of events and the creation of stories.
  • Building the event frame library from just over 100 frames now to several thousand frames.
  • Developing an editorial process that can accommodate many contributors to stories and that can support coherent editing of events and stories.
  • Understanding how story and event editing and maintenance actually work and building tools to support those activities.
  • Observing and reacting to how real users use Structured Stories to create and consume stories.
  • Discovering how to educate early users about Structured Stories, its functionality and utility.

The primary usefulness of the beta application in 2015 is to enable these editorial challenges and others to be clearly identified, defined and addressed. There is much to do and much to learn, but there has also been some progress. The Structured Stories concept works technically, and if it can also work editorially then it may be useful.

Read More →

It’s here!

smallbaby

November 7, 2014

The Structured Stories beta application launched publicly in late October, 2014, after a long and painful delivery.

The application is now open for anyone to browse and explore structured stories, but registration – required for creating and editing stories – is still ‘upon request’ until at least the end of 2014. Pick a story and look at the ‘structured story’ view to quickly understand the concept.

This is a milestone, but these are still very early days for Structured Stories. The next few months will be about debugging and completing the full v1.0 stack (backend apps, API and web app), and then the focus will shift from technology to journalism. The rough goal is to build out a dense collection of stories (a ‘narrative network’) on local government in Los Angeles by mid-2015 – sufficient to fully demonstrate the concept, test its value and generate learnings about the application of semantic and knowledge engineering techniques to news.

At this stage of the project FEEDBACK IS CRITICAL. If you have opinions or criticism or comments about this project, or if you find bugs, then please let me know.

Read More →

Summer of Code

javascript

August 4, 2014

Well – so much for ‘updates coming soon’! I have been so consumed in technical matters for the past two months that I haven’t had time to do much else. The explanatory presentation remains unfinished and I’m beginning to feel guilty about not keeping my small handful of blog readers informed. So here’s a quick post to check in and to give a (hopefully) more realistic estimate of what to expect next and when to expect it.

The beta product is generally on track and is dramatically simpler and easier to grasp than the prototype. The functionality is almost identical to the prototype – no more and no less – but the user interaction is unrecognizable and (I think) a vast improvement. I am aiming to open the beta site to anyone who is interested in the last week of September, following the ONA 14 conference in Chicago. I will be attending ONA14 and demonstrating the beta application there, so if you would like to meet up please get in touch. More details to follow. Really.

The technical foundations of the beta are firmly in place and are working well. I am developing the front-end in AngularJS, with various plug-ins and libraries. The back end remains Node.JS, with data primarily in Redis and serving from Heroku (node) and EC2 (event data). All coding is in JavaScript (and CoffeeScript, my new superpower!), and client-server communication is a fully RESTful API – which really is as zen as it sounds. It’s a stack that is working well and that should be robust enough and scalable enough to last for quite a while.

The central update from the last few months is that Structured Stories is advancing from being just a technology to being a real product – a process that has been rooted in many iterations of the user experience design and that includes deep product decisions that have been forced by the UI development (most prominently the decision to include reporting tools directly in the product from the get-go). I have already begun to demo the beta application and we should be only about 7-8 weeks away from a product that YOU, dear blog reader, will be able to play around with yourself – browsing and entering stories. This is hard. Please be patient. :-)

As always, I invite anyone who might be interested in Structured Stories to get in touch. I always have time to chat or email and I can even set up a web-ex for a sneak preview of the beta. Have a great summer and look forward to playing with a new approach to news in the fall!

Read More →

Demo Day

demoday

April 9, 2014

Since February I have been doing some outreach about StructuredStories and have come to realise that I need to communicate more clearly about the technology. I am therefore providing two related guides – one in the form of a video demonstration of the prototype, and the other in the form of an informal FAQ document.

The video demonstration is quite extensive – about 1 hour – and my goal has been to replicate what an interested layperson would experience if I were demonstrating the prototype to them personally. I provide lots of commentary, including brief introductions to knowledge graphs and to FrameNet, and I apologise in advance for butchering some of these descriptions in attempts at simpler explanations. The video is on YouTube – I recommend that you view it in High Definition mode. The link is here:

A Demonstration of the StructuredStories prototype.

[IMPORTANT!: The link above is a video demo of the *proof-of-concept prototype*, not of the beta product, and it is intended for those interested in the technical background behind Structured Stories. If you want a demo of the beta product then either get in touch or wait until it launches in mid-October, 2014.]

The Frequently Asked Questions document is loosely based on questions that I receive while demonstrating and talking about the technology, and I intended for this document to accompany the demonstration. I suggest that you read this document first, then decide if you want to view the 1 hour demo. The FAQ document is in PDF format and is linked here:

StructuredStories – Frequently Asked Questions

The StructuredStories project is now in a new phase. I have decided to focus my attention on the specification and architectural design of a ‘version 1’ product, and therefore to place less emphasis on the prototype except for experimentation and communication. This will increase the time until I can release a publicly-accessible beta product, but will hopefully result in the creation of an enduring suite of technology. The prototype has largely served its purpose. It works.

Read More →

Patent Pending!

PatentPending

February 25, 2014

Well, it has been a very busy two months! January was very challenging but extremely productive and was centered on two additional major revisions to the event and narrative data structures. This brings the number of major model revisions to five – this process of alternatively coding a working prototype and then assessing the resulting functionality has been absolutely critical to moving the Structured Stories project forward, as has a deeper review of various research literatures in light of learnings from the prototypes. The system is now very much clarified, and I have spent most of February documenting it and preparing the patent application. That application has now been submitted, and therefore I can now be a bit more forthcoming here about the details. This will be a highly technical description, but I am publishing it here anyway in an effort to be as transparent as I can about this project. Please email me if you would like more (or less!) details.

The basis of the Structured Stories system is the concept of ‘semantic events’, which are uniquely identifiable representations of any specific activity in the world, no matter how large or small. These semantic events are defined using a library of abstract definitions of general forms of semantic events, which are in turn defined by a formal library of semantic units (specifically the FrameNet semantic library from UC Berkeley). The interior definitions of these semantic events are based on two basic elements – (1) a set of formal semantic roles describing the specific activity of the characters, entities, locations, etc that are involved the event and which are ‘filled’ using typed references to FreeBase, WikiData, GeoNames, etc, and (2) semantic phrases, primarily verb-based predicate phrases, that convey the activity that the event is about and which are semi-formal. The use of these partially-formal definitions of events, based on an open-ended library of abstract event forms and grounded in FrameNet, is intended to enable any conceivable event to be captured and represented without primary dependence on natural language. Furthermore, the availability of uniquely identified and partially-formal semantic events enables quite a few additional features at the event level, some of which are extremely powerful – for example the ability to represent and use cause-and-effect relationships between discrete events.

But capturing and representing semantic events is only half of the challenge here. The other half is in providing a method for organizing and navigating those events in a manner that is not merely coherent, but is also optimized for human understanding of the underlying events within their context. This method is the ‘narrative structure’, or ‘structured story’. A narrative structure is set of references to semantic events that enables those events to be ‘consumed’ as a narrative, and is much more than merely a simple list of events. Narrative structures include recursive elements (or ‘sub-narratives’), importance weightings and mechanisms for navigation between stories via common events, all of which mirror the natural narrative capacity of human beings.

There are a lot of interesting things that result from this representation of semantic events and of narrative structures. The most significant is the establishment of an ‘event’ graph’ and/or a ‘narrative graph’ that is formed from the various relationships between events and narratives – it has been interesting to observe this event/narrative graph emerge even from just the 40 or so L.A. local government stories that have been represented within the system to date. The power of this representation for extremely specific question answering has also become very apparent, as has its potential as a powerful way to navigate the ‘document web’ based on narrative.

I know that readers of this blog are keen to get access to the demonstration site so that they can play with the stories and draw their own conclusions, and I am working to make this possible as soon as I can. My challenge now is to rebuild the codebase and dataset following the last major data model revision in late January, plus address some scale considerations, and so I believe that I have another month or two of JavaScript ahead of me before I can open the site. I also need to catch up on the reporting of new events and stories. Please bear with me, and please drop me an email if you want to learn more or if you want early access to the demo.

Read More →

Modeling Narratives

DataModel

December 5, 2013

Working with stories in computers is obviously dependant upon a knowledge representation mechanism that allows for the storage and retrieval of narrative information – a narrative data model. The Structured Stories narrative data model is now in its third major revision, and expresses all of the major requirements for working with stories within the domain chosen for the proof-of-concept project: local government news. The model has been influenced by published models used in various pioneering computational narrative projects, but has some fundamental differences from these – most importantly in the use of semantic web concepts and in its focus on micro-content digital media requirements rather than on literary or academic objectives. Although it is likely that the model will undergo a further two or three major revisions before it ready to support a beta release, it is now beginning to stabilize and it currently supports the addition of new events and new stories at a rate that is equivalent to existing news reporting in the domain. Revisions of the model are primarily driven by learnings from working with these ‘live’ events and stories, and I expect to publish more details here once the model has stabilized and I have filed key patent applications.

From a technology perspective the current ‘stack’ is aimed at supporting rapid iteration on the data model while providing sufficient flexibility for a beta release and for support for some demonstration applications (primarily a reporting tool and a news reader). The current platform is based around Node.js, allowing JavaScript (with JQuery) to be used for both client and back-end development. Data is managed in a Neo4J NoSQL database, with characters, entities, locations, etc. all referenced externally in the linked data universe. While the various ontologies used are currently being managed somewhat awkwardly in XML, the goal is to move all of these into OWL 2.0 and SKOS at some point in the new year. At the moment all of this is solely being used to support exercise of and revisions to the data model – a relatively narrow set of functionality – however it will also support considerably more functionality and scale when that becomes necessary.

I have found that attempting to capture the abstraction of stories in a useful way is not easy, however after months of modeling and three major revisions I have not yet come across an insurmountable conceptual obstacle. I believe that my narrative data model is growing increasingly useful – at the very least within the domain of local government news.

Read More →