PatentPending

February 25, 2014

Well, it has been a very busy two months! January was very challenging but extremely productive and was centered on two additional major revisions to the event and narrative data structures. This brings the number of major model revisions to five – this process of alternatively coding a working prototype and then assessing the resulting functionality has been absolutely critical to moving the Structured Stories project forward, as has a deeper review of various research literatures in light of learnings from the prototypes. The system is now very much clarified, and I have spent most of February documenting it and preparing the patent application. That application has now been submitted, and therefore I can now be a bit more forthcoming here about the details. This will be a highly technical description, but I am publishing it here anyway in an effort to be as transparent as I can about this project. Please email me if you would like more (or less!) details.

The basis of the Structured Stories system is the concept of ‘semantic events’, which are uniquely identifiable representations of any specific activity in the world, no matter how large or small. These semantic events are defined using a library of abstract definitions of general forms of semantic events, which are in turn defined by a formal library of semantic units (specifically the FrameNet semantic library from UC Berkeley). The interior definitions of these semantic events are based on two basic elements – (1) a set of formal semantic roles describing the specific activity of the characters, entities, locations, etc that are involved the event and which are ‘filled’ using typed references to FreeBase, WikiData, GeoNames, etc, and (2) semantic phrases, primarily verb-based predicate phrases, that convey the activity that the event is about and which are semi-formal. The use of these partially-formal definitions of events, based on an open-ended library of abstract event forms and grounded in FrameNet, is intended to enable any conceivable event to be captured and represented without primary dependence on natural language. Furthermore, the availability of uniquely identified and partially-formal semantic events enables quite a few additional features at the event level, some of which are extremely powerful – for example the ability to represent and use cause-and-effect relationships between discrete events.

But capturing and representing semantic events is only half of the challenge here. The other half is in providing a method for organizing and navigating those events in a manner that is not merely coherent, but is also optimized for human understanding of the underlying events within their context. This method is the ‘narrative structure’, or ‘structured story’. A narrative structure is set of references to semantic events that enables those events to be ‘consumed’ as a narrative, and is much more than merely a simple list of events. Narrative structures include recursive elements (or ‘sub-narratives’), importance weightings and mechanisms for navigation between stories via common events, all of which mirror the natural narrative capacity of human beings.

There are a lot of interesting things that result from this representation of semantic events and of narrative structures. The most significant is the establishment of an ‘event’ graph’ and/or a ‘narrative graph’ that is formed from the various relationships between events and narratives – it has been interesting to observe this event/narrative graph emerge even from just the 40 or so L.A. local government stories that have been represented within the system to date. The power of this representation for extremely specific question answering has also become very apparent, as has its potential as a powerful way to navigate the ‘document web’ based on narrative.

I know that readers of this blog are keen to get access to the demonstration site so that they can play with the stories and draw their own conclusions, and I am working to make this possible as soon as I can. My challenge now is to rebuild the codebase and dataset following the last major data model revision in late January, plus address some scale considerations, and so I believe that I have another month or two of JavaScript ahead of me before I can open the site. I also need to catch up on the reporting of new events and stories. Please bear with me, and please drop me an email if you want to learn more or if you want early access to the demo.