My interpretation of the Message Understanding Competitions was that
text extraction could be effectively done by a cascade of finite
state automatas (a good description of this approach was Jerry Hobbs'
FASTUS system.) This is what the
GATE
system uses as at least the introductory part of its standard
text extraction system. So, some of the standard systems could
be created to start to process language. One cascade would be:
- Tokenizer
- Sentence Splitter
- Part of Speech Tagger (e.g. Brill)
- Simple Phrase Parser
- Phrase Combination
- Template Filling
Note that only template filling is specific to text extraction. The
other components could be used for other systems such as dialog
agents.
As all of these components can be done with FSAs, each could be done
readily on neuromorphic systems. Moreover, all of these subprojects
could be combined into one system. Putting these in cascade on one
board would reduce the delays of getting things onto and off the board.