Other Language Modules

Traditional Text Engineering systems are broken into different modules. A reasonable modular break down would include:
  1. Word Recognition, which has been explained.
  2. Part of Speech Analysis. This is categorisation to learn parts of speech and to categorise examples. This can take advantage of context.
  3. Parsing has been explained.
  4. Co-Reference resolution involves aligning anaphor (e.g. pronouns) with existing entities. Existing entities are either active CAs, recently inactive CAs, or recently activated bound n-tuples. The question is which CA to activate. This is decided by activation.
  5. Discourse Analysis, this is the least understood module. It would involve building up temporary structure, and allowing some of that structure to persist. In some sense it can be implemented with really complex grammar rules.