- One of the big problems, is that the scientific community does
not really understand semantics or meaning.
- Note that most parsers work largely (or entirely) on the part of
speech of each word.
- It's not entirely clear what each word means, much less a sentence,
or a paragraph.
- One resource that is useful is WordNet. It's a dictionary with
a semantic net placing words in a semantic hierarchy.
- Bag of word techniques like Latent Semantic Analysis
are also useful.
- You take a corpora, and you build a matrix of words by documents.
- You then mark the cells where words occur in documents.
- The meaning of the document is the vector of the words, and the
meaning of the words is the vector of documents.
- LSA reduces this to two much smaller matrices, and thus compresses
- While these are useful, they don't really give the meaning of a word.