MinorThird is a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text. It was written primarily by Dr William W. Cohen. It comes with a collection of publically-available extraction problems in Minorthird format (about 2Mb).
Minorthird differs from existing NLP and learning toolkits in a number of ways:
- Unlike many NLP packages (eg GATE, Alembic) it combines tools for annotating and visualizing text with state-of-the art learning methods.
- Unlike many other learning packages, it contains methods to visualize both training data and the performance of classifiers, which facilitates debugging.
- Unlike other learning packages less tightly integrated with text manipulation tools, it is possible to track and visualize the transformation of text data into machine learning data.
- Unlike many packages (including WEKA), it is open-source, and available for both commercial and research purposes.
- Unlike any open-source learning systems I know of, it is architected to support active learning and on-line learning, which should facilitate integration of learning methods into agents.
No comments:
Post a Comment