Saturday, March 28, 2009

Semantic Web vs. Semantic Interpretation

The article on The Unreasonable Effectiveness of Data also contrasts Semantic Web and Semantic Interpretation. Research on the Semantic Web aims to enable machines to comprehend semantic documents and data so as to achieve software service interoperability. It is fundamentally different with the problem of Semantic Interpretation, i.e., understanding human speech and writing. Roughly speaking, Semantic Web creates precise data, while Semantic Interpretation deals with inherently imprecise data --- natural language. It is important to make a clear distinction between the above two different meanings of 'semantic' which are both widely used in computer science.

Follow the Data

Recently some Google researchers published an article to advocate The Unreasonable Effectiveness of Data. They argue that AI researchers should embrace the complexity and follow the data rather than attempting to create elegant theories.


  • "Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data."

  • "Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail."


For example, due to the availability of Web-scale text data, natural language applications can achieve better performance by simply relying on word occurrence and co-occurrence statistics instead of complex latent factor analysis. The former approach is also more scalable because it only requires online learning that can be easily parallelized.

This clearly echoes the previous posts More Data vs. Better Algorithms and Right Data vs. Better Models.

Wednesday, March 25, 2009

Yahoo's Key Terms Extraction

It seems that the Key Terms feature of Yahoo! Search BOSS is based on the technique used to be called Prisma. It has actually been described in the following SIGIR-2003 paper:
Using terminological feedback for web search refinement: a log-based study.

Saturday, March 21, 2009

The Bayesian approach to ML

To be Bayesian or not to be Bayesian, that is the question. Zoubin Ghahramani recently gave a thought-provoking and humorous talk on this topic.