Saturday, November 26, 2005

AJAX and "mashups"

Google has been an accidental catalyst for two major advances in Web application development: AJAX (Asynchronous JavaScript and XML) and "mashups" (new applications created by blending data and tools from different sources).
Stacy Cowley, Google Attracts Developers, IDG News Service, 11/23/2005.

Skepticism on the Semantic Web vision

Rob McCool, Rethinking the Semantic Web, Part 1, IEEE Internet Computing, Nov./Dec. issue, 2005, pp. 88.
"This column proposes several reasons that the Semantic Web vision hasn't been adopted. Part two will present an alternate approach."

Monday, November 21, 2005

New Introductory Book on Web Search

Prof. Mark Levene at Birkbeck, University of London has written a new book: An Introduction to Search Engines and Web Navigation. It is ideal for undergraduates studying web search, and those on general web technology courses.

Wednesday, May 25, 2005

Yahoo! API provides key-term extraction ability

Yahoo!'s content analysis web service can extract significant words and phrases from text.
John Herren's Yahoo! News Tag Soup is a representative application of this tool.

Thursday, May 12, 2005

Automatic Identification of User Goals in Web Search

Dr. Junghoo Cho's research group recently has a very nice paper on classifying Web queries into two categories (navigational or informational) using user-click behavior and anchor-link distribution features.
Uichin Lee, Zhenyu Liu, Junghoo Cho. "Automatic Identification of User Goals in Web Search." In Proceedings of the 14th International World Wide Web Conference 2005, Chiba, Japan, May 2005.

Wednesday, May 11, 2005

KDD-Cup 2005

This year, the KDD-Cup competition is about classifying internet user search queries. Its dataset is open.

Thursday, March 17, 2005

Google Code

Google has launched Google Code, Google's place for Open Source software.
The featured projects are picked applications that use some Google tool or api.
There are also links to some cool developer tools on the

Wednesday, March 16, 2005


Terrier (TERabyte RetrIEveR) is an open source (under Mozilla Public License) software (from University of Glasgow) for the rapid development of Web, intranet and desktop search engines. More generally, it is a modular platform for the rapid development of large-scale Information Retrieval applications, providing indexing and retrieval functionalities. It is written in Java.

A9 Open Search

A9 Launches Open Search to Syndicate Vertical Search Results using RSS.
Could it be used as the user interface of my experimental search engines?

Thursday, March 10, 2005


SIGMOD Online now offers an RSS feed with the news items posted on

"Learning in Web Search"

Wednesday, March 09, 2005


"MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, clustering, information extraction, and other machine learning applications to text".
It seems that MALLET has superseded the widely used toolkit Rainbow. The most important new feature of MALLET is probably its implementation of finite-state Conditional Random Fields. However, the current version of MALLET does not contain Support Vector Machines yet.

Sunday, March 06, 2005


I have started using Bloglines to read blog feeds that I am interested in.