- C. J. "Keith" van Rijsbergen, Information Retrieval. Classic. Free online.
- Bill Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures & Algorithms. Technical. With source code.
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval. Popular.
- Ian H. Witten, Alistair Moffat and Timothy C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Tight integration with data compression. Its companion software MG is open source and wide-spreaded.
- William Hersh, Information Retrieval: A Health and Biomedical Perspective. Primer. Domain specific.
- Peter Jackson and Isabelle Moulinier, Natural Language Processing for Online Applications: Text Retrieval, Extraction & Categorization. Biased to text mining.
- David A. Grossman, Ophir Frieder. Information Retrieval : Algorithms and Heuristic in the Springer The Information Retrieval Series. A very clear "fundamentals" book for students and practitioners.
- Richard K. Belew, Find Out About: Search Engine Technology from a Cognitive Perspective.
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval. On the way. Desirable
- ?
Monday, December 26, 2005
Information Retrieval Textbooks
Saturday, November 26, 2005
AJAX and "mashups"
Google has been an accidental catalyst for two major advances in Web application development: AJAX (Asynchronous JavaScript and XML) and "mashups" (new applications created by blending data and tools from different sources).
Stacy Cowley, Google Attracts Developers, IDG News Service, 11/23/2005.
Skepticism on the Semantic Web vision
Rob McCool, Rethinking the Semantic Web, Part 1, IEEE Internet Computing, Nov./Dec. issue, 2005, pp. 88.
"This column proposes several reasons that the Semantic Web vision hasn't been adopted. Part two will present an alternate approach."
"This column proposes several reasons that the Semantic Web vision hasn't been adopted. Part two will present an alternate approach."
Monday, November 21, 2005
New Introductory Book on Web Search
Prof. Mark Levene at Birkbeck, University of London has written a new book: An Introduction to Search Engines and Web Navigation. It is ideal for undergraduates studying web search, and those on general web technology courses.
Wednesday, May 25, 2005
Yahoo! API provides key-term extraction ability
Yahoo!'s content analysis web service can extract significant words and phrases from text.
John Herren's Yahoo! News Tag Soup is a representative application of this tool.
John Herren's Yahoo! News Tag Soup is a representative application of this tool.
Thursday, May 12, 2005
Automatic Identification of User Goals in Web Search
Dr. Junghoo Cho's research group recently has a very nice paper on classifying Web queries into two categories (navigational or informational) using user-click behavior and anchor-link distribution features.
Uichin Lee, Zhenyu Liu, Junghoo Cho. "Automatic Identification of User Goals in Web Search." In Proceedings of the 14th International World Wide Web Conference 2005, Chiba, Japan, May 2005.
Uichin Lee, Zhenyu Liu, Junghoo Cho. "Automatic Identification of User Goals in Web Search." In Proceedings of the 14th International World Wide Web Conference 2005, Chiba, Japan, May 2005.
Wednesday, May 11, 2005
KDD-Cup 2005
This year, the KDD-Cup competition is about classifying internet user search queries. Its dataset is open.
Thursday, March 17, 2005
Google Code
Google has launched Google Code, Google's place for Open Source software.
The featured projects are picked applications that use some Google tool or api.
There are also links to some cool developer tools on the SourceForge.net.
The featured projects are picked applications that use some Google tool or api.
There are also links to some cool developer tools on the SourceForge.net.
Wednesday, March 16, 2005
Terrier
Terrier (TERabyte RetrIEveR) is an open source (under Mozilla Public License) software (from University of Glasgow) for the rapid development of Web, intranet and desktop search engines. More generally, it is a modular platform for the rapid development of large-scale Information Retrieval applications, providing indexing and retrieval functionalities. It is written in Java.
A9 Open Search
A9 Launches Open Search to Syndicate Vertical Search Results using RSS.
Could it be used as the user interface of my experimental search engines?
Could it be used as the user interface of my experimental search engines?
Thursday, March 10, 2005
SIGMOD RSS feed
SIGMOD Online now offers an RSS feed with the news items posted on sigmod.org
http://www.sigmod.org/rss/index.rss
http://www.sigmod.org/rss/index.rss
Wednesday, March 09, 2005
MALLET
"MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, clustering, information extraction, and other machine learning applications to text".
It seems that MALLET has superseded the widely used toolkit Rainbow. The most important new feature of MALLET is probably its implementation of finite-state Conditional Random Fields. However, the current version of MALLET does not contain Support Vector Machines yet.
It seems that MALLET has superseded the widely used toolkit Rainbow. The most important new feature of MALLET is probably its implementation of finite-state Conditional Random Fields. However, the current version of MALLET does not contain Support Vector Machines yet.
Monday, March 07, 2005
Subscribe to:
Posts (Atom)