Wednesday, November 14, 2007

Web Search Results Clustering based on Related Queries

Although automatic organization of Web search results should be helpful, search engines that offer clustered search results, such as Clusty from Vivisimo, have achieved only limited success so far. The underlying reason may be that (1) the created clusters are not really consistent with users' search interests and (2) the extracted labels for clusters are not really readable or informative from the users' perspective.

I have an idea which may sound silly:) Many search engines return a list of related queries in addition to the search results for a given query. Such related queries are real queries extracted from search logs --- they reflect the users real information needs. By using related queries as search result categories, we may be able to get user-oriented clustering of search results. This approach is simple to understand and easy to implement. Furthermore, it can be done on-the-fly just-in-time.

For example, a clustering interface for Windows Live Search could be developed in the following way.


  • Given a query q such as 'jaguar'.

  • Get the top 100 search results for q through Live Search SDK.

  • Get the related queries of q, such as 'jaguar animal' and 'jaguar car', by scraping the 'Related searches' column in the search results page directly, or using the ITermSuggestion provider in the Microsoft adCenter Keyword Services Platform.

  • For each related query r_i, create a corresponding cluster c_i with the label r_i, and assign all search results containing the r_i terms to the cluster c_i.

  • Present the clustered search results in a Clusty-like interface.

1 comment: