Monday, April 28, 2008


ParsCit is an open-source reference string parsing package developed by Min-Yen Kan et al. It is based on the Conditional Random Fields (CRF) toolkit CRF++. It is being used by the well-known computer science digital library CiteSeer^x.

Tuesday, April 08, 2008

More Data vs. Better Algorithms

The recent blog posts from Anand Rajaraman that more data usually beats better algorithms (part 1, part 2 and part 3) reminds me of a talk by David Hand two years ago --- Classifier Technology and the Illusion of Progress. There has also been discussons on a this issue in Hal Daume III's blog post about Heuristics.

Laplacian Kernel, Resistance Distance and Commute Time

The Laplacian kernel for a graph is interestingly connected to the resistance distance (the total resistance between two nodes) and the commute time (the average length of a random walk between two nodes) over the graph.