Saturday, February 17, 2007

Normalized Latent Semantic Indexing

Laplacian Eigenmap on the term-document bipartite graph with adjency matrix W = [0, A; A^T, 0] is acutally equivalent to Latent Semantic Indexing (Singular Value Decomposition) on the normalized term-document matrix A_n = D_1^{-1/2} A D_2^{-1/2}. This could be easily derived from the fact that the generalized eigenvector problem (D-W) y = \lambda D y can be tranformed into a standard eigenvector problem D^{-1/2} (D-W) D^{-1/2} x = \lambda x, i.e., D^{-1/2} W D^{-1/2} x = (1-\lambda) x where x = D^{1/2} y. See the paper Normalized Cuts and Image Segmentation p12. Note that we do not use D^{-1} W here because it is not garanteed to be symmetric. Refer to the book Kernel Methods for Pattern Analysis section 6.4. It is not clear yet whether such kind of normalization will lead to better empirical performance.

To my knowledge, the proof of this connection has been presented in the following two papers, but in an unnecessarily complicated way.
Co-clustering documents and words using Bipartite Spectral Graph Partitioning
Bipartite Graph Partitioning and Data Clustering

Manifold Learning

Manifold Learning or Nonlinear Dimensionality Reduction has been a hot topic for several years, particularly since the publication of ISOMAP and LLE on Science (December 2000).

A collection of Manifold Learning papers.
A collection of Manifold Learning codes (in MatLab).

John Langford has an insightful blog post: Why Manifold-Based Dimension Reduction Techniques?

Datasets in LIBSVM format

The LIBSVM Data page contains a number of preprocessed and formated datasets for various machine learning problems. It is very helpful when one wants to try some new machine learning algorithms (not necessarily SVM) without the hassle of preprocessing and formatting.

Thursday, February 15, 2007

Laplace Prior and Sparsity

Sparse parameter estimation can be achieved in the Bayesian framework by using the Laplace (double exponential) prior distribution. It is known that the MAP estimates using the Laplace prior are the same as those produced by applying the lasso algorithm that minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients.

The Bayesian Logistic Regression (BBR) and Bayesian Mutinomial Regression (BMR) software packages have implemented this idea and showed good performance in text classification.

Challenges in Statistical Machine Learning

John Lafferty and Larry Wasserman have recently published an article talking about the Challenges in Statistical Machine Learning. The following problems are considered current challenges that cut across both statistics and machine learning communities.

  • Sparse Learning in High Dimensions

  • Semi-Supervised Learning

  • Relation between Computation and Risk

  • Structured Prediction