Saturday, February 17, 2007

Normalized Latent Semantic Indexing

Laplacian Eigenmap on the term-document bipartite graph with adjency matrix W = [0, A; A^T, 0] is acutally equivalent to Latent Semantic Indexing (Singular Value Decomposition) on the normalized term-document matrix A_n = D_1^{-1/2} A D_2^{-1/2}. This could be easily derived from the fact that the generalized eigenvector problem (D-W) y = \lambda D y can be tranformed into a standard eigenvector problem D^{-1/2} (D-W) D^{-1/2} x = \lambda x, i.e., D^{-1/2} W D^{-1/2} x = (1-\lambda) x where x = D^{1/2} y. See the paper Normalized Cuts and Image Segmentation p12. Note that we do not use D^{-1} W here because it is not garanteed to be symmetric. Refer to the book Kernel Methods for Pattern Analysis section 6.4. It is not clear yet whether such kind of normalization will lead to better empirical performance.

To my knowledge, the proof of this connection has been presented in the following two papers, but in an unnecessarily complicated way.
Co-clustering documents and words using Bipartite Spectral Graph Partitioning
Bipartite Graph Partitioning and Data Clustering

No comments: