Sunday, August 28, 2011

Fastest membership test in Python

What is the most efficient method to check whether an item is in a given group or not? In Python, it seems that set (or frozenset) would be slightly faster than dict and much much faster than list.

Friday, August 26, 2011

Submodular functions

Intuitively, a submodular function over the subsets demonstrates "diminishing returns", which is related to the concept of marginal utility in economics. Its usefulness for machine learning is well explained and illustrated by the Beyond Convexity tutorial. There is a Matlab toolbox for submodular function optimization available that is developed by Andreas Krause.

L1 regularisation Is efficient for selecting relevant features

Andrew Ng has proven in his ICML-2004 paper that sample complexity grows linearly in the number of irrelevant features when using L2 regularisation (in logistic regression, support vector machine, and back-propagation neural network), but only logarithmically when using L1 regularisation (in logistic regression).

Sunday, July 03, 2011

New linear-time algorithm for suffix array construction

Juha Kärkkäinen, Peter Sanders , and Stefan Burkhardt: Linear Work Suffix Array Construction, Journal of the ACM (JACM), Volume 53 Issue 6, November 2006.
As the authors have said, this algorithm narrows the gap between suffix tree and suffix array, which are widely used and largely interchangeable index structures on strings and sequences. Usually theoreticians prefer the former due to linear-time construction algorithms and more explicit structure while practitioners prefer the latter due to its simplicity and space efficiency. Now there is one more reason for practitioners to stick with suffix array.

Friday, July 01, 2011

Research Impact for REF

The British government's emphasis on the practical impact of research in REF reminds me of Feynman's following words.

Physics [research] is like sex: sure, it may give some practical results, but that's not why we do it.