Monday, July 14, 2008

Python defaultdict

I recently learned the trick of using the defaultdict class for frequency counting and smoothing from Peter Norvig's influential technical article How to Write a Spelling Corrector.

As its name suggests, defaultdict is like a regular Python dict except that a default value (factory in fact) can be specified in advance. For example, the following piece of code uses the stadard dict to build a term-frequency hash table.


tf = {}
for t in words:
tf[t] = tf.get(t, 0) + 1

It can be simpler and faster by making use of the defaultdict from the collections module.

import collections
tf = collections.defaultdict{int}
for t in words:
tf[t] += 1


PS: I am glad to see that my colleague Rogger Mitton's work on spell-checking is cited by the above article.

No comments: