It seems that there is currently no suffix tree implementation with Unicode support publicly available online. So I adapted Thomas Mailund's suffix tree implementation in C with a Python binding and put it here. The changes that I made to the code were mainly to make it support Unicode text and be compatible with new version Python. It also includes an example program all_comsubstr.py that illustrates the extraction of common substrings from two Chinese strings (encoded in UTF-8).
Tuesday, May 04, 2010
Subscribe to:
Post Comments (Atom)
6 comments:
Thank you!
Thanks very much for your contribution, it's just what I was looking for!
I was wondering if you have any example about how to implement an efficient dictionary search using an underlying suffix tree in python.
The usage is same as Thomas Mailund's original module.
Although this is some years old by now: Under which license is this code released?
As it is based on Thomas Mailund's suffix tree code, it should bear the same constraints. I do not know which license he has adopted ---probably a non-restrictive one as most 3rd-party modules for Python.
Hi Dell Zhang,
Could you make an example of using the SuffixTree to find the longest repeating substring (Repeating 2x, or more generally, n-times)?
Post a Comment