Announcing Data.SuffixTree, a lazy, efficient Haskell suffix tree

A few days ago, I wrote a Haskell library for building and working with suffix trees. It builds a suffix tree lazily, so even though its performance is O(n log n) on large input strings, it often has linear performance for many applications.

The implementation is simple and easy to read; the API is well-documented and easy to use; and the internals of the tree structure are exposed, so you can build your own traversal, inspection, and manipulation functions on top of it. The code is liberally licensed (BSD3), to encourage wide use.

You can download the stable version from HackageDB, or fetch the source repository:

darcs get

Posted in haskell
2 comments on “Announcing Data.SuffixTree, a lazy, efficient Haskell suffix tree
  1. Dylan Thurston says:

    I’m surprised that construction is so limited; right now the library is only useful for searching for a sequence in a long list and not, for instance, for looking up identifiers.

    That is, I think construct should have type

    construct’ :: (Ord a) => [[a]] -> STree a

    where the constructor you have now is

    construct l = construct’ (tails l)

  2. No, the library isn’t only useful for searching. It’s also usable for a number of other tasks, such as counting repeats, maximal unique matching, and so on.

    If you wanted to be able to store arbitrary sequences, you’d need a trie, not a suffix tree. The two are related, but they’re not the same, and you can’t use the suffix tree construction algorithm to create a trie.

Leave a Reply

Your email address will not be published. Required fields are marked *