Here is the PDF copy.
And for the impatient, here is the abstract.
Analysis of biological data often involves large data sets and computationally expensive algorithms. Databases of biological data continue to grow, leading to an increasing demand for improved algorithms and data structures. Despite having many advantages over more traditional indexing structures, the Bloom filter is almost unused in bioinformatics. Here we present a robust and efficient Bloom filter implementation in Haskell, and implement a simple bioinformatics application for indexing and matching sequence data. We use this to index the chromosomes that make up the human genome, and map all available gene sequences to it. Our experiences with developing and tuning our application suggest that for bioinformatics applications, Haskell offers a compelling combination of rapid development, quality assurance, and high performance.
I’ll write a more friendly overview of the paper later. The Cliff’s Notes version: Bloom filters are almost unused in bioinformatics; they’re tremendously useful; and our Haskell code is really fast.
Right now, I have to catch a plane from Victoria back to San Francisco, now that ICFP is safely put to bed.