Comments on: Using Bloom filters for large scale gene sequence analysis in Haskell

By: Matías Iturburu

Matías Iturburu — Thu, 13 Sep 2012 01:35:23 +0000

I’m currently interested in the subject. Pitty that the pdf link is down.

Is it possible for you to upload it again?

By: Tim Yates

Tim Yates — Mon, 13 Oct 2008 15:25:40 +0000

Nice paper 🙂 I’m evaluating using a Bloom filter for getting my 25mer probe sequences pre-filtered into sets per chromosome rather than searching for millions of them for each chromosome in turn.. How do you ensure you are not out of phase on the words you extract from the target sequence? ie: If I read 8mer words with an overlap of 2, how do I ensure I am just not out by one base, thereby missing the words existence when running it through my hashes?

I’ve probably got the algorithm wrong in my head… More reading required 😉

By: Jeremy Leipzig

Jeremy Leipzig — Wed, 01 Oct 2008 16:11:00 +0000

Interesting paper. Aligning ESTs is a bit old-school, but there is a lot of interest in aligning many very short sequences (<30bp) sequences to the genome at high or exact thresholds. Due to its k-mer based heuristics, BLAT has not been very good at finding these matches. A lot of researchers have been turning to suffix trees and as a result they are spending a lot more time at home with their families.
I think would be interesting to implement a short sequence alignment tool along these lines in Haskell using your Bloom filters. I’m not sure the bottleneck is in storage, but perhaps the decreased footprint could make a distributed solution more attractive.

By: Yatima

Yatima — Mon, 29 Sep 2008 16:47:51 +0000

You’ll be pleased to hear that Haskell’s laziness and type-inferencing, along with Anathem, were major topics of conversation at my date night last night. I adore geeks. That is all.

By: Dan G

Dan G — Mon, 29 Sep 2008 03:47:20 +0000

I think this project uses similar techniques for network forensics, http://isis.poly.edu/projects/fornet/