<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>teideal glic deisbhéalach &#187; open source</title>
	<atom:link href="http://www.serpentine.com/blog/category/software/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.serpentine.com/blog</link>
	<description>Bryan O&#039;Sullivan&#039;s blog</description>
	<lastBuildDate>Thu, 01 Dec 2011 16:53:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>aeson 0.4: easier, faster, better</title>
		<link>http://www.serpentine.com/blog/2011/11/30/893/</link>
		<comments>http://www.serpentine.com/blog/2011/11/30/893/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 06:20:57 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=893</guid>
		<description><![CDATA[After months of work, and a number of great contributions from other developers, I just released version 0.4 of aeson, the de facto standard Haskell JSON library. This is a major release, with a number of improvements. Enjoy! Ease of use The new decode function complements the longstanding encode function, and makes the API simpler. [...]]]></description>
			<content:encoded><![CDATA[<p>After months of work, and a number of great contributions from other developers, I just released version 0.4 of <a href="http://hackage.haskell.org/package/aeson">aeson</a>, the de facto standard Haskell JSON library. This is a major release, with a number of improvements. Enjoy!</p>
<h2 id="ease-of-use">Ease of use</h2>
<p>The new <a href="http://hackage.haskell.org/packages/archive/aeson/latest/doc/html/Data-Aeson.html#v:decode"><code>decode</code> function</a> complements the longstanding <code>encode</code> function, and makes the API simpler.</p>
<p><a href="https://github.com/bos/aeson/tree/master/examples">New examples</a> make it easier to learn to use the package.</p>
<h2 id="generics-support">Generics support</h2>
<p>aeson&#8217;s support for data-type generic programming makes it possible to use JSON encodings of most data types without writing any boilerplate instances.</p>
<p>Thanks to Bas Van Dijk, aeson now supports the two major schemes for doing datatype-generic programming:</p>
<ul>
<li><p>the modern mechanism, <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/generic-programming.html">built into GHC itself</a></p></li>
<li><p>the older mechanism, based on SYB (aka &quot;scrap your boilerplate&quot;)</p></li>
</ul>
<p>The modern GHC-based generic mechanism is fast and terse: in fact, its performance is generally comparable in performance to hand-written and TH-derived <code>ToJSON</code> and <code>FromJSON</code> instances. To see how to use GHC generics, refer to <a href="https://github.com/bos/aeson/blob/master/examples/Generic.hs"><code>examples/Generic.hs</code></a>.</p>
<p>The SYB-based generics support lives in <a href="http://hackage.haskell.org/packages/archive/aeson/latest/doc/html/Data-Aeson-Generic.html">Data.Aeson.Generic</a>, and is provided mainly for users of GHC older than 7.2. SYB is far slower (by about 10x) than the more modern generic mechanism. To see how to use SYB generics, refer to <a href="https://github.com/bos/aeson/blob/master/examples/GenericSYB.hs"><code>examples/GenericSYB.hs</code></a>.</p>
<h2 id="improved-performance">Improved performance</h2>
<ul>
<li><p>We switched the intermediate representation of JSON objects from <code>Data.Map</code> to <a href="http://hackage.haskell.org/package/unordered-containers"><code>Data.HashMap</code></a>, which has improved type conversion performance.</p></li>
<li><p>Instances of <code>ToJSON</code> and <code>FromJSON</code> for tuples are between 45% and 70% faster than in 0.3.</p></li>
</ul>
<h2 id="evaluation-control">Evaluation control</h2>
<p>This version of aeson makes explicit the decoupling between <em>identifying</em> an element of a JSON document and <em>converting</em> it to Haskell. See the <a href="http://hackage.haskell.org/packages/archive/aeson/latest/doc/html/Data-Aeson-Parser.html"><code>Data.Aeson.Parser</code></a> documentation for details.</p>
<p>The normal aeson <code>decode</code> function performs identification strictly, but defers conversion until needed. This can result in improved performance (e.g. if the results of some conversions are never needed), but at a cost in increased memory consumption.</p>
<p>The new <code>decode'</code> function performs identification and conversion immediately. This incurs an up-front cost in CPU cycles, but reduces reduce memory consumption.</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/11/30/893/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The future of MailRank&#8217;s open source technologies</title>
		<link>http://www.serpentine.com/blog/2011/11/15/the-future-of-mailranks-open-source-technologies/</link>
		<comments>http://www.serpentine.com/blog/2011/11/15/the-future-of-mailranks-open-source-technologies/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 22:23:53 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=891</guid>
		<description><![CDATA[(Cross-posted from the MailRank engineering blog.) You may have seen my exciting news about our upcoming move to Facebook. It&#8217;s been a total blast working on our product, and of course as we did so we released a number of open source libraries and tools. It only added to our pleasure to see so much [...]]]></description>
			<content:encoded><![CDATA[<p>(Cross-posted from the <a href="http://engineering.mailrank.com/the-future-of-mailranks-open-source-technolog">MailRank engineering blog</a>.)</p>
<p>You may have seen my <a href="http://blog.mailrank.com/the-mailrank-team-is-joining-facebook">exciting news about our upcoming move to Facebook</a>.</p>
<p>It&#8217;s been a total blast working on our product, and of course as we did so we released a number of open source libraries and tools. It only added to our pleasure to see so much of that code used outside of our own domain. I will continue to develop and maintain the code that we have released.</p>
<p>Here is a quick rundown of the code we have released, roughly ordered by significance. Yep, we wrote all of these projects in Haskell, definitely a decision that in retrospect I&#8217;m very happy about.</p>
<ul>
<li>
<p><a href="https://github.com/bos/pronk"><strong>pronk</strong></a> (not yet actually released) is an application for load testing web servers. Think of it as similar to httperf or ab, only more modern, simpler to deal with, and with vastly better analytic and reporting capabilities.</p>
</li>
<li>
<p><a href="https://github.com/bos/configurator"><strong>configurator</strong></a> is a library that allows fast, dynamic reconfiguration of a Haskell application or daemon.</p>
</li>
<li>
<p><a href="https://github.com/bos/aeson"><strong>aeson</strong></a> is a JSON encoding and decoding library optimized for high performance and ease of use.</p>
</li>
<li>
<p><a href="https://github.com/bos/text-format"><strong>text-format</strong></a> is a library for <code>printf</code>-like text formatting.</p>
</li>
<li>
<p><a href="https://github.com/bos/mysql-simple"><strong>mysql-simple</strong></a> is an easy-to-use client library for the MySQL database. It is several times faster than its competitors, and easier to use. It is built on top of the low-level <a href="https://github.com/bos/mysql"><strong>mysql</strong></a> library.</p>
</li>
<li>
<p><a href="https://github.com/bos/riak-haskell-client"><strong>riak-haskell-client</strong></a> is a client for the Riak decentralized data store.</p>
</li>
<li>
<p><a href="https://github.com/bos/blaze-textual"><strong>blaze-textual</strong></a> is a library for efficiently rendering Haskell data as text.</p>
</li>
<li>
<p><a href="https://github.com/bos/double-conversion"><strong>double-conversion</strong></a> is a very fast library for rendering double precision floating point numbers as text, based on the <a href="http://code.google.com/p/double-conversion/">code from the V8 Javascript engine</a>.</p>
</li>
<li>
<p><strong><a href="https://github.com/bos/pool">resource-pool</a></strong>&nbsp;is a fast resource pooling library.</p>
</li>
<li>
<p><a href="https://github.com/bos/snappy"><strong>snappy</strong></a> provides Haskell bindings to Google&#8217;s extremely fast <a href="http://code.google.com/p/snappy/">snappy compression library</a>.</p>
</li>
<li>
<p><a href="https://github.com/bos/base16-bytestring"><strong>base16-bytestring</strong></a> provides fast handling of base16-encoded data.</p>
</li>
<li>
<p><a href="https://github.com/bos/hdbc-mysql"><strong>hdbc-mysql</strong></a> provides a MySQL transport for the HDBC database access library. (Yes, we recommend using mysql-simple instead!)</p>
</li>
</ul>
<p>Thanks to all of you who have contributed patches and bug reports. It&#8217;s going to be an exciting future!</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/11/15/the-future-of-mailranks-open-source-technologies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A major new release of the Haskell statistics library</title>
		<link>http://www.serpentine.com/blog/2011/11/10/a-major-new-release-of-the-haskell-statistics-library/</link>
		<comments>http://www.serpentine.com/blog/2011/11/10/a-major-new-release-of-the-haskell-statistics-library/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 04:45:36 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=887</guid>
		<description><![CDATA[I'm pleased to announce a major release of of the Haskell statistics library, version 0.10.0.0. I'd particularly like to thank Alexey Khudyakov for his wonderful work on this release. New features: Student-T, Fisher-Snedecor, F-distribution, and Cauchy-Lorentz distributions are added. Histogram computation is added, in Sample.Histogram. Forward and inverse discrete Fourier and cosine transforms are added, [...]]]></description>
			<content:encoded><![CDATA[<p>I'm pleased to announce a major release of of the Haskell <a href="http://hackage.haskell.org/package/statistics">statistics</a> library, version 0.10.0.0.</p>
<p>I'd particularly like to thank Alexey Khudyakov for his wonderful work on this release.</p>
<p>New features:</p>
<ul>
<li><p>Student-T, Fisher-Snedecor, F-distribution, and Cauchy-Lorentz distributions are added.</p></li>
<li><p>Histogram computation is added, in <code>Sample.Histogram</code>.</p></li>
<li><p>Forward and inverse discrete Fourier and cosine transforms are added, in <code>Transform</code>.</p></li>
<li><p>Root finding is added, in <code>Math.RootFinding</code>.</p></li>
</ul>
<p>Major changes:</p>
<ul>
<li><p>The <code>Sample.KernelDensity</code> module has been renamed, and completely rewritten to be much more robust. The older module oversmoothed multi-modal data. (The older module is still available under the name <code>Sample.KernelDensity.Simple</code>).</p></li>
<li><p>The type classes <code>Mean</code> and <code>Variance</code> are split in two. This is required for distributions which do not have finite variance or mean.</p></li>
</ul>
<p>Smaller changes:</p>
<ul>
<li><p>The <code>complCumulative</code> function is added to the <code>Distribution</code> class in order to accurately assess probalities P(X&gt;x) which are used in one-tailed tests.</p></li>
<li><p>A <code>stdDev</code> function is added to the <code>Variance</code> class for distributions.</p></li>
<li><p>The constructor <code>Distribution.normalDistr</code> now takes standard deviation instead of variance as its parameter.</p></li>
<li><p>A bug in <code>Quantile.weightedAvg</code> is fixed. It produced a wrong answer if a sample contained only one element.</p></li>
<li><p>Bugs in quantile estimations for chi-square and gamma distribution are fixed.</p></li>
<li><p>Integer overlow in <code>mannWhitneyUCriticalValue</code> is fixed. It produced incorrect critical values for moderately large samples. Something around 20 for 32-bit machines and 40 for 64-bit ones.</p></li>
<li><p>A bug in <code>mannWhitneyUSignificant</code> is fixed. If either sample was larger than 20, it produced a completely incorrect answer.</p></li>
<li><p>One- and two-tailed tests in <code>Tests.NonParametric</code> are selected with sum types instead of <code>Bool</code>.</p></li>
<li><p>Test results returned as enumeration instead of <code>Bool</code>.</p></li>
<li><p>Performance improvements for Mann-Whitney U and Wilcoxon tests.</p></li>
<li><p>Module <code>Tests.NonParamtric</code> is split into <code>Tests.MannWhitneyU</code> and <code>Tests.WilcoxonT</code></p></li>
<li><p><code>sortBy</code> is added to <code>Function</code>.</p></li>
<li><p>Mean and variance for gamma distribution are fixed.</p></li>
<li><p>Much faster cumulative probablity functions for Poisson and hypergeometric distributions.</p></li>
<li><p>Better density functions for gamma and Poisson distributions.</p></li>
<li><p>The function <code>Function.create</code> is removed. Use <code>generateM</code> from the <code>vector</code> package instead.</p></li>
<li><p>A function to perform approximate comparion of doubles is added to <code>Function.Comparison</code>.</p></li>
<li><p>Regularized incomplete beta function and its inverse are added to <code>Function</code>.</p></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/11/10/a-major-new-release-of-the-haskell-statistics-library/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Strange Loop conference was a blast</title>
		<link>http://www.serpentine.com/blog/2011/09/27/the-strange-loop-conference-was-a-blast/</link>
		<comments>http://www.serpentine.com/blog/2011/09/27/the-strange-loop-conference-was-a-blast/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 04:54:05 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=877</guid>
		<description><![CDATA[Last week, I flew to St Louis for the excellent Strange Loop conference, where I gave a 3-hour Haskell tutorial and a talk on how we use Haskell at my startup company, MailRank. Strange Loop is a pretty good approximation to my ideal conference, covering a narrow family of topics I&#8217;m interested in, mainly leading-edge [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I flew to St Louis for the excellent <a href="https://thestrangeloop.com/">Strange Loop</a> conference, where I gave a 3-hour Haskell tutorial and a talk on how we use Haskell at my startup company, MailRank.</p>
<p>Strange Loop is a pretty good approximation to my ideal conference, covering a narrow family of topics I&#8217;m interested in, mainly leading-edge matters in programming languages and distributed systems. The focus is not at all academic, instead being on open source software that you can download and get to work with. I have to strongly commend the organizers for finding a great venue, excellent speakers, and running a fabulous show at an impressively low price.</p>
<p>I&#8217;ve published the source for the slides from both my workshop and talk in a <a href="https://github.com/bos/strange-loop-2011">github repository</a>, but you might prefer to look at the slides directly:</p>
<ul>
<li><p><a href="http://bos.github.com/strange-loop-2011/slides/slides.html">My Haskell workshop</a> (141 slides, and we managed to cover the whole lot at a pleasant pace!)</p></li>
<li><p><a href="http://bos.github.com/strange-loop-2011/talk/talk.html">My talk on startups and Haskell</a> (41 slides)</p></li>
</ul>
<p>I buried a <a href="http://bos.github.com/strange-loop-2011/talk/talk.html#(31)">teaser slide</a> 3/4 of the way through the startup talk, which turned out to be an excellent way to accidentally find out how many people on Twitter were reading through the slides (the answer surprised me: a lot!). The enigmatic bug in question was quite spectacular: a development version of GHC would delete a source file if it contained a type error. Ouch!</p>
<p>One of the people in the capacity crowd at the Haskell workshop was none other than <a href="http://groups.csail.mit.edu/mac/users/gjs/">Gerry Sussman</a>, who was finally learning Haskell for the first time (doubtless after several decades of overexposure to parentheses). He pronounced the workshop &quot;pretty good&quot;, which I believe is MIT-speak for &quot;OMG teh awesome!&quot;</p>
<p>At one point, Gerry had a pretty amusing epigram to offer. &quot;Haskell is the best of the obsolete programming languages!&quot; he pronounced, with a mischievous look. Now, I know when I&#8217;m being trolled, so I said nothing and waited a moment, whereupon he continued, &quot;but don&#8217;t take it the wrong way &#8211; I think they&#8217;re <em>all</em> obsolete!&quot;</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/09/27/the-strange-loop-conference-was-a-blast/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Fitter, happier, more productive UTF-8 decoding</title>
		<link>http://www.serpentine.com/blog/2011/07/11/fitter-happier-more-productive-utf-8-decoding/</link>
		<comments>http://www.serpentine.com/blog/2011/07/11/fitter-happier-more-productive-utf-8-decoding/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 07:25:38 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=854</guid>
		<description><![CDATA[The other night, I had a random whim to spend a couple of minutes looking at the performance of UTF-8 decoding in the Haskell Unicode text package. Actually, rather than look at the actual performance, what I did was use Don Stewart's excellent ghc-core tool to inspect the high-level &#34;Core&#34; code generated by the compiler. [...]]]></description>
			<content:encoded><![CDATA[<p>The other night, I had a random whim to spend a couple of minutes looking at the performance of UTF-8 decoding in the Haskell <a href="http://hackage.haskell.org/package/text">Unicode text package</a>. Actually, rather than look at the actual performance, what I did was use Don Stewart's excellent <a href="http://hackage.haskell.org/package/ghc-core"><code>ghc-core</code></a> tool to inspect the high-level &quot;Core&quot; code generated by the compiler. Core is the last layer at which Haskell code is still somewhat intelligible, and although it takes quite a bit of practice to interpret, the effort is often worth it.</p>
<p>For instance, in this case, I could immediately tell by inspection that something bad was afoot in the inner loop of the UTF-8 decoder. A decoder more or less has to read a byte of input at a time, as this heavily edited bit of Core illustrates:</p>
<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">let</span><span class="ot"> x2 </span><span class="ot">::</span> <span class="dt">Word8</span><br />    x2 <span class="fu">=</span> <span class="kw">case</span> readWord8OffAddr<span class="fu">#</span> <span class="co">{- ... -}</span> <span class="kw">of</span><br />           (<span class="fu">#</span> s, x <span class="fu">#</span>) <span class="ot">-&gt;</span> <span class="dt">W8</span><span class="fu">#</span> x<br /><span class="kw">in</span> <span class="co">{- ... -}</span></code></pre>
<p>What's important in the snippet above is that the value <code>x2</code> is <em>boxed</em>, i.e. packaged up with a <code>W8#</code> constructor so that it must be accessed via a pointer indirection. Since a decoder must read up to 4 bytes to emit a single Unicode code point, the loop was potentially boxing up 4 bytes, then immediately <em>unboxing</em> them:</p>
<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">case</span> x2 <span class="kw">of</span> <br />  <span class="dt">W8</span><span class="fu">#</span> x2<span class="fu">#</span> <span class="ot">-&gt;</span><br />    <span class="kw">case</span> x3 <span class="kw">of</span><br />      <span class="dt">W8</span><span class="fu">#</span> x3<span class="fu">#</span> <span class="ot">-&gt;</span><br />        <span class="kw">case</span> x4 <span class="kw">of</span><br />          <span class="dt">W8</span><span class="fu">#</span> x4<span class="fu">#</span> <span class="ot">-&gt;</span> <span class="co">{- ... -}</span></code></pre>
<p>While both boxing and unboxing are cheap in Haskell, they're certainly not <em>free</em>, and we surely don't want to be doing either in the inner loop of an oft-used function.</p>
<p>We can see <em>why</em> this was happening at line 96 of the <a href="https://bitbucket.org/bos/text/src/cac7dbcbc392/Data/Text/Encoding.hs#cl-88"><code>decodeUtf8With</code> function</a>. I'd hoped that the compiler would be smart enough to unbox the values <code>x1</code> through <code>x4</code>, but it turned out not to be.</p>
<p>Fixing this excessive boxing and unboxing wasn't hard at all, but <a href="https://bitbucket.org/bos/text/src/71ead801296a/Data/Text/Encoding.hs#cl-88">it made the code uglier</a>. The rewritten code had identical performance on pure ASCII data, but was about 1.7 times faster on data that was partly or entirely non-ASCII. Nice! Right?</p>
<p>Not quite content with this improvement, I tried writing a decoder based on <a href="http://bjoern.hoehrmann.de/utf-8/decoder/dfa/">BjÃ¶rn HÃ¶hrmann's work</a>. My initial attempt looked promising; it was up to 2.5 times faster than my first improved Haskell decoder, but it fell behind on decoding ASCII, due to the extra overhead of maintaining the DFA state.</p>
<p>In English-speaking countries, ASCII is still the king of encodings. Even in non-English-speaking countries that use UTF-8, a whole lot of text is at least partly ASCII in nature. For instance, other European languages contain frequent extents of 7-bit-clean text. Even in languages where almost all code points need two or more bytes to be represented in UTF-8, data such as XML and HTML <em>still</em> contains numerous extents of ASCII text.</p>
<p>What would happen if we were to special-case ASCII? If we read a 32-bit chunk of data, mask it against 0x80808080, and get zero, we know that all four bytes must be ASCII, so we can just <a href="https://bitbucket.org/bos/text/src/d6b9108799ba/cbits/cbits.c#cl-64">write them straight out without going through the DFA</a> (see <a href="https://bitbucket.org/bos/text/src/d6b9108799ba/cbits/cbits.c#cl-82">lines 82 through 110</a>).</p>
<p>As <a href="https://spreadsheets0.google.com/a/serpentine.com/spreadsheet/pub?hl=en_US&amp;hl=en_US&amp;key=0AlCjMsgkVJXcdG11ZGNaa2FkX3gwZ241bV9IYTduWkE&amp;output=html">the numbers suggest</a>, this makes a big difference to performance! Decoding pure ASCII becomes <em>much</em> faster, while both HTML and XML see respectable improvements. Of course, even this approach comes with a tradeoff: we lose a little performance when decoding entirely non-ASCII text.</p>
<img src="https://spreadsheets.google.com/a/serpentine.com/spreadsheet/oimg?key=0AlCjMsgkVJXcdG11ZGNaa2FkX3gwZ241bV9IYTduWkE&#038;oid=5&#038;zx=nq9fqkp2ty6x" width=600 height=371 />
<p>Even in the slowest case, we can now decode upwards of 250MB of UTF-8 text per second, while for ASCII, we exceed 1.7GB per second!</p>
<p>These changes have made a big difference to decoding performance across the board: it is now always between 2 and 4 times faster than before.</p>
<img src="https://spreadsheets0.google.com/a/serpentine.com/spreadsheet/oimg?key=0AlCjMsgkVJXcdG11ZGNaa2FkX3gwZ241bV9IYTduWkE&#038;oid=3&#038;zx=muagi2e0n09u" width=600 height=371 />
<p>As a final note, I haven't released the new code quite yet - so keep an eye out!</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/07/11/fitter-happier-more-productive-utf-8-decoding/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Here be dragons: advances in problems you didn&#8217;t even know you had</title>
		<link>http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/</link>
		<comments>http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/#comments</comments>
		<pubDate>Wed, 29 Jun 2011 07:27:08 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=844</guid>
		<description><![CDATA[Here&#8217;s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it&#8217;s been regarded as essentially solved since about 1990.Prior to Steele and White&#8217;s &#34;How to print floating-point numbers accurately&#34;, implementations of printf and similar rendering functions did their [...]]]></description>
			<content:encoded><![CDATA[<p
>Here&#8217;s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it&#8217;s been regarded as essentially solved since about 1990.</p
><p
>Prior to Steele and White&#8217;s &quot;<a href="http://portal.acm.org/citation.cfm?id=93559"
  >How to print floating-point numbers accurately</a
  >&quot;, implementations of <code
  >printf</code
  > and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.</p
><p
>Steele and White effectively solved the problem with a clever algorithm named &quot;Dragon4&quot; (the fourth version of the &quot;Dragon&quot; algorithm, which acquired its name because the authors were inspired to obscure puns by Heighway's <a href="http://en.wikipedia.org/wiki/Dragon_curve"
  >dragon curve</a
  >).</p
><p
>The Dragon4 algorithm spread quickly across language runtimes, such that few programmers today understand that this was ever a problem, much less how hairy it was (and is). Indeed, prior to last year, there was almost no activity in this area: two papers proposed widely used refinements to Dragon4, and that was about it. (Alas, the problem was originally solved around a decade before Steele and White published their work, but nobody noticed. If you have a clever idea and sufficient chutzpah, try to enlist Guy Steele as a coauthor. Your work will be read.)</p
><p
>But how solved was the problem? Dragon4 and its derivatives are complicated and tricky, and they have a hefty performance cost, since they rely on arbitrary-precision integer arithmetic to compute their results. There might be a significant performance improvement to be gained if someone could figure out how to use native machine integers instead.</p
><p
>In 2010, Florian Loitsch published a wonderful paper in PLDI, &quot;<a href="http://florian.loitsch.com/publications/dtoa-pldi2010.pdf?attredirects=0"
  >Printing floating-point numbers quickly and accurately with integers</a
  >&quot;, which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say &quot;mostly&quot;? Because although Loitsch's &quot;Grisu3&quot; algorithm is very fast, it <em
  >gives up</em
  > on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative.</p
><p
>If you're a language runtime author, the Grisu algorithms are a big deal: Grisu3 is about 5 times faster than the algorithm used by <code
  >printf</code
  > in GNU libc, for instance. A few language implementors have already taken note: Google hired Loitsch, and the Grisu family acts as the default rendering algorithms in both the V8 and Mozilla Javascript engines (replacing David Gay's 17-year-old <code
  >dtoa</code
  > code). Loitsch has kindly released implementations of his Grisu algorithms as a library named <a href="http://code.google.com/p/double-conversion"
  ><code
    >double-conversion</code
    ></a
  >.</p
><p
>And of course I can't talk about performance without mentioning Haskell somewhere :-) I've taken Loitsch's library and written a <a href="http://hackage.haskell.org/package/double-conversion"
  >Haskell interface</a
  >, which I've measured to be 30 times faster than the default renderer used in the Haskell runtime libraries. This has some nice knock-on effects: my <a href="http://hackage.haskell.org/package/aeson"
  ><code>aeson</code> JSON library</a
  > is now 10 times faster at rendering big arrays of floating point numbers, for instance. I accidentally noticed in the course of that work that my Haskell <a href="http://hackage.haskell.org/package/text"
  ><code
    >text</code
    > Unicode library</a
  >'s UTF-8 encoder wasn't as fast as it could be, so I improved its performance by about 50% along the way. Hooray for faster code!</p
><p
>(By the way, the punnery in algorithm naming continues: the Grisu algorithms are named for <a href="http://de.wikipedia.org/wiki/Grisu,_der_kleine_Drache"
  >GrisÃ¹, the little dragon</a
  >.)</p
>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>A new week, a new JSON performance improvement</title>
		<link>http://www.serpentine.com/blog/2011/03/22/a-new-week-a-new-json-performance-improvement/</link>
		<comments>http://www.serpentine.com/blog/2011/03/22/a-new-week-a-new-json-performance-improvement/#comments</comments>
		<pubDate>Tue, 22 Mar 2011 06:53:09 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=817</guid>
		<description><![CDATA[It&#8217;s been a few weeks since I last wrote about the aeson library for working with JSON in Haskell, but this isn&#8217;t because I&#8217;ve been idle. In fact, just tonight I put out a new release. Where the previous releases focused on parsing performance, this one focuses on encoding performance. And the performance news is [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a few weeks since I last wrote about the <a href="http://hackage.haskell.org/package/aeson">aeson library</a> for working with JSON in Haskell, but this isn&#8217;t because I&#8217;ve been idle. In fact, just tonight I put out a new release. Where the previous releases focused on parsing performance, this one focuses on encoding performance.</p>

<p>And the performance news is good: on real-world data, I&#8217;ve improved encoding performance by about a factor of 4. Why don&#8217;t we let the graphs do the talking.</p>

<a href="http://www.serpentine.com/wordpress/wp-content/uploads/2011/03/shot.png"><img src="http://www.serpentine.com/wordpress/wp-content/uploads/2011/03/shot.png" alt="Encoding performance" title="shot" width="308" height="257" class="aligncenter size-full wp-image-818" /></a>

<p>Enjoy!</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/03/22/a-new-week-a-new-json-performance-improvement/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A little care and feeding can go a long way</title>
		<link>http://www.serpentine.com/blog/2011/03/18/a-little-care-and-feeding-can-go-a-long-way/</link>
		<comments>http://www.serpentine.com/blog/2011/03/18/a-little-care-and-feeding-can-go-a-long-way/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 06:36:09 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=805</guid>
		<description><![CDATA[Sometimes, when a software package meets a certain level of maturity (or the desire to hack on it fades sufficiently), it's tempting to consider it &#34;done&#34;. Here's a little tale of when done isn't really done.About a week ago, I received a message from Finlay Thompson asking about my Haskell statistics package: he wanted to [...]]]></description>
			<content:encoded><![CDATA[<p
>Sometimes, when a software package meets a certain level of maturity (or the desire to hack on it fades sufficiently), it's tempting to consider it &quot;done&quot;. Here's a little tale of when done isn't really done.</p
><p
>About a week ago, I received a message from Finlay Thompson asking about my Haskell <a href="http://hackage.haskell.org/package/statistics"
  >statistics</a
  > package: he wanted to know how to generate pseudo-random variables using it. I redirected him from that to my <a href="http://hackage.haskell.org/package/mwc-random"
  >mwc-random</a
  > package, where my pseudo-random number generation code lives.</p
><p
>The mwc-random package currently provides generators for two widely used distributions: uniform and normal. When I was originally writing it, I paid particular attention to making it high quality, fast, and easy to use.</p
><p
>&quot;High quality&quot; sounds a little nebulous, but in the world of pseudo-random number generation, it's actually pretty well defined: a good PRNG should have a large period (the number of samples you need to pull out of it before it repeats itself, assuming a good seed), and the numbers it generates should withstand stringent tests of apparent independence (simply put, given one datum, you shouldn't be able to predict the next).</p
><p
>One algorithm that satisfies these criteria of quality is George Marsaglia's <a href="http://en.wikipedia.org/wiki/Multiply-with-carry_(random_number_generator)"
  >multiply-with-carry</a
  > algorithm MWC256 (also known as MWC8222). It has a period of about 2<sup>8222</sup> (huge enough for all conceivable practical purposes), and stands up well to the &quot;testu01&quot;, &quot;diehard&quot; and &quot;big crush&quot; statistical tests.</p
><p
>Due to its simplicity, MWC256 is also very fast, and under appropriate circumstances (e.g. on a 64-bit machine) it can be even faster than the well known Mersenne Twister algorithm (which also fails some statistical tests that MWC256 passes).</p
><p
>The Mersenne Twister is itself available for Haskellers to use, in the form of the <a href="http://hackage.haskell.org/package/mersenne-random"
  >mersenne-random</a
  > package. This package is a wrapper around the Mersenne Twister library, and unfortunately it imposes on its users the underlying library's typically horrible constraints borne of too much Fortran programming: you can only have one PRNG per application, and it can only be used from a single thread! The mwc-random package is less restrictive: fire up as many PRNGs in different threads as you like, and they'll all operate independently. You can also use the PRNGs in either the ST or the IO monad, for further convenience.</p
><p
>When generating normally distributed random variables, the mwc-random package uses an algorithm known as the &quot;modified ziggurat&quot;. One of the more popular algorithms for generating normally distributed variables is called the ziggurat, but its popularity belies an ill-understood quality problem: the numbers it generates aren't independent enough! It turns out that they are noticeably correlated. The modified ziggurat is almost as fast, and it sacrifices a little speed in the name of improved independence.</p
><p
>The base-level performance of the random number generators looks like this on my Mac using 32-bit GHC 6.12.3, where the time quoted is to generate a single double-precision floating point number:</p
><ul
><li
  ><p
    >Uniform: 142.6 nanoseconds</p
    ></li
  ><li
  ><p
    >Normal: 15149 nanoseconds</p
    ></li
  ></ul
><p
>Where does the question of being &quot;done&quot; or not come in? Well, while poking around tonight, I was a little surprised at the large difference in speed between the uniform and normal PRNGs, so I investigated. The <a href="http://en.wikipedia.org/wiki/Ziggurat_algorithm"
  >ziggurat algorithm</a
  > gets its name from the precomputed lookup tables it uses to gain its speed. It turns out that GHC's inliner was being too aggressive with the table-related code, causing the ziggurat tables to be regenerated over and over instead of precomputed just once. Ouch!</p
><p
>One <a href="https://bitbucket.org/bos/mwc-random/changeset/123ccdb62a3a"
  >small and very quick change</a
  >, and the performance of the PRNG for normally distributed variables changed dramatically:</p
><ul
><li
  ><p
    >Before: 15149 nanoseconds</p
    ></li
  ><li
  ><p
    >After: 246.8 nanoseconds</p
    ></li
  ></ul
><p
>That's a little over 61 times faster. Not bad for a couple of lines of changes!</p
><p
>As a final note, now that GHC can build 64-bit programs on a Mac, you might wonder how it performs. Here's a comparison between 32-bit and 64-bit versions of GHC 7.0.2 (times in nanoseconds):</p
><table>
  <tr>
    <th align="left">type</th><th>32-bit</th><th>64-bit</th><th>speedup</th>
  </tr>
  <tr>
    <td>uniform Double</td><td align="right">148</td><td align="right">28</td><td align="right">5.3</td>
  </tr>
  <tr>
    <td>uniform Int32</td><td align="right">53</td><td align="right">16.7</td><td align="right">3.2</td>
  </tr>
  <tr>
    <td>normal Int32</td><td align="right">252</td><td align="right">62</td><td align="right">4.1</td>
  </tr>
</table>
<p
>Those are some pretty nice performance improvements! Of course, not all applications come in the form of nice tight numeric kernels, so don't take it as given that you'll see improvements like this in your code.</p
>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/03/18/a-little-care-and-feeding-can-go-a-long-way/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Faster, better, cleaner: new aeson and attoparsec releases</title>
		<link>http://www.serpentine.com/blog/2011/02/25/faster-better-cleaner-new-aeson-and-attoparsec-releases/</link>
		<comments>http://www.serpentine.com/blog/2011/02/25/faster-better-cleaner-new-aeson-and-attoparsec-releases/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 19:24:51 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=785</guid>
		<description><![CDATA[I&#8217;ve spent some time over the past few weeks improving the performance of the attoparsec parsing library, and of the aeson JSON library. Since they&#8217;ve now reached a new plateau of performance and stability, I thought this would be a good time to release new versions. The major advance in the new version of aeson [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve spent some time over the past few weeks improving the performance of the <a href="http://hackage.haskell.org/package/attoparsec">attoparsec</a> parsing library, and of the <a href="http://hackage.haskell.org/package/aeson">aeson</a> JSON library. Since they&#8217;ve now reached a new plateau of performance and stability, I thought this would be a good time to release new versions.</p>

<p>The major advance in the new version of aeson is a considerable speed improvement.</p>

<a href="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/flump.png"><img src="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/flump.png" alt="Performance improvement" title="Performance improvement" width="356" height="307" class="size-full wp-image-789" /></a>

<p>The datasets I&#8217;m using are Twitter search results, from the Twitter JSON search API. For mostly-English results, 0.2.0.0 is up to 30% faster than before, while on Japanese data (which makes heavy use of Unicode escapes), I&#8217;ve bumped performance by more than 50%.</p>

<p>To see how well aeson performs compared to JSON parsers for other languages, I compared it against the <tt>json</tt> module in Python 2.7. That module&#8217;s JSON parser is written in C, so it&#8217;s very fast indeed, and the amount of actual Python being executed in my microbenchmark is tiny. How do we fare?</p>

<a href="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/bumf.png"><img src="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/bumf.png" alt="JSON parsing performance" title="JSON parsing performance" width="439" height="399" class="size-full wp-image-787" /></a>

<p>On mostly-English data, aeson is actually <i>faster</i> than Python&#8217;s native-code <tt>json</tt> parser. Nice! And on Japanese data, we&#8217;re a little slower, but still very competitive.</p>

<p>What if you&#8217;ve been using the Haskell <a href="http://hackage.haskell.org/package/json">json</a> package, which was the first open source Haskell JSON parser to be published? Well, I do think that aeson is easier to use, but it&#8217;s also 3x faster than the json package:</p>

<a href="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/flurb.png"><img src="http://www.serpentine.com/wordpress/wp-content/uploads/2011/02/flurb.png" alt="aeson vs json" title="aeson vs json" width="340" height="312" class="size-full wp-image-792" /></a>

<p>The new version of aeson introduces some other useful improvements.</p>
<ul>
<li>There&#8217;s a new Generic module, which lets you convert almost any instance of the Data typeclass to and from JSON without writing boilerplate code. (Be warned: generics are slow. If performance is important to you, write that boilerplate!)</li>
<li>We introduce a Number type that represents integers to full accuracy, and which handles floating point numbers efficiently.</li>
<li>Instead of parsing via the Applicative typeclass, we now use a custom parsing monad, improving both ease of use and performance.</li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/02/25/faster-better-cleaner-new-aeson-and-attoparsec-releases/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>So I went and started a company</title>
		<link>http://www.serpentine.com/blog/2011/01/28/so-i-went-and-started-a-company/</link>
		<comments>http://www.serpentine.com/blog/2011/01/28/so-i-went-and-started-a-company/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 00:14:15 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=781</guid>
		<description><![CDATA[I&#8217;m delighted to say that after a couple of years of a break from the startup world (which I&#8217;ve inhabited for most of the past decade), I&#8217;ve decided to throw my hat back into the ring. Together with Bethanye Blount, I&#8217;ve started a company named MailRank. We&#8217;re working on helping people to manage the all-too-common [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m delighted to say that after a couple of years of a break from the startup world (which I&#8217;ve inhabited for most of the past decade), I&#8217;ve decided to throw my hat back into the ring. Together with Bethanye Blount, I&#8217;ve started a company named MailRank. We&#8217;re working on helping people to manage the all-too-common problem of email overload. You can read a little more about what we&#8217;re up to in our <a href="http://blog.mailrank.com/hello-world">initial announcement</a>, and we&#8217;ll have more details to share soon.</p>

<p>Since I&#8217;ve spent the past two decades in the world of open source, of course <a href="http://engineering.mailrank.com/introducing-some-open-source-technologies">I have goodies to show off</a>:</p>

<ul>
<li>A fast, powerful library for working with the Riak decentralized data store (<a href="https://github.com/mailrank/riak-haskell-client">mailrank/riak-haskell-client</a> on github). I think it&#8217;s the best Riak client library available for any programming language <img src='http://www.serpentine.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </li>
<li>An efficient, easy to use JSON library for Haskell (<a href="https://github.com/mailrank/aeson">mailrank/aeson</a> on github). Twice as fast as the competition.</li>
</ul>

<li>I&#8217;m excited to be working with Bethanye once again, and thrilled that we&#8217;re going to have our first employee join in ten days.</li>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2011/01/28/so-i-went-and-started-a-company/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

