Here’s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it’s been regarded as essentially solved since about 1990.

Prior to Steele and White’s "How to print floating-point numbers accurately", implementations of `printf`

and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.

Steele and White effectively solved the problem with a clever algorithm named "Dragon4" (the fourth version of the "Dragon" algorithm, which acquired its name because the authors were inspired to obscure puns by Heighway’s dragon curve).

The Dragon4 algorithm spread quickly across language runtimes, such that few programmers today understand that this was ever a problem, much less how hairy it was (and is). Indeed, prior to last year, there was almost no activity in this area: two papers proposed widely used refinements to Dragon4, and that was about it. (Alas, the problem was originally solved around a decade before Steele and White published their work, but nobody noticed. If you have a clever idea and sufficient chutzpah, try to enlist Guy Steele as a coauthor. Your work will be read.)

But how solved was the problem? Dragon4 and its derivatives are complicated and tricky, and they have a hefty performance cost, since they rely on arbitrary-precision integer arithmetic to compute their results. There might be a significant performance improvement to be gained if someone could figure out how to use native machine integers instead.

In 2010, Florian Loitsch published a wonderful paper in PLDI, "Printing floating-point numbers quickly and accurately with integers", which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say "mostly"? Because although Loitsch’s "Grisu3" algorithm is very fast, it *gives up* on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative.

If you’re a language runtime author, the Grisu algorithms are a big deal: Grisu3 is about 5 times faster than the algorithm used by `printf`

in GNU libc, for instance. A few language implementors have already taken note: Google hired Loitsch, and the Grisu family acts as the default rendering algorithms in both the V8 and Mozilla Javascript engines (replacing David Gay’s 17-year-old `dtoa`

code). Loitsch has kindly released implementations of his Grisu algorithms as a library named `double-conversion`

.

And of course I can’t talk about performance without mentioning Haskell somewhere :-) I’ve taken Loitsch’s library and written a Haskell interface, which I’ve measured to be 30 times faster than the default renderer used in the Haskell runtime libraries. This has some nice knock-on effects: my `aeson`

JSON library is now 10 times faster at rendering big arrays of floating point numbers, for instance. I accidentally noticed in the course of that work that my Haskell `text`

Unicode library‘s UTF-8 encoder wasn’t as fast as it could be, so I improved its performance by about 50% along the way. Hooray for faster code!

(By the way, the punnery in algorithm naming continues: the Grisu algorithms are named for GrisÃ¹, the little dragon.)

Haha, wow! I had no idea I had this problem, and definitely not that it has been improved!

You’ve just shown me a new, lovely facet of the world! Very, very interesting stuff.

(No, this is not ironic. It really is awesome.)

I’ve tried once to “improve %f a bit”, and failed miserably. Since then, I know.

@koala_man: It’s amazing how many mundane things become insanely complicated once you look into it.

Well, is it a real problem? Floating-point arithmetic is inaccurate. A need to distinguish between 1 and 0.999 is a clue that other structure should be used, such as integers, fixed-point numbers or fractions.

So what the first, forgotten solution?

Was it identical to Dragon4?

@Ivan, while it’s true that floating-point arithmetic is inherently approximate there’s no need to introduce even more inaccuracy when you’re not doing arithmetic. It’s quite reasonable to expect that read (show x) == x. That said, read the paper. There is a variant that doesn’t backslide to bignums but loses accuracy on 0.2% of floats.

The paper is available for free at Dr. Loitsch’s website: https://4196667435410119098-a-loitsch-com-s-sites.googlegroups.com/a/loitsch.com/florian/publications/dtoa-pldi2010.pdf?attachauth=ANoY7co8FNccBkCKg5VapkNS0a5rJI5XJ3xWsd-BJPBU6-RKIMS-1zhAbUaa1Ek95G9Zix9M_oC9PpAdKjtgCYCgLF__NVLpqPG_VIzdOt2TrOYALMjcgmWeOXT6ZhNCw_uwxpaPvn1TzooZNlpWgjKzxO0_Vjl_8Gyed7–wYDk8WJ9THTQxeWTHSawcK6iOtfwgzKiu9CVFRwhwc3D8AdDAdJcoKjyug%3D%3D&attredirects=0

I may take a hack at modifying it to C so it can be included in python…

Informative blog post, Bryan. Thanks!

What was the trick wrto UTF-8 encoding? I’d love to integrate it also in the new bytestring builder.

No tricks, Simon, just a shortcut that I improved and a mistake that I caught.

The shortcut was that any time I needed to write one or more bytes and I got within 4 bytes of the end of the buffer, I’d grow the buffer. This meant that sometimes I’d grow the buffer when I didn’t need to. This was easy enough to fix up.

Also, I screwed up the initial buffer size calculation. I intended the initial buffer size to be either 4 bytes or the number of words in the source buffer, whichever was bigger. But I actually ended up using whichever was *smaller* – oops!

Neither of these affected correctness; they just meant that I’d resize the destination buffer more often than necessary. With them fixed, if the source Unicode text contains only ASCII, the destination bytestring never needs to be resized during encoding. Otherwise, it’s still resized more conservatively than before.

I’ve got 99 problems, but fast rendering of floating point numbers as text strings isn’t one of them.

The bug in UTF-8 encoding bug is very similar to another in Modula-3’s “Sequence” datatype (a double-ended queue based on arrays) that was one of the early bugs found by the Extended Static Checking project.

Between the paper of Steele and White in 1994 and the paper by Loitsch in 2010 others contributed greatly to this field. In particular Burger and Dybvig in 1996 wrote a paper optimising Steele’s and White’s algorithms titled ‘Printing Floating-Point Numbers Quickly and Accurately’, in homage to S & W’s title ‘How to print floating-point numbers accurately’. The paper by Loitsch builds on both of these works and shows its heritage in the title ‘Printing Floating-Point Numbers Quickly and Accurately with Integers’. For completeness, I feel it is only fair to mention these other contributors. In fact, Loitsch mentions Burger and Dybvig 1996 on the first page.

For those interested in Scheme, the Burger and Dybvig paper includes complete code, as well as elegant correctness proofs, using a mathematical approach. Recommended reading.

In the better-late-than-never category, responding a few years later to @ivan’s query about whether it matters.

Two decades ago, I worked on checkpointing systems for large scientific computations. We needed them because the computations routinely exceeded the MTBF for the workstations the scientists used (and these were state-of-the-art Sun’s).

Fortunately I’d set up tests, and everything seemed great until I noticed some tests failing—sometimes causing quite a bit of divergence over time (as these things sometimes do).

You know where this is going. All the computations were floating-point computations, and the problem was being caused because I was printing values and reading them back in. In other words, it was a failure to meet the expectation that @James Iry so concisely expressed: that (show x) == x.

A checkpoint system is a perfect illustration of why you want this, because you want its behavior to be a semantic no-op. It sure can’t go around changing the computation’s values!