Two-dimensional spatial hashing with space-filling curves

Posted on 2007-01-11 by Bryan O'Sullivan — 9 Comments ↓

The Hilbert curve is a fractal space-filling curve that is rather pretty to look at.

A Hilbert curve of order n traces a single path over a square of side 2^n units, as you can see in the images from MathWorld above (with curves of order 2 through 6). If you pick a point p from within the square covered by a curve, and trace out the curve until you reach that point, the number of intermediate points that you passed through along your way to p is called the Hilbert distance. The notion of distance along a space-filling curve is useful for a few reasons.

You can use it as a hash from a 2D coordinate to an integer.
Some curves preserve locality; for example, points that are nearby in 2D space are likely to have similar Hilbert distances.
It’s an isomorphism: you can turn a distance back into a 2D coordinate, in some cases quite easily.
Locality-preserving hashes can be useful for clustering nodes of an R-tree, which is the generalisation of a B-tree to spatial data.

There’s a nice algorithm for computing the Hilbert distance of a 2D point that’s linear in the order of the curve (or logarithmic in the size of the side of the square you’re trying to cover). To put this into useful terms, a “10 megapixel” camera typically has a resolution of 3766×2800 pixels. To cover every pixel with a Hilbert curve, you’d need a curve with 4096 points on a side, which is an order-12 curve. You can compute the Hilbert distance of any point in your 10-megapixel photo with 12 steps of the distance algorithm. Here’s an implementation of this Hilbert distance algorithm in Haskell.

import Data.Bits

hilbertDistance :: (Bits a, Ord a) => Int -> (a,a) -> a

hilbertDistance d (x,y)

    | x < 0 || x >= 1 `shiftL` d = error "x bounds"

    | y < 0 || y >= 1 `shiftL` d = error "y bounds"

    | otherwise = dist (1 `shiftL` (d - 1)) (1 `shiftL` ((d - 1) * 2)) 0 x y

    where dist 0 _ result _ _ = result

          dist side area result x y =

              case (compare x side, compare y side) of

              (LT, LT) -> step result y x

              (LT, _)  -> step (result + area) x (y - side)

              (_, LT)  -> step (result + area * 3) (side - y - 1)

                          (side * 2 - x - 1)

              (_, _)   -> step (result + area * 2) (x - side) (y - side)

              where step = dist (side `shiftR` 1) (area `shiftR` 2)

As you can see, each step requires a few comparisons, a few shifts, and a handful of arithmetic operations, so it’s fairly cheap, and the loop unrolls and constant-folds well. There’s some controversy in the literature over whether a Hilbert curve gives better spatial locality than a Lebesgue (also known as a Z-order, or Morton) curve. It’s certainly the case that computing the Lebesgue distance is much cheaper than the Hilbert distance, as it’s possible to finagle an answer using a fixed, unconditional sequence of bit-twiddling operations to interleave the bits of the x coordinate with those of the y.

lebesgueDistance :: (Int, Int) -> Int

lebesgueDistance (x,y) =

    let ms = [(0x00FF00FF, 8), (0x0F0F0F0F, 4),

              (0x33333333, 2), (0x55555555, 1)]

        aaargh x (mask,shift) = (x .|. (x `shiftL` shift)) .&. mask

        x' = foldl aaargh x ms

        y' = foldl aaargh y ms

    in x .|. (y `shiftL` 1)

I haven’t myself checked to see which curve better preserves locality. It’s certainly easier to see how to extend the Lebesgue bit-fiddling hack to a higher dimension, while the given Hilbert code is a bit of a head wrecker.

Posted in haskell, software

9 comments on “Two-dimensional spatial hashing with space-filling curves”

Ben Collins-Sussman says:

2007-01-11 at 04:45

Hey Bryan, why the interest in fractal curves? Are you using them in some application?
Patrick MÃ©zard says:

2007-01-11 at 04:59

For the record, I did performance comparison of Hilbert vs Z-order indexing for a cartographic tiling or Western Europe, mostly containing the road network and water/forest/building features. The indexed database was something like 15/20Gb of vector data, test machine looks like a bi-PIII-700Mhz with 1Gb of memory and SCSI disks (was 7 years ago).

I did not find relevant differences despite what several papers/thesis claim (Moon,Faloutsos&Jagadish 1996, Lawder 2001). Maybe the code/tests I wrote were incorrect or biased in a manner which invalidated the expected locality gains, anyway nothing spectacular came from it.

I was sorry about it, I thought it was a cool idea. Too bad it just did not work for me.
Bryan O'Sullivan says:

2007-01-11 at 08:38

Ben – a number of years ago, I wrote a spatial database in Scheme, using Hilbert R-trees. The fast Hilbert distance algorithm was a useful little snippet that fit nicely into the constraints of a blog posting
Malcolm Rowe says:

2007-01-12 at 07:41

Nice, I was aware of Hilbert curves, but I hadn’t heard about Lebesgue curves before!

See also: http://xkcd.com/c195.html
Ezra says:

2007-05-13 at 13:10

Hello–

Is there a typo in the lebesgue algorithm? It appears you’re defining x’ and y’ and then ignoring them. Is the last line meant to be

in x’ .|. (y’ `shiftL` 1)

instead? It seems to work better that way.
jberryman says:

2009-05-14 at 10:51

Cool, thanks! I wonder if space-filling curves can be used in functional languages to represent a grid of mutable objects, like a checker board.
taotree says:

2011-09-04 at 05:29

Thanks for the info and especially the code implementations.

Ezra is correct, there is a typo. I used the code given and it results in incorrect values. I took Ezra’s fix and it looks much better now. It would be nice if the author could correct the code in the post.
Mats Rauhala says:

2012-01-10 at 23:39

Could you point on to some material about hilbert curves, especially about mapping 2d->1d, 1d->2d?
Carter T Schonwald says:

2013-04-16 at 12:45

I”ll actually have some benchmarks on morton vs hilbert layouts for linear algebra soon. It also looks like a straightline loopless hilbert algorithm *may* exist, trying to sus out the details shortly. (will certainly be bigger than the morton algorithm, but thats a small price for better locality!)