Two-dimensional spatial hashing with space-filling curves
January 11th, 2007 by Bryan O'Sullivan
The Hilbert curve is a fractal space-filling curve that is rather pretty to look at.
A Hilbert curve of order n traces a single path over a square of side 2^n units, as you can see in the images from MathWorld above (with curves of order 2 through 6).
If you pick a point p from within the square covered by a curve, and trace out the curve until you reach that point, the number of intermediate points that you passed through along your way to p is called the Hilbert distance. The notion of distance along a space-filling curve is useful for a few reasons.
A Hilbert curve of order n traces a single path over a square of side 2^n units, as you can see in the images from MathWorld above (with curves of order 2 through 6).
If you pick a point p from within the square covered by a curve, and trace out the curve until you reach that point, the number of intermediate points that you passed through along your way to p is called the Hilbert distance. The notion of distance along a space-filling curve is useful for a few reasons.
- You can use it as a hash from a 2D coordinate to an integer.
- Some curves preserve locality; for example, points that are nearby in 2D space are likely to have similar Hilbert distances.
- It’s an isomorphism: you can turn a distance back into a 2D coordinate, in some cases quite easily.
- Locality-preserving hashes can be useful for clustering nodes of an R-tree, which is the generalisation of a B-tree to spatial data.
import Data.Bits
hilbertDistance :: (Bits a, Ord a) => Int -> (a,a) -> a
hilbertDistance d (x,y)
| x < 0 || x >= 1 `shiftL` d = error “x bounds”
| y < 0 || y >= 1 `shiftL` d = error “y bounds”
| otherwise = dist (1 `shiftL` (d - 1)) (1 `shiftL` ((d - 1) * 2)) 0 x y
where dist 0 _ result _ _ = result
dist side area result x y =
case (compare x side, compare y side) of
(LT, LT) -> step result y x
(LT, _) -> step (result + area) x (y - side)
(_, LT) -> step (result + area * 3) (side - y - 1)
(side * 2 - x - 1)
(_, _) -> step (result + area * 2) (x - side) (y - side)
where step = dist (side `shiftR` 1) (area `shiftR` 2)
As you can see, each step requires a few comparisons, a few shifts, and a handful of arithmetic operations, so it’s fairly cheap, and the loop unrolls and constant-folds well.
There’s some controversy in the literature over whether a Hilbert curve gives better spatial locality than a Lebesgue (also known as a Z-order, or Morton) curve. It’s certainly the case that computing the Lebesgue distance is much cheaper than the Hilbert distance, as it’s possible to finagle an answer using a fixed, unconditional sequence of bit-twiddling operations to interleave the bits of the x coordinate with those of the y.
lebesgueDistance :: (Int, Int) -> Int
lebesgueDistance (x,y) =
let ms = [(0x00FF00FF, 8), (0x0F0F0F0F, 4),
(0x33333333, 2), (0x55555555, 1)]
aaargh x (mask,shift) = (x .|. (x `shiftL` shift)) .&. mask
x' = foldl aaargh x ms
y' = foldl aaargh y ms
in x .|. (y `shiftL` 1)
I haven’t myself checked to see which curve better preserves locality. It’s certainly easier to see how to extend the Lebesgue bit-fiddling hack to a higher dimension, while the given Hilbert code is a bit of a head wrecker.
Hey Bryan, why the interest in fractal curves? Are you using them in some application?
For the record, I did performance comparison of Hilbert vs Z-order indexing for a cartographic tiling or Western Europe, mostly containing the road network and water/forest/building features. The indexed database was something like 15/20Gb of vector data, test machine looks like a bi-PIII-700Mhz with 1Gb of memory and SCSI disks (was 7 years ago).
I did not find relevant differences despite what several papers/thesis claim (Moon,Faloutsos&Jagadish 1996, Lawder 2001). Maybe the code/tests I wrote were incorrect or biased in a manner which invalidated the expected locality gains, anyway nothing spectacular came from it.
I was sorry about it, I thought it was a cool idea. Too bad it just did not work for me.
Ben - a number of years ago, I wrote a spatial database in Scheme, using Hilbert R-trees. The fast Hilbert distance algorithm was a useful little snippet that fit nicely into the constraints of a blog posting
Nice, I was aware of Hilbert curves, but I hadn’t heard about Lebesgue curves before!
See also: http://xkcd.com/c195.html
[...] A nice way to hash a 2-D point, while preserving some locality. Link. [...]
Hello–
Is there a typo in the lebesgue algorithm? It appears you’re defining x’ and y’ and then ignoring them. Is the last line meant to be
in x’ .|. (y’ `shiftL` 1)
instead? It seems to work better that way.
Fascinating site and well worth the visit. I will be back