For a while now, I’ve had it in mind to improve the encoding performance of my Haskell JSON package, aeson.
Over the weekend, I went from hazy notion to a proof of concept for what I think could be a reasonable approach.
This post is a case of me “thinking out loud” about the initial design I came up with. I’m very interested in hearing if you have a cleaner idea.
The problem with the encoding method currently used by aeson is that it occurs via a translation to the Value
type. While this is simple and uniform, it involves a large amount of intermediate work that is essentially wasted. When encoding a complex value, the Value
that we build up is expensive, and it will become garbage immediately.
It should be much more efficient to simply serialize straight to a Builder
, the type that is optimized for concatenating many short string fragments. But before marching down that road, I want to make sure that I provide a clean API that is easy to use correctly.
I’ve posted a gist that contains a complete copy of this proof-of-concept code.
{-# LANGUAGE GeneralizedNewtypeDeriving, FlexibleInstances,
OverloadedStrings #-}
import Data.Monoid (Monoid(..), (<>))
import Data.Text (Text)
import Data.Text.Lazy.Builder (Builder, singleton)
import qualified Data.Text.Lazy.Builder as Builder
import qualified Data.Text.Lazy.Builder.Int as Builder
The core Build
type has a phantom type that allows us to say “I am encoding a value of type t
”. We’ll see where this type tracking is helpful (and annoying) below.
data Build a = Build {
_count :: !Int
, run :: Builder
}
The internals of the Build
type would be hidden from users; here’s what they mean. The _count
field tracks the number of elements we’re encoding of an aggregate JSON value (an array or object); we’ll see why this matters shortly. The run
field lets us access the underlying Builder
.
We provide three empty types to use as parameters for the Build
type.
data Object
data Array
data Mixed
We’ll want to use the Mixed
type if we’re cramming a set of disparate Haskell values into a JSON array; read on for more.
When it comes to gluing values together, the Monoid
class is exactly what we need.
instance Monoid (Build a) where
mempty = Build 0 mempty
mappend (Build i a) (Build j b)
| ij > 1 = Build ij (a <> singleton ',' <> b)
| otherwise = Build ij (a <> b)
where ij = i + j
Here’s where the _count
field comes in; we want to separate elements of an array or object using commas, but this is necessary only when the array or object contains more than one value.
To encode a simple value, we provide a few obvious helpers. (These are clearly so simple as to be wrong, but remember: my purpose here is to explore the API design, not to provide a proper implementation.)
build :: Builder -> Build a
build = Build 1
int :: Integral a => a -> Build a
int = build . Builder.decimal
text :: Text -> Build Text
text = build . Builder.fromText
Encoding a JSON array is easy.
array :: Build a -> Build Array
array (Build 0 _) = build "[]"
array (Build _ vs) = build $ singleton '[' <> vs <> singleton ']'
If we try this out in ghci
, it behaves as we might hope.
?> array $ int 1 <> int 2
"[1,2]"
JSON puts no constraints on the types of the elements of an array. Unfortunately, our phantom type causes us difficulty here.
An expression of this form will not typecheck, as it’s trying to join a Build Int
with a Build Text
.
?> array $ int 1 <> text "foo"
This is where the Mixed
type from earlier comes in. We use it to forget the original phantom type so that we can construct an array with elements of different types.
mixed :: Build a -> Build Mixed
mixed (Build a b) = Build a b
Our new mixed
function gets the types to be the same, giving us something that typechecks.
?> array $ mixed (int 1) <> mixed (text "foo")
"[1,foo]"
This seems like a fair compromise to me. A Haskell programmer will normally want the types of values in an array to be the same, so the default behaviour of requiring this makes sense (at least to my current thinking), but we get a back door for when we absolutely have to go nuts with mixing types.
The last complication stems from the need to build JSON objects. Each key in an object must be a string, but the value can be of any type.
-- Encode a key-value pair.
(<:>) :: Build Text -> Build a -> Build Object
k <:> v = Build 1 (run k <> ":" <> run v)
object :: Build Object -> Build Object
object (Build 0 _) = build "{}"
object (Build _ kvs) = build $ singleton '{' <> kvs <> singleton '}'
If you’ve had your morning coffee, you’ll notice that I am not living up to my high-minded principles from earlier. Perhaps the types involved here should be something closer to this:
data Object a
(<:>) :: Build Text -> Build a -> Build (Object a)
object :: Build (Object a) -> Build (Object a)
(In which case we’d need a mixed
-like function to forget the phantom types for when we want to get mucky and unsafe—but I digress.)
How does this work out in practice?
?> object $ "foo" <:> int 1 <> "bar" <:> int 3
"{foo:1,bar:3}"
Hey look, that’s more or less as we might have hoped!
Open questions, for which I appeal to you for help:
Does this design appeal to you at all?
If not, what would you change?
If yes, to what extent am I wallowing in the “types for thee, but not for me” sin bin by omitting a phantom parameter for
Object
?
Helpful answers welcome!
Excellent post: Mira Murati Net Worth
Who Is Offering Synthetic GM Oil Change for $39.95? visit originalorganiconion.com
Welcome to HazelNews, where our commitment to upholding the highest standards of journalistic integrity guides everything we do. Our mission is to deliver not just news, but a deeper understanding of the world around us. As we navigate these turbulent times, our commitment to truth and accuracy becomes even more crucial.
This really resonated with me! If you’re looking to explore more information, Check out ridzeal.org. We offer in-depth articles, free resources, and helpful advice to make your journey easier.
Vent Speak Magazine offers insightful articles across various topics, including business, community, education, entertainment, lifestyle, technology, and travel. visit ventspeakmagazine.com