yo dawg, I heard you like strings in your types so I put a type in your string so you could type check your strings while you stringify your types - Proxy "XZibit"
The saying goes that one should encode as many invariants in the type system as possible. This way bad programs don't type check, thereby affording you hours previously spent writing unit tests to now meander in the warm embrace of nihilism. With that in mind I was faced with the following problem: at work we have a json-encoded message object that has a generic payloads. However, each payload should contain the keys msg_type and data where msg_type contains a string-encoded value to indicate how to route or parse data. For example:
{ "from": "you"
, "to" : "me"
, "payload" : { "msg_type" : "identity_validation_request"
, "data" : { ... IdentityValidationData data ... }
}
}As time withers and our corporeal bodies float aimlessly through our cold, meaningless universe, the possible values of msg_type (in this example: "identity_validation_request") will increase and be litered throughout our codebase as we add various types of payload. These values are a great example of invariants that should be encoded somewhere in our type system. But how?
The goal of this post is to create a data type that encodes our "msg_type" value in its type and succesfully parses a json encoded string if the "msg_type" value matches the value in our type. It should look like this:
-- | `s` will hold the value of our msg_type, `a` is the data.
data Payload s a
-- | this is the goal
parse :: ByteString -> Maybe (Payload s a)
-- | where
jsonB :: ByteString
jsonB = "{\"msg_type\" : \"identity_validation_request\", \"data\": ... }"
-- itworks evaluates to: Just (Payload "identity_validation_request" IdentityValidationData)
itworks = decode jsonB :: Maybe (Payload "identity_validation_request" IdentityValidationData)
-- doesntparse evalutes to: Nothing
doesntparse = decode jsonB :: Maybe (Payload "xxxx" IdentityValidationData)To get there, we will:
- Sum Types: The Simplest Thing That Works
- (De)Serializing Data.Proxy
- A More General DeSerialization of Data.Proxy
- Data.Proxy woes
- Data.Reflection
- Stringly-Typed Programming for the Masses
- Next Steps
Before we dive into type-level tomfoolery, let's create the simplest thing that works. The goal is to dispatch based on the key in "msg_type". There are a few ways of doing this. We could forgoe any sense of type-safety and write FromJSON instances that inspect "msg_type" willy-nilly. However, to bring some sanity to our codebase, let's use a basic sum type called Payloads which contain all possible payloads. This will force us to put all "msg_type" string assumptions in one place.
{-# LANGUAGE OverloadedStrings #-}
module SimpleThings where
import Control.Applicative ((<$>), (<*>))
import Control.Monad (mzero)
import Data.Aeson
import qualified Data.Aeson as A
-- | sum type containing all possible payloads
data Payloads = FooPayload Int String
| InviteRequestPayload String
| IdentityValidationRequestPayload Int Int String
-- | dispatch on the value of `"msg_type"`
instance FromJSON Payloads where
parseJSON (Object v) = v .: "msg_type" >>= handlePayloadType
where
-- handle foo_payload key
handlePayloadType (A.String "foo_payload") =
v .: "data" >>= \sub -> FooPayload <$> sub .: "id" <*> sub .: "msg"
-- handle invite_request key
handlePayloadType (A.String "invite_request") =
v .: "data" >>= \sub -> InviteRequestPayload <$> sub .: "msg"
-- handle identity_validation_data key
handlePayloadType (A.String "identity_validation_data") =
v .: "data" >>= \sub -> IdentityValidationRequestPayload <$> sub .: "from"
<*> sub .: "to"
<*> sub .: "validation_msg"
-- default
handlePayloadType _ = mzero
parseJSON _ = mzeroIn this design we are expecting a json structure that looks like: {"msg_type" : "...", "data": { ... }}. To parse this we inspect the value of the "msg_type" key and dispatch accordingly. This approach is simple and forces us to keep the assumed json keys in one place. However, we loose lots of generality. This solution is not polymorphic (i.e. we aren't working with an arbitrary, abstract container) and is prone to the expression problem (add a new payload, we now have to dispatch in a bunch of places).
To improve on this we want a data type that looks like:
data Payload a = Payload { payload :: a }How and where do we encode the "msg_type" string in this polymorphic container? Note that we can no longer dispatch over the "msg_type" value anymore and return different types of a. To resolve this we'll need to take a detour into Data.Proxy.
Data.Proxy is a great example of Haskell's expressive type system. It's an incredibly simple and essential data type in Haskell's type-level swiss army knife. It's definition:
data Proxy a = ProxyNote that a can be of any kind, specifically, it can be of kind Symbol, which means that Proxy "foo" is a valid type (with the DataKinds extension enabled). Haskell also exposes some support for transitioning from the Symbol kind to a value of type String:
symbolVal :: KnownSymbol s => Proxy s -> String$ ghci
Prelude> import GHC.TypeLits
Prelude GHC.TypeLits> import Data.Proxy
Prelude GHC.TypeLits Data.Proxy> :set -XDataKinds
Prelude GHC.TypeLits Data.Proxy> symbolVal (Proxy :: Proxy "foo")
"foo"
Armed with Proxy and symbolVal we can now attempt to serialize and de-serialize JSON into Proxy "foo". The ToJSON instance is pretty straight forward:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE OverloadedStrings #-}
module Proxy where
import Control.Monad (mzero)
import Data.Aeson
import Data.Proxy (Proxy(Proxy))
import GHC.TypeLits (KnownSymbol, symbolVal)
instance ToJSON (Proxy "foo") where
toJSON p = object [ "type" .= symbolVal p ]This will serialize Proxy :: Proxy "foo" into {"type": "foo"}. In fact, we can do better:
instance KnownSymbol s => ToJSON (Proxy s) where
toJSON p = object [ "type" .= symbolVal p ]Now, let's write a FromJSON instance for Proxy "foo":
instance FromJSON (Proxy "foo") where
parseJSON (Object v) = v .: "type" >>= handleType
where
handleType (A.String "foo") = return (Proxy :: Proxy "foo")
handleType _ = mzero
parseJSON _ = mzero
jsonString :: BL.ByteString
jsonString = "{\"type\": \"foo\"}"You can see this work in action by loading the code in ghci:
$ ghci
Prelude> :load 02-data-proxy.hs
[1 of 1] Compiling Proxy ( 02-data-proxy.hs, interpreted )
Ok, modules loaded: Proxy.
*Proxy> :set -XDataKinds
*Proxy> decode jsonString :: Maybe (Proxy "foo")
Just Proxy
*Proxy> decode jsonString :: Maybe (Proxy "bar")
<interactive>:5:1:
No instance for (FromJSON (Proxy "bar"))
arising from a use of ‘decode’
In the expression: decode jsonString :: Maybe (Proxy "bar")
In an equation for ‘it’:
it = decode jsonString :: Maybe (Proxy "bar")
Awesome! This works for Proxy "foo" and we get a compiler error when trying to deserializing into Proxy "bar".
Obviously we don't want to write a FromJSON instance for Proxy "bar" and every other string / symbol kind that might appear. If you write the naive FromJSON instance you'll hit a wall:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE OverloadedStrings #-}
module Proxy where
import Control.Monad (mzero)
import Data.Aeson
import qualified Data.Aeson as A
import qualified Data.ByteString.Lazy as BL
import Data.Proxy (Proxy(Proxy))
import GHC.TypeLits (KnownSymbol, symbolVal)
import Data.Text (pack)
instance KnownSymbol s => ToJSON (Proxy s) where
toJSON p = object [ "type" .= symbolVal p ]
instance KnownSymbol s => FromJSON (Proxy s) where
parseJSON (Object v) = v .: "type" >>= handleType
where
handleType (A.String s) | s == pack (symbolVal (Proxy :: Proxy s)) = return (Proxy :: Proxy s)
handleType _ = mzero
parseJSON _ = mzero
jsonString :: BL.ByteString
jsonString = "{\"type\": \"foo\"}"This intance naively checks if the value corresponding to the "type" key matches the symbolVal of the Proxy and if they match, it returns a Proxy of the correct type.
Unfortunately, if you compile this you will get the following error:
03-more-general-proxy.hs:21:44:
Couldn't match kind ‘*’ with ‘GHC.TypeLits.Symbol’
Expected type: Value
-> aeson-0.8.0.2:Data.Aeson.Types.Internal.Parser (Proxy s)
Actual type: Value
-> aeson-0.8.0.2:Data.Aeson.Types.Internal.Parser (Proxy s0)
In the second argument of ‘(>>=)’, namely ‘handleType’
In the expression: v .: "type" >>= handleType
The problem is that GHC differentiates between the variables that appear in a type signature from the variables that appear in a function's definition. Therefore, the s in Proxy s in the type signature is different from the s in (Proxy :: Proxy s) appearing in the definition. To resolve this, we can enable the ScopedTypeVariables extension, which will extend the scope of a type variable throughout the function definition. This will allow GHC to infer that s satisfies the KnownSymbol constraint and compile. Loading this into ghci:
$ ghci
Prelude> :load 03-more-general-proxy.hs
[1 of 1] Compiling Proxy ( 03-more-general-proxy.hs, interpreted )
Ok, modules loaded: Proxy.
*Proxy> :set -XDataKinds
*Proxy> decode jsonString :: Maybe (Proxy "foo")
Just Proxy
*Proxy> decode jsonString :: Maybe (Proxy "bar")
Nothing
*Proxy> import Data.ByteString.Lazy
*Proxy Data.ByteString.Lazy> :set -XOverloadedStrings
*Proxy Data.ByteString.Lazy> let otherString = "{\"type\":\"bar\"}" :: ByteString
*Proxy Data.ByteString.Lazy> decode otherString :: Maybe (Proxy "bar")
Just Proxy
Great post! On GHC 8.0.1 you would also need
{-# LANGUAGE OverlappingInstances #-}or it will complain about "Overlapping instances" for ToJSON FromJSON. Overlapping instances are getting depricated, so a new way of doing it without{-# LANGUAGE OverlappingInstances #-}would be like this: