KEMBAR78
Creating a Custom Serialization Format (Gophercon 2017) | PDF
Creating a Custom
Serialization Format
Scott Mansfield (@sgmansfield)
Senior Software Engineer
Netflix
What are we doing here?
1. Motivations
2. Queries
3. The Format
4. Performance
5. Future
1. Motivations
"The field is too in love with horribly
inefficient frameworks. Writing network
code and protocols is now considered
too low level for people."
- jnordwick (Hacker News)
Motivations
● Computers make meaning out of voltages
● Serialization is everywhere
○ Network protocols
○ Video encoding
○ Machine code
○ HTTP/2 headers
○ Hard drive communication
○ Video display
● Engineers should know what's inside the black box
Motivations
● JSON is the de facto serialization format
● Common pattern:
1. Get entire document
2. Inflate serialized data
3. Walk data structure & extract
● New pattern:
1. Query the document
2. Get only the data you need
3. Still need to inflate
Motivations
● Query capabilities over JSON documents
● Documents stored as a byte array
JSON Document (Augmented)
{
"null" : null,
"boolean" : true,
"integer" : 1,
"float" : 2.3,
"string" : "a string",
"array" : [4, 5, 6],
"map" : {"foo": 1}
}
2. Queries
Query Types
● Array Index
● Array Slice
● Array Iteration
● Map Access
● Map Keys
● Map Iteration
Array Index
Query: [2]
Result: 3
[1, 2, 3, 4, 5]
↑
Index 2
Array Slice
Query: [2:-1]
Result: [3, 4]
[1, 2, (3, 4), 5]
↑
Index 2 until 4
Array Iteration
Query: .a[] [0]
Result: [1,2,3,4,5]
[[1], [2], [3], [4], [5]]
↑ ↑ ↑ ↑ ↑
Index 0 of each list
Map Access (Single)
Query: .foo
Result: 3
{"foo": 3, "bar": 4}
↑
Key foo
Map Access (Multiple)
Query: .foo|bar
Result: {"foo":3, "bar":4}
{"foo":3, "bar":4, "baz":5}
↑ ↑
Key foo Key bar
Map Keys
Query: keys
Result: ["foo", "bar"]
{"foo": 3, "bar": 4}
↑ ↑
Map Keys
Map Iteration
Query: .m[] [0]
Result: {"foo": 3, "bar": 4}
{"foo": [3], "bar": [4]}
↑ ↑
Index 0 of each array value
Example
{"foo": {"k1": [3,4]},
"bar": {"k1": [5,6]}}
Query: .m[] .k1 [0]
Result: {"foo": 3, "bar": 5}
Example
{"foo": {"1":1, "2":2, "3":3},
"bar": {"4":4, "5":5, "6":6}}
Query: .m[] keys
Result: {"foo": ["1","2","3"],
"bar": ["4","5","6"] }
3. The Format
Types
Augmented JSON == JSON + integers
● Scalars
○ Null
○ Boolean
○ Integer (64 bit)
○ Float (64 bit)
○ String
● Composites
○ Array
○ Map
General Format
Every record starts with a single byte for the type:
int ...
Type Data
1 byte
Scalars
● Null
● Boolean
● Integer (64 bit)
● Float (64 bit)
● String
Scalar: Null
null
Type
1 byte
Scalar: Boolean
bool 1 or 0
Type Data
1 byte 1 byte
Scalar: Integer
int Little endian int64
Type Data
1 byte 8 bytes
Scalar: Integer (example)
4 = 0x0000_0000_0000_0004
int
Type Data
1 byte 8 bytes
04 00 00 00 00 00 00 00
Scalar: Float
float float64 as little endian uint64
Type Data
1 byte 8 bytes
Scalar: Float (example)
4.5 = 0x4012_0000_0000_0000
float
Type Data
1 byte 8 bytes
00 00 00 00 00 00 12 40
Scalar: String
string
Type Length
1 byte 4 bytes
Little endian
uint32
String contents
Data
length bytes
Scalar: String (example)
"Hello, Go!" Length: 10 = 0x0000_000A
string
Type __ Length __
1 byte 4 bytes
_______ Data ________
10 bytes
0A 000000 l l o , G o !H e
Composites
Recursive - contained data are defined by this same format
● Array
● Map
Composite: Array
array
Type Header
1 byte var bytes
array header array entries
Data
var bytes
Composite: Array - Header
numoffsets
var bytes
offlen
uvarint
numoffsets uints
of offlen length
(0,8)
numoffsets × offlen bytes1 byte
offsets
Composite: Array - Header offsets
2 or more offsets
Each offlen bytes
offset offsetoffsetoffsetoffset
Composite: Array - Data
1 or more records
Each var bytes
record recordrecordrecord
Composite: Empty Array
numoffsets
1 byte
uvarint (0)array
Type
1 byte
Composite: Array (example)
[true, false]
array
Type ___ Header ____
1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾
______ Data _______
‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾
0 2 4 bool 1 bool 013
num
off
off
len
____ offsets ____ ___ record 2 ______ record 1 ___
Composite: Array (example, slicing)
[true, false]
array
Type ___ Header ____
1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾
______ Data _______
‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾
0 2 4 bool 1 bool 0123
num
off
off
len
____ offsets ____ ___ record 2 ______ record 1 ___
Composite: Map
map
Type Header
1 byte var bytes
map header map entries
Data
var bytes
Composite: Map - Header
num recs
var bytes
offlen
uvarint
num recs
header records
(0,8)
∝num recs1 byte
header recordslenlen
(0,8)
1 byte
Composite: Map - Header
1 or more header records
Each 4 + offlen + lenlen bytes
record recordrecordrecord
Composite: Map - Data
1 or more records
Each var bytes
record recordrecordrecord
Composite: Map - Header Record
Intern ID
4 bytes
uint32 uintuint
offset length
offlen bytes lenlen bytes
Composite: Map - Interned Keys
● Map keys are assigned a unique uint32 ID
● IDs are shared by identical strings
● Forward and reverse mappings stored next to the data
● Example:
○ "true" → 1
○ "false" → 2
Composite: Map - Header
header records
1 1955217
Composite: Empty Map
num recs
1 byte
uvarint
(0)
map
Type
1 byte
Composite: Map (example)
{"false":false, "true":true}
map
Type _______ Header _______
1 byte ‾‾‾‾‾‾‾‾‾‾‾ 15 bytes ‾‾‾‾‾‾‾‾‾‾
1 1 012
#
rec
______ header records ______
___ Data ___
‾‾‾‾ 4 bytes ‾‾‾‾
bool
1
bool
0
record 2record 1
off
len
len
len __ record 1 __
2 2 2 2
__ record 2 __
"true" → 1
"false" → 2
4. Performance
How fast is it?
It depends
… on:
● How much data you ask for
● How complex the query is
● How many CPU's
● Speed of the underlying data storage
Scalars
Serialize
Type time/op
Null 64.3 ns ± 2%
Boolean 71.6 ns ± 1%
Int 75.7 ns ± 0%
Float 75.4 ns ± 1%
String 88.6 ns ± 1% "foobar"
Deserialize
Type time/op
Null 16.0 ns ± 1%
Boolean 23.9 ns ± 1%
Int 26.6 ns ± 1%
Float 27.1 ns ± 1%
String 70.1 ns ± 1%
Composites: Serialize
Type # elems time/op time/op (ns)
Array 0 115 ns ± 0% 115 ns
Array 1 273 ns ± 1% 273 ns
Array 10 900 ns ± 1% 900 ns
Array 100 5.42 µs ± 1% 5420 ns
Array 1000 43.7 µs ± 1% 43700 ns
Array 10000 453 µs ± 1% 453000 ns
Array 100000 5.35 ms ± 1% 5350000 ns
Array 1000000 54.0 ms ± 3% 54000000 ns
Map 0 87.2 ns ± 1% 87 ns
Map 1 608 ns ± 1% 608 ns
Map 10 3.39 µs ± 1% 3390 ns
Map 100 34.1 µs ± 1% 34100 ns
Map 1000 374 µs ± 0% 374000 ns
Map 10000 4.37 ms ± 1% 4370000 ns
Map 100000 58.7 ms ± 2% 58700000 ns
Map 1000000 866 ms ± 4% 866000000 ns
Composites: Deserialize
Type # elems time/op time/op (ns)
Array 0 136 ns ± 1% 136 ns
Array 1 201 ns ± 0% 201 ns
Array 10 588 ns ± 2% 588 ns
Array 100 4.05 µs ± 3% 4050 ns
Array 1000 38.1 µs ± 1% 38100 ns
Array 10000 380 µs ± 2% 380000 ns
Array 100000 3.81 ms ± 1% 3810000 ns
Array 1000000 39.9 ms ± 2% 39900000 ns
Map 0 158 ns ± 0% 158 ns
Map 1 361 ns ± 0% 361 ns
Map 10 1.97 µs ± 0% 1970 ns
Map 100 21.3 µs ± 0% 21300 ns
Map 1000 261 µs ± 1% 261000 ns
Map 10000 2.67 ms ± 1% 2670000 ns
Map 100000 38.3 ms ± 2% 38300000 ns
Map 1000000 757 ms ± 3% 757000000 ns
Composites: Queries
Type # elems time/op
Array Get 1 25.9 ns ± 7%
Array Get 10 26.4 ns ± 6%
Array Get 100 26.6 ns ± 6%
Array Get 1000 26.3 ns ± 6%
Array Get 10000 26.3 ns ± 8%
Array Get 100000 26.0 ns ± 4%
Array Get 1000000 26.2 ns ± 7%
Map Get 1 35.3 ns ± 1%
Map Get 10 64.7 ns ± 0%
Map Get 100 74.6 ns ± 1%
Map Get 1000 121 ns ± 1%
Map Get 10000 157 ns ± 0%
Map Get 100000 221 ns ± 2%
Map Get 1000000 375 ns ± 1%
Type # elems time/op
Array Slice 1 70.1 ns ± 1%
Array Slice 10 73.9 ns ± 4%
Array Slice 100 73.7 ns ± 3%
Array Slice 1000 73.0 ns ± 2%
Array Slice 10000 73.4 ns ± 3%
Array Slice 100000 75.6 ns ± 3%
Array Slice 1000000 73.4 ns ± 2%
Map Keys 1 662 ns ± 9%
Map Keys 10 2.11 µs ± 8%
Map Keys 100 17.4 µs ± 8%
Map Keys 1000 173 µs ± 8%
Map Keys 10000 2.28 ms ± 4%
Map Keys 100000 35.6 ms ± 5%
Map Keys 1000000 348 ms ± 7%
5. Future
In Progress & Future Work
● Replace simple scalar values
● Append to arrays
● Add new keys to a map
● Other ops (inc, dec, etc)
● Compression
Thank You
@sgmansfield
smansfield@netflix.com
techblog.netflix.com
Creating a Custom Serialization Format (Gophercon 2017)

Creating a Custom Serialization Format (Gophercon 2017)

  • 1.
    Creating a Custom SerializationFormat Scott Mansfield (@sgmansfield) Senior Software Engineer Netflix
  • 2.
    What are wedoing here? 1. Motivations 2. Queries 3. The Format 4. Performance 5. Future
  • 3.
  • 4.
    "The field istoo in love with horribly inefficient frameworks. Writing network code and protocols is now considered too low level for people." - jnordwick (Hacker News)
  • 5.
    Motivations ● Computers makemeaning out of voltages ● Serialization is everywhere ○ Network protocols ○ Video encoding ○ Machine code ○ HTTP/2 headers ○ Hard drive communication ○ Video display ● Engineers should know what's inside the black box
  • 6.
    Motivations ● JSON isthe de facto serialization format ● Common pattern: 1. Get entire document 2. Inflate serialized data 3. Walk data structure & extract ● New pattern: 1. Query the document 2. Get only the data you need 3. Still need to inflate
  • 7.
    Motivations ● Query capabilitiesover JSON documents ● Documents stored as a byte array
  • 8.
    JSON Document (Augmented) { "null": null, "boolean" : true, "integer" : 1, "float" : 2.3, "string" : "a string", "array" : [4, 5, 6], "map" : {"foo": 1} }
  • 9.
  • 10.
    Query Types ● ArrayIndex ● Array Slice ● Array Iteration ● Map Access ● Map Keys ● Map Iteration
  • 11.
    Array Index Query: [2] Result:3 [1, 2, 3, 4, 5] ↑ Index 2
  • 12.
    Array Slice Query: [2:-1] Result:[3, 4] [1, 2, (3, 4), 5] ↑ Index 2 until 4
  • 13.
    Array Iteration Query: .a[][0] Result: [1,2,3,4,5] [[1], [2], [3], [4], [5]] ↑ ↑ ↑ ↑ ↑ Index 0 of each list
  • 14.
    Map Access (Single) Query:.foo Result: 3 {"foo": 3, "bar": 4} ↑ Key foo
  • 15.
    Map Access (Multiple) Query:.foo|bar Result: {"foo":3, "bar":4} {"foo":3, "bar":4, "baz":5} ↑ ↑ Key foo Key bar
  • 16.
    Map Keys Query: keys Result:["foo", "bar"] {"foo": 3, "bar": 4} ↑ ↑ Map Keys
  • 17.
    Map Iteration Query: .m[][0] Result: {"foo": 3, "bar": 4} {"foo": [3], "bar": [4]} ↑ ↑ Index 0 of each array value
  • 18.
    Example {"foo": {"k1": [3,4]}, "bar":{"k1": [5,6]}} Query: .m[] .k1 [0] Result: {"foo": 3, "bar": 5}
  • 19.
    Example {"foo": {"1":1, "2":2,"3":3}, "bar": {"4":4, "5":5, "6":6}} Query: .m[] keys Result: {"foo": ["1","2","3"], "bar": ["4","5","6"] }
  • 20.
  • 21.
    Types Augmented JSON ==JSON + integers ● Scalars ○ Null ○ Boolean ○ Integer (64 bit) ○ Float (64 bit) ○ String ● Composites ○ Array ○ Map
  • 22.
    General Format Every recordstarts with a single byte for the type: int ... Type Data 1 byte
  • 23.
    Scalars ● Null ● Boolean ●Integer (64 bit) ● Float (64 bit) ● String
  • 24.
  • 25.
    Scalar: Boolean bool 1or 0 Type Data 1 byte 1 byte
  • 26.
    Scalar: Integer int Littleendian int64 Type Data 1 byte 8 bytes
  • 27.
    Scalar: Integer (example) 4= 0x0000_0000_0000_0004 int Type Data 1 byte 8 bytes 04 00 00 00 00 00 00 00
  • 28.
    Scalar: Float float float64as little endian uint64 Type Data 1 byte 8 bytes
  • 29.
    Scalar: Float (example) 4.5= 0x4012_0000_0000_0000 float Type Data 1 byte 8 bytes 00 00 00 00 00 00 12 40
  • 30.
    Scalar: String string Type Length 1byte 4 bytes Little endian uint32 String contents Data length bytes
  • 31.
    Scalar: String (example) "Hello,Go!" Length: 10 = 0x0000_000A string Type __ Length __ 1 byte 4 bytes _______ Data ________ 10 bytes 0A 000000 l l o , G o !H e
  • 32.
    Composites Recursive - containeddata are defined by this same format ● Array ● Map
  • 33.
    Composite: Array array Type Header 1byte var bytes array header array entries Data var bytes
  • 34.
    Composite: Array -Header numoffsets var bytes offlen uvarint numoffsets uints of offlen length (0,8) numoffsets × offlen bytes1 byte offsets
  • 35.
    Composite: Array -Header offsets 2 or more offsets Each offlen bytes offset offsetoffsetoffsetoffset
  • 36.
    Composite: Array -Data 1 or more records Each var bytes record recordrecordrecord
  • 37.
    Composite: Empty Array numoffsets 1byte uvarint (0)array Type 1 byte
  • 38.
    Composite: Array (example) [true,false] array Type ___ Header ____ 1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾ ______ Data _______ ‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾ 0 2 4 bool 1 bool 013 num off off len ____ offsets ____ ___ record 2 ______ record 1 ___
  • 39.
    Composite: Array (example,slicing) [true, false] array Type ___ Header ____ 1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾ ______ Data _______ ‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾ 0 2 4 bool 1 bool 0123 num off off len ____ offsets ____ ___ record 2 ______ record 1 ___
  • 40.
    Composite: Map map Type Header 1byte var bytes map header map entries Data var bytes
  • 41.
    Composite: Map -Header num recs var bytes offlen uvarint num recs header records (0,8) ∝num recs1 byte header recordslenlen (0,8) 1 byte
  • 42.
    Composite: Map -Header 1 or more header records Each 4 + offlen + lenlen bytes record recordrecordrecord
  • 43.
    Composite: Map -Data 1 or more records Each var bytes record recordrecordrecord
  • 44.
    Composite: Map -Header Record Intern ID 4 bytes uint32 uintuint offset length offlen bytes lenlen bytes
  • 45.
    Composite: Map -Interned Keys ● Map keys are assigned a unique uint32 ID ● IDs are shared by identical strings ● Forward and reverse mappings stored next to the data ● Example: ○ "true" → 1 ○ "false" → 2
  • 46.
    Composite: Map -Header header records 1 1955217
  • 47.
    Composite: Empty Map numrecs 1 byte uvarint (0) map Type 1 byte
  • 48.
    Composite: Map (example) {"false":false,"true":true} map Type _______ Header _______ 1 byte ‾‾‾‾‾‾‾‾‾‾‾ 15 bytes ‾‾‾‾‾‾‾‾‾‾ 1 1 012 # rec ______ header records ______ ___ Data ___ ‾‾‾‾ 4 bytes ‾‾‾‾ bool 1 bool 0 record 2record 1 off len len len __ record 1 __ 2 2 2 2 __ record 2 __ "true" → 1 "false" → 2
  • 49.
  • 50.
    How fast isit? It depends … on: ● How much data you ask for ● How complex the query is ● How many CPU's ● Speed of the underlying data storage
  • 51.
    Scalars Serialize Type time/op Null 64.3ns ± 2% Boolean 71.6 ns ± 1% Int 75.7 ns ± 0% Float 75.4 ns ± 1% String 88.6 ns ± 1% "foobar" Deserialize Type time/op Null 16.0 ns ± 1% Boolean 23.9 ns ± 1% Int 26.6 ns ± 1% Float 27.1 ns ± 1% String 70.1 ns ± 1%
  • 52.
    Composites: Serialize Type #elems time/op time/op (ns) Array 0 115 ns ± 0% 115 ns Array 1 273 ns ± 1% 273 ns Array 10 900 ns ± 1% 900 ns Array 100 5.42 µs ± 1% 5420 ns Array 1000 43.7 µs ± 1% 43700 ns Array 10000 453 µs ± 1% 453000 ns Array 100000 5.35 ms ± 1% 5350000 ns Array 1000000 54.0 ms ± 3% 54000000 ns Map 0 87.2 ns ± 1% 87 ns Map 1 608 ns ± 1% 608 ns Map 10 3.39 µs ± 1% 3390 ns Map 100 34.1 µs ± 1% 34100 ns Map 1000 374 µs ± 0% 374000 ns Map 10000 4.37 ms ± 1% 4370000 ns Map 100000 58.7 ms ± 2% 58700000 ns Map 1000000 866 ms ± 4% 866000000 ns
  • 53.
    Composites: Deserialize Type #elems time/op time/op (ns) Array 0 136 ns ± 1% 136 ns Array 1 201 ns ± 0% 201 ns Array 10 588 ns ± 2% 588 ns Array 100 4.05 µs ± 3% 4050 ns Array 1000 38.1 µs ± 1% 38100 ns Array 10000 380 µs ± 2% 380000 ns Array 100000 3.81 ms ± 1% 3810000 ns Array 1000000 39.9 ms ± 2% 39900000 ns Map 0 158 ns ± 0% 158 ns Map 1 361 ns ± 0% 361 ns Map 10 1.97 µs ± 0% 1970 ns Map 100 21.3 µs ± 0% 21300 ns Map 1000 261 µs ± 1% 261000 ns Map 10000 2.67 ms ± 1% 2670000 ns Map 100000 38.3 ms ± 2% 38300000 ns Map 1000000 757 ms ± 3% 757000000 ns
  • 54.
    Composites: Queries Type #elems time/op Array Get 1 25.9 ns ± 7% Array Get 10 26.4 ns ± 6% Array Get 100 26.6 ns ± 6% Array Get 1000 26.3 ns ± 6% Array Get 10000 26.3 ns ± 8% Array Get 100000 26.0 ns ± 4% Array Get 1000000 26.2 ns ± 7% Map Get 1 35.3 ns ± 1% Map Get 10 64.7 ns ± 0% Map Get 100 74.6 ns ± 1% Map Get 1000 121 ns ± 1% Map Get 10000 157 ns ± 0% Map Get 100000 221 ns ± 2% Map Get 1000000 375 ns ± 1% Type # elems time/op Array Slice 1 70.1 ns ± 1% Array Slice 10 73.9 ns ± 4% Array Slice 100 73.7 ns ± 3% Array Slice 1000 73.0 ns ± 2% Array Slice 10000 73.4 ns ± 3% Array Slice 100000 75.6 ns ± 3% Array Slice 1000000 73.4 ns ± 2% Map Keys 1 662 ns ± 9% Map Keys 10 2.11 µs ± 8% Map Keys 100 17.4 µs ± 8% Map Keys 1000 173 µs ± 8% Map Keys 10000 2.28 ms ± 4% Map Keys 100000 35.6 ms ± 5% Map Keys 1000000 348 ms ± 7%
  • 55.
  • 56.
    In Progress &Future Work ● Replace simple scalar values ● Append to arrays ● Add new keys to a map ● Other ops (inc, dec, etc) ● Compression
  • 57.