My Study Note: Recursive Length Prefix (RLP) Serialization

Justin Thein Kyaw
9 min readNov 22, 2023

Introduction

The RLP Serialization Standard is a method used to transfer data between nodes in a space-efficient format. It aims to encode arbitrarily nested arrays of binary data and is widely utilized as the primary encoding method for serializing objects in Ethereum’s execution layer.

Purpose

The main purpose of RLP is to encode the structure of data. It does not handle the encoding of specific data types such as strings or floats, as this responsibility is left to higher-order protocols. However, it is important to note that positive RLP integers must be represented in big-endian binary form without any leading zeroes. Consequently, an RLP integer value of zero is equivalent to the empty byte array.

To use RLP to encode a dictionary, the two suggested canonical forms are:

  • use [[k1,v1],[k2,v2]...] with keys in lexicographic order
  • use the higher-level Patricia Tree encoding as Ethereum does

The RLP encoding function takes in an item. An item is defined as follows:

  • a string (i.e. byte array) is an item
  • a list of items is an item

For example, all of the following are items:

  • an empty string;
  • the string containing the word “cat”;
  • a list containing any number of strings;
  • and a more complex data structures like ["cat", ["puppy", "cow"], "horse", [[]], "pig", [""], "sheep"].

Encoding Process

The RLP encoding process involves the following steps:

Encoding Single Bytes

If a single byte is in the range 0x00 to 0x7f, it is encoded as itself.

Encoding Short Strings

Strings with a length of 55 bytes or less are encoded as a single byte, representing the length of the string plus 0x80, followed by the string itself.

Encoding Long Strings

Strings longer than 55 bytes are encoded as follows:

  1. The length of the string is encoded as a byte array.
  2. The length of the length is encoded as a byte array.
  3. The string itself is appended after the length and length of the length.

Encoding Lists or Arrays

Lists or arrays are encoded using the same principles as strings, with the exception that the length of the list or array is used instead of the length of the string.

Implementation Details

In RLP Encoding, it’s only focus on following data type. In following content, ‘string’ means “a certain number of bytes of binary data”; no special encodings are used, and no knowledge about the content of the strings is implied.​

💡 Computation of first bytes + follow by of Green highlight rows are little more complicated than others.

Rule 1: Encoding Single Bytes

For a single byte whose value is in the [0x00, 0x7f] (decimal [0, 127]) range, that byte is its own RLP encoding. Single byte range is 0x00–0x7f.

For Example:

  • the encoded integer 0 (‘\x00’) = [ 0x00 ]
  • the encoded integer 15 (‘\x0f’) = [ 0x0f ]

Rule 2: Encoding Short Strings

Otherwise, if a string is 0–55 bytes long, the RLP encoding consists of a single byte with value 0x80 (dec. 128) plus the length of the string followed by the string. The range of the first byte is thus [0x80, 0xb7] (dec. [128, 183]).

If String is between 0–55 bytes, RLP encoding function will do like

  • 0x80 + (the length of the string), followed by string
  • String is 0–55 bytes long, so the first byte range is 0x80–0xb7 which is (0x80+55 bytes)

For Example:

  • the string “Ou” = [ 0x82, ‘o’, ‘u’ ]
  • the encoded integer 1024 (‘\x04\x00’) = [ 0x82, 0x04, 0x00 ]
  • the empty string (‘null’) = [ 0x80 ]
  • the integer 0 = [ 0x80 ]

Rule 3: Encoding Long Strings

If a string is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xb7 (dec. 183) plus the length in bytes of the length of the string in binary form, followed by the length of the string, followed by the string. The range of the first byte is thus [0xb8, 0xbf] (dec. [184, 191]).

If string is more than 55 bytes long, RLP encoding function will

  • 0xb7 + (the length in bytes of the length of the string in binary form), followed by the length of the string, followed by the string.

For Example:

  • a 1024 byte long string would be encoded as \xb9\x04\x00 (dec. 185, 4, 0) followed by the string.
  • the string Lorem ipsum dolor sit amet, consectetur adipisicing elit = [ 0xb8, 0x38, 'L', 'o', 'r', 'e', 'm', ' ', ... , 'e', 'l', 'i', 't' ]

Here is why,

# the length in bytes of the length of the string in binary form
0xb7 + 2 = 0xb9
^
|
'2' is the length in bytes of the length of the string in binary form
- 1024 bytes long string size in bytes is 0x0400,
- bytes 0x0400 is 2 bytes
- string is over 55 bytes, so the prefix would be `0xb7` plux size of the bytes 2

# followed by the length of the string
0x0400 is 2 bytes, then `0x0400`

# followed by the actual string
"iloveouou........"

\xb9\x04\x00\x.........

The range of the first byte is [0xb8 - 0xbf]

Rule 4: Encoding Short List

if the total payload of a list (i.e. the combined length of all its items being RLP encoded) is 0–55 bytes long, the RLP encoding consists of a single byte with value 0xc0 plus the length of the payload followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xc0, 0xf7] (dec. [192, 247]).

For more details, please refer to Theoretical Representation.

If payload list 0–55 bytes,

  • first bytes is (0xc0 + the combined length of all items being RLP encoded), followed by the concatenation of the RLP encodings of the items
  • First byte range is 0xc0–0xf7

For example:

  • the empty list = [ 0xc0 ]
  • the list [ “cat”, “dog” ] = [ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ]

Rule 5: Encoding Long List

If the total payload of a list is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xf7 plus the length in bytes of the length of the payload in binary form, followed by the length of the payload, followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xf8, 0xff] (dec. [248, 255]).

If payload list is more than 55 bytes

  • 0xf7 + (the length in bytes of the length of the payload in binary form), followed by the length of the payload, followed by the concatenation of the RLP encodings of the items.
  • First byte range is 0xf8–0xff

Theoretical Representation

In Zermelo–Fraenkel (ZF) set theory, the natural numbers are defined recursively by letting 0 = {} be the empty set and n + 1 (the successor function) = n ∪ {n} for each n. In this way n = {0, 1, …, n − 1} for each natural number n. This definition has the property that n is a set with n elements. The first few numbers defined this way are: (Goldrei 1996)

For Example,

[ [], [[]], [ [], [[]] ] ] can be encoded as [ 0xc7, 0xc0, 0xc1, 0xc0, 0xc3, 0xc0, 0xc1, 0xc0 ]

Type is collection of items and not over 55 bytes. So, we can applied with Rule 4.

  • First bytes is (0xc0 + the length of the payload), followed by the concatenation of the RLP encodings of the items
  • So, 0xc7 which is 0xc0 + 0x07 means combined length of all items is 0x07. That’s why 0xc7 is first bytes
  • After that [] is 0xc0. So, 0xc7 0xc0
  • After that [[]] is 0xc1 and inside of it’s contains empty list [] which is 0xc0. So, 0xc7 0xc0 0xc1 0xc0.
  • After that [ [], [ [] ] ] which is contains two list which is [] and [ [] ] which contains empty list again. Total List is 3. 0xc0 + 0x03 is 0xc3. So, 0xc7 0xc0 0xc1 0xc0 0xc3.for first [] is 0xc0, so 0xc7 0xc0 0xc1 0xc0 0xc3 0xc0 so far. for second [ [] ] is 0xc1 and inside of it contains empty list [] which is 0xc0
  • Then, final encoded result is 0xc7 0xc0 0xc1 0xc0 0xc3 0xc0 0xc1 0xc0

Decoding Rule

According to the rules and process of RLP encoding, the input of RLP decode is regarded as an array of binary data. The RLP decoding process is as follows:

  1. according to the first byte (i.e. prefix) of input data and decoding the data type, the length of the actual data and offset;
  2. according to the type and offset of data, decode the data correspondingly;
  3. continue to decode the rest of the input;

Implementation Rules

Rule 1: the data is a string if the range of the first byte (i.e. prefix) is [0x00, 0x7f], and the string is the first byte itself exactly;

Rule 2: the data is a string if the range of the first byte is [0x80, 0xb7], and the string whose length is equal to the first byte minus 0x80 follows the first byte;

  • For Example, encoded string “dog” is [ 0x83, ‘d’, ‘o’, ‘g’ ]

If first bytes (prefix) is within 0x80–0xb7,

  • the whose length = first bytes - 0x80
  • in this case, whose length = 0x83–0x80. Length is 0x03.

Rule 3: the data is a string if the range of the first byte is [0xb8, 0xbf], and the length of the string whose length in bytes is equal to the first byte minus 0xb7 follows the first byte, and the string follows the length of the string;

  • For example, a 1024 byte long string would be encoded as \xb9\x04\x00 (dec. 185, 4, 0) followed by the string.
  • whose length in bytes = first byte-0xb7
  • 0xb9–0xb7 = 0x02.
  • Length in bytes of string (or) data is 0x02(Means 2 bytes after first bytes is actual length of data)
  • In this case 0xb90400.… , after 0xb9 next 2 bytes is 0x0400 which is 1024 in decimal.

Rule 4: the data is a list if the range of the first byte is [0xc0, 0xf7], and the concatenation of the RLP encodings of all items of the list which the total payload is equal to the first byte minus 0xc0 follows the first byte;

  • the list [ “cat”, “dog” ] = [ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ]
  • 0xc8–0xc0 = 0x08 which is the concatenation of the RLP encodings of all items of the list.
  • Meaning after first bytes, the following 8 bytes is the concatenation of the RLP encodings of all items of the list.
  • then, check second bytes range, 0x83
  • Applied Rule 2 here to get the length of the string.
  • 0x83–0x80 = 3

Rule 5: the data is a list if the range of the first byte is [0xf8, 0xff], and the total payload of the list whose length is equal to [1] the first byte minus 0xf7 follows the first byte, and [3] the concatenation of the RLP encodings of all items of the list follows [2] the total payload of the list;

if we reconstruct [1],[2],[3] then (Refer to encode rule 5) we got this:

  • (0xf7 + the length in bytes of the length of the payload in binary form), followed by the length of the payload, followed by the concatenation of the RLP encodings of the items.
  • This rule is similar with Rule 3 and Rule 4.
  • TL (Total Payload Length in bytes of the length of the payload in binary form) = first bytes — 0xf7
  • Then took the TL bytes from encoded RPL as Length of the payload
  • Read followed by the length of the payload and decode it.

Conclusion

The RLP Serialization Standard is a crucial component in Ethereum’s execution layer, enabling efficient data transfer between nodes. By providing a standardized approach to encoding nested arrays of binary data, RLP facilitates the interoperability of Ethereum-based applications and protocols.

For more detailed information and implementation guidelines, refer to the official RLP documentation.

--

--