Radix Tree Data Structure

Justin Thein Kyaw
6 min readNov 27, 2023

--

About Radix in wikipedia.org

In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The result is that the number of children of every internal node is at most the radix r of the radix tree, where r is a positive integer and a power x of 2, having x ≥ 1.

Unlike regular trees, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.

Unlike regular trees (where whole keys are compared en masse from their beginning up to the point of inequality), the key at each node is compared chunk-of-bits by chunk-of-bits, where the quantity of bits in that chunk at that node is the radix r of the radix trie.

Explanation

The key at each node in the radix trie is compared chunk-of-bits by chunk-of-bits. The term “chunk-of-bits” refers to a group of consecutive bits in the binary representation of the key.

The quantity of bits in each chunk at a particular node is determined by the radix r of the radix trie. When r is set to 2, the radix trie is binary, and each node compares a 1-bit portion of the key.

Explain Radix tree in One pics.

Sparseness

“Sparseness” refers to the potential inefficiency arising from internal nodes having fewer children than the maximum allowed by the radix r of the trie. Sparseness occurs when internal nodes have fewer than r children, leaving gaps or unused slots in the node.

  • Wasted Memory: Sparseness can lead to wasted memory because internal nodes are allocated space for a certain number of children (r), even if they have fewer actual children.
  • Potential for Gaps: The presence of gaps in internal nodes can complicate trie traversal algorithms, as there may be unused slots that need to be skipped during key comparison.

The choice of radix r in a radix trie involves a trade-off between trie depth and sparseness. A larger r (e.g., r=16) reduces trie depth but may introduce more sparseness because internal nodes need to have 16 children. A smaller r (e.g., r=2) minimizes sparseness but can lead to deeper tries.

Some trie variations or optimizations aim to address sparseness. For instance, Patricia Tries (or Radix Tries) use a compact representation that can reduce memory wastage by storing multiple bits per node.

Insertion

To insert a string, we search the tree until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining elements in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string elements.

Deletion

To delete a string x from a tree, we first locate the leaf representing x. Then, assuming x exists, we remove the corresponding leaf node. If the parent of our leaf node has only one other child, then that child’s incoming label is appended to the parent’s incoming label and the child is removed.

Child merging to Parent

Redundant Edge and Redundant Chain of Edges

In the context of radix tries or Patricia tries, the terms “redundant” and “redundant chain of edges” refer to situations where the trie structure contains edges that do not contribute to distinguishing between keys. These redundant edges may arise due to common prefixes shared by keys in the trie.

A redundant edge is an edge in the trie that doesn’t add any new information for distinguishing between keys. It represents a common prefix shared by the keys associated with the nodes at both ends of the edge.

A redundant chain of edges is a sequence of edges in the trie, leading from the root to a leaf, where each edge in the sequence is redundant. Such a chain may result from a series of keys sharing a common prefix.

Limitation of Radix Tree.

1) Radix Tree needs Reversibility (String-Like Elements or Reversible Mapping)

Radix trees are well-suited for scenarios where the keys are strings or can be efficiently mapped to strings. Each node in the radix tree represents a portion of the key.

In a radix tree, each node represents a portion of a key. For example, if you have a radix tree storing words, each node might represent a letter in a word. The path from the root to a leaf node corresponds to the complete key.

“Mapped to strings” we mean that the keys or portions of the keys are represented as strings. This mapping is typically reversible, meaning you can convert between the original key and its string representation.

For example:

Original keys: Words like “bat,” “batman,” and “batwoman” are keys

String Representation: We can represent each letter in these words as a string. For example, “bat” might be represented as the string “b,” “a,” “t.”

Reversibility: The mapping from the original keys to strings is reversible. You can reconstruct the original keys from their string representations.

If your keys are strings, this mapping is straightforward, as each character in the string is a portion of the key. If your keys are not inherently strings (e.g., numeric values or custom objects), you need a reversible mapping to convert them into strings for use in the radix tree.

The reversible mapping ensures that you can easily convert a portion of a key back to its original form. This reversible mapping is crucial for radix trees because it allows for the efficient organization of keys based on their string representations. It enables the trie structure to efficiently handle prefix-based searches and comparisons.

While the mapping of keys to strings is advantageous for trie operations, it’s important to note that this doesn’t mean all keys must be character strings. Instead, it means that keys or portions of keys are represented as strings to take advantage of the inherent properties of strings for efficient trie-based operations. This approach is particularly common in radix trees, which leverage string representations for effective prefix-based searches.

One specific challenge highlighted is when a data type only provides a comparison operation but not a (de)serialization operation. (De)serialization is important for radix trees because it allows converting elements to and from strings.

2) Lack of Generality Compared to Balanced Search Trees

Delve into the concept of generality and total ordering in the context of radix trees and balanced search trees (such as AVL trees or Red-Black trees).

Radix Tree are specific to Strings or Reversible Mapping.

Radix trees are designed with a focus on strings or elements that can be efficiently mapped to strings. Each node in a radix tree typically represents a character in the string key, and the tree relies on the lexicographic order of characters.

Radix trees, by their nature, lack the full generality of handling any data type with a total ordering. If your keys are not naturally strings, a reversible mapping to strings is required for effective use in a radix tree.

Balanced Search Trees

Balanced search trees, like AVL trees or Red-Black trees, can handle any data type that has a total ordering defined. Total ordering means there is a consistent way to compare any two elements, facilitating sorting and efficient searching. Unlike radix trees, balanced search trees do not depend on strings or reversible mappings. They directly use the total ordering of elements.

For Example:

Let’s consider an example with a custom data type, a Person object, with attributes age and name

In this radix tree, each node represents a digit or character in the age or name. The lexicographic order of strings is utilized to organize keys.

In this AVL tree, the total ordering of the Person objects is used directly. It doesn't rely on string representations, making it more versatile for any data type with a total ordering.

Summary:

  • Radix trees excel when keys are strings or can be mapped to strings efficiently.
  • Balanced search trees offer greater generality, handling any data type with a total ordering without dependence on string representations.

In situations where keys are not naturally strings, and a total ordering is defined, balanced search trees provide a more flexible and general solution.

--

--