owned this note
owned this note
Published
Linked with GitHub
# SSZ Offset Exploits
This document describes how a malicious entity may exploit a naive SSZ implementation to create multiple, distinct SSZ representations of the same object.
## Introduction
**Note:** this document is not comprehensive or guaranteed to be error-free. It should be considered a collection of informal notes.
**Note:** these potential attacks are not unavoidable flaws in SSZ; they will only be present in an imperfect implementation.
These potential exploits are only related to malicious _offsets_; data validation attacks are ignored (e.g., ensuring that exactly 8 bytes are provided to decode a `uint64`).
First, some helper functions are described, then each exploit is listed along with some Python-esque psuedo-code providing potential safeguards.
## Detail
### Helpers & Examples
We define helper functions where `enc` is of type `bytes` and is the SSZ encoding of some container, list or vector. These functions are assumed to operate in a magic context where they may access all information required to decode the object (e.g., the schema).
With each function we provide an example which references the following
struct (defined in Rust, sorry. It should be straight forward to understand -- `Vec` is a variable length list):
```rust
struct Example {
a: u16,
b: Vec<u8>
c: Vec<u8>,
}
let example_a = Example {
a: 42,
b: vec![5, 6],
c: vec![7, 8],
};
// schema: | a | b (offset) | c (offset) | b | c |
let enc = vec![42, 00, 10, 00, 00, 00, 12, 00, 00, 00, 05, 06, 07, 08];
// indices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
assert_eq!(
ssz_encode(example_a),
enc
);
```
### Helpers
#### `num_fixed_bytes(bytes)`
Returns the length of the fixed-length portion of `enc`.
Example:
```python
num_fixed_bytes(ssz_encode(example_a)) == 10
```
#### `offsets(bytes)`
Returns a list containing any offsets in `bytes`, each decoded as an `int`.
Example:
```python
offsets(ssz_encode(example_a)) == [10, 12]
```
## Attacks
## 1. Offset into fixed portion
An offset points "backwards" into the fixed-bytes portion of the message, essentially double-decoding bytes that will also be decoded as fixed-length.
Potential safeguard:
```python
for offset in offsets(enc):
assert offset >= num_fixed_bytes(enc)
```
## 2. Skip first variable byte
The first offset does not point to the byte that follows the fixed byte portion, essentially skipping a variable-length byte.
Potential safeguard:
```python
offsets = offsets(enc)
if len(offsets) > 0:
assert offsets[0] == num_fixed_bytes(enc)
```
## 3. Offsets are decreasing
An offset points to bytes prior to the previous offset. Depending on how you look at it, this either double-decodes bytes or makes the first offset a negative-length.
```python
offsets = offsets(enc)
for i in range(1, len(offsets)):
assert offsets[i] >= offsets[i - 1]
```
## 4. Offsets are out-of-bounds
An offset references byte indices that do not exist in `enc`.
```python
for offset in offsets(enc):
assert offset <= len(enc)
```
_Note: use `<=` since a `[[]]` is represented as `[4, 0, 0, 0]`._