Model-Based Stateful Testing with proptest-state-machine

This post is part of the Model-Based Stateful Testing with madhouse-rs series.

Imagine trying to test a distributed system with dozens of operations:

miners submitting blocks
nodes joining and leaving
transactions flooding the mempool
network partitions
and more.

Traditional unit tests can’t capture the chaotic, interleaved nature of these scenarios.

Model-based testing offers a solution: define all possible operations, let the framework generate random sequences, and check that your system behaves correctly. But… how you structure those operations determines whether your test harness scales from 5 commands to 500.

proptest-state-machine sits firmly on the operations-open, data-closed side of the expression problem. You define a central Transition enum that lists every possible operation.

The Enum Approach

Let’s explore this with a simple counter example first—while the real-world blockchain scenarios would be more complex to demonstrate initially, the core design patterns are identical. We’ll see the blockchain applications later in the series.

use proptest_state_machine::*;
use proptest::prelude::*;

// The model state.
#[derive(Clone, Debug)]
struct CounterModel {
    value: i32,
    max_value: i32,
}

// The real system under test.
struct Counter {
    value: i32,
    max_value: i32,
}

// All possible operations in one enum.
#[derive(Clone, Debug)]
enum CounterTransition {
    Inc,
    Dec,
    Reset,
}

impl StateMachine for CounterModel {
    type State = CounterModel;
    type Sut = Counter;
    type Transition = CounterTransition;

    fn init_state() -> BoxedStrategy<Self::State> {
        Just(CounterModel { value: 0, max_value: 100 }).boxed()
    }

    fn init_sut(state: &Self::State) -> BoxedStrategy<Self::Sut> {
        Just(Counter {
            value: state.value,
            max_value: state.max_value
        }).boxed()
    }

    fn transitions(_state: &Self::State) -> BoxedStrategy<Self::Transition> {
        prop_oneof![
            Just(CounterTransition::Inc),
            Just(CounterTransition::Dec),
            Just(CounterTransition::Reset),
        ].boxed()
    }

    fn apply(
        mut state: Self::State,
        sut: &mut Self::Sut,
        transition: Self::Transition,
    ) -> Self::State {
        match transition {
            CounterTransition::Inc => {
                if state.value < state.max_value {
                    state.value += 1;
                    sut.value += 1;
                }
            }
            CounterTransition::Dec => {
                state.value -= 1;
                sut.value -= 1;
            }
            CounterTransition::Reset => {
                state.value = 0;
                sut.value = 0;
            }
        }
        state
    }

    fn postconditions(state: &Self::State, sut: &Self::Sut) {
        assert_eq!(state.value, sut.value);
        assert_eq!(state.max_value, sut.max_value);
        assert!(state.value <= state.max_value);
    }
}

The Scalability Problem

This approach starts simple, but every new operation requires:

Adding a variant to the central CounterTransition enum.
Updating the apply function with a new match arm.
Updating the transitions function to include the new operation.

With 10 operations, this is manageable. With 100+ operations—like testing a blockchain node—it becomes unwieldy. The apply function grows into a monolithic match statement. Every developer adding a test command must touch this central file.

Real-World Complexity

Consider testing the Stacks blockchain, where operations include:

enum StacksTransition {
    MineBitcoinBlock,
    MineStacksBlock(BlockData),
    SubmitTransaction(Transaction),
    ConnectPeer(PeerInfo),
    DisconnectPeer(PeerId),
    NetworkPartition(Vec<NodeId>),
    RestoreNetwork,
    RestartNode(NodeId),
    // ... 50+ more operations.
}

The apply function becomes hundreds of lines. Adding new test scenarios requires editing this central bottleneck. Worse, the logic for each command is scattered—generation in transitions, preconditions mixed into apply, and postconditions in a separate function.

This scaling limitation becomes apparent in complex scenarios like reproducing a Stacks mainnet bug. While that specific attempt used hand-written tests, the enum-based approach would face the same bottleneck—dozens of blockchain operations in a central enum and apply function, making it difficult to generate the chaotic scenarios needed to reveal subtle consensus issues.

Next: Scaling Model-Based Stateful Testing with madhouse-rs