Scaling Model-Based Stateful Testing with madhouse-rs

This post is part of the Model-Based Stateful Testing with madhouse-rs series.

In the previous post, we saw how proptest-state-machine’s enum-based design becomes a bottleneck when scaling to hundreds of operations. What if there was a different approach—one that embraced the “data-open” side of the expression problem?

madhouse-rs was born from this exact frustration. When trying to reproduce that elusive Stacks mainnet bug, the traditional enum approach simply couldn’t scale to the complexity needed.

The Trait-Based Approach

Instead of a central enum, madhouse-rs makes each command its own type implementing a stable Command trait. There is no central bottleneck—no enum to extend, no monolithic match statement to update.

Let’s return to our counter example from the previous post to see how this trait-based approach works in practice:

use madhouse::prelude::*;
use proptest::prelude::*;
use std::sync::Arc;

// Define your state and context.
#[derive(Debug, Default)]
struct CounterState {
    value: u64,
    max_value: u64,
}
impl State for CounterState {}

#[derive(Debug, Clone, Default)]
struct CounterContext {
    increment_range: (u64, u64),
}
impl TestContext for CounterContext {}

// Each operation is its own self-contained type.
struct IncrementCommand {
    amount: u64,
}

impl Command<CounterState, CounterContext> for IncrementCommand {
    // Check preconditions against the model state.
    fn check(&self, state: &CounterState) -> bool {
        state.value + self.amount <= state.max_value
    }

    // Apply the command to both model and real system.
    fn apply(&self, state: &mut CounterState) {
        state.value += self.amount;
        // In a real test, you'd also apply to the actual system here.
        println!("Incremented counter by {}, now at {}", self.amount, state.value);
    }

    // Human-readable label for debugging.
    fn label(&self) -> String {
        format!("INCREMENT({})", self.amount)
    }

    // Strategy for generating instances of this command.
    fn build(
        ctx: Arc<CounterContext>,
    ) -> impl Strategy<Value = CommandWrapper<CounterState, CounterContext>> {
        let (min, max) = ctx.increment_range;
        (min..=max).prop_map(|amount| CommandWrapper::new(IncrementCommand { amount }))
    }
}

struct ResetCommand;

impl Command<CounterState, CounterContext> for ResetCommand {
    fn check(&self, state: &CounterState) -> bool {
        state.value > 0  // Only reset if there's something to reset.
    }

    fn apply(&self, state: &mut CounterState) {
        state.value = 0;
        println!("Counter reset to 0");
    }

    fn label(&self) -> String {
        "RESET".to_string()
    }

    fn build(
        _ctx: Arc<CounterContext>,
    ) -> impl Strategy<Value = CommandWrapper<CounterState, CounterContext>> {
        Just(CommandWrapper::new(ResetCommand))
    }
}

Running the Scenario

With madhouse-rs, you compose test scenarios using the scenario! macro:

fn test_counter_chaos() {
    let test_context = Arc::new(CounterContext {
        increment_range: (1, 100),
    });

    // Run the scenario - madhouse-rs handles the rest.
    scenario![
        test_context,
        IncrementCommand,
        ResetCommand,
        (IncrementCommand { amount: 42 })  // Fixed command instance.
    ];
}

The Power of Data-Open Design

What makes this approach scale? Each command is autonomous:

Self-contained logic: Generation, preconditions, and application logic all live together.
No central bottleneck: Adding DecrementCommand requires zero edits to existing code.
Composable: Mix and match commands freely in different test scenarios.
Maintainable: Each command can be developed, tested, and reviewed independently.

Real-World Impact

Update (June 14, 2025): This design proved its worth in the Stacks blockchain testing. Consider this actual test scenario from the stacks-core PR #6007 that was merged yesterday:

scenario![
    test_context,
    SkipCommitOpMiner2,
    BootToEpoch3,
    SkipCommitOpMiner1,
    PauseStacksMining,
    MineBitcoinBlock,
    VerifyMiner1WonSortition,
    SubmitBlockCommitMiner2,
    ResumeStacksMining,
    WaitForTenureChangeBlockFromMiner1,
    MineBitcoinBlock,
    VerifyMiner2WonSortition,
    VerifyLastSortitionWinnerReorged,
    WaitForTenureChangeBlockFromMiner2,
    ShutdownMiners
]

Each of those 14+ operations is a self-contained Command implementation. No central enum to maintain. No monolithic match statement. No coordination between developers adding new test operations.

More importantly, when the framework runs with MADHOUSE=1, it generates random permutations of these operations, creating chaotic scenarios that manual tests could never explore. This is how the framework can reproduce production bugs that traditional testing might miss.

The Expression Problem Solved

By choosing the “data-open” side, madhouse-rs makes it trivial to add new command types while keeping the core operations (check, apply, label, build) stable. This is exactly the opposite trade-off from proptest-state-machine, and for model-based testing at scale, it’s the right choice.

Next: Chaos Testing stacks-node with Model-Based Stateful Testing