moodmosaic

The Expression Problem in Practice: A Trait-Based Testing Harness

This post is part of the Model-Based Stateful Testing with madhouse-rs series.

We started this series with a production bug that couldn’t be reproduced. We end with a framework that not only can catch that bug, but fundamentally change how we think about testing complex systems. The journey reveals practical lessons about the expression problem that extend far beyond testing.

The Design That Emerged

Through trial and error, madhouse-rs converged on a simple but powerful architecture, as described in the whitepaper commit:

Each Command follows a predictable lifecycle:

  1. Generated by a proptest Strategy
  2. Validated via check against current state
  3. Applied via apply, mutating both model and real system
  4. Verified through assertions and postconditions

Why Traits Won Over Enums

The contrast with proptest-state-machine is instructive. Consider how each approach handles a new test operation:

Enum approach (proptest-state-machine):

// 1. Add to the central enum (affects everyone).
enum SystemTransition {
    ExistingOp1,
    ExistingOp2,
    NewOperation(NewOpData), // <- New variant.
}

// 2. Update the central apply function (affects everyone).
fn apply(state: State, transition: SystemTransition) -> State {
    match transition {
        SystemTransition::ExistingOp1 => { /* existing logic */ }
        SystemTransition::ExistingOp2 => { /* existing logic */ }
        SystemTransition::NewOperation(data) => { // <- New arm.
            // New logic scattered across this central function.
        }
    }
}

// 3. Update the transitions function (affects everyone).
fn transitions() -> BoxedStrategy<SystemTransition> {
    prop_oneof![
        existing_strategy_1(),
        existing_strategy_2(),
        new_operation_strategy(), // <- New generator.
    ].boxed()
}

Trait approach (madhouse-rs):

// Self-contained - zero impact on existing code.
struct NewOperationCommand {
    data: NewOpData,
}

impl Command<SystemState, SystemContext> for NewOperationCommand {
    fn check(&self, state: &SystemState) -> bool {
        // Preconditions logic here.
    }

    fn apply(&self, state: &mut SystemState) {
        // Application logic here.
    }

    fn label(&self) -> String {
        format!("NEW_OPERATION({:?})", self.data)
    }

    fn build(ctx: Arc<SystemContext>) -> impl Strategy<Value = CommandWrapper<SystemState, SystemContext>> {
        // Generation strategy here.
        new_operation_strategy()
            .prop_map(|data| CommandWrapper::new(NewOperationCommand { data }))
    }
}

The difference is profound: trait-based commands are autonomous. All logic—generation, preconditions, application, and labeling—lives in one place. No coordination required.

Real-World Scale: The PoX-4 Experience

Before madhouse-rs, we applied these principles with Radu Bahmata to test the Proof-of-Transfer (PoX-4) consensus using TypeScript and fast-check. The harness grew to include 20+ command types, each testing different aspects of the staking protocol:

The key insight: each command class was self-contained. A developer could add StackExtendCommand without understanding the internals of DelegateStxCommand. The framework composed them automatically.

When a test failed after 200+ operations, the shrinking algorithm would reduce it to something like:

Original sequence: [200+ operations]
Shrunk to: [
    DelegateStx(account, pool),
    StackAggregationCommit(pool, account),
    RevokeDelegateStx(account),
    StackAggregationCommit(pool, account)
]

This four-step sequence revealed a subtle bug: revoking delegation didn’t properly invalidate pending aggregation commits. Finding this manually would have taken weeks.

Lessons for System Design

The expression problem appears everywhere in software design, not just testing frameworks:

1. Plugin Architectures

Want users to extend your system with new functionality? Choose the “data-open” side—make plugins implement traits rather than forcing them to modify central enums.

2. Event Systems

Need to handle dozens of event types? Each event type should be its own struct implementing an Event trait, not variants in a central enum.

3. Command Patterns

Building a command-line tool with subcommands? Each subcommand should be its own type, not a variant in a central enum.

4. Middleware Systems

Web frameworks often choose the “data-open” side: each middleware is its own type implementing a common trait.

The Cost of Getting It Wrong

We’ve seen both sides of this trade-off in practice:

When the enum approach breaks down:

When the trait approach breaks down:

For madhouse-rs, the trade-off was clear: we needed to add new test operations constantly, but the core operations (check, apply, label, build) were stable. The “data-open” choice was correct.

Performance Considerations

One concern with trait-based approaches is performance. CommandWrapper uses Arc<dyn Command<S, C>>, which involves heap allocation and dynamic dispatch. In our testing scenarios, this overhead was negligible compared to the actual blockchain operations being tested.

The Full Circle

We began with a simple question: how do you design systems that are easy to extend? The expression problem provided the theoretical framework, but the real learning came from building systems that needed to scale.

The Stacks blockchain bug that started this journey taught us that complexity is the enemy of correctness. Traditional testing assumes you can predict where bugs hide. Model-based testing with madhouse-rs assumes you can’t—so it generates the chaos systematically.

The trait-based design made this scalable. Instead of a monolithic test harness that becomes unmaintainable, we have an ecosystem of autonomous commands that compose naturally.

Practical Takeaways

  1. Choose your trade-off consciously: The expression problem forces a choice. Understanding the trade-off helps you pick the right tool.

  2. Favor autonomy at scale: When systems grow large, autonomous components (traits) usually scale better than centralized ones (enums).

  3. Let chaos find the bugs: For complex systems, generated test scenarios often find bugs that manual tests miss.

  4. Design for shrinking: When random tests fail, automatic reduction to minimal cases is invaluable.

  5. Start simple, then scale: Both approaches work for small systems. The difference emerges at scale.

The expression problem isn’t academic theory—it’s a practical design constraint that affects every system you build. Understanding it helps you make better architectural choices, whether you’re building testing frameworks, plugin systems, or distributed applications.

In the end, good design isn’t about avoiding trade-offs. It’s about making them consciously, understanding their implications, and choosing the ones that align with how your system needs to grow.

References and Further Reading

The ideas in this series draw from decades of research and practice:

  1. Philip Wadler’s original expression problem
  2. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs
  3. Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane
  4. eqc_statem documentation
  5. Clarity Model-Based Testing Primer
  6. Hedgehog State.hs - Haskell stateful testing implementation
  7. Hedgehog References.hs - Practical stateful testing example
  8. PoX-4 Commands TypeScript - Original disjointed command implementation
  9. The original GitHub comment that sparked madhouse-rs
  10. The pull request where both traditional and madhouse-rs approaches reproduced the production bug

Series Complete: Model-Based Stateful Testing with madhouse-rs series.