This post is part of the Model-Based Stateful Testing with madhouse-rs series.
We started this series with a production bug that couldn’t be reproduced. We end with a framework that not only can catch that bug, but fundamentally change how we think about testing complex systems. The journey reveals practical lessons about the expression problem that extend far beyond testing.
Through trial and error, madhouse-rs converged on a simple but powerful architecture, as described in the whitepaper commit:
Each Command
follows a predictable lifecycle:
Strategy
check
against current stateapply
, mutating both model and real systemThe contrast with proptest-state-machine is instructive. Consider how each approach handles a new test operation:
Enum approach (proptest-state-machine):
// 1. Add to the central enum (affects everyone).
enum SystemTransition {
ExistingOp1,
ExistingOp2,
NewOperation(NewOpData), // <- New variant.
}
// 2. Update the central apply function (affects everyone).
fn apply(state: State, transition: SystemTransition) -> State {
match transition {
SystemTransition::ExistingOp1 => { /* existing logic */ }
SystemTransition::ExistingOp2 => { /* existing logic */ }
SystemTransition::NewOperation(data) => { // <- New arm.
// New logic scattered across this central function.
}
}
}
// 3. Update the transitions function (affects everyone).
fn transitions() -> BoxedStrategy<SystemTransition> {
prop_oneof![
existing_strategy_1(),
existing_strategy_2(),
new_operation_strategy(), // <- New generator.
].boxed()
}
Trait approach (madhouse-rs):
// Self-contained - zero impact on existing code.
struct NewOperationCommand {
data: NewOpData,
}
impl Command<SystemState, SystemContext> for NewOperationCommand {
fn check(&self, state: &SystemState) -> bool {
// Preconditions logic here.
}
fn apply(&self, state: &mut SystemState) {
// Application logic here.
}
fn label(&self) -> String {
format!("NEW_OPERATION({:?})", self.data)
}
fn build(ctx: Arc<SystemContext>) -> impl Strategy<Value = CommandWrapper<SystemState, SystemContext>> {
// Generation strategy here.
new_operation_strategy()
.prop_map(|data| CommandWrapper::new(NewOperationCommand { data }))
}
}
The difference is profound: trait-based commands are autonomous. All logic—generation, preconditions, application, and labeling—lives in one place. No coordination required.
Before madhouse-rs, we applied these principles with Radu Bahmata to test the Proof-of-Transfer (PoX-4) consensus using TypeScript and fast-check. The harness grew to include 20+ command types, each testing different aspects of the staking protocol:
StackStxCommand
- Delegate STX tokens to a stackerDelegateStxCommand
- Delegate stacking rights to a poolStackAggregationCommitCommand
- Commit aggregated stacking transactionsRevokeDelegateStxCommand
- Revoke previously delegated stacking rightsStackExtendCommand
- Extend an existing stacking commitmentGetStackerInfoCommand
- Query stacker information and verify stateThe key insight: each command class was self-contained. A developer could add StackExtendCommand
without understanding the internals of DelegateStxCommand
. The framework composed them automatically.
When a test failed after 200+ operations, the shrinking algorithm would reduce it to something like:
Original sequence: [200+ operations]
Shrunk to: [
DelegateStx(account, pool),
StackAggregationCommit(pool, account),
RevokeDelegateStx(account),
StackAggregationCommit(pool, account)
]
This four-step sequence revealed a subtle bug: revoking delegation didn’t properly invalidate pending aggregation commits. Finding this manually would have taken weeks.
The expression problem appears everywhere in software design, not just testing frameworks:
Want users to extend your system with new functionality? Choose the “data-open” side—make plugins implement traits rather than forcing them to modify central enums.
Need to handle dozens of event types? Each event type should be its own struct implementing an Event trait, not variants in a central enum.
Building a command-line tool with subcommands? Each subcommand should be its own type, not a variant in a central enum.
Web frameworks often choose the “data-open” side: each middleware is its own type implementing a common trait.
We’ve seen both sides of this trade-off in practice:
When the enum approach breaks down:
When the trait approach breaks down:
For madhouse-rs, the trade-off was clear: we needed to add new test operations constantly, but the core operations (check
, apply
, label
, build
) were stable. The “data-open” choice was correct.
One concern with trait-based approaches is performance. CommandWrapper
uses Arc<dyn Command<S, C>>
, which involves heap allocation and dynamic dispatch. In our testing scenarios, this overhead was negligible compared to the actual blockchain operations being tested.
We began with a simple question: how do you design systems that are easy to extend? The expression problem provided the theoretical framework, but the real learning came from building systems that needed to scale.
The Stacks blockchain bug that started this journey taught us that complexity is the enemy of correctness. Traditional testing assumes you can predict where bugs hide. Model-based testing with madhouse-rs assumes you can’t—so it generates the chaos systematically.
The trait-based design made this scalable. Instead of a monolithic test harness that becomes unmaintainable, we have an ecosystem of autonomous commands that compose naturally.
Choose your trade-off consciously: The expression problem forces a choice. Understanding the trade-off helps you pick the right tool.
Favor autonomy at scale: When systems grow large, autonomous components (traits) usually scale better than centralized ones (enums).
Let chaos find the bugs: For complex systems, generated test scenarios often find bugs that manual tests miss.
Design for shrinking: When random tests fail, automatic reduction to minimal cases is invaluable.
Start simple, then scale: Both approaches work for small systems. The difference emerges at scale.
The expression problem isn’t academic theory—it’s a practical design constraint that affects every system you build. Understanding it helps you make better architectural choices, whether you’re building testing frameworks, plugin systems, or distributed applications.
In the end, good design isn’t about avoiding trade-offs. It’s about making them consciously, understanding their implications, and choosing the ones that align with how your system needs to grow.
The ideas in this series draw from decades of research and practice:
Series Complete: Model-Based Stateful Testing with madhouse-rs series.