undefined | Firaz Zakariya

Most experimentation tooling is built for companies with millions of daily active users. If you have hundreds, the standard advice (“just wait for statistical significance”) means waiting months, or running tests so underpowered that you’re mostly measuring noise. microexp is a Python package built for that situation.

Problem

Running experiments on Cawosh meant small weekly traffic. Classical fixed-horizon tests required sample sizes I couldn’t reach before the world changed. I needed methods that let you peek at results without inflating false positive rates, and that extract more signal from the data I had.

Approach

Sequential testing (mSPRT). The mixture Sequential Probability Ratio Test lets you test continuously and stop as soon as you have enough evidence, in either direction. Unlike naive peeking, the Type I error stays controlled at the chosen level.

Bayesian A/B testing. Models the conversion rate as a Beta-distributed random variable and updates it as data arrives. Gives you a probability that variant B beats control, rather than a p-value, which is more useful for small-traffic decisions.

CUPED variance reduction. Uses pre-experiment covariates (e.g. prior week’s behaviour) to reduce outcome variance, tightening confidence intervals without collecting more data. Often equivalent to running the experiment with 20 to 40% more users.

Package design. The target API is a small Test class: you pass a metric type and significance level at construction, stream observations via .update(), and read a decision summary from .result(). The goal is notebook-first usage with a thin CLI on top, not a heavyweight experimentation platform.

Results

The package is under active development. The current focus is repository scaffolding, CI (lint, typecheck, test), and the mSPRT path for continuous metrics. Unit tests will assert Type I error control under the null and reasonable power under a known effect size. Before running anything on live Cawosh traffic, I will validate behaviour in simulation against a naive fixed-horizon baseline.

Code & package

Source code, documentation, and a PyPI release are planned for this sprint. The motivation is real: experiments I need to run on Cawosh, not a toy dataset. Until the package ships, see github.com/Firazak for other work.