Create Your Own Testing Framework

22nd Jul 2025
5 min read
Tags:
python,
testing,
performance

Introduction

Creating a testing framework for your services is easier than you might think. We’re not reinventing the wheel or writing everything from scratch, we’re smartly borrowing from existing tools to make our lives easier. In this guide, we’ll combine two fantastic tools, Locust and Docker, to craft a deterministic simulation testing framework that lets you test your Python services with precision and control.

Docker

Before we jump into the how-to, let’s get clear on what these tools are and what they bring to the table. Docker is a platform that packages your application and all its dependencies into a neat little container. This container ensures your service runs the same way everywhere, whether it’s your laptop or a production server, while letting you control resources like CPU and memory. In our setup, Docker runs our service, not Locust, giving us a sandbox to mimic real-world constraints.

Locust

Locust, on the other hand, is an open-source load testing tool where you define user behavior using Python code. It can simulate millions of users hammering your service, and it runs outside of Docker in this framework. Locust provides a slick web UI dashboard to monitor tests, a CLI to fire off requests, and detailed stats like response times and request rates, everything you need to see how your service holds up.

Fuzz testing

And then there’s fuzz testing. This is the art of throwing random or malformed inputs at your service to see if it breaks, catches fire, or just shrugs it off. It’s perfect for testing robustness and finding those sneaky edge cases that only show up when someone (or something) sends garbage data.

With those basics down, let’s dive in.

Why Use Locust and Docker?

Locust takes care of sending requests, defining how those requests behave, and serving up a dashboard with pretty charts to watch it all unfold. It’s our command center for testing. Docker, meanwhile, is the stage where our service performs. By running our service in a container, we can dictate exactly how much CPU or memory it gets, specify where it reads or writes data, and keep the environment consistent every time we test.

Testing on your local machine isn’t ideal—local environments differ from production in many ways. Still, having a setup to test your software and get a rough idea of its capabilities isn’t a bad start. This framework helps you perform performance and fuzz testing, catch errors, and reproduce tests in a production-like environment with consistent parameters. You can spot regressions locally and monitor the ressources usage, then deploy and test in production by changing only the service URL.

Step 1: Setting Up Test Parameters

To make this framework flexible, we need to customize things like CPU limits, memory, worker counts, threads, or even the Docker image we’re using. And we want to adjust these easily, either through a web UI or the terminal.

Here’s how we set it up:

@events.init_command_line_parser.add_listener
def _(parser):
    add_args_to_parser(parser)

Nice right, almost nothig to add. All the logic to parse the file and add commands will live in a second file.

def add_args_to_parser(parser: LocustArgumentParser) -> None:
    parser.add_argument(
        "--name",
        type=str,
        default=DOCKER_INFO["container_name"],
        help="A name for the reports and the container if you are using custom settings. A report's name would look "
        "like: <name>-<cpu>cpu-<memory><memory-unit>memory-<date>",
    )
    parser.add_argument(
        "--gunicorn-threads",
        type=int,
        default=DOCKER_INFO["environment"]["THREADS"],
        help="The number of threads that gunicorn can use. This will override the value of the .env file.",
    )
    parser.add_argument(
        "--gunicorn-workers",
        type=int,
        default=DOCKER_INFO["environment"]["WORKERS"],
        help="The number of workers that gunicorn can use. This will override the value of the .env file.",
    )
    parser.add_argument(
        "--service-port",
        type=str,
        default=DOCKER_INFO["environment"]["PORT"],
        help="The port for the service to use",
    )
    parser.add_argument(
        "--cpu",
        type=int,
        default=DOCKER_INFO["deploy"]["resources"]["limits"]["cpus"],
        help="The number of cpus availables to the container. 1 CPU probably has 1 core and 2 threads.",
    )
    parser.add_argument(
        "--memory",
        type=int,
        default=DOCKER_INFO["deploy"]["resources"]["limits"]["memory"][0],
        help="The amount of memory available to the container.",
    )
    parser.add_argument(
        "--memory-unit",
        type=str,
        choices=("gb", "mb", "kb"),
        default=DOCKER_INFO["deploy"]["resources"]["limits"]["memory"][1:],
        help="The unit for the amount of memory",
    )
    parser.add_argument(
        "--image",
        type=str,
        default=DOCKER_INFO["image"],
        help="The image for your docker container",
    )
    parser.add_argument(
        "--docker-tag",
        type=str,
        default=DOCKER_INFO["environment"]["TAG"],
        help="The tag for the docker image",
    )

For simplicity, we’ll lean on a docker-compose file to define resources and the image, pre-filling our fields with defaults we can tweak as needed.

def get_dockerfile_information() -> dict:
    with open("docker-compose.yaml") as file:
        services = yaml.safe_load(file)["services"]
        service_name = next(filter(lambda x: x.startswith("test"), services))
        return services[service_name]

DOCKER_INFO = get_dockerfile_information()

Step 2: Running the Service and Collecting Stats

With parameters in place, it’s time to put them to work. One approach is to fire off Docker commands through a bash script or subprocess.

But there’s a smoother path, using Docker as a Python library to talk directly to the Docker daemon, skipping the CLI entirely. How do we launch Docker and track its stats without everything grinding to a halt? We borrow a trick from Locust and use greenlets for non-blocking execution. It adds a dependency, sure, but it’s one Locust already uses, so no harm done.

class ServerProvider:
    __slots__ = (
        "locenv",
        "name",
        "results_dir",
        "docker_client",
        "container",
        "_background",
        "_docker_stats_csv_filehandle",
        "_docker_stats_csv_writer",
    )

    def __init__(
        self,
        locenv: env.Environment,
        results_dir: Path,
        name: str,
        memory: str,
    ) -> None:
        self.docker_client = docker.from_env()
        self.locenv = locenv
        self.name = name
        self.results_dir = results_dir
        image = self._clean_image(
            locenv.parsed_options.image,
            locenv.parsed_options.docker_tag,
        )
        self._start_docker(
            memory,
            int(locenv.parsed_options.cpu),
            name,
            image,
        )
        self._set_up_writers()
        self._background = gevent.spawn(self._update_stats)
        locenv.events.quit.add_listener(self._quit)

We then use our parameters to start the container:

    def _start_docker(self, memory: str, cpu: int, name: str, image: str) -> None:
        volumes = fix_volume_paths(DOCKER_INFO.get("volumes", []))
        try:
            nano_cpus, cpuset_cpus = self._get_cpu_values(cpu)
            self.container: Container = self.docker_client.containers.run(
                image,
                detach=True,
                environment={
                    "PORT": self.locenv.parsed_options.service_port,
                    "WORKERS": self.locenv.parsed_options.gunicorn_workers,
                    "THREADS": self.locenv.parsed_options.gunicorn_threads,
                },
                nano_cpus=nano_cpus,
                cpuset_cpus=cpuset_cpus,
                mem_limit=memory,
                mem_reservation=memory,
                mem_swappiness=0,
                memswap_limit=memory,
                name=name,
                ports={
                    f"{self.locenv.parsed_options.service_port}/tcp": (
                        "0.0.0.0",
                        self.locenv.parsed_options.service_port,
                    )
                },
                volumes=volumes,
            )
            logger.debug("Starting container %s from image %s", name, image)

            self.container.reload()
        except docker.errors.ImageNotFound as e:
            logger.warning("Image not found, well create it. %s", str(e))
            return self._start_docker(memory, cpu, name, image)
        except (docker.errors.ContainerError, docker.errors.APIError) as e:
            logger.error("Error running the docker container %s", str(e))
            exit(1)

Next, we need to keep an eye on our container’s stats as it runs:

    def _get_machine_stats(self, docker_stats: dict) -> MachineStat:
        try:
            return MachineStat(
                memory_usage=docker_stats["memory_stats"].get("usage"),
                memory_stats_cache=docker_stats["memory_stats"]
                .get("stats", {})
                .get("cache"),
                memory_limit=docker_stats["memory_stats"].get("limit"),
                cpu_usage=docker_stats["cpu_stats"]["cpu_usage"].get("total_usage"),
                precpu_usage=docker_stats["precpu_stats"]["cpu_usage"].get(
                    "total_usage"
                ),
                system_cpu_usage=docker_stats["cpu_stats"].get("system_cpu_usage"),
                system_precpu_usage=docker_stats["precpu_stats"].get(
                    "system_cpu_usage"
                ),
                online_cpus=docker_stats["cpu_stats"]["online_cpus"],
                number_cpus=len(
                    docker_stats["cpu_stats"]["cpu_usage"].get("percpu_usage", [])
                ),
                time=time.time(),
                user_count=self.locenv.runner.user_count,
            )
        except Exception as e:
            logger.warning(
                "Error getting the docker container stats %s",
                str(e),
            )
            return MachineStat()

    def _update_stats(self) -> None:
        last_flush_time: float = 0.0
        stats_source = self.container.stats(decode=True, stream=True)
        while (
            self.locenv.runner.state not in STATE_NOT_RUNNING
            and self.container.status == "running"
        ):
            stats = self._get_machine_stats(next(stats_source))
            self._docker_stats_csv_writer.writerow(stats)
            now = time.time()
            if now - last_flush_time > 15:
                self._docker_stats_csv_filehandle.flush()
                last_flush_time = now

And there you have it, our container, service, and client are all runnin along with the settings we chose.

Where’s the Fuzzing?

Fuzz testing depends on the type of requests you want to simulate. Want to mimic production traffic? Easy. Most cloud providers offer logs with requested URLs. You can retrieve these URLs and use them to send requests to your local service.

To take it further, your logs likely include timestamps. You could replay requests with the same time intervals as in production, simulating real-world traffic rates. Implementing this is left as an exercise for the reader (or maybe a future post). Check the Locust documentation for details.

Alternatively, you can fuzz your API by generating random requests with valid (or invalid) data. Python’s built-in libraries offer handy functions for creating random combinations:

def all_security_values():
    while True:
        for e in SecurityValue:
            yield e.value


def all_weighting_methods(d: str):
    lb = round(random.uniform(0.01, 1.0), 3) if random.choice((True, False)) else None
    ub = round(random.uniform(0.01, 1.0), 3) if random.choice((True, False)) else None
    return {"d": d, "lb": lb, "ub": ub}


def all_filters(d: str):
    if random.choice((True, False)):
        n = random.randint(1, 10)
        return {"n": n, "d": d}
    p = round(random.uniform(0.01, 100.0), 2)
    return {"p": p, "d": d}


def all_calendar_rules(dates: list[str]):
    if random.choice((True, False)):
        return {"dates": random.sample(dates, k=random.randint(1, len(dates)))}
    return {"initial_date": random.choice(dates)}


def generate_all_payload_combinations(dates: list[str]):
    for d in all_security_values():
        yield {
            "calendar_rule": all_calendar_rules(dates),
            "backtest_filter": all_filters(d),
            "weighting_method": all_weighting_methods(d),
        }

Wrapping Up

Now that you have everything, simply type:

locust

Let the magic happen and when you had enough, stop it and look at the results of your service.

That’s the gist of running a fuzzer or performance test with this setup! But don’t stop now, there’s room to grow. If your service leans on a database or another dependency, use the same Docker library trick to spin them up and test how everything plays together.

Want more? Hook this into Prometheus with Locust’s plugin for real-time monitoring and alerts. And since you can run it from the CLI in headless and autostart modes, it slots right into your CI/CD pipeline. Catch issues early, stress-test under load, and ship with confidence.

Try it out with your own service, drop your thoughts on X, or peek at my GitHub repo for the full code. Happy testing!