Inside Even

Testing fast and easy with Bazel

Run 4,200 tests on every change with this one simple command.
A window on a computer screen with a line of code that says "bazel test"

At Even, we run lots (~4200) of tests every time an engineer makes a pull request. At first, it took a bunch of complicated support code and lots of time to run all of those tests. But since I joined in October 2018, we have totally revamped the way we run tests. This post tells the story of how we made testing faster and easier on our backend services. If you're interested in learning more, or if you want to help us improve testing for our React Native mobile apps, we're hiring.

TL;DR;

Testing at Even is as simple as:

bazel test //...

But that is not where we started and that command is deceptively complex.

Where we started

Our backend code is largely made up of the following languages and components:

  1. Golang
  2. Python
  3. PostgreSQL
  4. AWS
  5. Docker & docker-compose

This is a simplified list of the most complex parts of our system.

We have multiple services that use gRPC for communication. To mimic our production system, we use docker-compose to stitch together containers and support services like PostgreSQL during local testing and development.

Our "unit tests" could be considered integration tests as most of them test code end-to-end (e.g. reading/writing to PostgreSQL). Furthermore, we rarely mock out any interaction with external services including AWS. When I started at Even, some of our tests relied on things like S3 buckets in our production account!

Developers (and our CI system) ran tests using a script that basically called docker-compose run <service>.test. That test service was a container defined in docker-compose.yml that existed for each component in our backend. Each service had to duplicate the same basic setup:

  1. Link in support services (databases, caches, blob stores, etc).
  2. Migrate the database.
  3. Run go test <packages> or tox (for Python).

This seems simple enough but came with a lot of problems and hacks:

  1. All services depended on the same single database. We had to carefully make sure the database names were different for each service being tested.
  2. Our Go tests were split into serial and parallel tests. Golang has support for marking tests as parallel but it will still test packages in parallel. We had to write our own test runner to sequentially execute packages with serial tests.
  3. There was no way to run all tests in our repository. Most developers would only test the packages they were working on and would rely on CI to test everything else.
  4. Caching test results wasn't really an option, especially in CI.
  5. Adding new directories to the root of the repo required updating CI and our test runner. Otherwise those packages were never tested. Some core packages were never being tested, even in CI!

In addition to the testing issues listed above, we had no way to guarantee consistent build tools between developers and CI.

Introducing Bazel

Bazel is a fantastic build tool for monorepos. This post isn't really about why we chose bazel or what bazel can do. We wanted reproducible builds and tighter control over our tooling so bazel was a good fit.

The first step towards bazel test //... was to get bazel build //... working. Luckily, there is a great tool called gazelle that integrates with bazel to produce BUILD.bazel files for Go projects. We heavily rely on this to update our BUILD.bazel files automatically.

Since we wanted tight control over tooling, bazel is actually a script in our repository that ensures that every developer and our CI system gets the exact same version of the "real" bazel. The script automatically installs the desired version and handles some other complex setup (more on that later).

bazel test //... almost worked out of the box. We ended up hitting the serial/parallel issue mentioned above. bazel wants to run as many actions as possible in parallel but we required packages with serial tests to run in isolation. My first approach was to run parallel tests first and then all serial tests using -j 1 which tells bazel to run a single job at a time. The challenge with this approach was that bazel expects tests to be run with the same arguments or environment. If those arguments or environment change, the cache is invalidated.

For example, suppose we have a test like so:


func TestComplex(t *testing.T) {
  if os.Getenv("COMPLEX_ENABLED") == "true" {
    Complex()
  }
}

Running bazel test //... will skip this test (COMPLEX_ENABLED is empty). Running bazel test //... --test_env COMPLEX_ENABLED=true will run the test again since the environment changed. This is good for running that complex test but it also invalidated the cache for the first run. So running bazel test //... will run the original test again even though the code did not change. So having a script like the following:


bazel test --parallel //...
bazel test --serial //...

...will always run the tests even if no code changes. So I decided to remove the package isolation requirement.

PostgreSQL templates

In order to have bazel test everything in parallel, we needed to remove the shared resource of PostgreSQL. Our serial tests were marked that way because they a) truncated a table or b) relied on a pre-existing database state. But since we couldn't give every service a dedicated database container, we decided to use PostgreSQL templates to isolate each test within the same database.

CREATE DATABASE foo WITH TEMPLATE bar is a PostgreSQL command that will create a database named foo that is identical to the database bar. Our migration step runs all migrations on our core databases and then makes a copy into a database named <database>_template. Each package that interacts with the database automatically runs CREATE DATABASE source_my_package WITH TEMPLATE source_template and reconnects to the new database. This isolates the package to a copy of the database it needs so that it does not conflict with other tests.

Our tests interact with other services that need to talk to the database as well. Our gRPC layer will forward the new database name to remote services (running under docker) so the remote service will use the same copy.

Test setup

bazel is a powerful tool. But that power comes at the cost of complexity, especially for the engineers on our team that had never used bazel before. That’s why we integrated it into the test scripts our developers already knew.

In the old system, all developers needed to do was run a single command run-tests foo that took care of docker-compose, database migrations, and testing the targeted package. bazel is not a tool for managing long running applications like docker. I mentioned earlier in the post that our bazel is a script that wraps the real bazel binary. This script does a lot to ease the transition for developers to a new command:

  1. docker-compose setup
  2. Database migrations
  3. Setting the environment for bazel via --test_env arguments
  4. Running gazelle
  5. Convenience flags
  6. Package matching similar to golang (e.g. ./... -> //...)

When a developer runs bazel test //..., they will first see a run of gazelle to automatically update BUILD.bazel files. Next, bazel will run a dedicated docker-compose service that ensures all dependencies are up and available. Then, bazel runs all migrations. Finally, bazel tests the desired packages.

We added flags to bazel to control all the above behaviors in order to iterate much faster. And we now have a command bazel watch which runs ibazel under the hood to automatically test the desired packages when changes are made. Using bazel watch is very similar to automatic build+test in an IDE.

Python

I mentioned Python earlier as one of our development languages. bazel's Python support is not as advanced as other languages but we still wanted bazel test //... to work with our Python code.

sh_test is a bazel rule that runs a shell script as a test target. We do not test our Python code directly with bazel because managing those dependencies within bazel is incredibly complex and requires native libraries which are difficult to manage across developer machines and CI. Instead, we continue to use the docker container for running tox. Our sh_test includes all dependencies (Dockerfile, docker-compose.yml, etc) and Python sources as the data argument, so bazel only needs to run these tests when one of those files changes. The script invokes docker-compose to run tox within the container. The nice part about this is that we have tightened up the dependency set so that these tests are easy to cache.

Localstack

The last big component of our testing infrastructure is localstack. We use localstack to mock out AWS services required by tests. Localstack does a great job of duplicating the behavior of most AWS services. This allows us to write tests that utilize the same code path as production — we just point the test to our localstack instance.

The future

At this point we are heavily invested in bazel. We use it for linting, docker build and push, code generation, and infrastructure. We have made incredible progress but we have a lot more to do. We still have several docker images that exist outside of bazel and our Python code is not properly integrated. Our client codebase has just gotten started with bazel and has an entirely unique set of challenges. If you want to help us do more with Bazel across our entire codebase, we’re hiring.


About the author

Patrick is a Staff Software Engineer working remotely from North Carolina. He’s always looking for ways to make the right thing also be the easy thing, especially when it comes to the tools and internals in our codebase.

Sign up for our newsletter

Get updates around new research and findings in your email.