Skip to main content

Configurator

Code is read more often than it is written. - Guido van Rossum

Here’s an alternate wording for that quote: code is about communication; communicating instructions to machines, and concepts to developers.

Knowing this reality, I find most developers underestimate the value of clean code. Clean code is a blessing because readers can focus on the task at hand. On the contrary, messy code is full of distractions; readers waste mental clock cycles thinking about latent bugs or inefficiencies.

The ability to draft clean code is not necessarily a product of experience, or investing a bunch of time manually reviewing code diffs, it comes with the right developer toolchain.

Origins
#

While working on the biotech startup Synthego’s innovation team, I often worked in a green field without code review. Paired with the reality that immunotherapy research involves multi-week experiments and costly reagents, preventable software failures (e.g. TypeErrors) were excruciating.

I became interested in automated quality assurance, both to prevent bugs and receive feedback on my programs. Modern conveniences such as Python typing, pyproject.toml-centralized configs, and GitHub Actions didn’t yet exist, though flake8 plugins were aplenty. I began building a toolchain just for myself, discovering where the tools helped, missed, and hindered.

Adoption
#

Over the years, I passively expanded the toolchain, growing it for new discoveries like codespell or black preview rules, disabling aspects found unhelpful, and contracting it alongside consolidations into ruff. Like a plumber’s toolbox, I repeatedly ported it to my current setting: production software at Synthego, group projects in Stanford computer science, open-source software facing easily-prevented bugs.

I started receiving gratitude from colleagues who had not experienced such a toolchain before, and this began to give me ideas. Leaving Synthego for FutureHouse in early February 2024, I created the GitHub repo configurator with a two-part vision:

  1. Creating a single toolchain fluent for both early-stage exploratory research and production software.
  2. Building an automated yet flexible system to propagate tooling updates across repos.

Fast-forward in time, and item 1 has come true. FutureHouse has adopted the configuration system org-wide. It’s been used for research projects such as PaperQA2, aviary, and ether0, as well as adopted universally by our platform engineering team.

Item 2 has some weak competition from tools like cookiecutter or nitpick, but really the ultimate solution will be an AI agent. Maybe someday there will be time to properly tackle item 2.

Source Code
#

James Braza
James Braza
Artificial Intelligence Research