Engineering✦Dec 16, 2025

Open Sourcing pbcc: A Faster, Leaner Protobuf Compiler for Python

Our infrastructure team has developed a custom Protocol Buffers implementation for Python, named pbcc, to replace the standard library in our high-performance workloads. pbcc is a streamlined compiler that generates specialized C++ code for Protobuf messages, enabling us to handle massive datasets in memory with significantly reduced overhead and a much cleaner Python API.

In this article, we'll first provide background on why the standard tools didn't fit our needs. We will then recount the design of pbcc, focusing on how we achieved speed and simplicity. Finally, we'll outline the key takeaways from building a core infrastructure component from scratch.

Let's get started!

▶ Background

At its core, Protocol Buffers is a way to serialize and deserialize structured data. To use it, you would first write the structures in a .proto file using Protobuf’s definition language, compile that using Google’s protobuf compiler, then use the resulting Python library to serialize and deserialize instances of the structures (“messages”) defined in the .proto file. Abstracting away serialization in this manner makes it easy to safely interact with services running on other machines, or to store structured data in long-term storage such as Google Cloud Storage.

As our data scale grew, we began hitting limits with the standard Python implementation. We found that for very large messages, the standard library was often too slow and memory-hungry, and frequently failed entirely on input messages larger than 2GB. Furthermore, the Python interface provided by the standard library did not satisfy standard type-checking utilities like mypy, and was cumbersome to use in general. For example, Google’s implementation wraps repeated fields in custom container types rather than native Python lists, which leads to unintuitive behaviors where strict equality checks against standard lists fail, forcing engineers to cast data constantly just to compare values.

We needed a solution that was:

• Fast: Capable of serializing and deserializing large payloads with minimal latency.

• Memory Efficient: Able to handle very large messages in memory without crashing the process.

• Pythonic: Exposing a clean API that uses native types (like list and dict) and supports modern IDE type hinting.

▶ The Architecture of pbcc

To solve this, we decided to build pbcc. Unlike the standard library, which relies on interpreting message descriptors at runtime, pbcc takes an ahead-of-time (AOT) compilation approach.

The system is driven by a Python script named compile.py, which reads the message descriptors from existing modules compiled by Google’s Protobuf compiler. It then uses a highly optimized C++ template (pymodule.in.cc) to generate specific C++ source code for each message type. This generated C++ code is then compiled into an extension module (.so file) that Python can import directly. This module provides the message types as classes which behave similarly to Python’s built-in dataclasses, but also have serialization and deserialization functions.

This architecture allows us to encapsulate the complexity of Protobufs entirely in the extension module, so we can just treat pbcc objects like normal Python objects. Next, we’ll describe some implementation details that allowed us to optimize performance while maintaining safety in the extension module.

▶ Key Design Decisions

Correctness was the first priority when building pbcc, followed by ease of use, and finally speed.

For correctness:

• We wrote an extensive test suite which exercises every field type with every kind of modifier (optional, repeated, oneof, and map). The test suite also verifies that deserializing data which doesn’t match the message definition results in raising an exception.

• We maintain the unknown field behavior from the Protobuf specification, so re-serializing a message will always result in semantically-equivalent data, even if the input contained fields that the module didn’t know about. The caller can explicitly delete these unknown fields on a pbcc object to save memory, if they want to.

• For Python objects, we handle memory management via a PyObjectRef wrapper class, which acts similarly to a std::shared_ptr but holds a reference to a Python object, which ensures we don't leak memory even when processing massive, deeply nested messages.

• We use native (64-bit) integer types everywhere, which means there are no artificial size limits in pbcc. We suspect that the use of signed 32-bit values for message sizes is why Google’s upb library is unable to deserialize messages larger than 2GB, but we didn’t verify this.

For ease of use, we wanted our engineers to interact with pbcc message objects as if they were normal Python objects. To help with this, the compiler generates a .pyi stub file alongside the compiled module. This means that developers get full autocomplete and type checking in their IDEs, which doesn’t work well with the original Protobuf library. In addition, pbcc maps all Protobuf field types to native Python types, which means that repeated fields become lists, maps become dicts, and optional fields may be None. This behavior allows for intuitive operations, such as direct equality comparisons between a repeated field and a list literal.

To optimize speed and scalability, the C++ implementation uses custom StringReader and StringWriter classes which manage memory efficiently, copying data only when needed. We didn’t do much explicit optimization in the C++ code, so there are probably some potential improvements to be made there. Even without doing this, pbcc was already about as fast as Google’s upb library.

▶ Conclusion

As with the REPL service, there's a tradeoff between building something in-house vs. using an open-source or vendored product to fill an engineering need. Even though serialization is considered by many to be a solved problem, we weren't satisfied with the solution, so we built what we needed instead.

You can use pbcc yourself on GitHub.

At Harmonic, we like to solve problems in simple, elegant, and performant ways, and we aren’t afraid to dive deeper down the stack when needed. If this kind of work sounds interesting to you, join us!

Engineering✦Dec 16, 2025

Open Sourcing pbcc: A Faster, Leaner Protobuf Compiler for Python

Let's get started!

▶ Background

We needed a solution that was:

• Fast: Capable of serializing and deserializing large payloads with minimal latency.

• Memory Efficient: Able to handle very large messages in memory without crashing the process.

• Pythonic: Exposing a clean API that uses native types (like list and dict) and supports modern IDE type hinting.

▶ The Architecture of pbcc

To solve this, we decided to build pbcc. Unlike the standard library, which relies on interpreting message descriptors at runtime, pbcc takes an ahead-of-time (AOT) compilation approach.

▶ Key Design Decisions

Correctness was the first priority when building pbcc, followed by ease of use, and finally speed.

For correctness:

▶ Conclusion

You can use pbcc yourself on GitHub.

Open Sourcing pbcc: A Faster, Leaner Protobuf Compiler for Python

Running Lean at Scale

Open-sourcing python-memtools: a memory analyzer for Python programs

Introducing Yuclid+Newclid 3.0

How Aristotle Achieved its IMO Gold Medal-Level Performance

Open Sourcing pbcc: A Faster, Leaner Protobuf Compiler for Python

Running Lean at Scale

Open-sourcing python-memtools: a memory analyzer for Python programs

Introducing Yuclid+Newclid 3.0

How Aristotle Achieved its IMO Gold Medal-Level Performance