Company✦Jun 10, 2024

Introducing Harmonic: Our Mission and First Results

Harmonic is building mathematical superintelligence.

When we want to know if an answer to a question is correct, we check the reasoning - the logical steps behind the answer. In order for artificial intelligence systems to be truthful, explainable, and aligned with us, we must endow these systems with powerful and verifiable reasoning capabilities.

The language of reasoning is mathematics, and mathematics is the means through which humans have discovered the fundamental truths about our universe. Mathematical superintelligence will greatly accelerate human advancement in science and engineering.

▶ AI today

Modern AI systems based on large language models appear to understand and use language in a way that emulates humans, and we have seen new and exciting capabilities emerge as these models have grown in scale. However, today's models fall short of human-level reasoning; hallucinations are common and models are often as confident about wrong answers as they are about right ones. They can fail in unpredictable ways, and the consequences of such failures become increasingly significant as their usage grows. In their current state, they are unusable in most safety-critical applications.

▶ The promise of mathematical reasoning

Models capable of formal mathematical reasoning will produce output guaranteed to be correct, with an interpretable chain of reasoning. We believe such models, with transparent and automatically-verifiable reasoning traces, will be fundamentally safe in ways that the current generation of models are not.

This approach will be immediately applicable to critical industries such as aerospace, chip design, industrial systems, and healthcare, where software reliability is paramount. Moreover, the development of such models will push the boundaries of AI research, ultimately driving the creation of more powerful and reliable AI systems across different domains.

▶ Where we are today

Our first research result is Aristotle: an automated theorem prover advancing the state of the art on MiniF2F. This standard benchmark measures problem-solving ability on a range of problem difficulties including the International Mathematical Olympiad. To evaluate our system, we manually reformalized and improved MiniF2F in Lean 4. To obtain a training set, we re-split the 488 MiniF2F problems (originally evenly divided into validation and test sets) randomly into 392 training, 48 validation, and 48 test problems.

We evaluate Aristotle in two settings: one where additional external computer algebra systems are permitted to solve simple subproblems in a trusted way, and one where the full proofs are expressed in Lean. Our system currently achieves a 83% success rate in the first setting and a 63% success rate when restricted to Lean. We compare our results on the validation split to two previous state of the art approaches below.

▶ Join us

Our journey to build mathematical superintelligence is just beginning. We are a commercial research lab, and our primary focus is to enable our team of talented researchers, mathematicians, and engineers to build the world's most advanced mathematical reasoning engine.

If our mission resonates with you, join us or follow us on X

- Tudor, Vlad, and the Harmonic Team