Syzygy

Dual Code-Test C to Rust Translation using LLMs and Dynamic Analysis


UC Berkeley
*Equal Contribution, Equal Advising

Teaser

Despite extensive usage in high-performance, low-level systems programming applications, C is susceptible to vulnerabilities due to manual memory management and unsafe pointer operations. Rust, a modern systems programming language, offers a compelling alternative. Through its unique ownership model and type system, it ensures memory safety without sacrificing performance. We present Syzygy, an automated approach to translate C to safe Rust. Our technique uses a synergistic combination of LLM-driven code and test generation guided by dynamic-analysis-generated execution information. We apply our approach to successfully translate Zopfli, a high-performance compression library with ~3000 lines-of-code and 98 functions. We validate the translation by testing equivalence with the source C program on a set of inputs. To our knowledge, this is the largest automated and test-validated C to safe Rust code translation achieved so far.

Dualities for Safe and Valid C to Rust

As the name suggests, Syzygy exhibits two kinds of dualities that form the core principles of our approach.
  • LLMs meet Dynamic Analysis. First, we synergistically combine the superior generative capabilities of LLMs with semantic execution information collected by dynamic analyses on the source C codebase.
  • Dual Code-Test Translation. Secondly, as opposed to using the LLM+Dynamic Analysis recipe to perform Rust code-generation alone, we also build a test translation approach for reliable equivalence testing.

Syzygy Overview

Overview. The Slicer decomposes a codebase into translation units, CodeGenerator performs translation for that unit, and EqTester checks for equivalence of the generated Rust code with the original C code. ArgTranslator maps the C-Rust function arguments allowing appropriate equivalence checks in EqTester. SpecMiner, our dynamic analysis module, mines (property and I/O) specifications for internal C functions using top-level tests. These specifications later assist our LLM-driven dual code-test translation.

Scaling LLM Inference with Dynamic Analysis

Pipeline

Translation Pipeline. As shown above, given a C function and test suite, we run four modules:

  1. SpecMiner: Analyzes function execution, collecting I/O and key properties like types, aliasing, and bounds.
  2. CodeGenerator: An LLM uses the spec to craft a candidate Rust translation.
  3. ArgTranslator: An LLM creates a function that translates C I/O to Rust inputs and expected outputs, aligned with the generated Rust translation.
  4. EqTester: An LLM generates a test that executes the candidate translation, along with the previously generated Rust code (Invariant), and checks whether the outputs match expected values.

Rejection Sampling. As shown below, we use rejection sampling to scale LLM inference throughout the pipeline. To do so, we alternate between generation (→) and verification (-->) steps, beginning with an LLM generating N candidate Rust functions which are filtered for compilation. The pipeline then samples K translate_args functions for each valid candidate, with subsequent execution filtering and equivalence testing phases that create and verify triples, ultimately yielding successfully translated and equivalent Rust functions.

Pipeline

Translating Zopfli from C to Rust

We completely translate Zopfli, a compression algorithm known to achieve superior compression ratios by extensively optimizing the compression process. The codebase consists of over 3000 lines of C code (excluding comments), comprising 98 functions and 10 structs, and spans over 21 files. It provides a challenging testbed for evaluating our approach due to the diversity of the source code constructs, including heap-based data structures (such as linked lists and array iterators), function pointers, and void* arguments. Our translated Zopfli program ranges over 7000 lines (including verbose comments and docstrings).

Pass Rate

We study the quality of the translation candidates across functions with two metrics: compilation pass rate and execution pass rate. Compilation pass rate is computed as the number of translations that compile against the total number of translations generated. The execution pass rate is computed as the number of translations that pass the equivalence test compared to the total number of compiling translations.
Pass@1 on R2E-Eval1
The mean pass rates are high, ranging over 75% for both compilable and executable translations. However, the distribution shows a long tail of challenging functions with lower pass rates (below 20%).

Efficiency

We further evaluated the final translated Rust code on varied test inputs and compared the output and execution time with the original C code. We construct two test suites -- first with six strings formed by repeating 'a' different number of times (between 10 and 1000000 times) and second with twelve random strings of varying lengths (between 1000 and 1000000 characters). Additionally, we compare the execution times under default compilation configuration (gcc -O0 -g and cargo debug) and optimized compilation configuration (gcc -O3 and cargo build --release) for both the C and Rust implementations.
Performance Comparison of C and Rust Implementations
Input Type Default Optimized
C Rust Slowdown C Rust Slowdown
Repeated Input 0.17 1.56 8.9x 0.040 0.059 1.47x
Random Input 0.242 3.336 13.8x 0.094 0.330 3.67x

We find that the translated Rust code is equivalent but slower, with larger slowdowns, in the default compilation compared to the optimized compilation configuration. Even with the compiler optimization, the Rust code is up to 3.67X slower than the C code.