Rust changed how we think about systems performance. Not immediately — the first few weeks with the borrow checker were genuinely painful, and we had real doubts about whether we'd made the right call. But the throughput numbers at the end of the rewrite made the decision look obvious in retrospect: 10x improvement in processing throughput, p99 latency down from 340ms to 18ms, and memory usage cut by roughly 80%. This is the story of how we got there and what we'd do differently.
Table of Contents
- The Original System: What We Had and Where It Failed
- Why Rust and Not Go or C++
- The Learning Curve: Being Honest About the Friction
- The Rewrite Strategy: Strangler Fig and FFI Bindings
- Performance Results
- The Ownership Model in Practice
- Rayon for Data Parallelism
- Error Handling with Result and the ? Operator
- The Operational Reality
- When Rust Is and Is Not the Right Choice
The Original System: What We Had and Where It Failed
Our data processing pipeline ingested structured event streams from multiple upstream sources, normalized them, ran a series of validation and enrichment passes, aggregated metrics in configurable time windows, and wrote results to a downstream data store. At modest scale, the Node.js implementation was fine. At the scale we reached in 2023, it wasn't.
The pipeline processed batches of JSON records — anywhere from a few hundred to tens of thousands per invocation. Each record went through five sequential processing stages, some of which involved regex matching, string manipulation, and schema validation. Under light load, Node.js handled this without complaint. Under sustained high concurrency, three problems emerged.
Memory pressure and garbage collection pauses. JSON.parse allocates heavily. Each record generated several intermediate objects across the processing stages, none of which were particularly large but all of which created GC pressure. We spent months tuning the Node.js garbage collector (--max-old-space-size, --gc-interval) and reducing allocations in the hot path, but we kept hitting GC pauses of 40–150ms at the p99 that we couldn't eliminate.
Single-threaded CPU saturation. Node.js is single-threaded for JavaScript execution. Our processing stages were pure CPU work — parsing, regex matching, transformation. Worker threads helped partially, but the coordination overhead between workers and the main thread added latency that partially offset the parallelism gain. We were leaving multiple CPU cores idle on every server.
Memory ceiling. Processing a large batch required holding the entire batch in memory. In Node.js, a batch of 50,000 records would occupy significantly more memory than the raw data size due to V8's object representation overhead — strings as UTF-16, object headers, pointer sizes. We were regularly hitting 4–6GB of heap usage on instances sized for much less.
We tried vertical scaling. We tuned the garbage collector exhaustively. We profiled with clinic.js and flamegraphs and squeezed real gains. But the architectural constraints were real: we needed a language with deterministic memory management, true parallelism, and minimal overhead at the object representation level.
Why Rust and Not Go or C++
Go was the first alternative we evaluated seriously. The case for it was obvious: fast compilation, simple concurrency model with goroutines and channels, garbage collector designed for low latency, strong ecosystem, and — critically — a much smaller learning curve than Rust for engineers coming from Node.js.
We prototyped the hot path in Go. The results were good: GC pauses under 2ms at p99, easy parallelism with goroutines, readable code. Go was the pragmatic choice.
We chose Rust anyway, and the reason was the ownership model.
Go's garbage collector is excellent but still non-deterministic. In our use case, we were processing records that could be arbitrarily large, and we needed guaranteed memory behavior under load spikes. Go's GC has improved dramatically with the 1.17+ improvements, but at the scale we were targeting, even sub-2ms pauses were a constraint.
The deeper reason: our processing pipeline had several stages where the same data needed to be read by multiple concurrent workers without copying. In Go, this requires either copying the data (allocation) or using synchronization primitives (locks or channels). In Rust, the borrow checker enforces at compile time that shared references are read-only and mutable references are exclusive. Zero-copy shared reads are safe by construction, and the compiler won't let you create a data race. We didn't need to think about it at runtime because we couldn't write the code wrong.
C++ was rejected more quickly. The safety argument against C++ isn't purely theoretical — it's operational. We are not a systems programming shop. Our engineers have strong experience with JavaScript and moderate experience with typed languages, but none with C++. The risk of memory unsafety bugs (use-after-free, buffer overflows, undefined behavior) in a production data pipeline was not a risk we were willing to accept without a significant investment in C++ expertise we didn't have.
Rust gives you C++-level performance with a type system that makes memory safety bugs a compile-time error instead of a production incident.
The Learning Curve: Being Honest About the Friction
I want to be direct about this because every Rust adoption post glosses over it: the first few weeks were hard.
The borrow checker rejected code that looked correct to everyone on the team. Lifetimes were conceptually intelligible but practically frustrating — we understood why they existed, but we couldn't always write the code the compiler wanted. The error messages are genuinely excellent by the standard of systems languages, but "consider adding a lifetime parameter" is less helpful when you're not sure which lifetime is appropriate.
The specific patterns that caused the most friction:
Iterating and mutating simultaneously. You cannot hold a mutable reference and any other reference to the same data in the same scope. Coming from JavaScript, where you can freely read and write during iteration, this feels unnecessarily restrictive. The restriction is there for good reason — it prevents iterator invalidation bugs — but the mental shift takes time.
Returning references from functions. In JavaScript, you return objects freely without thinking about who owns them. In Rust, returning a reference to data that's local to the function is a compile error — the data won't exist when the caller tries to use the reference. This is a correct restriction, but it forced us to redesign several functions that were written in a "return a slice of this thing" style.
Async Rust specifically. Async/await in Rust is genuinely more complex than in Node.js or Go. The Future trait, the need for Pin, and the Send + Sync bounds on async functions in multithreaded contexts create a category of compiler errors that require real understanding of how async works under the hood.
How we got over the hump: we started with a small, non-critical component of the pipeline, kept the scope narrow, and had one engineer (our most experienced with typed systems) own the first two weeks of implementation. They became the internal expert and helped the rest of the team through the borrow checker friction. After about six weeks, most engineers were writing Rust without regular compiler battles. After three months, the team was productive.
The standard advice is true: read The Rust Book, then read Rust for Rustaceans. Do the Rustlings exercises. Expect weeks of borrow checker friction and don't interpret it as a sign the language is wrong.
The Rewrite Strategy: Strangler Fig and FFI Bindings
We did not do a big-bang rewrite. We've seen big-bang rewrites go wrong too many times.
The strangler fig pattern: incrementally replace pieces of the existing system with new implementations, routing traffic to the new implementation piece by piece, while the old system continues to run. The Node.js pipeline continued processing traffic throughout the rewrite. We never had a flag day where everything cut over simultaneously.
Neon for Node.js/Rust Bindings
During the transition period, we used Neon — a library for writing native Node.js modules in Rust — to call Rust code from our existing Node.js pipeline. This let us replace individual processing stages in Rust while the orchestration, I/O, and downstream writing remained in Node.js.
use neon::prelude::*;
fn process_batch(mut cx: FunctionContext) -> JsResult<JsArray> {
let input = cx.argument::<JsArray>(0)?;
let records: Vec<Handle<JsValue>> = input.to_vec(&mut cx)?;
// process records, return results
let output = JsArray::new(&mut cx, records.len() as u32);
// ...
Ok(output)
}
#[neon::main]
fn main(mut cx: ModuleContext) -> NeonResult<()> {
cx.export_function("processBatch", process_batch)?;
Ok(())
}The performance benefit of running processing stages in Rust was visible immediately, even with the Node.js-to-Rust FFI crossing overhead. The hot stages — JSON parsing, regex matching, schema validation — were fast enough that the FFI overhead was insignificant.
Over about four months, we moved stage by stage until the Rust implementation handled the full pipeline. The Node.js wrapper became a thin entrypoint that we eventually replaced with a standalone Rust binary.
Performance Results
Before and after comparison on the same hardware (8-core, 32GB instances):
| Metric | Node.js | Rust | Improvement |
|---|---|---|---|
| Throughput (records/sec) | ~8,000 | ~85,000 | 10.6x |
| Latency p50 | 45ms | 4ms | 11x |
| Latency p95 | 180ms | 9ms | 20x |
| Latency p99 | 340ms | 18ms | 19x |
| Memory per batch (50k records) | ~5.2GB | ~620MB | 88% reduction |
| CPU utilization at peak | 92% (1 core saturated) | 76% (8 cores utilized) | True parallelism |
The throughput number understates the real gain because the Node.js system was CPU-saturated at those numbers — throughput was limited by the single-threaded bottleneck. The Rust system was not saturated; it had headroom.
The memory numbers are the ones that changed our operational model. The Node.js system required instances with 8–16GB of memory to handle large batches safely. The Rust system runs the same workload on 4GB instances with comfortable headroom.
GC pauses went from a real operational problem to nonexistent. Rust's ownership model means memory is freed deterministically when values go out of scope. There is no garbage collector. There are no GC pauses.
The Ownership Model in Practice
The ownership model is what enables most of the performance characteristics above.
Zero-Copy Parsing
When we parse a JSON batch in Node.js, each parsed string becomes a new JavaScript String object on the heap. Manipulating strings allocates more strings. By the end of a processing chain, we've allocated many times the original data size.
In Rust, we use serde_json with borrowed deserialization, deserializing records into types that hold string slices (&str) referencing the original input buffer rather than copying string data into new allocations. The borrow checker guarantees this is safe — the borrow checker statically verifies that the parsed struct's lifetime doesn't outlive the buffer it borrows from.
#[derive(Deserialize)]
struct Record<'a> {
id: &'a str,
event_type: &'a str,
payload: &'a RawValue,
}This struct holds references into the input buffer. No string copying occurs during deserialization. The processing stages that only need to read fields operate on these references. Only stages that produce output allocate new strings.
Arena Allocation
For stages that do need to allocate, we use bumpalo — a fast arena allocator. Instead of allocating each record's output struct individually through the global allocator (which involves synchronization), we allocate into a bump arena that's created at the start of each batch and freed all at once at the end. Allocation becomes a pointer increment.
Rayon for Data Parallelism
Rayon is a data parallelism library that turns iterator chains into parallel iterator chains with a single method change:
use rayon::prelude::*;
let results: Vec<ProcessedRecord> = records
.par_iter()
.map(|record| process(record))
.filter(|r| r.is_valid())
.collect();par_iter() instead of iter() parallelizes the work across all available CPU cores using a work-stealing thread pool. The borrow checker ensures the parallel processing is safe: because process takes an immutable reference, Rayon knows it's safe to run multiple instances concurrently.
For our pipeline, replacing sequential iteration with Rayon gave us near-linear scaling with core count on the CPU-bound processing stages. An 8-core machine processes roughly 7.5x faster than a single-threaded baseline — close to the theoretical maximum.
Error Handling with Result and the ? Operator
Rust has no exceptions. Functions that can fail return Result<T, E>. This forces errors to be explicit in function signatures and handled by callers.
The ? operator propagates errors up the call stack without boilerplate:
fn process_record(raw: &str) -> Result<ProcessedRecord, ProcessingError> {
let record: RawRecord = serde_json::from_str(raw)?;
let validated = validate(&record)?;
let enriched = enrich(validated)?;
Ok(enriched)
}Each ? unwraps the Ok value or returns the Err from the current function, converting the error type as needed. The result is that the happy path reads linearly while error propagation is automatic.
For our pipeline, this pattern made error handling more consistent than the Node.js version. In JavaScript, it's easy to accidentally swallow errors in async chains or miss error branches. In Rust, the compiler requires every Result to be handled. We discovered several error handling gaps in our Node.js logic during the rewrite, simply because Rust forced us to think about them explicitly.
The Operational Reality
Cross-Compilation
We build for Linux x86_64 in CI (GitHub Actions). Cross-compilation in Rust is genuinely straightforward — add the target with rustup target add, use cross or a Docker-based builder for targets with different libc requirements. Our build produces a single statically linked binary with no runtime dependencies.
Binary Size
Debug builds are large. Release builds with opt-level = "z" (size optimization) and strip = true produce binaries around 8–12MB for our pipeline. Strip the debug symbols from release builds — the default keeps them in, which makes binaries 5–10x larger than necessary.
Deployment
Deploying a Rust binary is operationally simpler than deploying a Node.js application. There's no node_modules, no runtime version to manage, no npm. The binary runs or it doesn't. Container images are smaller because you're copying one binary into a minimal base image rather than a full Node.js installation plus dependencies.
Startup time is negligible — sub-100ms vs. the 500–1500ms startup we had in Node.js for a warm application. For batch processing workloads invoked frequently, this matters.
Debugging
This is the honest downside. Debugging Rust in production is harder than Node.js. Stack traces from panics are useful but require building with debug symbols. Async stack traces are less readable than sync ones. println! debugging is effective for local development but you obviously don't want it in production. We use tracing for structured logging and tokio-console for async task inspection. The ecosystem is good but requires more setup than the Node.js observability toolchain.
When Rust Is and Is Not the Right Choice
Be honest with yourself about whether Rust is the right tool before you commit.
Rust is the right choice when:
- You have a specific, measurable performance problem that you've demonstrated can't be solved within your current language's constraints.
- The hot path does CPU-intensive work (parsing, cryptography, data transformation, compression) rather than I/O-bound waiting.
- Memory predictability matters — you need guaranteed latency, not just average latency.
- You have time to invest in onboarding (budget 2–3 months before the team is productive).
- The component can be isolated — a library, a service, a binary — rather than requiring a wholesale rewrite.
Rust is not the right choice when:
- Your bottleneck is network I/O or database queries. Node.js handles async I/O efficiently; switching to Rust won't help if you're waiting on Postgres.
- Your team has no systems programming background and your project timeline is tight.
- The application is CRUD-heavy with modest throughput requirements. Rust's productivity cost isn't justified by the marginal performance gain.
- You're trying to solve a scaling problem that horizontal scaling or better algorithmic choices would solve instead.
We had a concrete performance problem with a measured root cause. The rewrite had clear success criteria. The component boundaries were well-defined. Those conditions made it the right call.
If you're considering Rust because it's fast and interesting — those are real things — make sure you have the organizational patience for the learning curve and the specific performance requirements that justify it. Rust is not a general-purpose upgrade from Node.js. It's a specialized tool for specific problems. When you have that problem, it's the best tool available.