TL;DR
Your JVM ran out of off-heap (direct) memory. Three quick fixes:
- Raise the direct memory cap:
-XX:MaxDirectMemorySize=512m - On Netty, add
-Dio.netty.maxDirectMemory=0so Netty uses the JVM cap instead of its own separate limit. - Audit
ByteBufferleaks โ direct buffers don't free themselves until theirCleanerfires, which can lag minutes behind allocation under load.
What's going on under the hood
Direct buffers live outside the Java heap. They're allocated via ByteBuffer.allocateDirect() or sun.misc.Unsafe.allocateMemory() and tracked against a separate ceiling: -XX:MaxDirectMemorySize.
The default is usually equal to -Xmx, but on some JVM versions it's as low as 64 MB. Hit the ceiling and you get this:
Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
Four situations that trigger this:
- Netty's
PooledByteBufAllocatorpre-allocates direct arenas at startup โ on a 16-core machine, this can eat 300โ500 MB before your app handles a single request. - Code that allocates direct buffers in a tight loop and relies on GC to free them. GC doesn't run often enough under sustained load.
- gRPC, Kafka client, RxNetty โ these all use Netty internally and quietly consume direct memory you didn't budget for.
- A container with a 2 GB memory limit where the JVM defaulted
MaxDirectMemorySizeto match a 1 GB-Xmx. Add metaspace and thread stacks and you're already over.
Fix 1 โ Raise the direct memory ceiling
This is the fastest lever to pull. Tune the value based on your actual workload:
-XX:MaxDirectMemorySize=1g
For a Spring Boot fat JAR:
java -XX:MaxDirectMemorySize=1g -jar app.jar
In Kubernetes, set it via JAVA_TOOL_OPTIONS so it applies regardless of how the JVM is launched:
env:
- name: JAVA_TOOL_OPTIONS
value: "-XX:MaxDirectMemorySize=512m -Xmx1g"
One rule to live by: heap + direct memory + metaspace + thread stacks must fit inside your container limit. Blow past that and Kubernetes OOMKills the pod โ no JVM error, just a silent restart.
Fix 2 โ Netty-specific tuning
Netty's PooledByteBufAllocator creates one direct arena per CPU core by default. On a beefy machine, startup alone can pre-allocate 400 MB before your app handles a single request.
Option A โ Remove Netty's own direct memory cap and let the JVM flag govern everything:
-Dio.netty.maxDirectMemory=0
Option B โ Switch to heap buffers. Slightly lower throughput, but the memory model is much simpler:
// In your Netty server bootstrap
ServerBootstrap b = new ServerBootstrap();
b.childOption(ChannelOption.ALLOCATOR, new UnpooledByteBufAllocator(false)); // false = heap
Option C โ Keep direct buffers but reduce the arena count to shrink the pre-allocation footprint:
-Dio.netty.allocator.numDirectArenas=1
Option C is often the best tradeoff for I/O-heavy services: you keep the performance benefits of direct memory while cutting startup allocation from 400 MB down to ~25 MB.
Fix 3 โ Find and fix buffer leaks
Raising the limit buys time. If usage climbs steadily and the crash just happens later, you have a leak.
Direct buffers are freed by a Cleaner object tied to garbage collection. If your code holds references or allocates faster than GC can clean up, memory grows without bound. Start by checking what's actually allocated right now โ without restarting:
# Requires -XX:NativeMemoryTracking=summary at startup
jcmd <pid> VM.native_memory summary scale=MB
# Older fallback via jmap
jmap -histo <pid> | grep Direct
To disable buffer caching in the NIO layer (Java 9+, useful during profiling):
-Djdk.nio.maxCachedBufferSize=0
For raw NIO, force immediate release rather than waiting for GC. Note: this uses an internal JDK API that may change in future versions:
ByteBuffer buf = ByteBuffer.allocateDirect(1024 * 1024);
try {
// use buf
} finally {
if (buf instanceof sun.nio.ch.DirectBuffer) {
((sun.nio.ch.DirectBuffer) buf).cleaner().clean();
}
}
For Netty, every ByteBuf must be released. Miss one release() call and the buffer leaks for the lifetime of the process:
ByteBuf buf = ctx.alloc().directBuffer(1024);
try {
// write to buf, pass through pipeline
} finally {
buf.release(); // decrements ref count; frees when refCnt reaches 0
}
Turn on Netty's leak detector in staging. It's expensive in production, but invaluable for tracking down exactly where the leak was allocated:
-Dio.netty.leakDetection.level=PARANOID
Leaks show up as LEAK: ByteBuf.release() was not called before it's garbage-collected in the logs, with a full stack trace pointing at the allocation site.
Fix 4 โ Force more frequent GC (stopgap only)
Can't change code right now? You can tell the JVM to collect garbage more aggressively, which triggers Cleaner callbacks on unreferenced direct buffers sooner:
-XX:+ExplicitGCInvokesConcurrent -XX:MaxGCPauseMillis=50
Alternatively, call System.gc() from a background thread. It's ugly, and it doesn't fix the root cause โ but it's saved more than a few on-call engineers at 3 AM:
ScheduledExecutorService cleaner = Executors.newSingleThreadScheduledExecutor();
cleaner.scheduleAtFixedRate(System::gc, 0, 30, TimeUnit.SECONDS);
Treat this as a hotfix. Ship the real fix โ explicit releases or a higher memory budget โ within the next sprint.
Verify the fix
After applying changes, watch direct memory under realistic load. It should plateau, not climb:
# Requires -XX:NativeMemoryTracking=summary at startup
watch -n 2 'jcmd $(pgrep -f app.jar) VM.native_memory summary scale=MB | grep -A5 "Internal"'
# Via JMX โ works without NativeMemoryTracking
# MBean: java.nio:type=BufferPool,name=direct
# Attributes: Count, MemoryUsed, TotalCapacity
With Prometheus + JMX exporter, graph this metric:
java_nio_buffer_pool_memory_used_bytes{pool="direct"}
A healthy app shows a flat line after warmup. A leak looks like a ski slope โ steady upward drift until the next crash.
If you ran with PARANOID leak detection, a clean run produces zero LEAK: ByteBuf.release() was not called lines. Even one is worth investigating.
Quick reference
- Immediate relief:
-XX:MaxDirectMemorySize=512m(or higher) - Netty startup bloat:
-Dio.netty.allocator.numDirectArenas=1 - Find leaks:
-Dio.netty.leakDetection.level=PARANOID+ always callbuf.release() - Monitor: JMX MBean
java.nio:type=BufferPool,name=director Prometheusjava_nio_buffer_pool_memory_used_bytes{pool="direct"}

