When the JVM Chokes on Garbage CollectionI recently wrestled with this during a heavy data migration task. For the first 10 minutes, the logs looked perfect. Then, throughput plummeted, CPU fans started screaming, and the process finally collapsed with a familiar, frustrating message:
java.lang.OutOfMemoryError: GC overhead limit exceeded
This isn't your typical "Heap Space" error. It is the JVM’s way of throwing in the towel. It happens when the Garbage Collector (GC) works overtime but achieves almost nothing, leaving your application gasping for resources.
The 98/2 RuleBy default, the JVM triggers this error if it spends over 98% of its time performing garbage collection while recovering less than 2% of the heap. It is a vital defensive mechanism. Without this limit, your application would hang indefinitely while the CPU sits pegged at 100% just to free up a few measly bytes.
The Immediate Band-Aid: Expanding the HeapIf your application is simply handling a larger dataset than usual—like processing a 2GB XML file when you only allocated 1GB—you can give it more breathing room via the -Xmx parameter.
# Bump the maximum heap size to 4GB
java -Xmx4g -jar high-load-app.jar
The "Panic Button" FlagYou can technically disable this safety check. I’ve used this once to let a 5-hour migration finish its last 10%, but use it with extreme caution. Disabling it usually results in the app freezing entirely or crashing with a standard Java heap space error moments later.
-XX:-UseGCOverheadLimit
Hunting the Root CauseAdding memory is often just delaying the inevitable. If you have a leak, a 16GB heap will eventually fill up just as fast as a 2GB one. You need to see what is actually hogging the RAM.
1. Catch the Crash with a Heap DumpDon't try to guess. Tell the JVM to save its state the moment it fails so you can inspect the "crime scene" later:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./diagnostics/crash_dump.hprof
If the app is currently lagging but hasn't crashed yet, grab a live dump using the process ID (PID):
# Get the PID from 'jps' first
jmap -dump:live,format=b,file=debug_dump.hprof [PID]
2. Analyze the EvidenceLoad that .hprof file into Eclipse Memory Analyzer (MAT) or VisualVM. Run the "Leak Suspects" report. In my experience, the culprit is usually one of these four:
- Abandoned Collections: A
static HashMapused for caching that never removes old entries.- Resource Leaks: Database ResultSets or FileStreams that weren't wrapped in atry-with-resourcesblock.- Object Churn: Creating 500,000Stringobjects inside a loop instead of using aStringBuilder.- Missing Pagination: Fetching 200,000 rows from a database into aList<User>instead of processing them in batches of 500.### 3. Practical Code Fix: Streaming vs. LoadingStop loading entire datasets into RAM. If you're using Spring Data JPA, swap the standard list fetch for a stream. This allows the GC to clean up processed objects while the loop is still running.
// AVOID: Pulls every record into memory at once
List<Transaction> history = repository.findAll();
history.forEach(this::calculateTax);
// BETTER: Processes one record at a time
try (Stream<Transaction> stream = repository.streamAll()) {
stream.forEach(this::calculateTax);
}
Verification: Is it actually fixed?Don't cross your fingers and hope. Use jstat to watch the GC health in real-time while your app works:
# Watch GC statistics every 2 seconds
jstat -gcutil [PID] 2000
Keep an eye on the GCT (Total GC Time). If it climbs steadily while the O (Old Generation) stays at 99%, your leak is still there. A healthy application shows a "sawtooth" pattern: memory climbs, the GC fires, and usage drops back to a clean baseline.

