The Incident
It's 2 AM and your batch job just died with this:
java.lang.IllegalStateException: Duplicate key userId=1042
at java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
at java.util.stream.Collectors.lambda$toMap$1(Collectors.java:174)
at java.util.HashMap.merge(HashMap.java:1262)
at java.util.stream.Collectors.lambda$toMap$2(Collectors.java:174)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at com.example.UserService.buildUserMap(UserService.java:58)
The code that blew up looks innocent enough:
Map<Long, User> userMap = users.stream()
.collect(Collectors.toMap(User::getId, u -> u));
Works fine in dev. Works fine in staging. Fails in prod at 2 AM because prod data has duplicates that your test datasets don't.
Why This Happens
The two-argument Collectors.toMap() has zero tolerance for duplicate keys. The moment two elements map to the same key, it throws IllegalStateException and stops. No "last one wins". No silent overwrite. Just a crash.
The full error from production looks like:
java.lang.IllegalStateException: Duplicate key (attempt to merge values key1 and key2)
Three common culprits:
- The database returned multiple rows for a field you assumed was unique โ stale constraint, soft deletes, or a botched data migration
- A list was assembled from multiple sources and joined without deduplication
- The key function isn't as unique as you thought (a nullable field, for instance, collapses all nulls into a single key)
Step 1 โ Find the Duplicate
Before touching the code, confirm what's actually duplicated. Add this diagnostic:
// Find which keys appear more than once
Map<Long, Long> idCount = users.stream()
.collect(Collectors.groupingBy(User::getId, Collectors.counting()));
idCount.entrySet().stream()
.filter(e -> e.getValue() > 1)
.forEach(e -> System.err.println("Duplicate id: " + e.getKey() + " count: " + e.getValue()));
This tells you which keys are duplicated and how many times. Run it once against your production data snapshot. You might find 3 duplicates. You might find 3,000. That scope changes which fix makes sense.
Fix 1 โ Merge Function (Most Common Fix)
Pass a third argument to toMap() โ the merge function โ to define what happens when two elements share a key.
Keep the last one (most common intent)
Map<Long, User> userMap = users.stream()
.collect(Collectors.toMap(
User::getId,
u -> u,
(existing, replacement) -> replacement // last one wins
));
Keep the first one
Map<Long, User> userMap = users.stream()
.collect(Collectors.toMap(
User::getId,
u -> u,
(existing, replacement) -> existing // first one wins
));
Merge values when duplicates are intentional
// Example: sum scores for the same userId
Map<Long, Integer> scoreMap = records.stream()
.collect(Collectors.toMap(
Record::getUserId,
Record::getScore,
Integer::sum
));
Pick whichever strategy matches your business requirement. "Last one wins" is fine when you're building a lookup table and the duplicates are a data quality problem. When duplicates carry meaningful data โ two transactions for the same order, say โ you need a real merge.
Fix 2 โ Collect to a List of Values Instead
Sometimes you actually need all the values per key, not just one. Don't reach for toMap() here โ use groupingBy():
// Map<userId, List<User>>
Map<Long, List<User>> grouped = users.stream()
.collect(Collectors.groupingBy(User::getId));
// Map<userId, count>
Map<Long, Long> counts = users.stream()
.collect(Collectors.groupingBy(User::getId, Collectors.counting()));
Classic use case: mapping an order ID to its line items. The duplicates aren't a bug โ they're the data model. groupingBy() is built exactly for this.
Fix 3 โ Deduplicate with Logging
When duplicates are bad data and you want to stay alive and leave a paper trail:
Map<Long, User> userMap = users.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(
User::getId,
u -> u,
(a, b) -> {
log.warn("Duplicate user id {}, dropping: {}", a.getId(), b);
return a;
}
),
Collections::unmodifiableMap
));
The service keeps running. The warning log gives whoever owns the data quality issue something concrete to act on โ an ID, a count, a timestamp.
Fix 4 โ Specify the Map Implementation
A fourth argument lets you control the output map type. Handy when order matters:
Map<Long, User> userMap = users.stream()
.collect(Collectors.toMap(
User::getId,
u -> u,
(existing, replacement) -> replacement,
LinkedHashMap::new // preserves insertion order
));
Use LinkedHashMap::new for insertion order, TreeMap::new for sorted keys. Without this argument, you get a plain HashMap with no ordering guarantees.
Verify the Fix
After applying the merge function, write a test that deliberately feeds in duplicates:
@Test
void toMap_withDuplicates_shouldKeepLast() {
List<User> users = List.of(
new User(1L, "Alice"),
new User(1L, "Alice-updated"), // duplicate id
new User(2L, "Bob")
);
Map<Long, User> result = users.stream()
.collect(Collectors.toMap(
User::getId,
u -> u,
(existing, replacement) -> replacement
));
assertEquals(2, result.size());
assertEquals("Alice-updated", result.get(1L).getName());
assertEquals("Bob", result.get(2L).getName());
}
Also trace the duplicates upstream. If the data comes from a database query, check whether the SQL accidentally produces a cross-join or whether the repository layer is returning un-deduplicated rows.
Quick Reference
- Two-arg
toMap()โ throws on duplicate keys, period - Three-arg
toMap()โ merge function handles duplicates; use this in production groupingBy()โ when multiple values per key is the correct model- Dedup with logging โ when duplicates are bad data and you want evidence
The Real Lesson
Two-argument Collectors.toMap() is a landmine in production code. Dev and staging have clean, hand-crafted test data. Production has five years of imports, re-runs, botched migrations, and records nobody remembers creating.
Make it a team convention: any toMap() that touches data you don't fully control gets a merge function. The three-argument version isn't slower or harder to read. It just forces you to be explicit about what your code does when reality doesn't match your assumptions โ which, in production, it won't.

