Fix java.lang.IllegalStateException: Duplicate Key in Collectors.toMap()

The Incident

It's 2 AM and your batch job just died with this:

java.lang.IllegalStateException: Duplicate key userId=1042
  at java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
  at java.util.stream.Collectors.lambda$toMap$1(Collectors.java:174)
  at java.util.HashMap.merge(HashMap.java:1262)
  at java.util.stream.Collectors.lambda$toMap$2(Collectors.java:174)
  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
  at com.example.UserService.buildUserMap(UserService.java:58)

The code that blew up looks innocent enough:

Map<Long, User> userMap = users.stream()
    .collect(Collectors.toMap(User::getId, u -> u));

Works fine in dev. Works fine in staging. Fails in prod at 2 AM because prod data has duplicates that your test datasets don't.

Why This Happens

The two-argument Collectors.toMap() has zero tolerance for duplicate keys. The moment two elements map to the same key, it throws IllegalStateException and stops. No "last one wins". No silent overwrite. Just a crash.

The full error from production looks like:

java.lang.IllegalStateException: Duplicate key (attempt to merge values key1 and key2)

Three common culprits:

The database returned multiple rows for a field you assumed was unique — stale constraint, soft deletes, or a botched data migration
A list was assembled from multiple sources and joined without deduplication
The key function isn't as unique as you thought (a nullable field, for instance, collapses all nulls into a single key)

Step 1 — Find the Duplicate

Before touching the code, confirm what's actually duplicated. Add this diagnostic:

// Find which keys appear more than once
Map<Long, Long> idCount = users.stream()
    .collect(Collectors.groupingBy(User::getId, Collectors.counting()));

idCount.entrySet().stream()
    .filter(e -> e.getValue() > 1)
    .forEach(e -> System.err.println("Duplicate id: " + e.getKey() + " count: " + e.getValue()));

This tells you which keys are duplicated and how many times. Run it once against your production data snapshot. You might find 3 duplicates. You might find 3,000. That scope changes which fix makes sense.

Fix 1 — Merge Function (Most Common Fix)

Pass a third argument to toMap() — the merge function — to define what happens when two elements share a key.

Keep the last one (most common intent)

Map<Long, User> userMap = users.stream()
    .collect(Collectors.toMap(
        User::getId,
        u -> u,
        (existing, replacement) -> replacement  // last one wins
    ));

Keep the first one

Map<Long, User> userMap = users.stream()
    .collect(Collectors.toMap(
        User::getId,
        u -> u,
        (existing, replacement) -> existing  // first one wins
    ));

Merge values when duplicates are intentional

// Example: sum scores for the same userId
Map<Long, Integer> scoreMap = records.stream()
    .collect(Collectors.toMap(
        Record::getUserId,
        Record::getScore,
        Integer::sum
    ));

Pick whichever strategy matches your business requirement. "Last one wins" is fine when you're building a lookup table and the duplicates are a data quality problem. When duplicates carry meaningful data — two transactions for the same order, say — you need a real merge.

Fix 2 — Collect to a List of Values Instead

Sometimes you actually need all the values per key, not just one. Don't reach for toMap() here — use groupingBy():

// Map<userId, List<User>>
Map<Long, List<User>> grouped = users.stream()
    .collect(Collectors.groupingBy(User::getId));

// Map<userId, count>
Map<Long, Long> counts = users.stream()
    .collect(Collectors.groupingBy(User::getId, Collectors.counting()));

Classic use case: mapping an order ID to its line items. The duplicates aren't a bug — they're the data model. groupingBy() is built exactly for this.

Fix 3 — Deduplicate with Logging

When duplicates are bad data and you want to stay alive and leave a paper trail:

Map<Long, User> userMap = users.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(
            User::getId,
            u -> u,
            (a, b) -> {
                log.warn("Duplicate user id {}, dropping: {}", a.getId(), b);
                return a;
            }
        ),
        Collections::unmodifiableMap
    ));

The service keeps running. The warning log gives whoever owns the data quality issue something concrete to act on — an ID, a count, a timestamp.

Fix 4 — Specify the Map Implementation

A fourth argument lets you control the output map type. Handy when order matters:

Map<Long, User> userMap = users.stream()
    .collect(Collectors.toMap(
        User::getId,
        u -> u,
        (existing, replacement) -> replacement,
        LinkedHashMap::new  // preserves insertion order
    ));

Use LinkedHashMap::new for insertion order, TreeMap::new for sorted keys. Without this argument, you get a plain HashMap with no ordering guarantees.

Verify the Fix

After applying the merge function, write a test that deliberately feeds in duplicates:

@Test
void toMap_withDuplicates_shouldKeepLast() {
    List<User> users = List.of(
        new User(1L, "Alice"),
        new User(1L, "Alice-updated"),  // duplicate id
        new User(2L, "Bob")
    );

    Map<Long, User> result = users.stream()
        .collect(Collectors.toMap(
            User::getId,
            u -> u,
            (existing, replacement) -> replacement
        ));

    assertEquals(2, result.size());
    assertEquals("Alice-updated", result.get(1L).getName());
    assertEquals("Bob", result.get(2L).getName());
}

Also trace the duplicates upstream. If the data comes from a database query, check whether the SQL accidentally produces a cross-join or whether the repository layer is returning un-deduplicated rows.

Quick Reference

Two-arg toMap() — throws on duplicate keys, period
Three-arg toMap() — merge function handles duplicates; use this in production
groupingBy() — when multiple values per key is the correct model
Dedup with logging — when duplicates are bad data and you want evidence

The Real Lesson

Two-argument Collectors.toMap() is a landmine in production code. Dev and staging have clean, hand-crafted test data. Production has five years of imports, re-runs, botched migrations, and records nobody remembers creating.

Make it a team convention: any toMap() that touches data you don't fully control gets a merge function. The three-argument version isn't slower or harder to read. It just forces you to be explicit about what your code does when reality doesn't match your assumptions — which, in production, it won't.