Fix PostgreSQL 'could not create unique index' When Duplicate Data Exists

The Error

You run a migration to add a UNIQUE constraint, and PostgreSQL stops you cold:

ERROR: could not create unique index "users_email_key"
DETAIL: Key (email)=(user@example.com) is duplicated.

PostgreSQL scans the entire table before building any index. The moment it finds two rows sharing the same value, it aborts — the constraint never gets created.

Three situations cause this most often:

A migration adds UNIQUE to a column that was previously unconstrained
Data from multiple sources got merged without enforcing uniqueness first
A bug in application code allowed duplicate inserts before any guard was in place

Find the Duplicates First

Don't touch anything yet. Start by seeing exactly what you're dealing with:

SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY cnt DESC;

This lists every duplicated email alongside its count. On a table with millions of rows, tack on LIMIT 100 to get a quick read on the scale before you commit to a cleanup strategy.

To see the full rows — not just the offending values — use a window function:

SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
  FROM users
) ranked
WHERE rn > 1;

Rows where rn > 1 are duplicates. The row with rn = 1 is the one you keep.

Quick Fix: Delete the Duplicate Rows

If you can safely discard the duplicates, keep the row with the lowest id and delete the rest:

DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
  ) ranked
  WHERE rn > 1
);

Wrap it in a transaction so you can verify the result before it becomes permanent:

BEGIN;

DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
  ) ranked
  WHERE rn > 1
);

-- Should return 0 rows if the cleanup worked
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Clean? Commit. Something off? ROLLBACK.
COMMIT;

A real-world users table with 500,000 rows and a few hundred duplicates typically completes this delete in under a second. Larger tables with millions of duplicates may need batched deletes to avoid locking the table for too long.

When You Can't Delete — Merge or Nullify Instead

Deleting isn't always an option. Other tables might reference those duplicate rows via foreign keys, or business rules might require keeping every record.

Option A: Reassign foreign key references, then delete

-- Move all orders from the duplicate user (id=99) to the canonical one (id=42)
UPDATE orders SET user_id = 42 WHERE user_id = 99;

-- Now it's safe to remove the duplicate
DELETE FROM users WHERE id = 99;

Option B: Nullify the duplicate values

PostgreSQL's unique index ignores NULL values entirely — multiple NULLs in the same column don't conflict. So if the column permits NULL, you can blank out the duplicates rather than deleting the rows:

UPDATE users
SET email = NULL
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
  ) ranked
  WHERE rn > 1
);

Now Add the Constraint

Table is clean. Time to create the constraint:

-- Preferred: named constraint you can drop or reference later
ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);

-- Alternative: standalone index
CREATE UNIQUE INDEX users_email_key ON users (email);

Got a large production table? Use CONCURRENTLY. Without it, PostgreSQL holds an AccessShareLock during the entire index build — blocking writes for however long that takes on, say, a 50 million-row table.

CREATE UNIQUE INDEX CONCURRENTLY users_email_key ON users (email);

One catch: CONCURRENTLY can't run inside a transaction block. Run it as a standalone statement.

Prevent It from Happening Again in Migrations

If this error surfaced inside a migration script, bundle the deduplication step directly into the migration — don't rely on the data being clean before the migration runs:

-- Migration: 20240501_add_unique_email.sql

-- Step 1: Remove duplicates
DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
  ) ranked
  WHERE rn > 1
);

-- Step 2: Add the constraint
ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);

Packaging both steps together means the migration handles whatever state the data happens to be in. No surprises on the next environment.

Verify the Fix

-- 1. Confirm no duplicates remain
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Expected: 0 rows

-- 2. Confirm the constraint exists in the catalog
SELECT conname, contype
FROM pg_constraint
WHERE conrelid = 'users'::regclass
  AND conname = 'users_email_key';
-- Expected: 1 row, contype = 'u'

-- 3. Confirm the constraint actually rejects duplicates
INSERT INTO users (email) VALUES ('user@example.com');
INSERT INTO users (email) VALUES ('user@example.com'); -- must fail
-- Expected: ERROR: duplicate key value violates unique constraint