The Error
You run a migration to add a UNIQUE constraint, and PostgreSQL stops you cold:
ERROR: could not create unique index "users_email_key"
DETAIL: Key (email)=(user@example.com) is duplicated.
PostgreSQL scans the entire table before building any index. The moment it finds two rows sharing the same value, it aborts โ the constraint never gets created.
Three situations cause this most often:
- A migration adds
UNIQUEto a column that was previously unconstrained - Data from multiple sources got merged without enforcing uniqueness first
- A bug in application code allowed duplicate inserts before any guard was in place
Find the Duplicates First
Don't touch anything yet. Start by seeing exactly what you're dealing with:
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY cnt DESC;
This lists every duplicated email alongside its count. On a table with millions of rows, tack on LIMIT 100 to get a quick read on the scale before you commit to a cleanup strategy.
To see the full rows โ not just the offending values โ use a window function:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1;
Rows where rn > 1 are duplicates. The row with rn = 1 is the one you keep.
Quick Fix: Delete the Duplicate Rows
If you can safely discard the duplicates, keep the row with the lowest id and delete the rest:
DELETE FROM users
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1
);
Wrap it in a transaction so you can verify the result before it becomes permanent:
BEGIN;
DELETE FROM users
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1
);
-- Should return 0 rows if the cleanup worked
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Clean? Commit. Something off? ROLLBACK.
COMMIT;
A real-world users table with 500,000 rows and a few hundred duplicates typically completes this delete in under a second. Larger tables with millions of duplicates may need batched deletes to avoid locking the table for too long.
When You Can't Delete โ Merge or Nullify Instead
Deleting isn't always an option. Other tables might reference those duplicate rows via foreign keys, or business rules might require keeping every record.
Option A: Reassign foreign key references, then delete
-- Move all orders from the duplicate user (id=99) to the canonical one (id=42)
UPDATE orders SET user_id = 42 WHERE user_id = 99;
-- Now it's safe to remove the duplicate
DELETE FROM users WHERE id = 99;
Option B: Nullify the duplicate values
PostgreSQL's unique index ignores NULL values entirely โ multiple NULLs in the same column don't conflict. So if the column permits NULL, you can blank out the duplicates rather than deleting the rows:
UPDATE users
SET email = NULL
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1
);
Now Add the Constraint
Table is clean. Time to create the constraint:
-- Preferred: named constraint you can drop or reference later
ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);
-- Alternative: standalone index
CREATE UNIQUE INDEX users_email_key ON users (email);
Got a large production table? Use CONCURRENTLY. Without it, PostgreSQL holds an AccessShareLock during the entire index build โ blocking writes for however long that takes on, say, a 50 million-row table.
CREATE UNIQUE INDEX CONCURRENTLY users_email_key ON users (email);
One catch: CONCURRENTLY can't run inside a transaction block. Run it as a standalone statement.
Prevent It from Happening Again in Migrations
If this error surfaced inside a migration script, bundle the deduplication step directly into the migration โ don't rely on the data being clean before the migration runs:
-- Migration: 20240501_add_unique_email.sql
-- Step 1: Remove duplicates
DELETE FROM users
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1
);
-- Step 2: Add the constraint
ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);
Packaging both steps together means the migration handles whatever state the data happens to be in. No surprises on the next environment.
Verify the Fix
-- 1. Confirm no duplicates remain
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Expected: 0 rows
-- 2. Confirm the constraint exists in the catalog
SELECT conname, contype
FROM pg_constraint
WHERE conrelid = 'users'::regclass
AND conname = 'users_email_key';
-- Expected: 1 row, contype = 'u'
-- 3. Confirm the constraint actually rejects duplicates
INSERT INTO users (email) VALUES ('user@example.com');
INSERT INTO users (email) VALUES ('user@example.com'); -- must fail
-- Expected: ERROR: duplicate key value violates unique constraint

