Fix PostgreSQL 'invalid input syntax for type integer: "abc"' Error

The situation

It's 2 AM. A batch import job just died halfway through, leaving your table in a half-loaded state. The logs show:

ERROR: invalid input syntax for type integer: "abc"
LINE 1: INSERT INTO orders (user_id, amount) VALUES ('abc', 100);
                                                      ^

Or maybe it surfaces through your application driver:

psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "abc"

PostgreSQL is strict about types. Unlike MySQL, it won't silently coerce "abc" into 0 — it throws an error and rolls back the whole transaction. Annoying at 2 AM, but genuinely useful behavior once you fix the root cause.

Debug: find what's actually being inserted

The error message gives you two clues: the target type (integer, numeric, date, uuid) and the exact offending value. Start there.

-- Reproduce it manually
SELECT 'abc'::integer;
-- ERROR: invalid input syntax for type integer: "abc"

-- Test edge cases from your actual data
SELECT '42'::integer;   -- OK: returns 42
SELECT ' 42 '::integer; -- OK: trims whitespace
SELECT '42.5'::integer; -- ERROR: not an integer (use numeric or round first)
SELECT ''::integer;     -- ERROR: empty string is not zero
SELECT NULL::integer;   -- OK: NULL is always valid

These are the usual suspects in the wild:

CSV exports where the header row got mixed into the data rows
Empty strings ("") where NULL was expected
Decimal numbers ("12.50") being inserted into an integer column
IDs that look numeric but contain letters ("USR-042")
Locale-formatted numbers ("1,234" instead of 1234)

Find the bad rows before inserting

When loading from a staging table or CSV, scan for trouble before you touch the real table. One regex catches most problems:

-- Assuming you loaded CSV into a staging table with all TEXT columns
SELECT user_id, amount
FROM staging_orders
WHERE user_id !~ '^-?[0-9]+$'
   OR user_id = '';

-- More portable version that also handles whitespace
SELECT user_id
FROM staging_orders
WHERE user_id IS NOT NULL
  AND user_id !~ '^\s*-?[0-9]+\s*$';

PostgreSQL 14+ lets you write a custom try_cast function for a cleaner approach, but the regex above works on every version.

Fix 1: Clean the data before inserting

Fixing the source data beats working around it in SQL. Less clever, more reliable.

-- Empty strings become NULL instead of exploding
INSERT INTO orders (user_id, amount)
SELECT
  NULLIF(user_id, '')::integer,
  amount::numeric
FROM staging_orders
WHERE user_id ~ '^-?[0-9]+$' OR user_id IS NULL;

-- Strip commas from locale-formatted numbers like "1,234"
INSERT INTO orders (user_id, amount)
SELECT
  user_id::integer,
  REPLACE(amount, ',', '')::numeric
FROM staging_orders;

Fix 2: Validate in application code before the query runs

Bad data from user input or an external API? Catch it in your application layer — before it ever reaches Postgres.

Python (psycopg2 / psycopg3):

def safe_insert_order(conn, user_id_raw, amount_raw):
    try:
        user_id = int(user_id_raw)
    except (ValueError, TypeError):
        raise ValueError(f"Invalid user_id: {user_id_raw!r}")

    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO orders (user_id, amount) VALUES (%s, %s)",
            (user_id, amount_raw)
        )
    conn.commit()

Node.js (pg):

async function insertOrder(client, userIdRaw, amount) {
  const userId = parseInt(userIdRaw, 10);
  if (isNaN(userId)) {
    throw new Error(`Invalid user_id: ${userIdRaw}`);
  }
  await client.query(
    'INSERT INTO orders (user_id, amount) VALUES ($1, $2)',
    [userId, amount]
  );
}

Fix 3: Add a CHECK constraint or domain type

If bad data keeps sneaking in at the database level, lock it down with a constraint. This way even a buggy migration script can't break things silently.

-- Simple check constraint on one column
ALTER TABLE orders
  ADD CONSTRAINT chk_user_id_positive CHECK (user_id > 0);

-- Or define a reusable domain type across multiple tables
CREATE DOMAIN positive_int AS integer CHECK (VALUE > 0);

ALTER TABLE orders ALTER COLUMN user_id TYPE positive_int;

Fix 4: Bulk load with COPY and a staging table

For large CSV imports, skip INSERT entirely. Load into an all-text staging table first, validate, then cast. Recovering from a bad staging load is painless. Recovering mid-transaction on a production table is not.

-- Step 1: staging table with all TEXT columns
CREATE TEMP TABLE staging_orders (
  user_id    TEXT,
  amount     TEXT,
  created_at TEXT
);

-- Step 2: load the raw CSV
COPY staging_orders FROM '/tmp/orders.csv' CSV HEADER;

-- Step 3: validate — this should return 0
SELECT COUNT(*) FROM staging_orders
WHERE user_id !~ '^[0-9]+$';

-- Step 4: cast and insert only when validation passes
INSERT INTO orders (user_id, amount, created_at)
SELECT
  user_id::integer,
  amount::numeric,
  created_at::timestamptz
FROM staging_orders;

Verify the fix worked

-- Row count should match your source file
SELECT COUNT(*) FROM orders;

-- Make sure no unexpected NULLs slipped in
SELECT COUNT(*) FROM orders WHERE user_id IS NULL;

-- Spot-check the most recently inserted rows
SELECT user_id, amount FROM orders ORDER BY id DESC LIMIT 10;

If you wrapped the import in an explicit transaction, confirm it committed:

-- In psql
\echo :AUTOCOMMIT
-- or
SELECT txid_current();

Lessons learned

External data is never the type you think it is. Validate or sanitize everything before inserting into a typed column — CSVs, webhooks, and legacy systems all lie.
Staging tables are worth the extra step. Load as TEXT, validate, then cast. You'll thank yourself the first time a 500k-row import has 3 bad rows at line 450,000.
Empty string ≠ NULL in PostgreSQL. NULLIF(val, '') is your friend. MySQL quietly accepts "" in numeric columns; PostgreSQL will not.
Parameterized queries do type conversion for you. Use %s or $1 placeholders with properly typed Python or Node values. String concatenation bypasses that safety net completely.
The error message already tells you the answer. PostgreSQL always quotes the offending value — use it to trace back to the exact source row rather than hunting through the whole dataset.