The situation
It's 2 AM. A batch import job just died halfway through, leaving your table in a half-loaded state. The logs show:
ERROR: invalid input syntax for type integer: "abc"
LINE 1: INSERT INTO orders (user_id, amount) VALUES ('abc', 100);
^
Or maybe it surfaces through your application driver:
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "abc"
PostgreSQL is strict about types. Unlike MySQL, it won't silently coerce "abc" into 0 โ it throws an error and rolls back the whole transaction. Annoying at 2 AM, but genuinely useful behavior once you fix the root cause.
Debug: find what's actually being inserted
The error message gives you two clues: the target type (integer, numeric, date, uuid) and the exact offending value. Start there.
-- Reproduce it manually
SELECT 'abc'::integer;
-- ERROR: invalid input syntax for type integer: "abc"
-- Test edge cases from your actual data
SELECT '42'::integer; -- OK: returns 42
SELECT ' 42 '::integer; -- OK: trims whitespace
SELECT '42.5'::integer; -- ERROR: not an integer (use numeric or round first)
SELECT ''::integer; -- ERROR: empty string is not zero
SELECT NULL::integer; -- OK: NULL is always valid
These are the usual suspects in the wild:
- CSV exports where the header row got mixed into the data rows
- Empty strings (
"") whereNULLwas expected - Decimal numbers (
"12.50") being inserted into anintegercolumn - IDs that look numeric but contain letters (
"USR-042") - Locale-formatted numbers (
"1,234"instead of1234)
Find the bad rows before inserting
When loading from a staging table or CSV, scan for trouble before you touch the real table. One regex catches most problems:
-- Assuming you loaded CSV into a staging table with all TEXT columns
SELECT user_id, amount
FROM staging_orders
WHERE user_id !~ '^-?[0-9]+$'
OR user_id = '';
-- More portable version that also handles whitespace
SELECT user_id
FROM staging_orders
WHERE user_id IS NOT NULL
AND user_id !~ '^\s*-?[0-9]+\s*$';
PostgreSQL 14+ lets you write a custom try_cast function for a cleaner approach, but the regex above works on every version.
Fix 1: Clean the data before inserting
Fixing the source data beats working around it in SQL. Less clever, more reliable.
-- Empty strings become NULL instead of exploding
INSERT INTO orders (user_id, amount)
SELECT
NULLIF(user_id, '')::integer,
amount::numeric
FROM staging_orders
WHERE user_id ~ '^-?[0-9]+$' OR user_id IS NULL;
-- Strip commas from locale-formatted numbers like "1,234"
INSERT INTO orders (user_id, amount)
SELECT
user_id::integer,
REPLACE(amount, ',', '')::numeric
FROM staging_orders;
Fix 2: Validate in application code before the query runs
Bad data from user input or an external API? Catch it in your application layer โ before it ever reaches Postgres.
Python (psycopg2 / psycopg3):
def safe_insert_order(conn, user_id_raw, amount_raw):
try:
user_id = int(user_id_raw)
except (ValueError, TypeError):
raise ValueError(f"Invalid user_id: {user_id_raw!r}")
with conn.cursor() as cur:
cur.execute(
"INSERT INTO orders (user_id, amount) VALUES (%s, %s)",
(user_id, amount_raw)
)
conn.commit()
Node.js (pg):
async function insertOrder(client, userIdRaw, amount) {
const userId = parseInt(userIdRaw, 10);
if (isNaN(userId)) {
throw new Error(`Invalid user_id: ${userIdRaw}`);
}
await client.query(
'INSERT INTO orders (user_id, amount) VALUES ($1, $2)',
[userId, amount]
);
}
Fix 3: Add a CHECK constraint or domain type
If bad data keeps sneaking in at the database level, lock it down with a constraint. This way even a buggy migration script can't break things silently.
-- Simple check constraint on one column
ALTER TABLE orders
ADD CONSTRAINT chk_user_id_positive CHECK (user_id > 0);
-- Or define a reusable domain type across multiple tables
CREATE DOMAIN positive_int AS integer CHECK (VALUE > 0);
ALTER TABLE orders ALTER COLUMN user_id TYPE positive_int;
Fix 4: Bulk load with COPY and a staging table
For large CSV imports, skip INSERT entirely. Load into an all-text staging table first, validate, then cast. Recovering from a bad staging load is painless. Recovering mid-transaction on a production table is not.
-- Step 1: staging table with all TEXT columns
CREATE TEMP TABLE staging_orders (
user_id TEXT,
amount TEXT,
created_at TEXT
);
-- Step 2: load the raw CSV
COPY staging_orders FROM '/tmp/orders.csv' CSV HEADER;
-- Step 3: validate โ this should return 0
SELECT COUNT(*) FROM staging_orders
WHERE user_id !~ '^[0-9]+$';
-- Step 4: cast and insert only when validation passes
INSERT INTO orders (user_id, amount, created_at)
SELECT
user_id::integer,
amount::numeric,
created_at::timestamptz
FROM staging_orders;
Verify the fix worked
-- Row count should match your source file
SELECT COUNT(*) FROM orders;
-- Make sure no unexpected NULLs slipped in
SELECT COUNT(*) FROM orders WHERE user_id IS NULL;
-- Spot-check the most recently inserted rows
SELECT user_id, amount FROM orders ORDER BY id DESC LIMIT 10;
If you wrapped the import in an explicit transaction, confirm it committed:
-- In psql
\echo :AUTOCOMMIT
-- or
SELECT txid_current();
Lessons learned
- External data is never the type you think it is. Validate or sanitize everything before inserting into a typed column โ CSVs, webhooks, and legacy systems all lie.
- Staging tables are worth the extra step. Load as TEXT, validate, then cast. You'll thank yourself the first time a 500k-row import has 3 bad rows at line 450,000.
- Empty string โ NULL in PostgreSQL.
NULLIF(val, '')is your friend. MySQL quietly accepts""in numeric columns; PostgreSQL will not. - Parameterized queries do type conversion for you. Use
%sor$1placeholders with properly typed Python or Node values. String concatenation bypasses that safety net completely. - The error message already tells you the answer. PostgreSQL always quotes the offending value โ use it to trace back to the exact source row rather than hunting through the whole dataset.

