Fix PostgreSQL "column must appear in the GROUP BY clause" Error

The Error

ERROR: column "orders.customer_name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT customer_name, status, COUNT(*) FROM orders GROUP BY status
               ^

Spot the problem: SELECT asks for customer_name, but GROUP BY only lists status. One status group might contain 50 rows with 50 different customer names. PostgreSQL has no idea which one to return — so it refuses.

Coming from MySQL? This might catch you off guard. MySQL silently picks an arbitrary row's value, which is usually wrong. PostgreSQL rejects the query outright. That's not a bug — it's correct SQL standard behavior.

Typical Query That Causes This

-- This fails:
SELECT customer_name, status, COUNT(*)
FROM orders
GROUP BY status;

-- ERROR: column "orders.customer_name" must appear in the GROUP BY clause...

You're grouping by status but selecting customer_name. Each status group holds many different customer names — PostgreSQL won't pick one for you.

Step-by-Step Fixes

Fix 1: Add the Missing Column to GROUP BY

The simplest fix. Every non-aggregated column in your SELECT must also appear in GROUP BY.

-- Broken:
SELECT customer_name, status, COUNT(*)
FROM orders
GROUP BY status;

-- Fixed:
SELECT customer_name, status, COUNT(*)
FROM orders
GROUP BY customer_name, status;

Use this when grouping by both columns together is exactly what your report needs.

Fix 2: Wrap the Column in an Aggregate Function

Need just one value from the group — not all of them? Use MIN(), MAX(), or string_agg().

-- Pick any single customer name per status group:
SELECT MIN(customer_name) AS customer_name, status, COUNT(*)
FROM orders
GROUP BY status;

-- Collect all customer names as a comma-separated list:
SELECT status,
       COUNT(*) AS order_count,
       string_agg(DISTINCT customer_name, ', ') AS customers
FROM orders
GROUP BY status;

-- Or as an array:
SELECT status, COUNT(*), array_agg(DISTINCT customer_name) AS customers
FROM orders
GROUP BY status;

Fix 3: Use DISTINCT ON (PostgreSQL-specific)

DISTINCT ON keeps one row per partition after sorting. No subquery, no CTE.

-- Get the most recent order per status:
SELECT DISTINCT ON (status)
  customer_name, status, order_amount, created_at
FROM orders
ORDER BY status, created_at DESC;

The ORDER BY decides which row wins for each group. Here: the latest created_at comes out on top.

Fix 4: Use a CTE with ROW_NUMBER()

Want the full row matching an aggregate condition — say, the biggest order per status? Window functions handle this cleanly.

-- Get the highest-value order per status:
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY status ORDER BY order_amount DESC) AS rn
  FROM orders
)
SELECT customer_name, status, order_amount
FROM ranked
WHERE rn = 1;

Full Working Example

-- Setup:
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name TEXT,
  status TEXT,
  order_amount NUMERIC,
  created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO orders (customer_name, status, order_amount) VALUES
  ('Alice', 'pending', 150.00),
  ('Bob',   'pending', 200.00),
  ('Carol', 'shipped', 350.00),
  ('Dave',  'shipped', 100.00);

-- BROKEN — customer_name not in GROUP BY:
SELECT customer_name, status, COUNT(*), SUM(order_amount)
FROM orders
GROUP BY status;
-- ERROR: column "orders.customer_name" must appear in the GROUP BY clause...

-- OPTION A: Group by both columns:
SELECT customer_name, status, SUM(order_amount)
FROM orders
GROUP BY customer_name, status;
--  customer_name | status  |   sum
-- ---------------+---------+--------
--  Alice         | pending | 150.00
--  Bob           | pending | 200.00
--  Carol         | shipped | 350.00
--  Dave          | shipped | 100.00

-- OPTION B: Summary by status only — drop customer_name:
SELECT status, COUNT(*) AS orders, SUM(order_amount) AS total
FROM orders
GROUP BY status;
--  status  | orders |  total
-- ---------+--------+--------
--  pending |      2 | 350.00
--  shipped |      2 | 450.00

-- OPTION C: Stats + all customers per status:
SELECT status,
       COUNT(*),
       string_agg(customer_name, ', ') AS customers
FROM orders
GROUP BY status;
--  status  | count |   customers
-- ---------+-------+---------------
--  pending |     2 | Alice, Bob
--  shipped |     2 | Carol, Dave

Verify the Fix

Paste your corrected query into psql and make sure it returns rows, not an error:

-- Should return rows, not an error:
SELECT status, COUNT(*), SUM(order_amount)
FROM orders
GROUP BY status;

--  status  | count |   sum
-- ---------+-------+--------
--  pending |     2 | 350.00
--  shipped |     2 | 450.00
-- (2 rows)

Testing DISTINCT ON? Check that you get exactly one row per unique status value:

SELECT DISTINCT ON (status) status, customer_name, created_at
FROM orders
ORDER BY status, created_at DESC;

-- You should see one row per unique status value.

Which Fix Should You Use?

Add to GROUP BY — when grouping by that column too is exactly what you want (most common fix)
MIN / MAX / string_agg / array_agg — when you need a summary or combined value from non-grouped columns
DISTINCT ON — when you need one row per group and can define an ORDER BY to pick the winner
ROW_NUMBER() in a CTE — when you need the full row matching an aggregate condition, like the highest sale per region

One Extra Gotcha: Aliases in GROUP BY

PostgreSQL rejects column aliases in GROUP BY. Many developers hit this on their first day with date extraction:

-- This fails — alias in GROUP BY:
SELECT EXTRACT(YEAR FROM created_at) AS year, COUNT(*)
FROM orders
GROUP BY year;
-- ERROR: column "year" does not exist

-- Fix: repeat the full expression:
SELECT EXTRACT(YEAR FROM created_at) AS year, COUNT(*)
FROM orders
GROUP BY EXTRACT(YEAR FROM created_at);

-- Or use the column position (1-indexed):
SELECT EXTRACT(YEAR FROM created_at) AS year, COUNT(*)
FROM orders
GROUP BY 1;