Fix OpenAI BadRequestError: The response was filtered by Content Moderation

The Error

Mid-API call, your app dies with this:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry.', 'type': 'content_filter', 'param': 'prompt', 'code': 'content_filter'}}

On the standard OpenAI API it's terser:

openai.BadRequestError: The response was filtered

Either way, the content moderation layer intercepted your request. It blocked either the input prompt or the generated output — before it ever reached your code. This is a 400, not a 500. The server didn't fail; it deliberately refused.

Why This Happens

Every prompt and completion passes through OpenAI's moderation pipeline. The system scores content across categories like hate, self-harm, sexual content, and violence. Scores run from 0.0 to 1.0 — cross the threshold in any single category, and you get this error instead of a completion.

Common triggers:

Sensitive keywords in technical or academic context (security research, medical discussions)
Raw user input dropped into the prompt without any sanitization
Creative writing that involves violence, abuse, or adult themes
Medical or legal scenarios describing harm-related situations
Security prompts — asking about CVEs, exploit patterns, or vulnerability details
The model's own output being filtered — your prompt was clean, but the completion wasn't

That last one catches people off guard. You can't always predict what the model will generate, so even a safe-looking prompt can produce filtered output.

Step 1 — Identify What Got Filtered

Before rewriting anything, confirm whether the problem is your input or the model's output. Run your prompt through the Moderation API directly:

import openai

client = openai.OpenAI()

response = client.moderations.create(
    input="Your prompt text here"
)

result = response.results[0]
print("Flagged:", result.flagged)
print("Categories:", result.categories)
print("Scores:", result.category_scores)

If result.flagged is True, your input is the problem. The result.categories dict tells you exactly what triggered it — something like violence: True or sexual/minors: True. The scores show how close other categories are to the threshold, which is useful when a prompt fails intermittently.

If flagged is False but you still get the error, the output is being filtered. Move on to checking finish_reason (Step 4).

Step 2 — Rewrite the Prompt

Rephrasing is usually enough. A few patterns that work:

Swap emotionally charged words for clinical equivalents. "How to hurt X" → "What are the risks associated with X".
Add framing that makes intent explicit: "for a security audit", "in a fictional context", "for a medical training dataset".
Split long prompts. A single flagged sentence buried in a 500-word prompt will still kill the whole request.
Sanitize user input before it hits the API. Don't trust what users send you.

# Unsafe — user input goes straight into the prompt
prompt = f"User asked: {user_input}\nAnswer:"

# Safe — check first, then build the prompt
moderation_check = client.moderations.create(input=user_input)
if moderation_check.results[0].flagged:
    raise ValueError("User input contains flagged content.")

prompt = f"User asked: {user_input}\nAnswer:"

Step 3 — Handle the Error Gracefully in Code

Don't let a content filter crash your whole app. Catch BadRequestError and give users a meaningful message:

import openai

client = openai.OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": user_message}
        ]
    )
    answer = response.choices[0].message.content
except openai.BadRequestError as e:
    if "content_filter" in str(e) or "response was filtered" in str(e).lower():
        answer = "Sorry, I can't respond to that request due to content policy."
    else:
        raise  # Different BadRequestError — don't swallow it

The else: raise matters. Not every BadRequestError is a content filter — invalid model names, malformed messages, and token limit overflows throw the same exception type.

Step 4 — Check finish_reason on Successful Responses

There's a subtler variant: the API returns HTTP 200, but the output was filtered mid-generation. In this case finish_reason is content_filter instead of stop, and message.content may be None or truncated.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

choice = response.choices[0]
if choice.finish_reason == "content_filter":
    print("Output was filtered — content may be incomplete or None")
    print("Content:", choice.message.content)  # May be None or truncated
else:
    print(choice.message.content)

Always check finish_reason in production. Blindly reading message.content when it's None will throw an AttributeError deeper in your code, which is harder to trace than the original filter error.

Step 5 — Azure OpenAI Specific Fix

Azure OpenAI gives you more control. Content filter strictness is configurable per deployment — you can request adjusted thresholds for specific categories through the Azure portal.

Navigate to Azure OpenAI → Your Resource → Content Filters and create a custom filter profile. For example, a game moderation service might need looser violence filtering. Microsoft approves these on a case-by-case basis.

One more thing specific to Azure: a wrong deployment name can surface errors that look like content filter issues. Double-check it:

client = openai.AzureOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key",
    api_version="2024-02-01"
)

response = client.chat.completions.create(
    model="your-deployment-name",  # Must match your Azure deployment exactly
    messages=[{"role": "user", "content": prompt}]
)

Verify the Fix

Two checks before calling it done:

import openai

client = openai.OpenAI()

# 1. Confirm the prompt is clean
mod = client.moderations.create(input=your_new_prompt)
print("Still flagged:", mod.results[0].flagged)  # Should be False

# 2. Run the actual completion
try:
    res = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": your_new_prompt}]
    )
    print("finish_reason:", res.choices[0].finish_reason)  # Should be 'stop'
    print("Response:", res.choices[0].message.content)
except openai.BadRequestError as e:
    print("Still getting error:", e)

flagged: False and finish_reason: stop — that's the clean state you want.

Tips

Never pass raw user input directly into prompts. Always pre-screen with the Moderation API first, particularly in apps that handle user-generated content.
The Moderation API is free to call. Use it as a gate before every completion in high-risk workflows — the latency cost is negligible compared to a failed request.
System prompts can trigger filtering too. A system message that instructs the model to role-play as a harmful character will get caught just like user messages.
gpt-3.5-turbo and gpt-4o don't have identical sensitivity. A prompt that fails on one model sometimes works on the other — worth testing if you have flexibility.
Legitimately sensitive use cases (medical education, security research, legal analysis) can apply for a usage policy exception through OpenAI's support portal. Document your use case clearly when submitting.