The Error ScenarioYou’re mid-build—maybe it’s a customer-facing chatbot or a heavy-duty data pipeline—and everything grinds to a halt. Your logs start filling up with a specific, frustrating message:
google.api_core.exceptions.ResourceExhausted: 429 RESOURCE_EXHAUSTED
If you check the raw JSON response from a cURL command, you’ll see this:
{
"error": {
"code": 429,
"message": "Quota exceeded for aiplatform.googleapis.com/generate_content_requests per minute...",
"status": "RESOURCE_EXHAUSTED"
}
}
Essentially, the Gemini API is acting like a traffic cop. You've sent too much data or too many requests in a short window. While this is most common on the Free Tier, even paid users can hit these walls if their project settings aren't optimized.
Analysis: Why is this happening?Google manages traffic using three specific metrics. Understanding which one you're hitting is the first step to a fix.
- RPM (Requests Per Minute): A simple count of how many times you call the API.- RPD (Requests Per Day): Your total daily allowance, which resets at midnight UTC.- TPM (Tokens Per Minute): The volume of text processed, including both your prompt and the model's response.The limits vary wildly by model. For instance, the Gemini 1.5 Flash Free Tier allows 15 RPM. However, the Gemini 1.5 Pro Free Tier is much tighter, often limited to just 2 RPM. If you process a CSV with 50 rows in a fast loop, you'll trigger a 429 error in less than five seconds.
Quick Fix: Implement Exponential BackoffDon't just retry the request immediately. This creates a "thundering herd" problem that keeps you blocked. Instead, use exponential backoff. This technique adds an increasing delay between each retry, giving the API time to reset your quota bucket.
Python Example with TenacityThe tenacity library is the most reliable way to handle retries in Python. It manages the timing so you don't have to write complex loops.
import os
import google.generativeai as genai
from tenacity import retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-1.5-flash')
@retry(
retry=retry_if_exception_type(Exception),
wait=wait_random_exponential(min=1, max=60),
stop=stop_after_attempt(6)
)
def generate_content_with_retry(prompt):
try:
return model.generate_content(prompt).text
except Exception as e:
if "429" in str(e):
print("Rate limit reached. Backing off...")
raise e
raise e
Node.js ExampleIn JavaScript, you can use a simple async helper. This version adds a bit of "jitter" (randomness) to the wait time, which helps prevent multiple instances from retrying at the exact same millisecond.
const sleep = (ms) => new Promise(res => setTimeout(res, ms));
async function callGeminiWithRetry(prompt, retries = 5) {
for (let i = 0; i Quotas**.- Filter the list for "Generative Language API".- Find `generate_content_requests_per_minute`.- Click **Edit Quotas** and provide a brief explanation of your use case. Requests are often approved within 24–48 hours.## Verification: Is it Fixed?Don't just assume the error is gone. Monitor the **API & Services Dashboard** in Google Cloud. A healthy project should show a high volume of 200 OK responses and only a tiny sliver of 4xx errors. If you see your average latency increasing, your backoff logic is working—it's successfully delaying requests during peak traffic instead of letting them fail.

