Recovering from AWS RDS 'storage-full': Immediate Fixes and Long-term Prevention

The Panic of a 'storage-full' RDS

It usually hits at the worst possible time. Your application starts throwing 500 errors, and the AWS console greets you with a dreaded red status: storage-full. This isn't just a simple warning. It is a critical state where the database engine stops accepting writes. In many cases, it will drop existing connections entirely to protect the integrity of the file system.

When an RDS instance hits 100% disk usage, it loses its 'Available' status. You will see this specific error message:

DB instance is in storage-full state. The instance may not be able to accept connections or perform writes.

The 60-Second Recovery

Open the AWS RDS Console.
Select your database and click Modify.
Find Allocated storage. Increase the value by at least 20% or 50 GB to give yourself a comfortable buffer.
Scroll to the bottom, click Continue, and select Apply immediately.
Wait for the status to cycle from 'Modifying' back to 'Available'.

Why did the disk fill up?

Data growth is rarely the only culprit. Often, hidden overhead consumes your space. I have seen 100 GB volumes crash because of a single rogue background job. Watch out for these common offenders:

Uncommitted Transactions: In MySQL or Postgres, one long-running transaction can block the purging of binary logs. These logs can easily grow to 50 GB+ in a few hours.
Spilled Temporary Files: Complex joins or large ORDER BY operations that don't fit in RAM will write tmp files to the disk. A single bad query can eat 20 GB of space in minutes.
Log Bloat: If you set log_output to 'FILE' and skip rotation, your slow query logs will grow until the volume is exhausted.

Fix Approach 1: Use the AWS CLI for Speed

When the console is lagging during an outage, the terminal is your best friend. This command triggers an immediate storage expansion.

# Bump storage from 100GB to 150GB
aws rds modify-db-instance \
    --db-instance-identifier my-production-db \
    --allocated-storage 150 \
    --apply-immediately

Warning: Storage increases are permanent. You cannot shrink an RDS volume once it is provisioned. After you scale up, the instance enters 'storage-optimization' for several hours. You cannot modify the storage size again until this process finishes.

Fix Approach 2: Manual Cleanup

If you have reached the maximum storage limit for your instance class, you must delete data manually. If you can still squeeze in a connection, try these commands.

For MySQL/MariaDB:

Force a rotation of the internal logs to reclaim space immediately:

CALL mysql.rds_rotate_general_log;
CALL mysql.rds_rotate_slow_log;

For PostgreSQL:

Find the largest tables or orphaned temp files that are bloating your storage:

SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;

Fix Approach 3: Set and Forget with Autoscaling

Stop fixing this error manually. AWS can handle it for you. Storage Autoscaling triggers an increase whenever free space drops below 10% for a sustained period.

Go to the Modify menu for your RDS instance.
Check the Enable storage autoscaling box.
Set a Maximum storage threshold (e.g., 1000 GB). This cap prevents a runaway bug from costing you thousands of dollars.

Verification: Is it actually fixed?

Don't trust the green 'Available' light alone. Verify the health of the system with these three checks:

Test Writes: Create a tiny test table to confirm the engine is accepting data again.

CREATE TABLE rds_check (id INT); INSERT INTO rds_check VALUES (1); DROP TABLE rds_check;

  
  - **Check the CLI status:**
    ```
aws rds describe-db-instances --db-instance-identifier my-production-db --query 'DBInstances[*].DBInstanceStatus'

Monitor CloudWatch: Ensure the FreeStorageSpace metric is trending upward or stabilized.

Important Caveats

The 6-Hour Cooling Period: Once you modify storage, AWS locks you out of further storage changes for 6 hours. Do not just add 1 GB; add enough to last at least 24 hours.
IOPS Scaling: If you use Provisioned IOPS (io1), your performance ratio is tied to your storage size. You may need to scale IOPS alongside storage to maintain speed.
Performance Dip: Expansion happens on live volumes. You might see a 5-10% increase in latency while the underlying EBS volume optimizes.