TL;DR
AWS ran out of physical hardware in the Availability Zone you targeted. Three things work immediately: try a different AZ, switch to a similar instance type, or move to a different region. For long-term guarantees โ production workloads, compliance requirements โ set up an On-Demand Capacity Reservation before you actually need it.
What triggers this error
Every RunInstances call โ and every stopped-instance restart โ asks AWS to allocate real physical servers in a specific AZ. When that AZ's hypervisor fleet is fully booked for your instance family, you get InsufficientInstanceCapacity. It's a supply problem, not an account problem. Raising your service quota does nothing here.
This hits most often in these situations:
- Launching GPU-heavy instances like
p4d.24xlarge,trn1, orinf2โ these have thin global supply - Restarting a stopped instance: AWS released its hardware when you stopped it, and now that spot is taken
- Auto Scaling groups configured to hammer a single AZ with a single instance type
- Deploying into a small or lightly-provisioned AZ (e.g.,
us-east-1ein older accounts)
Fix 1: Try a different Availability Zone
Fastest fix, highest success rate. Capacity pools are per-AZ, so us-east-1b might have plenty while us-east-1a is slammed.
# Drop the AZ constraint โ let AWS pick, or point to a subnet in a different AZ
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type m5.xlarge \
--subnet-id subnet-0bb1234567890abcd \
--count 1
Need to try several AZs automatically? Here's a quick shell loop:
#!/bin/bash
SUBNETS=("subnet-aaa" "subnet-bbb" "subnet-ccc") # one subnet per AZ
for SUBNET in "${SUBNETS[@]}"; do
echo "Trying subnet $SUBNET..."
RESULT=$(aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type m5.xlarge \
--subnet-id "$SUBNET" \
--count 1 2>&1)
if echo "$RESULT" | grep -q "InstanceId"; then
echo "Success in $SUBNET"
echo "$RESULT"
break
fi
echo "Failed: $RESULT"
done
Fix 2: Switch to an equivalent instance type
Different instance families draw from different capacity pools โ even when the hardware underneath is nearly identical. An m5.xlarge (Intel) and an m5a.xlarge (AMD) give you 4 vCPU and 16 GB RAM either way, but they compete for separate physical inventory.
# m5.xlarge failing? Drop in m6i.xlarge โ same specs, separate pool
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type m6i.xlarge \
--subnet-id subnet-0bb1234567890abcd \
--count 1
Reliable fallback pairs to try:
m5โm5a/m6i/m6ac5โc5a/c6ir5โr5a/r6ip3โp3dn/p4dif available, otherwise EC2 Capacity Blocks
Fix 3: Use a Spot Instance as a fallback
Spot draws from a separate capacity pool. On-Demand exhaustion doesn't touch it. This works well for batch jobs, CI runners, or any workload that can handle interruptions.
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type m5.xlarge \
--subnet-id subnet-0bb1234567890abcd \
--instance-market-options '{"MarketType":"spot","SpotOptions":{"SpotInstanceType":"one-time"}}' \
--count 1
Fix 4: Pre-reserve capacity with On-Demand Capacity Reservations
Running production traffic or compliance-sensitive workloads? Don't gamble on capacity being available when you need it. Create a reservation in advance โ before an incident forces your hand.
aws ec2 create-capacity-reservation \
--instance-type m5.xlarge \
--instance-platform Linux/UNIX \
--availability-zone us-east-1a \
--instance-count 5 \
--instance-match-criteria open
The open match criteria means any On-Demand launch in that AZ with a matching type automatically uses your reservation โ no extra flags needed at launch time. One catch: you pay the On-Demand rate 24/7 whether or not instances are running in it. Reserve only what you'll actually use.
To target a reservation explicitly:
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type m5.xlarge \
--subnet-id subnet-in-us-east-1a \
--capacity-reservation-specification \
'CapacityReservationTarget={CapacityReservationId=cr-0123456789abcdef0}' \
--count 1
Fix 5: Auto Scaling groups โ enable multi-AZ and mixed instance types
Single-AZ, single-instance-type ASGs are a capacity disaster waiting to happen. A mixed instances policy spreads the risk across multiple pools at once:
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-asg \
--availability-zones us-east-1a us-east-1b us-east-1c \
--mixed-instances-policy '{
"LaunchTemplate": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "lt-0123456789abcdef0",
"Version": "$Latest"
},
"Overrides": [
{"InstanceType": "m5.xlarge"},
{"InstanceType": "m5a.xlarge"},
{"InstanceType": "m6i.xlarge"},
{"InstanceType": "m6a.xlarge"}
]
},
"InstancesDistribution": {
"OnDemandBaseCapacity": 1,
"OnDemandPercentageAboveBaseCapacity": 100
}
}'
Fix 6: For stopped instances that won't start
Stopping an EC2 instance releases its underlying hardware back into the pool. When you try to start it again, AWS has to find new hardware โ and sometimes there isn't any in that AZ.
Three options, in order of least to most disruptive:
- Wait and retry: Capacity shifts constantly. Spot shortages in popular AZs like
us-east-1aoften clear within 15โ60 minutes. - Change the instance type temporarily: Stop โ modify instance type โ start. A different type might have open capacity.
- Create an AMI and relaunch in another AZ: More work, but lets you migrate to wherever capacity exists.
# Snapshot the stopped instance
aws ec2 create-image \
--instance-id i-0123456789abcdef0 \
--name "my-instance-backup-$(date +%Y%m%d)" \
--no-reboot
# Launch from that AMI in a different AZ
aws ec2 run-instances \
--image-id ami- \
--instance-type m5.xlarge \
--subnet-id subnet-in-different-az \
--count 1
Verifying the fix
A successful launch returns an InstanceId. Check that the instance actually reached running state:
aws ec2 describe-instances \
--instance-ids i-0123456789abcdef0 \
--query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name,AZ:Placement.AvailabilityZone}' \
--output table
Expected output:
--------------------------------------------------
| DescribeInstances |
+----------------------+----------+--------------+
| AZ | ID | State |
+----------------------+----------+--------------+
| us-east-1b | i-0abc.. | running |
+----------------------+----------+--------------+
Before your next launch, check which AZs actually carry the instance type you need:
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters Name=instance-type,Values=m5.xlarge \
--query 'InstanceTypeOfferings[].Location' \
--output table
Further reading
- AWS docs: On-Demand Capacity Reservations โ guaranteed capacity in specific AZs
- AWS docs: EC2 Capacity Blocks for ML โ time-boxed GPU instance reservations
- AWS docs: Auto Scaling mixed instances policy โ building resilient fleets

