The Provisioner TrapYouāve finally nailed the Terraform manifest. You hit terraform apply, watch the instance spin up, and then... silence. The terminal hangs for five minutes before crashing with a familiar failure:
Error: timeout - last error: dial tcp 10.0.1.5:22: connect: connection refused
This usually means your Terraform runnerāwhether itās your laptop, a GitHub Actions agent, or a CI/CD pipeācannot reach the SSH port. Itās a frustrating roadblock, but usually easy to clear. Here is how to debug it without losing your mind.
TL;DR: The 60-Second Audit- Security Groups: Is port 22 open for your specific IP? Check the ingress rules.- IP Selection: Are you trying to hit a private IP from the public internet? Switch the host to public_ip.- Boot Lag: The VM might be 'running' in the AWS console, but sshd might still be initializing.- Usernames: Are you using ubuntu for an Amazon Linux AMI? (It should be ec2-user).### 1. The Routing Gap: Public vs. Private IPsLook closely at the IP in your error: 10.0.1.5. If you are running Terraform from a local machine, you can't route traffic to that address. Itās a private, internal IP. Terraform often grabs the first IP it sees, which is usually the internal one.
The remote-exec provisioner needs a direct line of sight. Unless you are on a VPN or using a bastion host, you must use the instance's public IP.
The Solution:Force the connection block to use the public IP attribute.
resource "aws_instance" "web" {
# ... configuration ...
provisioner "remote-exec" {
connection {
type = "ssh"
user = "ubuntu"
private_key = file("~/.ssh/deploy_key")
host = self.public_ip # Don't leave this to chance
}
inline = ["sudo apt-get update"]
}
}
2. Security Groups: The Invisible WallEven with the correct IP, a timeout suggests a firewall is silently dropping your packets. A connection refused, however, means you hit the server but were rejected. Most cloud providers default to 'deny all' for inbound traffic.
Your Next Step:Verify your Security Group or Network Security Group allows TCP port 22. For testing, you might use 0.0.0.0/0, but for production, restrict this to your specific IP range (e.g., 203.0.113.5/32).
resource "aws_security_group" "ssh_access" {
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["your_ip_here/32"]
}
}
3. The Race Condition: Boot Time RealitiesCloud APIs are fast; Linux kernels are slower. An AWS t3.micro might report as 'Running' within 20 seconds, but cloud-init and sshd often need another 40 to 60 seconds to fully start. If Terraform attempts to connect too early, it might exhaust its retries.
How to Fix it:Increase the connection timeout to give the OS room to breathe. A 5-to-10 minute window is usually safe for most standard images.
connection {
type = "ssh"
user = "ec2-user"
host = self.public_ip
timeout = "10m"
}
4. Local Firewalls (UFW/Firewalld)Some hardened AMIs come with internal firewalls enabled. Even if the cloud-level Security Group is wide open, ufw (on Ubuntu) or firewalld (on RHEL) might be blocking port 22.
Run a manual check to rule out Terraform-specific issues:
ssh -i ~/.ssh/key.pem user@1.2.3.4
If this manual command fails, your problem is the network or the OS, not your Terraform code.
The Pro Move: Ditch ProvisionersHashiCorp considers provisioners a 'last resort.' They aren't part of the Terraform state, which makes them brittle. If network flakiness is a recurring theme, move your setup logic into user_data.
Cloud-init runs locally on the machine. It doesn't need an SSH tunnel from your laptop, making it 10x more reliable for bootstrapping instances.
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
user_data = <<-EOF
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
EOF
}

