Fixing Terraform remote-exec SSH Timeouts and Connection Errors

The Provisioner TrapYou’ve finally nailed the Terraform manifest. You hit `terraform apply`, watch the instance spin up, and then... silence. The terminal hangs for five minutes before crashing with a familiar failure:

Error: timeout - last error: dial tcp 10.0.1.5:22: connect: connection refused

This usually means your Terraform runner—whether it’s your laptop, a GitHub Actions agent, or a CI/CD pipe—cannot reach the SSH port. It’s a frustrating roadblock, but usually easy to clear. Here is how to debug it without losing your mind.

TL;DR: The 60-Second Audit- Security Groups: Is port 22 open for your specific IP? Check the ingress rules.- IP Selection: Are you trying to hit a private IP from the public internet? Switch the `host` to `public_ip`.- Boot Lag: The VM might be 'running' in the AWS console, but `sshd` might still be initializing.- Usernames: Are you using `ubuntu` for an Amazon Linux AMI? (It should be `ec2-user`).### 1. The Routing Gap: Public vs. Private IPsLook closely at the IP in your error: `10.0.1.5`. If you are running Terraform from a local machine, you can't route traffic to that address. It’s a private, internal IP. Terraform often grabs the first IP it sees, which is usually the internal one.

The remote-exec provisioner needs a direct line of sight. Unless you are on a VPN or using a bastion host, you must use the instance's public IP.

The Solution:Force the connection block to use the public IP attribute.

resource "aws_instance" "web" {
  # ... configuration ...

  provisioner "remote-exec" {
    connection {
      type        = "ssh"
      user        = "ubuntu"
      private_key = file("~/.ssh/deploy_key")
      host        = self.public_ip # Don't leave this to chance
    }

    inline = ["sudo apt-get update"]
  }
}

2. Security Groups: The Invisible WallEven with the correct IP, a `timeout` suggests a firewall is silently dropping your packets. A `connection refused`, however, means you hit the server but were rejected. Most cloud providers default to 'deny all' for inbound traffic.

Your Next Step:Verify your Security Group or Network Security Group allows TCP port 22. For testing, you might use `0.0.0.0/0`, but for production, restrict this to your specific IP range (e.g., `203.0.113.5/32`).

resource "aws_security_group" "ssh_access" {
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["your_ip_here/32"]
  }
}

3. The Race Condition: Boot Time RealitiesCloud APIs are fast; Linux kernels are slower. An AWS `t3.micro` might report as 'Running' within 20 seconds, but `cloud-init` and `sshd` often need another 40 to 60 seconds to fully start. If Terraform attempts to connect too early, it might exhaust its retries.

How to Fix it:Increase the connection timeout to give the OS room to breathe. A 5-to-10 minute window is usually safe for most standard images.

connection {
  type    = "ssh"
  user    = "ec2-user"
  host    = self.public_ip
  timeout = "10m" 
}

4. Local Firewalls (UFW/Firewalld)Some hardened AMIs come with internal firewalls enabled. Even if the cloud-level Security Group is wide open, `ufw` (on Ubuntu) or `firewalld` (on RHEL) might be blocking port 22.

Run a manual check to rule out Terraform-specific issues:

ssh -i ~/.ssh/key.pem user@1.2.3.4

If this manual command fails, your problem is the network or the OS, not your Terraform code.

The Pro Move: Ditch ProvisionersHashiCorp considers provisioners a 'last resort.' They aren't part of the Terraform state, which makes them brittle. If network flakiness is a recurring theme, move your setup logic into `user_data`.

Cloud-init runs locally on the machine. It doesn't need an SSH tunnel from your laptop, making it 10x more reliable for bootstrapping instances.

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  user_data     = <<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              EOF
}

Final Checklist- Subnet: Is the instance in a private subnet? If so, you'll need a Bastion or VPN.- Key Permissions: Ensure your private key is protected (`chmod 400`).- Username: Double-check the AMI defaults. Using `root` or `admin` when the OS expects `ubuntu` will fail every time.

Fixing Terraform remote-exec SSH Timeouts and Connection Errors

The Provisioner TrapYou’ve finally nailed the Terraform manifest. You hit `terraform apply`, watch the instance spin up, and then... silence. The terminal hangs for five minutes before crashing with a familiar failure:

The Solution:Force the connection block to use the public IP attribute.

2. Security Groups: The Invisible WallEven with the correct IP, a `timeout` suggests a firewall is silently dropping your packets. A `connection refused`, however, means you hit the server but were rejected. Most cloud providers default to 'deny all' for inbound traffic.

Your Next Step:Verify your Security Group or Network Security Group allows TCP port 22. For testing, you might use `0.0.0.0/0`, but for production, restrict this to your specific IP range (e.g., `203.0.113.5/32`).

How to Fix it:Increase the connection timeout to give the OS room to breathe. A 5-to-10 minute window is usually safe for most standard images.

4. Local Firewalls (UFW/Firewalld)Some hardened AMIs come with internal firewalls enabled. Even if the cloud-level Security Group is wide open, `ufw` (on Ubuntu) or `firewalld` (on RHEL) might be blocking port 22.

The Pro Move: Ditch ProvisionersHashiCorp considers provisioners a 'last resort.' They aren't part of the Terraform state, which makes them brittle. If network flakiness is a recurring theme, move your setup logic into `user_data`.

Final Checklist- Subnet: Is the instance in a private subnet? If so, you'll need a Bastion or VPN.- Key Permissions: Ensure your private key is protected (`chmod 400`).- Username: Double-check the AMI defaults. Using `root` or `admin` when the OS expects `ubuntu` will fail every time.

Related Error Notes

Fixing the 'Provider Plugin Crashed' Error in Terraform

Fixing the Terraform "Saved plan is stale" Error in CI/CD Pipelines

Fix Terraform 'Cannot import non-existent remote object' Error When Importing Resources

The Provisioner TrapYou’ve finally nailed the Terraform manifest. You hit terraform apply, watch the instance spin up, and then... silence. The terminal hangs for five minutes before crashing with a familiar failure:

The Solution:Force the connection block to use the public IP attribute.

2. Security Groups: The Invisible WallEven with the correct IP, a timeout suggests a firewall is silently dropping your packets. A connection refused, however, means you hit the server but were rejected. Most cloud providers default to 'deny all' for inbound traffic.

Your Next Step:Verify your Security Group or Network Security Group allows TCP port 22. For testing, you might use 0.0.0.0/0, but for production, restrict this to your specific IP range (e.g., 203.0.113.5/32).

3. The Race Condition: Boot Time RealitiesCloud APIs are fast; Linux kernels are slower. An AWS t3.micro might report as 'Running' within 20 seconds, but cloud-init and sshd often need another 40 to 60 seconds to fully start. If Terraform attempts to connect too early, it might exhaust its retries.

How to Fix it:Increase the connection timeout to give the OS room to breathe. A 5-to-10 minute window is usually safe for most standard images.

4. Local Firewalls (UFW/Firewalld)Some hardened AMIs come with internal firewalls enabled. Even if the cloud-level Security Group is wide open, ufw (on Ubuntu) or firewalld (on RHEL) might be blocking port 22.

The Pro Move: Ditch ProvisionersHashiCorp considers provisioners a 'last resort.' They aren't part of the Terraform state, which makes them brittle. If network flakiness is a recurring theme, move your setup logic into user_data.

Final Checklist- Subnet: Is the instance in a private subnet? If so, you'll need a Bastion or VPN.- Key Permissions: Ensure your private key is protected (chmod 400).- Username: Double-check the AMI defaults. Using root or admin when the OS expects ubuntu will fail every time.

Related Error Notes

Fixing the 'Provider Plugin Crashed' Error in Terraform

Fixing the Terraform "Saved plan is stale" Error in CI/CD Pipelines

Fix Terraform 'Cannot import non-existent remote object' Error When Importing Resources

The Provisioner TrapYou’ve finally nailed the Terraform manifest. You hit `terraform apply`, watch the instance spin up, and then... silence. The terminal hangs for five minutes before crashing with a familiar failure:

2. Security Groups: The Invisible WallEven with the correct IP, a `timeout` suggests a firewall is silently dropping your packets. A `connection refused`, however, means you hit the server but were rejected. Most cloud providers default to 'deny all' for inbound traffic.

Your Next Step:Verify your Security Group or Network Security Group allows TCP port 22. For testing, you might use `0.0.0.0/0`, but for production, restrict this to your specific IP range (e.g., `203.0.113.5/32`).

4. Local Firewalls (UFW/Firewalld)Some hardened AMIs come with internal firewalls enabled. Even if the cloud-level Security Group is wide open, `ufw` (on Ubuntu) or `firewalld` (on RHEL) might be blocking port 22.

The Pro Move: Ditch ProvisionersHashiCorp considers provisioners a 'last resort.' They aren't part of the Terraform state, which makes them brittle. If network flakiness is a recurring theme, move your setup logic into `user_data`.

Final Checklist- Subnet: Is the instance in a private subnet? If so, you'll need a Bastion or VPN.- Key Permissions: Ensure your private key is protected (`chmod 400`).- Username: Double-check the AMI defaults. Using `root` or `admin` when the OS expects `ubuntu` will fail every time.