Fixing the 'Essential container in task exited' Error in AWS ECS Fargate

intermediate☁️ AWS2026-06-26| AWS ECS Fargate, Docker, AWS Task Definitions

Error Message

Essential container in task exited. Exit Code: 1 (or 137, 127)
#aws-ecs#fargate#docker#troubleshooting

The ProblemYou hit deploy on your ECS service, but instead of a running application, your tasks flip to a STOPPED state within seconds. The AWS Console usually throws a vague, frustrating message: Essential container in task exited. This happens because the primary container—the one you marked as 'essential'—either failed its health check or crashed immediately upon startup.

The real answer is hidden in the Exit Code. If your container crashes before the awslogs driver can even ship logs to CloudWatch, you might feel like you are debugging in the dark. Understanding these codes is the fastest way to get your service back online.

Common Exit Codes and Their Meanings- Exit Code 1: A generic application crash. This usually stems from a runtime error, a missing environment variable, or a failed database connection.- Exit Code 127: Command not found. Your ENTRYPOINT or CMD points to a script or binary that doesn't exist inside the container image.- Exit Code 137: Out of Memory (OOM). The Fargate agent killed your container because it tried to use more RAM than the 512MB or 1GB you allocated in the task definition.## How to Find the Root CauseDon't waste time looking at the high-level service events. They rarely provide the specific error. Instead, drill down into the individual failed task:

  • Open the ECS Cluster and select your Service.- Navigate to the Tasks tab and change the filter to Stopped.- Click the Task ID of the most recent failure.- Expand the Containers section. Look specifically at the Exit Code and the Reason field.## Proven Solutions### 1. Resolving Exit Code 1: Application Runtime CrashesMost Exit Code 1 errors are configuration issues. If your Node.js app expects a DB_HOST variable but finds undefined, it will likely crash during the bootstrap phase.
  • Review CloudWatch Logs: Look for "Module not found" or stack traces. If no logs exist, the crash happened before the logging agent started.- Verify Secret Manager Permissions: This is a common pitfall. Your Task Execution Role (not the Task Role) needs permission to pull secrets. Without secretsmanager:GetSecretValue, your app won't get the credentials it needs to start.``` // Essential Policy for your Task Execution Role { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "secretsmanager:GetSecretValue", "kms:Decrypt" ], "Resource": ["arn:aws:secretsmanager:region:account:secret:app-db-creds-*"] } ] }

### 2. Resolving Exit Code 127: Path and Permission ErrorsThis error occurs when the container starts, but the Linux shell cannot find your startup script. It is particularly common when overriding commands in the Task Definition.
- **Use Absolute Paths:** Instead of `./start.sh`, use `/usr/src/app/start.sh`.- **Fix Line Endings:** Scripts written on Windows often use `CRLF` line endings. Linux containers require `LF`. If you see `sh: /start.sh: not found` but the file clearly exists, line endings are likely the culprit. Run `dos2unix` on your scripts before building the image.- **Set Permissions:** Ensure your Dockerfile includes `RUN chmod +x /path/to/script.sh`.### 3. Resolving Exit Code 137: Memory ConstraintsFargate is less forgiving than local Docker environments. If a Java Spring Boot app spikes to 2.1GB during startup on a 2GB task, Fargate will kill it instantly.
- **Bump Task Size:** Try doubling your memory allocation (e.g., from 2GB to 4GB) to see if the task stabilizes.- **JVM Tuning:** For Java applications, use `-XX:MaxRAMPercentage=75.0`. This tells the JVM to respect the container's limits rather than trying to claim the entire underlying host's memory.### 4. Networking Failures (The "No Logs" Scenario)If your task stops with `ResourceInitializationError: unable to pull secrets or registry auth`, the container never actually ran. This is a networking bottleneck in your VPC.
- **Public Subnets:** You must set **Auto-assign Public IP** to `ENABLED` so the task can reach the ECR service via the internet.- **Private Subnets:** Ensure you have a NAT Gateway in place. Alternatively, configure VPC Endpoints (Interface Endpoints) for ECR, S3, and CloudWatch so traffic stays within the AWS network.## Verification and DeploymentAfter applying a fix, such as updating an IAM policy or increasing RAM, you must deploy the changes properly:
- Create a **New Revision** of your Task Definition.- Update the Service to use this latest revision.- Watch the **Tasks** tab. A successful deployment will transition from `PROVISIONING` to `RUNNING` and stay there.- Confirm the **Health Status** shows `HEALTHY`. This indicates your load balancer or container health check is finally receiving a response.## Summary Checklist- Check the **Stopped Reason** in the Task details, not just the Service events.- Differentiate between the **Task Execution Role** (pulling images/secrets) and the **Task Role** (application logic).- Always use absolute paths in Docker commands to prevent 127 errors.- Configure `awslogs` immediately so you can capture stdout/stderr during those critical first seconds of execution.

Related Error Notes