The ProblemImagine you have a Lambda function designed to process a 500MB CSV file or generate a complex quarterly report. You’ve wisely set the Lambda's internal timeout to 15 minutes to handle the load. However, the moment you run it via Step Functions, the task fails after exactly one minute. You see this error in your execution history:
States.HeartbeatTimeout: State machine execution failed: Task timed out after 60 seconds
This is frustrating because your Lambda might still be working perfectly fine in the background. The issue isn't that the Lambda crashed; it's that Step Functions gave up waiting.
Why This HappensStep Functions uses two distinct timers to manage tasks. Understanding the difference is key to a stable workflow.
- TimeoutSeconds: This is the hard ceiling. It defines the total time a task is allowed to run from start to finish. If a task hits this limit, it's terminated.- HeartbeatSeconds: Think of this as a "dead man's switch." It expects the task to report back periodically to prove it hasn't stalled. If the task doesn't send a heartbeat within this window, Step Functions assumes the worker died and kills the state.The
States.HeartbeatTimeouterror triggers because your Amazon States Language (ASL) definition includes aHeartbeatSecondsvalue, but your Lambda isn't checking in. Standard Lambda functions are "fire and forget" from the perspective of Step Functions. They don't automatically send heartbeats while they process data.
Step-by-Step Fix### Step 1: Audit your ASL DefinitionLocate the failing state in your workflow definition. Often, IDE plugins or snippets automatically include a 60-second heartbeat by default. It usually looks like this:
"ProcessLargeData": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:data-processor",
"HeartbeatSeconds": 60,
"End": true
}
Step 2: Remove the Heartbeat RequirementFor the vast majority of Lambda tasks, you don't need heartbeats. Since Lambda handles its own compute lifecycle, Step Functions just needs to wait for the final response.
The Best Fix: Delete the HeartbeatSeconds line entirely. Instead, use TimeoutSeconds to define how long you are willing to wait for the result.
"ProcessLargeData": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:data-processor",
"TimeoutSeconds": 900,
"End": true
}
In this example, we set the timeout to 900 seconds (15 minutes) to match the maximum execution time of a Lambda function.
Step 3: Sync Your Lambda ConfigurationA common pitfall is updating Step Functions but forgetting the Lambda settings. If your Step Function waits for 15 minutes but the Lambda is configured with a 30-second timeout, the Lambda will still fail. Use the AWS CLI to ensure they match:
aws lambda update-function-configuration \
--function-name data-processor \
--timeout 900
VerificationTest your changes by triggering a new execution. Open the Graph Inspector in the AWS Console and select the task. It should now stay in the "Running" state (blue) well past the previous 60-second mark. Finally, check your CloudWatch Logs to confirm the Lambda successfully returned its payload to the state machine.
When Should You Actually Use Heartbeats?Heartbeats aren't useless; they just aren't meant for simple Lambda calls. Use them in these two scenarios:
- Activity Workers: If you are running code on an EC2 instance or an on-premise server that polls Step Functions for work, heartbeats are essential to detect if that server goes offline.- The Callback Pattern: If your task involves a human approval step or a job that takes hours, use
.waitForTaskToken. In this case, your worker must periodically call theSendTaskHeartbeatAPI to keep the execution alive.For standard Standard or Express workflows using Lambda, stick toTimeoutSecondsand keep it simple.

