Fixing SQS Duplicate Messages and ReceiptHandleIsInvalidException

intermediateโ˜๏ธ AWS2026-04-12| AWS SQS, Java AWS SDK (v1/v2), Boto3 (Python), Node.js AWS SDK, distributed consumer environments.

Error Message

com.amazonaws.services.sqs.model.ReceiptHandleIsInvalidException: The input receipt handle is not a valid receipt handle (Service: AmazonSQS; Status Code: 404)
#aws-sqs#visibility-timeout#distributed-systems#cloudwatch#java

TL;DR: The Quick Fix

This error hits when your consumer takes longer to process a message than the VisibilityTimeout set on the queue. SQS assumes your worker crashed, makes the message visible to other consumers, and kills the original ReceiptHandle.

The Fix: Bump your VisibilityTimeout to at least 1.5x your maximum processing time. For unpredictable jobs, use a "heartbeat" via the ChangeMessageVisibility API to keep the handle alive while the worker is still active.

Why This Happens: A Race Against the Clock

When a consumer grabs a message, SQS doesn't delete it. It hides it for a specific window. Your job is to finish the work and call DeleteMessage before that window closes. If you're too slow, the message "reappears" for others to grab.

The ReceiptHandleIsInvalidException is the result of a specific timeline:

  • T+0: Consumer A picks up a message. The queue has a 30-second VisibilityTimeout.
  • T+31: Consumer A is still processing a heavy task, like resizing a 50MB image. SQS decides Consumer A is dead and puts the message back on the queue.
  • T+32: Consumer B picks up the same message. SQS issues a new ReceiptHandle. The old one is now garbage.
  • T+40: Consumer A finally finishes and tries to delete the message.
  • Result: SQS returns a 404. The handle is invalid because a newer one exists or the visibility window expired.

Solutions That Actually Work

1. Adjust the Global Visibility Timeout

If your logic consistently takes 2 minutes but your timeout is only 30 seconds, your configuration is working against you. Adjust the queue settings to handle your slowest possible successful request.

Using AWS CLI:

aws sqs set-queue-attributes \
    --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
    --attributes VisibilityTimeout=300

Using Terraform:

resource "aws_sqs_queue" "my_queue" {
  name                       = "data-processor-queue"
  visibility_timeout_seconds = 300 # 5 minutes
}

2. Implement a Programmatic Heartbeat

Sometimes you don't know if a job will take 10 seconds or 10 minutes. Setting a massive 15-minute timeout globally is dangerous. If a worker actually crashes, that message stays invisible for 15 minutes, stalling your pipeline. Instead, extend the timeout dynamically.

Example in Java (AWS SDK v2):

// Call this every 30 seconds during long-running tasks
SqsClient sqsClient = SqsClient.builder().build();

ChangeMessageVisibilityRequest request = ChangeMessageVisibilityRequest.builder()
    .queueUrl(queueUrl)
    .receiptHandle(receiptHandle)
    .visibilityTimeout(60) // Add 60 more seconds to the clock
    .build();

sqsClient.changeMessageVisibility(request);

Example in Python (Boto3):

import boto3
sqs = boto3.client('sqs')

def extend_lifetime(receipt_handle):
    # Tell SQS we need more time
    sqs.change_message_visibility(
        QueueUrl='YOUR_QUEUE_URL',
        ReceiptHandle=receipt_handle,
        VisibilityTimeout=60
    )

3. Design for Idempotency

In distributed systems, you must assume a message might be processed twice. Even with perfect timeouts, network blips happen. Ensure your database updates use unique keys or check status flags (e.g., WHERE status != 'COMPLETED') before doing any heavy lifting.

How to Verify the Fix

Check your Amazon CloudWatch dashboard for these specific signals:

  • ApproximateNumberOfMessagesVisible: This should trend toward zero. If it stays high while consumers are active, messages are likely timing out and recycling.
  • NumberOfMessagesReceived vs. Deleted: In a healthy system, these should be nearly 1:1. If you see 5,000 receives but only 3,000 deletes, you have a major duplication problem.
  • Log Patterns: Set up a CloudWatch Logs Insight query to count ReceiptHandleIsInvalidException occurrences. After the fix, this count should drop to zero.

Common Pitfalls

  • The Lambda 6x Rule: If SQS triggers a Lambda, set the SQS VisibilityTimeout to at least 6 times the Lambda's Timeout. This prevents the Lambda service from retrying the same event while the previous execution is still struggling.
  • Batch Processing Risks: If you pull 10 messages at once, the timeout starts for all of them immediately. If message #1 takes 29 seconds and your timeout is 30, messages #2 through #10 will almost certainly expire before you even touch them.

Related Error Notes