Skip to content

SQS

Created: 2019-03-05 10:57:44 -0800 Modified: 2020-07-24 09:17:22 -0700

  • You can connect SQS to Lambda automatically (reference). Lambda really is the service that’s polling for the presence of SQS messages (as opposed to CloudWatch), and you are charged for this, but you get 1M requests for free per month, and this should only be something like 200k request from what I can understand (reference - see “Additional information”). Overall, SQS + Lambda should be a cost-effective way of handling events.
  • To connect SQS to Lambda via Terraform, you would use aws_lambda_event_source_mapping (reference).
  • You set a batch size via createEventSourceMapping (reference) in the Lambda API. For SQS, this defaults to the maximum, which is 10.
    • The batch size itself is a maximum. For example, if you have 11 messages to process and your batch size is 10, then you won’t end up with one “dangling” message that won’t ever get processed.
  • When Lambda successfully returns, any messages from SQS that were sent to it will be removed from the queue (reference).
    • Here’s some boilerplate Node.js code for handling messages that I wrote for Bot Land’s notification system:
async function lambdaHandler(event, context) {
console.log(`This lambda has ${_.size(event.Records)} record(s)`);
try {
await Promise.all(
event.Records.map((record) => {
return processMessage(record);
})
);
console.log('Finished everything successfully');
return { success: true };
} catch (error) {
// If you throw here, then SQS would mark this entire batch as having failed
// even though SOME of the messages may have handled successfully.
}
}
  • It’s incredibly important to make sure your Lambda works correctly when connected to SQS, otherwise it’ll just keep trying to spawn it for ~2 days straight or however long your message retention period is. To temporarily disable SQS as a source, go to the console → Lambda → Designer → click SQS in the “triggers” section → Disable.
  • To actually write the Lambda code, check out this page. Note that “body” corresponds to “MessageBody” in the sendMessage* APIs.
  • The producer for the queue will need write access for SQS, which is not granted by the AWSLambdaSQSQueueExecutionRole IAM role, so I used AmazonSQSFullAccess.
  • sendMessageBatch requires that IDs be unique within the batch, but there are limitations on the IDs themselves (they must contain alphanumeric+hyphen+underscore and can’t be longer than 80 characters). I was trying to use them for device IDs, but Android device IDs don’t conform to those restrictions, so I ended up just using natural numbers (so 0, 1, 2, … 9 for each batch). I don’t think it’s worth trying to make IDs meaningful unless they naturally fit AWS’s constraints. Just make sure to convert the numbers to strings (‘1’, ‘2’, etc.) so that you don’t get a type error.
  • sendMessage and sendMessageBatch each have a limit of 256 KB (i.e. sending a batch of 10 messages does NOT increase it to 2560 KB), so if you find yourself trying to send, say, one million user IDs via SQS, then you’re doing it wrong. You should instead send 100K message batches of size 10:
    • BAD

sendMessage({

userIds: [1, 2, 3, …, 1e6],

})

  • GOOD

sendMessage({

id: 1,

})

sendMessage({

id: 2,

})

  • Part of the reason for this is because each item in the queue has its own status for whether it was handled. If you were to make your own concept of batching, then you’d lose that property. One additional feature of SQS is that they implement a dead letter queue (AWS reference here). This lets you isolate problematic messages. If your message was really a batch, then it would be harder to conclude what the problem was.

    • Note that dead-letter queues are for messages that couldn’t be handled, not for messages that couldn’t even be added to the queue to begin with (via sendMessage or sendMessageBatch). For those, you’d check the return value of the corresponding function.
  • If you’re going to send messages in a batch, then you can’t have individual messages return a failure; it’s all or nothing. AWS says that if you need to handle individual failures, then you should instead be able to reprocess successes: “When Lambda reads a message from the queue, it stays in the queue but becomes hidden until Lambda deletes it. If your function returns an error, or doesn’t finish processing before the queue’s visibility timeout, it becomes visible again. Then Lambda sends it to your Lambda function again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.” (reference)

    • Your solutions in general to the problem of individual failures in a batch are:
      • Be able to process successful messages a second time (e.g. have a database keep track of which messages were successful)
      • Set the batch size to 1 (i.e. send individual messages). You lose all of the benefits of batching by doing this.
      • Manually add to the dead-letter queue rather than letting AWS handle it via the redrive policy (reference). I opted for this in Bot Land for push notifications so that I don’t end up having to hit the database each time or lose the benefits of batching.