Lambda

Created: 2018-11-05 14:39:32 -0800 Modified: 2021-06-06 15:23:56 -0700

Basics (reference, CLI reference)

Lambda is for running relatively quick functions without a persistent server.
Lambda is part of the “always free” tier of AWS, meaning you can use it for some number of requests or time per month without paying money (reference).
Lambda functions have a specifiable timeout (reference) that cannot exceed 15 minutes. There are more limits listed here.

If you exceed the amount of RAM at runtime that you allocated at launch for a lambda, then it will probably crash and you’ll need to restart it with more RAM.

By giving your lambda more RAM, you also get more CPU allocated to the machine automatically.

There’s a guide on best practices here.
Lambda@Edge has its own costs (reference)
Lambdas can only use private subnets from inside a VPC (reference). You need a VPC in order to talk to AWS services like RDS. With no VPC, you can openly talk to the web. With VPC and a NAT, you can do both.
- Some AWS services are (or at least were) only able to accessed via the Internet. SNS was this way up until 2018 (reference, reference2), meaning that back in 2017, you would have needed a NAT just to call into SNS from a lambda. If you use SNS from a VPC via VPC endpoints, you need to modify security groups to give explicit inbound access on SNS, otherwise you’ll fail to connect.
  - Update (August 2nd, 2019): I just found out that VPC endpoints cost ~$7/month unless they’re a Gateway Type VPC endpoint. The SNS endpoint is an Interface endpoint.

Lambda@Edge (reference)

Basics

It’s a way of running code right before your content is served, e.g. to set headers or redirect requests or something (see the main page for more information).

If you want to set certain security-related headers like X-Frame-Options or Content-Security-Policy, you need to use Lambda@Edge. This blog post tells you how to set them up manually or via Terraform. The manual process is apparently super easy.

For some more details about the policies themselves, I’ve got these links:

W3C comparison of how CSP is an improvement on X-Frame-Options (reference)
Equivalent policies for Content-Security-Policy given existing X-Frame-Options settings (reference)

Here’s the entirety of my index.js to set these headers for Bot Land:

exports.handler = async (event) => {
    const response = event.Records[0].cf.response;
    const headers = response.headers;

    headers["strict-transport-security"] = [{key: "Strict-Transport-Security", value: "max-age=31536000; includeSubdomains; preload"}];
    headers["content-security-policy"] = [
      {
        key: "Content-Security-Policy",
        value:
          "default-src 'self' *.bot.land; manifest-src 'self'; img-src 'self' https://i.imgur.com/ https://imgur.com/ blob: data: https://www.google-analytics.com https://static-cdn.jtvnw.net; script-src 'unsafe-inline' www.google-analytics.com https://ssl.google-analytics.com 'unsafe-eval' *.bot.land; style-src 'self' 'unsafe-inline' fonts.googleapis.com; font-src 'self' fonts.googleapis.com fonts.gstatic.com https://themes.googleusercontent.com https://at.alicdn.com/; object-src 'none'; frame-src https://*.youtube.com; worker-src 'self' blob:"
      }
    ];
    headers["x-content-type-options"] = [{key: "X-Content-Type-Options", value: "nosniff"}];
    headers["x-frame-options"] = [{key: "X-Frame-Options", value: "DENY"}];
    headers["x-xss-protection"] = [{key: "X-XSS-Protection", value: "1; mode=block"}];
    headers["referrer-policy"] = [{key: "Referrer-Policy", value: "same-origin"}];

    return response;
};

Streaming

Lambda shows environment variables in plaintext in the AWS console.

For example, any Lambdas that have to connect to the database will that in the clear.

Usage with NodeJS (reference)

You can use either the callback signature or an asynchronous function that simply returns whatever object would have been passed to the callback on success (or throws an error on failure).

To get started incredibly quickly, here are the steps that I went through:

Make a new folder locally called “aws_nodejs_test”.
yarn init -y
yarn add lodash (just so that I could test having a package)
Create and edit main.js
Add this code - note: the handler function’s name can be anything since the create-function API lets you specify it

const _ = require("lodash");

async function lambdaHandler(event, context) {
  return testeroni();
}

async function testeroni() {
  console.log("Hello world from Lambda");
  const randomNumbers = _.times(10, () => _.random(1, 10));
  console.log("randomNumbers: " + JSON.stringify(randomNumbers));

  return {success: true};
}

module.exports = {
  lambdaHandler,
  testeroni
};

The reason I separated lambdaHandler from testeroni is because their best practices say that your handler should just be a wrapper around some other code, that way you can still unit-test this from outside the context of AWS.
Make an IAM role in AWS
- At the very least, you’re going to need CloudwatchLogsFullAccess if you want to produce logs, but also add any permissions for services that you know you’ll eventually want (e.g. S3 access).
- Depending on what you want, just make sure that you have a trust relationship set up for the service “lambda.amazonaws.com” or else you’ll hit the error “The role defined for the function cannot be assumed by Lambda.”.
Zip the code (on Linux, I used “zip”, which means you should make sure you have it installed). I believe that you cannot use the executable “gzip”.
Run this command through the CLI

aws lambda create-function

—function-name LambdaTesteroni

—runtime nodejs8.10

—role arn:aws:iam::212785478310:role /LambdaWithCloudwatch

—handler aws_nodejs_test/main.lambdaHandler

—timeout 3

—memory-size 128

—publish

—zip-file fileb://./aws_nodejs_test.zip

The “handler” part wasn’t easy to figure out from their documentation (it was here, which wasn’t where I expected), but the AWS Console UI online helped me figure it out.
- Note that the handler doesn’t have to have a folder name necessary.
To invoke the function, you can either use the AWS Console or this CLI command

aws lambda invoke —function-name LambdaTesteroni output.json

By running that, “output.txt” should get created in the current directory with the JSON success:true .

After invoking the instructions, if you want to check the CloudWatch logs, go to AWS Console → Lambda → Functions → Click your function → Monitoring → View Logs in CloudWatch

When doing anything more complex than “hello world”, you’ll probably want environment variables. These are encrypted in AWS with a default service key unless you specify “—kms-key-arn” and provide your own.

To specify the environment variables, you have to pass in “—environment Variables={foo=bar}” (WITH the “Variables=” part).

Accessing RDS (reference, SO reference)

The only way to get access to RDS from a lambda is to make sure you call either create-function or update-function-configuration with —vpc-config specified, then make sure the subnets / security groups are configured correctly. In order to be able to manage network connections to a VPC, you’ll also need an execution role that has AWSLambdaVPCAccessExecutionRole.

My general steps went something like this for my specific scenario with CircleCI:

Make a new IAM role named lambda-vpc-role with the AWSLambdaVPCAccessExecutionRole policy.
1. Cloudwatch permissions are part of this policy, so there’s no need to add an extra policy.
Make two new subnets for the Lambda to go into. I chose 10.0.2.32/27 as the CIDR block for the first one so that there would be 32 possible IP addresses (10.0.2.32 → 10.0.2.63), then 10.0.2.64/27 for the second block. These should be in different availability zones or else there’s no reason to have two separate ones.
Make a new security group for Lambda
Modify the RDS security group that I already had to allow inbound traffic on 3306 from the new security group.
Add CircleCI variables for the comma-separated subnet IDs, the comma-separated security groups (which is just the one security group for me), and the IAM role.
Modify my CircleCI config to do this when it comes to updating the function configuration

- run:
    name: Update Lambda function configuration
    working_directory: ~/botland/packages/database

    command: >
    . venv/bin/activate &&
    aws lambda update-function-configuration
    --function-name $CREATE_DATABASE_LAMBDA_FUNCTION_NAME
    --role $CREATE_DATABASE_LAMBDA_ROLE_ARN
    --handler $CREATE_DATABASE_LAMBDA_HANDLER_VALUE
    --timeout $DATABASE_MIGRATION_LAMBDA_TIMEOUT
    --vpc-config SubnetIds=$CREATE_DATABASE_LAMBDA_SUBNET_IDS,SecurityGroupIds=$CREATE_DATABASE_LAMBDA_SECURITY_GROUP_IDS
    --environment "Variables={BABEL_CACHE_PATH=/tmp/babel-cache,databaseHost=$DATABASE_HOST,databaseUser=$DATABASE_USER,databaseName=$DATABASE_NAME,databaseRootPassword=$DATABASE_ROOT_PASSWORD,databasePassword=$DATABASE_PASSWORD}"
    --memory-size $MB_RAM_GIVEN_TO_LAMBDA

When you use VPCs with Lambda, you get this warning in the AWS console:

“When you enable a VPC, your Lambda function loses default internet access.If you require external internet access for your function, make sure that your security group allows outbound connections and that your VPC has a NAT gateway.”

Zipping files while using Yarn workspaces

Yarn workspaces will hoist dependencies to the root of the workspace by default. When it comes to zipping files, this makes it very difficult to pull all of the necessary dependencies in. Yarn has a feature to prevent this functionality from happening called nohoist. However, when I tried that, I ran into a problem where Lerna was symlinking dependencies. I’m not 100% certain that how I explain this below is correct, but I’m relatively positive that I can’t get this to work as-is (keep reading afterward for a workaround).

For example, suppose I have this setup:

lerna_root

packages

database

package.json with lodash, shared, and left-pad

shared

package.json with lodash, semver

some_other_module

package.json with semver

With Yarn workspaces, running “yarn” will install lodash and semver at the root since they’re shared modules, and everything else (e.g. “left-pad”, “shared”) in the specific package folder since those would be unique modules.

With Lerna, database/nodemodules/shared would actually be a symlink. For the database package to run, it relies on shared, and shared relies on semver. However, the symlink points to a directory whose node_modules does _not contain semver since it would be at the root, and I assume the “nohoist” rules somehow prevent checking a parent folder, so simply running “yarn” in this scenario causes an error that looks like this:

error An unexpected error occurred: “ENOENT: no such file or directory, symlink ‘C:Usersagd13 _000scooppersistyarncachev3npm-jest-23.1.0-bbb7f893100a11a742dd8bd0d047a54b0968ad1anode_modulesjestbinjest.js’ -> ‘B:CodeBotLandbotlandpackagesdatabasenode _modules\botlanddbwrappernode_modulesjest\binjest’“.

This error randomly manifested itself with other packages, e.g.:

error An unexpected error occurred: “ENOENT: no such file or directory, symlink ‘C:Usersagd13 _000scooppersistyarncachev3npm-semver-5.6.0-7e74256fbaa49c75aa7c7a205cc22799cac80004node_modulessemverbinsemver’ -> ‘B:CodeBotLandbotlandpackagesdatabasenode _modules\botlanddbwrappernode_modulessemver\binsemver’“.

There’s a simple workaround for all of this:

Copy your entire “database” folder (or whatever package has to be zipped) to something outside of the monorepo.
Run “yarn” to install everything
Zip everything

My idea for a workaround is to write a script that will copy all of the dependencies (and recursively any of those dependencies’ dependencies) needed by the “database” package into a single folder and zip that new folder.

~~To set it up, I added this to the package.json in the specific package where I had dependencies installed:~~

"workspaces": {
    "nohoist": [
      "*botland/**",
      "*botland/**/**",
      "babel-register",
      "babel-register/**",
      "dotenv",
      "dotenv/**",
      "knex",
      "knex/**",
      "lodash",
      "lodash/**"
    ]
  }

~~I did this for every single module that was included in the dependencies.~~

Note that the “Botland” modules are all private (e.g. “@botland/foo”), but specifying ”@” in a glob or even ”@” seems to mess up and cause weird symlink errors that seemed to be this issue (unsolved as of 11/6/2018), so I had to use a different glob.

Troubleshooting

The request returns a 200 even though an error occurred

This is just how the CLI is supposed to work (reference). Basically, the 200 refers to the CLI working or failing, not whatever the CLI later invokes.

If you want to check for errors, use something like JQ and check for the existence of a FunctionError property (reference).

aws lambda invoke blahblahblah output.json > invocation_output.json

cat invocation_output.json | jq -r “.FunctionError” | grep -v -e “Handled” -e “Unhandled”

Notes:

If you’re going to do this, then you need to make sure that your Lambda handler itself always returns JSON and not just a string like “success”, otherwise JQ will fail. You can test this quickly on the command line via something like:

echo hello | jq -r “.FunctionError”

invocation_output.json will look like this

{
    "FunctionError": "Handled",

    "ExecutedVersion": "$LATEST",
    "StatusCode": 200
}

output.json will contain the error output of your process, so it’ll be something like this:

{
  "errorMessage": "connect ETIMEDOUT",
  "errorType": "Error",
  "stackTrace": [
    "Connection._handleConnectTimeout (/var/task/node_modules/mysql/lib/Connection.js:419:13)",
    "Object.onceWrapper (events.js:313:30)",
    "emitNone (events.js:106:13)",
    "Socket.emit (events.js:208:7)",
    "Socket._onTimeout (net.js:420:8)",
    "ontimeout (timers.js:482:11)",
    "tryOnTimeout (timers.js:317:5)",
    "Timer.listOnTimeout (timers.js:277:5)",
    " --------------------",
    "Protocol._enqueue (/var/task/node_modules/mysql/lib/protocol/Protocol.js:145:48)",
    "Protocol.handshake (/var/task/node_modules/mysql/lib/protocol/Protocol.js:52:23)",
    "Connection.connect (/var/task/node_modules/mysql/lib/Connection.js:130:18)",
    "/var/task/node_modules/knex/lib/dialects/mysql/index.js:109:18",
    "Promise._execute (/var/task/node_modules/bluebird/js/release/debuggability.js:313:9)",
    "Promise._resolveFromExecutor (/var/task/node_modules/bluebird/js/release/promise.js:483:18)",
    "new Promise (/var/task/node_modules/bluebird/js/release/promise.js:79:10)",
    "Client_MySQL.acquireRawConnection (/var/task/node_modules/knex/lib/dialects/mysql/index.js:104:12)",
    "create (/var/task/node_modules/knex/lib/client.js:283:23)",
    "tryPromise (/var/task/node_modules/tarn/lib/Pool.js:366:22)",
    "tryPromise (/var/task/node_modules/tarn/lib/utils.js:57:20)",
    "Promise (/var/task/node_modules/tarn/lib/Pool.js:366:5)",
    "new Promise (<anonymous>)",
    "callbackOrPromise (/var/task/node_modules/tarn/lib/Pool.js:357:10)",
    "Pool._create (/var/task/node_modules/tarn/lib/Pool.js:307:5)",
    "Pool._doCreate (/var/task/node_modules/tarn/lib/Pool.js:275:32)"
  ]
}

For example, here’s a failed invocation:

{
    "StatusCode": 200,
    "FunctionError": "Handled",
    "ExecutedVersion": "$LATEST"
}

Here’s a successful invocation:

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

For CircleCI, here’s what I ended up with:

- run:
    name: Invoke Lambda function
    working_directory: ~/botland/packages/database
    command: |
    . venv/bin/activate
    aws lambda invoke --function-name $CREATE_DATABASE_LAMBDA_FUNCTION_NAME output.json
    cat output.json
    cat output.json | jq -r ".FunctionError" | grep -v -e "Handled" -e "Unhandled"

I could probably have used “tee” if I wanted to write to output.json and print to stdout at the same time, but I figured this is clearer to me.