First, before you continue with this post, please read part one if you haven’t already. Second, if you would like to review the complete source code, you can find it on our Github and use the code to follow along or modify it for your own use. Last, to replicate the process used here, you will need to get Twitter credentials for your account from Twitter Application Management. As for the Twitter API, you can find it on the Twitter Developer page and use it directly or use a library such as this python package from PyPI. I’ve chosen to use the Python package but that’s my preference.
Brief Reminder for Context
The purpose of this Lambda is to generate a tweet when a new post is available. More specifically, it will create
a tweet when a new object whose key begins with
put into an S3 bucket. The Lambda is triggered by an event
produced by the S3 bucket of interest.
I realized that I’ve provided a large number of links throughout the text so I’ve included them all here for quick reference.
- Github repo for code
- Twitter applications
- Twitter developers
- Twitter library for Python
- AWS Lambda handler
- AWS Blog post on Lambda container reuse
- AWS Key Management Service pricing
- AWS Key Management Service
- AWS Lambda context object
- Python logging in AWS Lambda
- Python getLogger method
- AWS Lambda best practices
- Python list comprehension
- AWS DynamoDB
- AWS Identity and Access Management
- AWS S3 put event object
The scope of this post is limited to the Lambda function (the code) that is triggered by an event in S3. This means that the AWS-specific aspects like deployment aren’t included. I decided that it would be easy to lose focus and delve into other concepts such as Identity and Access Management (IAM). Instead, I want to highlight some of the code-specific considerations and provide a real example of how to use this technology.
Below is the entire handler. The handler is the entry point; it will be called when the function is invoked. Each function is run inside a container that may be reused so the handler can be invoked more than once per container deployment. This is an important consideration when we discuss environment variables and Key Management Service (KMS) below. I’ll explain each part beginning with the function definition itself.
lambda_handler has two parameters:
event, which is usually a
dict and contains attributes about the event causing
the function to run, and
context, which contains runtime information. I will be using the
event object because it
contains attributes that I need but not the
context object. An example of an
event object can be seen in the repository.
Next, logging is configured. Here a
logging object identified by
tweet is created. Using the method
logging.getLogger(name) and specifying a name will return a reference to the same object so it can be configured once
and then reused. This is useful when working with a
logging object across modules. Some of our code has been extracted to a
separate class so reuse is helpful here. Separating the handler from the business logic is one practice listed in AWS’s
Lambda best practices.
Now the real works starts and the S3
object keys are retrieved from the the
event object. List comprehension
is used to build a list of keys that are in the
event object. In this particular use case, when an event is received,
it is assumed that a single S3 bucket
object will be in the
event. Anything other than a
list of size 1 is considered
invalid. Remember that this
Lambda is triggered when a key that begins with
put into an S3 bucket. Given that posts are generally created one at a time,
one is the only valid length.
Creating the Twitter Connection
event has been received, and it contains a single key, a connection to Twitter is created using the
Using Environment Variables for Sensitive Information
Notice that there aren’t any values hardcoded here. Instead, variables defined outside of the handler are used. Normally I wouldn’t recommend global variables but in this case, they are preferred.
Using a convenience function for decryption (shown below), the values for sensitive information are retrieved from encrypted environment variables. The encryption is important due to the nature of the values.
The decryption is also the reason for defining the variables outside of the handler. If they are defined in
the handler, the environment variables will be decrypted every time the handler is called. KMS is a great service but it has
associated costs. If you are creating a customer master key (CMK), which is used for in-transit encryption
of environment variables, you will pay for each CMK and per
x number of requests; I use
x intentionally as the price will
vary depending on region and when you are reading this. Therefore, defining them once can save you money throughout the life of your
Plaintext above is a
bytes object and not a
str so it will need to be decoded. It took a bit to realize
this so don’t make that mistake.
On the subject of CMK and KMS, I encourage you to create your own key as it provides more security. If you use the default key, you can encrypt your environment variables at rest but not in transit. Encryption in transit is important for protecting sensitive information such as your Twitter API keys.
Posting a New Status
Finally, the new status (new tweet) is created and posted.
In an attempt to separate the logic needed for AWS Lambda and the business logic, a separate class is used to actually create the
twitter.py in the repository). Admittedly, it’s a small class but it can aid in testing to create separation. The
that posts the tweet is shown below. Most of the code uses the
In order to reuse the configured
logging object, a named
logging object is created in the
init method of
Admittedly, there are things that can be done to improve this setup for a more production-ready deployment. First, using a database such as DynamoDB can aid in tracking posts and preventing duplicate tweets in the event of edits to existing posts. Also, given our setup, when Jekyll builds the site, it builds all files so S3 will mark all posts as new posts, which isn’t accurate and using an alternative eventing strategy may be required. Lastly, you can consider adding some content to the post that could be used to provide some context to the tweet, outside of just commenting that a new post is available. One possibility is allowing Lambda access to the S3 bucket to retrieve and parse the post.
I hope that this example provides you with enough to get started with your own Lambda-based applications. If you have questions or comments, please post them in the Disqus section below.