Automating Tweets by Processing Events from S3 with AWS Lambda [how-to]

Getting Started

First, before you continue with this post, please read [part one]({{site.base_url}}{% link _posts/2018-03-02-sending-events-to-aws-lambda-from-s3.md %}) if you haven’t already. Second, if you would like to review the complete source code, you can find it on our Github and use the code to follow along or modify it for your own use. Last, to replicate the process used here, you will need to get Twitter credentials for your account from Twitter Application Management. As for the Twitter API, you can find it on the Twitter Developer page and use it directly or use a library such as this python package from PyPI. I’ve chosen to use the Python package but that’s my preference.

Brief Reminder for Context

The purpose of this Lambda is to generate a tweet when a new post is available. More specifically, it will create a tweet when a new object whose key begins with /posts is put into an S3 bucket. The Lambda is triggered by an event produced by the S3 bucket of interest.

References

I realized that I’ve provided a large number of links throughout the text so I’ve included them all here for quick reference.

Scope

The scope of this post is limited to the Lambda function (the code) that is triggered by an event in S3. This means that the AWS-specific aspects like deployment aren’t included. I decided that it would be easy to lose focus and delve into other concepts such as Identity and Access Management (IAM). Instead, I want to highlight some of the code-specific considerations and provide a real example of how to use this technology.

Function Overview

Below is the entire handler. The handler is the entry point; it will be called when the function is invoked. Each function is run inside a container that may be reused so the handler can be invoked more than once per container deployment. This is an important consideration when we discuss environment variables and Key Management Service (KMS) below. I’ll explain each part beginning with the function definition itself.

{% highlight python %} def lambda_handler(event, context): logger = logging.getLogger(‘tweet’) logger.setLevel(logging.INFO) keys = [record[‘s3’][‘object’][‘key’] for record in event.get(‘Records’, [])] if not keys or len(keys) > 1: logger.error(‘Only one new post at a time is expected.') return

t = twitter.Twitter(auth=twitter.OAuth(token=DECRYPTED_TOKEN,
                                       token_secret=DECRYPTED_TOKEN_SECRET,
                                       consumer_key=DECRYPTED_CONSUMER_KEY,
                                       consumer_secret=DECRYPTED_CONSUMER_SECRET))
tweeter = tweet.Tweeter(t)
tweeter.tweet(f"{URL_BASE}/{keys[0]}")

{% endhighlight %}

Function Definition

{% highlight python %} def lambda_handler(event, context): {% endhighlight %}

lambda_handler has two parameters: event, which is usually a dict and contains attributes about the event causing the function to run, and context, which contains runtime information. I will be using the event object because it contains attributes that I need but not the context object. An example of an event object can be seen in the [repository] put event.

Logging

Next, logging is configured. Here a logging object identified by tweet is created. Using the method logging.getLogger(name) and specifying a name will return a reference to the same object so it can be configured once and then reused. This is useful when working with a logging object across modules. Some of our code has been extracted to a separate class so reuse is helpful here. Separating the handler from the business logic is one practice listed in AWS’s Lambda best practices.

{% highlight python %} logger = logging.getLogger(‘tweet’) logger.setLevel(logging.INFO) {% endhighlight %}

Getting Events

Now the real works starts and the S3 object keys are retrieved from the the event object. List comprehension is used to build a list of keys that are in the event object. In this particular use case, when an event is received, it is assumed that a single S3 bucket object will be in the event. Anything other than a list of size 1 is considered invalid. Remember that this Lambda is triggered when a key that begins with /posts is put into an S3 bucket. Given that posts are generally created one at a time, one is the only valid length.

{% highlight python %} keys = [record[‘s3’][‘object’][‘key’] for record in event.get(‘Records’, [])] if not keys or len(keys) > 1: logger.error(‘Only one new post at a time is expected.') return {% endhighlight %}

Creating the Twitter Connection

If an event has been received, and it contains a single key, a connection to Twitter is created using the twitter library.

{% highlight python %} t = twitter.Twitter(auth=twitter.OAuth(token=DECRYPTED_TOKEN, token_secret=DECRYPTED_TOKEN_SECRET, consumer_key=DECRYPTED_CONSUMER_KEY, consumer_secret=DECRYPTED_CONSUMER_SECRET)) {% endhighlight %}

Using Environment Variables for Sensitive Information

Notice that there aren’t any values hardcoded here. Instead, variables defined outside of the handler are used. Normally I wouldn’t recommend global variables but in this case, they are preferred.

{% highlight python %} URL_BASE = os.environ[‘URL_BASE’] DECRYPTED_TOKEN = decrypt_env_var(os.environ[‘TOKEN’]) DECRYPTED_TOKEN_SECRET = decrypt_env_var(os.environ[‘TOKEN_SECRET’]) DECRYPTED_CONSUMER_KEY = decrypt_env_var(os.environ[‘CONSUMER_KEY’]) DECRYPTED_CONSUMER_SECRET = decrypt_env_var(os.environ[‘CONSUMER_SECRET’]) {% endhighlight %}

Using a convenience function for decryption (shown below), the values for sensitive information are retrieved from encrypted environment variables. The encryption is important due to the nature of the values.

The decryption is also the reason for defining the variables outside of the handler. If they are defined in the handler, the environment variables will be decrypted every time the handler is called. KMS is a great service but it has associated costs. If you are creating a customer master key (CMK), which is used for in-transit encryption of environment variables, you will pay for each CMK and per x number of requests; I use x intentionally as the price will vary depending on region and when you are reading this. Therefore, defining them once can save you money throughout the life of your function.

{% highlight python %} def decrypt_env_var(env_var: str) -> str: "”” example return value from decrypt { ‘KeyId’: ‘string’, ‘Plaintext’: b’bytes’ } "”” return boto3.client(‘kms’).decrypt(CiphertextBlob=base64.b64decode(env_var))[‘Plaintext’].decode(‘utf-8’) {% endhighlight %}

Notice that Plaintext above is a bytes object and not a str so it will need to be decoded. It took a bit to realize this so don’t make that mistake.

On the subject of CMK and KMS, I encourage you to create your own key as it provides more security. If you use the default key, you can encrypt your environment variables at rest but not in transit. Encryption in transit is important for protecting sensitive information such as your Twitter API keys.

Posting a New Status

Finally, the new status (new tweet) is created and posted.

{% highlight python %} tweeter = tweet.Tweeter(t) tweeter.tweet(f”{URL_BASE}/{keys[0]}") {% endhighlight %}

In an attempt to separate the logic needed for AWS Lambda and the business logic, a separate class is used to actually create the tweet (see twitter.py in the repository). Admittedly, it’s a small class but it can aid in testing to create separation. The Tweeter class that posts the tweet is shown below. Most of the code uses the twitter library.

{% highlight python %} import twitter import logging

class Tweeter: def init(self, t: twitter.Twitter) -> None: self.connection = t self.logger = logging.getLogger(‘tweet’)

def tweet(self, new_post: str) -> None:
    status = f"Check out our latest blog post: {new_post}"

    try:
        self.connection.statuses.update(status=status)
        self.logger.info('Successfully created new tweet.')
    except twitter.TwitterError:
        self.logger.error('An error occurred while creating tweet.')

{% endhighlight %}

In order to reuse the configured logging object, a named logging object is created in the init method of Tweeter.

Future Improvements

Admittedly, there are things that can be done to improve this setup for a more production-ready deployment. First, using a database such as DynamoDB can aid in tracking posts and preventing duplicate tweets in the event of edits to existing posts. Also, given our setup, when Jekyll builds the site, it builds all files so S3 will mark all posts as new posts, which isn’t accurate and using an alternative eventing strategy may be required. Lastly, you can consider adding some content to the post that could be used to provide some context to the tweet, outside of just commenting that a new post is available. One possibility is allowing Lambda access to the S3 bucket to retrieve and parse the post.

I hope that this example provides you with enough to get started with your own Lambda-based applications. If you have questions or comments, please post them in the Disqus section below.