Amazon Alexa and many other voice assistants are taking over the world with their automation of daily tasks. And, given how fast-paced our lives have become, it seems like there is no better way to stay on top than by utilizing these new innovations for your routine day-to-day needs!
Throughout the course of this article I will take you through the development of an Alexa skill, utilizing Alexa Skills Kit, Boto3, and Amazon S3. All hosting of the finished product will be through AWS Lambda.
We will begin by developing the Alexa Interaction Model (“frontend”) using the Alexa Skills Kit. Then we will work through the “backend” with Alexa SDK and incorporate the two systems in AWS S3. Let’s get started.
Information on Alexa development
Designing for Alexa means building with the help of the Alexa Skills Kit. We can create skills that are supported by the Alexa Voice Service interaction model and then integrate them on the backend with Alexa SDK.
Similar to other common web applications, Alexa skills have 3 tiers. There is frontend coding (Alexa Voice Service Interaction Model), backend (Alexa SDK in Python or NodeJS) and a storage (Amazon DynamoDB) which isn’t necessary unless you want it to be memorable to Alexa.
For our purposes, we are only going to address the first two pieces.
There are three main parts to our building in this article:
- Alexa Voice Service
- Alexa SDK
- The platform/service we will integrate with utilizing Alexa (AWS S3)
Before beginning this project, you should login in to your account with AWS and Alexa Development Platform. If you don’t have an account yet, please take a moment to register and then resume the tutorial.
Start by accessing this link. You should see the following webpage:
By clicking the ‘Create Skill’ button you will be taken to this page (see image below for reference). You can name your new skill, choose a default language for it and also pick out which model best suits its needs. For my article I chose custom but there are plenty of other models available too like smart home devices or chatbots!
Next you need to select a method to host your skill’s backend. You should choose a hosting method that aligns with your tech stack. Options include Python or Node.js, or alternatively you could choose to “provision your own” if you plan on going outside the default Alexa hosted platform.
With all of the technical details set, click on “Create Skill” which will take you to this page.
By reading through the tags on each template you should be able to determine some of the core features of each. Take some time to peruse these and choose a template which sounds similar to the skill you would like to develop. The screenshot above showcases just a few of the many options available.
Once you have selected, go ahead and click the “create” button which will then generate your new skill using template code from the template you chose in a previous section. This can be edited to suit custom specifications perfectly!
Now a new skill dashboard console will be generated.
You will next need to choose a keyword or phrase that will start/invoke your newly created skill. The name of your skill/the invocation phrase (these are the words that you speak to tell Alexa to perform your skill) is usually the name of the skill from the template, but you can change this default setting in the “Invocation” tab on your skill console. It is located within the Build tab, check out the image below if you need help locating.
Next, we will move to formatting the frontend.
We will define and build the “frontend” for the Alexa skill in the Alexa Voice Service interaction model. The interaction model is an interface that allows a user’s interaction with the skill through Alexa to be digested and filtered to the backend.
As developers, we used the interaction model to map the user’s invocation phrase or verbal input to intentions specifically defined in the skill service hosted by Alexa in the Cloud.
Frontend building blocks:
An intent represents an action that fulfills a user’s spoken request/phrase. These can optionally have arguments called slots.
A slot is a variable value that is supplied to the intent at runtime through an utterance spoken by the user.
An utterance is a spoken phrase that a user may speak.
All of the Alexa backend is constructed using Alexa Skills SDK. There are a number of coding languages that can be used including Python, Java and Node.js, but for our purposes we will be using Python.
The way the backend operates is that each time a user voices a command, the Alexa voice server “digests” it, passes it through the “frontend” (interaction model) to check for possible invocations, and if there is a match found sends it to the intent which the utterance matches to.
Specific words and phrases that users of Alexa can use to invoke a skill are specific, it is important for programmers to include as many representative phrases as they can think of that may be used by their users when mapping the skill. The interaction model is very specific, and not intuitive, so each possible utterance needs to be specified by the coder in order to lead to a skill for Alexa.
One of the purposes of the Alexa Voice Service is to query the backend for an intent handler of words or names that may be similar to the invoked intent and these must also be “labelled” as handlers for that intent. Examples of this will be evident as we work through the code.
The intent handler (backend) processes the request, and formulates an answer for Alexa Voice Service to relay to the user through any device such as the echo dot or the echo show.
The backend of each Alexa Skill must be hosted, and the location of the hosting needs to be declared when the skill is created. Hosting can occur on any platform that can utilize an HTTP(S) API endpoint. Some platform options include Kubernetes, EC2, GCE and AWS Lambda.
For this tutorial we will use AWS Lambda hosting as the backend/endpoint but the app will be deployed and maintained by the Alexa development platform. This is because most of the focus needs to remain on the actual Alexa Skill and not on the underlying infrastructure.
The next step of the process will be to build out the backend of the skill, set each of the user permissions that are required for the AWS IAM, and then launch the completed work for testing.
The skill that we have begun creating for Alexa in this process is to generate, delete and list a registered AWS user’s S3 buckets.
Building the backend
As part of the initial implementation, we need to integrate Alexa with AWS S3 through its API. It’s helpful that the engineers at AWS already created very accessible APIs for each of their functions and paired them with easy to follow instructions on how to use SDKs.
As we discussed earlier in this tutorial, we will be using AWS Python SDK, Boto3. Part of the reason is because Amazon’s AWS literature says:
“Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services”
That sounds great, but at this point it is all just talk; let’s get to writing some code.
Start by creating a
file, or if you choose give the file another name, and within the file include the following code:
We import all of the packages that we will need to build the backend and set up STS (Secure Token Service), the AWS security credentials service, in the section of code pictured above (lines 16-27). Line 18 will be addressed shortly when we are setting up AWS IAM permissions.
We choose the bucket name to be created from the slot value we assign to the variable “bucket_name” on line 55.
The S3 bucket mentioned earlier under the command “create kampala bucket” was this particular location.
For lines 57 to 63 we now create the S3 bucket with the help of AWS’ Boto3 SDK and also set the “speak_output” variable with appropriate responses when certain checks (“if” statement on line 59) pass or fail.
For line 64 we handle errors incase any occur while we try to create the S3 buckets.
On line 69 we end the execution of the “create bucket” class definition and simultaneously we tell alexa service to create and deliver our response to the user. You can find this delivery on lines 70-73.
For this Alexa Skill I have “created”, “listed”, “counted” and “deleted” S3 buckets.
In the code block pictured above you will find one of the classes that creates an S3 bucket. In order to make this article a manageable length, I have left out some of the remaining code, but if you are interested in finding any of the missing pieces please check out this GitHub repository.
Also, in an effort to make the code easily accessible so that anyone can follow this tutorial I have simplified the code, but if you do need further clarification on any part of this tutorial, please let me know in the comments.
AWS IAM permissions
Because the Skill will be hosted on AWS Lambda and we are using the S3 API, we need to allow the AWS Lambda to call the S3 API when in the process of running the skill backend. To accomplish this, we will use an assume role. This will allow “trusted” users or services (Lambda, EC2, etc.) to assume a “role” for a prescribed time period (1 hour or 10 minutes depending on how the assume role is code). In simpler terms it grants an entity all permissions for a prescribed amount of time. We will use the commands “create”, “list” and “delete” as permission scopes.
When you go to create an assume role, you should see something like this:
Choose “AWS Service’ as the entity type that will be trusted, then click on “Lambda” as the function that will route to AWS services on your behalf. You are basically giving the AWS service permission to use this role.
Then, click the “Next: Permissions” button. Clicking will take you to the following page to assign permissions to the role:
Select “AmazonS3FullAccess” to grant all S3 access rights to this role. This can seem excessive, but it is necessary for the skill to run correctly.
If you want to, you can add tags by clicking the “Next: Tags” button and go to the next page to include some tags. Finally, click next to go to the page so that you can give your role a name and create the role.
It’s official, you have now created a role. Next you need to “tell” the role which users are allowed to operate.
Find your newly created role, like in the picture below, in the IAM section of your AWS console and click on it.
You will then be taken to the next page where you can see your role in detail.
To edit the entities that will use this role, go to the “Trust Relations” tab. For our purposes, the Alexa hosted service lambda function is the entity that will make use of this role, so we need to add Amazon Resource Name (ARN) among the trusted relationships. To accomplish this, change over to the Alexa Developer Console and under the “code” tab choose the icon shown in this image:
This will display the ARN for your skill as shown below.
Copy this code, go to the AWS console and insert the copied ARN as the assume role trust relationships and the new trust relationship configuration JSON file will look like this: (don’t pay attention to line 15).
The purpose of lines 12 to 20 (which we added) is to allow the Lambda function that hosts our Alexa skill to assume the role on line 16.
Looks good, now it is time to test out our new skill.
Go back to the Alexa Developer Console here and then go to the “test” tab. You can enable the microphone and speak the commands or simply type them in using your computer keyboard. Which I will do for the sake of this article.
Make sure you invoke the Alexa Skill using the same word or phrase name you programmed at the beginning. For my skill the invocation will be “devops cloud assistant.”
This will trigger the “LaunchRequest” intent and subsequently the intent handler and alexa will speak the phrase “Welcome back sir. how may I help you today?” as I set it through the “speech_text” variable.
Try out different command words or phrases that are similar to those utterances. In order to invoke the proper response, the utterances need to be identical to those coded for otherwise a fallback intent will be triggered and that will not elicit the same response.
As with other Alexa skills, one she has responded to your initial request she is available to accept further commands for 30 seconds. After the allotted time you need to invoke the skill again by using the original command.
And that’s it, you have created an S3 bucket on Amazon AWS using Amazon Alexa.
Remember, you can use all these voice commands on any of the array of Amazon speaker enabled devices like Echo or Echo Dot.
Try using other “commands” to either delete or expand your S3 buckets. You can confirm your action by going to the AWS S3 page to check any created or deleted buckets.
All the code for this is pushed to this GitHub repository for your reference.
While this code certainly isn’t perfect, it does work, and as our forefathers have always said, “If it works, don’t mess with it.”
A few bottlenecks to look out for
- A user must fulfill the requirements for all platforms they are “glueing” together to avoid headaches. These include AWS IAM permissions, which allow an Alexa skill access to the AWS S3 API. Essentially this is authentication.
- Alexa skill builder slot types are limited to a set of different categories, like numbers and movie titles. Some slots can only be used in certain regions though- for example if you type them into the search bar while being located in Kenya, you won’t see any that apply there. It’s worth noting that it is possible to create your own custom category by specifying what they should contain like african surnames or village clan chiefs for example.
- Bucket naming conventions are important for those looking to store data on the internet. This is a sticky issue because when you’re creating such projects, the slot type may be fine but that name is not unique to the whole S3 universe, so it may not be accepted by S3 because it is only specific to the alexa slot type.
All these requirements must align like Orion’s belt to pull off a project of this sort seamlessly. The slot type you choose determines what bucket names you can create and those must align with the S3 buckets naming convention.
While this project does show many of the possibilities and capabilities that can exist for engineers in the software delivery industry in regards to Alexa, this project is just the tip of the iceberg in terms of code quality, creativity and best practices of Alexa. Feel free to take the skill we have practiced here, and make your own additions to the code, or expand on the ideas. Use this tutorial as inspiration to create many more functions for Alexa that can further automate or simplify your life.
Just think of telling Alexa to take all your semester exams and report back on the grades you received, or commanding Alexa to build a Docker image for a given microservice all while you are watching a movie in your living room. Just imagine… The sky is the limit.