We are just now scratching the surface of incorporating voice user interface (VUI) with natural language generation (NLG) and the future landscape is an exciting one. By utilizing NLG, platforms such as Amazon’s Alexa can now have more human-like conversations.

The current state of VUIs and voice user experience (VUX) leaves a lot to be desired. Most of the focus when designing a VUI that delivers a enjoyable VUX revolves around paying a lot of attention to a user’s potential requests. Responses however, typically sound awkward and very unnatural. By leveraging NLG, responses can become more human like, become conversational, and lead user’s on a unique journey through a skill. This leads to an optimal VUX.

In January, Automated Insights and Amazon Alexa hosted the first conversational language hackathon, highlighting this new incorporation of NLG. Teams from around the country gathered at Automated Insights’ headquarters to teach Alexa new NLG based skills. Demonstrating cutting-edge integrations using Automated Insights’ Wordsmith platform and Alexa, projects analyzed company stocks, evaluated soccer plays, reported on a loved-one’s well-being, shared a child’s academic status, suggested television shows, music and concerts, and prepared a financial advisor for a client meeting.

The Alexa + Wordsmith Hackathon Recap Video

As the technology continues to evolve, product managers, designers, and writers of voice interfaces interested in making new NLG based Alexa skills need to change the way in which they design those new skills. Quite simply, it is time to start “designing for the ear.” When setting out to develop a new skill which utilizes NLG, it is important to keep three things in mind.

  1. Have a good user interface
  2. Make it sound human
  3. Keep it simple

Building a Good User Interface:

Having a good user interface may seem like an easy tenet of designing a new skill for the Alexa platform. Often times, especially with a new kind of user interface where best practices are yet to be defined, this can be easily overlooked. It is essential to make functions discoverable.

Building Faith

As a user interacts with the device, the skill should offer suggestions that facilitate user discovery. Perhaps, including responses such as “Would you like to check your balance,” “Would you like to purchase more credits,” and the like offer a pathway to take a user deeper into the interface. Building trust with the user provides a great user experience. This can be achieved by a subtle or implicit confirmation of the user’s request. Provide responses that incorporate the user’s original question. “Buy tickets for the new Star Wars,” a user may state. Responding with, “Star Wars: Rogue One is playing at 7:30 and 9, which would you like?” lets the user psychologically build faith in the skill.

Accounting for Question Variability

Likewise, it is important to allow users to ask questions in different ways. For instance, if a user is attempting to check the balance within an account and ask, “How many credits do I have,” a bad response would be “I didn’t understand that, you can say ‘check your balance.'” By incorporating NLG into a new skill, the user’s seemingly vague question can be understood and appropriate responses can be utilized to guide the user through the depth of the new skill.

Making a Skill Sound Human

Sounding human seems like it should be easy, but it is incredibly easy to fall into the trap of providing only static responses. It is essential to adapt logically, providing insights rather than giving a user static information. For instance a bad response may look like, “You have 1000 credits. The average user has 600 credits.” Useful? Slightly. Human sounding? Not at all. Providing responses such as, “You are doing great! You have far more credits than the average user,” instills a conversational, human-like tone when responding to a user’s request.Responding in the same way every time sounds robotic. It’s important to naturally change word choice and sentence structure so that your responses don’t get stale. Using NLG, this is now possible and allows for personality to be incorporated right into the responses and provides the user with an incredible experience. The typical, robotic sounding error messages can benefit greatly from the use of NLG. Instead of responding to a user’s command with a similarly structured, “I’m sorry, I did not understand your request,” the platform can now suggest possible answers to an unrecognizable command. This allows the responses to adequately teach and guide the user.

Keeping it Simple

Being an early adopter on the forefront of a new and exciting integration of technology is exciting, but it is important to keep it relatively simple. Just like in good writing, it is important to keep it short. Be concise, be consistent. Use fewer clauses, as having too many independent clauses within sentences makes it hard for a user to parse the device’s response. This also introduces more instances for the text-to-speech engine to pause, making the conversation more awkward in tempo. Use simple, clear language. Don’t over complicate things with big words. Use language familiar to users, especially when your potential audience may include children. Simplicity is key.