Hidden Complexities in Voice Enabling Smart IoT Devices

3 min readApr 30, 2019

Voice Enablement of Internet of Things (IoT) Devices is Inherently Complex

Speech is an integral part of human communication perfected over thousands of years. Voice predates any form of written communication in our evolutionary history and humans naturally gravitate towards voice when available. Exponentially accelerating stats around the adoption of Intelligent Voice Assistants like Alexa, Google Assistant, Siri, Cortana, etc. provide undeniable evidence to the “Voice is the new Touch” phenomenon. “Shark Tank” billionaire investor Mark Cuban also feels that the new business successes might come from the voice domain.

Data Sources: TechCrunch and CBInsights

Voice is increasingly being used for “checking the weather,” “shopping groceries,” “booking a cab” or “controlling electronic appliances” these days. While these assistants and services get labeled as “Intelligent,” or “Simple,” the effort to understand the end-user needs and implement them at scale is fairly complex. Providers of these backend services — weather data, taxi cab booking, smart device manufacturers, etc. — have to spend significant resources in providing these “easy” end-user interactions over voice.

Designing a voice experience requires specialized skill set that is not the same as designing a Web or Mobile App, or an API.

**Designing a Voice Enabled IoT Lightbulb**

As an example, let’s try to understand challenges in front of a smart device manufacturer while trying to support various voice assistants for their customers. Imagine a Manufacturer X is planning to launch voice-enabled smart light bulbs that can be controlled by Amazon Alexa and Google Assistant. Here are some of the project steps they have to follow:

1. Build a cloud infrastructure where lightbulbs will connect to after joining the consumer’s Wi-Fi.

Implement user identity management & accounting.
Provide device identity management.
Firmware management and other basic requirements for secure communications between cloud and all consumers’ devices.

2. Develop secure firmware for the lightbulb to protect consumers from botnet attacks.

3. Integrate separate API/SDK with each of the desired voice assistants like Amazon Alexa, Google Assistant, etc.

4. Test all of these functionalities at scale.

5. Obtain voice provider certification for public access.

While the above project plan is over-simplified, one can easily anticipate that Manufacturer X’s time to market the lightbulb with voice assistant support can take an additional 6 to 12 months.

Time to Release Voice Compatible Devices

Of course, these numbers can vary based on the engineering skill set, budget, etc. but tasks in front of Manufacturer X are tough, nonetheless. Also, the complexity increases exponentially to support device functionalities beyond merely turning a lightbulb on or off. Quality of experience is another important metric that is often compromised to achieve quick time-to-market, but that discussion is for another day.

As a device manufacturer, you might wonder with some of the following questions:

How can we reduce time-to-market from a few months to a few days?
Do we have in-house engineering skills to build an intuitive voice assistant service across multiple voice providers like Alexa and Google?
How can we future-proof our investment when a new voice provider enters the market? Or more pragmatically, some API/SDK changes that take place are not backward compatible.

We will address some of these daunting questions and outline the options available to a device manufacturer in our next post. Feel free to check out Speak to IoT in the meantime.