Google and Amazon smart speakers can be leveraged to record user conversation or to phish for passwords through malicious voice apps, security researchers warn.
Unless the two companies take measures to improve the review process and the restrictions for apps integrating with their smart devices, malicious developers could exploit the weakness to capture audio from users.
Called ‘skills’ for Amazon Alexa and ‘actions’ for Google Home, voice apps for these smart speakers are activated using a phrase (‘invocation name’) designated by the developer to start the app, which is typically the name of the app.
"Hey Google/Alexa, turn on my Horoscope" - invoking the Horoscope app
Functions of a skill or an action are called via an ‘intent’ – a set phrase that can have slot values for custom variables. In many cases, these reach the developer’s server.
"Tell me my horoscope for today" - intent with the slot value 'today'
Users can tell the smart speaker to stop a skill or action, but security researchers at Germany’s Security Research Labs demonstrated that an app can bypass this intent and keep on listening.
They show that a malicious app behaving this way can receive the security approval stamp from both Google and Amazon and put user privacy at risk as the converted audio reaches a third-party server.
A malicious Amazon skill used for eavesdropping could come with a modified deactivation intent that does not turn it off. Instead, the session could stay open for a defined period.
SRLabs achieved this by changing the ‘stop’ intent to keep the skill running instead of turning it off. Users will still hear the ‘Goodbye’ message signaling the end of the skill, though.
To keep the speaker silent during the eavesdropping session, the researchers added the Unicode character sequence “�. “ (U+D801, dot, space) after the intent.
The sequence cannot be pronounced and the speaker will stay quiet for a few more seconds while the malicious app listens to the conversation. The eavesdrop time can be extended by adding the characters multiple times.
With a second intent triggered by specific words, an attacker can record sentences as slot values. This would act as a backup method to spy on users.
Eavesdropping on Google Home
Actions for Google Home could monitor for user speech for much longer, because of its design. By putting the user in a loop, the device sends a continuous stream of recognized speech to the attacker without making a peep to signal its activity.
The speaker is designed to wait for about nine seconds for vocal input and stops for a brief moment. This is repeated three times before the action is deactivated. When speech is detected, the count is reset.
This hack is possible by changing the main intent to end with the ‘Bye’ earcon sound, normally used to mark the end of a voice app.
Multiple ‘noInputPrompts’ with SSML element or the unpronounceable Unicode character sequence will silence the speaker silent while the eavesdropping action continues its speech-to-text activity.
Changing the malicious intent occurred after the apps passed the initial review from Amazon and Google, and the modifications did not trigger a second verification.
Phishing the password
Using similar tricks, SRLabs demonstrated another attack scenario that could fool users into giving up their passwords.
For this purpose, the silence given by the Unicode characters is cut short to play a phishing message.
“An important security update is available for your device. Please say start update followed by your password,” could pass for a genuine request. However, neither Google nor Amazon asks for passwords this way.
Anything the user says after this message is turned into text and delivered to the attacker’s server. SRLabs created two additional videos to show how this would work.
Amazon Alexa password phishing
Google Home password phishing
The German researchers recommend that unpronounceable characters be removed and allowing sensitive output that could be used to extract secret information should be considered more carefully.
By Ionut Ilascu