Vista Speech Recognition Vulnerability?

February 1, 2007 Dr. Jones

As much as some security groups are salivating over the idea, including SANS, this is not a vulnerability. Not really even a weakness.

SANS contends that you can be tricked into downloading a wav file that is specially crafted to take advantage of the speech recognition capabilities in Vista to perform a malicious action- like deleting a file or opening IE7 to go to a malicious URL to download some payload.

From Sans here:

The best picture in my mind of this attack vector is a large trading room, in the middle of the night, and one computer shouting out loud “start listening”, “start”, “internet explorer”, “download <some tinyurl>”, etc.

So, how about prevention? Well, the answer is that you should disable Speech Command for the time being or use it carefully and wait for Microsoft to issue a patch which ignore output from the computer’s own speakers.

Microsoft counters that information on their Technet blog here:

It is not possible through the use of voice commands to get the system to perform privileged functions such as creating a user without being prompted by UAC for Administrator credentials. The UAC prompt cannot be manipulated by voice commands by default. There are also additional barriers that would make an attack difficult including speaker and microphone placement, microphone feedback, and the clarity of the dictation.

SANS recommends disabling the Speech Tools because they fear it is a vector to exploit machines. Clearly, the exploit is extremely unlikely to happen. However, I still recommend that enterprises disable the Speech Recognition Tools for a different reason: Superstitious, clueless users.

As an incident handler for various organizations, I have seen over a dozen cases where a user reports that a hacking incident is taking place because “someone is typing on my screen!” And this is not with Vista, but with XP and MS Office tools installed to provide speech to text. What the user believes is an invader typing on their screen is invariably the microphone picking up background noise or voices and attempting to write it into an open email or a Word document. The solution is to turn off the speech tools since the user is obviously too stupid to understand their own software.

You see, the speech recognition software is unconfigured by default. And it takes hours of ‘training’ the software by reading stories into the computer for it to recognize anyone’s voice with any measurable amount of success.

But what is worse than believing your system is hacked because you see someone typing? Why, having an incident handling team believe it too. I once worked with an incident handling team that was ready to dispatch the FBI to a critical site because a user believed that not only was there a hacker writing on their screen, but they were typing in Russian! I had to calm a lot of supervisors down with a firm voice of reason and I had to train several team members to think critically and to use evidence to reach their conclusions, rather than to react excitedly to hysterical claims by end users.

BelchSpeak

Vista Speech Recognition Vulnerability?

Dr. Jones

Leave a Reply