Random header image... Refresh for more!

End of Day 1

So, at the end of the first day, one thing is clear:  It’s not just easy to use the speech recognition libraries in .Net 3.0’s System.Speech.Recognition libraries, it’s really really freaking easy to use them.  It’s just a handful of lines to start recognizing speech.

The harder part is figuring out what you want to recognize it and how to deal with it after you’ve recognized the speech.

There’s really not that much left for me to do with the speech recognition at this point.  I wanted to spent time to get it working but it didn’t actually take any time to get it working…  Any more work will be toward a specific use, which I wasn’t really planning on doing here.  So then, that means tomorrow will focus on the facial recognition side of the project.

November 25, 2009   No Comments

reform might be the poetry

Like I said… Speech recognition sucks.

Instead of doing a grammar with choices, like I did before, you can use a DictationGrammar. Free form dictation. Free form, like beatnik poetry. As in it ends up making as much sense as beatnik poetry.

I have no idea what I said, but it certainly was not the following:

into the earth and one half of this room here we are we’re on

After that sentence got derailed, for a fun exercise, I decided to see how the speech recognizer would interpret the words of others, say, for instance, the Constitution:

we have people of the United States or 11 for you, justice for domestic front of the provider, and jail where welfare and secure the blessings of liberty to ourselves and our posterity Jordan’s challenge this constitution the united states of America

And to that, I believe Madison would have said “Here we are, we’re on!”

November 25, 2009   No Comments

And a better example…

Here’s an example of using System.Speech.Recognition to recognize the words “red”, “blue”, and “green”.

using System;
using System.Speech.Recognition;

namespace MathPirate.AlternativeInputDevices.SpeechCommandProcessor
{
    public class SpeechProcessor
    {
        protected SpeechRecognizer Recognizer { get; set; }

        public SpeechProcessor()
        {
            Recognizer = new SpeechRecognizer();

            Recognizer.SpeechRecognized += (sender, e) => Console.WriteLine(e.Result.Text);
            Recognizer.SpeechDetected += (sender, e) => Console.WriteLine("Detected: {0}", e.AudioPosition);
            Recognizer.SpeechHypothesized += (sender, e) => Console.WriteLine("Hypothesis: {0}", e.Result.Text);
            Recognizer.SpeechRecognitionRejected += (sender, e) => Console.WriteLine("Rejected: {0}", e.Result.Text);

            GrammarBuilder builder = new GrammarBuilder();
            builder.Append(new Choices("blue", "green", "red"));
            Recognizer.LoadGrammar(new Grammar(builder));          
        }
    }
}

No, seriously. That’s it. And most of it’s debug writelines that you don’t even need.

November 25, 2009   No Comments

Well, that seems to be working…

Of course, this is completely meaningless without the audio to go along with it, but still…  It had a decently high hit rate for something that was quickly thrown together in an attempt to get something going on.

Say Something
Detected: 00:00:00.5400000
Rejected: blue
Detected: 00:00:01.0300000
Rejected: blue
Detected: 00:00:01.7600000
Rejected: blue
Detected: 00:00:02.7600000
Hypothesis: green
green
Detected: 00:00:04.8900000
Hypothesis: red
red
Detected: 00:00:07.8000000
Hypothesis: blue
blue
Detected: 00:00:10.8700000
blue
Detected: 00:00:15.3700000
Hypothesis: red
Rejected: red
Detected: 00:00:15.7300000
Rejected: blue
Detected: 00:00:18.8700000
Rejected: red
Detected: 00:00:20.4500000
Hypothesis: blue
Rejected: blue
Detected: 00:00:21.0500000
Rejected: blue
Detected: 00:00:21.4100000
Rejected: blue
Detected: 00:00:21.6900000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:31.7400000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:34.4500000
blue
Detected: 00:00:36.4600000
Rejected: blue
Detected: 00:00:38.6400000
Hypothesis: green
green
Detected: 00:00:40.2800000
Hypothesis: blue
Rejected: blue
Detected: 00:00:40.8800000
Rejected: blue
Detected: 00:00:42.6900000
Rejected: red
Detected: 00:00:50.6900000
Hypothesis: red
red
Detected: 00:00:52
Rejected: blue
Detected: 00:00:52.4400000
Hypothesis: blue
blue
Detected: 00:00:53.5600000
Rejected: blue
Detected: 00:00:55.0100000
Hypothesis: green
green
Detected: 00:00:56.4800000
Rejected: blue
Detected: 00:00:58.1300000
red
Detected: 00:00:59.8000000
Rejected: blue
Detected: 00:01:01.3400000
Hypothesis: blue
blue

November 25, 2009   No Comments

Rule #1: The Documentation Must Not Suck

So that’s how you create, initialize, and     a SpeechRecognizer…

WonderfulExamples

Besides the fact that this example is apparently showing something that’s invisible, it’s not even really showing the creation and initialization, either.  Calling your local functions “SetupEventHandlers()” and “LoadInitialGrammars()” isn’t exactly all that helpful to me.  I understand that I need to set up event handlers and load grammars.  THAT’S WHY I’M READING THE DOCUMENTATION:  I know I need to do something, but I don’t know how.  You’ve pretty much shown me how to call the constructor on your class.  I managed to get that bit on my own, remarkably enough.

November 25, 2009   No Comments

Speech Command Processor

As mentioned before, one of the tasks will be to write a speech-activated command processor.  For this, I do not mean that I’ll be doing a full speech recognition engine capable of dictation.  Instead, I mean that want to do something that will be able to recognize commands, like a “Say ‘one’ for service in Swahili” menu system or telling the computer to perform some operation like “Open the pod bay doors”.

There are two reasons for this limitation:

  1. I don’t need full dictation support for the application I have in mind.
  2. Speech recognition sucks.

Obviously, any form of speech recognition is a highly complex task, involving in-depth knowledge of linguistics and signal processing and all sorts of related things that I know nothing about.  That is why I am very glad that I don’t have to write any of it.  You see, one of the namespaces included in .Net 3.0 was something called System.Speech.Recognition.  It looks like classes in that library will pretty much do everything for me, hopefully making this task dead simple to implement, while seeming really impressive to anyone who doesn’t know about them.

November 25, 2009   No Comments

Crazy Weekend Project 2: Semi-Crazy

As you may recall, back in September, I spent five solid days building a robot that could play Atari 2600 Pong. It wasn’t perfect, but it did beat the computer player in several matches. However, there was significant room for improvement. The motion was too jerky, the trajectory projection algorithm had problems, and the robot was no match for a human player. Over the next five days, I will not be continuing that project.

You see, a couple of weeks ago, I finally bought an XBox 360, so I just don’t have that kind of time to devote to building robots at the moment.  Instead, I’ll be doing something much more practical and limited in scope, and only spend a few hours a day on it.  The rest of the time I’ll be alone in my apartment, immersed in HD gaming glory, like any other sane person would be this weekend.

Now, by “more practical and limited in scope”, I mean that I intend to attempt to build a facial recognition system and voice activated command processor.  The reason for this is plain:  Everyone needs a facial recognition system and voice activated command processor.  What good is a computer without one?  Additionally, these are two of the three necessary pieces that I need in order to fully exploit an HP TouchSmart PC that I got from Haggle.com a few weeks back, which has an integrated webcam and microphone.  The third piece, exploitation of the multi-touch screen, is left as an exercise to the reader.

As with the previous Crazy Project Weekend, I have not done any work in these areas or used any of these technologies prior to the commencement of the Crazy Project Weekend, other than a cursory glance to make sure that I’d have a chance of doing something useful in the timeframe alloted.  Additionally, I will be sharing successes, failures, thoughts, and above all, source code, which, in this case, might actually be useful to other people.

November 24, 2009   No Comments