Random header image... Refresh for more!

Victory At Last!

So, I finally got a new paddle cage put together.  While the old one was all stylish and minimalist, this one looks like it’s built out of Legos.  Real Legos, too, not those lame Technic beams.

Features include gearing to hopefully reduce the impact of the motor overrun, a button pressing motor (Not yet active), and flexible support arm for the angled knob.

I wrote a test program that will repeatedly rotate it like crazy and hopefully shake out the bugs.  Here’s video of it in action.

Of course, what kind of test would it be if it didn’t end in catastrophic structural failure?  Unfortunately, the camera was not rolling for the chaos, but I assure you it was about as spectacular as it could be given what it is.  Anyway, I did record the aftermath.

The flaw is fairly obvious in the first video, there’s some tall connector blocks on the spinner that kept knocking against the gear bar.  Eventually one of them caught and didn’t knock loose, but the motor spun the 360 anyway.  It kinda popped loose for a bit, then shuddered, before eventually collapsing.  I’ve since fixed the problem1 and had it run for upwards of ten minutes without issue.

It feels good to have some forward progress.

For those curious, the paddle in the cage is one of the Indy 500 Driving Paddles with a free 360 degree range of motion.  That’s how it was able to spin all it wanted without running into the range limitation that’s in a normal paddle.

  1. I was going to fix it before even running it the first time, but I wanted to see if it would cause a catastrophic structural failure… []

February 27, 2010   No Comments

Previously on Crazy Project Weekend…

A Crazy Project Weekend is when I take an extended weekend and dedicate my time to a AAA project:  One that is Achievable, Awesome, and slightly Abnormal.  There are a couple of rules, made up on the spot this instant, guiding the Crazy Project:

  • Work must be done within the limited weekend time frame.  You cannot begin any concrete work prior to the time window, and if you do not complete by the end of the time, you have failed.  You may do some preparation ahead of time, such as feasibility research or acquiring necessary materials, however, nothing should be built and there should be no written plans.  The point is to see what can be done in five days, not what can be done in five days and a couple of hours an evening for three weeks prior to those five days. 
  • It must not be something you would otherwise normally do.  Setting up a website with a blog and a bunch of pictures of the dog doesn’t count.  Cleaning the garage doesn’t count.  It must not be something that anyone would normally do.
  • You have to learn something.  If you know exactly what you’re doing going in, then it’s no fun.  One of the central pieces of the project must involve something you’ve never worked with before.  There must be several moments where you have no idea what in the hell you’re doing and wonder what you’ve gotten yourself into.
  • You must post regular progress updates throughout the weekend, detailing what you’re doing and what you’ve done.  Viewers must be able to get a glimpse of your thought process and understand what you’re going through.  You should talk about initial goals and milestones, obstacles you see on the path to those milestones, and the general approach you plan to take.
  • Reaction from outsiders to your project must be a mix of “Why did you do that?” and “Oh man, that is AWESOME”.
  • It’s fine to have a mental plan going in, to make sure that you’ve appropriately scoped the project so you have a reasonable chance of success, no matter how unreasonable the project itself may be.
  • Continuing the effort from a previous Crazy Project Weekend is acceptable, even though it violates some of the previous rules.

The first Crazy Weekend Project was over Labor Day Weekend, in September 2009.  I decided that it would be a good use of my time to build a robot out of Lego Mindstorms that could play a game of Pong on an unmodified Atari 2600 and win.  Initially, I had planned to make it play a perfect game of Pong, but I didn’t get there.  Full details here:  http://www.mathpirate.net/log/category/crazy-weekend-project-1-pong-robot/

The second Crazy Weekend Project was over Thanksgiving 2009.  It was limited, in that I only dedicated about half the day to the project (The other half being dedicated to XBox 360…).  There were two goals for this project:  Put together a speech recognition system capable of recognizing and responding to a set series of commands, as well as write a system that could identify faces.  Speech recognition came together very quickly, so the bulk of the time was spent trying to make Wesley Crusher disappear from episodes of Star Trek: The Next Generation.  Full details here:  http://www.mathpirate.net/log/category/crazy-weekend-project-2/

This will be my third Crazy Weekend Project.

February 25, 2010   No Comments

Web Automation (or: How To Write A Bot To Steal Porn)

A while back, I wrote about using the System.Windows.Automation libraries to write automation to drive Windows applications. With SWA and UIAutomation, you can write code to use Win32 apps, Windows.Forms programs, WPF and even Silverlight. That’s all happy and fun, as long as you’re only dealing with Windows applications. Trouble is, there this thing called “The Web” that’s all the rage with kids these days, and sooner or later, you’ll probably have to use it, too. If you pull out your handy installation of UISpy and try to inspect a web page, you get a whole big block of nothing. The red rectangle will outline the window and tell you that all of those things that look like text boxes and buttons aren’t really text boxes and buttons. That means you can’t use SWA for web sites.

That, well, that kinda sucks.  So, what do you do about it?

Obviously, the correct solution here is to admit defeat:   The tool you know about doesn’t work, so it’s too hard to do.  Time to give up and pay thousands of dollars a seat for some whiz-bang tool that promises to do what you need and even has a handy-dandy recorder, so you don’t even have to think about what you’re doing!

Or…  Not.

That whiz-bang tool is only going to cost you money and it’s not going to do a damn thing for you.  You’ll have to pay high-priced consultants and high-priced support engineers just to figure out how it works.  You see, their model is to cram so many features in and make it so complicated to use that you think that you must be stupid because you can’t understand it and as soon as you figure out that one last thing, you’ll be more productive than you ever were before.

And oh, will that test recorder make you productive!  You’ll be able to hire a monkey to point and click your way to hundreds of test cases with ease!  Except that they’re hundreds of useless test cases, because either the verification that the tool provides is hopelessly limited and unable to actually verify your website, or, well, you hired a monkey to do your testing and they have no idea how to do anything beyond pointing and clicking.  But that’s all right.  You see, as soon as a single line in the HTML of your web page changes in just the tiniest way, every last one of those recorded tests will break and you’ll have to completely redo them.

So, SWA is out and the big expensive tool is a total waste.  What else is there?

Well, there’s things like Selenium or WebAii or WatiN.  They’re free or open source libraries that you can use to drive web browsers to do your bidding.  They all support IE and Firefox and possibly other browsers.  And they’re all written by people who don’t seem to have ever tried to write web automation.

  • Selenium:  The default mode is to write your tests in HTML tables, with the thinking that “Anyone can write HTML tables, so anyone can write tests.”  That’s not what happens.  What happens is that you set it up, all the devs and PMs excitedly chatter about how “Anyone can write tests now!”, you give a training session, two devs out of a team of seven will ever write tests using it, creating a grand total of thirteen absolutely worthless tests before giving up, yet somehow, two months later, the director of software engineering will be talking to the EVP of product development and tell him how great it is that we’re using Selenium because “Anyone can write tests now!”, so when you try to tell them what a complete waste it is and how you hate having to maintain the intermediary server and how unstable the test automation is and that we should dump the whole system, they look at you like you’re trying to kill a basketful of cute puppies.
  • WebAii, in my experience, is a tad unstable, and since it’s not open source, you can’t even try to fix it.  Additionally, it needs a plug-in to work, so again, you have to maintain a test machine.
  • WatiN hasn’t even been compelling enough for me to try to use.  That’s not saying it’s bad, it’s just that nothing about it has really stood out to me.

Another thing that really bugs me about these solutions is that many of them don’t really work that well with continuous integration situations, despite claims that they’re designed for that very use.  At my company, our CI servers are all using CCNet, which is running as a service.  When running as a service, you don’t typically get an interactive window station.  In general, that’s fine.  You don’t need one.  Our build servers are spare boxes stuffed in a cabinet somewhere or rack machines living in an off-site datacenter.  Once they’re set up, it’s pretty much all automatic.  We can log into the build box remotely in the rare instance that something does go wrong, but we never stay logged in.  In fact, we can’t.  You see,  in my company (and probably in yours), there are computing security policies in place that prohibit leaving an unattended computer logged in and unlocked.  If you leave your computer unlocked, it will be locked for you.1  Trouble is, most of these web automation libraries I mentioned above require a logged in and unlocked session to function at all. 2

Okay, so no SWA, no expensive tool, and now the free stuff is shot down, as well.  What’s left?

Wouldn’t it be great if there’s something that’s free?

Wouldn’t it be great if there’s something that’s already installed on pretty much every copy of Windows since 95 OSR2?

Wouldn’t it be great if there’s something that works in headless service environments?

Wouldn’t it be great if there’s something that uses the same technology as the majority of web users?

In other words, why don’t you just use Internet Explorer to do your web automation?

Now, I’m guessing that you just answered my question with some sarcastic remark regarding Firefox, so let me address that before continuing.  Yes, using IE means you’re not using Firefox.  I understand that you like Firefox and all, but in the real world, people use IE.  Additionally and importantly, it usually doesn’t really matter that you’re only using IE.  Most of the differences between browsers are cosmetic things, like Firefox’s strange habit of occasionally making oversized divs that mask clickable areas or IE6 generally making every page look as attractive as cat vomit.  Normal web automation, regardless of what tool you’re using, will typically not pick that sort of thing up.  Web automation looks at the structure and functionality of the page, but it’s blind to the looks.  Many of the other tools I mentioned do support Firefox, if you need it, but you probably don’t need it.   After several years of web testing, I’ve only come across a handful of cases where running an automated test in Firefox would have picked up issues that would not have been seen in IE. 3  For the most part, going the extra mile to support Firefox in your automation is unnecessary and simply complicates things.

So, let’s look at using IE to solve all of your automation problems!

Okay, it won’t solve all your problems.  In fact, it’ll create new ones, I guarantee it.  But still, it’s very useful.

But first, a little warning…

You’re going to have to use COM.

Well, okay, you don’t have to use COM.  There is a .Net Web Browser class that you can probably do most of these things with, but I don’t use it.  I don’t use it because, as far as I’ve found, there’s no way to attach it to a real instance of IE.  Instead, you’d have to write your own little Windows Forms app, stick the control on it, and use it that way.  That might work for you, but I’ll stick to the full instance of IE that I can watch and manually interact with if necessary, even if it means using COM.  It is COM in .Net, though, so it’s not as bad as straight COM in C++.  There’s no QueryInterface or CComPtr<>s anything like that.  There are slightly weird things now and then, but they’re not that bad.

Right.  Disclaimer out of the way, let’s get started.

First, you need to add two references to your project.  Add a reference to your project, go to the COM tab in the dialog, and select “Microsoft HTML Object Library”, which will give you MSHTML, and “Microsoft Internet Controls”, which will give you SHDocVw.

 MSHTML is where all of the HTML parsing and related classes live.  SHDocVw is where the Internet Explorer classes are.

Now that you’ve added those references, add your using statements for the libraries, so you won’t have those ugly namespaces all over your code.

using mshtml;
using SHDocVw;

Note that although the reference to MSHTML gives the name in all caps, the namespace is, in fact, lower case.  SHDocVw is the same case both places.

Once you’re set up, you can create an instance of Internet Explorer that will launch and be ready for you to drive it through your code with one line:

InternetExplorer ieBrowser = new InternetExplorerClass();

Of course, there’s a slight problem here.  You can’t actually see the browser.  It’s there, trust me, it’s there, and pretty much everything I’m about to talk about will still work, even though you can’t see it.  However, in the interest of proving to you that what I’m talking about does, in fact, actually work, let’s make a minor modification so you can see things.

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;

There, if you run that, IE will pop open.  It won’t do much yet, but at least there’s some progress being made.

A brief aside:  You may have noticed that I created an instance of “InternetExplorerClass”, but assigned it to a variable of type “InternetExplorer”.  I did that because InternetExplorer is actually an interface, so you can’t create an instance of it. 4  InternetExplorerClass is the actual class that you need an instance of.  You could probably also do something with Activator.CreateInstance(), but I’m not going there.  I’ll have more about interfaces in a bit.

Back to the fun, to prove that we’re in control, and to start doing something actually useful, let’s point the browser at a website.  Let’s have our browser go to everybody’s favorite search engine:  Dogpile.com.  To navigate the browser you’re in control of, you use the .Navigate() method.  Unfortunately, .Navigate is all COMtaminated and ugly. 5

No, that’s not Intellisense having a freak out.  The signature of the Navigate method is actually void IWebBrowser2.Navigate(string URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers).  You only care about the URL, but it’s not going to provide you with an overload that only uses the URL.  Instead, you get all of this “ref object” crap. 6

I bet your first instinct is to think that you’ll just pass nulls to the parameters you don’t care about, and compile it and be happy.  Well, that ain’t gonna work.  See, the “ref” part of the signature means that it expects an actual object reference.  null is not an object reference, null is nothing.  The compiler won’t let you pass in nulls directly.  However, you can pass in a null object reference, and that’ll work.  Like so:

public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
{
    object nullObject = null;
    ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
}

You may have noticed that I put the Navigate call inside a helper method.  Helper methods and wrapper classes are one of your closest friends in the world of SHDocVw and MSHTML.  It’ll help hide all of the IE COM object’s interesting personality quirks in much the same way that girl you met on Match hid her interesting personality quirks until the fifth date.  Trust me, you don’t want “ref nullObject” a thousand different places in your tests, largely because it’ll scare the hell out of anyone reading your code.

If you go back to the main function and call NavigateToUrl(ieBrowser, "http://www.dogpile.com");, then run the code, you’ll have a browser that will open up and go to Dogpile all by itself.  Of course, if you are running the code as we go, you’ll probably have noticed that the browser remains open after your program exits.  Let’s take care of that before your computer explodes under the weight of a thousand IEs, shall we?  Just call .Quit() on the browser and it’ll go away.

If you call .Quit() immediately after navigating, the browser will probably close before the page even loads, so let’s add a sleep for a few seconds so you can see what’s going on.

For those of you playing the home game, here’s what my main function looks like at this point:

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;
NavigateToUrl(ieBrowser, "http://www.dogpile.com");
Thread.Sleep(5000);

//Do stuff here...

ieBrowser.Quit();

At this point, the code above is fairly useless.  Sure, you can use this to build a program that forces IE to navigate to web pages all day, but that’s not terribly exciting.  We’re having the browser navigate to a search engine, why don’t we search for something?

(By the way, you’ll want to leave that Thread.Sleep(5000); where it is.  I’ll come back to it later, but for now, DON’T TOUCH!)

When you search for something, what do you do?  Type a word in a box and click a button, right?  That’s what we need to do here.  The InternetExplorer object allows you to access all of the HTML elements on the page and interact with them, including text boxes and buttons.  If you’ve ever used JavaScript and dealt with the Document Object Model, or DOM, the methods and properties you’ll find in MSHTML will be very familiar, because they’re another implementation of the DOM standard.  The way you gain access to HTML elements is through the .Document property.

If you try to use it, Intellisense will be really helpful and tell you that the .Document property is an object.  A plain object.  A plain, useless object.  So what is the .Document property returning?

An IHTMLDocument object.

Or an IHTMLDocument2 object.

Or an IHTMLDocument3 or 4 or 5 object…

Now’s probably the time to talk about the use of interfaces in MSHTML.

In the land of .Net, if you had an IHTMLDocument5 interface, it would probably derive from IHTMLInterface4, which would derive from 3 and so on.  IHTMLDocument5 would have all of the stuff that was on the previous four interfaces, so that would be the only one you’d ever need to use, at least until IHTMLDocument6 comes along.  Not so in the land of MSHTML.  I’m not sure if it’s a COM restriction, a C++ thing, the way .Net deals with COM interfaces, or some strange design decision on the part of MSHTML, the end result is that IHTMLDocument3 and IHTMLDocument2 are pretty much independent.    If you want the title of the page, you need a reference to an IHTMLDocument2 object. If you want to call .getElementById(), you need IHTMLDocument3.

But that’s only if you want to do it the “Right” way.  If you want to do it the quick and easy way, then the .Document property is returning an HTMLElement object.  That’s the class that implements IHTMLDocument*, so it’s got everything on it.  If you want the page title and if you want to call .getElementById(), HTMLDocument will work for you.

Of course, it’s slightly riskier to do it that way.  The interfaces guarantee the contract, the class does not.  Microsoft could change the class at any time and you’d be screwed.  However, I highly doubt they’re going to do anything like that, because it would screw them over far more than it’ll screw you over.  In other words, just use HTMLDocument and you’ll have access to all the available properties, functions, events, etc., without having to cast between the interface types three hundred different places in each method.

It’s important to know that IHTMLDocument*s exist, since that’s where you’ll find much of the documentation.  And on a similar note, all of the HTMLElements that I’m going to talk about have corresponding interface types, and they’re usually what’s documented or talked about.  So, if you can’t find something about how HTMLElement works, try looking for IHTMLElement.  Or IHTMLElement2.  Or 3.  Or 4.

Now that we’ve taken that little vacation, let’s get back to work here.  I made such a big fuss about getting the page title, so let’s do that here.

HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
string pageTitle = htmlDoc.title;
Console.WriteLine(pageTitle);

Before you can interact with an element on a page, you have to find it in the document.  There are two easy ways to find things, along with a few ways that aren’t quite that easy.  Here are the ones I find the most useful.

  •  .getElementById(string):  This method takes the ID of the HTML element you want and returns the IHTMLElement with that ID. In HTML, an ID is supposed to be a unique identifier, identifying a single element.  Of course, certain popular HTML editors (like Notepad, for instance) won’t enforce a unique ID, so if there are multiple elements with the same ID, this method will return one of them.  This one is good if you know exactly what you’re looking for.
  • .getElementsByName(string):  This method takes the name of HTML elements and returns an IHTMLElementCollection of all of the elements with that name.  This one is good if you have an element or a handful of elements with a known name.
  • .getElementsByTagName(string):  This method takes a tag name, like “a” or “img” or “div” and will return an IHTMLElementCollection of all of the elements with that tag name.  Use this method to quickly get a collection of all of the links or images on a page, or if you’re looking for an element of a certain type with certain characteristics and need to run through the list to find it.
  • .documentElement:  This property returns an IHTMLElement of the root of the HTML content of the page.  On a page that plays by the rules, this will be the <html> element.  If your page doesn’t play by the rules, good luck.  This is a good starting point if you want to walk the tree.
  • .childNodes:  This property will give you an IHTMLElementCollection of the direct children of the current node.
  • .all:  This property returns an IHTMLElementCollection containing a flattened list of all of the elements in the document.  Use this when you don’t care about structure and need to do something that involves lots of nodes of different types.

Unfortunately, as far as I’ve found, there’s no support for something like XPath, which would let you give the node tree path of the elements you want in a simple string format.  If you enjoy pain, you could build something like that yourself.

 The specified return type for most of these methods is IHTMLElement, which is the base element type in MSHTML.  In reality, the element instances are all specific element types.  For instance, an <img> tag will return an IHTMLImageElement object, and an <a> will give you an IHTMLAnchorElement object.7  The specific types will have specific properties, so if you know what element type you have and you need to use it for something (Say, for instance, if you need to get the src attribute from an <img>), then you should cast it to the specific type.

Right-o, let’s start doing useful stuff, shall we?  Back before we took a wild turn and ended up hopelessly sidetracked, we had Internet Explorer going to the front page of the search engine Dogpile.com.  Now, let’s make IE do a search.  To do that, we need to grab the search box, put text in it, then grab the search button and click it.  We’ll use the .getElementById method I talked about to get the search box.  Using something like the IE developer tools or Firebug8 or even viewing the page source in Notepad, you can find that the search box is an <input> element with an ID of “icePage_SearchBoxTop_qkw”.

IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
textBox.value = "powered by awesome";

If you run this, the browser will open, and the phrase “powered by awesome” will appear in the search box.

A couple of points to note.  Even though in the HTML, the search box is an <input> tag, the element you’ll get back is an IHTMLInputTextElement.  The different <input> types are all represented by distinct classes, which is very helpful, because there’s not much in common between a checkbox, a text box, or a button.  Then, once you have the element, it has a .value property, which acts as a getter and setter for the contents of the text box.

Grabbing the submit button is similar:

IHTMLInputButtonElement submitButton = (IHTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");

Unfortunately, when you try to click the button, you’ll run into this:

There’s no click there.  There’s nothing remotely resembling a click.  A button’s sole reason for existing is to be clicked, yet you can’t click this button.

Actually, you can.  Just not on IHTMLInputButtonElement, where you’d think you should be able to.  You see, you can actually click on any HTML element, so the .click() method is on the base IHTMLElement.  This goes back to the interfaces I went rambling on and on about a while back.  To find the functionality you want, you sometimes have to bounce around almost randomly until you find what you need.  So, to hell with the interfaces, let’s go directly with the concrete class again, like we did with the document.  In this case, it’s HTMLInputButtonElement. 9

HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
submitButton.click();
Thread.Sleep(5000);

Again, there’s a Thread.Sleep() after the action, so the program will wait long enough for the page to finish loading.  And again, I promise I’ll talk about it later, but for now, trust me and just leave it there.

Now we’re on an entirely new page.  If you try to use the document or the elements you grabbed before, the results will be, uh, shall we say, unpredictable…  The old page no longer exists, so don’t try to use anything from it.  You have to grab a new reference to the document, as well as new elements to play around with.

We’re on a search results page now, so let’s do something like print out all the result titles and the URLs to all of the images.

Console.WriteLine("Links by tag name:");
foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
{
    if(anchorElement.className == "resultLink")
    {
        Console.WriteLine(anchorElement.innerText);
    }
}

Console.WriteLine("Links by result walking:");
IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
{
    IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
    Console.WriteLine(resultLink.innerText);
}

Console.WriteLine("img src:");
foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
{
    Console.WriteLine(imgElement.src);
}

Console.WriteLine("img src 2:");
foreach (IHTMLImgElement imgElement in htmlDoc.images)
{
    Console.WriteLine(imgElement.src);
}

The first bit walks through all of the links, which are <a> tags, looking for elements with the class “resultLink”.  When it finds one, it prints out the .innerText property, which contains the flattened text content of the element.  The second section finds the same elements, but walks through a bit of the tree structure to find the links among children nodes.

I should probably point out now that the structure of websites tends to change over time, so if you try to run this and you get a bunch of exceptions, that’s what’s going on.  If anything on the page changes, this code is likely to break.  It works for me right now, and that’s really all that matters anyway.

The last bit, the part with the “img src” is walking through the page and printing out the URLs of all of the images on the page in two different ways.  First by using the tag name method you already have seen, and the second time by using the .images convenience property on the document.  There are a few other properties like that, so take a look at what Intellisense shows you and play around a bit to get a feel for what’s there.

BUT WAIT, THERE’S MORE!

We’ve got all this access to stuff on the page.  We can put text in text boxes, we can click buttons, we can read the links and images, so why not step it up a notch and modify the page in some crazy way.  Like, I don’t know, maybe we could put a box around every div on the page?

Like so:

foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
{
    divElement.runtimeStyle.borderStyle = "groove";
    divElement.runtimeStyle.borderWidth = "3";
}

KABOOM!

Of course, a crazy box cascade is of little practical value, but you get the basic idea of what you’re able to do.  You’re inside the page being rendered, so you can completely rewrite it if you want.  You’re not stuck with a static, read-only page, so learn how and where to use that to your advantage.  I’ve used this ability inside tests to write out debug information or inject JavaScript functions to be called by the automation.

Here’s the full example code from today:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using mshtml;
using SHDocVw;
using System.Threading;

namespace IEAutomationSample
{
    class Program
    {
        static void Main(string[] args)
        {
            InternetExplorer ieBrowser = new InternetExplorerClass();
            ieBrowser.Visible = true;
            NavigateToUrl(ieBrowser, "http://www.dogpile.com");
            Thread.Sleep(5000);

            //Do stuff here...
            HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
            string pageTitle = htmlDoc.title;
            Console.WriteLine(pageTitle);

            IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
            textBox.value = "powered by awesome";

            HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
            submitButton.click();
            Thread.Sleep(5000);

            htmlDoc = (HTMLDocument)ieBrowser.Document;

            Console.WriteLine("Links by tag name:");
            foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
            {
                if(anchorElement.className == "resultLink")
                {
                    Console.WriteLine(anchorElement.innerText);
                }
            }

            Console.WriteLine("Links by result walking:");
            IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
            foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
            {
                IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
                Console.WriteLine(resultLink.innerText);
            }

            Console.WriteLine("img src:");
            foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
            {
                Console.WriteLine(imgElement.src);
            }

            Console.WriteLine("img src 2:");
            foreach (IHTMLImgElement imgElement in htmlDoc.images)
            {
                Console.WriteLine(imgElement.src);
            }

            foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
            {
                divElement.runtimeStyle.borderStyle = "groove";
                divElement.runtimeStyle.borderWidth = "3";
            }   

            Thread.Sleep(5000);
            ieBrowser.Quit();
        }

        public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
        {
            object nullObject = null;
            ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
        }
    }
}

As always, you can pull the project out of SVN:  http://www.mathpirate.net/svn/Projects/IEAutomationSample/

That’s about all I wanted to get into as far as a hands-on demonstration.  Now, it’s time for warnings about what can and will go wrong.  So watch out.

First, as promised, let’s talk about those Thread.Sleep()s that I scattered throughout the code.  They’re there because you have to wait for the browser to finish its work, otherwise you’ll get random exceptions.  Exceptions that will never happen when you step through in a debugger, either.  However, it’s not a good practice to rely on sleeping for a fixed amount of time in your automation.  If the browser loads the page in half a second, but you’re sleeping for five seconds, then you’ve wasted four and a half seconds.  That kind of time adds up fast.  On the other hand, if the process is slow, five seconds might not be enough.  Your application will wake up too early and die.

In most cases, I’d suggest polling.  Check the status of something, or look to see if something exists fairly frequently, but keep looking for a reasonable amount of time.  For instance, you could check the .Busy flag on the InternetExplorer object every 100 ms for 30 seconds.  That way, you’ll never sit around for more than 100 ms longer than you need to, plus, you’ll keep checking long enough to be sure that it will finish.  If the page isn’t done loading in 30 seconds, you should probably fail right there.

Except that polling the .Busy flag doesn’t actually work reliably.

If you try to poll the Busy flag exclusively, you’ll find that your tests will sometimes randomly fail.  They’ll look like they should be working.  IE will be loading the page you expect it to load and everything will look right, but you’ll get an exception.  You see, you’re not synchronously driving IE.  You’re talking to an intermediate layer that’s relaying your commands to IE, and IE will respond eventually.  What that means is that you’ll tell IE to load a page, then you’ll check the Busy flag.  Most of the time, Busy will return true because it’s loading the page or false because it’s done loading the page.  But sometimes, your check on the Busy flag will get to IE before it’s started loading the page.  In this case, Busy will return false.  As far as IE is concerned, it’s not busy.  It’s done loading the page.  Trouble is, it’s telling you that it’s done loading the last page, not the page you just told it to load.

One way to counteract this is to sleep for a small amount of time before starting the polling, perhaps 250 ms.  This usually gives IE a chance to start moving and will increase the reliability.  However, it’s going to have the same problem as sleeping did originally.  You’ll often be wasting time waiting around for something that’s already done, and occasionally, you still won’t be waiting long enough.

Another way to combat this is to listen to some of the events hanging on the InternetExplorer interface.  There are events, such as NavigateComplete2, DocumentComplete, and DownloadComplete that you might be able to handle and set your own status flags in.  For instance, you can set a flag before you start to navigate, then have your NavigateComplete2 event handler unset that flag when it’s called.  If it’s called…  And if it’s called for the correct navigation event.  You have to be very careful with some of these events.  I believe DownloadComplete is fired by XMLHttpRequests used by AJAX calls, so that could trip up your detection.  NavigateComplete2 will get called when the main page finishes loading as well as when a frame finishes loading, so if you have a hidden iframe on your page for something like tracking and analytics, watch out for that.

I still have not found a flawless way to wait for page completion.  I’ve found a complicated tangle of states and flags and events that make it work in most cases, but not all.  So, good luck with that.

Security will also get in your way when dealing with IE Automation.  Microsoft rightfully doesn’t want script kiddies and other assorted bastards being able to do things like automatically download files to your computer.  Unfortunately, script kiddies are using the same bit of DOM technology that you’re trying to use, and MS has no way to tell you apart, so that means that sometimes you’ll be blocked from doing things.  I don’t think you can read from a password text box and I don’t think you can directly write to a file upload control.  Sometimes when you click links or buttons that launch certain actions like file downloads, you’ll get a yellow bar that wouldn’t be there if you’d clicked the button yourself.  You have to find crazy workarounds for these issues.  Sometimes you’ll spend all day trying circumvent IE’s security just to click one stupid button.

Another issue you’re likely to run into are random, unexplained failures, often with useless error messages, like “COMException -21234115153″ or “RPC server has exploded, try again.”  Many of these exceptions will be timing problems.  Wait just a little longer and you’ll be fine.  I’ve had the constructor for the IE COM object give me an instance of IE that had already been destroyed.  Some errors I’ve seen are obscure COM threading issues.  You’ll get InvalidCastExceptions trying to access some of the properties, like  .location or .frames, even though you’re not casting anything.  You can sometimes fix those by setting your application to run in a Single-Threaded Apartment (Whatever in the hell that means) by putting the [STAThread] attribute on your Main method…  If you have a Main method.  If you’re in some library, or someplace like NUnit or VS Unit Tests, well, then, you’re just plain screwed.  And just this past week, I ran into a case where ieBrowser.HWND would throw an InvalidCastException every other time I called it.  Seriously, odd numbered of calls led to an exception, while even calls gave me a number.  The fix?

try { hwnd = ieBrowser.HWND; }
catch { hwnd = ieBrowser.HWND; }

Seriously.  I wrote that this week.  WTF?

I still feel dirty.

And finally, speaking of dirty, writing a bot to steal porn is left as an exercise for the reader.

  1. After some kind soul Hasslehoffs your desktop… []
  2. For that matter, so does SWA, but that’s a different story. []
  3. A Firefox specific toolbar and some Javascript issues []
  4. It really bugs me, too, because it should be IInternetExplorer… []
  5. And to make it even better, there’s a Navigate2() method, which is even uglier. []
  6. In the C++ world, the ref objects are all VARIANT*s.  The .Net magic that lets you use COM translates the VARIANT to object and the * to the ref.  Unfortunately, every one of those parameters could have had a strong type.  Flags is an int, TargetFrameName is a string (Well, BSTR, but whatever), and so on.  It didn’t have to be like this!  ARGH COM. []
  7. Okay, they’re really HTMLImageElements and HTMLAnchorElements, but who’s keeping track? []
  8. Yeah, Firebug is for Firefox, but a good web tester will have at least two or three browsers at the ready at all times. []
  9. Just don’t look too closely at the definition of HTMLInputButtonElement or HTMLDocument or any of the other things I called concrete classes, or you’ll discover that they, too, are interfaces.  The actual class is HTMLInputButtonElementClass or HTMLDocumentClass.  Whatever.  I don’t know what’s right and what’s real anymore… []

February 13, 2010   2 Comments

LCARS: Little Crusher Automated Removal System

Here’s the video:

http://www.mathpirate.net/hold/LCARS1.wmv

This is a scene from the Next Generation episode Journey’s End.  The first run is simply the result with Wesley Crusher blacked out.  The second part displays the faces that are recognized.  There are occasional blips (Data gets blacked out in several frames, Wesley’s ear gets “recognized” a few times), but overall, not too shabby for something thrown together in a few days with no virtually no tuning of the system being done.  The reason he’s not blacked out at the end is likely due to the decision to expand the minimum face size to 50×50.  If the face size were smaller, it likely would have detected and blocked that part, as well.

November 29, 2009   No Comments

Visualization Activated

Images in action:

STTNGFaces1

Target Acquired

STTNGFaces2

Multiple hits.  Note that it doesn’t see Geordi…  I think the face detector relies heavily on the eyes, so his VISOR is confusing it.

STTNGFaces3

It can even detect redshirts!

November 28, 2009   No Comments

It Works!

IT WORKS!

HOLY CRAP IT ACTUALLY WORKS!

I trained it on a set of about half of the images, because it takes a long time to train.  Then I ran it on the video and it was actually returning the right person most of the time.  I’m actually shocked that it’s working so well, especially seeing that this is the first time I’ve had the recognition part actually compiling and running.  No real tweaking necessary.  The problems I’ve been having have all been with calling the functions and simply using the library, it’s not that the software was doing what I told it to do and it all came out wrong.

Looking at this trace, you can even get a sense of the cuts in the scene:

Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: DeannaTroi
Recognized: JeanLucPicard
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta
Recognized: Anthwarta

Of course, right now, it’s only working on the positive cases.  When it knows someone, it knows them.  When it doesn’t know someone, it randomly guesses about who it is.

Now, something interesting about that…  I haven’t trained the system on Wesley yet.  He’s down in the Ws and the images I trained with were alphabetical.  However, in the scenes with him, he’s fairly regularly identified as “JeanLucPicard”.

Gotta wonder if the Doctor and the Captain have something they need to talk to him about…

Now I just have to label the faces on the screen, since screenshots are better than random text.

November 28, 2009   No Comments

Putting People In Boxes

So, after fixing the crashing issue, I started up the detection algorithm and let it roll.

FaceDetection1

Despite the fact that I’m in a poorly lit room and wasn’t even looking at the camera, it found me.  I like that it’s sensitive enough for that, since my planned application of this can’t rely on well-lit clear images all the time.

Of course, the sensitivity has a downside…

FalsePositives

I turned on the lights and it still found me, which was good.  However, it also believes that there are faces on the wall, on the ceiling, a big face made up of shelving and boxes, as well as three separate faces on the Commodore Plus/4 box.

It’s confused, of course, because it’s actually the Tomy Tutor box that has all the faces on it.

TomyTutorFaces

So, in summary:

  • Multiple faces: good.
  • Faces where they don’t exist: psychotic and delusional.

Must fix the psychotic and delusional part before I let this loose on the world.

There’s one other slight problem…  The facial detection is pegging my CPU and processing about two frames a second with a two second delay.  That’s not going to be acceptable, either…

November 26, 2009   No Comments

Well, that seems to be working…

Of course, this is completely meaningless without the audio to go along with it, but still…  It had a decently high hit rate for something that was quickly thrown together in an attempt to get something going on.

Say Something
Detected: 00:00:00.5400000
Rejected: blue
Detected: 00:00:01.0300000
Rejected: blue
Detected: 00:00:01.7600000
Rejected: blue
Detected: 00:00:02.7600000
Hypothesis: green
green
Detected: 00:00:04.8900000
Hypothesis: red
red
Detected: 00:00:07.8000000
Hypothesis: blue
blue
Detected: 00:00:10.8700000
blue
Detected: 00:00:15.3700000
Hypothesis: red
Rejected: red
Detected: 00:00:15.7300000
Rejected: blue
Detected: 00:00:18.8700000
Rejected: red
Detected: 00:00:20.4500000
Hypothesis: blue
Rejected: blue
Detected: 00:00:21.0500000
Rejected: blue
Detected: 00:00:21.4100000
Rejected: blue
Detected: 00:00:21.6900000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:31.7400000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:34.4500000
blue
Detected: 00:00:36.4600000
Rejected: blue
Detected: 00:00:38.6400000
Hypothesis: green
green
Detected: 00:00:40.2800000
Hypothesis: blue
Rejected: blue
Detected: 00:00:40.8800000
Rejected: blue
Detected: 00:00:42.6900000
Rejected: red
Detected: 00:00:50.6900000
Hypothesis: red
red
Detected: 00:00:52
Rejected: blue
Detected: 00:00:52.4400000
Hypothesis: blue
blue
Detected: 00:00:53.5600000
Rejected: blue
Detected: 00:00:55.0100000
Hypothesis: green
green
Detected: 00:00:56.4800000
Rejected: blue
Detected: 00:00:58.1300000
red
Detected: 00:00:59.8000000
Rejected: blue
Detected: 00:01:01.3400000
Hypothesis: blue
blue

November 25, 2009   No Comments

Crazy Weekend Project 2: Semi-Crazy

As you may recall, back in September, I spent five solid days building a robot that could play Atari 2600 Pong. It wasn’t perfect, but it did beat the computer player in several matches. However, there was significant room for improvement. The motion was too jerky, the trajectory projection algorithm had problems, and the robot was no match for a human player. Over the next five days, I will not be continuing that project.

You see, a couple of weeks ago, I finally bought an XBox 360, so I just don’t have that kind of time to devote to building robots at the moment.  Instead, I’ll be doing something much more practical and limited in scope, and only spend a few hours a day on it.  The rest of the time I’ll be alone in my apartment, immersed in HD gaming glory, like any other sane person would be this weekend.

Now, by “more practical and limited in scope”, I mean that I intend to attempt to build a facial recognition system and voice activated command processor.  The reason for this is plain:  Everyone needs a facial recognition system and voice activated command processor.  What good is a computer without one?  Additionally, these are two of the three necessary pieces that I need in order to fully exploit an HP TouchSmart PC that I got from Haggle.com a few weeks back, which has an integrated webcam and microphone.  The third piece, exploitation of the multi-touch screen, is left as an exercise to the reader.

As with the previous Crazy Project Weekend, I have not done any work in these areas or used any of these technologies prior to the commencement of the Crazy Project Weekend, other than a cursory glance to make sure that I’d have a chance of doing something useful in the timeframe alloted.  Additionally, I will be sharing successes, failures, thoughts, and above all, source code, which, in this case, might actually be useful to other people.

November 24, 2009   No Comments

END OF LINE

And with that, I’m declaring Crazy Project Weekend complete.  All in all, it was a success, although there were some things that didn’t work out.  I started on Friday morning with nothing but an idea and ended up successfully building a robot that could win a game of Pong, despite never having touched most of the major technologies (OpenCV, Mindstorms, Bluetooth) prior to this weekend.

I’m definitely going to have to do this sort of thing more often.

Just a reminder, the source code, if you’re interested, is available in SVN:  http://www.mathpirate.net/svn/

And now…  Sleep.

September 8, 2009   No Comments