Random header image... Refresh for more!

Well, That Could Have Something To Do With It…

public CvPoint GetAverageVelocity()
{
    double avgX = BombPositions.Average(bomb => bomb.X);
    double avgY = BombPositions.Average(bomb => bomb.Y);

    return new CvPoint((int)avgX, (int)avgY);
}

And that’s why I don’t work for NASA.

March 1, 2010   No Comments

Bomb Tracking Code

No no no, that just won’t do.  Won’t do at all.

I think the idea will work, but the implementation needs refinement.

March 1, 2010   No Comments

That Could Be A Problem

Can you spot the bug here?

public void ButtonDown()
{
    Rotate(NxcMotorPort.OUT_B, -50, 45);
}
public void ButtonUp()
{
    Rotate(NxcMotorPort.OUT_B, -50, 45);
}

February 28, 2010   No Comments

Working for the TSA

That’s looking better.  This is an implementation of the strip method of bomb detection suggested by a friend I mentioned in the previous post.  This appears to be roughly twice as fast as the rescan method.

The assumptions are as follows:

  • There can only be one bomb in a horizontal row.
  • Not all rows will have a bomb.

To find all bombs:

  1. Partition the playfield into strips the height of a bomb.
  2. Look for a bomb in each strip.
  3. Blot out the bomb point, in case the match overlaps strip boundaries.

This ends up only having to scan the source image once.  It also reduces the amount of noise, because only one bomb is allowed per row.  However, you do end up making more calls cvMinMaxLoc(), because you’re calling it once per strip, whereas in the rescan algorithm, you stop making calls after you hit a cutoff threshold.

In all, this method seems to be roughly twice as fast as the rescan, which is very good.

It also means that I’m now performing a strip search in an attempt to find bombs.  I should work at the airport.

Anyway, since I haven’t posted any code today, here’s the strip search algorithm.

IplImage bombMatches = new IplImage(image.Width - _bombTemplate.Width + 1, image.Height - _bombTemplate.Height + 1, BitDepth.F32, 1);
Cv.MatchTemplate(image, _bombTemplate, bombMatches, MatchTemplateMethod.SqDiffNormed);

for (int stripY = 0; stripY < bombMatches.Height; stripY += _bombTemplate.Height)
{
 bombMatches.ROI = new CvRect(0, stripY, bombMatches.Width, _bombTemplate.Height);
 Cv.MinMaxLoc(bombMatches, out minVal, out maxVal, out minLoc, out maxLoc);

 if (minVal < minThreshold)
 {
  CvPoint adjusted = minLoc + new CvPoint(_bombTemplate.Width / 2, _bombTemplate.Height / 2 + stripY);
  image.Circle(adjusted, 10, CvColor.Red);
  bombMatches.Circle(minLoc, 5, CvColor.White, -1);
 }
}

Now, the problem is that the cvMatchTemplate call seems to take about 50 ms to run.  At 30 FPS, I have only 33ms to process a frame.  I might be able to live with 50 ms, but I haven’t even started to find the buckets yet.

February 27, 2010   No Comments

Achievement Unlocked: The Situation Is Under Control

The first bit of code for Crazy Weekend Project III has been checked in.  It’s a rework of the OpenCV Video Capture code from CWP I that makes it a little bit more friendly and reusable.  (And probably introduces some fun bugs…)

For those wishing to play along at home:

The new solution is located here:  http://www.mathpirate.net/svn/Projects/AtariRobot/AtariRobot2/

You may be required to pull projects from other places in the repository, including /external and /Projects/AtariRobot.

Here’s the current interface of the VideoCapture class:

public class VideoCapture
{
    public List<IplImage> HistoricalFrames { get; protected set; }
    public IplImage CurrentFrame { get; protected set; }
    public CvCapture Capture { get; protected set; }
    public int FrameTimeMS { get; protected set; }
    public Options Options { get; protected set; }
    public int FrameNumber { get; protected set; }

    public void Start();
    public void Start(Options options);
    public void Stop();
}

Argh, I’ve forgotten how to successfully paste code into WordPress so that the indention isn’t mangled (Good thing I’m not writing Python!) and so greater thans and less thans and ampersands aren’t destroyed.

Anyway, you just new up one of these VideoCapture classes, then call Start().  If you’ve already configured the video options (What camera ID to use or what video file to read), you can pass that in, otherwise it will ask when you call Start().  Start() will start a new thread that constantly reads frames at the given video’s frame rate1 and sticks them into the HistoricalFrames List.  Current Frame is always the latest frame, and operations will generally use this frame.  When you’re done, call Stop() and it’ll dispose everything for you and kill the thread.  Three lines to get frames from a video.  Much nicer than the 30 or so that were there before.

  1. Or at least close to it.  I’m using Thread.Sleep() here, which is not the best for timing this sort of thing… []

February 25, 2010   No Comments

Round 1

First task of the project is fairly dull and unexciting:  Refactor what I did last time.  The code that’s there is messy and tangled.  It was all about doing it fast, not doing it right.  I need to try to get a better framework in place so that I’m not redoing all of the core pieces every time.  The video input code will pretty much be the same for anything, so I’ll try to see about pulling that out of the grips of the Pong code.

February 25, 2010   No Comments

Web Automation (or: How To Write A Bot To Steal Porn)

A while back, I wrote about using the System.Windows.Automation libraries to write automation to drive Windows applications. With SWA and UIAutomation, you can write code to use Win32 apps, Windows.Forms programs, WPF and even Silverlight. That’s all happy and fun, as long as you’re only dealing with Windows applications. Trouble is, there this thing called “The Web” that’s all the rage with kids these days, and sooner or later, you’ll probably have to use it, too. If you pull out your handy installation of UISpy and try to inspect a web page, you get a whole big block of nothing. The red rectangle will outline the window and tell you that all of those things that look like text boxes and buttons aren’t really text boxes and buttons. That means you can’t use SWA for web sites.

That, well, that kinda sucks.  So, what do you do about it?

Obviously, the correct solution here is to admit defeat:   The tool you know about doesn’t work, so it’s too hard to do.  Time to give up and pay thousands of dollars a seat for some whiz-bang tool that promises to do what you need and even has a handy-dandy recorder, so you don’t even have to think about what you’re doing!

Or…  Not.

That whiz-bang tool is only going to cost you money and it’s not going to do a damn thing for you.  You’ll have to pay high-priced consultants and high-priced support engineers just to figure out how it works.  You see, their model is to cram so many features in and make it so complicated to use that you think that you must be stupid because you can’t understand it and as soon as you figure out that one last thing, you’ll be more productive than you ever were before.

And oh, will that test recorder make you productive!  You’ll be able to hire a monkey to point and click your way to hundreds of test cases with ease!  Except that they’re hundreds of useless test cases, because either the verification that the tool provides is hopelessly limited and unable to actually verify your website, or, well, you hired a monkey to do your testing and they have no idea how to do anything beyond pointing and clicking.  But that’s all right.  You see, as soon as a single line in the HTML of your web page changes in just the tiniest way, every last one of those recorded tests will break and you’ll have to completely redo them.

So, SWA is out and the big expensive tool is a total waste.  What else is there?

Well, there’s things like Selenium or WebAii or WatiN.  They’re free or open source libraries that you can use to drive web browsers to do your bidding.  They all support IE and Firefox and possibly other browsers.  And they’re all written by people who don’t seem to have ever tried to write web automation.

  • Selenium:  The default mode is to write your tests in HTML tables, with the thinking that “Anyone can write HTML tables, so anyone can write tests.”  That’s not what happens.  What happens is that you set it up, all the devs and PMs excitedly chatter about how “Anyone can write tests now!”, you give a training session, two devs out of a team of seven will ever write tests using it, creating a grand total of thirteen absolutely worthless tests before giving up, yet somehow, two months later, the director of software engineering will be talking to the EVP of product development and tell him how great it is that we’re using Selenium because “Anyone can write tests now!”, so when you try to tell them what a complete waste it is and how you hate having to maintain the intermediary server and how unstable the test automation is and that we should dump the whole system, they look at you like you’re trying to kill a basketful of cute puppies.
  • WebAii, in my experience, is a tad unstable, and since it’s not open source, you can’t even try to fix it.  Additionally, it needs a plug-in to work, so again, you have to maintain a test machine.
  • WatiN hasn’t even been compelling enough for me to try to use.  That’s not saying it’s bad, it’s just that nothing about it has really stood out to me.

Another thing that really bugs me about these solutions is that many of them don’t really work that well with continuous integration situations, despite claims that they’re designed for that very use.  At my company, our CI servers are all using CCNet, which is running as a service.  When running as a service, you don’t typically get an interactive window station.  In general, that’s fine.  You don’t need one.  Our build servers are spare boxes stuffed in a cabinet somewhere or rack machines living in an off-site datacenter.  Once they’re set up, it’s pretty much all automatic.  We can log into the build box remotely in the rare instance that something does go wrong, but we never stay logged in.  In fact, we can’t.  You see,  in my company (and probably in yours), there are computing security policies in place that prohibit leaving an unattended computer logged in and unlocked.  If you leave your computer unlocked, it will be locked for you.1  Trouble is, most of these web automation libraries I mentioned above require a logged in and unlocked session to function at all. 2

Okay, so no SWA, no expensive tool, and now the free stuff is shot down, as well.  What’s left?

Wouldn’t it be great if there’s something that’s free?

Wouldn’t it be great if there’s something that’s already installed on pretty much every copy of Windows since 95 OSR2?

Wouldn’t it be great if there’s something that works in headless service environments?

Wouldn’t it be great if there’s something that uses the same technology as the majority of web users?

In other words, why don’t you just use Internet Explorer to do your web automation?

Now, I’m guessing that you just answered my question with some sarcastic remark regarding Firefox, so let me address that before continuing.  Yes, using IE means you’re not using Firefox.  I understand that you like Firefox and all, but in the real world, people use IE.  Additionally and importantly, it usually doesn’t really matter that you’re only using IE.  Most of the differences between browsers are cosmetic things, like Firefox’s strange habit of occasionally making oversized divs that mask clickable areas or IE6 generally making every page look as attractive as cat vomit.  Normal web automation, regardless of what tool you’re using, will typically not pick that sort of thing up.  Web automation looks at the structure and functionality of the page, but it’s blind to the looks.  Many of the other tools I mentioned do support Firefox, if you need it, but you probably don’t need it.   After several years of web testing, I’ve only come across a handful of cases where running an automated test in Firefox would have picked up issues that would not have been seen in IE. 3  For the most part, going the extra mile to support Firefox in your automation is unnecessary and simply complicates things.

So, let’s look at using IE to solve all of your automation problems!

Okay, it won’t solve all your problems.  In fact, it’ll create new ones, I guarantee it.  But still, it’s very useful.

But first, a little warning…

You’re going to have to use COM.

Well, okay, you don’t have to use COM.  There is a .Net Web Browser class that you can probably do most of these things with, but I don’t use it.  I don’t use it because, as far as I’ve found, there’s no way to attach it to a real instance of IE.  Instead, you’d have to write your own little Windows Forms app, stick the control on it, and use it that way.  That might work for you, but I’ll stick to the full instance of IE that I can watch and manually interact with if necessary, even if it means using COM.  It is COM in .Net, though, so it’s not as bad as straight COM in C++.  There’s no QueryInterface or CComPtr<>s anything like that.  There are slightly weird things now and then, but they’re not that bad.

Right.  Disclaimer out of the way, let’s get started.

First, you need to add two references to your project.  Add a reference to your project, go to the COM tab in the dialog, and select “Microsoft HTML Object Library”, which will give you MSHTML, and “Microsoft Internet Controls”, which will give you SHDocVw.

 MSHTML is where all of the HTML parsing and related classes live.  SHDocVw is where the Internet Explorer classes are.

Now that you’ve added those references, add your using statements for the libraries, so you won’t have those ugly namespaces all over your code.

using mshtml;
using SHDocVw;

Note that although the reference to MSHTML gives the name in all caps, the namespace is, in fact, lower case.  SHDocVw is the same case both places.

Once you’re set up, you can create an instance of Internet Explorer that will launch and be ready for you to drive it through your code with one line:

InternetExplorer ieBrowser = new InternetExplorerClass();

Of course, there’s a slight problem here.  You can’t actually see the browser.  It’s there, trust me, it’s there, and pretty much everything I’m about to talk about will still work, even though you can’t see it.  However, in the interest of proving to you that what I’m talking about does, in fact, actually work, let’s make a minor modification so you can see things.

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;

There, if you run that, IE will pop open.  It won’t do much yet, but at least there’s some progress being made.

A brief aside:  You may have noticed that I created an instance of “InternetExplorerClass”, but assigned it to a variable of type “InternetExplorer”.  I did that because InternetExplorer is actually an interface, so you can’t create an instance of it. 4  InternetExplorerClass is the actual class that you need an instance of.  You could probably also do something with Activator.CreateInstance(), but I’m not going there.  I’ll have more about interfaces in a bit.

Back to the fun, to prove that we’re in control, and to start doing something actually useful, let’s point the browser at a website.  Let’s have our browser go to everybody’s favorite search engine:  Dogpile.com.  To navigate the browser you’re in control of, you use the .Navigate() method.  Unfortunately, .Navigate is all COMtaminated and ugly. 5

No, that’s not Intellisense having a freak out.  The signature of the Navigate method is actually void IWebBrowser2.Navigate(string URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers).  You only care about the URL, but it’s not going to provide you with an overload that only uses the URL.  Instead, you get all of this “ref object” crap. 6

I bet your first instinct is to think that you’ll just pass nulls to the parameters you don’t care about, and compile it and be happy.  Well, that ain’t gonna work.  See, the “ref” part of the signature means that it expects an actual object reference.  null is not an object reference, null is nothing.  The compiler won’t let you pass in nulls directly.  However, you can pass in a null object reference, and that’ll work.  Like so:

public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
{
    object nullObject = null;
    ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
}

You may have noticed that I put the Navigate call inside a helper method.  Helper methods and wrapper classes are one of your closest friends in the world of SHDocVw and MSHTML.  It’ll help hide all of the IE COM object’s interesting personality quirks in much the same way that girl you met on Match hid her interesting personality quirks until the fifth date.  Trust me, you don’t want “ref nullObject” a thousand different places in your tests, largely because it’ll scare the hell out of anyone reading your code.

If you go back to the main function and call NavigateToUrl(ieBrowser, "http://www.dogpile.com");, then run the code, you’ll have a browser that will open up and go to Dogpile all by itself.  Of course, if you are running the code as we go, you’ll probably have noticed that the browser remains open after your program exits.  Let’s take care of that before your computer explodes under the weight of a thousand IEs, shall we?  Just call .Quit() on the browser and it’ll go away.

If you call .Quit() immediately after navigating, the browser will probably close before the page even loads, so let’s add a sleep for a few seconds so you can see what’s going on.

For those of you playing the home game, here’s what my main function looks like at this point:

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;
NavigateToUrl(ieBrowser, "http://www.dogpile.com");
Thread.Sleep(5000);

//Do stuff here...

ieBrowser.Quit();

At this point, the code above is fairly useless.  Sure, you can use this to build a program that forces IE to navigate to web pages all day, but that’s not terribly exciting.  We’re having the browser navigate to a search engine, why don’t we search for something?

(By the way, you’ll want to leave that Thread.Sleep(5000); where it is.  I’ll come back to it later, but for now, DON’T TOUCH!)

When you search for something, what do you do?  Type a word in a box and click a button, right?  That’s what we need to do here.  The InternetExplorer object allows you to access all of the HTML elements on the page and interact with them, including text boxes and buttons.  If you’ve ever used JavaScript and dealt with the Document Object Model, or DOM, the methods and properties you’ll find in MSHTML will be very familiar, because they’re another implementation of the DOM standard.  The way you gain access to HTML elements is through the .Document property.

If you try to use it, Intellisense will be really helpful and tell you that the .Document property is an object.  A plain object.  A plain, useless object.  So what is the .Document property returning?

An IHTMLDocument object.

Or an IHTMLDocument2 object.

Or an IHTMLDocument3 or 4 or 5 object…

Now’s probably the time to talk about the use of interfaces in MSHTML.

In the land of .Net, if you had an IHTMLDocument5 interface, it would probably derive from IHTMLInterface4, which would derive from 3 and so on.  IHTMLDocument5 would have all of the stuff that was on the previous four interfaces, so that would be the only one you’d ever need to use, at least until IHTMLDocument6 comes along.  Not so in the land of MSHTML.  I’m not sure if it’s a COM restriction, a C++ thing, the way .Net deals with COM interfaces, or some strange design decision on the part of MSHTML, the end result is that IHTMLDocument3 and IHTMLDocument2 are pretty much independent.    If you want the title of the page, you need a reference to an IHTMLDocument2 object. If you want to call .getElementById(), you need IHTMLDocument3.

But that’s only if you want to do it the “Right” way.  If you want to do it the quick and easy way, then the .Document property is returning an HTMLElement object.  That’s the class that implements IHTMLDocument*, so it’s got everything on it.  If you want the page title and if you want to call .getElementById(), HTMLDocument will work for you.

Of course, it’s slightly riskier to do it that way.  The interfaces guarantee the contract, the class does not.  Microsoft could change the class at any time and you’d be screwed.  However, I highly doubt they’re going to do anything like that, because it would screw them over far more than it’ll screw you over.  In other words, just use HTMLDocument and you’ll have access to all the available properties, functions, events, etc., without having to cast between the interface types three hundred different places in each method.

It’s important to know that IHTMLDocument*s exist, since that’s where you’ll find much of the documentation.  And on a similar note, all of the HTMLElements that I’m going to talk about have corresponding interface types, and they’re usually what’s documented or talked about.  So, if you can’t find something about how HTMLElement works, try looking for IHTMLElement.  Or IHTMLElement2.  Or 3.  Or 4.

Now that we’ve taken that little vacation, let’s get back to work here.  I made such a big fuss about getting the page title, so let’s do that here.

HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
string pageTitle = htmlDoc.title;
Console.WriteLine(pageTitle);

Before you can interact with an element on a page, you have to find it in the document.  There are two easy ways to find things, along with a few ways that aren’t quite that easy.  Here are the ones I find the most useful.

  •  .getElementById(string):  This method takes the ID of the HTML element you want and returns the IHTMLElement with that ID. In HTML, an ID is supposed to be a unique identifier, identifying a single element.  Of course, certain popular HTML editors (like Notepad, for instance) won’t enforce a unique ID, so if there are multiple elements with the same ID, this method will return one of them.  This one is good if you know exactly what you’re looking for.
  • .getElementsByName(string):  This method takes the name of HTML elements and returns an IHTMLElementCollection of all of the elements with that name.  This one is good if you have an element or a handful of elements with a known name.
  • .getElementsByTagName(string):  This method takes a tag name, like “a” or “img” or “div” and will return an IHTMLElementCollection of all of the elements with that tag name.  Use this method to quickly get a collection of all of the links or images on a page, or if you’re looking for an element of a certain type with certain characteristics and need to run through the list to find it.
  • .documentElement:  This property returns an IHTMLElement of the root of the HTML content of the page.  On a page that plays by the rules, this will be the <html> element.  If your page doesn’t play by the rules, good luck.  This is a good starting point if you want to walk the tree.
  • .childNodes:  This property will give you an IHTMLElementCollection of the direct children of the current node.
  • .all:  This property returns an IHTMLElementCollection containing a flattened list of all of the elements in the document.  Use this when you don’t care about structure and need to do something that involves lots of nodes of different types.

Unfortunately, as far as I’ve found, there’s no support for something like XPath, which would let you give the node tree path of the elements you want in a simple string format.  If you enjoy pain, you could build something like that yourself.

 The specified return type for most of these methods is IHTMLElement, which is the base element type in MSHTML.  In reality, the element instances are all specific element types.  For instance, an <img> tag will return an IHTMLImageElement object, and an <a> will give you an IHTMLAnchorElement object.7  The specific types will have specific properties, so if you know what element type you have and you need to use it for something (Say, for instance, if you need to get the src attribute from an <img>), then you should cast it to the specific type.

Right-o, let’s start doing useful stuff, shall we?  Back before we took a wild turn and ended up hopelessly sidetracked, we had Internet Explorer going to the front page of the search engine Dogpile.com.  Now, let’s make IE do a search.  To do that, we need to grab the search box, put text in it, then grab the search button and click it.  We’ll use the .getElementById method I talked about to get the search box.  Using something like the IE developer tools or Firebug8 or even viewing the page source in Notepad, you can find that the search box is an <input> element with an ID of “icePage_SearchBoxTop_qkw”.

IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
textBox.value = "powered by awesome";

If you run this, the browser will open, and the phrase “powered by awesome” will appear in the search box.

A couple of points to note.  Even though in the HTML, the search box is an <input> tag, the element you’ll get back is an IHTMLInputTextElement.  The different <input> types are all represented by distinct classes, which is very helpful, because there’s not much in common between a checkbox, a text box, or a button.  Then, once you have the element, it has a .value property, which acts as a getter and setter for the contents of the text box.

Grabbing the submit button is similar:

IHTMLInputButtonElement submitButton = (IHTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");

Unfortunately, when you try to click the button, you’ll run into this:

There’s no click there.  There’s nothing remotely resembling a click.  A button’s sole reason for existing is to be clicked, yet you can’t click this button.

Actually, you can.  Just not on IHTMLInputButtonElement, where you’d think you should be able to.  You see, you can actually click on any HTML element, so the .click() method is on the base IHTMLElement.  This goes back to the interfaces I went rambling on and on about a while back.  To find the functionality you want, you sometimes have to bounce around almost randomly until you find what you need.  So, to hell with the interfaces, let’s go directly with the concrete class again, like we did with the document.  In this case, it’s HTMLInputButtonElement. 9

HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
submitButton.click();
Thread.Sleep(5000);

Again, there’s a Thread.Sleep() after the action, so the program will wait long enough for the page to finish loading.  And again, I promise I’ll talk about it later, but for now, trust me and just leave it there.

Now we’re on an entirely new page.  If you try to use the document or the elements you grabbed before, the results will be, uh, shall we say, unpredictable…  The old page no longer exists, so don’t try to use anything from it.  You have to grab a new reference to the document, as well as new elements to play around with.

We’re on a search results page now, so let’s do something like print out all the result titles and the URLs to all of the images.

Console.WriteLine("Links by tag name:");
foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
{
    if(anchorElement.className == "resultLink")
    {
        Console.WriteLine(anchorElement.innerText);
    }
}

Console.WriteLine("Links by result walking:");
IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
{
    IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
    Console.WriteLine(resultLink.innerText);
}

Console.WriteLine("img src:");
foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
{
    Console.WriteLine(imgElement.src);
}

Console.WriteLine("img src 2:");
foreach (IHTMLImgElement imgElement in htmlDoc.images)
{
    Console.WriteLine(imgElement.src);
}

The first bit walks through all of the links, which are <a> tags, looking for elements with the class “resultLink”.  When it finds one, it prints out the .innerText property, which contains the flattened text content of the element.  The second section finds the same elements, but walks through a bit of the tree structure to find the links among children nodes.

I should probably point out now that the structure of websites tends to change over time, so if you try to run this and you get a bunch of exceptions, that’s what’s going on.  If anything on the page changes, this code is likely to break.  It works for me right now, and that’s really all that matters anyway.

The last bit, the part with the “img src” is walking through the page and printing out the URLs of all of the images on the page in two different ways.  First by using the tag name method you already have seen, and the second time by using the .images convenience property on the document.  There are a few other properties like that, so take a look at what Intellisense shows you and play around a bit to get a feel for what’s there.

BUT WAIT, THERE’S MORE!

We’ve got all this access to stuff on the page.  We can put text in text boxes, we can click buttons, we can read the links and images, so why not step it up a notch and modify the page in some crazy way.  Like, I don’t know, maybe we could put a box around every div on the page?

Like so:

foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
{
    divElement.runtimeStyle.borderStyle = "groove";
    divElement.runtimeStyle.borderWidth = "3";
}

KABOOM!

Of course, a crazy box cascade is of little practical value, but you get the basic idea of what you’re able to do.  You’re inside the page being rendered, so you can completely rewrite it if you want.  You’re not stuck with a static, read-only page, so learn how and where to use that to your advantage.  I’ve used this ability inside tests to write out debug information or inject JavaScript functions to be called by the automation.

Here’s the full example code from today:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using mshtml;
using SHDocVw;
using System.Threading;

namespace IEAutomationSample
{
    class Program
    {
        static void Main(string[] args)
        {
            InternetExplorer ieBrowser = new InternetExplorerClass();
            ieBrowser.Visible = true;
            NavigateToUrl(ieBrowser, "http://www.dogpile.com");
            Thread.Sleep(5000);

            //Do stuff here...
            HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
            string pageTitle = htmlDoc.title;
            Console.WriteLine(pageTitle);

            IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
            textBox.value = "powered by awesome";

            HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
            submitButton.click();
            Thread.Sleep(5000);

            htmlDoc = (HTMLDocument)ieBrowser.Document;

            Console.WriteLine("Links by tag name:");
            foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
            {
                if(anchorElement.className == "resultLink")
                {
                    Console.WriteLine(anchorElement.innerText);
                }
            }

            Console.WriteLine("Links by result walking:");
            IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
            foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
            {
                IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
                Console.WriteLine(resultLink.innerText);
            }

            Console.WriteLine("img src:");
            foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
            {
                Console.WriteLine(imgElement.src);
            }

            Console.WriteLine("img src 2:");
            foreach (IHTMLImgElement imgElement in htmlDoc.images)
            {
                Console.WriteLine(imgElement.src);
            }

            foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
            {
                divElement.runtimeStyle.borderStyle = "groove";
                divElement.runtimeStyle.borderWidth = "3";
            }   

            Thread.Sleep(5000);
            ieBrowser.Quit();
        }

        public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
        {
            object nullObject = null;
            ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
        }
    }
}

As always, you can pull the project out of SVN:  http://www.mathpirate.net/svn/Projects/IEAutomationSample/

That’s about all I wanted to get into as far as a hands-on demonstration.  Now, it’s time for warnings about what can and will go wrong.  So watch out.

First, as promised, let’s talk about those Thread.Sleep()s that I scattered throughout the code.  They’re there because you have to wait for the browser to finish its work, otherwise you’ll get random exceptions.  Exceptions that will never happen when you step through in a debugger, either.  However, it’s not a good practice to rely on sleeping for a fixed amount of time in your automation.  If the browser loads the page in half a second, but you’re sleeping for five seconds, then you’ve wasted four and a half seconds.  That kind of time adds up fast.  On the other hand, if the process is slow, five seconds might not be enough.  Your application will wake up too early and die.

In most cases, I’d suggest polling.  Check the status of something, or look to see if something exists fairly frequently, but keep looking for a reasonable amount of time.  For instance, you could check the .Busy flag on the InternetExplorer object every 100 ms for 30 seconds.  That way, you’ll never sit around for more than 100 ms longer than you need to, plus, you’ll keep checking long enough to be sure that it will finish.  If the page isn’t done loading in 30 seconds, you should probably fail right there.

Except that polling the .Busy flag doesn’t actually work reliably.

If you try to poll the Busy flag exclusively, you’ll find that your tests will sometimes randomly fail.  They’ll look like they should be working.  IE will be loading the page you expect it to load and everything will look right, but you’ll get an exception.  You see, you’re not synchronously driving IE.  You’re talking to an intermediate layer that’s relaying your commands to IE, and IE will respond eventually.  What that means is that you’ll tell IE to load a page, then you’ll check the Busy flag.  Most of the time, Busy will return true because it’s loading the page or false because it’s done loading the page.  But sometimes, your check on the Busy flag will get to IE before it’s started loading the page.  In this case, Busy will return false.  As far as IE is concerned, it’s not busy.  It’s done loading the page.  Trouble is, it’s telling you that it’s done loading the last page, not the page you just told it to load.

One way to counteract this is to sleep for a small amount of time before starting the polling, perhaps 250 ms.  This usually gives IE a chance to start moving and will increase the reliability.  However, it’s going to have the same problem as sleeping did originally.  You’ll often be wasting time waiting around for something that’s already done, and occasionally, you still won’t be waiting long enough.

Another way to combat this is to listen to some of the events hanging on the InternetExplorer interface.  There are events, such as NavigateComplete2, DocumentComplete, and DownloadComplete that you might be able to handle and set your own status flags in.  For instance, you can set a flag before you start to navigate, then have your NavigateComplete2 event handler unset that flag when it’s called.  If it’s called…  And if it’s called for the correct navigation event.  You have to be very careful with some of these events.  I believe DownloadComplete is fired by XMLHttpRequests used by AJAX calls, so that could trip up your detection.  NavigateComplete2 will get called when the main page finishes loading as well as when a frame finishes loading, so if you have a hidden iframe on your page for something like tracking and analytics, watch out for that.

I still have not found a flawless way to wait for page completion.  I’ve found a complicated tangle of states and flags and events that make it work in most cases, but not all.  So, good luck with that.

Security will also get in your way when dealing with IE Automation.  Microsoft rightfully doesn’t want script kiddies and other assorted bastards being able to do things like automatically download files to your computer.  Unfortunately, script kiddies are using the same bit of DOM technology that you’re trying to use, and MS has no way to tell you apart, so that means that sometimes you’ll be blocked from doing things.  I don’t think you can read from a password text box and I don’t think you can directly write to a file upload control.  Sometimes when you click links or buttons that launch certain actions like file downloads, you’ll get a yellow bar that wouldn’t be there if you’d clicked the button yourself.  You have to find crazy workarounds for these issues.  Sometimes you’ll spend all day trying circumvent IE’s security just to click one stupid button.

Another issue you’re likely to run into are random, unexplained failures, often with useless error messages, like “COMException -21234115153” or “RPC server has exploded, try again.”  Many of these exceptions will be timing problems.  Wait just a little longer and you’ll be fine.  I’ve had the constructor for the IE COM object give me an instance of IE that had already been destroyed.  Some errors I’ve seen are obscure COM threading issues.  You’ll get InvalidCastExceptions trying to access some of the properties, like  .location or .frames, even though you’re not casting anything.  You can sometimes fix those by setting your application to run in a Single-Threaded Apartment (Whatever in the hell that means) by putting the [STAThread] attribute on your Main method…  If you have a Main method.  If you’re in some library, or someplace like NUnit or VS Unit Tests, well, then, you’re just plain screwed.  And just this past week, I ran into a case where ieBrowser.HWND would throw an InvalidCastException every other time I called it.  Seriously, odd numbered of calls led to an exception, while even calls gave me a number.  The fix?

try { hwnd = ieBrowser.HWND; }
catch { hwnd = ieBrowser.HWND; }

Seriously.  I wrote that this week.  WTF?

I still feel dirty.

And finally, speaking of dirty, writing a bot to steal porn is left as an exercise for the reader.

  1. After some kind soul Hasslehoffs your desktop… []
  2. For that matter, so does SWA, but that’s a different story. []
  3. A Firefox specific toolbar and some Javascript issues []
  4. It really bugs me, too, because it should be IInternetExplorer… []
  5. And to make it even better, there’s a Navigate2() method, which is even uglier. []
  6. In the C++ world, the ref objects are all VARIANT*s.  The .Net magic that lets you use COM translates the VARIANT to object and the * to the ref.  Unfortunately, every one of those parameters could have had a strong type.  Flags is an int, TargetFrameName is a string (Well, BSTR, but whatever), and so on.  It didn’t have to be like this!  ARGH COM. []
  7. Okay, they’re really HTMLImageElements and HTMLAnchorElements, but who’s keeping track? []
  8. Yeah, Firebug is for Firefox, but a good web tester will have at least two or three browsers at the ready at all times. []
  9. Just don’t look too closely at the definition of HTMLInputButtonElement or HTMLDocument or any of the other things I called concrete classes, or you’ll discover that they, too, are interfaces.  The actual class is HTMLInputButtonElementClass or HTMLDocumentClass.  Whatever.  I don’t know what’s right and what’s real anymore… []

February 13, 2010   2 Comments

This Code Doesn’t Work And Here’s Why

I am a fan of continuous integration and automated builds.  It’s a great tool to help ensure that people are always checking in clean code that can be built successfully.  However, I have found that simply having an automated build system that sends out mail isn’t enough.  Some people are too lazy to set up proper e-mail filters, so they end up deleting all mails from the build server, including the “HEY, YOU BROKE THE BUILD” e-mails that are required for the proper functioning of any CI system.  As a result, I realized that simply having the system is not enough, there needs to be some other indication of the state of a build, something completely passive, yet visible to all.  At first, I set up a red light, using X10.  This alone was not enough, so it was soon joined by a screen of shame:  A computer monitor that would display, in big text for everyone to see, who broke the build and how long they’d left it broken.  That system worked fairly well.

Over time, though, the number of builds that needed to be monitored grew unwieldy.  The original display program I’d written was no longer up to the task.  When you have over a hundred separate builds, split across multiple teams, something that can display the status of only seven builds at once doesn’t work all that well.  It was clear that I needed to start over and create a new display system for this new world.

At the core of this build monitor system is a timer.  Every 50 ms, it wakes up and sees if there are any tasks that need to be performed.  This timer function is used to control the build status updates.  Every 30 seconds, the system will contact each of the CCNet servers that are being monitored, and pull down their latest build information.  Here’s the essence of that code (Error checking, etc. removed):

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    foreach (IntervalEventBase intervalEvent in _collectionList)
    {
        if (intervalEvent.NextEventTime < DateTime.Now)
        {
             ThreadPool.QueueUserWorkItem(
                 delegate
                 {
                    intervalEvent.UpdateNextEventTime();
                    intervalEvent.FireEvent(_state);
                 });
        }
    }
}

Those IntervalEvents can be any action I want to take on a timed interval.  One type of event checks a build server every thirty seconds, while another checks the time to determine if the application should enter a power saver mode which turns the screen off if it’s after hours and no one is around to look at it.  Not all events are on the same schedule, which is why it has to check whether or not it needs to fire a given event.

I set my program up to monitor a single build machine and, as expected, it updated the status every thirty seconds.  I failed a build and it showed up as failed on the next update.  Stopped a build and it displayed as stopped.  Everything was working, so I added a few other build servers to be monitored and deployed my program.  My new and improved build monitor application was now visible to all of the engineers in the company, telling them whether or not any of their builds were broken.

Almost immediately, I noticed that the screen was reporting that some of the builds were “stale”, that is, they’d missed their next expected build time by more than five hours.  Usually, this is a sign that a particular build on a server has gone sideways and is blocking all of the other builds as a result.  I didn’t think much of it at the time, especially when it cleared.  The server that was getting the stale warning was one I wasn’t very familiar with, so I figured that maybe there actually was a build that kicked off occasionally and that took six hours and blocked everything else while it ran.

I ignored the problem for a few days, until I noticed that a build box I was familiar with showed the same problem.  I knew that nothing on that box takes more than five minutes to run, so blocking for five hours was a major problem.  I went to the CCNet Dashboard and found that it was all clear.  Nothing was building and nothing was late.  So…  What the hell?  Why was the build monitor saying everything on that box was stale?  Obviously, my update timer was working right, because otherwise all of the build machines would be stale.  Network issues, perhaps?  Whatever.  I kicked the box, restarted my program, and everything showed up fine.

But still, at least once a day, individual build boxes would randomly report that everything was late to build when nothing was wrong.  I had to dig deeper and figure out what was going on.  The simplest way to begin was to put the last updated time for each server on the screen.  The servers should never be more than 30 seconds out of date, so by knowing when a particular server ran into trouble, I might be able to solve the problem.  Maybe the anti-virus software was kicking off at 11:30 every day and killing a connection to a build server somehow.  A broken connection would result in an error being displayed, but it would get displayed once for a total of ten seconds.  Very easy to miss.

I put the last updated time on the build status slides, then left the program to bake for a day so things could get in a weird state.  I checked it at 4:30 PM, and here’s the times it reported:

Build Server 1: 12:23 PM
Build Server 2: 4:27 PM
Build Server 3: 6:12 AM
Build Server 4: 3:56 PM
Build Server 5: 2:03 PM
Build Server 6: 4:30 PM
Build Server 7: 10:24 AM
Build Server 8: 3:38 PM

Blink.  Blink.  WTF?

Okay, that makes NO sense at all.  WHAT THE HELL?

If the timer is working, all of them should report a last updated time of around 4:30 PM.

If the timer isn’t working, all of them should be stale and all reporting the time I launched the program.

But this?  Scattered randomly throughout the day?  How in the hell was that happening?  I know that the .Net Timer classes aren’t meant to provide 100% realtime reliability and that using the ThreadPool will cause execution to be unpredictable, but this is well beyond the expected limits of tolerance.  This just cannot be happening.  There’s no way.

I open up the code and start looking around.  The timer is set up with an interval of 50 ms.  Clearly it’s firing properly because some of the builds are getting updated.  And if any of them are they all should be, since they’re all being queued in the same function call from the same timer tick.  Then it hit me what the problem was.

You see, using the ThreadPool doesn’t really make many guarantees about the execution.  You hand some work over to the ThreadPool and it’ll get done at some point in the future when the pool gets around to it.  It’s great for when you have something you need to do asynchronously and don’t require too much control over when or where it’s executed.  If you go back to the code, you’ll see that the function I’m putting on the pool will schedule the next execution of the event.  Trouble is, it’s the timer method that’s checking that value, and the timer is in its own asynchronous little world.  Obviously, what’s going on here is that the timer is firing and enqueuing a bunch of events and then firing again and enqueuing some of them AGAIN, before they had a chance to run and update their scheduled time.  As this happens, the number of queued events will grow larger and larger and who knows when any particular build will run.  Obviously, that will cause all sorts of problems.

Okay then, easy fix.  Wrap the body of the FireEvents method in a Monitor.TryEnter()/Monitor.Exit() block to ensure it can’t execute multiple times at once.  Then, put a flag on the events, so they won’t get re-queued if they’re already in the queue to be executed.  Here’s what the function looked like:

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    if (Monitor.TryEnter(_collectionList))
    {
        foreach (IntervalEventBase intervalEvent in _collectionList)
        {
            if (intervalEvent.NextEventTime < DateTime.Now && !intervalEvent.IsQueued)
            {
                intervalEvent.IsQueued = true;
                ThreadPool.QueueUserWorkItem(
                    delegate
                    {
                        intervalEvent.UpdateNextEventTime();
                        intervalEvent.FireEvent(_state);
                        intervalEvent.IsQueued = false;
                    });
            }
        }

        Monitor.Exit(_collectionList);
    }
}

Okay, everything’s golden now.  The problem is solved.  I rebuild, fire it up and…

COMPLETE FAILURE

Now they’re not updating at all.  AT ALL.  I put in some breakpoints and watch it run.  There are 9 events in _collectionList.  The foreach iterates through all nine, sets the .IsQueued flag to true on them and enqueues them on the ThreadPool.  The next time the timer fires, the first eight events, all of which are build servers, still have their IsQueued flags set to true.  Only the last event, which is the stupid power saver check, has its flag set to false.

What the hell?  It can’t possibly be taking thirty seconds to ask the build servers for their status.  One of them had to have finished by now.  How can they all be enqueued, yet not be finished?  I set a breakpoint inside the FireEvent method on the build status updater class and start things running, then wait.

And wait.

And wait.

It’s obviously been more than thirty seconds and it’s never getting called.  Not once.

I’m watching eight build status update events getting enqueued, but they’re never running and never having their queued flags reset.  But the power saver event gets its flag reset, so it must be getting called.  I put a breakpoint inside it and start things over.

Breakpoint hit!  Well, at least something is working.

Press F5.

Breakpoint hit!  Did I wait 30 seconds?

Press F5.  Breakpoint Hit!  Press F5.  Breakpoint Hit!  It’s definitely not 30 seconds.

I keep going.  I hit the breakpoint inside the power saver event NINE TIMES.

Great.  I now have a build monitor that doesn’t monitor any builds, but at least it’ll be absolutely certain to shut off the screen at 7PM Monday through Friday and keep it off all day Saturday and Sunday.  WTF?  It’s clearly enqueuing work for build update events and somewhere along the line, they’re all magically becoming power saver events.  Did I accidentally set System.Environment.TreeHugger = true; ?

I go back and stare at the code.

Oh…  Yeah…  Now I get it…

I like anonymous delegates and lambda functions.  I think they’re a great addition to the language.  I know they utterly confuse some people, but I think they’re great.  They let you keep your logic all together in one block, instead of forcing you to write a separate function and then figure out how to get your data into that function when it gets called.  In this case, I want to set the IsQueued flag on the event, then schedule its next execution time, fire the event, and finally say that it’s not queued up anymore.  That’s the logical flow of operations and thanks to an anonymous function, it’s all in one place.  Behind the scenes, it’s the compiler that ends up writing a separate function and figures out how to get your data into that function.  In most cases, that process works fine.

This is not one of those cases.

See, when the compiler works its magic, it packages up the anonymous function and any variables your function references into what’s called a closure.  This closure has a reference to the variable, not the value of the variable.  Subtle difference.  In most cases, you’re using anonymous functions synchronously, so there’s really no practical difference.  The value of the variable cannot change by the time the anonymous function is called and the variable is referenced.  In an asynchronous context, however, this subtle difference will get all hopped up on angry juice and come back to bite you.

What went wrong here is that the foreach loop does not create a new variable with every iteration.  It’s reusing intervalEvent and simply giving it a new value every time through the loop. 1  When I’m calling UserQueueWorkItem, I’m enqueuing a function that references that variable, not the value in it at the time I enqueue.  Which means that by the time the ThreadPool actually executes my function, there’s no guarantee what intervalEvent will be pointing at.  Most of the time, it will be the last event in _collectionList.  Sometimes, the ThreadPool will fire off an event while the iteration is still happening, so it will execute with some value in the middle.  It’s even possible that the value of intervalEvent will change while my anonymous function is executing.

This is what was causing all of the wackiness with the updates.  Most of the time, none of the updates were getting through, but it would run the screen shutoff event nine times in a row every 30 seconds.  Every once in a while, the ThreadPool would have mercy on an intervalEvent and allow it to run while the foreach was still iterating, which was how many of the servers would appear to be up-to-date and also how stale servers got cured automatically.

Okay, so, that’s why everything’s all FUBAR.  Now, how do you fix it?  Well, the problem is that the iterator variable is being reused, so, what you need to do is create a local variable inside the scope of the foreach to capture the value, then use that variable inside the anonymous delegate.  Like so:

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    if (Monitor.TryEnter(_collectionList))
    {
        foreach (IntervalEventBase intervalEvent in _collectionList)
        {
            IntervalEventBase localEvent = intervalEvent;
            if (localEvent.NextEventTime < DateTime.Now && !localEvent.IsQueued)
            {
                localEvent.IsQueued = true;
                ThreadPool.QueueUserWorkItem(
                    delegate
                    {
                        localEvent.UpdateNextEventTime();
                        localEvent.FireEvent(_state);
                        localEvent.IsQueued = false;
                    });
            }
        }

        Monitor.Exit(_collectionList);
    }
}

You have to love it when the fix for a big nasty bug is something that looks like a rookie coding mistake.  Which, of course, means that you have to clearly comment the fix, otherwise someone will think that it IS a rookie coding mistake and “fix” it.  The closure now has the proper value when it gets run and all of my builds are updating like clockwork every thirty seconds.

Well, at least I think they are.  I fixed it at 6:59 PM, so pretty much as soon as it started running, the power saver event fired and the screen turned off…

 

Anyway, if you want a compact repro case to play with, here you go.  “foreachString” will usually print out all “nine”s down the line (Or occasionally a “three” or something in the first spot or two), while “localString” will have a different value on every line (Although not necessarily in order).

string[] strings = new string[] { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

foreach (string foreachString in strings)
{
    string localString = foreachString;
    ThreadPool.QueueUserWorkItem(delegate { Console.WriteLine("foreachString: {0}  localString: {1}", foreachString, localString); });
}

Sample Output:

foreachString: zero  localString: zero
foreachString: nine  localString: two
foreachString: nine  localString: three
foreachString: nine  localString: four
foreachString: nine  localString: five
foreachString: nine  localString: six
foreachString: nine  localString: seven
foreachString: nine  localString: eight
foreachString: nine  localString: nine
foreachString: nine  localString: one

And for more information on the code generated by the compiler for anonymous functions and closures, including information on this very problem (called “an interesting but dangerous side effect” in section 6), check out this CodeProject article by P. Adityanand: http://www.codeproject.com/KB/cs/InsideAnonymousMethods.aspx

  1. With a foreach, this variable reuse isn’t really apparent.  If you’re using a regular for loop, it’s clear that the iterator variable is the same each time through. []

January 23, 2010   No Comments

This Code Doesn’t Work

The following is a snippet of code that reproduces a bug I came across today.

string[] strings = new string[] { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

foreach (string str in strings)
{
    ThreadPool.QueueUserWorkItem( delegate { Console.WriteLine(str); } );
}

It should print out the numbers “zero” through “nine”.  It doesn’t work.  Why not?

I’ll write up the answer tomorrow, but I want to see if anyone else can get it first.

January 22, 2010   No Comments

UI Automation: Tricks and Traps

UI Automation and testing can be among the trickiest areas of software testing.  Directly testing an API is relatively easy.  You’ve got functions to call, well defined inputs and outputs.  It’s meant to be used in the way you’re using it when you write your tests.  You can spend most of your time writing real test cases that will generally work correctly with minimal effort.  UI Automation, however, isn’t nearly as friendly.  You’ll sometimes spend hours twisting and tweaking one test case to get it running, and even then it’ll still randomly fail 25% of the time.

A large part of this is due to the fact that a UI is meant to present an interaction model for a human.  It’s not actually meant for another computer program to deal with.  A person is clicking the buttons and typing text in the text boxes and so on.  Allowing a computer to interact with it is usually an afterthought, hacked together using technologies that will work, sometimes, and only if the application programmer followed the rules.  If they’re not using a button, but instead are using something that they’re drawing themselves to look and act like a button, it’s not going to be a button for you and your UI tests aren’t going to be able click it easily.

Another major problem with UI tests is that the user interface frequently changes.  That’s not supposed to be a radio button, it’s supposed to be a check box.  Move that button after the text box.  Make that list box a combo box.  The UI is often the most fluid piece of a software application.  Once the API is in place, your API tests have a decent chance of working version over version, because an API isn’t subject to focus groups or marketing studies.  But it’s very rare to leave the UI untouched between versions.

There’s also a problem of perception regarding what UI tests do.  People often think that since UI automation is testing the UI, that means that it’s covering the look of the UI, as well.  Most of the time, it won’t because it can’t.  It’s very difficult to have automated visual testing.  Sure, you can compare screenshots, but what if the window size changes?  It’ll break if you move a button or box.  Your graphical verification tests will report complete and total failures if you took the screenshots on plain XP and someone later uses Vista with Aero to run them.  Hell, they’ll likely die if you turn font smoothing on or off.  Doing something so fragile is what we testers call “A Waste of Time”.  UI testing generally doesn’t cover the look of the application.  Instead, it verifies the correct functionality of the controls in the application.  It’s possible to have your UI tests reporting a 100% success rate when nothing is shown on the screen.  As long as the controls are accessible in the way you specify in your tests, they’ll run.

So then, what can be done about automated UI testing?  It’s obviously very valuable to have, despite the difficulties.  Here’s a few tips and tricks, as well as some traps to avoid.

Name Everything:

In web applications, you can give elements IDs or names.  In regular Windows apps, you can use SWA or  MSAA to identify things.  At any rate, anything you interact with should be uniquely identifiable in some way.  If you’re a developer, do this.  If you’re a tester, get your devs to do this.  If they refuse, do it for them.  Naming things will tend to make your automation resilient in the face of most general changes.  Bits and pieces can move around, but as long as they’re named the same and work the same way, your test will probably survive.

You don’t have to give a completely unique identifier to absolutely everything.  What I’ve found that tends to work well is giving logical groups a unique id for the current window or page, then naming repeated controls.  Consider, for example, a page of results from a search engine.  You’ll have a search box at the top of the page and at the bottom, and you’ll have multiple sets of results in the middle area.  Give the logical areas unique IDs, like “SearchBoxTop” and “SearchBoxBottom” for the search boxes, and “MainResults”, “AdResultsRight” and “AdResultsTop” for the result sections.  Then, those areas can share names across them.  For instance, I don’t really care that I’m dealing with the top search button or the bottom search button specifically.   All I need at that point is “Button”.   “Button” can be used as a name for fifteen controls on the page, but I already know that it’s the top search button I’m using because I got it in the context of SearchBoxTop. 

Turn UI Testing Into API Testing.  Sort Of…:

I’ve seen UI test code that’s an unreadable mess of copied and pasted bits to extract controls or elements followed by copied and pasted unreadable messes where the controls or elements are fiddled with followed by messes of bits that had been copied and pasted to the point of unreadability which extract results from controls or elements.  In fact, that’s what pretty much any test recorder will spit out at you.  It’s a total nightmare to look at and deal with even on a good day, and if you’re looking at it and dealing with it, chances are it’s not a good day.  Chances are all your tests broke last night and now you have to dig through a hundred separate tests and repair the element extraction code in each one of them, all because your UI developer made a “quick change” from tables to divs in the page layout.  Even though the page looks identical, the entire structure is different now and nothing is going to work.

I mentioned in the intro that API testing was relatively easy, because it’s typically well defined what you’re doing and how things are expected to function.  Things may fail, but usually they’ll fail in somewhat predictable ways.  Well, the best way I’ve found to make UI testing easier is to make it closer to API testing.  Wrap the code that interacts with the UI that you’re testing in classes and functions that behave somewhat predictably and expose the bits and pieces of the UI in ways that make sense.1  I prefer to create a class with ordinary properties or methods that operate on a web page or dialog or whatever.  Going back to the web search example, you’ll have a page with a text box and a button next to it.  That translates to a simple class along these lines:2

public class SearchPage
{
    public string SearchText { get; set; }
    public void ClickSearch();
}

Then it’s up to the SearchPage class to determine how to find the text box how to click the button, and to deal with all of the nonsense and WTFery that the UI throws at you.  Your test case that needs to do a search then only needs these two lines:

...
    SearchPage page = new SearchPage();
    page.SearchText = "nuclear manatee seesaw detector";
    page.ClickSearch();
...

In that example, it should be clear to anyone looking at the code what’s going on.  It’s not full of element paths and SWA control patterns.  I’m just setting the text of the search box and clicking a button.  Your test usually doesn’t care about the mechanics of getting the textbox filled in or what kind of stupid tricks are required to click the button, and it shouldn’t.  Doing it this way means that it won’t.  And then the next time the devs make a “quick change” that breaks everything, you only have to make a “quick change” yourself to the code of the wrapper classes and everything should be fixed.

Always Have A Plan B.  And A Plan C.  (And D…):

Successful UI Automation often requires hacks.  Not just hacks, but dirty hacks.  If you feel completely clean after writing UI automation, then there’s a good chance your tests won’t actually work.  Start by trying to do everything the “right” way, using the controls provided to you by SWA or the browser DOM or what have you.  They’ll work, most of the time.  Unfortunately, every so often you’ll run into a button or a dialog that just doesn’t behave.  Sometimes there are security measures put in place to prevent automated tasks from doing certain things, for instance downloading files in a browser.  You have to be ready to defeat whatever is thrown in your way.  Remember, you’re dealing with UI elements, so if you have to, you can act like an actual user.  Can’t “click” a button using SWA’s InvokePattern?  Try simulating a mouse click or sending keystrokes to the application (Space or Enter will usually activate a button that has focus).  Hell, if you need to, don’t be afraid to buy a Lego Mindstorms kit and build a robot that can click a physical mouse button for you.

SendKeys is Your Worst Enemy

 Available through the Windows API, as well as exposed in the .Net Framework, there’s a function called “SendKeys”.  It lets you send keystrokes to windows.  The application will then respond as if an actual user pressed the keys.  You can use keyboard shortcuts, tab through dialogs, type text into textboxes.  Pretty much anything a user can do from a keyboard, you can do with SendKeys.  It might be tempting to write all of your UI automation using SendKeys, but don’t.  Just don’t.  SendKeys is one of the least reliable and most fragile ways to try to interact with your software.  It won’t survive any kind of change to the interface, and even when it is set up properly, it doesn’t always work right.  Keys will get lost or come early or late, and if the focus changes at all for some reason, you’re screwed.

SendKeys is Your Best Friend

When all else fails, SendKeys will get the job done.  I once ran across a pretty normal looking Windows dialog, with normal looking buttons.  Unfortunately, for whatever reason, the dialog refused to respond to any kind of standard attempts to reach it.  I tried SWA first, and although I could find the button I wanted to click, Invoking it did nothing.  So I tried sending the button a Windows Message to tell it that it had been clicked.  Still nothing.  Then I tried setting its focus and sending the Enter key and still nothing.  In the end, what worked was SendKeys(“{Right}{Right}{Enter}”), which selected the button and triggered it completely from the keyboard.  Not a happy solution by any means, but it worked and that’s all that matters.  It’s definitely worth learning its syntax for those obscure cases where you need to hold down ALT for twenty seconds or whatever.3

Beware of “Don’t Show This Dialog Again” and Similar Conditions

You know that dialog option.  It’s everywhere and you always check it.  It turns off stupid things like the “Tip of the Day” or warnings about the mean and scary hackers that want to steal your life on the Internet.  And it will come back to bite you when you try to do UI automation.  You’ll write your tests on your machine and they’ll run beautifully.  Then you’ll put them on your automation box and they’ll fall apart because there’s some window or dialog that appears that you had long forgotten about.   You’ll need to alter your test to take into account the possibility that an optional dialog might be there and handle it if it is or move along quickly if it isn’t.  Speaking of which…

Waiting, Waiting, Waiting…

Pretty much any piece of UI automation will have some kind of timing dance.  Normal API testing is usually synchronous.  You call a method and are blocked until it returns or have some clear way of waiting for an asynchronous operation to complete.  This is often not the case with UI automation.  After all, you’re trying to run something that doesn’t know about you and doesn’t care about your schedule.  As a human, you click an icon, wait a few seconds, and continue when the application has finished opening.  You can’t just do that with your automated test.  You click, fine, that’s easy.  Then what?  You have to wait, but for how long?  One second?  Two?  What happens if your virus scanner kicked on when the test is running and now it takes ten seconds to open the application?  It’s ridiculous to force your test to wait for ten seconds every time just in case something goes wrong, but it’s equally bad to only wait one second and fail the test one out of ten times when something does go wrong.  The common solution is to poll, looking for something you expect, like a window with a certain title.  Every 100 ms or so, see if the window (or whatever) you’re waiting for is there yet.  But don’t wait forever, because if it doesn’t show up, you don’t want to be stuck.  Use a reasonable timeout that you’re willing to wait before giving up and fail after you reach that point. 

Okay, so you’ve waited for the window to show up, so you can continue with your test.  CRASH!  Well, sure, the window is there, but the control you’re trying to use won’t actually be visible for another 20 ms, so your test dies in a fire.  Watch out for things like that.

Wherever possible, use some indicator within the application itself as a guide for when something is done.  If your app has a status bar that reads “Working” when it’s working and “Done” when it’s done, then watch that status bar text for a change.  If a file is supposed to be written, then look for that file.  You have to be careful, though, don’t always trust the application outright.  As you’ll soon see, the application isn’t always telling you what you really want to know.

My absolute favorite brainbender of a timing issue is dealing with the IE COM object that lets you run browser automation through the IE DOM.  With this COM object, your commands are shipped off to be executed in another process, largely asynchronously.  You call the navigate method to open a web page.  Obviously, since you’re opening a web page, that can take some time, so you make sure that you wait for the page to finish loading before you begin the test.  Your test runs perfectly about 70-80% of the time.  But there’s that remaining chunk where your test reports that it can’t find the page element you’re trying to use.  So, you watch the test run.  It opens the browser and navigates to the page, the element is clearly present on the page you see, yet your test whines that it’s missing and it dies.  WTF?  As far as you can tell, everything is doing exactly what it should be doing except for the failing miserably part somewhere in the middle.  You step through in a debugger, hoping to catch the bug in action, but it works every time.  Here’s where it gets fun:  The IE instance that you’re driving lives in another process and operates on its own time.  You send off an asynchronous request to load a page, then almost immediately thereafter, you ask it if it’s done.  Most of the time, the browser will say “Not yet”, and your test goes to sleep.  But, once in a while, the browser responds, “Yeah, I’m done” on that first request.  You continue, and die because obviously it hasn’t loaded your page yet.  Why is it saying that it has?  Well, you’re not asking if it’s loaded the page you’re looking for.  You’ve asked if it’s done loading.  It says “Sure”, because as far as it’s concerned, it is done…  It’s done loading the LAST page you sent it to.  It hasn’t even started loading the page you just told it to go to.

Debugger == FAIL:

Stepping through your automated UI test case in a debugger is a blueprint for fail.  It won’t work right.  It just won’t.  Your test is happily humming along, driving controls, setting text, having fun, when all of a sudden, a breakpoint is hit.  Your trusty debugger IDE comes to the foreground and you tell it to step to the next line.

Where are you now?

The debugger stole focus.  Does it give focus back to the window you were at before?  Does it give it back to the same control?  When the debugger steps in, it FUBARs the state of your test.  Things might work.  Maybe.  Then again, your test might go completely off the rails and start opening menus and typing things in whatever application you land in.  It could go catastrophically wrong, and while it’s often entertaining to sit back and watch your computer flip out, it usually doesn’t help you solve the original problem.

You can try stepping through an automated test using a debugger, but dust off your Console.WriteLine or printf debugging skills, because there’s a good chance you’ll need them.

Make Sure You Have A UI To Test:

Standard operating prodcedure for automated tests in a Continuous Integration environment is to have some automated process kick off your tests in response to a check-in or a build.  Trouble is, these automated processes typically live as a service or a scheduled task on a machine hidden in a closet that no one ever logs in to.  If no one is logged into a machine, then there’s a good chance that the application you’re trying to test won’t be running in an interactive window station, and if it’s not in an interactive window station, then your application probably won’t have things like, oh, windows.  It’s pretty hard to test a UI when the UI doesn’t exist.  Make sure that you’re running your UI tests somewhere that they they’ll have an interactive window station.  If you can leave a machine unlocked and open all the time, then that’s the easiest thing to do.  Unfortunately, things like “Corporate Computing Security Policies” tend to get in the way of you getting done what you need to do.   If you can’t leave a machine unlocked, then another possible solution is to use a virtual machine.  It’s not as scary to set up a simple VM as it might sound initially4, and it’s possible to have a VM running and unlocked and with a nice shiny interactive window station, even on a physical box that no one is logged in to.

Now, some of you might be thinking of using Remote Desktop to solve your problems, but good luck with that.  I’ve found that any place big enough to have a computing security policy that prohibits unlocked machines also tends to have a computing security policy that will log you out of inactive remote sessions.  Even without a policy, remote desktop sessions tend to log themselves out when they get bored.  And, to top it all off, even if you don’t get logged out, I’ve had problems with UI things over Remote Desktop, so use at your own risk.  You might have better luck than I did…

Beware of Outside Influences:

With direct API or service testing, you’re usually insulated from whatever’s happening on the machine.

“Windows has just updated your computer and it will restart automatically in 3, 2, 1…”

Unfortunately, that’s not the case with UI automation.

“There is an update available for Flash.  Download NOW!”

You’re much more at the mercy of unexpected windows and popups and dialogs.

“This application has encountered an error and will be shut down.”

There’s not much you can do about it.

“You need administrative rights to perform this action.  Allow?”

You can always try to eliminate or tune down the things that you can predict, but there will always be the unexpected willing to come along and bite you.

“You need administrative rights to allow this action to obtain administrative rights to perform this action.  Allow?  Are you REALLY sure this time?”

Basically, your only option is to be defensive.  You can’t always recover from some random dialog or other interference5, but you can make sure that your tests don’t hang forever and at least report that something went wrong.  If you’re looking for a window or a control, don’t look forever.  If it’s not there within a minute, it ain’t coming, so kill the test and move on.  And whenever possible, have your tests take screenshots of unexpected failures.  You’d be amazed how much frustration you’ll avoid if you have a screenshot that clearly shows what went wrong.

Take Screenshots Whenever Possible If Something Goes Wrong:

Yeah, I know I just said that above, but it needed its own headline.

Sometimes It’s Just Plain Flaky

Even when you’ve tailored the environment to be exactly what the test needs, even when you’ve taken care of all the stupid timing issues and dialog interference, even when it should work, sometimes, it just won’t work.  UI testing should always be treated as your sworn enemy because it hates you.  And there’s nothing you can do about it.  A good rule to live by is that if a UI test fails once, run it again, if it fails twice, run it once more, and if it fails a third time in a row, it’s an actual bug in the software.  It is a waste of time to attempt to get UI automation running flawlessly 100% of the time.  Shoot for 90% and call it a day.  You’ll find more bugs in the software if you write 30 slightly imperfect tests than if you spend all that time writing 5 perfect tests.

And When All Else Fails…

Thread.Sleep(5000);

  1. And really, if you’re an SDET, you already have been thinking of a solution of some form along these lines.  If not, then give the D in your title back and get the hell out of my pay grade because you have no business calling yourself a Software Development Engineer, in Test or otherwise. []
  2. I actually follow a slightly more complicated model where I have wrapper classes for the controls, too.  For this SearchPage example, I’d actually have something like a “UITextBox” class or interface with a Text property, and a “UIButton” class with a “Click” method and a “Text” property, etc.  This lets me expand the functionality of the controls without having to change the container class.  (For instance, if I need a “Focus()” method on the button, I just add it on my “UIButton” class and it’s accessible on every button I have.)  Additionally, it allows for subclassing/inheritance, so if I have a stupid button that requires keystrokes to press, I can have “StupidButton” derive from UIButton, then make the class return a StupidButton instance, and the test cases are none the wiser. []
  3. Helpful tip:  If you want to send a space, call SendKeys(” “);.  Seems obvious now, but it’s amazing how your mind shuts out that possibility when you’re trying to do it. []
  4. MS gives away Virtual PC and Virtual Server for free, and chances are you have an OS install disc around somewhere, and that’s all you need. []
  5. Keep in mind that interference need not be from the system.  If you’re running on an open machine somewhere, they have a bad habit of being used to check Facebook in the middle of a test run, and that’s not good for your UI driving automation… []

December 23, 2009   7 Comments