Random header image... Refresh for more!

Continuous Integration and You

Continuous Integration is probably the most important development in team programming since the rise of source control systems.  It’s not a recent development, either, but while source control usage is universal, Continuous Integration tends to meet with calls of “It’s too hard” and “We don’t need it yet”.  Even in my company, where the benefits of Continuous Integration are obvious and clear, there always seems to be some resistance or avoidance when it’s time to put a project into the system.

Well, it’s not that hard1, and yes, you do need it right now.

Before I begin, I work at a web company.  Most of our software are fairly small (code-wise) sites, services and tools, that sort of thing.  We don’t really have any monolithic enterprise systems that take seven hours to compile with thousands of dependencies.  This posting is written from my experience in this world.  If that’s not your world, some of these comments might not apply to you, although the basic concepts should still be able to translate into what you do.

Continuous Integration, in its most basic form, is a system that watches source control check-ins, and when it detects a change, will build the code.

And deploy the code.

And run the automated tests against the code.

And build other projects that depend on the module that was just built, and deploy and test those, too.

You may have heard of Continuous Integration before, but didn’t look into it because you think it’s an “Agile process” and that you have to be doing test-driven design and extreme programming and all of that other nonsense in order for it to work.  Well, you don’t have to be doing Agile to do this.  Even if you’re in a barrel and going over a hard-core ISO-9000 Waterfall, you’ll benefit from at least part of it.  Hell, I’ve even set it up for some of my own personal projects, just so I don’t have to deal with the hassles of manual deployment.

Here’s why Continous Integration is Awesomeness:

  • Broken code detected automatically and immediately.  Accidentally check in a file with only half a function written?  Forget to check in a new file or library?  Within a few minutes, CI will fire off a build and choke, so you’ll know right away that there was a problem with what you just checked in.  You don’t have the case where two days later, someone else checks out your code, discovers that it’s broken, then wastes an hour trying to figure out what’s wrong before calling you over to help. 2  Plus, there’s the “public shaming” factor working in favor of quality software.  When someone breaks the build, everyone finds out about it, so people tend to be more careful about what they check in.
  • Automatic deployments.  If you’ve got web sites or services, you can’t use them unless they’re running somewhere.  Obviously, you can install the site by hand every couple of days, but then you have to go through the tedious steps of logging in, finding the latest build package, uninstalling the old code and installing the new code, repeating those steps for every machine you need the code on.  You could easily waste up to an hour or two a day doing this.  (On top of that, you’re running code that’s out of date most of the time.)  CI can do it for you.  Your machines are always up to date and you’re not wasting any of your time to make that happen.  Best of all, pesky testers won’t bug you asking for an updated build or begging you to push your latest fixes.
  • Automatic testing.  There’s test automation, then there’s automated test automation.  Automated tests that don’t run on their own don’t run often enough.  You’ve gone to all the trouble of making a computer do all the work of testing for you, so why do you still have it where someone has to push the button to make it go?  Think that’s job security for you or something?  If you have automated tests, make them run in a CI system.  That way, the developers check in their code and a few minutes later, they get an e-mail that tells them if anything broke, and you don’t have to do anything.  You don’t even have to be in the office!
  • Always up-to-date.  Testers are always looking at the latest code, so they’re not filing bugs about things you fixed last week.  The upper-ups are always looking at the latest code, so they’re not complaining about things that you fixed last week.  And the teams that rely on your component are using the component as it actually is, not as you designed it two months ago.
  • Code builds are clean.  Everything’s built automatically out on the build servers, not on your machine, where you’ve modified config files and replaced critical libraries with the debug versions.  The build servers are clean, and becaue what they’re building is constantly being deployed and tested, you know that the build packages they’re producing work.
  • Integration and deployment aren’t afterthoughts.    In ages past, I worked on a project which had a three month release cycle.  The first month and a half was strictly local development, on the developer’s machines.  Nothing was running anywhere other than the Cassini server in Visual Studio.  Like many systems, there was a front end and a back end.  Of course, the back end wasn’t running anywhere, so the front end was using a mock service that worked like the actual back end would, in theory, at least.  Then came the day of the first drop.  Developers spent hours building and installing everything, then turned it over to QA and went home.  The next day, QA was only able to log a single bug:  “[Drop 1] NOTHING WORKS. (Pri1/Sev1)” 3  You see, while everything worked according to the spec, in theory, none of the developers actually talked to one another, so neither side mentioned how they filled in the holes in the spec.  The back end was using a different port, because the original port conflicted with something, and the front end was using lowercase parameter names and expecting a different date format.   This sort of trainwreck deployment disaster doesn’t happen when you have continuous integration in place.  Not because there aren’t miscommunications, but because those miscommunications are discovered and resolved immediately, so they don’t pile up.  The front end developers point their code at the live work-in-progress back end servers.  If there’s a port change or a date format mismatch, it can be detected and corrected right away. 
  • Testers get something to test immediately.  In the story above, the testers were forced to try to write their tests against a phantom system for the first month and a half, based on partial technical specs and a handful of incomplete mockups.  After that, we were writing tests on code that was out of date.  It doesn’t have to be like that.  With Continuous Integration,  whatever is checked in is deployed and running somewhere, making it at least partially testable.  Even if there’s no functionality, at the very least, there is a test that the deployment package works.  The argument that testers can just pull the code out of the repository and run the service or website on their local machine and run all of their tests against that is ridiculous.  It’s a waste of time, because every tester has to pull down the code, configure their machines, build the code, then bug the developers for an hour or two because things aren’t working right.  Even after all that, there’s no telling whether some bugs are actual bugs, or just byproducts of the way it’s set up on a particular tester’s machine.  Worse still, the tests that are written aren’t going to be run regularly, because there’s no stable place to run them against.
  • Deployment packages and test runs are easily archived.  The system can automatically keep a record of everything that it does.  Need to compare the performance of builds before and after a change that was made two weeks ago?  You’ll be able to do that, because you still have the .MSIs archived for every build over the last two months.  Need to find out when a particular piece of functionality broke?  You’ll be able to do that, because the tests ran with almost every check-in and the results were saved.  You don’t have drops and test passes that are done two weeks and a hundred check-ins apart.
  • Progress is always visible.  Ever have a manager above you, sometimes far above you, want to see your progress?  Ever have to race to get something set up to demo?  CI takes care of that.  You’re always deploying with every check in.  A demo of the current state is always a link away.
  • It saves time and saving time saves money.  Use this point with skeptical management.  The first time you set up a project in a CI system, yes, you’ll spend several days getting everything running.  However, if you add up all the time that your team will waste without an automated build/deploy/test system, it will easily pay for itself.  You won’t spend any time on getting a build ready, deploying the build, firing off the automated tests, reporting on the automated tests, trying to get something you’re not working on set up on your box, explaining how to set up your code on someone else’s box, creating a demonstration environment to show off what you’ve done to the upper-ups after they asked about your progress in a status meeting, integrating with a service that’s been under development for months but that you’ve never seen, trying to figure out what check in over the last three weeks broke a feature, wasting a morning because a coworker checked in broken code, or being yelled at by coworkers for checking in broken code and causing them to waste their morning.  And, of course, the second time you set up a project in Continuous Integration, you can copy and paste what you did the first time, then only spend a fraction of the time working out the kinks.  It gets easier with every project you do. 

Like almost anything that’s good, there are some points where it’s painful.

  • When the build goes bad, people can be blocked.  Everyone’s relying on the code being deployed and functional.  Now that you have an automatic push of whatever you’ve checked in, there’s always the chance that that what you’ve checked in is garbage, leading to garbage getting pushed out.  However, the flip side to this is that the problem is exposed immedately, and there’s a very real and pressing need to fix the problem.  Otherwise, the blocking issue may have gone undetected for days, perhaps weeks, where it would be far more difficult to track down.
  • It never works right the first time.  Never.  Whenever you set up a project in your Continuous Integration system or make major changes to an existing project, it won’t work.  You’ll spend an hour tracking down little problems here and there.  Even if you get the build scripts running on your local machine, they’ll fail when you put them out on the build server.  It’s just part of the territory.  However, tweaking the build to work on a different machine does mean that you’re more aware of the setup requirements, so when it comes time to ramp up the new guy or to deploy to production, you’ll have a better idea of what needs to be done.
  • Everyone knows when you screw up.  Everyone gets the build failure e-mail.  So don’t screw up.  Where I work, we’ve taken it a step further.  We used to have a “Build Breaker” high score screen, which would keep track of who had left builds broken the longest.  When the build broke, everyone involved would make sure they fixed it as soon as they could, so they’d avoid getting the top score.  However, we also have a watcher set up that will send an e-mail to the specific person responsible for the build break, letting them know that something went wrong.

Now you know why it’s good to set up Continuous Integration, as well as some of the things that will go wrong when you do.  So, all that’s left is for me to give you some advice based on what I’ve learned in my own experience.  I’m not going to go so far as to say that these are Continuous Integration Best Practices or anything like that, just some tips and tricks you might find helpful. 4

  • Do it.  Just do it.  Stop complaining about how hard it is or how it takes so much time to set up or how we don’t really need it at this point in the project.  Do it.  DO IT NOW.  It’s easier to do it earlier in the project, and the overall benefit is greater.
  • Add new projects to CI immediately.  Create the project, check it in, and add it to CI.  Don’t wait, don’t say “I’ll do it when I’ve added functionality X”.  Get it in the system as soon as you can.  That way, it’s out of the way and you won’t try to find some excuse for putting it off later when the testers are begging you to finally do it.  I view CI as a fundamental part of collaborative software development and the responsibility of any developer who sets up a new project.  If you’re a tester and the devs won’t do it themselves, do it for them.  You need it.  Trust me, you do.
  • Use what makes sense.  Where I work, we use CCNet and nant.  Not necessarily because they’re the best, but because they’re what we’ve used for years and they make sense for us.  make is ancient and confusing, we don’t use Ruby for anything, so Rake would be a bit outside of our world.  If you can do what you need to do with a batch file and a scheduled task, then go for it.  Although, trust me, you’re going to want something that can handle all of the automatic source control checkout stuff and allow wide flexibility in how it fires off builds.  And don’t pay for anything.  Seriously, there’s free stuff out there, you don’t need to be paying thousands of dollars a year just because something that has a great sales pitch.
  • Consistency is the key.  On a major project about a year and a half ago, we standardized the set up of our projects and build scripts.  Ever since then, we’ve reused that model on a number of other projects, and it has been amazingly effective.  As long as we give parts of our projects standard names (project.Setup, project.Tests, etc.), we can copy and paste a small configuration file, change a few values in it, and in just a few minutes, we have that project being built, deployed, and tested in our CI system.  An additional benefit is that our projects now all have a similar organization, so any part of the code you look at will have a sense of familiarity.
  • Flexibility is also the key.  While we’ve gotten enormous gains from standardizing what we can standardize, there’s always going to be something that has to be different.  Don’t paint yourself into a corner by forcing everything you put in the system to act in exactly the same way.  However, don’t be too flexible.  If someone’s trying to take advantage of the flexibility in the system because they’re too lazy to be consistent, then, by all means, break out the Cane of Conformity and make them play by the rules.
  • Your build system is software, too.  Don’t think of it as just a throwaway collection of files to get the job done.  It isn’t.  It’s an integral part of your software.  It might not be deployed to production or shipped to the customer, but everything that is deployed to production or shipped to the customer will run through it.  Do things that make sense.  If you can share functionality, share it.  If you can parameterize something and reuse it, do that.  Check in your build scripts and CI configuration.  Remember, you’re going to have to update, extend and maintain your build system as you go, so it’s in your best interest to spend the time to make a system that’s simple to update, extend and maintain.
  • Be mindful of oversharing build scripts.  You want to try to make your build scripts and CI configs so that they’re modular and reusable, but be careful that you don’t share too many things or share the wrong things between unrelated projects.  At my company, we have a handful of teams, and most of them have one or more build servers.  At one point, one of the teams was reorganized into several smaller teams, and sent to work on wildly divergent projects.  However, they continued to share a single library of build scripts.  Some time later, someone made a change to the one of the scripts in the library that he needed for his project.  His project on his server worked just fine.  Then, two weeks later, every other server that used these shared scripts began to fail randomly.  No one had any idea what was going on, and it took several hours to trace the problem back to the original change.  This illustrates the danger of sharing too widely.  You want to try to structure the build script libraries in such a way that changes are isolated.  Perhaps you can version them, like any other library.  Or, like we’ve done for most of our projects, copy the build script libraries locally into the project’s directory, so everyone’s referencing their own copy. 5
  • Check in your build scripts and CI configuration.  I know I just said that in a point above, but it’s important enough that it deserves its own bullet.  Your build scripts are an important piece of your software, so when they change, you need to rebuild your code, in order to make sure that the process still works.  You want them checked in for the same reasons the rest of your source code is checked in.  We even have our CCNet.config checked in, which means that our Continuous Integration systems themselves are in CI.  In other words, CCNet will detect a change to the CCNet.config that’s checked in, pull down the latest version, and restart itself to pick up the changes.  Under normal circumstances, we never have to touch the build server.  We just check in our changes and everything is automatically picked up.
  • Don’t touch the build server.  Obviously, you’ll have to set up the box, and once in a while, there’s some maintenance to be done, but for day to day operations, don’t log on to the box.  In fact, don’t even manually check anything out other than the CI configuration file.  Everything that needs to be checked out should be checked out automatically by the CI system.  This even extends to tools, where possible.  We’ve got versions of nant and other command line tools checked into SVN, and any build that uses them is responsible for checking them out.  One of the benefits of this is that it makes it easy to move the builds to a different server in an emergency if something goes wrong.  If any of our build servers dies, then we can probably get things back up and running on an alternate server in about half an hour.
  • PSExec is your friend.  If you’re doing stuff on Windows systems, there’s a tool from Sysinternals called “PSExec”, which makes it fairly straightforward to run a command on a remote machine.  Get it.  Use it.  Love it.
  • Every build should check out everything it needs.  Every build should be able to run from scratch. 6  Every build should check out all the code it needs, all the dependencies, all the tools, all the data files.  It should be possible for you to go on to the build box, wipe out the source tree, and have all of your builds succeed the next time they run.  In fact, I’d recommend intentionally deleting the source tree from time to time, just to may sure that everything is self sufficient.  The reason for this is that every build should be using the latest code.  If Build A depends on a library that only Build B checks out, then there’s a chance that Build A will run before Build B updates, leaving Build A potentially out of date and in an unknown state.  Yes, this requirement often means that there are certain libraries or files that are included by pretty much everything, so even a simple change to those libraries will cause a build storm where everything gets rebuilt.  People hate these, but think about it:  You’ve changed a core dependency, therefore everything has to rebuild because everything was changed.
  • Only check out what you need.  Target the checkouts to be only the directories that you need for that particular build.  Don’t have every project watching the root of your repository, because then you’ll get builds for every single check in, no matter how unrelated the check in was.
  • Don’t set up your CI system with a normal employee account.  You want your CI system to run under an account that won’t go on vacation or get fired.
  • Fail builds when unit tests fail.  If you’re doing unit tests, you want those tests to be running as part of the CI build.  You also want them to fail the build.  The philosophy of unit testing is that when a unit test breaks, your code is broken and you need to fix the issue immediately.  This will very strongly encourage developers to make sure that the unit tests are maintained and in good working order and that their new code doesn’t break anything.
  • Don’t fail builds when functional/regression/integration tests fail.  It’s generally expected that at least some of your functional or regression tests will fail with every build.  If you don’t have at least a handful of regression tests failing due to open bugs in the software, then you need more tests.  Where I work, the functional test builds only fail when there is a problem when building or executing the tests, not when one of those tests fail.
  • Don’t deploy if a build fails.  Deployment should be one of the last steps in a build and should only be performed after the build and unit tests (if you have them) are successful.  Don’t wipe out a previous build that was successfully deployed with something that you know is broken.
  • Don’t archive deployment packages if a build fails.  If the build breaks or the unit tests die or the deployment doesn’t work, don’t archive the deployment package.  This will ensure that any MSI that’s saved is of a build that passed all the unit tests and was installed successfully.
  • Split your builds into logical components.  If possible, avoid having a monolithic build that will build everything in your company.  The bigger a build is, the more stuff that’s included, the more chances for the build to go wrong and the longer a build will take.  You want to aim for quick and small builds for faster feedback.  It’s fine if you have cascading builds, where one build triggers the next, as long as the project that was actually changed is built first in order to give immediate feedback.
  • Don’t filter build failure mails.  EVER.  Build failure notices are vital to the health of a CI system.  When a build breaks, you need to know about it.  However, I’ve seen a lot of people who simply set up a mail rule that forwards all mail from the build server into a folder that they never look at.  DON’T DO THAT.  It’s fine if you filter the successes away, but the failures should be front and center.  If anything, you need to set up a rule that flags failures as important.  I have a mail rule that specifically filters mails from the build server only when the subject line contains “build successful” and does not contain “RE:” or “FW:”, etc.
  • Fix your broken builds.  NOW.  Don’t wait.  Fix it.  Now.  NOW!  When a build breaks because of you, it should leap to the front of your priority queue.  Do not go to lunch, do not go for coffee, do not pass go, do not collect $200.  FIX. IT. NOW.
  • Don’t go home without making sure your builds are successful.  And especially don’t go on vacation.
  • Sometimes, broken builds aren’t the end of the world.  Sometimes it’s okay to check in something you know will break the build.  If you’re doing a major refactor of part of the code, it’s fine to do it in stages.  Go ahead and check in bits and pieces as you go.  In fact, in some source control systems, it would be suicidal to attempt to pile up all of the renames, moves, and deletes into a single check in at the end.  In some cases, you might even want to disable a particular build while you make changes.  (Just make sure to turn it back on again.)

Finally, and most importantly, stop your whining, stop your objections, stop reading this post, and go set up your Continuous Integration system immediately.

  1. You people know seven programming languages, yet you’re afraid of a build script?  It’s not hard.  You’re just lazy. []
  2. Except they can’t, because you’re in Cancun for the next two weeks. []
  3. And, of course, QA got blamed for slipping the release date because they didn’t get their testing done according to schedule. []
  4. Plus, I just wanted to say “Continuous Integration Best Practices” a couple of times to get more hits.  Continuous Integration Best Practices.  Okay, that oughta be enough. []
  5. However you choose to organize it, sticking project specific e-mail addresses in a shared file that everyone references is a dumb idea.  There’s nothing shared about it.  Don’t force me to rebuild everything just because you’ve hired a new intern. []
  6. Well, almost every build, at least.  Obviously a test run is going to need the project it’s testing to have been built first. []

February 6, 2011   No Comments

Automated Testing and TinyMCE

Ever come across one of these TinyMCE editors in the stuff you have to test?  Well, it’s probably going to give you a headache when you do.  There’s a <textarea> in the page, and normally, you’d grab the element, set its value, and call it good.

Not here.

You see…  TinyMCE doesn’t use content of the text area.  Instead, it seems to be dynamically building an HTML document within an <iframe> and the text area is just there to be a placeholder.  Now, I’m sure that if you really wanted to, you can probably manipulate the HTML document in that <iframe>, but if you want to do that, there’s probably something wrong with you.  Fortunately, there’s a better way.

TinyMCE has a JavaScript API.  (Details here…)  This API lets you do all sorts of crazy things when you’re developing a page with a TinyMCE editor on it.  Of course, we’re not developing the page, but fortunately, the TinyMCE editor doesn’t know that.  JavaScript called from the test is just as valid as JavaScript called from the page in its view, so we can call the function to insert text into the document.  Like so:

window.tinyMCE.execCommand(‘mceInsertContent’,false,’contentHTML‘);

where “contentHTML” is the stuff you want to insert.

I’m assuming that all you have to do is insert a bit of content into the editor field, much like it were a regular text input box.  If you have to do more in depth testing of various TinyMCE controls, you’re either testing the wrong thing (the editor instead of your app), or you’re on the dev team for TinyMCE, in which case you shouldn’t be reading my blog post on this subject, you should be writing your own explaining this so that outside testers don’t waste the better part of their day to figure this sort of stuff out.

Now, as far as how to actually call JavaScript code when you’re automating the browser through the DOM…  I thought I’d written something about that before, but now I can’t seem to find it.  I guess that’ll have to be a topic for the future.

May 11, 2010   No Comments

Web Automation (or: How To Write A Bot To Steal Porn)

A while back, I wrote about using the System.Windows.Automation libraries to write automation to drive Windows applications. With SWA and UIAutomation, you can write code to use Win32 apps, Windows.Forms programs, WPF and even Silverlight. That’s all happy and fun, as long as you’re only dealing with Windows applications. Trouble is, there this thing called “The Web” that’s all the rage with kids these days, and sooner or later, you’ll probably have to use it, too. If you pull out your handy installation of UISpy and try to inspect a web page, you get a whole big block of nothing. The red rectangle will outline the window and tell you that all of those things that look like text boxes and buttons aren’t really text boxes and buttons. That means you can’t use SWA for web sites.

That, well, that kinda sucks.  So, what do you do about it?

Obviously, the correct solution here is to admit defeat:   The tool you know about doesn’t work, so it’s too hard to do.  Time to give up and pay thousands of dollars a seat for some whiz-bang tool that promises to do what you need and even has a handy-dandy recorder, so you don’t even have to think about what you’re doing!

Or…  Not.

That whiz-bang tool is only going to cost you money and it’s not going to do a damn thing for you.  You’ll have to pay high-priced consultants and high-priced support engineers just to figure out how it works.  You see, their model is to cram so many features in and make it so complicated to use that you think that you must be stupid because you can’t understand it and as soon as you figure out that one last thing, you’ll be more productive than you ever were before.

And oh, will that test recorder make you productive!  You’ll be able to hire a monkey to point and click your way to hundreds of test cases with ease!  Except that they’re hundreds of useless test cases, because either the verification that the tool provides is hopelessly limited and unable to actually verify your website, or, well, you hired a monkey to do your testing and they have no idea how to do anything beyond pointing and clicking.  But that’s all right.  You see, as soon as a single line in the HTML of your web page changes in just the tiniest way, every last one of those recorded tests will break and you’ll have to completely redo them.

So, SWA is out and the big expensive tool is a total waste.  What else is there?

Well, there’s things like Selenium or WebAii or WatiN.  They’re free or open source libraries that you can use to drive web browsers to do your bidding.  They all support IE and Firefox and possibly other browsers.  And they’re all written by people who don’t seem to have ever tried to write web automation.

  • Selenium:  The default mode is to write your tests in HTML tables, with the thinking that “Anyone can write HTML tables, so anyone can write tests.”  That’s not what happens.  What happens is that you set it up, all the devs and PMs excitedly chatter about how “Anyone can write tests now!”, you give a training session, two devs out of a team of seven will ever write tests using it, creating a grand total of thirteen absolutely worthless tests before giving up, yet somehow, two months later, the director of software engineering will be talking to the EVP of product development and tell him how great it is that we’re using Selenium because “Anyone can write tests now!”, so when you try to tell them what a complete waste it is and how you hate having to maintain the intermediary server and how unstable the test automation is and that we should dump the whole system, they look at you like you’re trying to kill a basketful of cute puppies.
  • WebAii, in my experience, is a tad unstable, and since it’s not open source, you can’t even try to fix it.  Additionally, it needs a plug-in to work, so again, you have to maintain a test machine.
  • WatiN hasn’t even been compelling enough for me to try to use.  That’s not saying it’s bad, it’s just that nothing about it has really stood out to me.

Another thing that really bugs me about these solutions is that many of them don’t really work that well with continuous integration situations, despite claims that they’re designed for that very use.  At my company, our CI servers are all using CCNet, which is running as a service.  When running as a service, you don’t typically get an interactive window station.  In general, that’s fine.  You don’t need one.  Our build servers are spare boxes stuffed in a cabinet somewhere or rack machines living in an off-site datacenter.  Once they’re set up, it’s pretty much all automatic.  We can log into the build box remotely in the rare instance that something does go wrong, but we never stay logged in.  In fact, we can’t.  You see,  in my company (and probably in yours), there are computing security policies in place that prohibit leaving an unattended computer logged in and unlocked.  If you leave your computer unlocked, it will be locked for you.1  Trouble is, most of these web automation libraries I mentioned above require a logged in and unlocked session to function at all. 2

Okay, so no SWA, no expensive tool, and now the free stuff is shot down, as well.  What’s left?

Wouldn’t it be great if there’s something that’s free?

Wouldn’t it be great if there’s something that’s already installed on pretty much every copy of Windows since 95 OSR2?

Wouldn’t it be great if there’s something that works in headless service environments?

Wouldn’t it be great if there’s something that uses the same technology as the majority of web users?

In other words, why don’t you just use Internet Explorer to do your web automation?

Now, I’m guessing that you just answered my question with some sarcastic remark regarding Firefox, so let me address that before continuing.  Yes, using IE means you’re not using Firefox.  I understand that you like Firefox and all, but in the real world, people use IE.  Additionally and importantly, it usually doesn’t really matter that you’re only using IE.  Most of the differences between browsers are cosmetic things, like Firefox’s strange habit of occasionally making oversized divs that mask clickable areas or IE6 generally making every page look as attractive as cat vomit.  Normal web automation, regardless of what tool you’re using, will typically not pick that sort of thing up.  Web automation looks at the structure and functionality of the page, but it’s blind to the looks.  Many of the other tools I mentioned do support Firefox, if you need it, but you probably don’t need it.   After several years of web testing, I’ve only come across a handful of cases where running an automated test in Firefox would have picked up issues that would not have been seen in IE. 3  For the most part, going the extra mile to support Firefox in your automation is unnecessary and simply complicates things.

So, let’s look at using IE to solve all of your automation problems!

Okay, it won’t solve all your problems.  In fact, it’ll create new ones, I guarantee it.  But still, it’s very useful.

But first, a little warning…

You’re going to have to use COM.

Well, okay, you don’t have to use COM.  There is a .Net Web Browser class that you can probably do most of these things with, but I don’t use it.  I don’t use it because, as far as I’ve found, there’s no way to attach it to a real instance of IE.  Instead, you’d have to write your own little Windows Forms app, stick the control on it, and use it that way.  That might work for you, but I’ll stick to the full instance of IE that I can watch and manually interact with if necessary, even if it means using COM.  It is COM in .Net, though, so it’s not as bad as straight COM in C++.  There’s no QueryInterface or CComPtr<>s anything like that.  There are slightly weird things now and then, but they’re not that bad.

Right.  Disclaimer out of the way, let’s get started.

First, you need to add two references to your project.  Add a reference to your project, go to the COM tab in the dialog, and select “Microsoft HTML Object Library”, which will give you MSHTML, and “Microsoft Internet Controls”, which will give you SHDocVw.

 MSHTML is where all of the HTML parsing and related classes live.  SHDocVw is where the Internet Explorer classes are.

Now that you’ve added those references, add your using statements for the libraries, so you won’t have those ugly namespaces all over your code.

using mshtml;
using SHDocVw;

Note that although the reference to MSHTML gives the name in all caps, the namespace is, in fact, lower case.  SHDocVw is the same case both places.

Once you’re set up, you can create an instance of Internet Explorer that will launch and be ready for you to drive it through your code with one line:

InternetExplorer ieBrowser = new InternetExplorerClass();

Of course, there’s a slight problem here.  You can’t actually see the browser.  It’s there, trust me, it’s there, and pretty much everything I’m about to talk about will still work, even though you can’t see it.  However, in the interest of proving to you that what I’m talking about does, in fact, actually work, let’s make a minor modification so you can see things.

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;

There, if you run that, IE will pop open.  It won’t do much yet, but at least there’s some progress being made.

A brief aside:  You may have noticed that I created an instance of “InternetExplorerClass”, but assigned it to a variable of type “InternetExplorer”.  I did that because InternetExplorer is actually an interface, so you can’t create an instance of it. 4  InternetExplorerClass is the actual class that you need an instance of.  You could probably also do something with Activator.CreateInstance(), but I’m not going there.  I’ll have more about interfaces in a bit.

Back to the fun, to prove that we’re in control, and to start doing something actually useful, let’s point the browser at a website.  Let’s have our browser go to everybody’s favorite search engine:  Dogpile.com.  To navigate the browser you’re in control of, you use the .Navigate() method.  Unfortunately, .Navigate is all COMtaminated and ugly. 5

No, that’s not Intellisense having a freak out.  The signature of the Navigate method is actually void IWebBrowser2.Navigate(string URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers).  You only care about the URL, but it’s not going to provide you with an overload that only uses the URL.  Instead, you get all of this “ref object” crap. 6

I bet your first instinct is to think that you’ll just pass nulls to the parameters you don’t care about, and compile it and be happy.  Well, that ain’t gonna work.  See, the “ref” part of the signature means that it expects an actual object reference.  null is not an object reference, null is nothing.  The compiler won’t let you pass in nulls directly.  However, you can pass in a null object reference, and that’ll work.  Like so:

public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
{
    object nullObject = null;
    ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
}

You may have noticed that I put the Navigate call inside a helper method.  Helper methods and wrapper classes are one of your closest friends in the world of SHDocVw and MSHTML.  It’ll help hide all of the IE COM object’s interesting personality quirks in much the same way that girl you met on Match hid her interesting personality quirks until the fifth date.  Trust me, you don’t want “ref nullObject” a thousand different places in your tests, largely because it’ll scare the hell out of anyone reading your code.

If you go back to the main function and call NavigateToUrl(ieBrowser, "http://www.dogpile.com");, then run the code, you’ll have a browser that will open up and go to Dogpile all by itself.  Of course, if you are running the code as we go, you’ll probably have noticed that the browser remains open after your program exits.  Let’s take care of that before your computer explodes under the weight of a thousand IEs, shall we?  Just call .Quit() on the browser and it’ll go away.

If you call .Quit() immediately after navigating, the browser will probably close before the page even loads, so let’s add a sleep for a few seconds so you can see what’s going on.

For those of you playing the home game, here’s what my main function looks like at this point:

InternetExplorer ieBrowser = new InternetExplorerClass();
ieBrowser.Visible = true;
NavigateToUrl(ieBrowser, "http://www.dogpile.com");
Thread.Sleep(5000);

//Do stuff here...

ieBrowser.Quit();

At this point, the code above is fairly useless.  Sure, you can use this to build a program that forces IE to navigate to web pages all day, but that’s not terribly exciting.  We’re having the browser navigate to a search engine, why don’t we search for something?

(By the way, you’ll want to leave that Thread.Sleep(5000); where it is.  I’ll come back to it later, but for now, DON’T TOUCH!)

When you search for something, what do you do?  Type a word in a box and click a button, right?  That’s what we need to do here.  The InternetExplorer object allows you to access all of the HTML elements on the page and interact with them, including text boxes and buttons.  If you’ve ever used JavaScript and dealt with the Document Object Model, or DOM, the methods and properties you’ll find in MSHTML will be very familiar, because they’re another implementation of the DOM standard.  The way you gain access to HTML elements is through the .Document property.

If you try to use it, Intellisense will be really helpful and tell you that the .Document property is an object.  A plain object.  A plain, useless object.  So what is the .Document property returning?

An IHTMLDocument object.

Or an IHTMLDocument2 object.

Or an IHTMLDocument3 or 4 or 5 object…

Now’s probably the time to talk about the use of interfaces in MSHTML.

In the land of .Net, if you had an IHTMLDocument5 interface, it would probably derive from IHTMLInterface4, which would derive from 3 and so on.  IHTMLDocument5 would have all of the stuff that was on the previous four interfaces, so that would be the only one you’d ever need to use, at least until IHTMLDocument6 comes along.  Not so in the land of MSHTML.  I’m not sure if it’s a COM restriction, a C++ thing, the way .Net deals with COM interfaces, or some strange design decision on the part of MSHTML, the end result is that IHTMLDocument3 and IHTMLDocument2 are pretty much independent.    If you want the title of the page, you need a reference to an IHTMLDocument2 object. If you want to call .getElementById(), you need IHTMLDocument3.

But that’s only if you want to do it the “Right” way.  If you want to do it the quick and easy way, then the .Document property is returning an HTMLElement object.  That’s the class that implements IHTMLDocument*, so it’s got everything on it.  If you want the page title and if you want to call .getElementById(), HTMLDocument will work for you.

Of course, it’s slightly riskier to do it that way.  The interfaces guarantee the contract, the class does not.  Microsoft could change the class at any time and you’d be screwed.  However, I highly doubt they’re going to do anything like that, because it would screw them over far more than it’ll screw you over.  In other words, just use HTMLDocument and you’ll have access to all the available properties, functions, events, etc., without having to cast between the interface types three hundred different places in each method.

It’s important to know that IHTMLDocument*s exist, since that’s where you’ll find much of the documentation.  And on a similar note, all of the HTMLElements that I’m going to talk about have corresponding interface types, and they’re usually what’s documented or talked about.  So, if you can’t find something about how HTMLElement works, try looking for IHTMLElement.  Or IHTMLElement2.  Or 3.  Or 4.

Now that we’ve taken that little vacation, let’s get back to work here.  I made such a big fuss about getting the page title, so let’s do that here.

HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
string pageTitle = htmlDoc.title;
Console.WriteLine(pageTitle);

Before you can interact with an element on a page, you have to find it in the document.  There are two easy ways to find things, along with a few ways that aren’t quite that easy.  Here are the ones I find the most useful.

  •  .getElementById(string):  This method takes the ID of the HTML element you want and returns the IHTMLElement with that ID. In HTML, an ID is supposed to be a unique identifier, identifying a single element.  Of course, certain popular HTML editors (like Notepad, for instance) won’t enforce a unique ID, so if there are multiple elements with the same ID, this method will return one of them.  This one is good if you know exactly what you’re looking for.
  • .getElementsByName(string):  This method takes the name of HTML elements and returns an IHTMLElementCollection of all of the elements with that name.  This one is good if you have an element or a handful of elements with a known name.
  • .getElementsByTagName(string):  This method takes a tag name, like “a” or “img” or “div” and will return an IHTMLElementCollection of all of the elements with that tag name.  Use this method to quickly get a collection of all of the links or images on a page, or if you’re looking for an element of a certain type with certain characteristics and need to run through the list to find it.
  • .documentElement:  This property returns an IHTMLElement of the root of the HTML content of the page.  On a page that plays by the rules, this will be the <html> element.  If your page doesn’t play by the rules, good luck.  This is a good starting point if you want to walk the tree.
  • .childNodes:  This property will give you an IHTMLElementCollection of the direct children of the current node.
  • .all:  This property returns an IHTMLElementCollection containing a flattened list of all of the elements in the document.  Use this when you don’t care about structure and need to do something that involves lots of nodes of different types.

Unfortunately, as far as I’ve found, there’s no support for something like XPath, which would let you give the node tree path of the elements you want in a simple string format.  If you enjoy pain, you could build something like that yourself.

 The specified return type for most of these methods is IHTMLElement, which is the base element type in MSHTML.  In reality, the element instances are all specific element types.  For instance, an <img> tag will return an IHTMLImageElement object, and an <a> will give you an IHTMLAnchorElement object.7  The specific types will have specific properties, so if you know what element type you have and you need to use it for something (Say, for instance, if you need to get the src attribute from an <img>), then you should cast it to the specific type.

Right-o, let’s start doing useful stuff, shall we?  Back before we took a wild turn and ended up hopelessly sidetracked, we had Internet Explorer going to the front page of the search engine Dogpile.com.  Now, let’s make IE do a search.  To do that, we need to grab the search box, put text in it, then grab the search button and click it.  We’ll use the .getElementById method I talked about to get the search box.  Using something like the IE developer tools or Firebug8 or even viewing the page source in Notepad, you can find that the search box is an <input> element with an ID of “icePage_SearchBoxTop_qkw”.

IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
textBox.value = "powered by awesome";

If you run this, the browser will open, and the phrase “powered by awesome” will appear in the search box.

A couple of points to note.  Even though in the HTML, the search box is an <input> tag, the element you’ll get back is an IHTMLInputTextElement.  The different <input> types are all represented by distinct classes, which is very helpful, because there’s not much in common between a checkbox, a text box, or a button.  Then, once you have the element, it has a .value property, which acts as a getter and setter for the contents of the text box.

Grabbing the submit button is similar:

IHTMLInputButtonElement submitButton = (IHTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");

Unfortunately, when you try to click the button, you’ll run into this:

There’s no click there.  There’s nothing remotely resembling a click.  A button’s sole reason for existing is to be clicked, yet you can’t click this button.

Actually, you can.  Just not on IHTMLInputButtonElement, where you’d think you should be able to.  You see, you can actually click on any HTML element, so the .click() method is on the base IHTMLElement.  This goes back to the interfaces I went rambling on and on about a while back.  To find the functionality you want, you sometimes have to bounce around almost randomly until you find what you need.  So, to hell with the interfaces, let’s go directly with the concrete class again, like we did with the document.  In this case, it’s HTMLInputButtonElement. 9

HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
submitButton.click();
Thread.Sleep(5000);

Again, there’s a Thread.Sleep() after the action, so the program will wait long enough for the page to finish loading.  And again, I promise I’ll talk about it later, but for now, trust me and just leave it there.

Now we’re on an entirely new page.  If you try to use the document or the elements you grabbed before, the results will be, uh, shall we say, unpredictable…  The old page no longer exists, so don’t try to use anything from it.  You have to grab a new reference to the document, as well as new elements to play around with.

We’re on a search results page now, so let’s do something like print out all the result titles and the URLs to all of the images.

Console.WriteLine("Links by tag name:");
foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
{
    if(anchorElement.className == "resultLink")
    {
        Console.WriteLine(anchorElement.innerText);
    }
}

Console.WriteLine("Links by result walking:");
IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
{
    IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
    Console.WriteLine(resultLink.innerText);
}

Console.WriteLine("img src:");
foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
{
    Console.WriteLine(imgElement.src);
}

Console.WriteLine("img src 2:");
foreach (IHTMLImgElement imgElement in htmlDoc.images)
{
    Console.WriteLine(imgElement.src);
}

The first bit walks through all of the links, which are <a> tags, looking for elements with the class “resultLink”.  When it finds one, it prints out the .innerText property, which contains the flattened text content of the element.  The second section finds the same elements, but walks through a bit of the tree structure to find the links among children nodes.

I should probably point out now that the structure of websites tends to change over time, so if you try to run this and you get a bunch of exceptions, that’s what’s going on.  If anything on the page changes, this code is likely to break.  It works for me right now, and that’s really all that matters anyway.

The last bit, the part with the “img src” is walking through the page and printing out the URLs of all of the images on the page in two different ways.  First by using the tag name method you already have seen, and the second time by using the .images convenience property on the document.  There are a few other properties like that, so take a look at what Intellisense shows you and play around a bit to get a feel for what’s there.

BUT WAIT, THERE’S MORE!

We’ve got all this access to stuff on the page.  We can put text in text boxes, we can click buttons, we can read the links and images, so why not step it up a notch and modify the page in some crazy way.  Like, I don’t know, maybe we could put a box around every div on the page?

Like so:

foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
{
    divElement.runtimeStyle.borderStyle = "groove";
    divElement.runtimeStyle.borderWidth = "3";
}

KABOOM!

Of course, a crazy box cascade is of little practical value, but you get the basic idea of what you’re able to do.  You’re inside the page being rendered, so you can completely rewrite it if you want.  You’re not stuck with a static, read-only page, so learn how and where to use that to your advantage.  I’ve used this ability inside tests to write out debug information or inject JavaScript functions to be called by the automation.

Here’s the full example code from today:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using mshtml;
using SHDocVw;
using System.Threading;

namespace IEAutomationSample
{
    class Program
    {
        static void Main(string[] args)
        {
            InternetExplorer ieBrowser = new InternetExplorerClass();
            ieBrowser.Visible = true;
            NavigateToUrl(ieBrowser, "http://www.dogpile.com");
            Thread.Sleep(5000);

            //Do stuff here...
            HTMLDocument htmlDoc = (HTMLDocument)ieBrowser.Document;
            string pageTitle = htmlDoc.title;
            Console.WriteLine(pageTitle);

            IHTMLInputTextElement textBox = (IHTMLInputTextElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkw");
            textBox.value = "powered by awesome";

            HTMLInputButtonElement submitButton = (HTMLInputButtonElement)htmlDoc.getElementById("icePage_SearchBoxTop_qkwsubmit");
            submitButton.click();
            Thread.Sleep(5000);

            htmlDoc = (HTMLDocument)ieBrowser.Document;

            Console.WriteLine("Links by tag name:");
            foreach(IHTMLElement anchorElement in htmlDoc.getElementsByTagName("a"))
            {
                if(anchorElement.className == "resultLink")
                {
                    Console.WriteLine(anchorElement.innerText);
                }
            }

            Console.WriteLine("Links by result walking:");
            IHTMLElement resultContainerDiv = htmlDoc.getElementById("icePage_SearchResults_ResultsRepeaterByRelevance_ResultRepeaterContainerWeb");
            foreach (HTMLDivElement resultDiv in (IHTMLElementCollection)resultContainerDiv.children)
            {
                IHTMLElement resultLink = (IHTMLElement)resultDiv.firstChild;
                Console.WriteLine(resultLink.innerText);
            }

            Console.WriteLine("img src:");
            foreach (IHTMLImgElement imgElement in htmlDoc.getElementsByTagName("img"))
            {
                Console.WriteLine(imgElement.src);
            }

            Console.WriteLine("img src 2:");
            foreach (IHTMLImgElement imgElement in htmlDoc.images)
            {
                Console.WriteLine(imgElement.src);
            }

            foreach (HTMLDivElement divElement in htmlDoc.getElementsByTagName("div"))
            {
                divElement.runtimeStyle.borderStyle = "groove";
                divElement.runtimeStyle.borderWidth = "3";
            }   

            Thread.Sleep(5000);
            ieBrowser.Quit();
        }

        public static void NavigateToUrl(InternetExplorer ieBrowser, string url)
        {
            object nullObject = null;
            ieBrowser.Navigate(url, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
        }
    }
}

As always, you can pull the project out of SVN:  http://www.mathpirate.net/svn/Projects/IEAutomationSample/

That’s about all I wanted to get into as far as a hands-on demonstration.  Now, it’s time for warnings about what can and will go wrong.  So watch out.

First, as promised, let’s talk about those Thread.Sleep()s that I scattered throughout the code.  They’re there because you have to wait for the browser to finish its work, otherwise you’ll get random exceptions.  Exceptions that will never happen when you step through in a debugger, either.  However, it’s not a good practice to rely on sleeping for a fixed amount of time in your automation.  If the browser loads the page in half a second, but you’re sleeping for five seconds, then you’ve wasted four and a half seconds.  That kind of time adds up fast.  On the other hand, if the process is slow, five seconds might not be enough.  Your application will wake up too early and die.

In most cases, I’d suggest polling.  Check the status of something, or look to see if something exists fairly frequently, but keep looking for a reasonable amount of time.  For instance, you could check the .Busy flag on the InternetExplorer object every 100 ms for 30 seconds.  That way, you’ll never sit around for more than 100 ms longer than you need to, plus, you’ll keep checking long enough to be sure that it will finish.  If the page isn’t done loading in 30 seconds, you should probably fail right there.

Except that polling the .Busy flag doesn’t actually work reliably.

If you try to poll the Busy flag exclusively, you’ll find that your tests will sometimes randomly fail.  They’ll look like they should be working.  IE will be loading the page you expect it to load and everything will look right, but you’ll get an exception.  You see, you’re not synchronously driving IE.  You’re talking to an intermediate layer that’s relaying your commands to IE, and IE will respond eventually.  What that means is that you’ll tell IE to load a page, then you’ll check the Busy flag.  Most of the time, Busy will return true because it’s loading the page or false because it’s done loading the page.  But sometimes, your check on the Busy flag will get to IE before it’s started loading the page.  In this case, Busy will return false.  As far as IE is concerned, it’s not busy.  It’s done loading the page.  Trouble is, it’s telling you that it’s done loading the last page, not the page you just told it to load.

One way to counteract this is to sleep for a small amount of time before starting the polling, perhaps 250 ms.  This usually gives IE a chance to start moving and will increase the reliability.  However, it’s going to have the same problem as sleeping did originally.  You’ll often be wasting time waiting around for something that’s already done, and occasionally, you still won’t be waiting long enough.

Another way to combat this is to listen to some of the events hanging on the InternetExplorer interface.  There are events, such as NavigateComplete2, DocumentComplete, and DownloadComplete that you might be able to handle and set your own status flags in.  For instance, you can set a flag before you start to navigate, then have your NavigateComplete2 event handler unset that flag when it’s called.  If it’s called…  And if it’s called for the correct navigation event.  You have to be very careful with some of these events.  I believe DownloadComplete is fired by XMLHttpRequests used by AJAX calls, so that could trip up your detection.  NavigateComplete2 will get called when the main page finishes loading as well as when a frame finishes loading, so if you have a hidden iframe on your page for something like tracking and analytics, watch out for that.

I still have not found a flawless way to wait for page completion.  I’ve found a complicated tangle of states and flags and events that make it work in most cases, but not all.  So, good luck with that.

Security will also get in your way when dealing with IE Automation.  Microsoft rightfully doesn’t want script kiddies and other assorted bastards being able to do things like automatically download files to your computer.  Unfortunately, script kiddies are using the same bit of DOM technology that you’re trying to use, and MS has no way to tell you apart, so that means that sometimes you’ll be blocked from doing things.  I don’t think you can read from a password text box and I don’t think you can directly write to a file upload control.  Sometimes when you click links or buttons that launch certain actions like file downloads, you’ll get a yellow bar that wouldn’t be there if you’d clicked the button yourself.  You have to find crazy workarounds for these issues.  Sometimes you’ll spend all day trying circumvent IE’s security just to click one stupid button.

Another issue you’re likely to run into are random, unexplained failures, often with useless error messages, like “COMException -21234115153” or “RPC server has exploded, try again.”  Many of these exceptions will be timing problems.  Wait just a little longer and you’ll be fine.  I’ve had the constructor for the IE COM object give me an instance of IE that had already been destroyed.  Some errors I’ve seen are obscure COM threading issues.  You’ll get InvalidCastExceptions trying to access some of the properties, like  .location or .frames, even though you’re not casting anything.  You can sometimes fix those by setting your application to run in a Single-Threaded Apartment (Whatever in the hell that means) by putting the [STAThread] attribute on your Main method…  If you have a Main method.  If you’re in some library, or someplace like NUnit or VS Unit Tests, well, then, you’re just plain screwed.  And just this past week, I ran into a case where ieBrowser.HWND would throw an InvalidCastException every other time I called it.  Seriously, odd numbered of calls led to an exception, while even calls gave me a number.  The fix?

try { hwnd = ieBrowser.HWND; }
catch { hwnd = ieBrowser.HWND; }

Seriously.  I wrote that this week.  WTF?

I still feel dirty.

And finally, speaking of dirty, writing a bot to steal porn is left as an exercise for the reader.

  1. After some kind soul Hasslehoffs your desktop… []
  2. For that matter, so does SWA, but that’s a different story. []
  3. A Firefox specific toolbar and some Javascript issues []
  4. It really bugs me, too, because it should be IInternetExplorer… []
  5. And to make it even better, there’s a Navigate2() method, which is even uglier. []
  6. In the C++ world, the ref objects are all VARIANT*s.  The .Net magic that lets you use COM translates the VARIANT to object and the * to the ref.  Unfortunately, every one of those parameters could have had a strong type.  Flags is an int, TargetFrameName is a string (Well, BSTR, but whatever), and so on.  It didn’t have to be like this!  ARGH COM. []
  7. Okay, they’re really HTMLImageElements and HTMLAnchorElements, but who’s keeping track? []
  8. Yeah, Firebug is for Firefox, but a good web tester will have at least two or three browsers at the ready at all times. []
  9. Just don’t look too closely at the definition of HTMLInputButtonElement or HTMLDocument or any of the other things I called concrete classes, or you’ll discover that they, too, are interfaces.  The actual class is HTMLInputButtonElementClass or HTMLDocumentClass.  Whatever.  I don’t know what’s right and what’s real anymore… []

February 13, 2010   2 Comments

UI Automation: Tricks and Traps

UI Automation and testing can be among the trickiest areas of software testing.  Directly testing an API is relatively easy.  You’ve got functions to call, well defined inputs and outputs.  It’s meant to be used in the way you’re using it when you write your tests.  You can spend most of your time writing real test cases that will generally work correctly with minimal effort.  UI Automation, however, isn’t nearly as friendly.  You’ll sometimes spend hours twisting and tweaking one test case to get it running, and even then it’ll still randomly fail 25% of the time.

A large part of this is due to the fact that a UI is meant to present an interaction model for a human.  It’s not actually meant for another computer program to deal with.  A person is clicking the buttons and typing text in the text boxes and so on.  Allowing a computer to interact with it is usually an afterthought, hacked together using technologies that will work, sometimes, and only if the application programmer followed the rules.  If they’re not using a button, but instead are using something that they’re drawing themselves to look and act like a button, it’s not going to be a button for you and your UI tests aren’t going to be able click it easily.

Another major problem with UI tests is that the user interface frequently changes.  That’s not supposed to be a radio button, it’s supposed to be a check box.  Move that button after the text box.  Make that list box a combo box.  The UI is often the most fluid piece of a software application.  Once the API is in place, your API tests have a decent chance of working version over version, because an API isn’t subject to focus groups or marketing studies.  But it’s very rare to leave the UI untouched between versions.

There’s also a problem of perception regarding what UI tests do.  People often think that since UI automation is testing the UI, that means that it’s covering the look of the UI, as well.  Most of the time, it won’t because it can’t.  It’s very difficult to have automated visual testing.  Sure, you can compare screenshots, but what if the window size changes?  It’ll break if you move a button or box.  Your graphical verification tests will report complete and total failures if you took the screenshots on plain XP and someone later uses Vista with Aero to run them.  Hell, they’ll likely die if you turn font smoothing on or off.  Doing something so fragile is what we testers call “A Waste of Time”.  UI testing generally doesn’t cover the look of the application.  Instead, it verifies the correct functionality of the controls in the application.  It’s possible to have your UI tests reporting a 100% success rate when nothing is shown on the screen.  As long as the controls are accessible in the way you specify in your tests, they’ll run.

So then, what can be done about automated UI testing?  It’s obviously very valuable to have, despite the difficulties.  Here’s a few tips and tricks, as well as some traps to avoid.

Name Everything:

In web applications, you can give elements IDs or names.  In regular Windows apps, you can use SWA or  MSAA to identify things.  At any rate, anything you interact with should be uniquely identifiable in some way.  If you’re a developer, do this.  If you’re a tester, get your devs to do this.  If they refuse, do it for them.  Naming things will tend to make your automation resilient in the face of most general changes.  Bits and pieces can move around, but as long as they’re named the same and work the same way, your test will probably survive.

You don’t have to give a completely unique identifier to absolutely everything.  What I’ve found that tends to work well is giving logical groups a unique id for the current window or page, then naming repeated controls.  Consider, for example, a page of results from a search engine.  You’ll have a search box at the top of the page and at the bottom, and you’ll have multiple sets of results in the middle area.  Give the logical areas unique IDs, like “SearchBoxTop” and “SearchBoxBottom” for the search boxes, and “MainResults”, “AdResultsRight” and “AdResultsTop” for the result sections.  Then, those areas can share names across them.  For instance, I don’t really care that I’m dealing with the top search button or the bottom search button specifically.   All I need at that point is “Button”.   “Button” can be used as a name for fifteen controls on the page, but I already know that it’s the top search button I’m using because I got it in the context of SearchBoxTop. 

Turn UI Testing Into API Testing.  Sort Of…:

I’ve seen UI test code that’s an unreadable mess of copied and pasted bits to extract controls or elements followed by copied and pasted unreadable messes where the controls or elements are fiddled with followed by messes of bits that had been copied and pasted to the point of unreadability which extract results from controls or elements.  In fact, that’s what pretty much any test recorder will spit out at you.  It’s a total nightmare to look at and deal with even on a good day, and if you’re looking at it and dealing with it, chances are it’s not a good day.  Chances are all your tests broke last night and now you have to dig through a hundred separate tests and repair the element extraction code in each one of them, all because your UI developer made a “quick change” from tables to divs in the page layout.  Even though the page looks identical, the entire structure is different now and nothing is going to work.

I mentioned in the intro that API testing was relatively easy, because it’s typically well defined what you’re doing and how things are expected to function.  Things may fail, but usually they’ll fail in somewhat predictable ways.  Well, the best way I’ve found to make UI testing easier is to make it closer to API testing.  Wrap the code that interacts with the UI that you’re testing in classes and functions that behave somewhat predictably and expose the bits and pieces of the UI in ways that make sense.1  I prefer to create a class with ordinary properties or methods that operate on a web page or dialog or whatever.  Going back to the web search example, you’ll have a page with a text box and a button next to it.  That translates to a simple class along these lines:2

public class SearchPage
{
    public string SearchText { get; set; }
    public void ClickSearch();
}

Then it’s up to the SearchPage class to determine how to find the text box how to click the button, and to deal with all of the nonsense and WTFery that the UI throws at you.  Your test case that needs to do a search then only needs these two lines:

...
    SearchPage page = new SearchPage();
    page.SearchText = "nuclear manatee seesaw detector";
    page.ClickSearch();
...

In that example, it should be clear to anyone looking at the code what’s going on.  It’s not full of element paths and SWA control patterns.  I’m just setting the text of the search box and clicking a button.  Your test usually doesn’t care about the mechanics of getting the textbox filled in or what kind of stupid tricks are required to click the button, and it shouldn’t.  Doing it this way means that it won’t.  And then the next time the devs make a “quick change” that breaks everything, you only have to make a “quick change” yourself to the code of the wrapper classes and everything should be fixed.

Always Have A Plan B.  And A Plan C.  (And D…):

Successful UI Automation often requires hacks.  Not just hacks, but dirty hacks.  If you feel completely clean after writing UI automation, then there’s a good chance your tests won’t actually work.  Start by trying to do everything the “right” way, using the controls provided to you by SWA or the browser DOM or what have you.  They’ll work, most of the time.  Unfortunately, every so often you’ll run into a button or a dialog that just doesn’t behave.  Sometimes there are security measures put in place to prevent automated tasks from doing certain things, for instance downloading files in a browser.  You have to be ready to defeat whatever is thrown in your way.  Remember, you’re dealing with UI elements, so if you have to, you can act like an actual user.  Can’t “click” a button using SWA’s InvokePattern?  Try simulating a mouse click or sending keystrokes to the application (Space or Enter will usually activate a button that has focus).  Hell, if you need to, don’t be afraid to buy a Lego Mindstorms kit and build a robot that can click a physical mouse button for you.

SendKeys is Your Worst Enemy

 Available through the Windows API, as well as exposed in the .Net Framework, there’s a function called “SendKeys”.  It lets you send keystrokes to windows.  The application will then respond as if an actual user pressed the keys.  You can use keyboard shortcuts, tab through dialogs, type text into textboxes.  Pretty much anything a user can do from a keyboard, you can do with SendKeys.  It might be tempting to write all of your UI automation using SendKeys, but don’t.  Just don’t.  SendKeys is one of the least reliable and most fragile ways to try to interact with your software.  It won’t survive any kind of change to the interface, and even when it is set up properly, it doesn’t always work right.  Keys will get lost or come early or late, and if the focus changes at all for some reason, you’re screwed.

SendKeys is Your Best Friend

When all else fails, SendKeys will get the job done.  I once ran across a pretty normal looking Windows dialog, with normal looking buttons.  Unfortunately, for whatever reason, the dialog refused to respond to any kind of standard attempts to reach it.  I tried SWA first, and although I could find the button I wanted to click, Invoking it did nothing.  So I tried sending the button a Windows Message to tell it that it had been clicked.  Still nothing.  Then I tried setting its focus and sending the Enter key and still nothing.  In the end, what worked was SendKeys(“{Right}{Right}{Enter}”), which selected the button and triggered it completely from the keyboard.  Not a happy solution by any means, but it worked and that’s all that matters.  It’s definitely worth learning its syntax for those obscure cases where you need to hold down ALT for twenty seconds or whatever.3

Beware of “Don’t Show This Dialog Again” and Similar Conditions

You know that dialog option.  It’s everywhere and you always check it.  It turns off stupid things like the “Tip of the Day” or warnings about the mean and scary hackers that want to steal your life on the Internet.  And it will come back to bite you when you try to do UI automation.  You’ll write your tests on your machine and they’ll run beautifully.  Then you’ll put them on your automation box and they’ll fall apart because there’s some window or dialog that appears that you had long forgotten about.   You’ll need to alter your test to take into account the possibility that an optional dialog might be there and handle it if it is or move along quickly if it isn’t.  Speaking of which…

Waiting, Waiting, Waiting…

Pretty much any piece of UI automation will have some kind of timing dance.  Normal API testing is usually synchronous.  You call a method and are blocked until it returns or have some clear way of waiting for an asynchronous operation to complete.  This is often not the case with UI automation.  After all, you’re trying to run something that doesn’t know about you and doesn’t care about your schedule.  As a human, you click an icon, wait a few seconds, and continue when the application has finished opening.  You can’t just do that with your automated test.  You click, fine, that’s easy.  Then what?  You have to wait, but for how long?  One second?  Two?  What happens if your virus scanner kicked on when the test is running and now it takes ten seconds to open the application?  It’s ridiculous to force your test to wait for ten seconds every time just in case something goes wrong, but it’s equally bad to only wait one second and fail the test one out of ten times when something does go wrong.  The common solution is to poll, looking for something you expect, like a window with a certain title.  Every 100 ms or so, see if the window (or whatever) you’re waiting for is there yet.  But don’t wait forever, because if it doesn’t show up, you don’t want to be stuck.  Use a reasonable timeout that you’re willing to wait before giving up and fail after you reach that point. 

Okay, so you’ve waited for the window to show up, so you can continue with your test.  CRASH!  Well, sure, the window is there, but the control you’re trying to use won’t actually be visible for another 20 ms, so your test dies in a fire.  Watch out for things like that.

Wherever possible, use some indicator within the application itself as a guide for when something is done.  If your app has a status bar that reads “Working” when it’s working and “Done” when it’s done, then watch that status bar text for a change.  If a file is supposed to be written, then look for that file.  You have to be careful, though, don’t always trust the application outright.  As you’ll soon see, the application isn’t always telling you what you really want to know.

My absolute favorite brainbender of a timing issue is dealing with the IE COM object that lets you run browser automation through the IE DOM.  With this COM object, your commands are shipped off to be executed in another process, largely asynchronously.  You call the navigate method to open a web page.  Obviously, since you’re opening a web page, that can take some time, so you make sure that you wait for the page to finish loading before you begin the test.  Your test runs perfectly about 70-80% of the time.  But there’s that remaining chunk where your test reports that it can’t find the page element you’re trying to use.  So, you watch the test run.  It opens the browser and navigates to the page, the element is clearly present on the page you see, yet your test whines that it’s missing and it dies.  WTF?  As far as you can tell, everything is doing exactly what it should be doing except for the failing miserably part somewhere in the middle.  You step through in a debugger, hoping to catch the bug in action, but it works every time.  Here’s where it gets fun:  The IE instance that you’re driving lives in another process and operates on its own time.  You send off an asynchronous request to load a page, then almost immediately thereafter, you ask it if it’s done.  Most of the time, the browser will say “Not yet”, and your test goes to sleep.  But, once in a while, the browser responds, “Yeah, I’m done” on that first request.  You continue, and die because obviously it hasn’t loaded your page yet.  Why is it saying that it has?  Well, you’re not asking if it’s loaded the page you’re looking for.  You’ve asked if it’s done loading.  It says “Sure”, because as far as it’s concerned, it is done…  It’s done loading the LAST page you sent it to.  It hasn’t even started loading the page you just told it to go to.

Debugger == FAIL:

Stepping through your automated UI test case in a debugger is a blueprint for fail.  It won’t work right.  It just won’t.  Your test is happily humming along, driving controls, setting text, having fun, when all of a sudden, a breakpoint is hit.  Your trusty debugger IDE comes to the foreground and you tell it to step to the next line.

Where are you now?

The debugger stole focus.  Does it give focus back to the window you were at before?  Does it give it back to the same control?  When the debugger steps in, it FUBARs the state of your test.  Things might work.  Maybe.  Then again, your test might go completely off the rails and start opening menus and typing things in whatever application you land in.  It could go catastrophically wrong, and while it’s often entertaining to sit back and watch your computer flip out, it usually doesn’t help you solve the original problem.

You can try stepping through an automated test using a debugger, but dust off your Console.WriteLine or printf debugging skills, because there’s a good chance you’ll need them.

Make Sure You Have A UI To Test:

Standard operating prodcedure for automated tests in a Continuous Integration environment is to have some automated process kick off your tests in response to a check-in or a build.  Trouble is, these automated processes typically live as a service or a scheduled task on a machine hidden in a closet that no one ever logs in to.  If no one is logged into a machine, then there’s a good chance that the application you’re trying to test won’t be running in an interactive window station, and if it’s not in an interactive window station, then your application probably won’t have things like, oh, windows.  It’s pretty hard to test a UI when the UI doesn’t exist.  Make sure that you’re running your UI tests somewhere that they they’ll have an interactive window station.  If you can leave a machine unlocked and open all the time, then that’s the easiest thing to do.  Unfortunately, things like “Corporate Computing Security Policies” tend to get in the way of you getting done what you need to do.   If you can’t leave a machine unlocked, then another possible solution is to use a virtual machine.  It’s not as scary to set up a simple VM as it might sound initially4, and it’s possible to have a VM running and unlocked and with a nice shiny interactive window station, even on a physical box that no one is logged in to.

Now, some of you might be thinking of using Remote Desktop to solve your problems, but good luck with that.  I’ve found that any place big enough to have a computing security policy that prohibits unlocked machines also tends to have a computing security policy that will log you out of inactive remote sessions.  Even without a policy, remote desktop sessions tend to log themselves out when they get bored.  And, to top it all off, even if you don’t get logged out, I’ve had problems with UI things over Remote Desktop, so use at your own risk.  You might have better luck than I did…

Beware of Outside Influences:

With direct API or service testing, you’re usually insulated from whatever’s happening on the machine.

“Windows has just updated your computer and it will restart automatically in 3, 2, 1…”

Unfortunately, that’s not the case with UI automation.

“There is an update available for Flash.  Download NOW!”

You’re much more at the mercy of unexpected windows and popups and dialogs.

“This application has encountered an error and will be shut down.”

There’s not much you can do about it.

“You need administrative rights to perform this action.  Allow?”

You can always try to eliminate or tune down the things that you can predict, but there will always be the unexpected willing to come along and bite you.

“You need administrative rights to allow this action to obtain administrative rights to perform this action.  Allow?  Are you REALLY sure this time?”

Basically, your only option is to be defensive.  You can’t always recover from some random dialog or other interference5, but you can make sure that your tests don’t hang forever and at least report that something went wrong.  If you’re looking for a window or a control, don’t look forever.  If it’s not there within a minute, it ain’t coming, so kill the test and move on.  And whenever possible, have your tests take screenshots of unexpected failures.  You’d be amazed how much frustration you’ll avoid if you have a screenshot that clearly shows what went wrong.

Take Screenshots Whenever Possible If Something Goes Wrong:

Yeah, I know I just said that above, but it needed its own headline.

Sometimes It’s Just Plain Flaky

Even when you’ve tailored the environment to be exactly what the test needs, even when you’ve taken care of all the stupid timing issues and dialog interference, even when it should work, sometimes, it just won’t work.  UI testing should always be treated as your sworn enemy because it hates you.  And there’s nothing you can do about it.  A good rule to live by is that if a UI test fails once, run it again, if it fails twice, run it once more, and if it fails a third time in a row, it’s an actual bug in the software.  It is a waste of time to attempt to get UI automation running flawlessly 100% of the time.  Shoot for 90% and call it a day.  You’ll find more bugs in the software if you write 30 slightly imperfect tests than if you spend all that time writing 5 perfect tests.

And When All Else Fails…

Thread.Sleep(5000);

  1. And really, if you’re an SDET, you already have been thinking of a solution of some form along these lines.  If not, then give the D in your title back and get the hell out of my pay grade because you have no business calling yourself a Software Development Engineer, in Test or otherwise. []
  2. I actually follow a slightly more complicated model where I have wrapper classes for the controls, too.  For this SearchPage example, I’d actually have something like a “UITextBox” class or interface with a Text property, and a “UIButton” class with a “Click” method and a “Text” property, etc.  This lets me expand the functionality of the controls without having to change the container class.  (For instance, if I need a “Focus()” method on the button, I just add it on my “UIButton” class and it’s accessible on every button I have.)  Additionally, it allows for subclassing/inheritance, so if I have a stupid button that requires keystrokes to press, I can have “StupidButton” derive from UIButton, then make the class return a StupidButton instance, and the test cases are none the wiser. []
  3. Helpful tip:  If you want to send a space, call SendKeys(” “);.  Seems obvious now, but it’s amazing how your mind shuts out that possibility when you’re trying to do it. []
  4. MS gives away Virtual PC and Virtual Server for free, and chances are you have an OS install disc around somewhere, and that’s all you need. []
  5. Keep in mind that interference need not be from the system.  If you’re running on an open machine somewhere, they have a bad habit of being used to check Facebook in the middle of a test run, and that’s not good for your UI driving automation… []

December 23, 2009   7 Comments

Test-Driven Disaster

When first considering using Test-Driven Development, many people will consult their local tester.  This is, of course, the wrong thing to do, because their local tester doesn’t actually care about Test-Driven Development.  It’s not a testing methodology, it’s a development methodology.  Asking a tester about it because it has the word “Test” in it is just as wrong as asking a bus driver about it because it has the word “Drive” in it.  And if you’re assigning your tester the task of “Test-Driven Development”, you need to stop before you damage something, because you’re doing it wrong.

For those who don’t know, “Test-Driven Development” is an Agile methodology for designing classes and developing code, where you begin with writing an automated test, stubbing out a method as you go, then after you have the test, you implement the method until the test case passes.  What many people seem to miss is that testing is merely a side-effect, it’s not the central goal.  Instead, the goal is to develop code from the top down by looking at how you’re going to use it.  There’s virtually no difference between TDD and stubbing out a low level module while implementing a higher level module.  The approach is the same:  You focus on how you’re going to use the functions you’re writing by actually using them, rather than trying to list all the operations you might need without the context on how they’ll be called.  If you don’t understand that’s what you’re doing, you’re bound to foul it up and hurt something.  You can’t do TDD after you’ve already specced out the method signatures, so don’t even try.

Beyond the rampant misunderstanding regarding what TDD is, my biggest reservation about it is the fact that most people who want to try it don’t know how to write tests.  Bad tests are far worse than no tests.  By strictly adhering to the principles behind TDD, you’ll write five tests and think you have everything covered because you red-green-refactored your way to perfection.   In fact, since TDD and the Wide World of Agile will encourage you to use code coverage you’re guaranteed to have tested everything.  Trouble is, you’ve only covered five specific cases, not the five hundred cases that would be apparent if you actually looked at the problem and nowhere near the five thousand cases your users will throw your way.  Code coverage will only tell you that you’ve executed lines of code, not that they’re correctly executing according to your plan.  By exactly following the process of Test-Driven Development, you’re pretty much assured to write nice shiny code that doesn’t actually work, even though it passes all of your tests.  To successfully write effective unit tests, you have to go beyond the initial requirements (Which are incomplete) and the initial design (Which has changed) and look at the solution you actually implemented to see where the problems are.  In other words, your job isn’t over after “Refactor”, you still need to go back and enhance your initial suite of tests before you’re done.

Take, for example, the requirement that you write a function that takes any two numbers and returns their sums.  In the world of TDD, you start with a test:

[Test]
public void TestSumMethod()
{
    Assert.AreEqual(5, SumMethod(2, 3), "2 + 3 = 5 Check");
}

Okay, so, we have the test.  Now we need the function stub.

public int SumMethod(int a, int b)
{
    return 0;
}

Compile, run tests, and blammo!  Your test failed, as it should, because you haven’t implemented anything yet.  This was the “Red” step.  Now it’s time to go Green.  For that, you implement the function you stubbed out.

public int SumMethod(int a, int b)
{
    return a + b;
}

Run the test again and it’ll pass.  Run code coverage and -look at that- 100%.  Now you refactor, rerun the tests and code coverage, and everything’s green, so you’re done!

Except…  You’re not done.  The requirements said “Any two numbers”.  So, you need more tests.  Does it work with negatives?  Big numbers?  Small numbers?  Any tester worth paying will immediately break your function by trying to pass in 2147483647, 2147483647.  And that’s just the beginning.  What about real numbers?  Can it do π + e?  And let’s not even get into complex numbers.  During your refactoring, did you change the inputs to a type that includes NaNs and Infinities?

The point is that following Test-Driven Development left you thinking that you had written a function that was adequately tested, when in reality, it was woefully under-tested.  Obviously, this was a simplified example.  The consequences of doing this with some real, worthwhile code could be disastrous.

Now, I like developers writing unit tests (Well, I like it when they write good unit tests, as I’ve already written…).  I don’t care if they come first, last, or during.  Just don’t fool yourself into thinking that practicing TDD will mean that you’ll have all of your unit tests written as part of your red-green-refactor cycle.

All in all, I like what TDD tries to do.  Thinking about code before you write it and writing unit tests are generally good things to do.  What I don’t like is the shiny glowing path to failure that TDD sets out for those who are unprepared.  I have no doubt that certain Agile practitioners can do TDD and do TDD well.  Unfortunately, I don’t think those people live in The Real World.  If you run your own software consulting firm, then great, go off and do things how you want.  But for everyone else, you’re going to get fired if you try to do that sort of thing, because you’re going to do it wrong and waste a lot of time in the process.

November 4, 2009   No Comments

SWA in WPF and Silverlight

A couple of weeks ago, I wrote about using the System.Windows.Automation libraries in the .Net Framework to write UI Automation, primarily for use in writing automated test cases.  Well, that’s only half the story.  In order to truly unlock the power of SWA and the UIAutomation library, you’ll want to add SWA instrumentation to your application.  It’s not required for using SWA, but believe me, it will make your testers happier.

Microsoft has provided a way to add enhanced support for UIA into Win32 and Windows Forms apps, but WPF and Silverlight have been built from the ground up to have full and simple support for the UI Automation properies.  In this post, I’ll be exploring how to instrument a Silverlight application, largely because I can give a link to my finished sample app for you to explore.  Working with WPF should be mostly the same. 1

Let’s start by looking at what you get from Silverlight, straight out of the box.  We’ll begin with a simple text box and button, the sort of thing you’ll find in pretty much any UI program ever written.

<StackPanel x:Name="LayoutRoot" Orientation="Horizontal" Width="250" Height="32">
    <TextBox x:Name="TextEntryBox" Width="150"/>
    <Button x:Name="SubmitButton" Content="Submit"/>
</StackPanel>

That produces a super exciting display that looks like this:

SWASilverlightApp1

I has mad UX skillz.  Fear them.

If you open good old trusty UISpy and examine the control tree of your XAML masterpiece, you’ll find something that looks a bit like this:

SWASilverlightApp-UISpy1

An “edit” control and a “button” control in a Silverlight Control?  That looks an awful lot like what we just wrote up in XAML, doesn’t it?  That’s because it is.  Yep.  You write it and presto, it’s seen by SWA.  And if you look at the automation properties panel in UISpy, you’ll see that the edit control has an AutomationId with the value of “TextEntryBox” and that the button control has an AutomationId of “”SubmitButton”.  Those are also what we put in the x:Name attributes back in the XAML.  In other words, for Silverlight and WPF apps, most of your controls will automatically be exposed to UIAutomation using the programmatic names you’ve already given them in your code.  You get that for free.

But wait!  There’s more!

Do you see all those other properties in UISpy?  Well, if you don’t, here’s a screenshot for your enlightenment:

SWASilverlightApp-UISpy2

Well, you have direct access to set some of them straight from XAML.  Okay, so it’s not completely free, but it’s still really really easy.  They’re exposed through the AutomationProperties attribute.  As an example, let’s set the “Name” property of the text box in the sample.  If you look at the UISpy control tree, you’ll see that it’s called “”, instead of anything useful.  The automatic population of the “Name” property usually uses the content of a control.  In the case of the button, it has a text label of “Submit”, which is where it gets the name from.  But the text box doesn’t have any content, so it gets left out.  Let’s give it a name before it gets depressed.

<TextBox x:Name="TextEntryBox" Width="150" AutomationProperties.Name="Search Query"/>

All it needs is that “AutomationProperties.Name” attribute set and there you have it.  our edit box is now showing up with a name in UISpy and everyone’s happy.

SWASilverlightApp-UISpy3

Most of the time, I’ll stick to “Name” and “AutomationId” for helping me find elements for my automated tests.

For a good number of decently simple applications, that’s all you’ll need to know.  For that reason, most tutorials I’ve seen on the subject stop here and pretend that this much information is actually adequate for using SWA in the real world.  I, however, have actually done some work in the real world and know that this alone can be painful to work with.  So, I’m going to keep rambling for a bit.

Consider, for a moment, that you’ve assembled some combination of controls that is useful to you, and that you’d like to reuse that combination of controls and not have to copy/paste/rename.  Now, I understand that many of you will never attempt something as wild and complex as this, and that the thought of it alone might drive you to madness, but please bear with me, as it’s required for my next demonstration.  For the sake of conversation, let’s say that this useful combination of controls is a text box and a button to go with it.

So, you have a user control with a box and a button.  I’ve given it the bleedingly obvious name of “BoxAndButton”.  This code should look familar to you:

<UserControl x:Class="SWASilverlightApp.BoxAndButton"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Width="250" Height="32">
    <StackPanel x:Name="LayoutRoot" Orientation="Horizontal">
        <TextBox x:Name="TextEntryBox" Width="150" AutomationProperties.Name="Search Query"/>
        <Button x:Name="SubmitButton" Content="Submit"/>
    </StackPanel>
</UserControl>

Okay, now that we have the control, let’s use it somewhere.  In fact, it’s a reusable control, so let’s use it twice, just because we can!

<StackPanel x:Name="LayoutRoot">
    <MathPirate:BoxAndButton x:Name="SearchDatabase"/>
    <MathPirate:BoxAndButton x:Name="SearchWeb"/>
</StackPanel>

(Please note the “MathPirate” namespace that you’ll have to add to your XAML file, if you’re playing the home game.)

Now we run our app and see our two glorious text boxes and buttons.

SWASilverlightApp2

Now let’s go into UISpy and see what Silverlight or WPF has given us for free.

SWASilverlightApp-UISpy4

Now, this is the point in the application development cycle where you’ve made your UI tester cry.  (Or, if they’re anything like me, you’ve made them cry and shoot you with a fully automatic Nerf gun.) While all of the elements are there, everything is jumbled together.  Despite naming one of the controls as “SearchDatabase” and the other as “SearchWeb” and using a hierarchy of controls, SWA has decided that you really want a flat list of useless.  Anyone trying to automate against this isn’t going to know which button is which, and will therefore have to rely on ordinal positions of controls and a whole lot of luck.  If, for some reason, you decide that Web should come first and you go and swap the boxes, you’ll decimate every single test written against this app.  Database queries will go to the Web and Web queries will end up at the Database, and a quantum singularity will form that will destroy the entire universe.  So, let’s fix things, shall we?

I know!  I know!  AutomationProperties!

<StackPanel x:Name="LayoutRoot">
    <MathPirate:BoxAndButton x:Name="SearchDatabase" AutomationProperties.Name="Database"/>
    <MathPirate:BoxAndButton x:Name="SearchWeb" AutomationProperties.Name="Web"/>
</StackPanel>

Give it a run, and…

SWASilverlightApp-UISpy4

FAIL.

So, what gives?  You’re using a nicely hierarchical control tree, you’re giving things nice names and identifiers, why aren’t you getting anything useful from SWA?  The reason:  UIAutomation is primarily designed to expose interactive controls and elements, and your user control isn’t an interactive element in itself, only the text box and button inside it are.  It’s wrapped in a StackPanel, but things like StackPanels, Grids and Canvases in Silverlight and WPF are considered to be strictly for graphical layout and say nothing about the interaction with the system.  As a result, they don’t show up in the control tree. 2  And the BoxAndButton control itself doesn’t show up because you haven’t told SWA that it’s interesting yet.  Let’s do something about that.

The method that WPF and Silverlight use to expose these AutomationProperties is called an “AutomationPeer”.  AutomationPeer is an abstract class that has a bunch of methods for you to implement, like GetNameCore() and GetAutomationIdCore().  The idea is that these Get*Core are supposed to return the value of one of the Automation Properties for UIA to expose.  This gives you a lot of control over what gets seen by UI Automation clients or tools like UISpy.  If you like that level of control, then feel free to inherit directly from AutomationPeer and implement all thirty or so Automation Properties.  The rest of us, who don’t like tedium, will simply use the implementation provided by FrameworkElementAutomationPeer.

Of course, having an AutomationPeer isn’t going to help you much, unless you tell SWA to use it.  You do that by overriding the method OnCreateAutomationPeer() on your control class.  The method is defined on the UIElement base class, so every element in Silverlight already has it, you’ve just probably never noticed it before.  OnCreateAutomationPeer() just needs to return the appropriate AutomationPeer object that exposes your object and the Automation Properties and Control Patterns you want the world to know about.

Okay, that sounded scary and confusing, but it’s not really that bad when you see it in action.  So, let’s get to the code!  Open up the .cs file for your control.

First, add a using:

using System.Windows.Automation.Peers;

Then, override OnCreateAutomationPeer:

protected override AutomationPeer OnCreateAutomationPeer()
{
    return new FrameworkElementAutomationPeer(this);
}

And now let’s run our pretty application once more and see what UISpy has for us this time.

SWASilverlightApp-UISpy5

Boom-diggity, that’s what I’m talking about.

Now, you have a nice control tree, with your web and database search controls uniquely identified.  While your efforts may not have made the testers completely happy, at the very least, they should have stopped shooting you with the Nerf guns for a while.

But wait!  There’s more!

If you were paying attention, you may have noticed the mention of Control Patterns that I ever-so-slyly snuck in a few paragraphs back.  Control Patterns were the prime focus of the earlier article on using SWA and UIAutomation, but I haven’t really said anything about them here at all.  Why is that?  Because on the implementation side, you typically don’t have to care about them.  You’re using a text box or a button or a scroll bar or whatever that already implements all of the control patterns that it needs, for the most part.  That means you typically don’t have to do a damned thing to get the ValuePattern or InvokePattern to work, while the testers all have to suffer with that idiotic rehash of QueryInterface and COM.

Before you start gloating about that, you need to realize that I’m a tester and now I’m going to make you suffer, too.

All along, we’ve been using a Box and Button as two separate elements.  We’ve put them in a single control, though.  Why not make that single control appear as a single control to UIAutomation?  In other words, why don’t we make our custom control look like it is a text box and a button, rather than something that just contains a text box and button?3

So, somehow, we now have to implement the ValuePattern and the InvokePattern on our BoxAndButton control.  The way to do that, as mentioned before, is through the AutomationPeer.  First, we’ll have to create our own subclass of AutomationPeer.  Again, since I’m not crazy, I’m going to use FrameworkElementAutomationPeer as a base, but you don’t have to.

public class BoxAndButtonAutomationPeer : FrameworkElementAutomationPeer
{
    public BoxAndButtonAutomationPeer(BoxAndButton owner) : base(owner) { }
}

Be sure to change your OnCreateAutomationPeer in BoxAndButton to return one of these instead.  If you run this now, you should see everything act exactly the same as it was before you added this new class.

If you go inside the your new AutomationPeer class and type “public override “, you’ll conjure the magic Intellisense window that’ll show you the list of all those Get*Core() and other Automation Property methods I talked about earlier.

AutomationPeerMethods

You can implement any of them if you want to, but the implementation in FrameworkElementAutomationPeer is usually good enough.  Except for the method GetPattern…

GetPattern is the method that SWA calls on your AutomationPeer when someone using SWA calls “GetCurrentPattern” on an automation element representing your control.  You’ll need to implement it and return an object that implements whatever ControlPatterns you wish to support.  In our case, we want to implement ValueProvider for the Text Box and InvokeProvider for the Button.  The only parameter to GetPattern is a PatternInterface enum containing a value for each of the existing ControlPatterns, so a simple implementation is to switch on patternInterface and return the appropriate object.

public override object GetPattern(PatternInterface patternInterface)
{
    switch(patternInterface)
    {
        case PatternInterface.Value:
        return Owner;

        case PatternInterface.Invoke:
        return Owner;
    }

    return base.GetPattern(patternInterface);
}

If you run it right now and look at it with UISpy, you’ll see that your BoxAndButton control now is apparently supporting the Invoke Pattern and the Value Pattern, just like we want it to do.  Trouble is…  It doesn’t actually support anything.  You may have noticed that I was simply returning “Owner” from GetPattern and found that odd.  Owner is the UIElement we passed into the constructor for FrameworkElementAutomationPeer.  In other words, Owner is the BoxAndButton control itself.  But, we haven’t actually done anything to BoxAndButton to let it know that it’s supposed to support InvokePattern.  Let’s take care of that now.

Each of the control patterns is exposed through an interface. 4  For the Invoke Pattern, it’s IInvokeProvider.  For the Value Pattern, you’ll want to use IValueProvider.  These and other Provider interfaces for for the other patterns live in the System.Windows.Automation.Providers namespace, so be sure to add that.

IInvokePattern is an interface with a single method, called Invoke.  It takes no parameters and returns nothing.  The function is supposed to cause the control to perform some action.  In our case, we’re going to want it to act like the button was clicked.  Like so:

#region IInvokeProvider Members

void IInvokeProvider.Invoke()
{
    AutomationPeer peer = FrameworkElementAutomationPeer.CreatePeerForElement(SubmitButton);
    IInvokeProvider invokeProvider = (IInvokeProvider)peer.GetPattern(PatternInterface.Invoke);
    invokeProvider.Invoke();
}

#endregion

Okay, that was freaking ugly.  You see, Silverlight and WPF don’t really have a way to easily simulate a button click on the Button class.  So, this method is diving into the world of UIAutomation itself and using the InvokePattern that’s already available on the Button.  Basically, it’s a passthrough.  Making this function suck less is left as an exercise to the reader.

IValueProvider is somewhat more straightforward and sane.  It has three members.  IsReadOnly, a boolean property that returns true/false based on whether or not the control is read only.  Value, a string getter property that returns the value of the control.  And SetValue, a function that takes a string and sets the value of the control with it, because apparently the designers hadn’t heard that properties like “Value” can have both getters AND setters.

#region IValueProvider Members

bool IValueProvider.IsReadOnly
{
    get { return TextEntryBox.IsReadOnly; }
}

void IValueProvider.SetValue(string value)
{
    TextEntryBox.Text = value;
}

string IValueProvider.Value
{
    get { return TextEntryBox.Text; }
}

#endregion

 

 Conveniently, all three of those members on IValueProvider map directly to things we can easily do with a text box already.

Now, if you go into UISpy and play around, you’ll see that not only does the BoxAndButton control say that it supports the Value and Invoke patterns, but that you can now actually use them.  You can easily see the Value Pattern in action.  Simply put some text in the box, then look in UISpy, and you’ll see that the custom BoxAndButton now can see the text value.  To see that the InvokePattern is wired up properly, you’ll have to hook up an event handler on the Click event of the button.

Of course, you don’t have to do exactly what I did in my example.  You can use whatever you need and whatever you have available in the I*Provider implementations.  Do it the way that makes sense for you.   You don’t even have to implement these interfaces on the control class directly.  I did it because it was convenient in this case, but it might not make sense for you.  You can use any class you want, it’s entirely up to your design.  Just change what you return from GetPattern.  However, if you put the interfaces on the control class, like I’m doing, I would recommend explicitly implementing these interfaces, since it’s unlikely you’ll want to expose these methods to anyone using the class normally.

There’s just one little annoying thing left about the control.  If you look in UISpy, you’ll see that our BoxAndButton control now supports the Invoke and Value patterns, but we still have the Text Box and Button as children, and those also have the Value and Invoke patterns on them.  They’re not needed anymore.  Let’s axe them, shall we?

If you’ll look at the list of functions available to override on our AutomationPeer, you’ll see one called GetChildrenCore.  You can override that function to tell UIA what your children should be.

protected override List GetChildrenCore()
{
    return new List<AutomationPeer>();
}

Obviously, it’s more useful to tell the system that you have children that it wouldn’t ordinarily know about (Like in the case of an owner drawn control), than it is to disown the children that you do have.  So, when using this function in a normal situation, you’re probably going to add something to the list that’s being returned.

Now, if you run the app and look at it in UISpy, you’ll see that the children of the BoxAndButton control have been written out of the will.  You’ll also likely see that, in fact, UISpy is having trouble finding our control.

Element : Element details not available.
Name : TreeValidationException
Message : UI Automation tree navigation is broken. The parent of one of the descendants exist but the descendant is not the child of the parent
Stack Trace : at UISpy.Base.Tree.BuildTree(AutomationElement element)
    at UISpy.Base.Tree.BuildTreeAndRelatedTrees(AutomationElement element)

Well now…  That sucks.  See, what’s happening is that UISpy is finding the real Text Box and real Button controls that still exist and trying to select them, but it’s unable to find their parents in the tree.  In other words, we’ve tampered with the natural order of things and are making UISpy go all loopy.  Good luck with that, I’m gonna end on this high note.

You can find my sample application here (A version that’s not hOrking UISpy):  http://mathpirate.net/log/UploadedFiles/SWASilverlightApp1/SWASilverlightAppTestPage.html

You can get the source code out of SVN here:  http://www.mathpirate.net/svn/Projects/SWAExample/SWASilverlightApp/

  1. I haven’t done anything with SWA and Windows Forms or Win32 yet, so I don’t know what that entails, but from a quick glance at the docs, it’s a bit more complicated. []
  2. And honestly, you don’t want them to.  It would be insane to try to navigate through all the different panels that appear everywhere.  One of the Silverlight 2 Betas had all of the panels and things exposed and it was frightening. []
  3. Why don’t we?  Because it’s actually useless to do so.  A text box and a button are perfetly good controls to contain.  However, treat this as an exercise in exposing a completely custom owner drawn control to System.Windows.Automation. []
  4. If you’ll remember from the previous article, a large serving of wrath was thrown toward the designers of SWA for not using interfaces so finding out that they do, in fact, use interfaces on objects internally was a bit of a relief, but also upsetting.  I still can’t understand why the person responsible for the design on this side didn’t smack the designers of the consuming side around until they used interfaces, too.  And if it’s the same designer, then they need to smack themselves around until they make up their mind. []

October 10, 2009   No Comments

SWA: Straight Outta Redmond

Back in .Net 3.0, Microsoft slipped a little library into the Framework that was missed by most people. That library, System.Windows.Automation, was intended to allow direct programmatic access to MS UIA from .Net. UIA, or UIAutomation is Microsoft’s replacement for MSAA (Microsoft Active Accessibility), and is designed to expose window controls to accessibility devices, like screen readers for the blind. However, since it exposes all manner of window controls and operations through a direct programming interface, UIA is one of the most useful tools for UI Testers who are trying to write automation for Windows applications.1 In other words, if you want to write a program to drive the UI of another program for your automated tests, then System.Windows.Automation is where to begin.

Where to begin with SWA itself is a bit of a mystery, though. The documentation was sparse and confusing when I first started playing around with it, so most of what I know what a result of tinkering until it worked or searching the Internet until I found a similar confused person that had already solved the same problem and posted the solution. That’s why I’m writing this tutorial. I found that SWA had a learning cliff to overcome, so I hope to spare you some of the same trouble by explaining what I had to discover the hard way.

First, though, let’s take a trip through a highly opinionated aside about the general design of SWA. .Net 1.0 was beautiful and clean and easy. Everything made sense. .Net 1.1 cleaned more things up and made it even better. Then .Net 2.0 came out and the awesome was truly solidified by the introduction of generics and anonymous delegates. After that, everything fell apart. .Net 3.0 and 3.5 saw the introduction of bizarre things like WCF and Linq and semi-awesome, yet complicated things like WPF. It was like all the people who had guided the .Net Framework up through version 2 and shaped it with the mantra “Make it easy, make it clean” had been thrown out in a coup and replaced by an evil cabal of leftover COM programmers who wanted to restore the glory of MFC and ATL.

System.Windows.Automation seems to have been designed by one of these groups. At it’s core, it seems that the people who wrote it had never heard of things like interfaces or the MAUI libraries.2 When you work in SWA, you get generic AutomationElement objects, but they’re not the control type you want, and you can’t case them to the control type you want. There’s no Button class or TextBox class that you can get. Instead, you have to ask the element for the control pattern you’re interested in, and only then will you get an object that you can use. When I was first working with SWA, this approach made absolutely no sense to me. Why can’t I get an AutomationElement that I can cast to ButtonElement or IButtonElement and use directly? Why do I have to ask for these control patterns and get back some strange type? Then, about a year ago, I discovered what the model was. At that time, I was developing a toolbar for Internet Explorer, which requires extensive use of COM. This was my first exposure the the special brand of hell that is COM programming, as I mercifully had spent the late 90’s in the sheltered arena of school, and by the time I joined the real world, everyone was using .Net. When I saw QueryInterface in COM and what it was doing, it struck me that it was exactly the same thing that I’d had to do with AutomationElement.GetCurrentPattern().

The people who designed System.Windows.Automation had brought QueryInterface into the world of .Net. There is a special place in Hell for doing things like that.

Anyway, the utility of the library is enough to overcome any stupid choices in its design. So, let’s get going!

First, you’ll want to get the UISpy tool. You may already have it buried in your Visual Studio installation, but if not, head over to the MSDN and try to track it down. It’s usually part of the Windows SDK or .Net SDK, except when Microsoft apparently forgets to include it. I got mine from the Developer Tools in the Vista SDK Update, but you might want to see if there’s a better place by the time you read this.

UISpy is a lot like Spy++, which has been around since at least the VS6 days. You can walk the window control tree and find window handles and window classes and other things like that, just like Spy++, but it’s been extended with support for UIA. Once you get it, take it for a spin around your window hierarchy.3 I’d suggest turning on the “Hover” tracking mode, which lets you select a window to inspect by holding the CTRL key and hovering over a window. It’ll sometimes take a while to get to the control you’re selecting, but that’s what a full tree walk will do to you.

UISpyBasic

This screenshot shows the basic window of UISpy. On the left is the control hierarchy. On the right are the properties of the currently selected window. You’ll become very familiar with some of these properties and you’ll decide that other properties are completely useless. The determination of which fields are useful or useless is left as an exercise to the reader.

UISpyCalculatorWindow

Here’s an example of what UISpy will tell you about the window used by the Windows Calculator. On the left side, you can see that it has a bunch of child controls. They’re marked as check boxes, radio buttons, buttons, even an edit box. If the window were bigger, you’d also see that it has a title bar and menu bar. You can get information on all of these objects and interact with most of these objects. Pretty much anything you see here is something you can use SWA to control. On the right side are the properties for the selected object. Things like AutomationId, Name, and ClassName are generally good identifiers, while fields like ProcessId and RuntimeId may change from run to run.

At the bottom of the property list are the Control Patterns supported by this element. Control Patterns are how SWA interacts with controls. For instance, in this screenshot, it shows that the main calculator window supports the Transform pattern and the Window pattern. The Transform pattern means that you may be able to perform move, rotate, and resize actions on this object. In this case, the calculator reports that you can move it, but that you can’t resize or rotate it. If you right click on the element in the tree on the left side and select “Control Patterns” from the menu, you’ll get a dialog where you can trigger some of the methods on a supported control pattern. When you get to writing your automation program, you’ll ask for one of these Control Patterns and be able to use it to drive the control. There are other ControlPatterns, like “Invoke” for buttons and “Text” or “Value” for things like text boxes. You’ll probably find that you only use a small handful of these patterns regularly.

If you looked at that last screenshot of UISpy, you probably noticed that odd rectangle floating over the screen. If you’re playing the home game along with me, then you’ve probably had your own rectangle floating about. It’s the UISpy highlight window, showing the outline of the last window you selected.4 It’ll go away if you close UISpy, but I’ve found that you’ll tune it out. Sometimes I’ve had it linger around for several hours after I get the information I’ve needed, until someone comes by and asks me what that strange red thing on my screen is. If you move the mouse over the edge of the box, you’ll get a tooltip with a little bit of information on the selected window.

Anyway, we’ve been playing around in the ever-important UISpy, but we haven’t gotten around to actually using SWA yet. Giving that’s what this article is supposed to be about, let’s get to it.

My example code can be pulled from SVN here: http://www.mathpirate.net/svn/Projects/SWAExample/

I’m creating a Console App, but there’s no reason you can’t use SWA in a test DLL or a Windows app or whatever.5 It’s just a .Net library.

To begin, add references to UIAutomationClient, UIAutomationClientsideProviders and UIAutomationTypes.6

SWAReference

After that, add a using for System.Windows.Automation.

Now you’re ready to get rolling. The main class you’ll be using is the AutomationElement class. The entire control hierarchy is made up of AutomationElements. Think of them as an analog to the Control class in Windows Forms. It’s important to note that you’ll never create an AutomationElement yourself. AutomationElement does not have any constructors on it that you’re allowed to use. Instead, you’ll use methods on AutomationElements to give you other AutomationElements. The AutomationElement class has a number of static methods on it. Type “AutomationElement.” to bring up your good friend Intellisense to tell you what to do.

The first thing you’ll notice is that the AutomationElement has a hell of a lot of static fields on it. They’re pretty much all Dependency Property garbage. If you don’t know what Dependency Properties are, think of them as an enum value that will let you pass the name of a property to access on an object. (And if you do know what they are, then you know that the explanation I gave is horribly oversimplified and pretty much wrong and misleading. SHHH! Don’t tell anyone!) You can ignore them for now, but they’ll come back to haunt us in a bit. Right now, there are only four things you’ll care about on Automation Element:

  • RootElement: A reference to the root automation element, otherwise known as the main desktop window. Everything else is a child of this element. If you have to search for an element, you probably want to use this element as your base, at least until you find a better parent element to operate from.
  • FromHandle(IntPtr hwnd): If you have the window handle to the window or control you want to work with, use this method to grab an AutomationElement. It’ll be faster than searching for it and it will also give you exactly what you were looking for. I almost always start here rather than starting with a search, because you really don’t want to walk the entire control tree looking for something if you don’t have to.
  • FocusedElement: If the element you’re interested in has focus, use this and go straight there. No searching and no window handles necessary.
  • FromPoint(Point pt): Need the control at 132, 526? Use this. I’m not sure if this will do a tree walk, so use at your own risk.

To begin my example, I’m going to launch an instance of the Windows Calculator application, then use FromHandle to grab the Calculator window and print out some information on it. (BTW, I’m running XP, so if you’re playing along at home, the calculator may be different in your operating system.)

//Launches the Windows Calculator and gets the Main Window's Handle.
Process calculatorProcess = Process.Start("calc.exe");
calculatorProcess.WaitForInputIdle();
IntPtr calculatorWindowHandle = calculatorProcess.MainWindowHandle;

//Here I use a window handle to get an AutomationElement for a specific window.
AutomationElement calculatorElement = AutomationElement.FromHandle(calculatorWindowHandle);

if(calculatorElement == null)
{
throw new Exception("Uh-oh, couldn't find the calculator...");
}

//Walks some of the more interesting properties on the AutomationElement.
Console.WriteLine("--------Element");
Console.WriteLine("AutomationId: {0}", calculatorElement.Current.AutomationId);
Console.WriteLine("Name: {0}", calculatorElement.Current.Name);
Console.WriteLine("ClassName: {0}", calculatorElement.Current.ClassName);
Console.WriteLine("ControlType: {0}", calculatorElement.Current.ControlType.ProgrammaticName);
Console.WriteLine("IsEnabled: {0}", calculatorElement.Current.IsEnabled);
Console.WriteLine("IsOffscreen: {0}", calculatorElement.Current.IsOffscreen);
Console.WriteLine("ProcessId: {0}", calculatorElement.Current.ProcessId);

//Commented out because it requires another library reference. However, it's useful to see that this exists.
//Console.WriteLine("BoundingRectangle: {0}", calculatorElement.Current.BoundingRectangle);

Console.WriteLine("Supported Patterns:");
foreach (AutomationPattern supportedPattern in calculatorElement.GetSupportedPatterns())
{
Console.WriteLine("\t{0}", supportedPattern.ProgrammaticName);
}

(Apologies for the horizontal scrollies…)

The example above will output something like this, although your specific values may vary.

--------Element
AutomationId:
Name: Calculator
ClassName: SciCalc
ControlType: ControlType.Window
IsEnabled: True
IsOffscreen: False
ProcessId: 3660
Supported Patterns:
         WindowPatternIdentifiers.Pattern
         TransformPatternIdentifiers.Pattern

As you may have noticed, the information that was just printed out here is the same that was in UISpy, although the output in my example has been edited for time. Of course, if you ran this sample, you probably also noticed that the Calculator remains open after the app exits. That’s not very polite. Let’s clear that up now.

One of the patterns listed supported by the main window is, surprisingly enough, the WindowPattern. If you looked at what methods are on the WindowPattern back when you were playing around in UISpy, you may have noticed that there’s a method called Close which you can call. Something tells me that method will be useful for our current situation. I think I’m going to give it a spin.

(By the way, for my sanity and to make the examples more compact, I’m going to be moving parts of the sample code into helper functions as I go. For instance, all of those WriteLine statements have been put into a method called “PrintElementInfo”. So, if you see an odd function in the samples, that’s probably all it is. I’m not going to intentionally leave out important code that you’ll need to make things work.)

In order to get a control pattern off an AutomationElement object, you have to call QueryInt- er, I mean, you have to call GetCurrentPattern on the object. The GetCurrentPattern method takes an AutomationPattern object. AutomationPattern has a static LookupById() method on it, which is completely worthless to you, and, like everything else we’ve seen, has no public constructor. So, WTF, where are you supposed to get the pattern from? In a complete failure to make the code usable from Intellisense alone, you have to use a static member off of the type of the pattern you want to retrieve. You want to use a text box? Use TextPattern.Pattern. Need to play with a dropdown combo box? SelectionPattern.Pattern. We want the WindowPattern7, so we’re going to call GetCurrentPattern(WindowPattern.Pattern). Of course, GetCurrentPattern returns the pattern object as an ever-helpful IUnkno- I mean, object type, so you have to cast it.

Once you have the WindowPattern object, a quick examination of its members shows that it has a Close() method. Calling it should close the calculator and clean up after our program.

Here’s what those lines look like in code. Add them to the end of the sample and watch the window disappear!

//Get the WindowPattern from the window and use it to close the calculator app.
WindowPattern calculatorWindowPattern = (WindowPattern)calculatorElement.GetCurrentPattern(WindowPattern.Pattern);
calculatorWindowPattern.Close();

So, there you go! That’s all you need to know about System.Windows.Automation! You can find a window and close it, therefore, you can do anything! Have at it!

Or… Not… Let’s continue, shall we?

Since this is a calculator, let’s calculate something. Something too complex to do by hand, something we need the full power of a modern multi-core computer to figure out. Something like “What do you get if you multiply six by nine”, perhaps? To begin, you’ll need to list out the steps that you take when you manually perform this action.

  1. Open Calculator. (Hey! We did that already. We’re awesome.)
  2. Type “6”.
  3. Press the Multiplication button.
  4. Type “9”.
  5. Press the Equals button.
  6. Read the result and know the answer.

So, the first thing we need to do is type “6” into the calculator text box. So, we need to find the Calculator’s text box. Let’s bring up our friend UISpy to find out how to reference that box.

CalculatorEditBox

So, we’ve got a class name of “Edit”, a control type “ControlType.Edit” and an AutomationID of “403”. That should be enough to find the control we’re looking for, so let’s get to the code and grab it.

(BTW, obviously, this code will have to go before the Close method call we added. You can’t use a calculator that isn’t running anymore…)

An AutomationElement object has two methods that let you search the control hierarchy for the elements you’re interested in: FindAll and FindFirst. Since we’re only expecting a single text box, we’ll be using FindFirst. Intellisense will show you that FindFirst has two parameters: FindFirst(TreeScope scope, Condition condition);

TreeScope is an enum, so it’s very Intellisensible and clear. You’re like to use the values “Children” and “Descdendants” the most. Children limits the search to the immediate children of the element you’re searching on, while Descendants are the children and the children of children and so on, all the way to the bottom. I prefer to use Descendants by default, unless I know that I want something else. It should be noted that the Parent and Ancestor scopes are listed as “Not Supported”, so don’t expect to be able to use them. Anyway, we’ll use TreeScope.Descendants here.

Condition, on the other hand, offers no Intellisense help for you at all. That’s because Condition is an abstract base class of many conditions. There’s PropertyCondition, which will match based on a property value, and And/Or/Not conditions, which can be used to group multiple conditions logically. Off of the Condition class are static True and False conditions. And, if you need your own sort of crazy condition, I think you can derive from Condition and make one yourself, although I would question your MentalCondition if you were to do that without good reason. PropertyCondition is the only stock condition that you’ll find yourself using, and it’s also the only one that requires any kind of in depth explanation.

Warning! We are about to be haunted by Dependency Properties!

PropertyCondition allows you to specify the value you want a property to match for your control tree search. PropertyCondition actually has a constructor, to which you pass an AutomationProperty and a value to match. The AutomationProperty parameter is where Dependency Properties come in. You have to pass in one of the static values from the AutomationElement that I told you to ignore earlier. If you look, you’ll find that there’s one of these static values for each of the properties on an AutomationElement instance. So, if you want to find an AutomationElement that has an AutomationId of 403 (Which, coincidentally, is what we want to find), then you’ll use AutomationElement.AutomationIdProperty in your PropertyCondition. Like so:

PropertyCondition editBoxAutomationIDProperty = new PropertyCondition(AutomationElement.AutomationIdProperty, "403");

(Note that the value “403” is passed as a string. That’s because AutomationId is a string, and the types need to match. You’ll have to make sure that you’re passing the same type as the property yourself, otherwise you’ll get an exception at runtime.)

Then you pass that condition to your FindFirst call and presto, you get the element you’re looking for. (Or null or possibly an exception or maybe some other element that happens to match what you asked for or that isn’t what you wanted…) I’m going to do that now, but first, we need something to call FindFirst on. Doing a tree search can be very slow, on the order of 10+ seconds per call in some cases, so you want to limit the scope of the search. If you don’t have any element to go on, then you can use the static AutomationElement.RootElement property that I mentioned earlier, and that will look through EVERYTHING. However, we already have the main calculator window, so let’s just assume that anything in the window, including the edit box, will be a descendant of that window and use that as the starting point of our search. That gives us this:

PropertyCondition editBoxAutomationIDProperty = new PropertyCondition(AutomationElement.AutomationIdProperty, "403");
AutomationElement editBoxElement = calculatorElement.FindFirst(TreeScope.Descendants, editBoxAutomationIDProperty);

Printing out the element that’s returned gives you something like this:

--------Element
AutomationId: 403
Name:
ClassName: Edit
ControlType: ControlType.Edit
IsEnabled: True
IsOffscreen: False
ProcessId: 1884
Supported Patterns:
         ValuePatternIdentifiers.Pattern
         TextPatternIdentifiers.Pattern

Looks like the one we want, so let’s use it. We want to set the value of the control, so let’s grab the ValuePattern for the text box and use the SetValue method to set the value of the element to “6”, our first number.

ValuePattern editBoxValuePattern = (ValuePattern)editBoxElement.GetCurrentPattern(ValuePattern.Pattern);
editBoxValuePattern.SetValue("6");

Now you hit run and…

System.InvalidOperationException was unhandled
   Message="Exception of type 'System.InvalidOperationException' was thrown."
   Source="UIAutomationClient"
   StackTrace:
        at System.Windows.Automation.ValuePattern.SetValue(String value)
        at SWACalculatorExample.Program.Main(String[] args) in E:\svn\Projects\SWAExample\SWACalculatorExample\Program.cs:line 28
        at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
        at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
        at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
        at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
        at System.Threading.ThreadHelper.ThreadStart()
   InnerException:

AW CRAP IT BROKE

You followed the directions, you got an instance of the correct pattern, it should have worked, but didn’t. So what happened? Well, if you took the time to investigate the edit box in UISpy, you would have noticed this little tidbit down in the information about the Value pattern:

IsReadOnly: "True"

Lovely, so the edit box is read-only, which means we can’t assign the value in that way. Now, for a confession: I noticed that the box was read-only early on, but I still dragged you down this dead end for three reasons:

  1. I wanted to teach you that life sucks sometimes.
  2. I wanted to teach you how to use ValuePattern to set the value of a text box that’s not read-only.
  3. I wanted to illustrate one of the most important skills for an automation developer to have: The ability to come up with a Plan B workaround when the automation tool fails you. Because it will fail you. Frequently. And at the most annoying times.

The most common Plan B is to use System.Windows.Forms.SendKeys to send a series of keystrokes to do what you want. (If you’re using the VS Testing stuff, there’s also a Keyboard class that exposes essentially the same thing.) It’s less reliable than the SWA methods, so use them when you can, but when you can’t, SendKeys might just do the trick. It has a wide grammar for sending complex keystrokes and it’s well work browsing the documentation to see what it can do, but for now, we just need to type the number “6”. Focus the edit box first, using the SetFocus() method on the edit box AutomationElement. That will make the key strokes go where we want, then use SendKeys.SendWait(“6”) to simulate the keypress. (SendKeys also requires a reference to System.Windows.Forms, for don’t forget to set that up if you don’t have one already.)

//Since the direct ValuePattern method didn't work, this is Plan B, using SendKeys.
editBoxElement.SetFocus();
SendKeys.SendWait("6");

If you run it now, well, you’ll see the calculator open and close really fast, but trust me, if you watch really really carefully (or comment out the Close line), you’ll see that the edit box will have the number 6 that we just “typed”.

Now that that’s out of the way, it’s time to hit the multiplication button. You grab the button the same way you grabbed the edit box: Look in UISpy for something uniquely identifying the control, then run a tree search and get what you’re looking for. In this case, the multiply button has a meaningful name that we can use: “*”. Again, it will be a PropertyCondition, but this time, we’ll have to use the AutomationElement.NameProperty as the property to match against. Once you have the button, you’ll want to click on it. The button click action is handled by an InvokePattern, so ask the button for its InvokePattern and call Invoke on it to click the button.

//Grab the multiplication button by name, then click on it.
AutomationElement multiplyButtonElement = calculatorElement.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "*"));
InvokePattern multiplyButtonInvokePattern = (InvokePattern)multiplyButtonElement.GetCurrentPattern(InvokePattern.Pattern);
multiplyButtonInvokePattern.Invoke();

Now we need to enter the “9” and press the “=” button to get our answer. I’ll leave that up to you, since that’s pretty much a copy and paste of the last two things I just got done doing. You may even want to take this opportunity to refactor. You’ll find that using SWA will result in tons of areas in your code where one logical action takes three or four long lines of code, and you’ll end up with those lines copied all over the place. For instance, by parameterizing the last block of code, you can make a function called “InvokeChildButtonByName” and turn those three lines of ugly into one line that makes sense. I would strongly recommend moving as much of your SWA related code into various helper functions or classes because, quite simply, SWA code is ugly and will distract from what you’re actually trying to do.

At this point, we’re finished with the first 5 steps in performing the calculation, and are left only with the sixth and final step: Reading the result. If you recall, we’ve already seen how to pull the ValuePattern from the edit box. We couldn’t use it at the time, but that’s because it was read-only. Now that we only want to read from it, its read-onlyness shouldn’t be a problem. Once you have the pattern, in order to get the value, you’ll have to use the .Value property off the instance. However, .Value isn’t on the pattern itself. To get to it, you either have to go through the .Current or .Cached properties first. I’m not entirely sure what all of the features and limitations and differences are between .Current and .Cached, but I know that .Current usually works and that .Cached usually doesn’t, therefore I’d strongly recommend using .Current.

ValuePattern editBoxValuePattern = (ValuePattern)editBoxElement.GetCurrentPattern(ValuePattern.Pattern);
string editBoxResult = editBoxValuePattern.Current.Value;

At this point, the string editBoxResult will have the answer to the question.

Six by nine: 54.

That’s it. That’s all there is.

If you want the full source code to the example, go here.

At this point, you should know about as much about SWA as I learned in three days of fighting with the technology and documentation. Points to remember:

  • Use UISpy to inspect windows and controls to find some uniquely identifying set of information.
  • Use FromHandle, FindFirst, or FindAll to grab elements that you’re interested in using.
  • Use GetCurrentPattern to ask for an object that you can use to interact with the control in a specific way. ValuePattern and InvokePattern are two of the most common ones you’re likely to use, but become familiar with the others for when you find a different control.
  • If all else fails, use a dirty hack workaround.8

Of course, this posting only talks about how to use SWA to write automation. It completely omits the other half of the equation, which is to write a UI application that can play nicely with SWA, which will lead to happier testers. The example of using Calculator was simple. In a lot of applications, you’ll run into controls without automatable IDs, leaving you to do nasty things like grab all of the Edit controls in a window and selecting the third one from the list and hoping that it’s actually the edit box that you want. You’ll find buttons that don’t actually click, forcing you to perform crazy workarounds. It doesn’t have to be like that. When writing an application, you can implement some support for SWA, making it easier to reference the elements in your tests later. Perhaps I’ll cover that in a later post.

  1. It’s also supported by Silverlight, so all you Internet testers don’t have to feel left out. []
  2. MAUI was an internal library that was widely used for UI Automation by teams within Microsoft back when I did my time as a contractor in 2004. It was a .Net wrapper which had classes for every type of UI control. It was simple to understand and use, largely because if you wanted to interact with a button or a text box, there was a Button class and a TextBox class that you could program against. It was almost a mirror to Windows Forms. []
  3. You’ll probably find that it doesn’t work for web pages, because they’re custom drawn panels and not real Windows controls. It won’t even work for Firefox at all, because that browser is laid out entirely in XUL. I’ll probably talk about browser automation in a later article. []
  4. Which is usually the main editor window in Visual Studio, because every time you do a CTRL-C, CTRL-V, you’ll be hitting the CTRL hotkey, which makes UISpy select the window the mouse is over. []
  5. There is, however, a restriction on the environment where you can use SWA. Since it interacts with window controls and UI things, you must have a UI present for it to interact with. That means if you try to use SWA in a headless session, like the one your automated tests running on your build server will be executed in, you’re screwed. If you can, you may need to have your automated tests run on a server that’s always logged into a window session and unlocked. If not, you can do what I did, which was write a program to launch a virtual machine on an instance of Virtual Server, then remotely execute your tests within that virtual machine. Although complicated, you may actually find it easier to implement that solution than it is to get the corporate computing security policy altered so that it doesn’t get in the way of doing your job. []
  6. And why these all aren’t under one unified System.Windows.Automation library, I don’t know. Or at least named System.Windows.Automation.Client, .ClientsideProviders, and .Types. More evidence that the SWA team probably hadn’t used .Net before writing this library and that all the people keeping things sane had fled the scene. []
  7. Not to be confused with the Willow Pattern []
  8. And when that fails, then you can use fire. []

September 27, 2009   3 Comments