Automating procedures using Sikuli

A few days ago, I came across an interesting open-source GUI testing application called Sikuli. This tool promises to automate just about any procedure involving graphical elements displayed on the screen, using a vision engine to intelligently match regions of your GUI display to widgets where you might click, drag, or type things. Sikuli is distributed under the permissive MIT License.

Unlike other automation tools I've used such as WinRunner, QuickTest Pro, SilkTest, and LoadRunner, Sikuli does not depend on API-level access to the technology used in the target application, and instead works purely based on the pixels displayed on the screen. This could allow automation of tasks in graphical environments whose API is unsupported by these other tools.

The demos are quite impressive; Sikuli scripts can be written in Jython, giving full access to the power of the high-level, general purpose Python language. This also sets it apart from the proprietary tools mentioned above, which have extremely limited scripting languages. The Sikuli IDE looks much like a regular text editor, except that screenshot regions can be inserted directly into the code. This makes for some very intuitive scripts.

However, once I tried Sikuli, I became less hopeful. I tried to automate a few basic tasks, including:

  • Run Notepad, type "Hello world", then select the text and make it boldface
  • Run Firefox, load Google Maps, and zoom into Colorado Springs
  • Run Internet Explorer, browse to a VPN login page, fill in my username and password, and login

I had some measure of success with each of these, but I continually ran into problems with screen elements not being found, and typed text failing to be entered. After starting up an application, it seems necessary to wait for a particular graphical element in the application to show up. I tried several variations on this, including the menubar, titlebar icon, or URL entry field, with varying and inconsistent results.

Once the applications started up, I had other difficulties. I got "Hello world" to be entered in Notepad with no problem, but then couldn't get it to be selected or boldfaced. I also had problems with automatically entering URLs into Internet Explorer and Firefox. I did finally manage to get Google Maps to load, and the graphical map interface worked pretty well; however, the seemingly-simpler task of logging into VPN failed when, after getting the page to load, the username and password fields wouldn't fill in. Sikuli didn't even realize anything was wrong, but just proceeded with trying to login without a username and password.

Finally, and perhaps the most troubling of all, in a few cases I ran the exact same script twice in a row, and got two different results. The first time, the URL wouldn't be entered; the next time it would. Or, the first time the "File" menu would correctly be opened, and the next time it wouldn't. This is something you never want to see in automation, but unfortunately I have seen it many times (and yes, I've seen it happen quite often in the commercial automation tools).

Overall, I must say I'm not sure Sikuli is ready for prime-time automation, particularly in domains such as web applications where a more specialized tool like Selenium would be more reliable. But it does look very promising--it's possible that I just didn't play with it long enough to learn its subtleties, and I expect with a little patience it could turn out to be very useful.