Automating procedures using Sikuli

A few days ago, I came across an interesting open-source GUI testing application called Sikuli. This tool promises to automate just about any procedure involving graphical elements displayed on the screen, using a vision engine to intelligently match regions of your GUI display to widgets where you might click, drag, or type things. Sikuli is distributed under the permissive MIT License.

Unlike other automation tools I've used such as WinRunner, QuickTest Pro, SilkTest, and LoadRunner, Sikuli does not depend on API-level access to the technology used in the target application, and instead works purely based on the pixels displayed on the screen. This could allow automation of tasks in graphical environments whose API is unsupported by these other tools.

The demos are quite impressive; Sikuli scripts can be written in Jython, giving full access to the power of the high-level, general purpose Python language. This also sets it apart from the proprietary tools mentioned above, which have extremely limited scripting languages. The Sikuli IDE looks much like a regular text editor, except that screenshot regions can be inserted directly into the code. This makes for some very intuitive scripts.

However, once I tried Sikuli, I became less hopeful. I tried to automate a few basic tasks, including:

  • Run Notepad, type "Hello world", then select the text and make it boldface
  • Run Firefox, load Google Maps, and zoom into Colorado Springs
  • Run Internet Explorer, browse to a VPN login page, fill in my username and password, and login

I had some measure of success with each of these, but I continually ran into problems with screen elements not being found, and typed text failing to be entered. After starting up an application, it seems necessary to wait for a particular graphical element in the application to show up. I tried several variations on this, including the menubar, titlebar icon, or URL entry field, with varying and inconsistent results.

Once the applications started up, I had other difficulties. I got "Hello world" to be entered in Notepad with no problem, but then couldn't get it to be selected or boldfaced. I also had problems with automatically entering URLs into Internet Explorer and Firefox. I did finally manage to get Google Maps to load, and the graphical map interface worked pretty well; however, the seemingly-simpler task of logging into VPN failed when, after getting the page to load, the username and password fields wouldn't fill in. Sikuli didn't even realize anything was wrong, but just proceeded with trying to login without a username and password.

Finally, and perhaps the most troubling of all, in a few cases I ran the exact same script twice in a row, and got two different results. The first time, the URL wouldn't be entered; the next time it would. Or, the first time the "File" menu would correctly be opened, and the next time it wouldn't. This is something you never want to see in automation, but unfortunately I have seen it many times (and yes, I've seen it happen quite often in the commercial automation tools).

Overall, I must say I'm not sure Sikuli is ready for prime-time automation, particularly in domains such as web applications where a more specialized tool like Selenium would be more reliable. But it does look very promising--it's possible that I just didn't play with it long enough to learn its subtleties, and I expect with a little patience it could turn out to be very useful.

Comments

Hasn't Sikuli already been done and rejected?

Hello,

I just discovered Sikuli, and the first thing that came to mind was 'have we come full circle?' I recall that in the early days of GUI test automation, using screen shots for verification was quickly rejected for the same reasons you give here for Sikuli's failure. Relying on pixels is fraught with danger. The premise is a good one--no matter what the technology being used, the GUI objects can be identified without an API. However, it still seems to be a bit of the holy grail story. I fail to see how Sikuli advances GUI test automation, given even the most mundane tasks (those you tested) fail. Is Sikuli the thing of the future? Maybe, but I'm probably going sauce labs and selenium if my boss lets me make the call.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Each email address will be obfuscated in a human readable fashion or (if JavaScript is enabled) replaced with a spamproof clickable link.

More information about formatting options

CAPTCHA
This question tests whether you are a human visitor, to prevent automated spam submissions.
Question text provided by textcaptcha.com