Geb and Sikuli – Adding image recognition to help your functional tests see

One of the drawbacks with testing with Webdriver is that it offers very little in terms of validating things visually. Webdriver also has very limited interaction with alerts, the desktop and embedded components such as Flash, Java or Browser plugins.

When you have a QA person, they will be able to immediately check that an image is too big, or that the responsive design is broken a certain screen size. The vocabulary of XPaths and DOM elements is sometimes too limited to capture what a human sees.

Webdriver can simulate the user interacting with the browser with a mouse and keyboard, but not what this user is able to perceive.

In this post, I would describe my experiment marrying the Sikuli image recognition framework with the geb testing framework. It will describe how we can use some of the features in Sikuli to expand our functional tests and even overcome some limitations within the very powerful webdriver / geb combination.

What is Sikuli?

It would be great if there was a way to describe things in our tests and use image recognition to make sure that certain things look the way we expect them to see.

Sikuli is an open source project first started at MIT and now maintained at The University of Colorado Boulder. It is capable of both interacting with a screen using screenshot fragments and use these fragments to verify that things are as expected.

The main difference between Sikuli and WebDriver is that in WebDriver you interact with DOM Element, with Sikuli you interact with images that represent screen regions. It allows you to write screenshot driven tests.

The technology is very cool and you should definitively learn more about it on the Sikuli website.

There are a few components with Sikuli.

The coolest one is perhaps the Sikuli IDE, which is available for download here. The Sikuli IDE is very similar to the Selenium IDE. It allows you to record certain interactions with your screens based on screenshots. However, unlike Selenium, you record parts of your screen as images and use Sikuli to identify this image within your current screen.

To see a quick example of how the Sikuli IDE looks like when searching for google results, see the video below:

There are a few more examples of Sikuli in action in the videos section of their site.

Integrating with Geb.

Ok, so we have this little pseudo cool utility that can click on screenshot fragments. How do we use this for fun and profit?

Adding Sikuli to Maven

Unfortunately, the Sikuli script runner is not available on the public maven repo, so you will have to install this locally or put it in your local artifactory repo ( everyone’s got one of those, right? ).

Once we have the Sikuli script jar file installed, we can use it within our tests to interact with content.

Download the Sikuli application and install it.

Navigate to the jar file. On a mac, this is located at /Applications/Sikuli-IDE.app/Contents/Resources/Java .

You want to add this jar file to your maven repo, you can do so by issuing the dollowing command:

mvn install:install-file -Dfile=sikuli-script.jar -DgroupId=org.sikuli -DartifactId=sikuli-script -Dversion=0.10.2 -Dpackaging=jar

After the entire Internet downloads, you should see something like the following

[INFO] Installing /var/folders/7l/pz3gny4534z8xt0r0dy36v4m0000gn/T/mvninstall1644711524611360566.pom to /Users/tomaslin/.m2/repository/org/sikuli/sikuli-script/0.10.2/sikuli-script-0.10.2.pom

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.731s
[INFO] Finished at: Fri Oct 26 23:15:32 BST 2012
[INFO] Final Memory: 5M/24M
[INFO] ------------------------------------------------------------------------

Example: Using Sikuli to validate image resizing.

Sikuli is now available for maven, which means we can pull that into our application and use its screen capturing abilities for fun a profit. Let’s see how that works.

Let’s say I want to test the image resizing service on the Google App Engine that I wrote about two years ago to resize images in Gaelyk. You can read the original blogpost about it here.

So I have a website that sits at http://gaelykresize.appspot.com/upload.groovy. What I would like to do is to be able to upload an image from my file system into it and see that all these options work properly.

So I start writing a small geb test and set a height and width:


@Grapes([
@Grab("org.codehaus.geb:geb-core:0.7.2"),
   @Grab("org.seleniumhq.selenium:selenium-chrome-driver:2.23.0"),
   @Grab("org.seleniumhq.selenium:selenium-support:2.23.0")
])

import geb.Browser

Browser.drive{
   go 'http://gaelykresize.appspot.com/upload.groovy'
   $('form').with{
     customWidth = 200
     customHeight = 129
     // TODO: upload image
   }
   $('input[name="submit"]').click()
}

I get stuck on how to tell the browser to choose an image in the file system. Normally, I would assign a filepath to the uploaded file and test my application since Geb cannot really manipulate the File Chooser. With Sikuli, I can just record a script that will select the file from my desktop and use that instead.

So I fire up the Sikuli IDE, navigate to the page I want to test and record a small script that does exactly that. I save it as sikuli.sikuli. I can use the play slowly mechanism to make sure I have all the bits right. But I end up with a script that looks like this:

This script selects the window in Chrome with our website title, clicks on the right UI buttons to get to the desktop and then selects a filename. In this example we’re using a screenshot for the filename, but you could have also pasted a path into the file selection dialog.

Great, so now we have this file sikuli.sikuli. On the Mac, it appears that this is a file, but it’s a directory. When we navigate into this directory we can the following structure:

1351295066702.png	Desktob.png		sikuli.html
1351295181355.png	HideFAVORITL.png	sikuli.py
5rcnioe12728.png	SampletoResi.png
AllMyFiles.png		addfile.py

We see a bunch of python scripts and all the images that are needed to drive this test. If you open the sikuli.html file, you can see the test that I have created with the embedded images.

If we open my saved sikuli.py, you will see the same output, except with the image names relative to this directory.

click( "SampletoResi.png" )
click( "1351295066702.png" )
click( "AllMyFiles.png")
wait( "Desktob.png")
click( "Desktob.png" )

click( "5rcnioe12728.png" )
click( "1351295181355.png" )
waitVanish( "HideFAVORITL.png" )

Since we’re using groovy, we can write a quick wrapper around this that would allow us to simply copy and paste this content and run via the Sikuli Java API.

@Grapes([
    @Grab('org.sikuli:sikuli-script:0.10.2')
])

import org.sikuli.script.*;

Screen s = new Screen()

s.with{
     // hey look, this is just copied and pasted code!
     click( "SampletoResi.png" )
     click( "1351295066702.png" )
     click( "AllMyFiles.png")
     wait( "Desktob.png")
     click( "Desktob.png" )

     click( "5rcnioe12728.png" )
     click( "1351295181355.png" )
     waitVanish( "HideFAVORITL.png" )

     // end of copied over Sikuli script
}

Using the groovy with() command, we can make the exact same code that works in python work in Java. If we run the above with Java 1.6, we will see the screen interaction that we just recorded work. Because groovy allow us to preserve the same exact script as the one we recorded with the IDE, replaying and overwriting existing scripts is possible and easy.

So now we have a quick snippet of code we can add to our geb script to select the file we want.

The last bit here is simply to check that the image that we have generated is the size that we want. While you’re able to set a file in Geb easily, making sure that the resulting image is what we expect is much more difficult. With Sikuli, this task becomes incredibly easy. In fact, all we need to do is just add one line of Sikuli script.

Putting it all together, we get a script that looks like the following:

@Grapes([
    @Grab("org.codehaus.geb:geb-core:0.7.2"),
    @Grab("org.seleniumhq.selenium:selenium-chrome-driver:2.23.0"),
    @Grab("org.seleniumhq.selenium:selenium-support:2.23.0"),
    @Grab('org.sikuli:sikuli-script:0.10.2')
])

import geb.Browser
import org.sikuli.script.*;

Screen s = new Screen()

Browser.drive{
    
    go 'http://gaelykresize.appspot.com/upload.groovy'
    
    $('form').with{        
        customWidth = 200
        customHeight = 129
    }

    // pick a file from the desktop 
    s.with{
        click( "SampletoResi.png" )
        click( "1351295066702.png" )
        click( "AllMyFiles.png")
        wait( "Desktob.png")
        click( "Desktob.png" )

        click( "5rcnioe12728.png" )
        click( "1351295181355.png" )
        waitVanish( "HideFAVORITL.png" )
    }
    
    $('input[name="submit"]').click()
    
    s.wait( "JAviF.png" )
    
}

This example is pretty artificial, but it helps demonstrate how many more things Sikuli makes visible. You can test the installation of an adobe air application, change your Java plugin version, click on webstart run dialogs, etc. You can also use this mechanism to automatically test things like region encoding for geocoded videos, share button interaction with facebook and other external javascript / flash related garbage that goes into websites.

Another quick example: Interacting with Plugins and responsive design with Sikuli.

One of the cool things about Sikuli is that you’re able to use Chrome plugins and firefox add-ons to control different elements of the screen. Here is a little Sikuli script I recorded using the window resize chrome plugin to check the responsive design of http://www.css-tricks.com.

Since Sikuli allows me to click on anything on the screen, I can use extensions and plugins to change my user-agent, screen sizes, etc. It definitively makes more things testable.

Provided I set up the profiles and extensions within chromedriver correctly, I can easily add this test to my existing suite.

Things to Consider

The notion that you could take little screenshots and interact with them without knowing how deeply they are positioned within your DOM tree is quite exciting. However, there are a few things that we should consider when playing with Sikuli:

1. Cross Platform Look and Feel – one of the problems that might arise is that certain system interactions look different across different browsers and operating systems. The screenshot based test that works fine in MacOS will certainly not work the same in Windows or Linux.

2. Ease of update – The problem with a tool like Sikuli is that whenever a designer drunk with power and rage changes the CSS, every single one of your tests that rely on them are going to fail. I think this is where taking a Geb Page Object approach would be very helpful.

3. Speed – One of the complaints about Sikuli is that it is fairly slow compared to having webdriver just rip through DOM elements the same ways lies spew out of Mitt Romney’s mouth. Since Sikuli has to interact with the screen, tests written in Sikuli will be slower than just writing them with plain old Geb. Having the dependency on screen state also means you can’t run tests in parallel.

4. Pure Sikuli IDE vs. Geb/Sikuli hybrids – The Sikuli IDE is quite amazing. It allows you to quickly record the interactions that you want without worrying much about XPaths and expressions. As a Geb user, I am sometimes blinded by the idea that writing a functional test means only writing it in Geb. But I think we should always ask whether there is value in spending a bunch of time writing tests by hand. Given the brittleness of some of the interactions, perhaps it makes sense in some cases to just have a bunch of throwaway tests not written in Geb that can be quickly replaced. These test will complement the more rigidly tested code.

The compromise here should be to only use Sikuli where traditional webdriver cannot do it’s job. Sometimes it might be better to have a slow test rather than no test at all. Sikuli will not replace a real QA person, but it can definitively automate some of the work they do.

Also, it should really help to use the same best practices in Geb when writing tests with Sikuli. This might involve using Page Objects extensively and have helper methods within these objects that collaborate with Sikuli. In the long term, it might help to have a common Screen object shared across all the tests in the same way Geb shares the Browser and Driver. This can easily be done in Spock via a base class that contains a @Shared screen.

One of the cool things about Sikuli is that it helps extend Geb outside of the browser. This allows us to interact with content in Flash or Java that might not be possible with straight WebDriver. It also opens up the world of plugins, allowing us to perform tasks that might not have been possible in our existing tests. For example, we can use Sikuli to change user agents, screen sizes and even drive emulators. Better test coverage should hopefully lead to better code.

Now we just have to figure out how to double click on reality with our Google glasses and we can get rid of our entire QA team.

2 thoughts on “Geb and Sikuli – Adding image recognition to help your functional tests see

  1. Luke Daley

    Great post!

    Excellent points about a hybrid model. I think it makes a lot of sense and your rationale is great.

    I think I might incorporate som of this into The Book of Geb if you don’t mind, as a pointer for automating things that WebDriver can’t do.

    Reply

Leave a comment