Testing != test execution

April 19, 2011

Testing includes:

  1. Building a useful mental model of the application under test, including what value it should provide and what risks could threaten that value,
  2. Designing powerful tests, using that model to investigate important questions about the application’s quality,
  3. Test execution (which might be automated entirely, partly, or not at all),
  4. Analyzing the results to determine if there’s a problem, how severe it might be, or otherwise answer stakeholder questions, and
  5. Test reporting that clearly communicates the impact of the bug, and how to quickly reproduce it.

We often talk about testing as if it’s only test execution, yet often the most interesting, challenging, skill-intensive aspects of testing are in creating a mental model that helps us understand the problem space, designing tests to quickly and effectively answer key questions, analyzing what specifically the problem is, and communicating it effectively.

“Labeling a bundle of problems”

July 6, 2010

One of my favorite quotes from The Logic of Failure by Dietrich Dorner,

By labeling a bundle of problems with a single conceptual label, we make dealing with that problem easier — provided we’re not interested in solving it. Phrases like “urgently needed measures for combating unemployment” roll easily off the tongue if we don’t have to do anything about unemployment. A simple label can’t make the complex nature of a problem go away, but it can so obscure complexity that we lose sight of it. And that, of course, is a great relief.

By translating an unclear goal into a clear one, we often discover a multifaceted problem, one that consists of many partial problems.

A Creative Heuristic for Automated Detection of Certain Layout Bugs

February 23, 2010

I was just looking over the talks from the 2009 Google Test Automation Conference (GTAC) in Zurich

In test automation, the most difficult challenge is often finding a useful Test Oracle — a heuristic mechanism by which we can determine if our test has found a bug. This is perhaps particularly true when we’re trying to use automated checks to find layout problems. Given that, I was curious what Michael Tamm’s “Fighting Layout Bugs” would have to say…and have to say I’m impressed.

Tamm started out with running the HTML and CSS through the W3C validator, which is cool as far as it goes, but it got really interesting when he declared he has a heuristic oracle to determine if text has overrun a boundary (e.g. a name in a summary box which overflows the edge of that box). How does he do it?

Get the set of pixels that are text:

  • inject jQuery
  • jquery(‘*’).css(‘color’. ‘white’);  #set all text to white
  • take screenshot
  • jquery(‘*’).css(‘color’. ‘black’);  #set all text to black
  • take second screenshot
  • compare. the diff should be the set of all text pixels

Next get the borders:

  • jquery(‘*’).css(‘color’. ‘transparent’); #make all text transparent
  • Take screenshot
  • Find vertical pixel sequences of a certain min length, where each pixel has the same color (or a very similar one)
  • Only select those which have a high contrast to the left or right. This is the set of vertical borders.
  • Do the same for horizontal borders.

At this point, we have two sets of pixels; any pixel which is in both sets is likely a layout bug.

Now, obviously all of these screenshots of browsers makes for slowly running tests, and I’m curious how many false negatives there would be…but the idea is intruiging, and I quite admire the creative test oracle. I’m not sure where this would be an effective use of resources in my current work on Freebase.com, but I wonder if there are areas where it’s worth the bang for the buck.

For the curious, here’s his code. I recommend the slides, which stand on their own quite well. Here’s the talk on YouTube, but be warned that it suffers from poor audio. If you don’t want to watch the entire talk I agree with the YouTube commenter dericbytes that the highlight is ~ minute 21 to 26.

Congrats to Michael Tamm for thinking outside the box and creating a novel test oracle for detecting certain classes of layout bugs!

Oh, THAT’S what you meant?!?

January 19, 2010

Prashant Chavan expressed frustration today on the Agile Testing mailing list about there not being any standard definitions of testing terms. I can sympathize. That said, one of the hallmarks of becoming a more experienced tester is learning that there are no standard definitions…and can’t be any, given the lack of consensus in the field on what testing is.

I learned early on not to assume that I knew what others meant when they used a term. True story:

Early on at my second testing job, my boss came by and asked me how much “Sanity Testing” I’d done on the product so far. Not being familiar with the term, I asked how he used it and he replied, “You know, when you’ve finished the standard acceptance level testing, and you’re doing lots of insane cases, looking for hidden bugs.” We were then able to have a conversation.

I was curious though, because it frankly seemed like an odd usage. I googled “sanity testing” and most of the references I found were to quick checks, often done before moving on to more thorough testing, e.g. this from the Wikipedia “In computer science, a sanity test is a very brief run-through of the functionality of a computer program, system, calculation, or other analysis, to assure that the system or methodology works as expected, often prior to a more exhaustive round of testing.”

Our industry doesn’t have enough agreement (or any one source of authority) to have “right” meanings of words…but I noted that in this case my boss’s usage of “sanity test” was almost the opposite of the most common examples I’ve seen. If I already had a definition of sanity testing, perhaps I would’ve assumed I knew what he meant…and ended up in a frustrating miscommunication.

In the long run, the Association for Software Testing has been talking about a Dictionary Project that (rather than trying to present a nonexistent One True Definition) will show a range of usage for words, together with citations, similar to the Oxford English Dictionary. Needless to say, this is an ambitious project and right now I believe it’s awaiting volunteers who’re ready to lead the effort. If you think that might be you, let me know!

Relief for the pain of waiting for AJAX elements on the page?

October 21, 2009

More and more web applications are taking advantage of asynchronous JavaScript (or AJAX) – and with good reason. It gives tremendous power to a web application, allowing the app to respond fluidly to the user without requiring a full page reload and making it feel more like a native desktop app. At the same time, it makes the application trickier to automate.

When I first started writing browser-based test automation, it was safe to assume that once the page loaded, if an element wasn’t there yet, it wasn’t going to be. With AJAX, that assumption goes out the window. Let’s take an example:

Someone’s registering for our website. The site prompts her for a username, which she enters. Now, in the olden days, she would fill out the rest of the page and then submit. The page would refresh…and often tell her that the username she selected is unavailable. Rinse and repeat until you finally find an available name.

With AJAX, the moment she clicks away from the username field, we can have feedback appear on the page, letting her know whether or not that name has been taken. She doesn’t need to wait for a page refresh before each attempt, the feedback may be nearly instantaneous.

Sounds good, right? Now I come along to automate a test of this page. If I write something like (in pseudocode):

    set username = reserved_name
    assert response_text says reserved_name is taken.

…much of the time, I’ll get an error saying that the response_text isn’t there on the screen.

How can I tell my script not to progress to the next step until the application is ready? I have a range of options here. At the cringe-inspiring end of the spectrum, I can write:

    set username = reserved_name
    sleep 10 seconds
    assert response_text says reserved_name is taken.

The problem here is that one either sets the sleep too short, and the script errors out when the test server is under a bit of extra load — or one sets it too long, and the scripts begin to spend 90% of their time sleeping.

Somewhat better is to do like so:

    set username = reserved_name
    wait( up to 20 seconds) for response_text #then
    assert text in page says reserved_name is taken.

The issue here is that you need to know exactly what you’re waiting for, may need to list multiple triggers to wait for, and your script risks of aging poorly as these triggers change.

To get better than this likely requires testability features in the application under test. One thing that’s worked well for me is having the application set a flag whenever it starts to execute AJAX, and unset that flag when the AJAX completes. At Freebase.com, we chose to have that flag be an attribute on the body tag, like so:

Whenever AJAX is kicked off, this is set to ajaxStart and whenever the last asynchronous process completes, it sets it to

Now I can say something like:

    set username = reserved_name
    wait_until_loaded
    assert text in page says reserved_name is taken.

Where wait_until_loaded is defined like:
Wait( up to 20 seconds) while body.attribute(ajax) == ‘ajaxStart’

This pushes the burden of knowing when the page is ready for the next step back to the application under test. The downsides of this approach are: (1) it requires code in the application under text to make it work. Sometimes that’s difficult (politically or otherwise). (2) this solution’s only as robust as the code that sets to ajax flag. If the application under test can’t or won’t report this reliably, this solution will only cause you grief.

That said, the upside of this approach is potentially huge: It frees the tester from writing a custom wait trigger after every AJAX step, speeding up test script writing and often making the scripts more robust as well.

Depending on how familiar you are with the testing library you’re using, it may be worth considering putting this check deeper in your code. For example, one might change the click method to always wait_until_loaded before clicking. This has the potential to streamline test scripts that much more.

In fact, the Windmill test automation library has already done this on trunk. As of the next release of Windmill, the default behavior for everything other than asserts is to wait for that element for a set number of seconds — progressing immediately if it’s there right away, but only throwing an error if the timeout is reached. I suspect as web applications become more AJAX-heavy, this will become the standard in test automation libraries.

There isn’t a One Right Answer for any context, but I’ve been very happy with having the application under test set an ajax flag of some sort. If you’re testing an application where the page changes asynchronously, I encourage you to give it a try.

When (Broken) Software Inspires New Language

October 15, 2009

Software Testing has many joys. Most have to do with solving tricky puzzles, collaborating with brilliant minds, or contributing to the creation of elegant software…but sometimes there are simpler pleasures as well.

Today a colleague sent around a prototype of an application that takes any string and attempts to pluralize it. It handles a lot of tricky cases, e.g. correctly pluralizing “surgeon general” as “surgeons general”. On the other hand, it makes many mistakes, and I suspect will always do so. One example: it currently assumes that any input is singular. When run against one of the data sets it was designed for (a list of Types Freebase.com users have created) I noticed it turned “Most Disliked Things” into “Most Disliked Thingses”. At that point I couldn’t resist watching it turn “Hobbits” into “Hobbitses”…and then coining a new term:

Smeagolize: To alter a word or phrase, making it sound like something the character Smeagol from the Lord of the Rings would say, e.g. “Most Disliked Things” into “Most Disliked Thingses”

This reminded me of one of my favorite examples of a coder solving a problem in an amusing way. This was many years and companies ago. A harried coder was asked to display some (potentially long) URLs in a very small space on a home page. As he worked on it, he discovered that some URLs didn’t fit the space. What to do? A less intrepid coder might have asked for help with the design problem, but he tackled it himself…by writing code to remove all the vowels from any URL over 20 characters.  This led us to coin the term:

Disemvowel: To remove the vowels from a URL or other word, e.g. changing
http://www.associationforsoftwaretesting.org/ to
http://www.ssctnfrsftwrtstng.rg/

Needless to say, this particular code didn’t live to see the light of day…but its memory lives on in the term Disemvowel.

Has broken software inspired new words for you as well?

Controlled Experiments To Test For Bugs In Our Mental Models

June 24, 2009

Here’s a 22 minute video lecture on using controlled experiments to discover what your customers think, and if you work on web software I suspect you’ll find it 22 minutes well spent. Kohavi points out several examples where the results of tests were quite surprising, as well as some interesting suggestions for how to organize them.

Note, this is related testing software in the broad sense. The bugs that split tests find are the (plentiful) bugs in our mental models, not in our implementation.

Question about Model-Based Testing

June 3, 2009

 

First, a quick note on terms. I tend to use James Bach’s definition of Testing as “Questioning a product in order to evaluate it”. All test rely on /mental/ models of the application under test. The term Model-Based Testing though is typically used to describe programming a model which can be explored via automation. For example, one might specify a number of states that an application can be in, various paths between those states, and certain assertions about what should occur in on the transition between those states.
There are real costs here: building a useful model, creating algorithms for exploring it, logging systems that allow one to weed through for interesting failures, etc. Whether or not the costs are reasonable has a lot to do with *what are the questions you want to answer?* In general, start with “What do I want to know? And how can I best learn about it?” rather than looking for a use for an interesting technique.
All that said, some excellent testers have gotten a lot of mileage out of automated model-based tests. Sometimes we have important questions about the application under test are best explored by automated, high-volume semi-randomized tests. Here’s one very colorful example from Harry Robinson (one of the leading theorists and proponents of model-based testing) where he discovered many interesting bugs in Google driving directions using a model-based test (written with ruby’s Watir library): http://model.based.testing.googlepages.com/exploratory-automation.pdf
Robinson has used MBT successfully at companies including Bell Labs, Microsoft, and Google, and has a number of essays here: http://www.harryrobinson.net/
Ben Simo (another great testing thinker and writer) has also written quite a bit worth reading on model-based testing: http://www.questioningsoftware.com/search/label/Model-Based%20Testing
Finally, a few cautions: To make good use of a strategy, one needs to explore both its strengths and its weaknesses. Toward that end, James Bach has an excellent essay on the limits of Model Based Testing http://www.satisfice.com/blog/archives/87 has links to his hour long talk (and associated slides on the Unbearable Lightness of Model Based Testing.
I’ll end with a note about what Boris Beizer calls the Pesticide Paradox: “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffective.” Scripted tests (whether executed by a computer or a person) are particularly vulnerable to the pesticide paradox, tending to find less and less useful information each time the same script is executed. Folks sometimes turn to model-based testing thinking that it gets around the pesticide problem. One should remember that in some contexts model-based testing may well find a much larger set of bugs than a given set of scripted tests…but that it is still fundamentally limited by the Pesticide Paradox. Remembering its limits — and starting with questions MBT addresses well — it has the potential to be a very powerful testing strategy.

If you haven’t been to Stack Overflow yet, it’s an interesting forum for asking technical questions — and sorting through the answers — written by Joel Spolsky and Jeff Atwood. 

I noticed a question on Model-Based Testing over there that I had something to say about. I wanted to link to articles by Harry Robinson, Ben Simo and James Bach…but as a new user, I’m allowed to add only one link. What to do? How about using my one link to go to my blog. Here’s the original question.

And here’s my answer, complete with links:

First, a quick note on terms. I tend to use James Bach’s definition of Testing as “Questioning a product in order to evaluate it”. All test rely on /mental/ models of the application under test. The term Model-Based Testing though is typically used to describe programming a model which can be explored via automation. For example, one might specify a number of states that an application can be in, various paths between those states, and certain assertions about what should occur in on the transition between those states. Then one can have scripts execute semi-random permutations of transitions within the state model, logging potentially interesting results.

There are real costs here: building a useful model, creating algorithms for exploring it, logging systems that allow one to weed through for interesting failures, etc. Whether or not the costs are reasonable has a lot to do with *what are the questions you want to answer?* In general, start with “What do I want to know? And how can I best learn about it?” rather than looking for a use for an interesting technique.

All that said, some excellent testers have gotten a lot of mileage out of automated model-based tests. Sometimes we have important questions about the application under test are best explored by automated, high-volume semi-randomized tests. Here’s one very colorful example from Harry Robinson (one of the leading theorists and proponents of model-based testing) where he discovered many interesting bugs in Google driving directions using a model-based test (written with ruby’s Watir library).

Robinson has used MBT successfully at companies including Bell Labs, Microsoft, and Google, and has a number of helpful essays.

Ben Simo (another great testing thinker and writer) has also written quite a bit worth reading on model-based testing. 

Finally, a few cautions: To make good use of a strategy, one needs to explore both its strengths and its weaknesses. Toward that end, James Bach has an excellent talk on the limits and challenges of Model-Based Testing. This blog post of Bach’s links to his hour long talk (and associated slides).

I’ll end with a note about what Boris Beizer calls the Pesticide Paradox: “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffective.” Scripted tests (whether executed by a computer or a person) are particularly vulnerable to the pesticide paradox, tending to find less and less useful information each time the same script is executed. Folks sometimes turn to model-based testing thinking that it gets around the pesticide problem. In some contexts model-based testing may well find a much larger set of bugs than a given set of scripted tests…but one should remember that it is still fundamentally limited by the Pesticide Paradox. Remembering its limits — and starting with questions MBT addresses well — it has the potential to be a very powerful testing strategy.

Are Ladders Useful?

April 6, 2009

On the watir-general email list, George Sanders recently wrote

“It seems that I’ve been encountering more people within my workplace (and, alas, even within my own QA team!) that are not sold on test automation. From what I’ve learned so far, there seems that automation will never cover 100% of what needs to be tested, but this doesn’t negate the need.

Another frustration is that I’ve been tasked to write automation scripts as part of my year-end goals. However, I haven’t been assigned hours in my work week to do them! All of my script development has been after-hours and weekends (notice I’m posting this on a Saturday!).

Has anyone else run into naysayers? How can I convince the decision-makers that this is a worthwhile effort?”

I responded on-list, but want to improve my answer (and change my analogy). This conversation tends to be quite unproductive when it becomes “Is or isn’t test automation useful?” It’s a lot like “Are ladders useful?” Ladders won’t solve any problems on their own. There are plenty of problems that ladders will not help with a bit. Problems that are helped by step-ladders may only be made worse by a 20 foot ladder. Yet there are some problems where the right ladder well used will add tremendous value.

Similarly with testing, I like to look at what are the problems that need to be solved. Then I like to think about each of them, and consider solutions. Is one of your problems basic functionality breaking when code is changed? Unit tests might be a great solution to your problem. Do you suspect intermittent production failures are the result of concurrency issues? Likewise, automation-assisted tests might really help in reproducing the problem in house.

I think discussing the relative costs/benefits of automated browser-based regression testing is a good idea, and getting real experience reports helps a lot. Within this realm in particular — depending on many contextual factors —  there may be some problems that’re helped through automated browser-based tests and others where the cost is too high. 

Note, for some concrete examples of problems where Watir has been useful for me, see my post about a few of the most compelling cases where I’ve used Watir over the past five years.

Getting time added into the schedule for test automation is a different question, but one that might become easier if you’re able to focus the conversation around solutions to particular problems. (On the other hand, there are a lot of reasons — sensible and not — why test automation might not get onto the schedule.) In any event, focussing the conversation on specific problems and potential solutions for them is likely to increase the odds of a productive conversation.

Some Benefits Of Being Part of Real Professional Communities

April 3, 2009

A sign of the times, I know a handful of great testers and coders who’ve been laid off in recent months. One I just heard about today is Chris McMahon. I first encountered Chris as a contributor on the Watir, Software-testing, and Agile-testing mailing lists. At the time, I was QA Manager for a company that was having a rough time filling a position for a very technical API tester and test automation expert. I emailed Chris, one thing led to another, and I had the good fortune to work with Chris for a period.

Several jobs later for both of us, he fell prey to economic downsizing today. I have to say I was very impressed (though not really surprised) to see what a stir it has made in the software testing communities I’m a part of. The impact is clearest in all the conversation about and testimonals for Chris on Twitter.

Is this because Chris is a very skilled tester? Absolutely it is…but there are other very skilled testers out there who just aren’t as known. He blogs, he has had articles published in several journals, and he actively contributes to multiple online testing communities. And by contributes I mean he engages in dialog, he offers ideas, he offers help to folks who ask good questions. On the Watir list, he claimed an unofficial spot some time back as the Answerer Of Off Topic Questions. When someone raises something that’s more of a ruby / test design / other library related question, he has frequently had something helpful to contibute, and he’s done so, even as he worked for the last year or so at a company that uses Selenium instead.

Being a part of a testing community has many benefits: Exposure to new ideas, meeting colleagues, a chance to have our ideas tested and improved via feedback, etc. — but this is a place where it’s particularly clear. It may turn out to be a tough time to be looking for a remote testing position, but the way Chris has chosen to live his professional life over many years seems to be reaping major dividends right about now.