Controlled Experiments To Test For Bugs In Our Mental Models

Here’s a 22 minute video lecture on using controlled experiments to discover what your customers think, and if you work on web software I suspect you’ll find it 22 minutes well spent. Kohavi points out several examples where the results of tests were quite surprising, as well as some interesting suggestions for how to organize them.

Note, this is related testing software in the broad sense. The bugs that split tests find are the (plentiful) bugs in our mental models, not in our implementation.


4 Responses to “Controlled Experiments To Test For Bugs In Our Mental Models”

  1. Justin Hunter Says:


    I LOVED listening to that presentation. So much in fact, that (a) I’ll try to track down the presenter for a phone call and (b) I was motivated to document a quick summary of his talk, included below.

    I firmly believe that applied statistics-based experiments are under-appreciated by businesses (and, for that matter, business schools). Few people who understand them are as articulate and concise as Kohavi. Admittedly, I could be accused of being biased as: (a) I am the son of a prominent applied statistician and (b) I am the founder of a software testing tools company that uses applied statistics-based methods and algorithms to make our tool work. Thank you for posting it. I’ll also bring it to the attention of my brother, whose blog focuses on similar topics.

    – Justin

    Justin Hunter
    Founder and CEO
    “More coverage. Fewer tests.”

    Ron Kohavi, Microsoft Research

    1:00 Amazon: in 2000, Greg Linden wanted to add recommendations in shopping cards during the check out process. The “HiPPO” (meaning the Highest Paid Person’s Opinion) was against it on the grounds that it would be a bad idea; recommendations would confuse and/or distract people. Amazon, a company with a good culture of experimentation, decided to run a small experiment anyway, “just to get the data” – It was wildly successful and is in widespread use today at Amazon and other firms.

    3:00 Dr. Footcare example: Including a coupon code above the total price to be paid had a dramatic impact on abandonment rates.

    4:00 “Was this answer useful?” Dramatic differences occur when Y/N is replaced with 5 Stars and whether an empty text box is initially shown with either (or whether it is triggered only after a user clicks to give their initial response)

    6:00 Sewing machines: experimenting with a sales promotion strategy led to extremely counter-intuitive pricing choice

    7:00 “We are really, really bad at understanding what is going to work with customers…”

    7:30 “DATA TRUMPS INTUITION” {especially on novel ideas}. Get valuable data through quick, cheap experimentation. “The less the data, the stronger the opinions.”

    8:00 Overall Evaluation Criteria: “OEC” What will you measure? What are you trying to optimize? (Optimizing for the “customer lifetime value”)

    9:00 Analyzing data / looking under the hood is often useful to get meaningful answers as to what really happened and why

    10:30 A/B tests are good; more sophisticated multi-variate testing methods are often better

    12:00 Some problems: OEC is hard culturally. People won’t agree. If there are 10 changes per page, you will need to break things down into smaller experiments.

    14:00 Many people are afraid of multiple experiments [e.g., multi-variate experiments] more than they should be.

    16:00 People do a very bad job at understanding natural variation and are often too quick to jump to conclusions.

    17:00 eBay does A/B testing and makes the control group ~1%. Ron Kohavi, the presenter, suggests starting small then quickly ramping up to 50/50 (e.g., 50% of viewers will see version A, 50% will see version B).

    19:00 Beware of launching experiments than “do not hurt,” there are feature maintenance cost

    20: 00 Drive to a data-driven culture. “It makes a huge difference. People who have worked in a data-driven culture really, really love it… At Amazon… we built an optimization system that replaced all the debates that used to happen on Fridays about what gets on the home page with something that is automated.”

    21:00 Microsoft will be releasing its controlled experiments on the web platform at some point in the future, probably not in the next year

    21:00 Summary
    Listen to your customers because our intuition at assessing new ideas is poor
    Don’t let the HiPPO drive decisions; they are likely to be wrong/ let the customer data do it
    Experiment often
    Create a trustworthy system to accelerate innovation

  2. testingjeff Says:

    Justin, thanks for your comments, and for adding your notes! I’ll have a look at Hexwise as well.

  3. Learning Using Controlled Experiments for Software Solutions « Hexawise's Blog Says:

    […] Jeff Fry linked to a great webcast in Controlled Experiments To Test For Bugs In Our Mental Models. […]

  4. Curious Cat Science and Engineering Blog » Controlled Experiments for Software Solutions Says:

    […] Jeff Fry linked to a great webcast in Controlled Experiments To Test For Bugs In Our Mental Models. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: