HFI Usability Home

Usable. Experience. Design.

HFI Usability Home About HFI - Usability Experts Usability Consulting Usability Training & Certification Usability Tools & Standards Usability Newsletter Executives Only  

Contact Us | 1-800-242-4480

 
UI Design Newsletter
Current Issue
Past Issues
Reader Comments
Subscribe
Change Address
divider
HFI Webcasts
June 2008 Webcast
Upcoming Webcasts
Past Webcasts / Podcasts
divider
Ask Eric
Questions & Answers
Ask your question
divider
Readings
Published HFI Articles
White Papers
Intranet Standards
GUI Standards
Quantitative Usability
e-Commerce Usability
GUI Design
IVR
divider
Just Fun
Cartoons
Mouse Maze
10 Web Usability Tips
Usability Quiz
Web Usability Quiz
Contextual Innovation Quiz
Persuasive Design Quiz
Persuasion Flow Symbols
History of HFI Buttons
divider
Resources
Persuasion Flow Symbols
Accessibility
Bibliography
Usability Links
HCI Degree Programs

Quantitative Usability Articles:
1. Success Rates
| 2. Time-on-Task

Print this page | Email this page

John Sorflaten

John Sorflaten, PhD, CUA, CPE, Project Director, looks at the pitfalls of using "averages" when dealing with time-on-task, and how to avoid them.

Secrets to Using Time-on-Task for ROI Goals
   

Summary

We're so accustomed to using "averages" to characterize ROI measurements that we easily go astray when dealing with time-on-task. First, time-on-task data is a strange animal, and merits special mathematical treatment. Second, most measures have some uncertainty.

For both problems, we show you how to wrestle this problem to the ground. Figuratively, of course. But the math is easy when you use a special Web calculator built just for this problem.

Check out your ROI savvy with this following docudrama.

divider line

Episode #342 of Usability Crossroads: "Telling it Like It Is"

Scene 1: DAY, EXTERIOR, IN CAR ON LOS ANGELES FREEWAY

You (usability specialist): Wow, look at that decked-out Beemer (also known as a BMW). People spend money on anything in LA if it's their car. Now, if we could only get management to buy into saving 20 seconds on our Customer Service average handle time. THAT would be real value for their usability bucks.

Customer Service User Rep: It would be great to save 20 seconds out of our average 6 minutes per call. Can your usability team really save that much time in a re-design?

You: Well, at $1.11 million bucks a second savings my usability team will jump through a lot of hoops for you. Watch us. For enough savings, we do real magic for you.

Customer Service User Rep: Five thousand people handling customer service calls really adds up fast.

You: (smiling) Twenty seconds is ONLY twenty seconds! It's a piece of cake. Oops, watch the road. You just cut that Beemer off. Listen to that guy honk his horn!

Scene 2: DAY, ATHENS, GREECE, INTERIOR, COMPTROLLER'S OFFICE (speaking via video camera)

Comptroller (mumbling while reviewing your Excel chart): Tell me, Glaucon, about this $1.11 million dollars per second savings. How concrete is that? You know, we get ROI savings sent up here every week. I hardly know what to believe any more.

You: We are already showing 18.67 seconds mean savings, on average, and can boost it to our 20 second savings goal with a little more re-design work. We even used the geometric mean, which suits "time-on-task" data. Have you checked all our spreadsheet numbers for the details?

Comptroller: Although these numbers look good, there's always some little glitch that affects the calculations badly. Did you know that some research shows up to 90% of corporate spreadsheets have an error?

Scene 3: CUT TO: LOS ANGELES, DAY, INTERIOR, USABILITY OFFICE

You (to the CSR User Representative): I just told our comptroller that our numbers were solid. We have a loaded labor rate of $40 per hour. That gives us $22.2 million annual returned savings. Then I calculated all the costs of our usability work. I even added in the cost of programmer and additional CSR training for the new system. These investments totaled $2.7 million giving a net savings of $19.5 million for the first year! That's a 7.22 times return or 722% ROI just the first year! What more could she want?

CSR Representative: Yeah, these are big dollars. I don't see where we can go wrong on this.

Reality quiz

Which is right – scene 2 or scene 3?
Pick one:     Scene 2     Scene 3     Both     Neither

Analysis: To end the suspense, the answer is "Neither". Your ROI calculations are right – as far as they go. You even used the geometric mean (see below). But did you know why? Meanwhile, your comptroller DOES need more information before investing in your idea. By limiting your goal to reaching the mean as a target, you failed to take into account the "margin of error" in your data.

divider line

Pitfalls to learn about

Although you remembered to use the "geometric mean" for time-on-task average, you still failed to give a "margin of error" for your test results. Usability fails to qualify the goal of saving 20 seconds in terms of a "margin of error".

1. Geometric mean: This first issue involves knowledge of how time-on-task data gets skewed.

Our typical definition of "mean" (or, "average") requires we assume that our data has about as many values below the mean as fall above the mean. Also, these below and above values are about the same distance from the mean, but in opposite directions. Technically, this gives us data values that follow a normal "bell-shaped curve".

However, when we use time-on-task data, this assumption often fails. The data gets lopsided, with some people taking much longer to do the task than expected. We fail to get a bell-shaped curve when looking at the distribution of the data values.

So this makes the mean (average) sensitive to the extra-slow performance of a few individuals. This affects the "average" much like the sale of a few $10 million dollar homes in your small town raises the average price of all the homes by hundreds of thousands of dollars. Clearly our understanding of "average" needs more clarification in this case.

For example, as a consequence of these problems, newspapers express "average" home prices as a "median" price. Thus, readers easily visualize half of all homes being priced above the median and half below. We get an intuitive feel for the "practical" average when we use the median as the average.

However, for time-on-task data statisticians use an even better averaging method called the geometric mean.

Simply put, we give more weight to the typical data points and less weight to the few outliers. The details shouldn't worry us here. We'll give you a Web-based calculator to do that math.

All said, most of your colleagues would never know to ask THIS question about the geometric mean. So it becomes your responsibility alone to lead them correctly..

2. Margin of error: This second issue occurred when both you and your comptroller failed to ask about the "confidence level" in your time-on-task results

Let's cover familiar ground first, then we'll apply it to our ROI calculations.

Think about how you read the results of voting polls. When 49% of prospective voters claim they will vote for Candidate A and 51% claim they will vote for Candidate B, which candidate is winning?

If someone says 51% wins the contest, you know they are probably wrong.

Newspaper polls always come with a "margin of error" typically plus or minus (+/-) 3%.

In our example, the candidates tied because +/- 3% causes the 49% and 51% to overlap.

3% plus 49% makes 52%. This overlaps with the 51% polled for the other candidate. Thus, we see there is no "true" difference between the two candidates.

But these statistics we read in the newspaper typically leave out an important point. Newspapers leave out the "level of confidence" that we get with a given margin of error.

They should include that these plus and minus ranges are calculated to give a 95% confidence level. That is, the interval between the low value and the high value encompass the true mean 95 times out of 100 similar polls.

That range certifies that your result will fall between the bottom margin of error and the top margin of error at least 95 similar polling events out of 100.

As you can imagine, more subjects gives you a better feel for the outcome. Fewer subjects give less confidence.

Note that to get a margin of error as small as +/-3% you need 1068 subjects. Whew..

And 385 subjects gets you a margin of error of +/- 5%. And 97 subjects gets you a margin of error of +/- 10%. Thus smaller numbers of subjects make the margin of error larger.

Image1

divider line

The smart person's "fix" on margin of error for time-on-task

Lucky for us, methods exist to easily calculate the margin of error for time-on-task for a given confidence level you choose. Jeff Sauro, educator and usability consultant, has put together a web site with lots of usability know-how for quantitative analysis. In that site, you'll find a time-on-task calculator that gives you the geometric mean and the upper and lower margins of error. See www.measuringusability.com/time_intervals.php

Check it out. You can chart your initial finding like this. These are the details you should have shown your comptroller.

Image2

By the way, notice this chart says "one-tailed test". That means we are really interested only in one of the margin of error bars – here, that's the bottom one. For the record, we show two margins of error – the bottom and the top error bars. We do that just to meet the visual expectations people have when they look at such charts. Jeff's calculator is set up to handle two-tailed tests for the confidence level you select on his web page.

Therefore, when you use Jeff's calculator for a one-tailed test, as we discuss here, we have to set the confidence level drop-down list to 90% to give us the desired 95% confidence level. While this sounds strange, professionals use this method to do one-tailed tests.

divider line

How to say it like it is...

Now you know how to rewrite Scene 1 above.

Comptroller: Great, Glaucon. I like these margin of error bars you show in your Excel chart. They give me a sense of how much uncertainty we have in our data. You said that the range from top to bottom represents where the geometric mean would fall 95 times out of 100 replications of your test. Is that right?

You: Yes, we call that range the 95% confidence interval for the margin of error bars. That means we are using a 95% confidence level as a criterion for our time-on-task tests.

Also, note that we used the geometric mean to handle the skewed data we expect for task-times. We used the geometric mean because other research has shown that some test participants take extra long and that data is not balanced out by participants who are extra fast.

Also, remember we are only concerned with the bottom Y-error bar, because that's the number that has to meet our 20 second saving goal.

So, in our first test, the mean showed 18.67 seconds savings, and the bottom of the confidence interval ended at 15.7 seconds – we clearly failed to meet our 95% confidence level in hitting the goal. We need to have the bottom error bar above our goal of 20 seconds savings.

Comptroller: I see, Glaucon. How did you do on subsequent re-design efforts?

You: We conducted 4 more iterations. Finally, the application was fast enough to give us data where the bottom of the margin of error bars fell at 20.08 seconds.

Whew, exceeding our goal by only 8 hundredths of a second was so close that we did another iteration just to make sure.

Our 6th iteration put the bottom of the margin of error bar at 21.14 seconds saved. The geometric mean savings for that last test came out to be 24.08 seconds. In a sense, the bottom margin of error number is our real target. The geometric mean is only a means to an end (pun intended J )!

image3

Comptroller: Very good, Glaucon. Your ability to explain these things with a margin of error gives me confidence that you have handled the calculations in a reasonable way. I hadn't heard of using geometric means for time-on-task. Thanks for mentioning that.

divider line

Conclusion and wrap-up advice...

Now you have a dialog worth repeating. Remember to use Jeff Sauro's calculator to get the geometric mean and confidence intervals at www.measuringusability.com/time_intervals.php.

For people who like to make charts, use your spreadsheet to make these calculations. (Microsoft Excel uses the term "Y-error bars" for "margin of error" bars.) For a starter-kit on these calculations and charts with the margins of error bars, visit this HFI Web page.

At the link, first we give an example of the calculations for this article using Jeff's calculator. Then we give you a link to an Excel spreadsheet with formulas for doing the same thing. You can modify the data and see your associated charts.

divider line

References

Jeff Sauro (2005). "Confidence Interval for Task Times in Usability Tests," Downloaded from www.measuringusability.com/time_intervals.php on 21 Nov 2006.

John Sorflaten (2006). "Making the Fuzzy Part of ROI Clear" in interactions, 13, 6 (Nov + Dec), 38-41.

divider line

Send your comments to the author

Article: Success Rates | Time-on-Task