Go to...User Experience for a Better World published in The Journal of Electronic Commerce, Volume 12, Number 2
We've already given you the punch line: the low-fidelity test performed as well as the high fidelity test. In fact, the researchers recommended "we would not have spent the time and effort to build a high-fidelity prototype" if their only goal was usability testing. (The project did have other goals.) "In fact this is how we currently design IVR systems in practice." To encourage you in future analyses, here are some of the measures that showed group equivalency. First, the experimenters identified 21 problems with the IVR interface. Comparing the two groups, they found no differences in...
The authors list issues for which a high fidelity prototype can be useful. However, mockups limited to specific questions could serve, as well. A prototype or mockup can test...
Furthermore, investment in a prototype can enhance...
|
||
| How to Conduct an IVR Usability Test | ||
|---|---|---|
| Introduction | The following steps and data represent a demonstration project that Human
Factors International, Inc. accomplished on an IVR that served a telecommunications
firm that we will call Phones-R-Us. Expert review indicated significant
potential for user confusion and consequent overload of the CSR staff.
Subjects came from a university population – students and staff.
IVR Usability Test
|
|
| Step 1. Get Subjects | Choose the number of subjects to match the expected probability of finding a given problem. Big problems need fewer subjects. Subtle problems need more subjects. Experience indicates 10-20 subjects would provide insight into the problems that we anticipated. Our intern tested 16 subjects with telephone experience and varied educational background and gender. He used 2 of the sessions to learn to write the subject's comments rapidly and concisely. We used data from the following 14 subjects. Our intern videotaped five of the interviews in case we wanted to demo the process. If needed, provide training to give your subjects the same expertise your actual users have. (If you expect a specific background, then recruit – and pay – subjects from your user population.) In our case, we only needed experience using a telephone and age enough to qualify for a telephone card. Here's a subject selection summary: Subjects
Comment During data analysis (see below) we wanted to see if ESL made a difference in how subjects felt about the IVR menu. Therefore, we used a statistical test to check for differences between the average scores on each of the 5 satisfaction ratings (given below). "NSD" means No Significant Difference would be found 19 times out of 20 similar tests (the so-called "95% confidence" rating). We used the t-test for unequal variances found in Microsoft Excel. You don't need such confirmation if your own group of test subjects has no particular differentiating characteristic. |
|
| Step 2. Determine the Tasks and Test Script | Based on preliminary expert review, we had specific issues we wanted to test. Were our suspicions correct? What percentage of average users would have difficulty? One of us devised 10 test scenarios to meet these needs. Concrete language and specific instances make the test more valid. If necessary, provide any paper documents that would normally be used, such as a credit card statement On the next page is our test script, with the task scenarios. The data supervisor read the script to maintain consistency of expectation and motivation among subjects. |
|
| Step 3. Collect Performance Data | Our intern spent about an hour with each subject. He recorded the demographic data (indicated above), then administered the test script. He used a speakerphone so that he could hear the IVR prompts. (Remember, if this were a "low-fidelity" test he would have read out the prompts himself.) He asked the subjects to tell him which button they pressed. He made a point to record the button presses in sequence for each test question. He also recorded his observations and useful subject comments for each button press. At the end of each task, he asked them to describe their experience and degree of difficulty. Note that a subject may have felt they completed the task correctly because they got a CSR – whether by accident or on purpose. In reality, they were scored as fail because they didn't follow the intent of the design. Subjects had no difficulty with the presence of videotape equipment and its operation. |
|
| Top | ||