The purpose of this experiment is to examine the phenomenon of task-metaphor correspondence of interfaces. Our goal was to find out whether a single metaphor adopted in interface design is more appropriate to certain types of tasks and not others. The experiment made comparisons among three interfaces designed with three existing information retrieval metaphors for a tourist information system.
An information system containing tourist information about St. Louis was constructed for this study. Three interfaces based on three information retrieval metaphors (the book metaphor, the note card metaphor, and the map metaphor) were implemented in the information system. The three metaphors were selected based on the correspondences between the underlying characteristics of the metaphors and the characteristics of complex systems. Two principles were followed for choosing the appropriate metaphors: (a) The metaphors would have to be familiar to ordinary users, and (b) the metaphors in combination would have to cover as many characteristics of a complex system as possible.
The St. Louis information system was implemented as a HyperCard stack. The three interface models chosen were used as bases of interface design for the tourist information system. Each metaphor resulted in an interface version of the St. Louis' tourist information system. The main bodies of information contained in the three versions were identical. The differences among the three interfaces were in (a) their basic screen layouts, (b) the elements specifically belonging to a metaphor, and (c) the methods of information retrieval. Table 1 is a summary of major features of the three interfaces of the information system from these three perspectives. (See Lin (1989) for sample screens of the basic screen layouts.)
Three types of tasks were designed for the experiment. The rationale behind the three types of tasks was the assumption that each type of question would be answered most easily by using one of the three interface systems. All three types of tasks required searching for information in the St. Louis tourist information system.
Type I tasks required subjects to answer questions which referred to information appearing at the beginning of a large section. This type of task should be easier to accomplish by using the book interface for the physical location of the answers. In general, books may well be best for extracting information which requires reading sequentially for underlying ideas, philosophy, and reasoning of the writers. However, since the system used for this experiment was a tourist information system, information was mostly factual in nature. There were no questions which required extensive reading and inference making. Therefore, this type of question was designed instead.
Type II tasks required subjects to answer questions on simple facts. This type of task should be accomplished more easily with the note card interface. Locating the card with a specific piece of information on it with the note card interface should be relatively easy, since information was organized into smaller units, and could be accessed directly by using tabs. The major difference between Type I and Type II tasks is the physical location of the target information.
Type III tasks required subjects to answer questions that refer to relations such as distance, order, and precedence. This type of task should be more appropriate for the map metaphor because relations were presented explicitly in maps and diagrams.
A Summary of Major Features of the Book Interface, the Note Card Interface, and the Map Interface
The book interface
Screen layout and specific elements: An open book that includes a two-page table of contents and three end notes
Methods of information retrieval:
a. Directly through the table of contents of the book
b. Sequentially through page turning (both forward and backward)
c. Through footnote references
Screen layout and specific elements: A stack of note cards with two-level tabs
Methods of information retrieval:
a. Directly through first level tabs
b. Directly through second level tabs
c. Sequentially through flipping cards
d. Through cross-references
Screen layout and specific elements: Various organization maps and diagrams that include time lines, geographical maps, abstract diagrams, and text pages
Methods of information retrieval:
a. Hierarchically through different maps and diagrams
b. Sequentially browsing through text pages where available
c. Through cross-references
Experiment 1 used a 3 x 3 mixed between and within subjects design (Kirk, 1982). The between-subject treatment variable was the type of interface used. It had three treatment levels: I1 = the book interface, I2 = the note card interface, and I3 = the map interface. The within-subject treatment variable was the type of task to be performed. It also had three treatment levels: T1 = type I tasks (answering questions on information appearing at the beginning of a large section); T2 = type II tasks (answering questions on simple facts); T3 = type III tasks (answering questions about relationships). Three types of information were collected for analysis: (a) the response time for answering each question, (b) whether the answer provided by the subject for each question was correct or not, and (c) the search pattern for each question recorded in terms of the screens of information which have been viewed in answering the question.
Forty-five subjects drawn from the Department of Educational Psychology subject pool at the University of Illinois participated in this experiment.
Subjects were randomly assigned to one of the three treatment groups. Each treatment group had 15 subjects. The subjects began by filling out a questionnaire given by the experimenter. The questionnaire contained five questions that collected information on subjects' prior experience with computers, and four questions about St. Louis to see how familiar the subjects were with St. Louis. Subjects then worked with the information system. They first read a brief on-line introduction explaining the version of the tourist information system they would be using (making the metaphor explicit), and the nature of the tasks they would be asked to do. Each subject then worked through one neutral practice question and nine randomly arranged questions (three for each task type). The subject's task was to locate the information necessary to answer each question in the information system as quickly as possible, and to write down their answers on the same sheet of paper where the question was listed. The computer collected the response time and search pattern for each question.
The three dependent measures used in the analyses were response time, number of errors, and blocking. Response time for each question was defined as the interval between the start and end of each search. It was measured in seconds. For any question that was answered incorrectly, the response time was replaced by the subject's average response time for the remaining correct questions of the same task type. That is, if a type I question was wrong, its response time would be replaced with the average response time of the remaining correct type I questions of that subject. An average correct response time was calculated for each subject for a given task type.
An error was counted if the subject put down the incorrect information for a particular question. Blocking was a measure developed for this particular study. Besides relatively standard indicators such as time, error, and number of cards, a process measure which allowed us to make more detailed inferences on the internal mental process of the subjects seemed very desirable. This researcher was particularly interested in developing a measure which would not only indicate that subjects were having difficulty with a question but also allow us to locate the point of difficulty and to quantify the degree of the difficulty. One way of doing this would be to show that subjects were expressing signs of being blocked. In the analyses, blocking was defined as when the subjects took an identical path from a screenful to another more than once. Whenever this type of looping happened, it meant that a subject was in a mental status of being blocked. Each repetitive path was counted as one block. Thus, each subject's response to a question can be analyzed to determine the number of times they were blocked.
The mean response time for answering a question of all subjects in Experiment 1 was 64.32 seconds with a standard deviation of 37.13 seconds. A two-way analysis of variance (ANOVA) with repeated measures was performed on the average response time. The two independent variables were (a) the type of interface used, and (b) the type of task performed. The latter variable was the variable on which the repeated measures were performed.
No significant main effects were found for type of interface used and type of task performed. There was a strongly significant interaction (F(4,84) = 12.98, p<.0001) between the two variables. This indicates that, overall speaking, the interfaces seem to work no differently among themselves, nor did the different types of tasks. However, the strong interaction implied that the effectiveness of an interface varied as the tasks that needed to be dealt with changed in nature. Further analyses showed that the significant interaction was due to the significant main effect of type of task at the book interface (F(2,28) = 10.92, p<.0003) and at the map interface (F(2,28) = 13.31, p<.0001). The subjects using the book interface performed significantly better with Type I tasks than with Type II and Type III tasks (F(1,14) = 14.10, p<.002, and F(1,14) = 21.43, p<.0004). The subjects using the map interface performed better with type III tasks than type I and type II tasks (F(1,14) = 22.60, p<.0003, and F(1,14) = 27.02, p<.0001). No significant effects were found for type of task at the note card interface. The mean response time for each treatment group displayed in Figure 1 gives a clearer view of the correspondence between the interface used and the type of task performed. It took less time for the subjects to locate the desired information for a task given an interface that was designed with a matched metaphor for two of the interface-task pairs.
Figure 1: Average response time of all treatment conditions in Experiment 1.
In total, there were 30 errors out of 405 answers. A frequency distribution of the errors is displayed in Figure 2. As shown, there seems to be a systematic pattern of error distribution that also reflects the interface-task correspondence. The least number of errors occurred when the tasks were performed with the interface that matched the design metaphor. The results of a two-way ANOVA with repeated measures on errors supports the assumption by producing a strong significant interaction between interface used and type of task performed (F(4,82) = 5.00, p<.001). Both the main effects of interface used (F(2,42) = 5.04, p<.011) and type of task performed (F(2,84) = 6.40, p<.003) were also significant. Main effect analyses revealed that this was a result of subjects making significantly more errors using the book and the note card interfaces than the map interface while dealing with type III tasks (p<.05). Further analyses show that the interaction was due to the significant main effects of type of task at the book interface (F(2,28) = 4.42, p<.021) and the note card interface (F(2,28) = 7.46, p<.003). Fewer errors occurred for the book interface group with type I tasks than type II (F(1,14) = 6.00, p<.028) and type III tasks (F(1,14) = 9.33, p<.009). The note card interface group made more errors when working with type III tasks than type I or type II tasks (F(1,14) = 7.27, p<.015, and F(1,14) = 13.50, p<.003). The number of errors for the map interface group were so small (2), there were no significant differences between the different types of tasks. The analyses on errors supported the hypothesis that a particular metaphor adopted in interface design was only appropriate for certain types of tasks.
Figure 2: Total number of errors of all treatment conditions in Experiment 1.
Due to the simple nature of the tasks, not every search showed signs of blocking. Only 73 out of 405 answers had blocking and 6 subjects showed no blocking throughout all nine questions. A two-way ANOVA with repeated measures was carried out for blocking. The results show that the main effect of interface used was significant (F(2,42) = 5.09, p<.011). Main effect analysis revealed that less blocking occurred with the book interface (p<.05). Both the main effect for type of task performed and the interaction of the two factors were not significant. Though the distribution of total blocking presented in Figure 3 seems to indicate correspondences between the book interface and type I tasks and the map interface and type III tasks, the effects did not reach a significant level.
On the whole, the findings from the analyses of response time and errors have pointed rather strongly in one direction, namely, the interface-task correspondence does exist. Findings on blocking were consistent with the other three measures though the effect did not reach a significant level. The analyses were able to demonstrate strong correspondences with at least two interface-task pairs, i.e., the book interface and type I tasks and the map interface and type III tasks. One possible reason for failing to show significant note card interface-type II task correspondence may be because of the inappropriate selection of questions as type II tasks. This should be examined in future studies.
Figure 3: Total blocking of all treatment conditions in Experiment 1.
An interface metaphor is more effective for the types of tasks that closely matched its model and is not as effective when the match is not as close. The effectiveness of an interface has a great deal to do with the type of tasks the interface is used to accomplish. Interface designers should keep in mind that it is very important that the interface model matches the task model. Given these results, it would also be unrealistic to adopt a single interface model for a complex information retrieval system.