User studies for information access purposes

As it turns out, I have had reason to formulate some of my thoughts on user studies for information access purposes.

  1. My first pet point is to note the difference between benchmarking and validation. Benchmarking is what Cranfield-style studies do. The original metaphor of benchmarking is useful to understand the point: bolting a piece of machine to the bench and running it with various inputs. Validation is another sort of exercise, seeing if tools and technologies (and the design principles behind them) actually work for the tasks they are envisioned to address.
  2. The second point is to stress the importance of a use case. A use case is one way of bridging the divide between on the one hand designing interaction with systems, building useful tools, studying usage, evaluating systems with respect to the same, and on the other evaluating system components with respect to their capacity to deliver the goods necessary for a complete system. "Use case" is a technical term for one such framework for the informal specification of usage for this type of purpose — even while I happen to like that specific practice and school of thought I don’t think it is necessary to commit to it. There might be others as useful ones around, and certainly there will be others emerging. Wikipedia has a fairly good text on use cases. The standard references on use cases:
    • Alan Cockburn. (2002). Agile software development. Addison-Wesley.
    • Ivar Jacobson, Magnus Christerson, Patrik Jonsson and Gunnar Övergaard. (1992). Object-Oriented Software Engineering: A Use Case Driven Approach, Addison-Wesley.
  3. The third is to note that the Cranfield-style studies performed so far have not been agnostic with respect to use cases. While the notion of a use case has not been explored to any great extent in information access research, there is an implicit notion of retrieval being task-based, topical, with active and well-spoken users. This implicit use case informs both evaluation and design of systems. Recall and precision can be worked together to become a fair proxy for user satisfaction in that usage scenario, even when abstracted to be a relation between query and document rather than between need and fulfilling that need.
  4. The fourth point is the advent of multimedia, which breaks the implicit information retrieval use case. Multimedia is different, used differently, by different users, and for different reasons than is text. Benchmarking must change to capture the most important criteria for success for multimedia information access systems, using appeal and satisfaction rather than completeness and precision as target notions.

So what consequences does the above argument have for the purposes of future research in the field?

Cranfield studies have worked well to establish usefulness of systems with respect to some human activities if the activities in question fit the implicit use case – (3) above. If they do not – (4) above – evaluations will fail to establish success criteria. (Although there are studies which claim that at the limit, differences in retrieval system performance do not influence the end result of user output.)

User studies of the "traditional" type: show a system to a set of users for a brief while and give them movie tickets as a thank you for participation do neither this nor that. They might be useful to evaluate the ergonomics of some specific interface widget, but they certainly are very unlikely to provide purchase to establish the usefulness of a system solution for a new task. Meaning shoddy user studies do not benchmark, neither do they validate. (I am allowed to use strong language here, since I have performed such studies myself and am one of the people I am criticising!)

There is a craft of performing user studies of the kind we would need. SIGCHI conferences have a lively discussion on topics related to this. But very seldom do user studies address information access, especially not multimedia information access, probably to some extent due to the high threshold of building and testing such systems. That gap between engineering and application-oriented research in information retrieval and the craft of designing and building appealing and habitable interfaces, and studying users in action, needs to be closed somehow.

I believe one path to doing this is formulating several new use cases. These may be put together with various levels of ambition, competence, and insight, but once formulated, HCI specialists can debate and test their validity and IR specialists can set parameters for system benchmarking, based on crucial characteristics of the use case. I believe we need inspiration from meeting people outside the field: that means talking to broadcasters, media editors and archivists, consumer electronic device manufacturers and personal information management researchers, product designers, and net activists of various types.

I proffer as an exhibit to demonstrate the validity of my argument the current (and hopefully transient) success of Youtube as a social space and communication channel. The interface is horrid ergonomically, breaking every design guideline imaginable; the search capabilities and components don’t deserve being called mediocre; the content is of unpredictable aesthetic, cultural, and technical quality. Yet it works. I would like to be able to give a reasoned argument why the major information access success story of the early 2000’s cannot be predicted and described through the research results either in HCI or IR we like to take great pride in.

Some published texts where this argument is made: