The controlled experiment approach to evaluating user interfaces Research methods for Human-Computer Interaction

8 May 2011

This post is part of a series of notes I collated during my studies at UCL’s Interaction Centre (UCLIC).

The question most commonly asked: “Does making a change to the value of variable x have a significant effect on the value of variable y?”

x could be an interface or interaction feature.
y could be the time to complete task, number of errors, work load, the user’s subjective satisfaction, etc.

Controlled experiments are more widely used in HCI research than in practice (costs of design and running experiments outweigh their benefits).

The MethodPermalink to section titled The Method

1. ParticipantsPermalink to section titled 1. Participants

Consider what the appropriate user population is:

A representative sample of the user population is recruited as participants (not always feasible).
If a non-representative sample of users is involved, consequences must be considered.
How many participants to recruit depends upon power of statistical tests, time available for study, ease of recruiting participants, incentives available for participant rewards, etc.

2. Ethical considerationsPermalink to section titled 2. Ethical considerations

Blandford et al. (2008) – VIP:

Vulnerable participants.
Informed consent.
Privacy, confidentiality and trust.

Ensure all participants are informed of the purpose of the study and what will be done with the data.

Anonymise data where possible.

Offer the participants the opportunity to talk about the experiment in a debriefing session after they have finished the tasks.

Data stored in accordance with legislation:

UK Data Protection Act.
Need to register with the government if identifiable data is stored.

3. Design: Dependent and independent variablesPermalink to section titled 3. Design: Dependent and independent variables

A controlled experiment tests a hypothesis – the effects of a design change on a measurable performance indicator.

Hypothesis example:

“A particular combination of speech and key-press input will greatly enhance the speed and accuracy of people sending text messages on their mobile phone.”

The aim of the experiment is to formally fail to prove the null hypothesis.

For the above hypothesis: You design an experiment, which in all fairness ought not to make any difference to the speed and accuracy of the texting.

The assumption that there is no difference between designs is the null hypothesis.

The study is designed to show that the interaction has no effect, within the bounds of probability.

The failure to prove the null hypothesis provides evidence that there is a causal relationship between the independent and the dependent variables.

Variables:

Independent variable(s) – the variable that is intentionally varied (one or many).
Dependent variable(s) – the variable that is measured (time, error rate, workload).
Confounding variable(s) – variables that are unintentionally varied between conditions of the experiment and which affect the measured values of the dependent variable.
- Example: testing two interfaces for text entry it could be that you use two different messages between devices, therefore affecting entry time.
- Example: Complexity of certain words, using dictionary mode vs. not, using natural language vs. “text speak”.

A simple answer to the texting example is to make sure each message is input on each device.

The aim is to vary the independent variable in a known manner, to measure the dependent variable(s) and to minimise the affects of the confounding variables on the outcome of the study.

Avoid different rooms, different computers.
Randomise variables such as the time of day.
Be aware of differences between individuals.
- Men vs. women
- Personality
- Aesthetic sensibilities
- Cognitive skills
- Age
- Education level

4. Design: “Within Subjects” or “Between Subjects”Permalink to section titled 4. Design: “Within Subjects” or “Between Subjects”

Subjects:

Within subjects – each participant performs under all sets of conditions.
Between subjects – each participant only performs under one condition.
Mixed factoral – One independent variable is within subjects and another is between subjects.

Within vs. betweenPermalink to section titled Within vs. between

Are participants required to compare interfaces (therefore a within subjects design)?
Are there likely to be unwelcome learning or interface effects across conditions (therefore a between subjects design)?
What statistical tests are planned?
Time take to complete the task? (The longer the time, the less likely a within subjects design can be used).
How easy is it to recruit participants? (The more people, the more feasible a between subjects design is).

Within subjectsPermalink to section titled Within subjects

Advantage: individual differences are less likely to influence results.
Disadvantage: possible learning effects, complex statistics.
Participants typically required to repeat similar procedures multiple times with different values of the independent variable.
Advisable to generate multiple tasks, one for each condition, for each participant to perform.
- The task becomes an independent variable, but one with no direct interest in analysis.
- Different tasks/values sometimes referred to as levels.
- Example, 2 independent variables: mode of navigation (speech, key-press), mode of message (speech, predictive text, multi-tap).

Nav \ Message	Speech	Predictive text	Multi-tap
Speech	Condition 1	2	3
Key-press	4	5	6

5. Apparatus and materialsPermalink to section titled 5. Apparatus and materials

Software simulators
[Smartphone] emulators

Often require some programming skills to create prototypes.

6. ProcedurePermalink to section titled 6. Procedure

The procedure describes what the participants are going to do during the experiment.

Ensures that every participant in the experiment has the same experience, which helps eliminate confounds.
Example, study of young vs. old – you don’t want to treat the latter more deferentially.
Allows others to replicate the experiment – the basis of “good science” – which helps eliminate confounds between entirely different attempts.

Minimising the effect of confoundsPermalink to section titled Minimising the effect of confounds

Control the order of the experiments –performance effects (learning), novelty effects.
Control the tasks that the participants are to perform – devise different tasks to avoid learning effects and reduce boredom.
Control the context in which the study is run – using a different computer or room, different time of day. Innocuous changes can influence results.

A systematic approach to variation is appropriate.

“Latin square” design – a grid in which every element appears precisely once in each row and each column.
- The row represents the order in which elements are administered.
- The column represents the sequence of participants.

Group \ Task	First	Second	Third	Fourth
One	A	B	C	D
Two	B	C	D	A
Three	C	D	A	B
Four	D	A	B	C

Mixed factoral example – order of presentation of tasks and interfaces are systematically varied to eliminate order effects.

Group \ Task	First	Second
One	Interface 1, Task A	Interface 2, Task B
Two	Interface 1, Task B	Interface 2, Task A
Three	Interface 2, Task A	Interface 1, Task B
Four	Interface 2, Task B	Interface 1, Task A

Masking the experiment robustPermalink to section titled Masking the experiment robust

Clear and consistent task instructions.
- Level of detail required.
- Reasonable time limit.
- Engaging.
Piloting the experiment to ensure people behave as anticipated.
- Ensure correct data is being gathered.
Ensure all equipment is working properly.

Bigger investigationsPermalink to section titled Bigger investigations

Probes the phenomenon more deeply.
Series of limited experiments each of which involves a simple controlled manipulation.
More reliable to conduct a series of investigations.

References and further readingPermalink to section titled References and further reading

Cairns, P., & Cox, A. L. (Eds.). (2008). Research methods for human-computer interaction. Cambridge University Press. https://doi.org/10.1017/CBO9780511814570

Updated on: 13 February 2021

The Method#Permalink to section titled The Method

1. Participants#Permalink to section titled 1. Participants

2. Ethical considerations#Permalink to section titled 2. Ethical considerations

3. Design: Dependent and independent variables#Permalink to section titled 3. Design: Dependent and independent variables

4. Design: “Within Subjects” or “Between Subjects”#Permalink to section titled 4. Design: “Within Subjects” or “Between Subjects”

Within vs. between#Permalink to section titled Within vs. between

Within subjects#Permalink to section titled Within subjects

5. Apparatus and materials#Permalink to section titled 5. Apparatus and materials

6. Procedure#Permalink to section titled 6. Procedure

Minimising the effect of confounds#Permalink to section titled Minimising the effect of confounds

Masking the experiment robust#Permalink to section titled Masking the experiment robust

Bigger investigations#Permalink to section titled Bigger investigations

References and further reading#Permalink to section titled References and further reading