Controlled Study · Visual Cues · CHI 2024

Closing the Age Gap with the Right Visual Cue

A controlled experiment (n=56) comparing 4 visual cue types — finding that highlighting with context or weighted zoom eliminates the age gap in mobile UI navigation

Published at CHI 2024 — Reducing the Search Space on demand helps Older Adults find Mobile UI Features quickly, on par with Younger Adults · Yu & Chattopadhyay

Abstract

Abstract

Building on prior WoZ findings, we systematically compared visual cues for reducing the on-screen search space. A controlled mixed-factorial study (n=56: 28 older + 28 younger adults, 24 tasks, 6 apps) showed that Highlight with Context and Weighted Zoom help older adults find features quickly — and eliminate the age gap. All three hypotheses supported.

Keywords: visual cues, mobile navigation, search space reduction, older adults, multivariate testing, equivalence test, CHI 2024

1. Motivation

From WoZ to Controlled Evaluation

The WoZ study used simple red rectangles for highlighting — functional, but not tested. Visually augmenting a feature-rich mobile UI can do more harm than good if it adds cognitive load or isn't salient enough. There was no prior evidence on which visual cue design works best for older adults.

Our feature search space reduction pipeline (using Android's Assist API + APE keyword extraction + Universal Sentence Encoder matching) automatically identifies the top 3 UI elements relevant to a user's voice query — achieving 87.2% accuracy at a search space size of 3.

2. Candidate Visual Cue Designs

Six Designs Evaluated in Two Phases

Six visual cue designs
Figure 3. Six candidate cues: highlight (a, b), highlight with context (c), zoom (d), weighted zoom (e), paired with content labels (a, c) or manually-authored instructions (b, d, e). (CHI 2024)
Four spatial cue types on phone screenshot
Figure 4. Four spatial cue types on the same task screen: highlight (H), highlight with context (HC), zoom (Z), weighted zoom (WZ). Task: "Scan the barcode of a dress to get more information." (CHI 2024)
  • Highlight (H) 20% black overlay on entire screen except 3 target elements.
  • Highlight with Context (HC) ★ Target elements shown with surrounding context (3× diameter circle); no darkening of immediate neighborhood.
  • Zoom (Z) All three elements magnified equally (×1.3).
  • Weighted Zoom (WZ) ★ Elements magnified in proportion to relevance rank: ×1.6, ×1.4, ×1.2.

3. Experiment

Mixed Factorial Design · Online · 56 Participants

Between-group: Age (older 60+ vs. younger 20–29). Within-group: Visual cue type (4 levels in Phase 1). Tasks: find a UI feature within 13 seconds; if failed, cue shown randomly. 24 tasks across 6 apps (Starbucks, Uniqlo, AMC, Ventra, Subway, Audible).

56participants (28 OA, 28 YA); all own a touch smartphone
24tasks across 6 feature-rich apps; 13-sec baseline timeout
GLMMR lme4, REML; TOST equivalence test for H3; α = .05

Hypotheses: H1 OA less efficient and accurate than YA (baseline). H2 Reduced search space improves OA efficiency and accuracy. H3 No large age effect when search space is reduced (equivalence, d ≤ .55).

4. Results

All Three Hypotheses Supported

Completion time and preference by cue type
Figure 5. Task completion times (left) and preferences (right) of older adults. HC and WZ are fastest and most preferred; H and Z are significantly slower. (CHI 2024, Fig. 5)
Complete vs reduced search space performance
Figure 6. With a complete search space, OA are significantly slower and less accurate than YA. With a reduced search space, differences are not significant. (CHI 2024, Fig. 6)
  • H1 ✓ OA took significantly longer (M=6.24s vs. 4.15s, p<.001, r=1.66) and made more errors (p=.006) without cues.
  • H2 ✓ With cues, OA completed tasks in M=4.49s vs. 6.24s without (p<.001), and made significantly fewer errors (p=.001).
  • H3 ✓ TOST equivalence test significant for both completion time (p=.007) and error rate (p=.038) — no large age effect (d ≤ .55).

Key Finding

HC and WZ were the best-performing and most-preferred cue types. With either, older adults performed on par with younger adults. Text cue quality (content labels vs. manually-authored) had no significant effect on performance.