Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.7k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 72k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
PDF (19MB)
Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual Symposium on User Interface Software and Technology (UIST '17).

This work was supported in part by a Google Faculty Research Award.
The Dataset
We mined over 9.7k free Android apps from 27 categories to create the Rico dataset. Apps in the dataset had an average user rating of 4.1. The Rico dataset contains visual, textual, structural, and interactive design properties of more than 72k unique UI screens and 3M UI elements.

Number of apps in different categories
Data Collection
Rico Android farm
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
Design Representation
Rico dataset contents
For each app, Rico exposes Google Play Store metadata, a set of user interaction traces, and a list of all the unique UIs discovered. The Play Store metadata includes an app’s category, average rating, number of ratings, and number of downloads. Each user trace is composed of a sequence of UIs and user interactions that connect them. Each UI comprises a screenshot, an augmented Android view hierarchy, a set of explored user interactions, a set of animations capturing transition effects in response to user interaction, and a learned vector representation of the UI’s layout. View hierarchies capture all of the elements comprising a UI, their properties, and relationships between them. For each element, Rico exposes its visual properties such as screen position, dimensionality, and visibility, textual properties such as class name, id, and displayed text, structural properties such as a list of its children in the hierarchy, and interactive properties such as the ways a user can interact with it. Additionally, we annotate elements with any Android superclasses that they are derived from (e.g., TextView), which can help third-party applications reason about element types.
Deep Learning Applications
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.

Deep learning training procedure
1. UI Screenshots and View Hierarchies
Contains 72k+ unique UI screens. For each UI, we present a screenshot (PNG file) and a detailed view hierarchy (JSON file). A few sample UI screenshots are shown below.
A few example screenshots

A sample view hierarchy file is shown below. The activity_name contains the name of the app package as well as the name of the activity the UI belongs to. All elements in the UI can be accessed by traversing the view hierarchy starting at the root node: ["activity"]["root"]. For each element, the class property specifies its class name, and the ancestors property contains a list of its superclasses. The bounds property specifies an element's bounding box within a 1440x2560 screen window.
{ "activity_name": "",
  "activity": {
    "root": {
      "scrollable-horizontal": false,
      "draw": true,
      "ancestors": [
      "clickable": false,
      "pressed": "not_pressed",
      "focusable": false,
      "long-clickable": false,
      "enabled": true,
      "bounds": [
      "visibility": "visible",
      "content-desc": [
      "rel-bounds": [
      "focused": false,
      "selected": false,
      "scrollable-vertical": false,
      "children": [
      "adapter-view": false,
      "abs-pos": true,
      "pointer": "2e18ce7",
      "class": "$DecorView",
      "visible-to-user": true
    "added_fragments": [],
    "active_fragments": []
  "is_keyboard_deployed": true,
  "request_id": "1350"
2. UI Metadata
Contains metadata about each UI screen: the name of the app it came from, the user interaction trace within that app, and the UI number within that trace. A few lines from the CSV file are shown below.
UI Number,App Package Name,Interaction Trace Number,UI Number in Trace
3. UI Layout Vectors
Contains 64-dimensional vector representations for each UI screen that encode layout based on the distribution of text and images. To access the layout vector for a particular UI, first find its index in the array contained in ui_names.json. Then, load the 2-D array in ui_vectors.npy and take the slice at that index along the first dimension. For example, the UI 20353.png is at index 2. Therefore, its corresponding layout vector can be obtained by ui_vectors[2,:] in Python.

This representation can be used to cluster and retrieve similar UIs from different apps. Below are some results that illustrate nearest-neighbor search in the learned layout space.

Similar UIs
4. Interaction Traces
Contains user interaction traces organized by app. Each app can have multiple traces: trace_0, trace_1, etc. Each trace comprises a sequence of UIs (shown below) captured as screenshots and view hierarchies.

Contents of an interaction trace

Each trace also has a corresponding gestures.json file, which captures the XY coordinates of user interactions performed on each UI screen (example below). A UI with a single pair of XY coordinates represents a tap; a UI with multiple XY coordinates represents a swipe. In the example below, a user tapped on UI 48, and swiped on UI 73.

  "48": [[0.2671957671957672, 0.7721088435374149]],
  "73": [[0.5302343159486017, 0.36904761904761907],
         [0.5302343159486017, 0.36904761904761908],
         [0.5302343159486017, 0.36904761904761909],
         [0.5302343159486017, 0.36904761904761910],
         [0.5302343159486017, 0.36904761904761911]],
  "550": [[0.36999244142101284, 0.7721088435374149]],
  "764": [[0.5483749055177627, 0.3758503401360544]],
  "828": [[0.46674225245653816, 0.7704081632653061]],
5. Animations
Contains GIFs that demonstrate how screens animated in response to a user interaction; follows the same folder structure introduced for interaction traces. Example animations are shown below.

Animation 1 Animation 2 Animation 3
Hover on the GIFs to replay them
6. Play Store Metadata
Contains metadata about the apps in the dataset including an app’s category, average rating, number of ratings, and number of downloads.