Shirbi
Ish-Shalom

Roundtable – Experimentation: Beyond A/B Testing

Intuit

Shirbi ish

Shirbi
Ish-Shalom

Roundtable – Experimentation: Beyond A/B Testing

Intuit

Shirbi ish

Bio

Shirbi Ish-Shalom is a product manager at Intuit building data-driven products. She focuses on managing sophisticated recommendation engines and driving experimentation. Previously, she conducted research in the Data Science and Computational Biology division at Pfizer where she focused on personalized medicine in genetics. She completed her undergraduate and master’s degrees in Biomedical Informatics at Stanford University.

Bio

Shirbi Ish-Shalom is a product manager at Intuit building data-driven products. She focuses on managing sophisticated recommendation engines and driving experimentation. Previously, she conducted research in the Data Science and Computational Biology division at Pfizer where she focused on personalized medicine in genetics. She completed her undergraduate and master’s degrees in Biomedical Informatics at Stanford University.

Abstract

Traditional methods of experimentation have been implemented in industry settings for years. However, state-of-the-art experimentation techniques offer new opportunities to tailor product features to user behaviors more rapidly, accurately, and at larger scales. A roundtable discussion about the latest techniques (including multi-armed bandits, sequential hypothesis testing), common pitfalls, and best practices. The goal of the roundtable is to increase understanding of these methods and provide guidance for cutting-edge experimentation

Abstract

Traditional methods of experimentation have been implemented in industry settings for years. However, state-of-the-art experimentation techniques offer new opportunities to tailor product features to user behaviors more rapidly, accurately, and at larger scales. A roundtable discussion about the latest techniques (including multi-armed bandits, sequential hypothesis testing), common pitfalls, and best practices. The goal of the roundtable is to increase understanding of these methods and provide guidance for cutting-edge experimentation

Discussion Points

  • What is the current status of experimentation? Which techniques are currently employed at Intuit and other companies?
  • Three topics to focus on: 1. Outline of best practices; 2. Common pitfalls; 3. Latest and greatest techniques (what are they and how do we use them?). Give outline of 2-3 techniques that go beyond standard A/B testing (i.e. multi-armed bandit systems, sequential tests).

  • Deep dive 1:
    Based on my experience with experimentation at Intuit and other companies, I have found there are subtleties to the most basic questions that deeply impact experimental results. When can I end my experiment? How many secondary metrics can I track? What makes a good primary metric? Though seemingly high-level, the answers to these questions have unintuitive technical underpinnings that are essential to decision-making. For example, many companies target underpowered primary metrics, track secondary metrics without proper statistical corrections, or unknowingly prematurely end experiments based on random noise. Discussion centered around navigating these pitfalls in experimentation with worked examples from Intuit.

  • Deep dive 2:

    Some experiments at Intuit face the low traffic problem: certain product features receive visitors too infrequently to run experiments in feasible timescales. New improvements in sequential testing offer possible ways to accelerate time to detection of true effects. Other experiments face a problem in high-traffic settings: revenue loss from missed conversions due to the traditional A/B testing framework. Multi-armed bandit systems can help minimize potentially lost conversions by optimizing the exploration-exploitation tradeoff. A review of leading multi-armed bandit techniques, and discussion of their application in industry settings.

  • Summary & wrap-up

Discussion Points

  • What is the current status of experimentation? Which techniques are currently employed at Intuit and other companies?
  • Three topics to focus on: 1. Outline of best practices; 2. Common pitfalls; 3. Latest and greatest techniques (what are they and how do we use them?). Give outline of 2-3 techniques that go beyond standard A/B testing (i.e. multi-armed bandit systems, sequential tests).

  • Deep dive 1:
    Based on my experience with experimentation at Intuit and other companies, I have found there are subtleties to the most basic questions that deeply impact experimental results. When can I end my experiment? How many secondary metrics can I track? What makes a good primary metric? Though seemingly high-level, the answers to these questions have unintuitive technical underpinnings that are essential to decision-making. For example, many companies target underpowered primary metrics, track secondary metrics without proper statistical corrections, or unknowingly prematurely end experiments based on random noise. Discussion centered around navigating these pitfalls in experimentation with worked examples from Intuit.

  • Deep dive 2:

    Some experiments at Intuit face the low traffic problem: certain product features receive visitors too infrequently to run experiments in feasible timescales. New improvements in sequential testing offer possible ways to accelerate time to detection of true effects. Other experiments face a problem in high-traffic settings: revenue loss from missed conversions due to the traditional A/B testing framework. Multi-armed bandit systems can help minimize potentially lost conversions by optimizing the exploration-exploitation tradeoff. A review of leading multi-armed bandit techniques, and discussion of their application in industry settings.

  • Summary & wrap-up