A/B Testing

Date Posted: June 4, 2020

Friendly reminder that I am a student, and not a professional in the field. Please review the resources below before implementing the model. If there are any errors, please leave a comment/contact me and I will get it resolved immediately.

Introduction

A/B testing. If you’re a student like me, you probably have never heard of it before in your statistics classes before. It sounds fancy. It sounds complex.

It’s neither of those.

A/B testing is a two-sample hypothesis test. Remember your unpaired t-tests and ANOVA? Congratulations, you have almost all the statistics background necessary for A/B testing.

A/B testing is frequently used in the context of websites and apps. Take your original website and call it the control — version A. If you think a modification will improve user experience, or some other variable of value (click rate, session time, etc.), make the change, and call it version B. Now, conduct a random sample, and then test to see if version B is significantly better or not.

Sound familiar? That’s exactly the textbook examples of a t-test. Change a factor, compare for a significant difference, and present results.

However, from my research, the complexity of A/B testing is less about the background methodology, but more about the business implication and implementation. It ultimately takes domain knowledge to know what factors should be tested, what sort of results are meaningful, and how much of an improvement is required to justify any upgrade costs.

Statistical Methodology

First and foremost, it goes without saying that the sample should be random and representative of the population. We will not discuss this topic further, but recall that if the sample is biased, so too will the results. We use statistics to model reality, and without a proper sample, that is impossible.

Moving onwards, we want to examine the actual models applied for A/B testing. Interestingly enough, it was fairly difficult to find reliable sources that explain the models used.

The Wikipedia article lists several two-sample hypothesis tests such as the t-test, the fisher’s exact test, and a chi-squared test, but surprisingly doesn’t provide any citations for the table.

When I attempted to examine what software people use for A/B testing, I was struct by the lack of statistics-oriented programs such as R and SPSS. Again, it seems like A/B testing software is heavily orientated towards business cases, and so interactive dashboards took priority. Examining the software pages to see what models they used, I found very little information. I will be reaching out to several of these companies to see if they are willing to share the background statistics.

Considerations Beyond Statistics

A/B testing is the use of statistics to make data-driven decisions, but it seems that the topic is much less about statistics compared to business. The 2017 paper by Kohavi and Longbotham seems to provide a great overview of the business considerations and applications of A/B testing, highlighting real world implementation. This paper may help bridge the gap to the real world for other statistics students.

While the statistical models used are the same as those in an introductory class, the implementation and analysis are entirely business oriented. So, before you add A/B testing to your resume after taking STAT 200, definitely do a bit of research first!

Resources

Online Controlled Experiments and A/B Testing by Kohavi and Longbotham, 2017

A Refresher on A/B Testing by Amy Gallo, 2017, Harvard Business Review

Improving Library User Experience with A/B Testing: Principles and Process by Young, 2014