Busted! DigitalMarketer Calls Bull$hit on 4 Conversion Rate Optimization (CRO) Case Studies
I’ve evaluated thousands of conversion rate optimization case studies in my career.
I’m pretty darn good at sniffing out the good ones I can apply to our web properties at DigitalMarketer. And, I’m equally good at sniffing out the posers that are peddling weak information that, if implemented, could damage our site.
Today, I’ll give you the 8-points I look for when evaluating the validity of a CRO case study.
Then, I’ll show you 4 CRO case studies that aren’t worth the paper pixels they are printed on. Plus, I’ll show you one rock-solid case study you can take to the bank.
(RELATED:Leveraging Conversion Rate Optimization to Drive Growth)
The Current State of Split Test Case Studies
…and why the increasing low quality is becoming problematic for the testing industry. Problem: Due to our love for case studies we…
consume them
share them
try to implement the variants on our own site without regard to their validity
This needs to stop or the optimization industry will look like digital snake oil salesman.
I’m on a mission to eliminate bad case studies! Each quarter I’ll share some examples ranging from truly actionable case studies to the worst in the business.
We’ll get into the case study examples in just a minute. First, you need the…
8-Point Case Study Evaluation Process
Do yourself a favor and use this process when you read your next split test case study (or publish your own).
1. Is the sample size available?
Generally when people don’t publish the sample size it’s because it is way too small.
Most will hide behind the guise that they couldn’t get sign off from the client to share this data, but we really know what’s going on.
If you can’t share all the data points, then your case study isn’t worth sharing.
Would you take a case study seriously if the entire sample size were just 400 visitors?
Notice in this test by GoodUI both the number of visits AND conversions are listed.
2. What’s the lift percentage?
If you see a lift percentage that’s too good to be true, it probably is.
If I see any case study that has a triple digit lift, I’m immediately suspect. The only way to combat lift disbelief is to share the raw number of conversions. Also, remember to contextualize the conversion lift.
When someone says there was a 30% lift in conversions they aren’t saying they went from a 10% conversion rate to a 40% conversion rate. They went from a 10% conversion rate to a 13% conversion rate!
Numbers can be tricky sometimes; don’t let the lift percentage create false expectations. Pop quiz: what is the percent lift for these two conversion rates?
Control: 5%
Variation 1: 7%
(Don’t say 2%, don’t say 2%, don’t say 2%)
3. Are the numbers of raw conversions published?
I’m a little flexible with this one.
I know a lot of organizations don’t like to share the raw number of conversions, especially larger companies. They think that competitors can get more detailed information about their organization through these numbers.
If you’re unable to share the raw conversions, then don’t share case studies with lift percentages that look unbelievable.
The burden of proof is on the person/organization publishing the case study!
4. Is the conversion metric listed?
GoodUI makes the primary metric abundantly clear.
This is a major pet peeve of mine.
I’ve seen plenty of case studies where they just say ‘20% conversion lift!’
Well… what the heck was the metric lifted?!
Most optimizers measure more than just one metric, so when you publish your case study be as clear as possible. Avoid saying ‘conversion lift’ and be more descriptive, e.g., 20% lift in sign-ups, 10% lift in clicks, 15% lift in purchases, etc…
5. Is the confidence rate published?
Ideally we like to see a minimum of a 95% confidence rate on a split test. Some organizations reduce this number and assume the increased risk.
However, if you see a sub 90% confidence rate on a split test case study, move on.
6. Is there a test procedure? What is it?
In the early days of split testing it was okay to test random button colors.
Mainly people wanted to show they could actually run split tests versus running good split tests.
Now that optimization has matured we need to have a reason for testing. I’ve said it many times:
For every test you run there are an infinite number of other test possibilities.
It is our job as optimizers to find the best pages to optimize and to test the right elements. This requires more than a ‘guess and check’ approach.
If a case study doesn’t share how they came up with the idea for the test or how they implemented the test, then you can’t really learn a whole lot from the case study.
Context is everything.
Without context a split test case study is just idea fodder showing you things you could possibly test.
Here are a few things to be sure to outline in your procedure:
Why you selected that page
The segment of traffic you’re testing, e.g., new visitors, desktop only, past purchasers, etc…
Why you selected the test elements
Your test hypothesis
ConversionXL has started sharing more case studies and they go into a lot of detail in regards to their procedure.
7. Is the conclusion justified by data or just hyperbole?
Learning from our test results is the reason we test (well, on top of increasing sales, leads, etc..)
However, most case studies will jump to some major conclusions unjustifiably.
The only time you can attribute a change to a single element is if you isolate that element (by a single element A/B test or a MVT test). If more than one thing on the page changed, for the love of Pete, don’t attribute the lift to the element you THINK caused the change.
If you want to run a test where a bunch of elements change, that’s fine.
Just know that some elements may lift conversions while others depress it. You can only know the actual impact if you isolate the metric or run a full factorial MVT test.
In this example from WhichTestWon you can see the pages are radically different. Yet the headline and much of the summary focuses on the offer change, 400 off vs. Free Quote.
There is just too much going on here to actually draw any kind of conclusion.
If you are going to test an offer, don’t test the design. If you are going to test design, don’t test an offer. The only thing you can actually learn in this case is which variation performed better, you’ll never know why in a single test.
8. What was the test timeline?
Tests require the perfect balance of time for their results to be taken seriously.
If a test is called too soon it may not take natural variance between days into consideration or report on a portion of the buying cycle (invalidating results).
On the other hand, if a test runs for too long seasonal variances could impact results. My rule of thumb here is simple:
Don’t run a single test for less than 7 days or more than 6-8 weeks.
All tests must complete the week. If you start a test on a Wednesday you need to end it on a Wednesday.
The time line is extremely important and so is the time of year.
The former is a must in case studies where the latter is often omitted. I wouldn’t shut down a case study because it didn’t tell me the time of year, but that sort of context makes the case study extremely valuable.
To sum up the spot check ask these questions before sharing a case study or publishing one on your site: 1. Did I/they publish total visitors? 2. Did I/they share the lift percentage correctly? 3. Did I/they share the raw conversions? (Does the lack of raw conversions hurt my case study?) 4. Did I/they identify the primary conversion metric? 5. Did I/they publish the confidence rate? Is it >90%? 6. Did I/they share the test procedure? 7. Did I/they only use data to justify the conclusion? 8. Did I/they share the test timeline and date?
A case study that has more no’s than yes’s is a dud. In the next section I’ll evaluate a few case studies based on this rubric.
(Digital Marketer Lab Member Extra: Access your Landing Page Testing Formula Execution Plan now! Split test your way to more leads and sales from your landing pages. Not a Lab member? Click here to learn more about DM Lab.)
They shared the lift percentage, but it is pretty high. Without the raw conversions we don’t know if this is correct. I’ll give them the benefit of the doubt.
Did they share the raw conversions?
No.
Does the lack of raw conversions hurt my case study?
Yes – with an 84% increase in opt-ins, I would liked to have seen the raw conversions (or the sample size at the very least)
Did they identify the primary conversion metric?
Yes – opt-ins
Did they publish the confidence rate and is it greater than 90%?
Yes.
Did they share the test procedure?
No. Button color tests are the lowest common denominator of tests. It is well known that the actual hue won’t make as much of a difference in comparison to whether people can easily find your CTA.
This was more of a guess and check test.
Did they only use data to justify the conclusion?
There’s no real conclusion here, but I am a little worried about there they were going with the color theory. If you don’t have a clear test procedure or reason to run the test, you won’t be able to justify a conclusion.
Did they share the test timeline and date?
No – we don’t have any idea about the test duration or when this test was run during the year.
Case Study Status:
Dud.
There is no reliable data or procedural context. The only thing you can learn is button colors might impact conversions.
Yes. We get to see both the total number of visitors and the number of visitors per variation.
Did they share the lift percentage correctly?
Yes. The relative lift is calculated correctly and we can verify this with the raw number of conversions.
Did they share the raw conversions?
Yes. We were told the number of total conversions and conversions per variation.
Did they identify the primary conversion metric?
Yes – clicks. Normally I don’t care for ‘click’ tests, but this is an affiliate site where they need to get people to click on these CTA buttons to eventually make a commission.
Did they publish the confidence rate and is it greater than 90%?
Yes.
Did they share the test procedure?
No. I really enjoy the case studies on GoodUI, but they don’t talk about test procedure in depth. This is the quintessential split test one pager.
Did they only use data to justify the conclusion?
There is no real conclusion drawn. Since they didn’t share a procedure the team also didn’t share a definitive conclusion. They isolated the elements, but didn’t attempt to connect it to any larger conclusion, e.g., active CTAs vs. passive CTAs.
The lack of conclusion when a procedure is not in place isn’t necessarily a bad thing. Some optimizers detest the idea of trying to ‘tell stories’ from the data and would actually prefer this method.
Did they share the test timeline and date?
No – we don’t have any idea about the test duration or when this test was run during the year.
Case Study Status:
Pretty Good!
The ease of consuming the data and the depth of the data is what makes this case study. However, there is no real procedural context or analysis to help inspire case study readers.
Case Study #3: Mobile UX Test
Read the full case study here.
Note: Unlike the other case studies we’ve seen so far this one is in prose.
Did they publish total visitors?
Yes. They share both the total visitors and the number of visitors per variation.
Did they share the lift percentage correctly?
Yes. The relative lift is calculated correctly and displayed in a screenshot of the Optimizely report.
Did they share the raw conversions?
No. Office politics and worries of corporate espionage strikes again and the raw conversions and sales figures were left out.
Does the lack of raw conversions hurt my case study?
Though I would like to have seen the raw conversions, the study also includes the line graph of conversion rate over time. This helps ease any worries around the test’s validity.
If you can’t share raw conversions, this is a nice work around to give your case study more credibility.
Did they identify the primary conversion metric?
Yes – sales and revenue. Unlike the last few case studies, the team her had a primary and secondary metric they were monitoring.
Did they publish the confidence rate and is it greater than 90%?
Yes.
Did they share the test procedure?
Yes. This entire post is a testament to their procedure. They shared their testing framework, segments, tangential research, and design inspiration.
Did they only use data to justify the conclusion?
Yes. The goal was to simplify the page and the simplistic version won. They didn’t say any one element is responsible for the conversion lift, but the new design as a whole.
Did they share the test timeline and date?
Yes. We know this test ran from Aug. 18th through Sept 15th. They didn’t round out the final week, which would have been ideal.
Case Study Status:
Stud.
Other than supplying us raw conversion numbers this test nearly completely opened the kimono.From a procedural standpoint, it really went in depth so you could draw connections to your own mobile site.
Yes. There were 10,000 visitors, though we don’t know exactly how many visitors went to each variation.
Did they share the lift percentage correctly?
They reported upon a 7% lift in purchases. We don’t know if they measured this correctly since we don’t have the raw number of conversions. In this case, I’ll give them the benefit of the doubt.
Did they share the raw conversions?
No. Raw conversions weren’t shared and they also didn’t share the conversion rate graph like ConversionXL did.
Does the lack of raw conversions hurt my case study?
It certainly does. With ~5,000 visitors going to each variation we need to know more about the actual raw conversions (or they could at least assure us that it hits a reasonable amount). Remember it isn’t just about the traffic; it’s about the conversions.
Did they identify the primary conversion metric?
Yes – total sales. With a coupon code test it would be nice to see how this impacted revenue. With only a 7% lift in purchases, there’s a chance revenue could suffer. However, at $3 off it likely didn’t break the bank.
Did they publish the confidence rate and is it greater than 90%?
Yes.
Did they share the test procedure?
Yes. They shared the test inspiration, hypothesis, and how they created the test.
Did they only use data to justify the conclusion?
Yes. By meeting the user’s expectation, e.g., giving them a discount when they are actively trying to get one, they were able to increase purchases.
Did they share the test timeline and date?
Yes. This test ran from June to September in 2015.
This test went on for entirely too long. There are all kinds of external variables that could have impacted test results outside of the changed element.
Case Study Status:
Meh…
The timeline and lack of revenue focus really hurt the test. However, this is a really cool example of how to use tech to meet user behavior so it is a great source of inspiration (the test’s saving grace).
No. What’s most troubling is there is a total of 4 variations and we have no idea what the traffic volume looks like here.
Did they share the lift percentage correctly?
The lift percentage is shared, 12.3%.
However we don’t know if it is correct due to the raw numbers being omitted. Furthermore, we don’t know which variation the winner is being compared to! Is it compared to all variations, the control, or the other variants?
Did they share the raw conversions?
No. Raw conversions weren’t shared and we don’t know the traffic split. We can assume that traffic was split evenly 4 ways.
Does the lack of raw conversions hurt my case study?
Yes. We have no idea about the number of conversions, visitors, or the sample size at all. Without that we can’t be absolutely sure about the findings.
Did they identify the primary conversion metric?
Yes – completed checkouts.
Did they publish the confidence rate and is it greater than 90%?
No. This is a little troubling especially coming from a site with a testing technology. They have to know the importance of the confidence rate and why it needs to be published.
Did they share the test procedure?
Yes. The testing team used insight from the customer service team and an internal prioritization framework to select the test.
Did they only use data to justify the conclusion?
No. In the results and learnings they talk about natural eye flow and the importance of playing down the price on the product page. None of the variations played down the price, in fact each one make it equally difficult to find the price!
Also, we don’t know how the other variants performed and whether there is any conclusive difference between each variation.
Did they share the test timeline and date?
No. We do not know how long this test ran or during what time of the year.
This test went on for entirely too long. There are all kinds of external variables that could have impacted test results outside of the changed element.
Case Study Status:
Dud.
With the lack of data, timeline, information between each variation this case study doesn’t provide much information.
That said, having a research procedure in place and using your customer support team as a data source is genius and should be used at every organization.
However, they lost points again by citing unfounded conclusions that are outside of the test’s scope and test’s ability to verify.
Your turn! If you read a case study recently, check how it holds up to this 8-point process.
Even better — if you are finishing up a case study you plan to use as a case study, then make sure to run it through my process so you provide the most value without people wondering if the results are valid.
June 2016
As promised, we’re back with more split test case study evaluations and this couldn’t come at a better time. I recently spoke at Conversion Conference and when I got my speaker review and comments I saw this one:
“…He seemed too concerned about making sure we all knew the data was accurate, which was unnecessary – we trust him!”
Well I’m glad you trust me, but the reason I’m so concerned about sharing raw data and proving we haven’t cut corners is because (heads up for those with sensitive ears!):
Most split test case studies and reports out there on the interwebs actually do cut corners with…
Overhyped tests
Manipulated data
False conclusions
One final note: I found these tests by Googling “Split test Case Studies.”
Despite some of these tests being a few years old, they are still some of the first tests people will see when they are looking for an example of a split test. The following tests weren’t disqualified for being older, specifically because they still hold SEO weight and still get a lot of attention. You’ll see why they’ve been disqualified momentarily.
All right, let’s get cracking!
Case Study #6: Background Image vs. Nothing at all
They never actually calculated the lift; they only showed the percent loss in the VWO screenshot. If you’re curious, it was a 42.19% lift, but based on the raw data that lift is pretty meaningless.
Did they share the raw conversions?
Yes.
Does the lack of raw conversions hurt the case study?
The raw conversion is what hurts this case study.
Did they identify the primary conversion metric?
No. This is another case study that falls into the generic “Conversion” trap.
Did they publish the confidence rate and is it greater than 90%?
No, they didn’t publish the confidence, but the raw data will help us figure that out. This test did reach a 95% confidence rate, but there are far too few conversions to call this test.
Did they share the test procedure?
Yes. The procedure was to settle a dispute.
I honestly can’t think of anyone who would think the white background form was going to win anything…
Did they only use data to justify the conclusion?
No.
I can’t think of a page that wouldn’t beat this non-contextualized form on a white background. ANYTHING would have made this page better, it just so happened to be that the background image was what they decided to go with.
The case study continues to talk about what makes up a cluttered page and attempts to make other points with this case study, too.
I’m okay with a little story telling or a small jump to conclusions when the data is there. However, this is just unacceptable.
Did they share the test timeline and date?
No.
Case Study Status:
Nope. I can’t even…
You can’t learn anything from this case study. Not one thing. Actually, I take it back. You can learn how NOT to write a case study.
Case Study #7: No Free Shipping Threshold Text vs. Free Shipping Threshold Text
Without the raw data I can’t calculate the percentage. In this case, I will give them the benefit of the doubt.
Did they share the raw conversions?
No.
Does the lack of raw conversions hurt the case study?
Yes, it certainly does! Give me that data!
Did they identify the primary conversion metric?
Yes.
Orders and Average Order Value (AOV).
Did they publish the confidence rate and is it greater than 90%?
They published the confidence rate for the orders.
You aren’t going to see a confidence rate for something like AOV. AOV will tell you whether the percent lift in sales actually made you more money.
This is a figure you get during post test analysis.
Did they share the test procedure?
They shared the test inspiration, which was a great way to come up with this test.
I’ve also seen plenty of well-designed split tests that used a similar hypothesis and saw major lifts in orders, AOV, and Revenue Per Visitor (RPV).
Did they only use data to justify the conclusion?
They didn’t make any major jumps in their conclusion that would have fallen outside of the observed data.
Did they share the test timeline and date?
No.
Case Study Status:
My second favorite of the batch despite its issues.
Really the main issues come from the lack of transparency.
I know a lot of organizations don’t want their raw data shared, but this is merely a snapshot of a period of time and isn’t indicative of their raw financials. I know VWO is a solid brand, but even though I trust them… well, I can’t trust them here.
#showmethedata
Case Study #8: Hamburger Icon vs. Menu
Read the full case study here… seriously do it. It’s informative and entertaining.
Did they publish total visitors?
Yes, in fact the case study shows all of the raw data.
Did they share the lift percentage correctly?
Yes, the percent lift is accurate and is the relative lift between the two conversion rates.
Did they share the raw conversions?
Yes, all data was shared. The author also shared device segments and their associated conversion rates.
Does the lack of raw conversions hurt the case study?
N/A
Did they identify the primary conversion metric?
Yes, menu clicks.
Did they publish the confidence rate and is it greater than 90%?
The confidence rate was not published. The author only said that the numbers were statistically significant. However, since the author shared the raw data you can figure out the confidence rate. This test reached a 99% confidence rate.
Did they share the test procedure?
The test procedure was shared.
This was the second test in a series of mobile navigation icon tests. The first had interesting results, but the data wasn’t significant between these tested variations.
The author also shared that he ran this on an internal tech and the potential technical limitations.
Did they only use data to justify the conclusion?
Absolutely.
There were no long-winded explanations as to why they acted this way or about any pseudo-psychology tricks that would have caused one variant to out perform the other.
This case study states the winning variation then includes potential caveats as well.
Did they share the test timeline and date?
No, this was the only thing missing from this test.
Case Study Status:
Delicious. 🙂
The lesson from this test is transparency.
If you read his intro, you’ll see he got some backlash on his first test. People publishing tests try to avoid backlash by obscuring the numbers. Sure, it hurts to be called out, but if people take your work for something more than it is, then their failures are on your conscience.
No, and this is incredibly unfortunate.
Whenever I see a triple digit lift, I immediately believe it is because the sample size is too small.
Did they share the lift percentage correctly?
I’m actually not sure here.
In their screen grab it shows a 114% lift improvement and the data in the Mixpanel calculator would mean it was a 135.8% lift.
All that said, the headline reads there was a 139% lift from this button copy change. If you’re going to have a massive lift, at least make sure your math is right.
Did they share the raw conversions?
No.
Which is bad news bears for this case study.
Does the lack of raw conversions hurt the case study?
Absolutely! Just like the traffic this is problematic.
We all know (or you’ll know after you read this) that it’s not the amount of traffic that dictates your tests validity but the number of conversions per variation. Without the raw data I’m still very suspect of this lift.
The single saving grace was that they showed the conversion rate over time in the screenshot and the fact that the data had normalized really helped their case.
Did they identify the primary conversion metric?
Yes. Watched demos.
Did they publish the confidence rate and is it greater than 90%?
Yes, but there are multiple reports. Remember 100% confidence is a mathematical impossibility, but I know they were trying to say 99.9%.
Did they share the test procedure?
Yes. They followed WiderFunnel’s LIFT methodology for creating their test variation.
Did they only use data to justify the conclusion?
I love a simple conclusion.
Similar to the last case study, the team just shared their results for this particular test at this particular time.
Did they share the test timeline and date?
The timeline wasn’t overtly shared, but you can see the timeline in the screenshot of the raw data. Unfortunately, this test wasn’t run for an entire week.
It would have been nice to see them round out the week to get cleaner data.
Case Study Status:
Missed the mark.
This test was very strong with the initial pre test procedure and had a proper conclusion. However, without the raw data the test is suspect! Since the change on the page was so minor and the lift was so large I’m worried that this test may have suffered from small sample size-itis.
Case Study #10: Below the Fold Testimonials vs. None
Yes, and thank goodness they did.
You CANNOT think any test will provide a large enough sample size with only 227 TOTAL visitors to a landing page.
Furthermore, the split is not equal; one variation had double the number of visitors. If your tech isn’t splitting your traffic equally, you need to get a new tech.
Did they share the lift percentage correctly?
Yes, the calculation is accurate, but the data is flawed.
Did they share the raw conversions?
Yes, they did share raw conversions, which would make me immediately disregard this case study.
Does the lack of raw conversions hurt the case study?
This time the sharing of raw conversions hurt the case study.
Did they identify the primary conversion metric?
They do not outright say what the converting action is.
I’m assuming it is opt-ins, but it could be clicks! Everyone needs to be 100% clear on what the conversion action is when they post a case study.
Just saying “conversion” doesn’t cut it!
Did they publish the confidence rate and is it greater than 90%?
Yes and yes!
However, the sample size is too small to take seriously. When you have an initial big lift between variations like this, your test will look like it is statistically significant. Just because the tech says so doesn’t make it so.
We have to be smarter than our tech and contextualize these numbers! Remember, at least 100 conversions per variation, and at the very least complete the week or run the test for 14 days.
Did they share the test procedure?
Nope, this was more or less a random act of testing.
Did they only use data to justify the conclusion?
This case study started to break down into some storytelling.
Removing testimonials has nothing to do with “simplistic design” at all. There are other factors! Maybe the visitors didn’t trust the testimonials. Maybe the image file size was too big and caused a load time issue. Who knows!
Did they share the test timeline and date?
Nope!
Case Study Status:
Idea fodder… maybe.
This case study is kind of a dumpster fire, but like all bad case studies out there they might provide some very minor inspiration.
There may be value if you haven’t thought about using testimonials on your landing pages, and you were looking for a new way to build trust or a new way to display testimonials. But that’s really it.
Got a case study you want me to look at or include in our next edition? Tweet the link to @digitalmktr or @jtrondeau, and I’ll give you my thoughts.
(Digital Marketer Lab Member Extra: Access your Landing Page Testing Formula Execution Plan now! Split test your way to more leads and sales from your landing pages. Not a Lab member? Click here to learn more about DM Lab.) Have a question? Ask the DM team and 9,036 other members in the DM Engage Facebook Group! Not a DM Lab Member? Learn more here.
Justin Rondeau
Justin Rondeau has been doing this whole “Marketing” thing since 2010, when he was mildly […okay maybe more than mildly] obsessed with all things, data, email, optimization, and split testing. Currently he is the co-owner and President of InvisiblePPC - the #1 white-label PPC provider for agencies. Previously he co-founded Truconversion and even sat at the helm as General Manager of DigitalMarketer and Scalable Labs. Rondeau has trained thousands of marketers, spoken on hundreds of stages, runs a delightful team of marketers, has dozens of shirts louder than his voice, and loves one hockey team: the Boston Bruins.
Subscribe to our weekly newsletter that delivers the most actionable, tactical, and timely marketing tips you actually need in 7 minutes or less. Get an edge over the competition, for free.