### Did you know that when you do an A/B test there is a point when it becomes “reliable enough”? And did you know that stopping the test before it gets to that point might give you misleading results?

The “reliability” of A/B test — or any scientific experiment — results is called *statistical significance.*

If you’re a mathematician, you might hate the way I am going to explain this, but if you’re not a mathematician it should help. :)

****

**Think of it like this:** if you had to predict the behaviour of 100 million people based on the first person that does the A/B test, would that be reliable?

Probably not. They could be anybody!

When the second person does it, the odds that those two people represent the majority are still low, but it’s a bit better than just one person.

After 10 million people have done the A/B test it is probably getting pretty reliable.

Lots of things can affect people the people you’re testing. Maybe they are having a bad day, maybe someone asks them a question while they are on your site, maybe they have used your app a hundred times before, maybe they aren’t in your target group (or maybe they are!).

Who knows?!

Statistical significance isn’t true or false. It’s probability. So the A/B test is 20% reliable, or 60%, or 99%, or anything in between.

Go for the big numbers (95+%). It takes longer. It requires more users. And it’s worth it.

Then *you* know (probably).

The analytics tool usually does all the math for you, so don’t panic… but it is good if you understand the idea. I don’t calculate my own statistical significance, and you probably won’t either, but it helps to know how to think about it.

As they say in the article, just because an A/B test result is *reliable* doesn’t mean it is *important. *That’s a whole other conversation.

But it can’t be important if it isn’t reliable.