While metrics like the p value will tell us if the results are significant or not, the effect size tells us if the values are meaningful.

Say we went and measured the toe length from 2 populations of lizards. After collecting thousands of measurements, we find a significant difference between the two populations. 

But are the difference between populations actually meaningful? 

Say our control group has a toe length of 50 mm. If the second population has a mean toe length of 51 mm, that may not make much of a difference biologically. In essence, the results are significantly different, but not biologically meaningful. 

But lets say instead that the population differences were still significant, but the mean toe length was instead 60mm. That difference might actually be meaningful! In the context of lizard toe length, generally speaking, the more arboreal a lizard is, the longer their toes. We might actually be able to detect that these lizards with longer toes are indeed more arboreal!

image depict 2 graphs. Top text reads Effect Size Toe Lengths in lizards. Left figure has a p-value of .02 and Cohens D of .2. The blue, leftmost distribution of both plots is labeled control with a mean of 50mm. The red, rightmost distribution is labeled 51mm on the left and 60mm on the right.

While both results allow us to reject the null hypothesis, because they are both significant, we would use our understanding of the biology to say that one is biologically meaningful while the other is not. With enough data, we will almost always detect significant differences between populations. 

Given this, how do we determine how meaningful a value is?

That is where effect size comes into play.

Effect size measures the magnitude of difference between populations or the strength of a relationship between two variables. It helps us understand whether a statistically significant result is also meaningful. Understanding effect size can help us interpret statistical results in the context of the research question and real-world implications.

Cohen’s d is a measure of effect size that describes the difference between two means. Generally, a Cohen’s d of 0.3-0.5 is considered a small effect size, 0.5-0.8 a moderate effect size, and 0.8 or higher a large effect size. Cohen’s d is calculated by taking the difference between the means of two groups and dividing it by the pooled standard deviation. 

With that calculation, we can say that a Cohen’s D of 1, indicates that the mean difference between populations is equal to 1 standard deviation. A Cohen’s D of .5, tells us that the mean difference is only half a standard deviation.

There are many metrics that can be used for determining effect size and they depend on which statistical test you are performing. Another common one is the correlation coefficient. We’ll cover it in depth during our correlation section, but it is a value from -1 to 1 that tells us the direction and strength of the relationship between 2 continuous variables. We can show that relationship according to how tightly clustered points are to a line of best fit. 

2 scatterplots are drawn each with a line of best fit. Bother are labeled with a p value of .02, with toe mm on the x axis. Tail mm on the y axis. The left graphs has an r value of .2 and the right has an r value of .8

The closer that r value is to 0, the weaker (and less meaningful) the relationship, and naturally the closer the value is to 1 (or to -1 for a negative correlation) the stronger (and more meaningful) the relationship. For example, a value of .2 indicates a pretty weak relationship, even if the data are significant. Conversely a value of .8 (closer to the max of 1) indicates a rather strong relationship!

Last modified: Wednesday, 26 November 2025, 6:43 AM