What is data variance?

Variance is a value that shows how “varied” a certain data is, and in mathematics it is used mainly in statistics and probabilities.

To calculate the variance, we

① Calculate the average difference for each data
② Square each of the differences
③ Add all calculated values
④ Divide everything by the number of data

The steps might be simple, but for people who do not understand ‘variance’, it might be hard to understand the meaning to these steps.

 

In that case, remember that

“The smaller the calculated value, the smaller the data variance”

In other words, the more the value is closer to ‘0’, the less the data is varied.

 

Calculate the variance/VAR

To calculate the variance in a data, we use the VAR function.

 

=VAR(number1, [number2]…)

[number1] = Select the 1st value or range to calculate the variance.
[number2] = Not obligatory. Select if there are other data to calculate the variance.

Definition of function A function to calculate the variance of a selected data
EXCEL version Excel2003 ・ Excel2007 ・ Excel2010 ・ Excel2013・ Excel2016

 

Example of use) VAR function

Below are end-of-year biology exam scores according to classes.
We will calculate the average for each class and look at the variance of scores.

① Calculate the variance for class ‘1-A’.

(1) [number] : Select cell range [C6:C11].
=VAR(C6:C11)

② Calculate the variance for class ‘1-B’.

(1) [number] : Select cell range [F6:F11].
=VAR(F6:F11)

We have successfully calculated variance for these 2 data.

 

POINT!

How to “analyze” (read) the data using the calculated variance

We have discussed the meaning of variance above, but here we will look at how to ‘analyze the data’ using the variance value.

How to analyze the data①

Firstly, in this data, the average score for class ‘1-B’ is higher than that of class 1-A.
This indicates from first sight that “students from class 1-B are better at Biology “.

How to analyze the data②

Next, if we look at the variance, class ‘1-A’ has a smaller value than class 1-B.
This indicates that “more students from class 1-A were able to achieve a score close to the average“.

Below is what we can see from ① and ②:

In class [1-B], there are students that are capable of achieving high scores, yet there are variances between the students. There is a big academic skill variance in this class.

In class [1-A], there are no students that have excellent scores, but the variance between students is small. There is a smaller academic skill variance in this class.

While it seems that class 1-B seems to have better grades overall, it is important to remember that variance between student grades are also important when analyzing data like these.

Calculate the variance/VARP

To calculate the variance in a specific data, we can also use the VARP function.

 

=VARP(number1, [number2]…)

[number1] = Select the 1st value or range to calculate the variance.
[number2] = Not obligatory. Select if there are other data to calculate the variance.

Definition of function A function to calculate the variance of a selected data
EXCEL version Excel2003 ・ Excel2007 ・ Excel2010 ・ Excel2013・ Excel2016

 

Example of use) VARP function

The method is the same as the VAR function, but the results show up differently.

① Calculate the variance for class ‘1-A’.

(1)[number] : Select cell range [C6:C11].
=VARP(C6:C11)

② Calculate the variance for class ‘1-B’.

(1)[number] : Select cell range [F6:F11].
=VARP(F6:F11)

 

Difference between VAR and VARP

The results that we get in the VAR function and the VARP function are different because the conditions in each function are different.

In the beginning, we have discussed that there are 4 steps in calculating the variation:

① Calculate the average difference for each data
② Square each of the differences
③ Add all calculated values
④ Divide everything by the number of data

The difference lies in step④, and this is what makes the result come out different from the VAR function and VARP function.

 

Below we discuss the difference of the VAR function and the VARP function.

The VAR function:

The VAR function identifies the selected data as a ‘sample‘ (A section of the whole data).
As the whole data number is unknown, in ④, the VAR function divides all the calculated values by the ‘selected data-1 (minus 1)‘, and try to avoid a biased result.

The VARP function:

We use the VARP function when the selected data is ‘complete‘ (i.e. all data is selected and there are no others).
Complete data = selected data for this function.

 

Furthermore, when calculating the variance, the ‘expected value’ (=accuracy of actual variance) becomes important and below are examples.

  • When there is a difference in the number of students between class A and class B, dividing the values by the ‘selected data-1 (minus 1)’ will make the expected value equal in each class. However, when dividing them merely by the ‘selected data’, the variance will tend to be smaller of that of the class with the smaller number of students.
  • When some students are absent on the day of the exam in class A, dividing the values by the ‘selected data-1’ will make the expected value equal regardless of absents. However, when dividing them merely by the ‘selected data’, the variance will tend to be smaller if there are absentees.

Therefore, in mathematics, we usually use the VAR function when calculating the variance.

 

We have discussed in full detail until here…

but it is still very confusing. To makes things easier, remember that…

↓Normally we would use this function
‘If there are a lot or an infinite number of data, or if the number of data is not certain, use the VAR function’

‘If all data is acknowledged and certain, use the VARP function’

 

Diagram example) VAR function

 

The VAR and VARP function may be extremely confusing, so if not sure, use the VAR function and select a section of the data and calculate.