Descriptive statistics indices, such as Central Tendency indices or dispersion measures, are used to identify the statistical population. But using each one alone may not have a good result. Simultaneous attention to the criteria of concentration and dispersion helps the statistical population to be better known and its behavior can be compared and analyzed. they are known as summary statistics.
Central tendency measures show how much the data are concentrated around, but examining the concentration point alone may lead the researcher astray. Here in this post, we talk about Dispersions measure and their applications.
Measures of Dispersions
Let’s assume that the grades of the students of two teachers in teaching statistics are recorded according to the table below. Teacher A:
grades | 18 | 12 | 13 | 17 | 14 | 16 | 15 | 15 |
Teacher B:
grades | 18 | 20 | 10 | 20 | 20 | 12 | 10 | 10 |
If the average (mean) is used to evaluate two teachers, it seems that both have the same score. While it can be seen that in the class of “Teacher A”, the scores are more uniform and despite the average of 15 in the class of “Teacher B”, the scores are more scattered, which may indicate the lack of order in the teaching of statistics. In order to make a better judgment for the evaluation of these teachers, it is better to use other criteria such as the size of the dispersion in addition to the average index. In the following, we will introduce Variance and Standard deviation measures that are more famous.
Variance
In variance, the squared distances between values and the average is computed. This way of calculating the amount of dispersion is called “Variance”. If we denote the mean of the statistical population by $\mu$, the calculation form for the variance, which is denoted by the symbol Var or $\sigma^2$, is as follows.
$$\sigma ^2=Var(x)= \frac{\sum (x_i-\mu)^2}{n}$$
Note: It can be shown that the sum of $(x_i-a)^2$ intervals will have its lowest value when you plugin a as average. The formula is used to calculate the variance of the statistical population. But if a statistical sample is used instead of the statistical population, first the average of the statistical population ($\mu$) must be estimated, then the variance of the sample is used to estimate the variance of the statistical population. In this way, calculating the variance of the sample, we have the known sample mean as a limitation, all the values except one can change freely. Because the data changes must be in such a way that their average is equal to $\overline X$. In this way, we say that the data has n-1 degrees of freedom. The calculation of the sample variance shown by $S^2$ is slightly different from the population variance, because n-1 is used instead of n in the denominator. The calculation form of $s^2$ is as follows:
$ \large S^2= \frac{\sum (x_i-\overline x)^2}{n-1}$
where n-1 is called the degree of freedom for the sample variance. If the sample size becomes large, there will not be much difference between the sample variance and the population variance as n-1 approaches n. According to the example of students’ grades, the variance of grades for “Teacher A” is equal to 3.5 and for “Teacher B” is 21. If it is assumed that these classes are a sample of the classes of these two teachers, the sample variance for “Teacher A” is equal to 4 and for “Teacher B” is calculated as 24.
Advantages and Disadvantages
- Advantages
- Can be used in most mathematical concepts
- Calculate the dispersion around the mean
- The role of all data in calculating the amount of dispersion
- Disadvantages
- Squaring the unit of measurement for the degree of dispersion
- High impact of very large or small data
standard deviation or standard deviation
Due to the use of the power of 2 in the variance calculation, the measurement unit for this index will be square. For example, if the data is in grams, the variance will be in grams squared (grams to the power of 2). This makes it impossible to compare the variance with the data itself. To solve this problem, it is enough to calculate the square root of the variance so that the measurement unit of this dispersion index is the same as the measurement unit of the data. The result of this work is called “Standard Deviation”. The standard deviation of the statistical population is represented by $\sigma$ and the standard deviation of the sample is represented by S.
So, the calculation method for the standard deviation will be as follows;
$$\sigma = \sqrt {\sigma ^2}$$
$$S= \sqrt {S^2} $$
In the example of students’ grades, the standard deviation of grades for “Teacher A” is 1.87 and for “Teacher B” is 4.58. If it is assumed that these classes are a sample of the classes of these two teachers, the standard deviation of the sample for “Teacher A” is equal to 2 and for “Teacher B” is calculated to be 4.9. Note: If the data is multiplied or divided by a fixed value, their standard deviation will also be multiplied or divided by the same value. As a result, changing the scale of the data will cause a change in the standard deviation, but changing the location of the data has no effect on their standard deviation.
Advantages and Disadvantages
- Advantages
- Can be used in most mathematical concepts
- Calculate the dispersion around the mean
- The role of all data in calculating the amount of dispersion
- Can be used in most statistical comparisons
- Disadvantages
- High impact of very large or small data
- Lack of stability when changing the data unit