How to Determine Measures of Position (Percentiles and Quartiles)

Although you may not often use measures such as percentiles and quartiles, these values are used to describe data in some situations, and knowing how to interpret them is beneficial.

Key Terms

o Range

o Measure of position

o Percentile

o Quartile

Objectives

o Determine the range of a data set

o Know how to interpret and determine measures of position (percentiles and quartiles)

While measures of central tendency, dispersion, and skewness are used often in statistics, there are other methods of characterizing or describing data distributions or portions that are commonly used as well. We will examine several of these statistical measures, some of which you may already know or have seen elsewhere.

Range

The range of a data set is simply the difference between the maximum and minimum values of the set. (This measure is typically considered a measure of dispersion, since it is a simple description of how far the data extends.) Thus, if a data set such as {x₁, x₂, x₃,., x_N} is provided in increasing order so that x_i < x_i₊₁, then the range of the data set is simply x_N – x₁. If the data set is not ordered, then you must simply determine by inspection the maximum and minimum values.

Practice Problem: Find the range of the following data set.

{1, 6, 3, 9, 7, 2, 5, 4, 11, 15, 10}

Solution: We can find the range either by simply looking for the maximum and minimum values or by arranging the set in increasing order and then subtracting the first element from the last. Although the latter approach is a bit more time consuming, it can be beneficial in cases where you need to perform other calculations. So, let's order the data set for the sake of completeness.

{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 15}

The range is then 15 – 1 = 14.

Quartiles and Percentiles

Virtually anyone who has taken a standardized test at one time or another is familiar with the term percentile. Although percentiles seem dangerously similar to percentages (that is, "percent correct," referring to the number of questions answered correctly divided by the total number of questions, all multiplied by 100%), they are actually different. A similar measurement is the quartile, which we will also discuss. Both percentiles and quartiles are statistical measures of position; that is, they do not measure a central tendency or a spread (dispersion), but instead measure location in a data set. (The exact definition of a percentile and quartile differs; these differences, however, tend to be minor and are focused on certain fine points. Also, these differences tend to disappear when the number of data values in the set is large.)

Let's consider a number p, where p is a whole number between 0 and 100. Assume that the number p describes the percentage of values less than or equal to some data value N_p. Consequently, 100 – p is the percentage of values greater than N_p. This number N_p is the pth percentile. Thus, to say that some data value x is the 75th percentile is to say that 75% of all the values in the data set are less than or equal to x, and that 25% of the data values are greater than x. Note that the percentile of a data value can also be understood as 100 times the cumulative relative frequency of that value. (Recall that the cumulative relative frequency of a value x is the relative frequency of all values less than or equal to x.) So, a student who gets a test score in the 90th percentile, for instance, hasn't (necessarily) scored 90/100 correct--he simply has a score that is at least as good as 90% of the other students. Although such a description isn't necessarily very satisfying for the student (who is probably more interested in finding out his percentage of correct answers), it is statistically helpful in certain situations. Typically, the 0th and 100th percentiles are not discussed, because these values are simply the minimum and maximum (respectively) of the data set.

Practice Problem: For the data set below, which value is in the 75th percentile?

{1, 3, 3, 4, 6, 7, 7, 7, 8, 9, 9, 10, 12, 15, 16, 17}

Solution: We want to find the data value N_p for which 75% of the data set is less than or equal to N_p. Note that there are a total of 16 values in the set; thus, 75% of the data set is 12 values. Because the data set is ordered, we need simply find the 12th data value; then, 75% (12 out of 16 values) of the data set will be less than or equal to this value. The number 10 is the 75th percentile: 75% of the values in the set are less than or equal to 10.

Practice Problem: Which of the following data values is the 50th percentile?

{1.52, 5.36, 6.79, 5.21, 0.28, 6.36, 8.47, 5.52, 6.26, 5.97}

Solution: The 50th percentile is that value N for which 50% of the values in the set are less than or equal to N. To help us find this value, let's first order the data set.

{0.28, 1.52, 5.21, 5.36, 5.52, 5.97, 6.26, 6.36, 6.79, 8.47}

The data set has 10 values; thus, the 50th percentile is the fifth data value, 5.52. Exactly half (50%) of the data values are less than or equal to 5.52, and the remaining half are greater than 5.52.

Another measure of position is the quartile, which is similar to the percentile except that it divides data into quarters (segments of 25% each) instead of hundredths. Thus, the nth quartile is the value x for which (25n)% of the values are less than or equal to x. Three quartiles are defined: Q1, Q2, and Q3. The quartile Q1 corresponds to the 25th percentile, Q2 to the 50th percentile, and Q3 to the 75th percentile.

The Q2 and the 50th percentile are sometimes said to correspond to the median of a data set. Given our definition of a median, this is true when there are an odd number of data values; it is not strictly true for an even number of data values (see the practice problem above)--the median, according to our definition, would actually be the mean of 5.52 and 5.97. We could, however, say that this median value (5.75) is the 50th percentile for the data set: technically, half the values in the data set are below this value, and half are above. Thus, we can still maintain our definition of the median if we appropriately define percentiles and quartiles. In addition, we can also note that Q1 is the median of the first half of the values, and Q3 is the median of the second half of the values. (Our above considerations on the definition of the median apply here as well.)

Practice Problem: What is Q3 for the following data set?

{20, 40, 50, 65, 70, 75, 80, 100}

Solution: Q3 is the value x for which 75% (three out of four) of the data values are at most x. Since there are eight members in the data set, the sixth value is Q3-75. This value is also the 75th percentile.