A part of my responsibilities as a post-doc here at Alabama entails helping faculty with seeking out external funding for projects. One of these projects has involved developing a program to promote educational outreach throughout the state. In the course of this work I’ve been digging through some data on educational attainment rates in Alabama. In terms of learning about the political and economic characteristics of my new state, this has been an informative exercise. I also think it’s helpful in illustrating the benefits of digging through your data, and helps to shed some light on the benefits of considering the distribution of your data more carefully, and so I thought I’d write a quick post on the subject. For the purposes of this post I’ll be drawing on data collected from the American Community Survey‘s 2006–2010 five year population estimates.
The mean percentage of Alabama’s population holding a bachelor’s degree or higher for this time period is 21.7%. Compare this figure with that of my home state—New York’s overall rate is 32.1%. This is a fairly large disparity, but further disparities emerge when we begin to disaggregate the data by race. The graph below shows the mean value of this same variable across the black and white populations of both states.
From this we can see that the total values mask some important differences between groups within each state. In both cases the college attainment rate of the white population is slightly higher than the overall mean value. Alternatively, the attainment for the black population is significantly lower than the overall mean value in each state. Even New York, which on its face appears to have a significantly higher overall college attainment rate, is not much different from Alabama when we compare black college attainment rates with Alabama’s values.
Even when disaggregating the data in this way, we often still lean on the mean value pretty heavily. The distribution of the data around the mean can be incredibly informative as well, but we don’t often pay as much attention to this as we should. Below I present a quick illustrative example of why this can matter.
Looking at Alabama’s data again, I have drawn on county-level college attainment data for both groups.
Looking back at the first graph we can see that the statewide mean white college attainment rate was about 24%, and the statewide black attainment rate is roughly 14%. Looking at the county-level distributions can help us to get a better grasp on educational attainment rates throughout the state. In the case of white attainment, we can see that the distribution is slightly skewed towards the low end, but the lowest value in the distribution is around 8–9%. Alternatively, the distribution of black college attainment rates at the county level is more heavily skewed, and the lowest value in the distribution is considerably lower than for the white attainment rates, as shown in the preceding graphs where the X-axes are held constant to help make the figures more comparable. In fact, .30% is the lowest black attainment rate recorded in the data.
Looking at the distribution of the data in addition to the mean values for these different groups can provide some additional insights into the substantive issues we’re interested in. In this case, it’s already apparent from considering the mean values that there are disparities in the college attainment rates of the black and white segments of the state’s population. However, these values may actually underestimate the magnitude of those disparities. Though the distribution for white attainment rates is still skewed, it’s floor is considerably higher than that of black attainment rates. In fact, by my count there were 14 counties in which the recorded black attainment rate was lower than the lowest recorded white attainment rate. The maximum for each group is roughly the same, however. Areas around Tuscaloosa, Birmingham, and Huntsville have fairly high college attainment rates. But these areas are highly unrepresentative when considering the state more broadly. In fact, although the mean statewide black attainment rate is 14%, the majority of counties fall well below this figure. And while most counties also fall below the mean on white attainment rates, there are, comparatively, more counties falling above the mean. All of this suggests that the mean statewide black college attainment figures are more heavily influenced by outliers in the data than are white attainment rates.
Lastly, looking at these plots reveals another important feature of the distributions—the variance. The variance in white college attainment rates is 66.2. The variance in black college attainment rates is nearly half that, at only 37.5. This tells us that not only are black college attainment rates lower than white attainment rates, but they are more uniformly so. To put it differently, there is greater dispersion in white college attainment rates, while black college attainment rates are more tightly clustered around lower values. This is made apparent by looking at the two preceding graphs. Though both are skewed towards the lower end of the X-axis, white college attainment figures are less tightly clustered here, as I mentioned above.
This is an issue that I’ll come back to in the future, but for now it seemed like this was a useful example for illustrating the importance of familiarizing yourself with the data that you’re working with, and how relying less on simple mean values can reveal interesting features of the data you’re working with. Bear Braumoeller has an interesting Political Analysis piece about examining the variance, which is worth checking out. More substantively, I also think that this exercise helps to more fully draw out the disparities that may exist between social groups. Relying on simple indicators of central tendency can be useful in some contexts when we want to quickly convey bits of information, but these figures can also be misleading. At the very least, they don’t always tell us the full story.