Teaching Introductory Graduate Methods: Some Ideas About Understanding Sampling Distributions

Everyone has their own way of teaching methods, but in this entry I thought I’d share a brief exercise that I thought worked well. This code was developed when I was the teaching assistant for a second-semester methods class focusing on regression. Specifically, this was a section of the class from the first few weeks that dealt with two things: first, showing/repeating the idea of a sampling distribution, and second, showing some introductory code on loops and graphics.

The code uses the ggplot2 package for R. This software, and this package, have (in my experience) a somewhat steep learning curve so starting slow is important, even in the second semester. Equally, the concept of a sampling distribution is something that takes a while to click with most people who don’t have a statistical background. So what I’m trying to do with this code is show some basic graphics, and also introduce sampling distributions by ‘bringing one to you on a plate’.

The first part of the code simulates ten rolls of a (pseudo) six-sided die:

cenlt<-vector(length=10)

for (i in 1:10)
{

x<-runif(10,1,6)

cenlt[i]<-mean(x)

}

cenlt

Here we create a vector, called cenlt (named for the central limit theorem), of length ten. We don’t give R any instructions about what to put in this vector, so if we inspect it (typing “cenlt”) it will display ‘FALSE’ ten times. We leave this empty vector to one side. Then we create a new imaginary conveyor belt, telling R to sit beside the conveyor belt and perform the same operation every time the belt rolls past, which it will do ten individual times.

The operation R will perform is to create ‘x’, which is a set of ten numbers, each randomly sampled from a uniform distribution between 1 and 6. We then take the mean of this set, and store it as a new element of cenlt. When we’ve done this ten times, then, cenlt will be full. We then display cenlt. We can actually display it a little bit more attractively if we graph it:

library(ggplot2)

p1<-ggplot()

p1<-p1+geom_histogram(aes(x=cenlt),binwidth=0.2,
linetype=1, alpha=.3, color=”black”, fill=”red”)

p1<-p1+ ylab(“Count”) +
xlab(“Sample Mean”) + xlim(c(0,6)) + ylim(c(0,6))

p1<-p1+opts(title=”Distribution of Sample
Means”)

p1<-p1+geom_segment(aes(x = 3.5, y =0, xend =
3.5, yend =6))

p1

The first line creates a new object called p1, which we set to be a graph object. Then we take it, and add another layer. The second line tells it to fill the graph with the shapes necessary to make a histogram: aesthetically, this means an x-axis defined by the values of cenlt. We ask for the shapes to have a solid black outline (linetype = 1, color = “black”) and to be semi-transparent and red (alpha = 0.3, fill = “red”). We then add a line showing the expected mean (geom_segment) as well as other details. This code was then followed by a discussion of what each element of ‘cenlt’ represented. Each was a mean. So the mean of cenlt was a ‘mean of means’, and so cenlt has its own distribution: the distribution of means of ‘x’, and if we consider each x to be a sample, then seeing cenlt as the sampling distribution is relatively straightforward. The code can then be tweaked to make ‘x’ bigger, or to run ‘x’ more times, and discussing each of these options helps elucidate the idea of a sampling distribution. Changing the distribution away from uniform can also be useful. We also got some good discussions out of manipulating the basic parameters of the graph (the axis labels, the colors and binwidth of the histogram), and then later by switching the geoms to different options. All in all, this was a useful exercise that led to some good discussions, and helped introduce the ideas of sampling distributions, loops, and statistical graphics. 