1 The Problem

You want to obtain US Census data at a low level of geographic aggregation. This example uses Washington, DC, Census tracts, but other states or levels of aggregation work similarly.

2 A Solution

2.1 Set up

Install (if needed) and load the UScensus2010 package in R:

# install.packages("UScensus2010")
library(UScensus2010)

Select a level of geography, and obtain and load the package for that level, which contains the data for that level. Specify your operating system. Note: for low levels of geography, there will be many units, so this may take a while, depending on your computer/connection speed.

install.tract("osx")

After the first installation, you can start each new session here, with

library(UScensus2010tract)

2.2 Obtain the Data

Make the data available in the workspace:

data("district_of_columbia.tract10")

The data are stored as a SpatialPolygonsDataFrame. This includes a standard data frame with 179 tracts and 461 Census variables. See help(district_of_columbia.tract10) for details.

Obtaining the spatial data involves S4 classes. The data frame is in the @data slot.

To see the first 6 rows and columns,

district_of_columbia.tract10@data[1:6, 1:6]
##                                state county  tract        fips P0010001
## district_of_columbia.tract10_0    11    001 002201 11001002201     3442
## district_of_columbia.tract10_1    11    001 002202 11001002202     3087
## district_of_columbia.tract10_2    11    001 002301 11001002301     2974
## district_of_columbia.tract10_3    11    001 002302 11001002302     2036
## district_of_columbia.tract10_4    11    001 002400 11001002400     3618
## district_of_columbia.tract10_5    11    001 002501 11001002501     2554
##                                P0020001
## district_of_columbia.tract10_0     3442
## district_of_columbia.tract10_1     3087
## district_of_columbia.tract10_2     2974
## district_of_columbia.tract10_3     2036
## district_of_columbia.tract10_4     3618
## district_of_columbia.tract10_5     2554

To get the center of the \(k^{th}\) tract, e.g., the \(10^{th}\) tract,

k <- 10
district_of_columbia.tract10@polygons[[k]]@labpt
## [1] -77.04182  38.93018

To store the centers of all 179 tracts,

## Get the number of tracts:
n_tract <- length(district_of_columbia.tract10)

## Create storage matrix:
centers <- matrix(NA, nrow = n_tract, ncol = 2)

## Loop over tracts, extracting longitude and latitude and storing:
for(i in 1:n_tract){
  
  this_center <- district_of_columbia.tract10@polygons[[i]]@labpt
  
  centers[i, ] <- this_center
}

Then I store this as a data frame and rename the columns:

centers <- data.frame(centers)
names(centers) <- c("long", "lat")

Here are the first few longitudes and latitudes:

head(centers)
##        long      lat
## 1 -77.02329 38.94913
## 2 -77.01386 38.94884
## 3 -77.01657 38.94259
## 4 -77.01088 38.93390
## 5 -77.02241 38.94178
## 6 -77.03173 38.94465