Joshua R. Bruce, PhD

Visualizing Per-Capita Federal Financial Assistance to States (with R)

23 Aug 2017

There are often discussions about the appropriate level of funding for the US federal government. Recently, this discussion took on renewed vigor as the Trump administration released a budget plan that would dramatically shrink domestic, non-defense spending.

As someone who does research on federal expenditures, and who is generally interested in politics, I wanted to know more about this debate. It’s also an excuse to play around with maps in R, which can be tricky but very satisfying. Hence, this post - a look at federal financial assistance to state and local governments from the US federal government. I start by looking at broad funding levels, and then dig a little deeper on the EPA and funding to colleges and universities.

The data here are only for fiscal year 2016 (i.e., Oct. 1, 2015 - Sept. 30, 2016), but could easily be expanded back to at least FY 2000. The data are available from USASpending.gov, a repository of all spending on contracts, grants, and other forms of assistance provided by the federal government. I specifically look at grants, not contracts, which include most forms of non-contractual assistance from the government to non-federal entities, such as food stamps and research grants.

Get the data

To download the data, which I call govt_assistance_2016 in the code below, proceed to the “Download Center” tab on the USASpending website. Select all agencies and fiscal year 2016. After you submit the request, it will generate the dataset, which may take a few minutes. Save this .csv file in your working directory. USASpending also provides a codebook for the data available; Section 4 of the codebook pertains to grants. The following analysis also requires a variety of R packages, listed below.

<?prettify?>

# Loading data and required packages for visualizing federal spending data
setwd('~/My Working Directory')
require(stringr)
require(ggplot2)
require(rgdal)
require(maps)
require(maptools)
require(mapproj)
require(viridis)
require(rgeos)
require(dplyr)
require(fiftystater)
require(plyr)

govt_assistance <- read.csv('fed_assistance_fy_2016.csv', stringsAsFactors = F)

Having loaded the data, we can now start to see where funding goes. Fist, by using length(table(govt_assistance$principal_place_state_code)), we can see there are 60 types of “states” that receive funding, indicating that non-state entities are included. I’m specifically interested in the 50 states, so I want to create a new data frame of just those records (I prefer to create new data frames rather than drop rows incase I make a mistake).

To begin, I make a table of the places where federal funds go. This is a count of grants, of which there are several sub-types, to each of the 60 state entities. This table is useful in part because it tells us states are identified by all-caps, two-letter abbreviations, which we can use to extract the records we want using R’s built-in list of state abbreviations.

<?prettify?>

# Create a table of grant frequency by state entity
table_of_state_grants <- as.data.frame(table(
    govt_assistance$principal_place_state_code),
    stringsAsFactors = F)

<?prettify?>

# Look at top six rows from table_of_state_grants
head(table_of_state_grants)

<?prettify?>

##   Var1  Freq
## 1      75527
## 2   00    98
## 3   AK  6373
## 4   AL  7990
## 5   AR  5042
## 6   AS   487

This table tells us not only how many grants went to entities in each state, but also that the states are identified by two-letter, all-caps abbreviations. The next section of code subsets the data to only the 50 US states.

<?prettify?>

# Create data frame of only grants that went to the 50 US states
govt_assistance_states <- govt_assistance[which(
    govt_assistance$principal_place_state_code
    %in% as.list(toupper(state.abb))), ]

Grant recipients

Having subset the data, we can now look at what types of entities in each state were receiving grants, according to the classification in the USASpending data.

<?prettify?>

# Make a data frame of grant recipient types 
grant_recipient_types <- as.data.frame(table(govt_assistance_states$recipient_type), 
                                        stringsAsFactors = F)
print(grant_recipient_types)

<?prettify?>

##                                                    Var1   Freq
## 1                                  00: State government 365575
## 2                                 01: County government   5988
## 3                       02: City or township government   8778
## 4                       04: Special district government   2254
## 5                       05: Independent school district  13153
## 6  06: State controlled institution of higher education  68415
## 7                                      11: Indian tribe  13819
## 8                                   12: Other nonprofit  45113
## 9                          20: Private higher education  31154
## 10                                       21: Individual  11738
## 11                              22: Profit organization   3633
## 12                                   23: Small business   8007
## 13                                        25: All other   8777

In this case, I’m interested in funding that went to a state or lower-level governmental unit, which are the first five types of entities listed in the table. As such, I’ll now make a new data frame that includes only these five types of recipients.

<?prettify?>

state_local_assistance <- govt_assistance_states[which(
    govt_assistance_states$recipient_type %in% grant_recipient_types[c(1:5),1]),]

Types of assistance

Within the federal assistance data, there are a number of variables that tell us about each grant. One of the most important for understanding funding levels is to look at new and continuing grants (rather than a funding adjustment or revision to a previous grant). The action_type variable identifies whether a grant is new, continuing, revised, or adjusted. I’m interested in only the new and continuing grant disbursements, so I subset to just those records.

<?prettify?>

# create subset data frame of new and continuing funding
state_local_assistance <- state_local_assistance[which(
    state_local_assistance$action_type == 'A: New assistance action' |
        state_local_assistance$action_type == 'B: Continuation (funding in succeeding budget period which stemmed from prior agreement to fund amount of the current action)'),]

We’re left with 107,177 records, totaling over $476 billion in domestic assistance from the federal government to state and local entities.

State-by-state federal funds

I next use dplyr to calculate the total funding received by each state. I label this variable state_assistance, which is then used in the next code chunk to map the data.

<?prettify?>

# calculate state-by-state funding totals
total_state_funding <- state_local_assistance %>% 
    group_by(principal_place_state_code) %>%
        summarise(state_assistance = sum(fed_funding_amount))

At this point, we can finally start to visualize how federal funds are distributed across the country. I start with a simple map of the fifty states shaded by their total funding amount. You can read more about creating maps in R with ggplot2 and fiftystater on this page. (And for more on the color palette I use, check out the viridis package.)

<?prettify?>

# Load fifty-state shape file from fiftystater package
data('fifty_states')

# Add column to data with lower-case state names rather than abbreviation
total_state_funding$state_id <- ''

# Loop to look up state name; make lower case to match map data
for(i in 1:nrow(total_state_funding)){
    total_state_funding$state_id[i] <-
        tolower(state.name[grep(total_state_funding$principal_place_state_code[i], 
                        state.abb)])
}

# Plot map
ggplot(total_state_funding, aes(map_id = state_id)) + 
    # map points to the fifty_states shape data
    geom_map(aes(fill = state_assistance), map = fifty_states) + 
    expand_limits(x = fifty_states$long, y = fifty_states$lat) +
    coord_map() +
    scale_x_continuous(breaks = NULL) + 
    scale_y_continuous(breaks = NULL) +
    labs(x = "", y = "", title = 'Federal Spending to US States') +
    theme(legend.position = "bottom", 
    panel.background = element_blank()) + 
    fifty_states_inset_boxes() +
    scale_fill_viridis(option = 'viridis')

Per-capita state spending

Not surprisingly, California, New York, and Texas get the most money. These are also the three most populous states. Thus, a reasonable feature to take into account is population. The next code block calculate per-capita spending by drawing in US Census population estimates for each state.

There are multiple ways to incorporate demographic data into analyses. For example, the Census makes population estimates available for download on its website. However, in the interest of keeping the analysis contained within R, I will use the censusapi package. Before using this package, you’ll need to set up an API key on the Census website, which you can do here. It’s worth spending some time getting to know the censusapi package, starting with the list of available APIs. You can make a table of these with as.data.frame(listCensusApis()). The API I use here is the “Vintage 2016 Population Estimates: Population Estimates,” known as ‘pep/population’ in the API. You can also see the list of variables included in a Census API with the listCensusMetadata() command (in this case, listCensusMetadata(name="pep/population", type = "variables", vintage = 2016)).

<?prettify?>

# Get population data from censusapi
# Set your Census API key using Sys.setenv(CENSUS_KEY='YOUR_API_KEY')

# Download Census population estimates by state for 2016
population_data <- getCensus(name = 'pep/population', 
                                vars = c('STATE', 'POP'), 
                                region = "state:*", 
                                vintage = 2016)

The STATE variable in the population data is the state FIPS code, rather than the state name. We haven’t made use of FIPS codes thus far, and so need to convert them to names or abbreviations. Thankfully, the noncensus package is here to help. After creating a table with state abbreviations and corresponding FIPS codes, I run a loop to add the abbreviations to the population data frame.

<?prettify?>

# Load noncensus package 
require(noncensus)
data("counties")

# Use dplyr to create table of states and state FIPS codes 
state_fips <- distinct(select(counties, state, state_fips))

# Make character instead of factor
state_fips$state <- as.character(state_fips$state)
state_fips$state_fips <- as.character(state_fips$state_fips)

# Make columns for state name and abbreviation in population data
# Re-use principal_place_state_code as name for merge later
population_data$principal_place_state_code <- ''

# Loop to match state name and abbreviation to FIPS in population_data
for(i in 1:nrow(population_data)){
    population_data$principal_place_state_code[i] <- 
        state_fips$state[which(state_fips$state_fips ==
                                population_data$STATE[i])]
}

# clean up data frame by only keeping columns with abbreviations and population
population_data <- subset(population_data, 
                            select = c('principal_place_state_code', 'POP'))

Now, we can merge the population data with the total state spending data and calculate per-capita spending.

<?prettify?>

# Merge population data with total_state_funding data frame
total_state_funding <- left_join(total_state_funding,
                                    population_data,
                                    by = 'principal_place_state_code')

# Divide 2016 spending by estimated 2016 population
total_state_funding$per_capita_funding <- as.numeric(total_state_funding$state_assistance)/
as.numeric(total_state_funding$POP)

# Summarize per-capita funding
summary(total_state_funding$per_capita_funding)

<?prettify?>

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   434.7  1211.0  1505.0  1521.0  1824.0  2769.0

As we can see, federal funding to state and local government entities ranged from $434.70/person to $2,769/person in fiscal year 2016. We can now map this using a slightly altered version of the code block above.

<?prettify?>

# Plot map - per-capita federal funding in FY 2016
ggplot(total_state_funding, aes(map_id = state_id)) + 
    # map points to the fifty_states shape data
    geom_map(aes(fill = per_capita_funding), map = fifty_states) + 
    expand_limits(x = fifty_states$long, y = fifty_states$lat) +
    coord_map() +
    scale_x_continuous(breaks = NULL) + 
    scale_y_continuous(breaks = NULL) +
    labs(x = "", y = "", title = 'Federal Spending to US States') +
    theme(legend.position = "bottom", 
    panel.background = element_blank()) + 
    fifty_states_inset_boxes() +
    scale_fill_viridis(option = 'viridis')

To get a sense of the actual values, we can also print a rank-ordered table of per-capita federal funding.

<?prettify?>

# Print a rank-ordered table of top 10 states per-capita federal funding
options(tibble.print_min = 50)
total_state_funding[order(total_state_funding$per_capita_funding, 
decreasing = T), 
c(3,5)] # columns for state_id and per_capita_funding

<?prettify?>

## # A tibble: 50 x 2
##          state_id per_capita_funding
##             
##  1         alaska          2768.7554
##  2        vermont          2508.5928
##  3       new york          2405.7763
##  4       kentucky          2217.5354
##  5      louisiana          2153.0957
##  6   rhode island          2113.8055
##  7     new mexico          2034.6928
##  8       arkansas          1986.8485
##  9  massachusetts          1932.4569
## 10   north dakota          1921.1059
## 11        montana          1861.3508
## 12     california          1830.5677
## 13       delaware          1826.5266
## 14    mississippi          1814.5886
## 15  west virginia          1801.9148
## 16    connecticut          1766.0705
## 17       michigan          1750.9256
## 18         oregon          1706.3330
## 19          maine          1697.9878
## 20         hawaii          1586.1937
## 21        wyoming          1582.6338
## 22        arizona          1579.7738
## 23     new jersey          1564.4301
## 24        indiana          1533.1844
## 25      minnesota          1528.4687
## 26           iowa          1481.9301
## 27     washington          1435.8609
## 28       missouri          1428.5307
## 29       maryland          1370.7874
## 30   pennsylvania          1355.9804
## 31        alabama          1307.1957
## 32           ohio          1299.9173
## 33         nevada          1270.8444
## 34          texas          1252.0911
## 35       oklahoma          1240.8785
## 36       colorado          1231.7817
## 37      tennessee          1216.1975
## 38       illinois          1209.7643
## 39 south carolina          1208.4932
## 40  new hampshire          1200.9699
## 41 north carolina          1150.6585
## 42          idaho          1079.1449
## 43      wisconsin          1074.1797
## 44   south dakota          1067.0331
## 45        georgia          1034.1041
## 46        florida           930.6878
## 47           utah           877.6799
## 48         kansas           724.3746
## 49       virginia           668.7814
## 50       nebraska           434.6791

End Note

I had originally intended to go into agency-specific funding, but time and energy have gone elsewhere.