There are often discussions about the appropriate level of funding for the US federal government. Recently, this discussion took on renewed vigor as the Trump administration released a budget plan that would dramatically shrink domestic, non-defense spending.
As someone who does research on federal expenditures, and who is generally interested in politics, I wanted to know more about this debate. It’s also an excuse to play around with maps in R, which can be tricky but very satisfying. Hence, this post - a look at federal financial assistance to state and local governments from the US federal government. I start by looking at broad funding levels, and then dig a little deeper on the EPA and funding to colleges and universities.
The data here are only for fiscal year 2016 (i.e., Oct. 1, 2015 - Sept. 30, 2016), but could easily be expanded back to at least FY 2000. The data are available from USASpending.gov, a repository of all spending on contracts, grants, and other forms of assistance provided by the federal government. I specifically look at grants, not contracts, which include most forms of non-contractual assistance from the government to non-federal entities, such as food stamps and research grants.
To download the data, which I call govt_assistance_2016
in the code below, proceed to the “Download Center” tab on the USASpending website. Select all agencies and fiscal year 2016. After you submit the request, it will generate the dataset, which may take a few minutes. Save this .csv file in your working directory. USASpending also provides a codebook for the data available; Section 4 of the codebook pertains to grants. The following analysis also requires a variety of R packages, listed below.
<?prettify?>
# Loading data and required packages for visualizing federal spending data setwd('~/My Working Directory') require(stringr) require(ggplot2) require(rgdal) require(maps) require(maptools) require(mapproj) require(viridis) require(rgeos) require(dplyr) require(fiftystater) require(plyr) govt_assistance <- read.csv('fed_assistance_fy_2016.csv', stringsAsFactors = F)
Having loaded the data, we can now start to see where funding goes. Fist, by using length(table(govt_assistance$principal_place_state_code))
, we can see there are 60 types of “states” that receive funding, indicating that non-state entities are included. I’m specifically interested in the 50 states, so I want to create a new data frame of just those records (I prefer to create new data frames rather than drop rows incase I make a mistake).
To begin, I make a table of the places where federal funds go. This is a count of grants, of which there are several sub-types, to each of the 60 state entities. This table is useful in part because it tells us states are identified by all-caps, two-letter abbreviations, which we can use to extract the records we want using R’s built-in list of state abbreviations.
<?prettify?>
# Create a table of grant frequency by state entity table_of_state_grants <- as.data.frame(table( govt_assistance$principal_place_state_code), stringsAsFactors = F)
<?prettify?>
# Look at top six rows from table_of_state_grants head(table_of_state_grants)
<?prettify?>
## Var1 Freq ## 1 75527 ## 2 00 98 ## 3 AK 6373 ## 4 AL 7990 ## 5 AR 5042 ## 6 AS 487
This table tells us not only how many grants went to entities in each state, but also that the states are identified by two-letter, all-caps abbreviations. The next section of code subsets the data to only the 50 US states.
<?prettify?>
# Create data frame of only grants that went to the 50 US states govt_assistance_states <- govt_assistance[which( govt_assistance$principal_place_state_code %in% as.list(toupper(state.abb))), ]
Having subset the data, we can now look at what types of entities in each state were receiving grants, according to the classification in the USASpending data.
<?prettify?>
# Make a data frame of grant recipient types grant_recipient_types <- as.data.frame(table(govt_assistance_states$recipient_type), stringsAsFactors = F) print(grant_recipient_types)
<?prettify?>
## Var1 Freq ## 1 00: State government 365575 ## 2 01: County government 5988 ## 3 02: City or township government 8778 ## 4 04: Special district government 2254 ## 5 05: Independent school district 13153 ## 6 06: State controlled institution of higher education 68415 ## 7 11: Indian tribe 13819 ## 8 12: Other nonprofit 45113 ## 9 20: Private higher education 31154 ## 10 21: Individual 11738 ## 11 22: Profit organization 3633 ## 12 23: Small business 8007 ## 13 25: All other 8777
In this case, I’m interested in funding that went to a state or lower-level governmental unit, which are the first five types of entities listed in the table. As such, I’ll now make a new data frame that includes only these five types of recipients.
<?prettify?>
state_local_assistance <- govt_assistance_states[which( govt_assistance_states$recipient_type %in% grant_recipient_types[c(1:5),1]),]
Within the federal assistance data, there are a number of variables that tell us about each grant. One of the most important for understanding funding levels is to look at new and continuing grants (rather than a funding adjustment or revision to a previous grant). The action_type variable identifies whether a grant is new, continuing, revised, or adjusted. I’m interested in only the new and continuing grant disbursements, so I subset to just those records.
<?prettify?>
# create subset data frame of new and continuing funding state_local_assistance <- state_local_assistance[which( state_local_assistance$action_type == 'A: New assistance action' | state_local_assistance$action_type == 'B: Continuation (funding in succeeding budget period which stemmed from prior agreement to fund amount of the current action)'),]
We’re left with 107,177 records, totaling over $476 billion in domestic assistance from the federal government to state and local entities.
I next use dplyr
to calculate the total funding received by each state. I label this variable state_assistance
, which is then used in the next code chunk to map the data.
<?prettify?>
# calculate state-by-state funding totals total_state_funding <- state_local_assistance %>% group_by(principal_place_state_code) %>% summarise(state_assistance = sum(fed_funding_amount))
At this point, we can finally start to visualize how federal funds are distributed across the country. I start with a simple map of the fifty states shaded by their total funding amount. You can read more about creating maps in R with ggplot2
and fiftystater
on this page. (And for more on the color palette I use, check out the viridis
package.)
<?prettify?>
# Load fifty-state shape file from fiftystater package data('fifty_states') # Add column to data with lower-case state names rather than abbreviation total_state_funding$state_id <- '' # Loop to look up state name; make lower case to match map data for(i in 1:nrow(total_state_funding)){ total_state_funding$state_id[i] <- tolower(state.name[grep(total_state_funding$principal_place_state_code[i], state.abb)]) } # Plot map ggplot(total_state_funding, aes(map_id = state_id)) + # map points to the fifty_states shape data geom_map(aes(fill = state_assistance), map = fifty_states) + expand_limits(x = fifty_states$long, y = fifty_states$lat) + coord_map() + scale_x_continuous(breaks = NULL) + scale_y_continuous(breaks = NULL) + labs(x = "", y = "", title = 'Federal Spending to US States') + theme(legend.position = "bottom", panel.background = element_blank()) + fifty_states_inset_boxes() + scale_fill_viridis(option = 'viridis')
Not surprisingly, California, New York, and Texas get the most money. These are also the three most populous states. Thus, a reasonable feature to take into account is population. The next code block calculate per-capita spending by drawing in US Census population estimates for each state.
There are multiple ways to incorporate demographic data into analyses. For example, the Census makes population estimates available for download on its website. However, in the interest of keeping the analysis contained within R, I will use the censusapi
package. Before using this package, you’ll need to set up an API key on the Census website, which you can do here. It’s worth spending some time getting to know the censusapi
package, starting with the list of available APIs. You can make a table of these with as.data.frame(listCensusApis())
. The API I use here is the “Vintage 2016 Population Estimates: Population Estimates,” known as ‘pep/population’ in the API. You can also see the list of variables included in a Census API with the listCensusMetadata()
command (in this case, listCensusMetadata(name="pep/population", type = "variables", vintage = 2016)
).
<?prettify?>
# Get population data from censusapi # Set your Census API key using Sys.setenv(CENSUS_KEY='YOUR_API_KEY') # Download Census population estimates by state for 2016 population_data <- getCensus(name = 'pep/population', vars = c('STATE', 'POP'), region = "state:*", vintage = 2016)
The STATE variable in the population data is the state FIPS code, rather than the state name. We haven’t made use of FIPS codes thus far, and so need to convert them to names or abbreviations. Thankfully, the noncensus
package is here to help. After creating a table with state abbreviations and corresponding FIPS codes, I run a loop to add the abbreviations to the population data frame.
<?prettify?>
# Load noncensus package require(noncensus) data("counties") # Use dplyr to create table of states and state FIPS codes state_fips <- distinct(select(counties, state, state_fips)) # Make character instead of factor state_fips$state <- as.character(state_fips$state) state_fips$state_fips <- as.character(state_fips$state_fips) # Make columns for state name and abbreviation in population data # Re-use principal_place_state_code as name for merge later population_data$principal_place_state_code <- '' # Loop to match state name and abbreviation to FIPS in population_data for(i in 1:nrow(population_data)){ population_data$principal_place_state_code[i] <- state_fips$state[which(state_fips$state_fips == population_data$STATE[i])] } # clean up data frame by only keeping columns with abbreviations and population population_data <- subset(population_data, select = c('principal_place_state_code', 'POP'))
Now, we can merge the population data with the total state spending data and calculate per-capita spending.
<?prettify?>
# Merge population data with total_state_funding data frame total_state_funding <- left_join(total_state_funding, population_data, by = 'principal_place_state_code') # Divide 2016 spending by estimated 2016 population total_state_funding$per_capita_funding <- as.numeric(total_state_funding$state_assistance)/ as.numeric(total_state_funding$POP) # Summarize per-capita funding summary(total_state_funding$per_capita_funding)
<?prettify?>
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 434.7 1211.0 1505.0 1521.0 1824.0 2769.0
As we can see, federal funding to state and local government entities ranged from $434.70/person to $2,769/person in fiscal year 2016. We can now map this using a slightly altered version of the code block above.
<?prettify?>
# Plot map - per-capita federal funding in FY 2016 ggplot(total_state_funding, aes(map_id = state_id)) + # map points to the fifty_states shape data geom_map(aes(fill = per_capita_funding), map = fifty_states) + expand_limits(x = fifty_states$long, y = fifty_states$lat) + coord_map() + scale_x_continuous(breaks = NULL) + scale_y_continuous(breaks = NULL) + labs(x = "", y = "", title = 'Federal Spending to US States') + theme(legend.position = "bottom", panel.background = element_blank()) + fifty_states_inset_boxes() + scale_fill_viridis(option = 'viridis')
To get a sense of the actual values, we can also print a rank-ordered table of per-capita federal funding.
<?prettify?>
# Print a rank-ordered table of top 10 states per-capita federal funding options(tibble.print_min = 50) total_state_funding[order(total_state_funding$per_capita_funding, decreasing = T), c(3,5)] # columns for state_id and per_capita_funding
<?prettify?>
## # A tibble: 50 x 2 ## state_id per_capita_funding ## ## 1 alaska 2768.7554 ## 2 vermont 2508.5928 ## 3 new york 2405.7763 ## 4 kentucky 2217.5354 ## 5 louisiana 2153.0957 ## 6 rhode island 2113.8055 ## 7 new mexico 2034.6928 ## 8 arkansas 1986.8485 ## 9 massachusetts 1932.4569 ## 10 north dakota 1921.1059 ## 11 montana 1861.3508 ## 12 california 1830.5677 ## 13 delaware 1826.5266 ## 14 mississippi 1814.5886 ## 15 west virginia 1801.9148 ## 16 connecticut 1766.0705 ## 17 michigan 1750.9256 ## 18 oregon 1706.3330 ## 19 maine 1697.9878 ## 20 hawaii 1586.1937 ## 21 wyoming 1582.6338 ## 22 arizona 1579.7738 ## 23 new jersey 1564.4301 ## 24 indiana 1533.1844 ## 25 minnesota 1528.4687 ## 26 iowa 1481.9301 ## 27 washington 1435.8609 ## 28 missouri 1428.5307 ## 29 maryland 1370.7874 ## 30 pennsylvania 1355.9804 ## 31 alabama 1307.1957 ## 32 ohio 1299.9173 ## 33 nevada 1270.8444 ## 34 texas 1252.0911 ## 35 oklahoma 1240.8785 ## 36 colorado 1231.7817 ## 37 tennessee 1216.1975 ## 38 illinois 1209.7643 ## 39 south carolina 1208.4932 ## 40 new hampshire 1200.9699 ## 41 north carolina 1150.6585 ## 42 idaho 1079.1449 ## 43 wisconsin 1074.1797 ## 44 south dakota 1067.0331 ## 45 georgia 1034.1041 ## 46 florida 930.6878 ## 47 utah 877.6799 ## 48 kansas 724.3746 ## 49 virginia 668.7814 ## 50 nebraska 434.6791
I had originally intended to go into agency-specific funding, but time and energy have gone elsewhere.