In [1]:
import geopandas
import pandas as pd
import os
import re
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math

import matplotlib.ticker as mtick

import plotly.express as px
import plotly.offline as py
import plotly.graph_objects as go

import statsmodels.api as sm

Mailbox Removals

This project looks for political bias in the mailboxes that are removed.

Background:

Some news sources have accused Trump's administration of sabotaging the mail system as a means to influence the 2020 election (Vox, CNN, Mother Jones, Vanity Fair. Some article specifically address removal of mailboxes, specifically in Portland. While more recently, some news sources report that photos are misleading after learning that some were taken during the process of refurbishing mailboxes. When refurbishing, the mailboxes are generally replaced with a new mailbox in the same trip when they are removed Vox.

We can use data to more definatively measure if the Trump administration is removing mailboxes in an effort to sabotage the election.

Source data:

I will use the following data:

  • collection_boxes by Nathaniel Story on GitHub. Mr. Story used FOIA to request a full listing of collection boxes from USPS for his site MailboxLocate. The FOIA request lists all mailboxes from September 9, 2019. Then he performed a scrape of USPS PO Locator on 8/15/2020. He then used a script to get a list of removed and added mailboxes.

    • Data reliability: I took a random sample of 15 mailboxes from the scraped dataset and compared them with the USPS Mailbox Locator. All instances matched. Because I do not have access to USPS data, I did not perform a completness test to ensure that every mailbox controlled by USPS is included in the Mailbox Locator. According to Mr. Story's GitHub page, he compared a scrape to the FOIA information and was able to determine that the scrape was reliably reading complete information.
    • Mr. Story and I independently noted a large number of collection boxes removed from Washington D.C. Because it is such an outlier, it influences many analyses. In some instances I exclude Washington D.C and disclose it in the individual analysis. If there is no note on the analysis, I am using the full dataset.
  • 2018 TIGER/Line Shapefiles, US Census Bureau I used the US census projection of shapefiles. I chose 2018 so the data could be easily compared with the 2018 ACS population data.

  • American Community Survey, 2018: 1-Year Estimates, Table B01003, US Census Bureau This table gives population estimates by county.
    • Data reliability: I did not consider margins of error, but only the US Census best estimate. Because there is only survey data and the last Census is from 2010, there are relatively large margins of error for smaller communities.
  • County Presidential Returns 200-2016 While this dataset contains results from multiple elections, I will only consider the results from 2016 election.
    • Note on level of analysis: I attempted to perform this analysis using voting precinct data which is more specific than county level data. At the time of compiling this project, while all precinct results were available, not all shapefiles were available. Therefore the analysis would have omitted some states entirely. County level data is the most granular election data available at the time.

Limitations:

This analysis only considers the change in post office locations; it does not consider potentially important ways of measuring the effectiveness of the Postal Service. Other ways to measure success are delivery times, wait times in Post Offices, how accessible Post Offices are. Further, if the Post Office removes collection boxes without updating their website, it would not be reflected in this analysis.

Conclusion

There is no evidence of bias trends. Collection boxes are removed from both Democrat and Republican communities in approximately equal amounts. Direct comparisons are difficult because many Democratic counties tend to be large population centers. Large population centers, in general, have more collection boxes, and therefore saw more collection boxes removed. However, when considering percent changes, and changes based on population size, there is no evidence of bias.

When considering only net change in the the number of mailboxes, Democrat counties have more mailboxes removed, but in general, they have more mailboxes to begin with.

Washington D.C, is an interesting case seeing nearly 80% of collection boxes removed, according to Mr. Story's data. I believe this to be irrelevant based on the electoral college system. Washington D.C. overwhelmingly supported Clinton in 2016 (Trump received fewer than 5% of overall votes). Also, Washington D.C. is not divided into multiple counties. However Washington D.C. county votes, so votes all of Washington D.C. electoral college votes. Even if Trump successfully suppressed 90% of all Clinton voters, the electoral votes for Washington D.C. would have been awarded to Clinton. I do not believe removing mailboxes from Washington D.C. specially is intended to interfere with the presidential election.

Code:

Data import

County-level voting data

The following block of code imports the 2018 US Census Shapefile and plots it to confirm it was imported correctly.

In [2]:
source = 'source data/us county shapefile/tl_2018_us_county.shp'

county = geopandas.read_file(source)
county['COUNTYFP'] = county['COUNTYFP'].astype(int)
county['GEOID'] = county['GEOID'].astype(int)

print(county.head())
county.plot()
  STATEFP  COUNTYFP  COUNTYNS  GEOID       NAME          NAMELSAD LSAD  \
0      31        39  00835841  31039     Cuming     Cuming County   06   
1      53        69  01513275  53069  Wahkiakum  Wahkiakum County   06   
2      35        11  00933054  35011    De Baca    De Baca County   06   
3      31       109  00835876  31109  Lancaster  Lancaster County   06   
4      31       129  00835886  31129   Nuckolls   Nuckolls County   06   

  CLASSFP  MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT       ALAND    AWATER  \
0      H1  G4020  None   None     None        A  1477652222  10690952   
1      H1  G4020  None   None     None        A   680956809  61588406   
2      H1  G4020  None   None     None        A  6016819484  29089486   
3      H1  G4020   339  30700     None        A  2169287528  22832516   
4      H1  G4020  None   None     None        A  1489645187   1718484   

      INTPTLAT      INTPTLON  \
0  +41.9158651  -096.7885168   
1  +46.2946377  -123.4244583   
2  +34.3592729  -104.3686961   
3  +40.7835474  -096.6886584   
4  +40.1764918  -098.0468422   

                                            geometry  
0  POLYGON ((-97.01952 42.00410, -97.01952 42.004...  
1  POLYGON ((-123.43639 46.23820, -123.44759 46.2...  
2  POLYGON ((-104.56739 33.99757, -104.56772 33.9...  
3  POLYGON ((-96.91075 40.78494, -96.91075 40.790...  
4  POLYGON ((-98.27367 40.08940, -98.27367 40.089...  
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a2ec4369a0>

Voter data

The following block of code imports the 2016 presidential election results, then merges it with the shapefile.

In [3]:
source = 'source data/county presidential results/countypres_2000-2016.csv'
county_vote = pd.read_csv(source).dropna(subset=['FIPS'])
county_vote['FIPS'] = county_vote['FIPS'].astype(int)
cv_voi = ['party','candidatevotes','totalvotes','FIPS']

mask = county_vote['year'] == 2016
c2016 = county_vote[mask].fillna(
    {'party':'na','candidatevotes':0}).pivot_table(values='candidatevotes',columns='party',index='FIPS')

vote_cols = c2016.columns
c2016 = c2016.reset_index().rename(columns={'FIPS':'GEOID'})
c2016
Out[3]:
party GEOID democrat na republican
0 1001 5936.0 865.0 18172.0
1 1003 18458.0 3874.0 72883.0
2 1005 4871.0 144.0 5454.0
3 1007 1874.0 207.0 6738.0
4 1009 2156.0 573.0 22859.0
... ... ... ... ...
3150 56037 3231.0 1745.0 12154.0
3151 56039 7314.0 1392.0 3921.0
3152 56041 1202.0 1114.0 6154.0
3153 56043 532.0 371.0 2911.0
3154 56045 299.0 194.0 3033.0

3155 rows × 4 columns

To compute how much a county leaned Democrat or Republican, I calculated the percent of votes for Trump--labeled r/(rd) in the data. I excluded any votes for third party candidates so that Republican vs. Democrat could be shown with a single variable.

Using this method if r/(rd)$ > .5$ then the county voted from Trump. If r/(rd)$< .5$, the county voted for Clinton.

In [4]:
cdf2016=county[['GEOID','NAME','geometry']].merge(c2016,on='GEOID')
cdf2016['r/(rd)'] = cdf2016['republican'] / (cdf2016['republican']+cdf2016['democrat'])
cdf2016
Out[4]:
GEOID NAME geometry democrat na republican r/(rd)
0 31039 Cuming POLYGON ((-97.01952 42.00410, -97.01952 42.004... 719.0 233.0 3122.0 0.812809
1 53069 Wahkiakum POLYGON ((-123.43639 46.23820, -123.44759 46.2... 832.0 182.0 1344.0 0.617647
2 35011 De Baca POLYGON ((-104.56739 33.99757, -104.56772 33.9... 193.0 97.0 620.0 0.762608
3 31109 Lancaster POLYGON ((-96.91075 40.78494, -96.91075 40.790... 61898.0 12737.0 61588.0 0.498745
4 31129 Nuckolls POLYGON ((-98.27367 40.08940, -98.27367 40.089... 353.0 126.0 1726.0 0.830207
... ... ... ... ... ... ... ...
3109 13123 Gilmer POLYGON ((-84.65478 34.66559, -84.65488 34.669... 1965.0 331.0 10477.0 0.842067
3110 27135 Roseau POLYGON ((-96.40466 48.80528, -96.40467 48.813... 1856.0 497.0 5451.0 0.745997
3111 28089 Madison POLYGON ((-90.09363 32.70763, -90.09360 32.707... 20343.0 1194.0 28265.0 0.581489
3112 48227 Howard POLYGON ((-101.69227 32.27106, -101.69221 32.2... 1770.0 316.0 6637.0 0.789461
3113 54099 Wayne POLYGON ((-82.59529 38.36978, -82.59515 38.369... 3357.0 673.0 11152.0 0.768626

3114 rows × 7 columns

To visualize the shapefiles, I create a map with election results and compared it to the map from the NY times(Screenshot below).

Visually, this map looks similar to the NY times map with two exceptions.First, Oglala county South Dakota which for some reason didn't get plotted. Second, the US Census TIGER lines include parts of lake Michigan. I'm assuming there are no mailboxes on the lake. For this reason, I will not consider population density.

In [5]:
mask = (cdf2016['republican'] > cdf2016['democrat']) & (cdf2016['republican'] > cdf2016['na'])
cdf2016.loc[mask,'color'] = 'Trump'

mask = (cdf2016['democrat'] > cdf2016['republican']) & (cdf2016['democrat'] > cdf2016['na'])
cdf2016.loc[mask,'color'] = 'Clinton'

plt.figure()

f,ax = plt.subplots(1,1)

f.set_figheight(15)
f.set_figwidth(15)

ax.set_xlim(-135,-60)
cdf2016.plot(column='color',ax=ax,
             legend=True,cmap='RdBu',
             categories=['Trump','Clinton']
            )
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a2ec4a2580>
<Figure size 432x288 with 0 Axes>