maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts, and districts, such as Assigning precincts to districts, Aggregating block data to precincts,Disaggregating data from precincts down to blocks,Prorating data when units do not nest neatly, and Fixing overlaps and gaps
The project's priorities are to be efficient by using spatial indices whenever possible and to integrate well with the existing ecosystem around
shapely. The package is distributed under the MIT License.
We can installing maup from conda-forge using conda:
conda install -c conda-forge maup
Also we can get
conda by installing
Miniconda, a free Python distribution made especially for data science and scientific computing. You might also consider
Anaconda, which includes many data science packages that you might find useful. To install
run pip install maup from your terminal.
Here are some basic situations where we might find
maup helpful. For these examples, we use test data from Providence, Rhode Island
"zip://./examples/blocks.zip") > precincts = geopandas.read_file("zip://./examples/precincts.zip") > districts = geopandas.read_file("zip://./examples/districts.zip")> import geopandas > import pandas > >>> blocks = geopandas.read_file(
Assigning precincts to districts
The assign function in
maup takes two sets of
geometries called sources and targets and returns a
pandas Series. The Series maps each
geometry in sources to the
geometry in targets that covers it. (Here,
geometry A covers
geometry B if every point of A and its
boundary lies in B or its
boundary.) If a source
geometry is not covered by one single target
geometry, it is assigned to the target
geometry that covers the largest portion of its area.
# Add the assigned districts as a column of the `precincts` GeoDataFrame: > precincts["DISTRICT"] = assignment > assignment.head() 0 7 1 5 2 13 3 6 4 1 dtype: int64> import maup > >>> assignment = maup.assign(precincts, districts) >
As an aside, you can use that assignment object to create a gerrychain Partition representing this districting plan
Aggregating block data to precincts
Precinct shapefiles usually come with election data, but not demographic data. In order to study their demographics, we need to aggregate demographic data from census blocks up to the precinct level. We can do this by assigning blocks to precincts and then aggregating the data with a Pandas groupby operation:
"TOTPOP", "NH_BLACK", "NH_WHITE"] > >>> assignment = maup.assign(blocks, precincts) > precincts[variables] = blocks[variables].groupby(assignment).sum() > precincts[variables].head() TOTPOP NH_BLACK NH_WHITE 0 5907 886 380 1 5636 924 1301 2 6549 584 4699 3 6009 435 1053 4 4962 156 3713> variables = [
If you want to move data from one set of geometries to another but your source and target geometries do not nest neatly (i.e. have overlaps), see Prorating data when units do not nest neatly.
Disaggregating data from precincts down to blocks
It's common to have data at a coarser scale that you want to attach to finer-scaled geometries. Usually this happens when vote totals for a certain election are only reported at the county level, and we want to attach that data to precinct geometries.
Let's say we want to prorate the vote totals in the columns
"PRES16R" from our precincts
GeoDataFrame down to our blocks
GeoDataFrame. The first crucial step is to decide how we want to distribute a precinct's data to the blocks within it. Since we're prorating election data, it makes sense to use a block's total population or
voting-age population. Here's how we might prorate by population
"PRES16D", "PRES16R"] > assignment = maup.assign(blocks, precincts) > >>> # We prorate the vote totals according to each block's share of the overall > # precinct population: > weights = blocks.TOTPOP / assignment.map(precincts.TOTPOP) > prorated = maup.prorate(assignment, precincts[election_columns], weights) > >>> # Add the prorated vote totals as columns on the `blocks` GeoDataFrame: > blocks[election_columns] = prorated > # We'll call .round(2) to round the values for display purposes. > blocks[election_columns].round(2).head() PRES16D PRES16R 0 0.00 0.00 1 12.26 1.70 2 15.20 2.62 3 15.50 2.67 4 3.28 0.45> election_columns = [
Warning about areal interpolation
We strongly urge you not to prorate by area! The area of a census block is not a good predictor of its population. In fact, the correlation goes in the other direction: larger census blocks are less populous than smaller ones.
Prorating data when units do not nest neatly
Suppose you have a shapefile of precincts with some election results data and you want to join that data onto a different, more recent precincts shapefile. The two sets of precincts will have overlaps, and will not nest neatly like the blocks and precincts did in the above examples. (Not that blocks and precincts always nest neatly...)
We can use
maup.intersections to break the two sets of precincts into pieces that nest neatly into both sets. Then we can disaggregate from the old precincts onto these pieces, and aggregate up from the pieces to the new precincts. This move is a bit complicated, so maup provides a function called
prorate that does just that.
We'll use our same blocks
GeoDataFrame to estimate the populations of the pieces for the purposes of proration. For our
"new precincts" shapefile, we'll use the
VTD shapefile for Rhode Island that the U.S. Census Bureau produced as part of their 2018 test run of for the 2020 Census.
"zip://./examples/new_precincts.zip") > >>> columns = ["SEN18D", "SEN18R"] > >>> # Include area_cutoff=0 to ignore any intersections with no area, > # like boundary intersections, which we do not want to include in > # our proration. > pieces = maup.intersections(old_precincts, new_precincts, area_cutoff=0) > >>> # Weight by prorated population from blocks > weights = blocks["TOTPOP"].groupby(maup.assign(blocks, pieces)).sum() > # Normalize the weights so that votes are allocated according to their > # share of population in the old_precincts > weights = maup.normalize(weights, level=0) > >>> # Use blocks to estimate population of each piece > new_precincts[columns] = maup.prorate( ... pieces, ... old_precincts[columns], ... weights=weights ... ) > new_precincts[columns].head() SEN18D SEN18R 0 752.0 51.0 1 370.0 21.0 2 97.0 17.0 3 585.0 74.0 4 246.0 20.0> old_precincts = precincts > new_precincts = geopandas.read_file(
Fixing overlaps and gaps
Precinct shapefiles are often created by stitching together collections of precinct geometries sourced from different counties or different years. As a result, the shapefile often has
overlaps between precincts where the different sources disagree about the boundaries. These
overlaps pose problems when you are interested in working with the adjacency graph of the precincts, and not just in mapping the precincts. This adjacency information is especially important when studying redistricting, because districts are almost always expected to be contiguous.
maup provides functions for closing gaps and resolving overlaps in a collection of geometries. As an example, we'll apply both functions to these geometries, which have both an
overlap and a
overlaps in real shapefiles are tiny and easy to miss, but this exaggerated example will help illustrate the functionality.First, we'll use shapely to create the polygons from scratch:
from shapely.geometry import Polygon geometries = geopandas.GeoSeries([ Polygon([(0, 0), (2, 0), (2, 1), (1, 1), (1, 2), (0, 2)]), Polygon([(2, 0), (4, 0), (4, 2), (2, 2)]), Polygon([(0, 2), (2, 2), (2, 4), (0, 4)]), Polygon([(2, 1), (4, 1), (4, 4), (2, 4)]), ])
Now we'll close the
0 POLYGON ((0 0, 2 0, 2 1, 1 1, 1 2, 0 2, 0 0)) 1 POLYGON ((2 0, 4 0, 4 2, 2 2, 2 0)) 2 POLYGON ((0 2, 2 2, 2 4, 0 4, 0 2)) 3 POLYGON ((2 1, 4 1, 4 4, 2 4, 2 1)) dtype: object> without_gaps = maup.close_gaps(geometries) > without_gaps
And then resolve the
0 POLYGON ((0 0, 2 0, 2 1, 1 1, 1 2, 0 2, 0 0)) 1 POLYGON ((2 0, 4 0, 4 2, 2 2, 2 0)) 2 POLYGON ((0 2, 2 2, 2 4, 0 4, 0 2)) 3 POLYGON ((2 1, 4 1, 4 4, 2 4, 2 1)) dtype: object> without_overlaps_or_gaps = maup.resolve_overlaps(without_gaps) > without_overlaps_or_gaps
Both of the functions
close_gaps accept a
relative_threshold argument. This threshold controls how large of a gap or overlap the function will attempt to fix. The default value of
0.1, which means that the functions will leave alone any
overlap whose area is more than
10% of the area of the geometries that might absorb that
overlap. In the above example, we set
relative_threshold=None to ensure that no
overlaps were ignored.