Note
Go to the end to download the full example code.
Aggregate from points
Group points into grid cells
Note
For a tiled implementation of this approach using Dask, see example aggregate_dask.py
Introduction
Grouping points in cells (here referred to as ‘aggregation’) is a common occurrence in spatial data processing. This is often used to generate heatmaps, obtain statistics of the data or to reduce the size of the data for easier processing. By grouping nearby points in the same cell, you can for example calculate the standard deviation to get a feel for the variability of your data.
In this example we will count the number of points in a cell. This gives a sense of the distribution of the points. Other common operations for this kind of exercise are: std, mean, median, percentile, min and max.
Tip
Groupby operations can also be done on polygons of arbitrary shape. If this is of interest to you, have a look at GeoPanda’s sjoin
Generate input data
Let’s start by generating some points. The data will be a set of points scattered around a circle to create a dougnut-like shape. I’ll plot the input here to show what it looks like.
import matplotlib.pyplot as plt
from gridkit.doc_utils import generate_2d_scatter_doughnut, plot_polygons
points = generate_2d_scatter_doughnut(num_points=2000, radius=4)
# Create the scatter plot
plt.scatter(*points.T, s=5)
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Doughnut with Gaussian Scattering")
plt.axis("equal")
plt.show()

Relate points to grid cells
Now we can create a grid and aggregate our points onto that grid.
from gridkit import HexGrid
grid = HexGrid(size=1, shape="flat")
cell_ids = grid.cell_at_point(points)
/home/runner/work/GridKit/GridKit/gridkit/hex_grid.py:100: UserWarning: A 'flat' ``shape`` will be deprecated in version v1.0.0. It is advised to use ``rotation=30`` instead.
warnings.warn(
Count the points per cell
The cell_ids
obtained in the previous step can be used to group the points.
Each point with the same ‘cell_id’ will be regarded as being in the same ‘bin’.
We can then do statistics on these bins. In our case we will count the number of points per bin.
For convenience, I will use a panda’s groupby functionality for this.
Note that GridIndex.index_1d()
is used here, and not GridIndex.index.
In the latter the x and y ids are split, which does not work well with DataFrames.
import pandas
df = pandas.DataFrame(
{"nr_points": 0, "cell_id": cell_ids.index_1d}
) # The 'nr_points' will contain the result after `.count()` is called
occurrences = df.groupby("cell_id").count()
Visualize the results
Now we have the number of points per cell, let’s obtain the corresponding cell shapes from the grid object and plot them.
Since we used GridIndex.index_1d()
,
we will have to convert that back into a GridIndex using GridIndex.from_index_1d()
.
from gridkit import GridIndex
occurrences_ids = GridIndex.from_index_1d(occurrences.index)
geoms = grid.to_shapely(occurrences_ids)
plot_polygons(geoms, colors=occurrences["nr_points"].values, cmap="Oranges")
plt.title("Number of points per cell")
plt.show()

Total running time of the script: (0 minutes 0.310 seconds)