.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "example_gallery/vector_synergy/aggregate.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_example_gallery_vector_synergy_aggregate.py: .. _example aggregate: Aggregate from points ===================== Group points into grid cells .. Note :: For a tiled implementation of this approach using Dask, see example :ref:`aggregate_dask.py ` TL;DR ----- .. code-block:: python :linenos: :emphasize-lines: 5-7 import pandas from gridkit import HexGrid grid = HexGrid(size=1, shape="flat") cell_ids = grid.cell_at_point(points) df = pandas.DataFrame({"nr_points": 0, "grid_id": list(cell_ids)}) occurrences = df.groupby("grid_id").count() .. Introduction ------------ Grouping points in cells (here referred to as 'aggregation') is a common occurrence in spatial data processing. This is often used to generate heatmaps, obtain statistics of the data or to reduce the size of the data for easier processing. By grouping nearby points in the same cell, you can for example calculate the standard deviation to get a feel for the variability of your data. In this example we will count the number of points in a cell. This gives a sense of the distribution of the points. Other common operations for this kind of exercise are: std, mean, median, percentile, min and max. .. Tip :: Groupby operations can also be done on polygons of arbitrary shape. If this is of interest to you, have a look at `GeoPanda's sjoin `_ .. Generate input data ------------------- Let's start by generating some points. The data will be a set of points scattered around a circle to create a dougnut-like shape. I'll plot the input here to show what it looks like. .. GENERATED FROM PYTHON SOURCE LINES 61-78 .. code-block:: Python import matplotlib.pyplot as plt from gridkit.doc_utils import generate_2d_scatter_doughnut, plot_polygons points = generate_2d_scatter_doughnut(num_points=2000, radius=4) # Create the scatter plot plt.scatter(*points.T, s=5) plt.xlabel("X") plt.ylabel("Y") plt.title("Doughnut with Gaussian Scattering") plt.axis("equal") plt.show() .. image-sg:: /example_gallery/vector_synergy/images/sphx_glr_aggregate_001.png :alt: Doughnut with Gaussian Scattering :srcset: /example_gallery/vector_synergy/images/sphx_glr_aggregate_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 80-84 Relate points to grid cells --------------------------- Now we can create a grid and aggregate our points onto that grid. .. GENERATED FROM PYTHON SOURCE LINES 85-91 .. code-block:: Python from gridkit import HexGrid grid = HexGrid(size=1, shape="flat") cell_ids = grid.cell_at_point(points) .. GENERATED FROM PYTHON SOURCE LINES 92-100 Count the points per cell ------------------------- The ``cell_ids`` obtained in the previous step can be used to group the points. Each point with the same 'cell_id' will be regarded as being in the same 'bin'. We can then do statistics on these bins. In our case we will count the number of points per bin. For convenience, I will use a panda's groupby functionality for this. .. GENERATED FROM PYTHON SOURCE LINES 101-108 .. code-block:: Python import pandas df = pandas.DataFrame( {"nr_points": 0, "cell_id": list(cell_ids)} ) # The 'nr_points' will contain the result after `.count()` is called occurrences = df.groupby("cell_id").count() .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp/gridkit_docs/v0.7.1/examples/vector_synergy/aggregate.py:101: DeprecationWarning: Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0), (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries) but was not found to be installed on your system. If this would cause problems for you, please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466 import pandas .. GENERATED FROM PYTHON SOURCE LINES 109-113 Visualize the results --------------------- Now we have the number of points per cell, let's obtain the corresponding cell shapes from the grid object and plot them. .. GENERATED FROM PYTHON SOURCE LINES 114-118 .. code-block:: Python geoms = grid.to_shapely(occurrences.index.to_list()) plot_polygons(geoms, colors=occurrences["nr_points"].values, cmap="Oranges") plt.title("Number of points per cell") plt.show() .. image-sg:: /example_gallery/vector_synergy/images/sphx_glr_aggregate_002.png :alt: Number of points per cell :srcset: /example_gallery/vector_synergy/images/sphx_glr_aggregate_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.510 seconds) .. _sphx_glr_download_example_gallery_vector_synergy_aggregate.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: aggregate.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: aggregate.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_