.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "example_gallery/vector_synergy/aggregate.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_example_gallery_vector_synergy_aggregate.py: .. _example aggregate: Aggregate from points ===================== Group points into grid cells .. Note :: For a tiled implementation of this approach using Dask, see example :ref:`aggregate_dask.py ` Introduction ------------ Grouping points in cells (here referred to as 'aggregation') is a common occurrence in spatial data processing. This is often used to generate heatmaps, obtain statistics of the data or to reduce the size of the data for easier processing. By grouping nearby points in the same cell, you can for example calculate the standard deviation to get a feel for the variability of your data. In this example we will count the number of points in a cell. This gives a sense of the distribution of the points. Other common operations for this kind of exercise are: std, mean, median, percentile, min and max. .. Tip :: Groupby operations can also be done on polygons of arbitrary shape. If this is of interest to you, have a look at `GeoPanda's sjoin `_ .. Generate input data ------------------- Let's start by generating some points. The data will be a set of points scattered around a circle to create a dougnut-like shape. I'll plot the input here to show what it looks like. .. GENERATED FROM PYTHON SOURCE LINES 43-60 .. code-block:: Python import matplotlib.pyplot as plt from gridkit.doc_utils import generate_2d_scatter_doughnut, plot_polygons points = generate_2d_scatter_doughnut(num_points=2000, radius=4) # Create the scatter plot plt.scatter(*points.T, s=5) plt.xlabel("X") plt.ylabel("Y") plt.title("Doughnut with Gaussian Scattering") plt.axis("equal") plt.show() .. image-sg:: /example_gallery/vector_synergy/images/sphx_glr_aggregate_001.png :alt: Doughnut with Gaussian Scattering :srcset: /example_gallery/vector_synergy/images/sphx_glr_aggregate_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 62-66 Relate points to grid cells --------------------------- Now we can create a grid and aggregate our points onto that grid. .. GENERATED FROM PYTHON SOURCE LINES 67-73 .. code-block:: Python from gridkit import HexGrid grid = HexGrid(size=1, shape="flat") cell_ids = grid.cell_at_point(points) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/GridKit/GridKit/gridkit/hex_grid.py:100: UserWarning: A 'flat' ``shape`` will be deprecated in version v1.0.0. It is advised to use ``rotation=30`` instead. warnings.warn( .. GENERATED FROM PYTHON SOURCE LINES 74-84 Count the points per cell ------------------------- The ``cell_ids`` obtained in the previous step can be used to group the points. Each point with the same 'cell_id' will be regarded as being in the same 'bin'. We can then do statistics on these bins. In our case we will count the number of points per bin. For convenience, I will use a panda's groupby functionality for this. Note that :meth:`.GridIndex.index_1d` is used here, and not `GridIndex.index`. In the latter the x and y ids are split, which does not work well with DataFrames. .. GENERATED FROM PYTHON SOURCE LINES 85-92 .. code-block:: Python import pandas df = pandas.DataFrame( {"nr_points": 0, "cell_id": cell_ids.index_1d} ) # The 'nr_points' will contain the result after `.count()` is called occurrences = df.groupby("cell_id").count() .. GENERATED FROM PYTHON SOURCE LINES 93-100 Visualize the results --------------------- Now we have the number of points per cell, let's obtain the corresponding cell shapes from the grid object and plot them. Since we used :meth:`.GridIndex.index_1d`, we will have to convert that back into a GridIndex using :meth:`.GridIndex.from_index_1d`. .. GENERATED FROM PYTHON SOURCE LINES 101-108 .. code-block:: Python from gridkit import GridIndex occurrences_ids = GridIndex.from_index_1d(occurrences.index) geoms = grid.to_shapely(occurrences_ids) plot_polygons(geoms, colors=occurrences["nr_points"].values, cmap="Oranges") plt.title("Number of points per cell") plt.show() .. image-sg:: /example_gallery/vector_synergy/images/sphx_glr_aggregate_002.png :alt: Number of points per cell :srcset: /example_gallery/vector_synergy/images/sphx_glr_aggregate_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.310 seconds) .. _sphx_glr_download_example_gallery_vector_synergy_aggregate.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: aggregate.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: aggregate.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: aggregate.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_