Graph Module¶

The graph module provides functions for converting between different graph representations, including GeoDataFrames, NetworkX graphs, and PyTorch Geometric data objects.

Conversion Functions¶

Module for creating heterogeneous graph representations of urban environments.

This module provides comprehensive functionality for converting spatial data (GeoDataFrames and NetworkX objects) into PyTorch Geometric Data and HeteroData objects, supporting both homogeneous and heterogeneous graphs. It handles the complex mapping between geographical coordinates, node/edge features, and the tensor representations required by graph neural networks.

The module serves as a bridge between geospatial data analysis tools and deep learning frameworks, enabling seamless integration of spatial urban data with Graph Neural Networks (GNNs) for tasks of GeoAI such as urban modeling, traffic prediction, and spatial analysis.

Functions:

Name	Description
`gdf_to_pyg`	Convert GeoDataFrames (nodes/edges) to a PyTorch Geometric object.
`nx_to_pyg`	Convert NetworkX graph to PyTorch Geometric Data object.
`pyg_to_gdf`	Convert PyTorch Geometric data to GeoDataFrames.
`pyg_to_nx`	Convert a PyTorch Geometric object to a NetworkX graph.

gdf_to_pyg ¶

gdf_to_pyg(
    nodes,
    edges=None,
    node_feature_cols=None,
    node_label_cols=None,
    edge_feature_cols=None,
    device=None,
    dtype=None,
    keep_geom=True,
)

Convert GeoDataFrames (nodes/edges) to a PyTorch Geometric object.

This function serves as the main entry point for converting spatial data into PyTorch Geometric graph objects. It automatically detects whether to create homogeneous or heterogeneous graphs based on input structure. Node identifiers are taken from the GeoDataFrame index. Edge relationships are defined by a MultiIndex on the edge GeoDataFrame (source ID, target ID).

The operation multiplies typed adjacency tables to connect terminal node pairs and can aggregate additional numeric edge attributes along the way.

Parameters:

Name	Type	Description	Default
`nodes`	`dict[str, GeoDataFrame] or GeoDataFrame`	Node data. For homogeneous graphs, provide a single GeoDataFrame. For heterogeneous graphs, provide a dictionary mapping node type names to their respective GeoDataFrames. The index of these GeoDataFrames will be used as node identifiers.	required
`edges`	`dict[tuple[str, str, str], GeoDataFrame] or GeoDataFrame`	Edge data. For homogeneous graphs, provide a single GeoDataFrame. For heterogeneous graphs, provide a dictionary mapping edge type tuples (source_type, relation_type, target_type) to their GeoDataFrames. The GeoDataFrame must have a MultiIndex where the first level represents source node IDs and the second level represents target node IDs.	`None`
`node_feature_cols`	`dict[str, list[str]] or list[str]`	Column names to use as node features. For heterogeneous graphs, provide a dictionary mapping node types to their feature columns.	`None`
`node_label_cols`	`dict[str, list[str]] or list[str]`	Column names to use as node labels for supervised learning tasks. For heterogeneous graphs, provide a dictionary mapping node types to their label columns.	`None`
`edge_feature_cols`	`dict[str, list[str]] or list[str]`	Column names to use as edge features. For heterogeneous graphs, provide a dictionary mapping relation types to their feature columns.	`None`
`device`	`str or device`	Target device for tensor placement ('cpu', 'cuda', or torch.device). If None, automatically selects CUDA if available, otherwise CPU.	`None`
`dtype`	`dtype`	Data type for float tensors (e.g., torch.float32, torch.float16). If None, uses torch.float32 (default PyTorch float type).	`None`
`keep_geom`	`bool`	Whether to preserve geometry information during conversion. If True, original geometries are serialized and stored in metadata for exact reconstruction. If False, geometries are reconstructed from node positions during conversion back to GeoDataFrames (creating straight-line edges between nodes).	`True`

Returns:

Type	Description
`Data or HeteroData`	PyTorch Geometric Data object for homogeneous graphs or HeteroData object for heterogeneous graphs. The returned object contains: Node features (x), positions (pos), and labels (y) if available Edge connectivity (edge_index) and features (edge_attr) if available Metadata for reconstruction including ID mappings and column names

Raises:

Type	Description
`ImportError`	If PyTorch Geometric is not installed.
`ValueError`	If input GeoDataFrames are invalid or incompatible.

See Also

pyg_to_gdf : Convert PyTorch Geometric data back to GeoDataFrames. nx_to_pyg : Convert NetworkX graph to PyTorch Geometric object. city2graph.utils.validate_gdf : Validate GeoDataFrame structure.

Notes

This function automatically detects the graph type based on input structure. For heterogeneous graphs, provide dictionaries mapping types to GeoDataFrames. Node positions are automatically extracted from geometry centroids when available. - Preserves original coordinate reference systems (CRS) - Maintains index structure for bidirectional conversion - Handles both Point and non-Point geometries (using centroids) - Creates empty tensors for missing features/edges - For heterogeneous graphs, ensures consistent node/edge type mapping

Examples:

Create a homogeneous graph from single GeoDataFrames:

>>> import geopandas as gpd
>>> from city2graph.graph import gdf_to_pyg
>>>
>>> # Load and prepare node data
>>> nodes_gdf = gpd.read_file("nodes.geojson").set_index("node_id")
>>> edges_gdf = gpd.read_file("edges.geojson").set_index(["source_id", "target_id"])
>>>
>>> # Convert to PyTorch Geometric
>>> data = gdf_to_pyg(nodes_gdf, edges_gdf,
...                   node_feature_cols=['population', 'area'])

Create a heterogeneous graph from dictionaries:

>>> # Prepare heterogeneous data
>>> buildings_gdf = buildings_gdf.set_index("building_id")
>>> roads_gdf = roads_gdf.set_index("road_id")
>>> connections_gdf = connections_gdf.set_index(["building_id", "road_id"])
>>>
>>> # Define node and edge types
>>> nodes_dict = {'building': buildings_gdf, 'road': roads_gdf}
>>> edges_dict = {('building', 'connects', 'road'): connections_gdf}
>>>
>>> # Convert to heterogeneous graph with labels
>>> data = gdf_to_pyg(nodes_dict, edges_dict,
...                   node_label_cols={'building': ['type'], 'road': ['category']})

nx_to_pyg ¶

nx_to_pyg(
    graph,
    node_feature_cols=None,
    node_label_cols=None,
    edge_feature_cols=None,
    device=None,
    dtype=None,
    keep_geom=True,
)

Convert NetworkX graph to PyTorch Geometric Data object.

Converts a NetworkX Graph to a PyTorch Geometric Data object by first converting to GeoDataFrames then using the main conversion pipeline. This provides a bridge between NetworkX's rich graph analysis tools and PyTorch Geometric's deep learning capabilities.

Parameters:

Name	Type	Description	Default
`graph`	`Graph`	NetworkX graph to convert.	required
`node_feature_cols`	`list[str]`	List of node attribute names to use as features.	`None`
`node_label_cols`	`list[str]`	List of node attribute names to use as labels.	`None`
`edge_feature_cols`	`list[str]`	List of edge attribute names to use as features.	`None`
`device`	`device or str`	Target device for tensor placement ('cpu', 'cuda', or torch.device). If None, automatically selects CUDA if available, otherwise CPU.	`None`
`dtype`	`dtype`	Data type for float tensors (e.g., torch.float32, torch.float16). If None, uses torch.float32 (default PyTorch float type).	`None`
`keep_geom`	`bool`	Whether to preserve geometry information during conversion. If True, original geometries are serialized and stored in metadata for exact reconstruction. If False, geometries are reconstructed from node positions during conversion back to GeoDataFrames.	`True`

Returns:

Type	Description
`Data or HeteroData`	PyTorch Geometric Data object for homogeneous graphs or HeteroData object for heterogeneous graphs. The returned object contains: Node features (x), positions (pos), and labels (y) if available Edge connectivity (edge_index) and features (edge_attr) if available Metadata for reconstruction including ID mappings and column names

Raises:

Type	Description
`ImportError`	If PyTorch Geometric is not installed.
`ValueError`	If the NetworkX graph is invalid or empty.

See Also

pyg_to_nx : Convert PyTorch Geometric data to NetworkX graph. gdf_to_pyg : Convert GeoDataFrames to PyTorch Geometric object. city2graph.utils.nx_to_gdf : Convert NetworkX graph to GeoDataFrames.

Notes

Uses intermediate GeoDataFrame conversion for consistency
Preserves all graph attributes and metadata
Handles spatial coordinates if present in node attributes
Maintains compatibility with existing city2graph workflows
Automatically creates geometry from 'x', 'y' coordinates if available

Examples:

Convert a NetworkX graph with spatial data:

>>> import networkx as nx
>>> from city2graph.graph import nx_to_pyg
>>>
>>> # Create NetworkX graph with spatial attributes
>>> G = nx.Graph()
>>> G.add_node(0, x=0.0, y=0.0, population=1000)
>>> G.add_node(1, x=1.0, y=1.0, population=1500)
>>> G.add_edge(0, 1, weight=0.5, road_type='primary')
>>>
>>> # Convert to PyTorch Geometric
>>> data = nx_to_pyg(G,
...                  node_feature_cols=['population'],
...                  edge_feature_cols=['weight'])

Convert from graph analysis results:

>>> # Use NetworkX for analysis, then convert for ML
>>> communities = nx.community.greedy_modularity_communities(G)
>>> # Add community labels to nodes
>>> for i, community in enumerate(communities):
...     for node in community:
...         G.nodes[node]['community'] = i
>>>
>>> # Convert with community labels
>>> data = nx_to_pyg(G, node_label_cols=['community'])

pyg_to_gdf ¶

pyg_to_gdf(data, node_types=None, edge_types=None, keep_geom=True)

Convert PyTorch Geometric data to GeoDataFrames.

Reconstructs the original GeoDataFrame structure from PyTorch Geometric Data or HeteroData objects. This function provides bidirectional conversion capability, preserving spatial information, feature data, and metadata.

Parameters:

Name	Type	Description	Default
`data`	`Data or HeteroData`	PyTorch Geometric data object to convert back to GeoDataFrames.	required
`node_types`	`str or list[str]`	For heterogeneous graphs, specify which node types to reconstruct. If None, reconstructs all available node types.	`None`
`edge_types`	`str or list[tuple[str, str, str]]`	For heterogeneous graphs, specify which edge types to reconstruct. Edge types are specified as (source_type, relation_type, target_type) tuples. If None, reconstructs all available edge types.	`None`
`keep_geom`	`bool`	Whether to use stored geometries for reconstruction. If True and geometries are stored in metadata, uses the original geometries. If False or no stored geometries exist, reconstructs geometries from node positions (creating straight-line edges between nodes).	`True`

Returns:

Type	Description
`tuple[GeoDataFrame, GeoDataFrame] \| tuple[dict[str, GeoDataFrame], dict[tuple[str, str, str], GeoDataFrame]]`	For Data input: Returns a tuple containing: - First element: GeoDataFrame containing nodes - Second element: GeoDataFrame containing edges (or None if no edges) For HeteroData input: Returns a tuple containing: - First element: dict mapping node type names to GeoDataFrames - Second element: dict mapping edge types to GeoDataFrames

See Also

gdf_to_pyg : Convert GeoDataFrames to PyTorch Geometric object. pyg_to_nx : Convert PyTorch Geometric data to NetworkX graph.

Notes

Preserves original index structure and names when available
Reconstructs geometry from stored position tensors
Maintains coordinate reference system (CRS) information
Converts feature tensors back to named DataFrame columns
Handles both homogeneous and heterogeneous graph structures

Examples:

Convert homogeneous PyTorch Geometric data back to GeoDataFrames:

>>> from city2graph.graph import pyg_to_gdf
>>>
>>> # Convert back to GeoDataFrames
>>> nodes_gdf, edges_gdf = pyg_to_gdf(data)

Convert heterogeneous data with specific node types:

>>> # Convert only specific node types
>>> node_gdfs, edge_gdfs = pyg_to_gdf(hetero_data,
...                                   node_types=['building', 'road'])

pyg_to_nx ¶

pyg_to_nx(data, keep_geom=True)

Convert a PyTorch Geometric object to a NetworkX graph.

Converts PyTorch Geometric Data or HeteroData objects to NetworkX graphs, preserving node and edge features as graph attributes. This enables compatibility with the extensive NetworkX ecosystem for graph analysis.

Parameters:

Name	Type	Description	Default
`data`	`Data or HeteroData`	PyTorch Geometric data object to convert.	required
`keep_geom`	`bool`	Whether to use stored geometries for reconstruction. If True and geometries are stored in metadata, uses the original geometries. If False or no stored geometries exist, reconstructs geometries from node positions.	`True`

Returns:

Type	Description
`Graph`	The converted NetworkX graph with node and edge attributes. For heterogeneous graphs, node and edge types are stored as attributes.

Raises:

Type	Description
`ImportError`	If PyTorch Geometric is not installed.

See Also

nx_to_pyg : Convert NetworkX graph to PyTorch Geometric object. pyg_to_gdf : Convert PyTorch Geometric data to GeoDataFrames.

Notes

Node features, positions, and labels are stored as node attributes
Edge features are stored as edge attributes
For heterogeneous graphs, type information is preserved
Geometry information is converted from tensor positions
Maintains compatibility with NetworkX analysis algorithms

Examples:

Convert PyTorch Geometric data to NetworkX:

>>> from city2graph.graph import pyg_to_nx
>>> import networkx as nx
>>>
>>> # Convert to NetworkX graph
>>> nx_graph = pyg_to_nx(data)
>>>
>>> # Use NetworkX algorithms
>>> centrality = nx.betweenness_centrality(nx_graph)
>>> communities = nx.community.greedy_modularity_communities(nx_graph)

Validation Functions¶

Module for creating heterogeneous graph representations of urban environments.

This module provides comprehensive functionality for converting spatial data (GeoDataFrames and NetworkX objects) into PyTorch Geometric Data and HeteroData objects, supporting both homogeneous and heterogeneous graphs. It handles the complex mapping between geographical coordinates, node/edge features, and the tensor representations required by graph neural networks.

The module serves as a bridge between geospatial data analysis tools and deep learning frameworks, enabling seamless integration of spatial urban data with Graph Neural Networks (GNNs) for tasks of GeoAI such as urban modeling, traffic prediction, and spatial analysis.

Functions:

Name	Description
`is_torch_available`	Check if PyTorch Geometric is available.
`validate_pyg`	Validate PyTorch Geometric Data or HeteroData objects and return metadata.

is_torch_available ¶

is_torch_available()

Check if PyTorch Geometric is available.

This utility function checks whether the required PyTorch and PyTorch Geometric packages are installed and can be imported. It's useful for conditional functionality and providing helpful error messages.

Returns:

Type	Description
`bool`	True if PyTorch Geometric can be imported, False otherwise.

See Also

gdf_to_pyg : Convert GeoDataFrames to PyTorch Geometric (requires torch). pyg_to_gdf : Convert PyTorch Geometric to GeoDataFrames (requires torch).

Notes

Returns False if either PyTorch or PyTorch Geometric is missing
Used internally by torch-dependent functions to provide helpful error messages

Examples:

Check availability before using torch-dependent functions:

>>> from city2graph.graph import is_torch_available
>>>
>>> if is_torch_available():
...     from city2graph.graph import gdf_to_pyg
...     data = gdf_to_pyg(nodes_gdf, edges_gdf)
... else:
...     print("PyTorch Geometric not available.")

validate_pyg ¶

validate_pyg(data)

Validate PyTorch Geometric Data or HeteroData objects and return metadata.

This centralized validation function performs comprehensive validation of PyG objects, including type checking, metadata validation, and structural consistency checks. It serves as the single point of validation for all PyG objects in city2graph.

Parameters:

Name	Type	Description	Default
`data`	`Data or HeteroData`	PyTorch Geometric data object to validate.	required

Returns:

Type	Description
`GraphMetadata`	Metadata object containing graph information for reconstruction.

Raises:

Type	Description
`ImportError`	If PyTorch Geometric is not installed.
`TypeError`	If data is not a valid PyTorch Geometric object.
`ValueError`	If the data object is missing required metadata or is inconsistent.

See Also

pyg_to_gdf : Convert PyG objects to GeoDataFrames. pyg_to_nx : Convert PyG objects to NetworkX graphs.

Examples:

>>> data = gdf_to_pyg(nodes_gdf, edges_gdf)
>>> metadata = validate_pyg(data)

Metapath Functions¶

Module for creating heterogeneous graph representations of urban environments.

This module provides comprehensive functionality for converting spatial data (GeoDataFrames and NetworkX objects) into PyTorch Geometric Data and HeteroData objects, supporting both homogeneous and heterogeneous graphs. It handles the complex mapping between geographical coordinates, node/edge features, and the tensor representations required by graph neural networks.

The module serves as a bridge between geospatial data analysis tools and deep learning frameworks, enabling seamless integration of spatial urban data with Graph Neural Networks (GNNs) for tasks of GeoAI such as urban modeling, traffic prediction, and spatial analysis.

Functions:

Name	Description
`add_metapaths`	Add metapath-derived edges to a heterogeneous graph.
`add_metapaths_by_weight`	Connect nodes of a specific type if they are reachable within a cost threshold band.

add_metapaths ¶

add_metapaths(
    graph=None,
    nodes=None,
    edges=None,
    sequence=None,
    new_relation_name=None,
    edge_attr=None,
    edge_attr_agg="sum",
    directed=False,
    trace_path=False,
    multigraph=False,
    as_nx=False,
    **_
)

Add metapath-derived edges to a heterogeneous graph.

The operation multiplies typed adjacency tables to connect terminal node pairs and can aggregate additional numeric edge attributes along the way.

Parameters:

Name	Type	Description	Default
`graph`	`tuple or Graph or MultiGraph`	Heterogeneous graph input expressed as typed GeoDataFrame dictionaries or a city2graph-compatible NetworkX graph.	`None`
`nodes`	`dict[str, GeoDataFrame]`	Dictionary of node GeoDataFrames.	`None`
`edges`	`dict[tuple[str, str, str], GeoDataFrame]`	Dictionary of edge GeoDataFrames.	`None`
`sequence`	`list[tuple[str, str, str]]`	Sequence of metapath specifications; every edge type is a `(src_type, relation, dst_type)` tuple and the path must contain at least two steps.	`None`
`new_relation_name`	`str`	Target edge relation name for the new metapath edges. If None (default), edges are named `metapath_0`.	`None`
`edge_attr`	`str \| list[str] \| None`	Numeric edge attributes to aggregate along metapaths. When `None`, only path weights are produced.	`None`
`edge_attr_agg`	`str \| object \| None`	Aggregation strategy for `edge_attr` columns. Supported values are `"sum"` and `"mean"` (default `"sum"`).	`'sum'`
`directed`	`bool`	Treat metapaths as directed when `True`; otherwise both edge directions are accepted when available in the input graph.	`False`
`trace_path`	`bool`	When `True`, attempt to create traced geometries. Currently ignored but retained for API compatibility.	`False`
`multigraph`	`bool`	When returning NetworkX data, build a `networkx.MultiGraph` if `True`.	`False`
`as_nx`	`bool`	Return the result as a NetworkX graph when `True`.	`False`
`**_`	`object`	Ignored placeholder for future keyword extensions.	`{}`

Returns:

Type	Description
`tuple[dict[str, GeoDataFrame], dict[tuple[str, str, str], GeoDataFrame]] \| Graph \| MultiGraph`	The graph with metapath-derived edges. If as_nx is False (default), returns a tuple of node and edge GeoDataFrames. If as_nx is True, returns a NetworkX graph (Graph or MultiGraph).

Notes

Legacy scaffolding for path-tracing geometries has been removed because it was never executed. The trace_path argument is preserved for API compatibility but remains a no-op while straight-line geometries are generated for all metapath edges.

add_metapaths_by_weight ¶

add_metapaths_by_weight(
    graph=None,
    nodes=None,
    edges=None,
    weight=None,
    threshold=None,
    new_relation_name=None,
    min_threshold=0.0,
    edge_types=None,
    endpoint_type=None,
    directed=False,
    multigraph=False,
    as_nx=False,
)

Connect nodes of a specific type if they are reachable within a cost threshold band.

This function dynamically adds metapaths (edges) between nodes of a specified endpoint_type if they are reachable within a given cost band [min_threshold, threshold] based on edge weights (e.g., travel time). It uses Dijkstra's algorithm for path finding via scipy.sparse.csgraph for efficiency.

Parameters:

Name	Type	Description	Default
`graph`	`tuple or Graph or MultiGraph`	Input graph. Can be a tuple of (nodes_dict, edges_dict) or a NetworkX graph.	`None`
`nodes`	`dict[str, GeoDataFrame]`	Dictionary of node GeoDataFrames.	`None`
`edges`	`dict[tuple[str, str, str], GeoDataFrame]`	Dictionary of edge GeoDataFrames.	`None`
`weight`	`str`	The edge attribute to use as weight (e.g., 'travel_time').	`None`
`threshold`	`float`	The maximum cost threshold for connection.	`None`
`new_relation_name`	`str`	Name of the new edge relation.	`None`
`min_threshold`	`float`	The minimum cost threshold for connection.	`0.0`
`edge_types`	`list[tuple[str, str, str]]`	List of edge types to consider for traversal. If None, all edges are used.	`None`
`endpoint_type`	`str`	The node type to connect (e.g., 'building').	`None`
`directed`	`bool`	If True, creates a directed graph for traversal.	`False`
`multigraph`	`bool`	If True, returns a MultiGraph (only relevant if as_nx=True).	`False`
`as_nx`	`bool`	If True, returns a NetworkX graph.	`False`

Returns:

Type	Description
`Graph or MultiGraph or tuple`	The graph with added metapaths. Format depends on `as_nx` parameter.