API reference

This page lists the generated API reference to the library.

XGeoDatasetAccessor

class xgeo.raster.XGeoDatasetAccessor(xarray_obj)

XGeoDatasetAccessor adds the geospatial functionalities to the xarray Dataset. The accessor makes use of the versatility of xarray together with the geospatial operations provided by rasterio together with many custom operations that are used in general day to day task in the geospatial world.

add_mask(vector_file, geometry_name='geometry', value_name=None, mask_name='mask')

Rasterizes the vector_file and add the mask as coordinate with name mask_name to the Dataset

Parameters
  • vector_file (str or geopandas.Dataframe) – Vector file which need to be rasterized and added as mask

  • geometry_name (str) – Name of geometry column in vector file if it doesn’t default to “geometry”

  • value_name (str) – Name of the value column, its value will be used to fill the raster. If None, all values in geometry is filled with 1

  • mask_name (str) – Name of the mask index

band_coords

Gets the band coordinates of the Dataset.

Returns

bandcoords – Band coordinates of the Dataset

Return type

xarray.DataArray

band_dim

Gets name of band dimension

Returns

band_dim – Name of the band dimension

Return type

str

band_size

Gets the size of band dimension

Returns

bands – Size of band dimension

Return type

int

bounds

Gets the bounds of the data.

Returns

bounds – Bounds of the data (left, bottom, right, top)

Return type

tuple

origin

Gets the origin of the Dataset in human readable format.

Returns

origin – Origin of the Dataset.

Return type

str

projection

Gets the projection/CRS system of the Dataset

Returns

projection – Projection/CRS in proj4 string

Return type

str

reproject(data_var=None, target_crs=None, resolution=None, resampling=None, target_height=None, target_width=None, source_nodata=None, target_nodata=None, memory_limit=0, threads=None)

Reprojects and resamples the Dataset.

Parameters
  • data_var (str) – The raster DataArray to be reprojected. Defaults to all

  • target_crs (int or string or dict) – Target projection/CRS system the DataSet should be reprojected to

  • resolution (int or float (Optional)) – Target resolution

  • resampling (rasterio.warp.Resampling or string) – Resampling method to be used. Default is ‘nearest’

  • target_height (int (Optional)) – Target height

  • target_width (int (Optional)) – Target width

  • source_nodata (int or float (Optional)) – Source NoData value

  • target_nodata (int or float (Optional)) – Target NoData value

  • memory_limit (int (Optional)) – Maximum memory the process should use. Defaults to 64MB

  • threads (int (Optional)) – Number of threads the process should use. Defaults to number of CPU.

Returns

dsout – Dataset with the reprojected rasters.

Return type

xarray.Dataset

Examples

>>> import xgeo  # In order to use the xgeo accessor
>>> import xarray as xr
>>> ds = xr.open_rasterio('test.tif')
>>> ds = ds.to_dataset(name='data')
>>> ds_reprojected = ds.geo.reproject(target_crs=4326)
resolutions

Gets the resolutions of the DataArrays in Dataset. If the resolutions don’t exist, it calculates the resolutions from the current coordinates.

Returns

resolutions – x and y resolutions of the DataArrays.

Return type

(float, float)

sample(vector_file, geometry_name='geometry', value_name='id')

Samples the pixel for the given regions. Each sample pixel have all the data values for each timestamp and each band.

Parameters
  • vector_file (str) – Name of the vector file to be used for the sampling. The vector file can be any one supported by geopandas.

  • geometry_name (str) – Name of the geometry in the vector file, if it doesn’t default to ‘geometry’”

  • value_name (str) – Name of the value of each region. This value will be associated with each pixels.

Returns

samples – Samples of pixels contained and touched by each regions in pandas.Dataframe.

Return type

pandas.Dataframe

Examples

>>> import xgeo  # In order to use the xgeo accessor
>>> import xarray as xr
>>> ds = xr.open_rasterio('test.tif')
>>> ds = ds.to_dataset(name='data')
>>> df_sample = ds.geo.sample(vector_file='test.shp', value_name="class")
slice_dataset(indices=None, bounds=None)

Subsets Dataset either with indices or bounds. :param indices: Indices (row_x_min, col_y_min, row_x_max, col_y_max) :type indices: tuple/list :param bounds: Bounds (x_min, y_min, x_max, y_max) :type bounds: tuple/list

Returns

ds – Dataset with data in given bounds or indices

Return type

xarray.Dataset

stats()

Calculates general statistics mean, standard deviation, max, min of for each band.

Returns

statistics – DataFrame with statistics

Return type

pandas.Dataframe

subset(vector_file, geometry_name='geometry', crop=False, extent_only=False, invert=False)

Subset the Dataset with the vector file. :param vector_file: Path to the vector file. Any vector file supported by GDAL are supported. :type vector_file: str or geopandas.GeoDataFrame :param geometry_name: Column name that describes the geometries in the vector file. Default value is “geometry” :type geometry_name: str :param crop: If True, the output Dataset bounds is approximately equal to the total bounds of the geometry. The

default value is False

Parameters
  • extent_only (bool) – If True, the output Dataset consists all the data that are within the total bounds of the geometry. Default value is True. If extent_only is True, the crop is by default True.

  • invert (bool) – If True, the output GeoDataset contains values that are only outside of the geometries. Default value is False. This doesn’t have effect if extent_only is True

Returns

ds_subset – Subset dataset

Return type

xarray.Dataset

time_coords

Gets the time coordinates of the Dataset

Returns

timecoords – Time coordinates of the Dataset

Return type

xarray.DataArray

time_dim

Gets name of time dimension

Returns

time_dim – Name of the time dimension

Return type

str

time_size

Gets the size of time dimension

Returns

times – Size of time dimension

Return type

int

to_geotiff(output_path='.', file_prefix='data', overviews=False, bigtiff=True, compress='lzw', tiled=True, num_threads='ALL_CPUS')

Creates one or multiple Geotiffs for the Dataset. If the Dataset has muliple raster arrays or raster arrays for multiple timestamps, separate geotiffs are created in following path:

output_path/<file_prifix>_<variable_nane>_<timestamp>.tif

Parameters
  • output_path (str) – Output directory for the files.

  • file_prefix (str) – Prefix for the filename

  • overviews (bool) – Creates image overviews if True.

  • bigtiff (bool) – Creates BigTiff if True

  • compress (str) – Compression algorithm to apply, default ‘lzw’

  • tiled (bool) – The tif is tiled if True

  • num_threads (int or str) – The number of threads the process should use. Default is ‘ALL_CPUS’

transform

Gets the geo-transform of the Dataset. If the transform isn’t present, it calculate the transform from the current coordinates of Dataset. :returns: transform – Geo-transform (x resolution, 0, x origin, 0, y resolution, y origin) :rtype: tuple

validate_and_restructure()
Validates and restructures the dataset to make full utilization of GeoDataset.
  • Validates if x and y dimensions exists

  • Validates if band and time dimension exists. If they don’t exist, it adds those dimensions to the raster

    DataArrays

Returns

dsout – A copy of original dataset restructured to have all raster DataArray in 4 dimensional format. It allows the library to be consistent over its operations.

Return type

xarray.Dataset

x_coords

Gets the X coordinates. :returns: xcoords – X coordinates of the Dataset :rtype: xarray.DataArray

x_dim

Gets name of X dimension :returns: x_dim – Name of the X dimension :rtype: str

x_size

Gets the size of X dimension :returns: xsize – Size of X dimension :rtype: int

y_coords

Gets the Y coordinates of the Dataset.

Returns

ycoords – Y Coordinates of the Dataset

Return type

xarray.DataArray

y_dim

Gets name of Y dimension :returns: y_dim – Name of the y dimension :rtype: str

y_size

Gets the size of Y dimension :returns: ysize – Size of Y dimension :rtype: int

zonal_stats(vector_file, geometry_name='geometry', value_name='id')

Calculates statistics for regions in the vector file.

Parameters
  • vector_file (str or geopandas.GeoDataFrame) – Vector file with regions/zones for which statistics needs to be calculated

  • geometry_name (str) – Name of the geometry column in vector file. Default is “geometry”

  • value_name (str) – Name of the value column for each of which the statistics need to be calculated. Default is “id”

Returns

zonal_statistics – DataFrame with Statistics

Return type

pandas.Dataframe

XCRS

class xgeo.crs.XCRS(initialdata=None, **kwargs)
classmethod from_any(proj: dict)

Makes CRS from any supported system

Parameters

proj (dict or str or int) – Projection in PROJ, EPSG, WKT or CF.

Returns

CRS

Return type

XCRS

classmethod from_cf_dict(cf_dict: dict)

Makes CRS from Climate and Forecast Convention grid_mapping

Parameters

cf_dict (dict) – CF grid_mapping in dictionary format

Returns

CRS

Return type

XCRS