API reference¶
This page lists the generated API reference to the library.
XGeoDatasetAccessor¶
-
class
xgeo.raster.
XGeoDatasetAccessor
(xarray_obj)¶ XGeoDatasetAccessor adds the geospatial functionalities to the xarray Dataset. The accessor makes use of the versatility of xarray together with the geospatial operations provided by rasterio together with many custom operations that are used in general day to day task in the geospatial world.
-
add_mask
(vector_file, geometry_name='geometry', value_name=None, mask_name='mask')¶ Rasterizes the vector_file and add the mask as coordinate with name mask_name to the Dataset
- Parameters
vector_file (str or geopandas.Dataframe) – Vector file which need to be rasterized and added as mask
geometry_name (str) – Name of geometry column in vector file if it doesn’t default to “geometry”
value_name (str) – Name of the value column, its value will be used to fill the raster. If None, all values in geometry is filled with 1
mask_name (str) – Name of the mask index
-
band_coords
¶ Gets the band coordinates of the Dataset.
- Returns
bandcoords – Band coordinates of the Dataset
- Return type
xarray.DataArray
-
band_dim
¶ Gets name of band dimension
- Returns
band_dim – Name of the band dimension
- Return type
str
-
band_size
¶ Gets the size of band dimension
- Returns
bands – Size of band dimension
- Return type
int
-
bounds
¶ Gets the bounds of the data.
- Returns
bounds – Bounds of the data (left, bottom, right, top)
- Return type
tuple
-
origin
¶ Gets the origin of the Dataset in human readable format.
- Returns
origin – Origin of the Dataset.
- Return type
str
-
projection
¶ Gets the projection/CRS system of the Dataset
- Returns
projection – Projection/CRS in proj4 string
- Return type
str
-
reproject
(data_var=None, target_crs=None, resolution=None, resampling=None, target_height=None, target_width=None, source_nodata=None, target_nodata=None, memory_limit=0, threads=None)¶ Reprojects and resamples the Dataset.
- Parameters
data_var (str) – The raster DataArray to be reprojected. Defaults to all
target_crs (int or string or dict) – Target projection/CRS system the DataSet should be reprojected to
resolution (int or float (Optional)) – Target resolution
resampling (rasterio.warp.Resampling or string) – Resampling method to be used. Default is ‘nearest’
target_height (int (Optional)) – Target height
target_width (int (Optional)) – Target width
source_nodata (int or float (Optional)) – Source NoData value
target_nodata (int or float (Optional)) – Target NoData value
memory_limit (int (Optional)) – Maximum memory the process should use. Defaults to 64MB
threads (int (Optional)) – Number of threads the process should use. Defaults to number of CPU.
- Returns
dsout – Dataset with the reprojected rasters.
- Return type
xarray.Dataset
Examples
>>> import xgeo # In order to use the xgeo accessor >>> import xarray as xr >>> ds = xr.open_rasterio('test.tif') >>> ds = ds.to_dataset(name='data') >>> ds_reprojected = ds.geo.reproject(target_crs=4326)
-
resolutions
¶ Gets the resolutions of the DataArrays in Dataset. If the resolutions don’t exist, it calculates the resolutions from the current coordinates.
- Returns
resolutions – x and y resolutions of the DataArrays.
- Return type
(float, float)
-
sample
(vector_file, geometry_name='geometry', value_name='id')¶ Samples the pixel for the given regions. Each sample pixel have all the data values for each timestamp and each band.
- Parameters
vector_file (str) – Name of the vector file to be used for the sampling. The vector file can be any one supported by geopandas.
geometry_name (str) – Name of the geometry in the vector file, if it doesn’t default to ‘geometry’”
value_name (str) – Name of the value of each region. This value will be associated with each pixels.
- Returns
samples – Samples of pixels contained and touched by each regions in pandas.Dataframe.
- Return type
pandas.Dataframe
Examples
>>> import xgeo # In order to use the xgeo accessor >>> import xarray as xr >>> ds = xr.open_rasterio('test.tif') >>> ds = ds.to_dataset(name='data') >>> df_sample = ds.geo.sample(vector_file='test.shp', value_name="class")
-
slice_dataset
(indices=None, bounds=None)¶ Subsets Dataset either with indices or bounds. :param indices: Indices (row_x_min, col_y_min, row_x_max, col_y_max) :type indices: tuple/list :param bounds: Bounds (x_min, y_min, x_max, y_max) :type bounds: tuple/list
- Returns
ds – Dataset with data in given bounds or indices
- Return type
xarray.Dataset
-
stats
()¶ Calculates general statistics mean, standard deviation, max, min of for each band.
- Returns
statistics – DataFrame with statistics
- Return type
pandas.Dataframe
-
subset
(vector_file, geometry_name='geometry', crop=False, extent_only=False, invert=False)¶ Subset the Dataset with the vector file. :param vector_file: Path to the vector file. Any vector file supported by GDAL are supported. :type vector_file: str or geopandas.GeoDataFrame :param geometry_name: Column name that describes the geometries in the vector file. Default value is “geometry” :type geometry_name: str :param crop: If True, the output Dataset bounds is approximately equal to the total bounds of the geometry. The
default value is False
- Parameters
extent_only (bool) – If True, the output Dataset consists all the data that are within the total bounds of the geometry. Default value is True. If extent_only is True, the crop is by default True.
invert (bool) – If True, the output GeoDataset contains values that are only outside of the geometries. Default value is False. This doesn’t have effect if extent_only is True
- Returns
ds_subset – Subset dataset
- Return type
xarray.Dataset
-
time_coords
¶ Gets the time coordinates of the Dataset
- Returns
timecoords – Time coordinates of the Dataset
- Return type
xarray.DataArray
-
time_dim
¶ Gets name of time dimension
- Returns
time_dim – Name of the time dimension
- Return type
str
-
time_size
¶ Gets the size of time dimension
- Returns
times – Size of time dimension
- Return type
int
-
to_geotiff
(output_path='.', file_prefix='data', overviews=False, bigtiff=True, compress='lzw', tiled=True, num_threads='ALL_CPUS')¶ Creates one or multiple Geotiffs for the Dataset. If the Dataset has muliple raster arrays or raster arrays for multiple timestamps, separate geotiffs are created in following path:
output_path/<file_prifix>_<variable_nane>_<timestamp>.tif
- Parameters
output_path (str) – Output directory for the files.
file_prefix (str) – Prefix for the filename
overviews (bool) – Creates image overviews if True.
bigtiff (bool) – Creates BigTiff if True
compress (str) – Compression algorithm to apply, default ‘lzw’
tiled (bool) – The tif is tiled if True
num_threads (int or str) – The number of threads the process should use. Default is ‘ALL_CPUS’
-
transform
¶ Gets the geo-transform of the Dataset. If the transform isn’t present, it calculate the transform from the current coordinates of Dataset. :returns: transform – Geo-transform (x resolution, 0, x origin, 0, y resolution, y origin) :rtype: tuple
-
validate_and_restructure
()¶ - Validates and restructures the dataset to make full utilization of GeoDataset.
Validates if x and y dimensions exists
- Validates if band and time dimension exists. If they don’t exist, it adds those dimensions to the raster
DataArrays
- Returns
dsout – A copy of original dataset restructured to have all raster DataArray in 4 dimensional format. It allows the library to be consistent over its operations.
- Return type
xarray.Dataset
-
x_coords
¶ Gets the X coordinates. :returns: xcoords – X coordinates of the Dataset :rtype: xarray.DataArray
-
x_dim
¶ Gets name of X dimension :returns: x_dim – Name of the X dimension :rtype: str
-
x_size
¶ Gets the size of X dimension :returns: xsize – Size of X dimension :rtype: int
-
y_coords
¶ Gets the Y coordinates of the Dataset.
- Returns
ycoords – Y Coordinates of the Dataset
- Return type
xarray.DataArray
-
y_dim
¶ Gets name of Y dimension :returns: y_dim – Name of the y dimension :rtype: str
-
y_size
¶ Gets the size of Y dimension :returns: ysize – Size of Y dimension :rtype: int
-
zonal_stats
(vector_file, geometry_name='geometry', value_name='id')¶ Calculates statistics for regions in the vector file.
- Parameters
vector_file (str or geopandas.GeoDataFrame) – Vector file with regions/zones for which statistics needs to be calculated
geometry_name (str) – Name of the geometry column in vector file. Default is “geometry”
value_name (str) – Name of the value column for each of which the statistics need to be calculated. Default is “id”
- Returns
zonal_statistics – DataFrame with Statistics
- Return type
pandas.Dataframe
-