Geodata¶
Tip
Use QGIS to display geospatial data and to create maps in PDF or image formats (e.g., tif, png, jpg).
Geodata sources¶
Geospatial data can be retrieved for various purposes from different sources. Here are some of them:
Geographical, atlas map-like data are provided by naturalearthdata.com (e.g., with their 227-mb Natural Earth quick start kit).
Satellite imagery is available at
eesa’s copernicus open access hub (Sentinel-2)
planet.com (commercial)
LiDAR data can be found at opentopography.org.
Climatological data are provided by NASA Earth Observation.
Meteorological (e.g., temperature or precipitation) and real-time satellite data are available at wunderground.com and its wundermap.
Data on land use (including canopy cover), socioeconomic characteristics, and global change are available at the FAo GeoNetwork or the archived ISCGM Global Map portal (go to their github archive).
Visualization¶
GIS software is needed to display geospatial data and many tools exist. This website primarily provides examples using QGIS. Since the use of GIS software, especially QGIS, is necessary in several places on the website, explanations on how to install QGIS are already included on the Get Started > Geospatial software page.
Tip
The BASEMENT pre-processing page features the basics of geospatial data handling with QGIS. Therefore, this introduction to numerical modeling is also a good introduction to QGIS.
Geodatabase¶
A geodatabase (also known as spatial database) can store, query (e.g., using Structured Query Language SQL), or modify data with geographic references (geospatial data). Primarily, geospatial data consist of vector data (see shapefiles), but raster data can also be implemented. A geodatabase links these data with attribute tables and geographic coordinates. The special aspect of geodatabases is that these data can be queried and manipulated by users via a (web or local) GIS (geographic information system) server. With software like QGIS (or ArcGIS Pro), for example, queries can be made on a kind of local server using locally stored geodata. The typical geodatabase format is .gdb
, which works actually like a directory in QGIS or ArcGIS, and the maximum size of a .gdb
file is 1 terabyte.
Vector data¶
Vector data are visually smooth and efficient for overlay operations, especially regarding shape-driven geo-information such as roads or surface delineations. Vector data are typically less storage-intensive, easier to scale, and more compatible with relational environments. Common formats are .shp
, JSON
or TIN
.
The shapefile format was invented by Esri (download their PDf documentation) and information contained in shapefiles can be:
Polygons (surface patches).
Points with x-y-z coordinates and an m field containing point data.
(Poly) lines consisting of lines defined by start points and endpoints.
Shapefile¶
Note
The gdal.ogr
driver name for shapefile handling is ogr.GetDriverByName('ESRI Shapefile')
. A shapefile is not just one file and consists of three essential parts: * a .shp
file, where geometries are stored, * a .shx
file, where indices of the geometries are stored, * a .prj
file that stores the projection, and * a .dbf
file containing attribute information (constitutes the attribute table).
These three files need to be in the same folder - otherwise, the shapefile does not work. A couple of other files may occur when we manipulate a shapefile (e.g., .atx
, .sb*
, .shp.xml
, .cpg
, .mxs
, .ai*
, or .fb*
), but we can ignore those files.
Shapefile vector data typically has an attribute table (just like any other geodatabase) in which each polygon, line or point object can be assigned an attribute value. Attributes are defined by columns along with their names (column headers) and can have numeric (e.g., float, double, int, or long), text (string), or date/time (e.g. yyyymmdd or HH:MM:SS) formats.
Shapefile versus geodatabase¶
A shapefile can be understood as a concurring format to a geodatabase. Which file format is better? Strictly speaking, both a geodatabase and a shapefile can perform similar operations, but a shapefile requires more storage space to store similar contents, cannot store combinations of data and time, nor does it support raster files or Null (not-a-number) values. So basically we are better off with geodatabases, but the usage of shapefiles is popular and many geospatial operations focus on shapefile manipulations.
Triangulated Irregular Network (TIN)¶
A triangulated irregular network (TIN) represents a surface consisting of multiple triangles. In hydraulic engineering and water resources research, one of the most important usage of TIN is the generation of computational meshes for numerical models (e.g., on this website’s BASEMENT tutorial). In such models, a TIN consists of lines and nodes forming georeferenced, three-dimensionally sloped triangles of the surface, which represent a digital elevation model (DEM). TIN nodes have georeferenced coordinates and potentially more attribute information such as node IDs and elevation. The advantage of a TIN DEM over a raster DEM is that it requires less storage space. Alas, manipulating a TIN is not that easy like manipulating a raster. The below figure shows an example TIN created with `matplotlib.tri.TriAnalyzer
<https://matplotlib.org/3.1.1/api/tri_api.html#matplotlib.tri.TriAnalyzer>`__), and based on a showcase from the matplotlib docs.
The file ending of a TIN is .TIN
.
GeoJSOn¶
Note
The gdal.ogr
driver name for shapefile handling is ogr.GetDriverByName('GeoJSON')
.
GeoJSON is an open format for representing geographic data with simple feature access standards, where JSON denotes JavaScript Object Orientation (read more about JSON file manipulation in the Python intro on this website). The GeoJSON file name ending is .geojson
and a file typically has the following structure:
{
"type": "FeatureCollection", "features": [
{
"type": "Feature", "geometry": {
"type": "Point", "coordinates": [9.104028940200806, 48.74417005744522]
}, "properties": {
"name": "IWS"
}
}
]
}
Visit geojson.io to build a customized GeoJSON file. While GeoJSON metadata can provide height information (z
values) as a properties
value, there is a more suitable offspring to encode geospatial topology in the form of the still rather young TopoJSON format.
Gridded cell (raster) data¶
Raster datasets store pixel values (cells), which require large storage space, but have a simple structure. A big advantage of rasters is the possibility to perform powerful geospatial and statistical analyses. Common Raster datasets are, among others, .tif
(GeoTIFF), GRID (a folder with a BND
, HDR
, STA
, VAT
, and other files), .flt
(floating points), ASCII (American standard Code for Information Interchange), and many more image-like file types.
Tip
Preferably use the GeoTIFF format in raster analyses. A GeoTIFF file, typically includes a .tif
file (with heavy data) and a .tfw
(a six-line plain text world file containing georeference information) file.
Note
The gdal
driver name for GeoTIFF handling is gdal.GetDriverByName('GTiff')
.
Projections and coordinate systems¶
In geospatial data analyses, a projection represents an approach to flatten (a part of) the globe. In this flattening process, latitudinal (North/South) and longitudinal (West/East) coordinates of a location on the globe (three-dimensional 3D) are projected into the coordinates of a two-dimensional (2d) map. When 3D coordinates are projected onto 2d coordinates, distortions occur and there is a variety of projection systems used in geospatial analyses. In practice this means that if we use geospatial data files with different projections, a distortion effect propagates in all subsequent calculations. It is absolutely crucial to avoid distortion effects by ensuring that the same projections and coordinate systems are applied to all geospatial data used. This starts with the creation of a new geospatial layer (e.g., a point vector shapefile) in QGIS and should be used consistently in all program codes. To specify a projection or coordinate system in QGIS, click on Project
> Properties
> CRS
tab and select a COORDINATE_SYSTEM
. For example, an appropriate coordinate system for central Europe is ESRI:31493
(read more in the QGIS docs). Projected systems may vary with regions (local coordinate systems), which can, for example, be found at epsg.io or spatialreference.org.
In shapefiles, information about the projection is stored in a .prj
file (recall definitions in the geospatial data section), which is a plain text file. The Open Spatial Consortium (OGC) and Esri use Well-Known Text (WKT) files for standard descriptions of coordinate systemsa and such a WKT-formatted .prj
file can look like this:
PROJCS["unknown",GEOGCS["GCS_unknown",
DATUM["D_Unknown_based_on_GRS80_ellipsoid",SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],
PROJECTION["Lambert_Conformal_Conic"], PARAMETER["False_Easting",6561666.66666667],
..., UNIT["US survey foot",0.304800609601219]]
In GeoJSON files, the standard coordinate system is WGS84 according to the developer’s specifications. The units and measures defined in the WKT-formatted .prj
file also determine the units of WK***B* (Well-Known Binary) definitions of geometries such as line length (e.g., in meters, feet or many more), or polygon area (square meters, square kilometers, acres, and many more).
Tip
To ensure that all geometries are measures in meters and powers of meters, use EPSG:3857 (former 900913 - g00glE) to define the WKT-formatted projection file.