Working with the database » History » Revision 12
Revision 11 (Rafael Bailon-Ruiz, 2021-01-12 15:56) → Revision 12/14 (Rafael Bailon-Ruiz, 2021-01-12 16:17)
h1. Working with the database The feature database is CAMS managing the storage and access of vector data, pieces of information like sensor measurements that can be described in space with a geometric figure such as a points, lines or polygons. See http://wiki.gis.com/wiki/index.php/Vector_data_modelto read more about vector data models. The basic piece of information of this data model is the _feature_, defined by a _geometry_ that indicates unequivocally its position and shape in the world, and a set of _attributes_ which are the characteristics associated to that location. Related features sharing a common geometry definition and attribute set are usually grouped together. {{toc}} h2. Data model The CAMS database model is takes inspiration from the _OGR data model_ and the _OGC OpenPackage specification_. h3. Dataset A dataset is encompasses a set of feature collections stored in the same database or file. h3. Collection A collection describes the characteristics of features of the same kind or category. I.e.: "Wind", "UAV state", "Liquid water content", etc. It corresponds roughly to a table in a relational database or a layer in many geographic information models. A collection is defined by the following parameters: # A computer id (name_id), # A human-readable name, # An coordinate reference system as an "EPSG code":https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset, # A geometry type (As of 12/2020 only the "point" geometry is supported), # A ordered set of attributes and corresponding types (Attributes can be of type _int_, _str_, _float_, or _datetime_), # And, optionally, a long description. h3. Feature !Feature%20definition.png! !vector_feature.png! The collection field is used to identify the collection to with a particular field belongs; thus determining the geometry and attribute set. General attributes, *t* (time) and *producer* , are mandatory for features generated repeatedly by UAV sensors. The time attribute is represented by a date and time (datetime.datetime in python) in Coordinated Universal Time (UTC). The producer attribute is a string. Specific attributes are unique to a particular collection. All features of the same collection must have the same attributes, but features of different collections do not need to share specific attributes unlike general ones. For instance, a _"wind"_ collection can have the _"east"_ and _"west"_ attributes to describe the wind vector components. h2. Code architecture The GeoPacakgeDatabase and MemoryDatabase provide two alternative feature storage strategies for the FeatureDatabase. The first uses the GDAL/OGR library to write and read GeoPackage files and the second implements a memory-backed database tailor-made to provide fast access to common simple queries. Depending on the request complexity, the FeatureDatabase _query_ method chooses one of the storage backends will use, the MemoryDatabase when possible or the GeoPackageDatabase otherwise. The GeoPackageDatabase class use Sqlite transactions that can be slow for writing or reading small pieces of data. When writing features it is advised to use the _register_features_ method to delay disk I/O operations and reduce the number of transactions. Anyway, fetching information from the database triggers a write transaction beforehand to ensure data integrity. The DataServer class receives AircraftStatus and SensorSample objects from the add_sample and add_status events and converts them to database features. h3. Class diagram !db%20diagram.png! h2. Code examples <code>nephelae_base/unittests/test_feature_database.py</code> provides many examples on using the CAMS database. p{border: solid 1px #8B0000; padding: 1em; margin: 1em; background: #FEE}. %{color:red; font-weight: bold; font-size: large}Important:% *You should be aware that neither CAMS or GDAL/OGR sanitize SQL statements. Your program may be the target of "SQL injection":https://en.wikipedia.org/wiki/SQL_injection attacks by malicious users resulting on important data loss and/or serious denial-of-service.* <pre><code class="python"> fdb = FeatureDatabase("database.gpkg") # Create a FeatureDatabase with memory and geopackage storage backends lwc_attrs = (("t", "datetime"), ("producer", "str"), ("humidity", "float")) lwc_collection = CollectionSchema( "lwc", "Liquid Water Content", 32631, "point", lwc_attrs, description="The liquid water content measurements") # epsg:32631 corresponds to WGS84/UTM31N fdb.add_collection(lwc_collection) </code></pre> <pre><code class="python"> # Define some features from a liquid water content sensor on UAV "200" lwc_feature = Feature('lwc', (360347.0, 4813681.0, 300.0), datetime.datetime(2020, 3, 5, 14, 35, 20, int(123.0 * 1000)), 200, {"humidity": 0.0125}) lwc_feature2 = Feature('lwc', (360347.0, 4813681.0, 300.0), datetime.datetime(2020, 3, 5, 14, 35, 20, int(123.0 * 1000)), "202", {"humidity": 0.0125}) lwc_feature3 = ('lwc', (361347.0, 4814681.0, 300.0), datetime.datetime(2020, 3, 5, 14, 35, 22, int(123.0 * 1000)), "200", {"humidity": 0.0125}) # Add them to the database fdb.insert(lwc_feature, lwc_feature2, lwc_feature3) </code></pre> <pre><code class="python"> # Get all featres from the "lwc" collection result_iter = fdb.query("lwc") # The result is an iterator (the actual reading operation is performed # lazily and makes it easier to combine with further filtering code # without extra memory usage. list_of_lwc = list(result_iter) # But you can have a list if needed # complex_r is a complex request that requires an sql engine to be processed complex_r = list(empty_feature_db.query( "lwc", where="\"producer\" == \"200\"", order_by="t", direction="asc")) # (minx, miny, minz, maxx, maxy, maxz) bbox = (lwc_feature.geometry[0] - 0.1, lwc_feature.geometry[1] - 0.1, -math.inf, lwc_feature.geometry[0] + 0.1, lwc_feature.geometry[1] + 0.1, math.inf) # Simple bounding box request. Fast result from the memory database bbox_r = list(empty_feature_db.query("lwc", bounding_box=bbox)) </code></pre> h2. Post-mission analysis While GeoPackage .gpkg files generated by CAMS during a mission can be read using this software, it is better to use general purpose geographic information systems or more mature GIS libraries to process the information. Popular python libraries are "fiona":https://fiona.readthedocs.io/en/stable/README.html —a pythonic style interface to the popular GDAL/OGR library— and "geopandas":https://geopandas.org/, extending the python pandas library model to geographic data. "QGIS":https://www.qgis.org/en/site/ is an easy option for non-developpers to visualize geospatial data and visually combine the information with other sources.