Working with the database » History » Version 10
Rafael Bailon-Ruiz, 2020-12-08 12:40
Add SQL injection warning
1 | 1 | Rafael Bailon-Ruiz | h1. Working with the database |
---|---|---|---|
2 | |||
3 | The feature database is CAMS managing the storage and access of vector data, pieces of information like sensor measurements that can be described in space with a geometric figure such as a points, lines or polygons. See http://wiki.gis.com/wiki/index.php/Vector_data_modelto read more about vector data models. |
||
4 | |||
5 | 2 | Rafael Bailon-Ruiz | The basic piece of information of this data model is the _feature_, defined by a _geometry_ that indicates unequivocally its position and shape in the world, and a set of _attributes_ which are the characteristics associated to that location. Related features sharing a common geometry definition and attribute set are usually grouped together. |
6 | 1 | Rafael Bailon-Ruiz | |
7 | 3 | Rafael Bailon-Ruiz | {{toc}} |
8 | 2 | Rafael Bailon-Ruiz | |
9 | 1 | Rafael Bailon-Ruiz | h2. Data model |
10 | |||
11 | The CAMS database model is takes inspiration from the _OGR data model_ and the _OGC OpenPackage specification_. |
||
12 | |||
13 | h3. Dataset |
||
14 | |||
15 | A dataset is encompasses a set of feature collections stored in the same database or file. |
||
16 | |||
17 | |||
18 | h3. Collection |
||
19 | |||
20 | A collection describes the characteristics of features of the same kind or category. I.e.: "Wind", "UAV state", "Liquid water content", etc. It corresponds roughly to a table in a relational database or a layer in many geographic information models. |
||
21 | |||
22 | A collection is defined by the following parameters: |
||
23 | # A computer id (name_id), |
||
24 | # A human-readable name, |
||
25 | 2 | Rafael Bailon-Ruiz | # An coordinate reference system as an "EPSG code":https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset, |
26 | 1 | Rafael Bailon-Ruiz | # A geometry type (As of 12/2020 only the "point" geometry is supported), |
27 | 3 | Rafael Bailon-Ruiz | # A ordered set of attributes and corresponding types. Attributes can be of type _int_, _str_, _float_, or _datetime_. |
28 | 1 | Rafael Bailon-Ruiz | # And, optionally, a long description. |
29 | |||
30 | |||
31 | h3. Feature |
||
32 | |||
33 | !vector_feature.png! |
||
34 | |||
35 | 3 | Rafael Bailon-Ruiz | The collection field is used to identify the collection to with a particular field belongs; thus determining the geometry and attribute set. |
36 | 1 | Rafael Bailon-Ruiz | |
37 | 3 | Rafael Bailon-Ruiz | General attributes, *t* (time) and *producer* , are mandatory for features generated repeatedly by UAV sensors. The time attribute is represented by a date and time (datetime.datetime in python) in Coordinated Universal Time (UTC). The producer attribute is a string. |
38 | 1 | Rafael Bailon-Ruiz | |
39 | 3 | Rafael Bailon-Ruiz | Specific attributes are unique to a particular collection. All features of the same collection must have the same attributes, but features of different collections do not need to share specific attributes unlike general ones. For instance, a _"wind"_ collection can have the _"east"_ and _"west"_ attributes to describe the wind vector components. |
40 | 1 | Rafael Bailon-Ruiz | |
41 | 4 | Rafael Bailon-Ruiz | h2. Code architecture |
42 | 1 | Rafael Bailon-Ruiz | |
43 | 7 | Rafael Bailon-Ruiz | The GeoPacakgeDatabase and MemoryDatabase provide two alternative feature storage strategies for the FeatureDatabase. The first uses the GDAL/OGR library to write and read GeoPackage files and the second implements a memory-backed database tailor-made to provide fast access to common simple queries. Depending on the request complexity, the FeatureDatabase _query_ method chooses one of the storage backends will use, the MemoryDatabase when possible or the GeoPackageDatabase otherwise. |
44 | |||
45 | The GeoPackageDatabase class use Sqlite transactions that can be slow for writing or reading small pieces of data. When writing features it is advised to use the _register_features_ method to delay disk I/O operations and reduce the number of transactions. Anyway, fetching information from the database triggers a write transaction beforehand to ensure data integrity. |
||
46 | |||
47 | The DataServer class receives AircraftStatus and SensorSample objects from the add_sample and add_status events and converts them to database features. |
||
48 | |||
49 | |||
50 | 4 | Rafael Bailon-Ruiz | h3. Class diagram |
51 | !db%20diagram.png! |
||
52 | 1 | Rafael Bailon-Ruiz | |
53 | 7 | Rafael Bailon-Ruiz | h2. Code examples |
54 | 8 | Rafael Bailon-Ruiz | |
55 | 9 | Rafael Bailon-Ruiz | <code>nephelae_base/unittests/test_feature_database.py</code> provides many examples on using the CAMS database. |
56 | 1 | Rafael Bailon-Ruiz | |
57 | 10 | Rafael Bailon-Ruiz | p{border: solid 1px #8B0000; padding: 1em; margin: 1em; background: #FEE}. %{color:red; font-weight: bold; font-size: large}Important:% *You should be aware that neither CAMS or GDAL/OGR sanitize SQL statements. Your program may be the target of "SQL injection":https://en.wikipedia.org/wiki/SQL_injection attacks by malicious users resulting on important data loss and/or serious denial-of-service.* |
58 | 9 | Rafael Bailon-Ruiz | |
59 | 7 | Rafael Bailon-Ruiz | <pre><code class="python"> |
60 | fdb = FeatureDatabase("database.gpkg") # Create a FeatureDatabase with memory and geopackage storage backends |
||
61 | 1 | Rafael Bailon-Ruiz | |
62 | 7 | Rafael Bailon-Ruiz | lwc_attrs = (("t", "datetime"), ("producer", "str"), ("humidity", "float")) |
63 | lwc_collection = CollectionSchema( |
||
64 | "lwc", "Liquid Water Content", 32631, "point", lwc_attrs, |
||
65 | description="The liquid water content measurements") # epsg:32631 corresponds to WGS84/UTM31N |
||
66 | 1 | Rafael Bailon-Ruiz | |
67 | 7 | Rafael Bailon-Ruiz | fdb.add_collection(lwc_collection) |
68 | </code></pre> |
||
69 | 1 | Rafael Bailon-Ruiz | |
70 | 7 | Rafael Bailon-Ruiz | <pre><code class="python"> |
71 | # Define some features from a liquid water content sensor on UAV "200" |
||
72 | lwc_feature = feature = ('lwc', (360347.0, 4813681.0, 300.0), { |
||
73 | "t": datetime.datetime(2020, 3, 5, 14, 35, 20, int(123.0 * 1000)), |
||
74 | "producer": "200", |
||
75 | "humidity": 0.0125}) |
||
76 | lwc_feature2 = ('lwc', (360347.0, 4813681.0, 300.0), { |
||
77 | "t": datetime.datetime(2020, 3, 5, 14, 35, 20, int(123.0 * 1000)), |
||
78 | "producer": "202", |
||
79 | "humidity": 0.0125}) |
||
80 | lwc_feature3 = ('lwc', (361347.0, 4814681.0, 300.0), { |
||
81 | "t": datetime.datetime(2020, 3, 5, 14, 35, 22, int(123.0 * 1000)), |
||
82 | "producer": "200", |
||
83 | "humidity": 0.0125}) |
||
84 | 1 | Rafael Bailon-Ruiz | |
85 | 7 | Rafael Bailon-Ruiz | # Add them to the database |
86 | fdb.insert(lwc_feature, lwc_feature2, lwc_feature3) |
||
87 | </code></pre> |
||
88 | 8 | Rafael Bailon-Ruiz | |
89 | 7 | Rafael Bailon-Ruiz | <pre><code class="python"> |
90 | 1 | Rafael Bailon-Ruiz | # Get all featres from the "lwc" collection |
91 | 7 | Rafael Bailon-Ruiz | result_iter = fdb.query("lwc") |
92 | 1 | Rafael Bailon-Ruiz | |
93 | 7 | Rafael Bailon-Ruiz | # The result is an iterator (the actual reading operation is performed |
94 | 9 | Rafael Bailon-Ruiz | # lazily and makes it easier to combine with further filtering code |
95 | 1 | Rafael Bailon-Ruiz | # without extra memory usage. |
96 | 8 | Rafael Bailon-Ruiz | list_of_lwc = list(result_iter) # But you can have a list if needed |
97 | 7 | Rafael Bailon-Ruiz | |
98 | 8 | Rafael Bailon-Ruiz | # complex_r is a complex request that requires an sql engine to be processed |
99 | 1 | Rafael Bailon-Ruiz | complex_r = list(empty_feature_db.query( |
100 | 7 | Rafael Bailon-Ruiz | "lwc", where="\"producer\" == \"200\"", |
101 | order_by="t", direction="asc")) |
||
102 | |||
103 | 8 | Rafael Bailon-Ruiz | |
104 | 7 | Rafael Bailon-Ruiz | # (minx, miny, minz, maxx, maxy, maxz) |
105 | bbox = (lwc_feature[1][0] - 0.1, lwc_feature[1][1] - 0.1, |
||
106 | -math.inf, |
||
107 | lwc_feature[1][0] + 0.1, lwc_feature[1][1] + 0.1, |
||
108 | math.inf) |
||
109 | # Simple bounding box request. Fast result from the memory database |
||
110 | bbox_r = = list(empty_feature_db.query("lwc", bounding_box=bbox)) |
||
111 | </code></pre> |
||
112 | 4 | Rafael Bailon-Ruiz | |
113 | 3 | Rafael Bailon-Ruiz | h2. Post-mission analysis |
114 | |||
115 | While GeoPackage .gpkg files generated by CAMS during a mission can be read using this software, it is better to use general purpose geographic information systems or more mature GIS libraries to process the information. |
||
116 | |||
117 | 1 | Rafael Bailon-Ruiz | Popular python libraries are "fiona":https://fiona.readthedocs.io/en/stable/README.html —a pythonic style interface to the popular GDAL/OGR library— and "geopandas":https://geopandas.org/, extending the python pandas library model to geographic data. "QGIS":https://www.qgis.org/en/site/ is an easy option for non-developpers to visualize geospatial data and visually combine the information with other sources. |