Cloud mapping using Gaussian Process Regression¶
The topic of this page is to show you how to build a map from entries of the database. This page will introduce you to the GprPredictor class and its subclasses (ValueMap, StdMap). There will be also a presentation of the BorderMap subclasses.
The GprPredictor class¶
GprPredictor is a class that is intended to produce maps based on the data contained in database using Gaussian Process Regression. To work, the Map generator needs a Kernel that will indicate the length scales for the 4 dimensions (time, x, y ,z). This way, the map can know which data is relevant and which one is not, since wind affect position of data in time. So before any computation is done, the data must be chosen. At time t, get the current samples and all the data present the length scales. Using wind*dt (where dt = tcurrent - tdata), we can select which are still relevant. Once done, we can use the Gaussian process regression to create our array of data. The Gaussian process regression is done by the sklearn library (see here).
To know which map you want to compute, you first have to use the choose the coordinates of the map you want to compute. The training locations and values will be built from the the resulting change of coordinate system and the span of the Kernel (start coordinates + 3 times the span of the Kernel).
The bounding box (the real computation of data) is given by the addition of the derivation of wind and its borders. The minimum or maximum of coordinates, depending where the computation is done, creates the real bounding box.
We select the possible locations between the boundingBox +/- the span of the kernel and these are used with the Gaussian Process Regression, after fitting with the train values and locations.
The return of the getitem function is a tuple. This tuple is always a size of two and the first element always contains a ScaledArray of the computed locations. The second position of the tuple can either be None or can be the ScaledArray of the standard deviation of the Gaussian Process Regression.
def at_locations(self, locations, locBounds=None):
"""
Computes predicted value at each given location using GPR.
This method is used in the map interface when requesting a dense map.
When requesting a dense map, each location must be the position of
on pixel of the requested map.
This method automatically fetch relevant data from self.database
to compute the predicted values.
Parameters
----------
locations : numpy.array (N x 4)
Locations N x (t,x,y,z) for each of the N points where to compute
a predicted value using GPR.
Note : this implementation of GprPredictor does not enforce the
use of a 4D space (t,x,y,z) for location. However, the
self.database attribute is most likely a
nephelae.database.NephelaeDataServer type, which enforce the
use of a 4D (t,x,y,z) space.
Returns
-------
numpy.array (N x M)
Predicted values at locations. Can be more than 1 dimensional
depending on the data fetched from the database.
Example : If the database contains samples of 2D wind vector.
The samples are 2D. The predicted map is then a 2D field vector
defined on a 4D space-time.
Note : This method probably needs more refining.
(TODO : investigate this)
"""
with self.locationsLock: # This is happening after the change of coordinates
if locBounds is None:
kernelSpan = self.kernel.span()
locBounds = Bounds.from_array(locations.T)
locBounds[0].min = locBounds[0].min - kernelSpan[0]
locBounds[0].max = locBounds[0].max + kernelSpan[0]
locBounds[1].min = locBounds[1].min - kernelSpan[1]
locBounds[1].max = locBounds[1].max + kernelSpan[1]
locBounds[2].min = locBounds[2].min - kernelSpan[2]
locBounds[2].max = locBounds[2].max + kernelSpan[2]
locBounds[3].min = locBounds[3].min - kernelSpan[3]
locBounds[3].max = locBounds[3].max + kernelSpan[3]
samples = self.dataview[locBounds[0].min:locBounds[0].max,
locBounds[1].min:locBounds[1].max,
locBounds[2].min:locBounds[2].max,
locBounds[3].min:locBounds[3].max] # Searching data inside database
if len(samples) < 1: # No data found, filling the blanks
return (np.ones((locations.shape[0], 1))*self.kernel.mean,
np.ones(locations.shape[0])*self.kernel.variance)
else:
trainLocations =\
np.array([[s.position.t,\
s.position.x,\
s.position.y,\
s.position.z]\
for s in samples]) # Setting up all the locations of found samples...
trainValues = np.array([s.data for s in samples]).squeeze() # ...And all their values
if len(trainValues.shape) < 2:
trainValues = trainValues.reshape(-1,1)
boundingBox = (np.min(trainLocations, axis=0),
np.max(trainLocations, axis=0)) # Creating the bounding box
dt = boundingBox[1][0] - boundingBox[0][0] # delta time
wind = self.kernel.windMap.get_wind()
dx, dy = dt*wind
# Creating the bounding box
boundingBox[0][1] = min(boundingBox[0][1], boundingBox[0][1] +
dx)
boundingBox[1][1] = max(boundingBox[1][1], boundingBox[1][1] +
dx)
boundingBox[0][2] = min(boundingBox[0][2], boundingBox[0][2] +
dy)
boundingBox[1][2] = max(boundingBox[1][2], boundingBox[1][2] +
dy)
same_locations = np.where(np.logical_and(
np.logical_and(
np.logical_and(
locations[:,0] >= boundingBox[0][0] - kernelSpan[0],
locations[:,0] <= boundingBox[1][0] + kernelSpan[0]),
np.logical_and(
locations[:,1] >= boundingBox[0][1] - kernelSpan[1],
locations[:,1] <= boundingBox[1][1] + kernelSpan[1])),
np.logical_and(
np.logical_and(
locations[:,2] >= boundingBox[0][2] - kernelSpan[2],
locations[:,2] <= boundingBox[1][2] + kernelSpan[2]),
np.logical_and(
locations[:,3] >= boundingBox[0][3] - kernelSpan[3],
locations[:,3] <= boundingBox[1][3] + kernelSpan[3])
)))[0] # Searching all similar locations
selected_locations = locations[same_locations] # Getting locations we are interested in
self.gprProc.fit(trainLocations, trainValues) # Training of the Gaussian process
computed_locations = self.gprProc.predict(
selected_locations, return_std=self.computeStd) # Locations
[...]
GprPredictor has two subclasses : ValueMap and StdMap
ValueMap and StdMap¶
You can see ValueMap as an alias map of GprPredictor, all functions are calls of the GprPredictor. The difference is that the getitem function will alway return the ScaledArray of Gaussian Process Regression values instead of a tuple.
StdMap follow the same principle, returning only the standard deviation ScaledArray.
Mapping the border of clouds¶
You can map the border of the clouds using the BorderIncertitude class or the BorderRaw class.
To compute the border of the clouds, we use the result of the GprPredictor and a threshold to apply on it.
Then, we apply a binary xor operation between the thresholded array and the clouds (clouds = values superior or equal to the threshold).
The result of this operation are the border of the clouds.
Note : If you use the BorderIncertitude class, you will need also the standard deviation map and the result of the getitem will give you a couple of ScaledArray (+/- n times the deviation).
Example of code found inside the BorderIncertitude class :
def at_locations(self, arrays):
"""
Computes the values of the inner and outer border using binary erosion
operations and bitwise exclusive or operations. Returns a tuple of
arrays taht conatains 1/0 values (1=border, 0=noborder)
Parameters
----------
arrays : Tuple(2*ScaledArray)
The arrays to process to obtain the tuple of borders
Returns
-------
Tuple(2*ScaledArray)
The tuple of borders array
"""
typ = np.int32 # Ensuring the type for binary operations
inner, outer = compute_cross_section_border(arrays[0],
arrays[1], factor=self.factor, threshold=self.threshold) # Computes borders
thresholded_inner = threshold_array(inner, threshold=self.threshold) # Returns the values >= thr
thresholded_outer = threshold_array(outer, threshold=self.threshold) # Returns the values >= thr
eroded_inner = ndimage.binary_erosion(thresholded_inner).astype(typ) # Erosion operation
eroded_outer = ndimage.binary_erosion(thresholded_outer).astype(typ) # Erosion operation
border_inner = np.bitwise_xor(thresholded_inner, eroded_inner) # Returns only the inner borders
border_outer = np.bitwise_xor(thresholded_outer, eroded_outer)# # Returns only the outer borders
inner_scarray = ScaledArray(border_inner, arrays[0].dimHelper,
arrays[0].interpolation)
outer_scarray = ScaledArray(border_outer, arrays[0].dimHelper,
arrays[0].interpolation)
return (inner_scarray, outer_scarray)
Updated by Fiagnon Paul Etse over 3 years ago · 2 revisions