Cloud mapping using Gaussian Process Regression¶
The topic of this page is to show you how to build a map from entries of the database. This page will introduce you to the GprPredictor class and its subclasses (ValueMap, StdMap). There will be also a presentation of the BorderMap subclasses.
The GprPredictor class¶
GprPredictor is a class that is intended to produce maps based on the data contained in database using Gaussian Process Regression. To work, the Map generator needs a Kernel that will indicate the length scales for the 4 dimensions (time, x, y ,z). This way, the map can know which data is relevant and which one is not, since wind affect position of data in time. So before any computation is done, the data must be chosen. At time t, get the current samples and all the data present the length scales. Using wind*dt (where dt = tcurrent - tdata), we can select which are still relevant. Once done, we can use the Gaussian process regression to create our array of data. The Gaussian process regression is done by the sklearn library (see here).
To know which map you want to compute, you first have to use the choose the coordinates of the map you want to compute. The training locations and values will be built from the the resulting change of coordinate system and the span of the Kernel (start coordinates + 3 times the span of the Kernel).
The bounding box (the real computation of data) is given by the addition of the derivation of wind and its borders. The minimum or maximum of coordinates, depending where the computation is done, creates the real bounding box.
We select the possible locations between the boundingBox +/- the span of the kernel and these are used with the Gaussian Process Regression, after fitting with the train values and locations.
The return of the getitem function is a tuple. This tuple is always a size of two and the first element always contains a ScaledArray of the computed locations. The second position of the tuple can either be None or can be the ScaledArray of the standard deviation of the Gaussian Process Regression.
def at_locations(self, locations, locBounds=None): """ Computes predicted value at each given location using GPR. This method is used in the map interface when requesting a dense map. When requesting a dense map, each location must be the position of on pixel of the requested map. This method automatically fetch relevant data from self.database to compute the predicted values. Parameters ---------- locations : numpy.array (N x 4) Locations N x (t,x,y,z) for each of the N points where to compute a predicted value using GPR. Note : this implementation of GprPredictor does not enforce the use of a 4D space (t,x,y,z) for location. However, the self.database attribute is most likely a nephelae.database.NephelaeDataServer type, which enforce the use of a 4D (t,x,y,z) space. Returns ------- numpy.array (N x M) Predicted values at locations. Can be more than 1 dimensional depending on the data fetched from the database. Example : If the database contains samples of 2D wind vector. The samples are 2D. The predicted map is then a 2D field vector defined on a 4D space-time. Note : This method probably needs more refining. (TODO : investigate this) """ with self.locationsLock: # This is happening after the change of coordinates if locBounds is None: kernelSpan = self.kernel.span() locBounds = Bounds.from_array(locations.T) locBounds.min = locBounds.min - kernelSpan locBounds.max = locBounds.max + kernelSpan locBounds.min = locBounds.min - kernelSpan locBounds.max = locBounds.max + kernelSpan locBounds.min = locBounds.min - kernelSpan locBounds.max = locBounds.max + kernelSpan locBounds.min = locBounds.min - kernelSpan locBounds.max = locBounds.max + kernelSpan samples = self.dataview[locBounds.min:locBounds.max, locBounds.min:locBounds.max, locBounds.min:locBounds.max, locBounds.min:locBounds.max] # Searching data inside database if len(samples) < 1: # No data found, filling the blanks return (np.ones((locations.shape, 1))*self.kernel.mean, np.ones(locations.shape)*self.kernel.variance) else: trainLocations =\ np.array([[s.position.t,\ s.position.x,\ s.position.y,\ s.position.z]\ for s in samples]) # Setting up all the locations of found samples... trainValues = np.array([s.data for s in samples]).squeeze() # ...And all their values if len(trainValues.shape) < 2: trainValues = trainValues.reshape(-1,1) boundingBox = (np.min(trainLocations, axis=0), np.max(trainLocations, axis=0)) # Creating the bounding box dt = boundingBox - boundingBox # delta time wind = self.kernel.windMap.get_wind() dx, dy = dt*wind # Creating the bounding box boundingBox = min(boundingBox, boundingBox + dx) boundingBox = max(boundingBox, boundingBox + dx) boundingBox = min(boundingBox, boundingBox + dy) boundingBox = max(boundingBox, boundingBox + dy) same_locations = np.where(np.logical_and( np.logical_and( np.logical_and( locations[:,0] >= boundingBox - kernelSpan, locations[:,0] <= boundingBox + kernelSpan), np.logical_and( locations[:,1] >= boundingBox - kernelSpan, locations[:,1] <= boundingBox + kernelSpan)), np.logical_and( np.logical_and( locations[:,2] >= boundingBox - kernelSpan, locations[:,2] <= boundingBox + kernelSpan), np.logical_and( locations[:,3] >= boundingBox - kernelSpan, locations[:,3] <= boundingBox + kernelSpan) ))) # Searching all similar locations selected_locations = locations[same_locations] # Getting locations we are interested in self.gprProc.fit(trainLocations, trainValues) # Training of the Gaussian process computed_locations = self.gprProc.predict( selected_locations, return_std=self.computeStd) # Locations [...]
GprPredictor has two subclasses : ValueMap and StdMap
ValueMap and StdMap¶
You can see ValueMap as an alias map of GprPredictor, all functions are calls of the GprPredictor. The difference is that the getitem function will alway return the ScaledArray of Gaussian Process Regression values instead of a tuple.
StdMap follow the same principle, returning only the standard deviation ScaledArray.
Mapping the border of clouds¶
You can map the border of the clouds using the BorderIncertitude class or the BorderRaw class.
To compute the border of the clouds, we use the result of the GprPredictor and a threshold to apply on it.
Then, we apply a binary xor operation between the thresholded array and the clouds (clouds = values superior or equal to the threshold).
The result of this operation are the border of the clouds.
Note : If you use the BorderIncertitude class, you will need also the standard deviation map and the result of the getitem will give you a couple of ScaledArray (+/- n times the deviation).
Example of code found inside the BorderIncertitude class :
def at_locations(self, arrays): """ Computes the values of the inner and outer border using binary erosion operations and bitwise exclusive or operations. Returns a tuple of arrays taht conatains 1/0 values (1=border, 0=noborder) Parameters ---------- arrays : Tuple(2*ScaledArray) The arrays to process to obtain the tuple of borders Returns ------- Tuple(2*ScaledArray) The tuple of borders array """ typ = np.int32 # Ensuring the type for binary operations inner, outer = compute_cross_section_border(arrays, arrays, factor=self.factor, threshold=self.threshold) # Computes borders thresholded_inner = threshold_array(inner, threshold=self.threshold) # Returns the values >= thr thresholded_outer = threshold_array(outer, threshold=self.threshold) # Returns the values >= thr eroded_inner = ndimage.binary_erosion(thresholded_inner).astype(typ) # Erosion operation eroded_outer = ndimage.binary_erosion(thresholded_outer).astype(typ) # Erosion operation border_inner = np.bitwise_xor(thresholded_inner, eroded_inner) # Returns only the inner borders border_outer = np.bitwise_xor(thresholded_outer, eroded_outer)# # Returns only the outer borders inner_scarray = ScaledArray(border_inner, arrays.dimHelper, arrays.interpolation) outer_scarray = ScaledArray(border_outer, arrays.dimHelper, arrays.interpolation) return (inner_scarray, outer_scarray)