Cloud mapping using Gaussian Process Regression

The topic of this page is to show you how to build a map from entries of the database. This page will introduce you to the GprPredictor class and its subclasses (ValueMap, StdMap). There will be also a presentation of the BorderMap subclasses.

The GprPredictor class

GprPredictor is a class that is intended to produce maps based on the data contained in database using Gaussian Process Regression. To work, the Map generator needs a Kernel that will indicate the length scales for the 4 dimensions (time, x, y ,z). This way, the map can know which data is relevant and which one is not, since wind affect position of data in time. So before any computation is done, the data must be chosen. At time t, get the current samples and all the data present the length scales. Using wind*dt (where dt = tcurrent - tdata), we can select which are still relevant. Once done, we can use the Gaussian process regression to create our array of data. The Gaussian process regression is done by the sklearn library (see here).
To know which map you want to compute, you first have to use the choose the coordinates of the map you want to compute. The training locations and values will be built from the the resulting change of coordinate system and the span of the Kernel (start coordinates + 3 times the span of the Kernel).
The bounding box (the real computation of data) is given by the addition of the derivation of wind and its borders. The minimum or maximum of coordinates, depending where the computation is done, creates the real bounding box.
We select the possible locations between the boundingBox +/- the span of the kernel and these are used with the Gaussian Process Regression, after fitting with the train values and locations.
The return of the getitem function is a tuple. This tuple is always a size of two and the first element always contains a ScaledArray of the computed locations. The second position of the tuple can either be None or can be the ScaledArray of the standard deviation of the Gaussian Process Regression.

def at_locations(self, locations, locBounds=None):
    Computes predicted value at each given location using GPR.
    This method is used in the map interface when requesting a dense map.
    When requesting a dense map, each location must be the position of       
    on pixel of the requested map.

    This method automatically fetch relevant data from self.database         
    to compute the predicted values.                                         

    locations : numpy.array (N x 4)
        Locations N x (t,x,y,z) for each of the N points where to compute
        a predicted value using GPR.
        Note : this implementation of GprPredictor does not enforce the
        use of a 4D space (t,x,y,z) for location. However, the
        self.database attribute is most likely a
        nephelae.database.NephelaeDataServer type, which enforce the
        use of a 4D (t,x,y,z) space.

    numpy.array (N x M)
       Predicted values at locations. Can be more than 1 dimensional
       depending on the data fetched from the database.
       Example : If the database contains samples of 2D wind vector.
       The samples are 2D. The predicted map is then a 2D field vector
       defined on a 4D space-time.                                                                                                                  

       Note : This method probably needs more refining.
       (TODO : investigate this)
    with self.locationsLock: # This is happening after the change of coordinates
       if locBounds is None:
           kernelSpan = self.kernel.span()
           locBounds = Bounds.from_array(locations.T)
           locBounds[0].min = locBounds[0].min - kernelSpan[0]
           locBounds[0].max = locBounds[0].max + kernelSpan[0]

           locBounds[1].min = locBounds[1].min - kernelSpan[1]
           locBounds[1].max = locBounds[1].max + kernelSpan[1]

           locBounds[2].min = locBounds[2].min - kernelSpan[2]
           locBounds[2].max = locBounds[2].max + kernelSpan[2]

           locBounds[3].min = locBounds[3].min - kernelSpan[3]
           locBounds[3].max = locBounds[3].max + kernelSpan[3]

       samples = self.dataview[locBounds[0].min:locBounds[0].max,
                               locBounds[3].min:locBounds[3].max] # Searching data inside database

       if len(samples) < 1: # No data found, filling the blanks
           return (np.ones((locations.shape[0], 1))*self.kernel.mean,
           trainLocations =\
                         for s in samples]) # Setting up all the locations of found samples...

                trainValues = np.array([ for s in samples]).squeeze() # ...And all their values
                if len(trainValues.shape) < 2:
                    trainValues = trainValues.reshape(-1,1)

                boundingBox = (np.min(trainLocations, axis=0),
                    np.max(trainLocations, axis=0)) # Creating the bounding box 

                dt = boundingBox[1][0] - boundingBox[0][0] # delta time
                wind = self.kernel.windMap.get_wind()
                dx, dy = dt*wind

                # Creating the bounding box
                boundingBox[0][1] = min(boundingBox[0][1], boundingBox[0][1] +
                boundingBox[1][1] = max(boundingBox[1][1], boundingBox[1][1] +

                boundingBox[0][2] = min(boundingBox[0][2], boundingBox[0][2] +
                boundingBox[1][2] = max(boundingBox[1][2], boundingBox[1][2] +

                same_locations = np.where(np.logical_and(
                            locations[:,0] >= boundingBox[0][0] - kernelSpan[0],
                            locations[:,0] <= boundingBox[1][0] + kernelSpan[0]),
                            locations[:,1] >= boundingBox[0][1] - kernelSpan[1],
                            locations[:,1] <= boundingBox[1][1] + kernelSpan[1])),
                            locations[:,2] >= boundingBox[0][2] - kernelSpan[2],
                            locations[:,2] <= boundingBox[1][2] + kernelSpan[2]),
                            locations[:,3] >= boundingBox[0][3] - kernelSpan[3],
                            locations[:,3] <= boundingBox[1][3] + kernelSpan[3])
                    )))[0] # Searching all similar locations

                selected_locations = locations[same_locations] # Getting locations we are interested in
      , trainValues) # Training of the Gaussian process
                computed_locations = self.gprProc.predict(
                        selected_locations, return_std=self.computeStd) # Locations

GprPredictor has two subclasses : ValueMap and StdMap

ValueMap and StdMap

You can see ValueMap as an alias map of GprPredictor, all functions are calls of the GprPredictor. The difference is that the getitem function will alway return the ScaledArray of Gaussian Process Regression values instead of a tuple.
StdMap follow the same principle, returning only the standard deviation ScaledArray.

Mapping the border of clouds

You can map the border of the clouds using the BorderIncertitude class or the BorderRaw class.
To compute the border of the clouds, we use the result of the GprPredictor and a threshold to apply on it.
Then, we apply a binary xor operation between the thresholded array and the clouds (clouds = values superior or equal to the threshold).
The result of this operation are the border of the clouds.
Note : If you use the BorderIncertitude class, you will need also the standard deviation map and the result of the getitem will give you a couple of ScaledArray (+/- n times the deviation).
Example of code found inside the BorderIncertitude class :

def at_locations(self, arrays):
        Computes the values of the inner and outer border using binary erosion
        operations and bitwise exclusive or operations. Returns a tuple of
        arrays taht conatains 1/0 values (1=border, 0=noborder)
        arrays : Tuple(2*ScaledArray)
            The arrays to process to obtain the tuple of borders
            The tuple of borders array
        typ = np.int32 # Ensuring the type for binary operations
        inner, outer = compute_cross_section_border(arrays[0],
                arrays[1], factor=self.factor, threshold=self.threshold) # Computes borders
        thresholded_inner = threshold_array(inner, threshold=self.threshold) # Returns the values >= thr
        thresholded_outer = threshold_array(outer, threshold=self.threshold) # Returns the values >= thr
        eroded_inner = ndimage.binary_erosion(thresholded_inner).astype(typ) # Erosion operation
        eroded_outer = ndimage.binary_erosion(thresholded_outer).astype(typ) # Erosion operation
        border_inner = np.bitwise_xor(thresholded_inner, eroded_inner) # Returns only the inner borders
        border_outer = np.bitwise_xor(thresholded_outer, eroded_outer)# # Returns only the outer borders
        inner_scarray = ScaledArray(border_inner, arrays[0].dimHelper,
        outer_scarray = ScaledArray(border_outer, arrays[0].dimHelper,
        return (inner_scarray, outer_scarray)

Updated by Fiagnon Paul Etse about 3 years ago · 2 revisions