Project

General

Profile

Cloud mapping using Gaussian Process Regression » History » Version 2

Fiagnon Paul Etse, 2021-03-17 09:21
syntax error

1 1 Rafael Bailon-Ruiz
h1. Cloud mapping using Gaussian Process Regression
2
3 2 Fiagnon Paul Etse
The topic of this page is to show you how to build a map from entries of the database. This page will introduce you to the GprPredictor class and its subclasses (ValueMap, StdMap). There will be also a presentation of the BorderMap subclasses.
4 1 Rafael Bailon-Ruiz
5
h2. The GprPredictor class
6
7
GprPredictor is a class that is intended to produce maps based on the data contained in database using Gaussian Process Regression. To work, the Map generator needs a Kernel that will indicate the length scales for the 4 dimensions (time, x, y ,z). This way, the map can know which data is relevant and which one is not, since wind affect position of data in time. So before any computation is done, the data must be chosen. At time t, get the current samples and all the data present the length scales. Using wind*dt (where dt = tcurrent - tdata), we can select which are still relevant. Once done, we can use the Gaussian process regression to create our array of data. The Gaussian process regression is done by the sklearn library (see "here":https://scikit-learn.org/stable/modules/gaussian_process.html).
8
To know which map you want to compute, you first have to use the choose the coordinates of the map you want to compute. The training locations and values will be built from the the resulting change of coordinate system and the span of the Kernel (start coordinates + 3 times the span of the Kernel).
9
The bounding box (the real computation of data) is given by the addition of the derivation of wind and its borders. The minimum or maximum of coordinates, depending where the computation is done, creates the real bounding box.
10
We select the possible locations between the boundingBox +/- the span of the kernel and these are used with the Gaussian Process Regression, after fitting with the train values and locations.
11
The return of the getitem function is a tuple. This tuple is always a size of two and the first element always contains a ScaledArray of the computed locations. The second position of the tuple can either be None or can be the ScaledArray of the standard deviation of the Gaussian Process Regression.
12
13
<pre><code class="python">def at_locations(self, locations, locBounds=None):
14
    """
15
    Computes predicted value at each given location using GPR.
16
    This method is used in the map interface when requesting a dense map.
17
    When requesting a dense map, each location must be the position of       
18
    on pixel of the requested map.
19
20
    This method automatically fetch relevant data from self.database         
21
    to compute the predicted values.                                         
22
23
    Parameters
24
    ----------
25
    locations : numpy.array (N x 4)
26
        Locations N x (t,x,y,z) for each of the N points where to compute
27
        a predicted value using GPR.
28
        Note : this implementation of GprPredictor does not enforce the
29
        use of a 4D space (t,x,y,z) for location. However, the
30
        self.database attribute is most likely a
31
        nephelae.database.NephelaeDataServer type, which enforce the
32
        use of a 4D (t,x,y,z) space.
33
34
    Returns
35
    -------
36
    numpy.array (N x M)
37
       Predicted values at locations. Can be more than 1 dimensional
38
       depending on the data fetched from the database.
39
       Example : If the database contains samples of 2D wind vector.
40
       The samples are 2D. The predicted map is then a 2D field vector
41
       defined on a 4D space-time.                                                                                                                  
42
43
       Note : This method probably needs more refining.
44
       (TODO : investigate this)
45
    """
46
    with self.locationsLock: # This is happening after the change of coordinates
47
       if locBounds is None:
48
           kernelSpan = self.kernel.span()
49
           locBounds = Bounds.from_array(locations.T)
50
           locBounds[0].min = locBounds[0].min - kernelSpan[0]
51
           locBounds[0].max = locBounds[0].max + kernelSpan[0]
52
53
           locBounds[1].min = locBounds[1].min - kernelSpan[1]
54
           locBounds[1].max = locBounds[1].max + kernelSpan[1]
55
56
           locBounds[2].min = locBounds[2].min - kernelSpan[2]
57
           locBounds[2].max = locBounds[2].max + kernelSpan[2]
58
59
           locBounds[3].min = locBounds[3].min - kernelSpan[3]
60
           locBounds[3].max = locBounds[3].max + kernelSpan[3]
61
62
       samples = self.dataview[locBounds[0].min:locBounds[0].max,
63
                               locBounds[1].min:locBounds[1].max,
64
                               locBounds[2].min:locBounds[2].max,
65
                               locBounds[3].min:locBounds[3].max] # Searching data inside database
66
67
       if len(samples) < 1: # No data found, filling the blanks
68
           return (np.ones((locations.shape[0], 1))*self.kernel.mean,
69
                   np.ones(locations.shape[0])*self.kernel.variance)
70
       else:                                                                                 
71
           trainLocations =\
72
              np.array([[s.position.t,\
73
                         s.position.x,\
74
                         s.position.y,\
75
                         s.position.z]\
76
                         for s in samples]) # Setting up all the locations of found samples...
77
78
                trainValues = np.array([s.data for s in samples]).squeeze() # ...And all their values
79
                if len(trainValues.shape) < 2:
80
                    trainValues = trainValues.reshape(-1,1)
81
                                                                                 
82
                boundingBox = (np.min(trainLocations, axis=0),
83
                    np.max(trainLocations, axis=0)) # Creating the bounding box 
84
85
                dt = boundingBox[1][0] - boundingBox[0][0] # delta time
86
                wind = self.kernel.windMap.get_wind()
87
                dx, dy = dt*wind
88
89
                # Creating the bounding box
90
                boundingBox[0][1] = min(boundingBox[0][1], boundingBox[0][1] +
91
                        dx)
92
                boundingBox[1][1] = max(boundingBox[1][1], boundingBox[1][1] +
93
                        dx)                                                      
94
95
                boundingBox[0][2] = min(boundingBox[0][2], boundingBox[0][2] +
96
                        dy)                                                      
97
                boundingBox[1][2] = max(boundingBox[1][2], boundingBox[1][2] +
98
                        dy)
99
100
                same_locations = np.where(np.logical_and(
101
                    np.logical_and(
102
                        np.logical_and(
103
                            locations[:,0] >= boundingBox[0][0] - kernelSpan[0],
104
                            locations[:,0] <= boundingBox[1][0] + kernelSpan[0]),
105
                        np.logical_and(
106
                            locations[:,1] >= boundingBox[0][1] - kernelSpan[1],
107
                            locations[:,1] <= boundingBox[1][1] + kernelSpan[1])),
108
                    np.logical_and(
109
                        np.logical_and(
110
                            locations[:,2] >= boundingBox[0][2] - kernelSpan[2],
111
                            locations[:,2] <= boundingBox[1][2] + kernelSpan[2]),
112
                        np.logical_and(
113
                            locations[:,3] >= boundingBox[0][3] - kernelSpan[3],
114
                            locations[:,3] <= boundingBox[1][3] + kernelSpan[3])
115
                    )))[0] # Searching all similar locations
116
117
                selected_locations = locations[same_locations] # Getting locations we are interested in
118
                self.gprProc.fit(trainLocations, trainValues) # Training of the Gaussian process
119
                computed_locations = self.gprProc.predict(
120
                        selected_locations, return_std=self.computeStd) # Locations
121
                [...]
122
</code></pre>
123
124
GprPredictor has two subclasses : ValueMap and StdMap
125
126
h2. ValueMap and StdMap
127
128
You can see ValueMap as an alias map of GprPredictor, all functions are calls of the GprPredictor. The difference is that the getitem function will alway return the ScaledArray of Gaussian Process Regression values instead of a tuple. 
129
StdMap follow the same principle, returning only the standard deviation ScaledArray.
130
131
h2. Mapping the border of clouds
132
133
You can map the border of the clouds using the BorderIncertitude class or the BorderRaw class.
134
To compute the border of the clouds, we use the result of the GprPredictor and a threshold to apply on it.
135
Then, we apply a binary xor operation between the thresholded array and the clouds (clouds = values superior or equal to the threshold).
136
The result of this operation are the border of the clouds.
137
Note : If you use the BorderIncertitude class, you will need also the standard deviation map and the result of the getitem will give you a couple of ScaledArray (+/- n times the deviation).
138
Example of code found inside the BorderIncertitude class :
139
<pre><code class="python">def at_locations(self, arrays):
140
        """
141
        Computes the values of the inner and outer border using binary erosion
142
        operations and bitwise exclusive or operations. Returns a tuple of
143
        arrays taht conatains 1/0 values (1=border, 0=noborder)
144
&nbsp;
145
        Parameters
146
        ----------
147
        arrays : Tuple(2*ScaledArray)
148
            The arrays to process to obtain the tuple of borders
149
&nbsp;
150
        Returns
151
        -------
152
        Tuple(2*ScaledArray)
153
            The tuple of borders array
154
        """
155
        typ = np.int32 # Ensuring the type for binary operations
156
        inner, outer = compute_cross_section_border(arrays[0],
157
                arrays[1], factor=self.factor, threshold=self.threshold) # Computes borders
158
        thresholded_inner = threshold_array(inner, threshold=self.threshold) # Returns the values >= thr
159
        thresholded_outer = threshold_array(outer, threshold=self.threshold) # Returns the values >= thr
160
        eroded_inner = ndimage.binary_erosion(thresholded_inner).astype(typ) # Erosion operation
161
        eroded_outer = ndimage.binary_erosion(thresholded_outer).astype(typ) # Erosion operation
162
        border_inner = np.bitwise_xor(thresholded_inner, eroded_inner) # Returns only the inner borders
163
        border_outer = np.bitwise_xor(thresholded_outer, eroded_outer)# # Returns only the outer borders
164
        inner_scarray = ScaledArray(border_inner, arrays[0].dimHelper,
165
                arrays[0].interpolation)
166
        outer_scarray = ScaledArray(border_outer, arrays[0].dimHelper,
167
                arrays[0].interpolation)
168
        return (inner_scarray, outer_scarray)
169
</code></pre>