Tutorial: Write an Interpolator¶
In case you want to extend the interpolation facilities its straightforward to write an additional Interpolator.
First, it has to be decided what interpolator should be implemented. In this tutorial a nearest neighbor interpolator is implemented.
Example: Nearest Neighbor Interpolator¶
A Nearest Neighbor interpolator just looks for the dataset, whose distance in the coordinate space is smallest. Then, the properties of this nearest dataset are returned for the requested coordinates. Our implementation in this tutorial will use scipy’s nearest neighbor search algorithm for a fast search and overall performance
Step 0: Colt Factory¶
Interpolator are written as Plugins of the Interpolator Colt-Factory, i.e. they are inherited from the Factory. The Factory takes already care of many things, which are common to any interpolator, e.g. coordinate transformation and energy-only calculations etc. During the example we will encounter many of them and explain them whenever they appear.
1from scipy.spatial import cKDTree
2from pysurf.spp import Interpolator
Step 1: Basic Features¶
Every Interpolator has to implement 5 functions:
- def get(self, request):
fill the request with the demanded data and additionally a flag has to be returned whether the result is trustworthy If fit_only of the SPP is False, an electronic structure calculation is started, else the result of the interpolator is taken. If fit_only is True, the result of the interpolator will be used anyway.
- def get_interpolators(self, db, properties):
for each property that is requested, a separate interpolator is set up. The function has to return a dictionary that contains the property name and the interpolator for the property
- def save(self, filename):
The save method is primarily for machine learning algorithms. The weights have to put in a file.
- def get_interpolators_from_file(self, filename, properties):
This function reads the weightsfile and sets up the interpolators for the properties from the weightsfile. Specifically in machine learning algorithms, its not necessary to train the algorithm again.
- def _train(self):
This function uses the input data to train the interpolator. It is primarily used in machine learning interpolators.
- def loadweights(self, filename):
The weights are loaded from a file.
1 class NearestNeighborInterpolator(Interpolator):
2 """Nearest Neighbor Interpolator"""
3
4 @classmethod
5 def from_config(cls, config, db, properties, logger, energy_only, weightsfile, crdmode,
6 fit_only):
7 """ This classmethod overwrites the from_config method of the factory and is needed
8 if interpolator specific user input is implemented
9 """
10
11 def __init__(self, db, properties, logger, energy_only=False, weightsfile=None,
12 crdmode='cartesian', fit_only=False)
13 """ This init overwrites the init of the interpolator factory. Therefor, it is
14 advisable to call the init of the interpolator factory to make sure that all
15 functionality is working and only additional features are added here.
16 """
17 super().__init__(db, properties, logger, energy_only, weightsfile, crdmode=crdmode,
18 fit_only=fit_only)
19
20
21 def get(self, request):
22 """fill request
23
24 Return request and if data is trustworthy or not
25 """
26
27 def get_interpolators(self, db, properties):
28 """ """
29
30 def save(self, filename):
31 """Save weights"""
32
33 def get_interpolators_from_file(self, filename, properties):
34 """setup interpolators from file"""
35
36 def _train(self):
37 """train the interpolators using the existing data"""
38
39 def loadweights(self, filename):
40 """load weights from file"""
Step 2: User Input¶
PySurf is build around the Colt framework, developed along the lines of this project. To specify specific input needed for your class you simply use the _questions string: In our example we will need 4 user inputs:
- trust_radius_general, float
the radius to decide whether an interpolation is trustworthy
- trust_radius_ci, float
the radius in the region of small energy gaps to decide whether an interpolation is trustworthy
- energy_threshold, float
the threshold to distinguish between regions with small and large energy gap
- norm, str
the norm that is used to measure the distance between points
1 class NearestNeighborInterpolator(Interpolator):
2 """Basic Rbf interpolator"""
3
4 _questions = """
5 trust_radius_general = 0.75 :: float
6 trust_radius_ci = 0.25 :: float
7 energy_threshold = 0.02 :: float
8 norm = euclidean :: str :: [euclidean]
9 """
10 @classmethod
11 def from_config(cls, config, db, properties, logger, energy_only, weightsfile, crdmode, fit_only):
12 trust_radius_general = config['trust_radius_general']
13 trust_radius_CI = config['trust_radius_ci']
14 energy_threshold = config['energy_threshold']
15 #
16 # convert input for norm in corresponding input (p-Norm) for the cKDTree
17 # for more information go to the cKDTree.query documentation
18 if config['norm'] == 'manhattan':
19 norm = 1
20 elif config['norm'] == 'max':
21 norm = 'infinity'
22 else:
23 norm = 2
24 #
25 return cls(db, properties, logger, energy_only=energy_only, weightsfile=weightsfile,
26 crdmode=crdmode, trust_radius_general=trust_radius_general,
27 trust_radius_CI=trust_radius_CI, energy_threshold=energy_threshold,
28 fit_only=fit_only, norm=norm)
29
30 def __init__(self, db, properties, logger, energy_only=False, weightsfile=None,
31 crdmode='cartesian', fit_only=False, trust_radius_general=0.75,
32 trust_radius_CI=0.25, energy_threshold=0.02, norm='euclidean'):
33 self.trust_radius_general = trust_radius_general
34 self.trust_radius_CI = trust_radius_CI
35 self.energy_threshold = energy_threshold
36 self.tree = None
37 self.norm = norm
38 # Call the init method of the Interpolator Factory
39 super().__init__(db, properties, logger, energy_only, weightsfile,
40 crdmode=crdmode, fit_only=fit_only)
Parameters
- db:
databse containing the datasets, on which the interpolation is based on
- properties: list
properties (e.g. [‘energy’, ‘gradient’]) that should be fitted
- logger:
logger to log any incident
- energy_only: bool, optional
if energy_only is True, gradients are derived from the energy surface
- weightsfile: str, optional
filepath, where to save the weights. Not used in the case of the NearestNeighborInterpolator, but needed for the overall framework.
- crdmode: str, optional
Variable to determine whether a coordinate transformation is applied before fitting.
- fit_only: bool, optional
Flag to determine, whether no new QM calculations are performed
- trust_radius_general: float, optional
radius to determine whether fitted result is trustworthy in regions of a large energy gap
- trust_radius_CI: float, optional
radius to determine whether fitted result is trustworthy in regions of a small energy gap
- energy_threshold: float, optional
Threshold to distinguish regions of small and large energy gaps.
- norm: str, optional
Determining the norm for the nearest neighbor search. ‘manhattan’ corresponds to the 1-norm, ‘euclidean’ is the 2-norm, and ‘max’ is the infinity norm.
Step 3: Implement get_interpolators function¶
The next step is to implement the get_interpolators method and the helper class for the NearestNeighborInterpolator of each property NNInterpolator. For each property, an Interpolator is set up, which is an instance of the NNInterpolator class. Each interpolator has to be callable and to return the desired property.
1class NearestNeighborInterpolator(Interpolator):
2 """Nearest Neighbor Interpolator"""
3
4 ...
5 def get_interpolators(self, db, properties):
6 """ """
7 self.tree = cKDTree(self.crds)
8 return {prop_name: NNInterpolator(db, self.tree, prop_name)
9 for prop_name in properties}, len(db)
10
11
12 class NNInterpolator():
13 def __init__(self, db, ckdtree, prop):
14 self.db = db
15 self.tree = ckdtree
16 self.prop = prop
17
18 def __call__(self, crd, request=None, idx=None):
19 if idx is None:
20 dist, idx = self.tree.query(crd)
21 return self.db.get(self.prop, idx)
The get_interpolators method returns a dictionary with the property names as keys and the interpolator for that specific property as value. For each property a separate interpolator has to be set up so that the interpolator factory can handle the interpolators for the properties independently, which allows e.g. the energy_only calculations. Implementing the interpolators in this way, they naturally are included in the code package and the full functionality is available.
To avoid that the cKDTree is set up several times, the NNInterpolator takes the tree as a Parameter. Moreover, if NNInterpolator is called with an index, no nearest neighbor search is performed, but the property of dataset with the index is returned. This is important in the case when several properties are demanded so that the nearest neighbor search is done only once, cf. Step 4 and the get function.
Step 4: Implement get function¶
The get function is called with the request as parameter. It has to fill in the desired results from the fit into the request instance and state whether the fit is trustworthy.
1class NearestNeighborInterpolator(Interpolator):
2 """Nearest Neighbor Interpolator"""
3
4 ...
5
6 def get(self, request):
7 #
8 # Convert coordinate into desired format
9 if self.crdmode == 'internal':
10 crd = internal(request.crd)
11 else:
12 crd = request.crd
13 #
14 # Make nearest neighbor search once and pass it to all interpolators
15 dist, idx = self.tree.query(crd, p=self.norm)
16 for prop in request:
17 request.set(prop, self.interpolators[prop](crd, request, idx))
18 #
19 # Determine whether result is trustworthy, using the trust radii
20 diffmin = np.min(np.diff(request['energy']))
21 is_trustworthy = False
22 if diffmin < self.energy_threshold:
23 if dist < self.trust_radius_CI: is_trustworthy = True
24 else:
25 if dist < self.trust_radius_general: is_trustworthy = True
26 #
27 return request, is_trustworthy
The get function first has to make sure that the interpolators get the right coordinates. Subsequently, the interpolators for all the desired properties are called and the results are put into the request instance. Finally, it is checked, whether the requested point is within the trusted region. The trusted region is devided into two parts, depending whether the smallest energy gap between two potential energy surfaces is small or large. The threshold is given as the energy_threshold as user input as well as the radii trust_radius_ci and trust_radius_general.
Step 5: Implement the save, load and _train methods¶
The NearestNeighborInterpolator does not use the functionality that the interpolators and their parameters are stored to a file and read afterwards, to avoid long training sessions. The training of the Nearest Neighbor search is just to update the cKDTree, which doesn’t take very long. Therefor, these functions are not really used, but implemented in a way to make sure that the full functionality is available.
1class NearestNeighborInterpolator(Interpolator):
2 """Nearest Neighbor Interpolator"""
3
4 ...
5
6def loadweights(self, filename):
7 """ Weights are loaded for the interpolators from a file. As the
8 NearestNeighborInterpolator is not using the save option, also
9 here, interpolators are just set up from the database
10
11 Parameters:
12 -----------
13 filename, str:
14 filepath of the file containing the weights. Not used here!
15 """
16 #
17 self.logger.warning("NearestNeighborInterpolator cannot load weights, interpolators are " +
18 "set up from DB")
19 # As saving is not used, interpolators are set up from the database
20 self.get_interpolators(self.db, self.properties)
21
22 def save(self, filename):
23 """ Method to save the interpolators to a file. Not used here!
24
25 Parameters:
26 -----------
27 filename:
28 filepath where to save the information. Not used here!
29 """
30 #
31 self.logger.warning("NearestNeighborInterpolator cannot be saved to a file")
32
33 def _train(self):
34 """ Method to train the interpolators. In the case of the NearestNeighborInterpolator
35 the cKDTree has to be updated.
36 """
37 #update cKDTree
38 self.tree = cKDTree(self.crds)