In GPC (Global Predictive Control) controllers, we must simulate the process to control. To preform the simulation we must first identify the system. There are mainly three type of identification algorithms: global, structured and local.
Lazy learning postpones all the computation until an explicit request for a prediction is received. The request is fulfilled by modelling locally the relevant data stored in the database and using the newly constructed model to answer the request. Each prediction requires therefore a local modelling procedure. We will use, as regressor, a simple polynomial. The number of points used for the construction of this polynomial (in other words, the kernel size or the bandwidth) and the degree of the polynomial (0,1, or 2) are chosen using leave-one out cross-validation technique. The identification of the polynomial is performed using a recursive least square algorithm. The cross-validation is performed at high speed using PRESS statistics.
What is leave-one out cross-validation? This is a technique used to evaluate to quality of a regressor build on a dataset of points. First, take points from the dataset (These points are the training set). The last point will be used for validation purposes (validation set). Build the regressor with the training set. Use this new regressor to predict the value of the only point which has not been used during the building process. You obtain a fitting error between the true value of and the predicted one. Continue, in a similar way, to build regressors, each time on a different dataset of points. You obtain fitting errors for the regressors you have build. The global cross-validation error is the mean of all these fitting errors.
We will choose the polynomial which has the lowest cross-validation error. We can also give an estimate of the quality of the prediction: it's simply the cross-validation error of the chosen model.
We can also choose the k polynomials which have the lowest cross-validation error and use, as final model, a new model which is the weighted average of these k polynomials. The weight is the inverse of the cross-validation error. In the Lazy learning toolbox, other possibilities of combination are possible.
The Lazy Learning requires a dataset in which the variance of the noise in
the data is locally constant. There can be a huge
noise on a small part of the data and nearly no noise everywhere else: This
is no problem. On the contrary, global identification tools like Neural Networks
demands that the variance of the noise in the data is globally
constant. This is why Lazy Learning performs nearly systematically
better than other classical global identification techniques.
Another advantage is the facility for on-line adaptation of the regressor. In the case of Takagi-Sugeno fuzzy systems, each time we increase the training set we have to wait several minutes for the computation of the new optimal number of rules, the calculation of the new "places" for each rules in the space and the construction of all the local models. On the contrary, in the case of Lazy Learning, we only have to add one point inside the training dataset. That's all. You can see here an example of on-line adaptation with lazy learning.
One drawback of the lazzy learning approach is that it takes time to construct at each query a good local model. Fortunately, most of the calculation time is spend in the search of the neighbours points inside the dataset. For a small dataset (less than 10000 points), the query is fullfilled immediatly (no wait time on a P3 600 Mhz) (the search-loop has been highly optimized).
The first Lazy Learning toolbox : it's available for use with matlab at the homepage of M.Birattari.
The original algorithm has been nearly completely rewritten and forms now a stand-alone package. The source code is available here. It is pure C/C++ code which compiles on both windows and unix. The search inside the dataset has been accelerated. You can now re-configure the parameters "on the fly". The Object Oriented approach allows a better comprehention of the algorithm. There is a small example file "testSetGen.cpp" which demonstrate the toolbox. You will find inside this paper informations on how to use the classes.