DESIGN OF ONLINE LEARNING SCHEMES

The previous section has dealt with rewriting the nonlinear system, in particular the func- tional uncertainty f*, into a form that is convenient for designing online learning models and parameter adaptive laws. In that section we defined the utility variable x. For each type of system, we presented two equations: the parametric model equation shows the depen- dence of x on the parametric function approximator; and, the measurement equation shows how x can be computed from measured signals. In this section, we consider the design of online learning models for nonlinear function approximation, based on the parametric forms derived in the previous section. The online learning model will generate a training signal e ( t ) that will be used to approximate the unknown nonlinearities in the system. The online learning model consists of the adaptive approximator augmented by identifier dynamics. The identifier dynamics are used to incorporate any a priori knowledge into the identification design and to filter some of the signals to avoid the use of differentiators and decrease the effects of noise.

We now proceed to the design of online learning schemes for dynamic systems. We will consider the design of two approaches: (i) the Error Filtering Online Learning (EFOL) scheme, and (ii) the Regressor Filtering Online Learning (RFOL) scheme.

4.3.1 Error Filtering Online Learning (EFOL) Scheme

Based on the general parametric model (4.1 l), the EFOL model is described by

(4.36) Therefore, the estimator is obtained by replacing the unknown “optimal” weights 8* and 8 , by their parameter estimates 8 ( t ) and S ( t ) , respectively. The output estimation error e ( t ) , which will be used in the update of the parameter estimates, is given by

e ( t ) = X ( t ) - X ( t ) , (4.37) where X ( t ) generated by (4.12) is a measurable variable. The architecture of the EFOL scheme is depicted as a block diagram in Figure 4.7. As can be seen from the diagram, the inputs to the EFOL scheme are the plant input vector u ( t ) and measurable state vector z ( t ) . The output estimation error e ( t ) , used in the update of the parameter estimates e ( t ) and 8 ( t ) , can be regarded as the output of the EFOL model.

Alternatively, one may consider the EFOL model as consisting of two components:

(1) the adaptive approximator, which is selected based on the considerations outlined in Chapters 2 and 3; and (2) the rest of the parts, referred to as the estimator, which contains the filters and apriori known nonlinearities fo. The block diagram of this configuration is depicted in Figure 4.8. As seen from the diagram, this configuration for viewing the EFOL model isolates the approximator, which is usually a convenient way for implementing the online learning design, as it requires fewer filters.

To extract some intuition behind this online learning scheme and to understand why it is referred to as “error filtering” scheme, we use (4.36) and (4.1 1) to rewrite the output estimation error as

. . .

I Online Learning Model 1

I I

I - l -

., I

I 1 - - - t(0 &[L) ; - t I

L - - - I

Figure 4.7: Block diagram of error filtered online learning system. The dashed box under the approximator indicates the dynamics of the parameter estimator.

r " " " - - - - " - - - - " -

Online Learning Model I ' I

I ' , I

I I ...

Estimator ':

S + h

, ...

...

L-,,,,-,,--- I

Figure 4.8: Alternative block diagram configuration for EFOL model for dynamical systems.

Therefore, e( t ) is equal to the filtered version ofthe approximation error f ( z ( t ) ; b ( t ) . 8 ( t ) ) - f * ( z ( t ) ) at time t ; thus the term "error filtering."

A key observation is that if at some specific time t = t l , the estimation error e(t1) = 0, this does not necessarily imply that f(z(t1); e ( t l ) , & ( t l ) ) = f * ( ( z ( t l ) ) . Moreover, the reverse is also not valid; the fact that f ^ ( z ( t l ) ; e ( t l ) , 8 ( t l ) ) = f'((z(t1)) does not imply that e ( t l ) = 0 (see Exercise 4.2). In general, the estimation error signal e(t) follows the approximation error signal f * ( z ( t ) ; @ t ) , 8 ( t ) ) - f * ( ( z ( t ) ) with some decay dynamics that depend on the value of A. It is easy to see that the larger the value of X the closer the estimation error will follow the approximation error. On the other hand, in the presence

of measurement noise, a large value of X will allow noise to have a greater effect on the approximator parameters. This may also be seen from Figures 4.7 and 4.8, where X multiplies the state measurement vector z ( t ) .

The EFOL scheme can be applied both to linearly as well nonlinearly parameterized approximators. In the special case of linearly parameterized approximators, the EFOL model described by (4.36) becomes

(4.38) where d(t) are the adjustable parameters and p(z(t)) is a vector of basis functions. The remaining components of the online learning model remain the same. As presented in Fig- ure 4.8, any of the approximators described in Chapter 3 can be inserted as the approximator component of the online learning scheme.

Eqn. (4.38) should be contrasted with eqn. (4.17). In (4.17) 8* is a constant vector that can be factored through the filter without affecting the validity of the equation. In (4.38), 6 ( t ) cannot be pulled through the filter as it is not a constant vector.

For readers who are more comfortable with state space representations, the EFOL model can be readily described in state space form using the same procedure described in Sec- tion 4.2. Specifically, g ( t ) is described in state space form as

To compute the output estimation error e ( t ) = f ( t ) - ~ ( t ) , the variable X ( t ) is generated according to (4.23)-(4.24). Therefore, the estimation error e ( t ) is described in state space form as:

i ( t ) = - W t ) - W t ) - fo(z(t), 4 t ) ) (4.40)

e ( t ) = x ( t ) - X [ ( t ) - Xz(t). (4.41)

Although in this section we have worked only with the filter &, the same design procedure can be applied to any SPR filter W ( s ) . Based on the parametric model (4.17), the EFOL model is of the form

(4.42) 4.3.2 Regressor Filtering Online Learning (RFOL) Scheme

The second class of learning models that we consider is called Regressor Filtering Online Learning (RFOL) scheme. The way it is introduced here, this learning model can be designed only for linearly parameterized approximators. It is important to reiterate that the RFOL scheme is not based on the EFOL model (4.38).

Based on the linearly parameterized model (4.18), the RFOL model is described by:

k ( t ) = B ( t ) T C ( t ) . (4.43)

where C is a vector of filtered basis functions

X (4.44)

a t ) = s+x [4?J(.(t))l.

In the more general case of a filter of the form W ( s ) , C becomes

C ( t ) = W ( s ) [d@(t))l ' (4.45)

The name "regressor filtering" is due to the filtering W ( s ) being placed in between the basis functions 4 (sometimes referred to as regressor) and the adaptable parameters 8, as shown in Figure 4.9. As we will see later on, RFOL models allow the use of powerful optimization methods, for deriving parameter adaptive laws, with provable convergence properties.

............................................

, 6

Figure 4.9: Online learning scheme based on regressor filtering,

An important observation from Figure 4.9 is that the adaptive approximator as used in generating g ( t ) is no longer a static mapping, since it contains filters in the middle, which have dynamics. At any time instant, a static approximator can still be produced as f ( z ) = 8 ( t ) T $ ( z ( t ) ) , but it is not utilized in the learning scheme.

In the state space representation, the RFOL model is described by

I ( t ) = - M t ) - X 4 ( 4 t ) ) (4.46)

k ( t ) = e T ( t ) C . (4.47)

To compute the output estimation error e ( t ) = k ( t ) - ~ ( t ) , the variable ~ ( t ) is again generated according to (4.23H4.24). A key characteristic of the RFOL model is that the output estimation error e ( t ) satisfies

e ( t ) = ( e ( t ) - e * ) [ ( t ) - 6 ( t ) . (4.48) Therefore, the relationship between the output estimation error e ( t ) and the parameter estimation error 8 = 8 ( t ) - 8* is a simple linear and static relationship, which allows the direct use of linear regression methods.

A block diagram representation of the overall configuration for the RFOL model is depicted in Figure 4.10. In comparing the EFOL and RFOL configurations, as shown in Figure 4.8 and Figure 4.10, we notice that the EFOL requires only n filters (where n is the number ofstate variables), whereas the RFOL requires n f N filters. where N is the number of basis functions. In general, the number of basis functions is quite large, especially in cases where the dimension of the input z is large. Therefore, the RFOL scheme is significantly more demanding computationally than the EFOL scheme.

COMPONENTS OF APPROXIMATION BASED CONTROL

Extent of Influence Function Support