Robot Manipulators_2 pdf

14 Novel Framework of Robot Force Control Using Reinforcement Learning Byungchan Kim1 and Shinsuk Park2 1Center for Cognitive Robotics Research, Korea Institute of Science and Technolo

Trang 1

14

Novel Framework of Robot Force Control Using

Reinforcement Learning

Byungchan Kim1 and Shinsuk Park2

1Center for Cognitive Robotics Research, Korea Institute of Science and Technology

2Department of Mechanical Engineering, Korea University

Korea

1 Introduction

Over the past decades, robotic technologies have advanced remarkably and have been proven to be successful, especially in the field of manufacturing In manufacturing, conventional position-controlled robots perform simple repeated tasks in static environments In recent years, there are increasing needs for robot systems in many areas that involve physical contacts with human-populated environments Conventional robotic systems, however, have been ineffective in contact tasks.Contrary to robots, humans cope with the problems with dynamic environments by the aid of excellent adaptation and learning ability In this sense, robot force control strategy inspired by human motor control would be a promising approach

There have been several studies on biologically-inspired motor learning Cohen et al suggested impedance learning strategy in a contact task by using associative search network (Cohen et al., 1991) They applied this approach to wall-following task Another study on motor learning investigated a motor learning method for a musculoskeletal arm model in free space motion using reinforcement learning (Izawa et al., 2002) These studies, however, are limited to rather simple problems In other studies, artificial neural network models were used for impedance learning in contact tasks (Jung et al., 2001; Tsuji et al., 2004) One

of the noticeable works by Tsuji et al suggested on-line virtual impedance learning method

by exploiting visual information Despite of its usefulness, however, neural network-based learning involves heavy computational load and may lead to local optimum solutions easily The purpose of this study is to present a novel framework of force control for robotic contact tasks To develop appropriate motor skills for various contact tasks, this study employs the following methodologies First, our robot control strategy employs impedance control based on a human motor control theory - the equilibrium point control model The

equilibrium point control model suggests that the central nervous system utilizes the

spring-like property of the neuromuscular system in coordinating multi-DOF human limb movements (Flash, 1987) Under the equilibrium point control scheme, force can be controlled separately by a series of equilibrium points and modulated stiffness (or more generally impedance) at the joints, so the control scheme can become simplified considerably Second, as the learning framework, reinforcement learning (RL) is employed

to optimize the performance of contact task RL can handle an optimization problem in an

Trang 2

unknown environment by making sequential decision policies that maximize external

reward (Sutton et al., 1998) While RL is widely used in machine learning, it is not

computationally efficient since it is basically a Monte-Carlo-based estimation method with

heavy calculation burden and large variance of samples For enhancing the learning

performance, two approaches are usually employed to determine policy gradient One

approach provides the baseline for gradient estimator for reducing variance (Peters et al.,

2006), and the other suggests Bayesian update rule for estimating gradient (Engel et al.,

2003) This study employs the former approach for constructing the RL algorithm

In this work, episodic natural actor-critic algorithm based on the RLS filter was

implemented for RL algorithm Episodic Natural Actor-Critic method proposed by Peters et

al is known effective in high-dimensional continuous state/action system problems and can

provide optimum closest solution (Peters et al., 2005) A RLS filter is used with the Natural

Actor-Critic algorithm to further reduce computational burden as in the work of Park et al

(Park et al., 2005) Finally, different task goals or performance indices are selected

depending on the characteristics of each task In this work, the performance indices for two

contact tasks were chosen to be optimized: point-to-point movement in an unknown force

field, and catching a flying ball The performance of the tasks was tested through dynamic

simulations

This paper is organized as follows Section 2 introduces the equilibrium point control model

based impedance control methods In section 3, we describe the details of motor skill

learning based on reinforcement learning Finally, simulation results and discussion of the

results are presented

2 Impedance control based on equilibrium point control model

Mechanical impedance of a robot arm plays an important role in the dynamic interaction

between the robot arm and its environment in contact Impedance control is a widely-adopted

control method to execute robotic contact tasks by regulating its mechanical impedance which

characterizes the dynamic behavior of the robot at the port of interaction with its environment

The impedance control law may be described as follows (Asada et al., 1986):

Where τactuator represents the joint torque exerted by the actuators, and the current and

desired positions of the end-effector are denoted by vectors x and xd, respectively Matrices

KC and BC are stiffness and damping matrices in Cartesian space This form of impedance

control is analogous to the equilibrium point control, which suggests that the resulting

torque by the muscles is given by the deviations of the instantaneous hand position from its

corresponding equilibrium position The equilibrium point control model proposes that the

muscles and neural control circuits have “spring-like” properties, and the central nervous

system may generate a series of equilibrium points for a limb, and the “spring-like”

properties of the neuromuscular system will tend to drive the motion along a trajectory that

follows these intermediate equilibrium postures (Park et al., 2004; Hogan, 1985) Fig 1

illustrates the concept of the equilibrium point control Impedance control is an extension of

the equilibrium point control in the context of robotics, where robotic control is achieved by

imposing the end-effector dynamic behavior described by mechanical impedance

Trang 3

Novel Framework of Robot Force Control Using Reinforcement Learning 261

Figure 1 Conceptual model of equilibrium point control hypothesis

For impedance control of a two-link manipulator, stiffness matrix KC is formed as follows:

In equation (3), orthogonal matrix V is composed of the eigenvectors of stiffness matrix KC,

and the diagonal elements of diagonal matrix Σ consists of the eigenvalues of stiffness

matrix KC Stiffness matrix KC can be graphically represented by the stiffness ellipse (Lipkin et

al., 1992) As shown in Fig 2, the eigenvectors and eigenvalues of stiffness matrix

correspond to the directions and lengths of principal axes of the stiffness ellipse,

respectively The characteristics of stiffness matrix KC are determined by three parameters of

its corresponding stiffness ellipse: the magnitude (the area of ellipse: 2λ1λ2), shape (the

length ratio of major and minor axes: λ1/λ2), and orientation (the directions of principal

axes: θ) By regulating the three parameters, all the elements of stiffness matrix K C can be

determined

In this study, the stiffness matrix in Cartesian space is assumed to be symmetric and positive

definite This provides a sufficient condition for static stability of the manipulator when it

interacts with a passive environment (Kazerooni et al., 1986) It is also assumed that

damping matrix BC is approximately proportional to stiffness matrix K C The ratio BC /K C is

chosen to be a constant of 0.05 as in the work of Won for human arm movement (Won,

1993)

Trang 4

Figure 2 A graphical representation of the end-effector’s stiffness in Cartesian space The

lengths λ 1 and λ 2 of principal axes and relative angle θ represent the magnitude and the

orientation of the end-effectors stiffness, respectively

For trajectory planning, it is assumed that the trajectory of equilibrium point for the

end-effector, which is also called the virtual trajectory, has a minimum- jerk velocity profile for

smooth movement of the robot arm (Flash et al., 1985) The virtual trajectory is calculated

from the start point x i to the final point x f as follows:

, where t is a current time and t f is the duration of movement

3 Motor Skill Learning Strategy

A two-link robotic manipulator for two-dimensional contact tasks was modeled as shown in

Fig 3 The robotic manipulator is controlled using the impedance control method based on

the equilibrium point control hypothesis as described in Section 2 The stiffness and

damping of the manipulator are modulated during a contact task, while the trajectory of

equilibrium point is given for the task The manipulator learns the impedance modulation

strategy for a specific task through reinforcement learning The state vector is composed of

the joint angles and velocities at the shoulder and elbow joints The action vector changes

the three parameters of stiffness ellipse: the magnitude (the area of ellipse), shape (the

length ratio of major and minor axes), and orientation (the direction of principal axes) This

section describes the learning method based on reinforcement learning for controlling task

impedance of the two-link manipulator in performing two-dimensional contact tasks

Trang 5

Figure 3 Two-link robotic manipulator (L1 and L2: Length of the link, M1 and M2: Mass of

the link, I1 and I2: Inertia of the link)

3.1 Reinforcement Learning

The main components of RL are the decision maker, the agent, and the interaction with the

external environment In the interaction process, the agent selects action at and receives

environmental state st and scalar-valued reward r t as a result of the action at discrete time t

The reward r t is a function that indicates the action performance The agent strives to

maximize reward r t by modulating policy π(s t ,a t) that chooses the action for a given state st

The RL algorithm aims to maximize the total sum of future rewards or the expected return,

rather than the present reward A discounted sum of rewards during one episode (the

sequence of steps to achieve the goal from the start state) is widely used as the expected

return:

1 0

( )

T k

i T k

Here, γ is the discounting factor (0≤γ≤1), and V π(s) is the value function that represents an

expected sum of rewards The update rule of value function V π(s) is given as follows:

In equation (6), the term rt+ ⋅ γ Vπ( ) st+1 − Vπ( ) st is called the Temporal Difference (TD) error

The TD error indicates whether action at at state st is good or not This updated rule is

repeated to decrease the TD error so that value function V π(st) converges to the maximum

point

Trang 6

3.2 RLS-based episodic Natural Actor-Critic algorithm

For many robotic problems, RL schemes are required to deal with continuous state/action

space since the learning methods based on discrete space are not applicable to high

dimensional systems High-dimensional continuous state/action system problems,

however, are more complicated to solve than discrete state/action space problems While

Natural Actor-Critic (NAC) algorithm is known to be an effective approach for solving

continuous state/action system problems, this algorithm requires high computational

burden in calculating inverse matrices To relieve the computational burden, Park et al

suggested modified NAC algorithm combined with RLS filter The RLS filter is used for

adaptive signal filtering due to its fast convergence speed and low computational burden

while the possibility of divergence is known to be rather low (Xu et al., 2002) Since this filter

is designed for infinitely repeated task with no final state (non-episodic task), this approach is

unable to deal with episodic tasks

This work develops a novel NAC algorithm combined with RLS filter for episodic tasks We

named this algorithm the “RLS-based eNAC (episodic Natural Actor-Critic) algorithm.” The

RLS-based eNAC algorithm has two separate memory structures: the actor structure and the

critic structures The actor structure determines policies that select actions at each state, and

the critic structure criticizes the selected action of the actor structure whether the action is

good or not

In the actor structure, the policy at state s t in episode e is parameterized as

π(a t|st )=p(a t|st ,ψ e), and policy parameter vector ψe is iteratively updated after finishing one

episode by the following update rule:

represents the gradient of objective functionJ ψ ( e) Peters et al derived the gradient of the

objective function based on the natural gradient method originally proposed by Amari

(Amari, 1998) They suggested a simpler update rule by introducing the natural gradient

vector we as follows:

e+ ← e+ ∇ α ψ J e ≈ e+ α e

, where α denotes the learning rate (0≤α≤1)

In the critic structure, the least-squares (LS) TD-Q(1) algorithm is used for minimizing the

TD error which represents the deviation between the expected return and the current

prediction value (Boyan, 1999) By considering the Bellman equation in deriving LSTD-Q(1)

algorithm, the action-value function Q π(s, a) can be formulated as follows (Sutton et al.,

γ γ

Trang 7

where V(s t+1 ) approximates value function V π(st+1 ) as iterative learning is repeated Peters et

function can be approximated using the compatible function approximation

( , ) elog ( )T

Aπ s a = ∇ψ π a s w (Peters et al., 2003) With this assumption, we can rearrange

the discounted summation of (10) for one episode trajectory with N states like below:

1

1 1

where the term γ N+1 V(s N+1 ) is negligible for its small effect By letting V(s0) the product of

1-by-1 critic parameter vector v and 1-1-by-1 feature vector [1], we can formulate the following

r v

Here, natural gradient vector we is used for updating policy parameter vector θe (equation

(8)), and value function parameter v e is used for approximating value function V π(st) By

t t t

G φ φ and be= φ er Matrix Ge for episode e is updated to yield the solution

vector χe using the following update rule:

, where δ is a positive scalar constant, and I is an identity matrix By adding the term δI in

(14), matrix Ge becomes diagonally-dominant and non-singular, and thus invertible (Strang,

1988) As the update progresses, the effect of perturbation is diminished Equation (14) is the

conventional form of critic update rule of eNAC algorithm

The main difference between the conventional eNAC algorithm and the RLS-based eNAC

algorithm is the way matrix Ge and solution vector χe are updated The RLS-based eNAC

algorithm employs the update rule of RLS filter The key feature of RLS algorithm is to

exploit Ge-1-1, which already exists, for the estimation of Ge-1 Rather than calculating Pe

Trang 8

(inverse matrix of Ge) using the conventional method for matrix inversion, we used the RLS

filter-based update rule as follows: (Moon et al., 2000):

1

1 1

Where forgetting factor β (0≤β≤1) is used to accumulate the past information in a discount

manner By using the RLS-filter based update rule, the inverse matrix can be calculated

without too much computational burden This is repeated until the solution vector χe

converges It should be noted, however, that matrix Ge should be obtained using (14) for the

first episode (episode 1)

The entire procedure of the RLS-based episodic Natural Actor-Critic algorithm is

for each step,

Take action a t , from stochastic policy π,

then, observe next state st+1 , reward r t

Update Actor structure:

Update policy parameter vector following the rule in (8)

repeat until converge

Table 1 RLS-based episodic Natural Actor-Critic algorithm

3.3 Stochastic Action Selection

As discussed in section 2, the characteristics of stiffness ellipse can be changed by

modulating three parameters: the magnitude, shape, and orientation For performing

contact tasks, policy π is designed to plan the change rate of the magnitude, shape, and

orientation of stiffness ellipse at each state by taking actions ( [ , , ]T

t= a mag a shape a orient

Through the sequence of an episode, policy π determines those actions The goal of the

Trang 9

learning algorithm is to find the trajectory of the optimal stiffness ellipse during the episode

Fig 4 illustrates the change of stiffness ellipse corresponding to each action

Policy π is in the form of Gaussian density function In the critic structure, compatible

t t

π

∇ψ a s w is derived from the stochastic policy π based on the

algorithm suggested by Williams (Williams, 1992) Policy π for each component of action

vector at can be described as follows:

2 2

1

22

Since the stochastic action selection is dependent on the conditions of Gaussian density

function, the action policy can be designed by controlling the mean μ and the variance σ of

equation (16) These variables are defined as:

10.001

T t

Since the stochastic action selection is dependent on the conditions of Gaussian density

function, the action policy can be designed by controlling the mean μ and the variance σ of

equation (16) These variables are defined as:

1 0.001

T t

As can be seen in the equation, mean μ is a linear combination of the vector ω and state

vector st with mean scaling factorξ Variance σ is in the form of the sigmoid function with

the positive scalar parameter η and variance scaling factorξ As a two-link manipulator

model is used in this study, the components of state vector st include the joint angles and

velocities of shoulder and elbow joints (st=[x x x x1, , , , 2 1 2 1]T, where the fifth component of 1 is

a bias factor) Therefore, natural gradient vector w (and thus policy parameter vector ψ) is

composed of 18 components as follows:

=

, where, ωmag, ωshape, and ωorient are 5-by-1 vectors and η mag , η shape , and η orient are parameters

corresponding to the three components of action vector ( [ , , ]T

t= a mag a shape a orient

4 Contact Task Applications

In this section, the method developed in the previous section is applied to two contact tasks:

point-to-point movement in an unknown force field, and catching a flying ball The two-link

manipulator developed in Section 3 was used to perform the tasks in two-dimensional

Trang 10

space The dynamic simulator is constructed by MSC.ADAMS2005, and the control algorithm is implemented using Matlab/Simulink (Mathworks, Inc.)

Figure 4 Some variation examples of stiffness ellipse (a) The magnitude (area of ellipse) is changed (b) The shape (length ratio of major and minor axes) is changed (c) The orientation

(direction of major axis) is changed (solid line: stiffness ellipse at time t, dashed line:

stiffness ellipse at time t+1)

Trang 11

4.1 Point-to-Point Movement in an Unknown Force Field

In this task, the two-link robotic manipulator makes a point-to-point movement in an

unknown force field The velocity-dependent force is given as:

viscous

x y

The movement of the endpoint of the manipulator is impeded by the force proportional to

its velocity, as it moves to the goal position in the force field This condition is identical to

that of biomechanics study of Kang et al (Kang et al., 2007) In their study, the subject

moves ones hand from a point to another point in a velocity-dependent force field, which is

same as equation (18), where the force field was applied by an special apparatus (KINARM,

BKIN Technology)

For this task, the performance indices are chosen as follows:

1 The root mean square of the difference between the desired position (virtual trajectory)

and the actual position of the endpoint (Δrms)

2 The magnitude of the time rate of torque vector of two arm joints ( 2 2

τ τ

τ ) The torque rate represents power consumption, which can also be interpreted as metabolic

costs for human arm movement (Franklin et al., 2004; Uno et al., 1989) By combining the

two performance indices, two different rewards are formulated for one episode as follows:

, where w1 and w2 are the weighting factors, and κ1 and κ2 are constants The reward is a

weighted linear combination of time integrals of two performance indices

The learning parameters were chosen as follows: α = 0.05, β = 0.99, γ = 0.99 The change

limits for action are set as [-10, 10] degrees for the orientation, [-2, 2] for the major/minor

ratio, and [-200π, 200π] for the area of stiffness ellipse The initial ellipse before learning was

set to be circular with the area of 2500π

The same physical properties as in (Kang et al., 2007) were chosen for dynamic simulations

(Table 2)

Table 2 Physical properties of two link arm model

Fig 5 shows the change of stiffness ellipse trajectory before and after learning Before

learning, the endpoint of the manipulator was not even able to reach the goal position (Fig 5

(a)) Figs 5 (b) and (c) compare the stiffness ellipse trajectories after learning using two

different rewards (reward1 and reward2) As can be seen in the figures, for both rewards the

major axis of stiffness ellipse was directed to the goal position to overcome resistance of

viscous force field

Trang 12

Fig 6 compares the effects of two rewards on the changes of two performance indices (Δrms

and τ ) as learning iterates While the choice of reward does not affect the time integral of

rms

Δ , the time integral of τ was suppressed considerably by using reward2 in learning

The results of dynamic simulations are comparable with the biomechanics study of Kang et al The results of their study suggest that the human actively modulates the major axis toward the direction of the external force against the motion, which is in accordance with our results

Figure 5 Stiffness ellipse trajectories (dotted line: virtual trajectory, solid line: actual

trajectory) (a) Before learning (b) After learning (reward1) (c) After learning (reward2)

Figure 6 Learning effects of performance indices (average of 10 learning trial) (a) position error (b) torque rate

4.2 Catching a Flying Ball

In this task, the two-link robotic arm catches a flying ball illustrated in Fig 7 The simulation was performed using the physical properties of the arm as listed in Table 2 The main issues

Trang 13

in ball-catching task would be how to detect the ball trajectory and how to reduce the impulsive force between the ball and the end-effector This work focuses on the latter and assumes that the ball trajectory is known in advance

Figure 7 Catching a flying ball

When a human catches a ball, one moves ones arm backward to reduce the impulsive contact force By considering the human ball-catching, the task is modeled as follow: A ball

is thrown to the end-effector of the robot arm The time for the ball to reach the end-effector

is approximately 0.8sec After the ball is thrown, the arm starts to move following the parabolic orbit of the flying ball While the end-effector is moving, the ball is caught and then moves to the goal position together The robot is set to catch the ball when the end-effector’s moving at its highest speed to reduce the impulsive contact force between the ball and the end-effector The impulsive force can also be reduced by modulating the stiffness ellipse during the contact

The learning parameters were chosen as follows: α = 0.05, β = 0.99, γ = 0.99 The change

limits for action are set as [-10, 10] degrees for the orientation, [-2, 2] for the major/minor

ratio, and [-200π, 200π] for the area of stiffness ellipse The initial ellipse before learning was set to be circular with the area of 10000 π

For this task, the contact force is chosen as the performance index:

where κ is a constant Fig 8 illustrates the change of stiffness during contact after learning

As can be seen in the figure, the stiffness is tuned soft in the direction of ball trajectory, while the stiffness normal to the trajectory is much higher Fig 9 shows the change of the impulse as learning continues As can be seen in the figure, the impulse was reduced considerably after learning

Trang 14

5 Conclusions

Safety in robotic contact tasks has become one of the important issues as robots spread their applications to dynamic, human-populated environments The determination of impedance

Trang 15

control parameters for a specific contact task would be the key feature in enhancing the robot performance This study proposes a novel motor learning framework for determining impedance parameters required for various contact tasks As a learning framework, we employed reinforcement learning to optimize the performance of contact task We have demonstrated that the proposed framework enhances contact tasks, such as door-opening, point-to-point movement, and ball-catching

In our future works we will extend our method to apply it to teach a service robot that is required to perform more realistic tasks in three-dimensional space Also, we are currently investigating a learning method to develop motor schemata that combine the internal models of contact tasks with the actor-critic algorithm developed in this study

Boyan J (1999) Least-squares temporal difference learning, Proceeding of the 16th

International Conference on Machine Learning, pp 49-56

Cohen, M & Flash, T (1991) Learning impedance parameters for robot control using

associative search network, IEEE Transactions on Robotics and Automation, Vol 7,

Issue 3, pp 382-390, ISSN 1042-296X

Engel, Y.; Mannor, S & Meir, R (2003) Bayes meets bellman: the gaussian process approach

to temporal difference learning, Proceeding of the 20th International Conference on Machine Learning, pp 154-161

Flash, T & Hogan, N (1985) The coordination of arm movements: an experimentally

confirmed mathematical model, Journal of Neuroscience, Vol 5, No 7, pp 1688-1703,

ISSN 1529-2401

Flash, T (1987) The control of hand equilibrium trajectories in multi-joint arm movement,

Biological Cybernetics, Vol 57, No 4-5, pp 257-274, ISSN 1432-0770

Franklin, D W.; So, U.; Kawato, M & Milner, T E (2004) Impedance control balances

stability with metabolically costly muscle activation, Journal of Neurophysiology, Vol

92, pp 3097-3105, ISSN 0022-3077

Hogan, N (1985) Impedance control: An approach to manipulation: part I theory, part II

implementation, part III application, ASME Journal of Dynamic System, Measurement, and Control, Vol 107, No 1, pp 1-24, ISSN 0022-0434

Izawa, J.; Kondo, T & Ito, K (2002) Biological robot arm motion through reinforcement

learning, Proceedings of the IEEE International Conference on Robotics and Automation,

Vol 4, pp 3398-3403, ISBN 0-7803-7272-7

Jung, S.; Yim, S B & Hsia, T C (2001) Experimental studies of neural network impedance

force control of robot manipulator, Proceedings of the IEEE International Conference on Robotics and Automation, Vol 4, pp 3453-3458, ISBN 0-7803-6576-3

Kang, B.; Kim, B.; Park, S & Kim, H (2007) Modeling of artificial neural network for the

prediction of the multi-joint stiffness in dynamic condition, Proceeding of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1840-1845, ISBN 978-1-

4244-0912-9, San Diego, CA, USA

Trang 16

Kazerooni, H.; Houpt, P K & Sheridan, T B (1986) The fundamental concepts of robust

compliant motion for robot manipulators, Proceedings of the IEEE International Conference on Robotics and Automation, Vol 3, pp 418-427, MN, USA

Lipkin, H & Patterson, T (1992) Generalized center of compliance and stiffness, Proceedings

of the IEEE International Conference on Robotics and Automation, Vol 2, pp 1251-1256,

ISBN 0-8186-2720-4, Nice, France

Moon, T K & Stirling, W C (2000) Mathematical Methods and Algorithm for Signal Processing,

Prentice Hall, Upper Saddle River, NJ, ISBN 0-201-36186-8

Park, J.; Kim, J & Kang, D (2005) An rls-based natural actor-critic algorithm for locomotion

of a two-linked robot arm, Proceeding of International Conference on Computational Intelligence and Security, Part I, LNAI, Vol 3801, pp 65-72, ISSN 1611-3349

Park, S & Sheridan, T B (2004) Enhanced human-machine interface in braking, IEEE

Transactions on Systems, Man, and Cybernetics, - Part A: Systems and Humans, Vol 34,

No 5, pp 615-629, ISSN 1083-4427

Peters, J.; Vijayakumar, S & Schaal, S (2005) Natural actor-critic, Proceeding of the 16th

European Conference on Machine Learning, LNCS, Vol 3720, pp.280-291, ISSN

1611-3349

Peters J & Schaal, S (2006) Policy gradient methods for robotics, Proceeding of IEEE/RSJ

International Conference on Intelligent Robots and Systems, pp 2219-2225, ISBN

1-4244-0259-X, Beijing, China

Strang, G (1988) Linear Algebra and Its Applications, Harcourt Brace & Company

Sutton, R S & Barto, A G (1998) Reinforcement Learning: An Introduction, MIT Press, ISBN

0-262-19398-1

Tsuji, T.; Terauchi, M & Tanaka, Y (2004) Online learning of virtual impedance parameters

in non-contact impedance control using neural networks, IEEE Transactions on Systems, Man, and Cybernetics, - Part B: Cybernetics, Vol 34, Issue 5, pp 2112-2118,

ISSN 1083-4419

Uno, Y.; Suzuki, R & Kawato, M (1989) Formation and control of optimal trajectory in

human multi-joint arm movement: minimum torque change model, Biological Cybernetics, Vol 61, No 2, pp 89-101, ISSN 1432-0770

Williams, R J (1992) Simple statistical gradient-following algorithms for connectionist

reinforcement learning, Machine Learning, Vol 8, No 3-4, pp 229-256 ISSN

1573-0565

Won, J (1993) The Control of Constrained and Partially Constrained Arm Movement, S M

Thesis, Department of Mechanical Engineering, MIT, Cambridge, MA, 1993

Xu, X.; He, H & Hu, D (2002) Efficient reinforcement learning using recursive least-squares

methods, Journal of Artificial Intelligent Research, Vol 16, pp 259-292

Trang 17

of the industrial robot manipulator Having optimum mass and minimum joint torques are the ways of improving the energy efficiency in robot manipulators The inverse of inertia matrix can be used locally minimizing the joint torques (Nedungadi & Kazerouinian, 1989) This approach is similar to the global kinetic energy minimization

Several optimization techniques such as genetic algorithms (Painton & Campbell, 1995; Chen & Zalzasa, 1997; Choi et al., 1999; Pires & Machado, 1999; Garg & Kumar, 2002; Kucuk

& Bingul, 2006; Qudeiri et al., 2007), neural network (Sexton & Gupta, 2000; Tang & Wang, 2002) and minimax algorithms (Pin & Culioli, 1992; Stocco et al., 1998) have been studied in robotics literature Genetic algorithms (GAs) are superior to other optimization techniques such that genetic algorithms search over the entire population instead of a single point, use objective function instead of derivatives, deals with parameter coding instead of parameters themselves GA has recently found increasing use in several engineering applications such

as machine learning, pattern recognition and robot motion planning It is an adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetic This provides a robust search procedure for solving difficult problems In this work, GA is applied to optimize the link masses of a three link robot manipulator to obtain minimum energy Rest of the Chapter is composed of the following sections In Section II, genetic algorithms are explained in a detailed manner Dynamic equations and the trajectory generation of robot manipulators are presented in Section III and Section IV, respectively Problem definition and formulation is described in Section V In the following Section, the rigid body dynamics of a cylindrical robot manipulator is given as example Finally, the contribution of this study is presented in Section VII

Trang 18

2 Genetic algorithms

GAs were introduced by John Holland at University of Michigan in the 1970s The improvements in computational technology have made them attractive for optimization problems A genetic algorithm is a non-traditional search method used in computing to find exact or approximate solutions to optimization and search problems based on the evolutionary ideas of natural selection and genetic The basic concept of GA is designed to simulate processes in natural system necessary for evolution, specifically those that follow the principles first laid down by Charles Darwin of survival of the fittest As such they represent an intelligent exploitation of a random search within a defined search space to solve a problem The obtained optima are an end product containing the best elements of previous generations where the attributes of a stronger individual tend to be carried forward into following generation The rule is survival of the fittest will The three basic features of GAs are as follows:

i Description of the objective function

ii Description and implementation of the genetic representation

iii Description and implementation of the genetic operators such as reproduction, crossover and mutation

If these basic features are chosen properly for optimization applications; the genetic algorithm will work quite well GA optimization possesses some unique features that separate from the other optimization techniques given as follows:

i It requires no knowledge or gradient information about search space

ii It is capable of scanning a vast solution set in a short time

iii It searches over the entire population instead of a single point

iv It allows a number of solutions to be examined in a single design cycle

v It deals with parameter coding instead of parameters themselves

vi Discontinuities in search space have little effect on overall optimization performance

vii It is resistant to becoming trapped in local optima

These features provide GA to be a robust and useful optimization technique over the other search techniques (Garg & Kumar, 2002) However, there are some disadvantages to use genetic algorithms

i Finding the exact global optimum in search space is not certain

ii Large numbers of fitness function evaluations are required

iii Configuration is not straightforward and problem dependent

The representation or coding of the variables being optimized has a large impact on search performance, as the optimization is performed on this representation of the variables The two most common representations, binary and real number codings, differ mainly in how the recombination and mutation operators are performed The most suitable choice of representation is based upon the type of application In GAs, a set of solutions represented

by chromosomes is created randomly Chromosomes used here are in binary codings Each zero and one in chromosome corresponds to a gene A typical chromosome can be given as

10000010

An initial population of random chromosomes is generated at the beginning The size of the initial population may vary according to the problem difficulties under consideration A

Trang 19

Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 277

different solution to the problem is obtained by decoding each chromosome A small initial with composed of eight chromosomes can be denoted as the form

At the reproduction process, chromosomes are chosen based on natural selections The chromosomes in the new population are selected according to their fitness values defined with respect to some specific criteria such as roulette wheel selection, rank selection or steady state selection The fittest chromosomes have a higher probability of reproducing one

or more offspring in the next generation in proportion to the value of their fitness

At the crossover stage, two members of population exchange their genes Crossover can be implemented in many ways such as having a single crossover point or many crossover points which are chosen randomly A simple crossover can be implemented as follows In the first step, the new reproduced members in the mating pool are mated randomly In the second step, two new members are generated by swapping all characteristics from a randomly selected crossover point A good value for crossover can be taken as 0.7 A simple crossover structure is shown below Two chromosomes are selected according to their fitness values The crossover point in chromosomes is selected randomly Two chromosomes are given below as an example

10110011

11100110 After crossover process is applied, all of the bits after the crossover point are swapped Hence, the new chromosomes take the form

101*00110 111*10011 The symbol ‘‘*’’ corresponds the crossover point At the mutation process, value of a particular gene in a selected chromosome is changed from 1 to 0 or vice versa., The probability of mutation is generally kept very small so these changes are done rarely In general, the scheme of a genetic algorithm can be summarized as follows

Trang 20

i Create an initial population

ii Check each chromosome to observe how well is at solving the problem and

evaluate the fitness of the each chromosome based on the objective function

iii Choose two chromosomes from the current population using a selection method

like roulette wheel selection The chance of being selected is in proportion to the

value of their fitness

iv If a probability of crossover is attained, crossover the bits from each chosen

chromosome at a randomly chosen point according to the crossover rate

v If a probability of mutation is attained, implement a mutation operation according

to the mutation rate

vi Continue until a maximum number of generations have been produced

3 Dynamic Equations

A variety of approaches have been developed to derive the manipulator dynamics equations

(Hollerbach, 1980; Luh et al., 1980; Paul, 1981; Kane and Levinson, 1983; Lee et al., 1983) The

most popular among them are Lagrange-Euler (Paul, 1981) and Newton-Euler methods

(Luh et al., 1980) Energy based method (LE) is used to derive the manipulator dynamics in

this chapter To obtain the dynamic equations by using the Lagrange-Euler method, one

should define the homogeneous transformation matrix for each joint Using D-H (Denavit &

Hartenberg, 1955) parameters, the homogeneous transformation matrix for a single joint is

00

αααθαθ

αα

αθαθ

0θ

θ

1 1 1 1

1 1

i i i i i i i

i i i

i i i i

i i

i

dsscccs

as

c

where ai-1, αi-1, di, θi, ci and si are the link length, link twist, link offset, joint angle, cosθi and

sinθi, respectively In this way, the successive transformations from base toward the

end-effector are obtained by multiplying all of the matrices defined for each joint

The difference between kinetic and potential energy produces Lagrangian function given by

)(),(),

where q and q represent joint position and velocities, respectively Note that, qi is the joint

angle θi for revolute joints, or the distance di for prismatic joints While the potential

energy (P) is dependent on position only, the kinetic energy (K) is based on both position

and velocity of the manipulator The total kinetic energy of the manipulator is defined as

i i T i i n

i

i T

vq

q

2

1),(

where mi is the mass of link i and Ii denotes the 3x3 inertia tensor for the center of the link i

with respect to the base coordinate frame Ii can be expressed as

Trang 21

T i m i

where R0i represents the rotation matrix and Im stands for the inertia tensor of a rigid object

about its center of mass The terms vi and ωi refer to the linear and angular velocities of the

center of mass of link i with respect to base coordinate frame, respectively

qqA

whereAi(q)and Bi(q) are obtained from the Jacobian matrix, Ji(q) If vi and ωi in

equations 5 are substituted in equation 3, the total kinetic energy is obtained as follows

qqq

i i i T i n

qqMqqq

2

1),

where M(q) denotes mass matrix given by

)(

1

qBIqBqAmqAq

i i i T i n

)()

(

1

qhgmq

P

n

i

i T i

∑

=

−

where g and hi(q) denotes gravitational acceleration vector and the center of mass of link i

relative to the base coordinate frame, respectively As a result, the Lagrangian function can

be obtained by combining the equations 7 and 9 as follows

)()

(2

1),(

1

qhgmqqMqqqL

n

i

i T i

The equations of motion can be derived by substituting the Lagrangian function in equation

10 into the Euler-Langrange equations

τ),(),

Trang 22

i n

k n

j

i j k i kj n

where, τ is the nx1 generalized torque vector applied at joints, and q, q and q are the nx1

joint position, velocity and acceleration vectors, respectively M(q) is the nxn mass matrix,

1)()

qqMqq

i ij

k

i kj

)()

(k n

j

j ki j k

The friction term is omitted in equation 12 The detailed information about Lagrangian

dynamic formulation can be found in text (Schilling, 1990)

4 Trajectory Generation

In general, smooth motion between initial and final positions is desired for the end-effector

of a robot manipulator since jerky motions can cause vibration in the manipulator Joint and

Cartesian trajectories in robot manipulators are two common ways to generate smooth

motion In joint trajectory, initial and final positions of the end-effector are converted into

joint angles by using inverse kinematics equations A time (t) dependent smooth function is

computed for each joint All of the robot joints pass through initial and final points at the

same time Several smooth functions can be obtained from interpolating the joint values A

5th order polynomial is defined here under boundary conditions of joints (position, velocity,

and acceleration) as follows

i

x(0)= θi

x(0)= θ

f f

t

x( )=θf ft

x( )=θThese boundary conditions uniquely specify a particular 5th order polynomial as follows

5 5 4 4 3 3 2 2 1 0)

The desired velocity and acceleration calculated, respectively

4 5 3 4 2 3 2

2)

Trang 23

where s0,s1, ,s5are the coefficients of the polynomial and given by

2

)θθ()θ12θ()θθ(20

f

f i f f i f i f

t

tt

4

2 4

2

)θθ()θ16θ14()θθ(30

f

f f i f i f f i

t

tt

f

f f i f i f i f

t

tt

where θi, θf, θ , i θ , f θ , i θ denote initial and final position, velocity and acceleration, f

respectively tf is the time at final position Fig 1 illustrates the position velocity and

acceleration profiles for a single link robot manipulator Note that the manipulator is

assumed to be motionless at initial and final positions

Trang 24

5 Problem Formulation

If the manipulator moves freely in the workspace, the dynamic equation of motion for n

DOF robot manipulator is given by the equation 12 This equation can be written as a simple

matrix form as

τ)(),()(qq+Cqq +Gq =

Note that, the inertia matrix M(q) is a symmetric positive definite matrix The local

optimization of the joint torgue weighted by inertia results in resolutions with global

characteristics That is the solution to the local minimization problem (Lui & Wang, 2004)

The formulation of the problem is

Minimize τTM− 1τ (m1,m2,m3)

Subjected to

0)()(qq+Jqq−r=

wheref(q) is the position vector of the end-effector J(q) is the Jacobian matrix and defined as

qqfqJ

Theobjectivefunctioninequation 25 also minimizes this kinetic energy

6 Example

A three-link robot manipulator including two revolute joints and a prismatic joint is chosen

as an example to examine the optimum energy performance The transformation matrices

are obtained using well known D-H method (Denavit & Hartenberg, 1955) For

simplification, each link of the robot manipulator is modelled as a homogeneous cylindrical

or prismatic beam of mass, m which is located at the centre of each link The kinematics and

dynamic representations for the three-link robot manipulator are shown in Fig 2

Trang 25

2

1

l

1 l

2

l

2 l

3 l

2

3

l

1 m

2 m

3 m 3

3 d

q = 2

2 = θ q 1

1 = θ q

1

h

1 , 0 x

1 , 0 y

1 , 0

2 y 2 z

3 x

3

y z3

Figure 2 Kinematics and dynamic representations for the three-link robot manipulator

The kinematics and dynamic link parameters for the three-link robot are given in Table 1

i

mI

21

where miandLCistand for link masses and the centers of mass of links, respectively The

matrix Im includes only the diagonal elements Ixx, Iyy and Izz They are called the principal

moment of inertia about x, y and z axes Since the mass distribution of the rigid object is

symmetric relative to the body attached coordinate frame, the cross products of inertia are

taken as zero The transformation matrices for the robot manipulator illustrated in Fig 2 are

obtained using the parameters in Table 1 as follows

0 1

0100

00θcosθsin

0θsinθcos

2 2

1 2 2

1

l

Trang 26

0010001

3

2 2

00

00mMMMM

where

3 2

4

1(4

2 2 1 2 3 2

2 2 2 2 2

14

1

2 3 2

2 1 2 2 2

Generalized torque vector applied at joints are

T]τττ[

where

1 2

1 2 2 1 2 2 3 2

1 2 2 2 2 2 2

1

4

1(4

1

τ =⎢⎣⎡ ml +Izz1+m l +l l +l +Izz2+m l + l +l +Izz3⎥⎦⎤

2 2

2 1 2 2 3 2

2 2 2

2

14

2

1(θsinθ

θ)2(θ

1 2

2 1 2 3 2

2 2 2

2

14

1(

τ =⎢⎣⎡m l + l l +Izz2+m l + l +Izz3⎥⎦⎤

Trang 27

2 3

2 2

4

1

3 2

As a result, the objective function is obtained as follows

1 3

2 2 2 2 3 2 3

2 2 3 2 2 3

2 2 2 2 3 2

2 3 1

)θ

cos(

θcosτ

)θ

cos(ττ

τ

3 3 3

zz

zz T

Imll

mlm

Il

lmlmI

mll

mlm

IlmM

2 3

2 2 2 2 2 2 3 2 1

2 1 2 2 3 2 2 3 2 3

2 2 2 2 2 2 2 1 3

2 2 3 2 2 3

)θ

cos(

θcos2

τ)θ

cos(

θcosτ

3 3 3

−

++

+

zz zz

zz

Imll

mlm

Ilml

lmlmI

mll

mlm

Il

lmlm

3 3 3ττ

6.1 Optimization with Genetic Algorithms

The displacement of the effector with respect to the time is illustrated in Fig 3 The

end-effector starts from the initial position pxi=44, pyi=0, pzi=18 at t=0 second, and reach the final

position pxf=13.2721, pyf=12.7279, pzf=2 at t=3 seconds in Cartesian space

When the end-effector perform a motion from the initial position pxi=44, pyi=0, pzi=18 to the

final position pxf=13.2721, pyf=12.7279, pzf=2 in 3 seconds in Cartesian space, each joint has

position velocity and acceleration profiles obtained from the 5th order polynomial as shown

in Fig 4

Ten samples obtained from Fig 4 at time intervals between 0 and 3 seconds for the

optimization problem are given in Table 2 Pos.-i, Vel.-i and Acc.-i represent position,

velocity and acceleration of ith joint of the robot manipulator, respectively (1≤ i ≤3) The

letter ‘‘S’’ represents sample

0 25

−

50 25

) 18 , 0 , 44 (

Figure 3 The displacement of the end-effector with respect to the time

Trang 28

(c) The third joint

Figure 4 The position velocity and acceleration profiles for first, second and third joints

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Seconds Seconds Seconds

Trang 29

S Pos.-1 Vel.-1 Acc.-1 Pos.-2 Vel.-2 Acc.-2 Pos.-3 Vel.-2 Acc.-3

72.177 203.37 228.97 177.77 78.577 -39.822 -48.622 -19.022 -22.222 -29.422

0.047 2.594 11.70728.33350.82375.91299.684118.54130.20134.63

1.401 18.02 43.20 66.66 81.40 83.62 72.80 51.62 26.04 5.226

27.066 76.266 85.866 66.666 29.466 -14.933 -55.733 -82.133 -83.333 -48.533

0.005 0.307 1.387 3.358 6.023 8.997 11.81414.05015.43215.957

0.166 2.136 5.120 7.901 9.647 9.911 8.628 6.118 3.086 0.619

3.207 9.039 10.176 7.901 3.492 -1.769 -6.605 -9.734 -9.876 -5.752 Table 2 Position, velocity and acceleration samples of first, second and third joints

In robot design optimization problem, the link masses m1, m2 and m3 are limited to an upper

bound of 10 and to a lower bound of 0 The objective function does not only depend on the

design variables but also the joint variables (q1, q2, q3) which have lower and upper bounds

(0<q1<360, 0<q2<135 and 0<q3<16), on the joint velocities (q1,q2,q3) and joint accelerations

(q1,q2,q3) The initial and final velocities of each joint are defined as zero In order to

optimize link masses, the objective function should be as small as possible at all working

positions, velocities and accelerations The following relationship was adapted to specify the

corresponding fitness function

)10(ττ)9(ττ,,)2(ττ)1(τττ

In the GA solution approach, the influences of different population sizes and mutation rates

were examined to find the best GA parameters for the mass optimization problem where the

minimizing total kinetics energy The GA parameters, population sizes and mutation rates

were changed between 20-60 and 0.005-0.1 where the number of iterations was taken 50 and

100, respectively The GA parameters used in this study were summarized as follows

Mutation rate :0.005, 0.01, 0.1

Number of iteration :50, 100

All results related these GA parameters are given in Table 3 As can be seen from Table 3,

population sizes of 20, mutation rate of 0.01 and iteration number of 50 produce minimum

kinetic energy of 0 892 10-14 where the optimum link masses are m1=9.257, m2=1.642 and

m3=4.222

Trang 30

7 Conclusion

In this chapter, the link masses of the robot manipulators are optimized using GAs to obtain the best energy performance First of all, fundamental knowledge about genetic algorithms, dynamic equations of robot manipulators and trajectory generation were presented in detail Second of all, GA was applied to find the optimum link masses for the three-link serial robot manipulator Finally, the influences of different population sizes and mutation rates were searched to achieve the best GA parameters for the mass optimization The optimum link masses obtained at minimum torque requirements show that the GA optimization technique

is very consistent in robotic design applications Mathematically simple and easy coding features of GA also provide convenient solution approach in robotic optimization problems

8 References

Denavit, J.; Hartenberg, R S (1955) A kinematic notation for lower-pair mechanisms based

on matrices, Journal of Applied Mechanics, pp 215-221, vol 1, 1955

Hollerbach, J M (1980) A Recursive Lagrangian Formulation of Manipulator Dynamics and

A Comparative Study of Dynamic Formulation Complexity, IEEE Trans System.,

Trang 31

Luh, J Y S.; Walker, M W & Paul, R P C (1980) On-Line Computational Scheme for

Mechanical Manipulators, Trans ASME, J Dynamic Systems, Measurement and

Paul, R P (1981) Robot Manipulators: Mathematics, Programming, and Control, MIT Press,

Cambridge Mass., 1981

Kane, T R.; Levinson, D A (1983) The Use of Kane’s Dynamic Equations in Robotics, Int J

of Robotics Research, vol 2, no 3, pp 3-20, 1983

Lee, C S G.; Lee, B H & Nigam, N (1983) Development of the Generalized D’Alembert

Equations of Motion for Robot Manipulators, Proc Of 22nd Conf on Decision and

Nedungadi, A.; Kazerouinian, K (1989) A local solution with global characteristics for joint

torque optimization of redundant manipulator, Journal of Robotic Systems, vol 6,

1989

Schilling, R (1990) Fundamentals of Robotics Analysis and Control, Prentice Hall, 1990,

New Jersey

Delingette, H.; Herbert, M & Ikeuchi, K (1992) Trajectory generation with curvature

constraint based on energy minimization, Proceedings of the IEEE/RSJ International

cooperating manipulation, Proceedings of the 23rd Annual Pittsburgh Modeling and

Pin, F.; Culioli, J (1992) Optimal positioning of combined mobile platform-manipulator

systems for material handling tasks,Journal of Intelligent Robotic System, Theory and

Painton, L.; Campbell, J (1995) Genetic algorithms in optimization of system reliability,

Hirakawa, A.;Kawamura, A (1996) Proposal of trajectory generation for redundant

manipulators using variational approach applied to minimization of consumed electrical energy,Proceedings of the Fourth International Workshop on Advanced Motion

Chen, M.; Zalzasa, A (1997) A genetic approach to motion planning of redundant mobile

manipulator systems considering safety and configuration, Journal of Robotic

Stocco, L.; Salcudean, S E & Sassani, F (1998) Fast constraint global minimax optimization

of robot parameters,Robotica,pp 595-605, vol 16, 1998

Choi, S.; Choi, Y & Kim, J (1999).Optimal working trajectory generation for a biped robot

using genetic algorithm, Proceedings of IEEE/RSJ International Conference on

Pires, E.; Machado, J (1999) Trajectory planner for manipulators using genetic algorithms,

1999

Sexton, R.; Gupta, J (2000) Comparative evaluation of genetic algorithm and back

propagation for training neural networks, Information Sciences, pp 45–59, 2000 Garg, D.; Kumar, M (2002) Optimization techniques applied to multiple manipulators for

path planning and torque minimization, Journal for Engineering Applications of Artificial Intelligence, pp 241-252, vol 15, 2002

Trang 32

Tang, W.; Wang, J (2002) Two recurrent neural networks for local joint torque optimization

of kinematically redundant manipulators,IEEE Trans Syst Man Cyber, Vol 32(5),

2002

Lui, S.; Wang, J (2004) A dual neural network for bi-criteria torque optimization of

redundant robot manipulators, Lecture Notes in Computer Science, vol 3316, pp 1142-1147, 2004

Kucuk S.; Bingul Z (2006) Link mass optimization of serial robot manipulators using

genetic algorithms, Lecture Notes in Computer Science, Knowledge-Based Intelligent

Qudeiri, J A.; Yamamoto, H & Ramli, R (2007) Optimization of operation sequence in

CNC machine tools using genetic algorithm, Journal of Advance Mechanical Design,

Trang 33

16

FPGA-Realization of a Motion Control IC for

Robot Manipulator

Ying-Shieh Kung and Chia-Sheng Chen

Southern Taiwan University

Taiwan

1 Introduction

Robotic control is currently an exciting and highly challenging research focus Several solutions for implementing the control architecture for robots have been proposed (Kabuka et al., 1988; Yasuda, 2000; Li et al., 2003; Oh et al., 2003) Kabuka et al (1998) apply two high-performance floating-point signal processors and a set of dedicated motion controllers to build

a control system for a six-joint robots arm Yasuda (2000) adopts a PC-based microcomputer and several PIC microcomputers to construct a distributed motion controller for mobile robots

Li et al (2003) utilize an FPGA (Field Programmable Gate Array) to implement autonomous fuzzy behavior control on mobile robot Oh et al (2003) present a DSP (Digital Signal Processor) and a FPGA to design the overall hardware system in controlling the motion of biped robots However, these methods can only adopt PC-based microcomputer or the DSP chip to realize the software part or adopt the FPGA chip to implement the hardware part of the robotic control system They do not provide an overall hardware/software solution by a single chip in implementing the motion control architecture of robot system

For the progress of VLSI technology, the Field programmable gate arrays (FPGAs) have been widely investigated due to their programmable hard-wired feature, fast time-to-market, shorter design cycle, embedding processor, low power consumption and higher density for implementing digital system FPGA provides a compromise between the special-purpose ASIC (application specified integrated circuit) hardware and general-purpose processors (Wei

et al., 2005) Hence, many practical applications in motor control (Zhou et al., 2004; Yang et al., 2006; Monmasson & Cirstea, 2007) and multi-axis motion control (Shao & Sun, 2005) have been studied, using the FPGA to realize the hardware component of the overall system

The novel FPGA (Field Programmable Gate Array) technology is able to combine an embedded processor IP (Intellectual Property) and an application IP to be an SoPC (System-on-a-Programmable-Chip) developing environment (Xu et al., 2003; Altera, 2004; Hall & Hamblen, 2004), allowing a user to design an SoPC module by mixing hardware and software The circuits required with fast processing but simple computation are suitable to be implemented by hardware in FPGA, and the highly complicated control algorithm with heavy computation can be realized by software in FPGA Results that the software/hardware co-design function increase the programmable, flexibility of the designed digital system and reduce the development time Additionally, software/hardware parallel processing enhances the controller performance Our previous works (Kung & Shu, 2005; Kung et al., 2006; Kung &

Trang 34

Tsai, 2007) have successfully applied the novel FPGA technology to the servo system of PMSM drive, robot manipulator and X-Y table

To exploit the advantages, this study presents a fully digital motion control IC for a five-axis robot manipulator based on the novel FPGA technology, as in Fig 1(a) Firstly, the mathematical model and the servo controller of the robot manipulator are described Secondly, the inverse kinematics and motion trajectory is formulated Thirdly, the circuit being designed to implement the function of motion control IC is introduced The proposed motion control IC has two IPs One

IP performs the functions of the motion trajectory for robot manipulator The other IP performs the functions of inverse kinematics and five axes position controllers for robot manipulator The former is implemented by software using Nios II embedded processor due to the complicated processing but low sampling frequency control (motion trajectory: 100Hz) The latter is realized

by hardware in FPGA owing to the requirements of high sampling frequency control (position loop:762Hz, speed loop:1525Hz, PWM circuit: 4~8MHz) but simple processing To reduce the usage of the logic gates in FPGA, an FSM (Finite State Machine) method is presented to model the overall computation of the inverse kinematics, five axes position controllers and five axes speed controllers Then the VHDL (VHSIC Hardware Description Language) is adopted to describe the circuit of FSM Hence, all functionality required to build a fully digital motion controller for a five-axis robot manipulator can be integrated in one FPGA chip As the result, hardware/software co-design technology can make the motion controller of robot manipulator more compact, flexible, better performance and less cost The FPGA chip employed herein is an Altera Stratix II EP2S60F672C5ES (Altera, 2008), which has 48,352 ALUTs (Adaptive Look-Up Tables), maximum 718 user I/O pins, total 2,544,192 RAM bits And a Nios II embedded processor which has a 32-bit configurable CPU core, 16 M byte Flash memory, 1 M byte SRAM and 16 M byte SDRAM is used to construct the SoPC development environment Finally, an experimental system included by an FPGA experimental board, five sets of inverter and a Movemaster RV-M1 micro robot produced by Mitsubishi is set up to verify the correctness and effectiveness of the proposed FPGA-based motion control IC

2 Important

The typical architecture of the conventional motion control system for robot manipulator is shown in Fig 1(b), which consists of a central controller, five sets of servo drivers and one robot manipulator The central controller, which usually adopts a float-pointed processor, performs the function of motion trajectory, inverse kinematics, data communication with servo drivers and the external device Each servo driver uses a fixed-pointed processor, some specific ICs and an inverter to perform the functions of position control at each axis of robot manipulator and the data communication with the central controller Data communication media between them may be an analog signal, a bus signal or a serial asynchronous signal Therefore, once the central controller receives a motion command from external device It will execute the computation of motion trajectory and inverse kinematics, generate five position commands and send those signals to five servo drivers Each servo driver received the position command from central controller will execute the position controller to control the servo motor

of manipulator As the result, the motion trajectory of robot manipulator will follow the prescribed motion command However, the motion control system in Fig 1 (b) has some drawbacks, such as large volume, easy effected by the noise, expensive cost, inflexible, etc In addition, data communication and handshake protocol between the central controller and servo drivers slow down the system executing speed To improve aforementioned drawbacks,

Trang 35

FPGA-Realization of a Motion Control IC for Robot Manipulator 293

based on the novel FPGA technology, the central controller and the controller part of five servo drivers in Fig 1 (b) are integrated into a motion control IC in this study, which is shown in Fig 1(a) The proposed motion control IC has two IPs One IP performs the functions of the motion trajectory by software The other IP performs the functions of inverse kinematics and five axes position controllers by hardware Hence, hardware/software co-design technology can make the motion controller of robot manipulator more compact, flexible, better performance and less cost The detailed design methodology is described in the following sections, and the contributions of this study will be illustrated in the conclusion section

encoder

L C Rectifier Inverter

Three-phase

ac source Isolated and driving circuits

+

A T

−

A TA T

Three-phase

+

A T

−

A T A T

Three-phase

+

A T

−

A TA T

Inverter

Three-phase

+

A T A T

−

A T A T

Inverter

Three-phase

+

A TA T

−

A TA T

J1-axis

J2-axis J3-axis

J4-axis

J5-axis

J3-axis servo motor

J5-axis servo motor

J2-axis servo motor

J1-axis servo motor

Digital circuits for inverse kinematics and five-axis position controller

QEP

Controller-3 Inverter-3 Servo driver-3

QEP

Inverter-1

1 st axis

DC motor Servo driver-1

QEP

Central controller

Controller-2 Controller-1

1 st axis position command

2 nd axis position command

3 rd axis position command

4 th axis position command

5 th axis position command

The five-axis articulated robot manipulator

Trang 36

3 System description and design method of robot manipulator

3.1 Mathematical model of a robot manipulator with actuator

The dynamic equation of the n-link manipulator is given by (Lewis et al., 1993):

τθθθθθθ

G denotes the gravity vector; and F(θ.) denotes the friction vector The terms θ, θ. , θ

and τ represent the n-vector of the position, the velocity, the acceleration and the

generalized forces, respectively; and are set to R n The dynamic model of the ith axis DC

motor drive system for the robot manipulator is expressed as

i i i

i i a i i b M

a R i v K dt

di

i M

M i =K i i

i i i i i

K ,

i M

K denote inductance, resistance, current, voltage, voltage constant and current constant, respectively Terms

i M

τ denote the inertial,

viscous and generated torque of the motor, and the load torque to the ith axis of the robot

manipulator, respectively Consider a friction

assume that the inductance term in (2) is small and can be ignored Then, (2) to (4) can be

arranged and simplified by the following equation

i a

M i i M M a

b M M M

R

K r F R

K K B J

i

i i

i i i i i i

Therefore, the dynamics of the dc motors that drive the arm links are given by the n

decoupled equations

v K R F B

with

)(),

(),

(

}/{},

{

}{}/{

}{},

{

i

i i

i i i i

i i

M M

i i

a M M

i

i a

M b M

M M

F vec F vec v vce v

R K diag K r diag R

B diag R

K K B diag B

J diag J vec

Δ

θθ

(7)

The dynamic equations including the robot manipulator and DC motor can be obtained by

combining (1) and (6) The gear ratio of the coupling from the motor i to arm link i is given

by ri, which is defined as

i

M i

Trang 37

Hence, substituting (8) into (6), then into (1), it can be obtained by

v RK G R F R RF V

R B M

R

The gear ratior i of a commercial robot manipulator is usually small (< 1/100) to increase

the torque value Therefore, the formula can be simplified From (9), the dynamic equation

combining the motor and arm link is expressed as

i i i a

M i M i i i i ii i

R

K r F r B m r J

i

i i i

=

i i k j jki j

1 1 1

a

L + K M1

1 1 1

M

1 1

M

θ

+ -

+

A five-axis motion and servo controller DC motor and driver of arm1 th

+ -

1 1

1 −

−

− +K z z

K i p

*

5 5 1

a

L + K M5

5 5 1

M

1 5

M

θ

+ -

+

+ -

1 1

1 −

−

−z z K

K i p

i a

M i M i i i i

R

K r F r B J

i

i i

Substituting (7) and (8) into (12), yields

i a

M M M a

b M M M

R

K F R

K K B J

i

i i i i

Trang 38

If the gear ratio is small, then the control block diagram combining arm link, motor actuator,

position loop P controller, speed loop PI controller, inverse kinematics and trajectory

planning for the servo and motion controller in a five-axis robot manipulator can be

represented in Fig 2

In Fig.2, the digital PI controllers in the speed loop are formulated as follows

)()(n k e n

) n ( e k ) n ( v ) n (

) n ( v ) n ( v ) n (

Where thek p, k i are P-controller gain and I-controller gain, the e (n) is the error between

command and sensor value and the v (n) is the PI controller output

3.2 Computational of the inverse kinematics for robot manipulator

Figure 3 shows the link coordinate system of the five-axis articulated robot manipulator

using the Denavit-Hartenberg convention Table 1 illustrates the values of the kinematics

parameters The inverse kinematics of the articulated robot manipulator will transform the

coordinates of robot manipulator from Cartesian space R3 (x,y,z) to the joint space R5

(θ1, θ2, θ3, θ4, θ5) and it is described in detail by author of (Schilling, 1998) The

computational procedure is as follows

2 2 2 5 1 2 1

a a z d d b

where a 1 , a 5 , d 1 , d 5 are Denavit-Hartenberg parameters and atan2 function refers to Table 2

Trang 39

3

5θ

Figure 3 Link coordinates of a five-axis robot manipulator

5 4 3 2 1

α

0

)( 2

1 π

α −

)( 2

d 1

) (

5 72 mm d

a 3

)( 250 mm

Table 1 Denavit-Hartenberg parameters of the robot manipulator

Case Quadrants atan2 (y, x)

x >0 1, 4 arctan (y / x)

x =0 1, 4 [sgn (y)] π/2

x <0 2, 3 arctan (y / x) + [sgn (y)] π

Table 2 Four-quadrant arctan function (Schilling, 1998)

3.3 Planning of the motion trajectory

3.3.1The computation of the point-to-point motion trajectory

To consider the smooth motion when manipulating the robot, the point-to-point motion scheme with constant acceleration/deceleration trapezoid velocity profile is applied in Fig.2

In this scheme, the designed parameters at each axis of robot are the overall displacement

*

i

θ

Δ (degree), the maximum velocity W i (degree/s), the acceleration or deceleration period

T acc and the position loop sampling interval t d Therefore, the instantaneous position

command q at each sampling interval, based on the trapezoid velocity profile, can be

computed as follows:

Step 1: Compute the overall running time First, compute the maximum running time

without considering the acceleration and deceleration:

Trang 40

5 4

* 4 3

* 3 2

* 2 1

* 1

This T 1 must exceed acceleration time T acc The acceleration and deceleration design is then

considered, and the overall running time is

Step 2: Adjust the overall running time to meet the condition of a multiple of the sampling

interval

Where N represents the interpolation number and [.] refers to the Gauss function with

integer output value Hence,

* 3

* 2

*

1 Δ θ Δ θ Δ θ Δ θ θ

Δ

)/(

],,,,[ W1W2 W3 W4 W5 T L T T acc

Step 4 : Calculate the acceleration/deceleration value

acc T W

Step 5 : Calculate the position command, q=[q1,q2,q3,q4,q5], at the mid-position

a Acceleration region:

2 0

2

1

t

* A

* q

where t =n* t d , 0<n≤ N ′ , and q denotes the initial position r0

b Constant speed region:

t

* W q

* A

* t

* W ( q

where t =n* t d , 0<n≤N′ and q r2 denotes the final position in the constant speed region

Tiêu đề	Robot Force Control Using Reinforcement Learning
Tác giả	Byungchan Kim, Shinsuk Park
Trường học	Korea University
Chuyên ngành	Robotics, Mechanical Engineering
Thể loại	Research Paper
Năm xuất bản	2023
Thành phố	Seoul

Định dạng
Số trang	288
Dung lượng	23,73 MB