Data Mining and Knowledge Discovery Handbook, 2 Edition part 44 pdf

At the beginning of epoch t, the enterprise observes the stock levels s1t and s2t, at the ﬁrst and the second warehouses respectively.. Although the ﬁrst warehouse can provide only 4 uni

Trang 1

In order to achieve good behavior, the agent must explore its environment Explo-ration means trying different sort of actions in various situations While exploring, some of the choices may be poor ones, which may lead to severe costs In such cases,

it is more appropriate to train the agent on a computer-simulated model of the en-vironment It is sometimes possible to simulate an environment without explicitly understanding it

RL methods have been used to solve a variety of problems in a number of

do-mains Pednault et al (2002) solved targeted marketing problems Tesauro (1994,

1995) planned an artiﬁcial backgammon player with RL Hong and Prabhu (2004) and Zhang and Dietterich (1996) used RL to solve manufacturing problems Littman and Boyan (1993) have used RL for the solution of a networking routing problem Using RL, Crites and Barto (1996) trained an elevator dispatching controller

20.6 Reinforcement-Learning and Data-Mining

This chapter presents an overview of some of the ideas and computation methods in

RL In this section the relation and relevance of RL to DM is discussed

Most DM learning methods are taken from ML It is popular to distinguish be-tween three categories of learning methods – Supervised Learning (SL), Unsuper-vised Learning and Reinforcement Learning In SL, the learner is programmed to extract a model from a set of observations, where each observation consists of ex-plaining variables and corresponding responses In unsupervised learning there is a set of observations but no response, and the learner is expected to extract a helpful representation of the domain from which the observations were drawn RL requires the learner to extract a model of response based on experience observations that in-clude states, responses and the corresponding reinforcements

SL methods are central in DM and a correlation may be established between SL and RL in the following manner Consider a learner that needs to extract a model

of response for different situations A supervised learner will rely on a set of ob-servations, each of which is labeled by an advisor (or an oracle) The label of each observation is regarded by the agent as the desired response for the situation intro-duced by the explanatory variables for this observation In RL the privilege of having

an advisor is not given Instead, the learner views situations (in RL these are called states) chooses responses (in RL these are called actions) autonomously and obtains rewards that indicate how good the choices were In this approach toward SL and

RL, states and realizations of ”explaining variables” are actually the same

In some DM problems, cases arise in which the responses in one situation affect

future outcomes This is typically the case in cost-sensitive DM problems Since SL

relies on labeled observations and assumes no dependence between observations, it

is sometimes inappropriate for such problems The RL model, on the other hand, perfectly ﬁts cost-sensitive DM problems5 For example, Pednault et al (2002) used

5Despite this claim, there are several difﬁculties in applying RL methods to DM problems

A serious issue is that DM problems suggest batches of observation stored in a database, whereas RL methods require incremental accumulation of observations through interaction

Trang 2

RL to solve a problem in targeted marketing – deciding on the optimal targeting

of promotion efforts in order to maximize the beneﬁts due to promotion Targeted marketing is a classical DM problem in which the desired response is unknown, and responses taken at one point in time affect the future (For example, deciding on an extensive campaign for a speciﬁc product this month may reduce the effectiveness of

a similar campaign the following month)

Finally, DM may be defined as a process in which computer programs manipulate data in order to provide knowledge about the domain that produced the data From the point of view implied by this definition, RL definitely needs to be considered as certain type of DM

20.7 An Instructive Example

In this section, an example-problem from the area of supply-chain management is presented and solved through RL Speciﬁcally, the modeling of the problem as an MDP with unknown reward and state-transition functions is shown; the sequence of Q-Learning is demonstrated; and the relations between RL and DM are discussed with respect to the problem

The term ”supply-chain management” refers to the attempts of an enterprise

to optimize processes involved in purchasing, producing, shipping and distributing goods Among other objectives, enterprises seek to formulate a cost-effective inven-tory policy Consider the problem of an enterprise that purchases a single product from a manufacturer and sells it to end-customers The enterprise may maintain a stock of the product in one or more warehouses The stock help the enterprise re-spond to customer demand, which is usually stochastic On the other hand, the en-terprise has to invest in purchasing the stock and maintaining it These activities lead

to costs

Consider an enterprise that has two warehouses in two different locations and

behaves as follows At the beginning of epoch t, the enterprise observes the stock levels s1(t) and s2(t), at the ﬁrst and the second warehouses respectively As a re-sponse, it may order from the manufacturer in quantities a1(t), a2(t) for the ﬁrst and

second warehouse respectively The decision of how many units to order for each of the warehouses is taken centrally (i.e simultaneously by a single decision-maker), but the actual orders are issued separately by the two warehouses The manufacturer

charges c d for each unit ordered, and additional c Kfor delivering an order to a

ware-house (i.e if the enterprise issues orders at both wareware-houses it is charged a ﬁxed 2c K

in addition to direct costs of the units ordered) It is assumed that there is no lead-time (i.e the units ordered become available immediately after issuing the orders) Subsequently, each of the warehouses observes a stochastic demand

A warehouse that has enough units in stock sells the units and charges p for each

sold unit If one of the warehouses fails to respond to the demand, whereas the other warehouse, after delivering to its customers, can spare units, transshipment is ini-tiated Transshipment means transporting units between the warehouses in order to

Trang 3

meet demand Transshipment costs c T for each unit transshipped Any unit

remain-ing in stock by the end of the epoch costs the enterprise c i for that one epoch The successive epoch begins with the number of units available at the end of the current epoch, and so-on

The enterprise wants to formulate an optimal inventory policy (i.e given the stock levels and in order to maximize its long-run expected proﬁts the enterprise wants to know when to issue orders, and in what quantities) This problem can be modeled as

an MDP (see the deﬁnition of MDP in Section 20.2) The stock levels s1(t) and s2(t)

at the beginning of epochs are the states faced by the enterprise’s decision-makers The possible quantities for two orders are the possible actions given a state As a consequence of choosing a certain action at a certain state, each warehouse obtains a deterministic quantity-on-hand As the demand is observed and met (either directly

or through transshipment), the actual, immediate proﬁt r t can be calculated as the revenue gained from selling products minus costs due to purchasing the products, de-livering the orders, the transshipments and maintaining inventory The stock levels at the end of the period, and thus the state for the successive epoch, are also determined Since the demand is stochastic, both the reward (the proﬁt) and the state-transition function are stochastic

Assuming that the demand functions at the two warehouses are unknown, the problem of the enterprise is how to solve an MDP with unknown reward and state-transition functions In order to solve the problem via RL, a large number of ex-perience episodes needs to be presented to an agent Gathering such exex-perience is

expensive, because in order to learn an optimal policy, the agent must explore its environment simultaneously to the exploitation of its current knowledge (see

discus-sion on the exploration-exploitation dilemma on Section 20.3.2) However, in many cases learning may be based on simulated experience

Consider using Q-Learning (see Section 20.3.2) for the solution of the

enter-prise’s problem Let this application be demonstrated for epoch t= 158, and the

ini-tial stock levels s1(158) = 4, s2(158) = 2 The agent constantly maintains a unique Q-value for each of the initial stock levels and the quantities ordered Assumed that capacity at both warehouses is limited to 10 units of stock, the possible actions given the states are:

A (s1(t),s2(t)) = {a1,a2 : a1+ s1(t) ≤ 10, a2+ s2(t) ≤ 10} (20.18) The agent chooses an action from the set of possible actions based on some heuristic that breaks the exploration-exploitation dilemma (see discussion in Section 20.3.2)

Assume that the current Q-values for the state s1(158) = 4, and s2(158) = 2 are as described in Figure 20.1 The heuristic used should tend to choose actions for which the corresponding Q-value is high, while allowing each action to be chosen with a

positive probability Assume that the action chosen is a1(158) = 0, a2(158) = 8 This action means that the ﬁrst warehouse does not issue an order while the second

ware-house orders 8 units Assume that the direct cost per unit is c d= 2, and that the ﬁxed

cost for an order is c K = 10 Since only the second warehouse issued an order, the enterprise’s ordering costs are 10+8·2 = 26 The quantities-on-hand after receiving

Trang 4

the order are 4 units in the first warehouse and 10 units in the second warehouse Assume the demand realizations are 5 units from the first warehouse and a single unit from the second warehouse Although the first warehouse can provide only 4 units directly, the second warehouse can spare a unit from its stock, transshipment

occurs, and both warehouses meet demand Assume the transshipment cost is c T= 1 for each unit transshipped Since only one unit needs to be transshipped, the total transshipment cost is 1 In epoch 158, six units were sold Assuming the enterprise

charges p= 10 for each unit sold, the revenue from selling products in this epoch

is 60 At the end of the epoch, the stock levels are zero units for the ﬁrst warehouse and 8 units for the second warehouse Assuming the inventory costs are 0.5 per unit

in stock for one period, the total inventory costs for epoch 158 are 4 (= 8 · 0.5) The immediate reward for that epoch is 60-26-1-4=29 The state for the next epoch

is s1(159) = 0 and s2(159) = 8 The agent can calculate V158(s1(159),s2(159)) by

maximizing the Q-values corresponding with s1(159) and s2(159), which it holds

by the end of epoch 158 Assume that the result of this maximization is 25 Assume

that the appropriate learning rate for s1= 4, s2= 2, a1= 0, a2= 8 and t = 158 is

α158(4,2,0,8) = 0.1, and that the discount factor is 0.9 The agents update the

appropriate entry according to the update rule in Equation 20.14 as follows

Q159(4,2,0,8) = 0.9 · Q158(4,2,0,8) + 0.1 · [r158+γV158(0,8)]

= 0.9 · 45 + 0.1 · [29 + 0.9 · 25] = 45.65 . (20.19)

The consequence of this update results in a change in the corresponding Q-value as indicated in Figure 20.2 Figure 20.3 shows the learning curve of a Q-Learning agent that was trained to solve the enterprise’s problem in accordance with the parameters assumed in this section The agent was introduced to 200,000 simulated experience episodes, in each of which the demands were drawn from Poisson distributions with means 5 and 3 for the ﬁrst and second warehouses respectively The learning rates

were set to 0.05 for all t, and a heuristic based on Boltzmann’s distribution was used

to break the exploration-exploitation dilemma (see Sutton and Barto, 1996) The ﬁgure shows a plot of the moving average reward (over 2000 episodes) against the experience of the agent while gaining these rewards

This section shows how RL algorithms (speciﬁcally how Q-Learning) can be used to learn from data observation As discussed in Section 20.6, this by itself makes RL, in this case, a DM tool However, the term DM may imply the use of

a SL algorithm Within the scope of problem discussed here, SL is inappropriate A supervised learner could induce an optimal (or at-least a near-optimal) policy based

on examples of the forms1,s2,a1,a2 whereas s1and s2describe a certain state, and

a1and a2are the optimal responses (orders quantity) for that state However in the case discussed here, such examples are probably not available

The methods presented in this chapter are useful for many application domains, such as: Manufacturing lr18,lr14, Security lr7,l10 and Medicine lr2,lr9, and for many data mining techniques, such as: decision trees lr6,lr12, lr15, clustering lr13,lr8, en-semble methods lr1,lr4,lr5,lr16 and genetic algorithms lr17,lr11

Trang 5

Fig 20.1 Q-values for the state encounter on epoch 158 before the update The value corre-sponding with the action ﬁnally chosen is marked

Fig 20.2 Q-values for the state encounter on epoch 158 after the update The value corre-sponding with the action ﬁnally chosen is marked

References

Arbel, R and Rokach, L., Classiﬁer evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier

Averbuch, M and Karson, T and Ben-Ami, B and Maimon, O and Rokach, L., Context-sensitive medical information retrieval, The 11th World Congress on Medical Informat-ics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp 282–286 Bellman R Dynamic Programming Princeton University Press, 1957

Trang 6

Fig 20.3 The learning curve of a Q-Learning agent assigned to solve the enterprise’s trans-shipment problem

Bertsekas D.P Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, 1987

Bertsekas D.P., Tsitsiklis J.N Neuro-Dynamic Programming Athena Scientiﬁc, 1996 Claus C., Boutilier, C The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems AAAI-97 Workshop on Multiagent Learning, 1998

Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp 3592-3612, 2007 Crites R.H., Barto A.G Improving Elevator Performance Using Reinforcement Learning Advances in Neural Information Processing Systems: Proceedings of the 1995 Confer-ence, 1996

Filar J., Vriez K Competitive Markov Decision Processes Springer, 1997

Hong J, Prabhu V.V Distributed Reinforcement Learning for Batch Sequencing and Sizing

in Just-In-Time Manufacturing Systems Applied Intelligence, 2004; 20:71-87 Howard, R.A Dynamic Programming and Markov Processes, M.I.T Press, 1960

Hu J., Wellman M.P Multiagent Reinforcement Learning: Theoretical Framework and Algo-rithm In Proceedings of the 15th International Conference on Machine Learning, 1998 Jaakkola T., Jordan M.I.,Singh S.P On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Neural Computation, 1994; 6:1185-201

Kaelbling L.P., Littman L.M., Moore A.W Reinforcement Learning: a Survey Journal of Artiﬁcial Intelligence Research 1996; 4:237-85

Littman M.L., Boyan J.A A Distributed Reinforcement Learning Scheme for Network Rout-ing In Proceedings of the International Workshop on Applications of Neural Networks

to Telecommunications, 1993

Trang 7

Littman M.L Markov Games as a Framework for Multi-Agent Reinforcement Learning In Proceedings of the 7th International Conference on Machine Learning, 1994

Littman M L Friend-or-Foe Q-Learning in General-Sum Games Proceedings of the 18th International Conference on Machine Learning, 2001

Maimon O., and Rokach, L Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D Braha (ed.), Kluwer Academic Publishers, pp 311–336, 2001 Maimon O and Rokach L., “Improving supervised learning by feature decomposition”, Pro-ceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp 178-196, 2002 Maimon, O and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artiﬁcial In-telligence - Vol 61, World Scientiﬁc Publishing, ISBN:981-256-079-3, 2005

Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behav-ioral classiﬁcation of the host, Computational Statistics and Data Analysis, 52(9):4544–

4566, 2008

Pednault E., Abe N., Zadrozny B Sequential Cost-Sensitive Decision making with Reinforcement-Learning In Proceedings of the 8th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, 2002

Puterman M.L Markov Decision Processes Wiley, 1994

Rokach, L., Decomposition methodology for classiﬁcation tasks: a meta decomposer frame-work, Pattern Analysis and Applications, 9(2006):257–271

Rokach L., Genetic algorithm-based feature set partitioning for classiﬁcation prob-lems,Pattern Recognition, 41(5):1676–1700, 2008

Rokach L., Mining manufacturing data using genetic algorithm-based feature set decompo-sition, Int J Intelligent Systems Technologies and Applications, 4(1):57-78, 2008 Rokach, L and Maimon, O., Theory and applications of attribute decomposition, IEEE In-ternational Conference on Data Mining, IEEE Computer Society Press, pp 473–480, 2001

Rokach L and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intel-ligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158

Rokach, L and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp 321–352, 2005, Springer

Rokach, L and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–

299, 2006, Springer

Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications, World Scientiﬁc Publishing, 2008

Rokach L., Maimon O and Lavi I., Space Decomposition In Data Mining: A Clustering Ap-proach, Proceedings of the 14th International Symposium On Methodologies For Intel-ligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag,

2003, pp 24–31

Rokach, L and Maimon, O and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artiﬁcial intelligence 3055, page 217-228 Springer-Verlag, 2004

Rokach, L and Maimon, O and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artiﬁcial Intelligence 20 (3) (2006), pp 329–350

Ross S Introduction to Stochastic Dynamic Programming Academic Press 1983

Trang 8

Sen S., Sekaran M., Hale J Learning to Coordinate Without Sharing Information In Pro-ceedings of the Twelfth National Conference on Artiﬁcial Intelligence, 1994

Sutton R.S., Barto A.G Reinforcement Learning, an Introduction MIT Press, 1998 Szepesv´ari C., Littman M.L A Uniﬁed Analysis of Value-Function-Based Reinforcement-Learning Algorithms Neural Computation, 1999; 11: 2017-60

Tesauro G.T TD-Gammon, a Self Teaching Backgammon Program, Achieves Master Level Play Neural Computation, 1994; 6:215-19

Tesauro G.T Temporal Difference Learning and TD-Gammon Communications of the ACM, 1995; 38:58-68

Watkins C.J.C.H Learning from Delayed Rewards Ph.D thesis; Cambridge University, 1989

Watkins C.J.C.H., Dayan P Technical Note: Q-Learning Machine Learning, 1992; 8:279-92 Zhang W., Dietterich T.G High Performance Job-Shop Scheduling With a Time Delay TD(λ) Network Advances in Neural Information Processing Systems, 1996; 8:1024-30

Trang 10

Neural Networks For Data Mining

G Peter Zhang

Georgia State University,

Department of Managerial Sciences,

gpzhang@gsu.edu

Summary Neural networks have become standard and important tools for data mining This chapter provides an overview of neural network models and their applications to data mining tasks We provide historical development of the ﬁeld of neural networks and present three important classes of neural models including feedforward multilayer networks, Hopﬁeld net-works, and Kohonen’s self-organizing maps Modeling issues and applications of these models for data mining are discussed

Key words: neural networks, regression, classiﬁcation, prediction, clustering

21.1 Introduction

Neural networks or artiﬁcial neural networks are an important class of tools for quan-titative modeling They have enjoyed considerable popularity among researchers and practitioners over the last 20 years and have been successfully applied to solve a va-riety of problems in almost all areas of business, industry, and science (Widrow, Rumelhart & Lehr, 1994) Today, neural networks are treated as a standard data min-ing tool and used for many data minmin-ing tasks such as pattern classiﬁcation, time series analysis, prediction, and clustering In fact, most commercial data mining soft-ware packages include neural networks as a core module

Neural networks are computing models for information processing and are par-ticularly useful for identifying the fundamental relationship among a set of variables

or patterns in the data They grew out of research in artiﬁcial intelligence; specif-ically, attempts to mimic the learning of the biological neural networks especially those in human brain which may contain more than 1011highly interconnected

neu-rons Although the artiﬁcial neural networks discussed in this chapter are extremely

simple abstractions of biological systems and are very limited in size, ability, and power comparing biological neural networks, they do share two very important char-acteristics: 1) parallel processing of information and 2) learning and generalizing from experience

O Maimon, L Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,

Định dạng
Số trang	10
Dung lượng	416,48 KB