Keywords Data mining, Game theory, policy making process, reinforcement learning Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot... Source: Eure
Trang 1Keywords
Data mining, Game theory, policy making process, reinforcement learning
Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot
Trang 2Experienced analysts could apply some mathematical models that are proven based on thepast data in order to evaluate company’s intrinsic value However, markets do not remain stableand indicators that have strong predictive value over one period may cease to generate excessreturns as soon as market conditions change New investment strategies and new technologywere introduced, which made some of the old models obsolete Since financial literacy becamehigher, there are more market players than ever Two measures have been proposed to counterthis evolving market behavior First, some trading systems are based on genetic algorithms thattransform the indicators that are used as attributes over time [6] [28] Second, more commonly,the data set is fit to nonlinear models using machine learning algorithms such as Artificial NeuralNetworks [10].
Trang 3The introduction to algorithms in trading definitely changed the stock market Algorithms made it easy to react fast to certain events on the stock market Machine learning algorithms also enabled analysts to create models for predicting prices of stocks much easier Introduction of machine learning caused that new models can be developed based on the past data The proof is the AI fund have outperformed their peers while providing downside protection, according to Eurekahedge’s report.
Trang 4The table above is comparing AI funds to the average hedge fund and systematicCTA/managed futures strategies, which can be considered the rough approximation for theaverage quant fund Source: Eurekahedge.
For the successful performance of AI fund, in this paper we will describe introduction to themethod for creating artificial agent trading on stock market using stock prices and throughseveral machine learning algorithms
1.2.Objective of research
The monetary motivation behind the predictive value of buying and selling stocks at profitablepositions is a key driver of this research Our main hypothesis was that by applying machinelearning and training it on the past data, it is possible to predict the movement of the stock pricethrough market’s patterns, then applying algorithms to create a profitable trading agent We useProfit and Loss (PnL) factor of agent through the test to justify the profitability of our agent Weshall conduct some simulations to examine whether the agent is profitable under different dataset (seen and unseen) then calculate the average PnL of the agent
1.3.Scope of the research
This thesis only provides elementary introduction approach to the algorithmic trading and gametheory approach as the frame work for market environment The game environment isuncomplicated when we assumed that others respond to our agent’s strategies indicate the stockprice movement Moreover, the algorithms used to create and train the agent exploits from themachine learning algorithms library called “Scikit-learn”, “Keras” Nevertheless, exploitedalgorithms and functions shall be explained in the Appendix of this thesis
Trang 5The thesis is organized in the following manner:
•Chapter 1 is stated the motivation for writing this thesis, the objectives and scope of the
research
•In Chapter 2, we provide the background of Efficient Market Hypothesis (EMH) and
it’s contradicts, as well as relevant works for this topic
•The game theoretical frame work background for describe the market, simulation
approach and algorithms are established in Chapter 3
•Chapter 4 describes the methods of data collection as well as data processing,
implementation and simulation on different variable of model
•Section 5 is the last section, we will discuss the final results of our agent, explain thelimitations of our research and state future improvement
Trang 6
Chapter 2: literature review:
This section begins with a background to efficient markets and then gives a brief review ofprevious empirical studies that use machine learning algorithms to construct trading strategies
1.5.Efficient Markets
One of the strongest oppositions to the existence of profitable trading strategies is founded on theideas of Efficient Market Hypothesis (EMH) Since EMH implies that our search forcontinuously profitably trading strategies is futile, we first give an overview of EMH and thenshow the empirical results that contradict this theory
EMH states that the current market price reflects the assimilation of all the informationavailable [13] That is, its proponents argue that since the stocks always trade at their fair value
on stock exchanges, it is impossible to outperform the overall market through expert stockselection or market timing Any new information is quickly integrated into the market price.Fama formalized the concept of efficient markets in 1970 by expressing the non-predictability ofmarket prices:
Where:
is the price of security j at time t;
is the one-period percentage return; and
is the information reflected at time t
Trang 7Based on this expectation expression, Fama argues that there is no possibility of findingexcess market returns via market timing based solely on information in , hence dispelling thepossibility of trading strategies based on technical indicators.
On the other hand, despite the theoretically sound nature of EMH, research over the last 30years has shown that several assumptions made in EMH may be unrealistic First, a fundamentalassumption is that investors behave rationally, or that the deviations of the many irrationalinvestors cancel out However, some research has shown that investors are not strictly rational[41], or devoid of biases [20] Indeed, people with a conservatism bias tend to underweight newinformation Moreover, experiments have shown that these biases tend to be systematic and thatdeviations do not cancel each other out [21] This leads to over- and under-reaction to newsevents
From the 1990s, literature has seen the growing decline of the EMH and the emergence ofbehavioral finance Behavioral finance views the market as an aggregate of human actions filledwith imperfect and inefficient decisions Under this theory, the financial markets are a reflection
of human desires, goals, motivations, errors and overconfidence [40] An alternative to EMH thathas grown traction is the idea of the Adaptive Market Hypothesis, which posits that profitopportunities from inefficiencies exist in finance markets but are eroded away as the knowledge
of the efficiency spreads throughout the public and the public capitalizes on the opportunities Bythis view of financial markets, many have built evolutionary and/or non-linear models anddemonstrated that excess returns can be attained on out-of-sample data
Trang 81.6.Previous Research
Because of their ability to model nonlinear relationships without pre-specification during themodeling process, neural networks (NNs) have become a popular method in financial time-seriesforecasting NNs also offer huge flexibility in the type of architecture of the model, in terms ofnumber of hidden nodes and layers Indeed, Pekkaya and Hamzacebi compare the results fromusing a linear regression versus a NN model to forecast macro variables and show that the NNgives much better results [35]
Many studies have used NNs and shown promising results in the financial markets.Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directionsand found they were able to correctly prediction the direction of monthly price changes 75% and61% respectively [15] Another study showed that a NN-based model leads to higher arbitrageprofits compared to cost of carry models Phua, Ming and Lin implement a NN usingSingapore’s stock market index and show a forecasting accuracy of 81% [36] Similarly, NNmodels applied to weekly forecasting of Germany’s FAZ index find favorable predictive resultscompared to conventional statistical approaches [14]
More recently, NNs have been augmented or adapted to improve performance on financialtime series forecasting Shaoo et al show that cascaded functional link artificial neural networks(CFLANN) perform the best in FX markets [39] Egrioglu et al introduce a new method based
on feed forward artificial neural networks to analyze multivariate high order fuzzy time seriesforecasting models [12] Liao and Wang used a stochastic time effective neural network model toshow predictive results on the global stock indices Bildirici and Ersin combined NNs withARCH/GARCH and other volatility-based models to produce a model that out performed ANNs
or GARCH based models alone Moreover, Yudong and Lenan used back-trial chemotaxis
Trang 9optimization (BCO) and back-propagation NN on S&P500 index and conclude that their hybridmodel (IBCO-BP) offers less computational complexity, better prediction accuracy and lesstraining time.
Another popular machine learning classification technique that does not require any domainknowledge or parameter setting is the decision tree It also often offers a better visuallyinterpretable model compared to NN, as the nodes in the tree can be easily understood Thesimplest type of decision tree model is the classification and regression tree (CART) Sorensen et
al show that CART decision trees perform better than single-factor models-based on the samevariables in picking stock portfolios [42] Wang and Chan use a two-layer bias decision tree topredict the daily stock prices of Microsoft, Intel and IBM, finding excess returns compared to abuy and hold method [43] Another study found that a boosted alternating decision tree withexpert weighing generated abnormal returns for the S&P500 index during the test period [11] Toimprove accuracy, some studies used the random forest algorithm for classification, which will
be further discussed in chapter 4 Namely, Booth et al show that a regency-weighted ensemble
of random forests produced superior results when analyzed on a large sample of stocks from theDAX in terms of both profitability and prediction accuracy compared with other ensembletechniques [7] Similarly, a gradient boosted random forest model applied to Singapore’s stockmarket was able to generate excess returns compared with a buy-and-hold strategy [37] Somerecent research combines decision tree analysis with evolutionary algorithms to allow the model
to adapt to changing market conditions Hsu et al present constraint-based evolutionaryclassification trees (CECT) and show strong predictability of a company’s financial performance[16]
Trang 10Support Vector Machines (SVM) are also often used in prediction market behaviors Huang
et al compare SVM with other classification methods (random Walk, linear discriminantanalysis, quadratic discriminant analysis and elman backpropagation neural networks) and findsthat SVM performs the best in forecasting weekly movements of the Nikkei 225 index [17].Similarly, Kim compares SVM with NN and case-based reasoning (CBR) and finds that SVMoutperforms both in forecasting the daily direction of change in the Korea composite stock priceindex (KOSPI) [23] Likewise, Yang et al use a margin-varying Support Vector Regressionmodel and show empirical results that have good predictive value for the Hang Seng Index [46].Nair et al propose a system that is a genetic algorithm optimized decision treesupport vectormachine hybrid and validate its performance on the BSE-Sensex and found that its predictiveaccuracy is better than that of both a NN and Naive bayes based model [31]
While some studies have tried to compare various machine learning algorithms against eachother, the results have been inconsistent Patel et al compares four prediction models, NN, SVM,random forest and naive-Bayes and find that over a ten years period of various indices, therandom forest model performed the best However, Ou and Wang examine the performance often machine learning classification techniques on the Hang Sen Index and found that the SVMoutperformed the other models [33] Kara et al compared the performance of NN versus SVM
on the daily Istanbul Stock Exchange National 100 Index and found that the averageperformance of the NN model (75.74%) was significantly better than that of the SVM model(71.52%) [22]
Machine learning researches are focus on predictive modeling However, aiming to create anagent in dynamic environment that is able to learn and improve his performance policy duringtraining requires another approach of machine learning that is reinforcement learning, when
Trang 11agent is created to find the optimal policies and maximize its reward But that is kind of aisolated way to think about the trading environment; what if there is other agents in the worldand in fact evidence suggest that there are in fact others agents exist in the world with our agent.Thus, game theory - the mathematics of conflict between participants is the missing piece tocomplete the model of market Eric Engle et al [note] provided the theoretical ideas of combininggame theory and machine learning to agent-based approach in stocks, but lack of implementationresult
Chapter 3: Theoretical reviews
In the first part of this chapter, we laid out the foundations of game theory At the beginning itformalizes the basic definitions, which are necessary to be able to correctly speak about gamesand game-plays Consecutively it presents the standard representations of games Thebackground in game theory is essential for finding rational responses and also for generalreasoning about games A mathematical formalization of game theory in this chapter is inspired
by [16] In the later part of the chapter, we shall mention how the game theory is applied tocreate decision making agent in stock market environment along with the difficulties oftraditional game theory approach and the need for simulation approach and algorithms
Game theory frame work
Game theory is a part of applied mathematics that studies a strategic decision making It usesmathematical models to formulate interactions between intelligent rational decision-makers.These interactions are called games
Trang 12Games are played within a game environment (foot note :” The di erence between games andffgame environments is sometimes omitted Although, it is useful to distinguish them, especially inthe context of general game playing This problematics is further explained in chapter 4”) (alsocalled world) and are composed of system of rules, which defines the players, the actions andpostulates the dynamics of the game The game is called a puzzle, if there is no more than oneagent involved Otherwise it is a conflict [18]
Definition 2.1 Player
A player (or an agent) is an entity able to act His activities alter the world in which he exists.
The concept of game consists of active and passive elements Passive elements represent theinformation, i.e which actions are feasible for a particular agent in a given state, or how thegame will evolve under certain conditions and actions taken Active elements in the game formthe players Without the players, the game remains static Only their actions can manipulate thegame
Definition 2.2 Action
An action (or a move) is a change in the game caused by a player in a particular situation.
A valid game environment enables all agents to act and be immediately aware of their actions.Their activity can lead to changing current situation as a consequence of their decision making
Di erent situations which can occur before the game terminates are called states of the game.ff
Game is played within a game environment
Trang 13Every game begins in a root state and then progresses according to the game dynamics, asparticipating agents make their decisions All rational players select their actions to achieve theirgoals Theory of utility was established to recognize the e ects of their behavior and evaluate theffsituations in which the agents are located Utility is a value which measures the usefulness of thecurrent state of the game for each player.
Definition 2.3 Utility
Let S be a set with weak ordering preference relation ≤ Utility (or outcome) is a cardinal element e ∈ S, representing the motivation of players The function u is said to be utility function IFF ∀x, y ∈ S: u(x) ≤ u(y) ⇔ x ≤ y.
All together, a mathematical game is a structure, which conclusively defines the whole game andits development
Trang 14This approach is certainly rational enough in puzzles, where there is only one agent to set thecourse of the world In contrast, in the environments with greater number of other players it isprefer able to rather randomize over the set of pure strategies, following selected probabilitydistribution Sometimes rather than a strategy, randomizing the decisions can be seen as a belief
of an agent, that he can profit from playing such action This kind of strategy is called mixed.Playing a mixed strategy ensures that every agent can only guess what will happen; andcompared to the pure strategies, the outcome is now less predictable
Optimal strategy
The whole game theory was originally established to solve a simple question What is an optimalreaction? How should an agent react to be the most likely to win the game? The answer is thatthe fundamental advantage for a player can be an information about the strategies of hisopponents In other words, once an agent is able to guess the next action of any other agent, hecan deliberately follow a strategy which maximizes his terminal utility In conclusion, the set ofall optimal strategies (meaning the strategies with the highest equal expected utility of a rationalwell-informed agent pi is then absolutely decided by the strategies of the others
Trang 15Definition Best response
Agent’s strategy in game is a best response to strategies:
Unfortunately, in most cases the information about the opponents’ strategies is out of reach orobtaining is impossible in sense of computational complexity Another possibility would be toestimate the strategies, e g from the previous actions of other players, and consecutively adjusthis own one
Definition 2.5 Nash equilibrium (NE)
Given a game and strategies , players P are in Nash equilibrium
If the stage of the world allows no one to benefit from changing his strategy, the situationremains stable It has been proved, that in every game with finitely many players and with finiteset of pure strategies, there is at least one Nash equilibrium profile, although it might consist ofmixed strategies [22] ].( choox nay xem references roi sua laic chop hu hop)
Game representations
There is a number of various representations of games The most simple one was presented at thebeginning of this part Although the general definition is su cient enough for the mathematicalffiapparatus, for concrete game examples it is more convenient to establish standard forms andstructures for working with the game data Di erent representations extend the generalffdefinition, thus allowing various games to express their specific aspects in more suitable form.Algorithms for finding Nash equilibria can be adapted to a particular representation to reducecomputational complexity There exist several representations of games, taking into account
Trang 16stochasticity, number of players and decision points, possibility of cooperation and otherimportant characteristics of the game.
Normal form
Normal (or strategic) form is a basic type of game representation, Each player moves once andactions are chosen simultaneously This makes the model simpler than other forms and easier tosolve for Nash equilibrium, but lacks any temporal locality
The most famous representative game of normal form game is Prisoner’s Dilemma which isdescribes as follow:
Two members of a criminal gang are arrested and imprisoned Each prisoner is in solitary confinement with no means of communicating with the other The prosecutors lack sufficient evidence to convict the pair on the principal charge They hope to get both sentenced to a year in prison on a lesser charge Simultaneously, the prosecutors offer each prisoner a bargain Each prisoner is given the opportunity either to: betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent The offer is:
If X and Y each betray the other, each of them serves 5 years in prison
If X betrays Y but Y remains silent, X will be set free and Y will serve 20 years in prison
(and vice versa)
If X and Y both remain silent, both of them will only serve 1 year in prison (on the lesser charge)
Trang 17An example of Prisoner’s Dilemma game
From that example we would observe that both confess is the Nash equilibrium of this gamebecause both player have no incentive to change their options
Extensive form
Extensive form models a multi-agent sequential decision making Convenient representation of
an extensive-form game is a game tree Such structure allows to express even complicatedbranching of the game, restricting actions in di erent game states to the feasible ones only.ff
Definition 2.6 Game tree
Every game tree is a tuple where:
S is a set of game states;
Z is a subset of S of terminal states;
A is a set of game actions;
e is an expander function, e: s ∈ S → {a ∈ A | a is executable in s};
f is a successor function, f: (s ∈ S × a ∈ e(s)) → t ∈ S; and
r ∈ S is a root state.
Using the notion of a game tree, now it is possible to define an extensive-form game Thisrepresentation consists of a game tree with a set of players, who are assigned to the states of the
Trang 18tree; and a utility function, which determines the utility in every terminal state, i.e in every leaf
of the game tree
Definition 2.7 Extensive-form games
Game in extensive form is a tuple , where:
In the example of matching pennies in extensive form, the second player can always make herchoice dependent on the first player’s choice; if the first player selects Head, she will select Tail,and if the first player selects Tail, she will select Head If paired with any of the two purestrategies of the first player, we have a Nash equilibrium in pure strategies
An example of extensive-form game – Matching pennies
Trang 19Stochastic games (Markov Games)
Arguably, most—if not all—real-world systems are influenced by events of a probabilisticnature Shapley (1953) was the first to define a game model that in corporates probabilisticchoices
Definition 2.8 Stochastic games
According to Shapley, stochastic games is a tuple of where:
S: is the states of the games;
Ai is the set of available action for player i, A is the set of available action for players;
T: is the transitions function it means that at state S if player I choose action ai and
others choose action simultaneously then the probability of reaching some next states S’;
R: is the reward for the players for taking chosen actions
γ: is the discount factor.
N: is the number of players
Shapley games are played by a finite number of players on a finite state space, and in each state,each player chooses one of finitely many actions resulting profile of actions determines a rewardfor each player and a probability distribution on successor states
In principle, a stochastic game proceeds ad infinitum The payoff that each player receives isgiven by a function of the infinite stream of rewards for this player: Shapley considered games
where payoffs are discounted sum of rewards; other popular payoff functions are the limit average of the rewards or the total sum of the reward that was mentioned by Filar & Vrieze in
1997
A pure strategy in a stochastic game assigns an action to each possible sequence of states visited
so far, where as a randomized strategy assigns a probability distribution on actions to each such
sequence Hence, every player has at his command, the Nash’s theorem of equilibrium is not
Trang 20applicable Nevertheless, in the case of discounted payoffs, there always exists a Nash
equilibrium in randomized strategies There is even a Nash equilibrium where strategies only
depend on the current state and not on the full history of visited states; we call such strategies
stationary For the general sum game, the Nash equilibria do not exist.
Thus, how the stochastic game could be applied to our research in order to create an agent havingthe ability to make decision without the supervised of human In principle, the stock market as astochastic game between our agent and others self interested agent, they can cooperate orcompetitive with others in order to gain the optimize reward However, the practical problem isunable to know all the information about other agents’ decision and state Then, in the context ofthis thesis, we describe the stock market game as two player stochastic game, all the interaction
of other agents to our agent’s action shall be reflected through market movement (nature) As can
be seen, it could be ease to directly apply the stochastic game to stock market where our agentchooses an action based on the current state, estimate the next available states and rewards, thenchoose the best respond at the current state However, it is impossible to predetermine all stateand available next states along with the rewards from taking actions because of the complex thenature of the market Fortunately, other research field that holds the key factor to solve ourproblem simulation and computer science approach in the form of machine learning
Simulation
In the following parts, we shall mention some key concept of simulation and machine learning toprovide more insight on how they could be the solution for the problem of traditional stochasticgame
Simulation
Trang 21Simulation methods are ways to imitate of the operation of real-world systems It first requiresthat a model be developed representing characteristics, behaviors and functions of the selectedsystem or process The model represents the system itself, whereas the simulation represents theoperation of the system over time.
The methods are widely used is Economy, Biology, Engineering and almost all sciences It isusually done using computers making changes to variables and performing predictions about thebehavior of the system A good example of the usefulness of computer simulation can be found
in automobile traffic simulation, grocery stores check out lines, inventory management, stockprices predictions, environmental consequences of policies and so on
Key issues in simulation include acquisition of valid source information about the relevantselection of key characteristics and behaviors, the use of simplifying approximations andassumptions within the simulation, and fidelity and validity of the simulation outcomes.Procedures and protocols for model verification and validation are an ongoing field of academicstudy, refinement, research and development in simulations technology or practice, particularly
in the field of computer simulation
Trang 22The simulation procedure.
Algorithms
Machine learning
Machine learning: Machine learning is a field of computer science that often uses statistical
techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed [Samuel, Arthur (1959) "Some Studies in Machine Learning Using the Game of Checkers" IBM Journal of Research and Development.]
Analysts like to talk about their model that they build in term of the problem that they solve Amodel is the process of taking in observations then provide predictions There was a lot ofmodels that people have built base on the application of simulation model, for example thefamous Black-Scholes model that predicts options prices Those models are developed by usingmathematical formula based
Trang 23However, to deal with the problem of building an agent that can learn and adapt to theenvironment, we need simulation approach under the form of machine learning With machinelearning, we do not use direct observations like modeling, we try to use data The machinelearning process is to take historical data, run it through a machine learning algorithm to generatethe model The model is not built by human but the machine it self Then when we need to usethe model, we just provide some input and the out put come out automatically.
Application to stock data
The application of machine learning approach to stock data is quite straight-forward, thefollowing figure shall describe how it works with historical stock data The historical datarepresents the value of the features for a particular stock through time horizon, we representthose features by stacking these one behind the other We use machine learning algorithms totrain our agent based on those features and historical price
Trang 24
Historical data
Features (x)P/EBollinger bandMoving average
The trading agent might be conveniently modeled in the framework of reinforcement learning asmention above This framework adjusts the parameters of an agent to maximize the expectedpayoff or reward generated due to its actions Therefore, the agent learns a policy that tells him
Trang 25the actions it must perform to achieve its best performance This optimal policy is exactly what
we hope to find when we are building an automated trading strategy
To solving Stochastic games of our agent, Markov decision processes (MDPs) are the mostcommon model when implementing reinforcement learning It can be considered as narrowdown model of Stochastic games The MDP model of the environment consists, among otherthings, of a discrete set of states S and a discrete set of actions taken from A In this project, weonly mention the action set of our agent because we assume that other agent action will bereflected as price movement of the stock; depending on the position of the learner (long or short),
at each time step t it will be allowed to choose an action at from different subsets from the actionspace A, that consists of three possible actions:
Where:
None indicates that the agent shouldn't have any order in the market
Long and Short means that the agent should execute a market order to buy or sell 100stocks (the size of an order will always be a hundred shares)
So, at each discrete time step t, the agent senses the current state and choose to take an action at.The environment responds by providing the agent a reward and by producing the succeedingstate The functions r and δ only depend on the current state and action (it is memoryless), arepart of the environment and are not necessarily known to the agent
The task of the agent is to learn a policy that maps each state to an action, selecting its nextaction at based solely on the current observed state st, that is The optimal policy, or control
Trang 26strategy, is the one that produces the greatest possible cumulative reward over time So, statingthat:
Where is also called the discounted cumulative reward and it represents the cumulative valueachieved by following a policy π from an initial state t and is a constant that determines therelative value of delayed versus immediate rewards
If we set , only immediate rewards are considered As , future rewards are given greater emphasisrelative to immediate reward The optimal policy that will maximize for all states can bewritten as:
However, learning directly is difficult because the available training data does not providetraining examples of the form Instead, the only available information is the sequence ofimmediate rewards
So, as we are trying to maximize the cumulative rewards for all states , the agent should preferwherever Given that the agent must choose among actions and not states, and it isn't able toperfectly predict the immediate reward and immediate successor for every possible state-actiontransition, we also must learn indirectly
To solve that, we define a function such that its value is the maximum discounted cumulativereward that can be achieved starting from state s and applying action a as the first action So, wecan write: