A reinforcement learning approach to improve the performance of the Avellaneda-Stoikov market-making algorithm PLOS ONE

genetic

The btc-usd data for 7th December 2020 was used to obtain the feature importance values with the MDI, MDA and SFI metrics, to select the most important features to use as input to the Alpha-AS neural network model. At each training step the parameters of the prediction DQN are updated using gradient descent. An early stopping strategy is followed on 25% of the training sets to avoid overfitting. The architecture of the target DQN is identical to that of the prediction DQN, the parameters of the former being copied from the latter every 8 hours. The prediction DQN receives as input the state-defining features, with their values normalised, and it outputs a value between 0 and 1 for each action.

While we do not change the rest of the parameters in Table1 and we observe our expectations in solutions which can be tracked by Table8, in coherence with . While keeping the other parameters same as in the Table1, our above expectation matches with the solutions obtained and be seen Table7. Increases as the trader expects the price to move up, she sends the orders at higher prices to get profit from the price increase which meets with our expectation. On the other hand, the results show that our strategy has a lower standard deviation. It can be also seen that the inventory of the trader reverts to zero more quickly than the symmetric strategy and the standard deviation of the inventory is produced less in the strategy. This part intends to show the numerical experiments and the behaviour of the market maker under the results given in Sect.

Multi-dimensional optimal trade execution under stochastic resilience

From the negative values in the Max DD columns, we see that Alpha-AS-1 had a larger Max DD (i.e., performed worse) than Gen-AS on 16 of the 30 test days. However, on 13 of those days Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances. Only on one day was the trend reversed, with Gen-AS performing slightly worse than Alpha-AS-1 on Max DD, but then performing better than Alpha-AS-1 on P&L-to-MAP. Table 6 compares the results of the Alpha-AS models, combined, against the two baseline models and Gen-AS.

Massively getting adverse filled. Avellaneda and Stoikov is a great model for beginners but really it is a set of rules for a probability distribution that is assumed, the real goal is to take that probability distribution and then apply rules. We must also consider

…

— Stat Arb (@quant_arb) July 18, 2022

The price to pay is a diminished nuance in the learning from very large values, while retaining a higher sensitivity for the majority, which are much smaller. By truncating we also limit potentially spurious effects of noise in the data, which can be particularly acute with cryptocurrency data. A second problem with Q-learning is that performance can be unstable. Increasing the number of training experiences may result in a decrease in performance; effectively, a loss of learning. To improve stability, a DQN stores its experiences in a replay buffer, in terms of the value function given by Eq , where now the Q-value estimates are not stored in a matrix but obtained as the outputs of the neural network, given the current state as its input.

Journal of Economic Dynamics and Control

It serves as a hard limit below which orders won’t be placed, if users choose to ensure that buy and sell orders won’t be placed too close to each other, which may be detrimental to the market maker’s earned fees. The minimum spread is given by the minimum_spread parameter as a percentage of the mid price. By default its value is 0, therefore the strategy places orders at optimal bid and ask prices. Consequently, the Alpha-AS agent adapts its bid and ask order prices dynamically, reacting closely (at 5-second steps) to the changing market. This 5-second interval allows the Alpha-AS algorithm to acquire experience trading with a certain bid and ask price repeatedly under quasi-current market conditions. As we shall see in Section 4.2, the parameters for the direct Avellaneda-Stoikov model to which we compare the Alpha-AS model are fixed at a parameter tuning step once every 5 days of trading data.

Where the 0 subscript denotes the best orderbook price level on the ask and on the bid side, i.e., the price levels of the lowest ask and of the highest bid, respectively. Market indicators, consisting of features describing the state of the environment. A discount factor (γ) by which future rewards are given less weight than more immediate ones when estimating the value of an action (an action’s value is its relative worth in terms of the maximization of the cumulative reward at termination time). Discover a faster, simpler path to publishing in a high-quality journal. PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership – a perfect fit for your research every time. To maximize trade profitability, spreads should be enlarged such that the expected future value of the account is maximized.

Deep LOB trading: Half a second please!

More precisely, we investigate an optimal trading strategy by these extended price modelings under different utility functions proposing two independent counting processes to express the market impact by the arrival and the filled limit orders motivated by Cartea and Jaimungal . We derive the closed-form solutions for the optimal quotes and solve the corresponding nonlinear HJB equations using the finite difference discretization method which enables us GAL https://www.beaxy.com/ to evaluate the spread values and derive the various simulation analyzes. Furthermore, we explore the risk and normality testings of the models depending on their strategies. Lastly, we compare the models that we have derived in this paper with existing optimal market making models in the literature under both quadratic and exponential utility functions.

Moreover, in practice the importance of being able to get out with back of queue orders is very important and is completely exogenous to the model. This potential weakness of the analytical AS approach notwithstanding, we believe the theoretical optimality of its output approximations is not to be undervalued. On the contrary, we find value in using it as a starting point from which to diverge dynamically, taking into account the most recent market behaviour. Tables 2 to 5 show performance results over 30 days of test data, by indicator (2. Sharpe ratio; 3. Sortino ratio; 4. Max DD; 5. P&L-to-MAP), for the two baseline models , the Avellaneda-Stoikov model with genetically optimised parameters (AS-Gen) and the two Alpha-AS models.

Characterisation of different market conditions and specific training under them, with appropriate data , can also broaden and improve the agent’s strategic repertoire. The agent’s action space itself can potentially also be enriched profitably, by adding more values for the agent to choose from and making more parameters settable by the agent, beyond the two used in the present study (i.e., risk aversion and skew). In the present study we have simply chosen the finite value sets for these two parameters that we deem reasonable for modelling trading strategies of differing levels of risk. This helps to keep the models simple and shorten the training time of the neural network in order to test the idea of combining the Avellaneda-Stoikov procedure with reinforcement learning.

Kumar , who uses Spooner’s avellaneda stoikov algorithm as a benchmark, proposes using deep recurrent Q-networks as an improved alternative to DQNs for a time-series data environment such as trading. Gašperov and Konstanjčar tackle the problem be means of an ensemble of supervised learning models that provide predictive buy/sell signals as inputs to a DRL network trained with a genetic algorithm. The same authors have recently explored the use of a soft actor-critic RL algorithm in market making, to obtain a continuous action space of spread values . Comprehensive examinations of the use of RL in market making can be found in Gašperov et al. and Patel . The training of the neural network has room for improvement through systematic optimisation of the network’s parameters.

In this, the most time-consuming step of the backtest process, our algorithms learned from their trading environment what AS model parameter values to choose every five seconds of trading (in those 5 seconds; see Section 4.1.3). For the case of a quadratic utility function, we derive the optimal spreads for limit orders and observe their behaviors. For this purpose, we should obtain an appropriate solution to with the final condition and show that this solution verifies the value function . Recently, there have been crucial developments in quantitative financial strategies to execute the orders driven in markets by computer programs with a very high speed . In particular, high-frequency trading is one of the major topics that attracts the attention excessively due to its benefits on market microstructure, being an interdisciplinary field including the hot topics, such as stochastic optimization, finance, economics and statistics.

In comparison, both the mean and the standard deviation of the Max DD for the Alpha-AS models were very high. Indeed, the differences in Max DD performance between Gen-AS and either of the Alpha-AS models, over all test days, are not statistically significant, despite the large differences in means. The latter are a result of extreme outliers for the Alpha-AS models from days in which these obtained a very poor (i.e., high) value for Max DD. The medians, however, are very similar to the median for the Gen-AS model. Mann-Whitney tests comparing the four daily performance indicator values (Sharpe, Sortino, Max DD and P&L-to-MAP) obtained for the Gen-AS model with the corresponding values obtained for the other models, over the 30 test days. Number of days either Alpha-AS-1 or Alpha-AS-2 scored best out of all tested models, for each of the four performance indicators.

market tick

We use the avellaneda stoikov algorithm to modify the risk aversion parameter and to skew the AS quotes based on a characterization of the latest steps of market activity. Another distinctive feature of our work is the use of a genetic algorithm to determine the parameters of the AS formulas, which we use as a benchmark, to offer a fairer performance comparison to our RL algorithm. The goal of this paper is first to propose an optimal quoting strategy that is adopted by the stochastic volatility, drift effect and market impact by the amount and type of the orders in the price dynamics. We also consider the case of the market impact occuring by the jumps in volatility dynamics.

Before any estimates can be given, both estimators need to have their buffers filled.
Balancing exploration and exploitation advantageously is a central challenge in RL.
By default the lengths of these buffers are set to be 200 ticks.
Comparison of values for Max DD and P&L-to-MAP between the Gen-AS model and the Alpha-AS models (αAS1 and αAS2).

The DQN has two hidden layers, each with 104 neurons, all applying a ReLu activation function. Where Ψ(τi) is the open P&L for the 5-second action time step, I(τi) is the inventory held by the agent and Δm(τi) is the speculative P&L (the difference between the open P&L and the close P&L), at time τi, which is the end of the ith 5-second agent action cycle. The target for the random forest classifier is simply the sign of the difference in mid-prices at the start and the end of each 5-second timestep.

Through repeated exploration the agent gradually learns the relationships between states, actions and rewards. It can then start exploiting this knowledge to apply an action selection policy that takes it closer to achieving its reward maximization goal. The trading_intensity estimator is designed to be consistent with ideas outlined in the Avellaneda-Stoikov paper. The instant_volatility estimator defines volatility as a deviation of prices from one tick to another in regards to a zero-change price action. The higher the value, the more aggressive the strategy will be to reach the inventory_target_base_pct, increasing the distance between the Reservation price and the market mid price. That is introduced with quadratic utility function and solved by providing a closed-form solution.

Specifically, the implicit high-dimensional feature space of ill-conditioned data is factorized by kernel sparse dictionary. Then, a robust sparse-norm and graph regularization constraints are performed in the objective function to ensure the consistency of the spatial information. For the optimization of the parameters involved in the model, a distributed adaptive proximal Newton gradient descent learning strategy is proposed to accelerate the convergence. Furthermore, considering the dynamic time-series and potentially non-stationary structure of industrial data, we propose extended incremental versions to alleviate the complexity of the overall model computation.

The large amount of data available in these fields makes it possible to run reliable environment simulations with which to train DRL algorithms. DRL is widely used in the algorithmic trading world, primarily to determine the best action to take in trading by candles, by predicting what the market is going to do. For instance, Lee and Jangmin used Q-learning with two pairs of agents cooperating to predict market trends (through two “signal” agents, one on the buy side and one on the sell side) and determine a trading strategy (through a buy “order” agent and a sell “order” agent). RL has also been used to dose buying and selling optimally, in order to reduce the market impact of high-volume trades which would damage the trader’s returns .

Top 10 Quant Professors 2022 – Rebellion Research

Top 10 Quant Professors 2022.

Posted: Thu, 13 Oct 2022 07:00:00 GMT [source]

More advanced models have been developed with adverse selection effects and stronger market order dynamics, see for example the paper of Cartea et al. . Guéant et al. have extended and formalized the results of Avellaneda and Stoikov . Another extended market making model with inventory constraints has been provided by Fodra and Labadie who consider a general case of midprice by linear and exponential utility criteria and find closed-form solutions for the optimal spreads. Cartea and Jaimungal have proposed a solution to deal with the problem of including the market impact on the midprice and have worked on risk metrics for the high-frequency trading strategies they have developed. Moreover, Yang et al. have improved the existing models with Heston stochastic volatility model, to characterize the volatility of the stock price with price impact and, implemented an approximation method to solve the nonlinear HJB equation. They have considered a constant price impact using the same counting processes for both arrival and filled limit orders.

A reinforcement learning approach to improve the performance of the Avellaneda-Stoikov market-making algorithm PLOS ONE

Multi-dimensional optimal trade execution under stochastic resilience

Journal of Economic Dynamics and Control

Deep LOB trading: Half a second please!

Top 10 Quant Professors 2022 – Rebellion Research

Submit a Comment Cancel reply

Recent Posts

Recent Comments