TRW Logo
Visit
Artificial IntelligencePreprint

Machine Learning Approaches to Order Flow Prediction in Indian Equity Markets

Abstract

We investigate the application of machine learning techniques to predict short-term order flow dynamics in the Indian equity market. Using a novel dataset of level-2 order book snapshots from the NSE, we compare the predictive performance of gradient boosting, LSTM networks, and transformer-based architectures for forecasting order imbalance, trade direction, and short-term price movements. Our results demonstrate that attention-based models achieve superior performance in capturing the complex, non-linear relationships inherent in high-frequency order flow data.

Authors: TRW AI Research Lab, Quantitative Engineering Division
·April 2026

Introduction

Order flow — the sequence and characteristics of buy and sell orders arriving at a market — contains rich information about the supply-demand dynamics driving price formation. Traditional market microstructure models rely on simplified assumptions about order arrival processes, often failing to capture the complex, non-linear patterns present in real market data.

Recent advances in machine learning, particularly in sequence modeling and attention mechanisms, offer promising new approaches to understanding and predicting order flow dynamics. This paper applies these techniques to the Indian equity market, where the growing scale and complexity of electronic trading create both challenges and opportunities for data-driven analysis.

The application of machine learning to market microstructure has grown rapidly. Key contributions include:

  • Sirignano & Cont (2019): Universal features in price formation across equity markets using deep learning
  • Zhang et al. (2019): DeepLOB architecture for limit order book modeling
  • Kolm et al. (2021): Comprehensive survey of ML applications in market making and order flow

Our work differs in its focus on the Indian market, which presents distinct characteristics including different tick size structures, trading hour patterns, and participant composition.

Data and Features

Dataset

We utilize level-2 order book data from the NSE for 50 liquid equities over a 12-month period (January 2025 — December 2025), comprising:

  • Order book snapshots: Top 20 levels of bid and ask prices and quantities, sampled at 100ms intervals
  • Trade records: Individual trade executions with timestamps, prices, and quantities
  • Market context: Index levels, sector indices, and aggregate market volume

Feature Engineering

We construct three categories of features:

Order Book Features:

  • Bid-ask spread (absolute and relative)
  • Order book imbalance at multiple levels
  • Depth-weighted mid-price
  • Queue position indicators
  • Volume-weighted price levels

Trade Flow Features:

  • Trade direction (Lee-Ready algorithm adapted for Indian market conventions)
  • Order imbalance ratios over multiple windows (1s, 5s, 30s, 1min, 5min)
  • Volume-weighted average trade size
  • Aggressive vs. passive order ratios

Contextual Features:

  • Time-of-day indicators (opening, closing, lunch hour patterns)
  • Relative spread to daily average
  • Distance from daily VWAP
  • Realized volatility over rolling windows

Model Architectures

Gradient Boosted Trees (XGBoost)

We train XGBoost models on tabular feature representations with hyperparameter optimization via Bayesian search. This serves as our primary baseline, representing the state-of-the-art in tabular machine learning.

LSTM Networks

Bidirectional LSTM networks process sequential order book snapshots, capturing temporal dependencies in the order flow:

  • 2-layer BiLSTM with 128 hidden units per direction
  • Attention pooling over the sequence dimension
  • Dropout regularization (0.3) and gradient clipping

Transformer Architecture

We implement a custom transformer encoder adapted for financial time series:

  • 4-layer transformer encoder with 8 attention heads
  • Positional encoding adapted for irregular time intervals
  • Feature-wise attention to learn relationships between order book levels
  • Causal masking to prevent information leakage

Results

Prediction Tasks

We evaluate models on three prediction tasks with 5-minute forward horizons:

Model Order Imbalance (R²) Trade Direction (AUC) Price Movement (Accuracy)
XGBoost 0.31 0.72 61.3%
BiLSTM 0.35 0.74 63.1%
Transformer 0.41 0.78 65.8%

Key Observations

  1. Transformer superiority: The attention mechanism excels at capturing cross-level relationships in the order book, outperforming sequential models by 6-8% across all metrics

  2. Feature importance: Order book imbalance at levels 1-3 and recent trade flow direction are the most predictive features across all models

  3. Time-of-day effects: Model performance varies significantly by time of day, with the highest accuracy during the first and last 30 minutes of trading

  4. Stock specificity: Performance varies by stock liquidity, with the most liquid stocks (top 10 by volume) showing 3-5% higher prediction accuracy

Practical Applications

Integration with Strat AI

The order flow prediction models developed in this research are being integrated into the Strat AI platform to provide:

  • Real-time order flow analysis: Visual representation of predicted buying/selling pressure
  • Entry timing optimization: Alerts when order flow conditions suggest favorable entry points
  • Risk warnings: Detection of unusual order flow patterns that may precede adverse price movements

Limitations

  • Models require low-latency data feeds for real-time deployment
  • Performance degrades during market stress events (regime change)
  • Training data requirements are substantial (minimum 6 months per instrument)

Conclusion

Our results demonstrate that modern machine learning architectures, particularly transformers with attention mechanisms, can effectively capture the complex dynamics of order flow in the Indian equity market. The superior performance of attention-based models suggests that cross-level relationships in the order book contain significant predictive information that sequential models cannot fully exploit.

Future work will extend these models to the derivatives segment and investigate multi-market order flow dynamics across correlated instruments.


This preprint is part of TRW’s research program on applying advanced AI techniques to Indian financial markets. The models described are under active development for integration into the Strat AI platform.

Keywords:machine learningorder flowdeep learningmarket microstructureNLPtransformers