Machine Learning Approaches to Order Flow Prediction in Indian Equity Markets
Abstract
We investigate the application of machine learning techniques to predict short-term order flow dynamics in the Indian equity market. Using a novel dataset of level-2 order book snapshots from the NSE, we compare the predictive performance of gradient boosting, LSTM networks, and transformer-based architectures for forecasting order imbalance, trade direction, and short-term price movements. Our results demonstrate that attention-based models achieve superior performance in capturing the complex, non-linear relationships inherent in high-frequency order flow data.
Introduction
Order flow — the sequence and characteristics of buy and sell orders arriving at a market — contains rich information about the supply-demand dynamics driving price formation. Traditional market microstructure models rely on simplified assumptions about order arrival processes, often failing to capture the complex, non-linear patterns present in real market data.
Recent advances in machine learning, particularly in sequence modeling and attention mechanisms, offer promising new approaches to understanding and predicting order flow dynamics. This paper applies these techniques to the Indian equity market, where the growing scale and complexity of electronic trading create both challenges and opportunities for data-driven analysis.
Related Work
The application of machine learning to market microstructure has grown rapidly. Key contributions include:
- Sirignano & Cont (2019): Universal features in price formation across equity markets using deep learning
- Zhang et al. (2019): DeepLOB architecture for limit order book modeling
- Kolm et al. (2021): Comprehensive survey of ML applications in market making and order flow
Our work differs in its focus on the Indian market, which presents distinct characteristics including different tick size structures, trading hour patterns, and participant composition.
Data and Features
Dataset
We utilize level-2 order book data from the NSE for 50 liquid equities over a 12-month period (January 2025 — December 2025), comprising:
- Order book snapshots: Top 20 levels of bid and ask prices and quantities, sampled at 100ms intervals
- Trade records: Individual trade executions with timestamps, prices, and quantities
- Market context: Index levels, sector indices, and aggregate market volume
Feature Engineering
We construct three categories of features:
Order Book Features:
- Bid-ask spread (absolute and relative)
- Order book imbalance at multiple levels
- Depth-weighted mid-price
- Queue position indicators
- Volume-weighted price levels
Trade Flow Features:
- Trade direction (Lee-Ready algorithm adapted for Indian market conventions)
- Order imbalance ratios over multiple windows (1s, 5s, 30s, 1min, 5min)
- Volume-weighted average trade size
- Aggressive vs. passive order ratios
Contextual Features:
- Time-of-day indicators (opening, closing, lunch hour patterns)
- Relative spread to daily average
- Distance from daily VWAP
- Realized volatility over rolling windows
Model Architectures
Gradient Boosted Trees (XGBoost)
We train XGBoost models on tabular feature representations with hyperparameter optimization via Bayesian search. This serves as our primary baseline, representing the state-of-the-art in tabular machine learning.
LSTM Networks
Bidirectional LSTM networks process sequential order book snapshots, capturing temporal dependencies in the order flow:
- 2-layer BiLSTM with 128 hidden units per direction
- Attention pooling over the sequence dimension
- Dropout regularization (0.3) and gradient clipping
Transformer Architecture
We implement a custom transformer encoder adapted for financial time series:
- 4-layer transformer encoder with 8 attention heads
- Positional encoding adapted for irregular time intervals
- Feature-wise attention to learn relationships between order book levels
- Causal masking to prevent information leakage
Results
Prediction Tasks
We evaluate models on three prediction tasks with 5-minute forward horizons:
| Model | Order Imbalance (R²) | Trade Direction (AUC) | Price Movement (Accuracy) |
|---|---|---|---|
| XGBoost | 0.31 | 0.72 | 61.3% |
| BiLSTM | 0.35 | 0.74 | 63.1% |
| Transformer | 0.41 | 0.78 | 65.8% |
Key Observations
-
Transformer superiority: The attention mechanism excels at capturing cross-level relationships in the order book, outperforming sequential models by 6-8% across all metrics
-
Feature importance: Order book imbalance at levels 1-3 and recent trade flow direction are the most predictive features across all models
-
Time-of-day effects: Model performance varies significantly by time of day, with the highest accuracy during the first and last 30 minutes of trading
-
Stock specificity: Performance varies by stock liquidity, with the most liquid stocks (top 10 by volume) showing 3-5% higher prediction accuracy
Practical Applications
Integration with Strat AI
The order flow prediction models developed in this research are being integrated into the Strat AI platform to provide:
- Real-time order flow analysis: Visual representation of predicted buying/selling pressure
- Entry timing optimization: Alerts when order flow conditions suggest favorable entry points
- Risk warnings: Detection of unusual order flow patterns that may precede adverse price movements
Limitations
- Models require low-latency data feeds for real-time deployment
- Performance degrades during market stress events (regime change)
- Training data requirements are substantial (minimum 6 months per instrument)
Conclusion
Our results demonstrate that modern machine learning architectures, particularly transformers with attention mechanisms, can effectively capture the complex dynamics of order flow in the Indian equity market. The superior performance of attention-based models suggests that cross-level relationships in the order book contain significant predictive information that sequential models cannot fully exploit.
Future work will extend these models to the derivatives segment and investigate multi-market order flow dynamics across correlated instruments.
This preprint is part of TRW’s research program on applying advanced AI techniques to Indian financial markets. The models described are under active development for integration into the Strat AI platform.

