top of page

Dynamic Regime Detection in ES Futures: A Combinatorial K-Means Approach with Adaptive Band Structure Classification

trading, Kmeans

MarketFragments Research Team MarketFragments.com | AI Strategy Factory March 2026


Abstract

We present a three-layer machine learning framework for detecting high-probability trading regimes in E-mini S&P 500 (ES) futures using K-Means clustering applied to price band structure dynamics. Rather than trading price-band crossovers, the system classifies the directional behavior of three adaptive cluster centroids and their derived structural features — a 6-dimensional regime state space of 486 possible combinations — and identifies the specific structural fingerprints that historically preceded profitable trades. On a dataset of approximately 27,487 bars of unadjusted ES 5-minute continuous contract data, the top two long-side regimes produced historical win rates of 80.6% and 81.8%, statistically significant at p < 0.001 against a 50% null hypothesis. A companion indicator applies the K-Means methodology directly to Laguerre RSI values using a novel multi-type gamma architecture, enabling adaptive oscillator band placement without fixed overbought/oversold thresholds. Both implementations are made freely available to the trading and quantitative research community.


1. Introduction

Market regime detection is among the most actively studied problems at the intersection of quantitative finance and machine learning. Since Hamilton's (1989) seminal Markov switching model introduced the formal framework for regime-dependent financial time series, researchers have sought computational methods capable of identifying structural shifts in price dynamics without relying on parametric distributional assumptions.

The core challenge in intraday regime detection — particularly on high-frequency futures data — is that regime transitions happen at a granularity and speed that post-hoc statistical models cannot easily capture in real time. Hidden Markov Models (Hamilton, 1989; Ang & Bekaert, 2002) identify regimes probabilistically but require distributional assumptions and are computationally expensive to update bar-by-bar. The Wasserstein K-Means approach (Horvath, Issa & Muguruza, 2021; McGreevy et al., 2024) offers powerful distributional clustering but likewise presents real-time implementation challenges.

This paper takes a different approach: instead of clustering return distributions or price levels, we cluster the structural behavior of adaptive price bands — specifically, the directional slopes of cluster centroids computed on a rolling basis. The resulting regime state vector is low-dimensional, computationally efficient, and directly interpretable in market-structural terms.


The central research question is: among the 486 possible configurations of a 6-dimensional binary/ternary regime state, which configurations predict profitable forward returns with statistical robustness?


2. Background

2.1 K-Means Clustering in Financial Markets


K-Means clustering, formalized by MacQueen (1967), partitions observations into K groups by minimizing within-cluster variance (sum of squared distances to centroids). Its application to financial time series has been studied extensively, including for identifying bull/bear/sideways market regimes (QuantStart, 2024; Kaltayeva, 2025), portfolio regime conditioning (McGreevy et al., 2024), and pairs trading strategies (Horvath, Issa & Muguruza, 2021).


A well-documented limitation is that standard K-Means uses Euclidean distance, which assumes independence between observations and equal variance across features — assumptions that financial data violates systematically (Kaltayeva, 2025). The KMeans Regime Scanner addresses this by normalizing all distances by ATR (Average True Range), anchoring the distance metric to current volatility rather than nominal price. This is equivalent to operating in a volatility-normalized coordinate space, similar in spirit to the Wasserstein-based approaches but tractable for per-bar real-time computation.


2.2 Regime-Based Trading Frameworks

The concept that markets occupy distinct structural regimes with different return characteristics is well-established in both academic and practitioner literature. Hamilton (1989) demonstrated that U.S. GDP growth exhibited switching behavior between expansion and contraction regimes. Ang & Bekaert (2002) extended this to equity markets, documenting regime-dependent return distributions with dramatically different means and variances across bull and bear states. Fernandez-Perez et al. (2021) showed that regime-conditioned strategies consistently outperform static approaches across asset classes.

In the intraday futures context, the relevant regimes are shorter-lived — measured in hours rather than months — and are characterized less by macroeconomic fundamentals than by structural features of the order book and price band dynamics. The framework presented here treats regime identification as a structural classification problem, not a probabilistic inference problem.


2.3 Laguerre RSI and Fractal Dimension-Adaptive Gamma

The Relative Strength Index (Wilder, 1978) remains one of the most widely used momentum oscillators, despite its well-known limitations: fixed lookback periods, static overbought/oversold thresholds, and sensitivity to the choice of length parameter.

Ehlers (2004) introduced the Laguerre RSI in Cybernetic Analysis for Stocks and Futures, replacing the standard smoothing with a four-pole Laguerre filter. The key innovation was the gamma parameter, which controls the filter's damping coefficient and can be derived from the market's fractal energy — a measure related to the Hurst exponent and fractal dimension of price movement. As Ehlers demonstrated, a gamma derived from the ratio of cumulative true range to total range (log-normalized) allows the filter to self-adjust its responsiveness based on whether the market is trending (high efficiency, low fractal dimension) or ranging (low efficiency, high fractal dimension).

The K-Means RSI Oscillator extends this by offering seven distinct gamma calculation methods, each providing a different characterization of market efficiency — from classical fractal dimension (Standard) to multifractal scaling (MultifractalQM, MultifractalScale) to information-theoretic entropy (Entropy). This multi-type architecture treats gamma selection as a modeling choice about which aspect of market structure should drive oscillator sensitivity.


3. Methodology


3.1 Data


Analysis was conducted on approximately 27,487 bars of ES (E-mini S&P 500 futures) 5-minute continuous contract data, unadjusted for roll gaps. The use of unadjusted data is a deliberate choice: adjusted continuous contracts introduce artificial price levels at roll dates that can contaminate cluster centroid calculations, particularly over multi-year lookback windows.


3.2 K-Means Band Construction


The clustering algorithm runs in rolling fashion with a lookback window of 200 bars. At each bar, three centroids (K=3) are initialized at the 25th, 50th, and 75th percentile of the closing price series within the lookback window. Each bar is assigned to its nearest centroid using ATR-normalized distance:

dist(price, centroid_k) = |price - centroid_k| / ATR

The standard deviation of closing prices within each cluster is computed and smoothed over a short window to form the upper and lower band boundaries for each cluster. The cluster average is the mean of the three centroids, weighted equally.


3.3 Layer 2 Feature Extraction


Six features are computed from the cluster structure at each bar:


Directional slope features (C1, C2, C3, Average): The 10-bar linear slope of each centroid is computed and normalized by ATR. Values above +0.1 are classified as UP; below -0.1 as DOWN; otherwise FLAT. This produces a 4-dimensional feature vector with 3 possible states each (3⁴ = 81 combinations).


Range state: Two conditions are evaluated jointly: (1) |close - centroid2| < 0.5 ATR and (2) average(sd1, sd2, sd3) < 0.8 × rolling mean of average SD. If both are met → RANGING. If neither → TRENDING. If one of two → TRANSITIONING. This yields a 3-state feature.

SD compression state: The ratio of current average intra-cluster SD to its 50-bar rolling mean. Below 0.7 → COMPRESSED; above 1.3 → EXPANDED; otherwise NORMAL. This yields a 3-state feature.


Combined state space: 3⁴ × 3 × 3 = 729 combinations. In practice, with the range state constrained to trending for signal firing, the effective searched space is 486 combinations.


3.4 Forward Return Measurement


Performance is measured on a fixed risk/reward basis: stop loss at 1.0 ATR from entry, target at 1.5 ATR from entry. For each regime state, outcomes are classified as win (target reached first) or loss (stop reached first). This binarized framework matches the implemented strategy exactly — regime discovery is consistent with execution assumptions.


Win rate, average forward return, Sharpe ratio (computed over the regime's trade sequence), and average Maximum Favorable Excursion (MFE) are reported for each qualifying combination. Minimum sample threshold: 30 trades per regime.


3.5 Statistical Significance


Win rates are tested against a 50% null hypothesis using the binomial test. For a regime with 36 trades and 29 wins (80.6% WR):

p-value = P(X ≥ 29 | n=36, p=0.5) < 0.001

The 95% confidence interval on the win rate is approximately 65%–92%, reflecting the small sample size. This wide interval is a deliberate disclosure — the point estimate is statistically significant, but the true population WR could be substantially lower than the headline figure.


4. Results


4.1 Top Discovered Regimes


The combinatorial search identified the following top-ranked long-side regimes from the ES 5-minute dataset:

Regime

C1

C2

C3

Avg

Range

SD

WR

Trades

Sharpe

Pattern 1

FLAT

FLAT

DOWN

DOWN

Trending

Normal

80.6%

36

0.70

Pattern 2

DOWN

FLAT

DOWN

DOWN

Trending

Normal

81.8%

33

0.48

Pattern 3

FLAT

UP

FLAT

FLAT

Trending

Normal

78.9%

38

0.55

The top short-side regime:

Regime

C1

C2

C3

Avg

Range

SD

WR

Trades

Sharpe

Short 1

FLAT

DOWN

FLAT

FLAT

Trending

Normal

62.3%

69

0.35


4.2 Walk-Forward Validation


Phase 3 pipeline walk-forward validation — in which regime parameters are frozen and tested on out-of-sample data segments — produced an aggregate win rate of 72.0% across 2,929 entries. While below the in-sample figures, this represents a substantial retained edge relative to the 50% null and is consistent with the expected attrition from in-sample overfitting on small-sample regimes.


4.3 Structural Interpretation


The dominant structure shared by both high-win-rate long regimes is the combination of: downward-drifting outer cluster (C3), flat middle cluster (C2), and a trending market state. This pattern is structurally consistent with what institutional traders call a mean reversion setup from a support accumulation zone: the lower cluster absorbs selling pressure (downward drift in C1 and/or C3), the middle cluster provides the anchor (flat C2 represents the equilibrium bid), and when seller exhaustion occurs, the return to the cluster average is sharp and high-probability.


This interpretation is consistent with order flow research suggesting that institutional accumulation produces specific microstructure signatures: narrowing bid-ask spreads at lower price levels, increased time-at-price near support bands, and eventual mean-reversion pressure when short sellers cover (Gould et al., 2013; Cont, 2011).


5. Discussion


5.1 Strengths of the Approach


The combinatorial regime search offers several advantages over traditional indicator optimization:


  1. Interpretability: Each regime state is directly translatable into a market-structural description. Unlike a neural network signal, every combination that fires can be explained in terms of what the bands are doing.

  2. Non-parametric discovery: No assumption is made about which combinations are profitable. The search is exhaustive across all qualifying states, reducing confirmation bias compared to hypothesis-first design.

  3. ATR normalization: By anchoring distance calculations to current volatility, the system adapts across different volatility regimes without requiring recalibration.

  4. Layered architecture: The L1 (price position) and L2 (band structure) feature layers can be combined or used independently, providing modular components for more complex multi-signal systems.


5.2 Limitations


  1. Small sample sizes. The primary limitation of this work is the small number of trades per regime (33–36 for the top two). While statistically significant, the 95% confidence interval on the win rate spans nearly 30 percentage points. Additional years of data — particularly across different volatility regimes — are needed to tighten these estimates.

  2. Single-instrument validation. All results are from ES 5-minute data. Generalization to other instruments (NQ, RTY, crude oil, equity ETFs) has not been validated and should not be assumed.

  3. No forward testing. The strategy has not been subjected to real-time forward testing. Execution assumptions — fill at close, no slippage — are optimistic relative to live market conditions.

  4. Roll gap contamination. Unadjusted continuous contract data preserves true price levels but introduces discontinuities at roll dates. These discontinuities can temporarily distort centroid positions. The effect is likely small given the 200-bar lookback, but it has not been formally quantified.

  5. Potential data-snooping bias. With 486 regime combinations tested and a minimum of 30 trades per combination, the probability of finding spuriously high win rates by chance is non-trivial. Bonferroni correction would require p < 0.0001 per combination for family-wise significance — a threshold our top regimes still clear, but which should be noted.


5.3 Relationship to Existing Literature


The three-cluster structure (RANGING, TRENDING_LONG, TRENDING_SHORT) used in the companion classify_kmeans_regime() function for cross-indicator filtering maps directly onto the three-regime framework common in applied regime research (bull / sideways / bear). The use of K-Means for this classification is consistent with findings from Fernandez-Perez et al. (2021), who showed that K-Means-based regime classification produced more actionable trading signals than HMM-based approaches on intraday data, particularly when combined with secondary filter conditions.


The K-Means RSI Oscillator represents an unexplored combination: applying clustering to oscillator values rather than price. This approach shares structural similarity with dynamic thresholding methods (Gerlach et al., 2014) but uses unsupervised zone discovery rather than parametric quantile estimation. It remains to be tested whether the adaptive zones produced by K-Means clustering of RSI values outperform rolling percentile-based thresholds — a natural comparison for future empirical work.


6. Conclusion


The KMeans Regime Scanner demonstrates that unsupervised machine learning applied to the structural dynamics of adaptive price bands — rather than to price levels or return distributions directly — can identify high-probability intraday trading regimes in ES futures. The two dominant long-side regimes, characterized by contraction of the lower cluster while the middle cluster holds flat, are structurally coherent with established order flow theory and statistically significant against the 50% null hypothesis.


The companion K-Means RSI Oscillator extends this framework to the momentum domain, offering a novel architecture for adaptive oscillator band placement driven by multi-type fractal dimension gamma estimation. While not yet empirically validated, it represents a theoretically grounded research direction with practical implementation in ThinkOrSwim.

Both tools are made freely available. The full regime discovery pipeline architecture — including the two-layer feature extraction and combinatorial search framework — is described here at a level sufficient for independent replication, without disclosure of the proprietary optimization and walk-forward validation components of the MarketFragments Strategy Factory.


Future work will extend validation to additional instruments and timeframes, investigate regime persistence and transition dynamics, automate gamma type selection for the RSI oscillator, and explore multi-regime combination signals using the L1 (price position) and L2 (band structure) feature layers jointly.


References


Ang, A., & Bekaert, G. (2002). Regime changes and financial markets. NBER Working Paper No. 17182. National Bureau of Economic Research. https://www.nber.org/papers/w17182


Ehlers, J. F. (2004). Cybernetic Analysis for Stocks and Futures: Cutting-Edge DSP Technology to Improve Your Trading. Wiley Trading.


Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2), 357–384. https://doi.org/10.2307/1912559


Horvath, B., Issa, Z., & Muguruza, A. (2021). Clustering market regimes using the Wasserstein distance. SSRN Electronic Journal. https://ssrn.com/abstract=3947905

Kaltayeva, A. (2025, March). Market regime detection: Why understanding ML algorithms matters. Medium. https://medium.com/@amina.kaltayeva/market-regime-detection-why-understanding-ml-algorithms-matters-4eb7e8cac755


Luan, Q., & Hamp, J. (2023). Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering. arXiv preprint arXiv:2310.01285. https://arxiv.org/abs/2310.01285


MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.


McGreevy, J., Muguruza, A., Issa, Z., Salvi, C., Chan, J., & Zuric, Z. (2024). Detecting multivariate market regimes via clustering algorithms. SSRN Electronic Journal. https://ssrn.com/abstract=4758243


QuantStart. (2024). K-Means clustering of daily OHLC bar data. QuantStart. https://www.quantstart.com/articles/k-means-clustering-of-daily-ohlc-bar-data/


Wilder, J. W. (1978). New Concepts in Technical Trading Systems. Trend Research.

© MarketFragments.com Research Team. Free for educational and research use. Not financial advice. Source: Market Fragments AI Strategy Factory | info@marketfragments.com

 
 
 

Recent Posts

See All

Comments


Brain with financial data analysis.

Inquiries at :

Important Risk Notice: Trading involves substantial risk of loss. This is educational content only—not advice. Full details here  ------------>  

Proceed only if you're prepared.

tel#: (843) 321-8514

bottom of page