MF Strategy Factory: Main Pivot Engine paper

mcdon030
May 31
20 min read

Non-Repainting Pivot Detection in ES Futures: A Regression-Slope and ATR-Gated Approach with Session-Conditioned Behavior Classification

MarketFragments Research Team - MarketFragments.com | MF Strategy Factory · May 2026

Author: mcdon030

Editor's note: All numeric figures in this draft have been reconciled against the source CSVs (`slope_pivot_sweep_ES_full_5min_continuous_UNadjusted_5yr.csv`, `session_gate_sweep_*`, `session_trail_backtest_*`, `pivot_validation_v2_by_session_*`, `pivot_validation_v2_by_conviction_*`, `length_opt_*_lstm.csv`). See `RECONCILIATION_REPORT.md` for the cell-by-cell audit.

Abstract

We present a session-aware, regression-based pivot detection framework for intraday futures data and a structured evaluation of its design choices. Nine slope-detection variants — eight weighted-mean kernels (V1–V8) and one least-squares regression with an ATR significance gate (V9) — were evaluated on approximately five years of E-mini S&P 500 (ES) 5-minute continuous contract data. Three principal findings are reported. First, the choice of weighting kernel does not alter pivot directional accuracy: every weighted-mean variant produced an overall swing-verification rate in a narrow 75.1%–76.0% band, with kernel choice influencing swing-scale and rotation frequency rather than accuracy. Second, the ATR significance gate functions as a swing-scale dial: tightening the gate from K=0.00 to K=0.15 reduces rotation count from 20,158 to 8,433, raises average swing size from 16.4 pt to 27.2 pt, and lifts overall accuracy from 74.9% to 78.2%, but reduces total points captured from 330,783 to 229,007 — the proxy `success × swing` inflates mechanically under tightening while total opportunity falls. Third, session classification (regular trading hours vs. extended/overnight hours) produces the largest single effect observed in the study: RTH pivots are approximately twice the swing size of ETH pivots (33.75 vs. 17.86 pt), verify 5.30 percentage points more often (81.03% vs. 75.73%, n_RTH=3,543 / n_ETH=8,025, two-proportion z ≈ 6.6, p < 0.001), and pay 2.12× per-trade PnL in a relative trailing-stop backtest (2.37 vs. 1.11 pt), while ETH pivots exhibit substantially higher revisit (70.3% vs. 44.0%) and first-touch rejection (67.2% vs. 59.3%) rates consistent with mean-reverting microstructure. A separate evaluation tested whether lookback length is a tunable predictive parameter by training an out-of-sample LSTM on a triple-barrier directional label across lookback values 6 through 25; directional accuracy lay in the band [49.4%, 51.3%] (RTH mean 50.23%; ETH mean 50.28%) at every length and in every session, with the 95% confidence interval including 50% at every point — no predictive optimum was found. The pivots are therefore best characterized as descriptors of completed swing structure rather than predictors of forward direction, and lookback length is treated as a responsiveness parameter rather than a tuned hyperparameter. The resulting indicator — a non-repainting, ATR-gated, session-switched regression-pivot stream — is implemented in ThinkScript and released to the trading research community.

1. Introduction

A pivot detector is a low-level building block in technical analysis: it identifies completed swing highs and swing lows from a price series and emits a stream of confirmed turning points consumable by higher-level signal logic. Pattern scanners — including the harmonic family (Gartley, 1935) that motivated this work — rely on a stable, non-repainting pivot stream as their input substrate. Two failure modes of conventional pivot/zigzag implementations are well known to practitioners and were the immediate motivation for this study.

The first failure mode is repainting. Standard zigzag implementations re-draw their legs as new price data arrives, so the location of a previously-flagged swing point can change after the fact. For backtesting downstream pattern scanners this is fatal: the history of the pivot stream rewrites itself, and a pattern that appears to have completed at one bar may not have existed at the time the bar printed. For live trading it is equally problematic: a stop or entry placed at a "pivot level" can be invalidated by a redraw at the next bar.

The second failure mode is excessive stringency. Monotonicity-based ascending/descending logic (a leg registers only if every bar in the lookback window steps monotonically in the appropriate direction) is highly sensitive to single-bar noise. In intraday futures data this causes a substantial fraction of legs to be skipped, producing a sparse and unreliable pivot stream.

The research question addressed in this paper is: what construction of a pivot detector simultaneously satisfies (a) non-repainting, in the sense that confirmed pivot levels do not move after the bar of confirmation; (b) robust leg registration in the presence of single-bar noise; (c) operational tractability, i.e., per-bar computation cheap enough to scan multi-year intraday datasets; and (d) measurable, statistically supported behavior over a long historical window. Additionally, we ask whether the lookback length parameter — typically pinned by convention rather than tested — encodes a tunable predictive edge, or whether it functions purely as a responsiveness selector.

2. Background

2.1 Pivot detection in technical analysis

Pivot points and swing structure form the foundation of multiple technical analysis traditions, including Dow theory, Elliott wave, and the harmonic pattern framework introduced by Gartley (1935) and extended in subsequent practitioner literature. The common requirement across these traditions is a discrete, ordered sequence of confirmed turning points (X, A, B, C, D, …) that can be used to measure ratios, projections, and levels. The quality of any pattern detector is bounded above by the quality of its underlying pivot stream — a noisy or repainting pivot stream cannot be salvaged by a more sophisticated downstream scanner.

2.2 Slope estimation and trend filters

Slope estimation over a rolling window is one of the simplest available trend filters. Weighted-mean variants — equal-weighted, linear-recency-weighted, exponentially-weighted, and so on — differ in how recent observations are emphasized relative to older ones, but share the property that the sign of the estimated slope is what is consumed by a pivot detector. Least-squares regression provides an alternative slope estimate with desirable statistical properties; combined with a significance threshold derived from local volatility (e.g., ATR; Wilder, 1978), it offers a principled mechanism for ignoring slopes that fail to clear the market's bar-to-bar noise floor.

2.3 Intraday session microstructure

The distinction between regular trading hours (RTH) and extended/overnight hours (ETH) in U.S. equity-index futures is a structural feature of the market documented in the intraday seasonality literature (Andersen & Bollerslev, 1997). RTH is characterized by concentrated order flow, higher volume, and directional price discovery driven by cash-market participants. ETH is characterized by thinner volume, predominantly hedging and global-macro flow, and a different mean-reversion / range-bound profile. The hypothesis that pivot-stream properties differ structurally between these two sessions is a natural one to test.

2.4 Out-of-sample predictive evaluation

A common failure mode in indicator design is the use of in-sample fit metrics as proxies for forward predictive value. Where the metric of interest carries a known bias — for example, a hit-rate × swing-size proxy mechanically inflates as the swing filter tightens — the only reliable evaluation is an out-of-sample predictive test against a label defined independently of the metric. We use the triple-barrier labeling scheme of López de Prado (2018) and a sequence model (LSTM; Hochreiter & Schmidhuber, 1997) trained walk-forward as the out-of-sample evaluator for the lookback-length question.

3. Methodology

3.1 Data

Analysis was conducted on E-mini S&P 500 (ES) continuous contract data at 5-minute resolution, **unadjusted** for roll gaps. Unadjusted data was used to preserve real price levels relevant to non-repainting level construction; roll discontinuities were inspected for their effect on pivot-flagging behavior and no material distortion was found at the 5-minute resolution. The principal sweep window is approximately five years. The Python research pipeline and the ThinkScript live indicator implement the same calculations for rotation count, swing-verification rate, and average swing size, so research-side and live-side statistics are directly comparable.

3.2 Slope detector variants

Let `p_t` denote the closing price at bar `t`, and let `W` denote the lookback window of size `L` ending at bar `t`. Each slope detector variant returns a signed slope estimate `s_t`. The sign of `s_t` is the only quantity consumed downstream; magnitude is used only for the V9 significance gate.

Variants V1 through V8 are weighted-mean estimators of the form

-s_t = Σ_{i ∈ W} w_i · (p_i − b_i)

where `b_i` is a body reference (open or close of bar `i`) and `w_i` is a weighting kernel that varies by variant: V1 equal weighting, V2 a flat average over the window, V3 a linear recency ramp, V4 a heavy exponential, V5 a recent-half ramp, V6 inverse-distance weighting, V7 a triangular kernel, V8 a normalized variant of V1. The full kernel specifications are released alongside the code.

Variant V9 is a least-squares linear regression of `p_t` against time across `W`, returning the regression slope `β`, combined with an ATR significance gate:

-flag direction-change at t ⇔ |β| > K · ATR_t

where `K` is the gate parameter and `ATR_t` is the Average True Range at bar `t` over a fixed lookback. Pivots that fail the gate condition are not registered; small slopes consistent with bar-level noise are filtered out. `K = 0` reduces V9 to an ungated regression-slope detector.

3.3 Pivot confirmation and non-repainting construction

A pivot is confirmed when the slope sign changes from positive to negative (high pivot) or negative to positive (low pivot), with the gate condition satisfied where applicable. At confirmation, the indicator snaps the pivot level to the corresponding extreme (high or low) already printed within the lookback window. Once snapped, the level is not modified by subsequent bars. This produces a strict non-repainting property at the cost of an inherent confirmation latency equal to the lookback window. A separate forward-aligned visualization layer is used in the research code for marker placement only, and is excluded from all reported statistics and from all live signal logic.

3.4 Session classification

Bars were labeled RTH for timestamps within 09:30:00–16:00:00 ET and ETH otherwise. Pivots are assigned the session of their confirmation bar. All session-conditioned statistics are computed within-session.

3.5 Evaluation metrics

For each pivot, the following outcomes are recorded.

- Swing-verification (overall %): whether the swing implied by the pivot — measured from the prior confirmed pivot — verified in the expected direction. This is the canonical pivot quality metric.

- Average swing size: the magnitude in points of the verified swing.

- Rotations per year: rotation count normalized to an annualized rate.

- First-touch rejection rate: among levels revisited after confirmation, the fraction at which the first touch produced a directional rejection.

- Times-revisited rate: the fraction of confirmed pivot levels revisited within a fixed forward window.

- Total points captured: the sum of verified swing magnitudes across the sweep.

The ranking proxy used in earlier-stage screening is `success × swing`. Because tightening the significance gate mechanically increases average swing size by removing small swings, this proxy is biased toward tighter gate settings. We report it for continuity with prior work but report total points captured alongside as the unbiased opportunity-cost measure.

3.6 Relative trailing-stop backtest

Entries were taken at pivot confirmation in the direction implied by the new leg. Two trail formulas were evaluated: a fixed-multiple ATR trail, and a hybrid trail taking the tighter of the ATR distance or the most recent prior-leg pivot level on each bar. Exits were the first bar at which the trail was breached. No commission or slippage was charged. The output reported is the mean per-trade PnL by session, and is interpreted as a *relative* comparison only — the absolute per-point figures are not a tradeable expectancy.

3.7 LSTM-based length evaluation

For each lookback length `L ∈ {6, 7, …, 25}`, the pivot stream was constructed from V9 with the standard gate. From each pivot, a feature vector encoding the geometry of the prior `n` pivots (leg sizes, leg durations, body proportions, session label) was constructed. A triple-barrier label (López de Prado, 2018) was assigned to each pivot: UP if the upper barrier was reached before the lower or time barrier, DOWN for the symmetric case, NEUTRAL otherwise. An LSTM classifier was trained walk-forward, per session, with strict separation of train and test windows to avoid lookahead. The metric reported is out-of-sample directional accuracy (UP vs. DOWN, conditional on the time barrier not being hit first) at each length.

Data availability. Source 5-minute ES continuous bars are not redistributed with this paper (file-size and exchange-licensing constraints). Readers wishing to reproduce results should source unadjusted ES continuous from their preferred data provider. All derived result CSVs from this study — the slope sweep, gate sweep, session split, trail backtest, conviction stratifications, LSTM length sweep, and HMM ablation — are linked under Data files below, and reproduce the published figures directly when the same construction is applied.

4. Results

4.1 Kernel comparison: weighted-mean variants

Across the eight weighted-mean variants, overall pivot directional accuracy is essentially constant. Every variant produced an overall verification rate in a narrow 75.13%–76.00% band over the five-year sweep window; the differences across variants are within the measurement noise of the sample. The principal axis of variation across V1–V8 is the rotation count and the corresponding average swing size: kernels with stricter direction agreement (V4, heavy exponential) emit fewer, slightly larger swings; kernels with more lenient agreement (V1, V8) emit more, slightly smaller swings.

Table 1. Slope-kernel comparison, V1–V8 (OC body reference; full sweep including HL2 body in `slope_pivot_sweep_ES_full_5min_continuous_UNadjusted_5yr.csv`).

|---|---|---|---|---|---|

| V2 | flat average | 20,826 | 76.00% | 16.71 | 348,073 |

| V3 | linear recency ramp | 19,366 | 75.91% | 17.29 | 334,858 |

| V7 | triangular | 20,726 | 75.90% | 16.72 | 346,557 |

| V5 | recent-half ramp | 20,028 | 75.87% | 16.99 | 340,359 |

| V1 ≡ V8 | flat / normalized-flat | 23,665 | 75.83% | 15.59 | 368,898 |

| V6 | inverse-distance | 21,000 | 75.78% | 16.59 | 348,416 |

| V4 | heavy exponential | 25,809 | 75.13% | 14.91 | 384,892 |

The OC and HL2 body reference settings produced near-identical outputs across all V1–V8 kernels (rotation counts within ≈0.5%, accuracy within 0.1 pp); OC values are reported throughout.

A separate observation worth recording: V8, designed as a normalization of V1 by the local price range, produced byte-identical pivot output to V1 across the sweep — every field of every row matched after sorting (12 of 14 columns identical, the two differences being the rank-ordinal and the version label). This is a consequence of normalizing by a strictly positive scalar, which cannot change the sign of the slope; since the pivot detector consumes only the sign, the normalization is a no-op. V8 is therefore retained as a duplicate of V1 in the kernel table rather than treated as an independent variant.

The kernel choice is interpreted as a swing-scale selector rather than an accuracy lever: any reasonable slope estimator over the same window identifies approximately the same directional turning points; variation across kernels reflects which subset of small-amplitude turns is included or excluded by the kernel's implicit smoothing.

4.2 ATR significance gate sensitivity

With V9 — the regression slope with explicit ATR gate — the gate parameter `K` is the principal control. Setting `K = 0` recovers an ungated regression detector with behavior comparable to V1–V8 (overall accuracy 74.94%, avg swing 16.41 pt). Tightening the gate reduces rotation count substantially and increases average swing size correspondingly.

Table 2. ATR-gate sensitivity for V9 (`slope_pivot_sweep_ES_full_5min_continuous_UNadjusted_5yr.csv`, V9 rows; rotations are 5-year totals, /yr column is the annualized rate).

|---|---|---|---|---|---|

| 0.00 | 20,158 | ≈4,032 | 74.94% | 16.41 | 330,783 |

| 0.05 | 15,368 | ≈3,074 | 76.50% | 19.32 | 296,937 |

| 0.10 | 11,568 | ≈2,314 | 77.35% | 22.73 | 262,888 |

| 0.15 | 8,433 | ≈1,687 | 78.18% | 27.16 | 229,007 |

| 0.20 | 5,854 | ≈1,171 | 78.46% | 33.23 | 194,523 |

Two structural observations follow. First, level-repaint match — measured as the agreement between the live indicator's confirmed pivot levels and a stat-only reproduction of the same construction — climbs to 100% once the gate is engaged at `K ≥ 0.10`. The sub-1% mismatches present in the ungated case are caused by small-amplitude turns where rounding sensitivity produces ambiguous confirmations; these are filtered by the gate. Second, total points captured falls monotonically as the gate tightens, which is the direct opportunity-cost signal of the gate setting. The mechanical inflation of the `success × swing` proxy that occurs under gate tightening is the methodological reason for reporting total capture alongside. From K=0.00 to K=0.20, average swing roughly doubles (16.41 → 33.23 pt) and accuracy gains 3.5 percentage points (74.94% → 78.46%), but total capture falls by 41% (330,783 → 194,523 pt) — fewer trades, larger each, less aggregate opportunity.

The gate is therefore characterized as a swing-scale dial with an explicit cost. Tighter gate produces fewer, larger, cleaner pivots at the cost of reduced total opportunity; looser gate produces the inverse.

4.3 Session-conditioned pivot behavior

Splitting confirmed pivots by RTH and ETH produces the largest single effect observed anywhere in the study.

Table 3. Pivot behavior by session

(`pivot_validation_v2_by_session_ES_full_5min_continuous_UNadjusted_5yr.csv`, gate K=0.10).

|---|---|---|---|---|---|---|

| RTH | 3,543 | 81.03% | 33.75 | 53.10% | 59.34% | 43.95% |

| ETH | 8,025 | 75.73% | 17.86 | 56.66% | 67.17% | 70.29% |

RTH pivots are 1.89× the swing size of ETH pivots and verify 5.30 percentage points more often. With n_RTH=3,543 and n_ETH=8,025, the two-proportion z-statistic for the verification-rate difference is z ≈ 6.6, corresponding to p < 0.001 against a null of session-invariance.

The reverse pattern appears in the level-respect columns. ETH pivots are revisited at 1.60× the RTH rate (70.29% vs. 43.95%), and the first-touch rejection rate at ETH levels is 7.83 percentage points higher than at RTH levels (67.17% vs. 59.34%). This is the empirical fingerprint of mean-reverting microstructure: levels persist as bounce zones because price oscillates around them rather than running through them.

The structural interpretation supported by these four metrics jointly is that the same engine produces operationally distinct pivots in the two sessions. RTH pivots mark directional structure (price runs through the level once and continues); ETH pivots mark range structure (price oscillates around the level and bounces). The session label, available as a free input from the bar timestamp, is sufficient to distinguish the two operational modes.

A relative trailing-stop backtest on the same pivot stream produces a consistent per-trade PnL advantage for RTH.

Table 4. Trail-backtest PnL by session, relative comparison (`session_trail_backtest_ES_full_5min_continuous_UNadjusted_5yr.csv`).

|---|---|---|---|---|---|

| ATR trail | RTH | 3,245 | 2.367 | 44.25% | 2.18 |

| ATR trail | ETH | 8,321 | 1.114 | 41.80% | 2.40 |

| Hybrid trail | RTH | 3,245 | 2.378 | 44.25% | 2.19 |

| Hybrid trail | ETH | 8,321 | 1.114 | 41.76% | 2.40 |

| **Pivot trail** | RTH | 3,245 | **2.058** | 43.02% | 1.91 |

| **Pivot trail** | ETH | 8,321 | **1.097** | 41.79% | 2.07 |

The per-trade ratio is 2.12× RTH-to-ETH across the ATR and hybrid trail formulas. Aggregated PnL across the test window is higher for ETH (9,272 pt vs. 7,682 pt for ATR-trail) because of the substantially larger overnight bar count and corresponding trade count, so the practical interpretation is not "discard ETH pivots" but "size RTH trades proportionally larger because each pivot is doing more work per trade." The absolute per-point figures are again *relative* — no commission or slippage modeling is included.

The pivot-anchored trail rows (bold) are the negative finding discussed in §4.4: the pivot-anchored stop produced strictly lower per-trade PnL than the ATR trail in both sessions (RTH: 2.058 vs. 2.367 = 13% worse; ETH: 1.097 vs. 1.114 ≈ 1.5% worse) and a worse R:R profile.

4.4 Negative findings: pivot-anchored trail, volume conviction, and an HMM regime layer

Three augmentations of the engine were evaluated and rejected.

Pivot-anchored trailing stop. The hypothesis was that a trail anchored to the most recent prior-leg pivot level would be tighter and more structurally grounded than a fixed-multiple ATR trail. The data did not support this. The pivot-anchored trail produced 2.058 pt mean PnL/trade in RTH and 1.097 in ETH, against 2.367 and 1.114 for the ATR trail — strictly worse in both sessions, and with a worse risk-reward (RTH R:R 1.91 vs. 2.18). The hybrid trail (whichever stop is tighter at each bar) effectively reduces to ATR-trail performance (2.378 vs. 2.367 in RTH), because the pivot level rarely sat tighter than the ATR distance from the prevailing price. The pivots earn their utility at entry timing, level identification, and session selection; the trail formula is a separate problem for which ATR is at least as good.

Volume conviction. A composite score of relative volume and directional bar commitment was computed for each pivot and used to stratify the level-respect statistics (terciles: high / mid / low; n = 3,856 each). The hypothesis was that high-conviction pivots would exhibit higher first-touch rejection and higher overall verification rates. Unconditional results: high 78.35%, mid 78.29%, low 75.41% — a 2.94 percentage point gap between high and low, but with high and mid statistically indistinguishable (0.06 pp). A conviction signal that produces a true gradient should be monotonic across terciles; the high ≈ mid > low pattern is inconsistent with that and points instead to a low-conviction noise floor rather than a high-conviction edge. Session-stratified, the gap shrinks but does not vanish: 1.65 pp in RTH (82.01% vs. 80.36%) and 3.66 pp in ETH (76.74% vs. 73.08%), with mid actually leading ETH (77.27%). The composite score was dropped from the released engine because the effect is small, non-monotonic, and inconsistent across sessions, not because it disappears — but it does not survive the discipline of "a true signal should be monotonic in the discriminator."

HMM regime layer. An HMM-based regime classifier was tested as both a pre-filter (two assignment strategies — argmax and a competitive "raced" variant) and as a feature input to the downstream LSTM. The argmax HMM produces 5 active regimes with mean persistence of 18.2 bars; the raced HMM produces 4 active regimes with mean persistence of 120.6 bars. Adding the regime label as an LSTM input produced a 0.2 pp lift in the downstream metric (0.9595 → 0.9615) — within the model-training noise floor and not distinguishable from regime label being uninformative. The regime layer was dropped from the released engine.

4.5 LSTM length sweep: no predictive optimum

For each lookback length from `L = 6` to `L = 25`, the LSTM classifier described in §3.7 was trained walk-forward, per session, and evaluated out-of-sample on the triple-barrier directional label. The metric of interest is whether any length produces directional accuracy distinguishably above the 50% null hypothesis.

Table 5. LSTM out-of-sample directional accuracy by lookback length and session

(selected rows; full sweep of all 20 lengths in `length_opt_ES_full_5min_continuous_UNadjusted_5yr_lstm.csv`).

|---|---|---|---|---|

| 6 | 0.5008 | 0.5046 | 17.55 | 7.87 |

| 10 | 0.4996 | 0.5044 | 23.49 | 11.09 |

| 13 | 0.5039 | 0.5007 | 27.35 | 13.52 |

| 17 | 0.5095 | 0.5005 | 32.54 | 16.56 |

| 21 | 0.4991 | 0.5006 | 36.33 | 20.00 |

| 25 | 0.4987 | 0.5057 | 40.11 | 23.92 |

Across all 20 lengths in 6–25, in both sessions, directional accuracy lies within a narrow band centered on 0.50. The full RTH range is [0.4939, 0.5126] with mean 0.5023; the full ETH range is [0.4975, 0.5101] with mean 0.5028. No length produces accuracy distinguishable from chance at any conventional significance threshold; the 95% binomial confidence interval at every length includes 0.50.

In contrast, the in-sample `success × swing` proxy increases monotonically with lookback length, exactly as expected from its known swing-size bias. In RTH the proxy rises from 17.55 at L=6 to 40.11 at L=25 — a 2.29× increase — while dir_acc moves by 0.0 pp. In ETH the proxy rises 3.04× over the same range with the same flat dir_acc. A grid search ranked on the proxy would have selected `L = 25` as the optimum and shipped it. The LSTM evaluation establishes that this preference is an artifact of the proxy's bias rather than a real predictive edge.

The conclusion drawn from this section is that lookback length is not a tunable predictive parameter for this pivot construction. The pivots are accurate descriptors of *completed* swings (75.73% in ETH, 81.03% in RTH at the released gate setting); the geometry of a freshly-confirmed pivot does not encode information about the *next* move above chance, at any of the lookbacks tested. Lookback length is therefore appropriately treated as a responsiveness selector: shorter `L` produces more pivots confirmed at lower latency; longer `L` produces fewer pivots representing larger structural legs. The choice is operational, driven by the downstream consumer (e.g., a pattern scanner requiring leg density vs. a structural trader requiring quiet levels), and not optimized against a fit metric.

5. Discussion

5.1 Strengths of the approach

The construction has four operational properties that follow directly from its design and that we explicitly validated.

Non-repainting. Confirmed pivot levels do not move under subsequent price action. Repaint-match measurements at gate `K ≥ 0.10` are 100%. The level a downstream pattern scanner or stop placement reads is the level the indicator drew.

Robust leg registration. The regression-slope plus ATR-gate construction registers legs in the presence of single-bar noise, addressing the leg-skipping pathology of monotonic ascending/descending logic. Rotation counts at moderate gate settings are consistent with the swing density practitioners expect at the 5-minute resolution.

Operational tractability. Per-bar computation is dominated by a least-squares fit over a window of length `L ≤ 25` and an ATR calculation. The full 5-year sweep is tractable in minutes on a single machine.

Honest separation of descriptive and predictive claims. The kernel and gate decisions are made against the descriptive metric (swing verification); the lookback length is evaluated against an independent predictive metric (LSTM out-of-sample). The two evaluation channels disagree about lookback length, and we follow the predictive channel because that is the question the parameter purports to answer.

5.2 Limitations

Volatility regime robustness. The session-conditioned differences in Table 3 are large in absolute magnitude and supported by large per-session sample sizes (n_RTH=3,543; n_ETH=8,025), but a longer-window robustness check across volatility regimes is needed to confirm that the magnitude of the RTH–ETH gap is stable and not driven by one or two outlier years. This check is planned and not yet reported.

Single-instrument validation. All results are on ES 5-minute. Generalization to other index futures (NQ, RTY, YM), interest-rate futures, energy futures, or equity ETFs is not implied. The framework is general; specific parameter settings may not be.

Relative, not absolute, PnL. The trail-backtest PnL figures in Table 4 are relative comparisons. Entries are pivot turns rather than a production entry model, commissions and slippage are not charged, and execution assumptions are optimistic relative to live conditions. The *relative* findings — RTH ≈ 2× ETH per trade; ATR ≥ pivot-anchored for trailing — are interpreted as the load-bearing conclusions; the per-point magnitudes are not.

Timezone-dependent session labeling. The RTH/ETH split depends on bar timestamps interpreted as Eastern Time and on the 09:30–16:00 RTH window. The four independent metrics in Table 3 jointly produce a textbook session microstructure pattern, which is strong corroborating evidence that the labeling is correct, but the dependence is noted explicitly.

No claim about downstream signal performance. The validation in this paper is of pivot-stream properties — non-repainting, leg registration robustness, swing verification, session-conditioned behavior, and the absence of a predictive optimum in lookback length. The performance of any specific harmonic pattern, trading strategy, or signal built on top of these pivots is a separate evaluation problem and is not addressed here.

5.3 Relationship to existing literature

The use of ATR (Wilder, 1978) to define a market-relative significance threshold is well established. The session-microstructure finding is consistent with the broader intraday seasonality literature (Andersen & Bollerslev, 1997), in which RTH and ETH are shown to differ systematically in volatility profile, volume concentration, and price-discovery role. The out-of-sample failure of a parameter that looks favorable in-sample is a textbook case of the proxy-bias / lookahead pathology described in López de Prado (2018), and the triple-barrier label and walk-forward LSTM architecture used here follow that book's methodology directly. The negative result on lookback length is, accordingly, an instance of the broader category of "in-sample fit metric inflated by mechanical selection effect," not a finding specific to this construction.

6. Conclusion

A pivot detector built from a least-squares regression slope and an ATR significance gate, applied per bar and confirmed non-destructively, satisfies the operational requirements (non-repainting, robust leg registration, tractable compute) that are necessary for use as a substrate for higher-level pattern and signal logic. The choice of slope weighting kernel does not affect directional accuracy; the ATR gate functions as an explicit swing-scale dial with a known cost; and the session label produces operationally distinct pivot behavior — directional in RTH, mean-reverting in ETH — that is exploitable as a built-in use-case switch.

An independent out-of-sample evaluation using an LSTM trained on a triple-barrier directional label established that the lookback length parameter does not encode a tunable predictive edge across the tested range. The pivots are descriptors of completed swing structure with high verification rates, not predictors of the next move; lookback length is accordingly treated as a responsiveness parameter set by the downstream consumer's requirements.

The MarketFragments Pivot Engine is released as a single ThinkScript study implementing the V9 construction with session-switched gate defaults (`K = 0.15` RTH; `K = 0.05` ETH). The indicator and its short specification are published in the Thinkscript Library group at [marketfragments.com/group-page/thinkscript-library/discussion/\[POST_GUID\]][indicator-drop]. The full slope-sweep, gate-sweep, session-split, trail-backtest, and LSTM-length-sweep result CSVs accompany this paper. The out-of-sample evaluation of the lookback-length parameter is published separately as a companion paper, [*Out-of-Sample Evaluation of Lookback Length as a Predictive Parameter in Regression-Based Pivot Detection: An LSTM Negative Result*][lstm-companion]. Future work will extend the evaluation to additional instruments, formalize the longer-window volatility-regime robustness check, and investigate whether session-conditioned pivot statistics can be embedded as features in downstream pattern-scanner scoring rather than used purely as a use-case label.

[indicator-drop]: https://www.marketfragments.com/group-page/thinkscript-library/discussion/a307d17d-16a0-4bae-b0cc-bc3d461fc55b

[LSTM info ]: https://www.marketfragments.com/post/mf-strategy-factory-pivot-engine-lstm-companion

Data files

Every result CSV/XLSX cited in this paper, hosted in the MarketFragments Media Library. Raw event-level data is included for independent re-analysis.

Slope-kernel and ATR-gate sweeps (§3.2, §4.1, §4.2)

▸ slope_pivot_sweep_ES_full_5min_continuous_UNadjusted_5yr.csv — Full 9-variant × body × gate sweep (V1–V9, OC/HL2 bodies, K=0.00–0.20).

▸ slope_pivot_sweep_ES_full_5min_continuous_UNadjusted_5yr.xlsx — Same data, Excel format.

Session-conditioned pivot behavior (§3.4, §4.3)

▸ pivot_validation_v2_by_session_ES_full_5min_continuous_UNadjusted_5yr.csv — Per-session swing success, avg swing, respect rate, first-touch rejection, revisit rate (source of Table 3).

▸ session_gate_sweep_ES_full_5min_continuous_UNadjusted_5yr.csv — Per-session × gate-K behavior across the full sweep window.

Trailing-stop backtest (§3.6, §4.3, §4.4)

▸ session_trail_backtest_ES_full_5min_continuous_UNadjusted_5yr.csv — ATR / hybrid / pivot-anchored trail PnL, win rate, R:R by session (source of Table 4).

Conviction and pivot-validation detail (§4.4)

▸ pivot_validation_v2_by_conviction_ES_full_5min_continuous_UNadjusted_5yr.csv — Volume-conviction tercile stratification (high / mid / low).

▸ pivot_validation_v2_ES_full_5min_continuous_UNadjusted_5yr.xlsx — Combined v2 pivot-validation workbook (all stratifications, Excel).

▸ pivot_validation_ES_full_5min_continuous_UNadjusted_5yr.xlsx — Original (v1) pivot-validation workbook.

▸ pivot_validation_raw_ES_full_5min_continuous_UNadjusted_5yr.csv — Event-level raw pivot data for independent re-analysis.

Rejected regime layer (§4.4)

▸ hmm_engine_comparison.csv — argmax vs. raced HMM: entropy, persistence, active-regime count.

▸ lstm_regime_ablation.csv — Regime-layer ablation on the LSTM (lift = 0.002).

▸ regime_stratified_sweep.csv — Sweep stratified by regime label — exploratory.

LSTM lookback-length sweep (§4.5)

▸ length_opt_ES_full_5min_continuous_UNadjusted_5yr_lstm.csv — Per-length, per-session OOS dir_acc + proxy values (source of Table 5).

▸ length_opt_ES_full_5min_continuous_UNadjusted_5yr_lstm.xlsx — Same data, Excel format.

References

Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets. *Journal of Empirical Finance*, 4(2–3), 115–158.

Gartley, H. M. (1935). *Profits in the Stock Market*. Lambert-Gann Publishing.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. *Neural Computation*, 9(8), 1735–1780.

López de Prado, M. (2018). *Advances in Financial Machine Learning*. Wiley.

Wilder, J. W. (1978). *New Concepts in Technical Trading Systems*. Trend Research.

Companion materials

- Out-of-Sample Evaluation of Lookback Length as a Predictive Parameter in Regression-Based Pivot Detection: An LSTM Negative Result. MF Strategy Factory companion paper: https://www.marketfragments.com/post/mf-strategy-factory-pivot-engine-lstm-companion

- MarketFragments Pivot Engine — Market Structure Drop. ThinkScript indicator and short specification, Thinkscript Library group: https://www.marketfragments.com/group-page/thinkscript-library/discussion/a307d17d-16a0-4bae-b0cc-bc3d461fc55b

- MF Strategy Factory category index: https://www.marketfragments.com/blog/categories/mf-strategy-factory

---

Research preview. Not financial advice. Figures are from historical study and are not a promise of future results.