pith. machine review for the scientific record. sign in

arxiv: 2605.04004 · v1 · submitted 2026-05-05 · 💱 q-fin.TR · q-fin.CP· q-fin.ST

Recognition: unknown

Structural Limits of OHLCV-Based Intraday Signals in MNQ Futures: A Systematic Falsification Study

Mathias Mesfin

Pith reviewed 2026-05-07 03:26 UTC · model grok-4.3

classification 💱 q-fin.TR q-fin.CPq-fin.ST
keywords MNQ futuresOHLCV signalsintraday tradingfalsification studytrading edgetransaction costsmomentum strategieswalk-forward validation
0
0 comments X

The pith

No OHLCV-derived intraday signal meets institutional criteria for a tradable edge in MNQ futures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests fourteen families of signals built from five-minute open-high-low-close-volume bars, including opening-range breakouts, gap continuations, volume surges, liquidity grabs, and volatility-conditioned rules, all applied to Micro E-mini Nasdaq 100 futures across 947 trading days. Each signal family is required to clear four simultaneous hurdles in out-of-sample walk-forward tests: a T-statistic of at least 2.0, a minimum of thirty trades, positive net return after a fixed two-point round-trip cost, and stability across multiple years. None of the families satisfies every condition at once. The largest gross edge observed under next-bar-open execution stays between 0.07 and 1.50 points per trade, which remains below the cost threshold. The study supplies a reproducible null result that places an explicit upper bound on the information contained in standard price and volume data for this market.

Core claim

Despite exhaustive evaluation of opening-range breakouts, gap continuations, liquidity grabs, cross-session momentum, and other OHLCV-derived rules, no configuration simultaneously satisfies the full set of institutional performance criteria. A gap-continuation rule produces a T-statistic of 3.23 and a cumulative profit of 14.52 points yet triggers only twenty-two trades, violating the sample-size requirement. Two signals known to work from prior internal validation pass the same tests, confirming that the framework is capable of detecting edge when it exists. The gross per-trade advantage available to any of the tested rules therefore remains bounded between 0.07 and 1.50 points, which is 0

What carries the argument

A walk-forward falsification protocol that applies four pre-specified institutional filters (T-statistic >= 2.0, minimum 30 trades, positive net return after two-point cost, multi-year stability) to fourteen distinct signal families on five-minute MNQ data.

If this is right

  • Standard OHLCV-based intraday strategies cannot generate consistent profits in MNQ futures once realistic transaction costs are included.
  • Execution at the next bar open on five-minute bars leaves insufficient room for slippage and commissions in this contract.
  • Signals that appear profitable on raw data lose their edge when subjected to realistic sample-size and stability requirements.
  • The methodology correctly identifies edge when it is present, as shown by the two positive-control signals that pass every filter.
  • The documented upper bound of 1.50 points gross per trade applies across momentum, gap, volume, and volatility-conditioned families alike.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structural constraint on gross edge is likely to appear in other liquid index futures when subjected to identical filters.
  • Traders seeking intraday edges in MNQ will probably need to incorporate order-book imbalance or external news flow rather than OHLCV alone.
  • Extending the falsification protocol to additional years or to the full-size E-mini Nasdaq contract would provide a direct test of robustness.
  • The null result supplies a concrete baseline against which any newly proposed intraday rule can be measured.

Load-bearing premise

The pre-specified institutional criteria correctly identify whether any tradable edge exists, and the 2021-2025 MNQ data with next-bar-open execution fairly represent live trading conditions.

What would settle it

Discovery of even one OHLCV-derived signal that simultaneously achieves a T-statistic of at least 2.0, produces at least 30 trades, delivers positive net profit after two-point round-trip costs, and remains stable across multiple years in an out-of-sample test would falsify the structural-limit claim.

read the original abstract

This paper tests whether intraday momentum signals derived from open-high-low-close-volume (OHLCV) data produce a statistically significant trading edge in Micro E-mini Nasdaq 100 futures (MNQ) under realistic execution constraints. Using 947 trading days of five-minute data (2021-2025), fourteen signal families are evaluated, including opening range breakouts, gap strategies, volume signals, cross-session momentum, liquidity grabs, volatility-conditioned classifiers, and news-driven strategies. All signals are assessed using strict institutional criteria: out-of-sample walk-forward validation, minimum T-statistic of 2.0, at least 30 trades, positive net return after a fixed two-point round-trip cost, and multi-year stability. No signal satisfies all criteria simultaneously. The gross edge available to next-bar-open execution is constrained to approximately 0.07-1.50 points per trade, insufficient to overcome transaction costs. A gap-continuation signal achieves T = 3.23 and +14.52 points but fails minimum sample requirements (N = 22). Two validated signals from a separate research program are included as positive controls, confirming the methodology detects genuine edge when present. The primary contribution is a reproducible falsification framework and a documented null result, highlighting structural limits of OHLCV-based intraday strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper evaluates 14 families of OHLCV-derived intraday signals (opening range breakouts, gap strategies, volume signals, cross-session momentum, liquidity grabs, volatility classifiers, and news-driven approaches) on 947 days of 5-minute MNQ futures data (2021-2025). Using walk-forward out-of-sample validation and pre-specified institutional criteria (T-statistic ≥ 2.0, minimum 30 trades, positive net return after a fixed 2-point round-trip cost, and multi-year stability), no signal meets all criteria simultaneously. Gross edge under next-bar-open execution is reported as constrained to 0.07-1.50 points per trade. A gap-continuation signal achieves T = 3.23 and +14.52 points (N = 22) but is excluded by the sample-size rule. Two positive-control signals from a separate program pass the criteria, validating detection power. The contribution is a reproducible falsification framework and documented null result.

Significance. If the central null result holds under the stated assumptions, the paper supplies a valuable, reproducible falsification benchmark for OHLCV-based intraday strategies. The inclusion of validated positive controls strengthens the methodology and distinguishes this from an untested null claim. It provides concrete quantitative bounds on gross edge and highlights the difficulty of overcoming realistic costs with standard OHLCV features, which can serve as a reference point for future work on intraday futures trading.

major comments (2)
  1. [Results section (gap-continuation signal)] Results section (gap-continuation signal): The signal reports T = 3.23, +14.52 points net, yet N = 22 trades and is excluded solely by the N ≥ 30 rule. This threshold is load-bearing for the claim that 'no signal satisfies all criteria simultaneously,' but the manuscript provides no sensitivity analysis varying the minimum-trade threshold (e.g., N ≥ 20 or N ≥ 25) to show whether the null conclusion is robust to reasonable changes in the pre-specified criterion.
  2. [Methodology (execution and cost model)] Methodology (execution and cost model): The central claim that gross edge is 'insufficient to overcome transaction costs' rests on next-bar-open execution plus a fixed 2-point round-trip cost. No alternative execution models (limit-order entry, variable slippage, or regime-dependent costs) are tested. Because the paper's title asserts 'structural limits,' the absence of robustness checks on the execution assumption is load-bearing for the generalization beyond the specific simulation.
minor comments (3)
  1. [Abstract] Abstract: The reported gross-edge range (0.07-1.50 points) should explicitly identify which signal family or families attain the upper bound to improve interpretability.
  2. [Signal definitions] Signal definitions: Several families (liquidity grabs, volatility-conditioned classifiers) would benefit from explicit pseudocode or mathematical formulations in an appendix to support exact reproducibility.
  3. [Figures] Figures: Walk-forward equity curves would be clearer if out-of-sample periods were shaded distinctly and if axis scales were standardized across panels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments raise valid points regarding the robustness of our criteria and assumptions. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: Results section (gap-continuation signal): The signal reports T = 3.23, +14.52 points net, yet N = 22 trades and is excluded solely by the N >= 30 rule. This threshold is load-bearing for the claim that 'no signal satisfies all criteria simultaneously,' but the manuscript provides no sensitivity analysis varying the minimum-trade threshold (e.g., N >= 20 or N >= 25) to show whether the null conclusion is robust to reasonable changes in the pre-specified criterion.

    Authors: We agree that examining the sensitivity of our conclusions to the minimum trade threshold is important for demonstrating robustness. In the revised manuscript, we will add a sensitivity analysis that varies the N threshold from 20 to 40 and reports how many signals (if any) meet the other criteria under these alternative thresholds. This will clarify whether the null result is driven by the specific choice of N >= 30. revision: yes

  2. Referee: Methodology (execution and cost model): The central claim that gross edge is 'insufficient to overcome transaction costs' rests on next-bar-open execution plus a fixed 2-point round-trip cost. No alternative execution models (limit-order entry, variable slippage, or regime-dependent costs) are tested. Because the paper's title asserts 'structural limits,' the absence of robustness checks on the execution assumption is load-bearing for the generalization beyond the specific simulation.

    Authors: The next-bar-open execution model with a fixed cost is chosen as a conservative and transparent benchmark that aligns with the OHLCV data resolution and avoids assumptions about intrabar liquidity. We acknowledge that this limits the generalizability of the 'structural limits' claim. In the revision, we will add a new subsection in the methodology discussing the rationale for this model, its limitations, and how alternative assumptions (such as limit-order placement or variable costs) might affect the results. We will also revise the title and abstract to more precisely scope the findings to next-bar-open execution. Full testing of limit-order strategies would require additional modeling of order book dynamics not available in our 5-minute OHLCV dataset, but we will provide qualitative bounds where possible. revision: partial

Circularity Check

0 steps flagged

Empirical falsification with pre-specified criteria yields independent null result

full rationale

The paper applies fourteen pre-specified OHLCV signal families to 2021-2025 MNQ five-minute data under walk-forward validation and fixed institutional thresholds (T-stat >=2.0, N>=30, positive net after 2-point costs, multi-year stability). The central finding that none satisfy all criteria simultaneously is a direct computational outcome on the held-out data and does not reduce to any self-referential definition, fitted parameter renamed as prediction, or self-citation chain. Positive controls from a separate program confirm test power but are not required to establish the null on the tested signals. The framework is self-contained against external data benchmarks and does not invoke uniqueness theorems or ansatzes that loop back to the paper's own inputs.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on conventional but arbitrary thresholds for statistical and practical significance plus the assumption that the chosen data period and execution model are representative; these are not derived from first principles.

free parameters (3)
  • minimum T-statistic = 2.0
    Threshold of 2.0 chosen as the bar for statistical significance
  • minimum number of trades = 30
    Threshold of 30 trades chosen to ensure sample reliability
  • round-trip transaction cost = 2 points
    Fixed 2-point cost assumed for realistic MNQ execution
axioms (2)
  • domain assumption Walk-forward out-of-sample validation on 2021-2025 data accurately reflects future performance
    Invoked to support the claim that no signal meets the criteria
  • domain assumption Next-bar-open execution with fixed costs is a realistic representation of institutional trading
    Used to conclude that the observed gross edge is insufficient

pith-pipeline@v0.9.0 · 5539 in / 1745 out tokens · 77193 ms · 2026-05-07T03:26:13.299298+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

8 extracted references

  1. [1]

    The practical question—whether such patterns survive realistic execution costs on modern, highly liquid instruments—receives considerably less rigorous treatment

    Introduction The hypothesis that short-term price patterns in equity index futures contain exploitable directional information is widely held in retail trading communities and appears frequently in early academic literature on intraday momentum (Gao, Han, Li, & Zhou, 2018; Heston, Korajczyk, & Sadka, 2010). The practical question—whether such patterns sur...

  2. [2]

    After removing partial days and session boundary artifacts, 947 complete trading days remain

    Data and Execution Framework 2.1 Data Mesfin (2026) | 2 The primary dataset consists of 72,604 five-minute OHLCV bars for MNQ continuous front-month futures, covering regular trading hours (09:30–16:00 ET) from December 2021 through August 2025. After removing partial days and session boundary artifacts, 947 complete trading days remain. Data was sourced ...

  3. [3]

    Satisfying a subset is not sufficient for deployment consideration

    Validation Standards A signal is considered to pass validation only if it satisfies all five of the following criteria simultaneously. Satisfying a subset is not sufficient for deployment consideration. Criterion Threshold Rationale T-statistic ≥ 2.0 (OOS only) Minimum statistical significance; in-sample T-stats are not reported as evidence Trade count ≥ ...

  4. [4]

    The hypothesis is that a price breakout beyond the high or low of the first N Mesfin (2026) | 4 bars of the session signals directional continuation

    Signal Families Tested 4.1 Opening Range Breakout (ORB) The opening range breakout is among the most widely discussed intraday signals in retail futures trading literature. The hypothesis is that a price breakout beyond the high or low of the first N Mesfin (2026) | 4 bars of the session signals directional continuation. We test the 09:30–09:55 ET opening...

  5. [5]

    Positive Control Signals Two signals from a separate research program are presented here as positive controls. Their purpose is to confirm that the methodology used in this paper is capable of detecting genuine edge when it exists—that the consistent null results in Section 4 reflect a true absence of edge rather than a flaw in the testing framework. Thes...

  6. [6]

    The minimum friction assumption of 2.0 points round-trip consistently exceeds this gross edge

    The Gross Edge Ceiling: A Structural Interpretation 6.1 Quantifying the Ceiling Across all fourteen signal families in Section 4, the maximum gross return observed before friction deduction is approximately 1.05 to 1.50 points at the most favorable tested horizons. The minimum friction assumption of 2.0 points round-trip consistently exceeds this gross ed...

  7. [7]

    Actual friction varies with session time, volatility, and order size

    Limitations and Extensions 7.1 Limitations Mesfin (2026) | 14 The friction assumption of two points round-trip is fixed throughout this study. Actual friction varies with session time, volatility, and order size. In high-volatility regimes, MNQ bid-ask spreads can widen significantly, making the two-point assumption conservative and potentially understati...

  8. [8]

    No signal survived the combined criteria of T ≥ 2.0 out-of-sample, ≥ 30 trades, positive net return after friction, and multi-year consistency

    Conclusion Mesfin (2026) | 15 This paper has documented a systematic falsification study of fourteen OHLCV-based intraday momentum signal families on MNQ futures. No signal survived the combined criteria of T ≥ 2.0 out-of-sample, ≥ 30 trades, positive net return after friction, and multi-year consistency. The consistent finding across all signal families ...