pith. sign in

arxiv: 2606.23367 · v1 · pith:WO5V7NS2new · submitted 2026-06-22 · 💱 q-fin.CP · cs.CE· cs.DC· math.OC· q-fin.PM

Asymmetry PRISM: A CPU/GPU Portfolio Optimization Engine for Deadline-Bounded Institutional Rebalancing

Pith reviewed 2026-06-26 05:48 UTC · model grok-4.3

classification 💱 q-fin.CP cs.CEcs.DCmath.OCq-fin.PM
keywords portfolio optimizationGPU accelerationinstitutional rebalancingdeadline constraintstax-aware optimizationbatch quadratic programmingconstraint handlingaudit records
0
0 comments X

The pith

Asymmetry PRISM completes 500 institutional rebalances over a 10,000-instrument universe in 109.5 seconds on GPU while meeting a 25-minute deadline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces and benchmarks Asymmetry PRISM as a CPU/GPU engine built specifically for batched portfolio rebalancing under hard deadlines and multiple constraints including taxes, turnover, exposures, and exclusions. It reports that the CPU version runs 4.5x to 24.1x faster than the fastest completed reference solver on problems sized N=100 to N=2,000, and the GPU version finishes every one of 500 accounts in 109.5 seconds with zero missed deadlines. A reader would care because real institutional rebalancing must produce new weights before trading windows close, and missed deadlines force either delayed trades or simplified models. The evaluation keeps hardware, software versions, and timing lanes fixed and only claims objective gaps where a reference solver also finished.

Core claim

Asymmetry PRISM is a portfolio optimization engine that on completed multi-solver rows from N=100 to N=2,000 is 4.5x to 24.1x faster than the fastest completed reference row in the same lane; on a production queue of 500 accounts over a 10,000-instrument universe the GPU route finishes all 500 solves in 109.5 seconds inside a declared 25-minute operating window with an audit record for every solve while the recorded OSQP baseline finishes only 4 of 500; on an operationally constrained real-data suite the engine clears constrained solves 3.4x to 126.7x faster than the best completing incumbent at certified-equal objectives and the GPU route widens to 8.8x over the CPU route at N=384,800.

What carries the argument

Asymmetry PRISM, a CPU/GPU portfolio optimization engine that ingests problem data and returns weights, status codes, timings, memory class, feasibility diagnostics, and audit records for batched institutional rebalancing.

If this is right

  • Institutions can process hundreds of accounts with full constraint sets inside fixed operating windows without missed deadlines.
  • The GPU route supplies an 8.8x additional speedup over the CPU route at the largest tested scale of N=384,800.
  • Every solve produces a complete audit record of feasibility, timing, memory, and failure status.
  • Speedups of 3.4x to 126.7x over the best completing incumbent are achieved at certified-equal objective values on tax-motivated transition penalties and restriction caps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The design could support more frequent rebalancing cycles without lengthening the operating window.
  • The same engine structure might transfer to other finance workloads that combine batch quadratic programs with hard deadlines, such as intraday risk hedging.
  • At still larger account counts the memory-class and parallel scaling behavior reported for N=384,800 would determine the practical ceiling.

Load-bearing premise

The reference solvers including OSQP represent the relevant state-of-the-art baselines and the chosen problem instances with tax penalties, turnover controls, and restriction caps are representative of real institutional workloads.

What would settle it

Running the same 500-account production queue with the 10,000-instrument universe on identical hardware and observing whether any standard solver such as OSQP completes more than 4 accounts inside the 25-minute window.

read the original abstract

Institutional rebalancing is a batched optimization workload with a hard operating deadline: hundreds of accounts need new weights under budget, turnover, exposure, exclusion, and tax-aware controls before trading can proceed. This paper evaluates Asymmetry PRISM, a CPU/GPU portfolio optimization engine, through a public evaluation boundary; problem data in, and returned weights, status codes, timings, memory class, external feasibility diagnostics, eligible objective comparisons, and audit records out. Within that boundary, the evaluation protocol fixes hardware and software versions, declares timing lanes, separates cold single calls from repeated workloads, and admits objective-gap claims only where an eligible reference solver completed. On completed multi-solver rows from N=100 to N=2,000, Asymmetry PRISM-CPU is 4.5x to 24.1x faster than the fastest completed reference row in the same lane. In the production queue study, Asymmetry PRISM-GPU completes 500/500 accounts over a 10,000-instrument universe in 109.5 s within a declared 25-minute operating window, with zero missed deadlines and an audit record for every solve; the recorded OSQP queue baseline completes 4/500. On an operationally constrained real-data suite (tax-motivated transition penalties, restriction caps, turnover controls, batches), Asymmetry PRISM clears constrained solves 3.4x to 126.7x faster than the best completing incumbent at certified-equal objectives, and the GPU route widens to 8.8x over the CPU route at N=384,800. Rows without a completed reference are reported as feasibility, timing, memory, and failure-status evidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents Asymmetry PRISM, a CPU/GPU portfolio optimization engine for institutional rebalancing under hard deadlines and constraints including tax penalties, turnover controls, exposure limits, and restrictions. It reports empirical timing and completion results on batched solves for N=100 to 2000 instruments and a production queue of 500 accounts over a 10,000-instrument universe, claiming 4.5x–24.1x speedups versus the fastest completed reference solver (including OSQP) on multi-solver rows, 500/500 completions in 109.5 s (versus 4/500 for the OSQP baseline) within a 25-minute window, and up to 126.7x faster clears at certified-equal objectives, with all results conditioned on a declared public evaluation boundary that admits objective comparisons only where a reference completed.

Significance. If the evaluation protocol, baseline configurations, and instance representativeness hold, the work would demonstrate practical feasibility for deadline-bounded, tax-aware rebalancing at institutional scale on commodity CPU/GPU hardware, with the emphasis on audit records, feasibility diagnostics, and separate reporting of non-completed rows providing a useful template for reproducible systems evaluation in computational finance.

major comments (3)
  1. [Abstract] Abstract: The headline claims of 4.5x–24.1x speedups and 500/500 vs. 4/500 completions are restricted to “completed multi-solver rows” and “eligible reference solver completed,” yet the manuscript provides no count or characterization of excluded rows, nor any analysis of whether those rows correspond to the operationally hardest instances; this selection criterion is load-bearing for the generalization of the performance advantage.
  2. [Abstract] Abstract (production queue study): The comparison of Asymmetry PRISM-GPU completing all 500 accounts versus the OSQP queue baseline completing only 4/500 lacks any description of the reference solver’s configuration (tolerances, iteration limits, warm-starting), tuning effort, or resource allocation, making it impossible to verify that the baseline represents a fair or state-of-the-art comparator for the tax-penalty/turnover/restriction instances.
  3. [Abstract] Abstract: The evaluation protocol is described only at the level of “fixes hardware and software versions, declares timing lanes, separates cold single calls,” with no methods section, data-generation procedure, or verification details supplied; this absence directly undermines the soundness of all reported timings, memory classes, and “certified-equal objectives” assertions.
minor comments (1)
  1. The abstract is information-dense; separating the CPU versus GPU results and the single-call versus repeated-workload lanes into distinct sentences would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract and evaluation protocol. We address each major comment below and will revise the manuscript accordingly to improve transparency.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claims of 4.5x–24.1x speedups and 500/500 vs. 4/500 completions are restricted to “completed multi-solver rows” and “eligible reference solver completed,” yet the manuscript provides no count or characterization of excluded rows, nor any analysis of whether those rows correspond to the operationally hardest instances; this selection criterion is load-bearing for the generalization of the performance advantage.

    Authors: The manuscript conditions all speedup and completion claims on completed multi-solver rows and separately reports non-completed rows with feasibility, timing, memory, and failure-status evidence. We agree the abstract would be strengthened by explicit counts and characterization of excluded rows. We will revise the abstract and add a table summarizing the fraction and properties of completed versus non-completed instances to allow readers to evaluate selection effects. revision: yes

  2. Referee: [Abstract] Abstract (production queue study): The comparison of Asymmetry PRISM-GPU completing all 500 accounts versus the OSQP queue baseline completing only 4/500 lacks any description of the reference solver’s configuration (tolerances, iteration limits, warm-starting), tuning effort, or resource allocation, making it impossible to verify that the baseline represents a fair or state-of-the-art comparator for the tax-penalty/turnover/restriction instances.

    Authors: The abstract omits these configuration details. While the evaluation protocol section of the manuscript specifies reference solver settings, we will expand the abstract with a concise description of the OSQP configuration parameters (tolerances, iteration limits, warm-starting), tuning effort, and resource allocation to enable independent verification of the baseline. revision: yes

  3. Referee: [Abstract] Abstract: The evaluation protocol is described only at the level of “fixes hardware and software versions, declares timing lanes, separates cold single calls,” with no methods section, data-generation procedure, or verification details supplied; this absence directly undermines the soundness of all reported timings, memory classes, and “certified-equal objectives” assertions.

    Authors: The abstract summarizes the protocol at a high level. We agree a dedicated methods description is required. We will add a Methods section detailing the data-generation procedure, verification steps, hardware/software versions, timing lane definitions, and criteria for certified-equal objectives to support the reported timings and claims. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical timing benchmarks with no derivation chain

full rationale

The paper contains no mathematical derivations, first-principles results, fitted parameters, or ansatzes. All claims consist of direct wall-clock timing measurements on fixed problem instances against external reference solvers (OSQP and others). The evaluation protocol explicitly conditions objective-gap claims on completed reference solves and reports non-completed rows separately as feasibility/timing evidence. No self-citation is used to justify any core claim, and no result reduces to its own inputs by construction. This is a standard empirical performance study whose validity rests on the representativeness of the test suite and baselines rather than any definitional or self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests entirely on empirical timing measurements of an unreleased engine against named reference solvers; no free parameters, mathematical axioms, or invented entities are identifiable from the abstract.

pith-pipeline@v0.9.1-grok · 5853 in / 1114 out tokens · 33850 ms · 2026-06-26T05:48:37.328945+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 19 canonical work pages

  1. [1]

    Artifact review and badging, version 1.1

    ACM. Artifact review and badging, version 1.1. ACM Publications Policy, 2020

  2. [2]

    Optimal execution of portfolio transactions.Journal of Risk, 3(2):5–39, 2001

    Robert Almgren and Neil Chriss. Optimal execution of portfolio transactions.Journal of Risk, 3(2):5–39, 2001. doi: 10.21314/JOR.2001.041

  3. [3]

    Taylor, and Justin Carpentier

    Antoine Bambade, Fabian Schramm, Sarah El-Kazdadi, Stéphane Caron, Adrien B. Taylor, and Justin Carpentier. ProxQP: An efficient and versatile quadratic programming solverfor real-time robotics applications and beyond.IEEE Transactions on Robotics, 2025. doi: 10.1109/TRO.2025.3577107

  4. [4]

    Malan, Jason H

    Thomas Bartz-Beielstein, Carola Doerr, Daan van den Berg, Jakob Bossek, Sowmya Chandrasekaran, Tome Eftimov, Andreas Fischbach, Pascal Kerschke, William La Cava, Manuel López-Ibáñez, Katherine M. Malan, Jason H. Moore, BorisNaujoks,PatrykOrzechowski,VanessaVolz,MarkusWagner,andThomasWeise. Benchmarkinginoptimization: Best practice and open issues. arXiv p...

  5. [5]

    Global portfolio optimization.Financial Analysts Journal, 48(5):28–43, 1992

    Fischer Black and Robert Litterman. Global portfolio optimization.Financial Analysts Journal, 48(5):28–43, 1992. doi: 10.2469/faj.v48.n5.28

  6. [6]

    Cambridge University Press, 2004

    Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004. doi: 10.1017/ CBO9780511804441

  7. [7]

    Chaudhuri, Terence C

    Shomesh E. Chaudhuri, Terence C. Burnham, and Andrew W. Lo. An empirical evaluation of tax-loss-harvesting alpha.Financial Analysts Journal, 76(3):99–108, 2020. doi: 10.1080/0015198X.2020.1760064

  8. [8]

    Constantinides

    George M. Constantinides. Capital market equilibrium with personal tax.Econometrica, 51(3):611–636, 1983. doi: 10.2307/1912150

  9. [9]

    CVXPY: A Python-embedded modeling language for convex optimization

    Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016

  10. [10]

    Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002

    Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002. doi: 10.1007/s101070100263

  11. [11]

    Commission delegated regulation (EU) 2017/589: Regulatory technical standards specifying the organisational requirements of investment firms engaged in algorithmic trading

    European Commission. Commission delegated regulation (EU) 2017/589: Regulatory technical standards specifying the organisational requirements of investment firms engaged in algorithmic trading. Official Journal of the European Union, L 87, 2017. MiFID II RTS 6

  12. [12]

    F., & French, K

    Eugene F. Fama and Kenneth R. French. Common risk factors in the returns on stocks and bonds.Journal of Financial Economics, 33(1):3–56, 1993. doi: 10.1016/0304-405X(93)90023-5

  13. [13]

    Medeiros, Hanming Yang, and Songshan Yang

    Qingliang Fan, Marcelo C. Medeiros, Hanming Yang, and Songshan Yang. Cost-aware portfolios in a large universe of assets. arXiv preprint arXiv:2412.11575, 2025. URLhttps://arxiv.org/abs/2412.11575

  14. [14]

    Goulart and Yuwen Chen

    Paul J. Goulart and Yuwen Chen. Clarabel: An interior-point solver for conic programs with quadratic objectives. arXiv preprint arXiv:2405.12762, 2024

  15. [15]

    Harvey, Michele G

    Campbell R. Harvey, Michele G. Mazzoleni, and Alessandro Melone. The unintended consequences of rebalancing. Working Paper 33554, National Bureau of Economic Research, 2025

  16. [16]

    and Hall, J

    QiHuangfuandJ.A.JulianHall. Parallelizingthedualrevisedsimplexmethod.MathematicalProgrammingComputation, 10(1):119–142, 2018. doi: 10.1007/s12532-017-0130-5

  17. [17]

    Principlesfordirectelectronicaccesstomarkets: Finalreport

    IOSCOTechnicalCommittee. Principlesfordirectelectronicaccesstomarkets: Finalreport. InternationalOrganization 21 PRISM: deadline-bounded portfolio optimization Ghosh, 2026 of Securities Commissions, FR08/10, 2010

  18. [18]

    FlashFolio: A GPU-accelerated solver for portfolio optimization

    Yilun Jiang, Haihao Lu, Zedong Peng, and Jinwen Yang. FlashFolio: A GPU-accelerated solver for portfolio optimization. arXiv preprint arXiv:2604.22625, 2026

  19. [19]

    Personalized indexing: A portfolio construction plan

    Kevin Khang, Alan Cummings, Thomas Paradise, and Brennan O’Connor. Personalized indexing: A portfolio construction plan. Vanguard Research, 2022

  20. [20]

    A well-conditioned estimator for large- dimensional covariance matrices

    Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matrices.Journal of Multivariate Analysis, 88(2):365–411, 2004. doi: 10.1016/S0047-259X(03)00096-4

  21. [21]

    Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks.Review of Financial Studies, 30(12):4349–4388, 2017

    Olivier Ledoit and Michael Wolf. Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks.Review of Financial Studies, 30(12):4349–4388, 2017. doi: 10.1093/rfs/hhx052

  22. [22]

    Applications of second-order cone programming.Linear Algebra and its Applications, 284:193–228, 1998

    Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret. Applications of second-order cone programming.Linear Algebra and its Applications, 284:193–228, 1998. doi: 10.1016/S0024-3795(98)10032-0

  23. [23]

    Portfolio selection.Journal of Finance, 7(1):77–91, 1952

    Harry Markowitz. Portfolio selection.Journal of Finance, 7(1):77–91, 1952. doi: 10.2307/2975974

  24. [24]

    Richard O. Michaud. The Markowitz optimization enigma: Is ‘optimized’ optimal?Financial Analysts Journal, 45(1): 31–42, 1989. doi: 10.2469/faj.v45.n1.31

  25. [25]

    Tax-awareportfolioconstructionviaconvex optimization.Journal of Optimization Theory and Applications, 189:364–383, 2021

    NicholasMoehle,MykelJ.Kochenderfer,StephenBoyd,andAndrewAng. Tax-awareportfolioconstructionviaconvex optimization.Journal of Optimization Theory and Applications, 189:364–383, 2021. doi: 10.1007/s10957-021-01823-0

  26. [26]

    Portfolioconstructionaslinearlyconstrained separable optimization.Optimization and Engineering, 24:1667–1687, 2023

    NicholasMoehle,JacobGindi,StephenBoyd,andMykelJ.Kochenderfer. Portfolioconstructionaslinearlyconstrained separable optimization.Optimization and Engineering, 24:1667–1687, 2023. doi: 10.2139/ssrn.3800965

  27. [27]

    MOSEK ApS, 2026

    MOSEK ApS.MOSEK Optimizer API Manual. MOSEK ApS, 2026. Version 11.1

  28. [28]

    Nasdaq closing cross: Frequently asked questions

    Nasdaq. Nasdaq closing cross: Frequently asked questions. Nasdaq Trader market-system documentation,https: //www.nasdaqtrader.com/content/productsservices/Trading/ClosingCrossfaq.pdf, 2024

  29. [29]

    Scalable mean-variance portfolio optimization via subspace embeddings and GPU-friendly nesterov-accelerated projected gradient

    Yi-Shuai Niu and Yajuan Wang. Scalable mean-variance portfolio optimization via subspace embeddings and GPU-friendly nesterov-accelerated projected gradient. arXiv preprint arXiv:2604.02917, 2026. URLhttps://arxiv. org/abs/2604.02917

  30. [30]

    NVIDIA cuOpt documentation.https://docs.nvidia.com/cuopt/, 2026

    NVIDIA Corporation. NVIDIA cuOpt documentation.https://docs.nvidia.com/cuopt/, 2026. Version 26.2

  31. [31]

    2016 , issue_date =

    Brendan O’Donoghue, Eric Chu, Neal Parikh, and Stephen Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding.Journal of Optimization Theory and Applications, 169(3):1042–1068, 2016. doi: 10.1007/s10957-016-0892-3

  32. [32]

    André F. Perold. The implementation shortfall: Paper versus reality.Journal of Portfolio Management, 14(3):4–9, 1988. doi: 10.3905/jpm.1988.409150

  33. [33]

    GPUaccelerationofADMMforlarge-scalequadraticprogramming

    MichelSchubiger,GoranBanjac,andJohnLygeros. GPUaccelerationofADMMforlarge-scalequadraticprogramming. Journal of Parallel and Distributed Computing, 144:55–67, 2020. doi: 10.1016/j.jpdc.2020.05.021

  34. [34]

    William F. Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk.Journal of Finance, 19 (3):425–442, 1964. doi: 10.1111/j.1540-6261.1964.tb02865.x

  35. [35]

    Enhancingactivetaxmanagementthroughtherealization of capital gains.Journal of Wealth Management, 10(4):9–16, 2008

    DavidM.Stein,HemambaraVadlamudi,andPaulBouchey. Enhancingactivetaxmanagementthroughtherealization of capital gains.Journal of Wealth Management, 10(4):9–16, 2008

  36. [36]

    OSQP: An operator splitting solver for quadratic programs.Mathematical Programming Computation, 12(4):637–672, 2020

    Bartolomeo Stellato, Goran Banjac, Paul Goulart, Alberto Bemporad, and Stephen Boyd. OSQP: An operator splitting solver for quadratic programs.Mathematical Programming Computation, 12(4):637–672, 2020. doi: 10.1007/ s12532-020-00179-2

  37. [37]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. Risk management controls for brokers or dealers with market access. 17 CFR 240.15c3-5; Exchange Act Release No. 34-63241, 2010

  38. [38]

    Using anytime algorithms in intelligent systems.AI Magazine, 17(3):73–83, 1996

    Shlomo Zilberstein. Using anytime algorithms in intelligent systems.AI Magazine, 17(3):73–83, 1996. doi: 10.1609/ aimag.v17i3.1232. 22