arxiv: 2604.20109 · v1 · submitted 2026-04-22 · 💻 cs.LG · cs.AI· math.OC

Recognition: unknown

Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning

Yicheng Pan , Ruisong Zhou , Haijun Zou , Tianyou Li , Zaiwen Wen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.OC

keywords quadratic assignment problempermutation learningMCMC finetuningenergy-based modelcombinatorial optimizationcross-graph attentionQAPLIB

0 comments

The pith

PLMA combines a neural permutation model with warm-started MCMC finetuning to reach near-zero optimality gaps on QAP instances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PLMA as a permutation learning framework that trains a neural network to initialize solutions and then applies short Markov chains for deployment-time adaptation. The chains start from the model's predictions and use an additive energy-based model to enable fast sampling over permutations. If the approach holds, learning-based methods could deliver consistent high performance on structurally varied real-world QAP problems where both classical heuristics and prior neural solvers fall short. The design includes a cross-graph attention layer to capture facility-location interactions and an O(1) 2-swap Metropolis-Hastings step for rapid exploration. Experiments show this yields superior results on standard libraries and hard instances.

Core claim

PLMA features an efficient warm-started MCMC finetuning procedure to enhance deployment-time performance, leveraging short Markov chains to anchor the adaptation to the promising regions previously explored. For rapid exploration via MCMC over the permutation space, an additive energy-based model (EBM) enables an O(1)-time 2-swap Metropolis-Hastings sampling step. The neural network parameterizing the EBM incorporates a scalable and flexible cross-graph attention mechanism to model interactions between facilities and locations in the QAP.

What carries the argument

The additive energy-based model with O(1)-time 2-swap Metropolis-Hastings sampling, warm-started from neural initial predictions and refined by short Markov chains.

If this is right

PLMA consistently outperforms state-of-the-art baselines across various benchmarks.
PLMA achieves a near-zero average optimality gap on QAPLIB.
PLMA exhibits remarkably superior robustness on the notoriously difficult Taixxeyy instances.
PLMA serves as an effective QAP solver in bandwidth minimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same warm-start plus short-chain pattern could be tested on other permutation optimization tasks such as the traveling salesman problem or graph matching.
The cross-graph attention component may generalize to model interactions in other bipartite assignment settings beyond QAP.
Combining the learned initializer with longer or adaptive chain lengths could reveal whether the short-chain regime is sufficient for still-larger instances.

Load-bearing premise

Short Markov chains started from the neural model's initial predictions reliably anchor adaptation to promising regions for structurally diverse real-world QAP instances without getting trapped in poor local optima.

What would settle it

Running PLMA on a fresh collection of QAP instances drawn from the same distribution as Taixxeyy where the average optimality gap exceeds that of the strongest baseline or where solution quality plateaus in clearly suboptimal regions.

Figures

Figures reproduced from arXiv: 2604.20109 by Haijun Zou, Ruisong Zhou, Tianyou Li, Yicheng Pan, Zaiwen Wen.

**Figure 1.** Figure 1: Gap distributions on representative Taixxeyy instances over 10 independent runs, comparing PLMA against Ro-TS and [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Performance comparison between finetuning from a pretrained model and finetuning from a randomly initialized model. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation and hyperparameter study. Panel (a) summarizes the Taixxeyy ablation results using the mean gap and the [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

The quadratic assignment problem (QAP) is a fundamental NP-hard task that poses significant challenges for both traditional heuristics and modern learning-based solvers. Existing QAP solvers still struggle to achieve consistently competitive performance across structurally diverse real-world instances. To bridge this performance gap, we propose PLMA, an innovative permutation learning framework. PLMA features an efficient warm-started MCMC finetuning procedure to enhance deployment-time performance, leveraging short Markov chains to anchor the adaptation to the promising regions previously explored. For rapid exploration via MCMC over the permutation space, we design an additive energy-based model (EBM) that enables an $O(1)$-time 2-swap Metropolis-Hastings sampling step. Moreover, the neural network used to parameterize the EBM incorporates a scalable and flexible cross-graph attention mechanism to model interactions between facilities and locations in the QAP. Extensive experiments demonstrate that PLMA consistently outperforms state-of-the-art baselines across various benchmarks. In particular, PLMA achieves a near-zero average optimality gap on QAPLIB, exhibits remarkably superior robustness on the notoriously difficult Taixxeyy instances, and also serves as an effective QAP solver in bandwidth minimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PLMA pairs cross-graph attention initialization with an additive EBM for cheap 2-swap MCMC finetuning, but the reported near-zero gaps rest on experiments that omit error bars, ablations, and instance details.

read the letter

The core idea is straightforward: train a neural model with cross-graph attention to output an initial permutation for QAP, then run short MCMC chains from that point using an additive energy-based model that makes each 2-swap Metropolis-Hastings step O(1). The abstract positions this warm-started finetuning as the key to better deployment performance on diverse instances. That combination of attention for facility-location modeling and the additive EBM for fast sampling looks new relative to the QAP solvers cited. The reported results on QAPLIB (near-zero average gap) and especially the robustness on Taixxeyy instances are the strongest part of the pitch, and the bandwidth-minimization application shows the method is not limited to one benchmark family. Those empirical claims give the work a practical flavor that could interest people working on hybrid neural-heuristic solvers. The main soft spots are the missing pieces in the experimental section. No error bars appear in the abstract summary, there is no mention of data splits or how the Taixxeyy instances were selected, and there are no ablations that isolate the contribution of the MCMC finetuning versus the neural initialization alone. The stress-test worry about short chains failing to escape poor local optima on rugged QAP landscapes is reasonable given how the method is described; without chain-length statistics or mixing diagnostics, it is hard to know whether the adaptation step is doing real work or whether the gains are mostly from the attention model. If the full paper supplies those checks and shows consistent improvement from the finetuning, the central claim holds up better. This paper is aimed at researchers in combinatorial optimization who already follow learning-based approaches to assignment problems. It has enough concrete machinery and benchmark numbers to deserve a serious referee, even though the current write-up would need revisions for transparency on the MCMC component and statistical reporting. I would send it out for review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The paper introduces PLMA, a permutation learning framework for the Quadratic Assignment Problem (QAP) that uses a cross-graph attention neural network to parameterize an additive energy-based model (EBM), enabling an efficient O(1)-time 2-swap Metropolis-Hastings step in a warm-started MCMC finetuning procedure. Extensive experiments are reported showing that PLMA outperforms state-of-the-art baselines, achieving near-zero average optimality gaps on QAPLIB, superior robustness on difficult Taixxeyy instances, and good performance in bandwidth minimization applications.

Significance. If the empirical results hold with stronger statistical support, this would advance hybrid neural-MCMC solvers for NP-hard combinatorial optimization by showing how short Markov chains can refine neural predictions for QAP. The O(1) 2-swap MH step enabled by the additive EBM is a clear technical strength for efficiency, and the cross-graph attention mechanism provides a scalable way to model facility-location interactions. These elements could influence similar approaches in other permutation problems if the robustness claims are substantiated.

major comments (3)

[§5] §5 (Experimental Results): The central claims of near-zero average optimality gap on QAPLIB and superior robustness on Taixxeyy instances are presented without error bars, standard deviations across multiple runs, or details on instance selection and data splits. This directly undermines assessment of whether the outperformance is statistically reliable and reproducible.
[§4 and §5] §4 (Method) and §5: No ablation is reported on the contribution of the warm-started MCMC finetuning versus the neural network's initial predictions alone, nor on the warm-start chain length. This is load-bearing for the headline claim that short chains reliably anchor adaptation to promising regions on rugged QAP landscapes like Taixxeyy, as the gains could otherwise be attributable to the NN predictor.
[§3.2] §3.2 (Additive EBM): The additivity assumption that enables the O(1) 2-swap MH step is not ablated against non-additive alternatives. Without this, it remains unclear whether the efficiency and performance benefits stem from the proposed EBM structure or could be achieved with standard sampling.

minor comments (2)

[§3.1] The description of the cross-graph attention mechanism in §3.1 would benefit from an explicit equation showing how facility and location embeddings are combined before the energy computation.
[Tables and Figures] Table captions and axis labels in the experimental figures should include the exact number of instances and runs to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our work. The comments highlight important aspects for improving the clarity and rigor of our experimental claims. We address each major comment point-by-point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§5] §5 (Experimental Results): The central claims of near-zero average optimality gap on QAPLIB and superior robustness on Taixxeyy instances are presented without error bars, standard deviations across multiple runs, or details on instance selection and data splits. This directly undermines assessment of whether the outperformance is statistically reliable and reproducible.

Authors: We agree that the current presentation lacks sufficient statistical detail to fully assess reproducibility and reliability. In the revised manuscript, we will add error bars and report standard deviations computed over multiple independent runs with varied random seeds. We will also expand the experimental section to include explicit details on instance selection from QAPLIB and Taixxeyy, along with any data preprocessing or split procedures used. revision: yes
Referee: [§4 and §5] §4 (Method) and §5: No ablation is reported on the contribution of the warm-started MCMC finetuning versus the neural network's initial predictions alone, nor on the warm-start chain length. This is load-bearing for the headline claim that short chains reliably anchor adaptation to promising regions on rugged QAP landscapes like Taixxeyy, as the gains could otherwise be attributable to the NN predictor.

Authors: We concur that isolating the contribution of the warm-started MCMC finetuning is essential to substantiate our claims about short chains anchoring adaptation on difficult landscapes. We will add ablation experiments in the revision that compare the neural network's standalone predictions against the full PLMA with MCMC finetuning, and systematically vary the warm-start chain length to quantify its impact, with particular focus on the Taixxeyy instances. revision: yes
Referee: [§3.2] §3.2 (Additive EBM): The additivity assumption that enables the O(1) 2-swap MH step is not ablated against non-additive alternatives. Without this, it remains unclear whether the efficiency and performance benefits stem from the proposed EBM structure or could be achieved with standard sampling.

Authors: We will include an ablation study in the revised version comparing the additive EBM to a non-additive alternative (e.g., a denser neural parameterization of the energy). This will help demonstrate the specific benefits of additivity. We note that any non-additive model would necessarily forgo the O(1) 2-swap MH step that is a core efficiency contribution of our framework, and we will discuss this design trade-off explicitly in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical ML framework for QAP that combines a cross-graph attention neural model with a warm-started MCMC finetuning step using an additive EBM. All performance claims (near-zero optimality gap on QAPLIB, robustness on Taixxeyy, etc.) are established via direct benchmark comparisons against baselines rather than any mathematical derivation chain. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citation is load-bearing for a uniqueness or ansatz result, and the method does not rename known patterns as new derivations. The central results therefore remain independent of the inputs they are evaluated against.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated beyond the standard neural-network training assumptions implicit in any learning-based solver.

pith-pipeline@v0.9.0 · 5522 in / 1222 out tokens · 34245 ms · 2026-05-10T01:29:35.780671+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 2 canonical work pages

[1]

Hospital layout as a quadratic assignment problem,

A. N. Elshafei, “Hospital layout as a quadratic assignment problem,” Journal of the Operational Research Society, vol. 28, no. 1, pp. 167– 179, 1977

1977
[2]

Learning graph matching,

T. S. Caetano, J. J. McAuley, L. Cheng, Q. V . Le, and A. J. Smola, “Learning graph matching,”IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 6, pp. 1048–1058, 2009

2009
[3]

Quadratic assignment problems,

R. E. Burkard, “Quadratic assignment problems,”European Journal of Operational Research, vol. 15, no. 3, pp. 283–289, 1984

1984
[4]

Cela,The quadratic assignment problem: theory and algorithms

E. Cela,The quadratic assignment problem: theory and algorithms. Springer Science & Business Media, 2013, vol. 1

2013
[5]

P-complete approximation problems,

S. Sahni and T. Gonzalez, “P-complete approximation problems,”Jour- nal of the ACM (JACM), vol. 23, no. 3, pp. 555–565, 1976

1976
[6]

Lower bounds for the quadratic assignment problem,

Y . Li, P. M. Pardalos, K. Ramakrishnan, and M. G. Resende, “Lower bounds for the quadratic assignment problem,”Annals of Operations Research, vol. 50, no. 1, pp. 387–410, 1994

1994
[7]

A thermodynamically motivated simulation procedure for combinatorial optimization problems,

R. E. Burkard and F. Rendl, “A thermodynamically motivated simulation procedure for combinatorial optimization problems,”European Journal of Operational Research, vol. 17, no. 2, pp. 169–174, 1984

1984
[8]

An improved annealing scheme for the qap,

D. T. Connolly, “An improved annealing scheme for the qap,”European Journal of Operational Research, vol. 46, no. 1, pp. 93–100, 1990

1990
[9]

Robust taboo search for the quadratic assignment problem,

´E. Taillard, “Robust taboo search for the quadratic assignment problem,” Parallel computing, vol. 17, no. 4-5, pp. 443–455, 1991

1991
[10]

Comparison of iterative searches for the quadratic assignment problem,

E. D. Taillard, “Comparison of iterative searches for the quadratic assignment problem,”Location science, vol. 3, no. 2, pp. 87–105, 1995

1995
[11]

Fitness landscape analysis and memetic algorithms for the quadratic assignment problem,

P. Merz and B. Freisleben, “Fitness landscape analysis and memetic algorithms for the quadratic assignment problem,”IEEE transactions on evolutionary computation, vol. 4, no. 4, pp. 337–352, 2002

2002
[12]

Memetic search for the quadratic assignment problem,

U. Benlic and J.-K. Hao, “Memetic search for the quadratic assignment problem,”Expert Systems with Applications, vol. 42, no. 1, pp. 584–595, 2015

2015
[13]

Attention, learn to solve routing problems!

W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” inInternational Conference on Learning Representations, 2018

2018
[14]

POMO: Policy optimization with multiple optima for reinforcement learning,

Y .-D. Kwon, B. K. Jinho Choo, Y . G. Iljoo Yoon, and S. Min, “POMO: Policy optimization with multiple optima for reinforcement learning,” in Advances in Neural Information Processing Systems, 2020

2020
[15]

Learning improvement heuristics for solving routing problems,

Y . Wu, W. Song, Z. Cao, J. Zhang, and A. Lim, “Learning improvement heuristics for solving routing problems,”IEEE transactions on neural networks and learning systems, vol. 33, no. 9, pp. 5057–5069, 2021

2021
[16]

LMask: Learn to solve constrained routing problems with lazy masking,

T. Li, H. Zou, J. Wu, and Z. Wen, “LMask: Learn to solve constrained routing problems with lazy masking,” inThe Fourteenth International Conference on Learning Representations, 2026

2026
[17]

Flexible job-shop scheduling via graph neural network and deep reinforcement learning,

W. Song, X. Chen, Q. Li, and Z. Cao, “Flexible job-shop scheduling via graph neural network and deep reinforcement learning,”IEEE Transactions on Industrial Informatics, vol. 19, no. 2, pp. 1600–1610, 2022

2022
[18]

Neural DAG scheduling via one-shot priority sampling,

W. Jeon, M. Gagrani, B. Bartan, W. W. Zeng, H. Teague, P. Zappi, and C. Lott, “Neural DAG scheduling via one-shot priority sampling,” inThe Eleventh International Conference on Learning Representations, 2023

2023
[19]

A learning method with gap-aware generation for heterogeneous dag scheduling,

R. Zhou, H. Zou, L. Zhou, C. Sun, and Z. Wen, “A learning method with gap-aware generation for heterogeneous dag scheduling,”arXiv preprint arXiv:2603.23249, 2026

work page arXiv 2026
[20]

Neural graph matching network: Learn- ing lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching,

R. Wang, J. Yan, and X. Yang, “Neural graph matching network: Learn- ing lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5261–5279, 2021

2021
[21]

Revocable deep reinforcement learning with affinity regularization for outlier-robust graph matching,

C. Liu, Z. Jiang, R. Wang, L. Huang, P. Lu, and J. Yan, “Revocable deep reinforcement learning with affinity regularization for outlier-robust graph matching,” inThe Eleventh International Conference on Learning Representations, 2023

2023
[22]

Learning solution-aware transformers for efficiently solving quadratic assignment problem,

Z. Tan and Y . Mu, “Learning solution-aware transformers for efficiently solving quadratic assignment problem,” inForty-first International Con- ference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024, 2024

2024
[23]

QAPLIB–a quadratic assignment problem library,

R. E. Burkard, S. E. Karisch, and F. Rendl, “QAPLIB–a quadratic assignment problem library,”Journal of Global optimization, vol. 10, no. 4, pp. 391–403, 1997

1997
[24]

Recent advances for the quadratic assignment problem with special emphasis on instances that are difficult for meta-heuristic methods,

Z. Drezner, P. M. Hahn, and ´E. D. Taillard, “Recent advances for the quadratic assignment problem with special emphasis on instances that are difficult for meta-heuristic methods,”Annals of Operations research, vol. 139, no. 1, pp. 65–94, 2005

2005
[25]

Revised note on learning quadratic assignment with graph neural networks,

A. Nowak, S. Villar, A. S. Bandeira, and J. Bruna, “Revised note on learning quadratic assignment with graph neural networks,” in2018 IEEE Data Science Workshop (DSW). IEEE, 2018, pp. 1–5

2018
[26]

Learning combinatorial embedding networks for deep graph matching,

R. Wang, J. Yan, and X. Yang, “Learning combinatorial embedding networks for deep graph matching,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3056–3065

2019
[27]

Neural Combinatorial Optimization with Reinforcement Learning

I. Bello, H. Pham, Q. V . Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,”arXiv preprint arXiv:1611.09940, 2016

work page Pith review arXiv 2016
[28]

Learning feature embedding refiner for solving vehicle routing prob- lems,

J. Li, Y . Ma, Z. Cao, Y . Wu, W. Song, J. Zhang, and Y . M. Chee, “Learning feature embedding refiner for solving vehicle routing prob- lems,”IEEE Transactions on Neural Networks and Learning Systems, 2023

2023
[29]

Winner takes it all: Training performant RL populations for combinatorial optimization,

N. Grinsztajn, D. Furelos-Blanco, S. Surana, C. Bonnet, and T. D. Barrett, “Winner takes it all: Training performant RL populations for combinatorial optimization,” inThirty-seventh Conference on Neural Information Processing Systems, 2023

2023
[30]

PolyNet: Learning diverse solution strategies for neural combinatorial optimization,

A. Hottung, M. Mahajan, and K. Tierney, “PolyNet: Learning diverse solution strategies for neural combinatorial optimization,” inThe Thir- teenth International Conference on Learning Representations, 2025

2025
[31]

Efficient active search for combinatorial optimization problems,

A. Hottung, Y .-D. Kwon, and K. Tierney, “Efficient active search for combinatorial optimization problems,” inInternational Conference on Learning Representations, 2022

2022
[32]

DIMES: A differentiable meta solver for combinatorial optimization problems,

R. Qiu, Z. Sun, and Y . Yang, “DIMES: A differentiable meta solver for combinatorial optimization problems,” inAdvances in Neural Informa- tion Processing Systems, 2022

2022
[33]

A Monte Carlo policy gra- dient method with local search for binary optimization,

C. Chen, R. Chen, T. Li, R. Ao, and Z. Wen, “A Monte Carlo policy gra- dient method with local search for binary optimization,”Mathematical Programming, pp. 1–57, 2025

2025
[34]

Revisiting sampling for combinatorial optimization,

H. Sun, K. Goshvadi, A. Nova, D. Schuurmans, and H. Dai, “Revisiting sampling for combinatorial optimization,” inICML, 2023

2023
[35]

Regularized langevin dynamics for combinatorial optimization,

S. Feng and Y . Yang, “Regularized langevin dynamics for combinatorial optimization,” inForty-second International Conference on Machine Learning, 2025

2025
[36]

Kissing to find a match: Efficient low-rank permutation representation,

H. Dr ¨oge, Z. L ¨ahner, Y . Bahat, O. M. Nadal, F. Heide, and M. Moeller, “Kissing to find a match: Efficient low-rank permutation representation,” inAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023

2023
[37]

An integer projected fixed point method for graph matching and map inference,

M. Leordeanu, M. Hebert, and R. Sukthankar, “An integer projected fixed point method for graph matching and map inference,”Advances in neural information processing systems, vol. 22, 2009

2009
[38]

A spectral technique for correspondence problems using pairwise constraints,

M. Leordeanu and M. Hebert, “A spectral technique for correspondence problems using pairwise constraints,” in10th IEEE International Con- ference on Computer Vision (ICCV 2005), 17-20 October 2005, Beijing, China. IEEE Computer Society, 2005, pp. 1482–1489

2005
[39]

Reweighted random walks for graph matching,

M. Cho, J. Lee, and K. M. Lee, “Reweighted random walks for graph matching,” inComputer Vision - ECCV 2010 - 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5- 11, 2010, Proceedings, Part V, ser. Lecture Notes in Computer Science, K. Daniilidis, P. Maragos, and N. Paragios, Eds., vol. 6315. Springer, 2010, pp. 492–505

2010
[40]

George and J

A. George and J. W. Liu,Computer solution of large sparse positive definite. Prentice Hall Professional Technical Reference, 1981

1981
[41]

On bounding the bandwidth of graphs with symmetry,

E. R. van Dam and R. Sotirov, “On bounding the bandwidth of graphs with symmetry,”INFORMS Journal on Computing, vol. 27, no. 1, pp. 75–88, 2015

2015
[42]

The university of florida sparse matrix collection,

T. A. Davis and Y . Hu, “The university of florida sparse matrix collection,”Acm transactions on mathematical software (toms), vol. 38, no. 1, pp. 1–25, 2011. 13 APPENDIXA SUPPLEMENTARYTHEORETICALRESULTS A.1 Proof of Theorem 2 Theorem 2.Letπ 1, ..., πN be i.i.d. samples fromp θ in (4). Then for any functiong: Π n →R, the gradient of the expectation Eπ∼pθ ...

2011