Recognition: 2 theorem links
· Lean TheoremEquivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems
Pith reviewed 2026-05-10 18:11 UTC · model grok-4.3
The pith
By aligning multimodal features self-supervised and training equivariant GNN policies in MARL, roadside units can maximize rates in V2I systems while respecting vehicle location symmetries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a self-supervised multimodal sensing framework extracts vehicle positions by aligning latent features, which then feed an equivariant GNN-based MARL policy with message passing and a signaling scheme for coordination; under simulation with ray-tracing and graphics data, this delivers more than twofold accuracy gains over baselines in position estimation and more than 50 percent performance gains over standard MARL in rate maximization.
What carries the argument
Equivariant policy network using a graph neural network with message-passing layers, preceded by self-supervised alignment of multimodal latent features to extract vehicle positions.
If this is right
- The self-supervised multimodal sensing generalizes and produces more than twofold accuracy gains in vehicle position extraction compared with baselines.
- Equivariant MARL training produces more than 50 percent performance gains in decentralized rate maximization over non-equivariant approaches.
- Each agent computes its policy locally while a signaling scheme overcomes partial observability and maintains global policy equivariance.
- Rotation symmetries of vehicle locations are incorporated directly into the policy structure via the GNN architecture.
Where Pith is reading between the lines
- If the simulation symmetries persist in field deployments, the method could reduce labeled-data requirements for training large-scale V2I networks.
- The same combination of self-supervised alignment and equivariant message passing could transfer to other geometrically symmetric multi-agent settings such as traffic signal control or drone coordination.
- The signaling coordination layer might be adapted to handle additional uncertainties like sensor failures or changing vehicle densities.
Load-bearing premise
That the rotational symmetries of vehicle locations can be faithfully captured by the GNN policy and that the self-supervised feature alignment reliably extracts accurate positions from multimodal observations under the simulation conditions used.
What would settle it
A simulation or real-world trial in which vehicle positions break the assumed rotational symmetries or multimodal data introduces alignment errors that erase the reported accuracy and performance improvements.
Figures
read the original abstract
In this paper, we study a vehicle-to-infrastructure (V2I) system where distributed base stations (BSs) acting as road-side units (RSUs) collect multimodal (wireless and visual) data from moving vehicles. We consider a decentralized rate maximization problem, where each RSU relies on its local observations to optimize its resources, while all RSUs must collaborate to guarantee favorable network performance. We recast this problem as a distributed multi-agent reinforcement learning (MARL) problem, by incorporating rotation symmetries in terms of vehicles' locations. To exploit these symmetries, we propose a novel self-supervised learning framework where each BS agent aligns the latent features of its multimodal observation to extract the positions of the vehicles in its local region. Equipped with this sensing data at each RSU, we train an equivariant policy network using a graph neural network (GNN) with message passing layers, such that each agent computes its policy locally, while all agents coordinate their policies via a signaling scheme that overcomes partial observability and guarantees the equivariance of the global policy. We present numerical results carried out in a simulation environment, where ray-tracing and computer graphics are used to collect wireless and visual data. Results show the generalizability of our self-supervised and multimodal sensing approach, achieving more than two-fold accuracy gains over baselines, and the efficiency of our equivariant MARL training, attaining more than 50% performance gains over standard approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a self-supervised multimodal (wireless and visual) sensing framework to extract vehicle positions in a V2I system, which is then used to train an equivariant GNN-based policy for decentralized MARL that maximizes rates while exploiting rotational symmetries of vehicle locations via message passing and a signaling scheme to handle partial observability.
Significance. If the performance claims hold under rigorous validation, the work could advance decentralized MARL for wireless networks by showing how self-supervision on multimodal data and symmetry-aware GNN policies improve generalization and efficiency in V2I rate optimization. The reported gains suggest practical value for intelligent transportation systems, though the simulation-only evaluation limits broader claims.
major comments (3)
- [Abstract] Abstract: the claim of 'more than two-fold accuracy gains over baselines' for the self-supervised multimodal sensing provides no description of the baselines, the position accuracy metric (e.g., RMSE or IoU), ablation studies separating multimodal alignment from single-modality inputs, or statistical significance tests; this is load-bearing for the generalizability assertion.
- [Abstract] Abstract / Numerical Results: the claim of 'more than 50% performance gains over standard approaches' in the equivariant MARL lacks specification of the standard approaches (e.g., non-equivariant MARL or centralized RL), sensitivity analysis to simulation parameters such as vehicle density or ray-tracing settings, and any ablation on the signaling scheme; without these the efficiency claim cannot be assessed.
- [Proposed framework] Proposed framework (self-supervised alignment): the latent-feature matching objective is asserted to recover accurate metric vehicle positions without explicit labels, yet no analysis or test is provided showing robustness outside the specific ray-tracing/graphics simulator (e.g., under changed antenna patterns, lighting, or densities); this directly affects whether the downstream MARL gains are artifacts of the simulation.
minor comments (1)
- [Abstract] The abstract would benefit from a brief parenthetical definition of 'equivariant policy' and 'signaling scheme' for readers outside MARL.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that clarifying the abstract claims and providing additional robustness analysis will strengthen the manuscript. We respond to each major comment below and will make the indicated revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of 'more than two-fold accuracy gains over baselines' for the self-supervised multimodal sensing provides no description of the baselines, the position accuracy metric (e.g., RMSE or IoU), ablation studies separating multimodal alignment from single-modality inputs, or statistical significance tests; this is load-bearing for the generalizability assertion.
Authors: We acknowledge that the abstract is concise and omits these details. The manuscript body describes the baselines as single-modality sensing and supervised methods, uses RMSE as the position accuracy metric, presents ablations separating multimodal alignment from single-modality inputs, and reports statistical significance via repeated runs with confidence intervals. In the revised manuscript we will expand the abstract to concisely specify the baselines, metric, reference the ablations, and note the significance testing. revision: yes
-
Referee: [Abstract] Abstract / Numerical Results: the claim of 'more than 50% performance gains over standard approaches' in the equivariant MARL lacks specification of the standard approaches (e.g., non-equivariant MARL or centralized RL), sensitivity analysis to simulation parameters such as vehicle density or ray-tracing settings, and any ablation on the signaling scheme; without these the efficiency claim cannot be assessed.
Authors: We agree that explicit specification is needed. The standard approaches are non-equivariant GNN-based MARL and centralized RL, as evaluated in the numerical results. We will revise the abstract to name them. We will also add sensitivity analysis varying vehicle density and ray-tracing parameters together with an ablation isolating the signaling scheme in the updated numerical results section. revision: yes
-
Referee: [Proposed framework] Proposed framework (self-supervised alignment): the latent-feature matching objective is asserted to recover accurate metric vehicle positions without explicit labels, yet no analysis or test is provided showing robustness outside the specific ray-tracing/graphics simulator (e.g., under changed antenna patterns, lighting, or densities); this directly affects whether the downstream MARL gains are artifacts of the simulation.
Authors: The evaluation uses a ray-tracing and graphics simulator that permits controlled variation of parameters. We will add experiments in the revised manuscript that vary vehicle densities, antenna array configurations, and lighting conditions within the simulator to test robustness of the latent-feature matching. These additions will show that the position recovery and downstream MARL gains are not tied to one fixed simulator configuration. revision: partial
Circularity Check
No significant circularity; empirical results from training, not tautological derivations
full rationale
The paper frames its contributions as a self-supervised multimodal alignment step feeding into an equivariant GNN-MARL policy, with all headline gains (>2× accuracy, >50% performance) reported as simulation outcomes under ray-tracing and graphics models. No equations, fitted parameters, or uniqueness theorems are presented that reduce predictions to inputs by construction. The derivation relies on standard MARL value functions, GNN message passing for equivariance, and a latent-feature matching objective; these are trained end-to-end rather than defined circularly. No self-citation chains or ansatzes are load-bearing for the central claims. The reader's assessment of score 2 is consistent with minor normal self-citation at most, but the chain remains independent and falsifiable via external simulation benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The V2I resource allocation problem can be modeled as a partially observable Markov decision process amenable to distributed MARL.
- domain assumption Rotational symmetries in vehicle locations can be incorporated into policy networks without loss of optimality.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
self-supervised multimodal learning framework where each BS agent aligns the latent features of its multimodal observation to extract the positions of the vehicles
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
equivariant policy network using a graph neural network (GNN) with message passing layers... guarantees the equivariance of the global policy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A vision of 6g wireless systems: Applications, trends, technologies, and open research problems,
W. Saad, M. Bennis, and M. Chen, “A vision of 6g wireless systems: Applications, trends, technologies, and open research problems,” IEEE network, vol. 34, no. 3, pp. 134–142, 2019
2019
-
[2]
Twelve scientific challenges for 6g: Rethinking the foundations of communications theory,
M. Chafii, L. Bariah, S. Muhaidat, and M. Debbah, “Twelve scientific challenges for 6g: Rethinking the foundations of communications theory,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 868–904, 2023
2023
-
[3]
Wireless communications and applications above 100 ghz: Opportunities and challenges for 6g and beyond,
T. S. Rappaport, Y . Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal, A. Alkhateeb, and G. C. Trichopoulos, “Wireless communications and applications above 100 ghz: Opportunities and challenges for 6g and beyond,”IEEE access, vol. 7, pp. 78 729– 78 757, 2019
2019
-
[4]
Large generative AI models for telecom: The next big thing?
L. Bariah, Q. Zhao, H. Zou, Y . Tian, F. Bader, and M. Debbah, “Large language models for telecom: The next big thing?”arXiv preprint arXiv:2306.10249, 2023
-
[5]
Going beyond rf: A survey on how ai-enabled multimodal beamforming will shape the nextg standard,
D. Roy, B. Salehi, S. Banou, S. Mohanti, G. Reus-Muns, M. Belgiovine, P. Ganesh, C. Dick, and K. Chowdhury, “Going beyond rf: A survey on how ai-enabled multimodal beamforming will shape the nextg standard,”Computer Networks, vol. 228, p. 109729, 2023
2023
-
[6]
Beam design for beam switching based millimeter wave vehicle-to-infrastructure communications,
V . Va, T. Shimizu, G. Bansal, and R. W. Heath, “Beam design for beam switching based millimeter wave vehicle-to-infrastructure communications,” in2016 IEEE International Conference on Communications (ICC). IEEE, 2016, pp. 1–6
2016
-
[7]
Applying deep-learning-based computer vision to wireless communications: Methodologies, opportunities, and challenges,
Y . Tian, G. Pan, and M.-S. Alouini, “Applying deep-learning-based computer vision to wireless communications: Methodologies, opportunities, and challenges,”IEEE Open Journal of the Communications Society, vol. 2, pp. 132–143, 2020
2020
-
[8]
Sensing aided reconfigurable intelligent surfaces for 3gpp 5g transparent operation,
S. Jiang, A. Hindy, and A. Alkhateeb, “Sensing aided reconfigurable intelligent surfaces for 3gpp 5g transparent operation,”IEEE Transactions on Communications, pp. 1–1, 2023
2023
-
[9]
Lidar aided future beam prediction in real-world millimeter wave v2i communications,
S. Jiang, G. Charan, and A. Alkhateeb, “Lidar aided future beam prediction in real-world millimeter wave v2i communications,”IEEE Wireless Communications Letters, vol. 12, no. 2, pp. 212–216, 2022
2022
-
[10]
Vision-aided 6g wireless communications: Blockage prediction and proactive handoff,
G. Charan, M. Alrabeiah, and A. Alkhateeb, “Vision-aided 6g wireless communications: Blockage prediction and proactive handoff,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 10 193–10 208, 2021
2021
-
[11]
Vision-position multi-modal beam prediction using real millimeter wave datasets,
G. Charan, T. Osman, A. Hredzak, N. Thawdar, and A. Alkhateeb, “Vision-position multi-modal beam prediction using real millimeter wave datasets,” in2022 IEEE Wireless Communications and Networking Conference (WCNC), 2022, pp. 2727–2731. 52
2022
-
[12]
Deep learning on visual and location data for v2i mmwave beamforming,
G. Reus-Muns, B. Salehi, D. Roy, T. Jian, Z. Wang, J. Dy, S. Ioannidis, and K. Chowdhury, “Deep learning on visual and location data for v2i mmwave beamforming,” in2021 17th International Conference on Mobility, Sensing and Networking (MSN), 2021, pp. 559–566
2021
-
[13]
Deep learning on multimodal sensor data at the wireless edge for vehicular network,
B. Salehi, G. Reus-Muns, D. Roy, Z. Wang, T. Jian, J. Dy, S. Ioannidis, and K. Chowdhury, “Deep learning on multimodal sensor data at the wireless edge for vehicular network,”IEEE Transactions on Vehicular Technology, vol. 71, no. 7, pp. 7639–7655, 2022
2022
-
[14]
Environment semantic communication: Enabling distributed sensing aided networks,
S. Imran, G. Charan, and A. Alkhateeb, “Environment semantic communication: Enabling distributed sensing aided networks,”IEEE Open Journal of the Communications Society, 2024
2024
-
[15]
Multi-modal data based semi-supervised learning for vehicle positioning,
O. Huan, Y . Yang, T. Luo, and M. Chen, “Multi-modal data based semi-supervised learning for vehicle positioning,”IEEE Transactions on Communications, 2024
2024
-
[16]
Multi-modal image and radio frequency fusion for optimizing vehicle positioning,
O. Huan, T. Luo, and M. Chen, “Multi-modal image and radio frequency fusion for optimizing vehicle positioning,”IEEE Transactions on Mobile Computing, 2024
2024
-
[17]
Self-supervised radio-visual representation learning for 6g sensing,
M. Alloulah, A. D. Singh, and M. Arnold, “Self-supervised radio-visual representation learning for 6g sensing,” inICC 2022-IEEE International Conference on Communications. IEEE, 2022, pp. 1955–1961
2022
-
[18]
Mdp homomorphic networks: Group symmetries in reinforcement learning,
E. Van der Pol, D. Worrall, H. van Hoof, F. Oliehoek, and M. Welling, “Mdp homomorphic networks: Group symmetries in reinforcement learning,”Advances in Neural Information Processing Systems, vol. 33, pp. 4199–4210, 2020
2020
-
[19]
Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks,
H. Jianye, X. Hao, H. Mao, W. Wang, Y . Yang, D. Li, Y . Zheng, and Z. Wang, “Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks,” inThe Eleventh International Conference on Learning Representations, 2022
2022
-
[20]
Esp: Exploiting symmetry prior for multi-agent reinforcement learning,
X. Yu, R. Shi, P. Feng, Y . Tian, J. Luo, and W. Wu, “Esp: Exploiting symmetry prior for multi-agent reinforcement learning,” inECAI
-
[21]
2946–2953
IOS Press, 2023, pp. 2946–2953
2023
-
[22]
Multi-agent MDP homomorphic networks,
E. van der Pol, H. van Hoof, F. A. Oliehoek, and M. Welling, “Multi-agent MDP homomorphic networks,” inInternational Conference on Learning Representations, 2022
2022
-
[23]
Symmetry-augmented multi-agent reinforcement learning for scalable uav trajectory design and user scheduling,
X. Zhou, J. Xiong, H. Zhao, C. Yan, and J. Wei, “Symmetry-augmented multi-agent reinforcement learning for scalable uav trajectory design and user scheduling,”IEEE Transactions on Mobile Computing, 2024
2024
-
[24]
Symmetry-informed marl: A decentralized and cooperative uav swarm control approach for communication coverage,
R. Shi, X. Yu, Y . Wang, Y . Tian, Z. Liu, W. Wu, X.-P. Zhang, and M. M. Veloso, “Symmetry-informed marl: A decentralized and cooperative uav swarm control approach for communication coverage,”IEEE Transactions on Mobile Computing, 2025
2025
-
[25]
Communication-efficient multimodal split learning for mmwave received power prediction,
Y . Koda, J. Park, M. Bennis, K. Yamamoto, T. Nishio, M. Morikura, and K. Nakashima, “Communication-efficient multimodal split learning for mmwave received power prediction,”IEEE Communications Letters, vol. 24, no. 6, pp. 1284–1288, 2020
2020
-
[26]
Computer vision aided beam tracking in a real-world millimeter wave deployment,
S. Jiang and A. Alkhateeb, “Computer vision aided beam tracking in a real-world millimeter wave deployment,” in2022 IEEE Globecom Workshops (GC Wkshps), 2022, pp. 142–147
2022
-
[27]
Sionna: An Open-Source Library for Next-Generation Physical Layer Research,
J. Hoydis, S. Cammerer, F. A. Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, “Sionna: An open-source library for next-generation physical layer research,”arXiv preprint arXiv:2203.11854, 2022
-
[28]
Emergent communication in multi-agent reinforcement learning for future wireless networks,
M. Chafii, S. Naoumi, R. Alami, E. Almazrouei, M. Bennis, and M. Debbah, “Emergent communication in multi-agent reinforcement learning for future wireless networks,”IEEE Internet of Things Magazine, vol. 6, no. 4, pp. 18–24, 2023
2023
-
[29]
C. C. Pinter,A book of abstract algebra. Courier Corporation, 2010
2010
-
[31]
Harmonic networks: Deep translation and rotation equivariance,
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028–5037. 53
2017
-
[32]
Learning symmetric embeddings for equivariant world models,
J. Y . Park, O. Biza, L. Zhao, J.-W. Van De Meent, and R. Walters, “Learning symmetric embeddings for equivariant world models,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 202...
2022
-
[33]
Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,
C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 7464– 7475
2023
-
[34]
Channel charting: Locating users within the radio environment using channel state information,
C. Studer, S. Medjkouh, E. Gonulta¸ s, T. Goldstein, and O. Tirkkonen, “Channel charting: Locating users within the radio environment using channel state information,”IEEE Access, vol. 6, pp. 47 682–47 698, 2018
2018
-
[35]
Angle-delay profile-based and timestamp-aided dissimilarity metrics for channel charting,
P. Stephan, F. Euchner, and S. Ten Brink, “Angle-delay profile-based and timestamp-aided dissimilarity metrics for channel charting,” IEEE Transactions on Communications, 2024
2024
-
[36]
Anote on two problems in connection with graphs,
E. Dijkstra, “Anote on two problems in connection with graphs,”Numer Math, vol. 1, pp. 101–118, 1959
1959
-
[37]
Generalized unsupervised manifold alignment,
Z. Cui, H. Chang, S. Shan, and X. Chen, “Generalized unsupervised manifold alignment,”Advances in Neural Information Processing Systems, vol. 27, 2014
2014
-
[38]
Unsupervised topological alignment for single-cell multi-omics integration,
K. Cao, X. Bai, Y . Hong, and L. Wan, “Unsupervised topological alignment for single-cell multi-omics integration,”Bioinformatics, vol. 36, no. Supplement_1, pp. i48–i56, 2020
2020
-
[39]
Joint variational autoencoders for multimodal imputation and embedding,
N. Cohen Kalafut, X. Huang, and D. Wang, “Joint variational autoencoders for multimodal imputation and embedding,”Nature Machine Intelligence, pp. 1–12, 2023
2023
-
[40]
Channel charting in real-world coordinates with distributed mimo,
S. Taner, V . Palhares, and C. Studer, “Channel charting in real-world coordinates with distributed mimo,”IEEE Transactions on Wireless Communications, 2025
2025
-
[41]
Group equivariant convolutional networks,
T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning. PMLR, 2016, pp. 2990–2999
2016
-
[42]
Relational inductive biases, deep learning, and graph networks
P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez-Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkneret al., “Relational inductive biases, deep learning, and graph networks,”arXiv preprint arXiv:1806.01261, 2018
work page internal anchor Pith review arXiv 2018
-
[43]
Neural enhanced belief propagation on factor graphs,
V . G. Satorras and M. Welling, “Neural enhanced belief propagation on factor graphs,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 685–693
2021
-
[44]
A. Lazaridou and M. Baroni, “Emergent multi-agent communication in the deep learning era,”arXiv preprint arXiv:2006.02419, 2020
-
[45]
Lightweight deep learning for resource-constrained environments: A survey,
H.-I. Liu, M. Galindo, H. Xie, L.-K. Wong, H.-H. Shuai, Y .-H. Li, and W.-H. Cheng, “Lightweight deep learning for resource-constrained environments: A survey,”ACM Computing Surveys, vol. 56, no. 10, pp. 1–42, 2024
2024
-
[46]
Optimal brain compression: A framework for accurate post-training quantization and pruning,
E. Frantar and D. Alistarh, “Optimal brain compression: A framework for accurate post-training quantization and pruning,”Advances in Neural Information Processing Systems, vol. 35, pp. 4475–4488, 2022
2022
-
[47]
Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,
M. M. H. Shuvo, S. K. Islam, J. Cheng, and B. I. Morshed, “Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,”Proceedings of the IEEE, vol. 111, no. 1, pp. 42–91, 2023
2023
-
[48]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
arXiv preprint arXiv:2001.08317 , year=
N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep transformer models for time series forecasting: The influenza prevalence case,” arXiv preprint arXiv:2001.08317, 2020
-
[50]
Learning to forget: continual prediction with lstm,
F. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: continual prediction with lstm,” in1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), vol. 2, 1999, pp. 850–855 vol.2
1999
-
[51]
Multi-agent actor-critic for mixed cooperative-competitive environments,
R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,”Advances in neural information processing systems, vol. 30, 2017. 54
2017
-
[52]
Monotonic value function factorisation for deep multi-agent reinforcement learning,
T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020
2020
-
[53]
Symmetries and model minimization in markov decision processes,
B. Ravindran and A. G. Barto, “Symmetries and model minimization in markov decision processes,” University of Massachusetts, Tech. Rep., 2001
2001
-
[54]
Leveraging partial symmetry for multi-agent reinforcement learning,
X. Yu, R. Shi, P. Feng, Y . Tian, S. Li, S. Liao, and W. Wu, “Leveraging partial symmetry for multi-agent reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 583–17 590
2024
-
[55]
The matrix cookbook,
K. B. Petersen, M. S. Pedersenet al., “The matrix cookbook,”Technical University of Denmark, vol. 7, no. 15, p. 510, 2008
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.