Recognition: 2 theorem links
· Lean TheoremEchoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
Pith reviewed 2026-05-15 19:50 UTC · model grok-4.3
The pith
Models trained only on short video clips can generate coherent audio exceeding five minutes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A multimodal hierarchical network augmented with non-causal Mamba enables length generalization in video-to-audio generation, so that training exclusively on short instances produces usable audio for videos longer than five minutes at test time.
What carries the argument
MMHNet: a multimodal hierarchical network that combines hierarchical feature processing with non-causal Mamba blocks to capture long-range video-audio temporal dependencies.
Load-bearing premise
The hierarchical structure plus non-causal Mamba can maintain alignment and coherence across long video-audio sequences even when no long examples were present in training.
What would settle it
Run the model on videos several times longer than the training clips and check whether audio-video synchronization scores or human coherence ratings collapse compared with short-video results.
Figures
read the original abstract
Scaling multimodal alignment between video and audio is challenging, particularly due to limited data and the mismatch between text descriptions and frame-level video information. In this work, we tackle the scaling challenge in multimodal-to-audio generation, examining whether models trained on short instances can generalize to longer ones during testing. To tackle this challenge, we present multimodal hierarchical networks so-called MMHNet, an enhanced extension of state-of-the-art video-to-audio models. Our approach integrates a hierarchical method and non-causal Mamba to support long-form audio generation. Our proposed method significantly improves long audio generation up to more than 5 minutes. We also prove that training short and testing long is possible in the video-to-audio generation tasks without training on the longer durations. We show in our experiments that our proposed method could achieve remarkable results on long-video to audio benchmarks, beating prior works in video-to-audio tasks. Moreover, we showcase our model capability in generating more than 5 minutes, while prior video-to-audio methods fall short in generating with long durations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MMHNet, a multimodal hierarchical extension of video-to-audio models that combines hierarchical temporal modeling with non-causal Mamba. It claims this architecture enables length generalization: models trained exclusively on short clips can generate coherent audio for videos exceeding 5 minutes at test time, without any long-duration training data, and that this 'proves' training-short/testing-long is feasible while outperforming prior video-to-audio methods on long-video benchmarks.
Significance. If the length-generalization results hold with proper validation, the work would address a key scaling bottleneck in multimodal generation—limited availability of long-form aligned video-audio data—by allowing extrapolation beyond training lengths. The hierarchical + non-causal Mamba design offers a plausible route to maintaining temporal alignment and acoustic consistency over extended durations, which could have practical impact on video editing and long-form content synthesis.
major comments (2)
- Abstract: The central claim that 'we prove that training short and testing long is possible' and that the method 'significantly improves long audio generation up to more than 5 minutes' is presented without any quantitative metrics, baselines, ablation studies, or experimental protocol. This absence is load-bearing because the title, abstract, and contribution rest entirely on an empirical demonstration of length generalization that cannot be assessed from the given text.
- Abstract: The assumption that the hierarchical MMHNet structure plus non-causal Mamba prevents error accumulation and distribution shift when extrapolating far beyond training lengths (e.g., evolving scene dynamics or cumulative audio drift) is stated but not supported by any mechanism description, theoretical argument, or empirical test. This is load-bearing for the length-generalization claim.
minor comments (1)
- Abstract: Phrases such as 'remarkable results' and 'beating prior works' are used without naming the specific benchmarks, comparison methods, or quantitative improvements, reducing clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the two major comments regarding the abstract below, providing clarifications from the full paper and indicating planned revisions to strengthen the presentation of our claims and mechanisms.
read point-by-point responses
-
Referee: Abstract: The central claim that 'we prove that training short and testing long is possible' and that the method 'significantly improves long audio generation up to more than 5 minutes' is presented without any quantitative metrics, baselines, ablation studies, or experimental protocol. This absence is load-bearing because the title, abstract, and contribution rest entirely on an empirical demonstration of length generalization that cannot be assessed from the given text.
Authors: We acknowledge that the abstract lacks specific quantitative metrics and protocol details, which limits immediate assessment. The full manuscript (Sections 4 and 5) reports results on long-video-to-audio benchmarks, including comparisons against prior video-to-audio methods with metrics such as audio quality scores and temporal alignment measures, demonstrating coherent generation beyond 5 minutes without long-duration training data. We will revise the abstract to include representative quantitative improvements and a brief reference to the benchmarks and setup. revision: yes
-
Referee: Abstract: The assumption that the hierarchical MMHNet structure plus non-causal Mamba prevents error accumulation and distribution shift when extrapolating far beyond training lengths (e.g., evolving scene dynamics or cumulative audio drift) is stated but not supported by any mechanism description, theoretical argument, or empirical test. This is load-bearing for the length-generalization claim.
Authors: The method section details how the hierarchical temporal modeling captures multi-scale dependencies while non-causal Mamba enables bidirectional long-range context without causal accumulation of errors. We will add a concise mechanism description to the abstract and expand the paper with additional empirical tests (e.g., drift analysis over extended sequences) and a brief theoretical note on Mamba's state-space properties for extrapolation. This revision will better support the claim. revision: yes
Circularity Check
No significant circularity: empirical result with no derivation chain or self-referential definitions
full rationale
The paper advances an empirical architecture (MMHNet: hierarchical extension of video-to-audio models using non-causal Mamba) and reports experimental outcomes on long-video benchmarks, including generation beyond 5 minutes when trained only on short clips. No equations, parameter-fitting steps, or formal derivations appear in the provided text. The central claim (training short, testing long is possible) is presented as an observed experimental outcome rather than a mathematical reduction to the model's own inputs or prior self-citations. No load-bearing uniqueness theorems, ansatzes smuggled via citation, or renamings of known results are invoked. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt the Mamba-2 architecture [6], which inherently supports sequence modeling without explicit positional embeddings... Non-Causal Mamba-2 [37] for two key reasons: 1) video conditions are available offline... 2) multimodal fusion across multiple modalities is difficult without a predefined order.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a hierarchical framework... temporal routing... MM routing... Chunking with downsampling... Dechunking with upsampling.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013. 6
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[2]
Vggsound: A large-scale audio-visual dataset
Honglie Chen, Weidi Xie, Andrea Vedaldi, and Andrew Zisserman. Vggsound: A large-scale audio-visual dataset. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 721–725. IEEE, 2020. 6
work page 2020
-
[3]
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. Extending context window of large lan- guage models via positional interpolation.arXiv preprint arXiv:2306.15595, 2023. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Mmaudio: Taming multimodal joint training for high-quality video-to- audio synthesis
Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, and Yuki Mitsufuji. Mmaudio: Taming multimodal joint training for high-quality video-to- audio synthesis. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28901–28911, 2025. 1, 2, 3, 4, 6, 7, 8
work page 2025
-
[5]
Lova: Long-form video-to-audio generation
Xin Cheng, Xihua Wang, Yihan Wu, Yuyue Wang, and Rui- hua Song. Lova: Long-form video-to-audio generation. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 2, 6, 7
work page 2025
-
[6]
Tri Dao and Albert Gu. Transformers are ssms: General- ized models and efficient algorithms through structured state space duality.arXiv preprint arXiv:2405.21060, 2024. 2, 3, 4, 5, 8
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Scaling recti- fied flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning,
-
[8]
Audio set: An ontology and human-labeled dataset for audio events
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, Ryan C Moore, Manoj Plakal, and Marvin Ritter. Audio set: An ontology and human-labeled dataset for audio events. In2017 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), pages 776–780. IEEE, 2017. 6
work page 2017
-
[9]
Dense-localizing audio-visual events in untrimmed videos: A large-scale benchmark and baseline
Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, and Feng Zheng. Dense-localizing audio-visual events in untrimmed videos: A large-scale benchmark and baseline. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22942–22951, 2023. 1, 2, 3, 6, 7
work page 2023
-
[10]
Tiantian Geng, Jinrui Zhang, Qingni Wang, Teng Wang, Jinming Duan, and Feng Zheng. Longvale: Vision-audio- language-event benchmark towards time-aware omni-modal perception of long videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18959– 18969, 2025. 1, 2, 6, 7
work page 2025
-
[11]
Imagebind: One embedding space to bind them all.arXiv preprint arXiv:2305.05665, 2023
Rohit Girdhar, Alexander Kirillov, Mathilde Caron, Ross Girshick, Piotr Doll ´ar, and Ishan Misra. Imagebind: One embedding space to bind them all.arXiv preprint arXiv:2305.05665, 2023. 7
-
[12]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
Lm-infinite: Zero-shot extreme length generalization for large language models
Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, and Sinong Wang. Lm-infinite: Zero-shot extreme length generalization for large language models. InProceed- ings of the 2024 Conference of the North American Chap- ter of the Association for Computational Linguistics: Hu- man Language Technologies (Volume 1: Long Papers), pages 3991–4008, 2024. 3
work page 2024
-
[14]
Mambavision: A hybrid mamba-transformer vision backbone
Ali Hatamizadeh and Jan Kautz. Mambavision: A hybrid mamba-transformer vision backbone. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25261–25270, 2025. 2
work page 2025
-
[15]
Zigma: A dit-style zigzag mamba diffusion model
Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, and Bjorn Ommer. Zigma: A dit-style zigzag mamba diffusion model. InArxiv, 2024. 2
work page 2024
-
[16]
Dynamic chunking for end-to-end hierarchical sequence modeling
Sukjun Hwang, Brandon Wang, and Albert Gu. Dynamic chunking for end-to-end hierarchical sequence modeling. arXiv preprint arXiv:2507.07955, 2025. 2, 6
-
[17]
Taming visually guided sound generation.arXiv preprint arXiv:2110.08791, 2021
Vladimir Iashin and Esa Rahtu. Taming visually guided sound generation.arXiv preprint arXiv:2110.08791, 2021. 1, 2
-
[18]
Synchformer: Efficient synchronization from sparse cues
Vladimir Iashin, Weidi Xie, Esa Rahtu, and Andrew Zisser- man. Synchformer: Efficient synchronization from sparse cues. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5325–5329. IEEE, 2024. 2, 4, 6, 7
work page 2024
-
[19]
Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Nate- san Ramamurthy, Payel Das, and Siva Reddy. The impact of positional encoding on length generalization in transform- ers.Advances in Neural Information Processing Systems, 36: 24892–24928, 2023. 2
work page 2023
-
[20]
Panns: Large-scale pretrained audio neural networks for audio pattern recogni- tion
Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D Plumbley. Panns: Large-scale pretrained audio neural networks for audio pattern recogni- tion. InIEEE/ACM Transactions on Audio, Speech, and Lan- guage Processing, pages 2880–2894. IEEE, 2020. 6, 7
work page 2020
-
[21]
Efficient training of audio transformers with patchout.arXiv preprint arXiv:2110.05069, 2021
Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer. Efficient training of audio transformers with patchout.arXiv preprint arXiv:2110.05069, 2021. 6
-
[22]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 3, 4
work page 2024
-
[23]
Diff-bgm: A diffusion model for video background mu- sic generation
Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, and Yang Liu. Diff-bgm: A diffusion model for video background mu- sic generation. InCVPR, 2024. 1
work page 2024
-
[24]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling.arXiv preprint arXiv:2210.02747, 2022. 3, 4 9
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[25]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Simian Luo, Chuanhao Yan, Chenxu Hu, and Hang Zhao. Diff-foley: Synchronized video-to-audio synthesis with la- tent diffusion models.Advances in Neural Information Pro- cessing Systems, 36:48855–48876, 2023. 1, 2
work page 2023
-
[27]
Text-to- audio generation synchronized with videos.arXiv preprint arXiv:2403.07938, 2024
Shentong Mo, Jing Shi, and Yapeng Tian. Text-to- audio generation synchronized with videos.arXiv preprint arXiv:2403.07938, 2024. 2
-
[28]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
-
[29]
YaRN: Efficient context window extension of large language models
Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole. YaRN: Efficient context window extension of large language models. InThe Twelfth International Conference on Learning Representations, 2024. 2
work page 2024
-
[30]
Film: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm de Vries, Vincent Du- moulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Con- ference on Artificial Intelligence, 2018. 4
work page 2018
-
[31]
Plumbley, Thomas Blumensath, Laurent Daudet, R´emi Gribonval, and Mike E
Mark D. Plumbley, Thomas Blumensath, Laurent Daudet, R´emi Gribonval, and Mike E. Davies. Sparse representa- tions in audio and music: From coding to source separation. Proceedings of the IEEE, 98(6):995–1005, 2010. 2
work page 2010
-
[32]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 4, 6
work page 2021
-
[33]
Soundreac- tor: Frame-level online video-to-audio generation.arXiv preprint arXiv:2510.02110, 2025
Koichi Saito, Julian Tanke, Christian Simon, Masato Ishii, Kazuki Shimada, Zachary Novack, Zhi Zhong, Akio Hayakawa, Takashi Shibuya, and Yuki Mitsufuji. Soundreac- tor: Frame-level online video-to-audio generation.arXiv preprint arXiv:2510.02110, 2025. 2
-
[34]
Ssamba: Self-supervised audio representation learning with mamba state space model
Siavash Shams, Sukru Samet Dindar, Xilin Jiang, and Nima Mesgarani. Ssamba: Self-supervised audio representation learning with mamba state space model. In2024 IEEE Spoken Language Technology Workshop (SLT), pages 1053–
-
[35]
Sizhe Shan, Qiulin Li, Yutao Cui, Miles Yang, Yuehai Wang, Qun Yang, Jin Zhou, and Zhao Zhong. Hunyuanvideo- foley: Multimodal diffusion with representation alignment for high-fidelity foley audio generation, 2025. 2, 6, 7
work page 2025
-
[36]
I hear your true colors: Im- age guided audio generation
Roy Sheffer and Yossi Adi. I hear your true colors: Im- age guided audio generation. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023. 1, 2, 3
work page 2023
-
[37]
Vssd: Vision mamba with non-causal state space duality
Yuheng Shi, Minjing Dong, Mingjia Li, and Chang Xu. Vssd: Vision mamba with non-causal state space duality. arXiv preprint arXiv:2407.18559, 2024. 3, 4, 8
-
[38]
Titan-guide: Taming inference-time alignment for guided text-to-video diffusion models
Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, and Yuki Mitsufuji. Titan-guide: Taming inference-time alignment for guided text-to-video diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16662–16671, 2025. 2
work page 2025
-
[39]
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced trans- former with rotary position embedding.arXiv preprint arXiv:2104.09864, 2021. 4
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[40]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063,
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063,
-
[41]
Fourier features let networks learn high frequency functions in low dimen- sional domains
Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimen- sional domains. InProc. NeurIPS, pages 7537–7547. Curran Associates, Inc., 2020. 2, 6
work page 2020
-
[42]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muham- mad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language en- coders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2, 4
work page 2017
-
[44]
Temporally aligned audio for video with autoregression
Ilpo Viertola, Vladimir Iashin, and Esa Rahtu. Temporally aligned audio for video with autoregression. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,
work page 2025
-
[45]
Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, and Liqiang Nie. ReTaKe: Reducing Temporal and Knowl- edge Redundancy for Long Video Understanding, 2024. arXiv:2412.20504 [cs]. 2
-
[46]
Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, and Zhou Zhao. Frieren: Efficient video-to-audio generation network with rectified flow matching.Advances in Neural Information Processing Systems, 37:128118–128138, 2024. 2
work page 2024
-
[47]
Zhifan Ye, Kejing Xia, Yonggan Fu, Xin Dong, Jihoon Hong, Xiangchi Yuan, Shizhe Diao, Jan Kautz, Pavlo Molchanov, and Yingyan Celine Lin. Longmamba: Enhancing mamba’s long context capabilities via training-free receptive field en- largement.arXiv preprint arXiv:2504.16053, 2025. 5
-
[48]
Audio-synchronized visual animation
Lin Zhang, Shentong Mo, Yijing Zhang, and Pedro Mor- gado. Audio-synchronized visual animation. InECCV, 2024. 1
work page 2024
-
[49]
Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, and Kai Chen. Foley- crafter: Bring silent videos to life with lifelike and synchro- nized sounds.arXiv preprint arXiv:2407.01494, 2024. 1, 2
-
[50]
Long-video audio synthesis with multi-agent collabo- ration.arXiv preprint arXiv:2503.10719, 2025
Yehang Zhang, Xinli Xu, Xiaojie Xu, Li Liu, and Yingcong Chen. Long-video audio synthesis with multi-agent collabo- ration.arXiv preprint arXiv:2503.10719, 2025. 1, 3 10
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.