pith. machine review for the scientific record. sign in

arxiv: 2604.16377 · v2 · submitted 2026-03-24 · 💻 cs.CL · cs.CY

Recognition: 2 theorem links

· Lean Theorem

GoCoMA: Hyperbolic Multimodal Representation Fusion for Large Language Model-Generated Code Attribution

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:36 UTC · model grok-4.3

classification 💻 cs.CL cs.CY
keywords LLM code attributionhyperbolic embeddingsmultimodal fusioncode stylometrybinary artifactsPoincaré ballcross-modal attention
0
0 comments X

The pith

GoCoMA fuses stylometric code features and binary artifact images in hyperbolic space to attribute LLM sources more accurately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GoCoMA to determine whether a piece of code was generated by a particular large language model. It models an extrinsic hierarchy where code stylometry captures higher-level structural signatures and binary pre-executable artifact images capture lower-level byte semantics. Embeddings from these modalities are projected into a hyperbolic Poincaré ball, fused using geodesic-cosine similarity-based cross-modal attention, and then used for attribution after back-projection to Euclidean space. Experiments demonstrate that this approach outperforms both unimodal methods and Euclidean multimodal baselines on two benchmarks. This matters because accurate attribution can address security risks and licensing issues arising from LLM-generated code.

Core claim

GoCoMA projects modality embeddings into a hyperbolic Poincaré ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism, and back-projects the fused representation to Euclidean space for final LLM-source attribution, consistently outperforming unimodal and Euclidean multimodal baselines on the CoDET-M4 and LLMAuthorBench benchmarks.

What carries the argument

Geodesic-cosine similarity-based cross-modal attention (GCSA) for fusing hyperbolic embeddings of code stylometry and binary pre-executable artifacts.

Load-bearing premise

Code stylometry forms a higher extrinsic level than binary pre-executable artifact images in a hierarchy that hyperbolic geometry can exploit to improve attribution.

What would settle it

Demonstrating that performance does not improve or degrades when using the hyperbolic projection and GCSA fusion compared to Euclidean alternatives on the same benchmarks would disprove the central benefit.

Figures

Figures reproduced from arXiv: 2604.16377 by Arun Balaji Buduru, Bhavinkumar Vinodbhai Kuwar, Bikrant Bikram Pratap Maurya, Nitin Choudhury.

Figure 1
Figure 1. Figure 1: Overview of GoCoMA Architecture a) Projection from Euclidean to Hyperbolic Space : Let h (E) code, h (E) img ∈ R n denote Euclidean embeddings extracted from the code PLM and vision PTM, respectively. Each embedding is projected to the hyperbolic Poincare ball ´ D n c = {x ∈ R n : c∥x∥ 2 < 1} through the Riemannian exponential map: h (H) = expc 0 (h (E) ) h (H) = tanh√ c ∥h (E) ∥  h (E) √ c ∥h(E)∥ . (1) … view at source ↗
read the original abstract

Large Language Models (LLMs) trained on massive code corpora are now increasingly capable of generating code that is hard to distinguish from human-written code. This raises practical concerns, including security vulnerabilities and licensing ambiguity, and also motivates a forensic question: 'Who (or which LLM) wrote this piece of code?' We present GoCoMA, a multimodal framework that models an extrinsic hierarchy between (i) code stylometry, capturing higher-level structural and stylistic signatures, and (ii) image representations of binary pre-executable artifacts (BPEA), capturing lower-level, execution-oriented byte semantics shaped by compilation and toolchains. GoCoMA projects modality embeddings into a hyperbolic Poincar\'e ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism, and back-projects the fused representation to Euclidean space for final LLM-source attribution. Experiments on two open-source benchmarks (CoDET-M4 and LLMAuthorBench) show that GoCoMA consistently outperforms unimodal and Euclidean multimodal baselines under identical evaluation protocols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GoCoMA, a multimodal framework for attributing code to its LLM source (or human). It explicitly models an extrinsic hierarchy with code stylometry as higher-level and binary pre-executable artifact (BPEA) images as lower-level, projects both modality embeddings into a hyperbolic Poincaré ball, fuses them via geodesic-cosine similarity-based cross-modal attention (GCSA), and back-projects the result to Euclidean space for final classification. Experiments on the CoDET-M4 and LLMAuthorBench benchmarks are reported to show consistent gains over unimodal and Euclidean multimodal baselines under identical protocols.

Significance. If the reported gains are reproducible, the work would demonstrate a concrete use of hyperbolic geometry to capture hierarchical structure in code stylometry and execution artifacts, offering a new direction for forensic attribution of LLM-generated code. The GCSA fusion mechanism is a specific technical contribution that could be tested in related multimodal settings.

major comments (2)
  1. [§4] §4 (Experiments): the abstract states that GoCoMA 'consistently outperforms' baselines on CoDET-M4 and LLMAuthorBench, yet no data splits, exact metrics, statistical significance tests, or ablation results are referenced. Without these, the central empirical claim cannot be verified and remains load-bearing for the paper's contribution.
  2. [§3.2] §3.2 (Hyperbolic Projection and Hierarchy): the extrinsic hierarchy between stylometry (higher) and BPEA images (lower) is asserted without supporting evidence, prior literature, or sensitivity analysis. Because the choice of Poincaré ball and geodesic-cosine fusion rests directly on this hierarchy, its justification is required for the modeling approach to be defensible.
minor comments (2)
  1. [§3.3] Notation for the fused representation after back-projection should be defined once and used consistently; currently the transition from hyperbolic to Euclidean space is described only at a high level.
  2. [§2] Add a short related-work paragraph contrasting GCSA with existing hyperbolic attention mechanisms (e.g., those based on Möbius operations) to clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to improve clarity and verifiability of the results and modeling choices.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): the abstract states that GoCoMA 'consistently outperforms' baselines on CoDET-M4 and LLMAuthorBench, yet no data splits, exact metrics, statistical significance tests, or ablation results are referenced. Without these, the central empirical claim cannot be verified and remains load-bearing for the paper's contribution.

    Authors: We agree that the current presentation of experimental details is insufficient for full verification. In the revised manuscript we will expand §4 with explicit descriptions of the train/validation/test splits for both CoDET-M4 and LLMAuthorBench, report exact metric values (accuracy, macro-F1, etc.) together with standard deviations over multiple runs, include results of statistical significance tests (paired t-tests with p-values) against all baselines, and provide comprehensive ablation tables isolating the contributions of hyperbolic projection, GCSA, and the hierarchy. These additions will directly support the reproducibility of the reported gains. revision: yes

  2. Referee: [§3.2] §3.2 (Hyperbolic Projection and Hierarchy): the extrinsic hierarchy between stylometry (higher) and BPEA images (lower) is asserted without supporting evidence, prior literature, or sensitivity analysis. Because the choice of Poincaré ball and geodesic-cosine fusion rests directly on this hierarchy, its justification is required for the modeling approach to be defensible.

    Authors: We acknowledge that the hierarchy motivation requires stronger grounding. In the revision we will add citations to prior literature on hierarchical multimodal representations in code analysis and vision-language models that similarly assign higher-level semantic features to one modality and lower-level execution or pixel features to another. We will also insert a sensitivity analysis (new subsection or appendix) that swaps the modality levels and reports the resulting performance drop, thereby demonstrating that the chosen hierarchy is empirically beneficial rather than arbitrary. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a multimodal fusion approach using hyperbolic projections and geodesic-cosine attention for code attribution, with claims of outperformance on CoDET-M4 and LLMAuthorBench benchmarks under identical protocols. No equations, derivations, or modeling steps are provided in the available text that reduce any prediction or result to a fitted parameter, self-definition, or self-citation chain by construction. The hierarchy between stylometry and BPEA images is presented as an explicit modeling assumption rather than derived from prior results, and the central performance claims rest on empirical comparisons without load-bearing internal reductions. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method description is high-level and does not enumerate modeling assumptions beyond the stated hierarchy and projection steps.

pith-pipeline@v0.9.0 · 5503 in / 1166 out tokens · 34101 ms · 2026-05-15T00:36:14.316055+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    projects modality embeddings into a hyperbolic Poincaré ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism... curvature-consistent Möbius linear maps... geodesic distance dc(x,y) = 2√c tanh⁻¹(√c∥−x⊕c y∥)

  • IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    models an extrinsic hierarchy between (i) code stylometry, capturing higher-level structural and stylistic signatures, and (ii) image representations of binary pre-executable artifacts (BPEA), capturing lower-level, execution-oriented byte semantics

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

  1. [1]

    We have a package for you! a comprehensive analysis of package hallucinations by code generating{LLMs},

    Joseph Spracklen et al., “We have a package for you! a comprehensive analysis of package hallucinations by code generating{LLMs},” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3687– 3706

  2. [2]

    An analyst-inspector framework for evaluating re- producibility of llms in data science,

    Qiuhai Zeng et al., “An analyst-inspector framework for evaluating re- producibility of llms in data science,”arXiv preprint arXiv:2502.16395, 2025

  3. [3]

    Stack overflow: A code laundering platform?,

    Le An, Ons Mlouki, Foutse Khomh, and Giuliano Antoniol, “Stack overflow: A code laundering platform?,” in24th International Con- ference on Software Analysis, Evolution, and Reengineering (SANER), Klagenfurt, Austria, 2017, pp. 282–292, IEEE

  4. [4]

    Reassessing code authorship attribution in the era of language models,

    Andy Liu et al., “Reassessing code authorship attribution in the era of language models,” in32nd USENIX Security Symposium (USENIX Security 23). 2023, pp. 257–274, USENIX Association

  5. [5]

    A first look at license compliance capability of llms in code generation,

    Tianyi Zhang et al., “A first look at license compliance capability of llms in code generation,”arXiv preprint arXiv:2408.02487, 2024

  6. [6]

    Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry,

    Andrea Gurioli, Maurizio Gabbrielli, and Stefano Zacchiroli, “Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry,” inIEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Montr ´eal, Canada, Mar. 2025

  7. [7]

    When coding style survives compilation: De-anonymizing programmers from executable binaries,

    Aylin Caliskan, Fabian Yamaguchi, Engin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan, “When coding style survives compilation: De-anonymizing programmers from executable binaries,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2018

  8. [8]

    Code authorship attribution: Methods and challenges,

    Vaibhav Kalgutkar, Rupinder Kaur, Hern ´an Gonzalez, Natalia Stakhanova, and Anita Matyukhina, “Code authorship attribution: Methods and challenges,”ACM Computing Surveys, vol. 52, no. 1, pp. 1–36, 2020

  9. [9]

    Authorship attribution of source code: A language-agnostic approach and applicability in software engineer- ing,

    Evgeny Bogomolov, Vladyslav Kovalenko, Yaroslav Rebryk, Alberto Bacchelli, and Timofey Bryksin, “Authorship attribution of source code: A language-agnostic approach and applicability in software engineer- ing,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (...

  10. [10]

    Reducing the impact of time evolution on source code authorship attribution via unsupervised data augmentation,

    Jos ´e Cambronero, Francisco J. Rodr ´ıguez, Laura Moreno, Daniel M. German, and Premkumar Devanbu, “Reducing the impact of time evolution on source code authorship attribution via unsupervised data augmentation,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1–25, 2024

  11. [11]

    Spotting the malicious moment: Characterizing malware behavior using dynamic features,

    Alberto Ferrante et al., “Spotting the malicious moment: Characterizing malware behavior using dynamic features,” in2016 11th International Conference on Availability, Reliability and Security (ARES), 2016, pp. 372–381

  12. [12]

    Who wrote this code? identifying the authors of program binaries,

    Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller, “Who wrote this code? identifying the authors of program binaries,” inEuropean Symposium on Research in Computer Security. Springer, 2011, pp. 172– 189

  13. [13]

    Bin- mlm: Binary authorship verification with flow-aware mixture-of-shared language model,

    Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen, “Bin- mlm: Binary authorship verification with flow-aware mixture-of-shared language model,” in2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022, pp. 1023– 1033

  14. [14]

    Detection of llm-generated java code using discretized nested bigrams,

    Timothy Paek and Chilukuri Mohan, “Detection of llm-generated java code using discretized nested bigrams,” inInternational Conference on Computational Science and Computational Intelligence. Springer, 2024, pp. 118–132

  15. [15]

    De-anonymizing programmers via code stylome- try,

    Caliskan-Islam et al., “De-anonymizing programmers via code stylome- try,” in24th USENIX security symposium (USENIX Security 15), 2015, pp. 255–270

  16. [16]

    When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

    Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan, “When coding style survives compilation: De-anonymizing programmers from executable binaries,”arXiv preprint arXiv:1512.08546, 2015

  17. [17]

    code2vec: Learning distributed representations of code,

    Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav, “code2vec: Learning distributed representations of code,”Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–29, 2019

  18. [18]

    Integration of static and dynamic code stylometry analysis for programmer de-anonymization,

    Ningfei Wang et al., “Integration of static and dynamic code stylometry analysis for programmer de-anonymization,” inProceedings of the 11th ACM workshop on artificial intelligence and security, 2018, pp. 74–84

  19. [19]

    Clave: A deep learning model for source code authorship verification with contrastive learning and transformer encoders,

    David ´Alvarez-Fidalgo and Francisco Ortin, “Clave: A deep learning model for source code authorship verification with contrastive learning and transformer encoders,”Information Processing & Management, vol. 62, no. 3, pp. 104005, 2025

  20. [20]

    Source code authorship attribution using long short-term memory based networks,

    Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt, “Source code authorship attribution using long short-term memory based networks,” inEuropean Symposium on Research in Computer Security. Springer, 2017, pp. 65–82

  21. [21]

    Large-scale and robust code authorship identification with deep feature learning,

    Mohammed Abuhamad, Tamer Abuhmed, David Mohaisen, and Daehun Nyang, “Large-scale and robust code authorship identification with deep feature learning,”ACM Transactions on Privacy and Security (TOPS), vol. 24, no. 4, pp. 1–35, 2021

  22. [22]

    Detecting stylistic fingerprints of large language models,

    Yehonatan Bitton et al., “Detecting stylistic fingerprints of large language models,”arXiv preprint arXiv:2503.01659, 2025

  23. [23]

    I know which llm wrote your code last summer: Llm generated code stylometry for authorship attribution,

    Tamas Bisztray et al., “I know which llm wrote your code last summer: Llm generated code stylometry for authorship attribution,” arXiv preprint arXiv:2506.17323, 2025

  24. [24]

    Marking code without breaking it: Code watermarking for detecting llm-generated code,

    Jungin Kim, Shinwoo Park, and Yo-Sub Han, “Marking code without breaking it: Code watermarking for detecting llm-generated code,”arXiv preprint arXiv:2502.18851, 2025

  25. [25]

    Codemark: Contextual and natural watermarking for tracing code snippet provenance,

    Wei Li, Borui Yang, Yujie Sun, Suyu Chen, Yuting Chen, and Liyao Xiang, “Codemark: Contextual and natural watermarking for tracing code snippet provenance,”IEEE Transactions on Dependable and Secure Computing, 2025

  26. [26]

    A watermark for large language models,

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein, “A watermark for large language models,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 17061–17084

  27. [27]

    Revisiting the robustness of wa- termarking to paraphrasing attacks,

    Saksham Rastogi and Danish Pruthi, “Revisiting the robustness of wa- termarking to paraphrasing attacks,”arXiv preprint arXiv:2411.05277, 2024

  28. [28]

    De- mark: Watermark removal in large language models,

    Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang, “De- mark: Watermark removal in large language models,”arXiv preprint arXiv:2410.13808, 2024

  29. [29]

    Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

    Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi, “Codet5+: Open code large language models for code understanding and generation,”arXiv preprint arXiv:2305.07922, 2023

  30. [30]

    Qwen2.5-Coder Technical Report

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al., “Qwen2. 5- coder technical report,”arXiv preprint arXiv:2409.12186, 2024

  31. [31]

    Unixcoder: Unified cross-modal pre-training for code representation,

    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” arXiv preprint arXiv:2203.03850, 2022

  32. [32]

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al., “Codebert: A pre-trained model for programming and natural lan- guages,”arXiv preprint arXiv:2002.08155, 2020

  33. [33]

    A convnet for the 2020s,

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie, “A convnet for the 2020s,”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 11976–11986, 2022

  34. [34]

    Efficientnetv2: Smaller models and faster training,

    Mingxing Tan and Quoc V . Le, “Efficientnetv2: Smaller models and faster training,” inProceedings of the 38th International Conference on Machine Learning (ICML), 2021, pp. 10096–10106

  35. [35]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations (ICLR), 2021

  36. [36]

    Maxvit: Multi-axis vision transformer,

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan C. Bovik, and Anish Mittal, “Maxvit: Multi-axis vision transformer,” inProceedings of the European Conference on Computer Vision (ECCV), 2022, pp. 459–479

  37. [37]

    Poincar ´e embeddings for learning hierarchical representations,

    Maximillian Nickel and Douwe Kiela, “Poincar ´e embeddings for learning hierarchical representations,”Advances in neural information processing systems, vol. 30, 2017

  38. [38]

    Hyperbolic neural networks,

    Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann, “Hyperbolic neural networks,”Advances in neural information processing systems, vol. 31, 2018

  39. [39]

    Hyfuse: Aligning heterogeneous speech pre-trained representations in hyperbolic space for speech emotion recognition,

    Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma, et al., “Hyfuse: Aligning heterogeneous speech pre-trained representations in hyperbolic space for speech emotion recognition,”arXiv preprint arXiv:2506.03403, 2025

  40. [40]

    CoDet-m4: Detecting machine-generated code in multi-lingual, multi-generator and multi-domain settings,

    Daniil Orel et al., “CoDet-m4: Detecting machine-generated code in multi-lingual, multi-generator and multi-domain settings,” inFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, Eds., Vienna, Austria, July 2025, pp. 10570–10593, Association for Computational Li...

  41. [41]

    Malnet: A large-scale image database of malicious software,

    Scott Freitas, Rahul Duggal, and Duen Horng Chau, “Malnet: A large-scale image database of malicious software,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3948–3952