arxiv: 2604.16377 · v2 · submitted 2026-03-24 · 💻 cs.CL · cs.CY

Recognition: 2 theorem links

· Lean Theorem

GoCoMA: Hyperbolic Multimodal Representation Fusion for Large Language Model-Generated Code Attribution

Nitin Choudhury , Bikrant Bikram Pratap Maurya , Bhavinkumar Vinodbhai Kuwar , Arun Balaji Buduru

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:36 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords LLM code attributionhyperbolic embeddingsmultimodal fusioncode stylometrybinary artifactsPoincaré ballcross-modal attention

0 comments

The pith

GoCoMA fuses stylometric code features and binary artifact images in hyperbolic space to attribute LLM sources more accurately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GoCoMA to determine whether a piece of code was generated by a particular large language model. It models an extrinsic hierarchy where code stylometry captures higher-level structural signatures and binary pre-executable artifact images capture lower-level byte semantics. Embeddings from these modalities are projected into a hyperbolic Poincaré ball, fused using geodesic-cosine similarity-based cross-modal attention, and then used for attribution after back-projection to Euclidean space. Experiments demonstrate that this approach outperforms both unimodal methods and Euclidean multimodal baselines on two benchmarks. This matters because accurate attribution can address security risks and licensing issues arising from LLM-generated code.

Core claim

GoCoMA projects modality embeddings into a hyperbolic Poincaré ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism, and back-projects the fused representation to Euclidean space for final LLM-source attribution, consistently outperforming unimodal and Euclidean multimodal baselines on the CoDET-M4 and LLMAuthorBench benchmarks.

What carries the argument

Geodesic-cosine similarity-based cross-modal attention (GCSA) for fusing hyperbolic embeddings of code stylometry and binary pre-executable artifacts.

Load-bearing premise

Code stylometry forms a higher extrinsic level than binary pre-executable artifact images in a hierarchy that hyperbolic geometry can exploit to improve attribution.

What would settle it

Demonstrating that performance does not improve or degrades when using the hyperbolic projection and GCSA fusion compared to Euclidean alternatives on the same benchmarks would disprove the central benefit.

Figures

Figures reproduced from arXiv: 2604.16377 by Arun Balaji Buduru, Bhavinkumar Vinodbhai Kuwar, Bikrant Bikram Pratap Maurya, Nitin Choudhury.

**Figure 1.** Figure 1: Overview of GoCoMA Architecture a) Projection from Euclidean to Hyperbolic Space : Let h (E) code, h (E) img ∈ R n denote Euclidean embeddings extracted from the code PLM and vision PTM, respectively. Each embedding is projected to the hyperbolic Poincare ball ´ D n c = {x ∈ R n : c∥x∥ 2 < 1} through the Riemannian exponential map: h (H) = expc 0 (h (E) ) h (H) = tanh√ c ∥h (E) ∥ h (E) √ c ∥h(E)∥ . (1) … view at source ↗

read the original abstract

Large Language Models (LLMs) trained on massive code corpora are now increasingly capable of generating code that is hard to distinguish from human-written code. This raises practical concerns, including security vulnerabilities and licensing ambiguity, and also motivates a forensic question: 'Who (or which LLM) wrote this piece of code?' We present GoCoMA, a multimodal framework that models an extrinsic hierarchy between (i) code stylometry, capturing higher-level structural and stylistic signatures, and (ii) image representations of binary pre-executable artifacts (BPEA), capturing lower-level, execution-oriented byte semantics shaped by compilation and toolchains. GoCoMA projects modality embeddings into a hyperbolic Poincar\'e ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism, and back-projects the fused representation to Euclidean space for final LLM-source attribution. Experiments on two open-source benchmarks (CoDET-M4 and LLMAuthorBench) show that GoCoMA consistently outperforms unimodal and Euclidean multimodal baselines under identical evaluation protocols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GoCoMA combines hyperbolic embeddings with geodesic-cosine fusion for LLM code attribution and reports gains on two benchmarks, but the abstract gives almost no experimental detail to back the numbers.

read the letter

The core move here is projecting code stylometry features and binary artifact images into a Poincaré ball, then fusing them with a geodesic-cosine attention layer before projecting back to Euclidean space for classification. That specific combination for this task does not appear in the cited prior work, so the architecture itself is new. The paper also states an explicit hierarchy assumption between the two modalities and tests it against both unimodal baselines and standard Euclidean multimodal ones on CoDET-M4 and LLMAuthorBench, claiming consistent improvement under matched protocols. If those gains survive proper controls, the approach could be practically useful for code forensics where distinguishing LLM sources matters for security or licensing questions. The modeling choice to use hyperbolic space for an extrinsic hierarchy is reasonable on paper and worth checking against the data. The main weakness is that the abstract supplies no accuracy numbers, no split details, no ablation results, and no statistical tests, so it is impossible to tell whether the reported outperformance is driven by the hyperbolic projection, the attention mechanism, or something else. The hierarchy assumption also needs direct evidence that it is not just adding capacity without real benefit. A reader working on multimodal representation learning or code attribution would find the fusion idea worth examining once the full experimental section is available. The work is coherent enough on its own terms to deserve referee time rather than a desk rejection, mainly to clarify the numbers and controls.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GoCoMA, a multimodal framework for attributing code to its LLM source (or human). It explicitly models an extrinsic hierarchy with code stylometry as higher-level and binary pre-executable artifact (BPEA) images as lower-level, projects both modality embeddings into a hyperbolic Poincaré ball, fuses them via geodesic-cosine similarity-based cross-modal attention (GCSA), and back-projects the result to Euclidean space for final classification. Experiments on the CoDET-M4 and LLMAuthorBench benchmarks are reported to show consistent gains over unimodal and Euclidean multimodal baselines under identical protocols.

Significance. If the reported gains are reproducible, the work would demonstrate a concrete use of hyperbolic geometry to capture hierarchical structure in code stylometry and execution artifacts, offering a new direction for forensic attribution of LLM-generated code. The GCSA fusion mechanism is a specific technical contribution that could be tested in related multimodal settings.

major comments (2)

[§4] §4 (Experiments): the abstract states that GoCoMA 'consistently outperforms' baselines on CoDET-M4 and LLMAuthorBench, yet no data splits, exact metrics, statistical significance tests, or ablation results are referenced. Without these, the central empirical claim cannot be verified and remains load-bearing for the paper's contribution.
[§3.2] §3.2 (Hyperbolic Projection and Hierarchy): the extrinsic hierarchy between stylometry (higher) and BPEA images (lower) is asserted without supporting evidence, prior literature, or sensitivity analysis. Because the choice of Poincaré ball and geodesic-cosine fusion rests directly on this hierarchy, its justification is required for the modeling approach to be defensible.

minor comments (2)

[§3.3] Notation for the fused representation after back-projection should be defined once and used consistently; currently the transition from hyperbolic to Euclidean space is described only at a high level.
[§2] Add a short related-work paragraph contrasting GCSA with existing hyperbolic attention mechanisms (e.g., those based on Möbius operations) to clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to improve clarity and verifiability of the results and modeling choices.

read point-by-point responses

Referee: [§4] §4 (Experiments): the abstract states that GoCoMA 'consistently outperforms' baselines on CoDET-M4 and LLMAuthorBench, yet no data splits, exact metrics, statistical significance tests, or ablation results are referenced. Without these, the central empirical claim cannot be verified and remains load-bearing for the paper's contribution.

Authors: We agree that the current presentation of experimental details is insufficient for full verification. In the revised manuscript we will expand §4 with explicit descriptions of the train/validation/test splits for both CoDET-M4 and LLMAuthorBench, report exact metric values (accuracy, macro-F1, etc.) together with standard deviations over multiple runs, include results of statistical significance tests (paired t-tests with p-values) against all baselines, and provide comprehensive ablation tables isolating the contributions of hyperbolic projection, GCSA, and the hierarchy. These additions will directly support the reproducibility of the reported gains. revision: yes
Referee: [§3.2] §3.2 (Hyperbolic Projection and Hierarchy): the extrinsic hierarchy between stylometry (higher) and BPEA images (lower) is asserted without supporting evidence, prior literature, or sensitivity analysis. Because the choice of Poincaré ball and geodesic-cosine fusion rests directly on this hierarchy, its justification is required for the modeling approach to be defensible.

Authors: We acknowledge that the hierarchy motivation requires stronger grounding. In the revision we will add citations to prior literature on hierarchical multimodal representations in code analysis and vision-language models that similarly assign higher-level semantic features to one modality and lower-level execution or pixel features to another. We will also insert a sensitivity analysis (new subsection or appendix) that swaps the modality levels and reports the resulting performance drop, thereby demonstrating that the chosen hierarchy is empirically beneficial rather than arbitrary. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a multimodal fusion approach using hyperbolic projections and geodesic-cosine attention for code attribution, with claims of outperformance on CoDET-M4 and LLMAuthorBench benchmarks under identical protocols. No equations, derivations, or modeling steps are provided in the available text that reduce any prediction or result to a fitted parameter, self-definition, or self-citation chain by construction. The hierarchy between stylometry and BPEA images is presented as an explicit modeling assumption rather than derived from prior results, and the central performance claims rest on empirical comparisons without load-bearing internal reductions. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method description is high-level and does not enumerate modeling assumptions beyond the stated hierarchy and projection steps.

pith-pipeline@v0.9.0 · 5503 in / 1166 out tokens · 34101 ms · 2026-05-15T00:36:14.316055+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

projects modality embeddings into a hyperbolic Poincaré ball, fuses them via a geodesic-cosine similarity-based cross-modal attention (GCSA) fusion mechanism... curvature-consistent Möbius linear maps... geodesic distance dc(x,y) = 2√c tanh⁻¹(√c∥−x⊕c y∥)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

models an extrinsic hierarchy between (i) code stylometry, capturing higher-level structural and stylistic signatures, and (ii) image representations of binary pre-executable artifacts (BPEA), capturing lower-level, execution-oriented byte semantics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

[1]

We have a package for you! a comprehensive analysis of package hallucinations by code generating{LLMs},

Joseph Spracklen et al., “We have a package for you! a comprehensive analysis of package hallucinations by code generating{LLMs},” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3687– 3706

work page 2025
[2]

An analyst-inspector framework for evaluating re- producibility of llms in data science,

Qiuhai Zeng et al., “An analyst-inspector framework for evaluating re- producibility of llms in data science,”arXiv preprint arXiv:2502.16395, 2025

work page arXiv 2025
[3]

Stack overflow: A code laundering platform?,

Le An, Ons Mlouki, Foutse Khomh, and Giuliano Antoniol, “Stack overflow: A code laundering platform?,” in24th International Con- ference on Software Analysis, Evolution, and Reengineering (SANER), Klagenfurt, Austria, 2017, pp. 282–292, IEEE

work page 2017
[4]

Reassessing code authorship attribution in the era of language models,

Andy Liu et al., “Reassessing code authorship attribution in the era of language models,” in32nd USENIX Security Symposium (USENIX Security 23). 2023, pp. 257–274, USENIX Association

work page 2023
[5]

A first look at license compliance capability of llms in code generation,

Tianyi Zhang et al., “A first look at license compliance capability of llms in code generation,”arXiv preprint arXiv:2408.02487, 2024

work page arXiv 2024
[6]

Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry,

Andrea Gurioli, Maurizio Gabbrielli, and Stefano Zacchiroli, “Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry,” inIEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Montr ´eal, Canada, Mar. 2025

work page 2025
[7]

When coding style survives compilation: De-anonymizing programmers from executable binaries,

Aylin Caliskan, Fabian Yamaguchi, Engin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan, “When coding style survives compilation: De-anonymizing programmers from executable binaries,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2018

work page 2018
[8]

Code authorship attribution: Methods and challenges,

Vaibhav Kalgutkar, Rupinder Kaur, Hern ´an Gonzalez, Natalia Stakhanova, and Anita Matyukhina, “Code authorship attribution: Methods and challenges,”ACM Computing Surveys, vol. 52, no. 1, pp. 1–36, 2020

work page 2020
[9]

Authorship attribution of source code: A language-agnostic approach and applicability in software engineer- ing,

Evgeny Bogomolov, Vladyslav Kovalenko, Yaroslav Rebryk, Alberto Bacchelli, and Timofey Bryksin, “Authorship attribution of source code: A language-agnostic approach and applicability in software engineer- ing,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (...

work page 2021
[10]

Reducing the impact of time evolution on source code authorship attribution via unsupervised data augmentation,

Jos ´e Cambronero, Francisco J. Rodr ´ıguez, Laura Moreno, Daniel M. German, and Premkumar Devanbu, “Reducing the impact of time evolution on source code authorship attribution via unsupervised data augmentation,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1–25, 2024

work page 2024
[11]

Spotting the malicious moment: Characterizing malware behavior using dynamic features,

Alberto Ferrante et al., “Spotting the malicious moment: Characterizing malware behavior using dynamic features,” in2016 11th International Conference on Availability, Reliability and Security (ARES), 2016, pp. 372–381

work page 2016
[12]

Who wrote this code? identifying the authors of program binaries,

Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller, “Who wrote this code? identifying the authors of program binaries,” inEuropean Symposium on Research in Computer Security. Springer, 2011, pp. 172– 189

work page 2011
[13]

Bin- mlm: Binary authorship verification with flow-aware mixture-of-shared language model,

Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen, “Bin- mlm: Binary authorship verification with flow-aware mixture-of-shared language model,” in2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022, pp. 1023– 1033

work page 2022
[14]

Detection of llm-generated java code using discretized nested bigrams,

Timothy Paek and Chilukuri Mohan, “Detection of llm-generated java code using discretized nested bigrams,” inInternational Conference on Computational Science and Computational Intelligence. Springer, 2024, pp. 118–132

work page 2024
[15]

De-anonymizing programmers via code stylome- try,

Caliskan-Islam et al., “De-anonymizing programmers via code stylome- try,” in24th USENIX security symposium (USENIX Security 15), 2015, pp. 255–270

work page 2015
[16]

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan, “When coding style survives compilation: De-anonymizing programmers from executable binaries,”arXiv preprint arXiv:1512.08546, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

code2vec: Learning distributed representations of code,

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav, “code2vec: Learning distributed representations of code,”Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–29, 2019

work page 2019
[18]

Integration of static and dynamic code stylometry analysis for programmer de-anonymization,

Ningfei Wang et al., “Integration of static and dynamic code stylometry analysis for programmer de-anonymization,” inProceedings of the 11th ACM workshop on artificial intelligence and security, 2018, pp. 74–84

work page 2018
[19]

Clave: A deep learning model for source code authorship verification with contrastive learning and transformer encoders,

David ´Alvarez-Fidalgo and Francisco Ortin, “Clave: A deep learning model for source code authorship verification with contrastive learning and transformer encoders,”Information Processing & Management, vol. 62, no. 3, pp. 104005, 2025

work page 2025
[20]

Source code authorship attribution using long short-term memory based networks,

Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt, “Source code authorship attribution using long short-term memory based networks,” inEuropean Symposium on Research in Computer Security. Springer, 2017, pp. 65–82

work page 2017
[21]

Large-scale and robust code authorship identification with deep feature learning,

Mohammed Abuhamad, Tamer Abuhmed, David Mohaisen, and Daehun Nyang, “Large-scale and robust code authorship identification with deep feature learning,”ACM Transactions on Privacy and Security (TOPS), vol. 24, no. 4, pp. 1–35, 2021

work page 2021
[22]

Detecting stylistic fingerprints of large language models,

Yehonatan Bitton et al., “Detecting stylistic fingerprints of large language models,”arXiv preprint arXiv:2503.01659, 2025

work page arXiv 2025
[23]

I know which llm wrote your code last summer: Llm generated code stylometry for authorship attribution,

Tamas Bisztray et al., “I know which llm wrote your code last summer: Llm generated code stylometry for authorship attribution,” arXiv preprint arXiv:2506.17323, 2025

work page arXiv 2025
[24]

Marking code without breaking it: Code watermarking for detecting llm-generated code,

Jungin Kim, Shinwoo Park, and Yo-Sub Han, “Marking code without breaking it: Code watermarking for detecting llm-generated code,”arXiv preprint arXiv:2502.18851, 2025

work page arXiv 2025
[25]

Codemark: Contextual and natural watermarking for tracing code snippet provenance,

Wei Li, Borui Yang, Yujie Sun, Suyu Chen, Yuting Chen, and Liyao Xiang, “Codemark: Contextual and natural watermarking for tracing code snippet provenance,”IEEE Transactions on Dependable and Secure Computing, 2025

work page 2025
[26]

A watermark for large language models,

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein, “A watermark for large language models,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 17061–17084

work page 2023
[27]

Revisiting the robustness of wa- termarking to paraphrasing attacks,

Saksham Rastogi and Danish Pruthi, “Revisiting the robustness of wa- termarking to paraphrasing attacks,”arXiv preprint arXiv:2411.05277, 2024

work page arXiv 2024
[28]

De- mark: Watermark removal in large language models,

Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang, “De- mark: Watermark removal in large language models,”arXiv preprint arXiv:2410.13808, 2024

work page arXiv 2024
[29]

Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi, “Codet5+: Open code large language models for code understanding and generation,”arXiv preprint arXiv:2305.07922, 2023

work page arXiv 2023
[30]

Qwen2.5-Coder Technical Report

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al., “Qwen2. 5- coder technical report,”arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Unixcoder: Unified cross-modal pre-training for code representation,

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” arXiv preprint arXiv:2203.03850, 2022

work page arXiv 2022
[32]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al., “Codebert: A pre-trained model for programming and natural lan- guages,”arXiv preprint arXiv:2002.08155, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002
[33]

A convnet for the 2020s,

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie, “A convnet for the 2020s,”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 11976–11986, 2022

work page 2022
[34]

Efficientnetv2: Smaller models and faster training,

Mingxing Tan and Quoc V . Le, “Efficientnetv2: Smaller models and faster training,” inProceedings of the 38th International Conference on Machine Learning (ICML), 2021, pp. 10096–10106

work page 2021
[35]

An image is worth 16x16 words: Transformers for image recognition at scale,

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[36]

Maxvit: Multi-axis vision transformer,

Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan C. Bovik, and Anish Mittal, “Maxvit: Multi-axis vision transformer,” inProceedings of the European Conference on Computer Vision (ECCV), 2022, pp. 459–479

work page 2022
[37]

Poincar ´e embeddings for learning hierarchical representations,

Maximillian Nickel and Douwe Kiela, “Poincar ´e embeddings for learning hierarchical representations,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[38]

Hyperbolic neural networks,

Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann, “Hyperbolic neural networks,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[39]

Hyfuse: Aligning heterogeneous speech pre-trained representations in hyperbolic space for speech emotion recognition,

Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma, et al., “Hyfuse: Aligning heterogeneous speech pre-trained representations in hyperbolic space for speech emotion recognition,”arXiv preprint arXiv:2506.03403, 2025

work page arXiv 2025
[40]

CoDet-m4: Detecting machine-generated code in multi-lingual, multi-generator and multi-domain settings,

Daniil Orel et al., “CoDet-m4: Detecting machine-generated code in multi-lingual, multi-generator and multi-domain settings,” inFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, Eds., Vienna, Austria, July 2025, pp. 10570–10593, Association for Computational Li...

work page 2025
[41]

Malnet: A large-scale image database of malicious software,

Scott Freitas, Rahul Duggal, and Duen Horng Chau, “Malnet: A large-scale image database of malicious software,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3948–3952

work page 2022