pith. sign in

arxiv: 2606.31971 · v1 · pith:C5OQEJ74new · submitted 2026-06-30 · 💻 cs.SE

CoCoMUT: A Tool for Code-Context Mining and Automated Dataset Generation

Pith reviewed 2026-07-01 03:51 UTC · model grok-4.3

classification 💻 cs.SE
keywords code context miningautomated dataset generationJava static analysiscall graph reconciliationsoftware engineering toolsmethod level context
0
0 comments X

The pith

CoCoMUT unifies build discovery, call-graph construction and source reconciliation to produce method context records from Java projects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a tool that automatically collects detailed context for each method in a Java project, such as its class, documentation, callers and callees. This context is packaged into versioned JSON records that can be used individually or as full datasets. The authors show that the pipeline works on twenty real repositories, reconciling nearly all call edges and passing manual checks on sampled records. A sympathetic reader would care because software engineering tools often need this kind of surrounding information but collecting it by hand is slow and error-prone.

Core claim

CoCoMUT extracts context for a focal method or generates datasets by discovering project structure, resolving build information, constructing a static call graph with SootUp, reconciling bytecode edges with Spoon source extraction, and emitting versioned JSON. On twenty repositories it produced 56,512 method-context records and 386,048 call edges, with 97.8 percent reconciliation for source targets and 99 percent passing audit.

What carries the argument

The reusable pipeline that combines build discovery, SootUp call-graph construction, Spoon source extraction, and custom source-bytecode reconciliation.

If this is right

  • Researchers can generate large context datasets without manual effort.
  • Individual method records supply ready context for assistants doing explanation, repair or search.
  • The same pipeline can be applied at class, package or system scope.
  • Versioned outputs support reproducible experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar pipelines could be developed for other languages if equivalent static analysis tools exist.
  • The high reconciliation rate suggests the approach generalizes beyond the tested Maven and Gradle projects.
  • Generated datasets might serve as benchmarks for program comprehension tasks.

Load-bearing premise

The assumption that SootUp call graphs combined with Spoon extraction and reconciliation logic will correctly capture context across arbitrary real-world Java projects without significant unhandled edge cases.

What would settle it

Running the tool on a new collection of Java projects and finding that reconciliation falls below 90 percent or that more than 5 percent of sampled records fail the audit checks.

Figures

Figures reproduced from arXiv: 2606.31971 by Ahsanul Ameen Sabit, Alessandro Botta, Jaya Vardhini Akurathi, Shiven Garisa, Soneya Binta Hossain, Trey Woodlief.

Figure 1
Figure 1. Figure 1: summarizes the CoCoMUT workflow. CoCoMUT com￾bines two views of the same project: source declarations modeled by Spoon and bytecode methods modeled by SootUp. It reconciles these views through stable method identities and emits one method￾context record per selected method. The workflow has four phases; phase 2 builds the source and bytecode views. 1. Input and extraction request. The user provides a Java … view at source ↗
read the original abstract

Software-engineering assistants often need method-level context beyond an isolated body, including enclosing-class information, documentation, callers, callees, type hierarchy, and structural characteristics. Manually collecting this context is time-consuming, inconsistent, and difficult to reproduce across large Java projects. We present CoCoMUT, a Java tool for Code-Context Mining and Automated Dataset Generation. CoCoMUT extracts context for a focal method or generates datasets at class, package, or system scope. It discovers project structure, resolves build and classpath information, constructs a SootUp static call graph, and reconciles bytecode-level call edges with Spoon-based source extraction. Each method record combines source, class, documentation, call-graph, and metadata context, providing reproducible inputs for training and running learned software-engineering techniques. The key contribution is a reusable, task-independent pipeline that unifies build discovery, source extraction, call-graph construction, source-bytecode reconciliation, and versioned JSON dataset generation. The resulting records can be consumed individually as context for a focal method or collectively as datasets for documentation, explanation, testing, review, repair, search, and program-comprehension workflows. We evaluate CoCoMUT on 20 real-world Java repositories evenly split between Maven and Gradle. CoCoMUT processed all 20 repositories, emitting 56,512 method-context records and 386,048 serialized call edges. Among call edges whose bytecode targets belonged to project source, CoCoMUT reconciled 97.8% to source method identities. In a manual audit of 200 randomly sampled methods across 10 systems, 99.0% of generated context records passed all applicable correctness checks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents CoCoMUT, a Java tool that automates extraction of method-level context (enclosing class, documentation, callers/callees via call graph, type hierarchy, structural metadata) from real projects. The pipeline integrates build discovery and classpath resolution, SootUp static call-graph construction, Spoon source extraction, custom source-bytecode reconciliation, and emission of versioned JSON records. It was evaluated on 20 real-world repositories (evenly split Maven/Gradle), successfully processing all of them to produce 56,512 method-context records and 386,048 call edges, with 97.8% reconciliation of project-source bytecode targets and 99% pass rate on a 200-method manual audit across 10 systems.

Significance. If the empirical results hold, the tool supplies a reusable, task-independent pipeline for generating reproducible method-context datasets that can directly support training and evaluation of learned SE techniques for documentation, repair, testing, search, and comprehension. The reported success on 20 diverse, real repositories (with concrete output volumes and two independent correctness metrics) provides direct evidence of practicality for dataset-generation workflows.

major comments (1)
  1. [Evaluation] Evaluation (the reported results on 20 repositories): the manuscript supplies only aggregate figures (97.8% reconciliation among project-source bytecode targets; 99% audit pass rate) and states that all 20 repositories were processed, but provides no per-repository breakdown, no enumeration of any build-discovery or classpath failures, and no characterization of the 2.2% unreconciled edges. This information is load-bearing for the central claim that the SootUp + Spoon + reconciliation combination correctly captures context across arbitrary real-world Java projects.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation (the reported results on 20 repositories): the manuscript supplies only aggregate figures (97.8% reconciliation among project-source bytecode targets; 99% audit pass rate) and states that all 20 repositories were processed, but provides no per-repository breakdown, no enumeration of any build-discovery or classpath failures, and no characterization of the 2.2% unreconciled edges. This information is load-bearing for the central claim that the SootUp + Spoon + reconciliation combination correctly captures context across arbitrary real-world Java projects.

    Authors: We agree that a per-repository breakdown would provide stronger evidence for the claim of broad applicability. In the revised manuscript we will add a table that reports, for each of the 20 repositories, the build system (Maven or Gradle), number of method-context records emitted, number of call edges, reconciliation rate among project-source bytecode targets, and any build-discovery or classpath-resolution failures encountered. We will also add a short section characterizing the 2.2% unreconciled edges, grouping them by observable cause (e.g., ambiguous overload resolution, missing source for synthetic methods, or classpath mismatches). The 99% audit figure already reflects a random sample drawn across 10 of the 20 systems; we will note that no audited record failed its applicable checks. These additions directly address the load-bearing concern while preserving the paper's focus on the overall pipeline. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical tool description with direct measurements

full rationale

The paper describes an implemented pipeline (build discovery, SootUp call graphs, Spoon extraction, reconciliation) and reports concrete empirical outcomes: processing 20 repositories, emitting 56,512 records, 97.8% reconciliation rate, and 99% audit pass rate. No equations, fitted models, predictions, or derivations exist that could reduce to author-defined inputs. No self-citation load-bearing steps or uniqueness theorems are invoked. The evaluation is self-contained against external benchmarks (the 20 repositories and manual audit), satisfying the criteria for score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the reliability of two external static-analysis frameworks and the assumption that Java build systems follow discoverable conventions; no free parameters or new entities are introduced.

axioms (2)
  • domain assumption Static analysis frameworks SootUp and Spoon can be used to construct call graphs and extract source information from compiled and source Java artifacts.
    The pipeline depends on these frameworks for call-graph construction and source extraction.
  • domain assumption Maven and Gradle projects expose sufficient metadata for automatic discovery of structure and classpath without manual configuration.
    The evaluation processed 20 repositories evenly split between the two build systems.

pith-pipeline@v0.9.1-grok · 5864 in / 1576 out tokens · 62296 ms · 2026-07-01T03:51:21.669334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 13 canonical work pages

  1. [1]

    ASSERT Lab. 2026. CoCoMUT: Context Constructor. https://github.com/assert- lab/CoCoMUT. Research artifact

  2. [2]

    ASSERT Lab. 2026. CoCoMUT Demonstration Video. https://youtu.be/ RCUzkCQjG30. Video demonstration. CoCoMUT: A Tool for Code-Context Mining and Automated Dataset Generation Conference’17, July 2017, Washington, DC, USA

  3. [3]

    2026.CoCoMUT: A Tool for Code- Context Mining and Automated Dataset Generation

    Alessandro Botta, Shiven Garisa, Jaya Vardhini Akurathi, Ahsanul Ameen Sabit, Trey Woodlief, and Soneya Binta Hossain. 2026.CoCoMUT: A Tool for Code- Context Mining and Automated Dataset Generation. doi:10.5281/zenodo.20962959

  4. [4]

    Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jian- wei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Founda- tions of Software Engineering. Association for Computing Machinery, 572–576. doi:10.1145/3663529.3663801

  5. [5]

    2026.Jackson

    FasterXML. 2026.Jackson. https://github.com/fasterxml/jackson GitHub reposi- tory

  6. [6]

    Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved Automatic Summarization of Subroutines via Attention to File Context. InProceed- ings of the 17th International Conference on Mining Software Repositories (MSR ’20). Association for Computing Machinery, 300–310. doi:10.1145/3379597.3387449

  7. [7]

    Soneya Binta Hossain and Matthew B. Dwyer. 2025. TOGLL: Correct and Strong Test Oracle Generation with LLMs. In47th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 1475–1487. doi:10.1109/ICSE55347.2025.00098

  8. [8]

    Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, and Omer Tripp. 2024. A deep dive into large language models for automated bug localization and repair.Proceedings of the ACM on Software Engineering1, FSE (2024), 1471–1493

  9. [9]

    Soneya Binta Hossain, Raygan Taylor, and Matthew Dwyer. 2025. Doc2oracll: Investigating the impact of documentation on llm-based test oracle generation. Proceedings of the ACM on Software Engineering2, FSE (2025), 1870–1891

  10. [10]

    JSON Lines Community. [n. d.]. JSON Lines. https://jsonlines.org/. Accessed: 2026-06-26

  11. [11]

    Kadiray Karakaya, Stefan Schott, Jonas Klauke, Eric Bodden, Markus Schmidt, Linghui Luo, and Dongjie He. 2024. SootUp: A Redesign of the Soot Static Analysis Framework. InTools and Algorithms for the Construction and Analysis of Systems: 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Sof...

  12. [12]

    Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-Based Code Transformation Learning for Automated Program Repair. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). As- sociation for Computing Machinery, 602–614. doi:10.1145/3377811.3380345

  13. [13]

    Grundy, and Xiaohu Yang

    Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John C. Grundy, and Xiaohu Yang

  14. [14]

    doi:10.1145/3477535

    On the Reproducibility and Replicability of Deep Learning in Software Engineering.ACM Transactions on Software Engineering and Methodology31, 1, Article 15 (2022), 46 pages. doi:10.1145/3477535

  15. [15]

    Oracle. 2021. Documentation Comment Specification for the Standard Doclet. https://docs.oracle.com/en/java/javase/17/docs/specs/javadoc/doc- comment-spec.html. Accessed June 22, 2026

  16. [16]

    Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, and Lionel Seinturier. 2016. SPOON: A library for implementing analyses and transforma- tions of Java source code.Softw. Pract. Exper.46, 9 (Sept. 2016), 1155–1179. doi:10.1002/spe.2346

  17. [17]

    Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2024. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Transactions on Software Engineering50, 1 (2024), 85–105. doi:10.1109/TSE. 2023.3334955

  18. [18]

    Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, and Charles Godin

    Vijay Sundaresan, Laurie J. Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, and Charles Godin. 2000. Practical Virtual Method Call Resolution for Java. InProceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’00). Association for Computing Machinery, 264–280

  19. [19]

    Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. Methods2Test: A Dataset of Focal Methods Mapped to Test Cases. InProceedings of the 19th International Conference on Mining Software Repositories (MSR ’22). Association for Computing Machinery, 299–303. doi:10.1145/3524842.3528009

  20. [20]

    Zejun Wang, Kaibo Liu, Ge Li, and Zhi Jin. 2024. HITS: High-Coverage LLM-Based Unit Test Generation via Method Slicing. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, 1258–1268. doi:10.1145/3691620.3695501

  21. [21]

    Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 2471–24...

  22. [22]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Au- toCodeRover: Autonomous Program Improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, 1592–1604. doi:10.1145/3650212.3680384