arxiv: 2605.03353 · v2 · submitted 2026-05-05 · 💻 cs.CR · cs.AI

Recognition: no theorem link

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

Yipeng Ouyang , Yi Xiao , Yuhao Gu , Xianwei Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:45 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentsskill compilationintermediate representationportabilitysecuritycross-frameworkcompilation pipeline

0 comments

The pith

SkCC compiles LLM agent skills into a portable intermediate representation that works across frameworks with built-in security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM agent skills written in Markdown vary widely in performance across different frameworks because of prompt formatting differences. The paper presents SkCC as a compiler that translates these skills into SkIR, a strongly-typed intermediate representation, to decouple the skill logic from any specific framework. This allows one skill to be adapted to many frameworks efficiently. A static optimizer adds security enforcement at compile time. Experiments show improved success rates, low compilation time, and token savings.

Core claim

The central discovery is that introducing classical compilation techniques, centered on a strongly-typed IR called SkIR, allows skill semantics to be separated from framework formatting. This enables a four-phase pipeline that reduces adaptation complexity from O(m × n) to O(m + n), while the Optimizer blocks vulnerabilities before deployment.

What carries the argument

SkIR, the strongly-typed intermediate representation that captures the full semantics of skills independently of framework-specific prompt formats.

If this is right

Adaptation effort scales linearly with the number of skills and frameworks rather than quadratically.
Security vulnerabilities are detected and blocked proactively with a 94.8% trigger rate.
Consistent pass rate improvements across tested frameworks like Claude Code and Kimi CLI.
Sub-10ms compilation latency and runtime token reductions of 10-46%.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Skill authors could create once and reach all frameworks without additional work.
It opens the possibility for a unified skill ecosystem or marketplace.
Similar compilation approaches might address portability in other prompt engineering domains.

Load-bearing premise

One strongly-typed IR is sufficient to represent the semantics of any Markdown-written skill without losing necessary details or expressiveness.

What would settle it

A test where SkCC-compiled skills perform no better than originals or still need per-framework tweaks on new frameworks would disprove the portability benefit.

Figures

Figures reproduced from arXiv: 2605.03353 by Xianwei Zhang, Yipeng Ouyang, Yi Xiao, Yuhao Gu.

**Figure 1.** Figure 1: Left: Agent workflow with SKCC integration. Skills are authored once as SKILL.md, compiled to framework-native formats, and loaded via progressive routing manifests at agent initialization. Right: Adaptation complexity reduction from O(m × n) to O(m + n). Traditional per-framework rewriting requires m × n manual adaptations. SKCC decouples skills and frameworks through a unified IR, requiring only m skill… view at source ↗

**Figure 2.** Figure 2: SKCC’s four-phase compilation pipeline. A unified SKILL.md source is parsed into a raw AST (Syntax Parser), transformed into a strongly-typed SKIR (IR Builder), validated and optimized by compile-time security analysis (Security Optimizer), and emitted into framework-native formats (Target Emitters). A representative capability of the IR level is nested data detection: when a skill declares schemas with ne… view at source ↗

**Figure 3.** Figure 3: Pass rate and mean reward comparison of Baseline vs. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Average relative pass rate improvement across methods. 4.2.2 Security: Injection Trigger and Compilation Interception [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cross-framework token and time efficiency heatmap. SKCC skills show consistent reductions in total tokens and execution time across all frameworks. Claude token counts are reported in hundreds due to API measurement differences. Static Expansion vs. Dynamic Efficiency. Compilation introduces static structural overhead from XML tags, Anti-Skill constraints, and format hardening (ranging from +4% on Kimi t… view at source ↗

**Figure 6.** Figure 6: Ablation study radar chart: the same Kimi-compiled format produces divergent effects [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

LLM agents increasingly rely on reusable skills (e.g., `SKILL.md`) to execute complex tasks, yet these artifacts lack portability: agent frameworks are highly sensitive to prompt formatting, leading to a large performance variation for the same skill. Nevertheless, most skills are authored once as format-agnostic Markdown, necessitating costly per-framework rewrites and also leaving security largely unaddressed, with widespread vulnerabilities in practice. To address this, we present SkCC, a compiler for LLM agents that introduces classical compilation design into agent skill development. SkCC centers on SkIR, a strongly-typed intermediate representation that decouples skill semantics from framework-specific formatting, thus enabling portable deployment across agent frameworks. Atop of this IR, a static Optimizer enforces security constraints, blocking vulnerabilities before deployment. Implemented as a four-phase pipeline, SkCC effectively reduces adaptation complexity from $O(m \times n)$ to $O(m + n)$ across $m$ skills and $n$ frameworks. Experiments on SkillsBench demonstrate that SkCC delivers consistent and substantial gains over original counterparts, with pass rate increases from 21.1% to 33.3% on Claude Code and from 35.1% to 48.7% on Kimi CLI. Further, the design achieves sub-10ms compilation latency, 94.8% proactive security trigger rate, and 10-46% runtime token savings across frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SkCC applies a typed IR and static optimizer to make Markdown skills portable across LLM frameworks with reported security and efficiency gains, but the IR's coverage of natural-language nuances is the key unproven piece.

read the letter

SkCC is a compiler that takes format-agnostic Markdown skills, converts them to a strongly-typed intermediate representation called SkIR, then applies a static security optimizer before emitting framework-specific versions. The four-phase pipeline is the concrete new artifact here, and the O(m + n) complexity claim follows directly if the IR succeeds in decoupling semantics from formatting details.

Referee Report

2 major / 2 minor

Summary. The paper presents SkCC, a compiler for LLM agent skills that introduces a strongly-typed intermediate representation (SkIR) to decouple skill semantics from framework-specific formatting. It describes a four-phase pipeline that reduces adaptation complexity from O(m × n) to O(m + n) across m skills and n frameworks, includes a static optimizer to enforce security constraints before deployment, and reports experimental results on SkillsBench showing pass-rate gains (21.1% to 33.3% on Claude Code; 35.1% to 48.7% on Kimi CLI), sub-10 ms compilation latency, 94.8% proactive security trigger rate, and 10-46% runtime token savings.

Significance. If the central claims hold, the work would be significant for standardizing skill development in LLM agents, offering a practical path to portability and proactive security via classical compilation techniques. The four-phase pipeline design and cross-framework empirical evaluation on SkillsBench provide concrete evidence of reduced engineering effort and measurable efficiency gains, which could influence how reusable skills are authored and deployed in agent systems.

major comments (2)

[Abstract and SkIR/pipeline description] The O(m + n) complexity reduction and portability guarantee rest on the assumption that SkIR can represent the complete semantics and intent of arbitrary natural-language Markdown skills without loss or the need for framework-specific extensions. No formal grammar, type definitions, or coverage analysis for edge cases (e.g., implicit context, conditional logic, or ambiguous instructions) is supplied in the SkIR definition or pipeline description, leaving open the possibility that expressiveness gaps would reintroduce per-framework adaptations.
[Experimental evaluation] The reported performance and security metrics (pass-rate increases, 94.8% trigger rate, token savings) are load-bearing for the practicality claims, yet the experimental section provides no details on SkillsBench composition, data splits, baseline implementations, number of skills/frameworks tested, or statistical tests. This prevents verification that the gains are consistent and attributable to the compiler rather than experimental artifacts.

minor comments (2)

[Abstract] The abstract introduces the four-phase pipeline but does not name the phases; a one-sentence enumeration would improve immediate clarity without lengthening the abstract.
[Introduction] Acronyms SkCC and SkIR should be expanded on first use in the main body even if defined in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where revisions will strengthen the manuscript while defending the core contributions on the basis of the presented design and results.

read point-by-point responses

Referee: The O(m + n) complexity reduction and portability guarantee rest on the assumption that SkIR can represent the complete semantics and intent of arbitrary natural-language Markdown skills without loss or the need for framework-specific extensions. No formal grammar, type definitions, or coverage analysis for edge cases (e.g., implicit context, conditional logic, or ambiguous instructions) is supplied in the SkIR definition or pipeline description, leaving open the possibility that expressiveness gaps would reintroduce per-framework adaptations.

Authors: We acknowledge that the current manuscript presents SkIR at a descriptive level with illustrative examples rather than a complete formal grammar or exhaustive coverage analysis. The strongly-typed nature of SkIR is intended to capture core semantics (control flow, data dependencies, and security-relevant operations) through its type system, and the four-phase pipeline is designed to preserve these semantics during lowering. The empirical results on SkillsBench demonstrate that the evaluated skills compiled and executed portably without requiring framework-specific extensions, supporting the O(m + n) claim in practice. To address the concern directly, the revised manuscript will include an explicit formal grammar for SkIR, complete type definitions, and a dedicated subsection analyzing coverage of edge cases such as conditional logic and ambiguous instructions, drawing on the skills present in the benchmark. revision: yes
Referee: The reported performance and security metrics (pass-rate increases, 94.8% trigger rate, token savings) are load-bearing for the practicality claims, yet the experimental section provides no details on SkillsBench composition, data splits, baseline implementations, number of skills/frameworks tested, or statistical tests. This prevents verification that the gains are consistent and attributable to the compiler rather than experimental artifacts.

Authors: We agree that the experimental section must be expanded for reproducibility and to allow readers to verify that observed gains are attributable to SkCC. The manuscript currently reports aggregate pass-rate, latency, security-trigger, and token-saving figures but does not detail the benchmark composition, evaluation protocol, baseline implementations, exact numbers of skills and frameworks, or statistical tests. In the revision we will add: (i) a full description of SkillsBench including skill count, framework coverage, and task categories; (ii) the evaluation protocol and any data splits used; (iii) precise baseline implementations; (iv) the number of trials per configuration; and (v) statistical significance tests (e.g., paired t-tests with p-values) confirming that the reported improvements are consistent and not artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: design claim and empirical results are independent

full rationale

The abstract and provided text introduce SkCC as a four-phase compiler pipeline centered on a new strongly-typed SkIR that decouples Markdown skill semantics from framework formatting. The O(m+n) complexity reduction is a direct consequence of successful decoupling (standard compiler benefit) rather than a fitted or self-referential quantity. Reported gains (pass-rate improvements, sub-10ms latency, 94.8% security trigger rate, token savings) are presented as experimental measurements on SkillsBench, not as predictions derived from parameters fitted to the same data or from self-citations. No equations, ansatzes, uniqueness theorems, or load-bearing self-citations appear in the supplied material that would collapse any central claim back to its inputs by construction. The design's correctness therefore rests on external validation rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract introduces SkIR and the optimizer as new constructs without enumerating free parameters or background axioms; the design implicitly assumes that skill semantics are fully expressible in a typed IR and that static analysis suffices for security.

invented entities (2)

SkIR no independent evidence
purpose: strongly-typed intermediate representation that decouples skill semantics from framework-specific formatting
Newly defined in the paper to enable portability.
SkCC Optimizer no independent evidence
purpose: static enforcement of security constraints before deployment
New component introduced for proactive vulnerability blocking.

pith-pipeline@v0.9.0 · 5560 in / 1447 out tokens · 60279 ms · 2026-05-12T01:45:17.913725+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 2 internal anchors

[1]

Jennings

Michael Wooldridge and Nicholas R. Jennings. Intelligent Agents: Theory and Practice.The Knowledge Engineering Review, 10(2):115–152, 1995. doi: 10.1017/S0269888900008122

work page doi:10.1017/s0269888900008122 1995
[2]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao, Dian Yu, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. doi: 10.48550/arXiv.2305.10601

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.10601 2023
[3]

A survey on large language model based autonomous agents , volume =

Lei Wang, Chen Ma, et al. A Survey on Large Language Model Based Autonomous Agents. Frontiers of Computer Science, 18(6):186345, 2024. doi: 10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[4]

Claude Code Overview, 2026

Anthropic. Claude Code Overview, 2026. URL https://code.claude.com/docs/en/ov erview

work page 2026
[5]

Codex Documentation, 2026

OpenAI. Codex Documentation, 2026. URLhttps://developers.openai.com/codex

work page 2026
[6]

Gemini CLI Documentation, 2026

Google. Gemini CLI Documentation, 2026. URL https://google-gemini.github.io/ gemini-cli/docs/

work page 2026
[7]

Kimi CLI Documentation, 2026

Kimi. Kimi CLI Documentation, 2026. URL https://moonshotai.github.io/kimi-cli /en/guides/getting-started.html

work page 2026
[8]

SKILL.md Specification and Progressive Disclosure Mechanism, 2026

Agent Skills. SKILL.md Specification and Progressive Disclosure Mechanism, 2026. URL ht tps://deepwiki.com/agentskills/agentskills/2.2-skill.md-specification

work page 2026
[9]

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward, 2026

Renjun Xu, Yang Yan, et al. Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward, 2026

work page 2026
[10]

Anthropic Skills: Public repository for Agent Skills, 2026

Anthropic. Anthropic Skills: Public repository for Agent Skills, 2026. URL https://github .com/anthropics/skills

work page 2026
[11]

Everything Claude Code: The agent harness performance optimization system, 2026

Affan-m. Everything Claude Code: The agent harness performance optimization system, 2026. URLhttps://github.com/affaan-m/everything-claude-code

work page 2026
[12]

Sentry Skills: Agent Skills used by the Sentry team for development, 2026

getSentry. Sentry Skills: Agent Skills used by the Sentry team for development, 2026. URL https://github.com/getsentry/skills

work page 2026
[13]

Does Prompt Formatting Have Any Impact on LLM Performance?, 2024

Jia He, Mukund Rungta, et al. Does Prompt Formatting Have Any Impact on LLM Performance?, 2024

work page 2024
[14]

Claude API Docs: Prompting Best Practices — Structure Prompts with XML Tags,

Anthropic. Claude API Docs: Prompting Best Practices — Structure Prompts with XML Tags,

work page
[15]

URL https://platform.claude.com/docs/en/build-with-claude/prompt-e ngineering/claude-prompting-best-practices

work page
[16]

Structured Outputs and Format Tax Elimination, 2025

OpenAI. Structured Outputs and Format Tax Elimination, 2025. URL https://platform.o penai.com/docs/guides/structured-outputs. 10

work page 2025
[17]

Which Nested Data Format Do LLMs Understand Best? JSON vs

Improving Agents. Which Nested Data Format Do LLMs Understand Best? JSON vs. Y AML vs. XML vs. Markdown, 2025. URL https://www.improvingagents.com/blog/best-n ested-data-format/

work page 2025
[18]

Snyk Finds Prompt Injection in 36%, 1467 Malicious Payloads in a ToxicSkills Study of Agent Skills Supply Chain Compromise, 2026

Luca Beurer-Kellner, Alexey Kudrinskii, et al. Snyk Finds Prompt Injection in 36%, 1467 Malicious Payloads in a ToxicSkills Study of Agent Skills Supply Chain Compromise, 2026. URL https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub /

work page 2026
[19]

SKILL.md Explained: How to Structure Your Product for AI Agents — Add Guardrails and Common Pitfalls, 2026

Agent Skills. SKILL.md Explained: How to Structure Your Product for AI Agents — Add Guardrails and Common Pitfalls, 2026. URL https://www.gitbook.com/blog/skill-md

work page 2026
[20]

Model Context Protocol (MCP) Specification, 2025

Anthropic. Model Context Protocol (MCP) Specification, 2025. URL https://modelconte xtprotocol.io/docs/

work page 2025
[21]

Aho, Ravi Sethi, et al.Compilers: Principles, Techniques, and Tools

Alfred V . Aho, Ravi Sethi, et al.Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986. ISBN 0-201-10088-6

work page 1986
[22]

Muchnick.Advanced Compiler Design and Implementation

Steven S. Muchnick.Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA, 1997. ISBN 1-55860-320-4

work page 1997
[23]

The Problem of Programming Communication with Changing Machines: A Proposed Solution.Communications of the ACM, 1(8):12–18, 1958

John Strong, Joseph Wegstein, et al. The Problem of Programming Communication with Changing Machines: A Proposed Solution.Communications of the ACM, 1(8):12–18, 1958. doi: 10.1145/368892.368915

work page doi:10.1145/368892.368915 1958
[24]

LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. InInternational Symposium on Code Generation and Optimization (CGO), 2004. doi: 10.1109/CGO.2004.1281665

work page doi:10.1109/cgo.2004.1281665 2004
[25]

MLIR: Scaling Compiler Infrastructure for Domain Specific Computation

Chris Lattner, Mehdi Amini, et al. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. InInternational Symposium on Code Generation and Optimization (CGO), 2021. doi: 10.1109/CGO51591.2021.9370308

work page doi:10.1109/cgo51591.2021.9370308 2021
[26]

SoK: Eternal war in memory

Laszlo Szekeres, Mathias Payer, et al. SoK: Eternal War in Memory. InIEEE Symposium on Security and Privacy (S&P), 2013. doi: 10.1109/SP.2013.13

work page doi:10.1109/sp.2013.13 2013
[27]

Anthropic’s Official Take on XML-Structured Prompting as the Core Strategy, 2026

Reddit r/ClaudeAI. Anthropic’s Official Take on XML-Structured Prompting as the Core Strategy, 2026. URLhttps://www.reddit.com/r/ClaudeAI/comments/1psxuv7/

work page 2026
[28]

Roy Philip. JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency, 2025. URL https://royphilip.xyz/blog/json-vs-xml-llm-showdown

work page 2025
[29]

Prompt Engineering Across the OpenAI, Anthropic, and Gemini APIs, 2026

Steve Kinney. Prompt Engineering Across the OpenAI, Anthropic, and Gemini APIs, 2026. URLhttps://stevekinney.com/writing/prompt-engineering-frontier-llms

work page 2026
[30]

Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization, 2025

Yuanye Liu, Jiahang Xu, et al. Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization, 2025

work page 2025
[31]

Towards Learning Boulder Excavation with Hydraulic Excavators

Renxi Wang, Xudong Han, et al. ToolGen: Unified Tool Retrieval and Calling via Generation. InInternational Conference on Learning Representations (ICLR), 2025. doi: 10.48550/arXiv.2 410.03439

work page doi:10.48550/arxiv.2 2025
[32]

Skill Retrieval Augmentation for Agentic AI, 2026

Weihang Su, Jianming Long, et al. Skill Retrieval Augmentation for Agentic AI, 2026

work page 2026
[33]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization, 2024

Darren Edge, Ha Trinh, et al. From Local to Global: A Graph RAG Approach to Query-Focused Summarization, 2024

work page 2024
[34]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InConference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. doi: 10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[35]

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings, 2026

Yujian Liu, Jiabao Ji, et al. How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings, 2026. 11

work page 2026
[36]

Agentic Code Optimization via Compiler-LLM Cooperation, 2026

Benjamin Mikek, Danylo Vashchilenko, et al. Agentic Code Optimization via Compiler-LLM Cooperation, 2026

work page 2026
[37]

Mahoney, Kurt Keutzer, and Amir Gholami

Sehoon Kim, Suhong Moon, et al. An LLM Compiler for Parallel Function Calling. In International Conference on Machine Learning (ICML), 2024. doi: 10.48550/arXiv.2312.04511

work page doi:10.48550/arxiv.2312.04511 2024
[38]

SkVM: Revisiting Language VM for Skills Across Heterogeneous LLMs and Harnesses, 2026

Le Chen, Erhu Feng, et al. SkVM: Revisiting Language VM for Skills Across Heterogeneous LLMs and Harnesses, 2026

work page 2026
[39]

A. B. V . Kumar. Deep Dive SKILL.md (Part 1/2): Negative Boundaries and Triggering Accuracy,

work page
[40]

URL https://abvijaykumar.medium.com/deep-dive-skill-md-part-1-2-0 9fc9a536996

work page
[41]

SecPI: Secure code generation with reasoning models via security reasoning internalization, 2026

Hao Wang, Niels Mündler, et al. SecPI: Secure code generation with reasoning models via security reasoning internalization, 2026

work page 2026
[42]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, 2026

Xiangyi Li, Wenbo Chen, et al. SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, 2026

work page 2026
[43]

UI/UX Pro Max Skill: An Agent Skill for UI/UX design tasks, 2026

NextLevelBuilder. UI/UX Pro Max Skill: An Agent Skill for UI/UX design tasks, 2026. URL https://github.com/nextlevelbuilder/ui-ux-pro-max-skill

work page 2026
[44]

Harbor: A Framework for Evaluating and Optimizing Agents and Models in Container Environments, 2026

Harbor Framework Team. Harbor: A Framework for Evaluating and Optimizing Agents and Models in Container Environments, 2026. URL https://github.com/harbor-framework /harbor

work page 2026
[45]

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

Xingyao Wang, Simon Rosenberg, et al. The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents. InConference on Machine Learning and Systems (MLSys), 2026. doi: 10.48550/arXiv.2511.03690. A Implementation Details SKCC is implemented in Rust and organized into four crates: nexa-skill-cli.CLI entry point using clap for ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.03690 2026
[46]

46## P r o c e d u r e s

**[ CRITICAL ]** 30## Pa ra met er Schema ( YAML O pt im ize d ) 15 31‘‘‘ yaml 32type : object 33p r o p e r t i e s : 34m i g r a t i o n _ c o n f i g : 35type : object 36p r o p e r t i e s : 37s our ce _d b : 38type : object 39p r o p e r t i e s : 40host : { type : string } 41‘‘‘ 42 43Kimi : # data - mi gr at io n 44## D e s c r i p t i o n 45... 46#...

work page
[47]

w/ Reduction

**[ CRITICAL ]** 48## Pa ra met er Schema 49- ‘ m i g r a t i o n _ c o n f i g . so ur ce _d b . host ‘ ( string ) : ... D Complete Experimental Data D.1 Four-Model Comparison Summary Table 7: Four-Model Comparison Summary Model Paired∆Rwd.p dVerdict claude-opus-4-6 22–27+0.26–0.270.0096** 0.59–0.60SKCC≫Baseline kimi-k2.5 74+0.1420.0063** 0.33SKCC>Baseli...

work page arXiv 2026