Ishigaki-IDS: An Open-Weight Verifier-Aware Model for Information Delivery Specification Drafting in Building Information Modeling

Atomu Kondo; Chenguang Wang; Daiho Nishioka; Dayuan Jiang; Koki Arakawa; Koyo Hidaka; Naofumi Fujita; Ryo Kanazawa; Shuhei Saitoh; Takayuki Kato

arxiv: 2606.08545 · v1 · pith:HW64LCOGnew · submitted 2026-06-07 · 💻 cs.CL · cs.SE

Ishigaki-IDS: An Open-Weight Verifier-Aware Model for Information Delivery Specification Drafting in Building Information Modeling

Ryo Kanazawa , Koyo Hidaka , Teppei Miyamoto , Takayuki Kato , Tomoki Ando , Chenguang Wang , Dayuan Jiang , Naofumi Fujita

show 4 more authors

Shuhei Saitoh Atomu Kondo Koki Arakawa Daiho Nishioka

This is my paper

Pith reviewed 2026-06-27 18:23 UTC · model grok-4.3

classification 💻 cs.CL cs.SE

keywords BIMInformation Delivery SpecificationIDSlarge language modelreinforcement learningverifier-aware trainingopen-weight modelBuilding Information Modeling

0 comments

The pith

An open-weight LLM trained with external validator rewards generates BIM IDS drafts that pass checks at twice the rate of Claude Opus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Ishigaki-IDS, an open-weight language model specialized for drafting Information Delivery Specification files required in Building Information Modeling projects. Training combines continued pretraining on BIM and IDS corpora, supervised fine-tuning on requirement-to-IDS pairs, and reinforcement learning that draws rewards directly from an external validator to enforce schema conformance. On the 166-case expert benchmark, the 8B model reaches an IDSAuditPass score of 0.651 versus 0.331 for Claude Opus 4.5, with 14B and 32B variants reaching 0.753 and 0.693 while also improving facet alignment among passing outputs. A workflow study with six BIM practitioners found that model assistance reduced aggregate authoring time by 54.7 percent under the same validation endpoint. The stated goal is to shift effort from manual XML schema repair toward inspection and correction of validator-loadable drafts.

Core claim

Ishigaki-IDS models are produced by continued pretraining on BIM/IDS data, supervised fine-tuning on information-requirement-to-IDS pairs, and reinforcement learning with rewards from an external validator. On the 166-case Ishigaki-IDS-Bench the 8B, 14B and 32B variants obtain IDSAuditPass scores of 0.651, 0.753 and 0.693, exceeding the 0.331 score of the strongest single-shot baseline evaluated, and they also record Audit-Gated FacetF1 scores of 0.282, 0.392 and 0.369 among validator-passing drafts. In a workflow check the assisted process reduced aggregate practitioner time by 54.7 percent at fixed validation and alignment endpoints.

What carries the argument

Reinforcement learning that supplies rewards from an external IDS validator to guide generation toward conformant and requirement-aligned outputs.

If this is right

Larger model sizes within the same training recipe produce higher IDSAuditPass and Audit-Gated FacetF1 scores.
Generated drafts become validator-loadable, allowing practitioners to move from low-level XML editing to review and correction.
The combination of domain pretraining, supervised fine-tuning, and validator-guided reinforcement learning scales to improved performance on the task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integration of such models into existing BIM tools could provide interactive draft suggestions during requirements capture.
Similar verifier-aware reinforcement learning may transfer to other engineering domains that require machine-checkable structured outputs.
Widespread use would shift practitioner expertise toward high-level validation rather than manual schema construction.

Load-bearing premise

The external validator supplies reliable, unbiased reward signals during reinforcement learning that generalize beyond the 166-case benchmark and six-practitioner study.

What would settle it

A fresh collection of real-world BIM information requirements, never seen in training or the original benchmark, on which Ishigaki-IDS models no longer exceed general LLMs on IDSAuditPass or produce measurable time reductions in practitioner trials.

Figures

Figures reproduced from arXiv: 2606.08545 by Atomu Kondo, Chenguang Wang, Daiho Nishioka, Dayuan Jiang, Koki Arakawa, Koyo Hidaka, Naofumi Fujita, Ryo Kanazawa, Shuhei Saitoh, Takayuki Kato, Teppei Miyamoto, Tomoki Ando.

**Figure 2.** Figure 2: Verifier-Aware Multi-Stage Adaptation of Ishigaki-IDS. CPT adapts the model to BIM/IFC/IDS vocabulary, SFT teaches [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Building Information Modeling (BIM) projects require information requirements to be described as machine-checkable Information Delivery Specification (IDS) files in order to verify whether building models contain the required attributes. However, IDS authoring remains a practical bottleneck: practitioners must handle domain vocabulary, strict XML schema constraints, and external validator conformance while also checking whether the requirement itself is correctly expressed. We present Ishigaki-IDS, an open-weight LLM specialized for verifier-aware IDS draft generation. The model combines continued pretraining on BIM/IDS corpora, supervised fine-tuning on information-requirement-to-IDS pairs, and reinforcement learning with verifiable rewards from an external validator. The goal is not to replace expert review, but to move IDS authoring from low-level XML and schema repair toward validator-loadable drafts that practitioners can inspect and correct. On the 166-case expert-created Ishigaki-IDS-Bench, Ishigaki-IDS-8B achieves an IDSAuditPass score of 0.651, a validator-pass metric for generated IDS files, substantially outperforming Claude Opus 4.5, the strongest single-shot LLM baseline we evaluated, at 0.331. It also obtains an Audit-Gated FacetF1 of 0.282, which measures requirement-facet alignment among validator-passing drafts. The same recipe scales: 14B and 32B variants reach IDSAuditPass 0.753 / 0.693 and Audit-Gated FacetF1 0.392 / 0.369. In a workflow check with six BIM practitioners, Ishigaki-assisted authoring reduced aggregate work time by 54.7% under the same validation and alignment endpoint. These results suggest that verifier-aware IDS generation can reduce the practical burden of converting BIM information requirements into reviewable IDS drafts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They fine-tuned an open LLM with RL against an IDS validator for BIM and beat Claude on their test set while showing time savings in a small practitioner trial, though the eval is tightly coupled to the reward signal.

read the letter

The core of this is taking an 8B model, doing continued pretraining on BIM stuff, SFT on requirement to IDS pairs, then RL where the reward is whether the output passes an external validator. On 166 expert cases it hits 0.65 IDSAuditPass while Claude gets 0.33. Larger sizes go higher. In a check with six BIM people, using the model cut total work time by 55 percent.

This is a direct application of verifier RL to a narrow but real pain point in construction modeling. Releasing the weights is the right move here. The focus on producing validator-ready drafts rather than perfect text makes sense for the use case.

The results are presented without much fanfare, which is fine. The time saving number is the kind of thing practitioners care about.

Where it gets thin is the evaluation. 166 cases is not a lot for claiming broad usefulness, and there's no breakdown on what kinds of requirements or building domains are covered. The six-person study is even smaller and lacks any measure of how consistent the time savings were. Since the RL reward comes from the same validator used in the main score, the big jump in passing rate is expected once the model learns the validator's preferences. It doesn't tell us much about whether the actual information requirements are accurate beyond what that one tool checks.

This work is for people building tools in the AEC industry or anyone doing domain-specific LLM fine-tuning with external verifiers. It has enough concrete numbers and an open model to be worth referee time, though the authors would need to add details on benchmark construction and perhaps an external test set to make the claims stick. I'd recommend sending it for review with those points in mind.

Referee Report

3 major / 1 minor

Summary. The paper presents Ishigaki-IDS, an open-weight LLM for generating verifier-aware Information Delivery Specification (IDS) drafts in BIM. It combines continued pretraining on BIM/IDS corpora, supervised fine-tuning on requirement-to-IDS pairs, and RL using rewards from an external validator. On the custom 166-case Ishigaki-IDS-Bench, the 8B model reports IDSAuditPass of 0.651 (vs. Claude Opus 4.5 at 0.331), with 14B/32B variants at 0.753/0.693; it also reports Audit-Gated FacetF1 scores and a 54.7% aggregate work-time reduction in a six-practitioner workflow study. The goal is to produce reviewable drafts rather than replace experts.

Significance. If the benchmark results and time-reduction claim hold under independent scrutiny, the work could reduce a documented practical bottleneck in BIM projects by shifting IDS authoring from XML/schema repair toward validator-loadable drafts. The open-weight release and use of an external verifiable reward signal are concrete strengths that support reproducibility and extension in the AEC domain.

major comments (3)

[Abstract] Abstract: The headline IDSAuditPass scores (0.651 for 8B, scaling to 0.753 for 14B) rest on the 166-case Ishigaki-IDS-Bench, yet no information is supplied on case selection criteria, domain coverage across BIM subfields, or training-data overlap; this directly affects whether the performance gap versus Claude can be interpreted as general improvement rather than benchmark-specific optimization.
[Abstract] Abstract: The 54.7% time reduction is reported from a workflow check with n=6 practitioners under the same validation endpoint, but the abstract supplies neither per-practitioner times, variance, experimental design (e.g., counterbalancing, controls), nor statistical tests; with such small N the aggregate figure is load-bearing for the practical-utility claim yet cannot be evaluated for reliability.
[Abstract] Abstract: Reinforcement learning employs rewards from the identical external validator that defines the IDSAuditPass metric used at test time; without an ablation that decouples training reward from evaluation metric, the reported scores risk reflecting validator-specific conformance rather than independent requirement correctness, which is central to the verifier-aware framing.

minor comments (1)

[Abstract] Abstract: Model parameter counts (8B/14B/32B) and the three-stage training recipe are mentioned only in passing; a single sentence summarizing the data sources for continued pretraining and SFT would improve readability without lengthening the abstract.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on transparency in the benchmark, user study, and RL design. We address each point below with proposed revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The headline IDSAuditPass scores (0.651 for 8B, scaling to 0.753 for 14B) rest on the 166-case Ishigaki-IDS-Bench, yet no information is supplied on case selection criteria, domain coverage across BIM subfields, or training-data overlap; this directly affects whether the performance gap versus Claude can be interpreted as general improvement rather than benchmark-specific optimization.

Authors: We agree that the abstract omits key benchmark details. The full manuscript describes the 166 cases as expert-created from real BIM projects. We will revise the experimental setup section (and reference it from the abstract) to specify case selection criteria, domain coverage across architectural/structural/MEP disciplines, and confirmation of no training-data overlap via project-level splits and manual review. This will allow readers to better assess generalizability versus optimization. revision: yes
Referee: [Abstract] Abstract: The 54.7% time reduction is reported from a workflow check with n=6 practitioners under the same validation endpoint, but the abstract supplies neither per-practitioner times, variance, experimental design (e.g., counterbalancing, controls), nor statistical tests; with such small N the aggregate figure is load-bearing for the practical-utility claim yet cannot be evaluated for reliability.

Authors: The reported figure derives from a preliminary within-subjects workflow check. We will expand the relevant section to include per-practitioner times, variance, design details (counterbalancing of assisted vs. unassisted tasks), and an explicit statement that no formal statistical tests were performed due to small N. The claim will be qualified as preliminary evidence of utility rather than a definitive result. revision: yes
Referee: [Abstract] Abstract: Reinforcement learning employs rewards from the identical external validator that defines the IDSAuditPass metric used at test time; without an ablation that decouples training reward from evaluation metric, the reported scores risk reflecting validator-specific conformance rather than independent requirement correctness, which is central to the verifier-aware framing.

Authors: The shared validator is by design to optimize for the practical goal of producing validator-passing drafts. We acknowledge that this leaves open the possibility of validator-specific behavior rather than fully independent correctness. No ablation with a held-out validator was performed. We will add a limitations discussion noting this risk and proposing future multi-validator experiments, while noting that Audit-Gated FacetF1 offers a partial check on requirement alignment. revision: partial

Circularity Check

0 steps flagged

No circularity; results rest on external benchmark comparisons and practitioner study

full rationale

The paper trains via continued pretraining, SFT on requirement-to-IDS pairs, and RL with rewards from an external validator, then reports IDSAuditPass (validator-pass rate) and Audit-Gated FacetF1 on a separate 166-case expert-created benchmark, plus time savings from a six-practitioner workflow check. These are direct empirical measurements against baselines (e.g., Claude Opus) that lack the RL stage; no equation, definition, or self-citation reduces the reported scores or time reduction to the training inputs by construction. The derivation chain is self-contained against the stated external validator and held-out cases.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, invented entities, or non-standard axioms are described beyond the general assumption that validator feedback can guide useful model behavior.

axioms (1)

domain assumption An external IDS validator can supply reliable and generalizable reward signals for reinforcement learning on schema-constrained generation.
This premise underpins the RL stage described in the abstract.

pith-pipeline@v0.9.1-grok · 5922 in / 1439 out tokens · 28331 ms · 2026-06-27T18:23:01.378142+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 20 canonical work pages · 5 internal anchors

[1]

Chunyan An, Yuying Huang, Qiang Yang, Siyu Yuan, and Zhixu Li. 2025. LLM- Powered Information Extraction for the Dairy Financial Domain: Tackling Data Scarcity and Ambiguity. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 55–64. doi:10.1145/3746252.3761030

work page doi:10.1145/3746252.3761030 2025
[2]

Anand Brahmbhatt, Mohith Pokala, Rishi Saket, and Aravindan Raghuveer. 2024. LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Propor- tions. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 4374–4381. doi:10.1145/3627673.3680032

work page doi:10.1145/3627673.3680032 2024
[3]

buildingSMART International. 2024. IDSAuditTool. https://github.com/ buildingSMART/IDS-Audit-tool. Official buildingSMART tool for auditing IDS files. Accessed: 2026-05-17

2024
[4]

buildingSMART International. 2026. Industry Foundation Classes (IFC). https://www.buildingsmart.org/standards/bsi-standards/industry-foundation- classes/. Accessed: 2026-05-08

2026
[5]

buildingSMART International. 2026. Information Delivery Specification (IDS). https://www.buildingsmart.org/standards/bsi-standards/information- delivery-specification-ids/. Accessed: 2026-05-08

2026
[6]

Tomo Cerovšek and Mohamed Omar. 2025. Advancing Semantic Enrichment Compliance in BIM: An Ontology-Based Framework and IDS Evaluation.Build- ings15, 15 (2025), 2621. doi:10.3390/buildings15152621

work page doi:10.3390/buildings15152621 2025
[7]

Giancarlo de Marco, Cinzia Slongo, and Dietmar Siegele. 2024. Enriching Build- ing Information Modeling Models through Information Delivery Specification. Buildings14, 7 (2024), 2206. doi:10.3390/buildings14072206

work page doi:10.3390/buildings14072206 2024
[8]

Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Atten- tion. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 33–43. doi:10.18653/v1/P16-1004

work page doi:10.18653/v1/p16-1004 2016
[9]

Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen

Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. 2024. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models. arXiv:2411.15100. doi:10.48550/arXiv.2411. 15100

work page doi:10.48550/arxiv.2411 2024
[10]

2018.BIM Hand- book: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers(3 ed.)

Chuck Eastman, Paul Teicholz, Rafael Sacks, and Ghang Lee. 2018.BIM Hand- book: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers(3 ed.). Wiley, Hoboken, NJ, USA

2018
[11]

Simon Fischer, Harald Urban, Christian Schranz, and Gerhard Zucker. 2025. Bridging the Gap Between Tabular Information Requirements and the Infor- mation Delivery Specification (IDS).Buildings15, 7 (2025), 1017. doi:10.3390/ buildings15071017

2025
[12]

Forestry Agency, Ministry of Agriculture, Forestry and Fisheries of Japan. 2025. Wooden BIM Model Parameter Guide: Explanation of How to Assign Information to Object Data in Wooden Design BIM Models. https://www.rinya.maff.go.jp/j/ mokusan/attach/pdf/bim-19.pdf. In Japanese. Accessed: 2026-05-17

2025
[13]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645 (2025), 633–638. doi:10.1038/s41586-025- 09422-z

work page doi:10.1038/s41586-025- 2025
[14]

IfcOpenShell Contributors. 2026. IfcOpenShell. https://docs.ifcopenshell.org/ ifcopenshell.html. Open source IFC library and geometry engine. Accessed: 2026-05-17

2026
[15]

Yuhe Ji, Yilun Liu, Feiyu Yao, Minggui He, Shimin Tao, Xiaofeng Zhao, Chang Su, Xinhua Yang, Weibin Meng, Yuming Xie, Boxing Chen, Shenglin Zhang, and Yongqian Sun. 2025. Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Associ...

work page doi:10.1145/3746252.3761189 2025
[16]

Ryo Kanazawa, Koyo Hidaka, Tomoki Ando, Chenguang Wang, Dayuan Jiang, and Daiho Nishioka. 2026. ONESTRUCTION/Ishigaki-IDS-8B. https://huggingface. co/ONESTRUCTION/Ishigaki-IDS-8B. doi:10.57967/hf/8871 Model card and weights. Accessed: 2026-05-20

work page doi:10.57967/hf/8871 2026
[17]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench. doi:10.57967/hf/ 8873 Hugging Face dataset. Accessed: 2026-05-21

work page doi:10.57967/hf/ 2026
[18]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Require- ments. arXiv:2605.22079 [cs.CL] https://arxiv.org/abs/2605.22079

Pith/arXiv arXiv 2026
[19]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench: Evaluation Code and Reproducibility Repository. doi:10.5281/zenodo.20362616 GitHub repository release v1.0.0. Accessed: 2026-05-24

work page doi:10.5281/zenodo.20362616 2026
[20]

Hwang, Jiangjiang Yang, Ro- nan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivi- son, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ro- nan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, et al. 2025. Tulu 3: Pushing Frontiers...

2025
[21]

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let’s Verify Step by Step. arXiv:2305.20050. doi:10.48550/arXiv.2305.20050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.20050 2023
[22]

Gang Luo, Julien Han, Hayreddin Ceker, and Karim Bouyarmane. 2025. Using Large Language Models to Improve Product Information in E-commerce Catalogs. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 6873–6874. doi:10.1145/3746252.3761437

work page doi:10.1145/3746252.3761437 2025
[23]

Soumya Madireddy, Lu Gao, Zia Ud Din, Kinam Kim, Ahmed Senouci, Zhe Han, and Yunpeng Zhang. 2025. Large Language Model-Driven Code Compliance Checking in Building Information Modeling.Electronics14, 11 (2025), 2146. doi:10.3390/electronics14112146

work page doi:10.3390/electronics14112146 2025
[24]

NVIDIA Corporation. 2026. NVIDIA NeMo Framework. https://docs.nvidia.com/ nemo-framework/user-guide/latest/overview.html. Documentation. Accessed: 2026-05-21

2026
[25]

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec, Bettina Messmer, Negar Foroutan, Amir Hossein Kargaran, Colin Raffel, Martin Jaggi, Leandro Von Werra, and Thomas Wolf. 2025. FineWeb2: One Pipeline to Scale Them All – Adapting Pre-Training Data Processing to Every Language. arXiv:2506.20920. doi:10.48550/ arXiv.2506.20920

arXiv 2025
[26]

Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract Syntax Net- works for Code Generation and Semantic Parsing. InProceedings of the 55th An- nual Meeting of the Association for Computational Linguistics. Association for Com- putational Linguistics, Vancouver, Canada, 1139–1149. doi:10.18653/v1/P17-1105

work page doi:10.18653/v1/p17-1105 2017
[27]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300. doi:10.48550/arXiv.2402.03300

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
[28]

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2024. HybridFlow: A Flexible and Efficient RLHF Framework. arXiv:2409.19256. doi:10.48550/arXiv.2409.19256

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.19256 2024
[29]

vLLM Team. 2026. Structured Outputs. https://docs.vllm.ai/usage/structured_ outputs.html. vLLM documentation. Accessed: 2026-05-17

2026
[30]

Efficient Guided Generation for Large Language Models

Brandon T. Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models. arXiv:2307.09702. doi:10.48550/arXiv.2307.09702

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09702 2023
[31]

An Yang et al. 2025. Qwen3 Technical Report. arXiv:2505.09388. doi:10.48550/ arXiv.2505.09388

Pith/arXiv arXiv 2025
[32]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Proces...

work page doi:10.18653/v1/d18- 2018
[33]

Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, and Omer Levy. 2023. LIMA: Less Is More for Alignment. arXiv:2305.11206. doi:10.48550/arXiv.2305.11206

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.11206 2023

[1] [1]

Chunyan An, Yuying Huang, Qiang Yang, Siyu Yuan, and Zhixu Li. 2025. LLM- Powered Information Extraction for the Dairy Financial Domain: Tackling Data Scarcity and Ambiguity. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 55–64. doi:10.1145/3746252.3761030

work page doi:10.1145/3746252.3761030 2025

[2] [2]

Anand Brahmbhatt, Mohith Pokala, Rishi Saket, and Aravindan Raghuveer. 2024. LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Propor- tions. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 4374–4381. doi:10.1145/3627673.3680032

work page doi:10.1145/3627673.3680032 2024

[3] [3]

buildingSMART International. 2024. IDSAuditTool. https://github.com/ buildingSMART/IDS-Audit-tool. Official buildingSMART tool for auditing IDS files. Accessed: 2026-05-17

2024

[4] [4]

buildingSMART International. 2026. Industry Foundation Classes (IFC). https://www.buildingsmart.org/standards/bsi-standards/industry-foundation- classes/. Accessed: 2026-05-08

2026

[5] [5]

buildingSMART International. 2026. Information Delivery Specification (IDS). https://www.buildingsmart.org/standards/bsi-standards/information- delivery-specification-ids/. Accessed: 2026-05-08

2026

[6] [6]

Tomo Cerovšek and Mohamed Omar. 2025. Advancing Semantic Enrichment Compliance in BIM: An Ontology-Based Framework and IDS Evaluation.Build- ings15, 15 (2025), 2621. doi:10.3390/buildings15152621

work page doi:10.3390/buildings15152621 2025

[7] [7]

Giancarlo de Marco, Cinzia Slongo, and Dietmar Siegele. 2024. Enriching Build- ing Information Modeling Models through Information Delivery Specification. Buildings14, 7 (2024), 2206. doi:10.3390/buildings14072206

work page doi:10.3390/buildings14072206 2024

[8] [8]

Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Atten- tion. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 33–43. doi:10.18653/v1/P16-1004

work page doi:10.18653/v1/p16-1004 2016

[9] [9]

Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen

Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. 2024. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models. arXiv:2411.15100. doi:10.48550/arXiv.2411. 15100

work page doi:10.48550/arxiv.2411 2024

[10] [10]

2018.BIM Hand- book: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers(3 ed.)

Chuck Eastman, Paul Teicholz, Rafael Sacks, and Ghang Lee. 2018.BIM Hand- book: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers(3 ed.). Wiley, Hoboken, NJ, USA

2018

[11] [11]

Simon Fischer, Harald Urban, Christian Schranz, and Gerhard Zucker. 2025. Bridging the Gap Between Tabular Information Requirements and the Infor- mation Delivery Specification (IDS).Buildings15, 7 (2025), 1017. doi:10.3390/ buildings15071017

2025

[12] [12]

Forestry Agency, Ministry of Agriculture, Forestry and Fisheries of Japan. 2025. Wooden BIM Model Parameter Guide: Explanation of How to Assign Information to Object Data in Wooden Design BIM Models. https://www.rinya.maff.go.jp/j/ mokusan/attach/pdf/bim-19.pdf. In Japanese. Accessed: 2026-05-17

2025

[13] [13]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645 (2025), 633–638. doi:10.1038/s41586-025- 09422-z

work page doi:10.1038/s41586-025- 2025

[14] [14]

IfcOpenShell Contributors. 2026. IfcOpenShell. https://docs.ifcopenshell.org/ ifcopenshell.html. Open source IFC library and geometry engine. Accessed: 2026-05-17

2026

[15] [15]

Yuhe Ji, Yilun Liu, Feiyu Yao, Minggui He, Shimin Tao, Xiaofeng Zhao, Chang Su, Xinhua Yang, Weibin Meng, Yuming Xie, Boxing Chen, Shenglin Zhang, and Yongqian Sun. 2025. Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Associ...

work page doi:10.1145/3746252.3761189 2025

[16] [16]

Ryo Kanazawa, Koyo Hidaka, Tomoki Ando, Chenguang Wang, Dayuan Jiang, and Daiho Nishioka. 2026. ONESTRUCTION/Ishigaki-IDS-8B. https://huggingface. co/ONESTRUCTION/Ishigaki-IDS-8B. doi:10.57967/hf/8871 Model card and weights. Accessed: 2026-05-20

work page doi:10.57967/hf/8871 2026

[17] [17]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench. doi:10.57967/hf/ 8873 Hugging Face dataset. Accessed: 2026-05-21

work page doi:10.57967/hf/ 2026

[18] [18]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Require- ments. arXiv:2605.22079 [cs.CL] https://arxiv.org/abs/2605.22079

Pith/arXiv arXiv 2026

[19] [19]

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka. 2026. Ishigaki-IDS-Bench: Evaluation Code and Reproducibility Repository. doi:10.5281/zenodo.20362616 GitHub repository release v1.0.0. Accessed: 2026-05-24

work page doi:10.5281/zenodo.20362616 2026

[20] [20]

Hwang, Jiangjiang Yang, Ro- nan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivi- son, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ro- nan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, et al. 2025. Tulu 3: Pushing Frontiers...

2025

[21] [21]

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let’s Verify Step by Step. arXiv:2305.20050. doi:10.48550/arXiv.2305.20050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.20050 2023

[22] [22]

Gang Luo, Julien Han, Hayreddin Ceker, and Karim Bouyarmane. 2025. Using Large Language Models to Improve Product Information in E-commerce Catalogs. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 6873–6874. doi:10.1145/3746252.3761437

work page doi:10.1145/3746252.3761437 2025

[23] [23]

Soumya Madireddy, Lu Gao, Zia Ud Din, Kinam Kim, Ahmed Senouci, Zhe Han, and Yunpeng Zhang. 2025. Large Language Model-Driven Code Compliance Checking in Building Information Modeling.Electronics14, 11 (2025), 2146. doi:10.3390/electronics14112146

work page doi:10.3390/electronics14112146 2025

[24] [24]

NVIDIA Corporation. 2026. NVIDIA NeMo Framework. https://docs.nvidia.com/ nemo-framework/user-guide/latest/overview.html. Documentation. Accessed: 2026-05-21

2026

[25] [25]

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec, Bettina Messmer, Negar Foroutan, Amir Hossein Kargaran, Colin Raffel, Martin Jaggi, Leandro Von Werra, and Thomas Wolf. 2025. FineWeb2: One Pipeline to Scale Them All – Adapting Pre-Training Data Processing to Every Language. arXiv:2506.20920. doi:10.48550/ arXiv.2506.20920

arXiv 2025

[26] [26]

Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract Syntax Net- works for Code Generation and Semantic Parsing. InProceedings of the 55th An- nual Meeting of the Association for Computational Linguistics. Association for Com- putational Linguistics, Vancouver, Canada, 1139–1149. doi:10.18653/v1/P17-1105

work page doi:10.18653/v1/p17-1105 2017

[27] [27]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300. doi:10.48550/arXiv.2402.03300

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024

[28] [28]

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2024. HybridFlow: A Flexible and Efficient RLHF Framework. arXiv:2409.19256. doi:10.48550/arXiv.2409.19256

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.19256 2024

[29] [29]

vLLM Team. 2026. Structured Outputs. https://docs.vllm.ai/usage/structured_ outputs.html. vLLM documentation. Accessed: 2026-05-17

2026

[30] [30]

Efficient Guided Generation for Large Language Models

Brandon T. Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models. arXiv:2307.09702. doi:10.48550/arXiv.2307.09702

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09702 2023

[31] [31]

An Yang et al. 2025. Qwen3 Technical Report. arXiv:2505.09388. doi:10.48550/ arXiv.2505.09388

Pith/arXiv arXiv 2025

[32] [32]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Proces...

work page doi:10.18653/v1/d18- 2018

[33] [33]

Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, and Omer Levy. 2023. LIMA: Less Is More for Alignment. arXiv:2305.11206. doi:10.48550/arXiv.2305.11206

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.11206 2023