Buildrix: An Open Platform for Sharing and Benchmarking Agentic AI Skills in Building Engineering
Pith reviewed 2026-06-25 21:40 UTC · model grok-4.3
The pith
Buildrix is an open platform that packages agentic AI skills for building engineering into reusable, expert-verifiable units with standardized benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Buildrix supplies an open, community-driven platform consisting of a Python command-line package, a web-based Hub, and a local agent harness that together allow standardized skills to be developed, published, installed, executed, and evaluated through expert-verified quantitative test cases promoted to golden benchmarks for building engineering tasks.
What carries the argument
The standardized self-contained skill package that bundles task instructions, executable scripts, dependencies, and resources, managed across the Python package, web Hub, and local harness for validation and execution.
If this is right
- Skills become reusable packages that developers can publish, install, and manage through the Python command-line package.
- The web Hub organizes open challenges, collects reviews, and displays benchmark results across skills.
- Expert-verified quantitative test cases are promoted to golden standards that support consistent, reproducible evaluation.
- The local harness enables agents to discover skills, provision external tools, and run multi-step building engineering workflows.
Where Pith is reading between the lines
- Widespread use could accumulate a library of verified skills that reduce duplication when automating building control or design tasks.
- The same packaging and verification model could be applied to agentic AI in related engineering domains if the components prove workable.
- Public challenges on the Hub could surface which agent architectures reliably handle specific building workflow problems.
Load-bearing premise
The described components will be implemented and domain experts will adopt them to verify test cases and generate reproducible benchmarks.
What would settle it
No functional Python package, web Hub, or local harness is released, or no community skills with expert-verified golden test cases appear for benchmarking.
Figures
read the original abstract
Agentic AI offers significant potential to automate complex building-engineering workflows. However, most existing applications remain isolated proof-of-concept demonstrations and lack reusable domain capabilities, human-verified evaluation cases, and standardized benchmarking infrastructure. This study presents Buildrix, an open, community-driven platform for developing, sharing, executing, and evaluating agentic AI skills for building engineering. Buildrix integrates three components: a Python command-line package for developing, validating, publishing, installing, and managing skills and test cases; a web-based Hub for organizing open challenges, reusable skills, test cases, reviews, and benchmark results; and a local agent harness that supports skill discovery, external toolchain provisioning, progressive context loading, and multi-step workflow execution. Buildrix skills are organized as standardized, self-contained packages containing task instructions, executable scripts, dependencies, and supporting resources. Quantitative test cases can be verified by domain experts and promoted to golden test cases for reproducible benchmark evaluation. Buildrix provides an open foundation for reusable capability development, transparent evaluation, and community-driven advancement of agentic AI in building engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Buildrix as an open, community-driven platform for agentic AI skills in building engineering. It integrates a Python command-line package for skill and test-case management, a web-based Hub for challenges and benchmarks, and a local agent harness for execution. Skills are standardized self-contained packages, with quantitative test cases verifiable by domain experts as golden cases for reproducible evaluation. The central claim is that Buildrix supplies an open foundation for reusable capability development, transparent evaluation, and community advancement.
Significance. If the described components were implemented, adopted, and used to produce verified benchmarks, the platform could address the isolation of current proof-of-concept agentic AI applications in the domain. The manuscript, however, contains only high-level descriptions of intended functionality with no code, artifacts, execution traces, benchmark numbers, or adoption data, so any significance remains prospective.
major comments (2)
- [Abstract] Abstract: The claim that Buildrix 'provides an open foundation for reusable capability development, transparent evaluation, and community-driven advancement' is not supported by any implementation details, source code, example skill packages, test-case artifacts, or usage metrics. The three components (Python package, web Hub, local harness) are described at the level of intended workflows only.
- [Abstract] Abstract: The assertion that 'quantitative test cases can be verified by domain experts and promoted to golden test cases for reproducible benchmark evaluation' rests on the unverified premise that the platform will be built and used; no mechanism, example workflow, or verification process is demonstrated.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback on our manuscript. The paper introduces Buildrix as a platform design to address fragmentation in agentic AI for building engineering, with the three components described through their architecture and workflows. We respond point-by-point to the major comments below, noting where revisions can clarify the scope of the current contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that Buildrix 'provides an open foundation for reusable capability development, transparent evaluation, and community-driven advancement' is not supported by any implementation details, source code, example skill packages, test-case artifacts, or usage metrics. The three components (Python package, web Hub, local harness) are described at the level of intended workflows only.
Authors: We agree that the manuscript presents the platform at the level of design and intended workflows rather than including embedded source code, example packages, or usage metrics. This is consistent with system-description papers that focus on architecture to enable community adoption. The standardized skill package format, Hub organization, and harness capabilities are specified in sufficient detail to define the foundation. We will revise the abstract and add a short implementation-status paragraph to make this scope explicit. revision: partial
-
Referee: [Abstract] Abstract: The assertion that 'quantitative test cases can be verified by domain experts and promoted to golden test cases for reproducible benchmark evaluation' rests on the unverified premise that the platform will be built and used; no mechanism, example workflow, or verification process is demonstrated.
Authors: The verification mechanism is described as part of the skill-package structure and the Hub's review workflow: quantitative test cases are bundled with skills, domain experts can review them via the web interface, and accepted cases become golden references for benchmark runs executed by the local harness. While the manuscript does not include a live example workflow, the design specifies how this process operates. We will revise the abstract to frame this as a defined capability of the platform rather than a demonstrated outcome. revision: partial
Circularity Check
No circularity: platform description without derivations or predictions
full rationale
The paper is a descriptive announcement of a proposed platform (Python package, web Hub, local harness) with no equations, no quantitative predictions, no fitted parameters, and no derivation chain of any kind. The central claim is a statement of intended functionality and community benefit rather than a result obtained from internal logic or self-referential steps. No self-citations, ansatzes, or uniqueness theorems appear in any load-bearing role. This is a normal non-finding for a platform paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Wetter, M., & Sulzer, M. (2024). A call to action for building energy system modelling in the age of decar- bonization.Journal of Building Performance Simula- tion, 17(3), 383–393
2024
-
[2]
Blum, D., Wang, Z., Weyandt, C., Kim, D., Wetter, M., Hong, T., & Piette, M. A. (2022). Field demonstra- tion and implementation analysis of model predictive control in an office HVAC system.Applied Energy, 318, 119104
2022
-
[4]
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT.arXiv preprintarXiv:2302.11382
Pith/arXiv arXiv 2023
-
[5]
& Sui, Z
Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., ... & Sui, Z. (2024, November). A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 1107–1128)
2024
-
[6]
X., Zhou, K., Li, J., Tang, T., Dong, Z., Hou, Y.,
Zhao, W. X., Zhou, K., Li, J., Tang, T., Dong, Z., Hou, Y., ... & Wen, J. R. (2026). A survey of large lan- guage models.Frontiers of Computer Science, 20(12), 2012627
2026
-
[7]
Jiang, G., Ma, Z., Zhang, L., & Chen, J. (2024). EPlus-LLM: A large language model-based comput- ing platform for automated building energy modeling. Applied Energy, 367, 123431. 13
2024
-
[8]
U., Kim, K., Senouci, A., Han, Z., & Zhang, Y
Madireddy, S., Gao, L., Din, Z. U., Kim, K., Senouci, A., Han, Z., & Zhang, Y. (2025). Large language model-driven code compliance checking in building information modeling.Electronics, 14(11), 2146
2025
-
[9]
S., & Capozzoli, A
Perini, M., Antonucci, D., Giudice, R., Piscitelli, M. S., & Capozzoli, A. (2025). BrickLLM: A Python library for generating Brick-compliant RDF graphs using LLMs.SoftwareX, 30, 102121
2025
-
[10]
Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., ... & Chen, E. (2024). Understanding the planning of LLM agents: A survey.arXiv preprint arXiv:2402.02716
Pith/arXiv arXiv 2024
-
[11]
Ferrag, M. A., Tihanyi, N., & Debbah, M. (2025). FromLLMreasoningtoautonomousAIagents: Acom- prehensive review.arXiv preprintarXiv:2504.19678
Pith/arXiv arXiv 2025
-
[12]
Du, C., Esser, S., Nousias, S., & Borrmann, A. (2026). Text2BIM: Generating Building Models Using a Large Language Model-Based Multiagent Frame- work.Journal of Computing in Civil Engineering, 40(2), 04025142
2026
-
[13]
Zhang, L., Ford, V., Chen, Z., & Chen, J. (2025). Automatic building energy model development and de- bugging using large language models agentic workflow. Energy and Buildings, 327, 115116
2025
-
[14]
Lin, X., Prabowo, A., Razzak, I., Xue, H., Amos, M., Behrens, S., & Salim, F. D. (2024, December). Bitsa: Leveraging time series foundation model for building energy analytics. In2024 IEEE International Conference on Data Mining Workshops (ICDMW)(pp. 891–894). IEEE
2024
-
[15]
B., Kuppan, K., & Divya, B
Acharya, D. B., Kuppan, K., & Divya, B. (2025). Agen- tic AI: Autonomous intelligence for complex goals—A comprehensive survey.IEEE Access, 13, 18912–18936
2025
-
[16]
& Wang, C
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., ... & Wang, C. (2024, August). AutoGen: Enabling next-gen LLM applications via multi-agent conversa- tions. InFirst Conference on Language Modeling
2024
-
[17]
(n.d.).LangGraph
LangChain. (n.d.).LangGraph. Retrieved May 26, 2026, from https://www.langchain.com/langgr aph
2026
-
[18]
SignificantGravitas.(2023).AutoGPT[Computersoft- ware]. GitHub. https://github.com/Significant -Gravitas/AutoGPT
2023
-
[19]
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., ... & Anandkumar, A. (2023). Voyager: An open-ended embodied agent with large language models.arXiv preprintarXiv:2305.16291
Pith/arXiv arXiv 2023
-
[20]
Xu, W., Wan, H., Goel, S., & Antonopoulos, C. A. (2025). Development of a dynamic multi-agent network for building energy modeling: A case study towards scalable and autonomous energy modeling.Energy and Buildings, 116712
2025
-
[21]
Li, H., Zhang, L., Zhou, H., & Hong, T. (2026). MCP- enabled agentic AI workflow for building energy mod- elling: framework and use cases.Journal of Building Performance Simulation, 1–27
2026
-
[22]
Lee, J., Song, J., Koo, J., Choi, S., Hwang, J., Saif, S. M. H., ... & Yoon, S. (2025). Agentic built environ- ments: a review.Energy and Buildings, 116159
2025
-
[23]
Kate, K., Pedapati, T., Basu, K., Rizk, Y., Chen- thamarakshan, V., Chaudhury, S., ... & Abdelaziz, I. (2025). LongFuncEval: Measuring the effectiveness of long context models for function calling.arXiv preprint arXiv:2505.10570
arXiv 2025
-
[24]
Kim, Y., Gu, K., Park, C., Park, C., Schmidgall, S., Heydari, A. A., ... & Liu, X. (2025). Towards a science of scaling agent systems.arXiv preprint arXiv:2512.08296
Pith/arXiv arXiv 2025
-
[25]
Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., ... & Molchanov, P. (2025). Small language models are the future of agentic AI.arXiv preprintarXiv:2506.02153
Pith/arXiv arXiv 2025
-
[26]
(2024, December 20).Building effective agents
Anthropic. (2024, December 20).Building effective agents. Anthropic Engineering.https://www.anthro pic.com/engineering/building-effective-age nts
2024
-
[27]
E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., & Narasimhan, K
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., & Narasimhan, K. (2024, May). SWE-bench: Can language models resolve real-world GitHub issues? InInternational Conference on Learning Representa- tions(Vol. 2024, pp. 54107–54157)
2024
-
[28]
(2024, May)
Mialon, G., Fourrier, C., Wolf, T., LeCun, Y., & Scialom, T. (2024, May). GAIA: a benchmark for general AI assistants. InInternational Conference on Learning Representations(Vol. 2024, pp. 9025–9049)
2024
-
[29]
https://openai.com/i ndex/introducing-codex/ , May 2025a
OpenAI.Introducing Codex. https://openai.com/i ndex/introducing-codex/ , May 2025a. Accessed: 2026-04-06
2026
-
[30]
Lin, J., Liu, S., Pan, C., Lin, L., Dou, S., Huang, X., ... & Gui, T. (2026). Agentic harness engineering: Observability-driven automatic evolution of coding- agent harnesses.arXiv preprintarXiv:2604.25850
Pith/arXiv arXiv 2026
-
[31]
(2025, October 16).Equipping agents for the real world with Agent Skills
Zhang, B., Lazuka, K., & Murag, M. (2025, October 16).Equipping agents for the real world with Agent Skills. Anthropic Engineering.https://www.anthro pic.com/engineering/equipping-agents-for-t he-real-world-with-agent-skills. 14
2025
-
[32]
https://platform.claude
Agent Skills overview. https://platform.claude. com/docs/en/agents-and-tools/agent-skills/ overview
-
[33]
E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., & Press, O
Yang, J., Jimenez, C. E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., & Press, O. (2024). SWE-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Process- ing Systems, 37, 50528–50652
2024
-
[34]
(2025, April 18).Claude Code: Best prac- tices for agentic coding
Anthropic. (2025, April 18).Claude Code: Best prac- tices for agentic coding. Anthropic Engineering.https: //www.anthropic.com/engineering/claude-cod e-best-practices
2025
-
[35]
(2025, May).Introducing Codex
OpenAI. (2025, May).Introducing Codex. https:// openai.com/index/introducing-codex/
2025
-
[36]
(2025, June 25).Gemini CLI: Your open- source AI agent
Google. (2025, June 25).Gemini CLI: Your open- source AI agent. Google Blog. https://blog.goo gle/technology/developers/introducing-gemin i-cli-open-source-ai-agent/
2025
-
[37]
(2026).OpenClaw: Open-source AI coding assistant[Computer software]
OpenClaw. (2026).OpenClaw: Open-source AI coding assistant[Computer software]. https://openclawla b.com/
2026
-
[38]
(2026).Hermes Agent: A self- improving autonomous AI agent[Computer software]
Nous Research. (2026).Hermes Agent: A self- improving autonomous AI agent[Computer software]. https://hermes-agent.nousresearch.com/
2026
-
[39]
Jiang, Z., Xu, W., & Dong, B. (2026). An Agen- tic AI-Enabled Physics-Informed Machine Learning Framework for Grid-Interactive, Decarbonized Build- ing Operations.Advances in Applied Energy, 100273
2026
-
[40]
(2026).Awesome Agent Skills: A cu- rated collection of agent skills from official develop- ment teams and the community[Computer software]
VoltAgent. (2026).Awesome Agent Skills: A cu- rated collection of agent skills from official develop- ment teams and the community[Computer software]. GitHub. https://github.com/VoltAgent/awesome -agent-skills(accessed June 23, 2026). 15
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.