CornerCase: Automated Extremal Testing of Protocol Implementations using LLMs
Pith reviewed 2026-06-30 02:53 UTC · model grok-4.3
The pith
CornerCase uses LLMs to extract validity constraints from protocol specs and generates tests at their boundaries to find implementation bugs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CornerCase decomposes test generation into LLM-based extraction of explicit validity constraints from protocol specifications in a structured section-by-section manner, followed by generation of extremal test cases at or near the boundary of each constraint; these tests are executed across multiple implementations with differential testing to identify inconsistencies that expose bugs missed by fuzzing and model-based testing.
What carries the argument
Two-stage process of LLM-driven structured constraint extraction from specifications combined with extremal test generation at constraint boundaries and differential testing across implementations.
If this is right
- Boundary behaviors such as encoded null bytes in URLs or state-dependent message validity can be targeted systematically rather than left to chance in random testing.
- Differential testing across implementations of the same protocol reliably surfaces bugs through observable inconsistencies.
- The same decomposition can be repeated on additional protocols beyond the five evaluated here.
- Many bugs previously unknown can be identified and reported, leading to fixes in production implementations.
Where Pith is reading between the lines
- If LLM accuracy on constraint extraction improves, the approach could extend to longer or more ambiguous specifications.
- Patterns in the extracted constraints might highlight recurring ambiguities in how protocols are specified.
- Combining the boundary-focused tests with existing fuzzers could produce broader coverage of protocol edge cases.
Load-bearing premise
Large language models can accurately extract explicit validity constraints from protocol specifications without significant omissions or hallucinations.
What would settle it
A protocol specification where the LLM extraction step misses or misstates a key validity constraint, causing the generated tests to miss a known boundary bug or produce only false inconsistencies.
Figures
read the original abstract
Many software bugs in network protocol implementations arise near specification boundaries, such as inputs just within or outside allowed ranges, or messages that are valid in isolation but invalid in a given state. From the SSL Heartbleed exploit to TCP Christmas Tree packets, boundary inputs have repeatedly exposed critical weaknesses, yet remain under-tested by existing techniques such as fuzzing and model-based testing. We present CornerCase, an automated extremal testing approach that systematically targets such boundary behaviors. Our key idea is to decompose test generation into two stages: first, large language models (LLMs) extract explicit validity constraints from protocol specifications (e.g., RFCs) in a structured, section-by-section manner; second, extremal test cases are generated at or near the boundary of each constraint. These tests are executed across multiple implementations, and differential testing identifies inconsistencies. We evaluate CornerCase on widely used implementations of HTTP, DNS, BGP, SMTP, and QUIC, uncovering many previously unknown bugs. For example, the HTTP server h2o enters a redirect loop when processing URLs containing encoded null bytes. Overall, we used CornerCase to identify and file 42 anomalies; to date 26 have been acknowledged as bugs and 18 fixed, with others under active investigation
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CornerCase, an automated extremal testing framework that uses LLMs to extract explicit validity constraints from protocol specifications (e.g., RFCs) in a structured section-by-section manner, generates test cases at or near those constraint boundaries, and applies differential testing across multiple implementations of HTTP, DNS, BGP, SMTP, and QUIC to identify anomalies, reporting 42 anomalies of which 26 have been acknowledged as bugs.
Significance. If the LLM extraction step proves accurate, the work would provide a useful complement to fuzzing and model-based testing by systematically targeting boundary behaviors that have historically caused serious vulnerabilities; the external validation via 26 acknowledged bugs is a concrete strength that supports the empirical utility of the generated tests.
major comments (2)
- [§3 (Approach)] §3 (Approach): The description of LLM-based constraint extraction provides no quantitative audit (e.g., precision/recall against expert-annotated ground truth on held-out RFC sections) of extraction accuracy, omissions, or hallucinations. This is load-bearing for the central claim because any systematic error in the extracted constraints directly invalidates the extremal tests and prevents confident attribution of differential anomalies to implementation bugs rather than test-construction artifacts.
- [§5 (Evaluation)] §5 (Evaluation): The reported outcomes (42 anomalies, 26 acknowledged) give no breakdown of false-positive rates, no comparison of LLM-extracted constraints against the actual tests that triggered each anomaly, and no discussion of how differential testing distinguishes bugs from benign implementation differences or from LLM-induced invalid inputs.
minor comments (1)
- [Abstract] The abstract and method overview could more explicitly flag the current lack of extraction validation as a limitation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which identifies key areas where additional validation would strengthen the manuscript. We address each major comment below and outline planned revisions to the approach and evaluation sections.
read point-by-point responses
-
Referee: §3 (Approach): The description of LLM-based constraint extraction provides no quantitative audit (e.g., precision/recall against expert-annotated ground truth on held-out RFC sections) of extraction accuracy, omissions, or hallucinations. This is load-bearing for the central claim because any systematic error in the extracted constraints directly invalidates the extremal tests and prevents confident attribution of differential anomalies to implementation bugs rather than test-construction artifacts.
Authors: We agree that a quantitative audit of the LLM extraction step is important for rigorously validating the approach and for confident attribution of anomalies. The current manuscript relies on downstream developer acknowledgments (26 bugs) as indirect evidence of utility but does not include precision/recall metrics against expert ground truth. In the revised version we will add such an audit: we will manually annotate a held-out sample of RFC sections for a subset of the evaluated protocols, compute precision and recall for the extracted constraints, and report omission and hallucination rates. This will be presented in an expanded §3. revision: yes
-
Referee: §5 (Evaluation): The reported outcomes (42 anomalies, 26 acknowledged) give no breakdown of false-positive rates, no comparison of LLM-extracted constraints against the actual tests that triggered each anomaly, and no discussion of how differential testing distinguishes bugs from benign implementation differences or from LLM-induced invalid inputs.
Authors: We acknowledge that the evaluation section would benefit from greater transparency on these points. The manuscript currently emphasizes the aggregate counts and acknowledgments without the requested breakdowns or explicit discussion of differential-testing mechanics. In the revision we will expand §5 to include: (i) an analysis of potential false-positive anomalies and how they were filtered, (ii) concrete examples mapping specific extracted constraints to the test cases that triggered each reported anomaly, and (iii) a discussion of how running the same extremal inputs across multiple independent implementations helps separate implementation bugs from benign differences or from any LLM-induced invalid inputs. These additions will be supported by additional tables and case studies. revision: yes
Circularity Check
No circularity; empirical method with external validation
full rationale
The paper presents CornerCase as an empirical testing technique that uses LLMs to extract constraints section-by-section from protocol specs (RFCs) and then generates extremal tests for differential testing across implementations. No mathematical derivation chain, equations, predictions, or first-principles results are claimed. Results rest on reported bug findings (42 anomalies, 26 acknowledged) that are externally validated by third-party acknowledgments rather than internal fits or self-citations. No self-definitional steps, fitted-input predictions, or load-bearing self-citation chains appear in the described approach. The central assumption about LLM extraction accuracy is an empirical claim open to external audit, not a circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can extract explicit validity constraints from protocol specifications (e.g., RFCs) accurately and completely in a structured manner
Reference graph
Works this paper leans on
-
[1]
aiosmtpd - An asyncio based SMTP server
aiosmtpd community. aiosmtpd - An asyncio based SMTP server. https://aiosmtpd.aio-libs.org/en/latest/, 2026
2026
-
[2]
AFL 2018
American Fuzzing Lop AFL. AFL 2018. https: //lcamtuf.coredump.cx/afl/
2018
-
[3]
Assessing Claude Mythos Preview’s Cy- bersecurity Capabilities
Anthropic. Assessing Claude Mythos Preview’s Cy- bersecurity Capabilities. https://red.anthropic. com/2026/mythos-preview/,2026. Accessed: 2026- 04-19
2026
-
[4]
Can Aygun, Yehuda Afek, Anat Bremler-Barr, and Leonard Kleinrock
R. Can Aygun, Yehuda Afek, Anat Bremler-Barr, and Leonard Kleinrock. LAPRAD: LLM-Assisted PRotocol Attack Discovery. InIFIP Network- ing 2025 Proceedings, 2025. Also available as arXiv:2510.19264
-
[5]
Asma Bhat and S. M. K. Quadri. Equivalence class partitioning and boundary value analysis - A review. In2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pages 1557–1562, 2015
2015
-
[6]
Brandon L Black and Community. gdnsd. https://gdnsd.org/, 2023. Github:https://github.com/gdnsd/gdnsd
2023
-
[7]
Coverage-based greybox fuzzing as Markov chain
Marcel Böhme, Van-Thuan Pham, and Abhik Roy- choudhury. Coverage-based greybox fuzzing as Markov chain. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 1032–1043, 2016
2016
-
[8]
A Formal TLS Handshake Model in LNT
JosipBozic,Lina Marsso,Radu Mateescu,andFranz Wotawa. A formal TLS handshake model in LNT. arXiv preprint arXiv:1803.10319, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
KLEE: Unassisted and automatic generation ofhigh-coverage tests forcomplex systems programs
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. KLEE: Unassisted and automatic generation ofhigh-coverage tests forcomplex systems programs. InOSDI, volume 8, pages 209–224, 2008
2008
-
[10]
quiche QUIC Implementa- tion
Cloudflare, Inc. quiche QUIC Implementa- tion. https://github.com/cloudflare/quiche,
-
[12]
CoreDNS community. CoreDNS. https://coredns.io/, 2026. Github:https://github.com/coredns/coredns
2026
-
[13]
The FRRouting protocol suite
FRR community. The FRRouting protocol suite. https://frrouting.org/, 2026. Github:https://github.com/FRRouting/frr
2026
-
[14]
GoBGP community. GoBGP. https://github.com/osrg/gobgp, 2026
2026
-
[15]
PowerDNS
PowerDNS Community. PowerDNS. https://www.powerdns.com/, 2026. Github:https://github.com/PowerDNS/pdns
2026
-
[16]
Internet Systems Consortium. BIND 9. https://www.isc.org/bind/, 2026. GitLab: https://gitlab.isc.org/isc-projects/bind9
2026
-
[17]
CZ.NIC. Knot. https://www.knot-dns.cz/, 2025. GitLab: https://gitlab.nic.cz/knot/ knot-dns
2025
-
[18]
A simple BGP fuzzer based on boofuzz.Github, 2023
Stanislav Dashevskyi. A simple BGP fuzzer based on boofuzz.Github, 2023. https://github.com/ Forescout/bgp_boofuzzer
2023
-
[19]
Protocol State Fuzzing of TLS Implementations
Joeri de Ruiter and Erik Poll. Protocol State Fuzzing of TLS Implementations. InUSENIX Se- curity Symposium, 2015
2015
-
[20]
Pentestgpt: An llm-empowered automatic penetration testing tool
Gelei Deng, Yi Liu, Victor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. Pentestgpt: An llm-empowered automatic penetration testing tool.arXiv preprint arXiv:2308.06782, 2024
-
[21]
SMTPD Python library
Python developer community. SMTPD Python library. https://docs.python.org/3.10/library/ smtpd.html, 2024
2024
-
[22]
Mailpit Email Testing Tool
Mailpit developers. Mailpit Email Testing Tool. https://github.com/axllent/mailpit, 2026
2026
-
[23]
OpenSMTPD Mail Server
OpenSMTPD developers. OpenSMTPD Mail Server. https://github.com/OpenSMTPD/OpenSMTPD, 2026
2026
-
[24]
Fuzzing targets and supported fuzzers available in FRR
Donatas Abraitis Donald Sharp and et al. Fuzzing targets and supported fuzzers available in FRR. Github, 2023. https://docs.frrouting.org/ projects/dev-guide/en/latest/fuzzing.html
2023
-
[25]
Kwik QUIC Implementation
Peter Doornbosch. Kwik QUIC Implementation. https://github.com/ptrd/kwik,2018. (Accessed: 2026-04-10)
2018
-
[26]
EURid.eu. Yadifa. https://www.yadifa.eu/, 2026. Github:https://github.com/yadifa/yadifa
2026
- [27]
-
[28]
A general approach to network configuration analysis
Ari Fogel,Stanley Fung,Luis Pedrosa,Meg Walraed- Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd Millstein. A general approach to network configuration analysis. InProceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, page 469–483, USA,
-
[29]
Apache HTTP Server
Apache Software Foundation. Apache HTTP Server. https://httpd.apache.org, 1995. Source code: https://github.com/apache/httpd (ac- cessed 2026-04-10)
1995
-
[30]
Fuzzing DNS zone parsers
Frederic Cambus. Fuzzing DNS zone parsers. https://www.cambus.net/ fuzzing-dns-zone-parsers/
-
[31]
Hickory-DNS
Benjamin Fry and Community. Hickory-DNS. https://github.com/hickory-dns/ hickory-dns, 2026. Github: https://github.com/hickory-dns/ hickory-dns/
2026
-
[32]
DART: Directed automated random testing
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. InPro- ceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 213–223, 2005
2005
-
[33]
Boundary Value Test Input Generation using Prompt Engineering with LLMs: Fault Detection and Coverage analysis, 2025
Xiujing Guo, Chen Li, and Tatsuhiro Tsuchiya. Boundary Value Test Input Generation using Prompt Engineering with LLMs: Fault Detection and Coverage analysis, 2025
2025
-
[34]
Caddy Web Server
Matthew Holt. Caddy Web Server. https:// caddyserver.com, 2015. Source code: https:// github.com/caddyserver/caddy (accessed 2026- 04-10)
2015
-
[35]
picoquic QUIC Implemen- tation
Christian Huitema. picoquic QUIC Implemen- tation. https://github.com/private-octopus/ picoquic, 2017. (Accessed: 2026-04-10)
2017
-
[36]
SCALE: Auto- matically finding RFC compliance bugs in DNS nameservers
Siva Kesava Reddy Kakarla, Ryan Beckett, Todd Millstein, and George Varghese. SCALE: Auto- matically finding RFC compliance bugs in DNS nameservers. In19th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 22), pages 307–323, 2022
2022
-
[37]
lighttpd Web Server
Jan Kneschke. lighttpd Web Server. https: //www.lighttpd.net, 2003. Source code: https: //github.com/lighttpd/lighttpd1.4 (accessed 2026-04-10)
2003
-
[38]
NLnet Labs. NSD. https://nlnetlabs.nl/projects/nsd/about/, 2026. Github:https://github.com/NLnetLabs/nsd
2026
-
[39]
Stalwart Mail Server
Stalwart Labs. Stalwart Mail Server. https://github.com/stalwartlabs/stalwart, 2026
2026
-
[40]
TwistedNames
Twisted Matrix Labs. TwistedNames. https://twisted.org/, 2026. Github:https://github.com/twisted/twisted
2026
-
[41]
aioquic QUIC Implementation
Jeremy Lainé. aioquic QUIC Implementation. https://github.com/aiortc/aioquic,2019. (Ac- cessed: 2026-04-10)
2019
-
[42]
Gatling: Auto- matic performance attack discovery in Large-scale Distributed systems.ACM Trans
Hyojeong Lee, Jeff Seibert, Dylan Fistrovic, Charles Killian, and Cristina Nita-Rotaru. Gatling: Auto- matic performance attack discovery in Large-scale Distributed systems.ACM Trans. Inf. Syst. Secur., 17(4), apr 2015
2015
-
[43]
LSQUIC QUIC Implemen- tation
LiteSpeed Technologies. LSQUIC QUIC Implemen- tation. https://github.com/litespeedtech/ lsquic, 2017. (Accessed: 2026-04-10)
2017
-
[44]
Insecure, 2009
Gordon Lyon.NMAP Network Scanning: The Of- ficial NMAP Project Guide to Network Discovery and Security Scanning. Insecure, 2009
2009
-
[45]
mvfst QUIC Implementation
Meta Platforms, Inc. mvfst QUIC Implementation. https://github.com/facebook/mvfst,2019. (Ac- cessed: 2026-04-10)
2019
-
[46]
MsQuic QUIC Implemen- tation
Microsoft Corporation. MsQuic QUIC Implemen- tation. https://github.com/microsoft/msquic,
-
[48]
OpenSSL TLS Heart- beat Extension Read Overrun (CVE-2014-0160)
MITRE Corporation. OpenSSL TLS Heart- beat Extension Read Overrun (CVE-2014-0160). https://cve.mitre.org/cgi-bin/cvename. cgi?name=CVE-2014-0160, 2014. Accessed: 2026-04-17
2014
-
[49]
Eywa: Automating model based testing using llms.arXiv preprint arXiv:2312.06875, 2023
Rajdeep Mondal, Rathin Singha, Todd Millstein, George Varghese, Ryan Beckett, and Siva Ke- sava Reddy Kakarla. Eywa: Automating model based testing using llms.arXiv preprint arXiv:2312.06875, 2023
-
[50]
Neqo QUIC Implementation
Mozilla Corporation. Neqo QUIC Implementation. https://github.com/mozilla/neqo, 2019. (Ac- cessed: 2026-04-10)
2019
-
[51]
NGINX QUIC Implementation
NGINX, Inc. NGINX QUIC Implementation. https://github.com/nginx/nginx, 2020. Project page: https://quic.nginx.org/ (Accessed: 2026- 04-10). 14
2020
-
[52]
Dns-fuzz
NMAP Organization. Dns-fuzz. https://nmap. org/nsedoc/scripts/dns-fuzz.html
-
[53]
H2O HTTP Server
Kazuho Oku. H2O HTTP Server. https://h2o. examp1e.net,2014. SourceCode: https://github. com/h2o/h2o(accessed 2026-04-10)
2014
-
[54]
https://peachtech.gitlab.io/ peach-fuzzer-community/
Peach Fuzzer. https://peachtech.gitlab.io/ peach-fuzzer-community/
-
[55]
quic-go QUIC Implemen- tation
quic-go contributors. quic-go QUIC Implemen- tation. https://github.com/quic-go/quic-go,
-
[57]
Chrome Image for the QUIC Interop Runner.https://github.com/ quic-interop/chrome-quic-interop-runner,
QUIC Interop Working Group. Chrome Image for the QUIC Interop Runner.https://github.com/ quic-interop/chrome-quic-interop-runner,
-
[58]
(Accessed: 2026-04-10)
2026
-
[59]
Quinn: QUIC Implemen- tation in Rust
quinn-rs contributors. Quinn: QUIC Implemen- tation in Rust. https://github.com/quinn-rs/ quinn, 2018. (Accessed: 2026-04-10)
2018
-
[60]
Testing software compo- nents using boundary value analysis
Muthu Ramachandran. Testing software compo- nents using boundary value analysis. In2003 Pro- ceedings 29th Euromicro Conference, pages 94–98. IEEE, 2003
2003
-
[61]
Automating QUIC Interoperability Testing
Marten Seemann and Jana Iyengar. Automating QUIC Interoperability Testing. InProceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC, EPIQ’20, pages 8–13, New York, NY, USA, 2020. ACM. Co-located with SIG- COMM 2020, Virtual Event, USA
2020
-
[62]
Black Box testing with Bound- ary Value Analysis and Equivalence Partitioning Methods
Muhammad Sholeh, Irmah Gisfas, Muhammad An- war Fauzi, et al. Black Box testing with Bound- ary Value Analysis and Equivalence Partitioning Methods. InJournal of Physics: Conference Series, volume 1823, page 012029. IOP Publishing, 2021
2021
-
[63]
MESSI: Behavioral Testing of BGP Im- plementations
Rathin Singha,Rajdeep Mondal,Ryan Beckett,Siva Kesava Reddy Kakarla, Todd Millstein, and George Varghese. MESSI: Behavioral Testing of BGP Im- plementations. In21st USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 24), pages 1009–1023, 2024
2024
-
[64]
Extremal testing for network software using llms, 2025
Rathin Singha, Harry Qian, Srinath Saikrishnan, Tracy Zhao, Ryan Beckett, Siva Kesava Reddy Kakarla, and George Varghese. Extremal testing for network software using llms, 2025
2025
-
[65]
Honggfuzz - Security oriented software fuzzer
Robert Swiecki. Honggfuzz - Security oriented software fuzzer. https://github.com/google/ honggfuzz/tree/master/examples/bind
-
[66]
NGINX HTTP Server
Igor Sysoev. NGINX HTTP Server. https:// nginx.org, 2004. Source code: https://github. com/nginx/nginx(accessed 2026-04-10)
2004
-
[67]
HAProxy QUIC Implemen- tation
Willy Tarreau. HAProxy QUIC Implemen- tation. https://github.com/haproxy/haproxy,
-
[68]
Canoni- cal source: https://git.haproxy.org/ (Accessed: 2026-04-10)
QUIC support added in v2.6. Canoni- cal source: https://git.haproxy.org/ (Accessed: 2026-04-10)
2026
-
[69]
golang.org/x/net: QUIC Pack- age
The Go Authors. golang.org/x/net: QUIC Pack- age. https://pkg.go.dev/golang.org/x/net/ internal/quic, 2022. Source code: https:// github.com/golang/net(Accessed: 2026-04-10)
2022
-
[70]
ngtcp2 QUIC Implementa- tion
Tatsuhiro Tsujikawa. ngtcp2 QUIC Implementa- tion. https://github.com/ngtcp2/ngtcp2, 2017. (Accessed: 2026-04-10)
2017
-
[71]
Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering,
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering,
- [72]
-
[73]
American Fuzzy Lop (AFL)
Michal Zalewski. American Fuzzy Lop (AFL). https://lcamtuf.coredump.cx/afl/, 2014. Ac- cessed: 2026-04-17
2014
-
[74]
Technitium DNS server
Shreyas Zare and Community. Technitium DNS server. https://technitium.com/dns/, 2026. Github: https://github.com/ TechnitiumSoftware/DnsServer
2026
-
[75]
Boundary value analysis in automatic white-box test generation
Zhiqiang Zhang, Tianyong Wu, and Jian Zhang. Boundary value analysis in automatic white-box test generation. In2015 IEEE 26th International Symposium on Software Reliability Engineering (IS- SRE), pages 239–249. IEEE, 2015
2015
-
[76]
Large language model for vulnerability detection: Emerg- ing results and future directions
Xiaogang Zhou, Tianyi Zhang, and David Lo. Large language model for vulnerability detection: Emerg- ing results and future directions. InProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), 2024
2024
-
[77]
test_id":
Sam Hocevar. zzuf: multi-purpose fuzzer. https: //caca.zoy.org/wiki/zzuf. 15 A Test Format This appendix documents the input and output test formats used by each protocol-specific harness. A.1 HTTP Test input format: { "test_id": "<integer>", "constraint": "<exact constraint that is being tested from the given list of constraints>", "description": "<descr...
-
[78]
Use the test case format to infer what inputs can be controlled by the tests
-
[79]
Scan the RFC chunk and find sentences that define constraints on those inputs. These include: - syntax rules, - allowed or disallowed values, - length or size limits, - character set restrictions, - relationships between multiple inputs, - ordering/state rules that can be represented as test inputs/state
-
[80]
But also look for sentences that describe a rule or constraint on inputs that can be tested with this framework
Constraints are generally RFC statements that include MUST/MUST NOT/SHOULD/SHOULD NOT. But also look for sentences that describe a rule or constraint on inputs that can be tested with this framework
-
[81]
Every constraint is written as a tuple: (<section_number>, <constraint>)
-
[82]
4.1.1",
If the chunk has no relevant constraints, return []. ### Important: - Return each constraint sentence *exactly as written* in the RFC (no edits). - Only include sentences that can plausibly be tested using the described setup. ### Output format (for each chunk): Return ONLY a JSON array like: [ ["4.1.1", "sentence1"], ["4.1.1", "sentence2"], ["4.2", "sent...
-
[83]
Reason about the differences between the implementations'outputs, considering: - Is one or more implementation likely violating the RFC (a real bug)? - Could the difference be due to acceptable implementation-specific behavior? - Could the difference plausibly be explained or fixed by configuration (e.g., security settings, extensions enabled/disabled, st...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.