Recognition: 2 theorem links
· Lean TheoremStrengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework
Pith reviewed 2026-05-10 18:49 UTC · model grok-4.3
The pith
A 16-factor structured prompt framework enhances Chain-of-Thought reasoning integrity in LLMs for cybersecurity tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By using a 16-factor prompt structure instead of unstructured prompts, LLMs demonstrate stronger reasoning integrity in analyzing security threats, leading to better detection performance and more interpretable outputs that hold up under human review.
What carries the argument
The 16-factor structured prompt framework, organized into four core dimensions of context control, evidence grounding, reasoning structure, and security constraints, which provides explicit controls to maintain reasoning quality.
If this is right
- Reasoning improvements reach up to 40 percent in smaller models.
- Accuracy gains hold steady across different model scales.
- Human evaluations show strong agreement on the enhanced reliability and explainability.
- The method serves as a lightweight way to make AI-driven security analysis more trustworthy and auditable.
Where Pith is reading between the lines
- The framework could be adapted for other domains requiring careful analytical reasoning, such as medical diagnosis or legal review.
- Future work might identify which subsets of the 16 factors contribute most to the observed benefits.
- Deploying this prompting method in real-time security monitoring systems could reduce false positives from AI hallucinations.
Load-bearing premise
That the performance improvements stem specifically from the structured 16-factor design rather than from using any carefully worded prompt in the same domain.
What would settle it
Running the same experiments but replacing the 16-factor structure with a comparably detailed unstructured prompt and measuring if gains disappear would test the necessity of the specific framework.
Figures
read the original abstract
Chain-of-Thought (CoT) prompting has been used to enhance the reasoning capability of LLMs. However, its reliability in security-sensitive analytical tasks remains insufficiently examined, particularly under structured human evaluation. Alternative approaches, such as model scaling and fine-tuning can be used to help improve performance. These methods are also often costly, computationally intensive, or difficult to audit. In contrast, prompt engineering provides a lightweight, transparent, and controllable mechanism for guiding LLM reasoning. This study proposes a structured prompt engineering framework designed to strengthen CoT reasoning integrity while improving security threat and attack detection reliability in local LLM deployments. The framework includes 16 factors grouped into four core dimensions: (1) Context and Scope Control, (2) Evidence Grounding and Traceability, (3) Reasoning Structure and Cognitive Control, and (4) Security-Specific Analytical Constraints. Rather than optimizing the wording of the prompt heuristically, the framework introduces explicit reasoning controls to mitigate hallucination and prevent reasoning drift, as well as strengthening interpretability in security-sensitive contexts. Using DDoS attack detection in SDN traffic as a case study, multiple model families were evaluated under structured and unstructured prompting conditions. Pareto frontier analysis and ablation experiments demonstrate consistent reasoning improvements (up to 40% in smaller models) and stable accuracy gains across scales. Human evaluation with strong inter-rater agreement (Cohen's k > 0.80) confirms robustness. The results establish structured prompting as an effective and practical approach for reliable and explainable AI-driven cybersecurity analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a structured prompt engineering framework with 16 factors grouped into four dimensions (Context and Scope Control, Evidence Grounding and Traceability, Reasoning Structure and Cognitive Control, and Security-Specific Analytical Constraints) to improve Chain-of-Thought reasoning integrity in LLMs for security-sensitive tasks. Using DDoS attack detection in SDN traffic as a case study, it evaluates multiple model families under structured versus unstructured prompting, claiming reasoning improvements up to 40% in smaller models, stable accuracy gains across scales, and robust human evaluation with Cohen's κ > 0.80. Pareto frontier analysis and ablation experiments are invoked to support the framework as a lightweight, transparent alternative to model scaling or fine-tuning for reliable and explainable AI-driven cybersecurity analysis.
Significance. If the reported gains are shown to arise specifically from the 16-factor structure rather than from detailed prompting in general, the work would offer a practical contribution to prompt engineering in high-stakes domains by providing explicit controls against hallucination and reasoning drift. The inclusion of human evaluation with strong inter-rater agreement is a positive element that supports claims of robustness and interpretability.
major comments (3)
- [Abstract] Abstract: The central claims of 'up to 40% in smaller models' reasoning improvements and 'stable accuracy gains across scales' are presented without any details on model sizes, exact baselines, data splits, metrics, or statistical tests. This absence prevents verification of the performance results and attribution to the proposed framework.
- [Evaluation] Evaluation section (Pareto frontier analysis and ablation experiments): The manuscript states that these analyses demonstrate consistent improvements, yet provides no description of the ablated components, the unstructured prompting baseline conditions, or quantitative results from the Pareto analysis. Without this, it is impossible to determine whether gains are due to the specific 16-factor, four-dimension structure or to any comparably detailed prompt.
- [Case Study] Case study and framework description: The evaluation is confined to a single DDoS-in-SDN scenario. The claim that the 16-factor structure itself strengthens reasoning integrity requires a control condition consisting of a prompt matched in length, domain specificity, and security detail but lacking the explicit four-dimensional grouping and reasoning controls; the current design does not isolate this mechanism.
minor comments (1)
- [Abstract] The abstract and introduction would benefit from explicit statements of the exact models evaluated and the precise definition of 'unstructured prompting conditions' to improve reproducibility.
Simulated Author's Rebuttal
Thank you for the detailed review and constructive feedback on our manuscript. We appreciate the recognition of the human evaluation aspect and the potential contribution to prompt engineering in high-stakes domains. We address each major comment below and will incorporate revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of 'up to 40% in smaller models' reasoning improvements and 'stable accuracy gains across scales' are presented without any details on model sizes, exact baselines, data splits, metrics, or statistical tests. This absence prevents verification of the performance results and attribution to the proposed framework.
Authors: We agree that the abstract lacks specific details on these elements. In the revised version, we will incorporate information on the model sizes evaluated, the exact baselines employed (unstructured prompting), the data splits used, the metrics for reasoning improvements and accuracy, as well as the statistical tests performed. This will enable better verification and attribution of the results to the proposed framework. revision: yes
-
Referee: [Evaluation] Evaluation section (Pareto frontier analysis and ablation experiments): The manuscript states that these analyses demonstrate consistent improvements, yet provides no description of the ablated components, the unstructured prompting baseline conditions, or quantitative results from the Pareto analysis. Without this, it is impossible to determine whether gains are due to the specific 16-factor, four-dimension structure or to any comparably detailed prompt.
Authors: We will expand the Evaluation section to provide a full description of the ablated components in the ablation experiments, the conditions for the unstructured prompting baseline, and the quantitative outcomes from the Pareto frontier analysis. These additions will allow readers to assess whether the gains stem from the specific 16-factor structure. revision: yes
-
Referee: [Case Study] Case study and framework description: The evaluation is confined to a single DDoS-in-SDN scenario. The claim that the 16-factor structure itself strengthens reasoning integrity requires a control condition consisting of a prompt matched in length, domain specificity, and security detail but lacking the explicit four-dimensional grouping and reasoning controls; the current design does not isolate this mechanism.
Authors: We recognize that the single-scenario case study limits generalizability, and that a matched control prompt is needed to isolate the effect of the structured framework. In the revised manuscript, we will include an additional control condition with a prompt matched for length, domain specificity, and security detail but without the four-dimensional grouping and explicit reasoning controls. Comparative results will be presented to demonstrate the unique contribution of the 16-factor structure. revision: yes
Circularity Check
No significant circularity; empirical evaluation is self-contained
full rationale
The paper proposes a 16-factor structured prompting framework and evaluates it empirically on a DDoS-in-SDN case study using ablation experiments, Pareto frontier analysis, and human evaluation (Cohen's k > 0.80). No equations, derivations, or first-principles claims are present that reduce reported accuracy or reasoning gains to quantities fitted inside the same experiment or to self-citations. The comparison is between structured and unstructured prompting conditions on held-out traffic data, with the central claims resting on observable experimental outcomes rather than tautological redefinitions or imported uniqueness theorems. This is the standard honest finding for an empirical prompt-engineering study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can follow explicit structural instructions to reduce reasoning drift and hallucination in security analysis
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The framework includes 16 factors grouped into four core dimensions: (1) Context and Scope Control, (2) Evidence Grounding and Traceability, (3) Reasoning Structure and Cognitive Control, and (4) Security-Specific Analytical Constraints.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Pareto frontier analysis and ablation experiments demonstrate consistent reasoning improvements (up to 40% in smaller models)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep Learning Based Intrusion Detection for Cybersecurity in Unmanned Aerial Vehicles Network
Shenoy, N., Mbaziira, A.V.: An extended review: LLM prompt engineering in cyber defense. In: 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET), pp. 1–6 (2024) doi:10.1109/ICECET61485.2024.10698605
-
[2]
Cybersecurity 8(1), 55 (2025) doi:10.1186/s42400-025-00361-w
Zhang, J., Bu, H., Wen, H., Liu, Y., Fei, H., Xi, R., Li, L., Yang, Y., Zhu, H., Meng, D.: When LLMs meet cybersecurity: A systematic literature review. Cybersecurity 8(1), 55 (2025) doi:10.1186/s42400-025-00361-w
-
[3]
Big Data and Cognitive Computing 9(7), 184 (2025) doi:10.3390/bdcc9070184
Atlam, H.F.: LLMs in cyber security: Bridging practice and education. Big Data and Cognitive Computing 9(7), 184 (2025) doi:10.3390/bdcc9070184
-
[4]
SSRN 4853709 (2024) doi:10.2139/ssrn.4853709
Ferrag, M.A., Alwahedi, F., Battah, A., Cherif, B., Mechri, A., Tihanyi, N.: Gener- ative AI and large language models for cyber security: All insights you need. SSRN 4853709 (2024) doi:10.2139/ssrn.4853709
-
[5]
Computation 13(2), 30 (2025) doi:10.3390/ computation13020030
Kasri, W., Himeur, Y., Alkhazaleh, H.A., Tarapiah, S., Atalla, S., Mansoor, W., Al-Ahmad, H.: From vulnerability to defense: The role of large language models in enhancing cybersecurity. Computation 13(2), 30 (2025) doi:10.3390/ computation13020030
2025
-
[6]
Sood, A.K., Zeadally, S., Hong, E.: The paradigm of hallucinations in AI-driven cybersecurity systems: Understanding taxonomy, classification outcomes, and mit- igations. Computers and Electrical Engineering 124, 110307 (2025) doi:10.1016/j. compeleceng.2025.110307
work page doi:10.1016/j 2025
-
[7]
In: Huang, K., Wang, Y., Goertzel, B., Li, Y., Wright, S., Ponnapalli, J
Huang, K., Huang, G., Duan, Y., Hyun, J.: Utilizing prompt engineering to oper- ationalize cybersecurity. In: Huang, K., Wang, Y., Goertzel, B., Li, Y., Wright, S., Ponnapalli, J. (eds): Generative AI Security: Theories and Practices, pp. 271–303. Springer Nature Switzerland, Cham (2024) doi:10.1007/978-3-031-54252-7 9
-
[8]
In: 2025 Silicon Valley Cybersecurity Conference (SVCC), pp
Ahi, K., Valizadeh, S.: Large language models (LLMs) and generative AI in cyber- security and privacy: A survey of dual-use risks, AI-generated malware, explain- ability, and defensive strategies. In: 2025 Silicon Valley Cybersecurity Conference (SVCC), pp. 1–8 (2025) doi:10.1109/SVCC65277.2025.11133642
-
[9]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds): Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837. Curran Associates, Inc. (2022) doi:10....
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903 2022
-
[10]
Habibzadeh, A., Feyzi, F., Atani, R.E.: Large language models for security op- erations centers: A comprehensive survey. arXiv:2509.10858 (2025) doi:10.48550/ arXiv.2509.10858
-
[11]
A Survey of Large Language Models
Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y.: A survey of large language models. arXiv:2303.18223 (2023) doi:10.48550/arXiv.2303.18223
-
[12]
Chen, M., Xiao, C., Sun, H., Li, L., Derczynski, L., Anandkumar, A., Wang, F.: Combating security and privacy issues in the era of large language models. In: Pro- ceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts), pp. 8–18 (2024) doi:10...
-
[13]
2280–2292 (2022) doi:10.1145/3531146.3534642 20 Jiling Zhou et al
Brown, H., Lee, K., Mireshghallah, F., Shokri, R., Tram` er, F.: What does it mean for a language model to preserve privacy? In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 2280–2292 (2022) doi:10.1145/3531146.3534642 20 Jiling Zhou et al
-
[14]
arXiv:2402.03927 (2024) doi:10.48550/arXiv.2402.03927
Balloccu, S., Schmidtov´ a, P., Lango, M., Duˇ sek, O.: Leak, cheat, repeat: Data con- tamination and evaluation malpractices in closed-source LLMs. arXiv:2402.03927 (2024) doi:10.48550/arXiv.2402.03927
-
[15]
In: 2025 Computing, Communications and IoT Applications (ComComAp), pp
Adeseye, A., Isoaho, J., Virtanen, S., Mohammad, T.: Why compromise privacy? Local LLMs rival commercial LLMs in qualitative analysis. In: 2025 Computing, Communications and IoT Applications (ComComAp), pp. 127–132 (2025). doi: 10.1109/ComComAp68359.2025.11353130
-
[16]
In: Proceedings of the 2023 ACM Conference on Information Technology for Social Good, pp
Montagna, S., Ferretti, S., Klopfenstein, L.C., Florio, A., Pengo, M.F.: Data de- centralisation of LLM-based chatbot systems in chronic disease self-management. In: Proceedings of the 2023 ACM Conference on Information Technology for Social Good, pp. 205–212 (2023) doi:10.1145/3582515.3609536
-
[17]
Kumar, B.V.P., Ahmed, M.D.S.: Beyond clouds: Locally runnable LLMs as a secure solution for AI applications. DISO 3, 49 (2024). doi:10.1007/s44206-024-00141-y
-
[18]
arXiv:2511.12869 (2025) doi:10.48550/arXiv.2511
Mohsin, M.A., Umer, M., Bilal, A., Memon, Z., Qadir, M.I., Bhattacharya, S., Rizwan, H., Gorle, A.R., Kazmi, M.Z., Mohsin, A., Rafique, M.U.: On the funda- mental limits of LLMs at scale. arXiv:2511.12869 (2025) doi:10.48550/arXiv.2511. 12869
-
[19]
A General Language Assistant as a Laboratory for Alignment
Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., Elhage, N.: A general language assistant as a laboratory for alignment. arXiv:2112.00861 (2021) doi:10.48550/arXiv.2112.00861
work page internal anchor Pith review doi:10.48550/arxiv.2112.00861 2021
-
[21]
NPJ Digital Medicine 7(1), 41 (2024) 10.1038/s41746-024-01029-4
Wang, L., Chen, X., Deng, X., Wen, H., You, M., Liu, W., Li, Q., Li, J.: Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digital Medicine 7(1), 41 (2024) 10.1038/s41746-024-01029-4
-
[22]
Barrie, C., Palaiologou, E., T¨ ornberg, P.: Prompt stability scoring for text anno- tation with large language models. arXiv:2407.02039 (2024) doi:10.48550/arXiv. 2407.02039
work page internal anchor Pith review doi:10.48550/arxiv 2024
-
[23]
In: Proceedings of the 2025 ACM Southeast Conference, pp
Meda, K.N., Nara, P.S.C., Bozenka, S., Zormati, T., Turner, S., Worley, W., Mitra, R.: Integrating prompt structures using LLM embeddings for cybersecurity threats. In: Proceedings of the 2025 ACM Southeast Conference, pp. 180–187 (2025) doi: 10.1145/3696673.3723069
-
[24]
International Journal of Crowd Science 9(4), 251–261
Jia, Z., Geng, S., Zhao, Y., Zhang, H.: Comprehensive survey on prompts generat- ing via knowledge-guided chain-of-thought. International Journal of Crowd Science 9(4), 251–261. Tsinghua University Press (2025) doi:10.26599/IJCS.2024.9100038
-
[25]
A comprehensive survey on trustworthiness in reasoning with large language models
Wang, Y., Yu, Y., Liang, J., He, R.: A comprehensive survey on trustworthiness in reasoning with large language models. arXiv:2509.03871 (2025) doi:10.48550/ arXiv.2509.03871
-
[26]
Land Forces Academy Review 30(2), 291–302 (2025) doi:10.2478/raft-2025-0028
Priescu, I., Banu, G.S., Dosescu, T.C., Banu, M.I.: Prompt Engineering in Cybersecurity–Achieving Technological Edge. Land Forces Academy Review 30(2), 291–302 (2025) doi:10.2478/raft-2025-0028
-
[27]
In: 8th IEEE Conference on Industrial Cyber-Physical Systems (ICPS)
Iyenghar, P., Zimmer, C., Gregorio, C.: A feasibility study on chain-of-thought prompting for LLM-based OT cybersecurity risk assessment. In: 2025 IEEE 8th International Conference on Industrial Cyber-Physical Systems (ICPS), pp. 1–4 (2025) doi:10.1109/ICPS65515.2025.11087903
-
[28]
In: Al-Onaizan, Y., Bansal, M., Chen, Y.-N
Taveekitworachai, P., Abdullah, F., Thawonmas, R.: Null-shot prompting: Re- thinking prompting large language models with hallucination. In: Al-Onaizan, Y., Bansal, M., Chen, Y.-N. (eds): Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 13321–13361. Association for Computational Linguistics, (2024) doi:10...
-
[29]
arXiv:2511.04108 (2025) doi:10.48550/ arXiv.2511.04108
Singh, G., Dey, A., Bidhan, J., Kansal, T., Kath, P., Srivastava, S.: Batch prompt- ing suppresses overthinking reasoning under constraint: How batch prompting sup- presses overthinking in reasoning models. arXiv:2511.04108 (2025) doi:10.48550/ arXiv.2511.04108
-
[30]
arXiv:2504.01282 (2025) doi: 10.48550/arXiv.2504.01282
Ahn, J.J., Yin, W.: Prompt-reverse inconsistency: LLM self-inconsistency beyond generative randomness and prompt paraphrasing. arXiv:2504.01282 (2025) doi: 10.48550/arXiv.2504.01282
-
[31]
In: International Conference on the AI Revolution, pp
Adeseye, A., Isoaho, J., Tahir, M.: Performance evaluation of LLM hallucination reduction strategies for reliable qualitative analysis. In: International Conference on the AI Revolution, pp. 142–156. Springer Nature Switzerland, Cham. (2025) doi:10.1007/978-3-032-12313-8 11
-
[32]
In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T
Xu, Y., Zheng, Y., Sun, S., Huang, S., Dong, B., Zhang, H., Huang, R., Yu, G., Wu, H., Wu, J.: Reason from Future: Reverse Thought Chain Enhances LLM Reasoning. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds): Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), pp. 25153–25166. Association for Co...
2025
-
[33]
ACM Computing Surveys 58(6), 1–35 (2025) doi:10.1145/3774896
Plaat, A., Wong, A., Verberne, S., Broekens, J., Van Stein, N., B¨ ack, T.: Multi-step reasoning with large language models, a survey. ACM Computing Surveys 58(6), 1–35 (2025) doi:10.1145/3774896
-
[34]
arXiv:2306.02569 (2023) doi:10.48550/arXiv.2306.02569
Zeng, F., Gao, W.: Prompt to be consistent is better than self-consistent? Few-shot and zero-shot fact verification with pre-trained language models. arXiv:2306.02569 (2023) doi:10.48550/arXiv.2306.02569
-
[35]
Evaluating step-by-step reasoning traces: A survey.arXiv preprint arXiv:2502.12289,
Lee, J., Hockenmaier, J.: Evaluating step-by-step reasoning traces: A survey. arXiv:2502.12289 (2025) doi:10.48550/arXiv.2502.12289
-
[36]
Osholake, S.F., Umealajekwu, C., Edohen, A., Majekodunmi, A.O., Evans-Anoruo, U.: Human-AI Collaborative Security Operations: Optimizing SOC Analyst Cog- nitive Load through Augmented Intelligence Frameworks. IRE Journals (2024). https://www.irejournals.com/formatedpaper/1709110.pdf
-
[37]
Authorea Preprints (2024) doi:10
Mariam, A., Berrada, A.:Human-Centric Enterprise Security: Advancing Access Control through AI-Driven Administration. Authorea Preprints (2024) doi:10. 22541/au.170708972.23906177/v1
-
[38]
Information Systems Frontiers, 1–19
Panteli, N., Nthubu, B.R., Mersinas, K.: Being Responsible in Cybersecurity: A Multi-Layered Perspective. Information Systems Frontiers, 1–19. Springer Nature (2025) doi:10.1007/s10796-025-10588-0
-
[39]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Chen, Q., Qin, L., Liu, J., Peng, D., Guan, J., Wang, P., Hu, M., Zhou, Y., Gao, T., Che, W.: Towards reasoning era: A survey of long chain-of-thought for reasoning large language models. arXiv:2503.09567 (2025) doi:10.48550/arXiv.2503.09567
work page internal anchor Pith review doi:10.48550/arxiv.2503.09567 2025
-
[40]
Cheng, J., Su, T., Yuan, J., He, G., Liu, J., Tao, X., Xie, J., Li, H.: Chain-of-thought prompting obscures hallucination cues in large language models: An empirical eval- uation. arXiv:2506.17088 (2025) doi:10.48550/arXiv.2506.17088
-
[41]
Large Language Models are Zero-Shot Reasoners
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds): Advances in Neural Information Processing Systems, vol. 35, pp. 22199–22213. Curran Associates, Inc. (2022) doi:10.48550/arXiv.2205.11916
work page internal anchor Pith review doi:10.48550/arxiv.2205.11916 2022
-
[42]
Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., Zhou, X., Wang, E., Dong, X.: Better zero-shot reasoning with role-play prompting. In: Proceed- ings of the 2024 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 4099–4113. Association for Computation...
-
[43]
ACM Transactions on Software Engineering and Methodology 34(2), 1–23
Li, J., Li, G., Li, Y., Jin, Z.: Structured chain-of-thought prompting for code generation. ACM Transactions on Software Engineering and Methodology 34(2), 1–23. Association for Computing Machinery, New York, NY, USA (2025) doi: 10.1145/3690635
-
[44]
Neumann, A., Kirsten, E., Zafar, M.B., Singh, J.: Position is power: System prompts as a mechanism of bias in large language models (LLMs). In: Proceed- ings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, pp. 573–598. Association for Computing Machinery, New York, NY, USA (2025) doi:10.1145/3715275.3732038
-
[45]
In: 2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS), pp
Adeseye, A., Isoaho, J., Tahir, M.: Systematic prompt framework for qualitative data analysis: Designing system and user prompts. In: 2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS), pp. 229–234. IEEE, (2025). doi:10.1109/ICHMS65439.2025.11154183
-
[46]
Kaggle Dataset, 1–23 (2021) https://www.kaggle
Kazin, A.: DDoS SDN Dataset. Kaggle Dataset, 1–23 (2021) https://www.kaggle. com/datasets/aikenkazin/ddos-sdn-dataset
2021
-
[47]
In: 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring), pp
Han, Y., Jia, Z., He, S., Zhang, Y., Wu, Q.: CNN+Transformer based anomaly traffic detection in UAV networks for emergency rescue. In: 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring), pp. 1–5. IEEE, (2025). doi: 10.1109/VTC2025-Spring65109.2025.11174732
work page doi:10.1109/vtc2025-spring65109.2025.11174732 2025
-
[48]
Applied Computing and Infor- matics 17(1), 168–192
Tharwat, A.: Classification assessment methods. Applied Computing and Infor- matics 17(1), 168–192. Emerald Publishing, (2021) doi:10.1016/j.aci.2018.08.003
-
[49]
International Journal of Ad- vanced Computer Science and Applications, 12(6), 599–606
Vujovi´ c,ˇZ.: Classification model evaluation metrics. International Journal of Ad- vanced Computer Science and Applications, 12(6), 599–606. SAI, (2021) doi: 10.14569/IJACSA.2021.0120670
-
[50]
In: Computer Science On-line Conference, pp
Naidu, G., Zuva, T., Sibanda, E.M.: A review of evaluation metrics in machine learning algorithms. In: Computer Science On-line Conference, pp. 15–25. Springer International Publishing, Cham. (2023) doi:10.1007/978-3-031-35314-7 2
-
[51]
In: Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems
Yacouby, R., Axman, D.: Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Eger, S., Gao, Y., Peyrard, M., Zhao, W., Hovy, E. (eds): Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp. 79–91. Association for Compu- tational Linguistics, (2020) doi:10.18653/v1...
-
[52]
Golovneva, O., Chen, M., Poff, S., Corredor, M., Zettlemoyer, L., Fazel-Zarandi, M., Celikyilmaz, A.: Roscoe: A suite of metrics for scoring step-by-step reasoning. arXiv:2212.07919 (2022) doi:10.48550/arXiv.2212.07919
-
[53]
In: International Conference on Fuzzy Systems, pp
Vieira, S.M., Kaymak, U., Sousa, J.M.: Cohen’s kappa coefficient as a performance measure for feature selection. In: International Conference on Fuzzy Systems, pp. 1–8. IEEE, (2010) doi:10.1109/FUZZY.2010.5584447
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.