pith. sign in

arxiv: 2512.15791 · v1 · submitted 2025-12-16 · 💻 cs.CY · cs.AI· cs.CL

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud

Pith reviewed 2026-05-16 22:34 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CL
keywords AI ethics toolslanguage modelsPortuguese languagedevelopers perspectiveModel CardsALTAIFactSheetsHarms Modeling
0
0 comments X p. Extension

The pith

AI ethics tools guide developers on general language model issues but miss Portuguese-specific harms like idiomatic effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys 213 AI ethics tools, narrows to four (Model Cards, ALTAI, FactSheets, Harms Modeling), and applies them to Portuguese language models through 35 hours of developer interviews. It concludes these tools offer a useful starting framework for broad ethical questions yet overlook model-unique features such as idiomatic expressions and fail to surface negative impacts on the Portuguese language. A reader would care because many AI systems now serve non-English users, and incomplete tools could leave cultural and linguistic risks unaddressed during development.

Core claim

The applied AIETs serve as a guide for formulating general ethical considerations about language models. However, they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.

What carries the argument

Selection of four AIETs after screening 213 publications, followed by their direct application to Portuguese language models and structured interviews with developers to rate usefulness and gaps.

If this is right

  • Developers working on language models in specific languages will need supplementary methods to catch harms not covered by current AIETs.
  • AI ethics tools require updates to incorporate checks for idiomatic expressions and cultural or linguistic nuances.
  • Evaluations of AIETs should routinely include developers from underrepresented languages rather than relying on general-purpose testing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Gaps found here may appear in other low-resource languages, suggesting standard tools favor dominant-language assumptions.
  • Tool creators could test revised versions by repeating the developer-interview process with new language-specific prompts.

Load-bearing premise

That interviews with the developers of these specific Portuguese models, using only the four chosen tools, provide enough evidence to judge how well AI ethics tools work for language models in general.

What would settle it

A follow-up study that applies the same four tools to English or other language models and finds they successfully surface unique aspects and negative impacts would contradict the reported limitations.

read the original abstract

In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and possible harms. In this scenario, the number of AI Ethics Tools (AIETs) publications has recently increased. These AIETs are designed to help developers, companies, governments, and other stakeholders establish trust, transparency, and responsibility with their technologies by bringing accepted values to guide AI's design, development, and use stages. However, many AIETs lack good documentation, examples of use, and proof of their effectiveness in practice. This paper presents a methodology for evaluating AIETs in language models. Our approach involved an extensive literature survey on 213 AIETs, and after applying inclusion and exclusion criteria, we selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling. For evaluation, we applied AIETs to language models developed for the Portuguese language, conducting 35 hours of interviews with their developers. The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model. The results suggest that the applied AIETs serve as a guide for formulating general ethical considerations about language models. However, we note that they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper surveys 213 AI Ethics Tools (AIETs), applies inclusion/exclusion criteria to select four (Model Cards, ALTAI, FactSheets, Harms Modeling), applies them to Portuguese-language models, and reports findings from 35 hours of developer interviews. It claims these tools guide only general ethical considerations and fail to address unique aspects such as idiomatic expressions or negative impacts specific to the Portuguese language.

Significance. If the findings are substantiated with transparent methods, the work would usefully map the AIET landscape via the 213-tool survey and provide practical developer perspectives on tool limitations for non-English models, highlighting the need for linguistically and culturally tailored ethics guidance.

major comments (2)
  1. [Methods] Methods: No details are supplied on the interview protocol, participant count or selection, exact procedure for applying each of the four AIETs to the models, qualitative analysis approach (coding scheme, saturation criteria), or how exclusion criteria were operationalized on the 213 tools. This absence directly undermines verifiability of the central claim that the tools failed to surface Portuguese-specific issues.
  2. [Results] Results/Discussion: The claim that the AIETs 'did not help to identify potential negative impacts of models for the Portuguese language' rests entirely on unverified developer self-reports. No cross-validation against model outputs, external audits, or documented harms is presented, making the evidence load-bearing for the failure conclusion but insufficiently grounded.
minor comments (1)
  1. [Title] Title is truncated ('Case Stud'); complete to 'Case Study'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below and will revise the manuscript to improve transparency and qualify our claims where appropriate.

read point-by-point responses
  1. Referee: [Methods] Methods: No details are supplied on the interview protocol, participant count or selection, exact procedure for applying each of the four AIETs to the models, qualitative analysis approach (coding scheme, saturation criteria), or how exclusion criteria were operationalized on the 213 tools. This absence directly undermines verifiability of the central claim that the tools failed to surface Portuguese-specific issues.

    Authors: We agree that the current Methods section lacks sufficient detail for full verifiability. In the revised manuscript we will expand it to describe the semi-structured interview protocol, the number of developer participants and their selection criteria, the step-by-step procedure for applying each of the four AIETs, the qualitative coding scheme and saturation criteria used, and the precise operationalization of the inclusion/exclusion criteria applied to the 213 tools. revision: yes

  2. Referee: [Results] Results/Discussion: The claim that the AIETs 'did not help to identify potential negative impacts of models for the Portuguese language' rests entirely on unverified developer self-reports. No cross-validation against model outputs, external audits, or documented harms is presented, making the evidence load-bearing for the failure conclusion but insufficiently grounded.

    Authors: The study is framed from the developers' perspective, so developer self-reports constitute the primary data. We acknowledge the absence of external validation as a limitation. In the revision we will add an explicit discussion of this limitation, include additional illustrative quotes from the interviews showing Portuguese-specific concerns raised by developers, and qualify the relevant claims to reflect that they are based on developer perceptions rather than independent verification. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external survey and interviews

full rationale

The paper conducts a literature survey of 213 AIETs, applies inclusion/exclusion criteria to select four tools, and evaluates them via 35 hours of developer interviews on Portuguese-language models. All central claims (that the tools guide only general ethics and miss idiomatic expressions or Portuguese-specific impacts) are presented as direct outcomes of this empirical process. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the abstract or described methodology. The derivation chain is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the selected tools are representative after screening and that interview data validly measures tool effectiveness for language-specific issues.

axioms (1)
  • domain assumption The four selected AIETs are representative of tools that should address language-specific ethical concerns.
    Stated after applying inclusion/exclusion criteria to 213 tools in the abstract.

pith-pipeline@v0.9.0 · 5618 in / 1156 out tokens · 62148 ms · 2026-05-16T22:34:03.956172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 5 internal anchors

  1. [1]

    OpenAI.: Introducing ChatGPT. 2022. https://openai.com/blog/chatgpt. Available from: https: //openai.com/blog/chatgpt

  2. [2]

    Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

    Bender EM, Koller A. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In: Jurafsky D, Chai J, Schluter N, Tetreault J, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics

  3. [3]

    5185–5198

    p. 5185–5198. Available from: https://aclanthology.org/2020.acl-main.463/

  4. [4]

    The Social Impact of Natural Language Processing

    Hovy D, Spruit SL. The Social Impact of Natural Language Processing. In: Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2016. p. 591–598

  5. [5]

    On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In: ACM Conference on Fairness, Accountability, and Transparency

    Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In: ACM Conference on Fairness, Accountability, and Transparency

  6. [6]

    Ethical and Social Risks of Harm from Language Models

    Weidinger L, Mellor J, Rauh M, Griffin C, Uesato J, Huang PS, et al. Ethical and Social Risks of Harm from Language Models. arXiv:211204359. 2021 Dec

  7. [7]

    Lost in Translation: Large Language Models in Non-English Content Analysis

    Nicholas G, Bhatia A. Lost in Translation: Large Language Models in Non-English Content Analysis. arXiv:230607377. 2023;[cs.CL]

  8. [8]

    CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

    Santos GO, Moreira DAB, Ferreira AI, Silva J, Pereira L, Bueno P, et al. CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages. In: Workshop on Multi-lingual Representation Learning (MRL), Conference on Empirical Methods in Natural Language Processing (EMNLP); 2023. p. 184–207

  9. [9]

    The Ghost in the Machine Has an American Accent: Value Conflict in GPT-3

    Johnson RL, Pistilli G, Men´ edez-Gonz´ alez N, Duran LDD, Panai E, Kalpokiene J, et al. The Ghost in the Machine Has an American Accent: Value Conflict in GPT-3. arXiv:220307785. 2022 Mar

  10. [10]

    Five sources of bias in natural language processing

    Hovy D, Prabhumoye S. Five sources of bias in natural language processing. Language and Linguistics Compass. 2021;15(8):e12432

  11. [11]

    Teaching ethics in computing: a systematic literature review of ACM computer science education publications

    Brown N, Xie B, Sarder E, Fiesler C, Wiese ES. Teaching ethics in computing: a systematic literature review of ACM computer science education publications. ACM Transactions on Computing Education. 2024;24(1):1–36

  12. [12]

    Integrating ethics into computer science education: Multi-, inter-, and transdisciplinary approaches

    Goetze TS. Integrating ethics into computer science education: Multi-, inter-, and transdisciplinary approaches. In: 54th ACM Technical Symposium on Computer Science Education; 2023. p. 645–651. 29

  13. [13]

    Model Cards for Model Reporting

    Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, et al. Model Cards for Model Reporting. In: Conference on Fairness, Accountability, and Transparency; 2019. p. 220–229

  14. [14]

    The Assessment List for Trustworthy Artificial Intelligence (ALTAI)

    High-Level Expert Group on Artificial Intelligence. The Assessment List for Trustworthy Artificial Intelligence (ALTAI). Brussels: European Commission; 2020. Available from: https://digital-strategy. ec.europa.eu/pt/node/806

  15. [15]

    Microsoft.: Harms Modeling - Azure Application Architecture Guide. 2022. Available from: https: //learn.microsoft.com/en-us/azure/architecture/guide/responsible-innovation/harms-modeling/

  16. [16]

    doi: 10.1147/JRD.2019.2942288

    Arnold M, Bellamy RKE, Hind M, Houde S, Mehta S, Mojsilovi´ c A, et al. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development. 2019;63(4/5):6:1–6:13. https://doi.org/10.1147/JRD.2019.2942288

  17. [17]

    Putting AI Ethics to Work: Are the Tools Fit for Purpose? AI and Ethics

    Ayling J, Chapman A. Putting AI Ethics to Work: Are the Tools Fit for Purpose? AI and Ethics. 2022;2(3):405–429

  18. [18]

    Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics

    Wong RY, Madaio MA, Merrill N. Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics. ACM on Human-Computer Interaction. 2023;7(CSCW1):1–27

  19. [19]

    From Ethical AI Frameworks to Tools: A Review of Approaches

    Prem E. From Ethical AI Frameworks to Tools: A Review of Approaches. AI and Ethics. 2023;3:1–18

  20. [20]

    No Such Thing as One-Size-Fits-All in AI Ethics Frameworks: A Comparative Case Study

    Qiang V, Rhim J, Moon Aj. No Such Thing as One-Size-Fits-All in AI Ethics Frameworks: A Comparative Case Study. AI & Society. 2023;6:1–20

  21. [21]

    From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices

    Morley J, Floridi L, Kinsey L, Elhalal A. From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices. Science and Engineering Ethics. 2020;26(4):2141–2168

  22. [22]

    A ‘Biased’ Emerging Governance Regime for Artificial Intelligence? How AI Ethics Get Skewed Moving from Principles to Practices

    Palladino N. A ‘Biased’ Emerging Governance Regime for Artificial Intelligence? How AI Ethics Get Skewed Moving from Principles to Practices. Telecommunications Policy. 2022;47(5):102479

  23. [23]

    Applying the ethics of AI: a systematic review of tools for developing and assessing AI-based systems

    Ortega-Bola˜ nos R, Bernal-Salcedo J, Germ´ an Ortiz M, Galeano Sarmiento J, Ruz GA, Tabares-Soto R. Applying the ethics of AI: a systematic review of tools for developing and assessing AI-based systems. Artificial Intelligence Review. 2024;57(5):110. https://doi.org/https://doi.org/10.1007/ s10462-024-10740-3

  24. [24]

    IDEO.: IDEO’s AI Ethics Cards. 2019. Available from: https://www.ideo.com/journal/ ai-needs-an-ethical-compass-this-tool-can-help

  25. [25]

    Corporate digital responsibility

    Lobschat L, Mueller B, Eggers F, Brandimarte L, Diefenbach S, Kroschke M, et al. Corporate digital responsibility. Journal of Business Research. 2021;122:875–888. https://doi.org/10.1016/j.jbusres. 2019.10.006

  26. [26]

    for Designers E.: Ethics for Designers — The Toolkit. 2017. Available from: https://www. ethicsfordesigners.com/tools

  27. [27]

    Communications of the ACM64(12), 62–71 (2021).https://doi.org/10.1145/3458723

    Gebru T, Morgenstern J, Vecchione B, Vaughan JW, Wallach H, III HD, et al. Datasheets for datasets. Communications of the ACM. 2021 Nov;64(12):86–92. https://doi.org/10.1145/3458723

  28. [28]

    Data statements for natural language processing: Toward mitigating system bias and enabling better science

    Bender EM, Friedman B. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics. 2018;6:587–604

  29. [29]

    The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards

    Holland S, Hosny A, Newman S, Joseph J, Chmielinski K. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. arXiv:180503677. 2018 May;[cs]

  30. [30]

    Aequitas: A Bias and Fairness Audit Toolkit

    Saleiro P, Kuester B, Hinkson L, London J, Stevens A, Anisfeld A, et al. Aequitas: A Bias and Fairness Audit Toolkit. arXiv:181105577. 2019;[cs.LG]. 30

  31. [31]

    AI Explainability 360 Toolkit

    Arya V, Bellamy RKE, Chen PY, Dhurandhar A, Hind M, Hoffman SC, et al. AI Explainability 360 Toolkit. In: 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD). CODS-COMAD ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 376–379

  32. [32]

    Research PA.: What-if Tool. 2018. Available from: https://pair-code.github.io/what-if-tool/

  33. [33]

    for Ethical AI & Machine Learning TI.: AI-RFX Procurement Framework. 2019. Available from: https://ethical.institute/rfx.html

  34. [34]

    Zeno: An Interactive Frame- work for Behavioral Evaluation of Machine Learning

    Cabrera AA, Fu E, Bertucci D, Holstein K, Talwalkar A, Hong JI, et al. Zeno: An Interactive Frame- work for Behavioral Evaluation of Machine Learning. In: CHI Conference on Human Factors in Computing Systems. CHI ’23. New York, NY, USA: Association for Computing Machinery; 2023

  35. [35]

    Why Should I Trust You?

    Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. New York, NY, USA: Association for Computing Machinery; 2016. p. 1135–1144

  36. [36]

    A Unified Approach to Interpreting Model Predictions

    Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems. vol. 30; 2017

  37. [37]

    BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation

    Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang KW, et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In: ACM Conference on Fair- ness, Accountability, and Transparency. FAccT ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 862–872

  38. [38]

    for Ethical AI & Machine Learning TI.: XAI - An eXplainability toolbox for machine learning. 2021. Available from: https://github.com/EthicalML/xai

  39. [39]

    Auditing large language models: A three-layered approach

    M¨ okander J, Schuett J, Kirk HR, Floridi L. Auditing large language models: A three-layered approach. AI and Ethics. 2024;4(4):1085–1115. https://doi.org/10.1007/s43681-023-00289-2

  40. [40]

    Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing

    Raji ID, Smart A, White RN, Mitchell M, Gebru T, Hutchinson B, et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In: Conference on Fair- ness, Accountability, and Transparency. FAT* ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 33–44

  41. [41]

    A seven-layer model with checklists for standardising fairness assess- ment throughout the AI lifecycle

    Agarwal A, Agarwal H. A seven-layer model with checklists for standardising fairness assess- ment throughout the AI lifecycle. AI and Ethics. 2023;4(2):299–314. https://doi.org/10.1007/ s43681-023-00266-9

  42. [42]

    PROBAST: a tool to assess the risk of bias and applicability of prediction model studies

    Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine. 2019;170(1):51–58

  43. [43]

    Judgment Call the Game: Using Value Sensitive Design and Design Fiction to Surface Ethical Concerns Related to Technology

    Ballard S, Chappell KM, Kennedy K. Judgment Call the Game: Using Value Sensitive Design and Design Fiction to Surface Ethical Concerns Related to Technology. In: Designing Interactive Systems Conference; 2019. p. 421–433

  44. [44]

    Microsoft.: Community Jury - Azure Application Architecture Guide. 2022. Available from: https: //learn.microsoft.com/en-us/azure/architecture/guide/responsible-innovation/community-jury/

  45. [45]

    Privacy T.: TensorFlow Privacy. 2019. Available from: https://github.com/tensorflow/privacy

  46. [46]

    Adversarial Robustness Toolbox v1.0.0

    Nicolae MI, Sinn M, Tran MN, Buesser B, Rawat A, Wistuba M, et al. Adversarial Robustness Toolbox v1.0.0. arXiv180701069. 2019;[cs.LG]

  47. [47]

    Doteveryone.: Consequence Scanning: An Agile event for Responsible Innova- tors. 2019. Available from: https://doteveryone.org.uk/wp-content/uploads/2021/02/ 31 Consequence-Scanning-Agile-Event-Manual-TechTransformed-Doteveryone-2.pdf

  48. [48]

    Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

    Kiritchenko S, Mohammad SM. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. NAACL HLT 2018. 2018;p. 43

  49. [49]

    Should I disclose my dataset? Caveats between reproducibility and individual data rights

    Benatti RM, Villarroel CML, Avila S, Colombini EL, Severi F. Should I disclose my dataset? Caveats between reproducibility and individual data rights. In: Natural Legal Language Processing Workshop. Association for Computational Linguistics; 2022. p. 228–237

  50. [50]

    TuringBox: An Experimental Plat- form for the Evaluation of AI Systems

    Epstein Z, Payne BH, Shen JH, Hong CJ, Felbo B, Dubey A, et al. TuringBox: An Experimental Plat- form for the Evaluation of AI Systems. In: Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization; 2018. p. 5826–5828

  51. [51]

    AI Audit: A Card Game to Reflect on Everyday AI Systems

    Ali S, Kumar V, Breazeal C. AI Audit: A Card Game to Reflect on Everyday AI Systems. AAAI Con- ference on Artificial Intelligence. 2024 Jul;37(13):15981–15989. https://doi.org/10.1609/aaai.v37i13. 26897

  52. [52]

    A Survey on Ethical Principles of AI and Implementations

    Zhou J, Chen F, Berry A, Reed M, Zhang S, Savage S. A Survey on Ethical Principles of AI and Implementations. In: IEEE Symposium Series on Computational Intelligence. Canberra, Australia: IEEE; 2020. p. 3010–3017

  53. [53]

    What’s next for AI ethics, policy, and governance? a global overview

    Schiff D, Biddle J, Borenstein J, Laas K. What’s next for AI ethics, policy, and governance? a global overview. In: AAAI/ACM Conference on AI, Ethics, and Society. New York City, NY, USA: ACM

  54. [54]

    Translating principles into practices of digital ethics: Five risks of being unethical

    Floridi L. Translating principles into practices of digital ethics: Five risks of being unethical. Philosophy & Technology. 2019;32(2):185–193

  55. [55]

    Ethics of AI: A systematic literature review of principles and challenges

    Khan AA, Badshah S, Liang P, Waseem M, Khan B, Ahmad A, et al. Ethics of AI: A systematic literature review of principles and challenges. In: 26th International Conference on Evaluation and Assessment in Software Engineering. Gothenburg, Sweden: ACM; 2022. p. 383–392

  56. [56]

    Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance

    Corrˆ ea NK, Galv˜ ao C, Santos JW, Del Pino C, Pinto EP, Barbosa C, et al. Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance. Patterns. 2023;4(10)

  57. [57]

    Artificial Intelligence: The Global Landscape of Ethics Guidelines

    Jobin A, Ienca M, Vayena E. Artificial Intelligence: The Global Landscape of Ethics Guidelines. Nature Machine Intelligence. 2019;1(9):389–399. [cs]

  58. [58]

    Artificial Intelligence Ethics Guidelines for Developers and Users: Clarifying Their Content and Normative Implications

    Ryan M, Stahl BC. Artificial Intelligence Ethics Guidelines for Developers and Users: Clarifying Their Content and Normative Implications. Journal of Information, Communication and Ethics in Society. 2020 Jan;19(1):61–86

  59. [59]

    Responsible AI: Two Frameworks for Ethical Design Practice

    Peters D, Vold K, Robinson D, Calvo RA. Responsible AI: Two Frameworks for Ethical Design Practice. IEEE Transactions on Technology and Society. 2020;1(1):34–47

  60. [60]

    High-Level Expert Group on Artificial Intelligence.: Ethics Guidelines for Trustworthy AI. 2019. Available from: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

  61. [61]

    Anderson D, Bonaguro J, McKinney M, Nicklin A, Wiseman J.: Ethics & Algorithms Toolkit (beta)

  62. [62]

    Available from: https://ethicstoolkit.ai/

  63. [63]

    Treasury Board of Canada.: Algorithmic Impact Assessment Tool. 2021. Available from: https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/ responsible-use-ai/algorithmic-impact-assessment.html

  64. [64]

    Community TW.: The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. 2021. 32

  65. [65]

    Algorithmic Impact Assessments: A Practical Framework for Public Agency Accountability

    Reisman D, Schultz J, Crawford K, Whittaker M. Algorithmic Impact Assessments: A Practical Framework for Public Agency Accountability. AI Now Intitute. 2018;1(1):1–22

  66. [66]

    Explanations based on the missing: Towards contrastive explanations with pertinent negatives

    Dhurandhar A, Chen PY, Luss R, Tu CC, Ting P, Shanmugam K, et al. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. Advances in Neural Information Processing Systems. 2018;31

  67. [67]

    Advbox: A toolbox to generate adversarial examples that fool neural networks

    Goodman D, Xin H, Yang W, Yuesheng W, Junfeng X, Huan Z. Advbox: A toolbox to generate adversarial examples that fool neural networks. arXiv:200105574. 2020;[cs.LG]

  68. [68]

    Fairness in Design: A Framework for Facilitating Ethical Artificial Intelligence Designs

    Zhang J, Shu Y, Yu H. Fairness in Design: A Framework for Facilitating Ethical Artificial Intelligence Designs. International Journal of Crowd Science. 2023;7(1):32–39. https://doi.org/10.26599/IJCS. 2022.9100033

  69. [69]

    AI Privacy Toolkit

    Goldsteen A, Saadi O, Shmelkin R, Shachor S, Razinkov N. AI Privacy Toolkit. SoftwareX. 2023;22:101352. https://doi.org/10.1016/j.softx.2023.101352

  70. [70]

    ICO.: Guide to the UK General Data Protection Regulation (UK GDPR). 2020

  71. [71]

    Paris: OECD Publishing

    OECD Digital Economy Papers.: OECD Framework for the Classification of AI Systems. Paris: OECD Publishing. 2022

  72. [72]

    Forum WE.: AI Procurement in a Box. 2020. Available from: https://www3.weforum.org/docs/ WEF AI Procurement in a Box Project Overview 2020.pdf

  73. [73]

    for Ethical AI & Machine Learning TI.: Machine Learning Maturity Model, AI & Machine Learning Solutions. 2019. Available from: https://ethical.institute/mlmm

  74. [74]

    Guillou P.: GPorTuguese-2 (Portuguese GPT-2 small): a Language Model for Portuguese text gen- eration (and more NLP tasks...). 2020. Available from: https://huggingface.co/pierreguillou/ gpt2-small-portuguese

  75. [75]

    PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

    Carmo D, Piau M, Campiotti I, Nogueira R, Lotufo R. PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data. arXiv:200809144. 2020 Oct

  76. [76]

    BERTimbau: Pretrained BERT Models for Brazilian Portuguese

    Souza F, Nogueira R, Lotufo R. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri R, Prati RC, editors. Intelligent Systems. Cham: Springer International Publishing; 2020. p. 403–417

  77. [77]

    BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition

    Schneider ETR, de Souza JVA, Knafou J, Oliveira LESe, Copara J, Gumiel YB, et al. BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition. In: 3rd Clinical Natural Language Processing Workshop; 2020. p. 65–72

  78. [78]

    BERTa´ u: Ita´ u BERT for Digital Customer Service

    Finardi P, Viegas JD, Ferreira GT, Mansano AF, Carid´ a VF. BERTa´ u: Ita´ u BERT for Digital Customer Service. arXiv:210112015. 2021 Jul

  79. [79]

    A GPT-2 Language Model for Biomedical Texts in Portuguese

    Schneider ETR, de Souza JVA, Gumiel YB, Moro C, Paraiso EC. A GPT-2 Language Model for Biomedical Texts in Portuguese. In: IEEE 34th International Symposium on Computer-Based Medical Systems; 2021. p. 474–479

  80. [80]

    LegalNLP – Natural Language Processing methods for the Brazilian Legal Language

    Polo FM, Mendon¸ ca GCF, Parreira KCJ, Gianvechio L, Cordeiro P, Ferreira JB, et al. LegalNLP – Natural Language Processing methods for the Brazilian Legal Language. arXiv:211015709. 2021;[cs.CL]

Showing first 80 references.