Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI

Alessandro G. Buda; Giuseppe Primiero; Leonardo Ceragioli; Melissa Antonelli

arxiv: 2604.19431 · v1 · submitted 2026-04-21 · 💻 cs.LO · cs.AI

Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI

Alessandro G. Buda , Giuseppe Primiero , Leonardo Ceragioli , Melissa Antonelli This is my paper

Pith reviewed 2026-05-10 01:38 UTC · model grok-4.3

classification 💻 cs.LO cs.AI

keywords branching-time logicbias mitigationgenerative AIfairnessmodal logiccounting semanticspost-hoc bias mitigation

0 comments

The pith

A new logic counts possible AI outputs as worlds to verify and fix bias in generations

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops CTLF, a branching-time logic for analyzing bias in generative AI output sequences. Each possible output is treated as a distinct world in a tree of branching possibilities. Modal operators check compliance with target probability distributions over protected attributes, forecast if future generations will stay fair, and calculate the exact number of outputs to drop to correct bias. The framework is tested on a simple example of biased image generation to show how fairness can be expressed formally at different stages.

Core claim

CTLF adopts a counting worlds semantics where each world represents a possible output at a given step in the generation process and introduces modal operators that allow us to verify whether the current output series respects an intended probability distribution over a protected attribute, to predict the likelihood of remaining within acceptable bounds as new outputs are generated, and to determine how many outputs are needed to remove in order to restore fairness. The framework is illustrated on a toy example of biased image generation.

What carries the argument

Counting worlds semantics for branching-time logic, using modal operators to verify probability distributions, predict bound adherence, and compute removals for fairness restoration.

Load-bearing premise

The actual generative AI process can be modeled faithfully as a branching-time structure with countable worlds at each step and that the modal operators accurately capture and enforce the intended fairness properties outside of toy examples.

What would settle it

An empirical test where CTLF is applied to outputs from a real generative model, and either the fairness verification fails to match observed data or the recommended removals do not achieve the target distribution.

Figures

Figures reproduced from arXiv: 2604.19431 by Alessandro G. Buda, Giuseppe Primiero, Leonardo Ceragioli, Melissa Antonelli.

**Figure 2.** Figure 2: A Subset of the Output at the fourth item in the series. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: A mitigated Output subset. In the following example we will refer to odds for the sake of readability, however it can be easily extended to frequencies and/or probabilities. Consider again [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: A model representing 6 generations of images on the prompt [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Generative AI systems are known to amplify biases present in their training data. While several inference-time mitigation strategies have been proposed, they remain largely empirical and lack formal guarantees. In this paper we introduce CTLF, a branching-time logic designed to reason about bias in series of generative AI outputs. CTLF adopts a counting worlds semantics where each world represents a possible output at a given step in the generation process and introduces modal operators that allow us to verify whether the current output series respects an intended probability distribution over a protected attribute, to predict the likelihood of remaining within acceptable bounds as new outputs are generated, and to determine how many outputs are needed to remove in order to restore fairness. We illustrate the framework on a toy example of biased image generation, showing how CTLF formulas can express concrete fairness properties at different points in the output series.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CTLF sets up a branching-time logic with counting worlds to express bias verification, prediction, and correction in generative output sequences, but the uniform counting semantics looks likely to mismatch the non-uniform probabilities that real models actually produce.

read the letter

The paper introduces CTLF as a modal logic over branching time where each world is a possible output at a generation step. The operators are meant to check whether the current sequence respects a target distribution over a protected attribute, to forecast whether future outputs will stay inside fairness bounds, and to compute how many outputs to drop to restore compliance. The toy image-generation example shows the formulas in action at different points in a sequence. That combination of branching time and counting for post-hoc fairness properties on output streams is new relative to the cited work, and the setup gives a clean way to write down these checks instead of leaving them purely empirical. The illustration helps make the intended use cases concrete. The central soft spot is the semantics. Counting worlds treats outputs as equiprobable by cardinality, yet generative models assign sharply different probabilities via softmax, diffusion schedules, or other mechanisms. Without explicit weighting by the model's actual generation measure, the probability-style operators will not track the intended distribution on real systems. The abstract and example give no sign that weighting is built in, so the stress-test concern holds up. The paper would also be stronger with explicit definitions of the operators, proofs that they deliver the claimed properties, and at least one non-toy evaluation. This is aimed at people working on formal methods for AI fairness or safety who already know modal logic. A reader could pull the operator definitions and test them on their own bias scenarios. It deserves peer review because the formal framing is worth checking and refining, even if the probability modeling needs work.

Referee Report

2 major / 1 minor

Summary. The paper introduces CTLF, a branching-time logic employing counting-worlds semantics in which each world represents a possible generative output at a given step. It defines modal operators intended to verify whether an output series respects a target probability distribution over a protected attribute, to predict the likelihood of remaining within fairness bounds as generation continues, and to compute the number of outputs that must be removed to restore compliance. These capabilities are illustrated via a toy example of biased image generation.

Significance. If the semantics and operators can be shown to correctly encode the intended fairness properties with respect to the actual (non-uniform) generative distribution, the framework would supply the first formal, post-hoc verification tool for bias mitigation in sequential generative outputs, replacing purely empirical strategies with checkable guarantees.

major comments (2)

[Abstract / Semantics] Abstract and semantics section: the counting-worlds semantics is described as using cardinality ratios to evaluate probability operators, yet generative models induce non-uniform measures (softmax, diffusion schedules, etc.). No reduction to the model's actual probability measure is indicated, so the operators cannot correctly verify or predict respect for an intended distribution; this is load-bearing for all three claimed capabilities.
[Abstract] Abstract: the manuscript states that CTLF supplies modal operators with the listed verification, prediction, and removal properties but provides neither the syntax, the satisfaction relation, nor any proof that the operators achieve these properties. Without these definitions the central claims cannot be checked.

minor comments (1)

[Toy example] The toy example would be clearer if it included explicit CTLF formulas together with the step-by-step evaluation under the counting-worlds semantics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and for identifying key areas where the formal foundations of CTLF require clarification and strengthening. We address each major comment below and outline the revisions we will undertake.

read point-by-point responses

Referee: [Abstract / Semantics] Abstract and semantics section: the counting-worlds semantics is described as using cardinality ratios to evaluate probability operators, yet generative models induce non-uniform measures (softmax, diffusion schedules, etc.). No reduction to the model's actual probability measure is indicated, so the operators cannot correctly verify or predict respect for an intended distribution; this is load-bearing for all three claimed capabilities.

Authors: We agree that this distinction is essential. The submitted manuscript defines the counting-worlds semantics via unweighted cardinality ratios, which corresponds to a uniform measure. To support the non-uniform distributions produced by generative models, we will revise the semantics section to introduce a weighted counting-worlds model. Each world will be equipped with a probability weight taken directly from the generative process (e.g., token probabilities or diffusion-step likelihoods). The modal operators will then be re-defined with respect to this weighted measure, and we will supply a formal reduction together with a proof that the verification, prediction, and removal operators correctly encode the intended fairness properties under the actual distribution. The toy example will be updated to use non-uniform probabilities. These changes will be incorporated in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: the manuscript states that CTLF supplies modal operators with the listed verification, prediction, and removal properties but provides neither the syntax, the satisfaction relation, nor any proof that the operators achieve these properties. Without these definitions the central claims cannot be checked.

Authors: We acknowledge that the submitted version does not present the syntax, satisfaction relation, or proofs in sufficient detail for the claims to be independently verified. In the revision we will add a dedicated subsection (or expand Section 3) that gives the full syntax of CTLF, defines the satisfaction relation for each modal operator, and includes complete proofs that the operators realize the stated verification, prediction, and removal capabilities under the (revised, weighted) semantics. A brief summary of the syntax and satisfaction clauses will also be added to the abstract to guide readers to the formal material. revision: yes

Circularity Check

0 steps flagged

No circularity: CTLF is a definitional semantics for fairness properties

full rationale

The paper introduces CTLF as a new branching-time logic whose counting-worlds semantics and modal operators are explicitly defined to capture verification of probability distributions over protected attributes, prediction of future compliance, and computation of output removals needed for fairness. This is a standard definitional construction of a formal system rather than any derivation that reduces to its own fitted inputs or self-referential equations. No load-bearing self-citations, no parameters fitted to data then renamed as predictions, and no uniqueness theorems imported from prior author work appear in the provided abstract or description. The toy example simply illustrates the defined operators. The framework is self-contained against external benchmarks as a proposed logic; any mismatch with non-uniform generative probabilities is an assumption or modeling question, not a circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution rests on the introduction of the new CTLF logic itself. The main supporting assumption is that generative AI output sequences can be modeled as branching time with countable worlds; no free parameters or external evidence for the new entities are supplied in the abstract.

axioms (1)

domain assumption Generative AI output series can be modeled as branching time structures with countable worlds at each step.
This underpins the counting-worlds semantics described in the abstract.

invented entities (1)

CTLF logic with counting worlds semantics and modal operators no independent evidence
purpose: To verify probability distributions, predict bias bounds, and compute outputs to remove for fairness.
Newly introduced framework without independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5447 in / 1425 out tokens · 66658 ms · 2026-05-10T01:38:20.978343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

[1]

Antonelli, L

M. Antonelli, L. Ceragioli, A. Buda, and G. Primiero. A linear temporal logic of frequencies on series of events, 2026

work page 2026
[2]

Antonelli, U

M. Antonelli, U. Dal Lago, and P. Pistone. On counting propositional logic and Wag- ner’s hierarchy.Theoretical Computer Science, 966-967, 2023

work page 2023
[3]

Antonelli, U

M. Antonelli, U. Dal Lago, and P. Pistone. Towards logical foundations for probabilistic computation.Annals of Pure and Applied Logic, 175(9), 2024

work page 2024
[4]

S. G. Ayyamperumal and L. Ge. Current state of LLM risks and AI guardrails. 2024

work page 2024
[5]

Bacchus.Representing and Reasoning with Probabilistic Knowledge A Logical Ap- proach to Probabilities

F. Bacchus.Representing and Reasoning with Probabilistic Knowledge A Logical Ap- proach to Probabilities. The MIT Press, 1990

work page 1990
[6]

S. L. Blodgett et al. Bias and fairness in large language models: A survey.Computa- tional Linguistics, 2024

work page 2024
[7]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Bias amplifi- cation chains in ml-based systems with an application to credit scoring, Jan 2023

work page 2023
[8]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Bias amplifica- tion chains in ml-based systems with an application to credit scoring. In G. Coraglia, F. A. D’Asaro, A. Dyoub, F. A. Lisi, and G. Primiero, editors,Proceedings of the 3rd Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Program- ming co-locate...

work page 2024
[9]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Assessing the risk of discrimination with BRIO: A use case from the financial sector. In D. Pedreschi, M. Milano, I. Tiddi, S. Russell, C. Boldrini, L. Pappalardo, A. Passerini, and S. Wang, editors,Proceedings of the 4th International Conference on Hybrid Human-Artificial Intelligence, ...

work page 2025
[10]

A. G. Buda and G. Primiero. A logic for using information.Logique Et Analyse, 265(n/a):59–103, 2025

work page 2025
[11]

Ceragioli and G

L. Ceragioli and G. Primiero. Trustworthiness preservation by copies of machine learn- ing systems.Int. J. Approx. Reason., 192:109638, 2026

work page 2026
[12]

T. Chen, G. Primiero, F. Raimondi, and N. Rungta. A computationally grounded, weighted doxastic logic.Stud Logica, 104(4):679–703, 2016. 10

work page 2016
[13]

Cheng, R

X. Cheng, R. Chen, H. Zan, Y. Jia, and M. Peng. Biasfilter: An inference-time debiasing framework for large language models.arXiv preprint arXiv:2505.23829, 2025

work page arXiv 2025
[14]

Coraglia, F

G. Coraglia, F. A. D’Asaro, F. A. Genco, D. Giannuzzi, D. Posillipo, G. Primiero, and C. Quaggio. Brioxalkemy: a bias detecting tool. In G. Boella, F. A. D’Asaro, A. Dyoub, L. Gorrieri, F. A. Lisi, C. Manganini, and G. Primiero, editors,Proceedings of the 2nd Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Programming co-locat...

work page 2023
[15]

F. A. D’Asaro, F. A. Genco, and G. Primiero. Checking trustworthiness of probabilistic computations in a typed natural deduction system.J. Log. Comput., 35(6), 2025

work page 2025
[16]

Fagin, J

R. Fagin, J. Halpern, and N. Megiddo. A logic for reasoning about probabilities.In- formation and Computation, 87(1-2):78–128, 1990

work page 1990
[17]

Y. Guo, M. Guo, J. Su, Z. Yang, and M. Zhu. Bias in large language models: Origin, evaluation, and mitigation.arXiv preprint arXiv:2411.10915, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

S. Han, S. Avestimehr, and C. He. Bridging the safety gap: A guardrail pipeline for trustworthy llm inferences. 2025

work page 2025
[19]

Haque, D

F. Haque, D. Xu, and X. Niu. A comprehensive survey on bias and fairness in large lan- guage models. In S. Yuan, F. Malliaros, and X. Zheng, editors,Trends and Applications in Knowledge Discovery and Data Mining, pages 83–101. Springer Nature Singapore, Singapore, 2025

work page 2025
[20]

Hirota, Y

Y. Hirota, Y. Nakashima, and N. Garcia. Quantifying societal bias amplification in image captioning. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13440–13449, 2022

work page 2022
[21]

Hutschenreiter, C

L. Hutschenreiter, C. Baier, and J. Klein. Parametric markov chains: Pctl complexity and fraction-free gaussian elimination.Electronic Proceedings in Theoretical Computer Science, 256:16–30, Sept. 2017

work page 2017
[22]

Jansen, S

N. Jansen, S. Junges, and J.-P. Katoen.Parameter Synthesis in Markov Models: A Gentle Survey, pages 407–437. Springer Nature Switzerland, Cham, 2022

work page 2022
[23]

Kiashemshaki, M

K. Kiashemshaki, M. J. Torkamani, N. Mahmoudi, and M. S. Bilehsavar. Simulating a bias mitigation scenario in large language models. 2025

work page 2025
[24]

Kontinen

J. Kontinen. A logical characterization of the counting hierarchy.ACM Trans. Comput. Log., 10(1):7:1–7:21, 2009

work page 2009
[25]

Kubyshkina and G

E. Kubyshkina and G. Primiero. A possible worlds semantics for trustworthy non- deterministic computations.Int. J. Approx. Reason., 172:109212, 2024

work page 2024
[26]

Legastelois, M.-J

B. Legastelois, M.-J. Lesot, and A. R. d’Allonnes. Typology of axioms for a weighted modal logic.International Journal of Approximate Reasoning, 90:341–358, 2017

work page 2017
[27]

L. Lin, L. Wang, J. Guo, and K.-F. Wong. Investigating bias in llm-based bias detection: Disparities between llms and human perception. InProceedings of COLING, 2025. 11

work page 2025
[28]

Maaz and T

M. Maaz and T. C. Y. Chan. Formal verification of markov processes with learned parameters, 2025

work page 2025
[29]

Monteiro Paes et al

L. Monteiro Paes et al. Direct steering optimization for bias mitigation.arXiv preprint arXiv:2512.15926, 2025

work page arXiv 2025
[30]

Moskowakis

A. Moskowakis. On a generalization of quantifiers.Fundamenta Mathematicae, 1957

work page 1957
[31]

N. J. Nilsson. Probabilistic logic.Artificial Intelligence, 28(1):71–87, 1986

work page 1986
[32]

Seshadri, S

P. Seshadri, S. Singh, and Y. Elazar. The bias amplification paradox in text-to-image generation, 2023

work page 2023
[33]

Siddique, I

Z. Siddique, I. Khalid, et al. Shifting perspectives: Steering vector ensembles for robust bias mitigation in llms.arXiv preprint arXiv:2503.05371, 2025

work page arXiv 2025
[34]

A. A. Syed. Guardrails for large language models: A review of techniques and challenges. URF Journals, 2025

work page 2025
[35]

K. W. Wagner. The complexity of combinatorial problems with succinct input repre- sentation.Acta informatica, 23(3):325–356, 1986

work page 1986
[36]

H. Zhou, Z. Feng, Z. Zhu, J. Qian, and K. Mao. Unibias: unveiling and mitigating llm bias through internal attention and ffn manipulation. 2024. 12

work page 2024

[1] [1]

Antonelli, L

M. Antonelli, L. Ceragioli, A. Buda, and G. Primiero. A linear temporal logic of frequencies on series of events, 2026

work page 2026

[2] [2]

Antonelli, U

M. Antonelli, U. Dal Lago, and P. Pistone. On counting propositional logic and Wag- ner’s hierarchy.Theoretical Computer Science, 966-967, 2023

work page 2023

[3] [3]

Antonelli, U

M. Antonelli, U. Dal Lago, and P. Pistone. Towards logical foundations for probabilistic computation.Annals of Pure and Applied Logic, 175(9), 2024

work page 2024

[4] [4]

S. G. Ayyamperumal and L. Ge. Current state of LLM risks and AI guardrails. 2024

work page 2024

[5] [5]

Bacchus.Representing and Reasoning with Probabilistic Knowledge A Logical Ap- proach to Probabilities

F. Bacchus.Representing and Reasoning with Probabilistic Knowledge A Logical Ap- proach to Probabilities. The MIT Press, 1990

work page 1990

[6] [6]

S. L. Blodgett et al. Bias and fairness in large language models: A survey.Computa- tional Linguistics, 2024

work page 2024

[7] [7]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Bias amplifi- cation chains in ml-based systems with an application to credit scoring, Jan 2023

work page 2023

[8] [8]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Bias amplifica- tion chains in ml-based systems with an application to credit scoring. In G. Coraglia, F. A. D’Asaro, A. Dyoub, F. A. Lisi, and G. Primiero, editors,Proceedings of the 3rd Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Program- ming co-locate...

work page 2024

[9] [9]

A. G. Buda, G. Coraglia, F. A. Genco, C. Manganini, and G. Primiero. Assessing the risk of discrimination with BRIO: A use case from the financial sector. In D. Pedreschi, M. Milano, I. Tiddi, S. Russell, C. Boldrini, L. Pappalardo, A. Passerini, and S. Wang, editors,Proceedings of the 4th International Conference on Hybrid Human-Artificial Intelligence, ...

work page 2025

[10] [10]

A. G. Buda and G. Primiero. A logic for using information.Logique Et Analyse, 265(n/a):59–103, 2025

work page 2025

[11] [11]

Ceragioli and G

L. Ceragioli and G. Primiero. Trustworthiness preservation by copies of machine learn- ing systems.Int. J. Approx. Reason., 192:109638, 2026

work page 2026

[12] [12]

T. Chen, G. Primiero, F. Raimondi, and N. Rungta. A computationally grounded, weighted doxastic logic.Stud Logica, 104(4):679–703, 2016. 10

work page 2016

[13] [13]

Cheng, R

X. Cheng, R. Chen, H. Zan, Y. Jia, and M. Peng. Biasfilter: An inference-time debiasing framework for large language models.arXiv preprint arXiv:2505.23829, 2025

work page arXiv 2025

[14] [14]

Coraglia, F

G. Coraglia, F. A. D’Asaro, F. A. Genco, D. Giannuzzi, D. Posillipo, G. Primiero, and C. Quaggio. Brioxalkemy: a bias detecting tool. In G. Boella, F. A. D’Asaro, A. Dyoub, L. Gorrieri, F. A. Lisi, C. Manganini, and G. Primiero, editors,Proceedings of the 2nd Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Programming co-locat...

work page 2023

[15] [15]

F. A. D’Asaro, F. A. Genco, and G. Primiero. Checking trustworthiness of probabilistic computations in a typed natural deduction system.J. Log. Comput., 35(6), 2025

work page 2025

[16] [16]

Fagin, J

R. Fagin, J. Halpern, and N. Megiddo. A logic for reasoning about probabilities.In- formation and Computation, 87(1-2):78–128, 1990

work page 1990

[17] [17]

Y. Guo, M. Guo, J. Su, Z. Yang, and M. Zhu. Bias in large language models: Origin, evaluation, and mitigation.arXiv preprint arXiv:2411.10915, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

S. Han, S. Avestimehr, and C. He. Bridging the safety gap: A guardrail pipeline for trustworthy llm inferences. 2025

work page 2025

[19] [19]

Haque, D

F. Haque, D. Xu, and X. Niu. A comprehensive survey on bias and fairness in large lan- guage models. In S. Yuan, F. Malliaros, and X. Zheng, editors,Trends and Applications in Knowledge Discovery and Data Mining, pages 83–101. Springer Nature Singapore, Singapore, 2025

work page 2025

[20] [20]

Hirota, Y

Y. Hirota, Y. Nakashima, and N. Garcia. Quantifying societal bias amplification in image captioning. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13440–13449, 2022

work page 2022

[21] [21]

Hutschenreiter, C

L. Hutschenreiter, C. Baier, and J. Klein. Parametric markov chains: Pctl complexity and fraction-free gaussian elimination.Electronic Proceedings in Theoretical Computer Science, 256:16–30, Sept. 2017

work page 2017

[22] [22]

Jansen, S

N. Jansen, S. Junges, and J.-P. Katoen.Parameter Synthesis in Markov Models: A Gentle Survey, pages 407–437. Springer Nature Switzerland, Cham, 2022

work page 2022

[23] [23]

Kiashemshaki, M

K. Kiashemshaki, M. J. Torkamani, N. Mahmoudi, and M. S. Bilehsavar. Simulating a bias mitigation scenario in large language models. 2025

work page 2025

[24] [24]

Kontinen

J. Kontinen. A logical characterization of the counting hierarchy.ACM Trans. Comput. Log., 10(1):7:1–7:21, 2009

work page 2009

[25] [25]

Kubyshkina and G

E. Kubyshkina and G. Primiero. A possible worlds semantics for trustworthy non- deterministic computations.Int. J. Approx. Reason., 172:109212, 2024

work page 2024

[26] [26]

Legastelois, M.-J

B. Legastelois, M.-J. Lesot, and A. R. d’Allonnes. Typology of axioms for a weighted modal logic.International Journal of Approximate Reasoning, 90:341–358, 2017

work page 2017

[27] [27]

L. Lin, L. Wang, J. Guo, and K.-F. Wong. Investigating bias in llm-based bias detection: Disparities between llms and human perception. InProceedings of COLING, 2025. 11

work page 2025

[28] [28]

Maaz and T

M. Maaz and T. C. Y. Chan. Formal verification of markov processes with learned parameters, 2025

work page 2025

[29] [29]

Monteiro Paes et al

L. Monteiro Paes et al. Direct steering optimization for bias mitigation.arXiv preprint arXiv:2512.15926, 2025

work page arXiv 2025

[30] [30]

Moskowakis

A. Moskowakis. On a generalization of quantifiers.Fundamenta Mathematicae, 1957

work page 1957

[31] [31]

N. J. Nilsson. Probabilistic logic.Artificial Intelligence, 28(1):71–87, 1986

work page 1986

[32] [32]

Seshadri, S

P. Seshadri, S. Singh, and Y. Elazar. The bias amplification paradox in text-to-image generation, 2023

work page 2023

[33] [33]

Siddique, I

Z. Siddique, I. Khalid, et al. Shifting perspectives: Steering vector ensembles for robust bias mitigation in llms.arXiv preprint arXiv:2503.05371, 2025

work page arXiv 2025

[34] [34]

A. A. Syed. Guardrails for large language models: A review of techniques and challenges. URF Journals, 2025

work page 2025

[35] [35]

K. W. Wagner. The complexity of combinatorial problems with succinct input repre- sentation.Acta informatica, 23(3):325–356, 1986

work page 1986

[36] [36]

H. Zhou, Z. Feng, Z. Zhu, J. Qian, and K. Mao. Unibias: unveiling and mitigating llm bias through internal attention and ffn manipulation. 2024. 12

work page 2024