Research

Bias is the Foundation

Why every fairness claim begins with a bias diagnosis, and why skipping it breaks everything downstream.

Executive summary

Fairness in AI is not a feature you bolt on. It is a property that emerges when bias is diagnosed, measured, and addressed at every layer of the pipeline. This report establishes five foundational principles: bias is the origin of every unfair outcome; fairness itself is abstract while bias is measurable; diagnosis must precede treatment; the intervention order is fixed (bias, then metric, then intervention); and bias is the load-bearing layer on which all trust, governance, and compliance ultimately rest.

It then maps how bias enters at each stage of the AI lifecycle (data collection, modelling, and human review) before explaining why choosing a fairness objective is a normative decision, not a data decision. The report closes with the insight that no universal fairness lens exists: what counts as fair is determined by the harm at stake, not by the mathematics, and cross-domain AI systems inherit cross-domain obligations.

A bias you can see on a shelf

Line drawing of the same shelf, abstracting the gendered pricing difference between Scrub Daddy and Scrub Mommy. — Illustrative real-world bias: a male-coded product priced above its female-coded variant, for the same job. Hover to abstract it. Photo courtesy of Katrina Marshall Dyrting (via LinkedIn).

Before bias is a statistic in a dataset, it is a fact about the world. Take the two near-identical cleaning sponges shown here. The original is sold as “Scrub Daddy”; beside it sits “Scrub Mommy,” a variant of the same product line. They do the same job, yet the male-coded original is priced higher than the female-coded version.

Two signals are embedded here at once. The “Mommy” product is positioned as a derivative of the “Daddy” brand, framed as the secondary version rather than an equal, and it is priced lower for doing identical work. A difference in label, not in function, becomes a difference in status and price. It is the shape of the gender wage gap, rendered in retail packaging.

This is an illustrative example, not a claim about any company’s intent. The point is structural: bias does not originate in a model. It is already present in the world the model learns from, in prices, in labels, in who is treated as the default and who as the variant. An AI system trained on a world like this inherits its assumptions unless someone deliberately diagnoses and corrects them. That is why fairness work begins not with the model, but with the bias that precedes it.

1. The five axioms of bias

Before any metric is chosen, any model is trained, or any mitigation is applied, five structural truths about bias must be understood. These are not recommendations. They are the load-bearing premises on which every subsequent fairness decision rests. Violating any one of them does not merely weaken an audit; it renders the audit meaningless.

01
Bias is the origin, not the symptom
Every unfair outcome traces back to an upstream bias. AI inherits unfairness; it does not invent it.
02
Fairness is abstract; bias is measurable
You cannot measure fairness directly. Every fairness metric is, underneath, a measurement of a bias gap.
03
Diagnosis precedes treatment
You cannot mitigate what you have not located. Naming the bias is the first technical act of fairness.
04
The order is fixed
Bias → metric → intervention. Reorder it and you optimise the wrong objective.
05
Bias is the load-bearing layer of trust
Governance, compliance and assurance all rest on the bias diagnosis beneath them.

The foundation — bias precedes everything above it

1.1 Bias is the origin, not the symptom

Every unfair AI outcome traces back to a bias upstream of it. AI does not invent unfairness; it inherits it. The model is a mirror: it reflects the distributions, the omissions, and the measurement choices that were already present in the data and the design decisions that shaped it. To treat an unfair outcome without tracing it to its bias source is to treat a fever without asking what caused the infection.

1.2 Fairness is abstract; bias is measurable

You cannot measure fairness directly. Every fairness metric (demographic parity, equal opportunity, calibration, predictive parity) is, underneath, a measurement of bias. The metric quantifies the gap; the gap is the bias. Fairness is the name we give to the state where the gap is acceptable. This distinction matters operationally: you do not optimise for fairness; you reduce bias until the remaining gap falls within a defensible threshold.

1.3 Diagnosis precedes treatment

You cannot mitigate what you have not located. Naming the bias (its type, its source, the stage at which it enters, the groups it affects) is the first technical act of fairness work. Without diagnosis, interventions are guesses applied to symptoms. With diagnosis, interventions become targeted, auditable, and reversible.

1.4 The order is fixed: bias, metric, intervention

Skipping the bias step turns interventions into guesses, and guesses do not generalise. The correct sequence is invariant:

01Identify the bias (type, source, affected group).
02Select a metric that operationalises the relevant fairness objective for that bias.
03Apply the intervention that targets the diagnosed bias, then re-measure.

Reversing steps 1 and 2 (choosing a metric before understanding the bias) leads to the most common failure mode in applied fairness: optimising the wrong objective because it happened to be computable.

1.5 Bias is the load-bearing layer of trust

Trust in an AI system begins the moment its biases stop hiding. Everything in the architecture above (governance frameworks, compliance certifications, assurance audits) rests on this foundation. A governance framework that does not know where the bias lives is built on sand. A compliance certificate issued without a bias diagnosis is a document, not evidence.

Architectural line drawing of a long colonnade receding to a vanishing point, with a single figure walking down it. — Each column is a fairness check the model must pass before a decision; the figure walking the corridor is the model under audit.

2. Where bias enters the pipeline

Bias is pipeline-deep, not point-deep. There is no single bias step to fix. It enters at every stage of the AI lifecycle: data collection, model construction, and human review. Each stage demands its own diagnosis and its own intervention. Data fixes cannot repair modelling bias. Modelling fixes cannot repair reviewer bias. Fairness work must be layered, because bias is layered.

Diagram of the AI lifecycle: World, Data, AI/ML, Human Review and Actions, with the biases that enter at each stage and a feedback loop returning Actions to the World. — Bias enters at every stage of the lifecycle, and a feedback loop carries it back into the world.

2.1 Bias in data

Bias in data can show up in several forms. The three principal offenders are historical bias, representation bias, and measurement bias.

Historical bias is the already existing bias in the world that has seeped into our data. It can occur even given perfect sampling and feature selection, and tends to show up for groups that have been historically disadvantaged or excluded. It was illustrated by the 2016 paper “Man is to Computer Programmer as Woman is to Homemaker,” whose authors showed that word embeddings trained on Google News articles exhibit and perpetuate gender-based stereotypes in society.

Representation bias happens from the way we define and sample a population to create a dataset. For example, the data used to train Amazon’s facial-recognition system was mostly based on white faces, leading to significant accuracy gaps when detecting darker-skinned faces. Datasets collected through smartphone apps can similarly underrepresent lower-income or older demographics.

Measurement bias occurs when choosing or collecting the features or labels used in predictive models. Data that is easily available is often a noisy proxy for the actual feature or label of interest, and measurement processes and data quality often vary across groups. In a 2016 report, ProPublica investigated predictive policing and found that the use of proxy measurements in predicting recidivism can lead to black defendants receiving harsher sentences than white defendants for the same crime.

Bias type	Mechanism	Canonical example
Historical bias	World’s existing inequities absorbed into training data	Word embeddings encoding gender stereotypes (Bolukbasi et al. 2016)
Representation bias	Population sampling does not reflect the deployment population	Amazon facial recognition trained predominantly on white faces
Measurement bias	Proxy features or labels diverge from the true construct across groups	COMPAS recidivism: arrest records as a proxy for re-offending (ProPublica 2016)

2.2 Bias in modelling

Even with perfect data, our modelling methods can introduce bias. Two common pathways are evaluation bias and aggregation bias.

Evaluation bias occurs during model iteration and evaluation. A model is optimised on training data, but its quality is often measured against benchmarks. Bias arises when those benchmarks do not represent the general population, or are not appropriate for the way the model will be used.

Aggregation bias arises during model construction when distinct populations are inappropriately combined. In many applications the population of interest is heterogeneous, and a single model is unlikely to suit all groups. In healthcare, models for diagnosing and monitoring diabetes have historically used Hemoglobin A1c (HbA1c) levels; a 2019 paper showed those levels differ in complicated ways across ethnicities, so a single model for all populations is bound to exhibit bias.

Bias type	Mechanism	Canonical example
Evaluation bias	Benchmarks do not represent the deployment population or use case	ImageNet benchmarks overrepresenting Western contexts
Aggregation bias	Heterogeneous populations forced into a single model	HbA1c diabetes models ignoring ethnicity-linked variation (Vyas et al. 2019)

2.3 Bias in human review

Even if the model is making correct predictions, a human reviewer can introduce their own biases when deciding whether to accept or disregard a prediction. The human in the loop is a bias source, not a bias filter.

A reviewer might override a correct prediction based on systemic bias, reasoning along the lines of “I know that demographic, and they never perform well.” This re-injects unfairness after the model has done its job, making human-in-the-loop oversight a potential amplifier of the very harm it was designed to prevent.

2.4 The feedback loop

Actions taken by the system return to the world as new training data. The cycle does not reset between releases; it compounds. A biased decision today becomes a biased training signal tomorrow. This is why bias auditing is not a one-time event but a continuous obligation.

3. Choosing the fairness objective

Choosing the fairness objective is a normative decision, not a data decision. You pick the target from the harm structure, not from what the file lets you compute. This section explains why, and what to do when the ideal metric is unmeasurable.

Three families of fairness definitions compared: individual fairness, group fairness and counterfactual fairness, arranged by operational cost. — Three families of fairness, each encoding a different theory of justice, at a different operational cost.

3.1 The objective follows the harm, not the data

In hiring, the costly, rights-relevant error is the false negative: a qualified person screened out. The criterion that targets exactly that is Equal Opportunity, equal true-positive rate across groups (Hardt et al. 2016): among people who really are qualified, are all groups invited at the same rate?

That choice is correct because of what hiring is, regardless of whether your particular dataset can measure it. Letting measurability pick your ethics is backwards: you would end up optimising demographic parity just because it is computable, even when it is the wrong objective for the domain. The rule: keep equal opportunity as the declared target. Data availability decides the estimator, not the goal.

Four stakeholders define fairness differently: engineering as accurate prediction, legal as no disparate impact, civil society as correcting historical exclusion, and executives as a defensible business outcome. — Four people, four answers: fairness is a normative choice, so the objective must be declared, not assumed.

3.2 Why you still cannot compute it directly, and that is realistic

Equal opportunity needs a “this person was actually qualified” label. A real audit almost never has it, for a structural reason called the selective labels problem: you only ever observe outcomes for people the model already let through. You cannot see how the rejected candidates would have performed.

The honest position is that the right objective is unmeasurable directly with production data, by nature, not by oversight. This is not a failure of the audit; it is the defining constraint of applied fairness work.

3.3 The proxy workflow

You do not abandon the objective; you estimate it with progressively weaker proxies and report honestly:

01Demographic parity / four-fifths rule. A large selection-rate gap is a necessary-condition screen: if a group is invited at one-fifth the rate, qualified people in it are almost certainly being lost. A pass here does not clear equal opportunity, but a fail is strong evidence of a problem.
02Conditional selection parity on legitimate observables. Among candidates matched on experience, skills, and education bands, are invite rates equal across groups? This uses observed merit features as a stand-in for the missing “truly qualified” label, and is the best label-free approximation of equal opportunity.
03Outcome proxies in real deployments. Who was hired and later retained, performed, or promoted? Biased and partial (selective labels again), but it lets you actually estimate TPR gaps with caveats.

This is precisely why Validant’s fairness showcase ships the oracle dataset. The full 43-column file exists to prove the proxy works. You run the label-free proxies on the 41-column file, then join the oracle back and check: did conditional selection parity on observables correctly recover the true equal-opportunity violation that the qualification score reveals? If yes, you have evidence the same proxy is trustworthy in real audits where no oracle exists. That is the entire pedagogical payload of having two files.

4. No universal fairness lens

There is no universal fairness lens. What “fair” means is determined by the harm at stake, not by the mathematics. The dominant lens follows the dominant harm.

Domain harm	Dominant fairness lens	Regulatory context	Key metric
Exclusion (hiring, lending, housing)	Distributive fairness	EU AI Act, NYC LL 144, EEOC four-fifths rule	Selection-rate parity, equal opportunity
Liberty (criminal justice, sentencing)	Calibration	COMPAS litigation, ProPublica analysis	Calibration across groups, predictive parity
Speech (content moderation)	Procedural fairness	DSA, platform governance frameworks	Appeal-rate equality, notice completeness
Health (clinical decision support)	Aggregation-aware equity	FDA guidance, Obermeyer et al. 2019	Subgroup-stratified accuracy, HbA1c-adjusted thresholds
General (cross-domain deployment)	Procedural floor	EU AI Act Art. 13-14, NIST AI RMF	Notice, explainability, contestability

Mapping of domains to their dominant fairness lens: hiring to equal opportunity, lending to equal credit access, healthcare to equal access to care, criminal justice to calibration and counterfactual, content and speech to procedural fairness. — The dominant fairness lens follows the dominant harm: each domain demands a different objective.

4.1 The same metric can be mandatory in one domain and meaningless in another

Selection-rate parity is law in hiring (the EEOC four-fifths rule) and irrelevant in healthcare. Calibration is essential in criminal justice and beside the point in content moderation. This is not a technicality; it is the structural reason why a single fairness score is scientifically incoherent.

4.2 Cross-domain systems inherit cross-domain obligations

A general-purpose model deployed in hiring, lending, and healthcare must satisfy three fairness regimes simultaneously, not one. This is the regulatory reality of foundation models. The vendor’s obligation is not to pick one lens and optimise for it; it is to demonstrate awareness of every domain in which the model is deployed and to provide domain-specific evaluation evidence for each.

4.3 Procedural fairness is the universal floor

Where substantive metrics disagree, every domain converges on the same minimum: notice, explainability, contestability. Where outcome fairness fails, procedural fairness is what remains. A system that cannot explain its decisions has failed the procedural floor regardless of how well it scores on any substantive metric.

“One word. Five regimes. No single answer. The only honest response is to build fairness evaluation that is domain-aware by design.”
Bias is the Foundation

Sources and further reading

01Hardt, M., Price, E. & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS 2016.
02Bolukbasi, T. et al. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. NeurIPS 2016.
03Angwin, J. et al. (2016). Machine Bias. ProPublica.
04Obermeyer, Z. et al. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464).
05Vyas, D.A. et al. (2020). Hidden in Plain Sight: Reconsidering the Use of Race Correction in Clinical Algorithms. NEJM 383(9).
06Suresh, H. & Guttag, J. (2021). A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. EAAMO 2021.
07Buolamwini, J. & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. FAccT 2018.

Share this post

The Wrong Question, Asked at Scale

A landmark study of 4 million job applications shows how AI hiring tools hide their bias, why one rejection can become rejection everywhere, and why independent, position-level assessment is no longer optional.

Read

Blanco-style technical line drawing of three spheres of different sizes held in balance by interlacing elliptical orbits, with soft coral washes, on a white ground.

ResearchOpen to read

5 June 2026

Digital Trust Is an Orbit, Not a Pillar

Trust is not one more pillar to stack. It is the orbit three bodies trace together: the model, the person and the organisation. Why the three-body problem is the honest metaphor for trustworthy AI, and how to tell where you are in the orbit.

Read

Blanco-style technical line drawing of Lucerne’s covered wooden Chapel Bridge and octagonal Water Tower over the Reuss, with Mount Pilatus behind and a soft coral wash in the sky.

EventsOpen to read

30 May 2026

Two Views of One Decision: Trustworthy and Explainable AI in Practice at HSLU

Notes from a Lucerne specialists course on Trustworthy and Explainable AI, and what it confirms about the Validant.ai approach.

Read

Back to all updates