Bias and Risks in AI Systems, Safety Considerations

AI/IA

Well intentioned fixes are tricky

Example from Bender et al. (2021) about the Colossal Clean Crawled Corpus (C4), which trained T5, is a cleaned dataset that removed any page containing one of a list of about 400 “Dirty, Naughty, Obscene or Otherwise Bad Words”:

This list is overwhelmingly words related to sex, with a handful of racial slurs and words related to white supremacy (e.g. swastika, white power) included. While possibly effective at removing documents containing pornography (and the associated problematic stereotypes encoded in the language of such sites and certain kinds of hate speech, this approach will also undoubtedly attenuate, by suppressing such words as twink, the influence of online spaces built by and for LGBTQ people.

Bias and Risks in AI Systems, Safety Considerations

Labs Discussion

In, Out, and About

Inside the Model

Algorithms are not neutral

Adverse or Unknown Effects

Indirect Discrimination

Tips

Cognitive Biases

Adverserial Use & Jailbreaks: Overview

Safety mitigations in Generative AI

Red Teaming

Removing all biases isn't always preferred

Well intentioned fixes are tricky

Lab: Bias Analysis

Bias and Risks in AI Systems, Safety Considerations

Labs Discussion

In, Out, and About

Inside the Model

Algorithms are not neutral

Adverse or Unknown Effects

Indirect Discrimination

Tips

Social Biases

Cognitive Biases

Adverserial Use & Jailbreaks: Overview

Safety mitigations in Generative AI

Red Teaming

Removing all biases isn't always preferred

Well intentioned fixes are tricky

Lab: Bias Analysis