Technical Foundations of Modern AI

Document	cat	dog	pet	food	bowl
doc1.txt	2	0	1	1	1
doc2.txt	0	3	1	2	1
doc3.txt	1	1	2	0	0
doc4.txt	2	3	2	0	0

Input:

<|im_start|>system<|im_sep|>You are a helpful assistant<|im_end|>
<|im_start|>user<|im_sep|>Explain Controlled Vocabularies like I'm 5<|im_end|>

Output:

<|im_start|>assistant<|im_sep|>Okay! Imagine you have a big toy box with all kinds of toys: blocks, 
cars, dolls, and puzzles. Now, what if everyone called the toys different names? Someone might call 
the blocks "building stuff," another person might call them "squares," and someone else might say 
"colorful thingies." That would get really confusing, right?

So, we decide on special names that everyone will use for the toys. For example:

- Blocks are always called "blocks."
- Cars are always called "cars."
- Dolls are always called "dolls."
- This way, when we want a toy, everyone knows exactly what to look for.

Controlled vocabularies are like that but for words. They help everyone use the same names for things
so we can find them easily, especially in big places like libraries or on the internet!<|im_end|>

Technical Foundations of Modern AI

Week 2: How We Got Here

Agenda

Class Questions

Our Chat Bot Toolbelt

ChatGPT (OpenAI)

Claude (Anthropic)

Gemini (Google)

LLaMA (Meta)

Key Features to Consider

More Info on Labs

Lab 1: Tool Evaluation Discussion

Skill Check Quiz

The Evolution of Text Understanding

Tokenization

Bag of Words

Vector Space Model (Salton, 1970s)

Problems with the Vector Space Model

Making It Smaller, Making It Smarter

Latent Semantic Analysis

Finding Hidden Patterns: Synonymy

Word Embeddings

King - Man + Woman = Queen

Transfer Learning

Understanding Complex Language

Sequences matter in some cases!

The Bitter Lesson (Sutton, 2019)

Improvements from Word Embeddings to the Models of Today

Updates to Tokenization

Attention is All You Need

Transformer Models

Extra Reading

Model Architectures

Q: If it's just generating text, how does the computer chat?

A: It's faking it!

Instruction Tuning

ChatGPT

Why does the history matter?

Looking Ahead

Lab 2: Bot Don't Kill My Vibe

Technical Foundations of Modern AI

Week 2: How We Got Here

Agenda

Class Questions

Our Chat Bot Toolbelt

ChatGPT (OpenAI)

Claude (Anthropic)

Gemini (Google)

LLaMA (Meta)

Key Features to Consider

More Info on Labs

Lab 1: Tool Evaluation Discussion

Skill Check Quiz

The Evolution of Text Understanding

Tokenization

Bag of Words

Vector Space Model (Salton, 1970s)

Problems with the Vector Space Model

Making It Smaller, Making It Smarter

Latent Semantic Analysis

Finding Hidden Patterns: Synonymy

Word Embeddings

King - Man + Woman = Queen

Transfer Learning

Understanding Complex Language

Sequences matter in some cases!

The Bitter Lesson (Sutton, 2019)

Improvements from Word Embeddings to the Models of Today

Updates to Tokenization

Attention is All You Need

Transformer Models

Extra Reading

Model Architectures

Q: If it's just generating text, how does the computer chat?

A: It's faking it!

Instruction Tuning

ChatGPT

Beyond Text: Multi-Modal Transformers

Why does the history matter?

Looking Ahead

Lab 2: Bot Don't Kill My Vibe