AI in Information Behavior
In the News
- o3-mini, o3-mini-high -> Reasoning models getting cheaper and more efficient
- OpenAI releases a deep research model -> cool, doesn’t anybody want it?
- S1 paper (Muennighoff et al. 2025):
Will 2025 be the year of improving smaller models?
Skill Check
- What is the AI Effect and why is it relevant to discussions of AI creativity?
- What are the three ‘voices’ typically recognized by instruction-tuned models?
- How might AI enhance or limit different aspects of creativity (e.g., originality vs flexibility)?
- What makes a good creative partnership between humans and AI?
Today
Information Behavior and AI:
- LLMs for information seeking
- LLMs and AI-tools as part of information retrieval
We’ll discuss our New Hobby outcomes in the middle.
From Document Retrieval to Information Retrieval
"As a medium for the display of information, the printed page is superb... With respect to retrievability they are poor. And when it comes to organizing the body of knowledge, books by themselves make no active contribution at all. ...If human interaction with the body of knowledge is a dynamic process involving repeated examinations and intercomparisons of very many small and scattered parts, then any concept of a library that begins with books on shelves is sure to encounter trouble"
"We need to substitute for the book a device that will make it easy to transmit information without transporting material, and that will not only present information to people but also process it for them, following procedures they specify, apply, monitor, and, if necessary, revise and reapply."
AI Tools and Generative AI in Research and Information Behavior
AI Information Retrieval For Ready Reference
Chat bots are good at information retrieval, but
a. they’re not perfect, and b. their reliability is selective
Good for standard advice, common information, ready reference
Worse for niche or very specific information
e.g. Research Methods Advice
- Which statistical test to use?
- How to write a literature review?
- What sections would you expect in a research paper on topic X?
If it’s something that you expect to be findable online but want to bring it up quickly and responsive to how exactly you’re thinking about your problem, try a chat bot
Speaking Your Language
- For most information seeking, you have to speak the language - a language that you may not know. Chat bots let you speak your language.
Example: Vague Questions, Vague Memories
Example: Highly-customized Information Seeking
Coding Assistance
Because of the logical structure of that form of writing, generative AI is particularly effective at helping you write code
- Github Copilot and Cursor, are good implementations of an LLM coding assistant
But I don’t code!
Think about highly structured formats that you use in your professional life
- Stats software (Mplus, R, SAS, Stata)
- Retrieval languages (SQL, SPARQL); complex boolean queries
- Markup languages (HTML, XML, Markdown, JSON, JSON-LD)
From my scripting class, Strategies for Pair Programming with Chat Bots
From Last week: Niche and Hallucinated Information
LLMs are not always trustworthy, and their trustworthiness is variable in different ways than our trustworthiness, making validation a challenge
Intermission
Sharing Our ’New Hobby’ Outcomes
Let’s discuss your experiences learning something new with AI assistance:
As a class:
- What new skill or hobby did you explore, and what AI tools did you use?
In groups:
- Share more detail about your project! Not every conversation needs to be about the tools - take some show and tell time!
- Did AI help you overcome the “vocabulary problem” in learning something new?
- What were the strengths and limitations of using AI as a learning assistant?
- How did the AI-assisted learning experience compare to traditional learning methods?
- What surprised you most about using AI to learn a new skill?
Information Retrieval in the Age of LLMs
INDEXING | USER INPUT
RETRIEVAL | OUTPUT
What can change?
- A newly designed AI-first retrieval system can design at any step
- BUT AI-tools can also be wrapped around an existing system - at varying levels of complexity - to still ease that system’s use
The current AI standard for retrieval
Coming full circle - AI-based embedding models
Models derived from the exemplary language understanding of LLMs, trained to put text in a geometric space where similar text is close together.
-> just like the vector space model in the 70s -> just like latent semantic indexing in the 80s
Use traditional information retrieval systems with an AI-first approach
Many traditional search systems that use LLMs in their indexing, access, and document processing
Embedding-based retrieval systems
Ranked Retrieval Model -> not boolean, but with grades of relevance - the most relevant documents at the top
The Tools
Embedding Models: The process that makes ‘text’ into ‘embeddings’
- Sentence-Transformers, OpenAI Embeddings, Google Gecko, Nomic Embed
The retrieval system: The system that can quickly find and rank the document embeddings from a query embeding
Working with existing systems
Automating Manual Processes (INDEXING > DATA PREP)
LLMs can be implemented into the data preparation process
Project: To assist, enhance and promote the Attwater Prairie Chicken National Wildlife Refuge’s goal of recovering the endangered Attwater’s prairie chicken and restoring native Texas coastal prairie for the benefit of present and future generations.
Output:
biome_habitat='Grassland'
taxon=['Bird']
common_species_name="Attwater's Prairie Chicken"
User Language instead of System Language
Pipeline: Using an LLM to translate to the system
Example: SparQL query over DBPedia (Wikipedia)
User:
Where are the Gothic buildings in the world
LLM:
SELECT ?building ?location WHERE { ?building dbo:architecturalStyle dbr:Gothic_architecture ; dbo:location ?location . }
Result Parsing
- Examples: SearchGPT (openAI), Perplexity, Google Search
Controversial for web search, because a) cannibalizes effort of a broad swath of creators online and b) is convenient, keeping people from leaving search
For your own databases there is the value without the ethical harm (though quality risks remain)
Elicit
Consensus.app
SciSpace
Semantic Scholar
https://www.semanticscholar.org
RAG - Retrieval Augmented Generation
Combines INPUT and OUTPUT parsing with AI - using a chat model, but giving it access to a search engine.*
Broadly, an example of an AGENT. (by one definition, an AGENT is a system that can take actions in the world)
*This is what the little globe icon in ChatGPT does
- e.g. “What is 1 + 1?” vs “What is 1 + 1? If the question requires a calculator, you can call it with
calculator('1+1')
?” - the latter lets the model offload functions that it’s not good.
Example prompt
this information is usually abstracted away, but behind the scenes, RAG is really just a prompt
You can implement other tools in the ‘chain’
Recall the system prompts we looked at
Code Interpreter
Conclusion
Machine learning can help database searching and information retrieval at every step of the process. LLMs themselves are useful for information behavior sometimes, but often more valuable as part of a bigger system.
- Fundamentally, innovations in text-based language modelling allow our interactions with database systems to be more human
Today’s Lab: Prompt & Circumstance
Design a self-contained interactive experience (game, tool, or interface) that lives entirely within a single AI prompt. The experience should activate when a user types ‘START’ and maintain its designed behavior throughout the interaction.
I’ve been creating some examples to model this.