< Back

AI in Information Behavior


In the News

  • o3-mini, o3-mini-high -> Reasoning models getting cheaper and more efficient
  • OpenAI releases a deep research model -> cool, doesn’t anybody want it?
  • S1 paper (Muennighoff et al. 2025): bg:right:30% width:500px

Will 2025 be the year of improving smaller models?


Skill Check

  • What is the AI Effect and why is it relevant to discussions of AI creativity?
  • What are the three ‘voices’ typically recognized by instruction-tuned models?
  • How might AI enhance or limit different aspects of creativity (e.g., originality vs flexibility)?
  • What makes a good creative partnership between humans and AI?

Today

Information Behavior and AI:

  1. LLMs for information seeking
  2. LLMs and AI-tools as part of information retrieval

We’ll discuss our New Hobby outcomes in the middle.


From Document Retrieval to Information Retrieval

"As a medium for the display of information, the printed page is superb... With respect to retrievability they are poor. And when it comes to organizing the body of knowledge, books by themselves make no active contribution at all. ...If human interaction with the body of knowledge is a dynamic process involving repeated examinations and intercomparisons of very many small and scattered parts, then any concept of a library that begins with books on shelves is sure to encounter trouble"

"We need to substitute for the book a device that will make it easy to transmit information without transporting material, and that will not only present information to people but also process it for them, following procedures they specify, apply, monitor, and, if necessary, revise and reapply."

JCR Licklider's *Libraries of the Future* (1965) bg right:37%


AI Tools and Generative AI in Research and Information Behavior


AI Information Retrieval For Ready Reference

  • Chat bots are good at information retrieval, but

    a. they’re not perfect, and b. their reliability is selective

  • Good for standard advice, common information, ready reference

  • Worse for niche or very specific information


e.g. Research Methods Advice

  • Which statistical test to use?
  • How to write a literature review?
  • What sections would you expect in a research paper on topic X?

If it’s something that you expect to be findable online but want to bring it up quickly and responsive to how exactly you’re thinking about your problem, try a chat bot



Speaking Your Language

  • For most information seeking, you have to speak the language - a language that you may not know. Chat bots let you speak your language.

Example: Vague Questions, Vague Memories

New post to 'What's the Name of That Book???' forums bg right:60%


GPT-4 speculation


Example: Highly-customized Information Seeking

Cocktail Menu. Depending on how it's prompted it can retrieve real recipes or invent new ones


Coding Assistance

Because of the logical structure of that form of writing, generative AI is particularly effective at helping you write code

  • Github Copilot and Cursor, are good implementations of an LLM coding assistant

But I don’t code!

Think about highly structured formats that you use in your professional life

  • Stats software (Mplus, R, SAS, Stata)
  • Retrieval languages (SQL, SPARQL); complex boolean queries
  • Markup languages (HTML, XML, Markdown, JSON, JSON-LD)

From my scripting class, Strategies for Pair Programming with Chat Bots

Strategies for Pair Programming with Chat Bots - https://links.porg.dev/chat-strategies


From Last week: Niche and Hallucinated Information

LLMs are not always trustworthy, and their trustworthiness is variable in different ways than our trustworthiness, making validation a challenge

width: 100px


Intermission


Sharing Our ’New Hobby’ Outcomes

Let’s discuss your experiences learning something new with AI assistance:

As a class:

  • What new skill or hobby did you explore, and what AI tools did you use?

In groups:

  • Share more detail about your project! Not every conversation needs to be about the tools - take some show and tell time!
  • Did AI help you overcome the “vocabulary problem” in learning something new?
  • What were the strengths and limitations of using AI as a learning assistant?
  • How did the AI-assisted learning experience compare to traditional learning methods?
  • What surprised you most about using AI to learn a new skill?

Information Retrieval in the Age of LLMs


Information Retrieval Process


INDEXING | USER INPUT

RETRIEVAL | OUTPUT


What can change?

  • A newly designed AI-first retrieval system can design at any step
  • BUT AI-tools can also be wrapped around an existing system - at varying levels of complexity - to still ease that system’s use

The current AI standard for retrieval

Coming full circle - AI-based embedding models

Models derived from the exemplary language understanding of LLMs, trained to put text in a geometric space where similar text is close together.

-> just like the vector space model in the 70s -> just like latent semantic indexing in the 80s


Use traditional information retrieval systems with an AI-first approach

Many traditional search systems that use LLMs in their indexing, access, and document processing


Embedding-based retrieval systems

Ranked Retrieval Model -> not boolean, but with grades of relevance - the most relevant documents at the top

Diagram of embedding models height:500


Example from OpenAI


The Tools

Embedding Models: The process that makes ‘text’ into ‘embeddings’

  • Sentence-Transformers, OpenAI Embeddings, Google Gecko, Nomic Embed

The retrieval system: The system that can quickly find and rank the document embeddings from a query embeding



Working with existing systems

Diagram of working with existing systems height:500


Automating Manual Processes (INDEXING > DATA PREP)

LLMs can be implemented into the data preparation process


bg fit


Project: To assist, enhance and promote the Attwater Prairie Chicken National Wildlife Refuge’s goal of recovering the endangered Attwater’s prairie chicken and restoring native Texas coastal prairie for the benefit of present and future generations.

Output:

biome_habitat='Grassland'
taxon=['Bird']
common_species_name="Attwater's Prairie Chicken"

User Language instead of System Language


Pipeline: Using an LLM to translate to the system

Diagram of retrieval pipeline


Example: SparQL query over DBPedia (Wikipedia)

  1. User: Where are the Gothic buildings in the world

  2. LLM: SELECT ?building ?location WHERE { ?building dbo:architecturalStyle dbr:Gothic_architecture ; dbo:location ?location . }


Result Parsing

Diagram of result parsing


  • Examples: SearchGPT (openAI), Perplexity, Google Search

Controversial for web search, because a) cannibalizes effort of a broad swath of creators online and b) is convenient, keeping people from leaving search

For your own databases there is the value without the ethical harm (though quality risks remain)


Elicit

https://elicit.org/

Elicit enables natural language search (and summarization) over peer-reviewed literature


Consensus.app

https://consensus.app/


SciSpace

https://typeset.io/


Semantic Scholar

https://www.semanticscholar.org

Semantic Scholar is a research search and tracking tool with effective recommendation (passive search)


RAG - Retrieval Augmented Generation

Combines INPUT and OUTPUT parsing with AI - using a chat model, but giving it access to a search engine.*

Broadly, an example of an AGENT. (by one definition, an AGENT is a system that can take actions in the world)

  • e.g. “What is 1 + 1?” vs “What is 1 + 1? If the question requires a calculator, you can call it with calculator('1+1')?” - the latter lets the model offload functions that it’s not good.
*This is what the little globe icon in ChatGPT does

Example prompt

this information is usually abstracted away, but behind the scenes, RAG is really just a prompt

RAG Diagram bg left:62%


You can implement other tools in the ‘chain’

Recall the system prompts we looked at

Langchain


Code Interpreter


Conclusion

Machine learning can help database searching and information retrieval at every step of the process. LLMs themselves are useful for information behavior sometimes, but often more valuable as part of a bigger system.

  • Fundamentally, innovations in text-based language modelling allow our interactions with database systems to be more human

Today’s Lab: Prompt & Circumstance

Design a self-contained interactive experience (game, tool, or interface) that lives entirely within a single AI prompt. The experience should activate when a user types ‘START’ and maintain its designed behavior throughout the interaction.

I’ve been creating some examples to model this.


References