< Back

AI Tool Evaluation Journal

Week 1

Competencies: Critical Analysis, Fluency, and Evaluation, Technical Understanding, Skilling and Productivity

Overview

Compare different AI tools on specific tasks to understand their capabilities, limitations, and appropriate use cases.

Details

  • Select 2 different AI tools (e.g., ChatGPT, Claude, Gemini), or two different models from the same tool
  • Write a basic query. It can be anything you like, but some ideas:
    • recommendations (e.g. getting a movie or book you might like, suggesting gifts)
    • information gathering (e.g. researching a topic, finding histories about a certain topic)
    • creative ideation feedback (e.g. brainstorming ideas, writing stories)
  • Document differences in outputs for identical prompts
  • Analyze strengths/weaknesses for different types of tasks
  • Consider factors like accuracy, creativity, reasoning ability, and overall ‘vibes’

You can also try systems that show different models side by side. For example, Chatbot Arena is a project that shows you two secret models, and you can choose the better option of the two. Another tool is the Vercel playground (to use without an account, choose models that don’t say ‘Hobby’ or ‘Pro’).

You may notice some modules have different sizes (e.g. ‘8B’ vs ‘27B’in the model name, or ‘-mini’, or ‘haiku/sonnet/opus’). How much is lost with smaller/bigger models?

Completion Details

We’ll discuss and compare results in group discussion.

Even though we’re just discussing it, it’s good to keep notes or screenshots in case this enters your portfolio.

Portfolio Details

Size: 1 point

If including in portfolio: compare models at least three queries, jotting down brief notes on the differences you observe between each (point-form is fine), and summarize your impressions with one or two paragraphs.

You can choose the approach for what you focus on; for example, do you have model recommendations? Did you not observe any variability? What other quirk did you note? Would a specific system’s characteristics be better or less suited for specific types of tasks?

You can also choose what types of comparisons you want to make. Different company’s systems?

Grading

Grading focus will be on: whether the submission demonstrates thoughtful comparison and critical analysis of the AI tools tested; demonstration of fluency in interpreting AI results; understanding of different