Stay organized with collections
Save and categorize content based on your preferences.
check_circle
Introduction to AI Evals
keyboard_arrow_down
keyboard_arrow_up
subject
Article
An introduction to AI Evals: why we need them, and how to create them.
check_circle
What you'll learn
keyboard_arrow_down
keyboard_arrow_up
subject
Article
What to expect from this series, and what you should know before you start.
check_circle
Mental model
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Mapping your web testing knowledge to the world of large language models.
check_circle
Design evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Define what good and bad looks like for your AI application.
check_circle
Build rule-based evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Automate the basics. Use code to catch simple errors.
check_circle
Build a basic judge, part 1
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Get your subjective evaluations running with a basic judge model.
check_circle
Build a basic judge, part 2
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Finish setting up your basic judge model to get your subjective evaluations running.
check_circle
Build an evals pipeline
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Applied engineering tips to build your AI testing pipeline.
check_circle
Run evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Structure your testing into layers.
check_circle
Course resources
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Optional
A non-exhaustive list of sources used in this course and eval tools that can help you.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[]]