Question 1

What is "LLM-as-a-judge" about?

Accepted Answer

Use a strong model as the grader against a rubric — pointwise scores or pairwise 'which is better.' Fast and scalable, but watch its biases (order, verbosity, self-preference) and calibrate it against human labels.

Question 2

What problem does it solve?

Accepted Answer

You have 10,000 open-ended answers to grade. Humans can't score them all.

Question 3

What will I be able to do after this lesson?

Accepted Answer

You can explain LLM-as-a-judge, its biases, and how to make it trustworthy.

Question 4

What comes next?

Accepted Answer

Both evals and judges need a source of truth: labeled data.