Interpretability
The idea: Explore real attention patterns / feature activations on curated inputs.
What you'll be able to do: You can explain interpretability: features inside a model track human concepts.
The problem it solves: Can we see what's happening inside?
Builds on: Attention scores, softmax & the weighted sum
← Hallucination · Next: Training-data curation →
All lessons