Skip to content
See How AI Works
← all lessonsE.9●●●○○elective

Safety attack lab

Injection, excessive agency, data exfiltration, unsafe output: break each chain with one guard.

1The attack is landing. Tap the green guard node to break the chain.your turn
LLM01: Prompt InjectionPrompt injection
Untrusted web page
Agent obeys the hidden command
HARM: Agent runs the attacker's instructions

the wallThe page hid "ignore your instructions and …" β€” the agent read it as a command and obeyed.

β†’ continue← backR replay

Next up

E.10 Capstone: fix a broken AI assistant
Elective RoomsΒ·

Common questions

What is "Safety attack lab" about?
Injection, excessive agency, data exfiltration, unsafe output: break each chain with one guard.
What problem does it solve?
A helpful agent can be turned against you. How do attackers do it, and how do you stop them?
What will I be able to do after this lesson?
You can recognize prompt injection, excessive agency, data exfiltration, and unsafe output handling, and the guard that defuses each (OWASP LLM Top 10).