Three lessons from using AI to “red team” risk

Red-Teaming Risk

Red-teaming is where we put ourselves in the position of an adversary – in our case, someone trying to subvert our security, values, and rules – to test the robustness of our preventative framework. Can AI help us with this task?

If you ask AI questions like “Where is a manufacturing business most vulnerable to fraud? Please provide examples, labelling the most common scenes and emerging threats.” It will provide answers like those in the screenshot 👆. Not revelatory, nor customised to what (you manufacture), where, how, and with whom. But it’s a start if you’re stuck.

However, some AI tools will refuse requests for what they deem harmful information, for instance, how to perform insider trading (response 👇).

There is a workaround. The New Scientist suggests that researchers got around ChatGPT’s moral conscience when they switched the language to more niche ones. Specifically, they used Google Translate to pose questions about insider trading and how to commit terrorist acts and found that Zulu had the highest success rate, bypassing the chatbot’s safeguards 53% of the time, followed by Scots Gaelic (43%), the Hmong (29%), and Guaraní (16%).

So what?

Well, a few lessons spring to mind:

🚦 We can expect “bad actors” (not Steven Seagal, dancing below; people who wish your organisation harm) to leverage AI for ideas. We might, therefore, wish to give AI’s “red-teaming” capabilities a whirl. AI will get better at this. If you want a breakdown of the different AI tools I use and how, I wrote about it here.

🚦 Much like compliance, we’ll get bad results if we give unclear instructions. Play around with the commands you give.

🚦 Whatever safeguards we create – like ChatGPT’s prohibition of certain questions – humans are endlessly creative when seeking to subvert. In the example I gave, a simple hop to Google Translate, but what is your risk framework’s Zulu?

Need more?

Book a (free) strategy session, get new articles, and other content designed to be useful and fun.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Need more?

Services

Ethics Insight

Your Quick Guide To Managing Ethics & Compliance