Red-Teaming Risk
Red-teaming is where we put ourselves in the position of an adversary – in our case, someone trying to subvert our security, values, and rules – to test the robustness of our preventative framework. Can AI help us with this task?
If you ask AI questions like “Where is a manufacturing business most vulnerable to fraud? Please provide examples, labelling the most common scenes and emerging threats.” It will provide answers like those in the screenshot 👆. Not revelatory, nor customised to what (you manufacture), where, how, and with whom. But it’s a start if you’re stuck.
However, some AI tools will refuse requests for what they deem harmful information, for instance, how to perform insider trading (response 👇).
There is a workaround. The New Scientist suggests that researchers got around ChatGPT’s moral conscience when they switched the language to more niche ones. Specifically, they used Google Translate to pose questions about insider trading and how to commit terrorist acts and found that Zulu had the highest success rate, bypassing the chatbot’s safeguards 53% of the time, followed by Scots Gaelic (43%), the Hmong (29%), and Guaraní (16%).
So what?
Well, a few lessons spring to mind:
🚦 We can expect “bad actors” (not Steven Seagal, dancing below; people who wish your organisation harm) to leverage AI for ideas. We might, therefore, wish to give AI’s “red-teaming” capabilities a whirl. AI will get better at this. If you want a breakdown of the different AI tools I use and how, I wrote about it here.
🚦 Much like compliance, we’ll get bad results if we give unclear instructions. Play around with the commands you give.
🚦 Whatever safeguards we create – like ChatGPT’s prohibition of certain questions – humans are endlessly creative when seeking to subvert. In the example I gave, a simple hop to Google Translate, but what is your risk framework’s Zulu?