On the Necessity and Challenges of Safety Guardrails for Deep Learning Models
In recent years, deep learning models, especially so-called transformer and diffusion models, became the powerhouses of the AI world. They demonstrated superhuman or near-superhuman performance in tasks such as natural-language processing, image generation, and others. Yet their superiority or their expressive capacity also brings trouble, as it makes them more dangerous and more entangled within our reality. In this article we address issues of safety guardrails for deep learning models, and their unique challenges of explainability. My favorite Human Analogy is Understanding Minds and Brains You and I often arrive at decisions — you to behave, and me to predict how you might behave — when we each have very little insight into the other’s inner control loops. And we’re often highly successful in gaining trust and safety that way, thanks to societal conventions, laws, governance — the ‘rules of the game’ that create robust norms for behaviour. The same dynamic applies to machine learni...