What it takes to make AI responsible in an era of advanced models

AI’s content-creation capabilities have skyrocketed in the last year, yet the act of writing remains incredibly personal. When AI is used to help people communicate, respecting the original intent of a message is of paramount importance—but recent innovation, particularly in generative AI, has outpaced existing approaches to delivering responsible writing assistance.

When thinking about safety and fairness in the context of AI writing systems, researchers and industry professionals usually focus on identifying toxic language like derogatory terms or profanity and preventing it from appearing to users. This is an essential step toward making models safer and ensuring they don’t produce the worst of the worst content. But on its own, this isn’t enough to make a model safe. What if a model produces content that is entirely innocuous in isolation but becomes offensive in particular contexts? A saying like “Look on the bright side” might be positive in the context of a minor inconvenience yet outrageously offensive in the context of war.

As AI developers, it’s not enough for us to block toxic language to claim our models are safe. To actually deliver responsible AI products, we must understand how our models work, what their flaws are, and what contexts they might be used in—and we must put controls in place to prevent harmful interactions between our AI systems and our users.

The problem and why it matters

According to a Forrester study, 70 percent of people use generative AI for most or all of their writing and editing at work. With this rise in the use of generative AI tools, more content than ever before is regularly interacting with AI, machine learning (ML), and large language models (LLMs).

And we know that AI makes mistakes. Typically, when an AI model makes a suggestion that changes the meaning of a sentence, it’s a harmless error—it can simply be rejected. This gets more complicated as technology advances and as developers rely more on LLMs. For instance, if an LLM is prone to political bias, it might not be responsible to allow it to generate political reporting. If it’s prone to misinformation and hallucination, it may be dangerous and unethical to allow it to generate medical advice and diagnoses. The stakes of inappropriate outputs are much higher, with harmless errors no longer the only outcome.

A way forward

The industry must develop new tactics for safety efforts to keep up with the capabilities—and flaws—of the latest AI models.

I previously mentioned a few circumstances in which blocking toxic language is not enough to prevent dangerous interactions between AI systems and our users in today’s ecosystem. When we take the time to explore how our models work, their weaknesses, and the contexts they will be used in, we can deliver responsible support in those examples and more:

A generative AI writing tool can draft a summary of a medical diagnosis. However, given the risk of inserting misleading or out-of-context information, we can prevent the LLM from returning inaccurate information by using the right ML model as a guardrail.

Political opinions are nuanced, and an AI product’s suggestion or output can easily misconstrue the integrity of a point since it doesn’t understand the intent or context. Here again, a carefully crafted model may prevent an LLM from engaging with some political topics in cases where there is a risk of misinformation or bias.

If you’re writing a condolence note to a coworker, a model can prevent an AI writing assistant from making a tone-deaf suggestion to sound more positive.

One example of a mechanism that can help deliver results like these is Seismograph—the first model of its kind that can be layered on top of large language models and proprietary machine learning models to mitigate the likelihood of dicey outputs. Much as a seismograph machine measures earthquake waves, Seismograph technology detects and measures how sensitive a text is so models know how to engage, minimizing the negative impact on customers.

Seismograph is just one example of how a hybrid approach to building—with LLMs, ML, and AI models working together—creates more trustworthy and reliable AI products. By reducing the odds of AI delivering adverse content without appropriate context, the industry can provide AI communication assistance from a place of empathy and responsibility.

The future of responsible AI

When AI communication tools were primarily limited to the basic mechanics of writing, the potential damage done by a writing suggestion was minimal regardless of the context. Today, we rely on AI to take on more complex writing tasks where context matters, so AI providers have a greater responsibility to ensure their technology doesn’t have unintended consequences.

Product builders can follow these three principles to hold themselves accountable:

1. Test for weak spots in your product: Red teaming, bias and fairness evaluations, and other pressure tests can uncover vulnerabilities before they significantly impact customers.

2. Identify industry-wide solutions that make building responsible AI easier and more accessible: Developments in responsible approaches help us all improve the quality of our products and strengthen consumer trust in AI technology.

3. Embed Responsible AI teams across product development: This work can fall through the cracks if no one is explicitly responsible for ensuring models are safe. Companies must prioritize Responsible AI teams and empower them to play a central role in building new features and maintaining existing ones.

These principles can guide the industry’s work and commitment to developing publicly available models like Seismograph. In doing so, we demonstrate that the industry can stay ahead of risk and provide people with more complex suggestions and generated outputs—without causing harm.

We’ve featured the best AI chatbot for business.

This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro