In a bid to improve accountability and transparency in AI development, OpenAI has released a preliminary draft of “Model Spec.” This first-of-its-kind document outlines the principles guiding model behavior in its API and ChatGPT, OpenAI announced in a blog post.
“We’re doing this because we think it’s important for people to be able to understand and discuss the practical choices involved in shaping model behavior,” the company said in the blog. “The Model Spec reflects existing documentation that we’ve used at OpenAI, our research and experience in designing model behavior, and work in progress to inform the development of future models. This is a continuation of our ongoing commitment to improve model behavior using human input and complements our collective alignment work and broader systematic approach to model safety.”
Model behavior — how AI models respond to user inputs, encompasses various aspects such as tone, personality, and response length — plays a critical role in AI-human interactions. It is a complex task to shape the behavior as models learn from diverse datasets and may encounter conflicting objectives in practice.
Shaping this behavior is still a nascent science, as models are not explicitly programmed but instead learn from a broad range of data, OpenAI said.
The “Model Spec” draft document outlined a three-pronged approach to shaping AI behavior. This document specifies OpenAI’s “desired model behavior” and how the company evaluates tradeoffs when “conflicts arise,” the ChatGPT creator added in the blog.
The first part of the Model Spec focuses on core objectives. These are broad principles that guide model behavior, including assisting users to achieve their goals, benefiting humanity, and reflecting positively on OpenAI. These foundational principles also ask model behavior to adhere to “social norms and applicable law.”
Beyond these broad objectives, the document also provides clear instructions, which the blog refers to as “rules.” These rules are designed to address complex situations and “help ensure the safety and legality” of AI actions. Some of these rules include following instructions from users, complying with laws, avoiding the creation of information hazards, respecting user rights and privacy, and avoiding the generation of inappropriate or NSFW (not safe for work) content.
Finally, the Model Spec acknowledges that there may be situations where these objectives and rules “conflict.” To navigate these complexities, the document suggests default behaviors for the AI model to follow. These default behaviors include assuming the best intentions from the users, being helpful without “overstepping” boundaries, and encouraging respectful interactions.
“This is the direction the models should ideally be going and it’s great to see OpenAI making the effort with this new spec on how a model should behave according to the user with greater context and personalization but more so “responsibly,” said Neil Shah, VP for research and partner at Counterpoint Research, a global research and consulting firm.
“Our intention is to use the Model Spec as guidelines for researchers and data labelers to create data as part of a technique called reinforcement learning from human feedback (RLHF),” another document by OpenAI detailing the Model Spec said. “The Spec, like our models themselves, will be continuously updated based on what we learn by sharing it and listening to feedback from stakeholders.”
RLHF will drive how a model will be more tuned to actual human behavior but also make it transparent with set objectives, principles, and rules. This takes the OpenAI model to the next level making it more responsible and useful, Shah said. “Though this will be a constantly moving target to fine-tune the specs as there are a lot of grey areas with respect to how a query is construed and what the final objective is and the model has to be intelligent and responsible enough to detect if the query and response is less responsible.”
The Model Spec represents a significant step towards achieving ethical AI. The company emphasizes the importance of building trust with users and public, who are increasingly interacting with AI systems in their daily lives.