OpenAI’s new Structured Outputs feature is designed to ensure that model-generated outputs exactly match JSON schemas that you provide. This feature is particularly beneficial for developers who need consistent and structured data formats, whether for API integration, data processing or application development.
I’ll walk you through getting started with Structured Outputs, including setting up your environment, defining JSON schemas and generating model outputs that conform to your specifications using the OpenAI API.
Introduction to Structured Outputs
Structured Outputs allow you to enforce specific data formats by defining JSON schemas that the model’s output must follow. This ensures that the model-generated data is both predictable and reliable, fitting seamlessly into your existing data workflows. Structured Outputs can be implemented in two main ways: through function calling and by using the response_formatparameter with the new json_schema option.
Why Use Structured Outputs?
Structured Outputs are incredibly useful when you need to:
- Integrate with other APIs that require data in a specific format.
- Ensure consistency in data returned by your model, reducing the need for additional validation or formatting.
- Simplify the process of using large language models (LLMs) in applications that depend on structured data, such as databases or web services.
-
Getting Started
Prerequisites
Before diving into Structured Outputs, make sure you have the following:
- Python installed on your machine.
- An OpenAI API key.
- The dotenv library for managing environment variables. You can install the necessary libraries using pip:
pip install openai python-dotenv
Set Up Your Environment
Start by creating a .env file in your project directory to securely store your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key
Next, load this API key in your Python script to interact with the OpenAI API:
import os
import openai
from dotenv import load_dotenv
load_dotenv()
# Set OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')
Using Structured Outputs in the OpenAI API
Let’s walk through how to use Structured Outputs in practice, focusing on both function calling and the response_format parameter.
1. Define a JSON Schema
To begin, define a JSON schema that your model’s output should conform to. For this example, I’ll assume you are working with a simple schema for user profile data.
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age", "email"]
}
This schema specifies that the output must be an object containing three fields: name, age and email. The name field is a string, age is an integer and email must follow the email format.
2. Set Up the API Request
Next, set up an API request that instructs the model to generate data matching this schema. Use the response_format parameter with the json_schema option to enforce the structure.
def generate_profile():
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age", "email"]
}
response = openai.Completion.create(
model="gpt-3.5-turbo",
prompt="Generate a user profile:",
response_format={"json_schema": schema},
max_tokens=100
)
return response['choices'][0]['message']['content']
print(generate_profile())
3. Use Function Calling With Structured Outputs
Another way to leverage Structured Outputs is through function calling. This approach allows you to define specific functions that the model can call based on the provided schema. Here’s how you can implement this:
a. Define the Function
def create_user_profile(name, age, email):
return {
"name": name,
"age": age,
"email": email
}
b. Register the Function With OpenAI
You’ll need to register the function as a tool that the model can call.
tools = [
{
"type": "function",
"function": {
"name": "create_user_profile",
"description": "Create a user profile with name, age, and email.",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age", "email"]
}
}
}
]
c. Generate the Output
Now, you can generate the output by invoking the function via the API.
import jsonschema
from jsonschema import validate
def validate_profile(profile, schema):
try:
validate(instance=profile, schema=schema)
print("Profile is valid.")
except jsonschema.exceptions.ValidationError as err:
print("Profile is invalid:", err)
# Example usage
profile = generate_profile_with_function_calling()
validate_profile(json.loads(profile), schema)
d. Validate the Output
Once you have the output, it’s crucial to validate it against the schema to ensure it meets all the specified requirements. Although the API attempts to conform to the schema, it’s always a good idea to add an extra layer of validation.
Handling Errors and Exceptions
When working with Structured Outputs, you might encounter errors if the model’s output does not match the defined schema. Handling these errors gracefully is essential for building robust applications.
def generate_and_validate_profile():
try:
profile = generate_profile_with_function_calling()
validate_profile(json.loads(profile), schema)
return profile
except Exception as e:
return f"An error occurred: {e}"
print(generate_and_validate_profile())
Conclusion
Structured Outputs is a powerful feature that enables developers to enforce specific data formats in model outputs using JSON schemas. Whether through function calling or the response_format parameter, this feature ensures that the outputs generated by your models are predictable, consistent and ready for integration with other systems.
By following the steps outlined in this guide, you can start using Structured Outputs in your own projects, improving the reliability and utility of your AI-powered applications. Whether you’re integrating with APIs, working with databases or building data-driven applications, Structured Outputs can help you maintain the integrity of your data and reduce the need for post-processing.
Start experimenting with Structured Outputs today and see how this feature can streamline your workflow and enhance your application’s capabilities.
About the author
Oladimeji Sowole is a member of the Andela Talent Network, a private marketplace for global tech talent. A Data Scientist and Data Analyst with more than 6 years of professional experience building data visualizations with different tools and predictive models for actionable insights, he has hands-on expertise in implementing technologies such as Python, R, and SQL to develop solutions that drive client satisfaction. A collaborative team player, he has a great passion for solving problems.