- What is JSON Mode in Google Gemini?
- Example: Gemini response without JSON mode
- Example: Gemini 1.0 response in JSON mode (the model does NOT support it)
- Example: Gemini 1.5 response in JSON mode
- Example: Gemini 1.5 response in JSON mode with specific JSON schema
- Learnings: Structured Outputs in Gemini-1.5 vs GPT-4o
- Summary
What is JSON Mode in Google Gemini?
I recently blogged about JSON Mode and Structured Outputs in OpenAI APIs. Today, I’ve examined the OpenAI competitor Google Gemini and tested whether it can adhere to the requested schema.
The JSON Mode is a feature that helps enforce the output of Chatbots to be in JSON format (in contrast to natural language). This is super useful when using APIs when developing software.
Example: Gemini response without JSON mode
So, as a baseline, let’s see how the API responds when the output format is not specified:
// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent
// x-goog-api-key: ***
{
"contents": [
{ "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
]
}
Code language: JSON / JSON with Comments (json)
The response is something like:
Here are the capitals of the countries you listed:
* **France:** Paris
* **Spain:** Madrid
* **Poland:** Warsaw
Code language: plaintext (plaintext)
Example: Gemini 1.0 response in JSON mode (the model does NOT support it)
Before I show an example of a request that works, let me share one that I expected to work, but it didn’t:
// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
// x-goog-api-key: ***
{
"generationConfig": {
"response_mime_type": "application/json"
},
"contents": [
{ "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
]
}
Code language: JSON / JSON with Comments (json)
The response here is:
{
"error": {
"code": 400,
"message": "Json mode is not enabled for models/gemini-pro",
"status": "INVALID_ARGUMENT"
}
}
Code language: JSON / JSON with Comments (json)
The learning here is that gemini-pro
model ID points to the older Gemini 1.0 Pro, not Gemini 1.5 Pro, where this feature is enabled! So, you most likely want to use the gemini-1.5-pro-latest
model ID instead.
Example: Gemini 1.5 response in JSON mode
When we use the model that supports JSON mode (like gemini-1.5-flash
or gemini-1.5-pro
), the response indeed comes nicely formatted as JSON:
// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent
// x-goog-api-key: ***
{
"generationConfig": {
"response_mime_type": "application/json"
},
"contents": [
{ "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
]
}
Code language: JSON / JSON with Comments (json)
The response now was:
{
"France": "Paris",
"Spain": "Madrid",
"Poland": "Warsaw"
}
Code language: JSON / JSON with Comments (json)
This JSON mode flag is slightly more refined than OpenAI’s non-strict JSON mode. OpenAI’s JSON mode requires you to request JSON output in the prompt, and you have documented the misbehavior of getting stuck generating whitespace if you are not explicit enough.
Example: Gemini 1.5 response in JSON mode with specific JSON schema
The cherry on top is the ability to request an answer that adheres to a given JSON schema. Here, I specifically ask for an array of objects with two properties: “Country” and “Capital”. This is how you do it in an HTTP request:
// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent
// x-goog-api-key: ***
{
"contents": [
{ "parts": [{ "text": "What are the capitals of France, Spain, Poland?" }] }
],
"generationConfig": {
"response_mime_type": "application/json",
"response_schema": {
"type": "object",
"properties": {
"capitals": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Country": { "type": "string" },
"Capital": { "type": "string", "description": "Country's capital name, in ALL CAPS. A local name, not international one." }
}
}
}
}
}
}
}
Code language: JSON / JSON with Comments (json)
This request produces the desired response:
{
"capitals": [
{ "Country": "France", "Capital": "PARIS" },
{ "Country": "Spain", "Capital": "MADRID" },
{ "Country": "Poland", "Capital": "WARSZAWA" }
]
}
Code language: JSON / JSON with Comments (json)
It’s interesting to note that the AI model takes the “description” property from the schema as a hint on how to fill in the value, which is super useful and allows us to define the details of the contract.
Learnings: Structured Outputs in Gemini-1.5 vs GPT-4o
It seems that, on a high level, we now have a feature parity between OpenAI’s newest GPT-4o models and Gemini-1.5 regarding structured outputs. They both can respond by adhering to the provided JSON schema.
APIs are a bit picky about the provided schema. It might require some fine-tuning to have the same schema working everywhere:
- OpenAI requires flags like additionalProperties: false in the schema to be set and marking all properties as required.
- Gemini has fewer restrictions, but it complains when it doesn’t recognize some properties in the schema.
Both APIs use hints in descriptive property names or separate description parameters.
Summary
It’s great news that we can cause two independently developed AI models to generate output in precisely the same structured format. This not only makes integration with those APIs much easier but also allows us to consult two or more independently trained AI models and see if their responses match or diverge, increasing (or decreasing) confidence in the response.
No comments yet, you can leave the first one!