Gemini: JSON mode and Structured Outputs

Table of Contents

What is JSON Mode in Google Gemini?
Example: Gemini response without JSON mode
Example: Gemini 1.0 response in JSON mode (the model does NOT support it)
Example: Gemini 1.5 response in JSON mode
Example: Gemini 1.5 response in JSON mode with specific JSON schema
Learnings: Structured Outputs in Gemini-1.5 vs GPT-4o
Summary

What is JSON Mode in Google Gemini?

I recently blogged about JSON Mode and Structured Outputs in OpenAI APIs. Today, I’ve examined the OpenAI competitor Google Gemini and tested whether it can adhere to the requested schema.

The JSON Mode is a feature that helps enforce the output of Chatbots to be in JSON format (in contrast to natural language). This is super useful when using APIs when developing software.

Example: Gemini response without JSON mode

So, as a baseline, let’s see how the API responds when the output format is not specified:

// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent
// x-goog-api-key: ***

{
    "contents": [
        { "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
    ]
}
Code language: JSON / JSON with Comments (json)

The response is something like:

Here are the capitals of the countries you listed:

* **France:** Paris
* **Spain:** Madrid
* **Poland:** WarsawCode language: plaintext (plaintext)

Example: Gemini 1.0 response in JSON mode (the model does NOT support it)

Before I show an example of a request that works, let me share one that I expected to work, but it didn’t:

// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
// x-goog-api-key: ***

{
	"generationConfig": {
		"response_mime_type": "application/json"
	},
	"contents": [
		{ "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
	]
}
Code language: JSON / JSON with Comments (json)

The response here is:

{
	"error": {
		"code": 400,
		"message": "Json mode is not enabled for models/gemini-pro",
		"status": "INVALID_ARGUMENT"
	}
}
Code language: JSON / JSON with Comments (json)

The learning here is that gemini-pro model ID points to the older Gemini 1.0 Pro, not Gemini 1.5 Pro, where this feature is enabled! So, you most likely want to use the gemini-1.5-pro-latest model ID instead.

Example: Gemini 1.5 response in JSON mode

When we use the model that supports JSON mode (like gemini-1.5-flash or gemini-1.5-pro), the response indeed comes nicely formatted as JSON:

// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent
// x-goog-api-key: ***

{
	"generationConfig": {
		"response_mime_type": "application/json"
	},
	"contents": [
		{ "parts": [ { "text": "What are the capitals of France, Spain, Poland?" } ] }
	]
}
Code language: JSON / JSON with Comments (json)

The response now was:

{
  "France": "Paris",
  "Spain": "Madrid",
  "Poland": "Warsaw"
}Code language: JSON / JSON with Comments (json)

This JSON mode flag is slightly more refined than OpenAI’s non-strict JSON mode. OpenAI’s JSON mode requires you to request JSON output in the prompt, and you have documented the misbehavior of getting stuck generating whitespace if you are not explicit enough.

Example: Gemini 1.5 response in JSON mode with specific JSON schema

The cherry on top is the ability to request an answer that adheres to a given JSON schema. Here, I specifically ask for an array of objects with two properties: “Country” and “Capital”. This is how you do it in an HTTP request:

// POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent
// x-goog-api-key: ***

{
  "contents": [
    { "parts": [{ "text": "What are the capitals of France, Spain, Poland?" }] }
  ],
  "generationConfig": {
    "response_mime_type": "application/json",
    "response_schema": {
      "type": "object",
      "properties": {
        "capitals": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "Country": { "type": "string" },
              "Capital": { "type": "string", "description": "Country's capital name, in ALL CAPS. A local name, not international one." }
            }
          }
        }
      }
    }
  }
}
Code language: JSON / JSON with Comments (json)

This request produces the desired response:

{
  "capitals": [
    { "Country": "France", "Capital": "PARIS" },
    { "Country": "Spain", "Capital": "MADRID" },
    { "Country": "Poland", "Capital": "WARSZAWA" }
  ]
}Code language: JSON / JSON with Comments (json)

It’s interesting to note that the AI model takes the “description” property from the schema as a hint on how to fill in the value, which is super useful and allows us to define the details of the contract.

Learnings: Structured Outputs in Gemini-1.5 vs GPT-4o

It seems that, on a high level, we now have a feature parity between OpenAI’s newest GPT-4o models and Gemini-1.5 regarding structured outputs. They both can respond by adhering to the provided JSON schema.

APIs are a bit picky about the provided schema. It might require some fine-tuning to have the same schema working everywhere:

OpenAI requires flags like additionalProperties: false in the schema to be set and marking all properties as required.
Gemini has fewer restrictions, but it complains when it doesn’t recognize some properties in the schema.

Both APIs use hints in descriptive property names or separate description parameters.

Summary

It’s great news that we can cause two independently developed AI models to generate output in precisely the same structured format. This not only makes integration with those APIs much easier but also allows us to consult two or more independently trained AI models and see if their responses match or diverge, increasing (or decreasing) confidence in the response.

Google Gemini: JSON mode and Structured Outputs – request examples