DALL-E 3: `vivid` vs `natural` styles compared

What is the new style parameter?

OpenAI released a preview of a DALL-E 3 API this week, and I’m excited to play with it today.

We can now control a new parameter named style. The documentation explains:

style (defaults to vivid):

The style of the generated images. Must be one of vivid or natural:
vivid causes the model to lean towards generating hyper-real and dramatic images.
natural causes the model to produce more natural, less hyper-real looking images.

This param is only supported for dall-e-3.

Source: DALL-E 3 API reference

Some examples of this parameter’s impact are in the cookbook, but I couldn’t get a sense of how much they differ based on a few images of a coffee set.

My current interest is in generating photorealistic images (as opposed to symbolic images or art), so I asked DALL-E to generate a few photography-like images.

Example images

The following table contains example images generated using DALL-E 3 in different ways:

  1. The image was generated using ChatGPT’s new “DALL-E” GPT. It is available as a part of ChatGPT Plus subscription. It’s only available via Web UI, and we have no direct control over low-level API parameters like style, quality, size. We can only influence the result with our prompt.
  2. Image generated using the https://api.openai.com/v1/images/generations API with quality=standard, size=1024x1024, model=dall-e-3 and style=natural
  3. Image generated using the https://api.openai.com/v1/images/generations API with quality=standard, size=1024x1024, model=dall-e-3 and style=vivid
PromptChatGPT’s stylenatural stylevivid style
A portrait of a school bus driver
Generate a photo of a surgeon performing brain operation
Generate a photorealistic image of a rock star performing on a stage of a large festival in the evening
Dancing people in the evening, sunset, city, flash photography, ƒ/3.5
A portrait of a dog in a library, Sigma 85mm f/1.4

* acknowledgement: this and the following prompts come from an article by Merzmensch, which helped me direct DALL-E to generating more photo-realistic results. Thanks!
A bitten-into apple hanging on branch of an apple tree, Sigma 85mm f/1.4
An image of a couple sharing an umbrella on a quaint park bench amidst falling rain.

Lessons learned

Each image is unique, and comparing them is subjective by nature. My view is that:

  • ChatGPT seems to produce images close to ‘vivid’ in style. In my opinion, at this moment, it tends to make more interesting images than those returned by the API with the vivid setting.

    There are some threads in the forums, like this one, showing examples of API generating arguably lower-quality images than ChatGPT. This might be temporary, as we’re dealing with a preview product, and can probably be explained by how the prompt is differently pre-processed and rewritten in those cases.
  • The natural style produces images that I’d describe as more “photorealistic” – looking like something possible to capture with a camera rather than created by a computer game. They might look a bit more bland, but I like them, and I’ll probably use this setting a lot.
  • The default “vivid” style leads to cartoony, dramatic images. Sometimes they look fantastic, and sometimes they look artificial with too much saturation and dramatism. I like the last one with the umbrella, which could perfectly serve as an illustration for a book. But some of them are overdone for my taste. They scream, “I’m generated by AI,” much louder than the natural ones 😉

Hope you enjoyed this short comparison!

Leave a Comment