What is the new style parameter?
OpenAI released a preview of a DALL-E 3 API this week, and I’m excited to play with it today.
We can now control a new parameter named style
. The documentation explains:
style (defaults to
vivid
):
The style of the generated images. Must be one ofvivid
ornatural
:
–vivid
causes the model to lean towards generating hyper-real and dramatic images.
–natural
causes the model to produce more natural, less hyper-real looking images.This param is only supported for
Source: DALL-E 3 API referencedall-e-3
.
Some examples of this parameter’s impact are in the cookbook, but I couldn’t get a sense of how much they differ based on a few images of a coffee set.
My current interest is in generating photorealistic images (as opposed to symbolic images or art), so I asked DALL-E to generate a few photography-like images.
Example images
The following table contains example images generated using DALL-E 3 in different ways:
- The image was generated using ChatGPT’s new “DALL-E” GPT. It is available as a part of ChatGPT Plus subscription. It’s only available via Web UI, and we have no direct control over low-level API parameters like
style
,quality
,size
. We can only influence the result with our prompt. - Image generated using the
https://api.openai.com/v1/images/generations
API withquality=standard
,size=1024x1024
,model=dall-e-3
andstyle=natural
- Image generated using the
https://api.openai.com/v1/images/generations
API withquality=standard
,size=1024x1024
,model=dall-e-3
andstyle=vivid
Prompt | ChatGPT’s style | natural style | vivid style |
---|---|---|---|
A portrait of a school bus driver | |||
Generate a photo of a surgeon performing brain operation | |||
Generate a photorealistic image of a rock star performing on a stage of a large festival in the evening | |||
Dancing people in the evening, sunset, city, flash photography, ƒ/3.5 | |||
A portrait of a dog in a library, Sigma 85mm f/1.4 * acknowledgement: this and the following prompts come from an article by Merzmensch, which helped me direct DALL-E to generating more photo-realistic results. Thanks! | |||
A bitten-into apple hanging on branch of an apple tree, Sigma 85mm f/1.4 | |||
An image of a couple sharing an umbrella on a quaint park bench amidst falling rain. |
Lessons learned
Each image is unique, and comparing them is subjective by nature. My view is that:
- ChatGPT seems to produce images close to ‘vivid’ in style. In my opinion, at this moment, it tends to make more interesting images than those returned by the API with the
vivid
setting.
There are some threads in the forums, like this one, showing examples of API generating arguably lower-quality images than ChatGPT. This might be temporary, as we’re dealing with a preview product, and can probably be explained by how the prompt is differently pre-processed and rewritten in those cases. - The
natural
style produces images that I’d describe as more “photorealistic” – looking like something possible to capture with a camera rather than created by a computer game. They might look a bit more bland, but I like them, and I’ll probably use this setting a lot. - The default “vivid” style leads to cartoony, dramatic images. Sometimes they look fantastic, and sometimes they look artificial with too much saturation and dramatism. I like the last one with the umbrella, which could perfectly serve as an illustration for a book. But some of them are overdone for my taste. They scream, “I’m generated by AI,” much louder than the
natural
ones 😉
Hope you enjoyed this short comparison!
No comments yet, you can leave the first one!