What is the new quality parameter?
DALL-E 3 introduced a new API parameter,
quality. The documentation explains:
quality (standard or hd. Defaults to standard)
The quality of the image that will be generated. hd creates images with finer details and greater consistency across the image. This param is only supported forSource: DALL-E 3 API reference
Which one should we choose, and is the difference significant? I decided to put them to a small test.
Before we begin, let me mention that both options have different pricing, so the choice is not only about the quality but also how much we are willing to pay 🤑 So the question is really, is HD worth double the price for your use case? Or is it already in the diminishing results zone, and the difference would be hard to notice?
As I write it in November 2023, the pricing page has the following prices listed:
|DALL·E 3||Standard||1024×1024||$0.040 / image|
|Standard||1024×1792, 1792×1024||$0.080 / image|
|DALL·E 3||HD||1024×1024||$0.080 / image|
|HD||1024×1792, 1792×1024||$0.120 / image|
Standard and HD settings compared
To see how this particular setting affects generated images, I decided to:
- Use the same prompts and API requests, and only modify the
- Use the
naturalstyle of generated images. See my other post if you are interested in how a choice of style between natural and vivid impacts output images.
- Measure the performance of the operation (the total time) as well, expecting that that could be an additional decision factor other than the cost.
Prompt 1: Umbrella
Prompt: An image of a couple sharing an umbrella on a quaint park bench amidst falling rain, evening, lantern light, summer.
Response time (standard): 9368 ms
Response time (hd): 15048 ms
Prompt 2: Maine Coon
Prompt: A photo of a Maine Coon hunting, Sigma 24 mm f/8
Response time (standard): 10669 ms
Response time (hd): 14314 ms
Prompt 3: style mimicking
Prompt: A photo of a woman playing drums in the style of Herman Leonard
Response time (standard): 15699 ms
Response time (hd): 12991 ms
Prompt 4: portrait
Prompt: A portrait of a woman with long hair, outdoor, in elegant dress, posing in blooming garden, golden hour
Response time (standard): 10608 ms
Response time (hd): 21375 ms
I hope you enjoyed this quick comparison! Please note that the above test is performed on a preview version of the
dall-e-3 model. I’m sure this technology will be rapidly changing, and sooner or later, the above examples will be outdated.
Images vary a lot, and I don’t know how to make them vary less (ideally, we’d like to see the same picture but “rendered” with different quality settings, but I don’t think that level of control is currently possible).
Based on the examples above, I would choose the “hd” option most of the time if the price was equal, but the difference isn’t striking for me. I use image generation to help explain words in language learning, and “standard” seems good enough for this use case.
I am a bit spoiled playing with Midjourney, which raised the image quality bar very high (albeit it is much harder to control with prompts), so I’d still like to see improved results in Dall-E. But the difference between Dall-E 2 and Dall-E 3 is striking anyway.
And how about you, do you see a big difference between “standard” and “hd” yourself? 🙂