Is there benefit to refining Flux.1 prompts with gpt-4o or similar?

Recently, I’ve been generating many images to serve as illustrations in my Spanish vocabulary courses published on Flashcard Space.

Last week I switched from Stable Diffusion XL to Flux.1 dev. It is a significant upgrade in image quality, but I also noticed the illustrations have less variety than SDXL, at least when I use the same simple prompts as before. When I worked on a flashcard set that teaches names of professions in Spanish, it felt like most illustrations depicted the average-looking middle-aged Caucasian male with a beard as a representative of all professions, and I looked for a way to add variance to my image outputs 🙂

Idea: Using AI to refine prompts for image generator model

People discover interesting prompt engineering tricks to improve outputs of every image generation model. And since we already have powerful and cheap general-purpose LLMs like gpt-4o, we can codify the knowledge about how to make a good prompt tailored to Flux.1 and use AI to preprocess our basic prompt and create a fancier, refined version of it before we pass it to Flux.

My experiments will show images generated using three prompts:

Base prompt, something like The road is full of leaves
AI-preprocessed base prompt created with https://flux1.ai/prompt-generator prompt generator – much more verbose and fancier
AI-preprocessed base prompt created with https://shop.xerophayze.com/xerogenlite – a different prompt generator where the author put significant work into crafting good LLM prompts

Experiment #1: refining a concrete, non-abstract prompt

Base prompt: The road is full of leaves

Prompt: *The road is full of leaves,*
Model: Flux/flux1-dev-fp8, Seed: 123456, Steps: 20, CFG Scale: 1

Preprocessed prompt: *A winding road blanketed in vibrant autumn leaves, with fiery reds and golden yellows, dappled sunlight filtering through ancient trees, casting soft shadows, evoking a serene, tranquil ambiance.,*
Model: Flux/flux1-dev-fp8, Seed: 123456, Steps: 20, CFG Scale: 1

Preprocessed prompt: A serene autumn landscape with a winding road blanketed in vibrant orange and red leaves, tall trees arching overhead, casting dappled sunlight across the scene. A tranquil oil painting capturing the essence of fall, soft light filtering through the foliage.,
Model: Flux/flux1-dev-fp8, Seed: 123456, Steps: 20, CFG Scale: 1

Comment: The aesthetics differ, but there isn’t much difference in the image’s composition or idea.

Experiment #2: refining a more abstract prompt

The area where I hope to see the most significant benefit of AI-refining prompts is when they are vague and relate to abstract concepts rather than describing something visual. So as an example, let’s generate an illustration for a flashcard with the following vague sentence:

Base prompt: She is able to solve the problem.

Prompt: *She is able to solve the problem.,*
Model: Flux/flux1-dev-fp8, Seed: 6789, Steps: 30, CFG Scale: 1

Preprocessed prompt: *A confident young woman, surrounded by notebooks and a glowing laptop, deep in thought. Soft sunlight streams through a window, casting warm, golden hues across a cozy study.,*
Model: Flux/flux1-dev-fp8, Seed: 6789, Steps: 30, CFG Scale: 1

Preprocessed prompt: A woman with determination in her eyes, sitting at a desk cluttered with papers and books, surrounded by a calm and orderly office. The morning light streams through the window, illuminating her focused expression as she holds a pen poised to write. A detailed digital painting captures the moment with vibrant colors and precise details. Bright sunlight creates dynamic contrasts across the workspace.,
Model: Flux/flux1-dev-fp8, Seed: 6789, Steps: 30, CFG Scale: 1

Comment: The results differ in aesthetics, but Flux.1 had the same idea about depicting “solving a problem.” Let me perform one more to see if it can handle even more vague descriptions, this time with idioms.

Experiment #3: ultra abstract prompt with idiomatic expressions

Base prompt: He tried to kill two birds with one stone, but ended up biting off more than he could chew.

Prompt: *He tried to kill two birds with one stone, but ended up biting off more than he could chew.,*
Model: Flux/flux1-dev-fp8, Seed: 654654, Steps: 25, CFG Scale: 1

Preprocessed prompt: A man stands in a lush green park, humorously juggling two vibrant birds and an oversized stone, with an astonished expression. Bright sunlight filters through leafy trees, casting playful shadows. A whimsical, cartoonish style enhances the scene.,
Model: Flux/flux1-dev-fp8, Seed: 654654, Steps: 25, CFG Scale: 1

Preprocessed prompt: A whimsical scene depicting a man attempting to juggle two colorful birds in a vibrant garden, bright flowers surrounding him. The composition is dynamic and playful, illuminated by warm sunlight casting cheerful shadows. A lively digital painting showcasing vivid colors and detailed textures.,
Model: Flux/flux1-dev-fp8, Seed: 654654, Steps: 25, CFG Scale: 1

Comments: Finally, with a highly vague prompt and two idiomatic expressions, the Flux model gave up and produced useless output. The refined prompts allowed me to create pictures with reasonable aesthetics, but they failed to see too far beyond the literal meaning of idioms.

Discussion

Based on the above, I don’t have a definite conclusion on whether it’s worth adding complexity to a project to preprocess and refine prompts with an LLM model like gpt-4o before passing the prompt to Flux.1.

It seems that Flux.1 has quite a good understanding of natural language, and as long as we don’t use idioms and abstract ideas, it can handle some ambiguity in prompts. Of course, the value added by preprocessing the prompt will depend on the guidelines we provide to LLM. If we are determined to invest time in it, we can surely guide the model to understand idioms, improve the aesthetics, or enforce a concrete image style.

If you have some experience with pre-processing prompts before sending them to Flux.1 and would like to share your conclusions with other readers, please leave a comment!