Simply once we all obtained cozy with Midjourney and DALL-E 3 — considering it was the gold normal — OpenAI went forward and dropped GPT-4o. No huge promo marketing campaign, no mysterious teaser. Only a informal announcement that, oh by the best way, their new mannequin occurs to be ridiculously good at creating photographs.
At first look, you may assume, “Alright, it’s most likely simply DALL-E 3 with a brand new coat of paint.” However no, this isn’t simply an replace. It’s a full-blown glow-up. Think about DALL-E 3 going by means of a Rocky-style coaching montage, studying from its previous errors, and coming again shredded.
So I did what any curious, barely obsessive nerd would do: I put them to the take a look at. Facet-by-side. Immediate for immediate. From photorealism to pixel artwork to summary concepts and even that cursed “room with out an elephant” problem — I threw every part at them.
Right here’s how GPT-4o stacks up in opposition to its older sibling — and spoiler alert: issues get just a little one-sided.
What’s DALL-E 3?
When you’ve been anyplace close to ChatGPT the previous few years, you’ve got most likely heard of DALL-E 3.
It’s (or, was, however I’m getting forward of myself) OpenAI’s most important text-to-image technology mannequin — a mannequin optimized for understanding context. Developed as a major leap ahead from its predecessors, DALL-E 3 represents a leap in how synthetic intelligence can rework textual descriptions into gorgeous, nuanced visible representations.

What made DALL-E 3 genuinely spectacular is its unprecedented stage of immediate understanding and picture technology accuracy. In contrast to earlier fashions that always produced considerably summary or imperfect photographs, this model can translate complicated, multi-layered descriptions into exact visuals.
However hey, don’t take my phrase for it, as a substitute take my phrase for it after I was reviewing the mannequin when it first got here out.
What’s GPT-4o Picture Era?
After I first heard the information, my first query was “what makes OpenAI’s new picture mannequin completely different from DALL-E?”
At a floor stage, not a lot. The best way you possibly can entry and use their new mannequin is identical because it all the time was: by means of ChatGPT or by utilizing their APIs. Essentially the most important change (and belief me, it’s important) is their functionality.

The most important limitations of AI picture turbines in the present day are context dealing with and textual content technology. It doesn’t matter if it’s DALL-E 3, Midjourney, Firefly, Meta — they typically fail when given an extended immediate or requests that want quite a lot of textual content.
OpenAI’s GPT-4o Picture Generator is the change we would have liked. I imply, simply take a look at this:


Authentic Immediate:
That isn’t simply acceptable, that’s excellent.
This is the reason I’m excited to do this one out, however a easy take a look at wouldn’t reduce it. As a substitute, I wished to check it in opposition to its predecessor: DALL-E 3.
GPT-4o Picture Era vs. DALL-E 3
Photorealism
Immediate: A 1:1 picture taken with a cellphone of a younger man reaching the summit of a mountain at dawn. The sector of view reveals different hikers within the background taking a photograph of the view.


DALL-E 3 continues to be caught in that uncomfortable “uncanny valley” the place individuals appear to be they have been stretched. Background people scale about as naturally as a fun-house mirror.
However GPT-4o? That is completely different. These photographs appear to be they have been snapped on a smartphone — so excellent that you just’d swear a human photographer was behind the lens. It is not simply good. It is “did I by chance obtain a inventory photograph?” good.
Pixel Artwork
Immediate: A pixel artwork illustration of the Taj Mahal.


DALL-E 3 tries exhausting — actually exhausting. It generates these flashy pixel artwork photographs that look spectacular at first look. Zoom in, although, and the magic falls aside. Pixels mix like watercolors as a substitute of being distinct.
As for GPT-4o, it is the pixel artwork purist’s dream. Easy, clear, each pixel precisely the place it ought to be.
Structure & Inside Design
Immediate: Create a picture of the inside design of a Bauhaus-inspired house.


DALL-E 3 apparently missed the memo on Bauhaus utterly. Throw a Bauhaus immediate at it, and you will get one thing that appears prefer it was designed by a bat who as soon as noticed a Bauhaus poster from actually distant.
GPT-4o nails it. Colours pop — each line is intentional and each shade is calculated. That is Pinterest prepared.
Mimicking Artwork Types
Immediate: Create a picture of a dawn as seen from a beachfront villa, within the fashion of Van Gogh.


After seeing y’all make “Studio Ghibli”-style photographs of yourselves, I’ll admit — I used to be tempted to do the identical for this spherical, however I opted to go a special (however acquainted) route: Van Gogh.
DALL-E 3’s Van Gogh? Positive, there are swirls. Positive, there’s some blue. However this is not Van Gogh — that is Van Gogh’s distant, much less proficient cousin. In the meantime, GPT-4o recreates brush strokes so completely you possibly can virtually really feel the feel of the canvas.
Summary Ideas


Each fashions deal with summary ideas surprisingly nicely. However DALL-E 3 nonetheless cannot shake that telltale “AI smoothness” — you recognize, that digital polish that screams “computer-generated.” It is like taking a look at a wonderfully waxed ground: spectacular, however one thing’s simply… off.
Textual content Era
Immediate: Create a picture of a mileage signal taken by a cellphone. The content material of the signal have to be as follows:
Line 1: “Manila” “10.1KM”
Line 2: “Antipolo” “20.4KM”
Line 3: “Batangas” “34.5KM”
Line 4: “Quezon” “49.44KM”
Line 5: “Naga” “142.4KM”


GPT-4o has perfected AI textual content technology in photographs. It’s not simply DALL-E 3 — Midjourney, Firefly, Grok — all of them need to play catch-up to be this good. There’s not a single letter missed, artifact misplaced, or quantity malformed. That is simply a picture of a mileage signal, and I imply that in a great way.
“A Room With out An Elephant”
Immediate: Create a picture of a room with out an elephant.


It is a well-known immediate within the r/ChatGPT neighborhood that famously breaks DALL-E. While you specify an exclusion, on account of low contextual understanding, DALL-E consists of it within the picture as a substitute. You possibly can see the identical factor occurring above.
Happily, GPT-4o doesn’t have the identical situation anymore, exhibiting that its nuance is evolving. It’s boring — correctly.
The Backside Line
I’ve mentioned this earlier than and I’ll say it once more: DALL-E 3, whereas good at context, was dangerous at artwork. Happily, it’s simply that GPT-4o walked in and made it appear to be a warm-up act.
In almost each class, GPT-4o doesn’t simply outperform — it redefines what “good” means in AI picture technology. Whether or not you’re speaking realism, artwork fashion mimicry, or absolutely the nightmare that’s rendering readable textual content in a picture, GPT-4o dealt with all of it prefer it was constructed for this.
The actual kicker? Context. GPT-4o truly will get what you’re asking for — not simply the phrases, however the intention behind them. You say “a room with out an elephant,” and for as soon as, the mannequin doesn’t attempt to sneak a cartoon elephant within the nook. It simply… listens.
That’s what units it aside. It’s not nearly sharper pixels or prettier outputs. It’s about understanding. And as soon as an AI mannequin begins doing that reliably? That’s when issues get thrilling.
So yeah — DALL-E 3 had a great run. But when that is the place GPT-4o begins, I can’t wait to see what’s subsequent.