Let’s take a fast historical past lesson and look again on the state of AI picture era a yr in the past. We could not reliably generate faces, DALL-E 2 had simply been launched a couple of months prior and had combined outcomes, Midjourney V4 was beginning to make some noise, and Secure Diffusion’s main the best way with 2.0.
In only a yr, AI artwork has been almost excellent besides for 2 vital roadblocks: nuance and textual content era.
Quick ahead to right this moment: we simply had DALL-E 3 a couple of months again, and earlier this week, Midjourney V6 was lastly launched. Can these lastly be the AI picture mills that deal with textual content completely? Let’s discover out.
Why Midjourney and DALL-E 3?
For some time now, DALL-E 3 has been the one AI picture generator that may persistently create pictures with textual content. It is considered one of their essential promoting factors, together with improved creativity and nuance. It is even showcased on their announcement web page with this photograph:
Just lately, Midjourney unveiled its latest mannequin: V6. And what have you learnt, they’re additionally highlighting higher nuance, creativity, and, most significantly, minor textual content drawing as their enhancements. I’ve all the time averted utilizing textual content era when evaluating Midjourney in opposition to different mills as a result of it could be unfair, however now that we’re getting this characteristic, it solely is smart to pit it in opposition to one of the best.
Head-to-Head: Midjourney vs. DALL-E for Textual content Era
Every comparability will deal with textual content, however we’ll additionally analyze their nuance and creativity in making use of the textual content. So, with out additional ado, this is a direct comparability of Midjourney and DALL-E 3 utilizing the identical prompts:
Easy Textual content
Textual content: “That is textual content.”
When it comes to the textual content itself, Midjourney carried out higher than DALL-E 3 due to a small mistake the latter made when writing the final a part of the textual content. Nevertheless, DALL-E exhibits extra cohesion as a picture as a result of the trainer in Midjourney is utilizing a pen to jot down on a chalkboard.
Winner: Midjourney V6
Lengthy Textual content
Textual content: “The short brown fox jumps over a lazy canine, and promptly tripped over the canine’s tail, incomes a disgruntled grumble.”
Each tried so as to add their very own aptitude to a easy immediate (a chunk of paper with writing on it), however neither really made readable textual content. This exhibits that AI picture mills can write brief phrases or sentences, however they worsen as you add extra phrases.
Winner: None.
Keyboard
For this one, I did not ask both mannequin to jot down a selected phrase or sentence, however I tasked them to generate an correct QWERTY keyboard. Clearly, neither is definitely appropriate, however DALL-E could not even prepare the letters correctly, whereas Midjourney in some way received the proper placement for greater than half the letters.
Winner: Midjourney V6
Brand
Textual content: “Matcha.”
Each of those pictures reveal an excellent understanding of my unique immediate (a inexperienced espresso mug emblem) and showcase creativity. There’s nothing incorrect with both textual content both, and it even matches the artwork type every generator created for his or her emblem.
Winner: Tie spherical.
Postcards
Textual content: “Joyful Halloween.”
As AI picture fashions evolve, I’ve to be extraordinarily nitpicky with how I choose their textual content era prowess. Working example: I’d like to make this a tie spherical, however the minor errors on DALL-E’s output (triple Ls in “Halloween” and inconsistent coloring in “Joyful”) prevents me from doing so.
I’ll say this although: I want DALL-E’s postcard over Midjourney.
Winner: Midjourney V6
Indicators
Textual content: “Bacon and Eggs.”
It is a clear win for DALL-E. Midjourney V6 tried its greatest, however the pointless and out-of-place yellow “and” signal stops this spherical from changing into a tie.
DALL-E additionally exhibits superb nuance this spherical by turning “and” to an ampersand and making a separate “Diner” neon signal with out me asking. It is not simply readable; it is also artistic, distinctive, and immersive.
Winner: DALL-E 3
E book Covers
Textual content: “Shapes and Stuff.”
I will admit: DALL-E 3 created a significantly better guide cowl than V6. Nevertheless, the guide title generated by DALL-E has far too many errors, so I’ve to provide this level to Midjourney, which completely rendered “Shapes and Stuff” in a constant font. V6’s cowl design additionally showcases its improved comprehension by highlighting the textual content’s key phrases.
Winner: Midjourney V6.
Comedian Panel
Textual content: “Knock knock!”
Midjourney V6 and DALL-E 3 each made minor errors in writing the textual content. Since each of those are nonetheless readable and their art work is amazingly achieved, I am declaring this spherical one other tie.
Winner: Tie spherical.
Surreal Settings
Textual content: “To infinity”
Simply to supply slightly background: my immediate for this spherical explicitly states that the textual content needs to be composed of stars. Though I discussed that the main target can be on the textual content itself, which Midjourney did higher this spherical, DALL-E’s minor mistake will not stop me from awarding this level to them as a result of they did, in truth, create the textual content utilizing stars.
Winner: DALL-E 3
The Ultimate Tally and Observations
Virtually excellent textual content, and showcases a excessive degree of nuance and creativity. |
Good textual content, and showcases an excellent degree of nuance and creativity. |
|
Letters aren’t positioned in the precise order. |
Round half of the letters are positioned within the appropriate order. |
|
Good textual content, and showcases a excessive degree of nuance and creativity. |
Good textual content, and showcases a excessive degree of nuance and creativity. |
|
Virtually excellent textual content, and showcases a excessive degree of nuance and creativity. |
Good textual content, and showcases a excessive degree of nuance and good creativity. |
|
Good textual content, and showcases an extremely excessive degree of nuance and creativity. |
Virtually excellent textual content with a noticeable mistake. Showcases good degree of nuance and creativity. |
|
A superb try with a couple of noticeable errors. Showcases nice degree of creativity. |
Good textual content, and showcases an excellent degree of nuance and creativity. |
|
Virtually excellent textual content, and showcases an extremely excessive degree of nuance and creativity. |
Virtually excellent textual content, and showcases an extremely excessive degree of nuance and creativity. |
|
Virtually excellent textual content, and showcases a excessive degree of nuance and creativity. |
Good textual content however exhibits low understanding of the immediate. |
One issues I’ve observed on this testing is that DALL-E 3 seems to have the next error price in comparison with Midjourney. Alternatively, Midjourney tends to lack the identical degree of creativity and nuance when tasked with producing pictures that particularly asks for textual content. I consider that V6 is compromising a portion of its creativity when fed with prompts that explicitly focuses on textual content era.
Wrapping Up
This face to face is quite a bit nearer than I anticipated, however Midjourney V6 pulls by way of with a win. Nevertheless, like I stated earlier, V6’s improved however nonetheless restricted nuance is stopping it from producing textual content whereas making full use of its creativity.
Nevertheless, that is to be anticipated as a result of this is not the ultimate model of V6 but. Midjourney is barely going to get higher from right here as they progressively enhance the mannequin behind it. There isn’t any concrete information on DALL-E 4 but, however we are able to anticipate the identical enhancements for that mannequin too. However for now, Midjourney’s the one main the house in textual content era no doubt.
That is it for this direct comparability. Should you’re searching for extra articles about V6 and DALL-E 3, I extremely recommend studying this text. Good luck!