MidJourney has launched the Alpha version of it’s V6 model, and there are many promised improvements over V5.2. We look at what’s better on paper, and test it against the older model.

What’s New in V6?

Undoubtedly a lot has happened under the hood with V6, but MidJourney highlighted the key features in anofficial Discord thread. Note that you’ll have to be a member of the MidJourney Discord to view the post in question. These are the most important changes:

In short, V6 brings MidJourney more in line with the impressive new capabilities of rival toolDALL-E 3, but here we’re interested in seeing how much better it is than the V5.2 model which was the default at the time of writing.

An AI-generated image of a marketplace in a futuristic city.

Prompt Adherence

The first thing I want to test is how well the new model adheres to the prompt. In the past, MidJourney would take details in the prompt more like vague suggestions than instructions. So here’s a prompt with very detailed instructions.

For each model I’ve chosen the image that most closely matches my prompt. Here’s the best V5.2 came up with.

AI-generated image of a marketplace in a futuristic city.

Here is the best that V6 came up with.

While V5.2 generally includes all the element I asked for, they aren’t arranged correctly relative to the frame or each other at all. The only real mistake V6 made here is putting the apple in the robot’s left arm and the shopping basket in the girl’s right arm. Perhaps most importantly, all the images generated by V6 are much more coherent than those made by V5.2, which has no sense of framing or balance here, and just feel sort of mashed together.

Putting Text Into Images

Like DALL-E 3, MidJourney V6 boasts the ability to properly integrate text in an image. All you have to do is separate the text using quotation marks in your prompt. Here’s the prompt we used:

I’m putting all four attempts of both models here to show that V6 is not perfect at this yet, but none of the V5.2 images are anywhere close to getting the text right.

Four AI-generated flags in each quadrant of the image, with garbled text.

With V6 however, it was 75% successful on the first attempt, and you can clearly see the text is properly integrated into the image, rather than simply overlaid.

Artistic Quality

While we can more or less objectively test how well V6 can follow a prompt or integrate text, artistic quality is much harder to nail down. In mycomparison of MidJourney models V1 to V5.2, it was clear that with every new model the AI was becoming more “imaginative” for lack of a better word. Composition and detail also drastically improved, and honestly, V5.2 still came out on top when it comes to artistic flair, as I noted when Icompared MidJourney to DALL-E 3.

So I think this is best left up to the judgment of each person reading this, and so here are a few pairs of images, with V5.2 on the left and V6 on the right.

Four AI-generated flags that have the words How-To Geek on them.

It’s Just an Alpha (For Now)

It’s really important to keep in mind that MidJourney V6 is not finished at the time of writing. This is a new model trained from scratch, but with the lessons learned from previous models. V6 is still missing some of the awesome value-adds you’re able to find in V5.2, such as the ability topan the image.

What is clear is that you can throw all the prompt engineering tricks you know for MidJourney out the window, V5.2 is still perfectly capable of creating stunning and usable images. At this stage, there’s no harm in trying the V6 Alpha model to see if it gives better results with your prompts, but keep V5.2 close at hand too.

Two side-by-side AI-generated images of idyllic elven villages featuring whimsical houses and lush greenery.

Two side-by-side images of a futuristic street scene with aliens, robots, and humans all living in the same city.

An AI-generated pair of images side-by-side of A nature photograph of mountains as seen from the beach, with a large visible moon in the sky.