Exploring Midjourney V4 for Creating Digital Art

By Jessie Hobb On Dec 1, 2022

A deep dive into the features and options for the popular text-to-image creation system

Top Row: **Mona Lisa** by da Vinci (left), **Stylized by Midourney as “Expressionist painting”** by Author (center), and **“Cubist painting”** by Author (right), Bottom Row: **Fields of Tulip With The Rijnsburg Windmill by Claude Monet** (left), **Stylized by Midourney as “Expressionist painting”** by Author (center), and **“Cubist painting”** by Author (right)

In my article last month, I tested three text-to-image generation systems, Stable Diffusion, DALL-E, and Midjourney, to see which was the best for creating digital art. [spoiler alert] Midjourney won the showdown. For the most part, I only used the default options for the three systems, even though many features can be used to make different and often better images.

In this article, I will lead you through a deep dive into the features and options available in Midjourney for creating digital art, including version 4, which is currently available as a public beta. And v4 is fantastic!

Here’s a list of what I’ll cover in the article:

Midjourney Basics
Exploring the Versions
Creating Variations and Remixes
Style Transfer with Text Prompts
Midjourney Mashups
Rendering Options
Upsizing Options

Midjourney is a text-to-image creation service that uses a Discord server, a group messaging platform mostly used by gamers, as the primary user interface [1]. It’s free to sign up, and you can create about 200 images for free. After that, you can check out the pricing options spelled out in the quick start guide here.

Basic Settings

Once I got set up and headed to a “newbies” room, I saw the main options by typing /settings and hitting enter twice. The following UI showed up.

**Midjourney Settings**, Image by Author

These are just some of the basic settings. The complete list is available as command-line options here.

Creating an Image

Using the default settings, I created four thumbnail images by typing /imagine, hitting enter to bring up the UI, and entering the following prompt, “abstract painting with colorful circles.”

**Creating Images in Midjourney**, Image by Author

The system thought about it for a second, then started the process of creating the thumbnails. After about 20 seconds, I got the results.

**“abstract painting with colorful circles” Midjourney Thumbnails,** Image by Author

I got four thumbnails at 256×256 pixels based on my prompt. They seem pretty good. The “U” buttons below are to upscale a selected image to 1024×1024, the “V” buttons create variations of a selected image, and the blue spinny icon will rerun the process, creating four new thumbnails.

I liked the upper right image (number 2), so I hit the U2 button to upscale that one. After about 10 seconds, I got the result.

**“abstract painting with colorful circles” Midjourney Upscale Results** (left), **Upscaled Image** (right), Images by Author

Here’s what I saw when the upsizing process was finished, with various options for further processing. When I checked on the image, it came up full-res in my Midjourney account, as seen on the right.

Midjourney released its first version of the image-generating service in March 2022. Version 2 was released soon after in April, and version 3 in July. The beta for version 4 was released on November 10, 2022.

As mentioned above, you can use the /settings command to select versions 1 through 4. I noticed that version 4 creates 512×512 initial renderings, whereas the previous versions created initial images at 256×256. To test all four versions, I sent in the prompt “painting of a French man wearing a hat drinking wine at an outdoor cafe” to see what I would get.

**“painting of a French man wearing a hat drinking wine at an outdoor cafe” rendered in Midjourney V1** (top left), V2 (top right), V3 (bottom left), and V4 (bottom right), images by Author

The four thumbnails in the upper left were rendered using version 1, version 2 at the upper right, version 3 at the lower left, and version 4 at the lower right. Wow! What a big difference in quality! With each version, the images steadily improve. The version 4 images look extraordinary! In the next section, I’ll show some quantitative results.

Contrastive Similarity Test

For my article last month, I researched various metrics for gauging the esthetics of generated art. I discovered a technique using the CLIP model by OpenAI [2] that can be used to estimate esthetic quality. When I compared an image embedding to the embeddings of positive and negative text prompts, like “bad art” and “good art,” I found that using the difference in similarity produced reasonable results. I also calculated a prompt similarity metric to see how well the image matched the prompt using similar logic. The graph below shows the two metrics for the sixteen images above.

**Prompt Similarity vs. Esthetic Quality,** Graph by Author

You can see how the renderings using the newer Midjourney rank higher on both the esthetic quality metric (vertical axis) and prompt similarity (horizontal axis) than the older versions, and V4 stands a cut above the rest. Two notable outliers are v2–1 (unusually good prompt similarity) and v2–3 (unusually bad on both metrics). Scrolling up to look at the images again, these metrics seem to hold up reasonably well. The math and Python for this test are on my GitHub repository here.

After creating four thumbnails with a text prompt, the system allows you to make additional variations using the V1-V4 buttons. For example, for the prompt, “a man and woman holding an umbrella on a city street in the rain,” the system generates four thumbnails. After choosing V2 for the upper right, it generates four variations. Note that I used version four for this and all of the following experiments.

**“a man and woman holding an umbrella on a city street in the rain” Initial Render** (left) and **Variations of the Upper Right** (right), Images by Author

The variations all look pretty good. Since version 3 of Midjourney, you can modify the text prompt when creating variations. The feature is called Remix mode and is available in the settings. For example, I added the following text to the prompts using Remix mode: “1880s”, “1950s”, and “future.” Here are the results.

**“a man and woman holding an umbrella on a city street in the rain” with “1880s”** (left), **“1950s”** (center), and “future” (right), Images by Author

Sure enough, the system renders the image with distinctive visual looks associated with the specified time periods.

Similar to using the Remix mode to create variations, you can start with an image available on the web and add text to stylize the image. I used the /imagine prompt and pasted in the URL for the base painting and specified the style using text.

For example, starting with the Mona Lisa, a Monet landscape, and a Cezanne still life painting, I used Midjourney to stylize these works as an “Expressionist painting” and a “Cubist painting.” Here are the results.

**Mona Lisa** by Leonardo da Vinci (left), **Midourney “Expressionist painting”** by Author (center), **Midjourney “Cubist painting”** by Author (right)

**Fields of Tulip With The Rijnsburg Windmill** by Claude Monet (left), **Midourney “Expressionist painting”** by Author (center), **Midjourney “Cubist painting”** by Author (right)

**Curtain, Jug, and Fruit** by Paul Cezanne (left), **Midourney “Expressionist painting”** by Author (center), **Midjourney “Cubist painting”** by Author (right)

The original works are on the left, and the Expressionist and Cubist stylizations follow to the right. The results seem pretty nice. I like how the system is free to change the composition a bit in addition to changing the style. Note how the Mona Lisa figure is scaled up and down a bit and how some extra items like flowers and a wine bottle were added to the still life paintings.

The system has a nice feature that allows users to specify two or three images as inputs in addition to a text prompt. I used this feature to create what I call Midjourney Mashups. I pasted in two URLs to paintings in the public domain without prompts, and here’s what Midjourney created. The new images are in the center.

**The Girl with a Pearl Earring** by Johannes Vermeer (left), **Midjourney Mashup** by Author (center), **Madame Roulin Rocking the Cradle** by Vincent van Gogh (right)

**Landscape** by Pierre-Auguste Renoir (left), **Midjourney Mashup** by Author (center), **Wheat Field with Cypresses at the Haude Galline near Eygalieres**
by Vincent van Gogh (right)

**Still Life with Herring, Wine, and Bread** by Pieter Claesz (left), **Midjourney Mashup** by Author (center), **Still life, pitcher and fruit** by Paul Cezanne (right)

**Yellow-Red-Blue** by Wassily Kandinsky (left), **Midjourney painting**by Author (center), **Composition C (No. III) with Red, Yellow, and Blue**
by Piet Mondrian (right)

OK, these look pretty cool. You can see how the system picked up on the key compositional elements of both source images and worked out a new way to express them. After experimenting with this technique, I found that this works best for sources that have some basis of thematic similarity. When I tried this with completely disparate sources, the results were unpredictable and often not visually coherent.

Also, the Midjourney V4 model only supports 1:1 image aspect ratios. Hopefully, they will release a version to create images with portrait and landscape aspect ratios, as they did for the earlier versions.

Mashups of Contemporary Works

To see how this would look with contemporary works, I reached out to four artists in the Boston area, and they permitted me to use some of their recent pieces. Katie Cornog is a watercolor artist, Noah Sussman paints with oil on canvas, Anna Kristina Goransson creates felt sculptures, and Georgina Lewis is an installation artist working in mixed media. The center column shows the four Midjorney Mashups I made, blending two pieces from each artist.

***Sandy Point Autumn*** by Katie Cornog (left), **Midjourney Mashup by Author** (center), ***Plum Island Reflections*** by Katie Cornog (right)

**Studio Scene** by Noah Sussman (left), **Midjourney Mashup** by Author (center), **Seeing Beyond The Eclipse** by Noah Sussman (right)

**Beauty of Melancholy** by Anna Kristina Goransson (left), **Midjourney Mashup** by Author (center), **Searching** by Anna Kristina Goransson (right)

**recent common ancestor** by Georgina Lewis (left), **Midjourney Mashup** by Author (center), **rappaccini’s offspring** by Georgina Lewis (right)

Once again, the results were spectacular. All four mashups look like the original artists could have created them. Two of the artists told me that they wanted to do just that.

As I mentioned above, there are a lot of rendering options in Midjourney. To run an apples-to-apples test, however, I learned how to retrieve and use the random seed used to generate images. It’s a bit hidden, but after I generated an image, I created a “reaction” to my image and then chose the “envelope” emoji.

**Sending a Reaction with an Envelope Emoji in Discord**, Image by Author

This triggered the bot to send me a message that included the seed for the run, like this.

**Direct Message with the Image Seed,** Image by Author

I then used the –seed 6534 parameter to generate the same image with different options. For example, here are renderings using the prompt, “painting of the Boston Public Garden in springtime,” with the quality set to 0.5, 1.0 (the default), and 2.0.

**“painting of the Boston Public Garden in springtime,” with quality of 0.5** (left), **1.0** (center), and 2.0 (right), Images by Author

It’s interesting to see how much the image changed with the different quality selections. The yellow tree in the middle roughly stayed the same, but the flowers, buildings, and people all moved around. It’s subtle, but the image on the right with a quality of 2.0 seems to have the fewest “problem” areas.

Midjourney also has options for upsizing that can improve the details of the final images. For this test, I used the prompt, “steampunk rotary phone,” with the quality set to 2.0. I had to use the “seed” trick to test out the three upsizing options, lite, regular, and beta. (For the record, this was seed #24863.) Here are the results, along with a close-up of the rotary dialer.

**“steampunk rotary phone” with Upsize Set to Lite** (left)**, Regular** (center)**, and Beta** (right), **with Details in the Second Row**, Images by Author

All three images show some fine details, but the beta upsize on the right seems a bit more orderly. For example, the inner ring on the dialer seems to be more realistically formed.

I have been looking at using AI to generate digital art for the last two and a half years. I can tell you that Midjourney V4 is by far the best system I have seen. Hopefully, they will release this version with the full suite of features (aspect ratio, style amount, etc.) Also, a couple of features that seem to be missing are inpainting and outpainting, like in DALL-E.

The source code for running the Contrastive Similarity Test in images is available on GitHub. I released the source code under the MIT open-source license.

MIT Open-source License

I want to thank Jennifer Lim and Oliver Strimpel for their help with this project.

[1] Midjourney https://midjourney.gitbook.io/docs/

[2] CLIP by A. Radford et al., Learning Transferable Visual Models From Natural Language Supervision (2021)