This article was last updated by on

Stable Diffusion 1.5 Vs. 2.1 – Which Is Better?

If you are looking for a reliable way to generate images from text prompts, you might have heard of Stable Diffusion.

But what are the differences and similarities between Stable Diffusion 1.5 and 2.1, and which one should you use for your creative projects?

Stable Diffusion 1.5 and 2.1 are similar in their underlying frameworks and model architecture. However, they differ in text encoder, datasets, resolutions and strengths.

This article will compare these two models’ features, performance, and limitations and help you decide which is better for your artistic needs.

What Are The Features Of Stable Diffusion 1.5?

Stable Diffusion version 1.5 is a text-to-image generation model that uses latent Diffusion to create high-resolution images from text prompts.

It was released in Oct 2022 by a partner of Stability AI named Runway Ml. This model uses a fixed pre-trained text-encoder CLIP ViT-L/14.

Stable Diffusion 1.5 is a latent Diffusion model which combines the autoencoder with the Diffusion model. It is trained in the latent space of the autoencoder.

Images are encoded via an encoder that turns images into latent representations.

Similarly, latent representation is used to guide the Diffusion process. It is a technique used to generate images by iteratively de-noising data.

Here are some interesting features of Stable Diffusion version 1.5;

  • It can generate high-resolution images from any text prompt.
  • It uses a pre-trained text encoder that understands natural language.
  • It uses a latent variable that is computed from the noisy image.
  • It can improve classifier-free guidance sampling by dropping the text-conditioning during training.
  • It can be used with the Diffusers library or the RunwayML GitHub repository.

What Are The Features Of Stable Diffusion 2.1?

Stable Diffusion V-2.1 is a high-resolution image synthesis model that uses latent Diffusion and an OpenCLIP (Contrastive Language-Image Pre-Training) text encoder.

It was released in December 2022 by Stability AI. You can find some of the features of Stable Diffusion V-2.1.

  • It supports negative and weighted prompts allowing a user to control the image synthesis.
  • It can effectively provide natural scenery, people and pop culture.
  • It also provides non-standard resolutions and extreme aspect ratios.
  • It can combine with other models, such as KARLO.
  • It can perform image variations and mixing operations.
Note: Stable Diffusion V-2.1 has two latent spaces: the image representation space learned by the encoder during training and the prompt latent space learned using a combination of pre-training and training-time fine-tuning.

Stable Diffusion 1.5 Vs. 2.1 – Similarities

Stable Diffusion uses a deep generative neural network that denoises random noise into text-encoder-guided images.

You can find some similarities between Stable Diffusion V-1.5 and V-2.1 below.

1. Purpose And Objective

Stable Diffusion can generate detailed images based on text prompts.

It can modify existing images or fill in the missing details on the image with the help of a prompt.

Stable Diffusion V-1.5 and V-2.1 are two versions of the same text-to-image model that can generate high-quality images and empower people to create stunning art.

Both versions aim to empower billions of people to produce stunning art within seconds.

The objectives of both versions are to generate high-quality images with a latent Diffusion model.

2. Architecture And Parameters

Stable Diffusion version-1.5 and 2.1 are based on the same number of parameters and architecture. The architecture is based on a u-Net with 32 residual blocks.

They use the latent Diffusion model architecture developed by the CompVis group at LMU Munich.

They also use the CLIP text encoder pre-trained on the attention mechanism.

stable diffusion 1.5 vs 2.1
Stable Diffusion 1.5 and 2.1 version uses the approach of the CLIP text encoder.

3. Image Resolution

Both of the models can generate images at 512 x 512 resolution. However, Stable Diffusion V-2.1 can generate a larger and more detailed image at 768 x 768 pixels.

This means V-2.1 requires more computational resources and time to generate images.

Continue to learn the similarities and differences between ChatGPT and Stable Diffusion.

Stable Diffusion 1.5 Vs. 2.1 – Major Differences

Stable Diffusion has different versions, such as v-1.5 and v-2.1, that differ in text encoder, resolution and training data. They both are trained on different datasets.

Moreover, Stable version 2.1 is an improvement over version 1.5 in terms of the generated images’ quality, diversity and stability.

Refer to the table below for some main differences between stable Diffusion 1.5 and 2.1.

Aspect
Stable Diffusion V-1.5Stable Diffusion V-2.1
Text EncoderCLIPOpenCLIP
Image Resolution512 x 512 pixels768 x 768 pixels
Dataset FilteringMore restrictive for adult contentLess filtered for adult content
Negative and weighted prompts
Not supported

Supported
Non-standard resolution and aspect rationNot Supported
Supported
Diversity and realism of images
Lower for people and pop culture
Higher for people and architecture, interior design, wildlife etc.

Now, let’s dive deeper into the significant differences between Stable Diffusion 1.5 and 2.1.

1. Text Encoder

Stable Diffusion version-1.5 uses CLIP ViT-L/14 as a text encoder.

While Stable Diffusion version-2.1 uses OpenCLIP-ViT/H as a text encoder.

OpenCLIP is a new text encoder developed by LAION (Large Scale Artificial Intelligence) that gives a deeper range of expression than CLIP.

2. Resolution Of Images

Stable Diffusion V-1.5 supports 512 x 512 resolution images.

However, Stable Diffusion V-2.1 supports higher resolution images of 768 x 768 based on SD2.1-768, which are twice the area of the former.

This model allows image variation and mixing operations described in Hierarchical Text-Conditional Image Generation With CLIP Latent.

This means Stable Diffusion V-2.1 can generate larger, more detailed images than V-1.5, capture more details and nuances from the prompt, and look more reliable than the original description.

stable diffusion 1.5 vs 2.1
The image resolution of Stable Diffusion v-1.5 and v-2.1

3. Training Data

Stable Diffusion V-1.5 was trained on a dataset called lion-aesthetics V2 5+, filtered for adult content using LAION’s NSFW (Not Safe For Work) filter.

This model is good at generating architecture, interior design and so on. However, they are not good at generating people and pop-culture images.

On the other hand, Stable Diffusion V-2.1 was fine-tuned on a less filtered dataset for inappropriate or adult content.

This model improved the ability to generate people and pop-culture images.

Additionally, the Stable Diffusion 2.1 version supports non-standard and negative and weighted prompts allowing users to tell what not to generate.

This means Stable Diffusion V-2.1 gives a deeper range of expression and more control over the image synthesis.

Regardless Stable Diffusion V-1.5 does not support a negative prompt.

Continue reading to discover some best Stable Diffusion negative prompt lists to create images.

4. Stability

Stable Diffusion V-1.5 is more suitable for generating images of people and pop culture.

While Stable Diffusion V-2.1 is suitable for generating images of architecture, interior design and other landscape scenes.

The Stable Diffusion V-2.1 comes in two variants: Stable unCLIP-L and Stable unCLIP-H, conditioned on CLIP ViT-L and CLIP ViT-H image embeddings.

You can try a demo of the Stable unCLIP model on the web.

This version also improves anatomy and supports a range of art styles than version 1.5.

Note: Users have reported that Stability Diffusion v-1.5 can represent celebrities or artistic styles than v-2.1 as it uses CLIP which was trained on a larger dataset than OpenCLIP.

The Botton Line

Stable Diffusion versions 1.5 and 2.1 can generate high-quality images while choosing between them depends on your preferences and use cases.

If you want to generate more realistic, diverse and stable images, you can try V-2.1. However, if you want to generate images more specific to certain celebrities or styles, you can try V-1.5.

Moreover, you can even try both versions and compare the results yourself.

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like