0% found this document useful (0 votes)

17 views19 pages

Stable Diffusion

Stable Diffusion is a deep learning text-to-image model released in 2022 that utilizes diffusion techniques to generate detailed images based on text prompts. Developed by Stability AI in collaboration with researchers from Ludwig Maximilian University of Munich, it allows for various applications including inpainting and image-to-image translations, and can run on consumer hardware. The model has undergone several updates, addressing limitations in image quality and bias, while enabling user fine-tuning for specific use cases.

Uploaded by

rhea.stuart.russell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views19 pages

Stable Diffusion

Uploaded by

rhea.stuart.russell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Stable Diffusion

Stable Diffusion is a deep learning, text-to-image

Stable Diffusion
model released in 2022 based on diffusion techniques.
The generative artificial intelligence technology is the
premier product of Stability AI and is considered to be
a part of the ongoing artificial intelligence boom.

It is primarily used to generate detailed images

conditioned on text descriptions, though it can also be
applied to other tasks such as inpainting, outpainting,
and generating image-to-image translations guided by
a text prompt.[3] Its development involved researchers
from the CompVis Group at Ludwig Maximilian
University of Munich and Runway with a
computational donation from Stability and training
data from non-profit organizations.[4][5][6][7] An image generated with Stable Diffusion 3.5
based on the text prompt a photograph of
Stable Diffusion is a latent diffusion model, a kind of an astronaut riding a horse
deep generative artificial neural network. Its code and
Original author(s) Runway, CompVis, and
model weights have been released publicly,[8] and it
Stability AI
can run on most consumer hardware equipped with a
modest GPU with at least 4 GB VRAM. This marked a Developer(s) Stability AI

departure from previous proprietary text-to-image Initial release August 22, 2022
models such as DALL-E and Midjourney which were Stable release SD 3.5 (model)[1] / October
accessible only via cloud services.[9][10] 22, 2024
Repository github.com/Stability-AI
/generative-models (http
Development s://github.com/Stability-AI/
generative-models)
Stable Diffusion originated from a project called
Latent Diffusion,[11] developed in Germany by Written in Python[2]
researchers at Ludwig Maximilian University in Type Text-to-image model
Munich and Heidelberg University. Four of the original License Stability AI Community
5 authors (Robin Rombach, Andreas Blattmann, License
Patrick Esser and Dominik Lorenz) later joined
Website stability.ai/stable-image (ht
Stability AI and released subsequent versions of Stable
tps://stability.ai/stable-imag
Diffusion.[12] e)

The technical license for the model was released by the

CompVis group at Ludwig Maximilian University of Munich.[10] Development was led by Patrick Esser
of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented
the latent diffusion model architecture used by Stable Diffusion.[7] Stability AI also credited EleutherAI
and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as
supporters of the project.[7]

Technology

Architecture
Models in Stable Diffusion series before SD 3 all used
a kind of diffusion model (DM), called a latent
diffusion model (LDM), developed by the CompVis
(Computer Vision & Learning)[13] group at LMU
Munich.[14][8] Introduced in 2015, diffusion models are
Diagram of the latent diffusion architecture used
trained with the objective of removing successive
by Stable Diffusion
applications of Gaussian noise on training images,
which can be thought of as a sequence of denoising
autoencoders. Stable Diffusion consists of 3 parts: the
variational autoencoder (VAE), U-Net, and an
optional text encoder.[15] The VAE encoder
compresses the image from pixel space to a smaller
dimensional latent space, capturing a more
fundamental semantic meaning of the image.[14]
Gaussian noise is iteratively applied to the
compressed latent representation during forward
diffusion.[15] The U-Net block, composed of a ResNet
backbone, denoises the output from forward diffusion
backwards to obtain a latent representation. Finally, The denoising process used by Stable Diffusion.
the VAE decoder generates the final image by The model generates images by iteratively
denoising random noise until a configured number
converting the representation back into pixel
of steps have been reached, guided by the CLIP
space.[15] text encoder pretrained on concepts along with
the attention mechanism, resulting in the desired
The denoising step can be flexibly conditioned on a image depicting a representation of the trained
string of text, an image, or another modality. The concept.
encoded conditioning data is exposed to denoising U-
Nets via a cross-attention mechanism.[15] For
conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts
to an embedding space.[8] Researchers point to increased computational efficiency for training and
generation as an advantage of LDMs.[7][14]

The name diffusion takes inspiration from the thermodynamic diffusion and an important link was made
between this purely physical field and deep learning in 2015.[16][17]
With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is
considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on
consumer GPUs,[18] and even CPU-only if using the OpenVINO version of Stable Diffusion.[19]

SD XL
The XL version uses the same LDM architecture as previous versions,[20] except larger: larger UNet
backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect
ratios (not just the square aspect ratio like previous versions).

The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for
adding fine details to preexisting images via text-conditional img2img.

SD 3.0
The 3.0 version[21] completely changes the backbone. Not a UNet, but a Rectified Flow Transformer,
which implements the rectified flow method[22][23] with a Transformer.

The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed
text encoding, and image encoding (in latent space). The transformed text encoding and image encoding
are mixed during each transformer block.

The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means
that it mixes text and image encodings inside its operations. This differs from previous versions of DiT,
where the text encoding affects the image encoding, but not vice versa.

Training data
Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available
dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were
classified based on language and filtered into separate datasets by resolution, a predicted likelihood of
containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality).[24] The dataset
was created by LAION, a German non-profit which receives funding from Stability AI.[24][25] The Stable
Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-
aesthetics v2 5+.[24] A third-party analysis of the model's training data identified that out of a smaller
subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample
size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by
websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by
Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of
private and sensitive data.[26]

Training procedures
The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few
rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the
LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out
of 10 when asked to rate how much they liked them.[27][24][28] The LAION-Aesthetics v2 5+ subset also
excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as
carrying a watermark with greater than 80% probability.[24] Final rounds of training additionally dropped
10% of text conditioning to improve Classifier-Free Diffusion Guidance.[29]

The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000
GPU-hours, at a cost of $600,000.[30][31][32]

SD3 was trained at a cost of around $10 million.[33]

Limitations
Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the
model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of
generated images noticeably degrades when user specifications deviate from its "expected" 512×512
resolution;[34] the version 2.0 update of the Stable Diffusion model later introduced the ability to natively
generate images at 768×768 resolution.[35] Another challenge is in generating human limbs due to poor
data quality of limbs in the LAION database.[36] The model is insufficiently trained to understand human
limbs and faces due to the lack of representative features in the database, and prompting the model to
generate images of such type can confound the model.[37] Stable Diffusion XL (SDXL) version 1.0,
released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and
text.[38][39]

Accessibility for individual developers can also be a problem. In order to customize the model for new
use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"),[40]
new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through
additional retraining have been used for a variety of different use-cases, from medical imaging[41] to
algorithmically generated music.[42] However, this fine-tuning process is sensitive to the quality of new
data; low resolution images or different resolutions from the original data can not only fail to learn the
new task but degrade the overall performance of the model. Even when the model is additionally trained
on high quality images, it is difficult for individuals to run models in consumer electronics. For example,
the training process for waifu-diffusion requires a minimum 30 GB of VRAM,[43] which exceeds the
usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, which has only about
12 GB.[44]

The creators of Stable Diffusion acknowledge the potential for algorithmic bias, as the model was
primarily trained on images with English descriptions.[31] As a result, generated images reinforce social
biases and are from a western perspective, as the creators note that the model lacks data from other
communities and cultures. The model gives more accurate results for prompts that are written in English
in comparison to those written in other languages, with western or white cultures often being the default
representation.[31]

End-user fine-tuning
To address the limitations of the model's initial training, end-users may opt to implement additional
training to fine-tune generation outputs to match more specific use-cases, a process also referred to as
personalization. There are three methods in which user-accessible fine-tuning can be applied to a Stable
Diffusion model checkpoint:
An "embedding" can be trained from a collection of user-provided images, and allows the
model to generate visually similar images whenever the name of the embedding is used
within a generation prompt.[45] Embeddings are based on the "textual inversion" concept
developed by researchers from Tel Aviv University in 2022 with support from Nvidia, where
vector representations for specific tokens used by the model's text encoder are linked to
new pseudo-words. Embeddings can be used to reduce biases within the original model, or
mimic visual styles.[46]
A "hypernetwork" is a small pretrained neural network that is applied to various points within
a larger neural network, and refers to the technique created by NovelAI developer Kurumuz
in 2021, originally intended for text-generation transformer models. Hypernetworks steer
results towards a particular direction, allowing Stable Diffusion-based models to imitate the
art style of specific artists, even if the artist is not recognised by the original model; they
process the image by finding key areas of importance such as hair and eyes, and then
patch these areas in secondary latent space.[47]
DreamBooth is a deep learning generation model developed by researchers from Google
Research and Boston University in 2022 which can fine-tune the model to generate precise,
personalised outputs that depict a specific subject, following training via a set of images
which depict the subject.[48]

Capabilities
The Stable Diffusion model supports the ability to generate new images from scratch through the use of a
text prompt describing elements to be included or omitted from the output.[8] Existing images can be re-
drawn by the model to incorporate new elements described by a text prompt (a process known as "guided
image synthesis"[49]) through its diffusion-denoising mechanism.[8] In addition, the model also allows the
use of prompts to partially alter existing images via inpainting and outpainting, when used with an
appropriate user interface that supports such features, of which numerous different open source
implementations exist.[50]

Stable Diffusion is recommended to be run with 10 GB or more VRAM, however users with less VRAM
may opt to load the weights in float16 precision instead of the default float32 to tradeoff model
performance with lower VRAM usage.[34]

Text to image generation

The text to image sampling script within Stable Diffusion, known as "txt2img", consumes a text prompt
in addition to assorted option parameters covering sampling types, output image dimensions, and seed
values. The script outputs an image file based on the model's interpretation of the prompt.[8] Generated
images are tagged with an invisible digital watermark to allow users to identify an image as generated by
Stable Diffusion,[8] although this watermark loses its efficacy if the image is resized or rotated.[51]

Each txt2img generation will involve a specific seed value which affects the output image. Users may opt
to randomize the seed in order to explore different generated outputs, or use the same seed to obtain the
same image output as a previously generated image.[34] Users are also able to adjust the number of
inference steps for the sampler; a higher value takes a longer duration of time, however a smaller value
may result in visual defects.[34] Another configurable option, the classifier-free guidance scale value,
allows the user to adjust how closely the output image adheres to the prompt.[29] More experimentative
use cases may opt for a lower scale value, while use cases aiming for more specific outputs may use a
higher value.[34]
Additional text2img features are
provided by front-end
implementations of Stable
Diffusion, which allow users to
modify the weight given to specific
parts of the text prompt. Emphasis
markers allow users to add or reduce
emphasis to keywords by enclosing
them with brackets.[52] An
alternative method of adjusting
weight to parts of the prompt are
"negative prompts". Negative
prompts are a feature included in
some front-end implementations,
including Stability AI's own Demonstration of the effect of negative prompts on image generation
DreamStudio cloud service, and
Top: no negative prompt
allow the user to specify prompts
Centre: "green trees"
which the model should avoid
during image generation. The Bottom: "round stones, round rocks"
specified prompts may be
undesirable image features that
would otherwise be present within image outputs due to the positive prompts provided by the user, or due
to how the model was originally trained, with mangled human hands being a common example.[50][53]

Image modification
Stable Diffusion also includes
another sampling script,
"img2img", which consumes a
text prompt, path to an existing
image, and strength value
between 0.0 and 1.0. The script
outputs a new image based on
the original image that also
features elements provided
within the text prompt. The
Demonstration of img2img modification
strength value denotes the
amount of noise added to the Left: Original image created with Stable Diffusion 1.5
output image. A higher strength Right: Modified image created with Stable Diffusion XL 1.0
value produces more variation
within the image but may
produce an image that is not semantically consistent with the prompt provided.[8]

There are different methods for performing img2img. The main method is SDEdit,[54] which first adds
noise to an image, then denoises it as usual in text2img.
The ability of img2img to add noise to the original image makes it potentially useful for data
anonymization and data augmentation, in which the visual features of image data are changed and
anonymized.[55] The same process may also be useful for image upscaling, in which the resolution of an
image is increased, with more detail potentially being added to the image.[55] Additionally, Stable
Diffusion has been experimented with as a tool for image compression. Compared to JPEG and WebP, the
recent methods used for image compression in Stable Diffusion face limitations in preserving small text
and faces.[56]

Additional use-cases for image modification via img2img are offered by numerous front-end
implementations of the Stable Diffusion model. Inpainting involves selectively modifying a portion of an
existing image delineated by a user-provided layer mask, which fills the masked space with newly
generated content based on the provided prompt.[50] A dedicated model specifically fine-tuned for
inpainting use-cases was created by Stability AI alongside the release of Stable Diffusion 2.0.[35]
Conversely, outpainting extends an image beyond its original dimensions, filling the previously empty
space with content generated based on the provided prompt.[50]

A depth-guided model, named "depth2img", was introduced with the release of Stable Diffusion 2.0 on
November 24, 2022; this model infers the depth of the provided input image, and generates a new output
image based on both the text prompt and the depth information, which allows the coherence and depth of
the original input image to be maintained in the generated output.[35]

ControlNet
ControlNet[57] is a neural network architecture designed to manage diffusion models by incorporating
additional conditions. It duplicates the weights of neural network blocks into a "locked" copy and a
"trainable" copy. The "trainable" copy learns the desired condition, while the "locked" copy preserves the
original model. This approach ensures that training with small datasets of image pairs does not
compromise the integrity of production-ready diffusion models. The "zero convolution" is a 1×1
convolution with both weight and bias initialized to zero. Before training, all zero convolutions produce
zero output, preventing any distortion caused by ControlNet. No layer is trained from scratch; the process
is still fine-tuning, keeping the original model secure. This method enables training on small-scale or
even personal devices.

User Interfaces
Stability provides an online image generation service called DreamStudio.[58][59] The company also
released an open source version of DreamStudio called StableStudio.[60][61] In addition to Stability's
interfaces, many third party open source interfaces exist, such as AUTOMATIC1111 Stable Diffusion
Web UI, which is the most popular and offers extra features,[62] Fooocus, which aims to decrease the
amount of prompting needed by the user,[63] and ComfyUI, which has a node-based user interface,
essentially a visual programming language akin to many 3D modeling applications.[64][65][66]

Releases

Version
Release date Parameters Notes
number
1.1, 1.2, 1.3, All released by CompVis. There is no "version 1.0". 1.1
August 2022
1.4[67] gave rise to 1.2, and 1.2 gave rise to both 1.3 and 1.4.[68]
Initialized with the weights of 1.2, not 1.4. Released by
1.5[69] October 2022 983M
RunwayML.

2.0[70] November 2022 Retrained from scratch on a filtered dataset.[71]

2.1[72] December 2022 Initialized with the weights of 2.0.

The XL 1.0 base model has 3.5 billion parameters, making

XL 1.0[73][20] July 2023 3.5B
it around 3.5x larger than previous versions.[74]

XL Turbo[75] November 2023 Distilled from XL 1.0 to run in fewer diffusion steps.[76]
February 2024
3.0[77][21] (early preview)
800M to 8B A family of models.

A family of models with Large (8 billion parameters), Large

[78] October 2024 2.5B to 8B Turbo (distilled from SD 3.5 Large), and Medium (2.5 billion
3.5
parameters).

Key papers

Learning Transferable Visual Models From Natural Language Supervision (2021).[79] This
paper describes the CLIP method for training text encoders, which convert text into floating
point vectors. Such text encodings are used by the diffusion model to create images.
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
(2021).[54] This paper describes SDEdit, aka "img2img".
High-Resolution Image Synthesis with Latent Diffusion Models (2021, updated in 2022).[80]
This paper describes the latent diffusion model (LDM). This is the backbone of the Stable
Diffusion architecture.
Classifier-Free Diffusion Guidance (2022).[29] This paper describes CFG, which allows the
text encoding vector to steer the diffusion model towards creating the image described by
the text.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (2023).[20]
Describes SDXL.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
(2022).[22][23] Describes rectified flow, which is used for the backbone architecture of SD
3.0.
Scaling Rectified Flow Transformers for High-resolution Image Synthesis (2024).[21]
Describes SD 3.0.
Training cost
SD 2.0: 0.2 million hours on A100 (40GB).[70]
Stable Diffusion 3.5 Large was made available for enterprise usage on Amazon Bedrock of Amazon Web
Services.[81]

Usage and controversy

Stable Diffusion claims no rights on generated images and freely gives users the rights of usage to any
generated images from the model provided that the image content is not illegal or harmful to
individuals.[82]

The images Stable Diffusion was trained on have been filtered without human input, leading to some
harmful images and large amounts of private and sensitive information appearing in the training data.[26]

More traditional visual artists have expressed concern that widespread usage of image synthesis software
such as Stable Diffusion may eventually lead to human artists, along with photographers, models,
cinematographers, and actors, gradually losing commercial viability against AI-based competitors.[83]

Stable Diffusion is notably more permissive in the types of content users may generate, such as violent or
sexually explicit imagery, in comparison to other commercial products based on generative AI.[84]
Addressing the concerns that the model may be used for abusive purposes, CEO of Stability AI, Emad
Mostaque, argues that "[it is] peoples' responsibility as to whether they are ethical, moral, and legal in
how they operate this technology",[10] and that putting the capabilities of Stable Diffusion into the hands
of the public would result in the technology providing a net benefit, in spite of the potential negative
consequences.[10] In addition, Mostaque argues that the intention behind the open availability of Stable
Diffusion is to end corporate control and dominance over such technologies, who have previously only
developed closed AI systems for image synthesis.[10][84] This is reflected by the fact that any restrictions
Stability AI places on the content that users may generate can easily be bypassed due to the availability of
the source code.[85]

Controversy around photorealistic sexualized depictions of underage characters have been brought up,
due to such images generated by Stable Diffusion being shared on websites such as Pixiv.[86]

In June of 2024, a hack on an extension of ComfyUI, a user interface for Stable Diffusion, took place,
with the hackers claiming they targeted users who committed "one of our sins", which included AI-art
generation, art theft, promoting cryptocurrency.[87]

Litigation

Andersen, McKernan, and Ortiz v. Stability AI, Midjourney, and DeviantArt

In January 2023, three artists, Sarah Andersen, Kelly McKernan, and Karla Ortiz, filed a copyright
infringement lawsuit against Stability AI, Midjourney, and DeviantArt, claiming that these companies
have infringed the rights of millions of artists by training AI tools on five billion images scraped from the
web without the consent of the original artists.[88]
In July 2023, U.S. District Judge William Orrick inclined to dismiss most of the lawsuit filed by
Andersen, McKernan, and Ortiz but allowed them to file a new complaint, providing them an opportunity
to reframe their arguments.[89]

Getty Images v. Stability AI

In January 2023, Getty Images initiated legal proceedings against Stability AI in the English High Court,
alleging significant infringement of its intellectual property rights. Getty Images claims that Stability AI
"scraped" millions of images from Getty’s websites without consent and used these images to train and
develop its deep-learning Stable Diffusion model.[90][91]

Key points of the lawsuit include:

Getty Images asserting that the training and development of Stable Diffusion involved the
unauthorized use of its images, which were downloaded on servers and computers that
were potentially in the UK. However, Stability AI argues that all training and development
took place outside the UK, specifically in U.S. data centers operated by Amazon Web
Services.[92]
Stability AI applied for reverse summary judgment and/or strike out of two claims: the
training and development claim, and the secondary infringement of copyright claim. The
High Court, however, refused to strike out these claims, allowing them to proceed to trial.
The court is to determine whether the training and development of Stable Diffusion occurred
in the UK, which is crucial for establishing jurisdiction under the UK's Copyright, Designs
and Patents Act 1988 (CDPA).[93]
The secondary infringement claim revolves around whether the pre-trained Stable Diffusion
software, made available in the UK through platforms like GitHub, HuggingFace, and
DreamStudio, constitutes an "article" under sections 22 and 23 of the CDPA. The court will
decide whether the term "article" can encompass intangible items such as software.[93]
The trial is expected to take place in summer 2025 and has significant implications for UK copyright law
and the licensing of AI-generated content.

License
Unlike models like DALL-E, Stable Diffusion makes its source code available,[94][8] along with the
model (pretrained weights). Prior to Stable Diffusion 3, it applied the Creative ML OpenRAIL-M license,
a form of Responsible AI License (RAIL), to the model (M).[95] The license prohibits certain use cases,
including crime, libel, harassment, doxing, "exploiting ... minors", giving medical advice, automatically
creating legal obligations, producing legal evidence, and "discriminating against or harming individuals
or groups based on ... social behavior or ... personal or personality characteristics ... [or] legally protected
characteristics or categories".[96][97] The user owns the rights to their generated output images, and is free
to use them commercially.[98]
Stable Diffusion 3.5 applies the permissive Stability AI Community License while commercial
enterprises with revenue exceed $1 million need the Stability AI Enterprise License.[99] As with the
OpenRAIL-M license, the user retains the rights to their generated output images and is free to use them
commercially.[78]

See also
Artificial intelligence art
Runway
Midjourney
Craiyon
Hugging Face
Imagen (Google Brain)

References
1. "Stable Diffusion 3.5" (https://stability.ai/news/introducing-stable-diffusion-3-5). Stability AI.
Archived (https://archive.today/20241023040750/https://stability.ai/news/introducing-stable-d
iffusion-3-5) from the original on October 23, 2024. Retrieved October 23, 2024.
2. Ryan O'Connor (August 23, 2022). "How to Run Stable Diffusion Locally to Generate
Images" (https://www.assemblyai.com/blog/how-to-run-stable-diffusion-locally-to-generate-i
mages/). Archived (https://web.archive.org/web/20231013123717/https://www.assemblyai.c
om/blog/how-to-run-stable-diffusion-locally-to-generate-images/) from the original on
October 13, 2023. Retrieved May 4, 2023.
3. "Diffuse The Rest - a Hugging Face Space by huggingface" (https://huggingface.co/spaces/
huggingface/diffuse-the-rest). huggingface.co. Archived (https://web.archive.org/web/20220
905141431/https://huggingface.co/spaces/huggingface/diffuse-the-rest) from the original on
September 5, 2022. Retrieved September 5, 2022.
4. "Leaked deck raises questions over Stability AI's Series A pitch to investors" (https://sifted.e
u/articles/stability-ai-fundraise-leak). sifted.eu. Archived (https://web.archive.org/web/202306
29201917/https://sifted.eu/articles/stability-ai-fundraise-leak) from the original on June 29,
2023. Retrieved June 20, 2023.
5. "Revolutionizing image generation by AI: Turning text into images" (https://www.lmu.de/en/n
ewsroom/news-overview/news/revolutionizing-image-generation-by-ai-turning-text-into-imag
es.html). www.lmu.de. Archived (https://web.archive.org/web/20220917200820/https://www.l
mu.de/en/newsroom/news-overview/news/revolutionizing-image-generation-by-ai-turning-te
xt-into-images.html) from the original on September 17, 2022. Retrieved June 21, 2023.
6. Mostaque, Emad (November 2, 2022). "Stable Diffusion came from the Machine Vision &
Learning research group (CompVis) @LMU_Muenchen" (https://twitter.com/EMostaque/stat
us/1587844074064822274?lang=en). Twitter. Archived (https://web.archive.org/web/202307
20002303/https://twitter.com/EMostaque/status/1587844074064822274?lang=en) from the
original on July 20, 2023. Retrieved June 22, 2023.
7. "Stable Diffusion Launch Announcement" (https://stability.ai/blog/stable-diffusion-announce
ment). Stability.Ai. Archived (https://web.archive.org/web/20220905105009/https://stability.ai/
blog/stable-diffusion-announcement) from the original on September 5, 2022. Retrieved
September 6, 2022.
8. "Stable Diffusion Repository on GitHub" (https://github.com/CompVis/stable-diffusion).
CompVis - Machine Vision and Learning Research Group, LMU Munich. September 17,
2022. Archived (https://web.archive.org/web/20230118183342/https://github.com/CompVis/s
table-diffusion) from the original on January 18, 2023. Retrieved September 17, 2022.
9. "The new killer app: Creating AI art will absolutely crush your PC" (https://www.pcworld.com/
article/916785/creating-ai-art-local-pc-stable-diffusion.html). PCWorld. Archived (https://web.
archive.org/web/20220831065139/https://www.pcworld.com/article/916785/creating-ai-art-lo
cal-pc-stable-diffusion.html) from the original on August 31, 2022. Retrieved August 31,
2022.
10. Vincent, James (September 15, 2022). "Anyone can use this AI art generator — that's the
risk" (https://www.theverge.com/2022/9/15/23340673/ai-image-generation-stable-diffusion-e
xplained-ethics-copyright-data). The Verge. Archived (https://web.archive.org/web/20230121
153021/https://www.theverge.com/2022/9/15/23340673/ai-image-generation-stable-diffusion
-explained-ethics-copyright-data) from the original on January 21, 2023. Retrieved
September 30, 2022.
11. "CompVis/Latent-diffusion" (https://github.com/CompVis/latent-diffusion). GitHub.
12. "Stable Diffusion 3: Research Paper" (https://stability.ai/news/stable-diffusion-3-research-pa
per).
13. "Home" (https://ommer-lab.com/). Computer Vision & Learning Group. Retrieved
September 5, 2024.
14. Rombach; Blattmann; Lorenz; Esser; Ommer (June 2022). High-Resolution Image
Synthesis with Latent Diffusion Models (https://openaccess.thecvf.com/content/CVPR2022/
papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVP
R_2022_paper.pdf) (PDF). International Conference on Computer Vision and Pattern
Recognition (CVPR). New Orleans, LA. pp. 10684–10695. arXiv:2112.10752 (https://arxiv.or
g/abs/2112.10752). Archived (https://web.archive.org/web/20230120163151/https://openacc
ess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_W
ith_Latent_Diffusion_Models_CVPR_2022_paper.pdf) (PDF) from the original on January
20, 2023. Retrieved September 17, 2022.
15. Alammar, Jay. "The Illustrated Stable Diffusion" (https://jalammar.github.io/illustrated-stable-
diffusion/). jalammar.github.io. Archived (https://web.archive.org/web/20221101104342/http
s://jalammar.github.io/illustrated-stable-diffusion/) from the original on November 1, 2022.
Retrieved October 31, 2022.
16. David, Foster. "8. Diffusion Models". Generative Deep Learning (2 ed.). O'Reilly.
17. Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli (March 12,
2015). "Deep Unsupervised Learning using Nonequilibrium Thermodynamics".
arXiv:1503.03585 (https://arxiv.org/abs/1503.03585) [cs.LG (https://arxiv.org/archive/cs.LG)].
18. "Stable diffusion pipelines" (https://huggingface.co/docs/diffusers/v0.5.1/en/api/pipelines/sta
ble_diffusion). huggingface.co. Archived (https://web.archive.org/web/20230625030241/http
s://huggingface.co/docs/diffusers/v0.5.1/en/api/pipelines/stable_diffusion) from the original
on June 25, 2023. Retrieved June 22, 2023.
19. "Text-to-Image Generation with Stable Diffusion and OpenVINO™" (https://docs.openvino.a
i/2023.3/notebooks/225-stable-diffusion-text-to-image-with-output.html). openvino.ai. Intel.
Retrieved February 10, 2024.
20. Podell, Dustin; English, Zion; Lacey, Kyle; Blattmann, Andreas; Dockhorn, Tim; Müller,
Jonas; Penna, Joe; Rombach, Robin (July 4, 2023). "SDXL: Improving Latent Diffusion
Models for High-Resolution Image Synthesis". arXiv:2307.01952 (https://arxiv.org/abs/2307.
01952) [cs.CV (https://arxiv.org/archive/cs.CV)].
21. Esser, Patrick; Kulal, Sumith; Blattmann, Andreas; Entezari, Rahim; Müller, Jonas; Saini,
Harry; Levi, Yam; Lorenz, Dominik; Sauer, Axel (March 5, 2024), Scaling Rectified Flow
Transformers for High-Resolution Image Synthesis, arXiv:2403.03206 (https://arxiv.org/abs/
2403.03206)
22. Liu, Xingchao; Gong, Chengyue; Liu, Qiang (September 7, 2022), Flow Straight and Fast:
Learning to Generate and Transfer Data with Rectified Flow, arXiv:2209.03003 (https://arxiv.
org/abs/2209.03003)
23. "Rectified Flow — Rectified Flow" (https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.htm
l). www.cs.utexas.edu. Retrieved March 6, 2024.
24. Baio, Andy (August 30, 2022). "Exploring 12 Million of the 2.3 Billion Images Used to Train
Stable Diffusion's Image Generator" (https://waxy.org/2022/08/exploring-12-million-of-the-im
ages-used-to-train-stable-diffusions-image-generator/). Waxy.org. Archived (https://web.arch
ive.org/web/20230120124332/https://waxy.org/2022/08/exploring-12-million-of-the-images-u
sed-to-train-stable-diffusions-image-generator/) from the original on January 20, 2023.
Retrieved November 2, 2022.
25. "This artist is dominating AI-generated art. And he's not happy about it" (https://www.technol
ogyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-h
appy-about-it/). MIT Technology Review. Archived (https://web.archive.org/web/2023011412
5952/https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-ge
nerated-art-and-hes-not-happy-about-it/) from the original on January 14, 2023. Retrieved
November 2, 2022.
26. Brunner, Katharina; Harlan, Elisa (July 7, 2023). "We Are All Raw Material for AI" (https://inte
raktiv.br.de/ki-trainingsdaten/en/index.html). Bayerischer Rundfunk (BR). Archived (https://w
eb.archive.org/web/20230912092308/https://interaktiv.br.de/ki-trainingsdaten/en/index.html)
from the original on September 12, 2023. Retrieved September 12, 2023.
27. Schuhmann, Christoph (November 2, 2022), CLIP+MLP Aesthetic Score Predictor (https://gi
thub.com/christophschuhmann/improved-aesthetic-predictor), archived (https://web.archive.
org/web/20230608005334/http://github.com/christophschuhmann/improved-aesthetic-predic
tor/) from the original on June 8, 2023, retrieved November 2, 2022
28. "LAION-Aesthetics | LAION" (https://laion.ai/blog/laion-aesthetics). laion.ai. Archived (https://
web.archive.org/web/20220826121216/https://laion.ai/blog/laion-aesthetics/) from the
original on August 26, 2022. Retrieved September 2, 2022.
29. Ho, Jonathan; Salimans, Tim (July 25, 2022). "Classifier-Free Diffusion Guidance".
arXiv:2207.12598 (https://arxiv.org/abs/2207.12598) [cs.LG (https://arxiv.org/archive/cs.LG)].
30. Mostaque, Emad (August 28, 2022). "Cost of construction" (https://twitter.com/emostaque/st
atus/1563870674111832066). Twitter. Archived (https://web.archive.org/web/202209061554
26/https://twitter.com/EMostaque/status/1563870674111832066) from the original on
September 6, 2022. Retrieved September 6, 2022.
31. "CompVis/stable-diffusion-v1-4 · Hugging Face" (https://huggingface.co/CompVis/stable-diff
usion-v1-4). huggingface.co. Archived (https://web.archive.org/web/20230111161920/http
s://huggingface.co/CompVis/stable-diffusion-v1-4) from the original on January 11, 2023.
Retrieved November 2, 2022.
32. Wiggers, Kyle (August 12, 2022). "A startup wants to democratize the tech behind DALL-E
2, consequences be damned" (https://techcrunch.com/2022/08/12/a-startup-wants-to-democ
ratize-the-tech-behind-dall-e-2-consequences-be-damned/). TechCrunch. Archived (https://w
eb.archive.org/web/20230119005503/https://techcrunch.com/2022/08/12/a-startup-wants-to-
democratize-the-tech-behind-dall-e-2-consequences-be-damned/) from the original on
January 19, 2023. Retrieved November 2, 2022.
33. emad_9608 (April 19, 2024). "10m is about right" (https://www.reddit.com/r/StableDiffusion/c
omments/1c870a5/any_estimate_on_how_much_money_they_spent_to/l0dc2ni/).

🧨
r/StableDiffusion. Retrieved April 25, 2024.
34. "Stable Diffusion with Diffusers" (https://huggingface.co/blog/stable_diffusion).
huggingface.co. Archived (https://web.archive.org/web/20230117222142/https://huggingfac
e.co/blog/stable_diffusion) from the original on January 17, 2023. Retrieved October 31,
2022.
35. "Stable Diffusion 2.0 Release" (https://stability.ai/blog/stable-diffusion-v2-release).
stability.ai. Archived (https://web.archive.org/web/20221210062729/https://stability.ai/blog/st
able-diffusion-v2-release) from the original on December 10, 2022.
36. "LAION" (https://laion.ai/). laion.ai. Archived (https://web.archive.org/web/20231016082902/
https://laion.ai/) from the original on October 16, 2023. Retrieved October 31, 2022.
37. "Generating images with Stable Diffusion" (https://blog.paperspace.com/generating-images-
with-stable-diffusion/). Paperspace Blog. August 24, 2022. Archived (https://web.archive.org/
web/20221031231727/https://blog.paperspace.com/generating-images-with-stable-diffusio
n/) from the original on October 31, 2022. Retrieved October 31, 2022.
38. "Announcing SDXL 1.0" (https://stability.ai/blog/stable-diffusion-sdxl-1-announcement).
Stability AI. Archived (https://web.archive.org/web/20230726215239/https://stability.ai/blog/st
able-diffusion-sdxl-1-announcement) from the original on July 26, 2023. Retrieved
August 21, 2023.
39. Edwards, Benj (July 27, 2023). "Stability AI releases Stable Diffusion XL, its next-gen image
synthesis model" (https://arstechnica.com/information-technology/2023/07/stable-diffusion-xl
-puts-ai-generated-visual-worlds-at-your-gpus-command/). Ars Technica. Archived (https://w
eb.archive.org/web/20230821011216/https://arstechnica.com/information-technology/2023/0
7/stable-diffusion-xl-puts-ai-generated-visual-worlds-at-your-gpus-command/) from the
original on August 21, 2023. Retrieved August 21, 2023.
40. "hakurei/waifu-diffusion · Hugging Face" (https://huggingface.co/hakurei/waifu-diffusion).
huggingface.co. Archived (https://web.archive.org/web/20231008120655/https://huggingfac
e.co/hakurei/waifu-diffusion) from the original on October 8, 2023. Retrieved October 31,
2022.
41. Chambon, Pierre; Bluethgen, Christian; Langlotz, Curtis P.; Chaudhari, Akshay (October 9,
2022). "Adapting Pretrained Vision-Language Foundational Models to Medical Imaging
Domains". arXiv:2210.04133 (https://arxiv.org/abs/2210.04133) [cs.CV (https://arxiv.org/arch
ive/cs.CV)].
42. Seth Forsgren; Hayk Martiros. "Riffusion - Stable diffusion for real-time music generation" (h
ttps://www.riffusion.com/about). Riffusion. Archived (https://web.archive.org/web/202212160
92717/https://www.riffusion.com/about) from the original on December 16, 2022.
43. Mercurio, Anthony (October 31, 2022), Waifu Diffusion (https://github.com/harubaru/waifu-dif
fusion/blob/6bf942eb6368ebf6bcbbb24b6ba8197bda6582a0/docs/en/training/README.m
d), archived (https://web.archive.org/web/20221031234225/https://github.com/harubaru/waif
u-diffusion/blob/6bf942eb6368ebf6bcbbb24b6ba8197bda6582a0/docs/en/training/READM
E.md) from the original on October 31, 2022, retrieved October 31, 2022
44. Smith, Ryan. "NVIDIA Quietly Launches GeForce RTX 3080 12GB: More VRAM, More
Power, More Money" (https://www.anandtech.com/show/17204/nvidia-quietly-launches-gefor
ce-rtx-3080-12gb-more-vram-more-power-more-money). www.anandtech.com. Archived (htt
ps://web.archive.org/web/20230827092451/https://www.anandtech.com/show/17204/nvidia-
quietly-launches-geforce-rtx-3080-12gb-more-vram-more-power-more-money) from the
original on August 27, 2023. Retrieved October 31, 2022.
45. Dave James (October 28, 2022). "I thrashed the RTX 4090 for 8 hours straight training
Stable Diffusion to paint like my uncle Hermann" (https://www.pcgamer.com/nvidia-rtx-4090-
stable-diffusion-training-aharon-kahana/). PC Gamer. Archived (https://web.archive.org/web/
20221109154310/https://www.pcgamer.com/nvidia-rtx-4090-stable-diffusion-training-aharon-
kahana/) from the original on November 9, 2022.
46. Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal;
Cohen-Or, Daniel (August 2, 2022). "An Image is Worth One Word: Personalizing Text-to-
Image Generation using Textual Inversion". arXiv:2208.01618 (https://arxiv.org/abs/2208.01
618) [cs.CV (https://arxiv.org/archive/cs.CV)].
47. "NovelAI Improvements on Stable Diffusion" (https://blog.novelai.net/novelai-improvements-
on-stable-diffusion-e10d38db82ac). NovelAI. October 11, 2022. Archived (https://archive.tod
ay/20221027041603/https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d
38db82ac) from the original on October 27, 2022.
48. Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけ
でコスプレ米Googleが開発" (https://www.itmedia.co.jp/news/articles/2209/01/news041.htm
l). ITmedia Inc. (in Japanese). Archived (https://web.archive.org/web/20220831232021/http
s://www.itmedia.co.jp/news/articles/2209/01/news041.html) from the original on August 31,
2022.
49. Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon,
Stefano (August 2, 2021). "SDEdit: Guided Image Synthesis and Editing with Stochastic
Differential Equations". arXiv:2108.01073 (https://arxiv.org/abs/2108.01073) [cs.CV (https://a
rxiv.org/archive/cs.CV)].
50. "Stable Diffusion web UI" (https://github.com/AUTOMATIC1111/stable-diffusion-webui-featur
e-showcase). GitHub. November 10, 2022. Archived (https://web.archive.org/web/20230120
032734/https://github.com/AUTOMATIC1111/stable-diffusion-webui-feature-showcase) from
the original on January 20, 2023. Retrieved September 27, 2022.
51. invisible-watermark (https://github.com/ShieldMnt/invisible-watermark/blob/9802ce3e0c3a5e
c43b41d503f156717f0c739584/README.md), Shield Mountain, November 2, 2022,
archived (https://web.archive.org/web/20221018062806/https://github.com/ShieldMnt/invisibl
e-watermark/blob/9802ce3e0c3a5ec43b41d503f156717f0c739584/README.md) from the
original on October 18, 2022, retrieved November 2, 2022
52. "stable-diffusion-tools/emphasis at master · JohannesGaessler/stable-diffusion-tools" (http
s://github.com/JohannesGaessler/stable-diffusion-tools). GitHub. Archived (https://web.archi
ve.org/web/20221002081041/https://github.com/JohannesGaessler/stable-diffusion-tools)
from the original on October 2, 2022. Retrieved November 2, 2022.
53. "Stable Diffusion v2.1 and DreamStudio Updates 7-Dec 22" (https://stability.ai/blog/stablediff
usion2-1-release7-dec-2022). stability.ai. Archived (https://web.archive.org/web/2022121006
2732/https://stability.ai/blog/stablediffusion2-1-release7-dec-2022) from the original on
December 10, 2022.
54. Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon,
Stefano (January 4, 2022). "SDEdit: Guided Image Synthesis and Editing with Stochastic
Differential Equations". arXiv:2108.01073 (https://arxiv.org/abs/2108.01073) [cs.CV (https://a
rxiv.org/archive/cs.CV)].
55. Luzi, Lorenzo; Siahkoohi, Ali; Mayer, Paul M.; Casco-Rodriguez, Josue; Baraniuk, Richard
(October 21, 2022). "Boomerang: Local sampling on image manifolds using diffusion
models". arXiv:2210.12100 (https://arxiv.org/abs/2210.12100) [cs.CV (https://arxiv.org/archiv
e/cs.CV)].
56. Bühlmann, Matthias (September 28, 2022). "Stable Diffusion Based Image Compression" (h
ttps://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202).
Medium. Archived (https://web.archive.org/web/20221102231642/https://pub.towardsai.net/s
table-diffusion-based-image-compresssion-6f1f0a399202) from the original on November 2,
2022. Retrieved November 2, 2022.
57. Zhang, Lvmin (February 10, 2023). "Adding Conditional Control to Text-to-Image Diffusion
Models". arXiv:2302.05543 (https://arxiv.org/abs/2302.05543) [cs.CV (https://arxiv.org/archiv
e/cs.CV)].
58. Edwards, Benj (November 10, 2022). "Stable Diffusion in your pocket? "Draw Things" brings
AI images to iPhone" (https://arstechnica.com/information-technology/2022/11/stable-diffusi
on-in-your-pocket-draw-things-brings-ai-images-to-iphone/). Ars Technica. Retrieved
July 10, 2024.
59. Wendling, Mike (March 6, 2024). "AI can be easily used to make fake election photos -
report" (https://www.bbc.com/news/world-us-canada-68471253). bbc.com. Retrieved
July 10, 2024. "The CCDH, a campaign group, tested four of the largest public-facing AI
platforms: Midjourney, OpenAI's ChatGPT Plus, Stability.ai's DreamStudio and Microsoft's
Image Creator."
60. Wiggers, Kyle (May 18, 2023). "Stability AI open sources its AI-powered design studio" (http
s://techcrunch.com/2023/05/18/stability-ai-open-sources-its-ai-powered-design-studio/).
TechCrunch. Retrieved July 10, 2024.
61. Weatherbed, Jess (May 17, 2023). "Stability AI is open-sourcing its DreamStudio web app"
(https://www.theverge.com/2023/5/17/23726751/stability-ai-stablestudio-dreamstudio-stable-
diffusion-artificial-intelligence). The Verge.
62. Mann, Tobias (June 29, 2024). "A friendly guide to local AI image gen with Stable Diffusion
and Automatic1111" (https://www.theregister.com/2024/06/29/image_gen_guide/). The
Register.
63. Hachman, Mak. "Fooocus is the easiest way to create AI art on your PC" (https://www.pcwor
ld.com/article/2253285/fooocus-is-the-easiest-way-to-run-ai-art-on-your-pc.html). PCWorld.
64. "ComfyUI Workflows and what you need to know" (https://learn.thinkdiffusion.com/comfyui-w
orkflows-and-what-you-need-to-know/). thinkdiffusion.com. December 2023. Retrieved
July 10, 2024.
65. "ComfyUI" (https://github.com/comfyanonymous/ComfyUI). github.com. Retrieved July 10,
2024.
66. Huang, Yenkai (May 10, 2024). Latent Auto-recursive Composition Engine (https://digitalcom
mons.dartmouth.edu/cgi/viewcontent.cgi?article=1188&context=masters_theses) (M.S.
Computer Science thesis). Dartmouth College. Retrieved July 10, 2024.
67. "CompVis/stable-diffusion-v1-4 · Hugging Face" (https://huggingface.co/CompVis/stable-diff
usion-v1-4). huggingface.co. Archived (https://web.archive.org/web/20230111161920/http
s://huggingface.co/CompVis/stable-diffusion-v1-4) from the original on January 11, 2023.
Retrieved August 17, 2023.
68. "CompVis (CompVis)" (https://huggingface.co/CompVis). huggingface.co. August 23, 2023.
Retrieved March 6, 2024.
69. "runwayml/stable-diffusion-v1-5 · Hugging Face" (https://huggingface.co/runwayml/stable-dif
fusion-v1-5). huggingface.co. Archived (https://web.archive.org/web/20230921025150/http
s://huggingface.co/runwayml/stable-diffusion-v1-5) from the original on September 21, 2023.
Retrieved August 17, 2023.
70. "stabilityai/stable-diffusion-2 · Hugging Face" (https://huggingface.co/stabilityai/stable-diffusi
on-2). huggingface.co. Archived (https://web.archive.org/web/20230921135247/https://huggi
ngface.co/stabilityai/stable-diffusion-2) from the original on September 21, 2023. Retrieved
August 17, 2023.
71. "stabilityai/stable-diffusion-2-base · Hugging Face" (https://huggingface.co/stabilityai/stable-
diffusion-2-base). huggingface.co. Retrieved January 1, 2024.
72. "stabilityai/stable-diffusion-2-1 · Hugging Face" (https://huggingface.co/stabilityai/stable-diffu
sion-2-1). huggingface.co. Archived (https://web.archive.org/web/20230921025146/https://h
uggingface.co/stabilityai/stable-diffusion-2-1) from the original on September 21, 2023.
Retrieved August 17, 2023.
73. "stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face" (https://huggingface.co/stabilityai/sta
ble-diffusion-xl-base-1.0). huggingface.co. Archived (https://web.archive.org/web/202310080
71719/https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) from the original on
October 8, 2023. Retrieved August 17, 2023.
74. "Announcing SDXL 1.0" (https://stability.ai/news/stable-diffusion-sdxl-1-announcement).
Stability AI. Retrieved January 1, 2024.
75. "stabilityai/sdxl-turbo · Hugging Face" (https://huggingface.co/stabilityai/sdxl-turbo).
huggingface.co. Retrieved January 1, 2024.
76. "Adversarial Diffusion Distillation" (https://stability.ai/research/adversarial-diffusion-distillatio
n). Stability AI. Retrieved January 1, 2024.
77. "Stable Diffusion 3" (https://stability.ai/news/stable-diffusion-3). Stability AI. Retrieved
March 5, 2024.
78. "Stable Diffusion 3.5" (https://stability.ai/news/introducing-stable-diffusion-3-5). Stability AI.
Archived (https://archive.today/20241023040750/https://stability.ai/news/introducing-stable-d
iffusion-3-5) from the original on October 23, 2024. Retrieved October 23, 2024.
79. Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal,
Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela (February 26, 2021). "Learning
Transferable Visual Models From Natural Language Supervision". arXiv:2103.00020 (https://
arxiv.org/abs/2103.00020) [cs.CV (https://arxiv.org/archive/cs.CV)].
80. Rombach, Robin; Blattmann, Andreas; Lorenz, Dominik; Esser, Patrick; Ommer, Björn
(2022). "High-Resolution Image Synthesis With Latent Diffusion Models" (https://openacces
s.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_L
atent_Diffusion_Models_CVPR_2022_paper.html). Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10684–10695.
arXiv:2112.10752 (https://arxiv.org/abs/2112.10752).
81. Kerner, Sean Michael (December 19, 2024). "Stable Diffusion 3.5 hits Amazon Bedrock:
What it means for enterprise AI workflows" (https://venturebeat.com/ai/stable-diffusion-3-5-hi
ts-amazon-bedrock-what-it-means-for-enterprise-ai-workflows/). VentureBeat. Retrieved
December 25, 2024.
82. "LICENSE.md · stabilityai/stable-diffusion-xl-base-1.0 at main" (https://huggingface.co/stabili
tyai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md). huggingface.co. July 26, 2023.
Retrieved January 1, 2024.
83. Heikkilä, Melissa (September 16, 2022). "This artist is dominating AI-generated art. And he's
not happy about it" (https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-do
minating-ai-generated-art-and-hes-not-happy-about-it/). MIT Technology Review. Archived
(https://web.archive.org/web/20230114125952/https://www.technologyreview.com/2022/09/1
6/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/) from the
original on January 14, 2023. Retrieved September 26, 2022.
84. Ryo Shimizu (August 26, 2022). "Midjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が
｢AIを民主化した｣と断言できる理由" (https://www.businessinsider.jp/post-258369).
Business Insider Japan (in Japanese). Archived (https://web.archive.org/web/202212101924
53/https://www.businessinsider.jp/post-258369) from the original on December 10, 2022.
Retrieved October 4, 2022.
85. Cai, Kenrick. "Startup Behind AI Image Generator Stable Diffusion Is In Talks To Raise At A
Valuation Up To $1 Billion" (https://www.forbes.com/sites/kenrickcai/2022/09/07/stability-ai-fu
nding-round-1-billion-valuation-stable-diffusion-text-to-image/). Forbes. Archived (https://we
b.archive.org/web/20230930125226/https://www.forbes.com/sites/kenrickcai/2022/09/07/sta
bility-ai-funding-round-1-billion-valuation-stable-diffusion-text-to-image/) from the original on
September 30, 2023. Retrieved October 31, 2022.
86. "Illegal trade in AI child sex abuse images exposed" (https://www.bbc.com/news/uk-6593237
2). BBC News. June 27, 2023. Archived (https://web.archive.org/web/20230921100213/http
s://www.bbc.com/news/uk-65932372) from the original on September 21, 2023. Retrieved
September 26, 2023.
87. Maiberg, Emanuel (June 11, 2024). "Hackers Target AI Users With Malicious Stable
Diffusion Tool on GitHub to Protest 'Art Theft' " (https://www.404media.co/hackers-target-ai-u
sers-with-malicious-stable-diffusion-tool-on-github/). 404 Media. Retrieved June 14, 2024.
88. Vincent, James (January 16, 2023). "AI art tools Stable Diffusion and Midjourney targeted
with copyright lawsuit" (https://www.theverge.com/2023/1/16/23557098/generative-ai-art-cop
yright-legal-lawsuit-stable-diffusion-midjourney-deviantart). The Verge. Archived (https://we
b.archive.org/web/20230309010528/https://www.theverge.com/2023/1/16/23557098/generat
ive-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart) from the original on
March 9, 2023. Retrieved January 16, 2023.
89. Brittain, Blake (July 19, 2023). "US judge finds flaws in artists' lawsuit against AI companies"
(https://www.reuters.com/legal/litigation/us-judge-finds-flaws-artists-lawsuit-against-ai-comp
anies-2023-07-19/). Reuters. Archived (https://web.archive.org/web/20230906193839/http
s://www.reuters.com/legal/litigation/us-judge-finds-flaws-artists-lawsuit-against-ai-companies
-2023-07-19/) from the original on September 6, 2023. Retrieved August 6, 2023.
90. Goosens, Sophia (February 28, 2024). "Getty Images v Stability AI: the implications for UK
copyright law and licensing" (https://www.pinsentmasons.com/out-law/analysis/getty-images
-v-stability-ai-implications-copyright-law-licensing).
91. Gill, Dennis (December 11, 2023). "Getty Images v Stability AI: copyright claims can
proceed to trial" (https://www.pinsentmasons.com/out-law/news/getty-images-v-stability-ai).
92. Goosens, Sophia (February 28, 2024). "Getty v. Stability AI case goes to trial in the UK –
what we learned" (https://www.reedsmith.com/en/perspectives/2024/02/getty-v-stability-ai-ca
se-goes-to-trial-in-the-uk-what-we-learned).
93. Hill, Charlotte (February 16, 2024). "Generative AI in the courts: Getty Images v Stability AI"
(https://www.penningtonslaw.com/news-publications/latest-news/2024/generative-ai-in-the-c
ourts-getty-images-v-stability-ai).
94. "Stable Diffusion Public Release" (https://stability.ai/blog/stable-diffusion-public-release).
Stability.Ai. Archived (https://web.archive.org/web/20220830210535/https://stability.ai/blog/st
able-diffusion-public-release) from the original on August 30, 2022. Retrieved August 31,
2022.
95. "From RAIL to Open RAIL: Topologies of RAIL Licenses" (https://www.licenses.ai/blog/2022/
8/18/naming-convention-of-responsible-ai-licenses). Responsible AI Licenses (RAIL).
August 18, 2022. Archived (https://web.archive.org/web/20230727145215/https://www.licens
es.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses) from the original on July
27, 2023. Retrieved February 20, 2023.
96. "Ready or not, mass video deepfakes are coming" (https://www.washingtonpost.com/technol
ogy/2022/08/30/deep-fake-video-on-agt/). The Washington Post. August 30, 2022. Archived
(https://web.archive.org/web/20220831115010/https://www.washingtonpost.com/technology/
2022/08/30/deep-fake-video-on-agt/) from the original on August 31, 2022. Retrieved
August 31, 2022.
97. "License - a Hugging Face Space by CompVis" (https://huggingface.co/spaces/CompVis/sta
ble-diffusion-license). huggingface.co. Archived (https://web.archive.org/web/202209042156
16/https://huggingface.co/spaces/CompVis/stable-diffusion-license) from the original on
September 4, 2022. Retrieved September 5, 2022.
98. Katsuo Ishida (August 26, 2022). "言葉で指示した画像を凄いAIが描き出す「Stable
Diffusion」～画像は商用利用も可能" (https://forest.watch.impress.co.jp/docs/review/14348
93.html). Impress Corporation (in Japanese). Archived (https://web.archive.org/web/202211
14020520/https://forest.watch.impress.co.jp/docs/review/1434893.html) from the original on
November 14, 2022. Retrieved October 4, 2022.
99. "Community License" (https://stability.ai/news/license-update). Stability AI. July 5, 2024.
Retrieved October 23, 2024.

External links
Stable Diffusion Demo (https://huggingface.co/spaces/stabilityai/stable-diffusion)
"Step by Step visual introduction to Diffusion Models. - Blog by Kemal Erdem" (https://erde
m.pl/2023/11/step-by-step-visual-introduction-to-diffusion-models/). Retrieved August 31,
2024.
"U-Net for Stable Diffusion" (https://nn.labml.ai/diffusion/stable_diffusion/model/unet.html).
U-Net for Stable Diffusion. Retrieved August 31, 2024.
Interactive Explanation of Stable Diffusion (https://poloclub.github.io/diffusion-explainer/)
"We Are All Raw Material for AI" (https://interaktiv.br.de/ki-trainingsdaten/en/index.html):
Investigation on sensitive and private data in Stable Diffusions training data
"Negative Prompts in Stable Diffusion (https://talkdigital.com.au/ai/stable-diffusion-negative-
prompt-list/)"
"Negative Prompts in Stable Diffusion (https://infoofai.com/negative-prompts-in-stable-diffusi
on//)"

Retrieved from "https://en.wikipedia.org/w/index.php?title=Stable_Diffusion&oldid=1271108917"

Limitless Expanded Workbook
100% (1)
Limitless Expanded Workbook
101 pages
Foundations of AI
No ratings yet
Foundations of AI
9 pages
SDXL Diffusion Model Training - Style & Objects
No ratings yet
SDXL Diffusion Model Training - Style & Objects
49 pages
Understanding Stable Diffusion
100% (1)
Understanding Stable Diffusion
66 pages
Diffusion
100% (6)
Diffusion
62 pages
Lecture 24 26
No ratings yet
Lecture 24 26
123 pages
2nd Quarter Exam - Attempt Review
No ratings yet
2nd Quarter Exam - Attempt Review
6 pages
Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
No ratings yet
Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
4 pages
StableDiffusion Presentation
No ratings yet
StableDiffusion Presentation
27 pages
Lec16 DiffusionModels
No ratings yet
Lec16 DiffusionModels
57 pages
Cbys fq1 Cbys fq1
No ratings yet
Cbys fq1 Cbys fq1
10 pages
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
3 pages
Stable Diffusion
No ratings yet
Stable Diffusion
19 pages
Iso Iec 42001 Lead Auditor Elearning
No ratings yet
Iso Iec 42001 Lead Auditor Elearning
10 pages
PARRY
No ratings yet
PARRY
2 pages
How Does Stable Diffusion Work
No ratings yet
How Does Stable Diffusion Work
79 pages
Exploring Multi Faceted Image Processing Generation Adversarial Attacks and Denoising
No ratings yet
Exploring Multi Faceted Image Processing Generation Adversarial Attacks and Denoising
38 pages
Stable Diffusion
No ratings yet
Stable Diffusion
5 pages
Presentation 1 - Introdution To Computer
No ratings yet
Presentation 1 - Introdution To Computer
72 pages
DT Unit 2 Lecture Notes
No ratings yet
DT Unit 2 Lecture Notes
18 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
Advancing AI-Powered Medical Image Synthesis - Insights From MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
No ratings yet
Advancing AI-Powered Medical Image Synthesis - Insights From MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
19 pages
Generative AI in Business Consulting Analyzing Its Impact On Client Engagement and Service Delivery Models
No ratings yet
Generative AI in Business Consulting Analyzing Its Impact On Client Engagement and Service Delivery Models
8 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
Thesis 11 51
No ratings yet
Thesis 11 51
41 pages
(Nsdi24) Nirvana
No ratings yet
(Nsdi24) Nirvana
18 pages
Mycin
No ratings yet
Mycin
5 pages
Image Classification and Generation of Images
No ratings yet
Image Classification and Generation of Images
21 pages
Generative AI in Vision: A Survey On Models, Metrics and Applications
No ratings yet
Generative AI in Vision: A Survey On Models, Metrics and Applications
12 pages
Paper 10
No ratings yet
Paper 10
8 pages
The Serfdom of Crowds
No ratings yet
The Serfdom of Crowds
5 pages
Robotics: Lecture 1: Introduction To Robotics
No ratings yet
Robotics: Lecture 1: Introduction To Robotics
44 pages
Diffusion Models in Vision: A Survey: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah
No ratings yet
Diffusion Models in Vision: A Survey: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah
25 pages
Stable Diffusion Clearly Explained! - by Steins - Medium
No ratings yet
Stable Diffusion Clearly Explained! - by Steins - Medium
13 pages
SVDiff Compact Parameter Space For Diffusion Fine-Tuning
No ratings yet
SVDiff Compact Parameter Space For Diffusion Fine-Tuning
12 pages
Module 5
No ratings yet
Module 5
23 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
Module 5
No ratings yet
Module 5
23 pages
Interactive Visual Learning For Stable Diffusion
No ratings yet
Interactive Visual Learning For Stable Diffusion
4 pages
1-Effective Data Augmentation With Diffusion Models
No ratings yet
1-Effective Data Augmentation With Diffusion Models
23 pages
Machine Learning for Engineers
100% (1)
Machine Learning for Engineers
80 pages
Fine Tuning Text-To-Image Diffusion Models For Correcting Anomalous Images
No ratings yet
Fine Tuning Text-To-Image Diffusion Models For Correcting Anomalous Images
7 pages
Multi-Modal AI Models: Vision & Language
No ratings yet
Multi-Modal AI Models: Vision & Language
26 pages
IEEE Editable
No ratings yet
IEEE Editable
8 pages
Stable Diffusion - Wikipedia
No ratings yet
Stable Diffusion - Wikipedia
1 page
Diffusion-StableDiffusion
No ratings yet
Diffusion-StableDiffusion
27 pages
DeepAI Text
No ratings yet
DeepAI Text
6 pages
Slides Curso Generacion Imagenes Ai
No ratings yet
Slides Curso Generacion Imagenes Ai
180 pages
Hollein ViewDiff 3D-Consistent Image Generation With Text-to-Image Models CVPR 2024 Paper
No ratings yet
Hollein ViewDiff 3D-Consistent Image Generation With Text-to-Image Models CVPR 2024 Paper
10 pages
18 Deeprl
No ratings yet
18 Deeprl
19 pages
Deep Learning Models Overview
No ratings yet
Deep Learning Models Overview
66 pages
Lecture 21
No ratings yet
Lecture 21
10 pages
Autodecoding Latent 3D Diffusion Models
No ratings yet
Autodecoding Latent 3D Diffusion Models
22 pages
Diffusion Processes 31 Picked Subjects94412
No ratings yet
Diffusion Processes 31 Picked Subjects94412
9 pages
Background and Literature Review
No ratings yet
Background and Literature Review
17 pages
Deep Learning Akash
No ratings yet
Deep Learning Akash
12 pages
Three Things We Need To Know About Transferring Stable Diffusion To Visual Dense Prediction Tasks
No ratings yet
Three Things We Need To Know About Transferring Stable Diffusion To Visual Dense Prediction Tasks
18 pages
Stable Diffusion With Generative Ai
No ratings yet
Stable Diffusion With Generative Ai
3 pages
V2 - Gen AI
No ratings yet
V2 - Gen AI
1 page
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
No ratings yet
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
16 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Background and Literature Review
No ratings yet
Background and Literature Review
7 pages
Efficient Diffusion Models For Vision A Survey
No ratings yet
Efficient Diffusion Models For Vision A Survey
16 pages
Diffusion Models in Vision Survey
No ratings yet
Diffusion Models in Vision Survey
20 pages
Dimba Transformer-Mamba Diffusion Models
No ratings yet
Dimba Transformer-Mamba Diffusion Models
15 pages
Motion Tracking
No ratings yet
Motion Tracking
49 pages
Screenshot 2023-07-29 at 10.26.11 AM
No ratings yet
Screenshot 2023-07-29 at 10.26.11 AM
1 page
Understanding Stable Diffusion
No ratings yet
Understanding Stable Diffusion
13 pages
PLC 631-Session 1
No ratings yet
PLC 631-Session 1
25 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
GPU-Optimized On-Device Diffusion Models
No ratings yet
GPU-Optimized On-Device Diffusion Models
5 pages
3D-Consistent Image Generation Method
No ratings yet
3D-Consistent Image Generation Method
22 pages
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
No ratings yet
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
26 pages
You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs
No ratings yet
You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs
14 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
Report: Trends in Generative Models
No ratings yet
Report: Trends in Generative Models
10 pages
Tech & Innovation for Business Growth
No ratings yet
Tech & Innovation for Business Growth
97 pages
AI in UK 1690306074
No ratings yet
AI in UK 1690306074
58 pages
Beginner's Guide to Generative Art
No ratings yet
Beginner's Guide to Generative Art
14 pages
B 5 - HubSpot and Motion AI Chatbot Enabled CRM
No ratings yet
B 5 - HubSpot and Motion AI Chatbot Enabled CRM
2 pages
Data Extracted For XAI in CyberSec Paper
No ratings yet
Data Extracted For XAI in CyberSec Paper
354 pages
Free HAL
No ratings yet
Free HAL
2 pages
Chatbot Evolution: Jabberwacky to Cleverbot
No ratings yet
Chatbot Evolution: Jabberwacky to Cleverbot
2 pages
Data Applied
No ratings yet
Data Applied
1 page
Philosophical Debate on AI Minds
No ratings yet
Philosophical Debate on AI Minds
28 pages
Dynamic Planning With A LLM
No ratings yet
Dynamic Planning With A LLM
9 pages
CBSE English 1 1 1 Question Paper With Solution
No ratings yet
CBSE English 1 1 1 Question Paper With Solution
44 pages
Shield Intro For Deloitte UK
No ratings yet
Shield Intro For Deloitte UK
19 pages
Stockfish (Chess)
No ratings yet
Stockfish (Chess)
28 pages
q3 KPMG Insurance Top of Mind
No ratings yet
q3 KPMG Insurance Top of Mind
4 pages
List of Artificial Intelligence Projects
No ratings yet
List of Artificial Intelligence Projects
12 pages
Comparison of Deep Learning Software - Wikipedia
No ratings yet
Comparison of Deep Learning Software - Wikipedia
4 pages
AIML: A Guide for AI Developers
No ratings yet
AIML: A Guide for AI Developers
4 pages
Minds DB
No ratings yet
Minds DB
4 pages
Synthetic Environment For Analysis and Simulations
No ratings yet
Synthetic Environment For Analysis and Simulations
3 pages
CMU Sphinx
No ratings yet
CMU Sphinx
3 pages
Speech Emotion Recognition Insights
No ratings yet
Speech Emotion Recognition Insights
4 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
4 pages
Ethical and Psychological Implications of Human
No ratings yet
Ethical and Psychological Implications of Human
1 page
(Ce351 Ai4e) A-01 (A, B)
No ratings yet
(Ce351 Ai4e) A-01 (A, B)
2 pages