Talk:Stable Diffusion: Difference between revisions

Content deleted Content added

Inline

Revision as of 16:06, 6 October 2022

This article is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computing: Software / CompSci C‑class Low‑importance

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles

C

This article has been rated as C-class on Wikipedia's content assessment scale.

Niedrig

This article has been rated as Low-importance on the project's importance scale.

This article is supported by WikiProject Software (assessed as Mid-importance).

This article is supported by WikiProject Computer science (assessed as Low-importance).

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Technology C‑class

	Technology portal This article is within the scope of WikiProject Technology, a collaborative effort to improve the coverage of technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TechnologyWikipedia:WikiProject TechnologyTemplate:WikiProject TechnologyTechnology articles
C	This article has been rated as C-class on Wikipedia's content assessment scale.

Not Open Source

The license has usage restrictions, and therefore does not meet the Open Source Definition (OSD):

https://opensource.org/faq#restrict

https://stability.ai/blog/stable-diffusion-public-release

Nor is the "Creative ML OpenRAIL-M" license OSI-approved:

https://opensource.org/licenses/alphabetical

It would be correct to refer to it as "source available" or perhaps "ethical source", but it certainly isn't Open Source.

Gladrim (talk) 12:40, 7 September 2022 (UTC)[reply]

This is my understanding as well, and I thought about editing this article to reflect this. However I'm not sure how to do this in a way that is compliant with WP:NOR, as the Stability press release clearly states that the model is open source and I have been unable to find a WP:RS that clearly contradicts that specific claim. The obvious solution is to say "Stability claims it is open source" but even that doesn't seem appropriate given the lack of sourcing saying anything else (after all, the explicit purpose of that language is to cast implicit doubt on the claim). I have a relatively weak understanding of Wikipedia policy and would be more than happy if someone can point to evidence that correcting this claim would be consistent with Wikipedia policy, but at the current moment I don't see a way to justify it.

It's also worth noting that the OSI-approved list hasn't been updated since Stable Diffusion came out, and SD is the first model to be released with this license as far as I can tell. Thus the lack of endorsement is not evidence of non-endorsement. Perhaps we could say "Stability claims it is open source, though OSI has not commented on the novel license" (this is poorly worded but you get my point)

Stellaathena (talk) 17:41, 7 September 2022 (UTC)[reply]

According to the license which is adapted from the Open RAIL-M(Responsible AI Licenses) which the 'M' means the usage restrictions only applies to the published Model or derivative of the Model, not source code.

Open RAIL has various types of licenses available: RAIL-D(Use restriction applies only to the Data), RAIL-A(Use restriction applies only to the application/executable), RAIL-m(Use restriction applies only to the Model), RAIL-S(Use restriction applies only to the Source code) and it can combined in D-A-M-S order e.g. RAIL-DAMS, RAIL-MS, RAIL-AM

The term 'Open' can be added to the licenses to clarify the license is royalty-free and the works/subsequent derivative works can be re-licensed 'as long as the Use Restrictions similarly apply to the relicensed artifacts'

"

Open RAIL Licenses

Does a RAIL License include open-access/free-use terms, akin to what is used with open source software?

If it does, it would be helpful for the community to know upfront that the license promotes free use and re-distribution of the applicable artifact, albeit subject to Use Restrictions. We suggest the use of the prefix "Open" to each RAIL license to clarify, on its face, that the licensor offers the licensed artifact at no charge and allows licensees to re-license such artifact or any subsequent derivative works as they choose, as long as the Use Restrictions similarly apply to the relicensed artifacts and its subsequent derivatives. A RAIL license that does not offer the artifact royalty-free and/or does not permit downstream licensing of the artifact or derivative versions of it in any form would not use the “Open” prefix." source

so technically the source code is 'Open Source'

Maybe a useful link:

https://huggingface.co/blog/open_rail

https://www.licenses.ai/ai-licenses

https://www.licenses.ai/blog/2022/8/26/bigscience-open-rail-m-license Dorakuthelekor (talk) 23:04, 17 September 2022 (UTC)[reply]

It is definitely not open source, and to describe it that way is misleading. Ichiji (talk) 15:56, 3 October 2022 (UTC)[reply]

Gallery of examples?

The Japanese language page has a gallery of various examples that Stable Diffusion can create, perhaps we should do the same to showcase a few examples for people to see. I'd be curious to hear others weigh in. Camdoodlebop (talk) 00:57, 11 September 2022 (UTC)[reply]

Image variety

Benlisquare, I appreciate all the work you've done expanding this article, including the addition of images, but I think the article would be improved if we could get a greater variety of subject matter in the examples. To be honest, I think any amount of "cute anime girl with eye-popping cleavage" content has the potential to raise the hackles of readers who are sensitive to the well-known biases of Wikipedia's editorship, so it might be better to avoid that minefield altogether. At the very least though, we should strive for variety.

I was thinking about maybe replacing the inpainting example with figure 12 from the latent diffusion paper, but that's not totally ideal since it's technically not the output of Stable Diffusion itself (but rather a model trained by LMU researchers under very similar conditions, though I think with slightly fewer parameters). Colin M (talk) 21:49, 28 September 2022 (UTC)[reply]

My rationale for leaving the examples as-is is threefold:

Firstly, based on my completely anecdotal and non-scientific experimentation from generating over 9,500+ images (approx. 11GB+) of images using SD at least, non-photorealistic images play best with the ability for img2img to upscale images and fill in tiny, closer details without the final result appearing too uncanny for the human eye, which is why I opted for working with generating a non-photorealistic image of a person for my inpainting/outpainting example. Sure, we theoretically could leave all our demonstration examples as 512x512 images (akin to how the majority of example images throughout that paper were small squares), but my spicy and highly subjective take on this is, why not strive for better? If we can generate high detail, high resolution images, then I may as well should. The technology exists, the software exists, the means to go above and beyond exists. At least, that's how I feel.
Specifically regarding figure 12 from that paper, it makes no mention as to whether or not the original inpainted images are generated through txt2img which were then inpainted using img2img, or whether they used img2img to inpaint an existing real-world photograph. If it is the latter, then we'd run into issues concerning Commons:Derivative works. At least with all of the txt2img images that I generate, I can guarantee that there wouldn't be any concern in this area, as long as I don't outright prompt to generate a copyrighted object like the Eiffel Tower or Duke Nukem or something.
Finally, I don't particularly think the systemic bias issue on this page is that severe. Out of the four images currently on this article, we have a photorealistic image of an astronaut, an architectural diagram, and two demonstration images containing artworks featuring non-photorealistic women. From my perspective, I don't think that's at the point of concern. Of course, if you still express concern in spite of my assurances, give me time I could generate another 10+ row array of different txt2img prompts featuring a different subject, but it'll definitely take me quite some time to finetune and perfect to a reasonable standard (given the unpredictability of txt2img outputs). As a sidenote, the original 13-row array I generated was over 300MB+ with dimensions of 14336 x 26624 pixels, and the filesize limit for uploading to Commons was 100MB, hence why I needed to split the image into four parts.

Let me know of your thoughts, @Colin M. Cheers, --benlisquare_T•C•E 03:08, 29 September 2022 (UTC)[reply]

Actually, now that I think about it, would you be keen on a compromise where I generate a fifth image, either containing a landscape, or an object, or a man, to demonstrate how negative prompting works, as a counterbalance to the images already present? The final result would be something like this: Astronaut in the infobox, diagram under "Architecture", the 13-row matrix comparing art styles under "Usage" (right-align), some nature landscape or urban skyline image under "Text to image generation" (left-align), the inpainting/outpainting demonstration under "Inpainting and outpainting" (right-align). I'm open to adjustments if suggested, of course. --benlisquare_T•C•E 03:33, 29 September 2022 (UTC)[reply]

Regarding your point 1:

I don't think we're obliged to carefully curate prompts and outputs that give the nicest possible results. We're trying to document the actual capabilities of the model, not advertise it. Seeing the ways that the model fails to generate photorealistic faces, for example, could be very helpful to the reader's understanding.
Even if we accept the reasoning of your point 1, that's merely an argument for showing examples in a non-photorealistic style. But why specifically non-photorealistic images of sexualized young women? Why not cartoonish images of old women, or sharks, or clocktowers, or literally anything else? It's distracting and borderline WP:GRATUITOUS.

Colin M (talk) 04:19, 29 September 2022 (UTC)[reply]

Creating the inpainting example took me quite a few hours worth of trial-and-error, given that for any satisfactory img2img output obtained one would need to cherrypick through dozens upon dozens of poor quality images with deformities and physical mutations, so I hope you can understand why I might be a bit hesitant with replacing it. Yes, I'm aware that's not a valid argument for keeping or not keeping something, I'm merely providing my viewpoint. As for WP:GRATUITOUS, I don't think that particularly applies, the subject looks like any other youthful woman one would find on the street in inner-city Melbourne during office hours, but I can understand the concern that it may reflect poorly on the systemic bias of Wikipedia's editorbase. Hence, my suggested solution to that issue would be to balance it out with more content, since there's always room for prose and image expansion. --benlisquare_T•C•E 06:01, 29 September 2022 (UTC)[reply]

I've gone ahead and added the landscape art demonstration for negative prompting to the article. When generating these, this time I've specifically left in a couple of visual defects (e.g. roof tiles appearing out of nowhere from inside a tree, and strange squiggles appearing on the sides of some columns), because what you mentioned earlier about also showcasing Stable Diffusion's flaws and imperfections does also make sense. There are two potential ways we can layout these, at least with the current amount of text prose we have (which optimistically would increase, one would hope), between this revision and this revision which would seem more preferable? --benlisquare_T•C•E 06:05, 29 September 2022 (UTC)[reply]

+1 on avoiding the exploitive images. The history of AI is rife with them, let's not add to that. Ichiji (talk) 15:59, 3 October 2022 (UTC)[reply]

Using images to promote unsourced opinions

I've removed two different versions of editors trying to use images to promote unsourced legal opinions and other viewpoints. Please, just use reliable sources that support that these images illustrate these the opinions, if those sources exist. You can't just place an image and claim that it illustrates an unsourced opinion. Thanks. Elspea756 (talk) 15:54, 6 October 2022 (UTC)[reply]

"But Stable Diffusion’s lack of safeguards compared to systems like DALL-E 2 poses tricky ethical questions for the AI community. Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms." - TechCrunch. "And three weeks ago, a start-up named Stable AI released a program called Stable Diffusion. The AI image-generator is an open-source program that, unlike some rivals, places few limits on the images people can create, leading critics to say it can be used for scams, political disinformation and privacy violations." - Washington Post. I don't know what additional convincing you need. As for your edit summary of "Removing unsourced claim that it is a "common concern" that this particular image might mislead people to believe this is an actual photograph of Vladimir Putin", nowhere in the caption was that ever mentioned, that's purely your own personal interpretation that completely misses the mark of what the caption meant. --benlisquare_T•C•E 16:02, 6 October 2022 (UTC)[reply]

Thank you for discussing here on the talking page. I see images of Barack Obama and Boris Johnson included in that Tech Crunch article, so those do seem to illustrate the point you are trying to make and are supported by the source you are citing. Can we agree to replace the previously used unsourced image with either that Barack Obama image or series of Boris Johnson images? Elspea756 (talk) 16:06, 6 October 2022 (UTC)[reply]

@@ Line 65: / Line 65: @@
 I've removed two different versions of editors trying to use images to promote unsourced legal opinions and other viewpoints. Please, just use reliable sources that support that these images illustrate these the opinions, if those sources exist. You can't just place an image and claim that it illustrates an unsourced opinion. Thanks. [[User:Elspea756|Elspea756]] ([[User talk:Elspea756|talk]]) 15:54, 6 October 2022 (UTC)
 :''[https://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/ "But Stable Diffusion’s lack of safeguards compared to systems like DALL-E 2 poses tricky ethical questions for the AI community. Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms."]'' - [[TechCrunch]]. ''[https://www.washingtonpost.com/technology/2022/08/30/deep-fake-video-on-agt/ "And three weeks ago, a start-up named Stable AI released a program called Stable Diffusion. The AI image-generator is an open-source program that, unlike some rivals, places few limits on the images people can create, leading critics to say it can be used for scams, political disinformation and privacy violations."]'' - [[Washington Post]]. I don't know what additional convincing you need. As for your edit summary of ''"Removing unsourced claim that it is a "common concern" that this particular image might mislead people to believe this is an actual photograph of Vladimir Putin"'', nowhere in the caption was that ever mentioned, that's purely your own personal interpretation that completely misses the mark of what the caption meant. --[[User:benlisquare|<span style="font-family:Monospace;padding:1px;color:orange">'''benlisquare'''</span>]]<sub>[[User talk:benlisquare|T]]•[[Special:Contributions/Benlisquare|C]]•[[Special:EmailUser/User:Benlisquare|E]]</sub> 16:02, 6 October 2022 (UTC)
+::Thank you for discussing here on the talking page. I see images of Barack Obama and Boris Johnson included in that Tech Crunch article, so those do seem to illustrate the point you are trying to make and are supported by the source you are citing. Can we agree to replace the previously used unsourced image with either that Barack Obama image or series of Boris Johnson images? [[User:Elspea756|Elspea756]] ([[User talk:Elspea756|talk]]) 16:06, 6 October 2022 (UTC)