-
-
Save NetHole/0db8325d343ed77eb9d08813920fda8f to your computer and use it in GitHub Desktop.
Revisions
-
harubaru revised this gist
Oct 9, 2022 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -35,7 +35,7 @@ The finetuning had been conducted with a fork from the original [Stable Diffusio For finetuning, the base model [Stable Diffusion 1.4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) had been finetuned on 680k text-image samples for **10 epochs** at a flat learning rate of ``5e-6``. The hardware used was a GPU VM instance which had the following specs: - 8x 48GB A40 GPUs - 24 AMD Epyc Milan vCPU cores - 192GB of RAM - 250GB Storage -
harubaru revised this gist
Oct 8, 2022 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,9 +17,9 @@ The **Waifu Diffusion 1.3 model** is a [Stable Diffusion model](https://github.c The data used for finetuning **Waifu Diffusion 1.3** was **680k text-image samples** that had been downloaded through a booru site that provides high-quality tagging and original sources to the artworks themselves that are uploaded to the site. I also want to personally thank them as well, as without their hardwork the generative quality from this model would not have been feasible without going to financially extreme lengths to acquiring the data to use for training. The Booru in question would also like to remain anonymous due to the current climate regarding AI generated imagery. Within the [**HuggingFace Waifu Diffusion 1.3 Repository**](https://huggingface.co/hakurei/waifu-diffusion-v1-3) are 4 final models: - [Float 16 EMA Pruned](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-float16.ckpt): This is the smallest available form for the model at 2GB. This model is to be used for **inference purposes only.** - [Float 32 EMA Pruned](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-float32.ckpt): The float32 weights are the second smallest available form of the model at 4GB. This is to be used for **inference purposes only.** - [Float 32 Full Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-full.ckpt): The full weights contain the EMA weights which are not used during inference. These can be used for either training or inference. - [Float 32 Full Weights + Optimizer Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-full-opt.ckpt): The optimizer weights contain all of the optimizer states used during training. It is 14GB large and there is no quality difference between this model and the others as this model is to be used for **training purposes only.** Various modifications to the data had been made since the **Waifu Diffusion 1.2** model which included: -
harubaru revised this gist
Oct 8, 2022 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,6 +8,7 @@ HuggingFace Page for model download: https://huggingface.co/hakurei/waifu-diffus - [Prompting](#prompting) - [License](#license) - [Sample Generations](#sample-generations) - [Team Members and Acknowledgements](#team-members-and-acknowledgements) ## Model Overview @@ -97,7 +98,7 @@ The **Waifu Diffusion 1.3** Weights have been released under the [CreativeML Ope ## Team Members and Acknowledgements This project would not have been possible without the incredible work by the [CompVis Researchers](https://ommer-lab.com/). I would also like to personally thank everyone for their generous support in our Discord server! Thank you guys. - [Anthony Mercurio](https://github.com/harubaru) - [Salt](https://github.com/sALTaccount/) -
harubaru revised this gist
Oct 8, 2022 . 1 changed file with 18 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,7 +3,11 @@ HuggingFace Page for model download: https://huggingface.co/hakurei/waifu-diffusion-v1-3 #### Table of Contents - [Model Overview](#model-overview) - [Training Process](#training-process) - [Prompting](#prompting) - [License](#license) - [Sample Generations](#sample-generations) ## Model Overview @@ -69,7 +73,7 @@ There, that's more like it. The **Waifu Diffusion 1.3** Weights have been released under the [CreativeML Open RAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license) License. ## Sample Generations  @@ -90,3 +94,15 @@ The **Waifu Diffusion 1.3** Weights have been released under the [CreativeML Ope  <sub>chen, arknights, 1girl, animal ears, brown hair, cat ears, cat tail, closed mouth, earrings, face, hat, jewelry, lips, multiple tails, nekomata, painterly, red eyes, short hair, simple background, solo, tail, white background</sub> ## Team Members and Acknowledgements This project would not have been possible without the incredible work by the [CompVis Researchers](https://ommer-lab.com/). - [Anthony Mercurio](https://github.com/harubaru) - [Salt](https://github.com/sALTaccount/) - [Cafe](https://twitter.com/cafeai_labs) In order to reach us, you can join our [Discord server](https://discord.gg/touhouai). [](https://discord.gg/touhouai) -
harubaru revised this gist
Oct 8, 2022 . 1 changed file with 26 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,6 +2,9 @@ HuggingFace Page for model download: https://huggingface.co/hakurei/waifu-diffusion-v1-3 #### Table of Contents [Model Overview](#modeloverview) ## Model Overview The **Waifu Diffusion 1.3 model** is a [Stable Diffusion model](https://github.com/compVis/stable-diffusion) that has been finetuned from [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original). I would like to personally thank everyone that had been involved with the development and release of Stable Diffusion, as all of this work for Waifu Diffusion would not have been possible without their original codebase and pre-existing model weights from which Waifu Diffusion was finetuned from. @@ -46,13 +49,13 @@ To generate an image of a girl wearing a hoodie in the rain using **Waifu Diffus ``original, 1girl, solo, portrait, hoodie, wearing hoodie``  Looks boring, right? Thankfully, with Booru tags, all you have to do to make the output look more detailed is to simply add more tags. Let's say you want to add rain and backlighting, then you would have a prompt that looks like: ``original, 1girl, solo, portrait, hoodie, wearing hoodie, red hoodie, long sleeves, simple background, backlighting, rain, night, depth of field`` <img src=https://user-images.githubusercontent.com/26317155/194689897-371fe9d3-7959-4a2a-be68-0aa19b66595f.png width=512 height=512> There, that's more like it. @@ -62,10 +65,28 @@ There, that's more like it. - **Be specific.** Use specific tags. The model cannot assume what you want, so you have to be specific. - **Do not use underscores.** They're deprecated since Waifu Diffusion 1.2. ## License The **Waifu Diffusion 1.3** Weights have been released under the [CreativeML Open RAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license) License. ## Non-Cherrypicked Generations  <sub>1girl, black eyes, black hair, black sweater, blue background, bob cut, closed mouth, glasses, medium hair, red-framed eyewear, simple background, solo, sweater, upper body, wide-eyed</sub>  <sub>1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt</sub>  <sub>1girl, black bra, black hair, black panties, blush, borrowed character, bra, breasts, cleavage, closed mouth, gradient hair, hair bun, heart, large breasts, lips, looking at viewer, multicolored hair, navel, panties, pointy ears, red hair, short hair, sweat, underwear</sub>  <sub>yakumo ran, arknights, 1girl, :d, animal ears, blonde hair, breasts, cowboy shot, extra ears, fox ears, fox shadow puppet, fox tail, head tilt, large breasts, looking at viewer, multiple tails, no headwear, short hair, simple background, smile, solo, tabard, tail, white background, yellow eyes</sub>  <sub>chen, arknights, 1girl, animal ears, brown hair, cat ears, cat tail, closed mouth, earrings, face, hat, jewelry, lips, multiple tails, nekomata, painterly, red eyes, short hair, simple background, solo, tail, white background</sub> -
harubaru created this gist
Oct 8, 2022 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,71 @@ # Waifu Diffusion 1.3 Release Notes HuggingFace Page for model download: https://huggingface.co/hakurei/waifu-diffusion-v1-3 ## Model Overview The **Waifu Diffusion 1.3 model** is a [Stable Diffusion model](https://github.com/compVis/stable-diffusion) that has been finetuned from [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original). I would like to personally thank everyone that had been involved with the development and release of Stable Diffusion, as all of this work for Waifu Diffusion would not have been possible without their original codebase and pre-existing model weights from which Waifu Diffusion was finetuned from. The data used for finetuning **Waifu Diffusion 1.3** was **680k text-image samples** that had been downloaded through a booru site that provides high-quality tagging and original sources to the artworks themselves that are uploaded to the site. I also want to personally thank them as well, as without their hardwork the generative quality from this model would not have been feasible without going to financially extreme lengths to acquiring the data to use for training. The Booru in question would also like to remain anonymous due to the current climate regarding AI generated imagery. Within the [**HuggingFace Waifu Diffusion 1.3 Repository**](https://huggingface.co/hakurei/waifu-diffusion-v1-3) are 4 final models: - [Float 16 EMA Pruned](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-float16.ckpt): This is the smallest available form for the model. There is a noticeable decrease in generative quality compared to the float32 model, but this model allows users with low VRAM or low-capability computers to inference from the model. - [Float 32 EMA Pruned](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-float32.ckpt): The float32 weights are the second smallest available form of the model, it offers a slight quality increase compared to the float16 model. This is to be used for inference purposes only. - [Float 32 Full Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-full.ckpt): The full weights contain the EMA weights which are not used during inference. - [Float 32 Full Weights + Optimizer Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3/blob/main/wd-v1-3-full-opt.ckpt): The optimizer weights contain all of the optimizer states used during training. It is 14GB large and there is no quality difference between this model and the others as this model is to be used for **training purposes only.** Various modifications to the data had been made since the **Waifu Diffusion 1.2** model which included: - Removing underscores. - Removing parenthesis. - Separating each booru tag with a comma. - Randomizing tag order. ## Training Process The finetuning had been conducted with a fork from the original [Stable Diffusion](https://github.com/compVis/stable-diffusion) codebase, which is known as [Waifu Diffusion](https://github.com/harubaru/waifu-diffusion). The differences between the main repo and the fork is that the fork includes fixes to the original training code as well as a custom dataloader to train on text-image pairs that are stored locally. For finetuning, the base model [Stable Diffusion 1.4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) had been finetuned on 680k text-image samples for **10 epochs** at a flat learning rate of ``5e-6``. The hardware used was a GPU VM instance which had the following specs: - 4x 48GB A40 GPUs - 24 AMD Epyc Milan vCPU cores - 192GB of RAM - 250GB Storage Training had taken approximately 10 days to finish and roughly around $3.1k had been spent on compute costs. ## Prompting As a result of the removal of underscores and including random tag order, it is now much more easier to prompt with the **Waifu Diffusion 1.3** model compared to the predecessor 1.2 version. So now, how do you generate something? Let's say you have a regular prompt like this: ``a girl wearing a hoodie in the rain`` To generate an image of a girl wearing a hoodie in the rain using **Waifu Diffusion 1.3**, you would prompt it out with booru tags that would look like: ``original, 1girl, solo, portrait, hoodie, wearing hoodie``  Looks boring, right? Thankfully, with Booru tags, all you have to do to make the output look more detailed is to simply add more tags. Let's say you want to add rain and backlighting, then you would have a prompt that looks like: ``original, 1girl, solo, portrait, hoodie, wearing hoodie, red hoodie, long sleeves, simple background, backlighting, rain, night, depth of field``  There, that's more like it. **Overall, to get better outputs you have to:** - **Add more tags** to your prompt, especially compositional tags. [A full list can be found here.](https://danbooru.donmai.us/wiki_pages/tag_group:image_composition) - **Include a copyright tag** that is associated with high quality art like ``genshin impact`` or ``arknights``. - **Be specific.** Use specific tags. The model cannot assume what you want, so you have to be specific. - **Do not use underscores.** They're deprecated since Waifu Diffusion 1.2. ## Disclaimer & Copyright <img src=https://user-images.githubusercontent.com/26317155/193715317-a64207a3-6b3d-4a7c-986b-d1dea1a832d6.png width=40% height=40%> <sub>An image generated at resolution 512x512 then upscaled to 1024x1024 with Waifu Diffusion 1.3 Epoch 7, demonstrating iterative refinement on higher resolutions.</sub>