Deep Learning Identify and Generate Art as if by Artist
The latest and greatest AI content generation trend is AI generated art. In January 2021, OpenAI demoed DALL-East, a GPT-3 variant which creates images instead of text. Withal, it can create images in response to a text prompt, assuasive for some very fun output.
However the generated images are not always coherent, so OpenAI besides demoed CLIP, which can exist used to interpret an prototype into text and therefore place which generated images were really avocado armchairs. CLIP was so open-sourced, although DALL-E was not.
Since CLIP is essentially an interface betwixt representations of text and image data, clever hacking can permit anyone to create their ain pseudo-DALL-East. The outset implementation was Large Slumber past Ryan Murdock/@advadnoun, which combined Clip with an image generating GAN named BigGAN. Then open source worked its magic: the GAN base of operations was changed to VQGAN, a newer model architecture Patrick Esser and Robin Rombach and Björn Ommer which allows more coherent image generation. The core CLIP-guided training was improved and translated to a Colab Notebook by Katherine Crawson/@RiversHaveWings and others in a special Discord server. Twitter accounts like @images_ai and @ai_curio which leverage VQGAN + CLIP with user-submitted prompts take gone viral and received mainstream press. @ak92501 created a fork of that Notebook which has a convenient UI, to which I became aware of how far AI image generation has developed in a few months.
From that, I forked my ain Colab Notebook, and streamlined the UI a bit to minimize the number of clicks needs to get-go generating and make it more mobile-friendly.
The VQGAN + CLIP technology is now in a good state such that information technology can exist used for more serious experimentation. Some say art is better when there's mystery, only my view is that knowing how AI art is fabricated is the key to making fifty-fifty better AI fine art.
A Hello World to AI Generated Art
All AI-generated image examples in this web log post are generated using this Colab Notebook, with the captions indicating the text prompt and other relevant deviations from the default inputs to reproduce the image.
Allow's spring right into it with something fantastical: how well can AI generate a cyberpunk woods?
cyberpunk forest
The TL;DR of how VQGAN + CLIP works is that VQGAN generates an image, Prune scores the image according to how well information technology can discover the input prompt, and VQGAN uses that information to iteratively improve its image generation. Lj Miranda has a skillful detailed technical writeup.
Now let's do the same prompt as before, but with an added author from a time well before the cyberpunk genre existed and see if the AI tin follow their style. Let's try Salvador Dali.
cyberpunk wood by Salvador Dali
It'southward definitely a cyberpunk woods, and it's definitely Dali'due south style.
One trick the community institute to better generated prototype quality is to merely add phrases that tell the AI to brand a good image, such as artstationHQ
or trending on /r/art
. Trying that hither:
cyberpunk wood by Salvador Dali artstationHQ
In this instance, it's unclear if the artstationHQ
part of the prompt gets higher priority than the Salvador Dali
part. Another fob that VQGAN + Clip can do is take multiple input text prompts, which can add more command. Additionally, y'all tin can assign weights to these different prompts. So if we did cyberpunk wood by Salvador Dali:3 | artstationHQ
, the model will attempt three times as hard to ensure that the prompt follows a Dali painting than artstationHQ
.
cyberpunk forest past Salvador Dali:iii | artstationHQ
Much amend! Lastly, we tin can use negative weights for prompts such that the model targets the opposite of that prompt. Let'south do the opposite of green and white
to see if the AI tries to remove those 2 colors from the palette and maybe make the final image more cyberpunky.
cyberpunk wood by Salvador Dali:3 | artstationHQ | greenish and white:-1
Now we're getting to video game concept art quality generation. Indeed, VQGAN + CLIP rewards the use of clever input prompt engineering.
Initial Images and Way Transfer
Normally with VQGAN + CLIP, the generation starts from a blank slate. However, y'all tin optionally provide an prototype to get-go from instead. This provides both a adept base for generation and speeds information technology up since it doesn't have to learn from empty noise. I usually recommend a lower learning rate every bit a outcome.
So let'southward try an initial image of myself, naturally.
Let'southward endeavour another artist, such as Junji Ito who has a very distinctive horror manner of art.
a black and white portrait by Junji Ito
— initial image in a higher place, learning rate = 0.aneOne of the earliest promising use cases of AI Image Generation was neural manner transfer, where an AI could take the "manner" of one paradigm and transpose it to another. Can information technology follow the manner of a specific painting, such as Starry Nighttime past Vincent Van Gogh?
Starry Night by Vincent Van Gogh
— initial image above, learning rate = 0.aneWell, it got the colors and style, but the AI appears to take taken the "Van Gogh" office literally and gave me a nice beard.
Of course, with the power of AI, yous can do both prompts at the aforementioned time for maximum anarchy.
Starry Dark by Vincent Van Gogh | a black and white portrait by Junji Ito
— initial prototype above, learning rate = 0.1Icons and Generating Images With A Specific Shape
While I was commencement experimenting with VQGAN + CLIP, I saw an interesting tweet by AI researcher Mark Riedl:
— Marker Loki Variant Riedl (@mark_riedl) July 31, 2021
Intrigued, I adapted some icon generation lawmaking I had handy from another project and created icon-image, a Python tool to programmatically generate an icon using Font Awesome icons and paste it onto a noisy background.
This icon can be used as an initial epitome, every bit above. Adjusting the text prompt to accomidate the icon tin can upshot in very cool images, such as a black and white evil robot past Junji Ito
.
a black and white evil robot by Junji Ito
— initial image higher up, learning rate = 0.1The groundwork and icon racket is the central, equally AI can shape information technology much better than solid colors. Omitting the racket results in a more deadening paradigm that doesn't reverberate the prompt as well, although it has its own manner.
a black and white evil robot by Junji Ito
— initial image above except i.0 icon opacity and 0.0 groundwork noice opacity, learning rate = 0.iSome other fun prompt addition is rendered in unreal engine
(with an optional loftier quality
), which instructs the AI to create a iii-dimensional image and works peculiarly well with icons.
smiling rusted robot rendered in unreal engine high quality
— icon initial paradigm, learning rate = 0.1icon-paradigm can also generate brand images, such every bit the Twitter logo, which tin can be good for comedy, especially if you tweak the logo/groundwork colors as well. What if nosotros turn the Twitter logo into Mordor, which is an fair metaphor?
Mordor
— fab fa-twitter
icon, icon initial image, black icon background, ruby icon, learning rate = 0.oneThen that didn't turn out well every bit the Twitter logo got overpowered by the prompt (you can meet outlines of the logo's lesser). However, there's a trick to forcefulness the AI to respect the logo: gear up the icon as the initial prototype and the target image, and apply a loftier weight to the prompt (the weight can be lowered iteratively to preserve the logo improve).
Mordor:3
— fab fa-twitter
icon, icon initial image, icon target epitome, black icon groundwork, red icon, learning rate = 0.1More Fun Examples
Here's a few more than good demos of what VQGAN + CLIP tin can do using the ideas and tricks above:
Microsoft Excel past Junji Ito
— 500 steps
a portrait of Mark Zuckerberg:2 | a portrait of a canteen of Sweet Babe Ray's barbecue sauce
— 500 steps
Never gonna give you up, Never gonna permit you down
— 500 steps
a portrait of cyberpunk Elon Musk:2 | a human:-1
— 500 steps
hamburger of the Onetime Gods:5
— fas fa-hamburger
icon, icon initial paradigm, icon target image, black icon background, white icon, learning charge per unit = 0.1, 500 steps
reality is an illusion:8
— fas fa-centre
icon, icon initial paradigm, icon target image, blackness icon background, white icon, learning charge per unit = 0.i@kingdomakrillic released an album with many more than examples of prompt augmentations and their results.
Making Coin Off of VQGAN + CLIP
Can these AI generated images be commercialized as software-as-a-service? Information technology'southward unclear. In contrast to StyleGAN2 images (where the license is explicitly noncommercial), all aspects of the VQGAN + CLIP pipeline are MIT Licensed which does back up commericalization. Withal, the ImageNet 16384 VQGAN used in this Colab Notebook and many other VQGAN+Clip Notebooks was trained on ImageNet, which has famously complicated licensing, and whether finetuning the VQGAN counts equally sufficiently detached from an IP perspective hasn't been legally tested to my knowledge. There are other VQGANs available such as ones trained on the Open up Images Dataset or COCO, both of which have commercial-friendly CC-Past-iv.0 licenses, although in my testing they had substantially lower image generation quality.
Granted, the biggest blocker to making money off of VQGAN + Prune in a scalable way is generation speed; different most commercial AI models which use inference and can therefore exist optimized to drastically increment performance, VQGAN + Clip requires preparation, which is much slower and can't let content generation in real time like GPT-three. Even with expensive GPUs and generating at pocket-sized images sizes, preparation takes a couple minutes at minimum, which correlates with a higher price-per-image and annoyed users. It'due south still cheaper per image than what OpenAI charges for their GPT-iii API, though, and many startups accept built on that successfuly.
Of form, if you merely want brand NFTs from manual usage of VQGAN + Clip, become ahead.
The Side by side Steps for AI Image Generation
CLIP itself is just the first practical iteration of translating text-to-images, and I suspect this won't be the last implementation of such a model (OpenAI may pull a GPT-3 and not open up-source the inevitable CLIP-2 since now there's a proven monetizeable use instance).
Yet, the AI Art Generation industry is developing at a tape pace, specially on the epitome-generating function of the equation. But the day before this commodity was posted, Katherine Crawson released a Colab Notebook for Prune with Guided Improvidence, which generates more than realistic images (albeit less fantastical), and Tom White released a pixel art generating Notebook which doesn't use a VQGAN variant.
The possibilities with just VQGAN + Clip alone are countless.
If you lot liked this post, I take ready a Patreon to fund my motorcar learning/deep learning/software/hardware needs for my hereafter crazy yet absurd projects, and any monetary contributions to the Patreon are appreciated and will be put to good creative utilize.
Source: https://minimaxir.com/2021/08/vqgan-clip/
0 Response to "Deep Learning Identify and Generate Art as if by Artist"
Post a Comment