The art of asking nicely
5 min read

The art of asking nicely

The art of asking nicely

There are upsides to working with a neural net that trained on a huge collection of internet images and text. One is that, instead of ominous grey geometric blobs when it doesn't understand your prompt (there is a free interactive demo of AttnGAN here and it is a lot of fun), a huge neural net like CLIP can follow a wide array of prompts. Since CLIP technically only judges how well a picture matches a prompt, people have developed a few ways of using CLIP's judgements to aim an image generator. I've used this to generate such useful things as Frodo Baggins delivering pizza through the Mines of Moria, cursed candy hearts, fully illustrated sea shanties, and more fully illustrated sea shanties.

I decided to give the VQGAN + CLIP method of steering a try (tutorial and link here), and for its first task I decided to have it generate a subject that's given neural nets trouble in the past: "a herd of sheep grazing on a lush green hillside".

Here's what it generated after 450 iterations:

A mossy hillocky slope with weird leafy cracks and flaps. The sheep are very molar-like in color and texture.
Prompt: A herd of sheep grazing on a lush green hillside

It could be better. The sheep are not so much grazing as embedded like weird molars, and the hillside isn't very picturesque.

But this version of VQGAN+CLIP allows me to upload an image as a starting point. So I decided to start with an image that a different neural net had captioned as "a herd of sheep grazing on a lush green hillside". In fact, Azure image description had still called it "a herd of sheep grazing on a lush green hillside" even after I had removed all the sheep. With the image on the right as a starting point, would VQGAN simply add the sheep?

Left: a landscape on the Isle of Skye, with sheep grazing in the foreground and midground. Right: the same landscape, but without sheep.

The answer is: sorta?

A flatly illuminated green clifftop with low shrubs and palm trees. The sheep are either white-green car-sized lumps or smaller popcorn huddles.
A herd of sheep grazing on a lush green hillside

Can't really tell whether those are sheep or cauliflower. And did it add palm trees? The original image's composition is destroyed and what is left looks pretty flat.

Here is where it gets interesting. If I use the same prompt and add "Amazing awesome and epic", the picture gets noticeably better. "Oh," goes the neural net, "you wanted a GOOD picture".

Deeply eroded green cliffs rise above some grass-topped mesas. The sheep are oversaturated but flowing down a tree-lined stream.
A herd of sheep grazing on a lush green hillside | amazing awesome and epic

And how good a picture you get depends on exactly how you ask for it. There are several phrases you can add that seem to make things better, like "trending on artstation" or "unreal engine" (a fancy new video game rendering engine).

Here's "a painting of a herd of sheep grazing on a lush green hillside in the style of disney trending on artstation | unreal engine" (prompt combo borrowed from here).

Steep cliffs encircle some grassy mesas, with clumps of grass in the foreground. Numerous lumps might be sheep, except they're purple and turquoise. Sharp focus and vignetting give the picture a great feeling of depth.
a painting of a herd of sheep grazing on a lush green hillside in the style of disney trending on artstation | unreal engine

Granted the sheep are more like multicolored bundles of cloth, but the saturation and vignetting got much more dramatic. There's even a soft focus effect in the background. All the time it was giving me the flat, lackluster landscape of the first picture, the AI was perfectly capable of giving me this instead.

So what are some other ways of asking CLIP-VQGAN to try harder?

The results from "Fine art print" reminds me of unicorn calendars from the 80s or something. The sheep are sort of bubbling out from the hillside.

There's an oversaturated pastel look to the landscape, with everything in soft focus and brushstrokes clearly visible. The sheep are indistinct white lumps sheltering beneath much larger white folds of the landscape.
a herd of sheep grazing on a lush green hillside | fine art print

I tried to specify particular artists. Bob Ross was a hilarious mistake.

Steep domed mountains rise from among lakes and happy little trees. The white sheep lumps are there, but they're all wearing brown afros.
a herd of sheep grazing on a lush green hillside by bob ross

The Tim Burton version was very cool-looking, if completely unrecognizable as sheep.

Skeletal trees, dark picket fences, and swirling, brooding mist. There's some stringy striped beings standing on floating ribbons of grass.
a herd of sheep grazing on a lush green hillside by tim burton

"Award winning National Geographic Photography" gave me nice looking background cliffs and trees - and sheep that look disturbingly like people crawling around under green blankets.

Near photorealistic red cliffs and jungle textures in the background, growing more vague in the foreground. Green lumps look like hunched-over humans with legs and arms protruding.
a herd of sheep grazing on a lush green hillside | award winning national geographic photography

But the most effective prompt? In terms of producing a realistic but dramatically lit landscape with recognizable mountains and hills and (okay not sheep)?

You're going to hate it. I hate it, and I'm the one who thought of it. But it's the natural extension of layering on descriptors to try to boost performance.

"dramatic atmospheric ultra high definition free desktop wallpaper"

Clumps of mist scatter a mountainous and dramatically lit landscape. Pointed mountains ar picked out by glimmers of sun, which also tods verdant meadows in places. There are white lumps that might be sheep, but they're building-sized and irregulary shaped. One looks like a kaiju armadillo.
a herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper

With that cursed prompt as a base, I could layer more styles on top. I am ashamed to admit that "a herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper | cubist cezanne" looks pretty cool.

Meadows rise in terraces into the misty distance, sharply outlined in black. Colors are rich, and sections of the landscape are wooly and white.
a herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper | cubist cezanne

The most haunted prompt turned out to be "a herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper by Lisa Frank". I'm not sure what kind of rainbow apocalypse is happening here, but I wouldn't recommend poking at the violet shimmery patches that are oozing into the lake. Maybe those are the sheep.

A mostly purple sky is shot with lightning and rainbows, which are beginning to devour a gemlike purple mountain range. There's an area of fairly normal patchwork fields, but in the foreground is a rainbow lake whose shores are occupied by field-sized purple tentacled areas, beginning to bleed into the water.
a herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper by lisa frank

For a more technical description of how CLIP steering works, and some gorgeous image examples, check out this blog post by Charlie Snell.

You can try CLIP+VQGAN for free by following the instructions in this tutorial (no coding or Spanish language skills necessary.) Or if you ask nicely, the @images_ai twitter account may try a prompt you request.

I generated way more images than will fit here, so I'll post a bunch more of them as bonus content. Become an AI Weirdness supporter to get bonus content! Or become a free subscriber to get new AI Weirdness posts in your inbox.