I’ve been working with an image-generating algorithm by Vadim Epstein called CLIP+FFT, which uses OpenAI’s CLIP algorithm to judge whether images match a given caption, and an FFT algorithm to come up with new images to present to CLIP. Give it any random phrase, and CLIP+FFT will try its best to come up with a matching image. And now there’s a version that will generate images to go with several phrases in a row and then fuse them into a video.

Here’s the sea shanty The Wellerman, sung by Nathan Evans, Jonny Stewart, and others, and illustrated by CLIP+FFT.

Now, there are several interesting things going on here, once you get past the sheer AI fever dream horror of it. One thing you’ll notice is that I changed some of the lines from the standard lyrics. CLIP+FFT deals with each line independently, so even if we have been talking about a ship and a whale throughout the song, the AI doesn’t know that in “when down on her a right whale bore”, the “her” refers to a ship. I made similar tweaks in one or two places.

There was nothing I could do about the line “One day, when the tonguing is done”. Trying to be more precise about the whaling sense of “tonguing” would, if anything, have made the image more horrifying.

Having none of the “Wellerman is a ship” context, the AI interprets The Wellerman itself as some kind of eldritch oil well drilling supervillain.

Collage of ominous dark helmeted and caped figures (they kind of look like Lord Buckethead), some bright yellow cranes, and some faces that look like middle-aged white men. "Wellerman" is written in several places as a semi-legible logo.

I kind of like what happened to “The winds blew hard, her bow dipped down,” with golden locks of hair and bows everywhere. I mean, I like it in a “oh no this has gone terribly yet fascinatingly wrong” sort of way.

A windswept landscape with tresses of blond hair everywhere, along with what might be ruffled bows.

The image for “We’ll take our leave and go” is also interesting, since it illustrates “leave” in so many ways. Sometimes there are cars and suitcases, or people shaking hands. Interestingly, I see hints of European Union flags and British flags in many of them, signs that during training CLIP was learning to associate “leave” with Brexit.

A variety of luggage-looking things and car-looking things. Yellow stars on blue scatter here and there, along with a definite Union Jack.

The “bully boys” are hilarious, classic glowering expressions and mean-kid haircuts. The AI is not used to the early-1900s meaning of “bully = awesome”

The Bully Boys are all scowling, and some appear to be pursing their lips to blow. They have the classic Draco Malfoy blond haircuts. Scattered throughout the image are little puffs of air, and mostly illegible writing saying BLOW

You’ll notice that many of the frames have text, which I find charming, as if the AI is frowning to itself and muttering “tea. tea. Billy. tea.” or “blow. blow.” The less interpretable the phrase is in image form, the more likely the AI is to use text instead.

Classic sailing ships are interspersed with mugs of tea and pale young faces. Billy O Tea is written in fragmented form here and there.

In fact CLIP treating the word and the object as equivalent has led to an interesting way of fooling its image recognition capabilities:

Attack label: pizza. On the left is an apple which the AI has identified as Granny Smith (a kind of apple). On the right, someone has written the word PIZZA and taped it to the front of the apple. Now the AI identifies it as pizza.

I also had CLIP+FFT illustrate The Twelve Days of Christmas and this is one of my favorite frames from it: Ten Lords A-Leaping

A gothic parliament-like hall with a bunch of whitehaired figures in business suits leaping high into the air, like one of those fun wedding photos.

To see the other illustrated Days of Christmas (including the weirdly human-faced swans), become a supporter of AI Weirdness! Or become a free subscriber to get new AI Weirdness posts in your inbox.

Subscribe now

Subscribe now