There was a paper recently where a research team trained a machine learning algorithm (a GAN they called AttnGAN) to generate pictures based on written descriptions. It’s like Visual Chatbot in reverse. When it was just trained to generate pictures of birds, it did pretty well, actually.
(Although the description didn’t specify a beak and so it just… left it out.)
But when they trained the same algorithm on a huge and highly varied dataset, it had a lot more trouble generating a picture to go with that caption. Below, I give the same caption to a version of their algorithm that has been trained to generate everything from sheep to shopping centers. Cris Valenzuela wrapped their trained model in an entertaining demo that attempts to generate a picture for any caption.
This bird is less, um, recognizable. When the GAN has to draw *anything* I ask for, there’s just too much to keep track of - the problem’s too broad, and the algorithm spreads itself too thin. It doesn’t just have trouble with birds. A GAN that’s been trained just on celebrity faces will tend to produce photorealistic portraits. But this one, however…
In fact, it does a horrifying job with humans because it can never quite seem to get the number of orifices correct.
It’s fun to ask it to draw animals though. It knows the texture of giraffes, but not quite exactly their shape. And it knows that boats are on the water, but not necessarily that they are boats.
It also (like many other image recognition algorithms) gets a bit confused about the difference between sheep and the landscapes they’re found on. Other algorithms recognize sheep in pictures of empty green fields. And this one, when asked to draw sheep…
That’s different, though, from asking it to draw *a* sheep. In that case, it knows exactly what to do. It draws the sheep, and then just to be safe it fills the entire planet with wool too.
It really likes drawing stop signs and clocks. Give it the slightest opportunity to draw one, and it will chuck those things all over the place.
Other than its horrifying humans, this algorithm can actually be pretty delightful.