Training AIs to Think Like Artists Through Examples
Prompts Tap Into Implicit Knowledge Learned from Images
AI art generators like DALL-E, Midjourney, and Stable Diffusion are powered by a type of machine learning called deep neural networks. Instead of being explicitly programmed with rules and knowledge about art, these AIs are trained through examining massive datasets of images and captions.
For example, DALL-E 2 was trained on over 650 million image-text pairs from the internet. By analyzing these vast datasets, the AI learns relationships between words, concepts, and visual elements. It develops an innate sense of what text descriptions correspond to what types of images.
This means the AI has not been directly taught specific information like “who Frida Kahlo is” or “what a wide-angle lens does.” It picks up on these concepts indirectly through seeing related words and images together frequently in its training data. The AI learns holistically rather than having discrete facts and artistic skills intentionally programmed in.
As a result, even the creators of these systems cannot be entirely sure what knowledge the AI has gleaned or how it will interpret different prompts. Its inner workings are complex and opaque. There is no traditional software manual, just like the human brain learning art intuitively over time rather than reading an instruction manual.
To harness these AIs’ potential, users need to experiment and discover how it responds to different prompts phrased in natural language. Like teaching a child about art by example, the AI learns by seeing prompts paired with outputs during training. The key is prompting the AI with descriptions that tap into what it has learned about relating words to visuals through its data-driven experiences.