What are AI Image Generators?
AI Generated concept art of how AI views and interacts with the natural world
AI image generators are computer algorithms that use machine learning techniques to generate images that resemble real-world images. They can create images from scratch or modify existing images to produce new variations.
Typically, generators are text to image software. There is a radical decrease in time required to generate high quality images. This has allowed for quicker visual studies, experimentation, faster iteration and validation during ideation.
Know your Platforms
AI Generated image of mechanical AI Mind meant to depict how an AI would potentially look like
There are many AI image generators available, each with its own strength and weaknesses and some might be better suited for specific tasks than others. Therefore, you should understand your intentions of what image output you are looking for.
“Am I just playing around and exploring?”
“What will the images be used for?”
“Are there image requirements such as size, dimension or resolutions?”
“Does it matter if the output images are realistic or conceptual?”
There are some AI generators that generate more surreal images while others are more realistic. However, through updates in software versions, certain platforms have revamped prompting and image output. Some experts differ to various versions of the same software to generate different and more desired results. Three main platforms will be discussed here: MidJourney, Stable Diffusion and DALL-E 2.
MidJourney is a platform experts use which allows access to different versions. Versions one through four of this software outputs more surreal or abstract images and leans more conceptual. However, the newer version five outputs more photo realistic images because it uses a different set of syntax and algorithms. Playing around with the different versions demonstrates the point of understanding the different algorithms output can really benefit and add value to your project. It is important to note the newer version prefers more sentenced based prompts, therefore users should really understand what they are looking for and be specific.
An evolution of prompt deliverables through the lense of Midjourney
Stable Diffusion is another well rounded text to image software. What is great about this software is that it can run locally on your computer, or you can use platforms like Dream Studio or Hugging Face. Stable Diffusion offers a few variations of the prompt, this allows for options. Like MidJourney, Stable Diffusion has different model versions. You can specify which version you want to use, this produces alternate results. The style is found to be more realistic and offer stylized subjects. The software also offers the user steps. The more steps you specify, the more detailed your image becomes. In addition, the program offers a CFG scale, this setting specifies how close your prompt is followed. Lastly, you can even upload an image into the software and have it reference it.
DALL-E 2 is created by OpenAI, the same company that built ChatGPT. This updated version of its predecessor, DALL-E, generates 4x better resolution images. It is preferred by experts in both prompt matching, albeit a literal interpretation of natural language processing, and photo realism. This platform offers variations, inpainting and text diffs. As an overview, the variations refer to syntactic and semantic. The program goes beyond text to image, because it leverages CLIP: the AI’s version of a “mental image” called image embedding. Just as a human could be prompted the draw a house with a fence, humans can draw this several different ways using the same prompt. Using this, the AI reads the prompt and outputs images from what it deems essential. DALL-E 2 preserves semantic information and stylistic elements, so you can receive similar images with slight variations that better suit your project. DALL-E 2 can also make edits to existing images via inpainting. The program can adapt to added objects to images and match the style of the context. Lastly, Text Diffs is the ability of interpolation. It can blend two images and offers the breakdown of stages depicting the blend. This can be taken to the next level and create a movie from the feature. You may have seen the history of art roaming around social media.
The evolution of art depicted by AI
Text to image software has revolutionized art, just as the camera did to painters. However, this software still has its limitations and has not yet perfected the space. For example, text, typography, emoji, natural materials, closeups, vintage photography and hybrid architecture are found to be difficult to generate. Another interesting limitation to these programs is generating human hands. They always look alien or are missing fingers.
AI generated image of people shaking hands depicting that there are limitations to AI images
AI images are great to use as background images to set the scene. You want to integrate generated images into already creative process. Establish the story you want to tell and understand which platform can help portray it. These images can help foster and solidify narrative and emotions about the content or project. Setting your intensions early can promote an expert level of efficiency.
Set your intentions
AI generators is the new visual communication tool. It can quickly communicate complex visuals and emotions. It can be an important tool when conceptualizing a vision within the ideation phase of a project. Thinking in prompts can help you become more direct, explicit and descriptive in communication with your team. Ultimately, AI generators can challenge us become better of ourselves by improving articulation and collaboration.
Prompt Descriptor TipsHere are some overview prompt tips you can use in generating specific images for your project:
- Be explicit with how you describe the image you want generated.
- Think of your prompt like an Object-Oriented Programing fashion – this can help break down your vision.
- Try rewording or rephrasing prompts, it can make all the difference.
- Positive phrases versus reductive phrases tend to work better. i.e., An empty room versus a room with no people.
- Consider creative styles, effects, artist style, medium, device, lens, lighting, shot type, time of day, context, weather, viewing direction, aesthetics, detail, emotion, time period.
- Use Jargon from mathematics, computer science, animation, illustration, photography or anthropology
There are sources such as Promptcraft by Evo Heyning that breaks down the science of prompting, and how you can excel at it. Another source to help you understand prompting is Lexica. This program works hand in hand with Stable Diffusion but is a great resource to understand what kind of prompts outputs what kind of images.
The book explores the intersection of creative writing and artificial intelligence, providing guidance and exercises for those who want to use AI tools.
Real World Example
These are some examples of AI-generated images we created for a recent client using MidJourney software. To test the quality and validity of the software, we specified various parameters such as aspect ratio, stylization, and quality to generate different versions of the images.
Initially, we asked for a veterinarian of colour holding a snake, but the images generated were somber and the snake looked like it belonged in a monster movie. To improve the image and make it more cheerful, we added more context to the prompt, specifying a “happy female veterinarian of colour holding a fluffy cat in a brightly lit veterinarian office. 85mm lens, photo realistic —s 750 —aspect 16:9”. However, even with these specifications, the image remained dark, closely cropped, and posed. We wanted the environment to be more prominent and the image to look candid, so we changed the prompt to feature an equine veterinarian.
Real photos of equine veterinarians often show them wearing white lab coats, which is not realistic. So, we specified exactly what we wanted: “a happy and smiling female veterinarian of colour examining a horse outside a horse stable, wearing blue scrubs and a stethoscope. The environment be brightly lit by the sun, with the wind blowing through her hair, and 85mm lens, photo-realistic, candid photography —s 750 —aspect 16:9”.
However, the AI was stuck on the fact that we mentioned the horse stable and set the environment inside, resulting in a dark image where the veterinarian was looking straight at the camera. Additionally, the horse in the image didn't look like a Canadian breed. After refining the prompt, we finally received four images that were close to what we envisioned. This was our final prompt: “A full body shot of a happy and smiling female veterinarian of colour wearing blue scrubs and a stethoscope examining a Canadian pacer horse outside. She is looking at the horse and smiling. The wind is blowing in her hair, and the sun is shining on her. The environment is brightly lit but the sun. 85mm lens, photo realistic, candid photography —s 750 —aspect 16:9”.
Midjourney offers variations of versions and after some adjustments, we settled on two images that met our expectations. However, upon closer inspection, there were minor inconsistencies such as a wonky eye, hair not attached to the head, and reigns on the sleeve. The second image's horse didn't have reigns in its mouth, and the mouth itself looked odd. Both images are missing the stethoscope and appeared slightly blurry, which could be attributed to the stylize parameter. Thus, adjusting these settings is essential to generate the desired image.
AI generated image of a female veterinarian of colour wearing blue scrubs with a horse
An alternative AI generated image of a female veterinarian of colour wearing blue scrubs with a horse
This exercise was helpful in understanding the evolution of prompts and what to expect from an AI image generator. Although these images are okay at first glance, they do not meet our quality standard to present to a client. However, with further refinement, it is possible to generate an appropriate image. Since AI-generated images are not copyrighted, clients can use them as they see fit.
In summary, AI image generators are computer algorithms that use machine learning techniques to generate images that resemble real-world images. These generators can create images from scratch or modify existing images to produce new variations.
There are many AI image generators available, each with its own strengths and weaknesses, and it is important to understand the intention of the image output you are looking for. Three main platforms, MidJourney, Stable Diffusion, and DALL-E 2, have been discussed, highlighting their differences in image output, syntax, and algorithms.
AI generators are a valuable tool in the ideation phase of a project, allowing for quicker visual studies, experimentation, faster iteration and validation. However, there are limitations to these programs, and human input is still necessary in some cases. Setting your intentions early and using prompt descriptor tips can promote an expert level of efficiency, helping to improve articulation and collaboration. Ultimately, AI generators can challenge us to become better at conceptualizing our vision and communicating it to our team.
Lead UX/UI Web Designer
Ray has over 10 years experience as a Product Designer building websites, apps, Saas, and client portals from the ground up. She is passionate about user psychology and design. Her skills and interests expand into animation, video editing and graphic design. She puts 110% into every project, and aims to exceed client expectations. When she isn't at her computer you can catch her with a good book, volunteering, or on the slopes.