Sociology and AI Text-to-Image: A Practical Guide

Example of Midjourney Picture City of Rainbows at Dawn

AI text-to-image — where users input textual descriptions into an Artificial Intelligence website or app to generate a unique picture — is a new way for sociologists to understand society and teach sociology.

This post provides a practical guide for sociologists to use AI text-to-image for research and classroom instruction.

What is AI Text-to-Image?

There are many definitions, but in short, AI text-to-image refers to textual descriptions, or “prompts,” that sociologists can input into an AI program in order to generate new pictures.

The textual description can be anything: “a person sitting on a bench” or “a dog at a fire hydrant.” It can be a “Jean-Michel Basquiat painting of the Washington Monument” or “Superman flying above Metropolis.”

The text-to-image commands, or “prompts,” can be simple or complex. Its best to start simply, and then try more complex prompts.

AI Text-to-Image Websites for Everyday Sociologists

With the introduction of DALL-E, the popularity of AI text-to-image is skyrocketing. There are many new websites and there are features added to existing websites. We will look in-depth at two of them: Dall-E and Midjourney.

DALL-E

A simple and powerful one is DALL-E. It looks like this:

Screenshot of DALL-E

It looks like Google. You enter the text into the box and click “Generate.”

Let’s try one.

DALL-E provides us with four pictures that are variations of the text. These pictures have never been created before. They are a product of DALL-E’s AI and you.

DALL-E Pros and Cons

It is simple to use. But there are some issues.

As you can see, DALL-E has had a well-known problem with generating human faces, but will likely solve the problem within months.

Can anyone use it? It seems so, but DALL-E is only for those whom it has “invited.” You can ask OpenAI, the company that runs DALL-E, to put you on the waitlist. It is easy to do. You can access the form for the waitlist here.

Simple and Complex AI Text-to-Image Websites for Sociology

Simpler tools are available in Canva, which has unveiled a new text-to-image feature. So has Tik Tok. More will follow.

There are more complex versions that require advanced knowledge to use: Google ImagenMidjourneyDeepAI, and Hugging Face and Stability AI, which uses Stable Diffusion. They require some complex steps, but sociologists can access them.

Of these, Midjourney is the “simplest” because you can use it within a well-known interface.

Midjourney

Midjourney’s interface is Discord, which is like Zoom and Skype, in that one can communicate over the internet with text, audio, and video. It is used by gamers (video game enthusiasts), but more and more sociologists are using it for teaching sociology and for research meetings. It takes some getting used to, but one can quickly learn it.

The interface looks like this:

Midjourney Pros and Cons

Midjourney is excellent for complex pictures. Sociologists can use a variety of prompts to generate interesting pictures for their powerpoint slides at research conferences and for teaching sociology.

However, Midjourney costs 10 USD per month to generate up to 200 images a month. Also, it has a steeper learning curve than DALL-E. Sociologists would first have to learn about Discord, and then they would have to learn Midjourney prompts.

But, you can learn them both and in a relatively quick time! Midjourney itself explains how to create pictures. There are also many excellent tutorials on YouTube. For example:

YouTube is an excellent search engine for guides. If you don’t like this YouTube guide, you can choose another. There are many to choose from.

Text-to-Image Struggles with Abstract Sociological Concepts

Text-to-image AI is built on millions and millions of images that it scraped from the web. It can picture well-known things, such as a person sitting on a bench. However, it struggles with abstract sociological concepts. We explored this a bit in a previous post, but not with a comparison.

Let’s try a comparison between DALL-E and Midjourney on the sociological concept of “inequality.” The prompt is “inequality,” with no other descriptors. What does it show?

DALL-E “inequality”

Yikes! DALL-E translated “inequality” into a jumble of letters and images. And who knows what “quay quaity” means! Let’s see Midjourney.

Midjourney “inequality”

Midjourney’s pictures are more interesting than that of DALL-E on this sociological topic, but it would require an Art Major to decipher what these AI-generated images of “inequality” mean.

As we can see, text-to-image AI struggles with abstract concepts. It is best to envision a physical object that is connected to an abstract one.

For example, to picture the concept of how economic inequality is connected to political inequality, here is a Midjourney prompt of “the Earth wrapped in dollar bills:”

Uses of AI Text-to-Image for Sociology Research and Teaching

While this powerful tool will improve over time, already it has shown to have many interesting uses for sociology research and teaching. How can sociology researchers, students, and teachers use AI text-to-image?

Research Questions

The AI tools suggest research questions that sociologists can ask and answer.

1. Stereotypes: What do these images say about society?

What can we learn about the images that society produced and that OpenAI (DALL-E) and Midjourney fed into their algorithm? Do AI images portray stereotypes as other AI algorithms have?

2. Threats and Opportunities: How are people using AI text-to-image? What occupations are threatened by this tool?

Is this an opportunity for some? What new occupations will be created from it? In short, who benefits, and who does not benefit, from AI text-to-image? Will this tool deepen inequality?

3. Redefining art: How will the widespread use of these tools change how society views art?

What is art? Is AI generated art “art”? Who is the artist?

4. Ethics: What are the ethical concerns with text-to-image AI?

Contemporary artists are already demanding that these tools remove their images. Since the images are based on other images, are AI images legal? What does copyright mean when it comes to AI generated images? What comparisons can we make with sampling in hip hop? In hip hop, as with music from time immemorial, artists “borrowed” parts of others’ music to use in their own music. Is AI generated art the same thing? Who owns the “prompts” used to generate AI art?

Teaching Sociology with AI Text-to-Image

Here are suggestions on how sociologists can use AI text-to-image in the classroom.

1. Illustrate sociological concepts

As young students are typically software-savvy, perhaps more so than the teacher, this would be a feasible and exciting class assignment. It is ideal for groupwork, as many students within the same group can run different prompts in Midjourney of the same sociological concept to illustrate sociological themes.

2. Discuss the tool’s potential for deepening inequalities and harm

The sociology class can discuss the research questions posed above. They can also discuss potential abuses of AI text-to-image. While DALL-E does not allow text prompts of sexual imagery or violence, other text-to-image tools do. The potential for abuse is high. How will abuse connect to existing inequalities? Who is more likely to be harmed by AI text-to-image?

3. Discuss the larger issue of AI and society

AI will continue to change society. What we are witnessing now is merely the beginning — it is akin to the dawn of the 1990s internet. The potential is great, but so are the risks. How is society building AI into its social, economic, cultural, and political structures? What will AI do to society?

Conclusion

This practical guide showed what AI text-to-image is, how everyday sociologists can access it, and how they can use it for research and teaching. This powerful new tool democratizes art, but also offers threats and new opportunities while having the potential to deepen inequalities.

In sum, this new tool is a perfect topic for sociology.

Joshua K. Dubrow is a PhD from The Ohio State University and a Professor of Sociology at the Polish Academy of Sciences.

  1. What is AI Text-to-Image?
  2. AI Text-to-Image Websites for Everyday Sociologists
    1. DALL-E
      1. DALL-E Pros and Cons
    2. Simple and Complex AI Text-to-Image Websites for Sociology
    3. Midjourney
    4. Midjourney Pros and Cons
  3. Text-to-Image Struggles with Abstract Sociological Concepts
    1. DALL-E “inequality”
    2. Midjourney “inequality”
  4. Uses of AI Text-to-Image for Sociology Research and Teaching
    1. Research Questions
      1. Stereotypes: What do these images say about society?
      2. Threats and Opportunities: How are people using AI text-to-image? What occupations are threatened by this tool?
      3. Redefining art: How does the widespread use of these tools change how society views art?
      4. Ethics: What are the ethical concerns with text-to-image?
  5. Teaching Sociology with AI Text-to-Image
    1. AI Text-to-Image for Teaching Sociology
      1. Illustrate sociological concepts
      2. Discuss the tool’s potential for deepening inequalities and harm
      3. Discuss the larger issue of AI and society
  6. Conclusion

Leave a Reply