Why Midjourney’s New Image-to-Text Generator Is an Accessibility Home Run

Look! AI can be used for good, too

  • Midjourney's new tool can create rich descriptions from images.
  • Text descriptions are essential for the screen reader software used by blind and visually impaired people. 
  • AI could open up many more areas of accessibility. 
Someone editing photo metadata on a laptop computer.

Pexels / Mockup Photos

Instead of turning a text prompt into an image, AI image-creation company Midjourney can now turn a picture into text. 

Midjourney now offers a clever twist on its AI image software, using its powerful machine-learning algorithms to generate text descriptions of already-existing images. This makes detailed image captions trivially easy to make and could totally change the game for blind people when it comes to pictures. 

"Midjourney's new image-to-text generator is a major advancement for accessibility. This technology allows visually impaired individuals to experience visual content in imaginative ways by generating descriptions of images that would otherwise be inaccessible to them. This means that people who are blind or have low vision can now fully participate in visual content, such as social media, web articles, and even online shopping," Dan Trichter, co-founder of Accessibility Checker, told Lifewire via email.

Good Image Descriptions Are Crucial

If you post an image on Twitter or Mastodon, that image is invisible to anyone who cannot see it. The best practice is to describe the image and add that description to the image as an 'alt' element, which associates the description with the image. This description can then be used by accessibility software. 

The iPhone's built-in screen reader, for example, speaks to the user to tell them what their finger is passing over right now, from buttons and other controls to text and, of course, image descriptions. 

"Having good image descriptions is crucial for accessibility because it provides an equal opportunity for everyone to understand and enjoy content. Without descriptions, those who are visually impaired would be excluded from visual information, limiting their access to important information, entertainment, and social interactions," says Trichter.

Of course, hardly anybody takes the time to add these descriptions. Even on Mastodon, where the software makes it easy to add them and, in many cases, even prompts you to do so, it's still an effort. And to be honest, it's an effort that humans should not have to make. That's precisely the kind of busywork that should be done by computers, only until now they haven't really been that good at it. 

If Midjourney's new image-to-text tool could easily be incorporated into social media apps, blog engines like WordPress, and even photo-library apps like the ones on our phones, then detailed descriptions could be added to every photo we upload to the internet automatically. And because text takes up way less storage space than images, the overhead is negligible. 

This means that people who are blind or have low vision can now fully participate in visual content...

More AI Accessibility

Image captions can be handy for all kinds of things, not just screen readers. We're already used to being able to search our photo libraries for dogs, bikes, plants, and so on, and every year the iPhone's search, for example, seems to add more and more depth to this search. 

But imagine if this search could take the same kind of leap, AI has added to image generation over the past year. If your images contained deep, detailed descriptions, you may never again have to scroll through thousands of images to find that amazing pasta dish you had that one time. And it's not just images. AI could help with accessibility in other ways too.

"Yes, AI can definitely help improve accessibility in other ways beyond image descriptions. For example, AI can be used to automatically generate captions for videos, which can be helpful for people who are deaf or hard of hearing. AI can also be used to improve speech recognition and natural language processing, which can benefit people who have difficulty typing or using a mouse," education counselor Johnson Adegoke told Lifewire via email. 

Someone editing photos on a desktop and laptop computers.

Glenn Carstens-Peters / Unsplash

So far, we've really only seen the worst aspects of AI in terms of processing or creating images, videos, and audio—it's harvesting copyrighted work without the permission of creators and using it to put those creators out of work. But the same kinds of AI tools, as we have seen, can have legitimately good uses too.

One possibility to hope for is that Apple, which has excellent accessibility tools across its platforms, would build something like this into its Voice Over tools, generating image descriptions on the fly. 

The benefits of AI-driven accessibility are evident and profound. It's just a shame that we have to put up with all the rest of it to get the good stuff.

Was this page helpful?