Real-Time Subtitles and Translations May Be the Future of Video Chat

And maybe some comedy swearing

Key Takeaways

  • Navi uses SharePlay, and Apple’s built-in speech-to-text, to offer subtitles and translations in FaceTime.
  • It’s far from perfect but already good enough. 
  • Subtitles are great for accessibility.
Multicultural video call as seen through the computer screen.

Jasmin Merdan / Getty Images

Navi is an app that adds live subtitles, and real-time translations, to your FaceTime calls. 

The app uses SharePlay and built-in Speech Recognition to add subtitles and translations in 20 languages to your FaceTime calls. It’s an incredible use of SharePlay, which most of us regard as a gimmicky way to watch synced movies with people in other places. You might not need to fire your translator just yet, but an app that does this well could be insanely useful.

"I’m not getting the audio from the FaceTime call,” writes Navi developer Jordi Bruin on Twitter, “but using SharePlay to share it amongst the participants in the call.”


SharePlay is a new feature in iOS 15 and macOS 12.1 that lets you share and synchronize things in FaceTime calls. With the movie-watching example above, any participant can pause or play the movie, for example, while you all chat in the FaceTime call. The FaceTime video stays open in a small, floating, picture-in-picture panel, and each participant runs the app locally on their device. SharePlay's trick is to sync whatever is happening in these local apps, so everyone shares the experience, be it a movie, a Fitness+ workout, or a spreadsheet.

Navi uses the same tech, only the in-call app isn't a movie—it's a real-time translation engine. To use it, you launch the app while in a FaceTime call and tap the 'Turn On Subtitles' button. Then, other participants can also join the action and see live subtitles for the current speaker. If somebody is monologuing, their speech bubble grows and sticks around a little longer.

A screenshot from the Navi app.

For the deaf, this could mean the difference between calling people or not. And for anyone, it means you can have useful conversations between people who don't share a language. 

Universal Text

The internet is built on text, and that’s great. It’s small and easy to create, read, and translate. It’s also simple to turn into synthesized speech. The result is that anyone from anywhere can participate in any conversation. Language is no barrier, and neither is deafness or any kind of blindness—as long as you’re using a device with good accessibility tools for impaired sight or hearing. 

But the spoken word is much harder to process. Speech-to-text dictation is impressive, but it’s only relatively recently that general speech recognition has gotten good enough for general use—Apple’s Translate app is a good example. Introduced in iOS 15, it offers real-time audio translations. If we still went on foreign vacations, it would be perfect. 

Now we use video more and more for work and to stay in touch with friends and family. No matter how we work in the future, the barrier to video calls has been thoroughly smashed. It’s now a common tool, but it lacks a lot of the finesse of written communication tools. 

Something like Navi, which offers real-time subtitles and translation, could be significant. Accessibility is one aspect, but the ability to converse with people whose language you do not speak opens up international business to a startling degree. 

Screenshots from the Navi app.


In Action

I tested Navi with app developer, author, and hearing-aid user Graham Bower. It's pretty good but not yet ready for critical tasks. Some of the transcriptions were comically bad and too vulgar to relate. As our conversation went on, though, it got a lot better at accurately recognizing his speech. That makes sense because the iOS dictation engine adapts to your voice over time. 

The translation also worked, although the quality of its translations depends on the accuracy of the input. 

It's easy to project this kind of technology into future Apple Glasses or whatever rumored AR/VR product is working on this week. 

"I can see this working in AR glasses," said Bower during our conversation. "Some people, even with normal hearing, prefer subtitles in movies. This would be like subtitles for real life."

While an impressive tech demo, Navi isn't there yet. For reliable business use, Apple's initial speech recognition will have to get a lot more accurate. But speed-wise, it's fine, and the translations are as good as any. 

But we're on the path now, and this kind of thing will only get better.

Was this page helpful?