The Bullet: Generative AI — The Good, the Bad, and the Scary

This week, let’s dive into artificial intelligence and, more specifically, an AI podcast created by Play.ht, a company focusing on AI-powered text-to-voice generation. Sound weird? It is, but it’s also fascinating.

Welcome to podcast.ai, “a weekly podcast that explores a new topic in depth, entirely generated by artificial intelligence.”

Here’s how it works — each week, based on the most up-voted user suggestions, the creators define who the two guests will be. The debut episode featured Joe Rogan interviewing Steve Jobs, and if the developers continue the trend, next we’ll have Buddha and Einstein, followed by Donald Trump interviewing himself. Fun.

But why? Why does the world need an AI-generated podcast when there are already millions of podcasts out there covering every niche subject that people could possibly be interested in?

Simple answer — Because the technology exists and people have enough time on their hands.

Formal answer — To improve generative AI’s capabilities and inspire more people to develop audio and visual AI tools.

Now let’s dive into how each podcast is made. Once the guests are selected, the episodes are rendered using play.ht’s ultra-realistic voices, and transcripts are generated with fine-tuned language models. In essence, the AI is scouring the web for sound bites, clips, videos, and text (including a notable figure’s online biography). From these, it creates a persona and speech pattern, along with tone and inflection akin to that of the person the AI is—for lack of a better word—impersonating.

Like any new technology, it never starts out great. Just think of how painfully slow it was to develop a website back in 1994 (complete with the electronic beeps every millenial associates with starting up a connection) or how Twitter had a pretty slow adoption rate when it was first established. But for any new technology to develop, it needs to be used, and that’s exactly what podcast.ai is aiming to achieve — better AI (specifically speech synthesis) through what appears to be a fun and silly podcast.

Play.ht states, “we wanted to push the boundaries of what is possible in the current state of speech synthesis; we wanted to create content that can inspire others to do the same.”

It then goes on to say, “we believe in a future where all content creation will be generated by AI but guided by humans.” A scarier sentiment, especially for those of us who are writers and artists.

While I think podcast.ai is a cool concept, and it was interesting to listen to the first episode (despite Jobs’ creepy laugh), the technology also unnerves me, similar to Deep Fake technology. If Play.ht is working towards being able to clone any voice with perfect resemblance, what does that mean for the world? As if “fake news” outcries aren’t enough, what evidence will we have to support our claims that we did NOT, in fact, say something potentially horrible, despite a recording of our voice saying exactly that?

I support anyone working towards pushing the envelope on what’s possible with today’s technology, and while an AI-generated podcast is a cool idea, I think it’s worth keeping an eye on how this technology evolves and ultimately what it will be used for, besides bringing back deceased legends for a 20-minute podcast.

All opinions expressed in this piece are the writer’s own and do not represent the views of KrASIA. Questions, concerns, or fun facts can be sent to [email protected].

RELATED ARTICLE

The Bullet: Buzzwords — There’s a time and a place