Why Human Transcription Still Beats AI

Artificial intelligence is taking over the world, it seems. But not in the post-apocalyptic way usually depicted in movies like The Terminator and The Matrix. AI is now in everything, from making calls to driving to shopping. Automated speech recognition (ASR) for transcription services had existed way before the current AI boom, but it is reaping the benefits of the technology.

Google, Amazon, and Microsoft have ASR programs and platforms boosted by AI implementations.  Several different transcription companies are doing the same. As a result, AI-powered transcription solutions are saturating the market, and many companies are turning to ASRs for their transcription needs. But is automated speech recognition really better? And does traditional transcription still work? Let’s talk about it. 

The Undeniable Strength of AI Transcription Services

Now, I won’t sit here and deny that AI is an excellent piece of technology. It has many applications in various industries, and companies should rightfully utilize them in their processes. One of the main strengths of AI is that it can parse and process information at speeds that match nothing we’ve seen or developed before. That’s to be expected since AI uses enhanced silicon, parallel processing, and high computational speeds that far outstrip anything our electrically-charged natural grey-matter processors (AKA our brains) can ever achieve. That means AI can produce transcripts faster compared to human transcribers. 

But, like anything, there’s a time and a place for using AI — and transcription might not be the best use for it. The transcription process requires an exceptional degree of accuracy, considering the industries and instances that demand its service. Law enforcement uses, legal discussions, courtroom hearings, corporate board meetings, academic and market research, medical transcription, and insurance reviews — these are just a few instances where transcription is vital. For all of them, accuracy is key. 

The Weakness of AI Transcription Services

I know I’ve said this before, but this bears repeating: the consequences of inaccurate transcription are heavy, far-reaching, and unpredictable. Miscommunications, legal ramifications, loss of credibility, misinformation, operational errors, medical errors, negative financial consequences, damaged relationships, and time and resource waste are some potential effects of incorrect transcripts. 

For all its speed, AI still cannot transcribe with acceptably accurate results. 

AI transcription work is at the mercy of background noise, overlapping speakers, different accents and dialects, and poor audio file quality. It cannot identify nuance and utilize context to create a more accurate verbatim transcription — things that come naturally to experienced human transcriptionists. 

In a statement to IBM, Julia Hirschberg, a professor and chair at the Department of Computer Science at Columbia University, explained why voice recognition software isn’t as accurate as some may think. Hirschberg says, “The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex.” 

Speech Recognition Plus Human Transcription?

In 2022 a paper titled Human–machine collaboration in transcription, authors Miller, Jetté, Migüel, and Kokotov said that “Despite improvements, ASR performance is as yet imperfect, especially in more challenging conditions (e.g., multiple speakers, noise, nonstandard accents).”

Their discussion proposes a marriage of automated and human transcription with what they call the human-in-the-loop (HIL) approach. Some sources and experts in the transcription industry refer to this as “automated transcription with human intervention.” The authors are ostensibly on the side of automated transcription. 

However, even they had to admit there’s no point in incorporating ASR-human transcription processes if the resulting transcription project from the initial automated process has too many errors. They further discuss that our current metric of measuring ASR errors (word error rate, or WER) might not be sufficient to qualitatively express the accuracy of automated transcripts and their effects on productivity. According to them, “It is, therefore, the case that improved WER does not always lead to increased productivity, and the inclusion of ASR in HIL may adversely affect productivity if it contains too many errors.”

The Speed of Human Transcription

While ASRs are undeniably faster than human transcription, the speed at which expert transcriptionists type is nothing to sneeze at, either. The average typing speed is about 40 words per minute (WPM). Entry-level transcribers, meanwhile, average 60 WPM, with more experienced transcriptionists reaching up to 80-100 WPM when they transcribe audio. Outliers, though far and few in between, can even go faster. 

The Ethical Cost of AI Transcription

Businesses want to save money while making more — that’s nothing new. However, there has to be a point where we draw the line between padding the corporate earnings report and considering the impact of business practices on other people. 

Case in point: automated transcription puts jobs at risk. There were over 51,000 transcriptionists employed in the United States in 2022, according to Zippia.com, and the transcription industry is valued a little below $26 billion in the same year. The continuous push for automation can severely impact the transcription industry and the transcribers currently employed. This is also nothing new, as workforce displacement has been a concern since the Industrial Revolution. 

There might be a few ways to mitigate this concern, such as upskilling and changing roles from baseline transcription to editing and proofreading (as suggested by Miller et al.), management, and maintenance. The transition will be slow and painful but ultimately necessary if AI implementation is really the solution. 

However, the current state of automated transcription does not warrant that sort of industry upheaval. The best automated speech recognition software, even bolstered by AI, can only reach up to 86% accuracy. It doesn’t matter if we’re transcribing entertainment podcasts or life-altering court hearings — 86%, or 14 errors for every 100 words, is simply unacceptable. 

The Benefits of Traditional Transcription

We’ve covered accuracy in discussing the merits of traditional transcription compared to automated solutions, and make no mistake, accuracy is one of the most crucial aspects of the human vs. AI transcription argument. But it doesn’t end there; human transcription has various other benefits. Here are some of them: 

Ability to Adapt to Audio Issues, Context, Accents, and Dialects

Human speech is a dizzying amalgamation of sounds, almost all varying depending on region, country, culture, educational background, and other disparate factors despite belonging to the same language. It’s more like ordered chaos than a rigid system of communication. Even worse for ASRs, human speech is constantly evolving. Having an AI work on a recording with accents, terminologies, and speech patterns different from its training data is an exercise in futility. 

As mentioned before, humans are better at understanding humans. Furthermore, they handle ambiguity better than ASRs since they can take context and apply it to the situation, or they can ask for clarification if necessary. This makes them infinitely better in situations where AI falters. That’s why humans can do it better, no question. 

Handling Complex, Industry-specific Recordings

Transcribing content in specialized fields or industries that require domain-specific knowledge (e.g., legal, medical, law enforcement, business, academia) is often better performed by human transcribers who can understand and interpret the subject matter on a deeper, more instinctual level. 

AI also has problems with complex terminologies. Law offices, for example, often use legalese and Latin phrases. An experienced legal transcriber will have no problems with that. AI not trained in the language will have nothing but problems, and the resulting transcript will be of poor quality. I still remember using voice recognition to type In flagrante delicto and getting “infle granted delicto” as a result. 

Privacy and Security

Human-powered transcription service providers handle sensitive data daily. Most of them offer elevated security to protect their client’s data, and there are regulatory measures like HIPAA and CJIS compliance that serve as further markers of a secure and trustworthy transcription company. Meanwhile, AI might not be as safe as others would like. Here’s a quick example. 

In March 2023, ChatGPT was taken offline to fix a privacy leak, allowing users to see other users’ personal data, payment information, and chat histories. It took them a few days from the initial report to the eventual fix, and the OpenAI reported only about 1.12% of users were affected by the error. In the same month, about 1.6 billion people were recorded using ChatGPT. That means about 18 million users had their information freely available to others. 

For the transcription industry, that kind of data breach is unacceptable. Imagine HIPAA-protected patient information, classified business processes, and legal case strategies getting shown to other people.

Human Transcription Is Here to Stay

Despite the massive and undeniable strides in automation, traditional, human-powered transcription is still the best transcription option for any industry. AI might be fast, but it’s not accurate, and it comes with several risks that make the service more trouble than it’s worth. Don’t listen to the hype, and make the right choice for your transcription needs. 

Ditto Transcripts is a CJIS- and HIPAA-compliant, Denver, Colorado-based transcription company that provides fast, accurate, and affordable transcription services for law enforcement agencies of all sizes. Call (720) 287-3710 today for a free quote, and ask about our free five-day trial. Visit our website for more information about our transcription services.

Looking For A Transcription Service?

Ditto Transcripts is a U.S.-based HIPAA and CJIS compliant company with experienced U.S. transcriptionists. Learn how we can help with your next project!