Automation Doesn’t Have All the Answers
- Is automated transcription faster than human transcription? Yes.
- Is automated transcription more accurate than human transcription? No, and it’s not even close!
Now that we’ve answered these two fundamental questions, determining which is the best transcription service for your needs — well — depends on your needs.
Artificial intelligence (AI) speech recognition software is gaining more attention with the popularity of systems like Siri, Alexa, and Google Home. These programs are fantastic for answering questions regarding directions or the day’s weather forecast. AI programs also perform simple commands such as turning appliances on and off and raising your garage door.
Have you ever asked Siri or Alexa a question they didn’t understand and couldn’t answer? That’s because speech recognition software isn’t as accurate as the human ear when transcribing complex information, especially when detail is paramount.
The reality remains, artificial intelligence isn’t to the point where it outranks the performance of a job done by a human. Many of the top researchers and leaders in the industry recognize this fact. Skepticism about the shortcomings of AI and speech recognition software explains why companies still require human transcription to meet specific needs.
AI vs. human transcription usage
Common uses for AI transcription
- When speed is more important than accuracy
- Only a rough draft is needed
- You have time to edit the transcribed document
- When you are searching for keywords or specific quotes
Common uses for human transcription
- Legal, law enforcement, and health care entities because accuracy is critical
- Willingness to pay for a top-quality product
- You have no time or desire to edit the document
- Your audio file has multiple speakers or difficult acoustics
AI Fails to Hit 99 Accuracy Rates
Your mom may think you’re perfect, but we can’t say the same for AI.
On the upside, software engineers are working hard to perfect autonomous software. However, AI transcription for detailed audio requirements can’t even begin to compete with the human brain when understanding intricate conversational details. That’s one reason why a software’s algorithm can be easily fooled, whereas human experience can figure out specific words or phrases.
Take for example, a 1,500-word two person interview transcript completed by a human with a 99% accuracy rate means no more than 15 errors. Most of these will be slight punctuation issues and not major issues that impact the text’s meaning.
Compare this to a 50% accuracy rate of the same 1,500-word document completed by AI, which translates to 750 errors. That’s a Grand Canyon size difference!
The MIT Technology Review explains that AI transcription shortcomings are due to relational reasoning. “Relational reasoning is the ability to consider relationships between different mental representations, such as objects, words, or ideas. This kind of reasoning is both crucial to human cognitive development and vital to solving just about any problem.”
Having a person do your transcripts for you prevents making any assumptions as to what the audio says. Only a human can differentiate between what they think they heard versus what was actually in the audio.
Speech Software Best Recognizes a “General American” Accent
How does the “general” American accent sound? The best example may be a “Midwestern” accent or an accent developed from the interior US states.
Consider the thickest or heaviest accent you’ve encountered from someone who lives in the Northeastern or Southeastern U.S. Can you see why automated software programs not only have difficulty recognizing accents but how individuals use certain words and phrases?
And what about foreign accents? Individuals in some South American countries often have difficulty understanding a native of Spain when Spanish is the native language of both regions.
The Economist cited a recent study conducted by Rachael Tatman from the University of Washington, as she was curious about AI and accents. Tatman researched which accents did the best and which did the worst in speech recognition software and devices. Of the five accents evaluated, Scottish speakers did the worst with having more than 50 percent of their words transcribed incorrectly.
Interestingly, some researchers suggest that AI speech software is biased toward some underrepresented groups.
“Having to adapt our way of speaking to interact with speech recognition technologies is a familiar experience for people whose first language is not English or who do not have conventionally American-sounding names. I have stopped using Siri because of it,” wrote Claudia Lopez Llorenda in Scientific American.
Developers are aware of this problem and continue to look for ways to improve the software. Nonetheless, the sheer number of worldwide accents creates several challenges for AI. We must accept that it could be years before speech recognition software can meet or exceed the accuracy rate of human transcription, regardless of the accent.
Transcription pricing comparisons are often misleading
Here’s an excellent example of where AI transcription services admit their shortcomings when accents and factors such as sound quality are challenging.
An online transcription service advertised automated transcription services for $.10 per minute. However, if the speaker has an accent, the guaranteed accuracy rate slips to 60 percent.
The same company advertises manual (human) transcription at $.80 per minute. However, add-ons for less-than-perfect audio files quickly escalate the price.
An independent review noted the following regarding their manual pricing model:
“These prices are for clean files of American speakers, with an additional charge of $0.50 a minute for speakers with an accent, a noisy background, or a poor audio quality file.”
Let’s take a closer look at this quote. For a 36-hour turnaround with manual transcription, you will pay a per-minute rate of $.80. However, if you require strict verbatim transcription, you’ll pay an additional $.25 per minute. Noisy audio and a speaker with an accent cost an additional $.25 each. This particular service is also known to use transcribers located outside of the U.S.
It’s easy to see how low advertised prices can quickly increase!
Pricing Note: We highly recommend you call and speak with a qualified individual at any transcription service company regarding specific pricing options. This way, once the audio file is reviewed, you’ll receive a more detailed and accurate quote.
In the Health Care, Legal, and Law Enforcement, Accuracy Matters
Is accuracy important when a physician is making notes for your medical chart? You better believe it!
Recent articles evaluated whether or not the use of voice recognition software helped physicians become more productive.
The MGMA (Medical Group Management Association) and the AMA (American Medication Association) discovered reduced physicians productively when AI software was used. Medical conversation, such as doctor patient conversation is typically too complicated, and AI transcription cannot understand the thousands of unique words and phrases.
Legal depositions and law enforcement interviews are an example of where AI transcription isn’t the best option. What if law enforcement personnel are interviewing someone suspected of committing a serious crime. Is accuracy important when transcribing the interview? It certainly is!
One Way or Another, You Pay for Accuracy
AI transcription isn’t close to keeping up with the accuracy standards needed for transcripts in the medical, legal, law enforcement, and financial industries.
Customers who try AI and then switch to using a transcription company like ours say that working with human transcriptionists is much easier. Indeed, a medical transcriptionist will ensure the utmost precision. Working with an actual human is more accurate and affordable in the long run, not to mention the quick turnaround a transcriptionist provides.
The Bottom Line on AI vs. Human Transcription
AI and speech recognition software has a long way to go before it can take over a human’s job. You can be sure that a well-trained transcriptionist won’t mishear nearly as many words, if any depending on the audio quality, based on your accent or how soft you may speak. One way to get a 99.9% accurate transcript is to give the audio to a human and leave the rest to them.
At Ditto Transcripts, we take pride in our ability to deliver a superior quality product at a reasonable price. It’s called value and that’s what we strive to provide on every single transcript we type.