When we see people transcribing audio files and typing away at their keyboards, it’s easy to think, “This looks like an easy, cushy job. How hard can it be to type what you’re hearing?”
“Harder than some people think,” is the answer. Audio transcription can be a challenging career, with an ever-present demand for efficiency and precision. Mistakes in transcribing audio or video, no matter how seemingly small, can cost valuable time and resources—and more. It can also be a very rewarding career path.
In this article, we’ll outline how hard audio transcription can get and what to look for in a transcription company to help get your job done correctly, the first time.
The Process of Audio Transcription
Let’s briefly touch upon the audio transcription process. First, the client sends us their audio or video file(s) that they need transcribed. Ditto receives the file and assigns it to the most suitable transcriber, who then uses their different transcription tools to create either cleaned-up or verbatim transcription for the entire recording. The transcriptionist can then add timestamps, formats, and other accessibility features to the written text file as per the client requirements. Depending on their process, raw transcripts are subjected to multiple quality control or error-checking processes to ensure accuracy before being sent back to the client.
Transcription plays a vital role in several business and government industries. Law enforcement, legal, healthcare, academic, and business industries need audio transcription services to speed up their documentation process, improve accessibility and storage, and create digitally searchable files to make it easier to review their recordings.
Transcribing Audio Is Not As Easy As You Might Think
If all that still sounds easy, I encourage you to consider one thing: how fast people talk.
Estimates from the National Center for Voice and Speech and other resources indicate that the average English speaker talks at about 150 words per minute (wpm). Another way to measure the speech rate is syllables per second (sps). This is a finer degree of measurement than words per minute.
The average rate in the U.S. States is 5.09 sps, according to Preply.com. Regional differences can pump those numbers up, with Minnesota averaging 5.34 sps. (Fun fact: The fastest-ever recorded speech rate was 637 words per minute, done by Steve Woodmore, an English comedian.)
Typing Speeds in Manual Transcription
Meanwhile, the average typing speed is measured at about 40 wpm. Two-finger typists average around 27 wpm, while professional transcriptionists clock between 50 and 80 wpm. Upper limits can reach 100 wpm, while extreme outliers have recorded up to 300 wpm. All these numbers, however, do not consider the error rate. The average error rate is 6%, or 1 out of every 17 words when transcribing audio.
Even using the highest numbers, crunching the data leads to the conclusion that each audio minute takes twice as long to transcribe under the best circumstances. In other words, one hour of audio containing 9,000 words will require at least two hours to transcribe and will contain 540 errors. The time required may go up, and error rates may decrease if we consider pauses for rest, review, and research.
Professional transcription with elevated accuracy rates often takes 4 hours for every hour of recorded audio and video. This is the accepted industry standard for converting audio to text.
Other Challenges In Audio Transcription Services
The rate of speech isn’t the only problem with transcribing audio to text. Audio content can have various problems, making it difficult to transcribe the recording. Here are some common issues transcribers face:
Audio Quality
Poor audio can make it difficult to create audio transcripts. Other issues like distortions, echoes, or other audio issues can affect the clarity of the speech. The most common causes are increased background noises, low-quality audio recording devices, compression or file format choices, and corrupted file transfers. This becomes an even bigger problem when the recording needs to be transcribed verbatim.
Terminology and Jargon
Transcription companies, like Ditto Transcripts, tend to specialize in specific fields. This allows us to be more familiar with industry jargon. Having a transcriber work with a recording from an unfamiliar industry may cause accuracy and turnaround time issues.
One particular example is financial transcription, which is filled with acronyms and short-hand terms like EPS and E/P ratio.
The legal industry is also notorious for using Latin terms that might confuse new transcriptionists. Examples are counsel vs. council, lien vs. lean, cession vs. session, certiorari (sur-shee-uh-rai-ry, which can be misconstrued as, “Sure, see you, Harry.” with low enough volumes) and in camera (which sounds like a common English phrase, however, it means “in chambers”). The transcription service might need more time to complete such tasks as they’d have to look the words up on Google to ensure accuracy.
Multiple Speakers and Unintelligible Speech
Captioning good quality audio recordings can be a difficult process when many people are talking at the same time. The best transcription companies have ways to mitigate this issue. For example, here at Ditto, we use audio enhancements and fine-tuned playback options to better understand what is being said by each person.
However, humans can still make mistakes when talking, fumbling phrases, which makes it difficult to transcribe even with the best audio equipment and under the best circumstances. Ultimately, there will be a point when some spoken words are incomprehensible because there is too much interference.
Accents and Dialects
Human speech is inherently messy, with just one language often containing hundreds of variations in pronunciation and word choice that are almost enough to make a different language altogether. Transcribers unfamiliar with the accent or dialect will have difficulty adjusting to the changes, which could lead to inaccuracy and delayed results.
Is Automatic Transcription Software The Answer… Or Is It?
Automated audio transcription is the process of converting audio or video recordings to plain text content using artificial intelligence (AI) or automated speech recognition (ASR) technologies. This type of transcription leverages the speed of AI and produces lightning-fast audio and video transcription. Additionally, automated transcription software and speech-to-text services are often cheaper than manual transcription. So, if humans are having trouble producing fast transcripts, then surely AI is the answer to that problem?
Not quite, not even close.
Yes, AI is fast, however, that comes at the price of accuracy, which is a terrible trade off, no matter how you look at it. The whole point of recording audio or video for transcription is to produce accurate verbatim transcripts that can be easily searched and stored.
Humans, at least, can understand nuance and use context for more accurate results. If an experienced transcriptionist hears certiorari in the legal context, they will likely write it down correctly, even if it sounds like “tertiary.” Automated transcription solutions will simply barrel through the processes, transcribing what it thinks it heard without thought or consideration (because they’re incapable of both). Even with advances in natural language processing and machine learning, AI programs and platforms can only depend on the training data fed into them.
Bottom line, AI is extremely susceptible to the challenges I mentioned above and is only about 86% accurate. On the other hand, our transcriptionists can provide transcripts and subtitle files with 99% accuracy.
If your goal is to make your content more accessible, then AI isn’t for you. Furthermore, precise transcripts are non-negotiable in some industries like law enforcement, the legal field, and healthcare. Even the most negligible errors here can lead to significant negative consequences.
Best Transcription Tips to Improve Accuracy
The demand for transcription services increases yearly, and both companies that need the services and transcriptionists can benefit from several techniques to ensure high accuracy rates. To maximize the benefits of audio transcription, companies and service providers can follow these steps:
Companies
- Use quality recording equipment
- Optimize audio settings
- Record using uncompressed/high-quality audio file formats
- Provide necessary context
- Provide clear instructions
- Spell out terms with different spellings
- Talk at a moderate pace
- Avoid cross-talk whenever necessary
- Record in a quiet environment
- Work with reputable manual transcription providers
Transcription Providers Like Ditto Transcripts
- Employ rigorous proofreading and editing guidelines
- Retain and train skilled transcriptionists
- Allow clients to choose their preferred text formats
- Delegate specific transcription tasks to the most appropriate and experienced transcribers
- Use quality playback equipment
- Utilize transcription software and tools like language correction programs, multiple monitors, ergonomic keyboards, foot pedals, etc.
- Use and maintain custom dictionaries for specific industries
- Employ feedback mechanisms
- Encourage collaboration and teamwork
The Benefits Of Transcription For Audio And Video Are Undeniable
Transcribing audio isn’t just for courtrooms and captioning entertainment content; many industries are realizing that now. Whether you’re running a law enforcement agency or a multinational corporation, transcription can help save you money, improve workflow efficiency, and provide a safe and secure way to store written documents transcribed from your recordings. All that’s needed is to choose the right transcription service provider, and you’re golden.
Ditto Transcripts is a HIPAA- and CJIS-compliant, Denver, Colorado-based transcription company that provides fast, accurate, and affordable transcription services to companies and agencies of all sizes. Call (720) 287-3710 today for a free quote, and ask about our free five-day trial. Visit our website for more information about our transcription services.