Tools to improve sound quality of audiobooks or podcasts

When I search for an audiobook, my usual choice is Amazon’s Audible (or our Polish Audioteka). This usually ensures good recording quality, and I have no concerns about copyright issues.

However, occasionally, I discover that the best available recording is found on LibriVox, YouTube, or a similar service with a mix of amateur and professional content. This is especially true when searching for not-so-popular books recorded in the not-so-mainstream language I want to practice.

Amateur recording: a list of typical sins

Producing high-quality audio requires both a good microphone, setting, and knowledge. In amateur recordings, the voice acting is often terrific, yet the technical setup might be flawed. Some of the common issues are:

Background noise (air conditioning or computer fans)
Poor quality of microphone
Incorrect distance from the microphone
No postproduction that could have compensated for the above mistakes.

Idea: use AI tools to denoise audiobooks and improve quality

When I recorded my podcast a few years ago, the post-production was quite a manual process. I cannot remember any tool allowing any “Intelligent enhance” action that would produce good results back then.

But today, we have AI-powered voice synthesis systems like OpenAI text-to-speech or Azure AI Speech Studio, which produce crystal-clear AI-generated voices that sound like humans.

So, with those rapid technological advancements in mind, I decided to find out if there are any AI-powered tools for enhancing the human voice in flawed recordings.

Denoising audiobooks with AI tools

Enough talk. Let’s look at my test comparing a few tools that promise to do the job. I recommend headphones to hear nuances like the level of noise or how pleasant the text is to listen to.

Demo 1: Sir Walter Scott – Rob Roy

Demo 2: Jacob Abbott – William the Conqueror

My opinion about the quality of individual tools

I performed a blind test (on just one person, myself 😉) of the outputs above in quality noise-canceling headphones. Here’s my subjective rating on a 0-10 scale where 0 is terrible audio quality, and 10 is excellent:

Service used to denoise and enhance the fragment	My subjective rating of output quality (scale 0-10)
Adobe Enhance v2	9.0 (Great)
Auphonic	6.5 (Acceptable)
Riverside	4.0 (Below acceptable)
Cleanvoice	3.5 (Below acceptable)
Raw recording (no postprocessing)	1.0 (Bad and distracting)
Ffmpeg with simple high-pass and low-pass filter	0.0 (Bad and distracting)

Additional comments

Adobe’s Enhance Speech v2 is an online service dedicated to podcasters. It’s easy to use: drag and drop a file to the browser, one-click “enhance,” and download the result.
The free version allows processing only 30 minutes of audio, not enough for audiobooks. The subscription is €13/month, and even with that, it has a limit of 4 hours daily.
Cleanvoice AI has impressive demos. I tested it to see the quality it achieves. Pricing is relatively high; at about €1 per hour of recording, it’s suitable for publishers rather than people who’d like to improve the quality of content for a single listen.
Auphonic—while it’s behind Adobe’s model in quality, it’s a strong second place. Unlike Adobe Podcast, it offers an API and can be used in software and automated processes.
Riverside Magic Audio—I found the UI confusing and unreliable (some actions stuck with a progress bar on 99%). I hardly knew what I was doing. The output is better than raw recording, but with the samples I tried, it’s below my threshold of “good enough”.
ffmpeg—just for a point of reference, I also wanted to see if I could improve the raw, noisy recording with an open-source ffdshow tool and some filters. I tried some combinations, but they were worse than the input audio. FFdshow is great, but it’s not the tool for this job.

Tools not tested but with some potential in this area:

I know that Audacity denoises audio quite well, but I excluded it from the test because it’s far from a “one-click enhance” experience. And it isn’t that easy to automate.
iZotope RX series of tools is on my radar for a long time, but I found them too expensive for my needs, so I haven’t tested them.

Summary

The technology to enhance audio quality to near-studio levels has advanced in recent years. I hope this trend continues, especially since there’s so much good content created by people who are not professionals in audio production.

In a few years, we might see features like “Enhance audio with AI” and “Enhance video with AI” in players. Or we might not because I don’t know if there is enough commercial potential to do it.

Today, I think the Adobe Enhance Speech v2 model is a definite winner. The website has a startup-like feel, almost as if it were just a prototype. I also wish it had an API, was cheaper, and had no daily limits. But the quality of output is so good that it’s an obvious winner to me. I actually subscribed to this one to process some of the content I want to listen 🙂

If you know any other tools I missed or want to share your opinion, don’t be scared of my poor blog comment system. Leave a few words 😁 And good luck enhancing your content!