A team led by Google scientists has developed a machine-learning tool that can help detect and monitor health conditions by evaluating noises such as coughing and breathing. The artificial intelligence (AI) system, trained on millions of audio clips of human sounds, might one day be used by physicians to diagnose diseases including COVID-19 and tuberculosis and to assess how well a person’s lungs are functioning.
The researchers, who reported the tool earlier this month in a preprint that has not yet been peer-reviewed, say it’s too early to tell whether HeAR will become a commercial product. For now, the plan is to give interested researchers access to the model so that they can use it in their investigations. “Our goal as part of Google Research is to spur innovation in this nascent field,” says Sujay Kakarmath, a product manager at Google in New York City who worked on the project.
Most AI tools being developed in this space are trained on audio recordings that are paired with health information about the person who made the sounds. For example, the clips might be labeled to indicate that the person had bronchitis at the time of the recording. The tool comes to associate features of the sounds with the data label, in a training process called supervised learning.
Instead, the Google researchers used self-supervised learning, which relies on unlabeled data. Through an automated process, they extracted more than 300 million short sound clips of coughing, breathing, throat clearing, and other human sounds from publicly available YouTube videos.
Each clip was converted into a visual representation of sound called a spectrogram. Then the researchers blocked segments of the spectrograms to help the model learn to predict the missing portions. This is similar to how the large language model that underlies chatbot ChatGPT was taught to predict the next word in a sentence after being trained on myriad examples of human text. Using this method, the researchers created what they call a foundation model, which they say can be adapted for many tasks.
In the case of HeAR, the Google team adapted it to detect COVID-19, tuberculosis, and characteristics such as whether a person smokes. Because the model was trained on such a broad range of human sounds, to fine-tune it, the researchers only had to feed it very limited data sets labeled with these diseases and characteristics.
On a scale where 0.5 represents a model that performs no better than a random prediction and 1 represents a model that makes an accurate prediction each time, HeAR scored 0.645 and 0.710 for COVID-19 detection, depending on which data set it was tested on — a better performance than existing models trained on speech data or general audio. For tuberculosis, the score was 0.739.
The fact that the original training data were so diverse — with varying sound quality and human sources — also means that the results are generalizable, Kakarmath says.
Ali Imran, an engineer at the University of Oklahoma in Tulsa, says that the sheer volume of data used by Google lends significance to the research. “It gives us the confidence that this is a reliable tool,” he says.
Imran leads the development of an app named AI4COVID-19, which has shown promise at distinguishing COVID-19 coughs from other types of cough. His team plans to apply for approval from the US Food and Drug Administration (FDA) so that the app can eventually move to market; he is currently seeking funding to conduct the necessary clinical trials. So far, no FDA-approved tool provides diagnosis through sounds.
The field of health acoustics, or ‘audiomics,’ is promising, Bensoussan says. “Acoustic science has existed for decades. What’s different is that now, with AI and machine learning, we have the means to collect and analyze a lot of data at the same time.” She co-leads a research consortium focused on exploring voice as a biomarker to track health.