Measuring respiratory function is important. It’s used to inform COPD or asthma diagnosis, track disease progression, and help clinicians make ventilatory support decisions in diseases that impact respiration, such as amyotrophic lateral sclerosis (ALS). These measurements are generally made with a spirometer, which uses air flowing through a tube into a chamber to measure volume and airflow; to use it, a patient follows a clinician’s instructions for blowing or inhaling through a mouthpiece attached to the tube. For forced vital capacity, for example, the patient is asked to take a deep breath and blow into the tube as hard as they can, until their lungs are completely empty. This “maneuver” is not pleasant, often induces coughing, and generally requires supervision to ensure that it was done correctly.
With the rise of telemedicine, driven in part by the COVID-19 pandemic, patients with conditions affecting respiration are increasingly evaluated remotely. A full assessment often requires a spirometer, but those in need of respiratory measurements don’t always have a spirometer handy. In a recent article published in Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, we show that a key respiratory measurement, Forced Vital Capacity (FVC), can be approximated using analysis of sustained phonations. A sustained phonation task is similar to the FVC task, in that both ask the patient to inhale maximally. Instead of blowing into a tube, patients performing sustained phonation are asked to then hold out the sound “ahhhh” for as long as they can, until they run out of breath. This audio recording requires no special equipment like a spirometer, just a smartphone or tablet. Both of these tasks, when done correctly, reflect vital capacity, or the amount of air a person can voluntarily exchange. In this study, we used our Speech Vitals app downloaded onto personal smartphones to collect sustained phonations of “ahhh” from a group of patients with ALS daily for 3-9 months. We used this data to build a model connecting measurements taken from the audio-recorded sustained phonation to FVC, measured using a spirometer. We then validated this model on a different database containing sustained phonation and FVC measures from a completely different study with different participants, in which ALS patients produced sustained phonation recordings from home.
The model is based on the premise that within-person changes in speech features can reflect changes in forced vital capacity. This is something that we observe in the many ALS patients who experienced declines in respiratory function over the course of the study. These declines, illustrated in the plot below, were reflected in both their spirometer FVC and sustained phonation measures. The top row of plots shows that FVC, as measured by a spirometer (y-axis), declines over time (x-axis). The bottom row of plots shows that sustained phonation duration (y-axis) declines similarly with time (x-axis).
Building from this idea, we constructed a simple statistical model to predict FVC from sustained phonation (“Speech Measure” on the y-axes of the plots above). The model is a demonstration of the repeatable, fit-for-purpose procedure that we outlined in our last blog post. To avoid uninterpretable models that are overfit to coincidental patterns in the available data, we used only a few interpretable features that we know to be relevant to respiratory measurement, and we controlled for speaker demographics (for example, a speaker’s height and age). Models were built using leave-one-out cross-validation, where we fit the model leaving out each of the study participants in turn and then verify the model on that left-out person, and then validated using a different dataset. All of the numbers we report are out of sample.
Estimates of FVC were accurate, even out of sample on this new dataset: The plot below shows the predicted vs observed FVC scores on the test sample (the sample from the second study which was not used for training the model). The correlation between the spirometry measurement and the model estimate was r = 0.80. We also found that the predicted FVC scores declined over time with disease progression.
These results are a demonstration that speech–in this case, sustained phonation–can be used to provide information not just about language processing and speech production, but about respiratory function as well. As we streamline our data collection, we expect that this model will improve, and that we will be able to estimate additional measures of respiratory function, and test them in a wide range of conditions affecting the respiratory and phonatory systems.