Speech Recognition Reassessed by A. Michael Noll

Speech Recognition Reassessed

A. Michael Noll

My new Garmin GPS navigation unit has made me reassess my previously negative opinion of automatic speech recognition. I am now impressed. But it has taken many decades for me to change my mind.

Back in the 1960s, when I was working in speech research at Bell Labs, speech recognition was in its infancy. Not only was the performance not very good, but also the applications were challenging to identify. A keyboard and knobs were far easier to use. Speech recognition a half-century ago required the largest computers that were then available – and they did not recognize speech in real time. Today’s speech recognition is much better and produces results in real time – and on devices we have in our cars or carry in our pockets. The technology has progressed significantly.

John R. Pierce (the famed father of Telstar) had written a paper “Whither Speech Recognition?” in the Journal of the Acoustical Society of America in 1969. He predicted a dismal fate for automatic speech recognition. I followed with my own paper taking a similarly skeptical view of automatic speech production.* I believed that graphical display of information was better than machines that spoke to us. But I did acknowledge that speech recognition might help a “driver to keep eyes on the road.”

We thought that imperfect automatic speech production would be more acceptable than imperfect speech recognition. That is because we believed that humans were better at understanding automatic synthesized speech than computers were at understanding human speech.

My Garmin represents the state of the art in both automatic speech recognition and production, as it not only recognizes speech but also creates synthetic-speech directions when navigating me along a route. Neither is perfect. Some pronunciations are comically wrong – and it will not recognize the names of some restaurants. Most of the time, it is great – but at other times, it is frustrating. But speech is much better – and safer — than attempting to touch the screen to enter data while driving.

Since my Garmin GPS unit sometimes will not recognize the correct pronunciation, I have to pronounce words incorrectly but in a way that it does recognize. My Garmin is making me conform to it, and I wonder whether we will over time have people with a Garmin accent!

I am told that the speech recognition by Google and Apple are very good. These systems send the speech to a remote cloud-based computer that has considerable processing power and speech-recognition software. But when using a computer, I still find it is easier to just type my request for information. Speaking to a computer, for me, just seems like more energy and effort. But I guess that if I used a smart phone, then speaking might be easier than typing on a small screen. However, at my old age, I am not smart enough for a smart phone!

*Noll, A. Michael, “Whither Speech Production?” The Journal of the Acoustical Society of America, Vol. 47, No. 6 (Part 2), June 1970, pp. 1614-1616.

Primary takeaways

Digital inequality shows larger impacts on youth academic performance as compared to time spent on screens.
Digital skills play a significant role in mediating unstructured online engagement (social media use, playing video games, browsing the web) and youth academic, social, and psychosocial development.
Unstructured online engagement and face-to-face social interaction are complementary and continuously interact to create and enhance youth capital outcomes.

A familiar story: concerns of screen time

Today’s discussions of adolescent well-being have coalesced around a clear narrative: teenagers spend too much time online, and their academic performance, mental health, and social lives are deteriorating as a result. A steady stream of academic papers, books, and op-eds, alongside a growing number of policy proposals––school phone bans, age-gated social media use, restrictive screen-time limits––rest on the same underlying claim, aligning with a contemporary, digitized version of the displacement hypothesis:

Screen time, particularly the unstructured, free-time spent on social media, gaming, watching video content, or browsing the web, is said to displace the productive face-to-face activities that build adolescents into capable adults.

The implied and often practiced solution is restriction. In response, this dissertation tested this claim directly, and placed it within the broader context of adolescence.

Across three years, I followed 653 Michigan adolescents from early through late adolescence: in grades 8 or 9 (survey one, 2019) to grades 11 or 12 (survey two, 2022). Notably, these students, studied over time, were part of a broader pooled sample of 5,825 students across the same eighteen highschools. The study window captured the year before and the year after the peak of the COVID-19 pandemic and related lockdown orders, functioning as an unprecedented stress test for theories of adolescent social, academic, and digital life and, importantly, as a benchmark to compare the effects of pandemic-related change and inequality to those effects from screen time alone.

Across four studies of adolescents, consisting of six cross-sectional and longitudinal analyses, findings are not consistent with the displacement narrative, nor the broader concerns about the time youth spend on screens.

Findings are, however, consistent with something the current public and (most) academic discussions have largely overlooked or ignored: the gaps and inequalities that determine whether adolescents can access and use the internet meaningfully in the first place.

What the displacement hypothesis overlooks

Displacement and related research and policy concerning the time young people spend online assumes a “zero-sum” model of adolescent day-to-day time. An hour online is an hour not spent studying, reading, sleeping, or interacting face-to-face (i.e., time spent on more productive or developmentally “better” activity).

Indeed, this makes sense logically. However, as an empirical claim, this model requires time spent online to behave differently from all other ways adolescents allocate time; it must produce uniquely negative outcomes and be inherently harmful across digital contexts, rather than the typical mix of trade-offs corresponding to, and often overlooked among any other social or developmental context.

Yet, online time does not differ from other youth activity. Instead, I find it has a mix of pros, cons, and even some “uniquely digital” benefits which youth utilize for social and academic gains. When I compared unstructured digital media use against traditional face-to-face interaction and activities, both produced similar patterns: some negative associations with academic outcomes, some null, and some positive.

Trade-offs within traditional face-to-face activity (for example, social time with friends and family, or time spent in after-school extracurriculars) are treated as ordinary developmental experiences that must be experienced for the betterment of development. The identical trade-offs involving digital time tend to be overlooked or ignored, and online engagement is perceived as altogether harmful.

A growing body of evidence, including this dissertation, do not support that distinction. Indeed, the developmental context is routinely misread, leaving out the context of the experiences and time spent on digital, as well as face-to-face activities, interactions, existing inequalities, and changes inherent to development. As such, I proposed a novel framework to understand these contexts:

Digital capital exchange

Rather than treating screen time as a unified harm, this dissertation advances an “exchange”-based framework, grounded in James Coleman’s theories of youth capital and digital inequality scholarship, particularly following Eszter Hargittai, Jan van Dijk, and Alexander van Deursen (see this list of all dissertation references for full works).

The core proposition is that adolescents’ online engagement is not an alternative to developmental activity but another, albiet modern domain through which young people accumulate and mobilize online resources––particularly digital skills––that work alongside existing social networks and experiences to be exchanged for human capital (measured as: academic achievement, aspirations, STEM interest) and social capital (peer networks, community participation, extracurricular involvement).

Online time is not the mechanism; instead, it is digital skills that I find to be the most vital component in youth capital exchange and enhancement. Unstructured online engagement contributes to online skills; those skills, accumulated and mobilized alongside existing peer, family, and community networks, translate into the outcomes researchers and parents care about, i.e., academic achievement, aspirations, and face-to-face interaction and social networks.

This digital capital framework treats online and in-person contexts as complementary rather than antagonistic, and it situates adolescents’ digital lives within the structural conditions––connectivity quality, device reliability, autonomy of use––that determine whether exchange can occur at all.

Methods (in brief)

Paper-and-pencil surveys were administered to students in classrooms at two time-points: spring 2019 (N=2,876) and spring 2022 (N=2,949), across the same eighteen predominantly rural Michigan schools, grades 8–12. Official, nationally-ranked standardized reading, writing, and math test scores (PSAT 8/9, PSAT 10, SAT; College Board) were then anonymously linked to students’ survey responses with the help of participating districts.

Cross-sectional path analyses modeled pooled and wave-specific samples (pooled N=5,825); two-wave cross-lagged panel models tested reciprocal, longitudinal relationships on the 653 students who completed both surveys. Multi-group analyses of the cross-lagged panel models compared relationships between girls (N=345) and boys (N=308). All longitudinal models included time-invariant socioeconomic covariates as well as time-varying covariates to reduce omitted-variable bias.

Key findings: an overview

To summarize, to the best of my ability, eight chapters across 376 pages, I present two primary findings:

First: digital inequality predicted larger and more consistent declines in human capital than screen time did.

Unreliable home internet and technology maintenance problems––experiencing and/or dealing with broken or outdated devices and software, restrictive school-issued hardware, issues with connecting to or maintaining internet access––decreased youth GPA and standardized test achievement. And, these effect sizes were substantially larger than any negative direct effect from unstructured digital media use.

Across all four empirical studies, digital inequality emerged as the most substantial predictor of academic and developmental decline.

Second: digital skills mediated the relationship between online time and adolescent academic and social outcomes.

Unstructured digital media use, particularly online gaming and web browsing, predicted higher internet and social media skills for adolescents, which in turn predicted stronger academic achievement and self-efficacy (human capital), and social interaction and extracurricular participation (social capital). The positive indirect effect of screen time through skills offset or exceeded any small negative direct effects across several outcomes (supporting our existing peer-reviewed work: Hales & Hampton, 2025, and which you can read more about here).

These exchange processes were amplified when peer and family networks were modeled alongside digital skills, consistent with the premise that online and offline contexts operate together rather than in competition. The effect was not universal: social media skills amplified rather than offset a negative association with consistency of interest, one of the two subscales of grit. The exchange framework describes a contextual and conditional, domain-specific mechanism, not a blanket defense of time spent online.

Implications

If digital inequality, and not screen time, is the primary predictor of adolescent academic and developmental decline, and still warrants concern regarding access quality and experience even with the broader adoption of digital devices across the United States, the current policy emphasis on restriction is pointed at the wrong target. The evidence supports a different set of priorities.

Stable, reliable home (fast) broadband should be treated as an educational prerequisite rather than a consumer amenity. Unreliable connectivity exerted larger downward pressure on human capital than any measure of screen time, and that pressure intensified during the pandemic-era reliance on digital infrastructure. Technology maintenance, device repair, replacement, technical support, and the flexibility to install software and explore the web autonomously, matters as much as initial access, and school-issued devices that restrict autonomous use appear to hinder skill accumulation rather than support it.

Restrictive parental mediation of internet use was negatively associated with grit and self-efficacy at magnitudes comparable to the positive contributions of face-to-face activity. This challenges the assumption that digital restriction functions protectively. Instructive mediation, teaching adolescents to verify information, navigate platforms critically, and mobilize online resources toward meaningful ends, is the posture the data supports.

Finally, the technical skill-building that occurs through gaming, self-directed exploration, and deep web use is skill-building, not wasted time. Closing the persistent gender gap in these domains likely requires legitimizing technical play for girls, rather than restricting it for everyone.

None of the above is an argument that screen time is benign. It is an argument that screen time is the wrong focus, particularly when studied mostly in isolation. Context matters substantially, whether that is time spent on other activities during adolescence, the period of adolescence itself, digital inequality, resources gained from such online use, and how all such factors interact. The factor that predicts whether a given adolescent can convert online engagement into capital outcomes is structural: access, infrastructure, skills, and the autonomy to use them. These factors are distributed unevenly, and its uneven distribution, not hours logged, is what separates adolescents who thrive from those who fall behind.

The full dissertation is available through Michigan State University’s ProQuest archive, or see the embedded full-text PDF below. I’m happy to share papers, preprints, or the underlying framework with anyone interested and working in this area––don’t hesitate to reach out via my contact form. Thanks for reading.

Tags: automatic speech Garmin GPS John Pierce speech speech recognition

Speech Recognition Reassessed by A. Michael Noll