Black Voices Comes to AI

Google aims to teach artificial intelligence (AI) to understand black dialects, African-American slang, hood language, or whatever you want to call it. Howard University and Google are collaborating to train AI to recognize Black dialect in speech recognition.

Howard and Google created Project Elevate Black Voices. The dataset includes over 600 hours of African-American dialects from 32 states to improve AI recognition of diverse Black speech..

Sounds Black to me!

African-Americans have a distinct linguistic style that can be differentiated by geographic region, level of education, upbringing, cultural variations, and gender. People often refer to this as when someone “sounds black.”

Often, Black people are forced to abandon our natural way of speaking when engaging with voice recognition technology. Developers design these systems with only white voices and dialects, causing the technology to struggle with recognizing Black speech. Similarly, facial recognition programs are primarily trained on white faces, making it difficult for AI to accurately identify Black faces. As a result, Black people are frequently misidentified and, in some cases, wrongly arrested.

African-Americans using voice recognition technology have had to alter their voice, pronunciation and even vocabulary to sound more authentic, or white, if I may say so. Black people are forced to change from their natural accents to be understood by voice products. These linguistic nuances can easily be misunderstood by AI-driven technologies. As a result Black people face obstacles when interacting with these technologies.

“African American English has been at the forefront of United States culture since almost the beginning of the country,” said Gloria Washington, Ph.D., Howard University researcher and co-principal investigator of Project Elevate Black Voices and Howard University researcher. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but also other persons who speak these unique dialects. It’s about time that we provide the best experience for all users of these technologies.”

The Howard African American English Dataset 1.0 will initially be available exclusively to researchers and institutions within HBCUs, with Howard University retaining ownership of the data. Howard wants to ensure that the data reflects the interests and needs of marginalized communities and is used appropriately. Particularly, Howard seeks to benefit African American communities, whose linguistic practices have often been excluded or misrepresented in computational systems.