Using AI to study demographic representation in Indian TV

Storytelling is intrinsic to India’s rich cultural heritage, creating shared experiences across the country’s social plurality and linguistic diversity. Stories have also always had a central place in informing, educating, and entertaining a growing audience base across screens of all sizes, through new content genres, and on new platforms.

Storytelling that is not only relatable and relevant, but also equitable and representative of India’s vast demographic, has emerged as an important imperative for media producers and content creators. And with the largest viewership across segments, TV content is key to this, enjoying prominence, resonance and reach deep within people’s homes.

This is the driving force for “Reflecting India: An intersectional and longitudinal analysis of popular scripted television from 2018 to 2022,” a new 5-year longitudinal study led by the Geena Davis Institute on Gender in Media (GDI), to which Google Research has extended its AI-powered research support, with the Signal Analysis and Interpretation Laboratory (SAIL) at the University of Southern California (USC) as the study’s academic advisor, and the India Chapter of the International Advertising Association (IAA) as the media studies advisor.

This first-of-its-kind large-scale, multi-lingual study examined media content across five Indian languages – Bengali, Hindi, Kannada, Tamil, and Telugu – in 10 scripted television shows that were the most watched between 2018 and 2022, according to the Broadcast Audience Research Council (BARC), India. The sample included a variety of genres such as soap operas, thrillers and mythological dramas. The study sample was organized by GDI and IAA.

Google’s machine learning innovations in computer vision and the natural-language understanding capabilities of its large language models (LLMs) powered the multimodal analysis. Specifically, AI-enabled technology developed by Google Research MUSE (Media Understanding for Social Exploration) was used to infer the visual and intersectional attributes of perceived gender, perceived skin tone, and perceived age of the on-screen characters.

In addition, the dialogues were automatically transcribed using Google’s Universal Speech Model, a state-of-the-art automatic speech recognition model, and the language in the dialogue was analyzed with our LLMs, drawing from expertise in Project BINDI which focuses on evaluating and mitigating undesirable biases in language models. The automatic language analysis complemented visual analysis to draw multimodal insights.

This AI-backed analysis yielded a wealth of evidence that would have otherwise been impractical and difficult to collect manually. Automation also provided the benefit of accuracy and consistency in analysis and reducing human error. This technology processed over 430 hours of footage in less than 48 hours with over 100 frames per second, cumulatively analyzing over 15 million frames, and about 38 million face appearances, and nearly 2 million words using machine learning models, delivering several data-driven insights.

Key findings:

Female characters had more on-screen time than male characters, nearly 55.8% for women compared to 44.2% for men, with both Bengali and Telugu shows providing female characters the highest proportion of screen time, approximately 59%, across all languages.
While female names are mentioned more often in dialogue than male names, unique male names outnumbered unique female names. Perceived female names featured in 55.6% of all instances in which names were mentioned in dialogue, but these names were only 46.7% of the different names featured on the shows.
Young adults (18–33 years old) are seen on screen the most, accounting for 75.6% of all characters present on screen, with female characters over the age of 33 on screen for less time than their male counterparts.
Characters with lighter skin tones were shown 8x more on screen than characters with medium or dark skin tones. However, between 2018 and 2022, the screen time of characters with medium skin tones increased proportionately with a decrease in screen time of characters with lighter skin tones.
When shown on screen, female characters tend to be younger and with lighter skin tones than male characters. 70% of female characters on screen were between the ages of 18 and 32 and had lighter skin tones, compared to 52.9% for male characters, who represented a wider age and skin tone range.
Tamil and Telugu language TV shows present a wider range of skin tones, with characters with darker skin tones occupying more screen time, approximately 23%, than in other language shows, which showed characters with medium or dark skin tones between 13% and 18% of screen time.

This study builds on our earlier joint studies on gender equity in Hollywood movies and 12 years of representation in US television shows, which were among the first to use AI to effectively study representation in media at scale. Expanding this work to the Indian context is a meaningful step towards fostering global understanding of media representation and cross-culture patterns, speaking directly to Google’s Responsible AI approach in ensuring that foundational AI technologies, be they in vision, language, or audio, are not just English and western-centric, but work for plethora of languages and visual mediums.

Neha Barjatya, Director, Marketing, Google India, commented, “Access to digital is crucial to opening up the gateways of information, opportunity and progress, and we’re committed to building products that empower everyone to use the internet with convenience and confidence. Bridging the digital gender gap is core to this, and we’re committed to doing our part to ensure women’s participation in the digital economy is equitable - be they as creators, innovators, or entrepreneurs. As an AI-first company, we’re delighted to power this groundbreaking study and join hands with GDI, SAIL and IAA to build a better understanding of representation in popular media, and inform the industry’s progression towards greater inclusiveness.”

Komal Singh, Google AI Research, Product Manager and Lead on MUSE, added here, “AI advances in computer vision and natural language processing technologies provide us a powerful computational lens to study human-centric representation in mainstream media, at scale. We were fortunate to have this meaningful and novel opportunity to apply the tech towards studying Indian TV across a breadth of languages and timeframe. This work is an exemplar of Google’s responsible approach to AI at work — using AI technology along with partners in the service of encouraging more transparency, and equity for diverse communities.”