Aditya Yadavalli

I am a first-year Ph.D. student in the LeM🍋N Lab at UC San Diego, where I am advised by Prof. Alex Warstadt. Here, I work on using spoken language models (SLMs) as cognitive models to study how information is conveyed through lexical (textual) and non-lexical (speech) streams of human communication.

Previously, I was a Speech/NLP Engineer at Karya. There, I worked on a range of topics: human-LLM agreement across various tasks, building indigenous/endangered language resources, quality estimation of crowdsourced datasets, and assistive tools for crowdsource workers.

I used to volunteer at Masakhane to explore how current SLMs do not generalise well, especially when we encounter African languages and accents, and how we can make them better.

From 2021-2023, I was also a visiting researcher at Case Western Reserve University (CWRU). There, I collaborated with Prof. Vera Tobin to evaluate NLP models trained on Child Directed Speech (CDS) to establish common mistakes that humans and NLP models make when acquiring a new (second) language.

I completed my B.Tech. (Hons) in Computer Science & M.S. by Research in Computational Linguistics at IIIT Hyderabad. For my M.S. thesis, I explored how closely related languages can be used to improve the performance of low-resource languages at Speech Processing Lab (SPL) with Prof. Anil Kumar Vuppala.

When not at work, you can find me discussing cricket or picking up obscure trivia that no one cares about.

news

Nov 10, 2025	Paper titled ELR-1000: A Community-Generated Dataset for Endangered Indic Indigenous Languages accepted at IJCNLP-AACL 2025!
Apr 15, 2025	Accepted an offer to join the LeM🍋N Lab at UCSD as a Ph.D. student!
Oct 17, 2024	Paper titled PARIKSHA: Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data accepted at EMNLP 2024!
Mar 28, 2024	Paper titled Speaking in Terms of Money: Financial Knowledge Acquisition through Speech Data Generation accepted at COMPASS 2024!
Mar 25, 2024	Paper titled “Akal Badi ya Bias”: An Exploration Study of Gender Bias in Hindi accepted at FAccT 2024!
Jan 27, 2024	Paper titled MunTTS : A Text-to-Speech System for Mundari accepted at ComputEL-7 workshop
Jan 27, 2024	Paper titled AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents accepted at EACL 2024 Findings
Oct 4, 2023	Paper titled AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR accepted in TACL & will be presented at EMNLP 2023!
May 4, 2023	Paper titled SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT accepted at ACL 2023
May 4, 2023	Paper titled X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents accepted in ACL Findings!
Nov 11, 2022	Defended my MS thesis! Thanks to my panel – Dr. Kishore Prahallad and Prof. Chiranjeevi Yarra
Oct 3, 2022	Paper on “How Do Phonological Properties Affect Bilingual Automatic Speech Recognition?” accepted at IEEE SLT 2022
Sep 10, 2022	I will visiting Incheon to attend Interspeech! Happy to meet you if you’ll be attending the same.
Jul 1, 2022	I will be attending NAACL in person. If you are attending too, let’s catch up!
Jun 19, 2022	Submitted my MS Thesis for review
Jun 15, 2022	Paper on “Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition” accepted in Interspeech 2022
Jun 1, 2022	Started working at Karya under the mentorship of Dr. Vivek Seshadri as a Speech/NLP Engineer.

selected publications

EMNLP
PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data

Watts, Ishaan, Gumma, Varun, Yadavalli, Aditya, Seshadri, Vivek, Swaminathan, Manohar, and Sitaram, Sunayana

In Proc. EMNLP 2024

Abs Bib PDF

Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. In this work, we study human and LLM-based evaluation in a multilingual, multi-cultural setting. We evaluate 30 models across 10 Indic languages by conducting 90K human evaluations and 30K LLM-based evaluations and find that models such as GPT-4o and Llama-3 70B consistently perform best for most Indic languages. We build leaderboards for two evaluation settings - pairwise comparison and direct assessment and analyse the agreement between humans and LLMs. We find that humans and LLMs agree fairly well in the pairwise setting but the agreement drops for direct assessment evaluation especially for languages such as Bengali and Odia. We also check for various biases in human and LLM-based evaluation and find evidence of self-bias in the GPT-based evaluator. Our work presents a significant step towards scaling up multilingual evaluation of LLMs.
@article{Watts2024PARIKSHAAL, title = {PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data}, author = {Watts, Ishaan and Gumma, Varun and Yadavalli, Aditya and Seshadri, Vivek and Swaminathan, Manohar and Sitaram, Sunayana}, journal = {In Proc. EMNLP}, year = {2024} }
EACL Findings
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents

Owodunni, Abraham*, Yadavalli, Aditya*, Emezue, Chris*, Olatunji, Tobi*, and Mbataku, Clinton

In Proc. EACL Findings 2024

Abs Bib PDF

Despite advancements in speech recognition, accented speech remains challenging. While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR). Our exploratory analysis of speech embeddings representing 100+ African accents reveals interesting spatial accent relationships highlighting geographic and genealogical similarities, capturing consistent phonological, and morphological regularities, all learned empirically from speech. Furthermore, we discover accent relationships previously uncharacterized by the Ethnologue. Through empirical evaluation, we demonstrate the effectiveness of AccentFold by showing that, for out-of-distribution (OOD) accents, sampling accent subsets for training based on AccentFold information outperforms strong baselines a relative WER improvement of 4.6%. AccentFold presents a promising approach for improving ASR performance on accented speech, particularly in the context of African accents, where data scarcity and budget constraints pose significant challenges. Our findings emphasize the potential of leveraging linguistic relationships to improve zero-shot ASR adaptation to target accents
@article{Owodunni2024AccentFoldAJ, title = {AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents}, author = {Owodunni, Abraham* and Yadavalli, Aditya* and Emezue, Chris* and Olatunji, Tobi* and Mbataku, Clinton}, journal = {In Proc. EACL Findings}, year = {2024} }
ACL
SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT

Yadavalli, Aditya*, Yadavalli, Alekhya*, and Tobin, Vera

In Proc. ACL 2023

Abs Bib PDF

Second language acquisition (SLA) research has extensively studied cross-linguistic transfer, the influence of linguistic structure of a speaker’s native language [L1] on the successful acquisition of a foreign language [L2]. Effects of such transfer can be positive (facilitating acquisition) or negative (impeding acquisition). We find that NLP literature has not given enough attention to the phenomenon of negative transfer. To understand patterns of both positive and negative transfer between L1 and L2, we model sequential second language acquisition in LMs. Further, we build a Mutlilingual Age Ordered CHILDES (MAO-CHILDES) – a dataset consisting of 5 typologically diverse languages, i.e., German, French, Polish, Indonesian, and Japanese – to understand the degree to which native Child-Directed Speech (CDS) [L1] can help or conflict with English language acquisition [L2]. To examine the impact of native CDS, we use the TILT-based cross lingual transfer learning approach established by Papadimitriou and Jurafsky (2020) and find that, as in human SLA, language family distance predicts more negative transfer. Additionally, we find that conversational speech data shows greater facilitation for language acquisition than scripted speech data. Our findings call for further research using our novel Transformer-based SLA models and we would like to encourage it by releasing our code, data, and models
@article{Yadavalli2023SLABERTTP, title = {SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT}, author = {Yadavalli, Aditya* and Yadavalli, Alekhya* and Tobin, Vera}, journal = {In Proc. ACL}, year = {2023}, volume = {abs/2305.19589} }
Interspeech
Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition

Yadavalli, Aditya, Mirishkar, Ganesh, and Vuppala, Anil Kumar

In Proc. Interspeech 2022

Abs Bib PDF

Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current practice is to use dialect-specific ASRs. However, dialect-specific information or data is hard to obtain making it difficult to build dialect-specific ASRs. Furthermore, it is cumbersome to maintain multiple dialect-specific ASR systems for each language. We build a unified multi-dialect End-to-End ASR that removes the need for a dialect recognition block and the need to maintain multiple dialect-specific ASRs for three Telugu regional dialects: Telangana, Coastal Andhra, and Rayalaseema. We find that pooling the data and training a multi-dialect ASR benefits the low-resource dialect the most – an improvement of over 9.71% in relative Word Error Rate (WER). Subsequently, we experiment with multi-task ASRs where the primary task is to transcribe the audio and the secondary task is to predict the dialect. We do this by adding a Dialect ID to the output targets. Such a model outperforms naive multi-dialect ASRs by up to 8.24% in relative WER. Additionally, we test this model on a dialect recognition task and find that it outperforms strong baselines by 6.14% in accuracy.
@inproceedings{Aditya2022Interspeech, author = {Yadavalli, Aditya and Mirishkar, Ganesh and Vuppala, Anil Kumar}, booktitle = {Proc. Interspeech}, title = {Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition}, pages = {1387--1391}, doi = {10.21437/Interspeech.2022-10739}, year = {2022} }