Publications

Conference Journal Workshop Preprint 🥇 Best Paper ♣ Equal Contribution ♠ Equal Mentorship

Preprints

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Jingdi Lei*, Varun Gumma*, Rishabh Bhardwaj*, Seok Min Lim, Chuan Li, Amir Zadeh, Soujanya Poria

Preprint (2025)

ABS PDF Data Code

The Role of Synthetic Data in Multilingual, Multi-Cultural AI Systems: Lessons from Indic Languages

Pranjal A. Chitale, Varun Gumma, Sanchit Ahuja, Prashant Kodali, Manan Uppadhyay, Deepthi Sudharsan, Sunayana Sitaram

Preprint (2025)

ABS PDF Data

HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings

Varun Gumma, Anandhita Raghunath, Mohit Jain†, Sunayana Sitaram†

Preprint (2024)

ABS PDF

Published Papers

Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios

Millicent Ochieng, Varun Gumma, Sunayana Sitaram, Jindong Wang, Vishrav Chaudhary, Keshet Ronen, Kalika Bali, Jacki O'Neill

AfricaNLP (2025)

ABS PDF

Contamination Report for Multilingual Benchmarks

Sanchit Ahuja*, Varun Gumma*, Sunyana Sitaram

EvalEval (2024)

ABS PDF

PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data

Ishaan Watts, Varun Gumma, Aditya Yadavalli, Vivek Seshadri, Manohar Swaminathan, Sunayana Sitaram

EMNLP (2024)

ABS PDF Code

🥇 Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology

Rishav Hada, Safiya Husain, Varun Gumma, Harshita Diddee, Aditya Yadavalli, Agrima Seth, Nidhi Kulkarni, Ujwal Gadiraju, Aditya Vashistha, Vivek Seshadri, Kalika Bali

ACM FAccT (2024)

ABS PDF

METAL: Towards Multilingual Meta-Evaluation

Rishav Hada*, Varun Gumma*, Mohamed Ahmed, Kalika Bali, Sunayana Sitaram

ACL Findings (2024)

ABS PDF Code

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Mohamed Ahmed, Kalika Bali, SunayanaSitaram

NAACL (2024)

ABS PDF Code

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Rishav Hada, Varun Gumma, Adrian Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

ACL Findings (2024)

ABS PDF Code

MAFIA: Multi-Adapter Fused Inclusive Language Models

Prachi Jain*, Ashutosh Sathe*, Varun Gumma, Kabir Ahuja, Sunayana Sitaram

EACL (2024)

ABS PDF

MunTTS: A Text-to-Speech System for Mundari

Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani Mondal, Vivek Seshadri, Kalika Bali

ComputEL (2024)

ABS PDF Code

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

Jay Gala*, Pranjal A Chitale*, A K Raghavan, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar M, Janki Atul Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M Khapra, Raj Dabre, Anoop Kunchukuttan

TMLR (2023)

ABS PDF Code Models

PAMMELA: Policy Administration Methodology using Machine Learning

Varun Gumma, Barsha Mitra, Soumyadeep Dey, Pratik Shashikantbhai Patel*, Sourabh Suman*, Saptarshi Das, Jaideep Vaidya

SECRYPT (2022)

ABS PDF Code

Decorative illustration