Harshita Diddee

PhD @ Carnegie Mellon University | hdiddee (at) andrew.cmu.edu

updated_profile_pic.jpg
6715 Gates Hillman Center, Carnegie Mellon University

LTI PhD @ Carnegie Mellon University

Hello!

I am a 3rd Year Ph.D. student at Carnegie Mellon University (LTI, School of Computer Science) where I am advised by Daphne Ippolito.

My vision is to make LLMs reliable for open-ended real-world tasks by improving measurement validity: the gap between appearing capable on an evaluation benchmark and succeeding at what users actually need.

I pursue this in two ways: (1) enabling users to anticipate hidden failures - our work on why this matters; our tool which helps reveal where narrow, static evaluations benchmarks miss and diverge on model competence conclusions for real-world use cases; and (2) reduce the cost of such failures by making opaque, verbose LLM outputs easier to verify before errors become consequential—for example, an incorrect tax return buried in a 100,000-token response (Ongoing work at CMU, Adobe Research).

Last summer, I worked with Amazon Core Search to make retail LLM-as-Judges less brittle—addressing a measurement-validity failure mode where a judge can appear reliable while missing fundamental relevance signals (preprint forthcoming). Previously, as a Pre-Doctoral Research Fellow at Microsoft Research India advised by Kalika Bali, I studied text and speech LLM compression under extreme data scarcity (text; speech).

I earned my B.Tech. in Computer Science from Bharati Vidyapeeth’s College of Engineering as the Class of 2021 and Computer Science Department’s Best Outgoing Student.

My earliest formal education started about 20 years ago when I began learning Odissi, an Indian Classical Dance originating in Odisha. Dashavatar or The Ten Primary Avatars of Vishnu and Battu are some recitals in this diversely rich art form ( I am the dancer in the pink-purple costume :)). Consequently, I am also deeply invested in music (any and every form or language - Here is me singing a Hindi Bhajan).

News

May 2026 Joined Adobe Research in San Jose, CA as a Research Scientist Intern!
Jan 2026 We released BenchBrowser: A tool to help you build more trust in the validity of the evaluations you see for capabilities of your interest! Given your custom use case: BenchBrowser retrieves testcases from 20+ popular benchmarks to assess how your capability is being evaluated and provides a workspace for you to assess if such evaluations lead to consistent conclusions about if a model is good for your task.
May 2025 Joined Amazon’s Core Search Team as an Applied Science Intern. Will be in Palo Alto, CA for the Summer! Let’s connect if you are interested in Data Selection and/or Language Model Personalization!
Jan 2025 Chasing Random: Instruction Selection Strategies Fail to Generalize accepted to NAACL Findings! Code released here!
Feb 2024 INMT-Lite accepted to LREC 2024! Code and Paper out!
Oct 2023 2 Papers Accepted to EMNLP 2023! MEGA and Fifty Shades of Bias: Normative Ratings of Gender Bias in GPT Generated English Text accepted to EMNLP 2023.
Apr 2023 Accepted an offer to join LTI @ Carnegie Mellon University for my PhD!
Nov 2022 I’ll be attending ALPS Winter School this January! Let’s chat if you are interested in referenceless NLG evaluations and Quality Estimation. You can check out my poster on our preliminary constraint review of QE here!
Nov 2022 CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection has been accepted to ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)!
Oct 2022 Too Brittle To Touch: Comparing the Stability of Quantization and Distillation towards developing Low-Resource MT Models was accepted to WMT (Research Track): Check out the Preprint and Code! Headed to EMNLP to present it! Let’s chat if you’re interested in Data Quality Estimation, contrainsted generation or anything Low-Resource :)
Aug 2022 Joining the organizing committee for The 2022 IEEE Spoken Language Technology Workshop, SLT 2022 Hackathon!
Jul 2022 Visiting Johns Hopkins University for the month as a part of JSALT’22: Contributing to the Speech Translation for Under-Resourced Languages Track
May 2022 Presenting our work A Collaborative Approach to Developing Language Technology Interventions for Endangered Languages and leading the panel (Panel A) at the ComputEL-5, ACL’22

Selected Papers

  1. BenchBrowser: Retrieving Evidence for Evaluating Benchmark Validity
    Harshita Diddee, Gregory Yauney, Swabha Swayamdipta, and 1 more author
    Preprint Feb 2026
  2. Chasing Random: Instruction Selection Strategies Fail to Generalize
    Harshita Diddee, and Daphne Ippolito
    In Findings of the Association for Computational Linguistics: NAACL 2025 Apr 2025
  3. NoveltyBench: Evaluating Creativity and Diversity in Language Models
    Yiming Zhang, Harshita Diddee, Susan Holm, and 5 more authors
    In Second Conference on Language Modeling Apr 2025