Harshita Diddee

PhD @ Carnegie Mellon University | hdiddee (at) andrew.cmu.edu

updated_profile_pic.jpg
5509 Gates Hillman Center, Carnegie Mellon University

LTI PhD @ Carnegie Mellon University

Hello!

I am a 1st Year Ph.D. student at Carnegie Mellon University (LTI, School of Computer Science) where I am advised by Daphne Ippolito. I am interested in designing optimal composition for training datasets, data quality estimation metrics and alignment methods between human and automatic evaluation pipelines. Prior to this, I spent 2 incredible years at Microsoft Research, India as a Pre-Doctoral Research Fellow where I was advised by Kalika Bali. While at MSR, I studied the compression of Large Language Models (text and speech) operating under extreme data sparsity. I also led the development of INMT-Lite, an interactive neural machine translation service aimed at improving the yield of data collection pipelines for extremely low-resource languages.

Previously, I have contributed to AI4Bharat’s Samanantar (Guided by Dr. Mitesh M. Khapra, IIT Madras) and VisionAir (Guided by Dr. Aakanksha Chowdhery, Google Brain, Work featured by TensorFlow). I graduated as the Best Outgoing Student for the Class of 2021, from Bharati Vidyapeeth’s College of Engineering (B.Tech, Computer Science) as well the Best Outgoing Student of the Department of Computer Science.

My earliest formal education started about 20 years ago when I began learning Odissi, an Indian Classical Dance originating in Odisha. Dashavatar or The Ten Primary Avatars of Vishnu and Battu are some recitals in this diversely rich art form ( I am the dancer in the pink-purple costume :)). Consequently, I am also deeply invested in music (any and every form or language - Here is me singing a Hindi Bhajan).

News

Feb 2024 INMT-Lite accepted to LREC 2024! Code out, Preprint out soon!
Oct 2023 2 Papers Accepted to EMNLP 2023! MEGA and Fifty Shades of Bias: Normative Ratings of Gender Bias in GPT Generated English Text accepted to EMNLP 2023.
Apr 2023 Accepted an offer to join LTI @ Carnegie Mellon University for my PhD!
Nov 2022 I’ll be attending ALPS Winter School this January! Let’s chat if you are interested in referenceless NLG evaluations and Quality Estimation. You can check out my poster on our preliminary constraint review of QE here!
Nov 2022 CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection has been accepted to ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)!
Oct 2022 Too Brittle To Touch: Comparing the Stability of Quantization and Distillation towards developing Low-Resource MT Models was accepted to WMT (Research Track): Check out the Preprint and Code! Headed to EMNLP to present it! Let’s chat if you’re interested in Data Quality Estimation, contrainsted generation or anything Low-Resource :)
Aug 2022 Joining the organizing committee for The 2022 IEEE Spoken Language Technology Workshop, SLT 2022 Hackathon!
Jul 2022 Visiting Johns Hopkins University for the month as a part of JSALT’22: Contributing to the Speech Translation for Under-Resourced Languages Track
May 2022 Presenting our work A Collaborative Approach to Developing Language Technology Interventions for Endangered Languages and leading the panel (Panel A) at the ComputEL-5, ACL’22

Selected Papers

  1. MEGA: Multilingual Evaluation of Generative AI
    Kabir Ahuja, Harshita Diddee, Rishav Hada, and 9 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Dec 2023
  2. Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
    Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, and 2 more authors
    Conference On Machine Translation 2022 Oct 2022