Harshita Diddee
PhD @ Carnegie Mellon University | hdiddee (at) andrew.cmu.edu
LTI PhD @ Carnegie Mellon University
Hello!
I am a 2nd Year Ph.D. student at Carnegie Mellon University (LTI, School of Computer Science) where I am advised by Daphne Ippolito. I am interested in designing optimal composition for training datasets (Q-How do we select useful subsets of data from existing datasets when a NLP practioner defines “useful”), data quality estimation metrics (Q-How do we quantify the utility of a subset of data, Q-How do we select datapoints to evaluate Language Models on subjective problems like General Purpose Instruction Following).
I am on the lookout for internships for Summer 2025 so if any of these questions seem relevant, please reach out for a chat!
Prior to this, I spent 2 incredible years at Microsoft Research, India as a Pre-Doctoral Research Fellow where I was advised by Kalika Bali. While at MSR, I studied the compression of Large Language Models (text and speech) operating under extreme data sparsity. I also led the development of INMT-Lite, an interactive neural machine translation service aimed at improving the yield of data collection pipelines for extremely low-resource languages.I have contributed to AI4Bharat’s Samanantar (Guided by Dr. Mitesh M. Khapra, IIT Madras) and VisionAir (Guided by Dr. Aakanksha Chowdhery, Google Brain, Work featured by TensorFlow). I graduated as the Best Outgoing Student for the Class of 2021, from Bharati Vidyapeeth’s College of Engineering (B.Tech, Computer Science) as well the Best Outgoing Student of the Department of Computer Science.
My earliest formal education started about 20 years ago when I began learning Odissi, an Indian Classical Dance originating in Odisha. Dashavatar or The Ten Primary Avatars of Vishnu and Battu are some recitals in this diversely rich art form ( I am the dancer in the pink-purple costume :)). Consequently, I am also deeply invested in music (any and every form or language - Here is me singing a Hindi Bhajan).
News
Feb 2024 | INMT-Lite accepted to LREC 2024! Code and Paper out! |
---|---|
Oct 2023 | 2 Papers Accepted to EMNLP 2023! MEGA and Fifty Shades of Bias: Normative Ratings of Gender Bias in GPT Generated English Text accepted to EMNLP 2023. |
Apr 2023 | Accepted an offer to join LTI @ Carnegie Mellon University for my PhD! |
Nov 2022 | I’ll be attending ALPS Winter School this January! Let’s chat if you are interested in referenceless NLG evaluations and Quality Estimation. You can check out my poster on our preliminary constraint review of QE here! |
Nov 2022 | CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection has been accepted to ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)! |
Oct 2022 | Too Brittle To Touch: Comparing the Stability of Quantization and Distillation towards developing Low-Resource MT Models was accepted to WMT (Research Track): Check out the Preprint and Code! Headed to EMNLP to present it! Let’s chat if you’re interested in Data Quality Estimation, contrainsted generation or anything Low-Resource :) |
Aug 2022 | Joining the organizing committee for The 2022 IEEE Spoken Language Technology Workshop, SLT 2022 Hackathon! |
Jul 2022 | Visiting Johns Hopkins University for the month as a part of JSALT’22: Contributing to the Speech Translation for Under-Resourced Languages Track |
May 2022 | Presenting our work A Collaborative Approach to Developing Language Technology Interventions for Endangered Languages and leading the panel (Panel A) at the ComputEL-5, ACL’22 |