Computational Linguist
PhD Student · Indiana University Bloomington
Dual-major PhD student in Computational Linguistics & Middle Eastern Languages and Cultures, minoring in Computer Science. Researching Arabic NLP, Natural Language Inference, and Mechanistic Interpretability.
I am a dual-major PhD student in Computational Linguistics and Middle Eastern Languages & Cultures at Indiana University Bloomington, with a minor in Computer Science. My research centers on Natural Language Inference (NLI), Arabic NLP pipelines, and—more recently—Mechanistic Interpretability for transformer-based language models.
I have built datasets and systems for ellipsis detection, dialect classification, sexism detection, and NER in Arabic financial news. I also developed Rasid, a 900M+ word Arabic Twitter corpus, and RogueTeX, a web-based LaTeX editor with Hugging Face cloud compilation.
Bridging formal linguistics with modern NLP to build robust, interpretable, and socially aware language systems.
Developing datasets, tools, and models for Arabic with coverage for dialect variation (NADI), morphology, ellipsis (ḥaḏf), named entity recognition in financial news (AMWAL), and discourse analysis.
Modeling entailment, contradiction, and pragmatic inference in Arabic and multilingual texts, with focus on morphology, syntax, and discourse markers. Current work examines label drift in multilingual NLI under machine translation settings.
Applying mechanistic interpretability techniques to speech transformers to detect linguistic nativeness in Arabic. Investigating AI safety, explainability, and how internal model circuits encode linguistic features.
Building retrieval-augmented knowledge graphs that track entities and relations in the medical and financial domains. Current project constructs an ontology-based knowledge graph for Alzheimer's disease research.