Computational Linguist

Muhammad S. Abdo

PhD Student · Indiana University Bloomington

Dual-major PhD student in Computational Linguistics & Middle Eastern Languages and Cultures, minoring in Computer Science. Researching Arabic NLP, Natural Language Inference, and Mechanistic Interpretability.

0
Citations
0
Publications
0
h-index
0
i10-index
Muhammad S. Abdo

NLP Researcher & Arabic Linguist

I am a dual-major PhD student in Computational Linguistics and Middle Eastern Languages & Cultures at Indiana University Bloomington, with a minor in Computer Science. My research centers on Natural Language Inference (NLI), Arabic NLP pipelines, and—more recently—Mechanistic Interpretability for transformer-based language models.

I have built datasets and systems for ellipsis detection, dialect classification, sexism detection, and NER in Arabic financial news. I also developed Rasid, a 900M+ word Arabic Twitter corpus, and RogueTeX, a web-based LaTeX editor with Hugging Face cloud compilation.

Full Biography →

Areas of Focus

Bridging formal linguistics with modern NLP to build robust, interpretable, and socially aware language systems.

01 // Arabic NLP

Arabic NLP & Low-Resource Languages

Developing datasets, tools, and models for Arabic with coverage for dialect variation (NADI), morphology, ellipsis (ḥaḏf), named entity recognition in financial news (AMWAL), and discourse analysis.

Ellipsis Dialect ID NER Arabic
02 // NLI

Natural Language Inference

Modeling entailment, contradiction, and pragmatic inference in Arabic and multilingual texts, with focus on morphology, syntax, and discourse markers. Current work examines label drift in multilingual NLI under machine translation settings.

Entailment Pragmatics Multilingual
03 // Interpretability

Mechanistic Interpretability

Applying mechanistic interpretability techniques to speech transformers to detect linguistic nativeness in Arabic. Investigating AI safety, explainability, and how internal model circuits encode linguistic features.

Transformers AI Safety Speech
04 // Knowledge Graphs

RAG & Knowledge Graphs

Building retrieval-augmented knowledge graphs that track entities and relations in the medical and financial domains. Current project constructs an ontology-based knowledge graph for Alzheimer's disease research.

RAG Ontology Alzheimer's
All Research Projects →

Selected Works

All Publications →
Journal 2025 📄 In Press
Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words
Muhammad S. Abdo, Damir Cavar, Billy Dickson, Attia Youseif
Arabic Linguistics, John Benjamins
Workshop 2025 ★ 1 cite
AMWAL: Named Entity Recognition for Arabic Financial News
Muhammad S. Abdo, Yash Hatekar, Damir Cavar
FinNLP-FNP-LLMFinLegal @ COLING 2025, Abu Dhabi
Conference 2024 ★ 12 cites
The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications
Damir Cavar, Louis V. Mompelat, Muhammad S. Abdo
SIGTYP 2024 @ EACL
Journal 2020 ★ 74 cites
How Do Arab Tweeters Perceive the COVID-19 Pandemic?
Basma A. Essam, Muhammad S. Abdo
Journal of Psycholinguistic Research

Recent Activity

May 2026
APSA MENA AI Roundtable Participated in the American Political Science Association (APSA) roundtable, discussing the regional impacts, governance, and technical challenges of artificial intelligence in the MENA region.
May 2026
Completed Technical AI Safety CourseAI Safety Fellowship — Completed BlueDot Impact's technical AI safety curriculum focusing on neural network alignment and governance frameworks.
Apr 2026
Poster at NLP @ Michigan Day — Presented mechanistic interpretability approach to detecting linguistic nativeness in Arabic speech transformers.
Apr 2026
Two Posters at MSLD 2026 — Presented at Midwest Speech and Language Days in Urbana-Champaign: (1) label drift in multilingual NLI, (2) ontology-based KG for Alzheimer's research.
Apr 2026
Best Abstract Award — Received Best Abstract Award for graduate students at the 39th Annual Symposium on Arabic Linguistics (ASAL39), which I co-organized at Indiana University.
Mar 2026
New Publication — "Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words" officially published in John Benjamins' Arabic Linguistics.
Feb 2026
RogueTeX Launch — Released RogueTeX, a web-based LaTeX editor with Hugging Face cloud compilation and Supabase cloud storage. Open to the public at roguetex.app.