Curriculum Vitae

Muhammad S. Abdo

PhD Student · Computational Linguistics & Middle Eastern Languages and Cultures · Indiana University Bloomington
mabdo@iu.edu · Bloomington, Indiana, USA

Google Scholar LinkedIn
Profile Summary

I am a dual-major PhD student in Computational Linguistics and Middle Eastern Languages & Cultures at Indiana University Bloomington, with a minor in Computer Science. My work connects Arabic linguistics, multilingual NLP, dataset construction, model evaluation, and mechanistic interpretability. I build Arabic and multilingual datasets, evaluate language and speech models, and develop research tools for linguistic and computational analysis. My current projects include multilingual NLI label drift under machine translation, mechanistic interpretability of Arabic speech transformers, Arabic ellipsis detection, Arabic financial NER, and ontology-grounded knowledge graph construction for Alzheimer's disease literature.

Research Highlights
NLI & Semantics
Multilingual Natural Language Inference
Study of translation-induced label drift across nine languages, with attention to entailment, contradiction, neutrality, modality, intensionality, conditionality, and comparative constructions.
Interpretability
Arabic Speech Transformers
Mechanistic interpretability work on native and non-native Arabic speech representations, using probing, representation analysis, and causal interventions to examine how models encode linguistic nativeness.
Arabic NLP
Ellipsis, Dialects, and Named Entities
Dataset and model development for Arabic ellipsis detection, Arabic dialect classification, and named entity recognition in Arabic financial news.
Knowledge Graphs
Biomedical Knowledge Graph Construction
Ontology-grounded knowledge graph construction for Alzheimer's disease literature using entity extraction, relation extraction, embeddings, and domain-specific normalization.
Education
2022-Present
PhD - Computational Linguistics and Middle Eastern Languages & Cultures
Indiana University Bloomington · Minor: Computer Science
Dual-major PhD student. Advisors: Dr. Sandra Kübler (Computational Linguistics) and Dr. Nader Morkus (MELC). Passed MELC qualifying exams with High Pass.
2021
MA - English Linguistics
Faculty of Al-Alsun, Ain Shams University, Cairo, Egypt
Thesis: Appraisal in Major and Bipolar Depression Patients' Narratives. Advisor: Dr. Nihal Nagi Sarhan.
2018
Postgraduate Certificate in Education
Faculty of Education, Monofiya University, Egypt
2013
BA - English Language and Literature
Faculty of Arts, Benha University, Egypt
Positions Held
2023-Present
Associate Instructor of Arabic
Department of Middle Eastern Languages & Cultures, Indiana University Bloomington
Summer 2024, 2025
Arabic Instructor
Middlebury Language Schools
2022-2023
Graduate Assistant
Arabic Flagship Program, Indiana University Bloomington
2022
Business Communication Instructor
Faculty of Engineering, Cairo University, Egypt
2021
Translation Lecturer
Faculty of Arts, Port Said University, Egypt
2016-2022
English Instructor
AUC School of Continuing Education, British University in Egypt, AMIDEAST, Al-Azhar ELRC, and other institutions
Publications

Citations: 122 · h-index: 4 · i10-index: 3 (Google Scholar, May 2026)

Forthcoming
Diminutives in North African Varieties of Arabic
S. Davis, M. S. Abdo. In The Handbook of North African Arabic. Forthcoming, August 2026.
Journal Articles, Book Chapters, and Newsletter Contributions
Roundtable Discussion: AI in MENA Politics Research
C. Barnett, M. S. Abdo, T. Adely, C. Bianco, A. Elshehawy, R. Kubinec, M. Robbins. APSA MENA Newsletter, 9(1):55-73. American Political Science Association MENA Section. 2026.
Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words
M. S. Abdo, D. Cavar, B. Dickson, A. Youseif. Arabic Linguistics, 1(2):240-263. John Benjamins. 2025.
Allocution, Sentencing, and Viewers' Comments in YouTube-mediated Trials of Convicted Young Murderers: An Appraisal-Sentiment Analysis
A. A. El Attar, M. S. Abdo. Language and Semiotic Studies, 11(4):636-660. 2025.
Thus Spoke a Couple: A Corpus-Based Content Analysis of Spousal Duties Fatwas
M. S. Abdo, A. Omran, S. F. Hassan. Journal of Digital Islamicate Research, 1(1-2):37-65. 2024.
How Do Arab Tweeters Perceive the COVID-19 Pandemic?
B. A. Essam, M. S. Abdo. Journal of Psycholinguistic Research. 2020. [74 citations]
Public Perception of COVID-19's Global Health Crisis on Twitter until 14 Weeks after the Outbreak
M. S. Abdo, A. S. Alghonaim, B. A. Essam. Digital Scholarship in the Humanities. Oxford University Press. 2020. [22 citations]
Analyzing Judgment in Bipolar Depression Patients' Narratives Using Syntactic Patterns: A Corpus-Based Study
M. S. Abdo, A. Y. Ali, N. N. Sarhan. Egyptian Journal of Language Engineering, 6(1):1-11. 2019.
Conference and Workshop Papers
AMWAL: Named Entity Recognition for Arabic Financial News
M. S. Abdo, Y. Hatekar, D. Cavar. FinNLP-FNP-LLMFinLegal @ COLING 2025, Abu Dhabi. 2025.
The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications
D. Cavar, L. V. Mompelat, M. S. Abdo. SIGTYP 2024 @ EACL. 2024. [12 citations]
IUNADI at NADI 2023 Shared Task: Country-level Arabic Dialect Classification in Tweets
Y. Hatekar, M. S. Abdo. ArabicNLP 2023, pp. 665-669. 2023.
IUEXIST: Multilingual Pre-trained Language Models for Sexism Detection on Twitter in EXIST2023
Y. Hatekar, M. S. Abdo, S. Khanna, S. Kübler. CLEF 2023 Working Notes, pp. 950-958. 2023.
Selected Conference Presentations and Posters
Inside the Nativeness Axis: Mechanistic Interpretability of Native and Non-Native Arabic Speech Representations
M. S. Abdo. Poster presented at NLP @ Michigan Day 2026. Ann Arbor, Michigan. April 22, 2026.
Translation-Induced Label Drift across Nine Languages in Natural Language Inference
M. S. Abdo, E. Naske, J. P. Benavides, V. Shi, E. H. Attala e Paiva, H. Karkoutli, P. Artkaew, S. Ashraf, J. Wang, S. Kübler. Poster presented at Midwest Speech and Language Days 2026. Urbana-Champaign, Illinois. April 15-16, 2026.
Ontology-Grounded Knowledge Graph Construction for Alzheimer's Disease Literature Using Multi-Model Ensemble Embeddings
M. S. Abdo, D. Cavar. Poster presented at Midwest Speech and Language Days 2026. Urbana-Champaign, Illinois. April 15-16, 2026.
Natural Language Inference: Lost in Translation? Label Stability in Arabic Machine Translation
M. S. Abdo, S. Kübler. Presented at the 39th Annual Symposium on Arabic Linguistics. Bloomington, Indiana. March 27-29, 2026.
On the Diminutive Suffix -aaya in Egyptian (Cairene) Arabic
M. S. Abdo, S. Davis. Presented at the Sixth Arabic Linguistics Forum. University of Vienna, Austria. September 3-5, 2025.
AMWAL: Named Entity Recognition for Arabic Financial News
M. S. Abdo, Y. Hatekar, D. Cavar. Presented at FinNLP-FNP-LLMFinLegal @ COLING 2025. Abu Dhabi, UAE. January 19-24, 2025.
Comparing Apologies and Complaints in Egyptian Arabic: Native Speakers vs. Large Language Model-based Chatbots
M. S. Abdo, N. Sarhan. Presented at the 5th Biennial Arabic Applied Linguistics Conference. Michigan. October 19-20, 2024.
The Hoosier Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing
M. S. Abdo, D. Cavar. Presented at Michigan Speech and Language Days. Michigan. April 15-16, 2024.
Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words
M. S. Abdo, D. Cavar, B. Dickson. Presented at the 37th Annual Symposium on Arabic Linguistics. Long Island University, New York. February 23-25, 2024.
Selected Language Teaching Presentations
Collocation Use in Arabic Learners' and Native Speakers' Writings
A. Youseif, M. S. Abdo. ACTFL Annual Convention. Philadelphia, USA. 2024.
Alhijra Alshar'ia: A Corpus-based Analysis of Collocation Use in Arabic Learners' and Native Speakers' Writing
A. Youseif, M. S. Abdo. BATA 4th Annual International Conference. Al-Maktoum College, Scotland. 2024.
Artificial Intelligence for Arabic Language Teaching: Enhancing the Four Skills and Fostering Intercultural Competence
A. Youseif, M. S. Abdo. International Conference on Teaching Arabic as a Foreign Language. Université de Monastir, Tunisia. 2024.
Certificates and Training
2026
Technical AI Safety
BlueDot Impact
Completed technical AI safety training focused on frontier AI risk, neural network behavior, dangerous capability evaluation, control, monitoring, robustness, and governance-relevant safety frameworks.
Certificate · Technical AI Safety
Service and Awards
2026
Best Abstract Award - Graduate Students
39th Annual Symposium on Arabic Linguistics (ASAL39), Indiana University Bloomington
2026
Lead Organizer - ASAL39
39th Annual Symposium on Arabic Linguistics, Indiana University Bloomington
Organized venue logistics, hotel coordination, keynote communication, website materials, reviewer coordination, and event planning.
2025
Reviewer - COLING 2025
31st International Conference on Computational Linguistics
2024
Reviewer - LREC-COLING 2024
Joint International Conference on Computational Linguistics, Language Resources, and Evaluation
2023
Reviewer - LAW-XVII 2023
17th Linguistic Annotation Workshop
2023-Present
Journal Reviewer
Journal of Digital Islamicate Research
Selected Projects
Rasid: Arabic Twitter Corpus
Built and maintained a 900M+ word Arabic Twitter corpus for corpus linguistic and computational analysis of Arabic public discourse.
RogueTeX
Developed a web-based LaTeX editor with Hugging Face cloud compilation and Supabase cloud storage for browser-based academic writing workflows.
Arabic Ellipsis Detection
Developed datasets and machine learning approaches for detecting and predicting elided elements in Arabic.
Arabic Dialect Identification
Built models for country-level Arabic dialect classification in tweets as part of the NADI 2023 shared task.
Sexism Detection
Collaborated on multilingual sexism detection using pre-trained language models for EXIST 2023.
Technical and Linguistic Skills
NLP and Machine Learning
Transformers Text Classification Named Entity Recognition Dialect Identification Sentiment Analysis Topic Modeling Knowledge Graphs
Interpretability and Evaluation
Probing Error Analysis Representation Analysis NLI Evaluation Speech Models AI Safety
Programming and Libraries
Python Pandas NumPy scikit-learn PyTorch TensorFlow Hugging Face
Corpus and Data Work
Corpus Design Annotation Web Scraping Corpus Statistics Data Visualization Arabic Corpora
Tools and Methods
spaCy NLTK Gensim Matplotlib Jupyter Unix RAG
Languages
Arabic Native English Fluent Persian Intermediate