Profile Summary
I am a dual-major PhD student in Computational Linguistics and Middle Eastern Languages & Cultures at Indiana University Bloomington, with a minor in Computer Science. My work connects Arabic linguistics, multilingual NLP, dataset construction, model evaluation, and mechanistic interpretability. I build Arabic and multilingual datasets, evaluate language and speech models, and develop research tools for linguistic and computational analysis. My current projects include multilingual NLI label drift under machine translation, mechanistic interpretability of Arabic speech transformers, Arabic ellipsis detection, Arabic financial NER, and ontology-grounded knowledge graph construction for Alzheimer's disease literature.
Publications
Citations: 122 · h-index: 4 · i10-index: 3 (Google Scholar, May 2026)
Forthcoming
Diminutives in North African Varieties of Arabic
S. Davis, M. S. Abdo. In The Handbook of North African Arabic. Forthcoming, August 2026.
Journal Articles, Book Chapters, and Newsletter Contributions
Roundtable Discussion: AI in MENA Politics Research
C. Barnett, M. S. Abdo, T. Adely, C. Bianco, A. Elshehawy, R. Kubinec, M. Robbins. APSA MENA Newsletter, 9(1):55-73. American Political Science Association MENA Section. 2026.
Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words
M. S. Abdo, D. Cavar, B. Dickson, A. Youseif. Arabic Linguistics, 1(2):240-263. John Benjamins. 2025.
Allocution, Sentencing, and Viewers' Comments in YouTube-mediated Trials of Convicted Young Murderers: An Appraisal-Sentiment Analysis
A. A. El Attar, M. S. Abdo. Language and Semiotic Studies, 11(4):636-660. 2025.
Thus Spoke a Couple: A Corpus-Based Content Analysis of Spousal Duties Fatwas
M. S. Abdo, A. Omran, S. F. Hassan. Journal of Digital Islamicate Research, 1(1-2):37-65. 2024.
How Do Arab Tweeters Perceive the COVID-19 Pandemic?
B. A. Essam, M. S. Abdo. Journal of Psycholinguistic Research. 2020. [74 citations]
Public Perception of COVID-19's Global Health Crisis on Twitter until 14 Weeks after the Outbreak
M. S. Abdo, A. S. Alghonaim, B. A. Essam. Digital Scholarship in the Humanities. Oxford University Press. 2020. [22 citations]
Analyzing Judgment in Bipolar Depression Patients' Narratives Using Syntactic Patterns: A Corpus-Based Study
M. S. Abdo, A. Y. Ali, N. N. Sarhan. Egyptian Journal of Language Engineering, 6(1):1-11. 2019.
Conference and Workshop Papers
AMWAL: Named Entity Recognition for Arabic Financial News
M. S. Abdo, Y. Hatekar, D. Cavar. FinNLP-FNP-LLMFinLegal @ COLING 2025, Abu Dhabi. 2025.
The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications
D. Cavar, L. V. Mompelat, M. S. Abdo. SIGTYP 2024 @ EACL. 2024. [12 citations]
IUNADI at NADI 2023 Shared Task: Country-level Arabic Dialect Classification in Tweets
Y. Hatekar, M. S. Abdo. ArabicNLP 2023, pp. 665-669. 2023.
IUEXIST: Multilingual Pre-trained Language Models for Sexism Detection on Twitter in EXIST2023
Y. Hatekar, M. S. Abdo, S. Khanna, S. Kübler. CLEF 2023 Working Notes, pp. 950-958. 2023.
Selected Conference Presentations and Posters
Inside the Nativeness Axis: Mechanistic Interpretability of Native and Non-Native Arabic Speech Representations
M. S. Abdo. Poster presented at NLP @ Michigan Day 2026. Ann Arbor, Michigan. April 22, 2026.
Translation-Induced Label Drift across Nine Languages in Natural Language Inference
M. S. Abdo, E. Naske, J. P. Benavides, V. Shi, E. H. Attala e Paiva, H. Karkoutli, P. Artkaew, S. Ashraf, J. Wang, S. Kübler. Poster presented at Midwest Speech and Language Days 2026. Urbana-Champaign, Illinois. April 15-16, 2026.
Ontology-Grounded Knowledge Graph Construction for Alzheimer's Disease Literature Using Multi-Model Ensemble Embeddings
M. S. Abdo, D. Cavar. Poster presented at Midwest Speech and Language Days 2026. Urbana-Champaign, Illinois. April 15-16, 2026.
Natural Language Inference: Lost in Translation? Label Stability in Arabic Machine Translation
M. S. Abdo, S. Kübler. Presented at the 39th Annual Symposium on Arabic Linguistics. Bloomington, Indiana. March 27-29, 2026.
On the Diminutive Suffix -aaya in Egyptian (Cairene) Arabic
M. S. Abdo, S. Davis. Presented at the Sixth Arabic Linguistics Forum. University of Vienna, Austria. September 3-5, 2025.
AMWAL: Named Entity Recognition for Arabic Financial News
M. S. Abdo, Y. Hatekar, D. Cavar. Presented at FinNLP-FNP-LLMFinLegal @ COLING 2025. Abu Dhabi, UAE. January 19-24, 2025.
Comparing Apologies and Complaints in Egyptian Arabic: Native Speakers vs. Large Language Model-based Chatbots
M. S. Abdo, N. Sarhan. Presented at the 5th Biennial Arabic Applied Linguistics Conference. Michigan. October 19-20, 2024.
The Hoosier Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing
M. S. Abdo, D. Cavar. Presented at Michigan Speech and Language Days. Michigan. April 15-16, 2024.
Ellipsis in Arabic: Using Machine Learning to Detect and Predict Elided Words
M. S. Abdo, D. Cavar, B. Dickson. Presented at the 37th Annual Symposium on Arabic Linguistics. Long Island University, New York. February 23-25, 2024.
Selected Language Teaching Presentations
Collocation Use in Arabic Learners' and Native Speakers' Writings
A. Youseif, M. S. Abdo. ACTFL Annual Convention. Philadelphia, USA. 2024.
Alhijra Alshar'ia: A Corpus-based Analysis of Collocation Use in Arabic Learners' and Native Speakers' Writing
A. Youseif, M. S. Abdo. BATA 4th Annual International Conference. Al-Maktoum College, Scotland. 2024.
Artificial Intelligence for Arabic Language Teaching: Enhancing the Four Skills and Fostering Intercultural Competence
A. Youseif, M. S. Abdo. International Conference on Teaching Arabic as a Foreign Language. Université de Monastir, Tunisia. 2024.
Selected Projects
Rasid: Arabic Twitter Corpus
Built and maintained a 900M+ word Arabic Twitter corpus for corpus linguistic and computational analysis of Arabic public discourse.
RogueTeX
Developed a web-based LaTeX editor with Hugging Face cloud compilation and Supabase cloud storage for browser-based academic writing workflows.
Arabic Ellipsis Detection
Developed datasets and machine learning approaches for detecting and predicting elided elements in Arabic.
Arabic Dialect Identification
Built models for country-level Arabic dialect classification in tweets as part of the NADI 2023 shared task.
Sexism Detection
Collaborated on multilingual sexism detection using pre-trained language models for EXIST 2023.