About | Muhammad S. Abdo

Biography

I am a dual-major PhD student in Computational Linguistics and Middle Eastern Languages & Cultures at Indiana University Bloomington, with a minor in Computer Science. My research centers on Natural Language Inference (NLI) and extends to NLP pipelines designed for real-world applications in Arabic.

I have built and evaluated datasets and systems for ellipsis detection, dialect classification, sexism detection, and named entity recognition in Arabic financial news. I have also studied public discourse on Arabic Twitter and analyzed language use in depression narratives. More recently, I have expanded my focus toward AI safety, explainability, and mechanistic interpretability — examining how transformer-based models internally represent linguistic features.

I developed Rasid, a 900M+ word Arabic Twitter corpus organized by year, month, and week — enabling longitudinal studies of Arabic language change. I also created RogueTeX, a web-based LaTeX editor with cloud compilation and storage, released publicly in early 2026.

mabdo@iu.edu View Full CV

quick_facts.sh

# Location
echo "Bloomington, Indiana, USA"

# PhD Major 1
echo "Computational Linguistics"

# PhD Major 2
echo "Middle Eastern Languages & Cultures"

# Minor
echo "Computer Science"

# Languages
echo "Arabic (Native)"
echo "English (Fluent)"
echo "Persian (Intermediate)"

Affiliations

🏛️ Dept. of Linguistics
Indiana University Bloomington

🏛️ Dept. of MELC
Indiana University Bloomington

🏛️ Hamilton Lugar School
Global & International Studies, IU

Education

Academic Trajectory

Aug 2022 – Present

Dual-Major PhD — Computational Linguistics & Middle Eastern Languages & Cultures

Indiana University Bloomington · Minor in Computer Science

Research in Arabic NLP, NLI, and Mechanistic Interpretability. Advisor: Dr. Sandra Kübler (Linguistics) and Dr. Nader Morkus (MELC). Passed qualifying exams (MELC, High Pass, 2026).

Jan 2021

MA in English Linguistics

Ain Shams University (Al-Alsun), Cairo, Egypt

Thesis: "Analyzing Appraisal in Major and Bipolar Depression Patients' Narratives in Mental Health Forums: A Corpus-Based Study." Advisor: Dr. Nihal Nagi Sarhan.

Aug 2018

PGCE — Postgraduate Certificate in Education

Monofiya University, Egypt

Jun 2013

BA in English Language and Literature

Benha University, Egypt

Interests

Research Interests

Arabic NLP

Developing datasets, tools, and models for Arabic with coverage for dialect variation, morphology, and low-resource challenges. Includes ellipsis corpus creation, NER for financial news, and dialect identification.

MSADialectsCorpus

Natural Language Inference

Modeling entailment, contradiction, and pragmatic inference in Arabic texts. Focus on morphological and syntactic cues, discourse markers, and cross-lingual label transfer.

EntailmentPragmaticsDiscourse

Mechanistic Interpretability

Probing transformer circuits to understand how speech and language models encode linguistic nativeness, syntactic structure, and semantic features. Committed to building interpretable, trustworthy AI.

AI SafetyProbingSpeech

Computational Pragmatics

Exploring appraisal theory, discourse analysis, and sentiment in clinical narratives, social media, and religious texts. Applying NLP to socially sensitive domains like mental health and Islamic discourse.

AppraisalSentimentSocial Media

Skills

Technical Expertise

NLP & ML

Transformers (HuggingFace) PyTorch scikit-learn LSTM RAG Few-Shot Prompting Fine-tuning

NLP Tasks

Dialect ID NER Ellipsis Recovery Text Classification Sentiment Analysis NLI Information Extraction

Programming

Python Pandas Unix / Bash Jupyter JavaScript LaTeX

Data & Annotation

Corpus Design Annotation Schemes CQL (Corpus Query Language) IAA Evaluation

Corpus Tools

LIWC Stylo ConceptNet ArTenTen Corpus Sketch Engine

Evaluation

F1 / MCC Human-in-the-Loop Bias & Fairness Annotation Quality