Google Scholar - Twitter - Bluesky - LinkedIn - GitHub

drawing

I am a Lecturer (≈ US Assistant Professor) at the Computer Science Department of the University of Cape Town and part of the UCT NLP research group.

My current research is on low-resource NLP, data-efficient modelling, and linguistically informed interpretability. I am particularly interested in exploring the intersection of two areas where data efficiency is central: NLP for low-resource languages and developmentally plausible NLP.

I completed my PhD under Jan Buys at the University of Cape Town in South Africa. My thesis was on optimising subword segmentation for low-resource, morphologically complex languages. Previously I obtained a masters in AI at the University of Amsterdam, supervised by Martha Lewis, and my undergraduate degrees in Computer Science and Mathematical Statistics at Stellenbosch University.

Recent Reviewing (2024/2025): ACL ARR, NeurIPS, COLM, BlackBoxNLP, AfricaNLP

Teaching (2025): CSC3022F Machine Learning for 3rd years, CSC2042S Supervised Machine Learning for 2nd years, CSC1016S Java programming for 1st years, CSC4019Z Research Methods for Honours.

News

October 2025

May 2025 My students and collaborators are presenting a few workhops papers at NAACL and ACL.

December 2024 Our paper BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context is accepted to the Workshop on Language Models for Low-Resource Languages (LoResLM) at COLING 2025.

June 2024 I attended NAACL in Mexico City to present our paper A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation.

May 2024 I attended LREC-COLING in Turin to present a talk on T2X and a poster on NGLUEni.

May 2024 Our paper NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages won a best paper award at the AfricaNLP workshop co-located with ICLR 2024.