About
Google Scholar - Twitter - Bluesky - LinkedIn - GitHub

I am a Lecturer (≈ US Assistant Professor) at the Computer Science Department of the University of Cape Town and part of the UCT NLP research group.
My current research is on low-resource NLP, data-efficient modelling, and linguistically informed interpretability. I am particularly interested in exploring the intersection of two areas where data efficiency is central: NLP for low-resource languages and developmentally plausible NLP.
I completed my PhD under Jan Buys at the University of Cape Town in South Africa. My thesis was on optimising subword segmentation for low-resource, morphologically complex languages. Previously I obtained a masters in AI at the University of Amsterdam, supervised by Martha Lewis, and my undergraduate degrees in Computer Science and Mathematical Statistics at Stellenbosch University.
Recent Reviewing (2024/2025): ACL ARR, NeurIPS, COLM, BlackBoxNLP, AfricaNLP
Teaching (2025): CSC3022F Machine Learning for 3rd years, CSC2042S Supervised Machine Learning for 2nd years, CSC1016S Java programming for 1st years, CSC4019Z Research Methods for Honours.
News
October 2025
-
Our paper The Learning Dynamics of Subword Segmentation for Morphologically Diverse Languages is accepted to AACL. I will be in Mumbai in December to present it.
-
We released BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data [website]. I was responsible for curating the datasets for five South African languages: isiXhosa, isiZulu, Sesotho, Sepedi, and Afrikaans.
May 2025 My students and collaborators are presenting a few workhops papers at NAACL and ACL.
-
Designing and Contextualising Probes for African Languages
Wisdom Aduah, Francois Meyer
AfricaNLP workshop @ ACL 2025 -
Neural Morphological Tagging for Nguni Languages
Cael Marquard, Simbarashe Mawere, Francois Meyer
AfricaNLP workshop @ ACL 2025 -
Benchmarking IsiXhosa Automatic Speech Recognition and Machine Translation for Digital Health Provision
Abby Blocker, Francois Meyer, Ahmed Biyabani, Joyce Mwangama, Mohammed Ishaaq Datay, Bessie Malila
Workshop on Patient-Oriented Language Processing (CL4Health) @ NAACL 2025
December 2024 Our paper BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context is accepted to the Workshop on Language Models for Low-Resource Languages (LoResLM) at COLING 2025.
June 2024 I attended NAACL in Mexico City to present our paper A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation.
May 2024 I attended LREC-COLING in Turin to present a talk on T2X and a poster on NGLUEni.
-
T2X: Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Francois Meyer and Jan Buys
[dataset] -
NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages
Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka
[benchmark]
May 2024 Our paper NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages won a best paper award at the AfricaNLP workshop co-located with ICLR 2024.