Publications
Papers
-
The Learning Dynamics of Subword Segmentation for Morphologically Diverse Languages
Francois Meyer and Jan Buys
AACL 2025 -
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prevot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, Leshem Choshen
arXiv October 2025 -
Designing and Contextualising Probes for African Languages
Wisdom Aduah and Francois Meyer
AfricaNLP workshop @ ACL 2025 -
Neural Morphological Tagging for Nguni Languages
Cael Marquard, Simbarashe Mawere, Francois Meyer
AfricaNLP workshop @ ACL 2025 -
Benchmarking IsiXhosa Automatic Speech Recognition and Machine Translation for Digital Health Provision
Abby Blocker, Francois Meyer, Ahmed Biyabani, Joyce Mwangama, Mohammed Ishaaq Datay, Bessie Malila
Workshop on Patient-Oriented Language Processing (CL4Health) @ NAACL 2025 -
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos, Charl Hendriks, Hishaam Mahomed and Francois Meyer
Workshop on Language Models for Low-Resource Languages @ COLING 2025 -
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer and Jan Buys
NAACL Findings 2024 -
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative
Data-to-Text Generation
Francois Meyer and Jan Buys
LREC-COLING 2024 [data] -
NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages
Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka
LREC-COLING 2024 [data] -
SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation
Haiyue Song, Francois Meyer, Raj Dabre, Hideki Tanaka, Chenhui Chu, Sadao Kurohashi
EAMT 2024 -
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation
Francois Meyer and Jan Buys
Findings of ACL 2023 -
Subword Segmental Language Modelling for Nguni Languages
Francois Meyer and Jan Buys
Findings of EMNLP 2022 -
University of Cape Town’s WMT22 System: Multilingual Machine Translation for
Southern African Languages
Khalid N. Elmadani, Francois Meyer, and Jan Buys
WMT 2022 -
NLAPOST2021 1st Shared Task on Part-of-Speech Tagging for Nguni Languages
Franziska Pannach, Francois Meyer, Edgar Jembere, Dlamini, Sibonelo Zamokuhle
Proceedings of the International Conference of the Digital Humanities Association of Southern Africa (DHASA) 2021 -
Challenging Distributional Models with a Conceptual Network of Philosophical Terms
Yvette Oortwijn, Jelke Bloem, Pia Sommerauer, Francois Meyer, Wei Zhou, and Antske Fokkens
NAACL 2021 -
Modelling Lexical Ambiguity with Density Matrices
Francois Meyer and Martha Lewis
CoNLL 2020 -
The semantics of meaning: distributional approaches for studying philosophical text
Francois Meyer, Yvette Oortwijn, Pia Sommerauer, Jelke Bloem, Arianna Betti, and Antske Fokkens Proceedings of the Network Institute Academy Assistants programme, 2019 -
Learning Concept Embeddings from Temporal Data
Francois Meyer, Brink van der Merwe, and Dirko Coetsee
Journal of Universal Computer Science, 2018
Theses
-
PhD thesis: Subword Segmental Neural Language Generation for Nguni Languages
University of Cape Town, 2024
Supervisor: Jan Buys -
MSc thesis: Lexical ambiguity with density matrices
University of Amsterdam, 2020
Supervisor: Martha Lewis -
BSc Honours thesis: Learning Concept Embeddings from Temporal Data
University of Stellenbosch, 2017
Supervisor: Brink van der Merwe