< Retour à la liste
profil pic BallierNicolas

Nicolas Ballier

Full Professor

nicolas.ballier@u-paris.fr

 

Bio

I’m Professor of English at Université Paris Cité. My background is corpus linguistics (text and speech) and I have an increasing interest in AI, data science, machine learning and NLP. I have taught at the Université de Rouen, Paris 13 and Université Paris Diderot now called ‘Université Paris Cité’

personal data
nicolas.ballier AT u-paris DOT fr
bureau 712 (7th floor)
phone : 33+ (0)1 57 27 58 74
Bâtiment Olympe de Gouges
Place Paul Ricoeur 75013 Paris
How to get there

Snail mail address :
Université Paris Cité
Nicolas Ballier – UFR EA
Service courrier – case 7046
Bâtiment Olympe de Gouges
27 rue Jean Antoine de Baif
75025 Paris cedex 13

Research Areas

Learner Corpus Research
speech  Models, LLMs, SLMs
Neural Machine translation (gender bias, XAI)
Digital Humanities
epistemology of linguistics  (3rd revolution of  grammatisation)

Recently  (Co)-supervised funded projects

PAPTAN (co-PI with  Maria-Zimina Poirot)
This is our platform for Neural Machine Translation and NLP/AI experiments.

MAKE-NMT VIZ : associate PI to this Genoble/Swansea project (2023-2024)
This was a project aimed at investigating Machine Translation interpretability using Visualisation tools.
See for example our paper published in the TAL journal or the project paper we wrote for the EAMT conference.

 DLLA Deep Learning for Language Assessment (2022-2023)
PI for UPCité, co-PI: Helen Yannadoukakis ( KCL). A joint project designed to investigate CEFR levels with keylog data. See our LREC paper that describes our KUPA-KEYS dataset published on Hugging Face.

Neuroviz (2021-2022)
PI:  Guillame Wisniewski. A project designed to probe information flux in  neural networks for translation.

SPECTRANS (2020-2022)
porteur principal de ce projet interdisciplinaire sur la traduction neuronale spécialisée (SPECTRANS)
Github du projet avec les données

GIFRA
PI for UPCité for Promoting fairness for under-represented languages in multilingual LLMs (2025-2026)
This project is funded by the University of Washington (UW) Global Innovation Fund (GIF)  Research award for a UPCIté/UW collaboration to promote fairness for under-represented languages in multilingual LLMs.

Recent events

Speech and text LLMs revisited: under the hood investigation of multilingual models (Dec 2025)

The sound patterns of Whisper : an informal workshop on audio LLM response to speech stimuli

Deep Learning for Language Assessment closing event
neuroViz and spectrans for Neural Machine translation

Latest publications

Journal articles

Assessing the validity of syntactic alternations as criterial features of proficiency in L2 writings in English
Cyriel Mallart, Andrew Simpkin, Nicolas Ballier, Paula Lissón, Rémi Venant, Bernardo Stearns, Jen-Yu Li, Thomas Gaillat
Research Methods in Applied Linguistics, 2025, 4 (3), pp.100238. ⟨10.1016/j.rmal.2025.100238⟩
BibTex
Exploring the cross-lingual influence of linguistic complexity in second language writing assessment
Sara Geremia, Thomas Gaillat, Nicolas Ballier, Andrew Simpkin
Assessing Writing, 2025, 66, pp.100951. ⟨10.1016/j.asw.2025.100951⟩
BibTex

Conference papers

Assessing the statistical validity of multi-noun alternation metrics as features of L2-English proficiency
Thomas Gaillat, Cyriel Mallart, Andrew Simpkin, Nicolas Ballier, Rémi Venant, Bernardo Stearns, Jen-Yu Li, Paula Lissón
58th Annual Meeting of the Societas Linguistica Europaea, Université Bordeaux Montaigne, Aug 2025, Bordeaux, France
BibTex
Predicting CEFR levels for learners of English with keylogging metrics, an exploratory study
Ahood Al Swar, Erin Pacquetet, Cyriel Mallart, Andrew J. Simpkin, Nicolas Ballier
CORIA-TALN 2025, Université d’Aix-Marseille et les UMR CNRS LIS et LPL, Jun 2025, Marseille, France. https://talnarchives.atala.org/ateliers/2025/DYNTAL/index.html
https://hal.science/hal-05146826/file/218.pdf BibTex
Actionability in CALL: linking proficiency prediction models to interpretable indicators
Thomas Gaillat, Cyrielle Mallart, Andrew Simpkin, Rémi Venant, Nicolas Ballier, Bernardo Stearns, Jen-Yu Li, Paula Lissón
International Workshop on Foreign language learning and proficiency-rated reading materials: SLA research and AI methods supporting analysis and effective didactics in real-life education, Universität Tübingen, Mar 2025, Tübingen, Allemagne, Germany
BibTex

Reports

A language-learning analytics system” project DMP
Nicolas Ballier, Thomas Gaillat, Jen-Yu Li, Cyriel Mallart, Andrew Simpkin, Bernardo Stearns, Rémi Venant
Opidor. 2025, https://dmp.opidor.fr/plans/13498
https://hal.science/hal-04988173/file/A_language-learning_analytics_system_project_DMP-5.pdf BibTex

Preprints, Working Papers, …

Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English
Cyriel Mallart, Andrew Simpkin, Nicolas Ballier, Paula Lissón, Rémi Venant, Jen-Yu Li, Bernardo Stearns, Thomas Gaillat
2025

#2024

Gabriela González Sáez, Fabien Lopez, Mariam Nakhlé, James Turner, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Caroline Rossi, Didier Schwab, Jun Yang: The MAKE-NMTViz Project: Meaningful, Accurate and Knowledge-limited Explanations of NMT Systems for Translators. EAMT (2) 2024: 12-13 PDF

Gabriela González Sáez, Mariam Nakhlé, James Turner, Fabien Lopez, Nicolas Ballier, Marco Dinarelli, Emmanuelle Esperança-Rodier, Sui He, Raheel Qader, Caroline Rossi, Didier Schwab, Jun Yang: Exploring NMT Explainability for Translators Using NMT Visualising Tools. EAMT (1) 2024: 396-410 PDF

Bernardo Stearns, Nicolas Ballier, Thomas Gaillat , Andrew Simpkin , John P. McCrae (2024) Evaluating the Generalisation of an Artificial Learner
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning, 199-208. PDF

Marco Dinarelli, Dimitra Niaouri, Fabien Lopez, Gabriela Gonzalez-Saez, Mariam Nakhlé, Emmanuelle Esperança-Rodier, Caroline Rossi, Didier Schwab* and Nicolas Ballier (2024) Context-Aware Neural Machine Translation Models Analysis And Evaluation Through Attention, TAL, n° 64, vol. 3, 67-91. Special issue on interpretability. PDF

Velentzas, G., Caines, A., Borgo, R., Pacquetet, E., Hamilton, C., Arnold, T., Nicholls, D., Gaillat, T., Ballier N. and Yannakoudakis, H. (2024). Logging Keystrokes in Writing by English Learners. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 10725-10746). PDF

Nicolas Ballier, Taylor Arnold, Adrien Méli, Tori Thurston, Jean-Baptiste Yunès (2024) Whisper for L2 speech scoring, The International Journal of Speech Technology.Springer,

Ballier, N & Méli, A. (2024) Investigating Acoustic Correlates of Whisper Scoring for L2 Speech Using Forced alignment with the Italian Component of the ISLE corpus, NLP4CALL 2024, Rennes, 24-25 Oct 2024, 20-32, published in the ACL anthology.

Ballier, N., Burin, L., Namdarzadeh, B., Ng, S, Wright, R. and Yunès, J.-B. (2024) Probing Whisper Predictions for French, English and Persian Transcriptions, 7th International Conference on Natural Language and Speech Processing, October 19-20, 2024, Trento, Italy, 129-138, published in the ACL Anthology.

# Selected communications

Tori Thurston  (Fullerton) & Nicolas Ballier (20224)   Using whisper to investigate learner pronunciations of English: Comparing LLM transcriptions with human perception of VOT”, 29 March, ALOES conferece, Villetaneuse
https://eclla.univ-st-etienne.fr/fr/tout-l-agenda/annee-2023-2024/eclla-2024-21e-colloque-aloes.html

Ballier, N. & Helen Yannakoudakis, H. (2022) Towards crowdsourcing research for learner keylogging data, LCR 2022, Padova, 22-24 sept.
Chamoun, J. & Ballier, N. 2022, Automatic Analysis of Learner Essays based on Complexity Metrics using Machine Learning Algorithms, LCR 2022, Padova, 22-24 sept.

Ballier, N. (2022) Faut-il former à ce que voit le réseau de neurones pour l’entraînement de la traduction ?, colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022

Namdarzadeh, B. & Ballier,N. 2022 What Does Neural Machine Translation Learn ? A Snapshot from Google Translate & DeepL (2021-February 2022), colloque Université libre de Bruxelles, Enseigner la traduction et l’interprétation à l’heure neuronale, 28-29 septembre 2022. https://tradital.ltc.ulb.be/medias/fichier/2022-colloque-tradital-programme-online_1660741236130- pdf

Ballier, Nicolas (2022), Traduire les dislocations de l’oral avec la traduction neuronale, Le cas des dislocations à gauche dans le CFPP
du Corpus de Français Parlé Parisien (CFPP) des années 2000, colloque TROL – Traduire l’oralité à l’ère de l’IA,
Université de Turin – 5-6 décembre 2022

Publications on the ACL anthology

DDLP (Computer science Digital Bibliography & Library Project):

LINK

Latest Open Access Publications on HAL (French Open Access Repository)

CV on HAL

https://cv.hal.science/nicolas-ballier?langChosen=fr

check other publications on HAL

You will be redirected to French open access website HAL .