CV

Research Scientist and Medical Doctor building and evaluating large language models for high-stakes settings. My work spans post-training, alignment, adversarial evaluation, uncertainty/calibration, and production deployment of LLM systems. I have led end-to-end development of medical LLMs, safety evaluation pipelines, and an Epic-integrated clinical agent used by 3000+ clinicians across 150,000+ clinical interactions. I bring a rare combination of clinical expertise, research leadership, and deep systems engineering experience across training, inference, and large-scale software infrastructure. I am especially interested in trustworthy reasoning, scalable oversight, and robust deployment of frontier models in real-world environments.

Education

Ph.D. in Computer Science, Université Catholique de Louvain, Brussels, 2026
- Defended March 2026
Doctor of Medicine (M.D.), Université Catholique de Louvain, Brussels, 2023
M.S. in Computer Science, Université d’Avignon, France, 2014
B.S. in Computer Science, Université d’Avignon, France, 2012

Experience

Postdoctoral Researcher — Stanford University (Remote / San Francisco, CA) · April 2026 – present

Designing and implementing medical benchmarks for LLMs
Red teaming multimodal frontier models with adversarial attacks

Research Scientist — Université Catholique de Louvain (Brussels) · October 2023 – March 2026

Trained Internist-7B, a 7B-parameter medical LLM based on Mistral; first model of its size to surpass 60% on MedQA. Public release: internistai/base-7b-v0.2
Designed and implemented alignment and safety evaluation pipelines including automated adversarial red teaming, hallucination suppression, self-critique, and metacognitive reasoning assessment for clinical reliability
Led end-to-end deployment of a clinical agentic system integrated with Epic EHR — model serving, inference orchestration, clinician-facing UI, safety guardrails, auditability, and monitoring; in production with 3000+ clinical users across 150,000+ clinical interactions. Press: RTL Info
Visiting Researcher, Harvard University (LiGHT) — evaluated alignment techniques for clinical LLMs with practicing clinicians as part of the MOOVE initiative
Visiting Researcher, Cleveland Clinic — trained Qwen-based models with GRPO for safe de-identification of clinical documents while preserving semantic fidelity

Machine Learning Scientist (Consultant) — DeepSky (Remote / San Francisco, CA) · August – September 2023

Designed and curated a dataset for a foundational medical generative text model
Performed training with AWS SageMaker and PyTorch

Medical Doctor Clerkships — Cliniques Universitaires Saint-Luc (Brussels) · 2021 – 2023

Core rotations: Emergency Medicine, Nephrology, Geriatrics, Obstetrics, Pediatrics, General Surgery, Family Medicine, Pulmonology, Anesthesiology, Radiology
Medical thesis on computer-assisted diagnosis of rare kidney diseases in emergency departments

Lead Software Engineer — Tilted Phoques (Brussels) · 2017 – 2023

Built large-scale systems in C++, Python, C#, AWS, and Kubernetes, including networking infrastructure handling tens of thousands of real-time concurrent users
Low-level optimization in Assembly and security analysis using IDA Pro and WinDbg
Authored open-source Cyber Engine Tweaks — 4,600+ GitHub stars, 10M+ downloads

Software Engineer — Bethesda Softworks (Remote / Austin, TX) · 2016 – 2017

Designed an anti-cheat system from scratch in C++, Assembly, and Python for upcoming titles
Built backend systems for cheat reports and analytics on AWS using microservices and Lambda; contributed to the open-source AWS C++ SDK
Researched obfuscation, tamper detection, and code/memory integrity techniques

Software Engineer — ZeniMax Online Studios (Hunt Valley, MD) · 2014 – 2016

Designed load-balancing systems improving response time and capacity at the scale of hundreds of thousands of concurrent connections
Low-level optimization across memory management, networking, I/O, threading, and lock-free data structures
Built anti-cheat systems including server-side payload generation, data obfuscation, and debugger traps

Awards & Honors

2025 — Senior Area Chair Highlight, ACL 2025
2025 — Nature Communications feature
2025 — Health Data Agency Grant (€70,000; PI)
2025 — FSR Fellowship (Special Research Fund, Belgium) — PhD scholarship
2023 – 2027 — FSL Fellowship (Saint-Luc Fund, Belgium) — PhD scholarship
2023 — 2nd place, Mistral AI Hackathon (RAISE Summit)

Skills

Alignment & Safety — RLHF, scalable oversight, preference modeling, red teaming, interpretability workflows, adversarial robustness, uncertainty / calibration
Model Development — Python, PyTorch, HuggingFace (transformers, TRL), Axolotl, CUDA, C++, Assembly
Evaluation & Benchmarking — lm-eval-harness, custom clinical reasoning benchmarks, physician-in-the-loop evaluation pipelines
Infrastructure & Deployment — vLLM, SGLang, Kubernetes, Docker, MLflow, Weights & Biases, Epic EHR integration

Publications

A Methodology for Developing and Integrating Large Language Models into Electronic Health Records to Support Clinical Workflows

Griot, M. (2026). "A Methodology for Developing and Integrating Large Language Models into Electronic Health Records to Support Clinical Workflows." Doctoral Thesis, Université Catholique de Louvain.

Implémentation d’un chatbot dans le dossier patient informatisé

Griot, M., Irrthum, A., Vanderdonckt, J., & Yuksel, D. (2026). "Implémentation d'un chatbot dans le dossier patient informatisé." Actes de la journée d'étude sur l'utilisation des LLM à l'hôpital.

Pattern recognition or medical knowledge? The problem with multiple-choice questions in medicine

Griot, M., Vanderdonckt, J., Yuksel, D., & Hemptinne, C. (2025). "Pattern recognition or medical knowledge? The problem with multiple-choice questions in medicine." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).

Maxime Griot

CV

Education

Experience

Awards & Honors

Skills

Publications

A Methodology for Developing and Integrating Large Language Models into Electronic Health Records to Support Clinical Workflows

Implémentation d’un chatbot dans le dossier patient informatisé

Pattern recognition or medical knowledge? The problem with multiple-choice questions in medicine

Large language models lack essential metacognition for reliable medical reasoning

La régulation de l’utilisation de l’intelligence artificielle en milieu hospitalier

Physician in the Loop Design of Interactive Agents

A patient-in-the-loop approach to artificial intelligence in medicine

Implementation of large language models in electronic health records

A hybrid deployment model for generative artificial intelligence in hospitals

Impact of high-quality, mixed-domain data on the performance of medical language models

MetaMedQA benchmark