CV
Research Scientist and Medical Doctor building and evaluating large language models for high-stakes settings. My work spans post-training, alignment, adversarial evaluation, uncertainty/calibration, and production deployment of LLM systems. I have led end-to-end development of medical LLMs, safety evaluation pipelines, and an Epic-integrated clinical agent used by 3000+ clinicians across 150,000+ clinical interactions. I bring a rare combination of clinical expertise, research leadership, and deep systems engineering experience across training, inference, and large-scale software infrastructure. I am especially interested in trustworthy reasoning, scalable oversight, and robust deployment of frontier models in real-world environments.
Education
- Ph.D. in Computer Science, Université Catholique de Louvain, Brussels, 2026
- Doctor of Medicine (M.D.), Université Catholique de Louvain, Brussels, 2023
- M.S. in Computer Science, Université d’Avignon, France, 2014
- B.S. in Computer Science, Université d’Avignon, France, 2012
Experience
Postdoctoral Researcher — Stanford University (Remote / San Francisco, CA) · April 2026 – present
- Designing and implementing medical benchmarks for LLMs
- Red teaming multimodal frontier models with adversarial attacks
Research Scientist — Université Catholique de Louvain (Brussels) · October 2023 – March 2026
- Trained Internist-7B, a 7B-parameter medical LLM based on Mistral; first model of its size to surpass 60% on MedQA. Public release: internistai/base-7b-v0.2
- Designed and implemented alignment and safety evaluation pipelines including automated adversarial red teaming, hallucination suppression, self-critique, and metacognitive reasoning assessment for clinical reliability
- Led end-to-end deployment of a clinical agentic system integrated with Epic EHR — model serving, inference orchestration, clinician-facing UI, safety guardrails, auditability, and monitoring; in production with 3000+ clinical users across 150,000+ clinical interactions. Press: RTL Info
- Visiting Researcher, Harvard University (LiGHT) — evaluated alignment techniques for clinical LLMs with practicing clinicians as part of the MOOVE initiative
- Visiting Researcher, Cleveland Clinic — trained Qwen-based models with GRPO for safe de-identification of clinical documents while preserving semantic fidelity
Machine Learning Scientist (Consultant) — DeepSky (Remote / San Francisco, CA) · August – September 2023
- Designed and curated a dataset for a foundational medical generative text model
- Performed training with AWS SageMaker and PyTorch
Medical Doctor Clerkships — Cliniques Universitaires Saint-Luc (Brussels) · 2021 – 2023
- Core rotations: Emergency Medicine, Nephrology, Geriatrics, Obstetrics, Pediatrics, General Surgery, Family Medicine, Pulmonology, Anesthesiology, Radiology
- Medical thesis on computer-assisted diagnosis of rare kidney diseases in emergency departments
Lead Software Engineer — Tilted Phoques (Brussels) · 2017 – 2023
- Built large-scale systems in C++, Python, C#, AWS, and Kubernetes, including networking infrastructure handling tens of thousands of real-time concurrent users
- Low-level optimization in Assembly and security analysis using IDA Pro and WinDbg
- Authored open-source Cyber Engine Tweaks — 4,600+ GitHub stars, 10M+ downloads
Software Engineer — Bethesda Softworks (Remote / Austin, TX) · 2016 – 2017
- Designed an anti-cheat system from scratch in C++, Assembly, and Python for upcoming titles
- Built backend systems for cheat reports and analytics on AWS using microservices and Lambda; contributed to the open-source AWS C++ SDK
- Researched obfuscation, tamper detection, and code/memory integrity techniques
Software Engineer — ZeniMax Online Studios (Hunt Valley, MD) · 2014 – 2016
- Designed load-balancing systems improving response time and capacity at the scale of hundreds of thousands of concurrent connections
- Low-level optimization across memory management, networking, I/O, threading, and lock-free data structures
- Built anti-cheat systems including server-side payload generation, data obfuscation, and debugger traps
Awards & Honors
- 2025 — Senior Area Chair Highlight, ACL 2025
- 2025 — Nature Communications feature
- 2025 — Health Data Agency Grant (€70,000; PI)
- 2025 — FSR Fellowship (Special Research Fund, Belgium) — PhD scholarship
- 2023 – 2027 — FSL Fellowship (Saint-Luc Fund, Belgium) — PhD scholarship
- 2023 — 2nd place, Mistral AI Hackathon (RAISE Summit)
Skills
- Alignment & Safety — RLHF, scalable oversight, preference modeling, red teaming, interpretability workflows, adversarial robustness, uncertainty / calibration
- Model Development — Python, PyTorch, HuggingFace (transformers, TRL), Axolotl, CUDA, C++, Assembly
- Evaluation & Benchmarking — lm-eval-harness, custom clinical reasoning benchmarks, physician-in-the-loop evaluation pipelines
- Infrastructure & Deployment — vLLM, SGLang, Kubernetes, Docker, MLflow, Weights & Biases, Epic EHR integration
Publications
Griot, M. (2026). "A Methodology for Developing and Integrating Large Language Models into Electronic Health Records to Support Clinical Workflows." Doctoral Thesis, Université Catholique de Louvain.
Griot, M., Irrthum, A., Vanderdonckt, J., & Yuksel, D. (2026). "Implémentation d'un chatbot dans le dossier patient informatisé." Actes de la journée d'étude sur l'utilisation des LLM à l'hôpital.
Griot, M., Vanderdonckt, J., Yuksel, D., & Hemptinne, C. (2025). "Pattern recognition or medical knowledge? The problem with multiple-choice questions in medicine." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
Griot, M., Hemptinne, C., Vanderdonckt, J., & Yuksel, D. (2025). "Large language models lack essential metacognition for reliable medical reasoning." Nature Communications.
Gobert, A., Rappe, M., & Griot, M. (2025). "La régulation de l'utilisation de l'intelligence artificielle en milieu hospitalier." In Droit hospitalier: décodage juridique au départ des réalités hospitalières.
Griot, M., Vanderdonckt, J., Yuksel, D., & Hemptinne, C. (2025). "Physician in the Loop Design of Interactive Agents." Engineering Interactive Computer Systems — EICS 2024 International Workshops.
Griot, M. F., & Walker, G. A. (2025). "A patient-in-the-loop approach to artificial intelligence in medicine." JAMA Network Open.
Griot, M., Vanderdonckt, J., & Yuksel, D. (2025). "Implementation of large language models in electronic health records." PLOS Digital Health.
Griot, M., Hemptinne, C., Vanderdonckt, J., & Yuksel, D. (2025). "A hybrid deployment model for generative artificial intelligence in hospitals." Machine Learning: Health.
Griot, M., Hemptinne, C., Vanderdonckt, J., & Yuksel, D. (2024). "Impact of high-quality, mixed-domain data on the performance of medical language models." Journal of the American Medical Informatics Association (JAMIA), 31(9), 1875.
Griot, M., Hemptinne, C., Vanderdonckt, J., & Yuksel, D. (2024). "MetaMedQA benchmark." Zenodo.