Slurm Cluster Orchestration on Kubernetes
→
Summary
Designed and implemented a containerized, autoscaling High Performance Computing (HPC) cluster solution on Kubernetes.
Master's candidate in High Performance Computing with expertise in designing and deploying robust solutions for distributed workload orchestration, monitoring, and analysis within complex Linux environments. Proven ability to optimize GPU performance, automate critical pipelines, and manage large-scale datasets, positioning for impactful roles in HPC, MLOps, and distributed systems engineering.
R&D Intern
Paris / Guyancourt, Île-de-France, France
→
Summary
As an R&D Intern, Dorian designed and orchestrated machine learning workflows, and automated critical data pipelines to enhance research efficiency.
Highlights
Designed and orchestrated efficient CNN model training workflows using PyTorch, processing extensive datasets exceeding 500GB for advanced R&D initiatives.
Automated complex data processing and job scheduling pipelines using Slurm with Python and Bash scripts, significantly enhancing operational efficiency and reproducibility for research projects.
Streamlined research operations by developing robust automation scripts for Slurm, reducing manual intervention and accelerating experimental cycles for critical R&D projects.
→
Master of Computer Science
High Performance Computing (CHPS)
Courses
Advanced HPC Architectures, Distributed File Systems (Lustre), and I/O Optimization techniques.
Software Engineering principles applied to High Performance Computing systems.
Practical experience and experimentation leveraging the ROMEO supercomputer for complex projects.
Awarded By
Tensara
Achieved a top 50 global ranking out of 1000+ participants in the Tensara GPU Benchmarking Challenge, demonstrating exceptional skills in GPU performance optimization, benchmarking, and profiling of intensive algorithms.
C/C++, Python, Bash.
Kubernetes, Docker, Slurm.
Azure, AWS.
OpenMP, Nsight, Distributed Systems, HPC Architectures, I/O Optimization, Supercomputing, Workload Analysis, Performance Profiling, Benchmarking, GPU Optimization.
Git, CI/CD, Linux, Machine Learning (PyTorch, CNN), Data Processing, DAG Modeling, Software Engineering.
→
Summary
Designed and implemented a containerized, autoscaling High Performance Computing (HPC) cluster solution on Kubernetes.
→
Summary
Developed a monitoring and analysis tool for High Performance Computing (HPC) workloads by modeling Slurm job dependencies as Directed Acyclic Graphs (DAGs).