Hi, I'm

Sumin Byeon

The data guy — building platforms for ML/LLM at scale.

I lead a machine learning data platform team at NAVER Cloud. Our mission is to develop and operate a large-scale, fault-tolerant ML data platform where engineers and researchers collaborate to build next-generation AI systems.

Connect

Talks

Selected conference talks and panels.

Projects

Selected work from recent roles and products.

NAVER Cloud

MLX Data Manager

A data management system designed for machine learning workloads, offering a Hugging Face-compatible interface for ease of adoption, along with data version control and lineage tracking.

DataMLKotlinPython
Details soon →
NAVER

Hand-written Font Generator

Given a sheet of handwritten paper, generate a font that resembles the handwriting.

MLKotlin
Coupang

Data Lake

Ingested and aggregated operational data—from Kafka into S3 (ORC format)—for analytics, ensuring integrity with deduplication and exactly-once delivery semantics.

javakafkas3