Research Scientist · Trust & Safety / AI Security

Hanbin Hong

Trust & Safety for multimodal models — data, training, and evaluation. Building LLM/VLM-powered video moderation systems, with automated labeling, large-scale datasets, and mid-/post-training.

Trust & SafetyVideo moderationData & labelingMid-/post-trainingSafety evaluationAdversarial robustness

About

I work at the intersection of Trust & Safety, adversarial ML, and multimodal model training — building pipelines that move from research ideas to production-grade systems.

I care about measurable reliability: turning ambiguous policy requirements into datasets, training objectives (mid-/post-training), and evaluation suites that are reproducible, auditable, and fast to iterate.

  • Video moderation: end-to-end automation for review, labeling, and quality control.
  • Model security: prompt security, red-teaming, and adversarial robustness.
  • Certified robustness: provable guarantees & verification for safety-critical settings.

Background

  • Ph.D. (CSE), University of Connecticut (Sep 2022–Aug 2025).
  • Ph.D. (CS), Illinois Institute of Technology (2021–2022).
  • B.S. Honor Science (Physics), Xi’an Jiaotong University (2014–2018).

Research themes: prompt security, adversarial attacks/defenses, and certified robustness — now applied to large-scale multimodal moderation.

What I build

Video moderation at scale

LLM/VLM-assisted review pipelines for real-world policy enforcement and reliability.

Data & post-training

Automated labeling, large-scale dataset construction, and mid-/post-training for production models.

Security & evaluation

Prompt security, adversarial ML, and certified robustness—turning research into deployable evals and guardrails.

Selected publications

Full list on Google Scholar.

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
arXiv · 2025
Towards Strong Certified Defense with Universal Asymmetric Randomization
IEEE CSF · 2026
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence
ACM CCS · 2024
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
IEEE S&P · 2024
UniCR: Universally Approximated Certified Robustness via Randomized Smoothing
ECCV · 2022

Contact

Email: hanbin.hong1@bytedance.com · LinkedIn · GitHub · Scholar