image

Joanna Hong

AI Research Scientist
joanna2587@kaist.ac.kr

About Me

I am a Ph.D. graduate in Integrated Vision Language Lab. under the supervision of Professor Yong Man Ro in Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST). I received a B.S. in Electrical Engineering from KAIST in 2019.

My major research area is multi-modal representation learning and human interactive learning. I am more focusing on integrating audio, video, and text modalities in human dialogue systems, especially lipreading, lip-to-speech synthesis, and audio-visual speech recognition. My research interests extend further like machine translation, speech enhancement, and speech separation, using multi-modal representations.

I am proud to highlight that I have been honored with the Best Dissertation Award in the School of Electrical Engineering. My thesis focuses on human speech understanding through multimodal representation learning.

Here is Curriculum Vitae for more information about me.


Work Experiences

June 2023 - March 2024
Meta

Research Scientist Intern

  • In Meta Reality Labs Research Audio team under Anurag Kumar.
  • Robust multi-modal audiovisual speech representation learning research.
Sept 2017 - Feb 2018
Koh Young

AI Research Intern

  • Adjusting control parameter of screen printers using opimization algorithms based on machine learning.

Publications

Conferences

(* indicates equal contribution)

  • Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model [link]
  • Joanna Hong, Se Jin Park, and Yong Man Ro
    Findings of the Association for Computational Linguistics: EMNLP, 2023

  • DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding [link]
  • Jeongsoo Choi*, Joanna Hong*, and Yong Man Ro
    IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  • Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring [link]
  • Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  • Lip-to-Speech Synthesis in the Wild with Multi-task Learning [link]
  • Minsu Kim*, Joanna Hong*, and Yong Man Ro
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

  • VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection [link]
  • Joanna Hong*, Minsu Kim, and Yong Man Ro
    European Conference on Computer Vision (ECCV), 2022

  • Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition [link]
  • Minsu Kim*, Joanna Hong*, Daehun Yoo, and Yong Man Ro
    Interspeech, 2022 (Oral)

  • SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [link]
  • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro
    AAAI Conference on Artificial Intelligence (AAAI), 2022 (Oral)

  • Lip to Speech Synthesis with Visual Context Attentional GAN [link]
  • Minsu Kim, Joanna Hong, and Yong Man Ro
    Conference on Neural Information Processing Systems (NeuIPS), 2021

  • Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video [link]
  • Minsu Kim*, Joanna Hong*, Se Jin Park, and Yong Man Ro
    IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  • Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition [link]
  • Minsu Kim, Joanna Hong, Junho Kim, Hong Joo Lee, and Yong Man Ro International Conference on Pattern Recognition (ICPR), 2021

  • Comprehensive Facial Expression Synthesis Using Human-Interpretable Language [link]
  • Joanna Hong, Jung Uk Kim, Sangmin Lee, Yong Man Ro IEEE International Conference on Image Processing (ICIP), 2020

  • Face Tells Detailed Expression: Generating Comprehensive Facial Expression Sentence Through Facial Action Units [link]
  • Joanna Hong, Hong Joo Lee, Yelin Kim, and Yong Man Ro International Conference on Multimedia Modeling (MMM), 2020

Journals

  • Speech Reconstruction with Reminiscent Sound via Visual Voice Memory [link]
  • Joanna Hong, Minsu Kim, Se Jin Park, and Yong Man Ro
    IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021

  • Cromm-vsr: Cross-modal memory augmented visual speech recognition [link]
  • Minsu Kim, Joanna Hong, Se Jin Park, and Yong Man Ro
    IEEE Transactions on Multimedia (TMM), 2021

Education

  • Ph.D. in Electrical Engineering
    Korea Advanced Institute of Science and Technology (KAIST)
    Feb 2020 - Feb 2024
  • M.S. in Electrical Engineering
    Korea Advanced Institute of Science and Technology (KAIST)
    Feb 2019 - Feb 2020
  • B.S. in Electrical Engineering
    Korea Advanced Institute of Science and Technology (KAIST)
    Sept 2014 - Feb 2019

Research Interests

  • Multi-modal
  • Audiovisual
  • Speech
  • Human Interactive AI
  • Machine Learning
  • Deep Learning

Honors

  • Best Ph.D. Dissertation Award (2024)
  • Representative of Image and Video Systems Lab. (2023)
  • Outstanding Teaching Assistant Award (2021)
  • National Government Fellowship (2014 - 2024)


Reviews

Conferences
  • CVPR (2024)
  • ECCV (2024)
  • ICASSP (2024)
  • ICLR (2024)
  • ICCV (2023)
  • ICML (2021, 2022, 2023, 2024)
  • NeruIPS (2020, 2022, 2023)
Journal
  • IEEE TASLP (2024)
  • IEEE TCSVT (2023, 2024)
  • IEEE TMM (2023)


Teachings

  • EE474 Introduction to Multimedia (2020, 2021, 2022, 2023)
  • EE837 Multimedia Processing and Learning (2022)
  • EE534 Pattern Recognition (2021)


Tech Stack

  • Pytorch
  • Python
  • Tensorflow
  • Matlab

Languages

  • Korean
  • English