KAIST VICLab

Academic website of KAIST VICLab, under the advisory of Prof.Munchurl Kim, Korea Advanced Institute of Science & Technology (KAIST), Korea.

Our research of interest includes deep-learning-based computer vision, computational image & video processing as well as image & video understanding and 2D/3D video coding.

Email / Homepage / Contact / Github

Research

Our recent intensive works focus on Computer Vision research

[1] in the fields of natural image and video restoration: (1) super-resolution, (2) frame interpolation, (3) SDR-to-HDR inverse tone mapping, (4) image in-painting, (5) depth estimation, (6) image deraining, (7) image dehazing, (8) video motion debluring; (9) generative restoration of old photos,

[2] in the fields of 3D image/video reconstruction: (1) depth estimation, (2) optical flow estimation, (3) camera pose estimation, (4) dynamic neural radiance field (NeRF) and Gaussian splatting learning of video for novel view synthesis;

[3] in the fields of satellite images: (1) PAN sharpening, super-resolution and cloud removal of Electro-Optical (EO) images, (2) super-resolution, detection and classification of Synthetic Aperture Radar (SAR) image targets, (3) SAR-to-EO image-to-image translation learning, etc.

Some papers are highlighted.

	Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition Jeonghyeok Do, Munchurl Kim ICCV, 2025 project page / arXiv TDSM introduces the first framework to apply diffusion models and to implicitly align the skeleton features with text prompts (action labels) by fully taking the advantage of excellent text-image correspondence learning in generative diffusion process, thus being able to learn fused discriminative features in a unified latent space.
	PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk, Jaehyup Lee, Munchurl Kim ICCV, 2025 project page / arXiv PAN-Crafter propose Modality-Adaptive Reconstruction (MARs), a unified reconstruction framework that enables robust learning from misaligned PAN-MS image pairs by dynamically generating both HRMS and PAN images.
	SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello , Jaeho Moon , Jihyong Oh, Munchurl Kim CVPR, 2025 project page / arXiv COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos.
	ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detection Aerial Image Objects Woojin Lee, Hyugjae Chang, Jaeho Moon , Jaehyup Lee , Munchurl Kim CVPR, 2025 project page / arXiv TBD.
	BiM-VFI: Bidirectional Motion Fields-Guided Frame Interpolation for Video with Non-uniform Motions Wonyong Seo, Jihyong Oh , Munchurl Kim CVPR, 2025 project page / arXiv TBD.
	U-Know-Diff-PAN: Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening Sungpyo Kim, Jeonghyeok Do , Jaehyup Lee , Munchurl Kim CVPR, 2025 project page / arXiv TBD.
	MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong , Won-Sik Cheong , Jihyong Oh, Munchurl Kim CVPR, 2025 project page / arXiv TBD.
	MIVE: New Design and Benchmark for Multi-Instance Video Editing Samuel Teodoro, Agus Gunawan , Soo Ye Kim , Jihyong Oh , Munchurl Kim arXiv, 2024 project page / arXiv TBD.
	Diffusion-based Data Augmentation and Knowledge Distillation with Generated Soft Labels Solving Data Scarcity Problems of SAR Oil Spill Segmentation Jaeho Moon, Jeonghwan Yun , Jaehyun Kim* , Jaehyup Lee , Munchurl Kim arXiv, 2024 project page / arXiv TBD.
	C-DiffSET: Leveraging Latent Diffusion for SAR-to-EO Image Translation with Confidence-Guided Reliable Object Generation Jeonghyeok Do, Jaehyup Lee, Munchurl Kim arXiv, 2024 project page / arXiv C-DiffSET proposes the first framework to fine-tune a pretrained LDM for SET tasks, effectively leveraging their learned representations to overcome the scarcity of SAR-EO image pairs.
	SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition Jeonghyeok Do, Munchurl Kim ECCV, 2024 project page / arXiv SkateFormer proposes a partition-specific attention strategy (Skate-MSA) for skeleton-based action recognition that captures skeletal-temporal relations and reduces computational complexity.
	FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring Geunhyuk Youk , Jihyong Oh , Munchurl Kim CVPR, 2024 (Oral Presentation) project page / arXiv TBD.
	From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior Jaeho Moon, Juan Luis Gonzalez Bello , Byeongjun Kwon, Munchurl Kim CVPR, 2024 project page / arXiv Solving dynamic object problems in self-supervised depth estimation using Ground Contacting Prior.
	Novel View Synthesis with View-Dependent Effects from a Single Image Juan Luis Gonzalez Bello , Munchurl Kim CVPR, 2024 project page / arXiv TBD.
	DyBluRF: Dynamic Deblurring Neural Radiance Fields for Blurry Monocular Video Minh-Quan Viet Bui, Jongmin Park , Jihyong Oh, Munchurl Kim arXiv, 2023 project page / arXiv Dynamic deblurring NeRF framework for reconstructing dynamic scenes from blurry monocular video.
	ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields Juan Luis Gonzalez Bello, Minh-Quan Viet Bui , Munchurl Kim IEEE Access project page / arXiv Efficient NeRF framework for fine-grained 3D scene reconstruction with few sampling points via projection-aware ray sampling.
	COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability Jongmin Park, Jooyoung Lee , Munchurl Kim ICCV, 2023 project page / arXiv The first proposed NN-based spatially scalable image compression method that supports arbitrary-scale spatial scalability.

This website's source code is borrowed from Jon Barron's source code.