Computer Vision

2024.04.09· 논문 리뷰

논문 링크: https://arxiv.org/abs/2304.02643 Introduction Web-scale datasets로 pre-trained된 LLM은 NLP에서 zero-shot 및 few-shot에 혁명을 일으켰다. 이런 foundation model은 학습 때 사용한 데이터를 넘어서 tasks와 데이터 분포를 생성할 수 있다. Vision 분야에서는 CLIP, ALIGN 같은 foundation model이 있으며, 이들은 text와 image 인코더를 두 개의 modalities를 align하는 contrastive learning을 사용하여 학습한다. 본 논문의 목표는 image segmentation을 위한 foundation model을 만드는 것이며, 이를 위해 promptabl..

티스토리툴바