U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

Exploring CLIP for Real World, Text-based Image Retrieval

NCJ Number
309744
Author(s)
Manal Sultan; Lia Jacobs; Abby Stylianou; Robert Pless
Date Published
September 2023
Length
6 pages
Annotation

In this paper, researchers explore using CLIP for image retrieval.

Abstract

In this paper, researchers consider the ability of CLIP features to support text-driven image retrieval and find that there is a sweet-spot of detail in the text that gives best results and find that words describing the "tone" of a scene (such as messy, dingy) are quite important in maximizing text-image similarity. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, the researchers explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. The authors explore the effectiveness of text-driven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. (Published Abstract Provided)