Questioning an Picture Database With native AI/LLM - DZone - Uplaza

The AIDocumentLibraryChat venture has been prolonged to incorporate a picture database that may be questioned for pictures. It makes use of the LLava mannequin of Ollama, which may analyze pictures. The picture search makes use of embeddings with the PGVector extension of PostgreSQL.

Structure

The AIDocumentLibraryChat venture has this structure:

The Angular front-end exhibits the add and query options to the consumer. The Spring AI Backend adjusts the mannequin’s picture measurement, makes use of the database to retailer the info/vectors, and creates the picture descriptions with the LLava mannequin of Ollama.

The movement of picture add/evaluation/storage seems to be like this:

The picture is uploaded with the front-end. The back-end resizes it to a format the LLava mannequin can course of. The LLava mannequin then generates an outline of the picture primarily based on the offered immediate. The resized picture and the metadata are saved in a relational Desk of PostgreSQL. The picture description is then used to create Embeddings. The Embeddings are saved with the outline within the PGVector database with metadata to seek out the corresponding row within the PostgreSQL Desk. Then the picture description and the resized picture are proven within the frontend.

The movement of picture questions seems to be like this:

The consumer can enter the query within the front-end. The backend converts the query to Embeddings and searches the PGVector database for the closest entry. The entry has the row ID of the picture desk with the picture and the metadata. The picture desk knowledge is queried mixed with the outline and proven to the consumer.

Backend

To run the PGVector database and the Ollama framework the recordsdata runPostgresql.sh and runOllama.sh comprise Docker instructions.

The backend wants these entries in application-ollama.properties:

# picture processing
spring.ai.ollama.chat.mannequin=llava:34b-v1.6-q6_K
spring.ai.ollama.chat.choices.num-thread=8
spring.ai.ollama.chat.choices.keep_alive=1s

The applying must be constructed with Ollama assist (property: ‘useOllama’) and began with the ‘ollama’ profile and these properties should be activated to allow the LLava mannequin and set a helpful keep_alive. The num_thread is barely wanted if Ollama doesn’t choose the correct amount mechanically.

The Controller

The ImageController comprises the endpoints:

@RestController
@RequestMapping("rest/image")
public class ImageController {
...
  @PostMapping("/query")
  public Record postImageQuery(@RequestParam("query") String 
    question,@RequestParam("type") String sort) {		
    var outcome = this.imageService.queryImage(question);		
    return outcome;
  }
	
  @PostMapping("/import")
  public ImageDto postImportImage(@RequestParam("query") String question, 
    @RequestParam("type") String sort, 
    @RequestParam("file") MultipartFile imageQuery) {		
    var outcome = 
      this.imageService.importImage(this.imageMapper.map(imageQuery, question),   
      this.imageMapper.map(imageQuery));		
    return outcome;
  }	
}

The question endpoint comprises the ‘postImageQuery(…)’ technique that receives a type with the question and the picture sort and calls the ImageService to deal with the request.

The import endpoint comprises the ‘postImportImage(…)’ technique that receives a type with the question(immediate), the picture sort, and the file. The ImageMapper converts the shape to the ImageQueryDto and the Picture entity and calls the ImageService to deal with the request.

The Service

The ImageService seems to be like this:

@Service
@Transactional
public class ImageService {
...
  public ImageDto importImage(ImageQueryDto imageDto, Picture picture) {
    var resultData = this.createAIResult(imageDto);
    picture.setImageContent(resultData.imageQueryDto().getImageContent());
    var myImage = this.imageRepository.save(picture);
    var aiDocument = new Doc(resultData.reply());
    aiDocument.getMetadata().put(MetaData.ID, myImage.getId().toString());
    aiDocument.getMetadata().put(MetaData.DATATYPE, 
      MetaData.DataType.IMAGE.toString());
    this.documentVsRepository.add(Record.of(aiDocument));
    return new ImageDto(resultData.reply(),  
      Base64.getEncoder().encodeToString(resultData.imageQueryDto()
       .getImageContent()), resultData.imageQueryDto().getImageType());
  }

  public Record queryImage(String imageQuery) {
    var aiDocuments = this.documentVsRepository.retrieve(imageQuery, 
      MetaData.DataType.IMAGE, this.resultSize.intValue())
       .stream().filter(myDoc -> myDoc.getMetadata()
        .get(MetaData.DATATYPE).equals(DataType.IMAGE.toString()))
        .sorted((myDocA, myDocB) -> 
           ((Float) myDocA.getMetadata().get(MetaData.DISTANCE))
          .compareTo(((Float) myDocB.getMetadata().get(MetaData.DISTANCE))))
        .toList();
    var imageMap = this.imageRepository.findAllById(
      aiDocuments.stream().map(myDoc -> 
        (String) myDoc.getMetadata().get(MetaData.ID)).map(myUuid -> 
          UUID.fromString(myUuid)).toList())
        .stream().acquire(Collectors.toMap(myDoc -> myDoc.getId(), 
          myDoc -> myDoc));
    return imageMap.entrySet().stream().map(myEntry ->   
      createImageContainer(aiDocuments, myEntry))
	.sorted((containerA, containerB) -> 
          containerA.distance().compareTo(containerB.distance()))
	.map(myContainer -> new ImageDto(myContainer.doc().getContent(), 
	  Base64.getEncoder().encodeToString(
            myContainer.picture().getImageContent()),
	  myContainer.picture().getImageType())).restrict(this.resultSize)
        .toList();
  }

  non-public ImageContainer createImageContainer(Record aiDocuments, 
    Entry myEntry) {
    return new ImageContainer(
      createIdFilteredStream(aiDocuments, myEntry)
        .findFirst().orElseThrow(),
        myEntry.getValue(),
	createIdFilteredStream(aiDocuments, myEntry).map(myDoc -> 
          (Float) myDoc.getMetadata().get(MetaData.DISTANCE))
            .findFirst().orElseThrow());
  }

  non-public Stream createIdFilteredStream(Record aiDocuments, 
    Entry myEntry) {
    return aiDocuments.stream().filter(myDoc -> myEntry.getKey().toString()
      .equals((String) myDoc.getMetadata().get(MetaData.ID)));
  }

  non-public ResultData createAIResult(ImageQueryDto imageDto) {
    if (ImageType.JPEG.equals(imageDto.getImageType()) || 
      ImageType.PNG.equals(imageDto.getImageType())) {
	imageDto = this.resizeImage(imageDto);
    } 
    var immediate = new Immediate(new UserMessage(imageDto.getQuery(), 
      Record.of(new Media(MimeType.valueOf(imageDto.getImageType()
        .getMediaType()), imageDto.getImageContent()))));
    var response = this.chatClient.name(immediate);
    var resultData = new  
    ResultData(response.getResult().getOutput().getContent(), imageDto);
    return resultData;
  }

  non-public ImageQueryDto resizeImage(ImageQueryDto imageDto) {
    ...
  }
}

Within the ‘importImage(…)’ technique the strategy ‘createAIResult(…)’ known as. It checks the picture sort and calls the ‘resizeImage(…)’ technique to scale the picture to a measurement that the LLava mannequin helps. Then the Spring AI Immediate is created with the immediate textual content and the media with the picture, media sort, and the picture byte array. Then the ‘chatClient’ calls the immediate and the response is returned within the ‘ResultData’ file with the outline and the resized picture. Then the resized picture is added to the picture entity and the entity is continued. Now the AI doc is created with the embeddings, description, and the picture entity ID within the metadata. Then the ImageDto is created with the outline, the resized picture, and the picture sort and returned.

Within the ‘queryImage(…)’ technique the Spring AI Paperwork with the bottom distances are retrieved and filtered for AI paperwork of picture sort within the metadata. The Paperwork are then sorted for the bottom distance. Then the picture entities with the metadata IDs of the Spring AI Paperwork are loaded. That allows the creation of the ImageDtos with the matching paperwork and picture entities. The picture is offered as a Base64 encoded string. That allows the MediaType the straightforward show of the picture in an IMG tag.

To show a Base64 Png picture you need to use: ‘’

Outcome

The UI outcome seems to be like this:

The applying discovered the big airplane within the vector database utilizing the embeddings. The second picture was chosen due to an analogous sky. The search took solely a fraction of a second.

Conclusion

The assist of Spring AI and Ollama allows the usage of the free LLava mannequin. That makes the implementation of this picture database straightforward. The LLava mannequin generates good descriptions of the pictures that may be transformed into embeddings for quick looking out. Spring AI is lacking assist for the generate API endpoint, due to that the parameter ‘spring.ai.ollama.chat.options.keep_alive=1s’ is required to keep away from having previous knowledge within the context window. The LLava mannequin wants GPU acceleration for productive use. The LLava is barely used on import, which implies the creation of the descriptions might be achieved asynchronously. The LLava mannequin on a medium-powered Laptop computer runs on a CPU, for 5-10 minutes per picture. Such an answer for picture looking out is a leap ahead in comparison with earlier implementations. With extra GPUs or CPU assist for AI such Picture search options will turn out to be far more well-liked.

Questioning an Picture Database With native AI/LLM – DZone – Uplaza