Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 3.51 KB

gpt4v.md

File metadata and controls

58 lines (40 loc) · 3.51 KB

RAG chat: Using GPT vision model with RAG approach

This repository includes an optional feature that uses the GPT vision model to generate responses based on retrieved content. This feature is useful for answering questions based on the visual content of documents, such as photos and charts.

How it works

When this feature is enabled, the following changes are made to the application:

  • Search index: We added a new field to the Azure AI Search index to store the embedding returned by the multimodal Azure AI Vision API (while keeping the existing field that stores the OpenAI text embeddings).
  • Data ingestion: In addition to our usual PDF ingestion flow, we also convert each PDF document page to an image, store that image with the filename rendered on top, and add the embedding to the index.
  • Question answering: We search the index using both the text and multimodal embeddings. We send both the text and the image to gpt-4o, and ask it to answer the question based on both kinds of sources.
  • Citations: The frontend displays both image sources and text sources, to help users understand how the answer was generated.

For more details on how this feature works, read this blog post or watch this video.

Using the feature

Prerequisites

Deployment

  1. Enable GPT vision approach:

    First, make sure you do not have integrated vectorization enabled, since that is currently incompatible:

    azd env set USE_FEATURE_INT_VECTORIZATION false

    Then set the environment variable for enabling vision support:

    azd env set USE_GPT4V true

    When set, that flag will provision a Azure AI Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new imageEmbedding field, and enable the vision approach in the UI.

  2. Clean old deployments (optional): Run azd down --purge for a fresh setup.

  3. Start the application: Execute azd up to build, provision, deploy, and initiate document preparation.

  4. Try out the feature: GPT4V configuration screenshot

    • Access the developer options in the web app and select "Use GPT vision model".
    • New sample questions will show up in the UI that are based on the sample financial document.
    • Try out a question and see the answer generated by the GPT vision model.
    • Check the 'Thought process' and 'Supporting content' tabs.