Image Upload

1 min read

Functionality

This module makes us of the capabilities of some LLM to answers questions out of an Uploaded Image. This module uses the entirety of the conversation history (up to the model’s token limit). It also support using a summary history instead if correct tool definition is provided and the functionality is enabled from config.

Input

An Image with valid extension + user query. It’s helpful to trigger this module when specifying in the user query that you need to get information from uploaded image.
Example: ”Please describe the content of the uploaded image.”

Output

Answer to the question based on the image and conversation history.

Reference in Code in AI Module Template

ImageUpload

Configuration settings

Default Configuration

json
{
    "languageModel": "AZURE_GPT_4o_2024_1120",
    "chatHistory": {
        "enabled": true,
        "maxMessages": 10,
        "useModelTokenLimit": true,
        "defaultTokenLimit": 4096,
        "percentOfMaxTokens": 0.5
    },
    "supportedExtensions": [
        ".jpg",
        ".jpe",
        ".jpeg",
        ".png",
        ".gif",
        ".bmp"
    ]
}

General parameters

Parameter

Description

Type

Default

languageMod

used GPT model

string

AZURE_GPT_4o_2024_1120

chatHistory

Defined which part of history should be included in LLM call

object

See here

supportedExtensions:

List of supported image extensions.

list[string]

[".jpg",".jpe",".jpeg",".png",".gif",".bmp"]

chatHistory

Parameter

Description

Type

Default

enabled:

Whether to use full conversation history. If False, attempt to use conversation summary from tool call.

bool

true

maxMessages

Max history messages to consider

integer

10

useModelTokenLimit

Whether to use model token limit or not

bool

true

defaultTokenLimit

Default token limit if model token limit is not available.

integer

4096

percentageOfMaxTokens

Upper limit of tokens to keep from history.

float

0.5

Tool Definition

json
{
    "type": "function",
    "function": {
      "name": "ImageUpload",
      "parameters": {
        "type": "object",
        "required": [
          "query",
          "conversation_history"
        ],
        "properties": {
          "query": {
            "type": "string",
            "description": "Smart query based on the user request but augmented with the context of conversation if needed."
          },
          "image_name": {
            "type": "string",
            "description": "Name of the image file to be analyzed [Optional]."
          },
          "conversation_history": {
            "type": "string",
            "description": "Summary of the history of the conversation to provide context for the query."
          }
        }
      },
      "description": "This tool is designed to analyze, extract, and summarize information from images. It should be invoked whenever the user inquires about images."
    }
}
Last updated