Image Upload

1 min read

Functionality

This module makes us of the capabilities of some LLM to answers questions out of an Uploaded Image. This module uses the entirety of the conversation history (up to the model’s token limit). It also support using a summary history instead if correct tool definition is provided and the functionality is enabled from config.

Input

An Image with valid extension + user query. It’s helpful to trigger this module when specifying in the user query that you need to get information from uploaded image.
Example: ”Please describe the content of the uploaded image.”

Output

Answer to the question based on the image and conversation history.

Reference in Code in AI Module Template

ImageUpload

Configuration settings

Default Configuration

json

{
    "languageModel": "AZURE_GPT_4o_2024_1120",
    "chatHistory": {
        "enabled": true,
        "maxMessages": 10,
        "useModelTokenLimit": true,
        "defaultTokenLimit": 4096,
        "percentOfMaxTokens": 0.5
    },
    "supportedExtensions": [
        ".jpg",
        ".jpe",
        ".jpeg",
        ".png",
        ".gif",
        ".bmp"
    ]
}

General parameters

Parameter	Description	Type	Default
`languageMod`	used GPT model	`string`	`AZURE_GPT_4o_2024_1120`
`chatHistory`	Defined which part of history should be included in LLM call	`object`	See here
`supportedExtensions`:	List of supported image extensions.	`list[string]`	`[".jpg",".jpe",".jpeg",".png",".gif",".bmp"]`

chatHistory

Parameter	Description	Type	Default
`enabled`:	Whether to use full conversation history. If False, attempt to use conversation summary from tool call.	bool	`true`
`maxMessages`	Max history messages to consider	integer	`10`
`useModelTokenLimit`	Whether to use model token limit or not	bool	`true`
`defaultTokenLimit`	Default token limit if model token limit is not available.	integer	`4096`
`percentageOfMaxTokens`	Upper limit of tokens to keep from history.	float	`0.5`

Tool Definition

json

{
    "type": "function",
    "function": {
      "name": "ImageUpload",
      "parameters": {
        "type": "object",
        "required": [
          "query",
          "conversation_history"
        ],
        "properties": {
          "query": {
            "type": "string",
            "description": "Smart query based on the user request but augmented with the context of conversation if needed."
          },
          "image_name": {
            "type": "string",
            "description": "Name of the image file to be analyzed [Optional]."
          },
          "conversation_history": {
            "type": "string",
            "description": "Summary of the history of the conversation to provide context for the query."
          }
        }
      },
      "description": "This tool is designed to analyze, extract, and summarize information from images. It should be invoked whenever the user inquires about images."
    }
}