Skip to main content

图像输入

将图像作为附件发送到Copilot会话。 可通过两种方法附加图像:

  • 文件附件type: "file"):提供绝对路径;运行时从磁盘读取文件,将其转换为 base64,并将其发送到 LLM。
  • Blob 附件type: "blob"):直接提供 base64 编码的数据;当图像已在内存中时非常有用(例如屏幕截图、生成的图像或 API 中的数据)。

概述

关系图:显示描述的过程的序列图。

概念Description
文件附件磁盘上含有type: "file"的附件和绝对path的图像
Blob 附件包含 type: "blob" 的附件,base64 编码 data,以及 mimeType——不需要磁盘 I/O。
自动编码对于文件附件,运行时将读取图像并将其自动转换为 base64
自动调整大小运行时会自动调整图像的大小,或降低超出模型特定限制的图像的质量。
视觉功能模型必须具有 capabilities.supports.vision = true 才能处理图像

快速入门 - 文件附件

使用文件附件类型将图像文件附加到任何邮件。 该路径必须是磁盘上映像的绝对路径。

TypeScript
import { CopilotClient } from "@github/copilot-sdk";

const client = new CopilotClient();
await client.start();

const session = await client.createSession({
    model: "gpt-4.1",
    onPermissionRequest: async () => ({ kind: "approve-once" }),
});

await session.send({
    prompt: "Describe what you see in this image",
    attachments: [
        {
            type: "file",
            path: "/absolute/path/to/screenshot.png",
        },
    ],
});
Python
from copilot import CopilotClient, PermissionDecisionApproveOnce

client = CopilotClient()
await client.start()

session = await client.create_session(
    on_permission_request=lambda req, inv: PermissionDecisionApproveOnce(),
    model="gpt-4.1",
)

await session.send(
    "Describe what you see in this image",
    attachments=[
        {
            "type": "file",
            "path": "/absolute/path/to/screenshot.png",
        },
    ],
)
Go
package main

import (
    "context"
    copilot "github.com/github/copilot-sdk/go"
    "github.com/github/copilot-sdk/go/rpc"
)

func main() {
    ctx := context.Background()
    client := copilot.NewClient(nil)
    client.Start(ctx)

    session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
        Model: "gpt-4.1",
        OnPermissionRequest: func(req copilot.PermissionRequest, inv copilot.PermissionInvocation) (rpc.PermissionDecision, error) {
            return &rpc.PermissionDecisionApproveOnce{}, nil
        },
    })

    path := "/absolute/path/to/screenshot.png"
    session.Send(ctx, copilot.MessageOptions{
        Prompt: "Describe what you see in this image",
        Attachments: []copilot.Attachment{
            &copilot.UserMessageAttachmentFile{
                DisplayName: "screenshot.png",
                Path:        path,
            },
        },
    })
}
ctx := context.Background()
client := copilot.NewClient(nil)
client.Start(ctx)

session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
    Model: "gpt-4.1",
    OnPermissionRequest: func(req copilot.PermissionRequest, inv copilot.PermissionInvocation) (rpc.PermissionDecision, error) {
        return &rpc.PermissionDecisionApproveOnce{}, nil
    },
})

path := "/absolute/path/to/screenshot.png"
session.Send(ctx, copilot.MessageOptions{
    Prompt: "Describe what you see in this image",
    Attachments: []copilot.Attachment{
        &copilot.UserMessageAttachmentFile{
            DisplayName: "screenshot.png",
            Path:        path,
        },
    },
})
.NET
using GitHub.Copilot;
using GitHub.Copilot.Rpc;

public static class ImageInputExample
{
    public static async Task Main()
    {
        await using var client = new CopilotClient();
        await using var session = await client.CreateSessionAsync(new SessionConfig
        {
            Model = "gpt-4.1",
            OnPermissionRequest = (req, inv) =>
                Task.FromResult(PermissionDecision.ApproveOnce()),
        });

        await session.SendAsync(new MessageOptions
        {
            Prompt = "Describe what you see in this image",
            Attachments = new List<UserMessageAttachment>
            {
                new UserMessageAttachmentFile
                {
                    Path = "/absolute/path/to/screenshot.png",
                    DisplayName = "screenshot.png",
                },
            },
        });
    }
}
using GitHub.Copilot;
using GitHub.Copilot.Rpc;

await using var client = new CopilotClient();
await using var session = await client.CreateSessionAsync(new SessionConfig
{
    Model = "gpt-4.1",
    OnPermissionRequest = (req, inv) =>
        Task.FromResult(PermissionDecision.ApproveOnce()),
});

await session.SendAsync(new MessageOptions
{
    Prompt = "Describe what you see in this image",
    Attachments = new List<UserMessageAttachment>
    {
        new UserMessageAttachmentFile
        {
            Path = "/absolute/path/to/screenshot.png",
            DisplayName = "screenshot.png",
        },
    },
});
Java
import com.github.copilot.sdk.CopilotClient;
import com.github.copilot.sdk.events.*;
import com.github.copilot.sdk.json.*;
import java.util.List;

try (var client = new CopilotClient()) {
    client.start().get();

    var session = client.createSession(
        new SessionConfig()
            .setModel("gpt-4.1")
            .setOnPermissionRequest(PermissionHandler.APPROVE_ALL)
    ).get();

    session.send(new MessageOptions()
        .setPrompt("Describe what you see in this image")
        .setAttachments(List.of(
            new Attachment("file", "/absolute/path/to/screenshot.png", "screenshot.png")
        ))
    ).get();
}

快速入门 - Blob 附件

如果内存中已有图像数据(例如应用捕获的屏幕截图或从 API 提取的图像),请使用 blob 附件直接发送它,而无需写入磁盘。

TypeScript
import { CopilotClient } from "@github/copilot-sdk";

const client = new CopilotClient();
await client.start();

const session = await client.createSession({
    model: "gpt-4.1",
    onPermissionRequest: async () => ({ kind: "approve-once" }),
});

const base64ImageData = "..."; // your base64-encoded image
await session.send({
    prompt: "Describe what you see in this image",
    attachments: [
        {
            type: "blob",
            data: base64ImageData,
            mimeType: "image/png",
            displayName: "screenshot.png",
        },
    ],
});
Python
from copilot import CopilotClient, PermissionDecisionApproveOnce

client = CopilotClient()
await client.start()

session = await client.create_session(
    on_permission_request=lambda req, inv: PermissionDecisionApproveOnce(),
    model="gpt-4.1",
)

base64_image_data = "..."  # your base64-encoded image
await session.send(
    "Describe what you see in this image",
    attachments=[
        {
            "type": "blob",
            "data": base64_image_data,
            "mimeType": "image/png",
            "displayName": "screenshot.png",
        },
    ],
)
Go
package main

import (
    "context"
    copilot "github.com/github/copilot-sdk/go"
    "github.com/github/copilot-sdk/go/rpc"
)

func main() {
    ctx := context.Background()
    client := copilot.NewClient(nil)
    client.Start(ctx)

    session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
        Model: "gpt-4.1",
        OnPermissionRequest: func(req copilot.PermissionRequest, inv copilot.PermissionInvocation) (rpc.PermissionDecision, error) {
            return &rpc.PermissionDecisionApproveOnce{}, nil
        },
    })

    base64ImageData := "..."
    mimeType := "image/png"
    displayName := "screenshot.png"
    session.Send(ctx, copilot.MessageOptions{
        Prompt: "Describe what you see in this image",
        Attachments: []copilot.Attachment{
            &copilot.UserMessageAttachmentBlob{
                Data:        base64ImageData,
                MIMEType:    mimeType,
                DisplayName: &displayName,
            },
        },
    })
}
mimeType := "image/png"
displayName := "screenshot.png"
session.Send(ctx, copilot.MessageOptions{
    Prompt: "Describe what you see in this image",
    Attachments: []copilot.Attachment{
        &copilot.UserMessageAttachmentBlob{
            Data:        base64ImageData, // base64-encoded string
            MIMEType:    mimeType,
            DisplayName: &displayName,
        },
    },
})
.NET
using GitHub.Copilot;
using GitHub.Copilot.Rpc;

public static class BlobAttachmentExample
{
    public static async Task Main()
    {
        await using var client = new CopilotClient();
        await using var session = await client.CreateSessionAsync(new SessionConfig
        {
            Model = "gpt-4.1",
            OnPermissionRequest = (req, inv) =>
                Task.FromResult(PermissionDecision.ApproveOnce()),
        });

        var base64ImageData = "...";
        await session.SendAsync(new MessageOptions
        {
            Prompt = "Describe what you see in this image",
            Attachments = new List<UserMessageAttachment>
            {
                new UserMessageAttachmentBlob
                {
                    Data = base64ImageData,
                    MimeType = "image/png",
                    DisplayName = "screenshot.png",
                },
            },
        });
    }
}
await session.SendAsync(new MessageOptions
{
    Prompt = "Describe what you see in this image",
    Attachments = new List<UserMessageAttachment>
    {
        new UserMessageAttachmentBlob
        {
            Data = base64ImageData,
            MimeType = "image/png",
            DisplayName = "screenshot.png",
        },
    },
});
Java
import com.github.copilot.sdk.CopilotClient;
import com.github.copilot.sdk.events.*;
import com.github.copilot.sdk.json.*;
import java.util.List;

try (var client = new CopilotClient()) {
    client.start().get();

    var session = client.createSession(
        new SessionConfig()
            .setModel("gpt-4.1")
            .setOnPermissionRequest(PermissionHandler.APPROVE_ALL)
    ).get();

    var base64ImageData = "..."; // your base64-encoded image
    session.send(new MessageOptions()
        .setPrompt("Describe what you see in this image")
        .setAttachments(List.of(
            new BlobAttachment()
                .setData(base64ImageData)
                .setMimeType("image/png")
                .setDisplayName("screenshot.png")
        ))
    ).get();
}

支持的格式

支持的图像格式包括 JPG、PNG、GIF 和其他常见图像类型。 对于文件附件,运行时从磁盘读取映像,并根据需要转换映像。 对于 Blob 附件,可以直接提供 base64 数据和 MIME 类型。 使用 PNG 或 JPEG 获得最佳效果,因为这些格式是支持最广泛的格式。

模型的字段列出了它接受的 capabilities.limits.vision.supported_media_types 确切 MIME 类型。

自动处理

运行时会自动处理图像以适应模型的约束。 无需手动调整大小。

  • 超出模型尺寸或大小限制的图像会自动调整大小(保留纵横比)或降低质量。
  • 如果图像在处理后无法满足要求,则会跳过该图像,并且不会发送到 LLM。
  • 模型的 capabilities.limits.vision.max_prompt_image_size 字段指示最大图像大小(以字节为单位)。

可以通过模型功能对象在运行时检查这些限制。 为了获得最佳体验,请使用大小合理的 PNG 或 JPEG 图像。

视觉模型功能

并非所有模型都支持视觉。 在发送图像之前检查模型的功能。

功能字段

领域类型Description
capabilities.supports.visionboolean模型是否可以处理图像输入
capabilities.limits.vision.supported_media_typesstring[]模型接受的 MIME 类型(例如 ["image/png", "image/jpeg"]
capabilities.limits.vision.max_prompt_imagesnumber每个提示的最大图像数
capabilities.limits.vision.max_prompt_image_sizenumber最大图像大小(以字节为单位)

视觉限制类型

interface VisionCapabilities {
    vision?: {
        supported_media_types: string[];
        max_prompt_images: number;
        max_prompt_image_size: number; // bytes
    };
}
vision?: {
    supported_media_types: string[];
    max_prompt_images: number;
    max_prompt_image_size: number; // bytes
};

接收图像处理结果

当工具返回图像(例如屏幕截图或生成的图表)时,结果包含 "image" 具有 base64 编码数据的内容块。

领域类型Description
type"image"内容块类型鉴别器
datastringBase64 编码的图像数据
mimeTypestringMIME 类型(例如) "image/png"

这些图像块显示在事件结果 tool.execution_complete 中。 有关完整的事件生命周期,请参阅 流式会话事件 指南。

提示和限制

Tip详细信息
直接使用 PNG 或 JPEG避免转换开销 - 这些内容会原样发送到 LLM
使图像保持合理大小大型图像可能会质量降低,这可能会丢失重要细节
对文件附件使用绝对路径运行时从磁盘读取文件;相对路径可能无法正确解析
使用 BLOB 附件来处理内存中的数据如果已有 base64 数据(例如屏幕截图、API 响应),Blob 将避免不必要的磁盘 I/O
首先检查视觉支持将图像发送到没有视觉理解能力的非视觉模型会浪费标记。
支持多个映像在一个消息中附加若干附件,至多到模型的max_prompt_images限制
不支持 SVGSVG 文件基于文本,并且从图像处理中排除

另见