Skip to content

Attachments And Multimodal Input

KsADK normalizes text, image, file, and uploaded attachment inputs before they reach a framework runner. This lets local UI, Responses API, Chat Completions, and ADK-style parts share one execution path.

Input Shapes

Shape Public protocol Runtime behavior
input_text Responses text content block
input_image Responses image URL or data URL
input_file Responses file data, file URL, file URI, or file ID
text legacy/ADK-style part text part
inlineData legacy/ADK-style part base64 attachment
fileData legacy/ADK-style part uploaded or referenced file
image_url Chat Completions content block image URL content

Public examples should prefer Responses item names for /v1/responses and Chat Completions content blocks for /v1/chat/completions.

Normalization Flow

flowchart LR
  Raw["client input"] --> Blocks["canonical content blocks"]
  Blocks --> Parts["normalized parts"]
  Parts --> Attach["attachments"]
  Attach --> Extract["attachment_results"]
  Extract --> Prompt["text prompt context"]
  Blocks --> Payload["runner payload"]
  Extract --> Payload

The runner receives both canonical content (input_content, input_messages) and structured attachment results (attachment_results, current_attachment_results).

Protocol Mapping

KsADK accepts several client representations, then maps them into the same internal attachment path:

Client surface Example field Canonical meaning
Responses input[].content[].type=input_image image input block
Responses input[].content[].type=input_file file input block
Chat Completions messages[].content[].type=image_url image input block
ADK-style part inlineData inline bytes with MIME type
ADK-style part fileData uploaded or referenced file
Local Web UI ksadk-upload://... runtime-local upload reference

Application code should use the normalized runner payload, not branch on which public protocol the client used.

Responses Examples

Text plus image:

{
  "model": "my-agent",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Describe this image."},
        {
          "type": "input_image",
          "image_url": "data:image/png;base64,..."
        }
      ]
    }
  ],
  "stream": false
}

Text plus file:

{
  "model": "my-agent",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Summarize this note."},
        {
          "type": "input_file",
          "filename": "note.txt",
          "file_data": "data:text/plain;base64,..."
        }
      ]
    }
  ]
}

Upload Then Reference

The local Web UI can upload a file and receive a URI:

{
  "FileData": {
    "fileUri": "ksadk-upload://abc123",
    "displayName": "contract.pdf",
    "mimeType": "application/pdf",
    "sizeBytes": 102400
  }
}

A later request can reference it:

{
  "type": "input_file",
  "filename": "contract.pdf",
  "file_url": "ksadk-upload://abc123"
}

Attachment Results

Each extracted attachment result can include:

Field Meaning
display_name file name shown to users
mime_type detected or provided MIME type
transport inline or reference
file_uri reference URI when available
size_bytes file size
kind text, document, image, archive, or binary
status ok, partial, or failed
warnings extraction warnings
extraction_method parser or OCR path used
text extracted text, when available
text_excerpt short display/search excerpt

Text and document content may be truncated. Business code should handle missing or partial extraction results.

Inline Data Versus References

KsADK supports two attachment transport styles:

Transport Example Use when
inline file_data: "data:text/plain;base64,..." small local examples, tests, direct API calls
reference file_url: "ksadk-upload://..." Web UI upload flows and larger local files

Inline data is self-contained but increases request size. References keep the request small, but they are only meaningful to the runtime that created the upload reference. Public docs should not present ksadk-upload://... as a durable URL.

Display Content Versus Runner Content

The runtime keeps separate views of an uploaded file:

View Purpose
display_content short user-visible message text and attachment names
prompt attachment text extracted or summarized text added to the runner prompt
attachment_results structured metadata, extraction status, warnings, and text excerpts
current_attachment_results structured results only for files in the current user turn

This separation avoids flooding the UI transcript with extracted document text while still giving the runner enough context to answer file-related questions. If your agent needs exact structured file metadata, read attachment_results instead of parsing the display text.

Supported Extraction Paths

Attachment Typical handling
text files decode text using common encodings
PDF native text extraction, with OCR fallback when available
DOCX/PPTX/XLSX/HTML document-specific text extraction
images OCR when OCR dependencies are available
ZIP safe enumeration with path and size restrictions
binary files metadata and warnings only

ZIP handling is intentionally conservative: path traversal, nested archives, executable entries, large entries, and excessive total extracted size are blocked.

ZIP And Archive Boundaries

Archive handling is optimized for safe inspection, not arbitrary extraction. The runtime may list or extract text-like entries, but it can reject:

  • path traversal entries.
  • nested archives.
  • executable files.
  • very large entries.
  • archives whose total expanded size exceeds local limits.

Treat archive results as partial unless status is ok and no warnings are present. Agents should ask the user for a narrower file or a direct text export when archive extraction is incomplete.

Runner Payload Fields

Framework adapters receive these file-related fields:

Field Use when
input_content preserving Responses-style multimodal blocks
input_messages passing message-native state into LangGraph or a model client
attachments using the effective file context for this turn
attachment_results reading extracted text, OCR, or document metadata
current_attachments checking files uploaded in the current user turn only
current_attachment_results processing only newly uploaded files
has_current_files deciding whether this turn should override prior file context

For follow-up questions, attachment_results can include the most recent session attachment context even when the current turn has no new file. Use current_attachment_results when the workflow must react only to newly uploaded files.

Framework Adapter Notes

Framework adapters receive the same normalized attachment payload, but they may feed it to the framework differently:

Adapter Typical behavior
ADK converts text and supported inline bytes into ADK Content / Part values
LangGraph includes normalized messages, attachment context, and optional image blocks in state
LangChain includes attachment context in prepared input or the replay prompt
custom hooks receive structured payload fields and can choose the exact state shape

If an application uses a custom hook, include the attachment fields you actually need:

def ksadk_prepare_state(payload: dict, session_context: dict) -> dict:
    return {
        "messages": payload.get("input_messages", []),
        "files": payload.get("current_attachment_results", []),
        "file_context": payload.get("attachment_results", []),
    }

Prefer current_attachment_results for workflows such as "process the file I just uploaded". Prefer attachment_results for follow-up questions such as "now summarize the second section".

Session Carryover

If the current user turn includes attachments, those attachments become the effective context. If a later turn has no new attachment, the runtime can restore the most recent attachment context from the same session.

flowchart TD
  A["user turn"] --> B{"has current files?"}
  B -->|yes| C["use current attachments"]
  B -->|no| D["restore recent session attachment context"]
  C --> E["runner payload"]
  D --> E

Use current_attachment_results when the user just uploaded a file. Use attachment_results when follow-up questions should keep working with the prior file.

Safety Rules

  • Validate file type and size before business processing.
  • Treat uploaded file content as untrusted user input.
  • Do not publish uploaded customer files, OCR text, or local file paths.
  • Prefer placeholder data in docs and tests.
  • Keep large binary fixtures out of the public repository unless reviewed.
  • Prefer file_data or ksadk-upload:// references for local processing; do not assume the runtime will fetch arbitrary remote file_url values.
  • Avoid logging extracted attachment text in CI or public issue templates.