Reduce costs by caching frequently-used context

Create and manage cached context for repeated queries against large payloads. Use caching when making multiple queries against the same large dataset or document - you pay once for input tokens, then only for output tokens.

Usage

gemini_cache_create(
  userdata,
  ml = NULL,
  instructions = NULL,
  ttl_seconds = NULL,
  displayName = NULL
)

gemini_cache_delete(cache)

gemini_cache_get(cache)

gemini_cache_list()

gemini_chat_cached(
  prompt,
  cache,
  ml = NULL,
  max_think = FALSE,
  temp = 1,
  timeout = 60
)

.estTokenUsage(userdata)

Arguments

userdata: Character or list. Data to cache - either JSON string or R object (will be serialized to JSON). Must be >= ~4096 estimated tokens. Use for large context like documents, datasets, or conversation history.
ml: Character. Model ID to use. If NULL (default), uses ART_GEMINI_MODEL env var. The cache is tied to this model.
instructions: Character. System instruction for cached context. Default "You are a helpful agent." Sets behavioral context for all queries using this cache.
ttl_seconds: Numeric. Time-to-live in seconds. Default 43200 (12 hours). Cache is automatically deleted after this time. Override via ART_GEMINI_TTL_SEC.
displayName: Character. Human-friendly label for the cache (e.g., "artist-portfolio-2024"). For metadata only - use the returned name field for operations.
cache: Character. Cache name/ID returned from gemini_cache_create(). Format: alphanumeric string. Used for get/delete/chat operations.
prompt: Character. User message for cached chat. Must be a single string.
max_think: Logical. Enable extended reasoning (thinkingLevel = "high") for Gemini 3 models. Silently ignored for other models. Default FALSE.
temp: Numeric. Temperature setting (0-2). Default 1.
timeout: Numeric. Request timeout in seconds. Default 60.

Value

List with name, model, tokenUsage, and optional displayName.

Invisible; errors on non-2xx.

List with cache metadata (name, model, tokenUsage, displayName).

List with cachedContents entries (each carries name/model/tokenUsage).

Character model reply with attributes modelVersion/usageMetadata.

Details

Minimum cache size: Payloads must be at least ~4096 estimated tokens (calculated as nchar(json) / 4). Smaller payloads will error before the API call is made: "Payload too small for explicit caching (min 4096 est. tokens)."

displayName: Optional human-friendly label set only at cache creation. It is returned on gemini_cache_get() and gemini_cache_list() but cannot be used for direct lookup - always use the cache name (ID) for operations.

TTL: Default is 12 hours (43200 seconds). Override via ttl_seconds parameter or ART_GEMINI_TTL_SEC environment variable.

Functions

gemini_cache_create(): Create an explicit cache entry
gemini_cache_delete(): Delete a cached content entry
gemini_cache_get(): Get cached content metadata
gemini_cache_list(): List cached content entries
gemini_chat_cached(): Chat using a cached context
.estTokenUsage(): Estimate token usage for a cached payload

Examples

if (FALSE) { # \dontrun{
# Create a large payload (must be >= ~4096 tokens)
large_context <- list(
  documents = lapply(1:100, function(i) {
    list(id = i, content = paste(rep("Sample text content.", 50), collapse = " "))
  })
)

# Create cache with 10-minute TTL
cache <- gemini_cache_create(large_context, ttl_seconds = 600)

# Query against cached context
gemini_chat_cached("Summarize document 42.", cache = cache$name)

# Clean up when done
gemini_cache_delete(cache$name)
} # }