Skip to contents

Introduction

Computer use is an advanced experimental feature that enables Claude to see your screen and control your computer through mouse movements, clicks, and keyboard input. This capability transforms Claude from a text-based assistant into an agent that can interact with any graphical application—browsers, desktop software, custom applications, and more.

This vignette provides a comprehensive deep dive into computer use, covering the agentic loop architecture, implementation patterns, safety considerations, and real-world applications. By the end, you’ll understand how to build robust computer control workflows and integrate them with other artclaude features.

Important: Computer use is an experimental beta feature. It gives Claude direct control over your system, which carries inherent risks. Always follow the safety guidelines in this vignette.

How Computer Use Works

Computer use follows an agentic loop pattern where Claude iteratively observes the screen, decides on actions, and executes them until a task is complete.

The Observation-Action Loop

┌─────────────────────────────────────────────────────────────────┐
│                     COMPUTER USE LOOP                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
│  │  Screenshot  │───▶│   Claude     │───▶│   Execute    │      │
│  │  Function    │    │   Analyzes   │    │   Action     │      │
│  └──────────────┘    └──────────────┘    └──────────────┘      │
│         ▲                                       │               │
│         │                                       │               │
│         └───────────────────────────────────────┘               │
│                    New Screenshot                               │
│                                                                 │
│  Exit conditions:                                               │
│  - Task complete                                                │
│  - max_iter reached                                             │
│  - Error encountered                                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The loop consists of these phases:

  1. Screenshot Capture: Your screenshot_fn captures the current screen state
  2. Visual Analysis: Claude receives the screenshot and analyzes the UI
  3. Decision Making: Claude determines what action to take
  4. Action Execution: Your action_fn executes the requested action
  5. Iteration: New screenshot is taken and the loop continues

This continues until Claude determines the task is complete or the maximum iterations are reached.

The claude_computer() Function

library(artclaude)

result <- claude_computer(
  prompt = "Open Notepad and type 'Hello, World!'",
  screenshot_fn = take_screenshot,
  action_fn = execute_action,
  display_size = c(1920L, 1080L),
  max_iter = 20L,
  echo = "all"
)

Parameters:

Parameter Type Description
prompt character The task for Claude to accomplish
screenshot_fn function Takes no arguments, returns path to screenshot
action_fn function Receives action list, executes the action
display_size integer[2] Screen dimensions c(width, height)
ml character Model ID (default: Opus 4.5)
max_iter integer Maximum iterations before stopping
sys_prompt character Optional system prompt
echo character Output mode: “none”, “output”, “all”

Returns:

A list containing:

  • response: Claude’s final response text
  • actions: List of all actions executed
  • iterations: Number of loop iterations completed
  • chat: The chat object (for session continuation)

Available Actions

Claude can request these actions through the computer tool:

Action Description Required Parameters
screenshot Take a new screenshot -
mouse_move Move cursor to position coordinate
left_click Left mouse click coordinate
right_click Right mouse click coordinate
double_click Double left click coordinate
type Type text string text
key Press a key/combination text
scroll_up Scroll up coordinate (optional)
scroll_down Scroll down coordinate (optional)

Coordinate System

Coordinates are provided as c(x, y) where:

  • (0, 0) is the top-left corner
  • x increases to the right
  • y increases downward
  • Values should match your display_size

Key Names

For the key action, use standard key names:

Category Examples
Letters a, b, c, …
Numbers 1, 2, 3, …
Special enter, tab, escape, backspace, delete
Arrows up, down, left, right
Modifiers ctrl, alt, shift, command
Function f1, f2, … f12
Combinations ctrl+c, ctrl+v, alt+tab

Implementation Guide

Screenshot Function

The screenshot function must capture the current screen state and return the file path.

Using magick Package

take_screenshot_magick <- function() {
  path <- tempfile(fileext = ".png")

  # Capture screen using system tools, then process with magick
  # This is platform-specific
  if (.Platform$OS.type == "windows") {
    # Windows: Use PowerShell
    ps_script <- sprintf('
            Add-Type -AssemblyName System.Windows.Forms
            $screen = [System.Windows.Forms.Screen]::PrimaryScreen.Bounds
            $bitmap = New-Object System.Drawing.Bitmap($screen.Width, $screen.Height)
            $graphics = [System.Drawing.Graphics]::FromImage($bitmap)
            $graphics.CopyFromScreen($screen.Location, [System.Drawing.Point]::Empty, $screen.Size)
            $bitmap.Save("%s")
        ', path)
    system2("powershell", c("-Command", ps_script), stdout = FALSE)
  } else if (Sys.info()["sysname"] == "Darwin") {
    # macOS: Use screencapture
    system2("screencapture", c("-x", path))
  } else {
    # Linux: Use scrot or similar
    system2("scrot", path)
  }

  path
}

Using reticulate with Python

take_screenshot_python <- function() {
  path <- tempfile(fileext = ".png")

  pyautogui <- reticulate::import("pyautogui")
  screenshot <- pyautogui$screenshot()
  screenshot$save(path)

  path
}

Using the screenshot Package

take_screenshot_pkg <- function() {
  path <- tempfile(fileext = ".png")

  if (requireNamespace("screenshot", quietly = TRUE)) {
    screenshot::screenshot(file = path)
  } else {
    stop("screenshot package not installed")
  }

  path
}

Action Function

The action function receives a list with the action details and executes it.

Using reticulate with pyautogui

This is the most robust cross-platform approach:

execute_action_pyautogui <- function(action) {
  pyautogui <- reticulate::import("pyautogui")

  # Disable fail-safe for automated use (careful!)
  pyautogui$FAILSAFE <- FALSE

  switch(action$action,
    "screenshot" = {
      # Screenshot is handled by screenshot_fn
      NULL
    },
    "mouse_move" = {
      pyautogui$moveTo(
        action$coordinate[1],
        action$coordinate[2],
        duration = 0.25
      )
    },
    "left_click" = {
      pyautogui$click(
        action$coordinate[1],
        action$coordinate[2]
      )
    },
    "right_click" = {
      pyautogui$rightClick(
        action$coordinate[1],
        action$coordinate[2]
      )
    },
    "double_click" = {
      pyautogui$doubleClick(
        action$coordinate[1],
        action$coordinate[2]
      )
    },
    "type" = {
      # Use write for regular text, handling special characters
      pyautogui$write(action$text, interval = 0.02)
    },
    "key" = {
      # Handle key combinations (e.g., "ctrl+c")
      if (stringr::str_detect(action$text, "\\+")) {
        keys <- stringr::str_split(action$text, "\\+")[[1]]
        pyautogui$hotkey(keys)
      } else {
        pyautogui$press(action$text)
      }
    },
    "scroll_up" = {
      if (!is.null(action$coordinate)) {
        pyautogui$moveTo(action$coordinate[1], action$coordinate[2])
      }
      pyautogui$scroll(3)
    },
    "scroll_down" = {
      if (!is.null(action$coordinate)) {
        pyautogui$moveTo(action$coordinate[1], action$coordinate[2])
      }
      pyautogui$scroll(-3)
    }
  )

  invisible(NULL)
}

Using Windows-Specific API

execute_action_windows <- function(action) {
  # Requires user32.dll via Rcpp or similar
  # This is a conceptual example

  switch(action$action,
    "left_click" = {
      ps_script <- sprintf('
                Add-Type -TypeDefinition @"
                using System;
                using System.Runtime.InteropServices;
                public class Win32 {
                    [DllImport("user32.dll")]
                    public static extern bool SetCursorPos(int X, int Y);
                    [DllImport("user32.dll")]
                    public static extern void mouse_event(uint dwFlags, int dx, int dy, uint dwData, int dwExtraInfo);
                }
"@
                [Win32]::SetCursorPos(%d, %d)
                [Win32]::mouse_event(0x02, 0, 0, 0, 0)  # LEFTDOWN
                [Win32]::mouse_event(0x04, 0, 0, 0, 0)  # LEFTUP
            ', action$coordinate[1], action$coordinate[2])
      system2("powershell", c("-Command", ps_script), stdout = FALSE)
    }
    # ... other actions
  )
}

Case Study: Automated Data Entry

Let’s build a complete example that uses computer use to automate data entry into a web form.

Scenario

We need to enter artist information into a legacy web application that doesn’t have an API. Claude will navigate to the form, fill in the fields, and submit the data.

Implementation

library(artclaude)
library(reticulate)

# Initialize Python environment
use_virtualenv("computer-use-env")

# Screenshot function
take_screenshot <- function() {
  path <- tempfile(fileext = ".png")
  pyautogui <- import("pyautogui")
  screenshot <- pyautogui$screenshot()
  screenshot$save(path)
  path
}

# Action function with logging
execute_action <- function(action) {
  pyautogui <- import("pyautogui")

  # Log the action
  message(sprintf("[Action] %s", action$action))
  if (!is.null(action$coordinate)) {
    message(sprintf(
      "  Coordinates: (%d, %d)",
      action$coordinate[1], action$coordinate[2]
    ))
  }
  if (!is.null(action$text)) {
    message(sprintf("  Text: %s", action$text))
  }

  # Execute the action
  switch(action$action,
    "mouse_move" = pyautogui$moveTo(
      action$coordinate[1],
      action$coordinate[2],
      duration = 0.3
    ),
    "left_click" = pyautogui$click(
      action$coordinate[1],
      action$coordinate[2]
    ),
    "double_click" = pyautogui$doubleClick(
      action$coordinate[1],
      action$coordinate[2]
    ),
    "type" = pyautogui$write(action$text, interval = 0.03),
    "key" = pyautogui$press(action$text),
    "scroll_up" = pyautogui$scroll(3),
    "scroll_down" = pyautogui$scroll(-3)
  )

  invisible(NULL)
}

# Artist data to enter
artist_data <- list(
  name = "Claude Monet",
  birth_year = "1840",
  nationality = "French",
  movement = "Impressionism",
  famous_work = "Water Lilies"
)

# Build the prompt
prompt <- sprintf(
  "I need you to fill out the artist registration form visible on screen.

    Please enter the following information:
    - Name: %s
    - Birth Year: %s
    - Nationality: %s
    - Art Movement: %s
    - Famous Work: %s

    After filling all fields, click the Submit button.
    Wait for confirmation before reporting completion.",
  artist_data$name,
  artist_data$birth_year,
  artist_data$nationality,
  artist_data$movement,
  artist_data$famous_work
)

# Run computer use
result <- claude_computer(
  prompt = prompt,
  screenshot_fn = take_screenshot,
  action_fn = execute_action,
  display_size = c(1920L, 1080L),
  max_iter = 30L,
  sys_prompt = "You are a data entry assistant. Work carefully and verify each
                  field before moving to the next. If you encounter an error,
                  try to recover gracefully.",
  echo = "all"
)

# Review results
message("=== Task Complete ===")
message(sprintf("Iterations: %d", result$iterations))
message(sprintf("Actions executed: %d", length(result$actions)))
message(sprintf("Final response: %s", result$response))

Action History Analysis

# Analyze the action sequence
action_summary <- lapply(result$actions, function(a) {
  list(
    action = a$action,
    coordinate = if (!is.null(a$coordinate)) {
      sprintf("(%d, %d)", a$coordinate[1], a$coordinate[2])
    } else {
      NA
    },
    text = a$text %||% NA
  )
})

# Convert to data.table for analysis
action_dt <- data.table::rbindlist(action_summary)
print(action_dt)

# Count action types
action_dt[, .N, by = action]

Integration with Extended Thinking

For complex tasks, combine computer use with extended thinking to enable more sophisticated reasoning.

Complex Navigation Task

# Create a chat with both thinking and computer use
# Note: claude_computer handles the computer tool internally
# This example shows how to structure complex prompts

complex_prompt <- "
I need you to perform a multi-step workflow:

1. Open the web browser
2. Navigate to a search engine
3. Search for 'Impressionist painters list'
4. Find and click on a result from a museum website
5. Locate the first painter mentioned
6. Copy their name

Think through each step carefully before acting. If a page takes time to load,
wait and take a new screenshot. If you encounter popups or unexpected dialogs,
close them first.
"

# For complex reasoning, use Opus 4.5 (default)
result <- claude_computer(
  prompt = complex_prompt,
  screenshot_fn = take_screenshot,
  action_fn = execute_action,
  display_size = c(1920L, 1080L),
  max_iter = 50L, # More iterations for complex tasks
  sys_prompt = "You are an expert at navigating graphical user interfaces.
                  Take your time to analyze each screen carefully before acting.
                  Prefer precise clicks over approximate ones.",
  echo = "all"
)

Combining with Tool Use

You can use the returned chat object to continue the session with additional tools:

# After computer use, continue with structured output
chat <- result$chat

# Define a schema for the extracted information
extraction_schema <- ellmer::type_object(
  painter_name = ellmer::type_string("Name of the painter found"),
  source_url = ellmer::type_string("URL where the information was found"),
  confidence = ellmer::type_enum(
    values = c("high", "medium", "low"),
    description = "Confidence in the extraction accuracy"
  )
)

# Get structured output from the session
extracted <- chat$chat_structured(
  "Based on our session, provide the extracted painter information.",
  type = extraction_schema
)

print(extracted)

Safety and Best Practices

Isolation Strategies

Computer use gives Claude direct control over your system. Implement these safeguards:

Virtual Machine Isolation

# Run computer use in a VM via SSH/VNC
# This provides complete isolation from your main system

# Example using a local VM with RDP
vm_screenshot_fn <- function() {
  path <- tempfile(fileext = ".png")

  # Capture VM window (Windows example)
  system2("powershell", c(
    "-Command",
    sprintf('
            $hwnd = (Get-Process -Name "vmconnect").MainWindowHandle
            # ... capture specific window ...
        ')
  ))

  path
}

Container-Based Isolation

# Use Docker with X11 forwarding for Linux
# Dockerfile: FROM ubuntu:22.04 with desktop environment

container_execute_action <- function(action) {
  # Send action to containerized Python environment
  cmd <- sprintf(
    'docker exec claude-computer python -c "import pyautogui; pyautogui.%s()"',
    action$action
  )
  system(cmd)
}

Network Restrictions

Limit what the controlled environment can access:

# Firewall rules for the computer use environment
# Allow only specific domains/IPs

# Example: Create isolated network namespace (Linux)
setup_network_isolation <- function() {
  system("ip netns add claude_sandbox")
  system("ip netns exec claude_sandbox ip link set lo up")
  # Configure limited network access
}

Credential Protection

Never expose credentials in computer use sessions:

# BAD: Credentials visible in prompt
prompt_bad <- "Log into the system with username admin and password secret123"

# GOOD: Pre-authenticated session
prompt_good <- "The browser is already logged in. Navigate to the dashboard."

# GOOD: Use environment variables accessed by action function
# The action function can handle authentication separately

Monitoring and Intervention

Always monitor computer use sessions:

# Action function with safety checks
execute_action_safe <- function(action) {
  pyautogui <- reticulate::import("pyautogui")

  # Safety check: Don't allow clicks in certain regions
  if (!is.null(action$coordinate)) {
    # Example: Block taskbar clicks
    if (action$coordinate[2] > 1040) { # Bottom 40 pixels
      warning("Blocked click in taskbar region")
      return(invisible(NULL))
    }
  }

  # Safety check: Block dangerous key combinations
  if (action$action == "key") {
    dangerous <- c("alt+f4", "ctrl+alt+delete", "super")
    if (tolower(action$text) %in% dangerous) {
      warning(sprintf("Blocked dangerous key: %s", action$text))
      return(invisible(NULL))
    }
  }

  # Log all actions for audit
  log_action(action)

  # Execute with timeout
  tryCatch(
    {
      # ... actual execution ...
    },
    error = function(e) {
      warning(sprintf("Action failed: %s", e$message))
    }
  )
}

# Action logging
log_action <- function(action) {
  log_entry <- list(
    timestamp = Sys.time(),
    action = action$action,
    coordinate = action$coordinate,
    text = action$text
  )

  # Append to audit log
  log_file <- "computer_use_audit.json"
  existing <- if (file.exists(log_file)) {
    jsonlite::read_json(log_file)
  } else {
    list()
  }
  existing <- c(existing, list(log_entry))
  jsonlite::write_json(existing, log_file, auto_unbox = TRUE)
}

Emergency Stop

Implement emergency stop mechanisms:

# Global flag for emergency stop
.computer_use_env <- new.env()
.computer_use_env$emergency_stop <- FALSE

# Check in action function
execute_action_with_stop <- function(action) {
  if (.computer_use_env$emergency_stop) {
    stop("Emergency stop activated", call. = FALSE)
  }

  # ... execute action ...
}

# Call this to stop (e.g., from Shiny UI or keyboard shortcut)
emergency_stop <- function() {
  .computer_use_env$emergency_stop <- TRUE
  message("EMERGENCY STOP ACTIVATED")
}

# Reset for next session
reset_emergency_stop <- function() {
  .computer_use_env$emergency_stop <- FALSE
}

Error Handling and Recovery

Graceful Error Handling

claude_computer_robust <- function(prompt, screenshot_fn, action_fn, ...) {
  # Wrapper with error recovery
  safe_action_fn <- function(action) {
    tryCatch(
      {
        action_fn(action)
      },
      error = function(e) {
        warning(sprintf("Action '%s' failed: %s", action$action, e$message))

        # Try recovery actions
        if (action$action %in% c("left_click", "right_click")) {
          # Maybe the window moved - try to refocus
          Sys.sleep(0.5)
          tryCatch(
            {
              action_fn(list(action = "key", text = "escape"))
              Sys.sleep(0.2)
              action_fn(action) # Retry
            },
            error = function(e2) {
              warning("Recovery failed")
            }
          )
        }
      }
    )
  }

  safe_screenshot_fn <- function() {
    tryCatch(
      {
        screenshot_fn()
      },
      error = function(e) {
        warning(sprintf("Screenshot failed: %s", e$message))
        Sys.sleep(1) # Wait and retry
        screenshot_fn()
      }
    )
  }

  claude_computer(
    prompt = prompt,
    screenshot_fn = safe_screenshot_fn,
    action_fn = safe_action_fn,
    ...
  )
}

Session Recovery

# Save session state for recovery
save_session_state <- function(result, path) {
  state <- list(
    iterations = result$iterations,
    actions = result$actions,
    last_response = result$response,
    timestamp = Sys.time()
  )
  saveRDS(state, path)
}

# Resume from saved state
resume_session <- function(state_path, new_prompt, screenshot_fn, action_fn, ...) {
  state <- readRDS(state_path)

  context_prompt <- sprintf(
    "Previous session completed %d iterations with %d actions.
         Last status: %s

         Continue with: %s",
    state$iterations,
    length(state$actions),
    state$last_response,
    new_prompt
  )

  claude_computer(
    prompt = context_prompt,
    screenshot_fn = screenshot_fn,
    action_fn = action_fn,
    ...
  )
}

Advanced Patterns

Multi-Monitor Support

# Handle multiple monitors
take_screenshot_monitor <- function(monitor = 1L) {
  function() {
    path <- tempfile(fileext = ".png")
    pyautogui <- reticulate::import("pyautogui")

    # Get monitor info
    monitors <- pyautogui$screenshot(allScreens = TRUE)
    # Save specific monitor...

    path
  }
}

# Usage
result <- claude_computer(
  prompt = "...",
  screenshot_fn = take_screenshot_monitor(2L), # Second monitor
  action_fn = execute_action,
  display_size = c(2560L, 1440L) # Match monitor 2 resolution
)

Region-Specific Control

# Limit computer use to a specific screen region
create_region_screenshot_fn <- function(region) {
  # region = list(x = 100, y = 100, width = 800, height = 600)

  function() {
    path <- tempfile(fileext = ".png")
    pyautogui <- reticulate::import("pyautogui")

    screenshot <- pyautogui$screenshot(region = tuple(
      region$x, region$y, region$width, region$height
    ))
    screenshot$save(path)

    path
  }
}

create_region_action_fn <- function(region, base_action_fn) {
  function(action) {
    # Translate coordinates to full screen
    if (!is.null(action$coordinate)) {
      action$coordinate[1] <- action$coordinate[1] + region$x
      action$coordinate[2] <- action$coordinate[2] + region$y
    }
    base_action_fn(action)
  }
}

# Usage: Control only a specific application window
app_region <- list(x = 100, y = 100, width = 800, height = 600)

result <- claude_computer(
  prompt = "...",
  screenshot_fn = create_region_screenshot_fn(app_region),
  action_fn = create_region_action_fn(app_region, execute_action),
  display_size = c(800L, 600L) # Match region size
)

Batch Processing

# Process multiple tasks sequentially
process_batch <- function(tasks, screenshot_fn, action_fn, ...) {
  results <- vector("list", length(tasks))

  for (i in seq_along(tasks)) {
    message(sprintf("=== Processing task %d of %d ===", i, length(tasks)))

    results[[i]] <- tryCatch(
      {
        claude_computer(
          prompt = tasks[[i]],
          screenshot_fn = screenshot_fn,
          action_fn = action_fn,
          ...
        )
      },
      error = function(e) {
        list(error = e$message, task = tasks[[i]])
      }
    )

    # Brief pause between tasks
    Sys.sleep(2)
  }

  results
}

# Usage
tasks <- list(
  "Open notepad and type 'Task 1 complete'",
  "Save the file as task1.txt",
  "Close notepad"
)

batch_results <- process_batch(
  tasks = tasks,
  screenshot_fn = take_screenshot,
  action_fn = execute_action,
  display_size = c(1920L, 1080L),
  max_iter = 15L
)

Debugging Tips

Visual Debugging

# Save screenshots for debugging
debug_screenshot_fn <- function() {
  timestamp <- format(Sys.time(), "%Y%m%d_%H%M%S")
  debug_dir <- "computer_use_debug"
  if (!dir.exists(debug_dir)) dir.create(debug_dir)

  path <- file.path(debug_dir, sprintf("screen_%s.png", timestamp))

  pyautogui <- reticulate::import("pyautogui")
  screenshot <- pyautogui$screenshot()
  screenshot$save(path)

  message(sprintf("Screenshot saved: %s", path))
  path
}

Action Replay

# Replay recorded actions for debugging
replay_actions <- function(actions, action_fn, delay = 1) {
  for (i in seq_along(actions)) {
    message(sprintf("Replaying action %d: %s", i, actions[[i]]$action))
    action_fn(actions[[i]])
    Sys.sleep(delay)
  }
}

# Usage: Replay a problematic session
replay_actions(result$actions, execute_action, delay = 2)

Verbose Logging

# Create verbose wrapper for debugging
verbose_action_fn <- function(base_fn) {
  function(action) {
    message(sprintf("\n[%s] Action: %s", Sys.time(), action$action))
    if (!is.null(action$coordinate)) {
      message(sprintf(
        "  Coordinate: (%d, %d)",
        action$coordinate[1], action$coordinate[2]
      ))
    }
    if (!is.null(action$text)) {
      message(sprintf("  Text: '%s'", action$text))
    }

    start <- Sys.time()
    result <- base_fn(action)
    elapsed <- Sys.time() - start

    message(sprintf("  Completed in %.2f seconds", elapsed))
    result
  }
}

# Usage
result <- claude_computer(
  prompt = "...",
  screenshot_fn = take_screenshot,
  action_fn = verbose_action_fn(execute_action),
  ...
)

Summary

Computer use enables powerful automation capabilities by allowing Claude to interact with graphical interfaces. Key takeaways:

  1. Agentic Loop: Computer use follows an observe-decide-act loop
  2. Implementation: Provide screenshot_fn and action_fn for your platform
  3. Safety First: Always use isolation, monitoring, and credential protection
  4. Error Handling: Implement robust error handling and recovery
  5. Integration: Combine with extended thinking and other tools for complex workflows

The combination of Claude’s visual understanding and reasoning capabilities with direct system control opens new possibilities for automation that goes beyond API integrations.

See Also

  • vignette("custom-tools") - Building custom tools for Claude
  • vignette("web-search") - Web search integration
  • vignette("code-execution") - Server-side code execution
  • Anthropic Computer Use Documentation