library(artclaude)
result <- claude_computer(
prompt = "Open Notepad and type 'Hello, World!'",
screenshot_fn = take_screenshot,
action_fn = execute_action,
display_size = c(1920L, 1080L),
max_iter = 20L,
echo = "all"
)Introduction
Computer use is an advanced experimental feature that enables Claude to see your screen and control your computer through mouse movements, clicks, and keyboard input. This capability transforms Claude from a text-based assistant into an agent that can interact with any graphical application—browsers, desktop software, custom applications, and more.
This vignette provides a comprehensive deep dive into computer use, covering the agentic loop architecture, implementation patterns, safety considerations, and real-world applications. By the end, you’ll understand how to build robust computer control workflows and integrate them with other artclaude features.
Important: Computer use is an experimental beta feature. It gives Claude direct control over your system, which carries inherent risks. Always follow the safety guidelines in this vignette.
How Computer Use Works
Computer use follows an agentic loop pattern where Claude iteratively observes the screen, decides on actions, and executes them until a task is complete.
The Observation-Action Loop
┌─────────────────────────────────────────────────────────────────┐
│ COMPUTER USE LOOP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Screenshot │───▶│ Claude │───▶│ Execute │ │
│ │ Function │ │ Analyzes │ │ Action │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ▲ │ │
│ │ │ │
│ └───────────────────────────────────────┘ │
│ New Screenshot │
│ │
│ Exit conditions: │
│ - Task complete │
│ - max_iter reached │
│ - Error encountered │
│ │
└─────────────────────────────────────────────────────────────────┘
The loop consists of these phases:
-
Screenshot Capture: Your
screenshot_fncaptures the current screen state - Visual Analysis: Claude receives the screenshot and analyzes the UI
- Decision Making: Claude determines what action to take
-
Action Execution: Your
action_fnexecutes the requested action - Iteration: New screenshot is taken and the loop continues
This continues until Claude determines the task is complete or the maximum iterations are reached.
The claude_computer() Function
Parameters:
| Parameter | Type | Description |
|---|---|---|
prompt |
character | The task for Claude to accomplish |
screenshot_fn |
function | Takes no arguments, returns path to screenshot |
action_fn |
function | Receives action list, executes the action |
display_size |
integer[2] | Screen dimensions c(width, height)
|
ml |
character | Model ID (default: Opus 4.5) |
max_iter |
integer | Maximum iterations before stopping |
sys_prompt |
character | Optional system prompt |
echo |
character | Output mode: “none”, “output”, “all” |
Returns:
A list containing:
-
response: Claude’s final response text -
actions: List of all actions executed -
iterations: Number of loop iterations completed -
chat: The chat object (for session continuation)
Available Actions
Claude can request these actions through the computer tool:
| Action | Description | Required Parameters |
|---|---|---|
screenshot |
Take a new screenshot | - |
mouse_move |
Move cursor to position | coordinate |
left_click |
Left mouse click | coordinate |
right_click |
Right mouse click | coordinate |
double_click |
Double left click | coordinate |
type |
Type text string | text |
key |
Press a key/combination | text |
scroll_up |
Scroll up |
coordinate (optional) |
scroll_down |
Scroll down |
coordinate (optional) |
Coordinate System
Coordinates are provided as c(x, y) where:
-
(0, 0)is the top-left corner -
xincreases to the right -
yincreases downward - Values should match your
display_size
Key Names
For the key action, use standard key names:
| Category | Examples |
|---|---|
| Letters |
a, b, c, … |
| Numbers |
1, 2, 3, … |
| Special |
enter, tab, escape, backspace, delete
|
| Arrows |
up, down, left, right
|
| Modifiers |
ctrl, alt, shift, command
|
| Function |
f1, f2, … f12
|
| Combinations |
ctrl+c, ctrl+v, alt+tab
|
Implementation Guide
Screenshot Function
The screenshot function must capture the current screen state and return the file path.
Using magick Package
take_screenshot_magick <- function() {
path <- tempfile(fileext = ".png")
# Capture screen using system tools, then process with magick
# This is platform-specific
if (.Platform$OS.type == "windows") {
# Windows: Use PowerShell
ps_script <- sprintf('
Add-Type -AssemblyName System.Windows.Forms
$screen = [System.Windows.Forms.Screen]::PrimaryScreen.Bounds
$bitmap = New-Object System.Drawing.Bitmap($screen.Width, $screen.Height)
$graphics = [System.Drawing.Graphics]::FromImage($bitmap)
$graphics.CopyFromScreen($screen.Location, [System.Drawing.Point]::Empty, $screen.Size)
$bitmap.Save("%s")
', path)
system2("powershell", c("-Command", ps_script), stdout = FALSE)
} else if (Sys.info()["sysname"] == "Darwin") {
# macOS: Use screencapture
system2("screencapture", c("-x", path))
} else {
# Linux: Use scrot or similar
system2("scrot", path)
}
path
}Using reticulate with Python
Using the screenshot Package
take_screenshot_pkg <- function() {
path <- tempfile(fileext = ".png")
if (requireNamespace("screenshot", quietly = TRUE)) {
screenshot::screenshot(file = path)
} else {
stop("screenshot package not installed")
}
path
}Action Function
The action function receives a list with the action details and executes it.
Using reticulate with pyautogui
This is the most robust cross-platform approach:
execute_action_pyautogui <- function(action) {
pyautogui <- reticulate::import("pyautogui")
# Disable fail-safe for automated use (careful!)
pyautogui$FAILSAFE <- FALSE
switch(action$action,
"screenshot" = {
# Screenshot is handled by screenshot_fn
NULL
},
"mouse_move" = {
pyautogui$moveTo(
action$coordinate[1],
action$coordinate[2],
duration = 0.25
)
},
"left_click" = {
pyautogui$click(
action$coordinate[1],
action$coordinate[2]
)
},
"right_click" = {
pyautogui$rightClick(
action$coordinate[1],
action$coordinate[2]
)
},
"double_click" = {
pyautogui$doubleClick(
action$coordinate[1],
action$coordinate[2]
)
},
"type" = {
# Use write for regular text, handling special characters
pyautogui$write(action$text, interval = 0.02)
},
"key" = {
# Handle key combinations (e.g., "ctrl+c")
if (stringr::str_detect(action$text, "\\+")) {
keys <- stringr::str_split(action$text, "\\+")[[1]]
pyautogui$hotkey(keys)
} else {
pyautogui$press(action$text)
}
},
"scroll_up" = {
if (!is.null(action$coordinate)) {
pyautogui$moveTo(action$coordinate[1], action$coordinate[2])
}
pyautogui$scroll(3)
},
"scroll_down" = {
if (!is.null(action$coordinate)) {
pyautogui$moveTo(action$coordinate[1], action$coordinate[2])
}
pyautogui$scroll(-3)
}
)
invisible(NULL)
}Using Windows-Specific API
execute_action_windows <- function(action) {
# Requires user32.dll via Rcpp or similar
# This is a conceptual example
switch(action$action,
"left_click" = {
ps_script <- sprintf('
Add-Type -TypeDefinition @"
using System;
using System.Runtime.InteropServices;
public class Win32 {
[DllImport("user32.dll")]
public static extern bool SetCursorPos(int X, int Y);
[DllImport("user32.dll")]
public static extern void mouse_event(uint dwFlags, int dx, int dy, uint dwData, int dwExtraInfo);
}
"@
[Win32]::SetCursorPos(%d, %d)
[Win32]::mouse_event(0x02, 0, 0, 0, 0) # LEFTDOWN
[Win32]::mouse_event(0x04, 0, 0, 0, 0) # LEFTUP
', action$coordinate[1], action$coordinate[2])
system2("powershell", c("-Command", ps_script), stdout = FALSE)
}
# ... other actions
)
}Case Study: Automated Data Entry
Let’s build a complete example that uses computer use to automate data entry into a web form.
Scenario
We need to enter artist information into a legacy web application that doesn’t have an API. Claude will navigate to the form, fill in the fields, and submit the data.
Implementation
library(artclaude)
library(reticulate)
# Initialize Python environment
use_virtualenv("computer-use-env")
# Screenshot function
take_screenshot <- function() {
path <- tempfile(fileext = ".png")
pyautogui <- import("pyautogui")
screenshot <- pyautogui$screenshot()
screenshot$save(path)
path
}
# Action function with logging
execute_action <- function(action) {
pyautogui <- import("pyautogui")
# Log the action
message(sprintf("[Action] %s", action$action))
if (!is.null(action$coordinate)) {
message(sprintf(
" Coordinates: (%d, %d)",
action$coordinate[1], action$coordinate[2]
))
}
if (!is.null(action$text)) {
message(sprintf(" Text: %s", action$text))
}
# Execute the action
switch(action$action,
"mouse_move" = pyautogui$moveTo(
action$coordinate[1],
action$coordinate[2],
duration = 0.3
),
"left_click" = pyautogui$click(
action$coordinate[1],
action$coordinate[2]
),
"double_click" = pyautogui$doubleClick(
action$coordinate[1],
action$coordinate[2]
),
"type" = pyautogui$write(action$text, interval = 0.03),
"key" = pyautogui$press(action$text),
"scroll_up" = pyautogui$scroll(3),
"scroll_down" = pyautogui$scroll(-3)
)
invisible(NULL)
}
# Artist data to enter
artist_data <- list(
name = "Claude Monet",
birth_year = "1840",
nationality = "French",
movement = "Impressionism",
famous_work = "Water Lilies"
)
# Build the prompt
prompt <- sprintf(
"I need you to fill out the artist registration form visible on screen.
Please enter the following information:
- Name: %s
- Birth Year: %s
- Nationality: %s
- Art Movement: %s
- Famous Work: %s
After filling all fields, click the Submit button.
Wait for confirmation before reporting completion.",
artist_data$name,
artist_data$birth_year,
artist_data$nationality,
artist_data$movement,
artist_data$famous_work
)
# Run computer use
result <- claude_computer(
prompt = prompt,
screenshot_fn = take_screenshot,
action_fn = execute_action,
display_size = c(1920L, 1080L),
max_iter = 30L,
sys_prompt = "You are a data entry assistant. Work carefully and verify each
field before moving to the next. If you encounter an error,
try to recover gracefully.",
echo = "all"
)
# Review results
message("=== Task Complete ===")
message(sprintf("Iterations: %d", result$iterations))
message(sprintf("Actions executed: %d", length(result$actions)))
message(sprintf("Final response: %s", result$response))Action History Analysis
# Analyze the action sequence
action_summary <- lapply(result$actions, function(a) {
list(
action = a$action,
coordinate = if (!is.null(a$coordinate)) {
sprintf("(%d, %d)", a$coordinate[1], a$coordinate[2])
} else {
NA
},
text = a$text %||% NA
)
})
# Convert to data.table for analysis
action_dt <- data.table::rbindlist(action_summary)
print(action_dt)
# Count action types
action_dt[, .N, by = action]Integration with Extended Thinking
For complex tasks, combine computer use with extended thinking to enable more sophisticated reasoning.
Complex Navigation Task
# Create a chat with both thinking and computer use
# Note: claude_computer handles the computer tool internally
# This example shows how to structure complex prompts
complex_prompt <- "
I need you to perform a multi-step workflow:
1. Open the web browser
2. Navigate to a search engine
3. Search for 'Impressionist painters list'
4. Find and click on a result from a museum website
5. Locate the first painter mentioned
6. Copy their name
Think through each step carefully before acting. If a page takes time to load,
wait and take a new screenshot. If you encounter popups or unexpected dialogs,
close them first.
"
# For complex reasoning, use Opus 4.5 (default)
result <- claude_computer(
prompt = complex_prompt,
screenshot_fn = take_screenshot,
action_fn = execute_action,
display_size = c(1920L, 1080L),
max_iter = 50L, # More iterations for complex tasks
sys_prompt = "You are an expert at navigating graphical user interfaces.
Take your time to analyze each screen carefully before acting.
Prefer precise clicks over approximate ones.",
echo = "all"
)Combining with Tool Use
You can use the returned chat object to continue the session with additional tools:
# After computer use, continue with structured output
chat <- result$chat
# Define a schema for the extracted information
extraction_schema <- ellmer::type_object(
painter_name = ellmer::type_string("Name of the painter found"),
source_url = ellmer::type_string("URL where the information was found"),
confidence = ellmer::type_enum(
values = c("high", "medium", "low"),
description = "Confidence in the extraction accuracy"
)
)
# Get structured output from the session
extracted <- chat$chat_structured(
"Based on our session, provide the extracted painter information.",
type = extraction_schema
)
print(extracted)Safety and Best Practices
Isolation Strategies
Computer use gives Claude direct control over your system. Implement these safeguards:
Virtual Machine Isolation
# Run computer use in a VM via SSH/VNC
# This provides complete isolation from your main system
# Example using a local VM with RDP
vm_screenshot_fn <- function() {
path <- tempfile(fileext = ".png")
# Capture VM window (Windows example)
system2("powershell", c(
"-Command",
sprintf('
$hwnd = (Get-Process -Name "vmconnect").MainWindowHandle
# ... capture specific window ...
')
))
path
}Container-Based Isolation
# Use Docker with X11 forwarding for Linux
# Dockerfile: FROM ubuntu:22.04 with desktop environment
container_execute_action <- function(action) {
# Send action to containerized Python environment
cmd <- sprintf(
'docker exec claude-computer python -c "import pyautogui; pyautogui.%s()"',
action$action
)
system(cmd)
}Network Restrictions
Limit what the controlled environment can access:
# Firewall rules for the computer use environment
# Allow only specific domains/IPs
# Example: Create isolated network namespace (Linux)
setup_network_isolation <- function() {
system("ip netns add claude_sandbox")
system("ip netns exec claude_sandbox ip link set lo up")
# Configure limited network access
}Credential Protection
Never expose credentials in computer use sessions:
# BAD: Credentials visible in prompt
prompt_bad <- "Log into the system with username admin and password secret123"
# GOOD: Pre-authenticated session
prompt_good <- "The browser is already logged in. Navigate to the dashboard."
# GOOD: Use environment variables accessed by action function
# The action function can handle authentication separatelyMonitoring and Intervention
Always monitor computer use sessions:
# Action function with safety checks
execute_action_safe <- function(action) {
pyautogui <- reticulate::import("pyautogui")
# Safety check: Don't allow clicks in certain regions
if (!is.null(action$coordinate)) {
# Example: Block taskbar clicks
if (action$coordinate[2] > 1040) { # Bottom 40 pixels
warning("Blocked click in taskbar region")
return(invisible(NULL))
}
}
# Safety check: Block dangerous key combinations
if (action$action == "key") {
dangerous <- c("alt+f4", "ctrl+alt+delete", "super")
if (tolower(action$text) %in% dangerous) {
warning(sprintf("Blocked dangerous key: %s", action$text))
return(invisible(NULL))
}
}
# Log all actions for audit
log_action(action)
# Execute with timeout
tryCatch(
{
# ... actual execution ...
},
error = function(e) {
warning(sprintf("Action failed: %s", e$message))
}
)
}
# Action logging
log_action <- function(action) {
log_entry <- list(
timestamp = Sys.time(),
action = action$action,
coordinate = action$coordinate,
text = action$text
)
# Append to audit log
log_file <- "computer_use_audit.json"
existing <- if (file.exists(log_file)) {
jsonlite::read_json(log_file)
} else {
list()
}
existing <- c(existing, list(log_entry))
jsonlite::write_json(existing, log_file, auto_unbox = TRUE)
}Emergency Stop
Implement emergency stop mechanisms:
# Global flag for emergency stop
.computer_use_env <- new.env()
.computer_use_env$emergency_stop <- FALSE
# Check in action function
execute_action_with_stop <- function(action) {
if (.computer_use_env$emergency_stop) {
stop("Emergency stop activated", call. = FALSE)
}
# ... execute action ...
}
# Call this to stop (e.g., from Shiny UI or keyboard shortcut)
emergency_stop <- function() {
.computer_use_env$emergency_stop <- TRUE
message("EMERGENCY STOP ACTIVATED")
}
# Reset for next session
reset_emergency_stop <- function() {
.computer_use_env$emergency_stop <- FALSE
}Error Handling and Recovery
Graceful Error Handling
claude_computer_robust <- function(prompt, screenshot_fn, action_fn, ...) {
# Wrapper with error recovery
safe_action_fn <- function(action) {
tryCatch(
{
action_fn(action)
},
error = function(e) {
warning(sprintf("Action '%s' failed: %s", action$action, e$message))
# Try recovery actions
if (action$action %in% c("left_click", "right_click")) {
# Maybe the window moved - try to refocus
Sys.sleep(0.5)
tryCatch(
{
action_fn(list(action = "key", text = "escape"))
Sys.sleep(0.2)
action_fn(action) # Retry
},
error = function(e2) {
warning("Recovery failed")
}
)
}
}
)
}
safe_screenshot_fn <- function() {
tryCatch(
{
screenshot_fn()
},
error = function(e) {
warning(sprintf("Screenshot failed: %s", e$message))
Sys.sleep(1) # Wait and retry
screenshot_fn()
}
)
}
claude_computer(
prompt = prompt,
screenshot_fn = safe_screenshot_fn,
action_fn = safe_action_fn,
...
)
}Session Recovery
# Save session state for recovery
save_session_state <- function(result, path) {
state <- list(
iterations = result$iterations,
actions = result$actions,
last_response = result$response,
timestamp = Sys.time()
)
saveRDS(state, path)
}
# Resume from saved state
resume_session <- function(state_path, new_prompt, screenshot_fn, action_fn, ...) {
state <- readRDS(state_path)
context_prompt <- sprintf(
"Previous session completed %d iterations with %d actions.
Last status: %s
Continue with: %s",
state$iterations,
length(state$actions),
state$last_response,
new_prompt
)
claude_computer(
prompt = context_prompt,
screenshot_fn = screenshot_fn,
action_fn = action_fn,
...
)
}Advanced Patterns
Multi-Monitor Support
# Handle multiple monitors
take_screenshot_monitor <- function(monitor = 1L) {
function() {
path <- tempfile(fileext = ".png")
pyautogui <- reticulate::import("pyautogui")
# Get monitor info
monitors <- pyautogui$screenshot(allScreens = TRUE)
# Save specific monitor...
path
}
}
# Usage
result <- claude_computer(
prompt = "...",
screenshot_fn = take_screenshot_monitor(2L), # Second monitor
action_fn = execute_action,
display_size = c(2560L, 1440L) # Match monitor 2 resolution
)Region-Specific Control
# Limit computer use to a specific screen region
create_region_screenshot_fn <- function(region) {
# region = list(x = 100, y = 100, width = 800, height = 600)
function() {
path <- tempfile(fileext = ".png")
pyautogui <- reticulate::import("pyautogui")
screenshot <- pyautogui$screenshot(region = tuple(
region$x, region$y, region$width, region$height
))
screenshot$save(path)
path
}
}
create_region_action_fn <- function(region, base_action_fn) {
function(action) {
# Translate coordinates to full screen
if (!is.null(action$coordinate)) {
action$coordinate[1] <- action$coordinate[1] + region$x
action$coordinate[2] <- action$coordinate[2] + region$y
}
base_action_fn(action)
}
}
# Usage: Control only a specific application window
app_region <- list(x = 100, y = 100, width = 800, height = 600)
result <- claude_computer(
prompt = "...",
screenshot_fn = create_region_screenshot_fn(app_region),
action_fn = create_region_action_fn(app_region, execute_action),
display_size = c(800L, 600L) # Match region size
)Batch Processing
# Process multiple tasks sequentially
process_batch <- function(tasks, screenshot_fn, action_fn, ...) {
results <- vector("list", length(tasks))
for (i in seq_along(tasks)) {
message(sprintf("=== Processing task %d of %d ===", i, length(tasks)))
results[[i]] <- tryCatch(
{
claude_computer(
prompt = tasks[[i]],
screenshot_fn = screenshot_fn,
action_fn = action_fn,
...
)
},
error = function(e) {
list(error = e$message, task = tasks[[i]])
}
)
# Brief pause between tasks
Sys.sleep(2)
}
results
}
# Usage
tasks <- list(
"Open notepad and type 'Task 1 complete'",
"Save the file as task1.txt",
"Close notepad"
)
batch_results <- process_batch(
tasks = tasks,
screenshot_fn = take_screenshot,
action_fn = execute_action,
display_size = c(1920L, 1080L),
max_iter = 15L
)Debugging Tips
Visual Debugging
# Save screenshots for debugging
debug_screenshot_fn <- function() {
timestamp <- format(Sys.time(), "%Y%m%d_%H%M%S")
debug_dir <- "computer_use_debug"
if (!dir.exists(debug_dir)) dir.create(debug_dir)
path <- file.path(debug_dir, sprintf("screen_%s.png", timestamp))
pyautogui <- reticulate::import("pyautogui")
screenshot <- pyautogui$screenshot()
screenshot$save(path)
message(sprintf("Screenshot saved: %s", path))
path
}Action Replay
# Replay recorded actions for debugging
replay_actions <- function(actions, action_fn, delay = 1) {
for (i in seq_along(actions)) {
message(sprintf("Replaying action %d: %s", i, actions[[i]]$action))
action_fn(actions[[i]])
Sys.sleep(delay)
}
}
# Usage: Replay a problematic session
replay_actions(result$actions, execute_action, delay = 2)Verbose Logging
# Create verbose wrapper for debugging
verbose_action_fn <- function(base_fn) {
function(action) {
message(sprintf("\n[%s] Action: %s", Sys.time(), action$action))
if (!is.null(action$coordinate)) {
message(sprintf(
" Coordinate: (%d, %d)",
action$coordinate[1], action$coordinate[2]
))
}
if (!is.null(action$text)) {
message(sprintf(" Text: '%s'", action$text))
}
start <- Sys.time()
result <- base_fn(action)
elapsed <- Sys.time() - start
message(sprintf(" Completed in %.2f seconds", elapsed))
result
}
}
# Usage
result <- claude_computer(
prompt = "...",
screenshot_fn = take_screenshot,
action_fn = verbose_action_fn(execute_action),
...
)Summary
Computer use enables powerful automation capabilities by allowing Claude to interact with graphical interfaces. Key takeaways:
- Agentic Loop: Computer use follows an observe-decide-act loop
-
Implementation: Provide
screenshot_fnandaction_fnfor your platform - Safety First: Always use isolation, monitoring, and credential protection
- Error Handling: Implement robust error handling and recovery
- Integration: Combine with extended thinking and other tools for complex workflows
The combination of Claude’s visual understanding and reasoning capabilities with direct system control opens new possibilities for automation that goes beyond API integrations.
See Also
-
vignette("custom-tools")- Building custom tools for Claude -
vignette("web-search")- Web search integration -
vignette("code-execution")- Server-side code execution - Anthropic Computer Use Documentation
