Claude Code Desktop Automation: 7 Essential Ways to Control Your Browser in 2026

🌐 この記事を日本語で読む

🤖
AI That Controls Your Screen.
Browser and desktop as extensions of an AI.

For official documentation, see the Anthropic Claude Code docs. MCP specifications are available at modelcontextprotocol.io.

This is the final article in the series. Here, I cover Claude Code desktop automation — the ability for a terminal-based AI to directly operate applications and websites on your screen. It sounds like science fiction, but it works today.

This article explains the two MCP approaches for Claude Code desktop automation, with practical techniques and real-world examples.

目次

Two Approaches to Claude Code Desktop Automation

Desktop automation uses two distinct MCP servers, each with different strengths. Choosing the right one is critical for reliable results.

MCPTargetMethodBest For
k-chromeBrowserDevTools ProtocolAll web operations, fast and precise
Windows MCPFull desktopScreenshot + coordinatesNon-browser application control

📊 MCP Selection Flowchart

graph TD Start[What to operate?] ==>|Website| Q1{k-chrome available?} Start ==>|Desktop App| WM[Windows MCP] Q1 ==>|Yes| KC[k-chrome MCP] Q1 ==>|No| WM2[Windows MCP fallback] KC ==> Done[Done] WM ==> SS[Screenshot] WM2 ==> SS SS ==> Coord[Find Coords] Coord ==> Click[Click / Type] Click ==> Done style KC fill:#27ae60,stroke:#fff,color:#fff style WM fill:#e67e22,stroke:#fff,color:#fff
🌐
Web Control
k-chrome first. DevTools Protocol is fast and precise
🖱️
App Control
Windows MCP: screenshot → coords → click workflow
🔒
Safety Guards
Auto-pause at password screens for manual entry

k-chrome MCP — Browser Automation First Choice

k-chrome is a custom MCP server built on Chrome DevTools Protocol. It is the primary tool for Claude Code desktop automation involving web browsers.

Basic Operation Flow

# 1. Navigate to page
navigate("https://example.com")

# 2. Fill in a form
fill_form({"#email": "user@example.com", "#password": "***"})

# 3. Click a button
click("#submit-btn")

# 4. Take a screenshot for verification
screenshot()

# 5. Extract page content
get_content(selector="#result")

Claude Code assembles these operations automatically from natural language instructions. Therefore, you do not need to know CSS selectors or APIs — just say “add this product to the cart” and the automation runs.

Key Advantage: Existing Session Reuse

k-chrome uses your everyday Chrome profile directly. This means all existing login sessions are preserved — Google, Amazon, and every other service. As a result, you never need to re-authenticate. Simply say “buy on Amazon” and the operation starts with your account already logged in.

JavaScript Execution for Advanced Operations

The evaluate() function runs JavaScript directly within the page context. This enables REST API calls, direct DOM manipulation, and interaction with complex SPAs. For instance, this very article was published using evaluate() to call the WordPress REST API. Furthermore, JavaScript access allows you to interact with application state that is not exposed through the visible UI.

Windows MCP — Desktop Application Control

For applications outside the browser, Windows MCP is essential. This npm package provides full control over the Windows desktop environment.

Operation Flow

  1. Launch applicationApp(mode: "launch") opens the target app
  2. Capture screenshot — PowerShell captures the current screen
  3. Identify coordinates — AI analyzes the screenshot to determine click positions
  4. Execute operations — Click, Type, and Shortcut commands perform the actions

This approach replicates how a human looks at the screen and clicks — it is somewhat crude, but it works with any application, even those without APIs. Consequently, Claude Code desktop automation can handle virtually any software.

Real Example: LINE Message Automation

Here is the step-by-step flow for sending a LINE message via the desktop app:

  1. Bring LINE PC app to foreground
  2. Take a screenshot to assess the current state
  3. Click the search bar at the identified coordinates
  4. Type the recipient’s name to search
  5. Open the chat from search results
  6. Type the message in the input field
  7. Show message to user for confirmation (mandatory)
  8. Click send after approval

Priority Rules for Claude Code Desktop Automation

Web operation needed
  → Use k-chrome MCP (first choice)
    → Fallback: Windows MCP screenshot + coordinate click

Desktop app operation needed
  → Windows MCP only

Document these rules in CLAUDE.md, and Claude Code will automatically select the appropriate MCP based on the situation.

Best Practices and Precautions

Always Re-capture Coordinates

Window positions can change between operations. Using stale coordinates causes click misalignment. Therefore, take a fresh screenshot immediately before every click operation.

Handle IME (Input Method) Carefully

When typing in languages like Japanese, IME state can cause text input errors. However, Windows MCP’s Type tool bypasses IME entirely, providing more reliable text entry than SendKeys.

Fullscreen Apps Block Screenshots

Fullscreen applications (like games) prevent Windows MCP from capturing screenshots. Close any fullscreen apps before running Claude Code desktop automation.

Security Is Non-Negotiable

When a password prompt appears, human manual entry is mandatory. AI must never input credentials. Additionally, financial accounts are restricted to read-only operations — no write actions allowed.

⚠️ Desktop Automation Troubleshooting Guide

🔴 Clicks are misaligned
→ Take a fresh screenshot immediately before each click. Window positions change between operations.

🔴 Text input is garbled
→ Use Windows MCP’s Type tool instead of SendKeys. It bypasses IME for reliable input.

🔴 Screenshot is black
→ A fullscreen app (game) is blocking capture. Close it before running automation.

🔴 AI stopped at login screen
→ This is expected. The security rule in CLAUDE.md requires manual password entry.

Complete Series Recap

  1. Introduction — Claude Code overview and installation
  2. Project Structure — Directory design and CLAUDE.md best practices
  3. MCP Servers — Custom MCP servers for external tool integration
  4. Skills — One-command custom automations
  5. Memory — Persistent context across conversations
  6. Scheduled Execution — Cron-like automated recurring tasks
  7. App Development — AI pair programming case study
  8. Desktop Automation — Browser and desktop application control

Claude Code is not a tool for “asking AI questions” — it is a tool for working alongside AI. With the right setup, you can automate virtually any task in your daily workflow. Moreover, by combining MCP servers, skills, and memory, the scope of automation is limitless.

If this series sparked your interest, start with the Introduction and try Claude Code for yourself. Taking that first step is all it takes to transform how you work with AI.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!
目次