For official documentation, see the Anthropic Claude Code docs. MCP specifications are available at modelcontextprotocol.io.
This is the final article in the series. Here, I cover Claude Code desktop automation — the ability for a terminal-based AI to directly operate applications and websites on your screen. It sounds like science fiction, but it works today.
This article explains the two MCP approaches for Claude Code desktop automation, with practical techniques and real-world examples.
Two Approaches to Claude Code Desktop Automation
Desktop automation uses two distinct MCP servers, each with different strengths. Choosing the right one is critical for reliable results.
| MCP | Target | Method | Best For |
|---|---|---|---|
| k-chrome | Browser | DevTools Protocol | All web operations, fast and precise |
| Windows MCP | Full desktop | Screenshot + coordinates | Non-browser application control |
📊 MCP Selection Flowchart
k-chrome MCP — Browser Automation First Choice
k-chrome is a custom MCP server built on Chrome DevTools Protocol. It is the primary tool for Claude Code desktop automation involving web browsers.
Basic Operation Flow
# 1. Navigate to page
navigate("https://example.com")
# 2. Fill in a form
fill_form({"#email": "user@example.com", "#password": "***"})
# 3. Click a button
click("#submit-btn")
# 4. Take a screenshot for verification
screenshot()
# 5. Extract page content
get_content(selector="#result")
Claude Code assembles these operations automatically from natural language instructions. Therefore, you do not need to know CSS selectors or APIs — just say “add this product to the cart” and the automation runs.
Key Advantage: Existing Session Reuse
k-chrome uses your everyday Chrome profile directly. This means all existing login sessions are preserved — Google, Amazon, and every other service. As a result, you never need to re-authenticate. Simply say “buy on Amazon” and the operation starts with your account already logged in.
JavaScript Execution for Advanced Operations
The evaluate() function runs JavaScript directly within the page context. This enables REST API calls, direct DOM manipulation, and interaction with complex SPAs. For instance, this very article was published using evaluate() to call the WordPress REST API. Furthermore, JavaScript access allows you to interact with application state that is not exposed through the visible UI.
Windows MCP — Desktop Application Control
For applications outside the browser, Windows MCP is essential. This npm package provides full control over the Windows desktop environment.
Operation Flow
- Launch application —
App(mode: "launch")opens the target app - Capture screenshot — PowerShell captures the current screen
- Identify coordinates — AI analyzes the screenshot to determine click positions
- Execute operations — Click, Type, and Shortcut commands perform the actions
This approach replicates how a human looks at the screen and clicks — it is somewhat crude, but it works with any application, even those without APIs. Consequently, Claude Code desktop automation can handle virtually any software.
Real Example: LINE Message Automation
Here is the step-by-step flow for sending a LINE message via the desktop app:
- Bring LINE PC app to foreground
- Take a screenshot to assess the current state
- Click the search bar at the identified coordinates
- Type the recipient’s name to search
- Open the chat from search results
- Type the message in the input field
- Show message to user for confirmation (mandatory)
- Click send after approval
Priority Rules for Claude Code Desktop Automation
Web operation needed
→ Use k-chrome MCP (first choice)
→ Fallback: Windows MCP screenshot + coordinate click
Desktop app operation needed
→ Windows MCP only
Document these rules in CLAUDE.md, and Claude Code will automatically select the appropriate MCP based on the situation.
Best Practices and Precautions
Always Re-capture Coordinates
Window positions can change between operations. Using stale coordinates causes click misalignment. Therefore, take a fresh screenshot immediately before every click operation.
Handle IME (Input Method) Carefully
When typing in languages like Japanese, IME state can cause text input errors. However, Windows MCP’s Type tool bypasses IME entirely, providing more reliable text entry than SendKeys.
Fullscreen Apps Block Screenshots
Fullscreen applications (like games) prevent Windows MCP from capturing screenshots. Close any fullscreen apps before running Claude Code desktop automation.
Security Is Non-Negotiable
When a password prompt appears, human manual entry is mandatory. AI must never input credentials. Additionally, financial accounts are restricted to read-only operations — no write actions allowed.
⚠️ Desktop Automation Troubleshooting Guide
🔴 Clicks are misaligned
→ Take a fresh screenshot immediately before each click. Window positions change between operations.
🔴 Text input is garbled
→ Use Windows MCP’s Type tool instead of SendKeys. It bypasses IME for reliable input.
🔴 Screenshot is black
→ A fullscreen app (game) is blocking capture. Close it before running automation.
🔴 AI stopped at login screen
→ This is expected. The security rule in CLAUDE.md requires manual password entry.
Complete Series Recap
- Introduction — Claude Code overview and installation
- Project Structure — Directory design and CLAUDE.md best practices
- MCP Servers — Custom MCP servers for external tool integration
- Skills — One-command custom automations
- Memory — Persistent context across conversations
- Scheduled Execution — Cron-like automated recurring tasks
- App Development — AI pair programming case study
- Desktop Automation — Browser and desktop application control
Claude Code is not a tool for “asking AI questions” — it is a tool for working alongside AI. With the right setup, you can automate virtually any task in your daily workflow. Moreover, by combining MCP servers, skills, and memory, the scope of automation is limitless.
If this series sparked your interest, start with the Introduction and try Claude Code for yourself. Taking that first step is all it takes to transform how you work with AI.
