WeSearch

Webwright: A terminal is all you need for web agents

·4 min read · 0 reactions · 0 comments · 20 views
#technology#web#innovation
⚡ TL;DR · AI summary

Webwright introduces a terminal-native approach for web agents, allowing them to manage multiple browser sessions efficiently. This innovative model enables agents to create reusable programs from web tasks while maintaining a local workspace for logs and outputs. The system demonstrates significant improvements in long-horizon browsing accuracy compared to traditional methods.

Key facts
Original article
Github
Read full at Github →
Opening excerpt (first ~120 words) tap to expand

Terminal-native web agents A terminal is all you need for web agents. Webwright gives the model a terminal, a local workspace, and the freedom to write code that launches, inspects, and discards browser sessions. The output is not just a completed task, but a reusable program. How it works Watch the trace GitHub Microsoft Research Blog 3 core modules ~1K lines of harness code 86.7% Online-Mind2Web accuracy 60.8% Odysseys score Paradigm shift In Webwright, agent can launch multiple browser sessions in terminal. Traditional web agents keep one browser session alive and predict the next click, type, or scroll. Webwright separates the agent from that session: the browser can be launched, inspected, and discarded, while code, logs, screenshots, and outputs persist in the local workspace.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Github