China News .online

AI Gains Website Understanding Through Browser Integration – An Emerging Open Source Trend

20 April 2025 · Uncategorized ·

Source: · https://technews.tw/2025/04/06/ai-agent-browser-use/

AI Gains Website Understanding Through Browser Integration – An Emerging Open Source Trend
Recently, the rapidly discussed artificial intelligence agent platform Manus has sparked considerable interest among developers with its ability to automatically operate websites and execute tasks. However, as Manus gained popularity, it unexpectedly brought attention to the open-source tool behind it—Browser Use—which quickly became a notable technology in the development community.

In simple terms, Browser Use converts website front-end structures into text formats that AI models can understand, allowing language models not only to read data but also “understand” websites and perform operations like clicking, inputting, browsing, etc., similar to how humans do.
Gregor Žunič (Browser Use co-founder) stated that the related introduction article on X gained over 2.4 million views, causing Browser Use's daily downloads to surge from 5,000 to 28,000 and quickly climbing up GitHub’s trending list.

Surprisingly, this technology was initially just a weekend experiment by two master students who spent only four days creating the first prototype version.
The concept of AI automatically operating websites is not new; many teams have attempted it. However, what did Browser Use do differently to overcome limitations and gain favor from both open-source communities and markets?

Artificial intelligence agents (AI Agents) are increasingly becoming a focal point in artificial intelligence technology applications, with numerous startups entering the field aiming to enable AI to autonomously complete various web tasks.
However, most current technical solutions still rely on “visual-oriented” methods such as screenshotting website interfaces, analyzing coordinate positions of interface elements, and simulating human operation processes. Although these approaches are relatively easy to implement, their stability is often low; a slight change in a site's layout (e.g., button position shifts or functions rearranged) can cause the previously set automation process to fail, interrupting tasks and increasing maintenance and correction costs.
Moreover, websites commonly employ anti-bot mechanisms like blocking abnormal IPs, requiring CAPTCHA inputs, or forcing re-login, adding uncertainty to AI execution.

Browser Use takes a different approach entirely. It doesn’t rely on image recognition but enables AI to truly “understand” the website by translating interactive elements (such as buttons, input fields, dropdown menus) into semantic structured text formats so that large language models can understand web logic like natural languages and make autonomous operational decisions.
This approach avoids common visual recognition errors such as coordinate discrepancies or layout changes, significantly improving operation accuracy and stability.

Users only need to issue task commands—like logging in to a website, downloading reports, filling out specific forms—and Browser Use will assist AI in automatically parsing the site structure and sequentially completing operations. It supports multi-tab operations and simulates mouse/keyboard actions while also accessing computer files for more complex tasks requiring continuity.
Interestingly, this prototype took only four days to complete. What made such an idea spark attention from open-source communities?

This concept originated with two data science master students at ETH Zurich—Magnus Müller and Gregor Žunič—who met in the campus innovation accelerator 'Student Project House' in 2024.
Müller specializes in developing web crawlers and automation tools, while Žunič focuses on applying data science to practical tasks; they began collaborating immediately upon meeting.

Žunič recalled that their initial idea was just a casual lunchtime discussion: “We wanted to make something small for Hacker News and see what happens.” Within four days, they completed the minimum viable product (MVP) and uploaded it simultaneously on GitHub and Hacker News.
The project’s rapid exposure led them to top positions on both platforms, sparking intense interest from developers. Browser Use has since accumulated over 50,000 stars on GitHub with more than 15,000 contributors.

Initially, Browser Use only provided an open-source version for developer self-deployment and customization.
However, after OpenAI launched its browser agent service 'Operator,' demand within the Browser Use community surged—many users requested a cloud-based, ready-to-use solution. The team quickly responded by launching their official online version priced at $30 per month.

This online service integrates features like IP switching, CAPTCHA handling, automatic login status retention and supports simultaneous task execution.
Users no longer need to handle backend settings; they can directly deploy AI agent processes on the platform. This product adjustment transformed Browser Use from an open-source tool into a potential commercialized AI automation platform.

According to market research firm Research and Markets’ forecast, the artificial intelligence agents market will reach $42 billion by 2029.
Consulting company Deloitte predicts that over half of companies will adopt AI agent technology by 2027—indicating a growing trend towards these technologies becoming key components in corporate digital transformation.

In this wave of industry trends, Browser Use has also attracted capital markets’ attention. Led by Felicis partner Astasia Myers and joined by investors like Paul Graham, Nexus Venture Partners, A Capital, the seed round raised $17 million for Browser Use.
Myers noted that their “open-source-first” strategy and AI agent application positioning were key factors in securing investment; she also highlighted the team's execution capability as a significant factor attracting investor interest.

The team is actively developing voice operation features, task reruns, automatic scheduling functions, and plans to launch an API interface for easier integration of AI agents into their products.
Žunič stated: "Tell your computer what you want it to do, and it will help complete the tasks." This sentence encapsulates precisely what they aim to achieve with AI.

(Reprinted from Startup Gathering; Image source: Browser Use)

Read Also

© 2025 CHINA NEWS .online beta

Write us hi@chinanews.online