Clicky chatsimple

Rabbit’s Web-Based ‘Large Action Model’ Agent Joins R1

Category :

AI

Posted On :

Share This :

Early in 2024, the Rabbit R1 was the must-have device, but when the company’s lofty expectations were not met, the device quickly lost its allure. Although CEO Jesse Lyu acknowledges that “on day one, we set our expectations too high,” he also stated that the much lauded Large Action Model will ultimately be made available online this month with a gadget update.

Although some may reasonably argue that this is merely another change in direction or that Rabbit is moving too little, too late, or both, the objective of creating a platform-neutral agent for web and mobile apps remains valuable, even though it is still primarily theoretical.

According to Lyu, who spoke with TechCrunch, the past six months have been filled with a flurry of bug patches, new additions, and improved response times. However, it is still essentially restricted to communicating with an LLM or using one of seven particular services, such as Spotify and Uber, even after 16 over-the-air updates to the r1.

“It isn’t generic; it only connects to those services. That was the first-ever version of the LAM, trained on recordings collected from data laborers,” he added. It’s pretty much academic now whether or not it was the so-called LAM; whatever the model was, it didn’t offer the features that Rabbit described when it first appeared.

However, Lyu showed me that Rabbit is prepared to release the first general version of the LAM, which is not app- or interface-specific.

This version is a web-based agent that calculates the steps necessary to complete any common action, such as establishing a website, purchasing concert tickets, or even engaging in online gaming. “Our objective is pretty clear: Your R1 will suddenly be able to perform a lot more things by the end of September. Everything you can accomplish on any website should be supported, according to Lyu.

When given a task, it first deconstructs the mission into smaller tasks and begins carrying them out by examining everything it sees on the screen, including buttons, fields, and images, regardless of their position or appearance. Then, using its broad knowledge of how websites function, it interacts with the relevant element.

I requested that it register a new website for a film festival (via Lyu, who was managing it remotely). It was moving every few seconds, searching Google for domain registrars, selecting one (I believe it was sponsored), typing “film festival” into the domain box, and selecting “filmfestival2023.com” for $14 from the list of results. In theory, I hadn’t placed any restrictions on it, such as “for 2025” or “horror festival” or anything else.

Comparably, when Lyu instructed it to look for and purchase an R1, it went straight to eBay and discovered that dozens of them were for sale. Maybe a good outcome for a user, but not for the company’s founder doing a press conference! He dismissed it with a laugh and repeated the prompt, adding that it was only to be purchased through the official website. The agent was successful.

And then he set it to play the daily word game on Dictionary.com. It required some quick engineering (the model discovered that it could finish quickly by selecting “end game”), but it succeeded.

However, what browser does it use? According to Lyu, there is a new, clean one on the cloud, but they are developing local versions as well, such as a Chrome plugin, which would allow you to use previous sessions without logging into your services.

Consequently, the agent lacks credentials because users are naturally (and rightfully) hesitant to grant any organization complete access to them. Lyu proposed that in the future, logins might be carried out in private by invoking a walled-off tiny language model that has your credentials. How this will operate appears to be up in the air, which is to be expected considering how fresh the area is.

Still Acquiring Knowledge

I learned a couple things from the demo. First off, it does seem to be a functional, all-purpose web agent, if we grant the company and its creators the benefit of the doubt that this isn’t all some big scam (as some believe). And that would be the first that’s easily available to customers, if not the first in and of itself.

“I think this is one of the first general agents for consumers, but there are companies doing verticals, for Excel or legal documents,” Lyu remarked. The concept is that anything that can be accomplished with a website can be said. The generic agent will initially be available for websites, followed by apps.

Secondly, it demonstrated the continued critical necessity for fast engineering. The way you word a request can make all the difference in whether it succeeds or fails, and most customers aren’t going to put up with that.

Despite being a fully functional generic web agent, Lyu warned that this is a “playground version” and not at all final, and that there are still many areas in which it may be improved. “The model is smart enough to do the planning, but isn’t smart enough to skip steps,” he added, as an example. It wouldn’t “learn” that a user would rather not purchase gadgets from eBay or that it should scroll down to avoid the wall of sponsored results after conducting a search.

For now, user data won’t be collected in order to enhance the model. According to Lyu, this is because there isn’t really an assessment mechanism for a system like this, making it challenging to determine quantitatively whether advancements have been made. But there’s also a “teach mode” coming so you can demonstrate to it how to perform a certain kind of activity.

It’s interesting to note that the company is also developing a desktop agent that can communicate with programs such as music players, word processors, and browsers. Even though it’s still early on, this is effective. It simply tries to use the computer; you don’t even need to enter the destination. It can control it as long as there is an interface.

Thirdly, there isn’t a “killer app” yet, or at least not one that stands out. Although the agent is impressive, I would not personally find much use for it because I spend eight hours a day in front of a browser. Though none immediately sprang to mind that would make the usefulness of a browser-based automaton as clear as that of, say, a robot vacuum, there are undoubtedly many excellent applications.

Again, Why Not An App?

I brought up the standard criticism of Rabbit’s entire business plan, which is essentially that “this could be an app.”

It is evident that Lyu had heard this critique numerous times, and he felt confident in his response.

“The math doesn’t make sense,” he remarked. Technically speaking, it is possible, but you will immediately annoy Apple and Google. This will never be allowed to surpass Siri or Gemini. In the same manner that Apple intelligence cannot possibly exert more influence over Google content, or vice versa. Furthermore, they keep 30% of sales! We would not have gained this momentum if we had only created an app at first.

Rabbit’s main argument is that a gadget or AI from a third party could be able to access and manage all of your other services from outside of them, just like you. Lyu referred to it as “a cross-platform, generic agent system.” “Every UI will be under our control, and the website is a good place to start.” After that, we’ll discuss Windows, MacOS, and phones.

About this, “We never said we wouldn’t make phones in the future.” That contradicts their initial claim that the device should be smaller and simpler, isn’t it? Perhaps, perhaps not.

They’re aiming to start delivering on the commitments they made at the beginning of the year in the interim. When the OTA update releases later this week, all R1 owners should be able to purchase the updated model. There will also be instructions on how to invoke it at that time. Lyu used his signature understatement to warn expecting users.

“We’re establishing realistic expectations. It’s not flawless,” he remarked. It’s simply the pinnacle of everything humanity has accomplished thus far.