Palona goes vertical and launches Vision, Workflow: 4 key lessons for AI builders



Building an enterprise AI business on one "quicksand foundation" is the central challenge for founders today, according to executives at Palonan.

Now the Palo Alto-based startup, led by former Google and Meta engineering veterans, is making a breakthrough vertical move into the restaurant and hospitality industry with today’s launch of Palona Vision and Palona Workflow.

The new offerings transform the company’s suite of multimodal agents into a real-time operating system for foodservice operations, spanning cameras, calls, conversations and coordinated task execution.

This news marks a strategic turning point for the company debut early 2025when he first appeared with $10 million in seed funding to train emotionally intelligent sales agents for large direct-to-consumer companies.

Now, focusing on one "multimodal native" approach for restaurants, Palona provides a blueprint for AI builders on how to go beyond "fine packaging" to build deep systems that solve high-stakes physics problems.

“You’re building a business on a foundation of sand, not quicksand, but quicksand,” said co-founder and CTO Tim Howes, referring to the instability of today’s LLM ecosystem. “So we created an orchestration layer that allows us to exchange models in terms of performance, fluidity and cost. »

VentureBeat recently spoke in person with Howes and co-founder and CEO Maria Zhang at – where else? – a restaurant in New York about the technical challenges and hard lessons learned of launching, growing, and pivoting.

The new offering: vision and workflow as a “digital GM”

For the end user (the restaurant owner or operator), the latest version of Palona is designed to work as an automated system. "best operations manager" who never sleeps.

Palona Vision uses in-store security cameras to analyze operational signals, such as queue length, table turnover, picking bottlenecks and cleanliness, without requiring new hardware.

It monitors upstream metrics like queue lengths, table turns and cleanliness, while simultaneously identifying background issues like prep slowdowns or station setup errors.

Palona Workflow complements this by automating multi-step operational processes. This includes managing catering orders, opening and closing checklists, and meal preparation. By correlating Vision’s video signals with point-of-sale (POS) data and staffing levels, Workflow ensures consistent execution across multiple locations.

“Palona Vision is like giving every location a digital general manager,” said Shaz Khan, founder of Tono Pizzeria + Cheesesteaks, in a press release provided to VentureBeat. “This flags problems before they become serious and saves me hours each week.” »

Going Vertical: Lessons in Domain Expertise

Palona’s journey began with a star-studded A-list. CEO Zhang was previously VP of Engineering at Google and CTO of Tinder, while co-founder Howes is the co-inventor of LDAP and a former CTO of Netscape.

Despite that pedigree, the team’s first year was a lesson in the need to focus.

Initially, Palona served fashion and electronics brands, creating "magician" And "surfer guy" personalities to manage sales. However, the team quickly realized that the restaurant industry represented a unique, trillion-dollar opportunity that was "surprisingly recession-proof" but "stunned" by operational inefficiency.

"Advice to startup founders: don’t go multi-industry," » warned Zhang.

By verticalizing, Palona went from the status of "thin" discussion layer to create a "multisensory information pipeline" which processes vision, voice and text in tandem.

This clarity of direction opened up access to proprietary training data (like prep manuals and call transcripts) while avoiding the scraping of generic data.

1. Rely on the “shifting sand”

To adapt to the reality of enterprise AI deployments in 2025 – with new and improved models released almost every week – Palona has developed a patented orchestration layer.

Rather than being "band" with a single vendor like OpenAI or Google, Palona’s architecture allows them to exchange models in no time based on performance and cost.

They use a mix of proprietary and open source models, including Gemini for computer vision testing and specific language models for proficiency in Spanish or Chinese.

For manufacturers, the message is clear: never let the fundamental value of your product depend on a single supplier.

2. From words to “models of the world”

The launch of Palona Vision represents a shift from understanding words to understanding the physical reality of a kitchen.

While many developers struggle to assemble separate APIs, Palona’s new vision model turns existing in-store cameras into operational assistants.

The system identifies "cause and effect" in real time, recognizing if a pizza is undercooked thanks to its "pale beige" color or alert a manager if a window is empty.

"In words, physics doesn’t matter," » Zhang explained. "But in reality, I drop the phone, it always drops… we really want to understand what is happening in this world of restaurants".

3. The “Muffin” solution: custom memory architecture

One of the biggest technical hurdles Palona faced was memory management. In the context of a restaurant, memory makes the difference between a frustrating interaction and a "magic" the one where the agent remembers a restaurant "usual" order.

The team initially used an unspecified open source tool, but found that it produced errors 30% of the time. "I think advisory developers always disable memory [on consumer AI products]because that will guarantee to ruin everything," » warned Zhang.

To solve this problem, Palona created Muffin, a proprietary memory management system named in a nod to the web. "cookies". Unlike standard vector approaches that struggle to handle structured data, Muffin is designed to handle four distinct layers:

  • Structured data: stable information such as delivery addresses or allergy information.

  • Slow-changing dimensions: Loyalty preferences and favorite items.

  • Transitory and seasonal memories: adapting to changes, such as preferring cold drinks in July to hot chocolate in winter.

  • Regional Context: Default values ​​such as time zones or language preferences.

The lesson for builders: If the best tool available isn’t tailored enough for your specific industry, you need to be willing to create your own.

4. Reliability thanks to “GRACE”

In a kitchen, an AI error isn’t just a typo; this is an unnecessary command or a security risk. A recent incident in Stefanina’s Pizzeria in Missouri, where an AI hallucinated fake deals during a rushed dinerhighlights how quickly brand trust can evaporate when guarantees are absent.

To avoid such chaos, Palona engineers follow its internal rules GRACE frame:

  • Guardrails: Strict limits on agent behavior to prevent unapproved promotions.

  • Red Teaming: proactive attempts to "to break" AI and identify potential triggers of hallucinations.

  • App Sec: Lock down third-party APIs and integrations with TLS, tokenization, and attack prevention.

  • Compliance: Base each response on verified and monitored menu data to ensure accuracy.

  • Escalation: Route complex interactions to a human manager before a guest receives incorrect information.

This reliability is verified by massive simulation. "We’ve simulated a million ways to order pizza," said Zhang, using one AI to act as a customer and another to take the order, measuring accuracy to eliminate hallucinations.

The essentials

With the launch of Vision and Workflow, Palona is betting that the future of enterprise AI lies not in general assistants, but in specialized solutions. "operating systems" who can see, hear and think in a specific area.

Unlike general-purpose AI agents, Palona’s system is designed to run restaurant workflows, not just respond to queries: it can remember customers, hear them order their food. "usual," and monitor restaurant operations to ensure they deliver food to that customer in accordance with their internal processes and guidelines, reporting whenever something is wrong or crucially, is about to be wrong.

For Zhang, the goal is to allow human operators to concentrate on their job: "If you made this delicious food…we’ll tell you what to do."



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *