← All case studies

Product/StartupAgentic AI

Letta buyer agent & custom eval platform

Shipped a Letta-based buyer agent with RAG over listing inventory, AWS deployment, and a custom eval platform for release gating.

CasaAmigo.ai · ~6 min read

At a glance

Eval suite size

[TBD: N cases]

Buyer-journey scenarios in the regression harness

Grounding fidelity

[TBD: %]

Responses with correct listing citations vs hallucinated inventory

Task success rate

[TBD: %]

End-to-end buyer journey completion in eval runs

P95 latency

[TBD: ms]

Conversational turn latency under production-like load

Problem

CasaAmigo needed a home-buying agent that could hold natural discovery conversations while staying grounded in live listing inventory. Early prototypes either felt generic or cited properties that were stale, wrong, or not in the catalog, which eroded trust in a high-stakes purchase journey.

The product team lacked systematic evals before promoting prompt, model, or retrieval changes to production. Without CI-style gates and post-deploy observability, each release carried avoidable risk to grounding quality and buyer task completion.

Approach

  1. Letta agent architecture used memory and tool patterns suited to multi-turn discovery, preference refinement, and handoff to listing detail.
  2. RAG over live inventory built retrieval, reranking, and citation-style grounding on AWS so answers tied to searchable listing records with acceptable conversational latency.
  3. A custom eval platform implemented task completion, grounding fidelity, and regression suites over representative buyer journeys with clear pass thresholds for promotion.
  4. Release gates and observability wired eval gates into the delivery pipeline and added monitoring hooks to catch grounding and latency regressions after deploy.

Results

[TBD: Insert validated metrics before publish.]

  • Buyer conversations consistently anchored to inventory-backed citations rather than free-form property invention.
  • Eval suites gave the founding team confidence to ship prompt and retrieval changes without manual-only QA marathons.
  • AWS deployment patterns supported scaling retrieval and agent serving as listing volume grew.