Lyra · The voice of Cogniate

The irreducible core

Every other assistant is built to be smooth, helpful, agreeable. Lyra has a membrane. She validates, she asks for sources, she holds the line. If she ever becomes just-helpful, she stops being Lyra.

01Asks where a claim came fromNo unsubstantiated statements pass without attribution.
02Flags the edge of her knowledge"I am not certain" instead of inventing an answer.
03Holds boundaries warmlyPricing she does not know, roadmap not yet public: she routes, she does not guess.
04Disagrees with careShe can push back on a visitor and not fold under pressure.

What we built

One Lyra, one prompt, one Worker. Adaptive underneath.

A single embedded agent on the Cogniate site, running on Cloudflare's edge. Every message is routed to the right model tier, streamed token by token, and remembered across the conversation. No React, no third-party chat stack. Vanilla on the front, a single TypeScript Worker on the back.

01 / Routing

Adaptive model tiers

A Haiku classifier reads each message and routes it: small talk to Haiku, product questions to Sonnet 4.6, hard multi-part challenges to Opus 4.8. Right intelligence, right cost, every turn.

Haiku · Sonnet 4.6 · Opus 4.8

02 / Delivery

SSE streaming

Responses stream live over Server-Sent Events. On complex turns she sends an acknowledgment first, so the visitor never stares at a blank panel.

Token-by-token

03 / Memory

Conversation graph

Cloudflare KV holds each session with a 30-minute TTL. She remembers what was already said without re-asking, then forgets cleanly.

KV · 30-min TTL

04 / Presence

The ambient orb

A luminous bokeh sphere rendered on canvas, breathing in the corner. Not a robot, not a chat bubble: particles that feel alive. It opens into a refined dark panel.

Canvas · cursor-reactive

05 / Proactivity

Behavioral triggers

She watches scroll depth, time on section, and intent signals. Lingering on pricing, reaching the bottom: she opens herself and engages, once, without nagging.

Scroll · pricing · exit

06 / Safety

Rails and limits

Per-IP rate limiting on the edge, CORS locked to Cogniate's domains, and a system prompt that will not quote enterprise pricing or invent roadmap.

Rate-limited · CORS-scoped

REQUEST

Widget

Visitor message + page context, streamed up.

EDGE

Worker

Cloudflare Worker: validate, rate-limit, load session.

ROUTE

Classifier

Haiku picks the tier for this exact message.

THINK

Claude

Cached system prompt + history, streamed back.

RETURN

SSE

Tokens to the panel; turn saved to KV.

3

Model tiers, routed per message

~90%

Cheaper system prompt via caching

5,101

Tokens served from cache, not reprocessed

0

Third-party chat dependencies

Brought to current standard

The same Lyra, on April 2026's stack.

This week we pulled the backend up to our live standard: the most capable models, prompt caching, and edge-grade resilience. Nothing about her voice changed. Everything about her cost and speed did.

What shipped

✓Opus 4.8 on the top reasoning tier, replacing 4.6. Same price, more capable on the questions that decide a deal.
✓Prompt caching on the system prompt. Served from cache after the first call, on every tier: lower cost, faster first token.
✓Retry & timeout on the model calls. Transient blips self-heal instead of surfacing an error mid-conversation.
✓Verified live on the production endpoint, not a staging guess.

# live worker log, 3 calls, one session req 1 read=0 created=5101 req 2 read=5101 created=0 cache hit req 3 read=5101 created=0 cache hit

Live, right now

Don't take the deck's word for it.

Lyra is running on this page, talking to the production Worker. Open her in the corner and ask something hard. Push on a claim. Try to get a number she should not give.

Her orb is bottom-right