The knowledge layer for specialized AI

Kinesthetic provides the infrastructure and tooling to ergonomically build agents that can learn how to improve themselves.

Request Early Access Read the research →

See exactly how your AI behaves

Auditing a domain agent usually means reading a giant prompt and guessing what the model attended to. Kinesthetic makes the specification first-class and queryable: ask in plain language how a scenario is handled and get back the specific ground-truth artifacts that govern it, each with its scope, owner, and full text. For any production trace, you can see exactly which artifacts were retrieved into the context, and at which version, so an audit is a handful of artifacts to read, not a 40k-token prompt to reverse-engineer.

Retrieved GTtraceOF-2287

The ground-truth artifacts pulled into context for this trace. (The full trace, including inputs, tool calls, and output, is elided here; this is just what governed it.)

GT-114agent: contract-reviewv2 · PR #190

Order-form auto-approval threshold

Order forms with a total contract value at or below $50,000 may be auto-approved without analyst review, provided all required fields are present and the counterparty is in good standing.

Applies to agent contract-review · all tenants (base rule) · owner M. Okafor

GT-073Approval routing & analyst assignmentagent

GT-009House style: approval notificationsshared

Scale knowledge past the context window

A prompt has a ceiling; a knowledge base doesn't. Kinesthetic moves your specification out of the context window and into a KB the agent retrieves from: no input-token limit, no context rot as it grows. At inference time, our knowledge engine assembles a small, task-tailored context instead of stuffing in the corpus. The spec can grow arbitrarily as you continue to refine and expand the behavior your agent can perform.

Improve from information in any form

Most improvement tooling only ingests data shaped for optimization: reward signals, labeled pairs. Kinesthetic takes whatever form your insight already arrives in: a plain-English correction, a batch of annotated traces, a failure-investigation report, a new feature spec. Coding agents are great for code, but they can't safely make these edits across large instructions. Our agent built specifically for this purpose can directly resolve many tasks that currently end up as tickets.

Kinesthetic agentAgentsupport-triage

Alex Liang

We keep telling Pro users bulk export isn't available — we shipped that months ago. And we open basically every reply with an apology, even when nothing went wrong.

▤12 transcripts · flagged

Kinesthetic agent

The same "no bulk export" message is written three different ways — "not available," "not supported yet," and a sales hand-off — so searching for "bulk export" only catches one of the three. And your note is really two fixes, not one: correct the capability wherever it's stale, and loosen the blanket apology. Staged across 4 artifacts:

CAP-12staged change

−Bulk export (CSV/JSON) is not available on any plan.

+Bulk export (CSV/JSON) is available on Pro and above.

PLAN-7Prostaged change

−Export: single records only. Larger exports aren't supported yet.

+Export: single records and bulk (CSV/JSON).

REPLY-31staged change

−"We can't export everything at once yet — if you need a full CSV, our sales team can pull one for you."

+"You can export everything at once from Settings → Export (CSV or JSON)."

TONE-3staged change

−Open every reply to a complaint with an apology.

+Apologize only when we actually got something wrong; otherwise acknowledge and go straight to the fix.

4 artifacts · branch fix-bulk-export-and-tone

Give the agent optimal context

The Knowledge Engine provides fewer, better tokens tailored to the task instead of the whole corpus, which lifts answer quality while cutting wasted agentic search. Additionally, our engine learns from its actions. Using all feedback on incorrect actions and examples of correct actions, the engine captures learnings and provides them to your agent at inference.

It's delivered as a managed service: state-of-the-art retrieval and learning methods from frontier research, on tap. Your team doesn't have to keep solving the hard, general problems of running and improving a knowledge base; we handle those, so your people focus on the genuinely bespoke needs of your product. And because the agent gets context it can act on immediately, you can start capturing the gains of smaller and open-source models.

τ³-bench · bankingAction-check pass rate

28.2%

41.5%

Mistral Large 3non-reasoning

6.0%

15.0%

GPT-OSS 120Breasoning

FinanceBenchPage-level retrieval F1

0.29

0.55

Financial filings QA150 questions

Harvey LAB · legalRubric pass rate

0.35

0.45

Legal work44 held-out tasks

BaselineWith Kinesthetic

Version-control your specification

Every change is a diff on specific artifacts, staged on a branch and merged through a pull request, with a PR note the system drafts from your conversation. Each artifact carries its own history you can trace back to the PR, author, and date that introduced it, so you always know who changed what, when, and why.

History · GT-114

PR #191Raised to $50k, clarified total valueA. Liang · Jun 8

PR #190Added SMB carve-outM. Okafor · Apr 14

PR #142Seed threshold at $35kM. Okafor · Nov 3

Validate before you ship

Before a change ships, run it against real sampled traces in the Playground and diff your branch's behavior against production: see exactly which inputs change and read the agent's reasoning on each. Safety checks then run over the whole branch: a behavioral diff that quantifies what moved, and a consistency check that flags inconsistencies or unspecified behavior across artifacts.

Replay · production vs. branch

OF-2287 · Northwind $42kbranch deviated

prod routed to analyst review on total value; branch auto-approved, open the full diff

OF-2291 · Globex $60kmatches production

OF-2304 · Initech $31kmatches production

⚑3 of 12 replays deviated from production · no inconsistencies found across artifacts

A clear authority gradient

Kinesthetic enforces a strict authority gradient. Human-authored ground truth is the single source of truth; everything the system derives from it (indexes, structures, the assembled context, distilled playbooks) is disposable and regenerable. When someone corrects the truth, the change flows downward and rebuilds the derived machinery, so nothing downstream ever hardens into a competing source of truth.

From ground truth to context at inference

Ground truth · authoritative

↓ a correction flows down, rebuilds the rest

Derived · regenerable

indexes · structures · assembled context · playbooks

→

Knowledge Engine

→

context at inference

→

Agent

Both the ground truth and its derived structures feed the engine, which assembles the agent's context at inference.

Generates your post-training data

Improving the agent and generating training data turn out to be the same activity. Every expert correction, and every trajectory a teacher actually got right, is captured as labeled, domain-grounded data, the exact policy data you'd train on. The loop that makes the agent better today is quietly accumulating proprietary data no foundation model has seen and no competitor can replicate, so you can decide later what genuinely needs frontier inference and what you'd rather distill into cheaper models, or weights you own.

A durable asset that survives model releases

Most of what teams invest in is perishable: prompts tuned to a specific model, fine-tuned weights, a hand-built harness; all of it depreciates the moment a new frontier model ships or the stack changes. Your ground-truth specification doesn't. It's model- and harness-agnostic knowledge, kept in plain language the agent reads at inference, and the model-specific machinery beneath it regenerates from that truth. So every correction is a permanent deposit that compounds across release cycles instead of resetting with them. Not only is this a better place to put and use knowledge for your system today, that work is also more worthwhile when it looks like investment versus rent.

What normally churns your knowledge plumbing

new modelharness updatereranker swaptool-schema changecontext window growsnew SDK

prompts · few-shots · fine-tunes · harness glue

ground-truth spec

untouched · derived layer auto-regenerates · compounds →

Each model upgrade, harness update, reranker swap, or tool change can invalidate hand-tuned prompts and fine-tunes. With our architecture, ground truth is unaffected and only the disposable machinery beneath it regenerates, at no human cost.

Multi-tenant, multi-agent by design

One specification can serve many agents and many tenants. Behavior is scoped: shared base rules that apply everywhere, per-agent rules, and per-tenant overrides that store only the difference from the base. The engine retrieves the base and the override together and reconciles them at inference, and you can view the resolved, effective behavior for any agent or tenant. So you share common ground truth across agents while customizing per customer, without forking the whole spec for each one.

One spec, scoped per tenant

baseGT-114 · total value ≤ $50k auto-approves

tenant: GlobexGT-220 · SMB ≤ $100k (stores only the diff)

resolvedGlobex SMB → auto-approve ≤ $100k

What it is

Product →

The interface for building, correcting, and shipping your agent's specification.

How it works

Research →

The Context Engine and meta-distillation, evaluated across banking, finance, and legal agent benchmarks.

Why we built it

Thesis →

Why specification, not raw capability, is the failure mode that survives every frontier release.