Kinesthetic

The knowledge layer for specialized AI

Kinesthetic provides the infrastructure and tooling to ergonomically build agents that can learn how to improve themselves.

The Kinesthetic layerAn agent pipeline of user input, input context, agent harness, output, and annotations and evaluations, sitting above the Kinesthetic Knowledge Layer of interface, knowledge engine, and knowledge base that feed it.
Kinesthetic Knowledge Layer
User input / session context
Give your model specific instructions tailored to this
Input Context
Input fewer tokens
Handle more inputs/tasks
Auditable
Agent (Harness)
Spend fewer tokens figuring out what to do
Easier to make it work with cheaper/faster/OSS models
Output
Better quality
Annotations & Evaluations
Use these directly to improve agent
Kinesthetic Interface
Chat about how your agent handles scenarios
Let agent propose changes based on failure reports or new feature requests, safely
Ergonomic for the whole team, not just engineering
Knowledge Engine
Leverage more data w/ less structure
Cleanly separate human-authored ground-truth from derived artifacts optimized for agent use
Utilize the latest research methods as-a-service so you can focus on the product
Knowledge Base
Actually scalable: no token limits or long-context hazards
Structure agent knowledge as a first-class, versionable, living asset
Build something that outlives the model release/harness design lifecycle

See exactly how your AI behaves

Auditing a domain agent usually means reading a giant prompt and guessing what the model attended to. Kinesthetic makes the specification first-class and queryable: ask in plain language how a scenario is handled and get back the specific ground-truth artifacts that govern it, each with its scope, owner, and full text. For any production trace, you can see exactly which artifacts were retrieved into the context, and at which version, so an audit is a handful of artifacts to read, not a 40k-token prompt to reverse-engineer.

Retrieved GTtraceOF-2287
The ground-truth artifacts pulled into context for this trace. (The full trace, including inputs, tool calls, and output, is elided here; this is just what governed it.)
GT-114agent: contract-reviewv2 · PR #190
Order-form auto-approval threshold
Order forms with a total contract value at or below $50,000 may be auto-approved without analyst review, provided all required fields are present and the counterparty is in good standing.
Applies to agent contract-review · all tenants (base rule) · owner M. Okafor
GT-073Approval routing & analyst assignmentagent
GT-009House style: approval notificationsshared

Scale knowledge past the context window

A prompt has a ceiling; a knowledge base doesn't. Kinesthetic moves your specification out of the context window and into a KB the agent retrieves from: no input-token limit, no context rot as it grows. At inference time, our knowledge engine assembles a small, task-tailored context instead of stuffing in the corpus. The spec can grow arbitrarily as you continue to refine and expand the behavior your agent can perform.

Improve from information in any form

Most improvement tooling only ingests data shaped for optimization: reward signals, labeled pairs. Kinesthetic takes whatever form your insight already arrives in: a plain-English correction, a batch of annotated traces, a failure-investigation report, a new feature spec. Coding agents are great for code, but they can't safely make these edits across large instructions. Our agent built specifically for this purpose can directly resolve many tasks that currently end up as tickets.

Kinesthetic agentAgentsupport-triage
AL
Alex Liang
We keep telling Pro users bulk export isn't available — we shipped that months ago. And we open basically every reply with an apology, even when nothing went wrong.
12 transcripts · flagged
Kinesthetic agent

The same "no bulk export" message is written three different ways — "not available," "not supported yet," and a sales hand-off — so searching for "bulk export" only catches one of the three. And your note is really two fixes, not one: correct the capability wherever it's stale, and loosen the blanket apology. Staged across 4 artifacts:

CAP-12staged change
Bulk export (CSV/JSON) is not available on any plan.
+Bulk export (CSV/JSON) is available on Pro and above.
PLAN-7Prostaged change
Export: single records only. Larger exports aren't supported yet.
+Export: single records and bulk (CSV/JSON).
REPLY-31staged change
"We can't export everything at once yet — if you need a full CSV, our sales team can pull one for you."
+"You can export everything at once from Settings → Export (CSV or JSON)."
TONE-3staged change
Open every reply to a complaint with an apology.
+Apologize only when we actually got something wrong; otherwise acknowledge and go straight to the fix.
4 artifacts · branch fix-bulk-export-and-tone

Give the agent optimal context

The Knowledge Engine provides fewer, better tokens tailored to the task instead of the whole corpus, which lifts answer quality while cutting wasted agentic search. Additionally, our engine learns from its actions. Using all feedback on incorrect actions and examples of correct actions, the engine captures learnings and provides them to your agent at inference.

It's delivered as a managed service: state-of-the-art retrieval and learning methods from frontier research, on tap. Your team doesn't have to keep solving the hard, general problems of running and improving a knowledge base; we handle those, so your people focus on the genuinely bespoke needs of your product. And because the agent gets context it can act on immediately, you can start capturing the gains of smaller and open-source models.

τ³-bench · bankingAction-check pass rate
28.2%
41.5%
Mistral Large 3non-reasoning
6.0%
15.0%
GPT-OSS 120Breasoning
FinanceBenchPage-level retrieval F1
0.29
0.55
Financial filings QA150 questions
Harvey LAB · legalRubric pass rate
0.35
0.45
Legal work44 held-out tasks
BaselineWith Kinesthetic

Version-control your specification

Every change is a diff on specific artifacts, staged on a branch and merged through a pull request, with a PR note the system drafts from your conversation. Each artifact carries its own history you can trace back to the PR, author, and date that introduced it, so you always know who changed what, when, and why.

History · GT-114
PR #191Raised to $50k, clarified total valueA. Liang · Jun 8
PR #190Added SMB carve-outM. Okafor · Apr 14
PR #142Seed threshold at $35kM. Okafor · Nov 3

Validate before you ship

Before a change ships, run it against real sampled traces in the Playground and diff your branch's behavior against production: see exactly which inputs change and read the agent's reasoning on each. Safety checks then run over the whole branch: a behavioral diff that quantifies what moved, and a consistency check that flags inconsistencies or unspecified behavior across artifacts.

Replay · production vs. branch
OF-2287 · Northwind $42kbranch deviated
prod routed to analyst review on total value; branch auto-approved, open the full diff
OF-2291 · Globex $60kmatches production
OF-2304 · Initech $31kmatches production
3 of 12 replays deviated from production · no inconsistencies found across artifacts

A clear authority gradient

Kinesthetic enforces a strict authority gradient. Human-authored ground truth is the single source of truth; everything the system derives from it (indexes, structures, the assembled context, distilled playbooks) is disposable and regenerable. When someone corrects the truth, the change flows downward and rebuilds the derived machinery, so nothing downstream ever hardens into a competing source of truth.

From ground truth to context at inference
Ground truth · authoritative
↓ a correction flows down, rebuilds the rest
Derived · regenerable
indexes · structures · assembled context · playbooks
Knowledge Engine
context at inference
Agent
Both the ground truth and its derived structures feed the engine, which assembles the agent's context at inference.

Generates your post-training data

Improving the agent and generating training data turn out to be the same activity. Every expert correction, and every trajectory a teacher actually got right, is captured as labeled, domain-grounded data, the exact policy data you'd train on. The loop that makes the agent better today is quietly accumulating proprietary data no foundation model has seen and no competitor can replicate, so you can decide later what genuinely needs frontier inference and what you'd rather distill into cheaper models, or weights you own.

A durable asset that survives model releases

Most of what teams invest in is perishable: prompts tuned to a specific model, fine-tuned weights, a hand-built harness; all of it depreciates the moment a new frontier model ships or the stack changes. Your ground-truth specification doesn't. It's model- and harness-agnostic knowledge, kept in plain language the agent reads at inference, and the model-specific machinery beneath it regenerates from that truth. So every correction is a permanent deposit that compounds across release cycles instead of resetting with them. Not only is this a better place to put and use knowledge for your system today, that work is also more worthwhile when it looks like investment versus rent.

What normally churns your knowledge plumbing
new modelharness updatereranker swaptool-schema changecontext window growsnew SDK
prompts · few-shots · fine-tunes · harness glue
ground-truth spec
untouched · derived layer auto-regenerates · compounds →
Each model upgrade, harness update, reranker swap, or tool change can invalidate hand-tuned prompts and fine-tunes. With our architecture, ground truth is unaffected and only the disposable machinery beneath it regenerates, at no human cost.

Multi-tenant, multi-agent by design

One specification can serve many agents and many tenants. Behavior is scoped: shared base rules that apply everywhere, per-agent rules, and per-tenant overrides that store only the difference from the base. The engine retrieves the base and the override together and reconciles them at inference, and you can view the resolved, effective behavior for any agent or tenant. So you share common ground truth across agents while customizing per customer, without forking the whole spec for each one.

One spec, scoped per tenant
baseGT-114 · total value ≤ $50k auto-approves
tenant: GlobexGT-220 · SMB ≤ $100k (stores only the diff)
resolvedGlobex SMB → auto-approve ≤ $100k