Evidence infrastructure for high-stakes AI

Put every model decision on the record.

Inferify writes a tamper-evident evidence record for every inference your model makes: version, input fingerprint, output, confidence, and whether the model was inside its validated operating regime. When a decision gets questioned six months later, the proof already exists.

No integration to view the demo. Live data from a fictional radiology model.
Every record carries model@version sha256(input) output + confidence regime validity signed timestamp
01 The blind spot

Your model has a failure mode it cannot see.

Validation certifies performance inside a fixed operating envelope. The moment an input drifts outside it (a new scanner resolution, a rare presentation, a model version you forgot was still deployed) your dashboards still read green. That gap is where unaccountable decisions live.

Inside the envelopeVALID, signed and audit-ready.
Outside the envelopeFLAGGED, with the reason and full trail.
02 Why now

Accountability is becoming a requirement, not a nicety.

AI is moving into decisions that get audited, disputed, and litigated. The regulatory frameworks now forming around high-stakes AI share one demand: show what the model did, and show it was operating as intended. Today that record is reconstructed by hand, months later, from scattered logs. That window is the opportunity.

FORCE 01

Regulation is arriving

Record-keeping and traceability obligations for high-risk AI in the EU, documentation expectations for AI and ML medical devices, model-risk governance in finance, and the push toward measurable, documented AI under emerging US frameworks.

FORCE 02

The stakes moved up

Models now decide diagnoses, credit, claims, and autonomous actions. When a single decision can be challenged, aggregate accuracy is not a defense. The individual decision has to be explainable on its own terms.

FORCE 03

Disputes follow deployment

Audits, lawsuits, and customer challenges arrive after the fact. The evidence has to already exist at the moment of the decision. You cannot manufacture a trustworthy record once the question is already being asked.

03 How it works

One line wraps every inference.

No pipeline rewrites. Wrap your prediction and Inferify fingerprints the input, records the output and confidence, and evaluates the regime signal before the response leaves your service.

STEP 01

Capture

The SDK hashes the input and logs the model version, output, and confidence. A structured record, not a log line you will parse later.

STEP 02

Validate the regime

Each inference is checked against the model's validated operating envelope. Inside is VALID; outside is FLAGGED with the reason.

STEP 03

Export

One click turns the window into a tamper-evident, SHA-256-chained package, built for regulatory submission, audit, and legal discovery.

Instrument it in an afternoon.

Python or TypeScript. The capture call returns the regime verdict inline, so you can route, hold, or escalate a decision the moment it leaves the model.

predict.py
import inferify

verdict = inferify.capture(
    model="novadx-v2.3.1",
    input=xray_512,
    output={"pneumonia": 0.87, "normal": 0.13},
    confidence=0.87,
)
if verdict.regime != "VALID":
    escalate(verdict)   # reason: out_of_distribution
# record inf_a3f921 · sealed in 24ms
04 The evidence record

Anatomy of a single decision.

Not a log entry. A structured, signed artifact you can hand to a regulator, an auditor, or opposing counsel, and that they can verify was never altered.

inf_a3f921 · evidence record REGIME VALID
MODEL VERSION
novadx-v2.3.1
INPUT HASH
sha256:a3f9c2…
CONFIDENCE
0.91
INPUT TYPE
chest_xray_512
RAW OUTPUT
pneumonia: 0.91
TIMESTAMP
14:42:09 UTC
01 IN PROCESS

Your model

The capture call wraps your existing prediction. The SDK runs inside your service, in your environment.

02 EVALUATE

Fingerprint + verdict

Input is hashed, the regime is checked against the validated envelope, and a verdict is returned inline.

03 SEAL

Hash-chained store

Each record commits to the one before it. Any later alteration breaks the chain and is detectable.

04 PROVE

Export package

Any time window exports as a signed, independently verifiable evidence package.

Raw inputs never have to leave your environment. The record carries a fingerprint, not the payload.
05 The dashboard

Every inference, accounted for.

The aggregate ledger: live model decisions, their regime verdicts, and exactly why anything was flagged. Illustrative data from a fictional radiology model. The real one runs on your inference stream.

NovaDx  /  Inference Log LIVE
LOGGED TODAY
14,902
IN WINDOW
20
REGIME VALID
15
FLAGGED
5
WHY INFERENCES WERE FLAGGED (THIS WINDOW)
out_of_distribution_input2
confidence_below_threshold1
deprecated_model_version1
input_distribution_shift1
INFERENCE IDTIMEMODELOUTPUTCONFREGIME
Each row links to its full evidence record: input hash, signed timestamp, and the reason for any flag.
06 When the record matters

The moment someone asks what your model did.

The cost of missing evidence is not abstract. It shows up as a specific request, on a specific deadline, that you either can answer instantly or spend weeks reconstructing.

A regulator asks you to show the model stayed within its validated use.
WITHOUTWeeks reconstructing which inputs fell outside the envelope, from logs that were never built to prove it.
WITHEvery out-of-regime inference is already flagged, dated, and exportable as a signed package.
A denied claimant's lawyer requests every input behind the decision.
WITHOUTLog archaeology under a discovery deadline, with no proof the records were not altered.
WITHOne record, tamper-evident, with the model version, fingerprint, output, and verdict.
A customer disputes an automated outcome and you have 30 days to respond.
WITHOUTAn engineer pulled off roadmap to trace one decision through the stack by hand.
WITHPull the record, see whether it was VALID or FLAGGED and why, and answer the same day.
An incident review needs to know which model version was live at 14:42.
WITHOUTGuesswork across deploy logs, hoping the timestamps line up with the failure.
WITHThe exact version is stamped on every record, so the timeline is unambiguous.
07 Who it is for

Where a wrong decision has consequences.

Teams shipping models into workflows where any single decision may later have to be explained, audited, or defended.

+

Healthcare

Diagnostic and triage models facing FDA submission and clinical liability.

$

Fintech

Credit, fraud, and underwriting decisions under fair-lending and model-risk rules.

§

Insurance

Claims and pricing models that have to justify each call to a regulator.

Enterprise AI

Agentic and decision systems where a customer dispute needs a paper trail.

08 Why Inferify

An evidence layer, not another monitor.

Monitoring tells you how your model is doing in aggregate, over time. Logging tells you what happened, if you can find it. Neither produces a per-decision, regime-aware, tamper-evident record built to survive an audit. That is the gap Inferify fills.

Capability Drift & performance monitoring MLOps logging Manual audit Inferify
Per-decision recordaggregate onlyunstructuredafter the fact✓ native
Regime validity at decision time
Tamper-evident and signed
Built for audit and legal exportDIYmanual
One-line integrationagents
Aggregate model health over timepartialcomplementary
Inferify runs alongside your monitoring stack. Keep watching aggregate health where you do today; Inferify owns the per-decision record that aggregate dashboards were never built to produce.
09 Security and compliance

Built for environments that cannot leak.

The teams that need evidence most are the ones with the strictest constraints. Inferify is designed so that adopting it never means relaxing them.

#

Fingerprint, not payload

Records store a SHA-256 hash of the input, not the raw data. Sensitive inputs stay where they already live.

Runs in your environment

The SDK runs in-process. VPC and on-prem deployment are first-class targets, so data does not have to transit to us.

Tamper-evident by design

A SHA-256 hash chain links every record. The export package is verifiable independently of Inferify.

§

Compliance-aware

Data residency and access controls are core to the design. SOC 2 Type II is on the roadmap as we move into production deployments.

10 Grounded in research

The regime signal is not a heuristic.

Inferify productizes peer-reviewed work on the structural limits of model evaluation: the formal reason whole classes of failure stay invisible to validation.

EIML @ ICML 2026
Ontological Closure and Structural Limits on Systemic AI
Entire classes of failure are invisible by construction when they fall outside an evaluation framework's bounded space.
PhilML @ ICML 2026
Beyond Accuracy: Epistemic Justification in Trustworthy ML
A model can be accurate for the wrong reasons; the gap between predictive success and warranted trust is measurable.

The regime validity signal is that research, productized into infrastructure.

11 The team

Built by the people who formalized the problem.

The two papers above are ours. Inferify is not a wrapper on someone else's idea; it is the productization of research we wrote on why model evaluation has structural blind spots.

Poojak Patel
Founder & CEO
  • First-author, CVPR 2026, presented in Denver. Top-tier computer vision research.
  • First-author oral, AIMLSystems 2025, from a Google Quantum AI collaboration.
  • Co-author of the ICML 2026 work Inferify is built on (EIML and PhilML).
  • Built a prior product to 15,000+ users across 20+ countries, unfunded.
  • Technical founder who ships the product end to end, from SDK to dashboard.
Maneth Perera
Founder
  • Aerospace engineering at UIUC: signal processing, radar imaging, controls.
  • UAV synthetic aperture radar research at MIT.
  • Co-author, EIML @ ICML 2026 and a Stanford PAI 2026 paper.
  • Systems-engineering depth in reliability and high-stakes imaging pipelines.
  • Owns the rigor that an evidence product lives or dies on.

Most teams selling AI accountability are repackaging monitoring. We are shipping the infrastructure version of our own published research on why those monitors miss what they miss.

12 Roadmap

Where this goes.

A clear path from a working demo to the system of record for high-stakes model decisions.

Now · live

The core record

  • Python SDK, one-line capture
  • Regime validity verdict
  • Evidence dashboard
  • Signed, hash-chained export
Next

Into production

  • TypeScript SDK
  • VPC and on-prem deployment
  • First paid pilots
  • Custom envelope definitions
Then

The standard

  • SOC 2 Type II
  • Regulator-ready templates (FDA, model risk)
  • Evidence API and webhooks
  • Cross-team audit workspace
13 Where we are

Early, and honest about it.

A founding team with the research credibility to own this category, a working demo, and the first design partners testing it on real workflows. We are raising to turn that into production deployments.

May 2026
Founded
Live
Working demo since June 2026
2
Design partners in trial
Pre-seed
Raising now
14 Questions

The short version.

No. Logs are unstructured and prove nothing about whether a decision should have been trusted. Inferify produces a structured, signed record per inference, with a regime verdict, built to survive an audit or discovery.

One capture call around your existing prediction. No pipeline rewrite, no model retraining. The SDK runs in-process and the verdict comes back inline, typically in tens of milliseconds.

The capture path is designed to add single-digit to low tens of milliseconds and can run without blocking your response. The record is sealed in the background while your model returns as usual.

Inferify still captures the full signed record, and helps you define a validated envelope from your own historical inputs, so the regime signal becomes meaningful rather than guessed.

Yes. Any model that produces an output and a confidence or score can be captured. For agents, the record anchors each decision in a chain you can replay and defend.

Records are hash-chained, so each one commits to the one before it. Any alteration breaks the chain and is detectable, and the export package is verifiable independently of Inferify.

The record captures an input fingerprint, not the raw input. The design target is teams under strict data-residency and compliance constraints, so the sensitive payload stays where it already lives.

Stop reconstructing what your model did. Start proving it.

Six months of manual audit reconstruction, replaced by an evidence record that is already written, on every inference.