Maintenance in progress… will be live in a few days
Sprapp

Own your AI.
Build it, run it, sell it.

Sprapp builds the model and the engine. Our own 1.58-bit ternary LLM family runs on our own CPU inference engine — up to 58 tok/s, no GPU. Train a custom agent in minutes, run it private, and own what you build.

on-devicetok/s on CPU
on-devicemodels we own
$0to start

You rent intelligence. You should own it.

Hosted AI locks you in — vendor pricing, per-token fees forever, and nothing you actually own.

The GPU tax

Hosted APIs bill per token, forever. Costs scale linearly and never flatten, no matter how much you use them.

Vendor lock-in

You're tied to one vendor's roadmap, pricing, and limits — with no real way to fine-tune a closed model into your own.

No upside

You build on someone else's model and own nothing. Sprapp gives you a model you own, run cheap, and can monetize.

Own the model. Own the engine.

Sprapp builds its own 1.58-bit ternary LLM family — eeny, meeny, miny — and its own Rust CPU inference engine, serve-native, with hand-written AVX2 SIMD at up to 58 tok/s, no GPU. Train your own LoRA adapter in minutes, run it private on cheap CPU, and publish it to earn. Public chat runs on-device in your browser at $0.

Everything you need to own your AI

Model, engine, training, and marketplace — one stack, CPU-first, GPU optional.

Fast CPU inference

Our serve-native engine runs miny (360M, 1.58-bit ternary, 211 MB) at up to 58 tok/s on a plain CPU. OpenAI-compatible, with LoRA adapter hot-swap.

Models you own

eeny, meeny, and miny — a ternary LLM family we built from scratch. Own the weights and run them anywhere, no vendor in the loop.

On-device privacy

Public chat runs in your browser via WASM. Conversations never leave your device, and inference is $0 at any scale.

Train your own

Give it a persona, add examples (or auto-generate a dataset), fine-tune a LoRA adapter in minutes, and chat with it live.

Live, cited web

Agents pull real-time, cited facts from the web via Exa — grounded answers instead of stale guesses.

How it works

Train an agent, run it on CPU, own what you build.

1

Train your agent

Add a persona and a few examples — or auto-generate a dataset. We fine-tune a LoRA adapter on miny in minutes.

2

Run it on CPU

Your adapter hot-swaps into our serve-native engine — up to 58 tok/s, no GPU — or runs on-device in the browser at $0.

3

Own & monetize

Keep the .knl adapter, run it private, or publish it to the marketplace and earn from what you built.

One stack. Everything you need.

Inference, training, a marketplace, and a safety filter — all on models you own.

Fast Inference

Our serve-native engine serves miny on a plain CPU — up to 58 tok/s, GPU optional, OpenAI-compatible with LoRA hot-swap.

Train-your-own

Persona plus a few examples, or an auto-generated dataset, becomes a real LoRA adapter in minutes — graded with a report card.

Marketplace

Publish the agent you trained as a .knl adapter. Others use it, you earn — own it, subscribe, or pay per query.

Sprappy Filter

Pre-LLM threat scoring across 25 categories in under a millisecond — block prompt injection, jailbreaks, and PII at the edge.

On-device

Public chat runs in the browser via WASM — private, offline-capable, and $0 to serve at any scale.

Own the weights

eeny, meeny, miny — a 1.58-bit ternary family we built from scratch. Run them anywhere, no vendor lock-in.

Live, cited web

Agents ground answers in real-time web results via Exa, with sources — not stale, unverifiable guesses.

Sprapp is not a substitute for professional advice. Always consult qualified professionals for legal, medical, financial, or other critical decisions.

Simple, transparent pricing

Start free. Flat pricing — never per-token.

Free
$0 / forever
On-device chat and a starter agent
  • On-device chat — private, $0, unlimited
  • Our own models (eeny · meeny · miny)
  • Fast CPU inference — up to 58 tok/s, no GPU
  • Train 1 custom LoRA agent
  • Live, cited web answers (Exa)
  • Sprappy Filter — prompt threat scoring
  • OpenAI-compatible API access
  • Export your trained adapter (.knl)
  • Dark/light theme

* On-device chat runs entirely in your browser

Get started

Frequently asked questions

What is Sprapp?
Sprapp builds its own tiny AI models and the CPU engine that runs them. You get fast inference (up to 58 tok/s, no GPU), a studio to train your own LoRA agent in minutes, a marketplace to publish and earn, and Sprappy Filter to screen prompts — all on models you own.
Is it really free?
Public chat runs on-device in your browser at $0 — no signup, no per-token fees. Training and hosted inference start free, and paid tiers are flat, not per-token.
Do you store my conversations?
On-device chat never leaves your browser. For hosted inference we keep only what's needed to run the service — there's no per-token vendor in the middle.
What models does Sprapp use?
Our own 1.58-bit ternary family — eeny (999K), meeny (~6M), and miny (360M, 211 MB) — built from scratch and served by our own Rust engine, serve-native.
How does training work?
Give your agent a persona and a few examples (or auto-generate a dataset). We fine-tune a LoRA adapter on miny in minutes, grade it with a report card, and hot-swap it live into the engine. You keep the .knl adapter.
How is it so cheap?
1.58-bit ternary models run on commodity CPUs — no GPU tax. Cost is flat per box, so the more you use it the less each token costs. On-device runs at $0, and it's roughly 17× cheaper than GPT-4o and 26× cheaper than Claude at scale.

Own your AI.

Build the model, run it on CPU, train your own agent, and keep what you make. Free to start — public chat runs on-device at $0.

Start for free