The Technology Behind EdgePype
How we turn 5 examples into a production-ready AI model.
Task Compiler — Understanding Intent
When you describe your task, EdgePype's Task Compiler analyzes your description and examples to build a structured task specification. This spec defines what your model needs to learn: the domain, the tone, the expected input/output patterns, and the quality criteria. Think of it as a blueprint for your AI.
Your 5 examples become the seed. The Task Compiler turns them into a comprehensive training plan.
Synthetic Data Engine — Quality at Scale
Training a good model requires thousands of diverse examples. Our Synthetic Data Engine takes your seed examples and the task specification, then generates hundreds of high-quality training pairs using enterprise-grade AI models. Each generated pair is filtered for quality, relevance, and diversity. The result: your model sees a rich, varied dataset that covers edge cases you never thought of.
Optimized Training — Fast and Efficient
EdgePype fine-tunes state-of-the-art open-weight models using QLoRA on cloud GPUs. We handle the hyperparameter tuning, the quantization, and the optimization. Your model is compressed using importance-matrix quantization (imatrix) — preserving the knowledge that matters most for your task while keeping the file size small enough to run on a laptop.
Automated Evaluation — Trust the Results
Every model goes through automated evaluation before delivery. We test it against held-out examples, measure response quality, and benchmark inference speed. You get a quality score and performance metrics — no guessing whether your model works.
Run Anywhere — Your Device, Your Rules
Your trained model is yours. Chat with it in the browser using WebLLM (zero install, runs on your GPU). Download the GGUF file and run it with Ollama on Mac, Windows, or Linux. Embed it as a chat widget on your website. No API calls to our servers — your data stays on your device.
Browser Chat
WebLLM runs the model directly in your browser tab. No server, no data sent anywhere.
Local Download
GGUF format compatible with Ollama, llama.cpp, LM Studio, and more. Runs offline forever.
Widget Embed
Add an AI chat to your website with a single script tag. Powered by your custom model.
WebLLM — AI That Runs in Your Browser
EdgePype uses WebLLM to run your custom model directly in the browser via WebGPU. No plugins, no extensions, no server calls. Your model is converted to MLC format, downloaded once, and cached locally. Every conversation happens on your hardware at near-native speed.
WebGPU Acceleration
Models run on your device's GPU via WebGPU, achieving near-native inference speeds directly in the browser.
One-Time Download
The model downloads once and is cached in IndexedDB. Every subsequent visit loads instantly from local storage.
Works Offline
Once cached, your model works without any internet connection. True offline AI, right in your browser tab.
Privacy Architecture — Zero Trust by Design
EdgePype is architecturally incapable of seeing your data after training. The model runs entirely on your machine. We cannot read your conversations, even if we wanted to. This is not a policy — it is a technical guarantee.
Zero Server Inference
After training, your model runs entirely on your device. EdgePype servers are never involved in your conversations.
Encrypted Storage
Model files are stored encrypted at rest on Cloudflare R2. Only your authenticated browser can download them.
Training Data Deleted
Your examples and generated training data are permanently deleted within 24 hours of model completion.
No Data Retention
We never store, log, or analyze your chat messages. They exist only in your browser's memory.
Built on Open Source
EdgePype is built on the shoulders of giants. We believe in the open-source AI ecosystem and contribute back where we can.