This weblog put up focuses on new options and enhancements. For a complete listing, together with bug fixes, please see the launch notes.
Introducing Native Runners: Run Fashions on Your Personal {Hardware}
Constructing AI fashions usually begins regionally. You experiment with structure, fine-tune on small datasets, and validate concepts utilizing your individual machine. However the second you need to take a look at that mannequin inside a real-world pipeline, issues turn out to be difficult.
You often have two choices:
Add the mannequin to a distant cloud setting, even for early-stage testing
Construct and expose your individual API server, deal with authentication, safety, and infrastructure simply to check regionally
Neither path is good, particularly when you’re:
Engaged on private or resource-limited tasks
Creating fashions that want entry to native information, OS-level instruments, or restricted information
Managing edge or on-prem environments the place cloud is not viable
Native Runners resolve this drawback.
They assist you to develop, take a look at, and run fashions by yourself machine whereas nonetheless connecting to Clarifai’s platform. You don’t must add your mannequin to the cloud. You merely run it the place it’s — your laptop computer, workstation, or server — and Clarifai takes care of routing, authentication, and integration.
As soon as registered, the Native Runner opens a safe connection to Clarifai’s management aircraft. Any requests to your mannequin’s Clarifai API endpoint are securely routed to your native runner, processed, and returned. From a consumer perspective, it really works like every other mannequin hosted on Clarifai, however behind the scenes it is working fully in your machine.
Right here’s what you are able to do with Native Runners:
Streamlined mannequin growth
Develop and debug fashions with out deployment overhead. Watch real-time visitors, examine inputs, and take a look at outputs interactively.Leverage your individual compute
When you’ve got a strong GPU or customized setup, use it to serve fashions. Your machine does the heavy lifting, whereas Clarifai handles the remainder of the stack.Non-public information and system-level entry
Serve fashions that work together with native information, personal APIs, or inner databases. With assist for the MCP (Mannequin Context Protocol), you may expose native capabilities securely to brokers, with out making your infrastructure public.
Getting Began
Earlier than beginning a Native Runner, be sure to’ve executed the next:
Constructed or downloaded a mannequin – You need to use your individual mannequin or decide a suitable one from a repo like Hugging Face. In case you’re constructing your individual, take a look at the documentation on tips on how to construction it utilizing the Clarifai-compatible undertaking format.
Put in the Clarifai CLI – run
pip set up --upgrade clarifai
Generated a Private Entry Token (PAT) – out of your Clarifai account’s settings web page below “Safety.”
Created a context – this shops your native setting variables (like consumer ID, app ID, mannequin ID, and so on.) so the runner is aware of how to connect with Clarifai.
You may arrange the context simply by logging in by means of the CLI, which can stroll you thru getting into all of the required values:
clarifai login
Beginning the Runner
As soon as the whole lot is about up, you can begin your Native Dev Runner from the listing containing your mannequin (or present a path):
clarifai mannequin local-runner [OPTIONS] [MODEL_PATH]
MODEL_PATH
is the trail to your mannequin listing. In case you depart it clean, it defaults to the present listing.This command will launch an area server that mimics a manufacturing Clarifai deployment, letting you take a look at and debug your mannequin stay.
If the runner doesn’t discover an present context or config, it’ll immediate you to generate one with default values. This can create:
A devoted native compute cluster and nodepool.
An app and mannequin entry in your Clarifai account.
A deployment and runner ID that ties your native occasion to the Clarifai platform.
As soon as launched, it additionally auto-generates a consumer code snippet that can assist you take a look at the mannequin.
Native Runners provide the flexibility to construct and take a look at fashions precisely the place your information and compute stay, whereas nonetheless integrating with Clarifai’s API, workflows, and platform options. Take a look at the complete instance and setup information within the documentation right here.
You may attempt Native Runners at no cost. There’s additionally a $1/month Developer Plan for the primary yr, which supplies you the power to attach as much as 5 Native Runners to the cloud API with limitless runner hours.
Compute UI
- We’ve launched a brand new Compute Overview dashboard that provides you a transparent, unified view of all of your compute sources. From a single display, now you can handle Clusters, Nodepools, Deployments, and the newly added Runners.
- This replace additionally consists of two main additions: Join a Native Runner, which helps you to run fashions immediately by yourself {hardware} with full privateness, and Join your individual cloud, permitting you to combine exterior infrastructure like AWS, GCP, or Oracle for dynamic, cost-efficient scaling. It’s now simpler than ever to regulate the place and the way your fashions run.
- We’ve additionally redesigned the cluster creation expertise to make provisioning compute much more intuitive. As an alternative of choosing every parameter step-by-step, you now get a unified, filterable view of all accessible configurations throughout suppliers like AWS, GCP, Azure, Vultr, and Oracle. You may filter by area, occasion kind, and {hardware} specs, then choose precisely what you want with full visibility into GPU, reminiscence, CPU, and pricing. As soon as chosen, you may spin up a cluster immediately with a single click on.
Printed New Fashions
We printed the Gemma-3n-E2B and Gemma-3n-E4B fashions. We’ve added each the E2B and E4B variants, optimized for text-only era and suited to completely different compute wants.
Gemma 3n is designed for real-world, low-latency use on gadgets like telephones, tablets, and laptops. These fashions leverage Per-Layer Embedding (PLE) caching, the MatFormer structure, and conditional parameter loading.
You may run them immediately within the Clarifai Playground or entry them by way of our OpenAI-compatible API.
Token-Based mostly Billing
We’ve began rolling out token-based billing for choose fashions on our Neighborhood platform. This alteration aligns with trade requirements and extra precisely displays the price of inference, particularly for giant language fashions.
Token-based pricing will apply solely to fashions working on Clarifai’s default Shared compute within the Neighborhood. Fashions deployed on Devoted compute will proceed to be billed based mostly on compute time, with no change. Legacy imaginative and prescient fashions will nonetheless observe per-request billing for now.
Playground
- The Playground web page is now publicly accessible — no login required. Nevertheless, sure options stay accessible solely to logged-in customers.
- Added mannequin descriptions and predefined immediate examples to the Playground, making it simpler for customers to know mannequin capabilities and get began shortly.
- Added Pythonic assist within the Playground for consuming the brand new mannequin specification.
- Improved the Playground consumer expertise with enhanced inference parameter controls, restored mannequin model selectors, and clearer error suggestions.
Further Adjustments
Python SDK: Added per-output token monitoring, async endpoints, improved batch assist, code validation, and construct optimizations.
Test all SDK updates right here.Platform Updates: Improved billing accuracy, added dynamic code snippets, UI tweaks to Neighborhood Dwelling and Management Heart, and higher privateness defaults.
Discover all platform modifications right here.Clarifai Organizations: Made invitations clearer, improved token visibility, and added persistent invite prompts for higher onboarding.
See full org enhancements right here.
Prepared to begin constructing?
With Native Runners, now you can serve fashions, MCP servers, or brokers immediately from your individual {hardware} with out importing mannequin weights or managing infrastructure. It’s the quickest approach to take a look at, iterate, and securely run fashions out of your laptop computer, workstation, or on-prem server. You may learn the documentation, watch the demo video to get began.