How KitOps Is Used π οΈ β
KitOps is the market's only open source, standards-based packaging and versioning system designed for AI/ML projects. Using the OCI standard allows KitOps to be painlessly adopted by any organization using containers and enterprise registries today (see a partial list of compatible tools).
Organizations around the world are using KitOps as a "gate" in the handoff between development and production. This is often part of establishing golden paths and platform engineering around AI/ML projects.
Those who are concerned about end-to-end auditing of their model development - like those in regulated industries, or under the jurisdiction of the EU AI Act extend KitOps usage to security and development use cases (see Level 2 and Level 3 use cases below.
Level 1: Handoff From Development to Production π€ β
Organizations are having AI teams build a ModelKit for each version of the AI project that is going to staging, user acceptance testing (UAT), or production.
KitOps is ideally suited to CI/CD pipelines (e.g., using KitOps in a GitHub Action, Dagger module, or other CI/CD pipeline flow) either triggered manually by the model development team when they're ready to send the model to production, or automatically when a model or its artifacts are updated in their respective repositories.
Security conscious organizations often can't allow internal teams to use any publicly available model off of Hugging Face because:
- Their licenses would put the organization at risk
- They don't match the organization's security testing requirements
- Their provenance isn't understood
In these cases teams may use a pipeline (with GitHub Actions, Dagger, or another tool) to pull models or sample datasets from Hugging Face, run them through a battery of tests, then publish them in tamper-proof and signed ModelKits to their private container registry.
This ensures that:
- Everyone has a library of safe, immutable, and signed ModelKits speeding development without compromising security
- Safe models can be deployed for development or production use cases
- Operations teams have all the assets and information they need for testing, deploying, auditing, and managing AI/ML projects
- AI versioned packages are held in the same enterprise registry as other production assets like containers making them easier to find, secure, and audit
- Compliance teams have a catalogue of versioned models that can be used for EU AI Act or other regulatory reporting
- Organizations are protected against vendor shifts in their MLOps and Serving Infrastructure domains (this also gives them negotiating leverage with vendors)
Get Started:
- Kit Dagger Modules: Kit Dagger modules make it easy to pack and selectively unpack ModelKits to speed pipelines.
- Kit GitHub Action: Our Kit GitHub Action is used to build hundreds of ModelKits every day as part of pipelines.
- Learn to pack and unpack ModelKits
- Create containers or Kubernetes deployments directly from ModelKits
This is where most organizations start their usage of KitOps, but once they start most continue on...
Level 2: Adding Security π‘οΈ β
Some organizations want to scan their models either before they enter the development phase (ideal), or before they are promoted beyond development. The open source ModelScan project can help here.
After model development has been completed, the resulting ModelKit and its artifacts can again by scanned by something like ModelScan and only allowed to move forward if it passes. Using ModelKits here again ensures that a model that passes the scan is packaged and signed so that it cannot be tampered with on its way to production.
Even with this level of scrutiny, however, there remain some risks since the varying repositories and versioning of artifacts during development can invite accidental or malicious tampering. This leads us to Level 3 adoption...
Level 3: Storage for all AI Project Versions πΎ β
Organizations that are more mature in their handling of AI projects, or are subject to extra scrutiny or regulations extend the use of ModelKits to the development phase as well. This solves the currently scattered storage of artifacts.
This ensures that:
- The organization uses a standards-based storage (OCI 1.1 artifacts) for all their AI/ML project work
- Data and AI teams who don't work closely together can share artifacts even if they're using different development tools
- Development artifacts can't be accidentally or maliciously tampered with
This can be a lightweight process automated with ModelKit packing via the Kit CLI.
The beauty of KitOps' ModelKits is their flexibility and the fact that they fit into the already standard tools and processes that organizations have built around OCI artifacts and registries.
Below are details on how we have used KitOps internally from development to production.
Using Tags π·οΈ β
We used ModelKit tag names to give team members a quick way to determine where an AI project was in its lifecycle and whether they needed to get involved. Since ModelKits are immutable but use content-addressable storage, two ModelKits with the same contents but different tags only result in one ModelKit in storage (with two tag "pointers").
- tuning: dataset is designed for model tuning
- Used for ModelKits that contained only datasets. These were used almost exclusively by the data science team.
- validating: dataset is designed for model validation [1]
- Used for ModelKits that contained only datasets. These were used almost exclusively by the data science team.
- trained: model has completed training phase [2]
- ModelKit would include the model, plus training and validation datasets. Other assets were optional depending on the project.
- tuned: model has completed fine-tuning phase [2]
- ModelKit would include the model, plus training and validation datasets. Other assets were optional depending on the project.
- challenger: model should be prepared to replace the current champion model
- ModelKit would include any codebases and datasets that need to be deployed to production with the model. The idea was to create a package that could be deployed via our existing CI/CD pipelines (one for code, one for serialized models, one for datasets).
- champion: model is deployed in production
- This is the same ModelKit as the
challenger
tagged one, but with an updated tag to show what's in production. This just creates a second reference to the same ModelKit, not 2x the ModelKit storage.
- This is the same ModelKit as the
- rollback: the model was the previous "champion" model, before it was replaced by the "challenger"
- This is the same ModelKit as the
champion
tagged one. We kept it so that we could quickly deploy it to production if there was a catastrophic failure with the currentchampion
model.
- This is the same ModelKit as the
- retired: the model is no longer appropriate for production usage
- We kept old ModelKits so we could trace the history of development. When they were too old to be informative we used
kit remove
to get rid of them.
- We kept old ModelKits so we could trace the history of development. When they were too old to be informative we used
Here's how we implemented those tags and changes in our repo throughout the AI project development lifecycle.
Development π§ͺ β
Our data scientists developed models in Jupyter notebooks, but saved their work as a ModelKit at each development milestone to facilitate collaboration across our teams. This was simple by using the kit pack
and kit push
commands in their notebook:
For example, if the data scientist had completed fine-tuning of the model and our enterprise registry was GitLab (registry.gitlab.com
), our repository was βchatbotβ and the model was called βlegalchatβ then the commands would be:
kit pack . -t registry.gitlab.com/chatbot/legalchat:tuned
kit push registry.gitlab.com/chatbot/legalchat:tuned
Now everyone knew that the Legal Chat model had completed fine-tuning and they could quickly pull, run, or inspect the updated model, datasets, or code.
Integration π§βπ³ β
Once a new ModelKit completed training or tuning, our app team would pull the updated ModelKit:
kit pull registry.gitlab.com/chatbot/legalchat:tuned
The team would test their service integration with the updated model, paying special attention to the performance characteristics and watching for features that may have changed between model versions.
Testing π½οΈ β
Once the application team had confirmed the new model didnβt change the behavior of the application, an engineer from outside the data science team would validate the model. To do this they would run the new model with the validation dataset included in the ModelKit, and compare their results with the results from the data science team. This engineer generally didn't need the codebases in the ModelKit so they would just unpack the model and datasets.
kit unpack registry.gitlab.com/chatbot/legalchat:tuned --model --datasets
This guaranteed that they were using the same dataset as the data science team to replicate the results, which is otherwise difficult because at most organizations, datasets arenβt housed in a versioned repository and can easily (and silently) change during the course of model development.
If the model passed validation the QA team would retag it from tuned
to challenger
to alert the SRE team that it should be readied for deployment.
kit tag registry.gitlab.com/chatbot/legalchat:tuned registry.gitlab.com/chatbot/legalchat:challenger
kit push registry.gitlab.com/chatbot/legalchat:challenger
Deployment π’ β
Adding the challenger
tag to a ModelKit was a trigger in our team for the DevOps group to know that it was ready to deploy to production. They would take the serialized model from the ModelKit and ready it for deployment via our pipelines. This may mean putting the model into a container, but it may mean using an init container, a sidecar, entrypoint, or post-start hooks.
Once the model was deployed and validated in production the ModelKit for the model that was previously tagged champion
would be retagged to rollback
, and the ModelKit for the challenger
would be retagged to champion
. These changes were part of our deployment automation and ensured that we were always ready to quickly redeploy the rollback
model in case something catastrophic happened in production with the current champion
.
kit tag registry.gitlab.com/chatbot/legalchat:champion registry.gitlab.com/chatbot/legalchat:rollback
kit push registry.gitlab.com/chatbot/legalchat:rollback
kit tag registry.gitlab.com/chatbot/legalchat:challenger registry.gitlab.com/chatbot/legalchat:champion
kit push registry.gitlab.com/chatbot/legalchat:champion
If you use KitOps differently and want to share it go ahead - we'd love to hear about it!