Beyond Open Weights: The Path to Unconditionally Open AI

While I routinely work with both proprietary LLMs and open-weights models, my heart lies with models that are open in the fullest sense. Very early on, I noted that for foundation models, ‘open’ must comprehensively cover not just weights but also data, code, and the detailed recipes crucial for genuine reproducibility. Measured against this standard, few initiatives have captured my attention like Oumi Labs. This public-benefit corporation champions ‘unconditionally open’ foundation models and the collaborative tooling to build them, recently demonstrating their prowess with HallOumi, a claim-verification model that punches above its weight. Guiding this mission is CEO Manos Koukoumidis, whose journey through AI includes foundational research in edge intelligence at Princeton and MIT, pioneering NLP work at Microsoft – where he developed an LSTM-based ChatGPT precursor and an early RAG prototype back in 2016 – and scaling PaLM at Google Cloud with a 300-person team. What follows is a carefully edited excerpt from our conversation, delving into Oumi’s vision for making open models the standard for serious, production-grade AI.

Enjoy these insights? Consider supporting our work.

Company Structure and Team

What is Oumi and what does the name stand for?

Oumi is an open source AI lab. The name stands for Open Universal Machine Intelligence, with the tagline “Let’s build better AI and open is the path forward.” It’s structured as a public benefit corporation (PBC), which means it’s for-profit but has a strong, legally binding mission to benefit the public.

What are “founding scholars” at Oumi?

Founding scholars are academics who were involved with Oumi in the very early days, some even before the company was officially incorporated. They have equity stakes and are more deeply involved than typical advisors. There are around 15 founding scholars with more collaborators joining as the project grows.

Open Source Vision for AI

What do you mean by AI having a “Linux moment,” and what does “unconditionally open” mean?

When we say “truly open,” we adhere to the OSI standard which requires open data, open code, and open weights. But we go beyond this to “open collaboration” – making it easy for others to reproduce, extend, and contribute to making models better. If something is open but people can’t push it forward, it doesn’t help much.

Just as Linux became the foundation for operating systems, AI models should become a common utility that anyone can build upon. The community needs all the necessary pieces to replicate and improve upon the work without barriers.

Why is this openness important for AI development?

AI has become the foundation not just for the tech industry, but for healthcare and all sciences. It would be a disservice not to make it a public utility that’s easy for anyone to leverage and contribute to. The foundation models should be a common utility that benefits everyone.

How does the current state of “open” models compare to your vision?

Currently, even the most open models like Llama, DeepSeek, and Alibaba’s models only provide open weights. While this is a great start and we’re grateful for these efforts, it’s not the full picture of what “open” should mean.

For the near term, we’ll likely see openness primarily in post-training rather than pre-training (which requires enormous resources). Pre-training massive models from scratch is currently prohibitive for smaller organizations, but there’s a huge opportunity for the open community to take existing open models and make them better through post-training collaboration.

Collaboration and Governance Models

How do you envision collaboration in open AI development to work effectively?

We need a standardized platform with standardized benchmarks where contributions can be validated and combined. When someone contributes an innovation in data curation, training methods, or other areas, it should be done on an end-to-end platform that captures all aspects of the work. Contributions that demonstrably improve performance can be combined to create better models.

AI is an incredibly complex field with diverse use cases and modalities; it truly requires “all hands on deck” – not just model builders but data engineers working on pipelines and data cleaning can make valuable contributions.

How do you address the signal-to-noise problem in open collaboration?

We focus on contributions that move the needle on benchmarks without regressions in other areas. When a contribution shows promise, many eyes look at it. As Linus says, “given enough eyeballs, all bugs are shallow” – or for AI, all issues are shallow.

Rather than reviewing every single fork, we focus on results. If a recipe moves a trusted benchmark positively with no regressions (including safety), it bubbles up for consideration. You don’t need to review every contribution in depth if you focus on those that show clear, positive impact.

How do you handle potentially problematic contributions, like data with IP violations or unsafe model behaviors?

I believe in Linus’s Law: “Given enough eyeballs, all bugs are shallow.” When development is done in the open, with many people scrutinizing the process and the data, it’s arguably safer. For AI, given enough eyeballs, all AI issues (safety, bias, IP) are more transparent and addressable.

A glass-box approach to how models are built and what data is used is crucial. If a contribution moves the needle positively, it will attract more scrutiny, which helps in vetting it for potential issues.

The Oumi Platform and Technical Capabilities

What does the Oumi platform provide to developers?

The Oumi platform enables experimentation with foundation models across different training types and model families. Think of it as the “DevOps layer” for foundation-model R&D. It includes:

A unified API covering data curation, data synthesis, all types of training (pre-training and various post-training techniques like LoRA), and evaluation with academic or custom benchmarks.
Flexible deployment options – run on your laptop or scale to the cloud with just a configuration change.
Built-in pipelines to synthesize data with any LLM, score quality, and clean datasets.
Extensible trainer and benchmark harness.

These capabilities address common needs in both academia and enterprise.

How does the platform handle compute resources?

Users bring their own compute. You can run the platform on your laptop or deploy to AWS, GCP, Azure, Lambda, Together, RunPod, or HPCs by changing the deployment configuration. We’ve tested it on national lab HPCs with over 1,000 GPUs. For some collaborators, we provide compute, and our enterprise offering will include compute resources.

What data processing capabilities does Oumi provide?

We make it easy to synthesize new data or curate data using existing models. You can use any open or closed model through Oumi to handle batch inference for data synthesis. You can also use LLMs to rate data quality and clean it up.

While we optimize data loading and streaming for training, our current focus for data tooling is more on synthesizing new data or creating/augmenting datasets using LMs. These jobs can be scheduled on various cloud providers or on-premise clusters.

Are you tied to a specific orchestration stack like Ray?

No. We don’t yet use Ray internally, but the platform is designed such that someone could integrate Ray for distributed training if they wish. You can schedule Oumi jobs on Ray or Slurm if that fits your infrastructure.

Halloumi: AI Claim Verification

What is Halloumi and why did you start with this project?

Halloumi is an “AI lie detector” or more precisely, AI claim verification. It checks that every part of an LLM’s answer is grounded in the provided context and not hallucinated. It works for summaries, question answering, or any context-based LLM output.

It provides per-sentence confidence scores along with citations and explanations – pointing to the specific lines in the original document to verify against. It offers state-of-the-art quality, significantly better than general-purpose models like GPT-4 or Gemini for this specific task.

We started with Halloumi because hallucinations are a major blocker for enterprises adopting AI in production. It also served as an excellent test case for developing and validating the Oumi platform itself.

How is Halloumi being used?

We’ve seen interest from fintech firms using it for RAG scenarios and others wanting to cross-check if articles they’ve written align with their notes. Some use it as a final QA pass when drafting articles to flag ungrounded statements before publication. The inputs to Halloumi are simple: the context provided to the LLM, the original prompt, and the LLM’s generated response.

Open AI Safety Considerations

How do you respond to concerns that open models could lead to safety issues?

While I deeply respect figures like Hinton who have higher “P(doom)” estimates, I align more with Yann LeCun’s perspective. The current development approach at big labs focuses on racing each other rather than investing enough in safety.

The best way to address safety is to do it openly, with the community collectively working on it before AI becomes even more powerful. When development is done in the open, with many people scrutinizing the process, issues are more likely to be caught early.

What’s your approach to AI safety?

Even if the probability of catastrophic outcomes is small, the potential consequences mean we need to address it. Beyond concerns about AI going wrong independently, there’s the risk of it learning harmful behaviors from human data or being used by bad actors.

We need to build “protector AI” that can detect and prohibit misuse, and the best way is through open development since closed labs aren’t doing enough. Safety research and development must happen openly and collaboratively, now, not later.

Future Vision

What’s your vision for the future of open AI by 2030?

I’d like to see the majority of the enterprise AI market powered by open source AI that’s developed collectively and transparently. Ideally, this would be fully open source AI – with open data, open code, open weights, and open collaboration – enabling both enterprise and scientific applications.

Some argue that as AI models become proficient at coding and AI development (a “flywheel” effect), the need for “all hands on deck” might diminish. What’s your take?

It’s a reasonable point. My hope is that if such a flywheel develops, it’s built in the open. The open community is well-positioned to compete, especially in post-training, where innovation and novel ideas are paramount, often more so than just raw GPU power. The ability to rapidly iterate and prove concepts at a smaller scale, leveraging diverse global talent, is a strength of the open approach.