Like many companies, commercial Python distributor Anaconda has pivoted into being an AI tools company, and in doing so, solidifies its role in this emerging ecosystem.
While building frontier models is sexy work, it requires lots of resources, and the payoff is far from certain. Anaconda offers another way: The company has just released its AI platform, which streamlines a lot of the grunt work in exploring this new realm.
At the most recent PyCon conference, we spoke with Peter Wang, a co-founder of Anaconda and currently its chief AI and innovation officer, to learn more about how Anaconda and its users are preparing for the AI age.
We chatted about the company’s AI incubator, the new AI platform and the role that open source should play in AI.
The interview has been edited for clarity and length.
A few years back, Anaconda launched an AI Incubator. It has since been decommissioned. But was it useful? What did the company learn?
If you think about it, 2023 was not that long ago, and yet seems like a lifetime ago. There were so many things happening, like every single day. It was going so fast.
Because it’s an exploding space, you have to just try things in order to actually learn what works and what doesn’t work. But at the same time, we’re a pretty old startup. We’ve been around for quite some time, and we have a real business and real customers, real products that we have to keep iterating on.
So how do we do all that while also stay abreast of what’s real, with the AI hype that was happening? The incubator explored a few areas that we felt were almost certainly going to be important.
Open source is actually the most pro-market thing you can do, because it’s a marketplace of ideas, right?
One of those was decentralized AI and smaller models. Another was interpretability. One was trying to understand the legality of how models train on datasets.
And there’s [an investigation on] what the future of open source means when source isn’t the main thing anymore, and it’s actually all the data, while the data is not transparent, you know?
So people claim they have open source models where they only have open weights, but they’re not telling the source of the data. So it’s not actually open source, right?
There was a lot of collaboration and interchange of ideas with people internally in product engineering, as well, with some external folks as well. And so it was, unfortunately, a bit of a chaotic mess, but at the end of the day, it achieved a lot of things we wanted to achieve. We wanted to accelerate innovation and AI, and land those innovations into products.
How did you sell the incubator to the people you needed to sell it to?
I mean, I am on the board, and when we launched it, I was a CEO. So, like, it was a pretty straightforward conversation. Our investors have a very long-term view. And they understand that with any business in any industry, you’re going to have disruption and innovation cycles, and you have to say ‘yes’ to those things, right? And so this is just our time to say ‘yes’ to some of that stuff.
Recently, the company launched the Anaconda AI Platform, which Fast Company dubbed “the GitHub of enterprise open source development.” Tell us about that.
Actually, it’s more than that, right? It’s really easy for us to try to draw parallels to what we know what exists. And the entire theme through that particular conversation with Fast Company was the future of software doesn’t look like software. And it doesn’t just look like models either, right?
The future of these kinds of information systems are going to be a fusion, a combination of code and data. How do you manage all the code stuff and the deployment stuff, and how do you evaluate whether the performance is falling off? It’s not that different from traditional machine learning [ML], yet the difference is that the scale is just much bigger. The kind of people creating these systems are not machine learning engineers or data scientists; they’re random end users.
And so when the end users want to collaborate with each other, GitHub won’t suffice, because GitHub is for source code for devs, right? You can also use it, of course, to store large objects and binary objects, things like that.
But at the end of the day, it’s really just around the collaboration of the source code, whereas what we see the need in enterprises, and what we’ve seen from users, is that they need to manage the collection of all these things, the code they’re writing, the upstream, open source dependencies, the models, the fine tunes of the models.
There’s a million different models on Hugging Face. They already are a place to store all these models. GitHub is a place to store all this code, but where do you bring all this together to actually, you know, to cook, to bring these ingredients together and make the thing?
And then, once you have the artifact, once you have a deployed AI system, or an agent, or whatever you want to call it, how do you manage that? How do you make sure you can roll it back? How do you reproduce that result six months down the road?
None of the individual technical problems are like impossible to solve. There’s 50 different ways you could version a model, and there’s 50 different ways you can take code and model together and run them on some cloud server. And that’s exactly the problem, that there are 50 different ways to do each of these things.
So if you have an end user who does it one way, another end user does a different way, a third user does it a third way, what is it supposed to do to support all these users? How do you, as someone who manages those users, provide a platform for all this stuff?
The AI Platform is not just about this feature or that feature. The overall vision of it is to have a place for the people who are responsible at the business level for the integrity of these systems, for monitoring them, and all that stuff.
And this is really built on top of all things we learned in building AI Platform: building data science platforms, supporting data science workflows.
It also sounds a lot like the initial motivation for the 2012 launch of Anaconda (the company’s namesake Python platform).
Everyone pretty much knows Anaconda for being the way you get Python and use it on a machine. And this is what most end users, like a student or a hobbyist, won’t encounter at their business, that they can’t just install random software packages. They have to deal with central IT. And so we built our enterprise products and our commercial products to be able to [do this].
So I think the requirement for transparency is an obvious one, a necessary one, a demand that will not go away.
Because even though we’re more than 30 years on now since the invention of Linux and the giant explosion of open source in the 90s, most businesses still don’t really have a good way of managing open source. It’s still really hard for [enterprises] to deal with that.
I wanted to ask you about open source and AI. Do you see open source as essential for the development of AI?
There are two related but separate questions. One is the importance of AI being transparent and governable, so that we can see what went into the training, what is the code that’s computing the thing, all these things being transparent. You can achieve that without it necessarily being open source. You just need an ingredient label on the side of a box. You don’t have to tell me exactly what percentage of what was mixed into what, but you got to at least tell me what is in my food.
I don’t think that’s a very exotic position to take. The motivations for it are almost too obvious to mention. If you actually use an AI system to make determinations about things of consequence, it almost should go without saying that we want to know what went into it.
So I think the requirement for transparency is an obvious one, a necessary one, a demand that will not go away. And the only reason any of this is controversial is because the really extremely well-capitalized and wealthy companies that control and dominate the conversation right now, they would rather not talk about that.
Now, the second piece of it: Open source, if you’re truly open source, meets the transparency argument, because you can expect and see what went into it. I think it’s fine to have proprietary AI models, as long as you meet that measure of transparency and accountability.
Now, why should we still demand to have open source models beyond just the transparency? For me, the reason is because this is such incredibly powerful technology, and it’s so impactful and it’s so early stage, the innovation surface is absolutely massive.
And one of the things that open source does is it allows everyone to go and explore and build wonderful things.
Back in like, the 90s, when Microsoft and others were so against open source, the argument was always that it’s socialists and communists who want open source. And us capitalists, we support markets.
I think that actually that’s completely backwards. Open source is actually the most pro-market thing you can do, because it’s a marketplace of ideas, right?
For something as important and foundational and open as AI, we want as many smart people as possible able to pick this up and use this, and try to build things.
The n-squared effect of innovation, when you can turn the entire planet on, is so different than when it’s just a few 100,000 employees in the coastal cities of America. For me, that is the big open source argument. It’s a very pro-human, pro-humanity argument. It’s a pro-market argument. You get cheaper innovation, you get faster innovation, and you get more cost-effective innovation.
Do you have any lessons in running an open source AI company?
Yes, open source AI companies will not be the same as open source software companies.
Number one, open source software companies are exceptionally hard to do, and also exceptionally rare are the ones that have really scaled.
The most important thing to think about if you want to build a truly scalable open source business, it almost has to be a platform. And the reason is because what open source does is it pre-commoditizes the innovation.
And that’s kind of weird, because usually innovation is a thing that people want to pay for, right? Like, you’re gonna pay for the next year’s model of the car because it’s faster, it’s better, it’s got some other fancy doohickeys on it. You pay for the next year’s version of some software because it’s got some fancy bells and whistles on it.
But with open source, what we’ve seen is that the world’s able to innovate freely together, and so you really can’t charge for that innovation in that same way. But the wisdom is if the innovation is commoditized, and the innovation is free.
Well, usually innovation isn’t the whole story. That might be the meat, but there’s a lot of potatoes and vegetables and soft drinks and various other things that complement that. So if you want to be a successful open source business or open AI business, then you really have to think about how to complement that and monetize the complement.
At The New Stack, we were given the marching orders to use more AI. Do you have any advice for organizations, enterprises, startups? How do they get started?
There’s lots of people with lots of options for AI. With us, if you just go to anaconda.com, we have lots of different ways that you can use our stuff. If you download Anaconda, we have ways that you can just run AI models locally. So you can actually run them on your own machine, if you don’t want to send your data to the cloud. If you’re comfortable with using cloud services, we have AI coding assistance built into the Jupyter notebook.
We also have other things that we’ve done, little experiments and fun things we’ve done with doing AI through PyScript. So you’d be in the cloud doing serverless Python.
If you’re an enterprise and you want to use AI on prem or in a private way, without sending your data out to anyone else, then we have our AI enterprise platform. It’s precisely to give enterprises this flexibility of running AI on their own terms, on prem securely, or if they want to take advantage of the proprietary models running the cloud, they can harness that as well.
What we find is that many people have hybrid setups, where they have local stuff, they have on prem servers, and then they have cloud things. So they want to use all these things. They want a way to do this portably, and with our platform, you can do all of that, right?
That’s the difference that we have between us and many of the cloud vendors who might just have only the cloud-hosted solution.
If you think about like my point earlier about solving for the complement. Training some 400-billion-parameter model is very sexy, but lots of people, with lots of money, are doing that, and they’re giving those models away for free. So maybe don’t build a business without trying to compete with them.
Many people don’t have the hardware it takes to run those models for inference or fine-tuning. We can go, we can quantize these, and provide a one-click way to fine-tune some of these things. We can give them golden paths. Do you want best practices on how to set up a RAG [retrieval-augmented generation]? Those are the kinds of things that all come together through this single, unified enterprise AI platform, and that’s complementary to the big, sexy innovation, which is, of course, the frontier model itself.
The post AI Needs Open Source: Q&A with Anaconda’s Peter Wang appeared first on The New Stack.
The co-founder of Python distributor Anaconda views open source as the best path forward to explore all the innovations inherent in AI.