Quantcast
Channel: Joab Jackson, Author at The New Stack
Viewing all articles
Browse latest Browse all 697

Q&A: Cockroach Labs’ Spencer Kimball on Distributing SQL

$
0
0

When the young programming whiz Spencer Kimball joined Google in 2004, the up-and-coming search company — like everyone else in Silicon Valley — sharded its databases to overcome storage limitations and performance latencies. But it was a stopgap solution at best.

“They sharded and that sharding was ugly,” Kimball said.

After Kimball had spent about a decade at Google, he, along with then-Google colleagues Ben Darnell and Peter Mattis, co-founded Cockroach Labs, with the idea of building a database system that could scale to Google proportions, while meeting the security and flexibility requirements for the enterprise.

And so they created CockroachDB, the first commercially available SQL-distributed database that could also support transactions with full ACID compliance. Like the pesky bug it was named after, CockroachDB was designed to be as resilient as possible.

Don’t let the dapper fits that Kimball attires himself in for public appearances fool you. He has major geek credentials: Kimball actually created the popular open source GNU Image Manipulation Program (GIMP) while in college. And so when we caught up with his at the company’s last annual user conference, Roastfest 2023, we got a great overview of how distributed databases have evolved over the past decade, and how CockroachDB came about.

The transcription was edited for clarity and brevity.

Could you tell us the origin story of Cockroach Labs? 

The impetus for Cockroach was born in my career at Google. Google had a very common set of requirements for their application use cases that are common today and were very unusual back then. Google was building data centers all over the world, and they had done it with real commodity hardware. Everyone had a Sun Microsystems server in order to run Oracle, that’s how it worked in the dot com era.

Google, because of its immense scale, started building indexes. It was very different: you couldn’t put it on an Oracle database anyway. But when they started going beyond just having web search, and they started building products on top of it — AdWords was the first one — [the engineers] realized needed databases. Not only did they need traditional databases, but they needed them at a massive scale.

So Google was an extremely tech-heavy company and that led to quite a string of innovations.

An experiment on AdWords was started on MySQL because they wanted a relational database of transactions. So I walked right into that project. I actually came from having done a dot.com myself, and we used Java very heavily. And when I got to Google, the thing I first got put onto was Java infrastructure.

A lot of what I worked on in that initial phase was building Java applications that worked like Google’s C++ operations in production. I wrote the Google servlet engine, which was a replacement for Tomcat, so they could run Java the way that they were running C++ binaries. That’s how I met Ben.

So the bottleneck for search was databases…

Google did it like everyone in the valley: they sharded and that sharding was ugly.

Ultimately, like any scalable solution — and this is true for Cockroach as much it was true for the manual process of trying to partition MySQL — You have a scalable system in theory, but then as you actually get those levels of scale, you find problems you weren’t expecting.

Let’s say you have eight shards of MySQL and you have a bunch of Java AdWords frontend application servers, and they have all these connections starting to overload all the databases. So you got to figure out what to do there.

You’ve got hundreds of thousands of users, and they’re all logging in. We sharded based on the user ID. So if you have the ID, you can easily know exactly, you go to shard seven.

But if you want to look at a secondary index, like for example, a user name, or an email address, instead of an user ID,  you don’t know where that is. So what they did is just queried all the shards at once, in parallel. This was a big fan-out. Every database was getting literally millions of requests, and it was knocking over the database servers.

I won’t go into the technical details, but it was a big project just to solve that problem. And we went to 32 shards. Eventually, they had 1,000.  And that was making every one of the MySQL servers bigger and bigger and bigger.

That AdWords experiment persisted for like 10 years. That architecture, however, was outlawed [at Google]: It was an anti-pattern: “You cannot do MySQL sharding ever. ‘That set the stage for Google to find other ways to build databases.

Everything they built could have a billion users. People didn’t have to log in to Google often, but they had cookies, right?

So until a cookie was cleared, you could know that a particular user had certain interests,  did certain kinds of queries, had a search history, like. So maybe that cookie existed for a week or a month or something like that. And it actually would allow you to improve the search results for a particular user.

But that requires storage on a per-cookie basis. Now you got a billion names you’re trying to create some history for. So even if you wanted to shard MySQL, it’s not reasonable.

So they started building Bigtable.

I’ll tell you an interesting story. NoSQL existed for more than a decade without having transactions. And people thought that was OK. When Google released Bigtable, it didn’t have transactions, but it was elastically scalability. It was a great system for certain use cases when you really didn’t need transactions. The infrastructure team that built it had some of the smartest people at Google.

But when the AdWords team looked at Bigtable, they said, ‘Are you kidding? We’ve got 500 tables. We have transactions for everything. There’s money at stake. We cannot use Bigtable.’

So the infrastructure team added some sorts of transactions to Bigtable, and they call that Megastore.

At that same time, they realized that these data centers they were building were always going down. Bigtable was struggling with that. So Megastore actually used Paxos for consensus replication, and that was a big innovation.

That 10 years that I was at Google, and seeing all those different evolutions of very scalable database products, informed my outlook, as you can imagine.

So when Peter, Ben and myself left Google, we started this company. Even though it’s a small company, because we were infrastructure engineers, as well as application engineers. we wanted Google-level level infrastructure.  Why build something that regressed 10 years?

We’ve seen all this improvement. We wanted to have the same capabilities. We wanted to make this system scale without all this extra work to do this manually.

Today CockroachDB appears to be marketed chiefly to system architects, rather than developers. Why is this the case?

We’ve struggled with that question, because there’s a lot more developers out there in the world. And if a developer wants to use Cockroach, they’re going to use it, right? They’ll start with it, and they can graduate into something valuable and successful.

However, turns out that developers, they just care about the familiarity and the ease of use developer experience. They don’t worry so much about the problems that CockroachDB solves. And we’ve put a lot of effort into making Cockroach look like Postgres, a familiar standard.

But CockroachDB is not a monolithic database. It doesn’t feel like one when you really get down into it. You might have to worry about different things. And if you really want it to be scalable across regions, you’ve got this latency, so you have to be careful about how you build your application.

There’s a complexity there. It’s not there when you just have a simple relationship between an application and your database. The beauty of Cockroach is all of these advanced capabilities for survivability and massive scalability and things like that. The developer is not trying to solve those problems. The extra complexity is just a cost. But for an architect, that is their job to solve these issues.

So you can see the quandary there.

Spencer Kimball

So what we realized is that Cockroach is actually interesting and incredibly aspirational for chief architects and CTOs at startups. They want to use Cockroach just like I did. When I was at Google, Cockroach would have been a slam dunk. It’s those use cases where the database absolutely can not go down — those are customers that actually need Cockroach.

So we’ve actually gotten a lot more focused as a company. At the beginning, you want anyone that’s going to use your system to help you understand it and develop it. But as we’ve now gotten to the point where even the big enterprises are using Cockroach, that really helps us focus is who we try to talk to, it’s just exclusively at the high-end market.

And that helps. If you try to talk to the high end and the low end at the same time, you’re really not having a good message for either one, because the high end doesn’t quite hear their problems. And if you’re talking to the low end, the high end doesn’t hear about any of their problems.

And this has profound implications for our roadmap? Like, is it about free databases? Or are the features that matter to an enterprise like crazy security, bulletproof resiliency, or the mind-blowing capabilities of running actively across cloud vendors?

Why is portability across clouds important? 

It’s one thing to be able to move between clouds, it’s another thing to be able to actively survive a cloud failure, or turn off a cloud. Like, that’s a pretty phenomenal feature of CockroachDB. And if we can talk about that, and make that a message, obviously, it’s going to appeal to chief architects, and even in some cases, the people that are responsible for the business strategy of these big companies.

So we’re actually looking at our mandate as more than just offering a database. The vendor-concentration risks that these enterprises used to face with Oracle were pretty bad. Now if all your products are with one cloud vendor, that cloud vendor can become such a source of lock-in that it can really affect your ability to negotiate.

So for various reasons, big companies are starting to have a greater and greater multicloud footprint. I think there’s an opportunity for these companies to take more control over their destiny from their cloud vendors.

We’re offering some degree of cloud portability and flexibility. Can you skate above the public cloud? Are you building your applications or your company in a way that you’re using the cloud vendors in the way they ought to be used — as providers and not necessarily as such a critical vendor that you’re locked into its orbit?

What do you see as emerging trends in the database space?

Certainly support for AI. And I think it’s a good time to be in infrastructure in general. AI is going to be an accelerant. Everyone’s going to reimagine their existing use cases, and many new things will be built. And I think that’ll be true in every business — certainly the ones that are already leaning into Cockroach. These are businesses that are tech inquisitive.

No matter what you’re building, you always need that system of record, an operational database for like every single use case in the world. Just two years ago, when I did interviews, everyone asked me if Cockroach would do a crypto blockchain. No, that’s not our plan. But every single crypto company still needs to be an extremely scalable system of record.

AI is going to, by some estimates, add trillions of dollars to the global economy in the decade ahead. And that’s clearly going to involve a lot of additional data being written.

TNS editor-in-chief Heather Joslyn contributed to this post. 

The post Q&A: Cockroach Labs’ Spencer Kimball on Distributing SQL appeared first on The New Stack.

How watching Google evolve distributed transactional databases inspired three engineers to bring these innovations to the enterprise.

Viewing all articles
Browse latest Browse all 697

Trending Articles