Quantcast
Channel: Joab Jackson, Author at The New Stack
Viewing all articles
Browse latest Browse all 697

Snowflake Polaris Aims for Multiquery Engine Interoperability

$
0
0

SAN FRANCISCO — Apache Spark, Dremio, Python, Trino, and other big data analysis tools will all soon to be able to more easily read and write to Apache Iceberg tables, using the Iceberg REST API, thanks to a new initiative from cloud data services provider Snowflake.

At the Snowflake Data Cloud Summit, being held this week in San Francisco, the company unveiled the Polaris Catalog for the Apache Iceberg open table format, oft used for implementing data lakes and data lakehouses.

The idea is to integrate the fragmented world of big data query engines, according to the company. All too often an organization has to maintain multiple query engines for different sources of data, or move data from one location to another.

“We are bringing together a number of industry partners to make sure that we can give our mutual customers the choice to mix and match multiple query engines to coordinate read and write activity,” said Christian Kleinerman, Snowflake executive vice president of product.

By open sourcing the catalogue, Snowflake is providing a way for Iceberg users with potential interoperability to  Amazon Web Services (AWS), Confluent, Dremio, Google Cloud, Microsoft Azure, Salesforce, and others.

By routing all read and write operations, Polaris allows data operations to read and write a table through multiple engines while maintaining atomicity.

Polaris Catalog implements Iceberg’s open REST API to maximize the number of engines you can integrate.

Polaris Catalog implements Iceberg’s open REST API to integrate with more query engines.

“A standardized catalog protocol for all engines unlocks multi-engine interoperability,” a group of Snowflake engineers explained the technology in a recent blog post.

The Polaris Catalog will be hosted in Snowflake’s AI Data Cloud, though the software will also be released as open source so it can be self-hosted as well, by using containers.

Built on Apache Iceberg Open Tables

Launched in 2020, Apache Iceberg is quickly becoming the de facto format for several different types of large-scale data analysis platforms, such as data lakes, lakehouses and meshes, all made possible by the flexible Iceberg open table format.

Already, the technology is opening up new data-sharing possibilities: Confluent offers the ability for customers to turn data streams into Apache Iceberg via Tableflow. Goldman Sachs’ open source data platform Legend will be able to work more easily with the Snowflake query engine. Salesforce plans to use Polaris as part of its zero-copy data-sharing initiative.

Snowflake and Open Source

This is not Snowflake’s first foray into open source for data analysis software. The company has invested heavily in the Iceberg Tables project, which allows Snowflake customers to use the Apache Iceberg format within Snowflake itself. It has released as open source the Snowflake Arctic Large Language Model, which is written for the enterprise. It also contributes to Streamlit, a popular Python project for rending data scripts as web applications.

Snowflake will hold a webinar on July 23 to provide more details on how the catalog works.

The post Snowflake Polaris Aims for Multiquery Engine Interoperability appeared first on The New Stack.

An open source catalog for Apache Iceberg, Snowflake Polaris provides a way for multiple query engines to write Apache Iceberg tables.

Viewing all articles
Browse latest Browse all 697

Trending Articles