Which is the best way to connect data across multiple sources while supporting large data sets, complex data structures, and real-time needs?
The growth and increasing complexity of enterprise data architectures has created a need for the ability to query multiple datasets--which might reside in entirely different repositories owned by different departments--simultaneously.
Trino, formerly known as PrestoSQL, is a community-driven open source project originally developed by META (aka Facebook) to query their very large Hadoop system. It has evolved to query both relational and non-relational data sources in federated architectures, and does so by leveraging a high performance ANSI SQL engine, which is standardized across the major SQL versions. Starburst, the commercial version of Trino, is available for purchase with enterprise level support.
Starburst provides a solution that addresses data silo & rate of access bottlenecks. The company helps organizations make the most of Trino, the distributed analytics engine. At the core of both Starburst and Trino is a query acceleration engine that uses SQL to query one or more data sources. So they both rely on SQL statements to work. If you’re new to Presto, you may not know that there have been 2 different Presto branches (PrestoSQL and PrestoDB) ever since the creators left META in late 2018.
Setting the Foundation:
PrestoSQL is now called Trino
PrestoDB is now called Presto
Starburst, as a member of the governing board of the Presto Foundation, will be working to harmonize the codebases and establish a Presto conformance program
If you use Starburst for Presto, nothing really changes
Use Cases for Trino and Starburst:
The primary use case for the development of Trino, is allowing SQL-based analytics of data lakes and warehouses. The main reason for Trino implantation is data analytics.
A business professional has to enter the query by using SQL or via a user interface and then waits for the results to come back in a timely manner.
Querying disparate data in the same system with the same SQL simplifies analytics that require knowledge of the large picture of all your datasets.
Trino is able to speed up data engineering while reducing the need for traditional ETL processes. This approach allows data professionals to use standard SQL statements, and work with many data sources from one system without needing to move data to a central repository.
What are the differences?
Presto was originally created to solve for slow queries on a 300 PB Hive Data Warehouse. The founders needed to create a SQL-based MPP engine that would be user-friendly for those with coding skills, compatible with many databases, warehouses and data lakes. The founders also wanted to make sure it was easy to integrate with any BI tool and aimed to help companies solve for speed and cost-efficiency of data access at a massive scale.
On the other hand, the Starburst team sought to solve the pains of data access. Many BI leaders complain that data access is too slow, inflexible, and expensive. Starburst aims to help accelerate the time to BI through its software, eliminate the need for copying and moving data and leverage enhanced security features for fine-grained access controls and detailed security auditing.
What’s missing from Trino and Starburst?
Knowing that Trino and Starburst are primarily distributed query engines it’s easy to see the gaps in functionality when constructing and managing data operations teams
Missing data catalog
No version control
No SQL editor and auditable data logs
This is where Promethium steps in by providing a solution that auto-generates SQL and replaces manually writing SQL statements with a combination of a graphical query builder and SQL generation technology to generate the SQL statement in ANSI SQL. In real-time, the auto-generated SQL statement can be sent to Starburst or Trino to run and validate it is correct.
The solution also needs to make it easy for Starburst and Trino users to find data sources, learn about the data they store and how to join data sources.
Promethium creates a fast and easy workflow for building and executing a federated query that allows data professionals to:
1. Search, preview and validate that the data is a good fit by using the catalog
Data sources are connected to the catalog and are kept up-to-date automatically. Metadata from each data source is used to make each data source easy to understand without needing to be an expert on the data source.
For the ideal solution, data isn't moved to populate and update the catalog and once the best fit data is found in the data catalog it can be accessed and used for step two immediately without needing to switch to another tool.
2. Prepare and join the data from each data source using the no-code query builder
Step 2 starts with the data found in Step1 and instead of writing SQL manually the solution allows the user to construct the query, including preparing and joining the tables with a UI driven data map builder, instead of a SQL statement writer.
To boost productivity and shorten time to answers, the SQL statement is automatically generated and recommendations for joining tables are provided. At each step of data preparation and joining the tables the results can be previewed in real-time.
3. Execute the federated query with the federated query engine
Regular SQL query tools or BI solutions can't perform federated queries, so a fast federated query engine is needed. And that federated query engine needs the ability to make the query available to the tools and people who need it.
Common use cases would be to publish a view for Tableau, Power BI, Qlik, Looker or any other BI or visualization tool, or to access the query for data science with a Jupiter notebook.
Introducing the new Promethium Free Trial
Promethium’s trial experience has been enhanced to make it easier for users to prove the solution works for their situation.
Trial users can connect their own data with just a few clicks and start exploring and querying their data in minutes without any code or special skills. An intuitive user experience is combined with a new digital adoption assistant that guides users effortlessly to success.
The new Free trial experience is available now. Free trial.