Ask many data teams about how they fulfill data requests for business analytics and they will say, “we have a data catalog”. It’s true that the data catalog plays a role, but finding data is just one of many steps.
The data catalog does what it was originally intended to do: data discovery and governance for the data team. Through conversations with many data teams we have identified a common process for how data teams fulfill requests from the business, and where organizations need more than a data catalog.
Why do you need more than a data catalog for fast analytics?
1. Data discovery needs to be faster
The original purpose of the data catalog was to help data teams with data discovery and governance. The problem is that traditional data catalogs haven’t evolved to make data discovery faster.
When the business needs data to make decisions quickly or run ad-hoc analysis, the data catalog probably gives the business user too much information than they need because the data is better suited for governance than it is for answering business questions.
Also contributing to the speed of data discovery is how fast the data catalog can be populated, and how easy and intuitive it is to search for data.
Time that can be saved: 4 weeks.
2. Moving data slows everything down
The business has learned the power of delivering data driven answers quickly and the importance of fast iteration. Business analysts see what they deliver as products and want to continually deliver their products in an agile way.
Needing to move data to a data warehouse or data lake to make it available to the business for verification and analysis stands in the way of what’s important to the business - quick delivery of answers and fast iteration.
Imagine if after finding data there was no need to move it. Imagine if it could be verified, assembled, prepared and queried in place all so it was ready for visualization without a single byte of data being moved. Imagine the time that could be saved and how much more productive data teams and business analysts could be.
Time that can be saved: 5 days.
3. Data teams need relief from manual data preparation
The data team needs relief from heavily manual data assembly and preparation. They need a solution that can intelligently assemble the data from across multiple sources. They need a solution that can find and suggest relationships to make joining data fast and easy. They need a solution that can prepare the data in place without needing to move it.
Time that can be saved: 2 months.
4. Data queries should be fast to make and reusable
Two things are likely; the query won’t be right the first time and someday someone else will need to query the same data.
The data team needs to be able to rapidly iterate queries and make them available for data analytics. This isn’t possible when the process relies on moving data, as well as manual assembly and preparation of data.
A data catalog can’t tell you if someone has already created a query for the data. When there isn’t a catalog of available queries it is inevitable that the next time someone needs the same data the data team will waste time duplicating the same query.
Time that can be saved: 4 days.
5. Visualizations need to be delivered quickly - not everyone needs a work of art
Have you ever been given a work of art when all you needed was a simple sketch? The perfection trap is a real time waster. Often the business only needs a simple table, pie or bar chart, not a multi visualization dashboard with rich animation and interactions.
People need to be given the tools to quickly create visualizations that will meet most needs before diving into complex BI tools. By delivering a simple visualization first, requests can be fulfilled faster and wasted time can be avoided. Save time and only deliver the work of art when the simple sketch is not enough.
What can be achieved with the right solution?
Having the right solution can speed up data analytics significantly. Imagine being able to reduce the average time to deliver data for analytics from three (3) months or more to three (3) minutes.
To give the business a speed advantage over the competition the data team needs more than a data catalog, they need an intelligent data hub that can support every step in the data fulfillment process. If your organization is aiming to be more data-driven, or if the volume of data analytics requests exceeds the capacity of your team, it’s time to consider using more than a data catalog.
What next?
Get the Eckerson Group research report, When a Data Catalog is Not Enough. The report is built from interviews with industry experts and the knowledge and expertise of the Eckerson Group. The report includes how to solve the problem and a practical case study.
Learn more about Promethium with this product overview.
Find out how to reduce the amount of time spent on data discovery and preparation with this free report from 451
Want to learn more about data catalogs and metadata management? Read, "What is a data catalog and what is a metadata management?"