Last month, we released the first edition of our Heroes of Data article featuring top 5 data leaders from fast-growing scale-ups who gave their take on the 5 hottest data trends for 2022. We didn’t expect to get so much initial traction and such a warm welcome from the community – thanks to everyone who gave us input on the newsletter and tips on who we should include in upcoming articles!
In the second part of this trendspotting article, we once again asked 5 top data leaders to share their take and answer to the question: “What trends will define the data landscape in 2022 and beyond?”
The Heroes of Data who share their insights in this second edition are:
Scroll down to get more insight on their knowledge and don’t forget to subscribe to get access to future updates! Know of anyone who should be featured? Feel free to reach out to us at firstname.lastname@example.org to give us the scoop!
These days, most trendy tech companies can boast both the people and the tools needed for making data-driven decisions at scale. They've hired data scientists, data analysts and data engineers, in addition to having built a data platform, A/B testing infrastructure and potentially, some machine learning pipelines. After years of investment, business value is (at best) finally being created as internal processes like fraud detection or lead generation are optimized. Product managers can also use data to decide what to build next and feature launches are evaluated based on their actual, tested impact.
But what I think many companies are now realizing is that the ROI of their data teams can potentially be improved by several orders of magnitude, by shifting these teams from acting as a supporting function to instead driving the development of customer-facing data products. These are products where the core value for the user is created from the data itself, and what’s so appealing about them is that they both improve retention and are difficult for others to copy, since they rely on users’ past data. Some famous examples include Spotify's Discover Weekly, Gmail's Smart Reply and Netflix's recommendation system.
While traditional cross-functional dev teams are the experts in regular product development, data people will usually be better able to spot data product opportunities. They will also be more familiar with problems like data availability, quality, processing, testing and monitoring. The challenge is to find a mode of collaboration that scales – some companies experiment with fully embedding data roles into product teams, while others create separate data innovation hubs.
Regardless of the approach taken, this movement represents a huge shift in mindset where data teams go from aiming to influence the business by optimizing processes and products, to actively taking part in developing the products themselves. I look forward to seeing more companies take this path in 2022 and beyond!
I see a trend of increased adoption of lakehouse architecture across data teams at different companies, something that we at Tibber are also underway to implement.
A common way of structuring a data warehouse when starting a business is to use a database such as AWS Redshift or Google BigQuery as a single database. But as business and data needs grow, it often gets crowded. More and more compute resources are used to just load the raw data and data science and business intelligence teams are competing for the remaining resources. Adding a data lake to solve the problem can easily turn into a data swamp, where it becomes hard to just find the right data.
Recently, we at Tibber decided to implement a lakehouse architecture, getting the best of both worlds and enabling us to use the right tool for the right purpose. Key components in the lakehouse architecture that allow us to work more efficiently include clearly separating raw data from curated data by storing raw data in an object storage (such as S3) and only the curated data needed for analysis purposes in Redshift. Additional helpful features include a data catalog that keeps track of all our data regardless of where it is stored. Tools such as AWS Athena enable us to query data outside Redshift with SQL syntax lowering barriers for our data scientists and analysts to access data.
This solution offloads a lot of cloud resources from Redshift and frees up resources for analytics use cases, as well as enabling our data scientists to perform more advanced ML algorithms. We also expect to achieve a much more reliable solution with enhanced data governance.
Delivering quicker business insights is becoming more popular than building advanced engineering solutions. Business stakeholders now understand the power of data and are keen on seeing the value from it as quickly as possible. Business leaders rarely care about how it is done, if the engineers have built CI/CD pipelines, have bought managed services or run their compute on K8s clusters, etc. – as long as concrete business value can be delivered from the data.
As a result of this, multiple companies have popped up and are building innovative products in the data ecosystem, catering even to very specific and smaller problem areas.
I see the trend of companies willing to invest in multiple products to build their modern data stack rather than building a solid technical solution entirely in house: be it using two different products doing just the data ingestion part or a couple of BI tools catering to different audiences and use cases.
They don't mind having a dedicated solution for each of the pieces in their data stack, meaning having a dedicated tool or two doing things like: data collection, ingestion, transformation, testing, lineage, reliability, visualizations, ad hoc-analysis, reverse ETL, etc. What is more interesting is that every tool is starting to support each other, which makes it possible to build a fully operational modern data stack within a few months that is not only limited to serving Business Intelligence, but also well-integrated with other use cases of Operational Analytics, CRM activities, better marketing attribution and better customer engagement.
The reason for this increased trend of building data stacks out of mainly third party tools is driven by a number of factors:
But it doesn't come without its own set of challenges:
It will be interesting to see if this trend will bring more technical debt or continue to deliver long-term net value.
The word “data driven” has been firmly entrenched in the collective subconsciousness of modern business for a solid decade. The value of this approach is thought to be enormous, the adoption of it long transformed from an ideal to an axiom. Yet in 2022, companies still struggle with interpreting this concept. Buzzwords abound both in the technical foundations of modern companies and in the higher echelons of business and management.
The most data-driven profession on the planet is that of the humble airline pilot. Surrounded by flight instruments presenting all types of metrics, pilots do their job of sitting and watching the machines work. Whenever a decision has to be made, it is informed only by data: from flight instruments, air traffic control on the radio, charts of weather conditions and mathematical short forms. The airline pilot often looks out their window and sees nothing but rain, darkness, clouds, or blue sky. Nothing to interpret, nothing for their gut feeling to hang onto - nothing to misguide them. And their own lives are on the line every single time – not bankruptcy, or a failed investment round, or a false step on the market – their lives. 99% of the time, the plane autopilots happily from airport to airport without a single intervention from the pilot. Typically, airline pilots do not question the nature of being data-driven, and more often than not, have a high degree of trust in the data the flight instruments present.
In 2022, we finally see more data tools, processes and paradigms (e.g. data mesh, self-serve analytics, decoupled ETL processes, metrics/metadata layers) being defined such that stakeholders (and shareholders) of companies can settle into their new roles of not trusting their gut. These updated data tools, processes and paradigms now have the power to change stakeholders’ decision-making entirely, ultimately providing the visceral understanding that being data-driven is not the same as being merely “informed”.
However, getting all the way to fully data-driven decision making requires implicit trust in the data, just like the airline pilot must trust their flight instruments. The data tools, processes and paradigms empower stakeholders to have more advanced analytics at their fingertips, but unfortunately do very little for their trust in the data.
Here, data quality is the missing component. It is therefore encouraging to see data quality slowly being turned into a measurable and observable quantity. Data quality is becoming a metric in and of itself that makes it possible to forget the ifs, ands and buts of how the insights came to be – just like it is for the airline pilot.
But this analogy extends beyond modern-day jobs. Ultimately, it’s about conveying stories about an event to a recipient, without being hindered on the way by a lack of trust. For millennia, the news of antelopes to hunt, lions on the hunt, a newborn child and other reasons for danger or celebration has been conveyed through drums beating in the deep jungle or smoke signals on the savannah. These signals were instinctually interpreted, just how the pilot instinctually interprets their flight instruments, without regards to how the information was formulated or how trustworthy it might be. In 2022, trust is finally becoming a measurable quantity and that is worth celebrating, because ultimately, it means one important thing: that stakeholders can do their jobs without bugging data engineers with annoying questions on Slack.
New tools for the modern data stack are constantly emerging to simplify the lives of the very busy data & analytics people. However, some manual labour is harder to replace than others. The work done by business analysts, data scientists and others on the business facing side of the analytics team generally requires deep domain knowledge and is therefore hard to automate completely with tools.
On the contrary, we have already seen tools like Fivetran and Airbyte do some heavy lifting for data engineers by seamlessly extracting and loading data into our data warehouses, in which succeeding transformations have been boosted by our beloved dbt. Thanks to such tools, we actually don’t have a pure data engineer at Hedvig right now.
Having said that, one overarching trend I am seeing is this shift in focus from data engineering to data science and business intelligence. Going forward, I think we will see an increased focus on understanding the business and what problems to solve. More and more job ads for full stack data scientists will pop up and analysts serving themselves with data without having to depend on busy data engineers, will be the reality for more and more companies.
Here’s a quick recap of the trends discussed:
The trends brought forward by our Heroes of Data match the overarching trends we see in the industry and the community; the nature of data teams’ work is slowly shifting focus and becoming more product and business-oriented as data teams continue to adopt new technologies and third-party solutions that simplify many of their time-consuming tasks.
We hoped you enjoyed these Heroes of Data insights. On an ending note, we have some exciting news to share - our first Heroes of Data meetup will take place in Stockholm on the 7th of June and we couldn’t be more excited! While spots are already fully booked, we will share key learnings in this newsletter - stay tuned!
Liked the content and eager to learn more and glean into the experience of seasoned data veterans? Don’t forget to subscribe to the Heroes of Data newsletter if you haven’t already!