At Validio, we’ve had the pleasure to speak with 100+ modern data teams globally, many of them located in the Nordics. During our conversations, we’ve covered topics such as the requirements needed for data pipelines, rationale for technology choices and the challenges involved when building out the data infrastructure. One specific topic we’ve covered extensively includes the technology preferences and tools being used by data teams in their data stacks. We’ve taken a look at our notes, crunched some data, and now share our findings on the 20 most popular data engineering tools being used in the Nordics.
In addition, we’ll do a deep dive on the adoption of cloud data warehouses in the Nordic region. As noted by Matt Turck in the 2021 Machine Learning, AI and Data Landscape analysis, modern cloud data warehouses have unlocked an entire ecosystem of tools and companies:
“Today, cloud data warehouses (Snowflake, Amazon Redshift and Google BigQuery) and lakehouses (Databricks) provide the ability to store massive amounts of data in a way that’s useful, not completely cost-prohibitive and doesn’t require an army of very technical people to maintain. In other words, after all these years, it is now finally possible to store and process Big Data. That is a big deal, and has proven to be a major unlock for the rest of the data/AI space." - Matt Turck, Partner at FirstMark Capital
The companies we’ve spoken with are primarily fast-growing scaleups and unicorns; hence a majority of the data teams we’ve spoken to have been able to build greenfield data stacks, without being entrenched in legacy on-prem systems needing migrations or retrofit integrations. In light of that, don’t be surprised if you don’t see tools such as Oracle database or Microsoft SQL Server on the list.
Let’s dig in!
Perhaps the most interesting thing about modern data tool usage in the Nordics is the prevalent usage of Airflow, dbt and BigQuery—all of which are being used by almost half of the teams we’ve spoken to.
Furthermore, there’s one tool close to the bottom of the list that might be considered an odd bird for the global data community, and that further suggests that the data is collected from the Nordics: the workflow orchestration tool Luigi. Two weeks ago Spotify’s engineering team published an article explaining why they’re switching their workflow orchestration tooling away from Luigi (which first started as an internal tool at Spotify). This spurred a lot of discussions within the data community where e.g. Erik Bernhadsson, one of the main maintainers of Luigi while he was at Spotify, started a Twitter thread discussing why Luigi didn’t reach worldwide adoption like Airflow (which originates from AirBnb). Spotify is by the way not switching their data orchestration to Airflow, but to Flyte.
Looking at the list with different tooling categories in mind, Nordic scale-ups and unicorns have agreed on a set of category favorites:
One thing that may come as a surprise to some readers is to see Redshift down below in 10th place, especially considering that the AWS native data warehouse is often viewed as one of the first major tools responsible for ushering us into the era of cloud-native data infrastructure, or the Modern Data Stack as many call it today.
What’s more, the lead BigQuery has on its main competitors Snowflake and Redshift might also come as a surprise. Comparing our data with that from other parts of the world illustrates what we mean:
In other words, the data from our discussions with Nordic scale-ups clearly suggest that Nordic companies are adopting BigQuery at a significantly higher rate vs. Snowflake and Redshift in other regions and the rest of the world.
By now, unless you’re a data professional operating in the Nordics with experience and insight into multiple Nordic scale-ups and finding the above statistics to completely conform to your expectations, you might at this point question the validity of the data and the rigidity of the data collection methodology (or lack thereof).
As Sahlin’s data suggests, Swedish-based companies (which serve as a representative sample of the Nordics) are indeed looking to hire BigQuery talent to a larger extent compared to the US and the rest of the world.
In the same post, there are some noteworthy comments as to whether a few large companies (e.g. Spotify, IKEA and King, all of which are using BigQuery (3)) happened to be looking to fill multiple data roles when the job postings data was collected, potentially contributing to the large share of BigQuery postings (turns out they accounted for a little bit more than ~10% of the postings). Given that the data from our surveyed scale-ups and unicorns, and Sahlin’s job ad data points towards the same thing, we can draw the conclusion that the Nordics indeed is a BigQuery stronghold (4).
Clickbait headlines aside, is this a fluke or can we find structural reasons and a narrative behind the stats? Again, comments on Sahlin’s post on job postings offers some interesting points:
1. Nordic data talent were schooled at Nordic success stories such as Spotify, iZettle and King, where BigQuery was used
This one is SahIin’s own hypothesis, where he suggests that data talent first worked at companies where BigQuery was the data warehouse of choice and later took senior positions at other companies, influencing the decision of which data tools to use. Prominent Swedish scale-ups like Spotify, iZettle (now Zettle after being acquired by PayPal) and King are all examples of heavy BigQuery users where data talent may have started their careers. (although the last mentioned parent company, Activision Blizzard, just got acquired by Microsoft - Hello Azure. Who would have thought you would need to live through a cloud migration when you’re already on cloud…).
2. Google has a strong local sales team
In the Nordic market, the local Google sales team seemingly have a good reputation, illustrated by comments such as:
As an outsider, one could speculate about what came first: the chicken or the egg? Is Google doubling down on the Nordics with a strong team allowing them to defend and grow their local market share, or has Google managed to gain a strong market position only because they have a strong team?
3. When using Google Cloud Platform (GCP), BigQuery comes out of the box
Which came first, the software engineer or the data engineer? Traditionally, in most companies, it’s the software engineer. If your software engineers already are on GCP, it’s not hard to imagine the account managers at Google upselling their existing accounts with their suite of data tools (including BigQuery) to data engineers, especially after the above anecdotal evidence of Google’s strong local team. Alternatively, data engineers may start to use BigQuery on their own accord, again, simply because their software engineer colleagues are already on GCP.
4. Flat and non-hierarchical companies in the Nordics adopt the community favorite
This one is our own hypothesis and something that struck us when we first saw the stats on the web communities of each of the data warehouses. Before taking a look at the community stats, we wanted to share a comment made about our CEO, Patrik, from one of our colleagues who recently moved to the Nordics (paraphrased):“Patrik doesn't really interfere with our work, I haven’t had any other boss who has had so little to say about the work we do everyday, and he is supposedly the big boss.”
This was naturally not a comment on Patrik’s competencies or leadership abilities, but rather a comment on the egalitarian work culture that Sweden and other Nordics countries are known for. An article published in The Local in 2019 discussed flat hierarchies in Sweden and interviewed a Nordic CEO:“You don’t need to be a group manager or a boss to be in charge [...] everybody can make a decision—as long as it aligns to the company's plan and you take responsibility for it and inform everybody who would be affected by it.”
In other words, Nordic employees are encouraged to participate and influence company decisions, supposedly to a greater extent than in other parts of the world, something I believe that many Nordic citizens and expats in the area would agree to (If you want to know more about the flat hierarchies in the Nordics, we recommend taking a look at the article. A flat hierarchy doesn’t come without its own set of challenges).
Let’s now change gears and get back to the community stats we mentioned earlier:
BigQuery being the clear favorite has a few implications:
By now, you’ve probably put two and two together and see where we’re going with this; if BigQuery is the community favorite with strong bottom-up/grass root support, and Nordic companies encourage all employees regardless of tenure to participate in the decision-making processes - wouldn’t it make sense then that engineers would recommend the alternative they’ve heard so much about on the internet and from friends, maybe even played around with themselves on the sparetime, when it comes to choosing technology?
For anyone familiar with product-led growth (PLG) and community-led growth, what we’re essentially saying is that the Nordics has a work culture particularly well-suited for a PLG and community-led growth motion.
At Validio, we are building the next generation data quality validation and monitoring platform. As such, we expect solutions in this category to soon find themselves on these lists. Not only have we heard from the 100+ discussions we’ve had with modern data teams on how data quality management is becoming a top priority, but companies are also publicly announcing data quality strategies and implementing specific OKRs, such as e.g. Gitlab.
What’s clear to us is that the amount of tools out there won’t reduce in number any time soon. Regardless of what problem a new data tool aims to solve, it’s paramount that it integrates and plays nicely with existing tools, whether it’s data warehouses like BigQuery, Redshift, Snowflake or some ETL/ELT or workflow orchestration tool. This way, data teams can mix and match and pick the tools that best suit their needs. As Bessemer notes, new infrastructure providers need to work seamlessly with a company’s predominant tools if they are to achieve any real adoption.
Lastly, data engineering is not—and never has been—about any particular technology. Data engineering is about designing, building, and maintaining systems and data platforms that incorporate best-of-breed and fit-for-purpose technologies and practices in a cost-effective way. Tools need to provide value, the right type of deployment optionality and abstract complexity away from data engineering.