Friday, December 2, 2022
HomeBig DataThe Way forward for the Fashionable Information Stack in 2022 - Atlan

The Way forward for the Fashionable Information Stack in 2022 – Atlan

That includes the 6 massive concepts it is best to know from 2021

As the info world slowed down for the vacations, I obtained some downtime to step again and take into consideration the final yr. And I can’t assist however assume, wow, what a yr it’s been!

Is it simply me, or did knowledge undergo 5 years’ value of change in 2021?

It’s partially COVID time, the place a month seems like a day and a yr on the similar time. You’d blink, and all of a sudden there can be a brand new buzzword dominating Information Twitter. It’s additionally partially the deluge of VC cash and loopy startup rounds, which added gasoline to the yr’s knowledge fireplace.

With a lot hype, it’s exhausting to know what traits are right here to remain and which can disappear simply as shortly as they arose.

This weblog breaks down the six concepts it is best to know concerning the trendy knowledge stack going into 2022 — those that exploded within the knowledge world final yr and don’t appear to be going away.

Future of the modern data stack in 2022: Data Mesh

You in all probability know this time period by now, even you don’t precisely know what it means. The concept of the “knowledge mesh” got here from two 2019 blogs by Zhamak Dehghani, Director of Rising Applied sciences at Thoughtworks:

  1. How one can Transfer Past a Monolithic Information Lake to a Distributed Information Mesh
  2. Information Mesh Ideas and Logical Structure

Its core thought is that corporations can turn into extra data-driven by shifting from centralized knowledge warehouses and lakes to a “domain-oriented decentralized knowledge possession and structure” pushed by self-serve knowledge and “federated computational governance”.

As you possibly can see, the language across the knowledge mesh will get complicated quick, which is why there’s no scarcity of “what truly is a knowledge mesh?” articles.

The concept of the info mesh has been quietly rising since 2019, till all of a sudden it was all over the place in 2021. The Thoughtworks Know-how Radar moved Information Mesh’s standing from “Trial” to “Assess” in only one yr. The Information Mesh Studying Neighborhood launched, and their Slack group obtained over 1,500 signups in 45 days. Zalando began doing talks about the way it moved to a knowledge mesh.

Quickly sufficient, sizzling takes had been flying forwards and backwards on Twitter, with knowledge leaders arguing over whether or not the info mesh is revolutionary or ridiculous.

Future of the modern data stack in 2022: Data Mesh

In 2022, I believe we’ll see a ton of platforms rebrand and supply their companies because the “final knowledge mesh platform”. However the factor is, the info mesh isn’t a platform or a service that you would be able to purchase off the shelf. It’s a design idea with some fantastic ideas like distributed possession, domain-based design, knowledge discoverability, and knowledge product delivery requirements — all of that are value making an attempt to operationalize in your group.

So right here’s my recommendation: As knowledge leaders, you will need to follow the primary ideas at a conceptual stage, quite than purchase into the hype that you just’ll inevitably see available in the market quickly. I wouldn’t be shocked if some groups (particularly smaller ones) can obtain the info mesh structure by a completely centralized knowledge platform constructed on Snowflake and dbt, whereas others will leverage the identical ideas to consolidate their “knowledge mesh” throughout complicated multi-cloud environments.

Future of the modern data stack in 2022: Metrics Layer

Metrics are important to assessing and driving an organization’s progress, however they’ve been struggling for years. They’re usually break up throughout totally different knowledge instruments, with totally different definitions for a similar metric throughout totally different groups or dashboards.

In 2021, individuals lastly began speaking about how the fashionable knowledge stack might repair this subject. It’s been referred to as the metrics layermetrics retailerheadless BI, and much more names than I can checklist right here.

It began in January, when Base Case proposed “Headless Enterprise Intelligence”, a brand new strategy to fixing metrics issues. A pair months later, Benn Stancil from Mode talked concerning the “lacking metrics layer” in as we speak’s knowledge stack.

That’s when issues actually took off. 4 days later, Mona Akmal and Aakash Kambuj from Falkon printed articles about making metrics first-class residents and the “trendy metrics stack”.

Two days after that, Airbnb introduced that it had been constructing a home-grown metrics platform referred to as Minerva to unravel this subject. Different outstanding tech corporations quickly adopted go well with, together with LinkedIn’s Unified Metrics Platform, Uber’s uMetric, and Spotify’s metrics catalog of their “new experimentation platform”.

Simply after we thought this fervor had died down, Drew Banin (CPO and Co-Founding father of dbt) opened a PR on dbtcore in October. He hinted that dbt can be incorporating a metrics layer into its product, and even included hyperlinks to these foundational blogs by Benn and Base Case. The PR blew up and reignited the dialogue round constructing a greater metrics layer within the trendy knowledge stack.

In the meantime, a bunch of early stage startups have launched to compete for this house. Remodel might be the largest identify thus far, however MetriqlLightdashSupergrain, and Metlo additionally launched this yr. Some greater names are additionally pivoting to compete within the metrics layer, resembling GoodData’s foray into Headless BI.

Future of the modern data stack in 2022: Metrics Layer

I’m extraordinarily excited concerning the metrics layer lastly turning into a factor. A number of months in the past, George Fraser from Fivetran had an unpopular opinion that all metrics shops will evolve into BI instruments. Whereas I don’t totally agree, I do consider {that a} metrics layer that isn’t tightly built-in with BI is unlikely to ever turn into commonplace.

Nonetheless, present BI instruments aren’t actually incentivized to combine an exterior metrics layer into their instruments… which makes this a hen and egg downside. Standalone metrics layers will wrestle to encourage BI instruments to undertake their frameworks, and can be pressured to construct BI like Looker was pressured to a few years in the past.

That is why I’m actually enthusiastic about dbt saying their foray into the metrics layer. dbt already has sufficient distribution to encourage no less than the fashionable BI instruments (e.g. Preset, Mode, Thoughtspot) to combine deeply into the dbt metrics API, which can create aggressive stress for the bigger BI gamers.

I additionally assume that metrics layers are so deeply intertwined with the transformation course of that intuitively this is smart. My prediction is that we’ll see metrics turn into a first-class citizen in additional transformation instruments in 2022.

Future of the modern data stack in 2022: Reverse ETL

For years, ETL (Extract, Remodel, Load) was how knowledge groups populated their methods. First, they’d pull knowledge from third-party methods, clear it up, after which load it into their warehouses. This was nice as a result of it stored knowledge warehouses clear and orderly, nevertheless it additionally meant that it took without end to get knowledge into warehouses. Generally, knowledge groups simply needed to dump uncooked knowledge into their methods and take care of it later.

That’s why many corporations moved from ETL to ELT (Extract, Load, Remodel) a few years in the past. As a substitute of reworking knowledge first, corporations would ship uncooked knowledge into an information lake, then remodel it later for a selected use case or downside.

In 2021, we obtained one other main evolution on this thought — reverse ETL. This idea first began getting consideration in February, when Astasia Myers (Founding Enterprise Associate at Quiet Capital) wrote an article concerning the emergence of reverse ETL.

Since then, Hightouch and Census (each of which launched in December 2020) have set off a firestorm as they’ve battled to personal the reverse ETL house. Census introduced that it raised a $16 million Collection A in February and printed a sequence of benchmarking experiences focusing on Hightouch. Hightouch countered with three raises of a complete $54.2 million in lower than 12 months.

Hightouch and Census have dominated the reverse ETL dialogue this yr, however they’re not the one ones within the house. Different notable corporations are GrouparooHeadsUp, PolytomicRudderstack, and Workato (who closed a $200m Collection E in November). Seekwell even obtained acquired by Thoughtspot in March.

Future of the modern data stack in 2022: Reverse ETL

I’m fairly enthusiastic about all the pieces that’s fixing the “final mile” downside within the trendy knowledge stack. We’re now speaking extra about use knowledge in day by day operations than warehouse it — that’s an unbelievable signal of how mature the elemental constructing blocks of the info stack (warehousing, transformation, and many others) have turn into!

What I’m not so positive about is whether or not reverse ETL ought to be its personal house or simply be mixed with an information ingestion software, given how related the elemental capabilities of piping knowledge out and in are. Gamers like Hevodata have already began providing each ingestion and reverse ETL companies in the identical product, and I consider that we’d see extra consolidation (or deeper go-to-market partnerships) within the house quickly.

Future of the modern data stack in 2022: Active Metadata & Third-Gen Data Catalogs

Within the final couple of years, the controversy round knowledge catalogs was, “Are they out of date?” And it will be straightforward to assume the reply is sure. In a few well-known articles, Barr Moses argued that knowledge catalogs had been lifeless, and Michael Kaminsky argued that we don’t want knowledge dictionaries.

Alternatively, there’s by no means been a lot buzz about knowledge catalogs and metadata. There are such a lot of knowledge catalogs that Rohan from our crew created, a “catalog of catalogs”, which feels each ridiculous and fully mandatory. So which is it — are knowledge catalogs lifeless or stronger than ever?

This yr, knowledge catalogs obtained new life with the creation of two new ideas — third-generation knowledge catalogs and energetic metadata.

Initially of 2021, I wrote an article on trendy metadata for the fashionable knowledge stack. I launched the concept we’re getting into the third-generation of knowledge catalogs, a basic transformation from the prevalent old-school, on-premise knowledge catalogs. These new knowledge catalogs are constructed round numerous knowledge property, “massive metadata”, end-to-end knowledge visibility, and embedded collaboration.

This concept obtained amplified by a enormous transfer Gartner made this yr — scrapping its Magic Quadrant for Metadata Administration Options and changing it with the Market Information for Lively Metadata. In doing this, they launched “energetic metadata” as a brand new class within the knowledge house.

What’s the distinction? Outdated-school knowledge catalogs accumulate metadata and produce them right into a siloed “passive” software, aka the normal knowledge catalog. Lively metadata platforms act as two-way platforms — they not solely deliver metadata collectively right into a single retailer like a metadata lake, but in addition leverage “reverse metadata” to make metadata obtainable in day by day workflows.

For the reason that first time we wrote about third-generation catalogs, they’ve turn into a part of the discourse round what it means to be a contemporary knowledge catalog. We even noticed the phrases pop up in RFPs!

Third-Gen Data Catalog RFP
Snippet of an anonymized RPF

On the similar time, VCs have been keen to take a position on this new house. Metadata administration has grown a ton with raises throughout the board — e.g. Collibra’s $250m Collection GAlation’s $110m Collection D, and our $16m Collection A at Atlan. Seed-stage corporations like Stemma and Acryl Information additionally launched to construct managed metadata options on present open-source initiatives.

Future of the modern data stack in 2022: Active Metadata & Third-Gen Data Catalogs

The information world will all the time be numerous, and that variety of individuals and instruments will all the time result in chaos. I’m in all probability biased, provided that I’ve devoted my life to constructing an organization within the metadata house. However I really consider that the important thing to bringing order to the chaos that’s the trendy knowledge stack lies in how we will use and leverage metadata to create the fashionable knowledge expertise.

Gartner summarized the way forward for this class in a single sentence: “The stand-alone metadata administration platform can be refocused from augmented knowledge catalogs to a metadata ‘wherever’ orchestration platform.”

The place knowledge catalogs within the 2.0 era had been passive and siloed, the three.0 era is constructed on the precept that context must be obtainable wherever and each time customers want it. As a substitute of forcing customers to go to a separate software, third-gen catalogs will leverage metadata to enhance present instruments like Looker, dbt, and Slack, lastly making the dream of an clever knowledge administration system a actuality.

Whereas there’s been a ton of exercise and funding within the house in 2021, I’m fairly positive we’ll see the rise of a dominant and really third-gen knowledge catalog (aka an energetic metadata platform) in 2022.

Future of the modern data stack in 2022: Data Teams as Product Teams

As the fashionable knowledge stack goes mainstream and knowledge turns into a much bigger a part of day by day operations, knowledge groups are evolving to maintain up. They’re not “IT people”, working individually from the remainder of the corporate. However this raises the query, how ought to knowledge groups work with the remainder of the corporate? Too usually, they get caught within the “service entice” — unending questions and requests for creating stats, quite than producing insights and driving affect by knowledge.

Emilie Schario - Service Trap
Emilie Schario’s iconic picture on the truth of engaged on an information crew. (Picture from MDSCON 2021.)

In 2021, Emilie Schario from Amplify CompanionsTaylor Murphy from Meltano, and Eric Weber from Sew Repair talked a few approach to break knowledge groups out of this entice — rethinking knowledge groups as product groups. They first defined this concept with a weblog on Domestically Optimistic, adopted by nice talks at conferences like MDSCONdbt Coalesce, and Future Information.

A product isn’t measured on what number of options it has or how shortly engineers can quash bugs — it’s measured on how effectively it meets clients’ wants. Equally, knowledge product groups ought to be centered on the customers (i.e. knowledge shoppers all through the corporate), quite than questions answered or dashboards constructed. This permits knowledge groups to give attention to expertise, adoption, and reusability, quite than ad-hoc questions or requests.

This give attention to breaking out of the service entice and reorienting knowledge groups round their customers actually resonated with the info world this yr. Extra individuals have began speaking about what it means to construct “knowledge product groups”, together with loads of sizzling takes on who to rent and set objectives.

Future of the modern data stack in 2022: Data Teams as Product Teams

Of all of the hyped traits in 2021, that is the one I’m most bullish on. I consider that within the subsequent decade, knowledge groups will emerge as one of the vital necessary groups within the group cloth, powering the fashionable, data-driven corporations on the forefront of the economic system.

Nonetheless, the truth is that knowledge groups as we speak are caught in a service entice, and solely 27% of their knowledge initiatives are profitable. I consider the important thing to fixing this lies within the idea of the “knowledge product” mindset, the place knowledge groups give attention to constructing reusable, reproducible property for the remainder of the crew. This may imply investing in person analysis, scalability, knowledge product delivery requirements, documentation, and extra.

Future of the modern data stack in 2022: Data Observability

This concept got here out of “knowledge downtime”, which Barr Moses from Monte Carlo first spoke about in 2019 saying, “Information downtime refers to durations of time when your knowledge is partial, misguided, lacking or in any other case inaccurate”. It’s these emails you get the morning after an enormous mission, saying “Hey, the info doesn’t look proper…”

Information downtime has been part of regular life on an information crew for years. However now, with many corporations counting on knowledge for actually each side of their operations, it’s an enormous deal when knowledge stops working.

But everybody was simply reacting to points as they cropped up, quite than proactively stopping them. That is the place knowledge observability — the thought of “monitoring, monitoring, and triaging of incidents to stop downtime” — got here in.

I nonetheless can’t consider how shortly knowledge observability has gone from being simply an thought to a key a part of the fashionable knowledge stack. (Just lately, it’s even began being referred to as “knowledge reliability” or “knowledge reliability engineering”.)

The house went from being non-existent to internet hosting a bunch of corporations, with a collective $200m of funding raised in 18 months. This consists of AcceldataAnomaloBigeyeDatabandDatafoldMetaplaneMonteCarlo, and Soda. Folks even began creating lists of latest “knowledge observability corporations” to assist maintain monitor of the house.

Future of the modern data stack in 2022: Data Observability

I consider that previously two years, knowledge groups have realized that tooling to enhance productiveness is just not a good-to-have however vital. In any case, knowledge professionals are one of the vital sought-after hires you’ll ever make, in order that they shouldn’t be losing their time on troubleshooting pipelines.

So will knowledge observability be a key a part of the fashionable knowledge stack sooner or later? Completely. However will knowledge observability live on as its personal class or will it’s merged right into a broader class (like energetic metadata or knowledge reliability)? That is what I’m not so positive about.

Ideally, if in case you have all of your metadata in a single open platform, it is best to have the ability to leverage it for a wide range of use instances (like knowledge cataloging, observability, lineage and extra). I wrote about that concept final yr in my article on the metadata lake.

That being mentioned, as we speak, there’s a ton of innovation that these areas want independently. My sense is that we’ll proceed to see fragmentation in 2022 earlier than we see consolidation within the years to return.

Future of the modern data stack in 2022: Last thoughts

It could really feel chaotic and loopy at instances, however as we speak is a golden age of knowledge.

Within the final eighteen months, our knowledge tooling has grown exponentially. All of us make plenty of fuss concerning the trendy knowledge stack, and for good cause — it’s so significantly better than what we had earlier than. The sooner knowledge stack was frankly as damaged as damaged might get, and this gigantic leap ahead in tooling is precisely what knowledge groups wanted.

In my view, the subsequent “delta” on the horizon for the info world is the trendy knowledge tradition stack — the very best practices, values, and cultural rituals that can assist us numerous people of knowledge collaborate successfully and up our productiveness as we sort out our new knowledge stacks.

Nonetheless, we will solely take into consideration working collectively higher with knowledge after we’ve nailed, effectively, working with knowledge. We’re on the cusp of getting the fashionable knowledge stack proper, and we will’t wait to see what new developments and traits 2022 will deliver!

This text was initially printed on In the direction of Information Science.

Header picture: Mike Kononov on Unsplash



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments