Earlier this year, in the LinkedIn 2020 Emerging Jobs Report, the role of the data engineer was listed as number eight in a ranking of emerging jobs in the U.S. — with a 33% annual hiring growth rate. More recently, the Robert Half Technology’s Salary Guide ranked data engineering as the highest-paying IT job in 2021.
According to Robert Half, data engineering talent will continue to be compensated well above average as organizations will heavily rely on “individuals who can transform large amounts of raw data into actionable information for strategy-setting, decision making and innovation.”
As the demand for data engineers increases, data teams must be prepared with the right tools and resources to ensure the role maintains a productive and effective workflow — instead of spending a majority of their time on maintenance of existing systems, as cited by a recent survey.
It’s important to acknowledge that the data engineering space is evolving quickly — there are new open-source projects and tools released all the time, “best practices” are constantly changing, and data engineers are stretched thinner than ever before as businesses increase their data demands. Not only that, but data engineers have recently seen firsthand the hybridization of the data lake and data warehouse, with modern warehouses blurring the distinction by separating out compute from storage. This hybridization has evolved basic “ETL” pipelines of the past into more complex orchestrations, oftentimes both reading from and writing to warehouses. Additionally, as data scientist and data analyst colleagues are operationalizing their work, data engineers need to work more collaboratively with these functions and further empower these roles. With all of this pressure and constant change, it can be difficult to keep up with the space (or even enter it!).
From podcasts to blogs, below is a roundup of resources — both for new data engineers and seasoned professionals — looking to stay up to date with the latest in the world of data engineering.
The Data Engineering Podcast
It’s hard to beat the Data Engineering Podcast for relevancy in the space. With a new episode each week, the pace is easy to follow. Keeping up with the subject matter, however, is another thing. Topics range from companies discussing their data architecture and technical challenges to deep dives into open-source and paid offerings.
I’ve highlighted a few of the fascinating episodes below:
Presto Distributed SQL Engine: We’ve gotten so used to aggregating data to get it ready for querying, it’s illuminating to think about the pattern from the other side — federating the query execution to query/combine data where it resides. Hearing the pros and cons of this approach helps to add another tool to a data engineer’s toolbox.
Jepsen: The legendary Jepsen project has been diving deep on how distributed systems were designed, built, and perhaps most importantly, how they deal with failure conditions. Although deeper in the stack than a normal day-to-day stack, getting more familiar with the distributed systems that we rely on and are building (as we tie together multiple disparate systems) is a boon.
Non-Profit Data Professional: The guest on this episode, a director of data infrastructure at a non-profit, shares how challenging it can be to work with a multitude of data sources and compliance changes all while on a strict budget — something many data professionals at today’s startups (or non-profits in this case) can relate to and learn something from.
Subscribing to this podcast can be a great way to dedicate some time each week to stay ahead of the curve.
Big Data at InfoQ
Maybe podcasts aren’t so much your style, or maybe you crave more than one a week. If you’re looking for more resources (in multiple formats including podcasts, articles, presentations, and more), InfoQ’s Big Data section can be a great place to dig in further. The company has been making a name for itself by creating an editorial community with engineers and practitioners as opposed to journalists, and the content reflects this choice. The Big Data section covers a wide range of topics including AI, machine learning, and data engineering. While it’s not entirely focused on data engineering, it’s an informative way to learn more about adjacent spaces.
Awesome Big Data
I’m a big fan of the software community’s trend of creating GitHub repositories with curated lists of resources focused on a specific area (aka the awesome list). In my Clojure days, awesome-clojure was a great resource for looking up different database adapters or linters. Luckily, there is an “awesome” for Big Data, which includes subsections for Data Engineering and Public Datasets (those always seem to come in handy). The curation seemed a bit tighter before the project exploded in popularity; fret not, even having a giant list of public data sets — in, for example, the energy space — is incredibly valuable.
The data engineering space is moving quickly, whether measured by hiring growth rate, business needs, or toolset evolution. The resources available are fortunately keeping pace. Although it’s easy to get lost in the day-to-day grind of projects, I’ve been able to rekindle enthusiasm and keep up with the space by taking a step back to learn about newer paradigms/technologies. With this list of resources, I hope you all find the jumping-off point to do the same!