Six Books that Have Shaped My Data Career | by Chad Isenberg | Mar, 2023

By Jessie Hobb On Mar 30, 2023

Great reads on modeling, processes, and leadership

A large wrapping bookshelf in a library. — Photo by Emil Widlund on Unsplash

At the very start of my journey in data, I thought I was going to be a data scientist, and my first foray into data was centered on studying statistics and linear algebra, not software engineering or database management. Fairly early in my career, however, I realized that I enjoyed building data assets more than reports or ML models. If you’re interested in those early days, how I grew my career, and advice for newcomers to data, take a look at my earlier article.

In this article, I want to focus on my on-again, off-again relationship with books and reading. A lifetime ago, I was pursuing an academic career in American and English literature. I specialized in 19th-century American literature, and as anyone who’s pursued advanced degrees in any field can tell you, I read a lot: fiction, poetry, non-fiction, non-fiction about fiction, non-fiction about non-fiction about fiction, and other permutations. I burned out spectacularly about a year into a PhD program, and my relationship with books ended for quite some time.

I returned to reading within the last few years, and my desire to grow and learn in the data space has been a huge part of that. I want to stress that reading books is not sufficient for getting into the field, getting a promotion, or getting your next job; there’s far more to building a career than accumulating knowledge. Still, I think there are two huge career benefits to reading books from your industry:

You do get those ideas, concepts, and knowledge in a way that’s different from learning on the job or even through watching someone else demonstrate a skill. Not only are you benefiting from the discipline the writer had to go through to create the work, but you’re benefiting from your own discipline in reading.
Reading is a social activity. In addition to the relationship you build between the authors and yourself, by reading a book, you have a touchpoint with the broader community. Even if you haven’t read any of the books below, you’ve probably at least heard of some of them. In much the same way that software frameworks drive alignment, so too can foundational literature in a field.

With that being said, let’s get into the books!

I’m not going to bury the lead. If you work in data, you at the very least need to be familiar with dimensional modeling concepts, and I personally don’t think there’s a better way than by going straight to the source. It’s possible that just having this on your bookshelf will make you a better data professional; it’s just that important.

I was completely naive to dimensional modeling when I first picked up this book as a data analyst, and it was transformational. I ended up reading it cover-to-cover (which I would not recommend; see below), and that’s when I knew I wanted to transition from reporting and analytics to modeling and building.

While the book is light on implementation specifics, the concepts are rock-solid. The simple 4-step design process is still shockingly effective, even decades later. Once you understand the fundamentals of dimensions, facts, and how they work together to describe business processes, you have a great set of tools to start solving business problems.

There’s an excellent reader’s guide from Holistic.io that I would encourage you to check out before diving in. I think that the entire book is worth reading, but when you read those later chapters is going to depend on specific data modeling problems you encounter. And for that matter, while it’s no longer maintained, the Kimball Group Articles & Design Tips is still full of excellent ideas and approaches.

Several years ago, I was fortunate enough to participate in an 8-week data engineering workshop sponsored by 1904labs, a consultancy in the St. Louis area. At this point in my career, I had only been exposed to SQL and R, so this felt like a huge step forward. During the course, we got hands-on experience with Kafka, Scala, Spark, HBase, and Hive, and I was hooked. I wanted to know more about the platforms and underlying technologies upon which data assets were built.

DDIA is perhaps the second-most recognized text in data, and for good reason. Kleppmann strikes an amazing balance between accessibility and depth. As someone who had minimal exposure to computer science, I felt it pretty easy to follow along with his explanations of encodings, database design (and tradeoffs), and distributed architecture. At the same time, I got a real sense of how various data technologies work.

You don’t necessarily have to read the book cover-to-cover, but the book has a narrative structure that makes this reading strategy extremely effective. Even more than the technical details, I found the history of data technologies to be the most important takeaway; you get to understand the motivation behind the tools. Why do we have RDBMS? What problems do document stores solve? Why Spark and not Hadoop? Hands-on experience will give you the most insight into those answers, but DDIA comes in a close second.

I found this book immensely motivating at a time when I was struggling professionally. I was in a traditional BI department, and while we were doing great work, I was hungry to lean into processes. I identified strongly with the movement’s roots in Lean and The Toyota Way: improvement methodologies I was introduced to during my time in healthcare.

“The Three Ways,” flow, feedback, and continuous learning and experimentation, are all broadly applicable across the software development space, with which data has become increasingly more aligned. Whether you’re a developer or a manager, you’ll find yourself nodding your head at both pain points and solutions. Sections about trust and safety seem especially applicable to the modern data landscape, where some of our biggest challenges are in a) delivering high-quality data and b) getting our stakeholders to accept that the data is indeed high-quality!

The book is full of excellent case studies that illustrate both the rewards and challenges in undergoing a DevOps transformation. As you might expect, processes are important, but culture change is the actual goal. Getting broad business alignment on your value chain is difficult but powerful; ultimately, it’s about getting everyone on board to design, develop, deliver, and operate software solutions in the most efficient (and safest) way possible.

I started my obsession with process work here. At my BI developer job, we were using Redgate tools to manage some aspects of our development. At some point or another, I became fascinated with some tooling that we didn’t use, especially Flyway; there were some immediate pain points that applying version control to our database objects seemed to solve.

That led me to cracking open this concise but excellent book on how to apply site reliability engineering practices to databases. For me, the most valuable takeaways are applying the reliability mindset to data stores. How do you standardize databases with “golden images”? What do error budgets look like? What does proper database instrumentation, monitoring, and observability look like? In short, what really matters when it comes to your databases / data warehouses?

What the authors do best is provide a vision for elevating the DBA role from a purely operational one to a combination of operations and enablement. Yes, you still need someone who can provide operational support to your data stores, but there’s a huge amount of value to be had in democratizing them. When individual development teams can safely and easily spin up “blessed” assets, you reduce their friction, and you free up your database team to further refine automation and processes, all of which increases safety.

It doesn’t hurt that Charity Majors has awesome stickers:

A pile of assorted stickers on top of an envelope.

As I’ve progressed in my career, I’ve become increasingly more interested in processes and how they apply to technical solutions. Don’t get me wrong; technology is still really important to me, but I also want to see it in action. It’s one thing to explore a repository and play with a demo, and it’s another thing to see how a technology enables an organization in meeting its goal.

To that end, I finally picked up this book after countless recommendations and endless praise, and all I can say is that it lives up to the hype. Team Topologies is, if nothing else, a case study on Conway’s Law, and the authors relentlessly drive toward systematizing it. I think their observations on team types and interactions are especially insightful and provide a roadmap for better projects and better software.

Data products in particular can suffer from silos and fragmentation. Data source owners, engineers, analysts, and business stakeholders typically operate in distinct enclaves, with insufficient contact to accommodate discoveries during development or evolving requirements. Thoughtfully arranging teams and designing the right interfaces could go a long way to reducing friction during the design, development, and eventual release of data assets.

In my previous career in healthcare, I ultimately found myself doing people management, which I found stressful and unfulfilling. My dissatisfaction is what motivated me to get into data a little over 5 years ago. In that time, I’ve learned and done more than I ever thought I would, and I’m really happy I made the transition. At this point in my career, I’ve slowed down on the constant learning; I’m able to recognize patterns, themes, and ways of working. Fairly recently, I’ve started focusing more on what my career looks like over the coming decades. I want my work to be interesting, but I also want it to be impactful.

Will Larson’s staff engineering site and related book are required reading for anyone with a serious interest in software, regardless of level (entry all the way to staff+) and path (individual contributor vs. management). The book does provide some structure, such as defining the staff archetypes, the importance of writing, and how to articulate your technical vision to key stakeholders; however, there are ample case studies from staff engineers from a variety of backgrounds, and I found these some of the most enjoyable reading.

To reiterate, regardless of whether or not you want to be a staff engineer, I think this book is important because it gets you thinking about technical leadership and and impact. Early in your career, delivery and execution seem like the only things that matter. While these are important success criteria, they are far from sufficient to ensure that you and your team are a) doing meaningful work that drives the business forward and b) will be recognized and rewarded for that impact.

Great reads on modeling, processes, and leadership

You do get those ideas, concepts, and knowledge in a way that’s different from learning on the job or even through watching someone else demonstrate a skill. Not only are you benefiting from the discipline the writer had to go through to create the work, but you’re benefiting from your own discipline in reading.
Reading is a social activity. In addition to the relationship you build between the authors and yourself, by reading a book, you have a touchpoint with the broader community. Even if you haven’t read any of the books below, you’ve probably at least heard of some of them. In much the same way that software frameworks drive alignment, so too can foundational literature in a field.

With that being said, let’s get into the books!

It doesn’t hurt that Charity Majors has awesome stickers:

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.