5 Pieces of Knowledge You Must Learn to Take Your Data Science Game to the Next Level | by Murtaza Ali | Nov, 2022


It takes a healthy combination of skills to excel in this interdisciplinary field

Photo by Yiran Ding on Unsplash

“I really want to get into data science.”

This is a phrase thrown around pretty loosely these days. Around every corner, there’s another person who emphatically insists that they want to learn the skills necessary to become a data scientist. I admit there is a large demand for this skill set within the current job market, but even then, it feels extensive at times.

But hey, I’m game. You want to be a data scientist? Enough talk, then. Time to back it up with some knowledge.

What knowledge?

I’m so glad you asked.

You must learn Python, and specifically, Pandas

R lovers, bless their souls, will hate me for this one.

Look, I’m not saying there’s anything wrong with R (even though there are many things wrong with R), but I am not budging on this one.

If you want to excel at data science, as opposed to merely dabbling in it, you need to learn Python. And once you do that, you need to learn Pandas.

Pandas is the gold standard for data cleaning, processing, and analysis today, and there are a number of reasons for it:

  • Being a Python module, it’s syntax is readable, concise, and reasonable to learn.
  • It integrates seamlessly with the wide variety of other tasks Python’s versatility as a language allows — building web applications, working with the cloud, and general software engineering (see the section below), to name a few.
  • Because of Pandas’ existing popularity, tools are continually being developed to make it even better. Want evidence? Check this out [1].

Python gives you statistical tests, data manipulation, machine learning, and a host of other capabilities for free. And, perhaps most importantly of all, it has a dedicated and active community surrounding it that is always willing to help. At the highest scale, data science is a collaborative endeavor, and Python (specifically, Pandas) actively supports that collaboration.

R has a lot of debate surrounding its utility. And while not everyone love Pandas, the sounds of controversy are noticeably less palpable.

There’s a reason for that.

You must learn basic software engineering

I am taking a unique class as part of my PhD this quarter. One with a very deliberate title: Software Development for Data Scientists.

You don’t have to be comfortable with software development as a data scientist. But it will certainly help you find a job, as well as excel at it. Here are a few reasons as to why that is:

  • As a data scientist, you will work with a team of technologists.
  • As a data scientist, you will need a way to prettily wrap and communicate your data analysis, insights, and models with the world at large. This will likely take the form of an app or system.
  • As a data scientist, you will write code or you will work with people who write code. As a result, you should understand best practices around programming.

These are all skills which you will not necessarily learn in a standard introductory programming class (which will focus on the foundations of computer science), but you will learn in a software engineering class.

And it doesn’t have to be a class — you could self-learn online, or simply learn by exposure through working on projects (these are arguably more effective than taking a course).

But the point is this: by learning software engineering, you expand your data science skill set to the next level by giving yourself the ability to effectively share and develop the insights you gain from your work.

Why wouldn’t you want to do that?

You must learn practical statistics

This is admittedly an area I myself can improve upon. Statistics forms the theoretical foundation of data science, and you can only get so far without learning it.

However, how you learn statistics is important. Let me explain what I mean. I have a professor with a Bachelor’s in Mathematics, PhD in Computer Science, and minor in Statistics (as part of the PhD), all from two of the top technical universities in the world. For all intents and purposes, she’s a math whiz.

And yet, when she shifted to data science later in her career, she needed to teach herself all the statistics she used. All the formal statistics she learned was simply too deep and abstract — it wasn’t practical.

If you’re a little hesitant about delving back into mathematics, this is good news. To be successful as a data scientist, you don’t need a formal degree in mathematics, nor do you need to be an expert at abstract proofs. What you do need is conceptual knowledge of important statistical ideas in data science that you can than practically apply (e.g. designing user research studies, running hypothesis tests, using machine learning models effectively, etc.).

This isn’t necessarily an easy task, but it is a very doable one. There are a plethora of resources online at your disposal.

Best of luck.

You must learn to simplify and communicate technical topics

Insights from data are of little use if they cannot be understood and subsequently utilized by others to do good.

Let me say that once more.

Insights from data are of little use if they cannot be understood and subsequently utilized by others to do good.

I don’t care if you’ve written the world’s most advanced model that will change the fate of Planet Earth — if you can’t share it with people in a way that they understand, it will do little good.

You might wonder why. After all, if I can apply the model and obtain the results, isn’t that enough?

If only the world was that simple. One person alone cannot effect change on large scales — you will need to work with teams, stakeholders, people with money, people with power. You will need to convince them that your work is excellent and groundbreaking. You will need to take complex ideas in your data science work and make them understandable to folks who aren’t technical experts.

There are two parts to this:

  1. Effective communication in general.
  2. Breaking down complex, interrelated phenomena into their component parts.

There are ways to practice both. To learn to communicate well, you might consider taking a writing or speaking course. As for the second point, it’s harder to achieve formally; it’s a skill gained over time, through consistent practice explaining your ideas and work to other people. So do that.

But whatever you do, don’t take this point lightly. A million great ideas is no better than 0 if you have no way to convey them.

You must learn to appreciate the non-technical

If you’re obsessed with numbers, calculations, and models — but ignorant of bias, ethics, and society — you shouldn’t be allowed anywhere near data science.

Effective data science goes beyond statistics and computer science. There is a third, essential component which is often overlooked: domain knowledge. The primary purpose of data science is to solve problems in a particular domain (e.g. biology, economics, sociology, political science, etc.). While you may be a master of numbers and programs, you most likely are not a master of the particular biases and subtleties of the resident field.

It is absolutely imperative to speak to domain experts when developing data solutions to various problems. Ignorance of this step is how you end up with inaccurate, biased models that have the potential to cause more harm than good [2].

Recap and Final Thoughts

Here’s a review of 5 must-have pieces of foundational knowledge for data science:

  1. Learn Python. Specifically, learn Pandas. It’s the gold standard for working with data.
  2. Learn basic software engineering. It’s one thing to write programs, but you take your marketability to a whole new level if you can engineer them.
  3. Learn practical statistics. This is the foundation of everything data science is, and you must learn it eventually. You can only fake it so much.
  4. Learn to simplify complex ideas and communicate them. People need to know what you’ve found from data. And you need to know how to tell them.
  5. Learn to appreciate the non-technical aspects of data science. It’s not all about the numbers. It never has been.

I can’t give you an exact blueprint to follow if you want to be a data scientist. It’s going to involve avid exploration, some trial and error, and likely a few failed attempts (much like data science itself, as it happens).

I can, however, give you a rundown of some skills that will be incredibly useful to you should you choose this path — and that’s exactly what I’ve done in the above article.

The rest is up to you.


It takes a healthy combination of skills to excel in this interdisciplinary field

Photo by Yiran Ding on Unsplash

“I really want to get into data science.”

This is a phrase thrown around pretty loosely these days. Around every corner, there’s another person who emphatically insists that they want to learn the skills necessary to become a data scientist. I admit there is a large demand for this skill set within the current job market, but even then, it feels extensive at times.

But hey, I’m game. You want to be a data scientist? Enough talk, then. Time to back it up with some knowledge.

What knowledge?

I’m so glad you asked.

You must learn Python, and specifically, Pandas

R lovers, bless their souls, will hate me for this one.

Look, I’m not saying there’s anything wrong with R (even though there are many things wrong with R), but I am not budging on this one.

If you want to excel at data science, as opposed to merely dabbling in it, you need to learn Python. And once you do that, you need to learn Pandas.

Pandas is the gold standard for data cleaning, processing, and analysis today, and there are a number of reasons for it:

  • Being a Python module, it’s syntax is readable, concise, and reasonable to learn.
  • It integrates seamlessly with the wide variety of other tasks Python’s versatility as a language allows — building web applications, working with the cloud, and general software engineering (see the section below), to name a few.
  • Because of Pandas’ existing popularity, tools are continually being developed to make it even better. Want evidence? Check this out [1].

Python gives you statistical tests, data manipulation, machine learning, and a host of other capabilities for free. And, perhaps most importantly of all, it has a dedicated and active community surrounding it that is always willing to help. At the highest scale, data science is a collaborative endeavor, and Python (specifically, Pandas) actively supports that collaboration.

R has a lot of debate surrounding its utility. And while not everyone love Pandas, the sounds of controversy are noticeably less palpable.

There’s a reason for that.

You must learn basic software engineering

I am taking a unique class as part of my PhD this quarter. One with a very deliberate title: Software Development for Data Scientists.

You don’t have to be comfortable with software development as a data scientist. But it will certainly help you find a job, as well as excel at it. Here are a few reasons as to why that is:

  • As a data scientist, you will work with a team of technologists.
  • As a data scientist, you will need a way to prettily wrap and communicate your data analysis, insights, and models with the world at large. This will likely take the form of an app or system.
  • As a data scientist, you will write code or you will work with people who write code. As a result, you should understand best practices around programming.

These are all skills which you will not necessarily learn in a standard introductory programming class (which will focus on the foundations of computer science), but you will learn in a software engineering class.

And it doesn’t have to be a class — you could self-learn online, or simply learn by exposure through working on projects (these are arguably more effective than taking a course).

But the point is this: by learning software engineering, you expand your data science skill set to the next level by giving yourself the ability to effectively share and develop the insights you gain from your work.

Why wouldn’t you want to do that?

You must learn practical statistics

This is admittedly an area I myself can improve upon. Statistics forms the theoretical foundation of data science, and you can only get so far without learning it.

However, how you learn statistics is important. Let me explain what I mean. I have a professor with a Bachelor’s in Mathematics, PhD in Computer Science, and minor in Statistics (as part of the PhD), all from two of the top technical universities in the world. For all intents and purposes, she’s a math whiz.

And yet, when she shifted to data science later in her career, she needed to teach herself all the statistics she used. All the formal statistics she learned was simply too deep and abstract — it wasn’t practical.

If you’re a little hesitant about delving back into mathematics, this is good news. To be successful as a data scientist, you don’t need a formal degree in mathematics, nor do you need to be an expert at abstract proofs. What you do need is conceptual knowledge of important statistical ideas in data science that you can than practically apply (e.g. designing user research studies, running hypothesis tests, using machine learning models effectively, etc.).

This isn’t necessarily an easy task, but it is a very doable one. There are a plethora of resources online at your disposal.

Best of luck.

You must learn to simplify and communicate technical topics

Insights from data are of little use if they cannot be understood and subsequently utilized by others to do good.

Let me say that once more.

Insights from data are of little use if they cannot be understood and subsequently utilized by others to do good.

I don’t care if you’ve written the world’s most advanced model that will change the fate of Planet Earth — if you can’t share it with people in a way that they understand, it will do little good.

You might wonder why. After all, if I can apply the model and obtain the results, isn’t that enough?

If only the world was that simple. One person alone cannot effect change on large scales — you will need to work with teams, stakeholders, people with money, people with power. You will need to convince them that your work is excellent and groundbreaking. You will need to take complex ideas in your data science work and make them understandable to folks who aren’t technical experts.

There are two parts to this:

  1. Effective communication in general.
  2. Breaking down complex, interrelated phenomena into their component parts.

There are ways to practice both. To learn to communicate well, you might consider taking a writing or speaking course. As for the second point, it’s harder to achieve formally; it’s a skill gained over time, through consistent practice explaining your ideas and work to other people. So do that.

But whatever you do, don’t take this point lightly. A million great ideas is no better than 0 if you have no way to convey them.

You must learn to appreciate the non-technical

If you’re obsessed with numbers, calculations, and models — but ignorant of bias, ethics, and society — you shouldn’t be allowed anywhere near data science.

Effective data science goes beyond statistics and computer science. There is a third, essential component which is often overlooked: domain knowledge. The primary purpose of data science is to solve problems in a particular domain (e.g. biology, economics, sociology, political science, etc.). While you may be a master of numbers and programs, you most likely are not a master of the particular biases and subtleties of the resident field.

It is absolutely imperative to speak to domain experts when developing data solutions to various problems. Ignorance of this step is how you end up with inaccurate, biased models that have the potential to cause more harm than good [2].

Recap and Final Thoughts

Here’s a review of 5 must-have pieces of foundational knowledge for data science:

  1. Learn Python. Specifically, learn Pandas. It’s the gold standard for working with data.
  2. Learn basic software engineering. It’s one thing to write programs, but you take your marketability to a whole new level if you can engineer them.
  3. Learn practical statistics. This is the foundation of everything data science is, and you must learn it eventually. You can only fake it so much.
  4. Learn to simplify complex ideas and communicate them. People need to know what you’ve found from data. And you need to know how to tell them.
  5. Learn to appreciate the non-technical aspects of data science. It’s not all about the numbers. It never has been.

I can’t give you an exact blueprint to follow if you want to be a data scientist. It’s going to involve avid exploration, some trial and error, and likely a few failed attempts (much like data science itself, as it happens).

I can, however, give you a rundown of some skills that will be incredibly useful to you should you choose this path — and that’s exactly what I’ve done in the above article.

The rest is up to you.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsAliartificial intelligenceDatagameknowledgeLearnlevelmachine learningMurtazaNovpiecesScience
Comments (0)
Add Comment