Techno Blender
Digitally Yours.

3 Powerful Takeaways from the National Workshop on Data Science Education | by Murtaza Ali | Jul, 2022

0 65


As the field flourishes, there is much we can do to benefit its countless students.

Photo by NASA on Unsplash

“Oh, I didn’t know you could get a PhD in Data Science. I didn’t think those departments existed yet.”

“Well, that’s true. Technically my department is Human-Centered Design and Engineering, but I focus on mostly data science and computer science problems, guided by my advisor.”

Last week, I attended the National Data Science Education Workshop, hosted by UC Berkeley — an academic leader in the field. While there, I had a collection of fascinating conversations with various folks who, in one way or another, were dedicated to the spread of data science.

Above is a short snippet of one such conversation. The professor who made this remark had a valid point — one which made the overall goal of the conference that much more significant.

As of now, there aren’t a huge number of data science departments out there. Most folks, even those studying data science as undergraduates, are housed in departments like computer science, information science, statistics, and so on. In a way, this is sensible, since data science itself lies at the intersection of many different fields.

That said, as the overall skill set becomes more coveted, so increases the predilection of academic institutions to “officially” teach it. Over the next few years, we can expect a great expansion in access to data science education at both the high school and collegiate level.

As this expansion occurs, it is essential that we pay heed to how these programs are structured, learning from past mistakes in order to secure a successful future. In this article, I discuss three primary reflections I drew from attending the National Data Science Education Workshop, which I personally feel are of immense importance in designing data science curricula.

Let’s get into it.

The Point of an Intro Course: Theory vs. Practicality

The introductory data science course at UC Berkeley is called Data 8: Foundations of Data Science. It is one of the largest courses on campus, drawing students of all backgrounds — in particular because it assumes no programming experience. A collection of faculty across multiple departments spent many years designing this course to be as far-reaching and effective as possible; as such, it was the primary point of discussion during Day 1 of the workshop.

One of the key features of Data 8 is that it operates from a core philosophy which is promising to some, and devastating to others: effective data science can initially be taught without deep, theoretical knowledge of the underlying statistics. Data 8 teaches students to use computation as a tool to process and analyze large data sets, while introducing statistical concepts as needed along the way.

This allows for a range of benefits:

That said, the paradigm is not without opposition. Many hardcore statisticians argue that such a manner of teaching is dangerous, as it attempts to build an entire skill set on a foundation which is not there — akin to building the top of a pyramid before its base. There is no chance of success.

While I can understand these sentiments (as someone who enjoys math myself), I think they’re a bit drastic. It’s not as if students will never learn the statistics and math underlying data science. The pyramid’s two halves are being built separately on the ground and will be put together later; it doesn’t matter if the top half is built first. Furthermore, the top half alone is certainly enough to begin tackling real-world problems in an effective way. The simple fact is that the average data scientist need not know the subtleties of gradient descent, provided they can import a model from SK-Learn and apply it to their data set.

Of course, this is an open topic, and I mean not to push my opinion on you. Rather, I am presenting this point as an ongoing debate — one that I would encourage you to think about deeply if you plan to venture into (or are already swimming within) the world of data science education.

The Importance of Real-World Data

Oftentimes, teaching lends itself to the design of silly, made-up problems in an effort to try and teach students certain skills. We all dealt with some version of the following back in grade school:

Tommy goes to the store to purchase food for his 3 kids. Everyone wants to eat 3 watermelons for dinner, along with a watermelon agua fresca, each of which requires 2 watermelons to make. How many watermelons will Tommy need to purchase to satisfy everyone, including himself?

As we advance in our education through the years, the problems become more involved and perhaps less ridiculous, but one fact tends to persist: they are very much not real.

However, this need not be the case within data science. There are a multitude of publicly available data sets in the modern age, and there is no good reason you cannot incorporate them into your course. The presenters at the workshop placed great emphasis on this point.

By using real-world data, you automatically build deeper questions into your curriculum:

  • If the data isn’t in the right format for analysis, what do we do?
  • How can we do good in the world through the practice of data science?
  • What are the ethical implications of this study?

These then lead to important insights which might otherwise be lost on your students. By using real-world data from the get-go, you automatically start to address the extremely important but oft-overlooked question within any educational setting: why are we doing this?

Don’t kid yourself into thinking that this will by any means be easy. Finding the right data set for the specific learning goals you want your students to meet from week to week can be quite challenging. The professors at the workshop confessed to spending huge amounts of time on this seemingly simple task. You should be prepared to do the same.

Know, however, that it will be well worth it.

Ethics Should Not Be an “Aside”

Finally, as is to be expected in a modern push toward data science education, a portion of the workshop was dedicated to ethical discussions. However, the way that the presenters approached it caught my attention.

For context, the topic arose when some educators from Berkeley were discussing the expansion of their data science program in the years to come. They mentioned how initially, they figured they would just have one or two ethics courses as a required part of the curriculum.

But then, they realized that they were again making the fatal mistake of essentially making the ethics of the work an “aside,” when in reality it should be the foundation. In many ways, a number of the ethical issues we face in the technological world today resulted from similar, flawed paradigms.

So they went back to square one, and instead decided on a different approach. As they designed (and continue to design, for the program is still being built as I write this) their curriculum, they incorporated an ethical discussion into a portion of each syllabus, attuned to fit in with the specific topics of the class.

This is a much better approach, and one that all data scientists — especially those who teach others, whether formally or otherwise — can learn from. Discussing ethics consistently (every course) and specifically (tied directly to the course material) makes it an important topic for students of the field from the get-go.

By adopting this paradigm, we can ensure an ideal future for data science: one which does much good for society, but without irremediable collateral damage along the way.

Final Thoughts: What is responsible data practice?

To conclude, I’ll just share some final thoughts from the workshop itself. Toward the end of the presentations, the following overarching principles were talked about. They are left intentionally vague — and I am refraining from throwing in my own thoughts — because I think a fitting way to end this article is to encourage you to consider what each of the following means to you, specifically within the context of a data-centered world:

  • Understanding the world
  • Imagining what’s possible
  • Reflexivity about oneself
  • Sensitivity to ethical contexts
  • Orientations toward justice
  • Critique of unjust institutions

Don’t stop at reflection, of course. Incorporate these ideals into all aspects of your data science work — especially mentoring others. If we all do so, the future is bright.


As the field flourishes, there is much we can do to benefit its countless students.

Photo by NASA on Unsplash

“Oh, I didn’t know you could get a PhD in Data Science. I didn’t think those departments existed yet.”

“Well, that’s true. Technically my department is Human-Centered Design and Engineering, but I focus on mostly data science and computer science problems, guided by my advisor.”

Last week, I attended the National Data Science Education Workshop, hosted by UC Berkeley — an academic leader in the field. While there, I had a collection of fascinating conversations with various folks who, in one way or another, were dedicated to the spread of data science.

Above is a short snippet of one such conversation. The professor who made this remark had a valid point — one which made the overall goal of the conference that much more significant.

As of now, there aren’t a huge number of data science departments out there. Most folks, even those studying data science as undergraduates, are housed in departments like computer science, information science, statistics, and so on. In a way, this is sensible, since data science itself lies at the intersection of many different fields.

That said, as the overall skill set becomes more coveted, so increases the predilection of academic institutions to “officially” teach it. Over the next few years, we can expect a great expansion in access to data science education at both the high school and collegiate level.

As this expansion occurs, it is essential that we pay heed to how these programs are structured, learning from past mistakes in order to secure a successful future. In this article, I discuss three primary reflections I drew from attending the National Data Science Education Workshop, which I personally feel are of immense importance in designing data science curricula.

Let’s get into it.

The Point of an Intro Course: Theory vs. Practicality

The introductory data science course at UC Berkeley is called Data 8: Foundations of Data Science. It is one of the largest courses on campus, drawing students of all backgrounds — in particular because it assumes no programming experience. A collection of faculty across multiple departments spent many years designing this course to be as far-reaching and effective as possible; as such, it was the primary point of discussion during Day 1 of the workshop.

One of the key features of Data 8 is that it operates from a core philosophy which is promising to some, and devastating to others: effective data science can initially be taught without deep, theoretical knowledge of the underlying statistics. Data 8 teaches students to use computation as a tool to process and analyze large data sets, while introducing statistical concepts as needed along the way.

This allows for a range of benefits:

That said, the paradigm is not without opposition. Many hardcore statisticians argue that such a manner of teaching is dangerous, as it attempts to build an entire skill set on a foundation which is not there — akin to building the top of a pyramid before its base. There is no chance of success.

While I can understand these sentiments (as someone who enjoys math myself), I think they’re a bit drastic. It’s not as if students will never learn the statistics and math underlying data science. The pyramid’s two halves are being built separately on the ground and will be put together later; it doesn’t matter if the top half is built first. Furthermore, the top half alone is certainly enough to begin tackling real-world problems in an effective way. The simple fact is that the average data scientist need not know the subtleties of gradient descent, provided they can import a model from SK-Learn and apply it to their data set.

Of course, this is an open topic, and I mean not to push my opinion on you. Rather, I am presenting this point as an ongoing debate — one that I would encourage you to think about deeply if you plan to venture into (or are already swimming within) the world of data science education.

The Importance of Real-World Data

Oftentimes, teaching lends itself to the design of silly, made-up problems in an effort to try and teach students certain skills. We all dealt with some version of the following back in grade school:

Tommy goes to the store to purchase food for his 3 kids. Everyone wants to eat 3 watermelons for dinner, along with a watermelon agua fresca, each of which requires 2 watermelons to make. How many watermelons will Tommy need to purchase to satisfy everyone, including himself?

As we advance in our education through the years, the problems become more involved and perhaps less ridiculous, but one fact tends to persist: they are very much not real.

However, this need not be the case within data science. There are a multitude of publicly available data sets in the modern age, and there is no good reason you cannot incorporate them into your course. The presenters at the workshop placed great emphasis on this point.

By using real-world data, you automatically build deeper questions into your curriculum:

  • If the data isn’t in the right format for analysis, what do we do?
  • How can we do good in the world through the practice of data science?
  • What are the ethical implications of this study?

These then lead to important insights which might otherwise be lost on your students. By using real-world data from the get-go, you automatically start to address the extremely important but oft-overlooked question within any educational setting: why are we doing this?

Don’t kid yourself into thinking that this will by any means be easy. Finding the right data set for the specific learning goals you want your students to meet from week to week can be quite challenging. The professors at the workshop confessed to spending huge amounts of time on this seemingly simple task. You should be prepared to do the same.

Know, however, that it will be well worth it.

Ethics Should Not Be an “Aside”

Finally, as is to be expected in a modern push toward data science education, a portion of the workshop was dedicated to ethical discussions. However, the way that the presenters approached it caught my attention.

For context, the topic arose when some educators from Berkeley were discussing the expansion of their data science program in the years to come. They mentioned how initially, they figured they would just have one or two ethics courses as a required part of the curriculum.

But then, they realized that they were again making the fatal mistake of essentially making the ethics of the work an “aside,” when in reality it should be the foundation. In many ways, a number of the ethical issues we face in the technological world today resulted from similar, flawed paradigms.

So they went back to square one, and instead decided on a different approach. As they designed (and continue to design, for the program is still being built as I write this) their curriculum, they incorporated an ethical discussion into a portion of each syllabus, attuned to fit in with the specific topics of the class.

This is a much better approach, and one that all data scientists — especially those who teach others, whether formally or otherwise — can learn from. Discussing ethics consistently (every course) and specifically (tied directly to the course material) makes it an important topic for students of the field from the get-go.

By adopting this paradigm, we can ensure an ideal future for data science: one which does much good for society, but without irremediable collateral damage along the way.

Final Thoughts: What is responsible data practice?

To conclude, I’ll just share some final thoughts from the workshop itself. Toward the end of the presentations, the following overarching principles were talked about. They are left intentionally vague — and I am refraining from throwing in my own thoughts — because I think a fitting way to end this article is to encourage you to consider what each of the following means to you, specifically within the context of a data-centered world:

  • Understanding the world
  • Imagining what’s possible
  • Reflexivity about oneself
  • Sensitivity to ethical contexts
  • Orientations toward justice
  • Critique of unjust institutions

Don’t stop at reflection, of course. Incorporate these ideals into all aspects of your data science work — especially mentoring others. If we all do so, the future is bright.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment