Browsing Tag

Gianluca

A practical introduction to sequential feature selection | by Gianluca Malato | Feb, 2023

Jessie Hobb Feb 16, 2023 0

A gentle dive into this unusual feature selection techniqueFoto di Robert Stump su UnsplashFeature selection is always a challenging task for data scientists. Identifying the right set of features is crucial for the success of a model. There are several techniques that make use of the performance that a set of features gives to a model. One of them is the sequential feature selection.Sequential feature selection is a supervised approach to feature selection. It makes use of a supervised model and it can be used to remove…

How is your data distributed? A practical introduction to the Kolmogorov-Smirnov test | by Gianluca Malato | Nov, 2022

Jessie Hobb Nov 16, 2022 0

An introduction to the KS test for beginnersPhoto by papazachariasa on PixabayData Scientists often need to assess the proper distribution of their data. We have already seen the Shapiro-Wilk test for normality, but what about non-normal distributions? There’s another test that can help us, which is the Kolmogorov-Smirnov test.Data Scientists usually face the problem of checking the distribution of their data comes. They work with samples and need to check if they come from a normal distribution, a lognormal distribution,…

A practical introduction to the Shapiro-Wilk test for normality | by Gianluca Malato | Nov, 2022

Jessie Hobb Nov 8, 2022 0

How to assess the normality of a dataset in PythonImage by authorData scientists usually have to check if data is normally distributed. An example is the normality check on the residuals of linear regression in order to correctly use the F-test. Let’s see how we can check the normality of a dataset.Normality means that a particular sample has been generated from a Gaussian distribution. It doesn’t necessarily have to be a standardized normal distribution (with 0 mean and variance equal to 1).There are several situations…

3 easy hypothesis tests for the mean value | by Gianluca Malato | Oct, 2022

Jessie Hobb Oct 26, 2022 0

3 easy ways to compare the mean value of a sample with other, expected valuesPhoto by Edge2Edge Media on UnsplashData scientists and analysts often have to work with mean values and need to compare the mean value of a sample with a known expected value or the mean value of another sample. Statistics helps us with a powerful set of hypothesis tests we can perform for such tasks.Let’s say that we measure something like the height of Mount Everest. We know that it’s 8848 meters. After we measure it, we get 8840 meters with a…

Does your model beat the baseline? | by Gianluca Malato | Jul, 2022

Jessie Hobb Jul 5, 2022 0

Let’s compare our model with a trivial baselinePicture by Pixabay: https://www.pexels.com/it-it/foto/lampadina-chiara-355948/Every time we train a model we should check if its performance beats some baseline, which is a trivial model that doesn’t take the inputs into account. Comparing our model with a baseline model, we can actually figure out whether it actually learns or not.A baseline model is a model that actually doesn’t use the features, but uses a trivial, constant value for all the predictions. For a regression…

Is your dataset imbalanced?. Some techniques to see if a dataset is… | by Gianluca Malato | Jun, 2022

Jessie Hobb Jun 28, 2022 0

Some techniques to see if a dataset is imbalancedImage by authorDealing with imbalanced datasets is always hard for a data scientist. Such datasets can create trouble for our machine learning models if we don’t deal with them properly. So, measuring how much our dataset is imbalanced is important before taking the proper precautions. In this article, I suggest some possible techniques.We say that a classification dataset is imbalanced when there are some target classes with very low frequencies than others.Let’s see, for…

Which models require normalized data? | by Gianluca Malato | Jun, 2022

Jessie Hobb Jun 14, 2022 0

A brief overview about models that need pre-processed dataImage by authorData pre-processing is an important part of every machine learning project. A very useful transformation to be applied to data is normalization. Some models require it as mandatory to work properly. Let’s see some of them.Normalization is a general term related to the scaling of the variables. Scaling transforms a set of variables into a new set of variables that have the same order of magnitude. It’s usually a linear transformation, so it doesn’t…

Which models are interpretable?. A brief overview of some interpretable… | by Gianluca Malato | Jun, 2022

Jessie Hobb Jun 7, 2022 0

A brief overview of some interpretable machine learning modelsImage by authorModel explanation is an essential task in supervised machine learning. Explaining how a model can represent the information is crucial to understanding the dynamics that rule our data. Let’s see some models that are easy to interpret.Data Scientists have the role to extract information from raw data. They aren’t engineers, nor they are software developers. They dig inside data and extract the gold from the mine.Knowing what a model does and how…

How To Run A/B Tests. A simple guide on how to use statistics… | by Gianluca Malato | May, 2022

Jessie Hobb May 24, 2022 0

A simple guide on how to use statistics to run A/B tests on proportionsImage by authorOnline marketing and startup growth are better if you can continuously test different ideas. The statistic comes into help when we have to perform A/B tests. The results you may achieve with the proper analysis can give your project a great boost.Companies often need to compare the results of some action with the results of another one in order to identify the most performing one. Generally speaking, there’s often the need to check which…