Techno Blender
Digitally Yours.
Browsing Tag

Chawla

Seven Ways to Optimize Memory Usage in Pandas | by Avi Chawla

Simple tips to optimize the memory utilization in PandasPhoto by Denise Jans on UnsplashDesigning and building real-world applicable machine learning models has always been of great interest to data scientists. This has inevitably led them to leverage optimized, efficient, and accurate methods at scale.Optimization, both on the level of run-time and memory, plays a foundational role in sustainably delivering real-world and user-facing software solutions.Optimization Categorization (Image by author)In one of my earlier…

A Simple Guide to Inplace Operations in Pandas | by Avi Chawla | Aug, 2022

Introduction to inplace operations in Pandas, exploring commonly supported methods and a common misconceptionsPhoto by Sigmund on UnsplashInplace assignment operations are widely popular in transforming Pandas DataFrames. As the name suggests, the core idea behind inplace assignment is to avoid creating a new DataFrame object with each successive modification but instead making changes to the original DataFrame itself.Inplace and Standard Assignment Operation (Image by author)Inplace assignment operations are especially…

A Step-By-Step Guide To Summarizing Audio Files in Python | by Avi Chawla | Aug, 2022

Speech summarization made easyPhoto by Daniel Schludi on UnsplashAs the name suggests, summarization is the process of generating a concise summary of a given piece of information. This information can appear as text, audio, video, pictures, etc. In other words, summarization is the process of selecting/generating relevant pieces of information that are representative of the entire input.Building a data-driven summarization system is a common task in natural language processing due to its broad downstream applicability in…

5 String-Based Filtering Methods Every Pandas User Should Know | by Avi Chawla | Aug, 2022

Before I proceed with the popular methods in Pandas to filter data on string values, let’s understand how you can identify a column with a string data type.In Pandas, the data type of a string column is represented as object. To determine the data type, you can use the dtype attribute of a series as follows:Here, you should note that even if a single value in a series is a string, the whole column will be interpreted as a string-type column. For instance, let’s change the first value in col2 from 1 to “1".This time, the…

10 Pandas Questions Asked a Decade Ago on StackOverflow That Are Still Relevant Today | by Avi Chawla | Jul, 2022

Iterating (also known as looping) is visiting every row in the DataFrame individually and performing some operation.Consider the DataFrame below:In Pandas, you can iterate in three different ways, using range(len(df)), iterrows() and itertuples().I discussed different methods of iterating over a DataFrame in detail in the following blog post:This question is intended to know about filtering a DataFrame based on a condition. To understand popular filtering methods, consider the DataFrame below:Some methods to filter a…

Introduction to PandaSQL: The Downsides No One Talks About | by Avi Chawla | Jul, 2022

Pandas + SQL = PandaSQL = A big messPhoto by Joshua Hoehne on UnsplashBoth Structured Query Language (SQL) and Pandas are undoubtedly the go-to tools for Data Scientists for tabular data management, processing, and analysis.While Pandas is a popular Python library for data analysis used by Data Scientists, SQL is an entire programming language of its own to interact with Databases spanning applicability across various domains of computer science. One thing that stands out in common between them is that both are incredible…

Improve Your Data Science Workflow with Rolling Functions in Pandas | by Avi Chawla | Jul, 2022

A guide to Rolling Features in PandasPhoto by Annie Spratt on UnsplashMuch of the tabular data analysis we see today is driven by popular Pandas’ series-based methods, which take into account the entire data at once for analysis. These methods usually encompass evaluating series distribution using value_counts(), determining the unique values using unique(), finding the distribution of one column which is segregated based on the values in another column using groupby(), or generating a cross-tabulation of values from…

20% of NumPy Functions that Data Scientists use 80% of the Time | by Avi Chawla | Jul, 2022

Who said you should know everything?Photo by Austin Distel on UnsplashNumPy (or Numeric Python) sits at the core of every Data Science and Machine Learning project. It is undoubtedly one of the most important libraries ever built in Python. Moreover, the whole data-driven ecosystem is in some way or the other dependent upon NumPy and its core functions.Given that the library holds wide applicability in industry and academia due to its unparalleled potential, acquaintance with its functions and syntax has become an utmost…

Conversational Sentiment Analysis on Audio Data | by Avi Chawla | Jul, 2022

Analyzing sentiment in SpeechPhoto by Towfiqu barbhuiya on UnsplashSentiment Analysis, also known as opinion mining, is a popular task in Natural Language Processing (NLP) due to its diverse industrial applications. In the context of applying NLP techniques specifically to textual data, the primary objective is to train a model that can classify a given piece of text between different sentiment classes. A high-level overview of a sentiment classifier is shown in the image below.An overview of the Sentiment Analysis model…

Five Killer Optimization Techniques Every Pandas User Should Know | by Avi Chawla | Jul, 2022

A step towards data analysis run-time optimizationPhoto by Brad Neathery on UnsplashThe motivation to design and build real-world applicable machine learning models has always intrigued Data Scientists to leverage optimized, efficient, and accurate methods at scale. Optimization plays a foundational role in sustainably delivering real-world and user-facing software solutions.While I understand that not everyone is building solutions at scale, awareness about various optimization and time-saving techniques is nevertheless…