Make Your Tabular Data Stand Out via CLI With These Tips and Tricks | by Federico Trotta | Apr, 2023

By Jessie Hobb On Apr 13, 2023

Enhance readability: tips for displaying datasets in the CLI

A couple of days ago I wanted to help my father solve a problem. His need was to aggregate, filter, and display some data as fast as possible. Well…the truth is that he printed the data (something like 10 pages each time!!) and search the data by hand! I saw his difficulties and decided to help him immediately.

Nothing as difficult for someone who can analyze data as I am: the data was already in Excel format, so a Jupyter Notebook and Pandas were the perfect choices.

The problem is that I don’t work for my father. Also, we leave in different cities and see each other only every couple of weeks. So, I needed to give him a tool he could use with the following characteristics:

Simple usage.
Don’t make the PC explodes if he writes something wrong.

This is why I thought to create a small program that could be managed via CLI. What I wanted to create was easy: the user writes something via the command line. Then, the program shows the user all the data associated as Pandas would do, but in the terminal.

So, in this article, I’ll show you how we can display tabular data via CLI (Command Line Interface. That is: the terminal, in case you didn’t know). We’ll create a simple project to get you immediately hands-on Python and discuss the libraries I used.

So, first of all, let’s create some tabular data just for the sake of the exercise:

import pandas as pd# Create data
data = {"fruit":["banana", "apple", "pear", "orange"],
"color":["yellow", "red", "green", "orange"],
"weight(kg)":[0.3, 0.1, 0.1, 0.2]
}
# Transform data to data frame
df = pd.DataFrame(data)

So, these are our data:

The tabular data we created. Image by Author.

We have created some tabular data that contain information on some fruits, particularly: the name of the fruit, the color, and the weight in kilograms.

Now, to make it “more real”, we can save it into an Excel file like so:

# Save data frame to xlsx file
df.to_excel("fruit.xlsx")

NOTE:
This methodology of saving files that Pandas gives us is very useful.
For example, we can use it to convert CSV files into XLSX; We did it
in this article here.

Now, we’ll create a simple filter that doesn’t need Python if we can use Excel a little bit. The problem I faced was more complicated, but here we’re creating it simply on purpose: our scope is not to show that this method is better than another. Here we’re showing how we can display tabular data via CLI, and a simple example will do the job.

So, let’s say this is our problem: we want the user to write the name of a fruit and our program returns all the features of the fruit chosen. We also want the filter to be somehow “intelligent” so that if the user writes “pea” it will display the features related to “pear”.

To do so, in Pandas we can use the method str.contains(). Let’s try it in our Jupyter Notebook:

import pandas as pd# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for pear
data_frame = df[df["fruit"].str.contains("pea")]
# Show filtered data
data_frame.head()

And we get:

The filtered data. Image by Author.

Read it carefully: we wrote “pea” as a typo on purpose to get sure Pandas returns the data anyway. And it does, as expected.

So, now we have to face another problem: the intervention of the user via CLI. As far as I know, we can use two different methods in these cases: we can use the input built-in function or we can use the library argparse.

In case you missed it, I’ve written an article on how we can use argparse in Data Science. Check it out here:

Now, in this case, I decided to use the input built-in function because I believe it’s easier to use, and in simple cases like these is a very good choice. In fact, this is the perfect choice if we just need to pass a string as an argument via CLI (you can read the documentation here).

We can use the input function like so:

# User input
fruit = input("filter the data for the kind of fruit: ")

Now, let’s see how this works and how it returns the data. This is the code we can use:

import pandas as pd# User input
fruit = input("filter the data for the kind of fruit: ")
# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for user input
data_frame = df[df["fruit"].str.contains(fruit)]
# Print results
print(data_frame)

NOTE:
look at the difference of how we've pasted the arguments in the method
str.contains(). Aboved we've passed "pea" with quotes because we were
searching directly for a string.
In this case, insetead, we have passed "fruit" without quotes because
we have used "fruit" as a variable to invoke the input() function so it
has to be passed as is (with no quotes).

Now, let’s save it as fruit.py, move it to the folder where fruit.xlsx is located and let’s run it via the terminal:

Well, as we can see, everything works fine. But just one thing: can we improve the visualization? What if we’d like to display better the data as if we were on Pandas?

Well, the solution I found was to use the library tabulate (here’s the documentation).

So, let’s add tabulate to our code and see what happens:

import pandas as pd
from tabulate import tabulate# User input
fruit = input("filter the data for the kind of fruit: ")
# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for user input
data_frame = df[df["fruit"].str.contains(fruit)]
# Print results
print(tabulate(data_frame, headers='keys', tablefmt='psql'))

And we get:

As we can see, the data are displayed in a “tabular way”, which is cleaner. Also, as we can see, the code correctly manages typos if we search for “pea” as we did in Jupyter before.

I hope this helps you if you need to display tabular data via CLI. If you have any other suggestions, please, let me know in the comments: I’m always open to improving and learning something new.

Enhance readability: tips for displaying datasets in the CLI

Nothing as difficult for someone who can analyze data as I am: the data was already in Excel format, so a Jupyter Notebook and Pandas were the perfect choices.

Simple usage.
Don’t make the PC explodes if he writes something wrong.

So, first of all, let’s create some tabular data just for the sake of the exercise:

import pandas as pd# Create data
data = {"fruit":["banana", "apple", "pear", "orange"],
"color":["yellow", "red", "green", "orange"],
"weight(kg)":[0.3, 0.1, 0.1, 0.2]
}
# Transform data to data frame
df = pd.DataFrame(data)

So, these are our data:

We have created some tabular data that contain information on some fruits, particularly: the name of the fruit, the color, and the weight in kilograms.

Now, to make it “more real”, we can save it into an Excel file like so:

# Save data frame to xlsx file
df.to_excel("fruit.xlsx")

NOTE:
This methodology of saving files that Pandas gives us is very useful.
For example, we can use it to convert CSV files into XLSX; We did it
in this article here.

To do so, in Pandas we can use the method str.contains(). Let’s try it in our Jupyter Notebook:

import pandas as pd# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for pear
data_frame = df[df["fruit"].str.contains("pea")]
# Show filtered data
data_frame.head()

And we get:

The filtered data. Image by Author.

Read it carefully: we wrote “pea” as a typo on purpose to get sure Pandas returns the data anyway. And it does, as expected.

In case you missed it, I’ve written an article on how we can use argparse in Data Science. Check it out here:

We can use the input function like so:

# User input
fruit = input("filter the data for the kind of fruit: ")

Now, let’s see how this works and how it returns the data. This is the code we can use:

import pandas as pd# User input
fruit = input("filter the data for the kind of fruit: ")
# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for user input
data_frame = df[df["fruit"].str.contains(fruit)]
# Print results
print(data_frame)

NOTE:
look at the difference of how we've pasted the arguments in the method
str.contains(). Aboved we've passed "pea" with quotes because we were
searching directly for a string.
In this case, insetead, we have passed "fruit" without quotes because
we have used "fruit" as a variable to invoke the input() function so it
has to be passed as is (with no quotes).

Now, let’s save it as fruit.py, move it to the folder where fruit.xlsx is located and let’s run it via the terminal:

Well, as we can see, everything works fine. But just one thing: can we improve the visualization? What if we’d like to display better the data as if we were on Pandas?

Well, the solution I found was to use the library tabulate (here’s the documentation).

So, let’s add tabulate to our code and see what happens:

import pandas as pd
from tabulate import tabulate# User input
fruit = input("filter the data for the kind of fruit: ")
# Import data
df = pd.read_excel("fruit.xlsx")
# Filter for user input
data_frame = df[df["fruit"].str.contains(fruit)]
# Print results
print(tabulate(data_frame, headers='keys', tablefmt='psql'))

And we get:

As we can see, the data are displayed in a “tabular way”, which is cleaner. Also, as we can see, the code correctly manages typos if we search for “pea” as we did in Jupyter before.

I hope this helps you if you need to display tabular data via CLI. If you have any other suggestions, please, let me know in the comments: I’m always open to improving and learning something new.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.