How to Create Pivot Tables With pandas :

May 27, 2024 May 27, 2024

How to Create Pivot Tables With pandas
by:
blow post content copied from Real Python
click here to view original post

A pivot table is a data analysis tool that allows you to take columns of raw data from a pandas DataFrame, summarize them, and then analyze the summary data to reveal its insights.

Pivot tables allow you to perform common aggregate statistical calculations such as sums, counts, averages, and so on. Often, the information a pivot table produces reveals trends and other observations your original raw data hides.

Pivot tables were originally implemented in early spreadsheet packages and are still a commonly used feature of the latest ones. They can also be found in modern database applications and in programming languages. In this tutorial, you’ll learn how to implement a pivot table in Python using pandas’ DataFrame.pivot_table() method.

Before you start, you should familiarize yourself with what a pandas DataFrame looks like and how you can create one. Knowing the difference between a DataFrame and a pandas Series will also prove useful.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.

The other thing you’ll need for this tutorial is, of course, data. You’ll use the Sales Data Presentation - Dashboards data, which is freely available for you to use under the Apache 2.0 License. The data has been made available for you in the sales_data.csv file that you can download by clicking the link below.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

This table provides an explanation of the data you’ll use throughout this tutorial:

Column Name	Data Type (PyArrow)	Description
`order_number`	`int64`	Order number (unique)
`employee_id`	`int64`	Employee’s identifier (unique)
`employee_name`	`string`	Employee’s full name
`job_title`	`string`	Employee’s job title
`sales_region`	`string`	Sales region employee works within
`order_date`	`timestamp[ns]`	Date order was placed
`order_type`	`string`	Type of order (Retail or Wholesale)
`customer_type`	`string`	Type of customer (Business or Individual)
`customer_name`	`string`	Customer’s full name
`customer_state`	`string`	Customer’s state of residence
`product_category`	`string`	Category of product (Bath Products, Gift Basket, Olive Oil)
`product_number`	`string`	Product identifier (unique)
`product_name`	`string`	Name of product
`quantity`	`int64`	Quantity ordered
`unit_price`	`double`	Selling price of one product
`sale_price`	`double`	Total sale price (`unit_price` × `quantity`)

As you can see, the table stores data for a fictional set of orders. Each row contains information about a single order. You’ll become more familiar with the data as you work through the tutorial and try to solve the various challenge exercises contained within it.

Throughout this tutorial, you’ll use the pandas library to allow you to work with DataFrames and the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types pandas uses by default.

If you’re working at the command line, you can install both pandas and pyarrow using python -m pip install pandas pyarrow, perhaps within a virtual environment to avoid clashing with your existing environment. If you’re working within a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. With the libraries in place, you can then read your data into a DataFrame:

Python


>>> import pandas as pd

>>> sales_data = pd.read_csv(
...     "sales_data.csv",
...     parse_dates=["order_date"],
...     dayfirst=True,
... ).convert_dtypes(dtype_backend="pyarrow")

First of all, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the sales_data variable, you used pandas’ read_csv() function. The first parameter refers to the file being read, while parse_dates highlights that the order_date column’s data is intended to be read as the datetime64[ns] type. But there’s an issue that will prevent this from happening.

In your source file, the order dates are in dd/mm/yyyy format, so to tell read_csv() that the first part of each date represents a day, you also set the dayfirst parameter to True. This allows read_csv() to now read the order dates as datetime64[ns] types.

With order dates successfully read as datetime64[ns] types, the .convert_dtypes() method can then successfully convert them to a timestamp[ns][pyarrow] data type, and not the more general string[pyarrow] type it would have otherwise done. Although this may seem a bit circuitous, your efforts will allow you to analyze data by date should you need to do this.

If you want to take a look at the data, you can run sales_data.head(2). This will let you see the first two rows of your dataframe. When using .head(), it’s preferable to do so in a Jupyter Notebook because all of the columns are shown. Many Python REPLs show only the first and last few columns unless you use pd.set_option("display.max_columns", None) before you run .head().

If you want to verify that PyArrow types are being used, sales_data.dtypes will confirm it for you. As you’ll see, each data type contains [pyarrow] in its name.

Note: If you’re experienced in data analysis, you’re no doubt aware of the need for data cleansing. This is still important as you work with pivot tables, but it’s equally important to make sure your input data is also tidy.

Tidy data is organized as follows:

Each row should contain a single record or observation.
Each column should contain a single observable or variable.
Each cell should contain an atomic value.

If you tidy your data in this way, as part of your data cleansing, you’ll also be able to analyze it better. For example, rather than store address details in a single address field, it’s usually better to split it down into house_number, street_name, city, and country component fields. This allows you to analyze it by individual streets, cities, or countries more easily.

In addition, you’ll also be able to use the data from individual columns more readily in calculations. For example, if you had columns room_length and room_width, they can be multiplied together to give you room area information. If both values are stored together in a single column in a format such as "10 x 5", the calculation becomes more awkward.

The data within the sales_data.csv file is already in a suitably clean and tidy format for you to use in this tutorial. However, not all raw data you acquire will be.

It’s now time to create your first pandas pivot table with Python. To do this, first you’ll learn the basics of using the DataFrame’s .pivot_table() method.

Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.

Take the Quiz: Test your knowledge with our interactive “How to Create Pivot Tables With pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Create Pivot Tables With pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won't find all the answers by reading the tutorial, so you'll need to do some investigating on your own. By finding all the answers, you're sure to learn some other interesting things along the way.

How to Create Your First Pivot Table With pandas

Now that your learning journey is underway, it’s time to progress toward your first learning milestone and complete the following task:

Calculate the total sales for each type of order for each region.

Read the full article at https://realpython.com/how-to-pandas-pivot-table/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

May 27, 2024 at 07:30PM
Click here for more details...

=============================
The original post is available in Real Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Python

Python Reader

How to Create Pivot Tables With pandas :

How to Create Your First Pivot Table With pandas

Read the full article at https://realpython.com/how-to-pandas-pivot-table/ »

Post a Comment