Python for Beginners: Select Specific Columns in Pandas Dataframe :

Python for Beginners: Select Specific Columns in Pandas Dataframe
by:
blow post content copied from  Planet Python
click here to view original post


While working with dataframes in python, we sometimes need to select specific data. For this, we need to select one or more columns that may or may not be contiguous. I have already discussed how to select multiple columns in the pandas dataframe. This article will discuss different ways to select specific columns in a pandas dataframe. 

Select Specific Columns in Pandas Dataframe Using Column Names

To select specific columns from the pandas dataframe using the column names, you can pass a list of column names to the indexing operator as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df[["Maths", "Physics"]]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Maths  Physics
0    100       80
1     80      100
2     90       80
3    100      100
4     90       90
5     80       70

In this example, we first converted a list of dictionaries to a dataframe using the DataFrame() function. Then, we selected the "Maths" and "Physics" columns from the dataframe using the list ["Maths", "Physics"].

Select Specific Columns in Pandas Dataframe Using the Column Positions

If you don’t know the column names and only have the position of the columns, you can use the column attribute of the pandas dataframe to select specific columns. For this, we will use the following steps.

  • First, we will get a list of column names from the dataframe using the columns attribute.
  • Then, we will extract the name of specific columns that we want to select. For this, we will use the list containing column names and list comprehension.
  • After obtaining the list of specific column names, we can use it to select specific columns in the dataframe using the indexing operator.

You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
column_names=df.columns
reduired_indices=[0,2,3]
reuired_columns=[column_names[index] for index in reduired_indices]
print("The column names are:")
print(reuired_columns)
print("The columns are:")
columns=df[reuired_columns]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The column names are:
['Roll', 'Physics', 'Chemistry']
The columns are:
   Roll  Physics  Chemistry
0     1       80         90
1     2      100         90
2     3       80         70
3     4      100         90
4     5       90         80
5     6       70         70

In this example, we had to select the columns at positions 0, 2, and 3. For this, we created a variable reduired_indices with the list [0, 2, 3] as its value. Then, we used list comprehension and the python indexing operator to get the column names at the specified indices from the list of column names. We stored the specified column names in the reuired_columns variable. Then, we used the indexing operator to select the specific columns from the dataframe.

Select Specific Columns in a Dataframe Using the iloc Attribute

The iloc attribute in a pandas dataframe is used to select rows or columns at any given position. The iloc attribute of a dataframe returns an _ilocIndexer object. We can use this _ilocIndexer object to select columns from the dataframe. To select columns as specific positions using the iloc object, we will use the following syntax.

df.iloc[start_row:end_row, list_of_column_positions]

Here,

  • df is the input dataframe.
  • The start_row variable contains the start position of the rows that we want to include in the output.
  • The end_row variable contains the position of the last row that we want to include in the output. 
  • The list_of_column_positions variable contains the position of specific columns that we want to select from the dataframe. 

As we want to select all the rows and specified columns, we will keep start_row and end_row empty. We will just pass the list containing the position of specific columns to the list_of_column_positions variable for selecting the columns from the dataframe as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
list_of_column_positions=[0,2,3]
columns=df.iloc[:,list_of_column_positions]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Roll  Physics  Chemistry
0     1       80         90
1     2      100         90
2     3       80         70
3     4      100         90
4     5       90         80
5     6       70         70

In this example, we used the iloc attribute to select columns at positions 0, 2, and 3 in the dataframe.

Specific Columns in a Dataframe Using the loc Attribute

The loc attribute in a pandas dataframe is used to select rows or columns at any given index or column name respectively. The loc attribute of a dataframe returns a _LocIndexer object. We can use this _LocIndexer object to select columns from the dataframe using the column names. To select specific columns using the loc object, we will use the following syntax.

df.iloc[start_row_index:end_row_index, list_of_column_names]

Here,

  • df is the input dataframe.
  • The start_row_index variable contains the start index of the rows that we want to include in the output.
  • The end_row_index variable contains the index of the last row that we want to include in the output. 
  • The list_of_column_names variable contains the name of specific columns that we want to select from the dataframe. 

As we want to select all the rows and specified columns, we will keep start_row_index and end_row_index empty. We will just pass the list of specific column names to list_of_column_names for selecting the columns from the dataframe as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df.loc[:,["Maths", "Physics"]]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Maths  Physics
0    100       80
1     80      100
2     90       80
3    100      100
4     90       90
5     80       70

In this example, we have selected specific columns from the dataframe using a list of column names and the loc attribute.

Conclusion

In this article, we have discussed different ways to select specific columns in a pandas dataframe.

To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Select Specific Columns in Pandas Dataframe appeared first on PythonForBeginners.com.


January 27, 2023 at 07:30PM
Click here for more details...

=============================
The original post is available in Planet Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce