NumPy Array Slicing – Ultimate Guide : Tony Dexter
by: Tony Dexter
blow post content copied from Finxter
click here to view original post
Python NumPy array slicing is used to extract parts of data from an array.
Array Slicing is often used when working with NumPy. In this article, we will go over the methods of array slicing, from basic to more advanced techniques. We will use the np.array()
function to create our array examples.
Before practicing any of the examples used in this article, please make sure the NumPy library is installed.
Recommended: How to Install NumPy?
We will need to import NumPy before using the code snippets given below:
import numpy as np
Slicing Arrays – The Basics
The most common method used to slice NumPy arrays is the slice
function. Which, similar to the slicing of lists, takes three arguments:
- The
start
index, - an
end
index and - a
step
value (with a default of 1).
When you slice an array, you create a new view of that array with the same data, but with different indices. The original array is not changed.
The most common way to slice an array is with the syntax:
arr[start:stop:step]
Choosing the start
index to be included and the end
index to be excluded:
array_1 = np.array([1, 2, 3, 4, 5, 6, 7]) # [1 2 3 4 5 6 7] print(array_1[1:4]) ''' [2 3 4] '''
If you only specify a start
index, the end index will default to the length of the array:
array_1 = np.array([1, 2, 3, 4, 5, 6, 7]) # [1 2 3 4 5 6 7] print(array_1[1:]) ''' [2 3 4 5 6 7] '''
If you only specify an end
index, the start
index will default to the beginning of the array:
array_1 = np.array([1, 2, 3, 4, 5, 6, 7]) # [1 2 3 4 5 6 7] print(array_1[:5]) ''' [1 2 3 4 5] '''
A step, as mentioned before, is by default set at 1, but it can also be defined:
array_1 = np.array([1, 2, 3, 4, 5, 6, 7]) # [1 2 3 4 5 6 7] print(array_1[1:6:2]) ''' [2 4 6] '''
We can also use negative index values:
array_1 = np.array([1, 2, 3, 4, 5, 6, 7]) # [1 2 3 4 5 6 7] print(array_1[-6:-2]) ''' [2 3 4 5] '''
Slicing 2D Arrays
In NumPy, two-dimensional arrays are made up of rows and columns. When slicing the 2D arrays the rows are considered as the first index and the columns as the second index.
Let’s have a look at the following example:
arr = np.array([[1,2,3], [4,6,8], [3,2,7], [5,7,9]]) print(arr) ''' [[1 2 3] [4 6 8] [3 2 7] [5 7 9]] ''' print(arr[2]) ''' [3 2 7] '''
We first print out the entire 2D array. Then, with the second print()
function, only a single index was passed, so the array at index 2 (row 3) is returned.
Slice Column
When working with NumPy arrays, slicing columns to select specific data points is often useful.
For example, suppose we have an array of employee data, including employees’ names, ages and job descriptions. We can use column slicing to select and output all the employees’ ages.
First, let’s create an array with some employee data:
employees = np.array([["John", 28, "Supervisor"], ["Sarah", 39, "Manager"], ["Kyle", 35, "CFO"], ["Kate", 48, "CEO"]]) ''' [["John", 28, "Supervisor"], ["Sarah", 39, "Manager"], ["Kyle", 35, "CFO"], ["Kate", 48, "CEO"]] '''
As we can see in the array above the employee ages are shown in column 2 (index 1), let’s slice that column from the array:
employees_ages = employees[:, 1] print(employees_ages) ''' ['28' '39' '35' '48'] '''
In this example, the first index of the array (the rows) is passed before the comma [0:4, 1]
.
Notice that in this syntax, the end
index is included.
The second index of the array (the columns) is passed after the comma [0:4, 1]
.
- The first index of the array has been passed from its 0 index to its final index, meaning that all the rows are included.
- The second index of the array has been passed as its index of 1, meaning that column 2 is selected.
All values from column 2 in all rows have been sliced: ['28' '39' '35' '48']
In this example, the integers are outputted as string items. To convert the whole array we can use the np.int_()
method, simply pass the array to the function:
print(np.int_(employees_ages)) ''' [28 39 35 48] '''
If we only need to slice the first column; using our employee dataset, we would use the following syntax:
employee_names = employees[:, 0] ''' ['John' 'Sarah' 'Kyle' 'Kate'] '''
Slice Multiple Columns
If you want to select multiple columns from a 2D NumPy array, you can do it by passing a list of column indices to the indexing operator.
For example, if we have an array like this:
array_2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) ''' [[1,2,3] [4,5,6] [7,8,9]] '''
If we want to select the second and third columns, we would use the following code:
array_3 = array_2[:, [1, 2]] print(array_3) ''' [[2 3] [5 6] [8 9]] '''
This would return a new array with shape (3, 2)
containing the values [[2,3],[5,6],[8,9]]
.
Slice with Condition
Let’s use our employee example:
employees = np.array([["John", 28, "Supervisor"], ["Sarah", 39, "Manager"], ["Kyle", 35, "CFO"], ["Kate", 48, "CEO"]])
Now, let’s say we want to only select certain elements from the original array based on certain criteria.
For example, we only want to return the employee data of the employees aged between 25 and 45. We can do this by indexing with a boolean array:
# For our example we first need to change our values to integers employees_ages = employees[:, 1] ages_new = np.int_(employees_ages)) ''' [28 39 35 48] '''
We will then use boolean indexing to create a Boolean Series for each of these two criteria:
bool_arr = ages_new > 25 ''' [True True True True] ''' bool_arr_2 = ages_new < 45 ''' [True True True False] '''
A neater way of doing this is to make a single boolean series that has True only if both criteria are met. For this, we will use np.logical_and()
np.logical_and()
combines the two input sequences into a new sequence that only has True
in positions where both of the input sequences have a True
in the corresponding position:
is_between_25_45 = np.logical_and(bool_arr, bool_arr_2) ''' [True True True False] '''
We want the integer values between 25 and 45 as our output. We can achieve this all in a single line by passing the complete comparison expression to the subscript operator []
of the original array:
ages_25_45 = ages_new[np.logical_and(ages_new > 25 , ages_new < 45)] print(ages_25_45) ''' [28 39 35] '''
Slice Assignment
Using slice assignment, we can overwrite specific elements in an array. Simply use the same slice notation on the left side of an assignment statement.
arr_1 = np.arange(10) arr_1[2:5] = 100 print(arr_1) ''' [ 0 1 100 100 100 5 6 7 8 9] '''
In this example, we used the np.arange()
function to create our NumPy array. This creates an array of integers within the given range.
Note that when you assign to a slice, the value replaces the entire specified slice. Here it replaces the slice of arr_1
with 100.
Compare normal slice syntax and slice assignment syntax:
- Slicing:
b = a[0:2]
. This makes a copy of the slice ofa
and assigns it tob
. - Slice assignment:
a[0:2] = b
. This replaces the slice of a with the contents ofb
.
Advanced slice notation is required for slice assignment of 2D arrays (using the comma to denote the different axis in the array). You can thus replace whole rows or columns in a 2D array with slice assignment:
arr_x = np.array([[1,2,3],[4,5,6],[7,8,9]]) arr_x[0, :] = 100 print(arr_x) ''' [[100 100 100] [ 4 5 6] [ 7 8 9]] ''' arr_y = np.array([[1,2,3],[4,5,6],[7,8,9]]) arr_y[:, 0] = 100 print(arr_y) ''' [[100 2 3] [100 5 6] [100 8 9]] '''
Slice Every Other
If you would like to return every other element in an array, you can do so by using the “slice every other” technique. To do this, you will need to specify the start
index, end
index and step
size. Step size is how many elements to “skip” before returning the next element.
For example, if we have an array of length 9
:
array_5 = np.arange(9)
If we would like to return every other element starting from the second element, our code would look like this:
array_odd = array_5[1::2]
This code would return: [1 3 5 7]
Slice with List
Finally, let’s have a look at array slicing using a list.
Assuming we have an array:
arr_6 = np.arange(10, 20) ''' [10 11 12 13 14 15 16 174 18 19] '''
We can use a list, which will in effect be a list of indices, to slice our array to create a new array reflecting the data at the given list of indices:
Let’s create our list:
filter_indices = [1,3,5,7]
Then use our list to slice the array:
arr_ind = arr_6[filter_indices]
The new array will then be:
[11 13 15 17]
To slice a 2D Array with a list we can use the np.take()
function, which takes elements from an array along an axis.
We use the following syntax:
np.take(arr, indices, axis)
Let’s look at the following example:
filter_indices = [1, 2] arr_10 = np.array([[5, 10, 15, 20, 25], [50, 100, 150, 200, 250], [500, 1000, 1500, 2000, 2500]]) axis_zero = 0 print(np.take(arr_10, filter_indices, axis_zero)) ''' [[ 50 100 150 200 250] [ 500 1000 1500 2000 2500]] ''' axis_one = 1 print(np.take(arr_10, filter_indices, axis_one)) ''' [[ 10 15] [ 100 150] [1000 1500]] '''
If we use axis_zero = 0
and filter indices 1, 2 we get a new array consisting of data from the two rows at index 1 and 2.
If we use axis_one = 1
and filter indices 1, 2 we get a new array consisting of the two columns at index 1 and 2.
Well done! You have reached the end of this NumPy Array slicing tutorial. You should now be more familiar with NumPy Array slicing, which you may find very useful for extracting specific data from large data sets.
December 17, 2022 at 05:37PM
Click here for more details...
=============================
The original post is available in Finxter by Tony Dexter
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================
Post a Comment