Fancy Indexing with Python

NumPy
βŒ› 4 minutes read

Introduction

Hello developers, today we’re diving into a fascinating technique known as “Fancy Indexing” which might sound fancy itself. Think of Fancy Indexing as your toolkit for selecting, arranging, and transforming data with ease.

Basics of Array Indexing

Before we dive into the deep with Fancy Indexing, let’s take a look at basic array indexing.

Standard Python List Indexing

Python lists are ordered collections that give us a straightforward way to store items. Let’s see an example:

fruits = ["apple", "banana", "cherry", "date"]
print(fruits[2]) # cherry

What just happened? We created a list called fruits and then accessed the third item using an index. Remember, in Python, indexing starts at 0, so [2] gets us the third item: cherry.

Ever wondered about getting a range of items? That’s where slicing shines:

print(fruits[1:3]) # ['banana', 'cherry']

NumPy Array Indexing

Numpy offers a high-speed and optimized data structure that is powerful for large datasets (check this article where we discussed the NumPy arrays in detail)

import numpy as np

numbers = np.array([2, 4, 6, 8, 10])
print(numbers[3]) # 8

Looks familiar, doesn’t it? Just like our Python list, we accessed the fourth element of our numbers array, which is 8.

But here’s where NumPy arrays show their prowess. Let’s access multiple specific values:

print(numbers[[1, 3]]) # [4 8]

We grabbed the second and fourth elements together. We’ll see more of this powerful feature as we dive into Fancy Indexing.

What is Fancy Indexing?

Fancy Indexing lets you quickly access and modify data in ways traditional indexing just can’t match.

Fancy Indexing is a feature offered by NumPy that lets you access multiple, non-consecutive elements from an array using either integer arrays or boolean arrays.

Why is it so important? Because sometimes data isn’t linear or sequential. There are cases where you’ll want to access elements based on specific criteria or conditions. Fancy Indexing makes these tasks easy!

Let’s start with an example

import numpy as np

data_array = np.array([10, 20, 30, 40, 50, 60])
selected_indices = [1, 3, 5]

print(data_array[selected_indices]) # [20 40 60]

See what happened? We directly accessed the second, fourth, and sixth elements of data_array using an array of indices (selected_indices). Instead of accessing them one by one, we got them all in one go!

It’s important to note that Fancy Indexing returns copies of data, not views. What does this mean? Well, if you modify the output of a fancy indexed array, the original data remains unaffected.

data_array = np.array([10, 20, 30, 40, 50, 60])
subset = data_array[selected_indices]
subset[0] = 999

print(subset) # [999  40  60]
print(data_array) # [10 20 30 40 50 60]

Notice how changing a value in our subset didn’t alter the data_array.

Types of Fancy Indexing

Integer Array Indexing

When we think of arrays, numbers often come to mind. NumPy leverages this intuition by allowing you to use integer arrays to access data.

import numpy as np

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_indices = [0, 1]
col_indices = [1, 2]

print(matrix[row_indices, col_indices]) # [2 6]

Here we fetched the elements at (0,1) and (1,2) from our matrix.

Boolean Indexing

Sometimes, we want data based on conditions rather than positions. Boolean Indexing is a way to filter data using conditions.

Let’s take an example

data = np.array([15, 25, 35, 45])
condition = data > 30

print(data[condition]) # [35 45]

By setting a condition (data > 30), we managed to extract all numbers greater than 30 from our array.

Combined Indexing

Mixing and matching is not just for fashion; it’s also a data analyst’s secret sauce! Combining basic and fancy indexing allows for even more intricate data manipulations.

Let’s take an example

data_matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
print(data_matrix[2, [0, 2]]) # [35 45]

Here, we selected the third row and cherry-picked the first and third elements.

Examples

Let’s see some examples that cover many use cases

Selecting Random Elements

import numpy as np

data = np.array([10, 20, 30, 40, 50, 60, 70])
random_indices = np.random.choice(data, 3)

print(random_indices) # [50 10 40]

With np.random.choice(), we fetched three random elements from our array.

Modifying Values with Conditions

grades = np.array([85, 90, 78, 92, 88, 76])
grades[grades < 80] = 0

print(grades) # [85 90  0 92 88  0]

Here, we transformed all grades below 80 to 0.

Fetching Rows from a 2D Array

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
selected_rows = [0, 3]

print(matrix[selected_rows])

# output
[[ 1  2  3]
 [10 11 12]]

With just a simple line, we’ve got the first and last rows of our matrix.

Conditional Data Extraction with Context

sales_data = np.array([150, 230, 175, 290, 80, 210])
top_sellers = sales_data[sales_data > 200]

print(top_sellers) # [230 290 210]

With just a snippet, we’ve isolated the top-selling products.

Further Reading

Conclusion

In conclusion, we’ve discussed Fancy Indexing and its power in data analysis. It makes data manipulation both intuitive and efficient.

This tool, can transform the way you interact with and understand datasets. But, as with all tools, its true potential shines brightest when applied in real scenarios. We encourage you to dive into datasets, employ Fancy Indexing, and see the magic. Happy coding!

Leave a Reply

Your email address will not be published. Required fields are marked *