
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Unique Rows in a NumPy Array
Duplicate rows in a dataset must frequently be found and removed in data science and machine learning and to solve this issue, a well-liked Python toolkit for numerical computation called NumPy offers a number of methods for manipulating arrays. In this tutorial, we'll go through how to use Python to locate unique rows in a NumPy array.
Installation and Setup
NumPy must first be installed using pip before it can be used in Python.
pip install numpy
Once installed, we can import the NumPy library in Python using the following statement ?
import numpy as np
Syntax
The NumPy function that we will use to find unique rows in a NumPy array is np.unique(). The syntax of this function is as follows ?
np.unique(arr, axis=0)
Here, arr is the NumPy array in which we want to find the unique rows, and axis is the axis along which to perform the uniqueness test. By default, axis=0 which means that we will perform the uniqueness test along the rows of the array.
Code Algorithm
Import the required library - Numpy
Create a NumPy array using np.array() with some duplicate rows.
Use np.unique() function to find unique rows and assign the result to a variable called unique_rows.
Finally, print the unique_rows array using print() function.
Example
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [1, 2, 3]]) unique_rows = np.unique(arr, axis=0) print(unique_rows)
Output
[[1 2 3] [4 5 6]]
Create a NumPy array arr with some duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array.
Example 2
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) unique_rows = np.unique(arr, axis=0) print(unique_rows)
Output
([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Create a NumPy array arr with no duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array.
Suppose we have a NumPy array representing a dataset with some duplicate rows. We want to find and remove these duplicate rows from the dataset. The dataset is given below ?
import numpy as np dataset = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [1, 2, 3, 4], [9, 10, 11, 12], [5, 6, 7, 8]]) unique_rows = np.unique(dataset, axis=0) print(unique_rows)
Output
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
Create a NumPy array dataset representing a dataset with some duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array. The output shows that the function successfully removed the duplicate rows from the dataset.
Applications
Due to the inherent complexity of data science and machine learning, it is frequently necessary to remove duplicate rows from a dataset to ensure model correctness and prevent overfitting. Finding unique rows in a NumPy array can be extremely challenging.
This is frequently achieved by setting up the information utilizing the np.unique() strategy which makes it simpler to find and concentrate the exceptional lines from a NumPy cluster so you might utilize them to make a shiny new dataset liberated from copies.
It is urgent to recollect that this strategy may not work for datasets with additional mind boggling geographies and is just suitable for 1D and 2D clusters and you ought to consider different methodologies in these circumstances to address difficulties presented by higher request complex datasets
Conclusion
The topic of finding unique rows in a NumPy array using Python was covered in this article. The ability of the np.unique() method to locate and eliminate duplicate rows from a dataset has been demonstrated. To illustrate how the function is used, we have given a few examples. There are several additional practical functions for manipulating arrays provided by the robust Python package NumPy.