Thanks to visit codestin.com
Credit goes to www.tutorialspoint.com

NumPy - Loading Arrays



Loading Arrays in NumPy

NumPy loading arrays refers to the process of reading and loading data from external files or sources into NumPy arrays.

This functionality allows you to work with data that is stored in files such as text files, binary files, or other formats, and brings that data into the NumPy environment for analysis or manipulation. Following are the common methods used for loading arrays in NumPy −

  • Loading from Text Files: Use functions like np.loadtxt() or np.genfromtxt() to read data from text files.
  • Loading from Binary Files: Use np.fromfile() function to read data from binary files.
  • Loading from .npy Files: Use np.load() function to read data from files saved in NumPys native binary format (.npy files).

Loading Arrays from Text Files

Loading arrays from text files in NumPy is a common operation for importing data stored in plain text files into NumPy arrays.

NumPy provides np.loadtxt() function and np.genfromtxt() function to handle different text file formats and structures, making it easy to work with various types of text-based data, they are −

Using np.loadtxt() Function

The np.loadtxt() function is used for reading data from a text file into a NumPy array.

This function is commonly used for loading structured data that is organized in a tabular format, such as CSV files or space-separated files. It is suitable for data files where each line contains a row of numbers, and all rows have the same number of columns. Following is the syntax −

numpy.loadtxt(fname, dtype=<type>, delimiter=<delimiter>, comments=<char>, skiprows=<num>, usecols=<cols>)

Where,

  • fname: Filename or file object to read.
  • dtype: Data type of the resulting array (default is float).
  • delimiter: String or character separating values (e.g., comma, space).
  • comments: String indicating the start of a comment (e.g., #).
  • skiprows: Number of rows to skip at the beginning of the file.
  • usecols: Indices of columns to read (e.g., [0, 2] to read the first and third columns).

Example

Assume you have a text file "data.txt" with the following content −

1 2 3
4 5 6
7 8 9

You can load this data into a NumPy array using the loadtxt() function as shown below −

import numpy as np

# Load data from a text file
array_from_text = np.loadtxt('data.txt')

print("Array loaded from text file:")
print(array_from_text)

Using np.genfromtxt() Function

The np.genfromtxt() function is used to read data from text files into NumPy arrays. It is useful for handling more complex text file formats, including files with missing values, mixed data types, and irregular structures. Following is the syntax −

numpy.genfromtxt(fname, dtype=<type>, delimiter=<delimiter>, comments=<char>, skip_header=<num>, usecols=<cols>, filling_values=<value>, missing_values=<value>, converters=<dict>, encoding=<str>, names=<bool>)

Where,

  • fname: Filename or file object to read.
  • dtype: Data type of the resulting array. If not specified, defaults to float.
  • delimiter: String or character separating values (e.g., comma for CSV, space for space-separated).
  • comments: String indicating the start of a comment (e.g., #). Lines starting with this character are ignored.
  • skip_header: Number of lines to skip at the beginning of the file (useful for skipping headers).
  • usecols: Indices of columns to read. For example, [0, 2] will read only the first and third columns.
  • filling_values: Values to use for missing data. Can be a scalar or a dictionary mapping column indices to fill values.
  • missing_values: Values representing missing data in the file. Can be a scalar or a list of values.
  • converters: Dictionary of functions for converting columns to specific formats.
  • encoding: Encoding to use for reading the file (default is None, which uses the system default).
  • names: If True, the first line of the file is assumed to contain column names.

Example

In this example, we are loading the "data.txt" file into a NumPy array using the genfromtxt() function −

import numpy as np

# Load data from a text file
array = np.genfromtxt('data.txt')

print("Array loaded from text file:")
print(array)

Loading Arrays from Binary Files

Loading arrays from binary files in NumPy involves reading data that has been stored in a binary format, which is generally more efficient for storage and retrieval than text formats.

Binary files contain raw data, which must be interpreted correctly based on the expected format and data type. NumPy provides np.fromfile() function and np.load() function to load arrays from binary files.

Using np.fromfile() Function

The np.fromfile() function is used to load binary data from a file into a NumPy array. This function requires knowledge of the data type and format of the binary file. Following is the syntax −

numpy.fromfile(file, dtype=<type>, count=-1, offset=0)

Where,

  • file: Filename or file object to read.
  • dtype: Data type of the resulting array (e.g., np.float32, np.int32).
  • count: Number of items to read. If -1, read all data.
  • offset: Number of bytes to skip at the beginning of the file.

Example

Assume you have a binary file "data.bin" that contains "32-bit" float data. The file can be created using the following code −

import numpy as np

# Create a binary file with float data
data = np.array([1.1, 2.2, 3.3], dtype=np.float32)
data.tofile('data.bin')
print ('File created!!')

Now, to read this binary file, use the following code −

import numpy as np

# Load data from a binary file
array = np.fromfile('data.bin', dtype=np.float32)

print("Array loaded from binary file:")
print(array)

Following is the output of the above code −

Array loaded from binary file:
[1.1 2.2 3.3]

Using np.load() Function for .npy Files

The np.load() function in NumPy is used to load arrays or data from files in NumPys native binary format .npy or .npz. This format preserves the array's metadata, such as its shape and data type. The ".npz" format is used for storing multiple arrays in a compressed format.

Following is the syntax −

numpy.load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII')

Where,

  • file: The filename or file object to read. This can be a .npy file (for single arrays) or a .npz file (for multiple arrays).
  • mmap_mode: If not None, it is used to memory-map the file, which allows for large arrays to be read without loading the entire file into memory. Valid values are 'r', 'r+', 'w+', etc.
  • allow_pickle: If True, allows loading objects saved with Pythons pickle format. Be cautious with this option as it can execute arbitrary code and pose a security risk.
  • fix_imports: If True, tries to detect and fix Python 2 to Python 3 compatibility issues when loading pickled data.
  • encoding: The encoding used to decode Python 2 string data when loading Python 3 files. Default is 'ASCII'.

Example: Loading .npy Files

Here, we are first saving an array to the ".npy" file format −

import numpy as np

# Create a NumPy array
array = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)

# Save the array to a .npy file
np.save('data.npy', array)
print ("Saved!!")

Now, we are loading the saved arrays from ".npy" files using the load() function in NumPy −

import numpy as np

# Load the array from the .npy file
array = np.load('data.npy')

print("Array loaded from .npy file:")
print(array)

The output obtained is as shown below −

Array loaded from .npy file:
[[1 2 3]
 [4 5 6]]

Example: Loading .npz Files

The .npz format is used for saving multiple arrays into a single compressed file. It creates a zip archive where each file inside is an ".npy" file as shown in the following example −

import numpy as np

# Save multiple arrays to a .npz file
array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5, 6], [7, 8, 9]])
np.savez('data.npz', array1=array1, array2=array2)

# Load the arrays from the .npz file
data = np.load('data.npz')

# Access individual arrays using their keys
array1_loaded = data['array1']
array2_loaded = data['array2']

print("Array 1 loaded from .npz file:")
print(array1_loaded)

print("Array 2 loaded from .npz file:")
print(array2_loaded)

After executing the above code, we get the following output −

Array 1 loaded from .npz file:
[1 2 3]
Array 2 loaded from .npz file:
[[4 5 6]
 [7 8 9]]
Advertisements