
Picture by Creator
CuPy is a Python library that’s appropriate with NumPy and SciPy arrays, designed for GPU-accelerated computing. By changing NumPy with CuPy syntax, you’ll be able to run your code on NVIDIA CUDA or AMD ROCm platforms. This lets you carry out array-related duties utilizing GPU acceleration, which ends up in sooner processing of bigger arrays.
By swapping out just some strains of code, you’ll be able to make the most of the huge parallel processing energy of GPUs to considerably velocity up array operations like indexing, normalization, and matrix multiplication.
CuPy additionally permits entry to low-level CUDA options. It permits passing of ndarrays to present CUDA C/C++ applications utilizing RawKernels, streamlines efficiency with Streams, and permits direct calling of CUDA Runtime APIs.
You’ll be able to set up CuPy utilizing pip, however earlier than that it’s a must to discover out the proper CUDA model utilizing the command beneath.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Company
Constructed on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation instruments, launch 11.8, V11.8.89
Construct cuda_11.8.r11.8/compiler.31833905_0
Evidently the present model of Google Colab is utilizing CUDA model 11.8. Due to this fact, we’ll proceed to put in the cupy-cuda11x model.
If you’re working on an older CUDA model, I’ve offered a desk beneath that will help you decide the suitable CuPy bundle to put in.
Picture from CuPy 12.2.0
After choosing the proper model, we’ll set up the Python bundle utilizing pip.
You may as well use the conda command to mechanically detect and set up the right model of the CuPy bundle in case you have Anaconda put in.
conda set up -c conda-forge cupy
On this part, we’ll examine the syntax of CuPy with Numpy and they’re 95% comparable. As an alternative of utilizing np you may be changing it with cp.
We are going to first create a NumPy and CuPy array utilizing the Python checklist. After that, we’ll calculate the norm of the vector.
import cupy as cp
import numpy as np
x = [3, 4, 5]
x_np = np.array(x)
x_cp = cp.array(x)
l2_np = np.linalg.norm(x_np)
l2_cp = cp.linalg.norm(x_cp)
print("Numpy: ", l2_np)
print("Cupy: ", l2_cp)
As we are able to see, we obtained comparable outcomes.
Numpy: 7.0710678118654755
Cupy: 7.0710678118654755
To transform a NumPy to CuPy array, you’ll be able to merely use cp.asarray(X).
x_array = np.array([10, 22, 30])
x_cp_array = cp.asarray(x_array)
sort(x_cp_array)
Or, use .get(), to transform CuPy to Numpy array.
x_np_array = x_cp_array.get()
sort(x_np_array)
On this part, we will likely be evaluating the efficiency of NumPy and CuPy.
We are going to use time.time() to time the code execution time. Then, we’ll create a 3D NumPy array and carry out some mathematical capabilities.
import time
# NumPy and CPU Runtime
s = time.time()
x_cpu = np.ones((1000, 100, 1000))
np_result = np.sqrt(np.sum(x_cpu**2, axis=-1))
e = time.time()
np_time = e - s
print("Time consumed by NumPy: ", np_time)
Time consumed by NumPy: 0.5474584102630615
Equally, we’ll create a 3D CuPy array, carry out mathematical operations, and time it for efficiency.
# CuPy and GPU Runtime
s = time.time()
x_gpu = cp.ones((1000, 100, 1000))
cp_result = cp.sqrt(cp.sum(x_gpu**2, axis=-1))
e = time.time()
cp_time = e - s
print("nTime consumed by CuPy: ", cp_time)
Time consumed by CuPy: 0.001028299331665039
To calculate the distinction, we’ll divide NumPy time with CuPy time and It looks like we obtained above 500X efficiency enhance whereas utilizing CuPy.
diff = np_time/cp_time
print(f'nCuPy is {diff: .2f} X time sooner than NumPy')
CuPy is 532.39 X time sooner than NumPy
Notice: To realize higher outcomes, it is suggested to conduct just a few warm-up runs to attenuate timing fluctuations.
Past its velocity benefit, CuPy provides superior multi-GPU help, enabling harnessing of collective energy of a number of GPUs.
Additionally, you’ll be able to take a look at my Colab pocket book, if you wish to examine the outcomes.
In conclusion, CuPy offers a easy approach to speed up NumPy code on NVIDIA GPUs. By making just some modifications to swap out NumPy for CuPy, you’ll be able to expertise order-of-magnitude speedups on array computations. This efficiency enhance lets you work with a lot bigger datasets and fashions, enabling extra superior machine studying and scientific computing.
Assets
Abid Ali Awan (@1abidaliawan) is an authorized information scientist skilled who loves constructing machine studying fashions. At the moment, he’s specializing in content material creation and writing technical blogs on machine studying and information science applied sciences. Abid holds a Grasp’s diploma in Know-how Administration and a bachelor’s diploma in Telecommunication Engineering. His imaginative and prescient is to construct an AI product utilizing a graph neural community for college kids scuffling with psychological sickness.