Learn how to speed up compute-intensive applications with the power of modern GPUs
The most common deep learning frameworks such as Tensorflow and PyThorch often rely on kernel calls in order to use the GPU for parallel computations and accelerate the computation of neural networks. The most famous interface that allows developers to program using the GPU is CUDA, created by NVIDIA.
Parallel computing requires a completely different point of view from ordinary programming, but before you start getting your hands dirty, there are terminologies and concepts to learn.
Background
In the following figure, you see a classic set-up in which we have an input that is processed by the CPU one instruction at a time to generate an output. But how do we process multiple instructions at the same time? This is what we will try to understand in this article.
Terminology
- Process: is an instance of a computer program that is being executed. A process is run on the CPU and creates an allocation in RAM.
- Context: a collection of data of a process (memory address, program state). It allows the processor to suspend or hold the execution of a process and restart it later.
- Thread: It is a component of a process. Every process has at least one thread called main thread which is the entry point of the program. A thread can execute some instructions. Within a process multiple threads can co-exist and share the memory allocated to that particular process. Between processes there is no memory sharing.
- Round Robin: if we have a single core processor, the processes will be executed with the Round Robin schedule, execute first process with the most priority. When you switch from the execution of one process to another is called context switching.
Parallelism
In modern computing there is no need for context switching, we can run different threads on different cores (for now think of a core as a small processor), that’s why we have multiple cores devices!
But remember that in almost every process there are some instructions which should be performed sequentially and some others that can be computed simultaneously in parallel.
When you talk about parallelism you should remember that there are 2 types of parallelism:
- Task Level: different tasks on the same or different data
- Data Level: same task on different data
At this point, be careful not to confuse parallelism with concurrency.
- Concurrency: we have a single processor, that executes processes sequentially, and we just have the illusion of parallelism because the processor is really fast.
- Parallelism: true parallelism with multiple processors.
CPU, GPU and GPGPU
Graphical processing units (GPU) can perform complex actions in a short period. The complexity relies upon the quantity of operations executed
simultaneously, but only as long as they remain simple and basically similar. The game industry has been the launching market for the GPU implementation, later reached by Nvidia company through the platform CUDA. The notoriety of GPU has increased even more, especially for developers which were now able to run multiple computing actions using a few lines of code.
CUDA allows us to use parallel computing for so-called general-purpose computing on graphics processing units (GPGPU), i.e. using GPUs for more general purposes besides 3D graphics
Let’s summarize some basic differences between CPUs and GPUs.
GPU:
- low clock speed
- thousands of cores
- context switching is done by hardware (really fast)
- can switch between threads if one thread stalls
CPU:
- high clock speed
- few cores
- context switching is done by software (slow)
- switching between threads it’s not that easy
Basic Steps of Cuda Programming
In the next articles, we are going to write code to use parallel programming. However, we must first know what the structure of a cuda-based code is, there are a few simple steps to follow.
- Initialization of data on CPU
- Transfer data from CPU to GPU
- Kernel launch (instructions on GPU)
- Transfer results back to CPU from GPU
- Reclaim the memory from both CPU and GPU
In such an environment we will call Host Code the code that is going to run on CPU and Device Code the code that is going to run og GPU.
There is much more to see, but I prefer not to write it all down in one article because I think it would be more confusing than anything else.
Parallel programming is the basis of everything today, it is the way we have to speed up very long computation times, simply more processors or more cores working together, unity is strength!
If you’re interested in understanding how GPUs work and you like to program closely with hardware, keep reading this series of articles I’m going to publish. Personally I have found and am finding the study of CUDA extremely interesting every line of code seems like a puzzle to be solved.
Marcello Politi
Learn how to speed up compute-intensive applications with the power of modern GPUs
The most common deep learning frameworks such as Tensorflow and PyThorch often rely on kernel calls in order to use the GPU for parallel computations and accelerate the computation of neural networks. The most famous interface that allows developers to program using the GPU is CUDA, created by NVIDIA.
Parallel computing requires a completely different point of view from ordinary programming, but before you start getting your hands dirty, there are terminologies and concepts to learn.
Background
In the following figure, you see a classic set-up in which we have an input that is processed by the CPU one instruction at a time to generate an output. But how do we process multiple instructions at the same time? This is what we will try to understand in this article.
Terminology
- Process: is an instance of a computer program that is being executed. A process is run on the CPU and creates an allocation in RAM.
- Context: a collection of data of a process (memory address, program state). It allows the processor to suspend or hold the execution of a process and restart it later.
- Thread: It is a component of a process. Every process has at least one thread called main thread which is the entry point of the program. A thread can execute some instructions. Within a process multiple threads can co-exist and share the memory allocated to that particular process. Between processes there is no memory sharing.
- Round Robin: if we have a single core processor, the processes will be executed with the Round Robin schedule, execute first process with the most priority. When you switch from the execution of one process to another is called context switching.
Parallelism
In modern computing there is no need for context switching, we can run different threads on different cores (for now think of a core as a small processor), that’s why we have multiple cores devices!
But remember that in almost every process there are some instructions which should be performed sequentially and some others that can be computed simultaneously in parallel.
When you talk about parallelism you should remember that there are 2 types of parallelism:
- Task Level: different tasks on the same or different data
- Data Level: same task on different data
At this point, be careful not to confuse parallelism with concurrency.
- Concurrency: we have a single processor, that executes processes sequentially, and we just have the illusion of parallelism because the processor is really fast.
- Parallelism: true parallelism with multiple processors.
CPU, GPU and GPGPU
Graphical processing units (GPU) can perform complex actions in a short period. The complexity relies upon the quantity of operations executed
simultaneously, but only as long as they remain simple and basically similar. The game industry has been the launching market for the GPU implementation, later reached by Nvidia company through the platform CUDA. The notoriety of GPU has increased even more, especially for developers which were now able to run multiple computing actions using a few lines of code.
CUDA allows us to use parallel computing for so-called general-purpose computing on graphics processing units (GPGPU), i.e. using GPUs for more general purposes besides 3D graphics
Let’s summarize some basic differences between CPUs and GPUs.
GPU:
- low clock speed
- thousands of cores
- context switching is done by hardware (really fast)
- can switch between threads if one thread stalls
CPU:
- high clock speed
- few cores
- context switching is done by software (slow)
- switching between threads it’s not that easy
Basic Steps of Cuda Programming
In the next articles, we are going to write code to use parallel programming. However, we must first know what the structure of a cuda-based code is, there are a few simple steps to follow.
- Initialization of data on CPU
- Transfer data from CPU to GPU
- Kernel launch (instructions on GPU)
- Transfer results back to CPU from GPU
- Reclaim the memory from both CPU and GPU
In such an environment we will call Host Code the code that is going to run on CPU and Device Code the code that is going to run og GPU.
There is much more to see, but I prefer not to write it all down in one article because I think it would be more confusing than anything else.
Parallel programming is the basis of everything today, it is the way we have to speed up very long computation times, simply more processors or more cores working together, unity is strength!
If you’re interested in understanding how GPUs work and you like to program closely with hardware, keep reading this series of articles I’m going to publish. Personally I have found and am finding the study of CUDA extremely interesting every line of code seems like a puzzle to be solved.
Marcello Politi