Tuesday, December 7, 2010

Open Source Software Helps Computers Use Graphics Chips for General Processing

Just in case you cannot read the picture above from the IEEE Computer magazine, December 2010. 

An Academic research team has developed an open source software tool that lets computers use the processing power of graphics processing units for purposes other than rendering and manipulating images.

North Carolina State University scientists are continuing to work on their optimizing compiler tool, which would let developers write simple application code without needing to know how to program specifically for GPUs. 

In essence, the tool takes an application and translates it into another program that does the same thing, more efficiently on a GPU. Computers use GPUs to generate the complex, data-intensive graphics seen in games, virtual reality, data visualization, and other applications, explained North Carolina State associate professor Huiyang Zhou. 

GPUs are much better than CPUs at vector processing-- handling mulitple sets of numeric operations on large arrays of structured data in parallel, explained Nathan Brookwood, Research Fellow with market-analysis firm Insight 64. They can thus offer considerably more performance than CPUs. 

Generally, mainstream general-purpose CPUs offer a peak performance between 20 and 150 Gflops, while mainstream GPUs perform between 500 and 1,500 Gflogs, he said. 

GPUs have been used to accelerate general programs for several years, and researchers are looking for ways to improve the process. The chips are fast in part because they process data in parallel. They thus mus work with applications written to compute this way. 

A key challenge is managing parallelism so that the chips stay busy by efficiently handling various threads they work with. Also challenging, are the effective use of on-chip and off-chip memory, and the even distribution of off-chip memory access to avoid over using memory controllers while leaving others idle. 

Zhou's tool compiles the application code optimally to both manage parallelism and efficiently utilize GPU memory. It enables a single memory access to load what would otherwise be multiple memory accesses, which makes data loading quicker and more efficient, thereby improving performance. 

The tool then checks the code to determine if it can handle high-performance memory features such as memory coalescing, which lets a GPU efficiently handle memory-access requests from multiple threads in a single process. If so, the tool recompiles the code so that it can implement these features. It subsequently determines whether data can be reused within and across threads. If so, the tool stores data in on-chip shared memory for fast reuse.

Finally, Zhou explained, the tool recompiles the code so that when system accesses off-chip memory, the accesses are evenly and efficiently distributed among multiple memory controllers. 

Zhou said tests by his research team showed that programs automatically translated by the new tool operated more efficiently than those manually optimized. 

Brookwood said the North Carolina State tool works similarly to Nvidia's CUDA and AMD's ATI Stream Technology, proprietary applications that enable GPUs to accelerate genreral-computing operations. It appears that Zhou's technique optimizes code more aggressively, he noted. Also, because the approach is not proprietary, it works with any hardware platform.

According to Zhou, his approach is more efficient because it yields optimized code that is better at parallelism management and memory utilization.

He said he eventually plans to release his tool as open source software.

No comments:

Post a Comment