Posts Tagged ‘CUDA’

Advanced data types with CUDA

Sunday, September 13th, 2009

Following with CUDA.NET 2.3.6 release, this article is meant to show you so of the more advanced constructs .NET can offer developers willing to get advanced interoperability with native code.
As most of you ar familiar, CUDA.NET offers to copy many types of arrays and data types to the GPU memory (through the different memcpy functions). These are based on well defined data types, mostly for numerical purposes.

Consider a basic data type of float, the corresponding array is declared as: float[], in C# or otherwise in different languages, but the principle is the same. In addition to these primitives (byte, short, int, long, float, double) there is also support for vector data types that CUDA support, such as Float2, where it is composed of 2 consequtive float elements.

What happens when you want to pass more complex data types that are not supported by CUDA.NET?

In this case, there are several techniques to achieve this goal, some maybe more complex to empploy than others, and it mostly depend on your expected usage.

1. Declaring a new copy function

Well, that’s always an option if you wish to extend the API of functions. In such case, the developer declares a new copy function to use, with expected parameters and consumes it.

The following example can show a little more:

// This is a dummy, complex data type
struct Test
{
public int value1;
public float value2;
}

// Define a new copy function to use with CUDA, assuming running under Linux
[DllImport("cuda")]
public static extern CUResult cuMemcpyHtoD(CUdeviceptr dst, Test[] src, uint bytes);

The definition above is for a function, to use, capable of copying data from an array of Test objects to device memory.
But, it may not always be convenient.

2. The dynamic, simpler way

Well, .NET offers one more possibility to convert .NET objects into native representation, without using “unsafe” mechanisms.

For this purpose, there is an object called “GCHandle” to use. This object provides an advanced control over the garbage collector of .NET to lock objects in memory and get their native pointer (IntPtr in .NET).

Since all copy functions in CUDA.NET support the IntPtr data type, one can use this mechanism as a generic way to copy data to the GPU. In practice, when a user calls one of the existing copy functions, the exact process is performed.

Again, consider the Test structure we created before.

// Getting native handle from an array
Test[] data = new Test[100];
// Fill in the array values...
GCHandle ptr = GCHandle.Alloc(data, GCHandleType.Pinned);
IntPtr src = ptr.AddrOfPinnedObject();
// Now copy to the GPU memory from this pointer...
....
// When finished, don't forget to free the GCHandle!
ptr.Free();

This is a simple process for exposing complex .NET data types to CUDA and CUDA.NET to be processed by the GPU.

In the next article we will present the new SizeT object we added for portability between 32 and 64 bit systems.

New CUDA.NET Release (2.3.6)

Sunday, September 13th, 2009

Hi Everyone,

We’ve just released the latest version (2.3.6) of CUDA.NET library.

Beside supporting the latest features of CUDA 2.3 (double precision FFT, advanced memory allocation and more) we added more features to the API of the runtime and graphics (DX/GL) to better support 32/64 systems and be portable.

Following this article we will publish a few series of articles presenting the new constructus we added, and native interoperability, which is always an issue with .NET code and advanced demands for applications. They intend to show how to create portable code between systems and using complex structures an data types passed to the GPU.

Enjoy!

CUDA.NET – Case studies, call for contribution

Friday, May 22nd, 2009

We are pleased to announce a call for contribution for case studies and customer stories using CUDA.NET to be presented in our web site.

We invite organizations, research institutes and privates to tell us about about their use of CUDA.NET for different purposes – developing a product, researching variety of scientific fields and more.

Users willing to contribute their story are invited to send their details to the following address: cuda.net@gass-ltd.co.il and we will contact them soon.

Thank you for your cooperation.

CUDA.NET 2.2 released

Thursday, May 21st, 2009

We are happy to announce the release of CUDA.NET version 2.2.

This release aligns with CUDA 2.2 API and features, and provides further improvements with CUDA.NET.
To download page.

Few of the additions/changes:

  • Supporting CUDA 2.2 API (zero copy etc.)
  • CUDA class supports all driver functions, adding few missing texture functions into the API
  • Removing double precision FFT routines from CUFFT – the functions were there for future support, but are no longer available
  • Adding MSDN/CHM based documentation for the library
  • Extending the runtime API support to allow various memory copies and the latest 2.2 API

We you will all find that release useful.

You are invited to provide us comments for usage and in general about the library to improve it.
You can send all that information to cuda.net@gass-ltd.co.il.

jCUDA 1.1 released

Friday, April 3rd, 2009

jCUDA version 1.1 is released to the public. This version adds many improvments to the previous 1.0.1 release.

Additions:

  • - Adding object oriented support for CUDA, OpenGL and CUFFT functionality
  • - Splitting FFT and CUDA native libraries to operate as standalone
  • - Extending native interface to provide more functionality (NativeUtiles.getPointerSize method)

You may download the new release from: http://www.gass-ltd.co.il/en/products/jcuda/.

jCUDA 1.0.1 released

Friday, March 13th, 2009

We are pleased to annouce the availability of jCUDA version 1.0.1 for the public.

New in this version:

  • Support for Windows operatin system (XP/Vista) in 32/64 bit
  • Fixing issues with native layer

You may download it from: http://www.gass-ltd.co.il/en/products/jcuda.

CUDA.NET 2.1 Released

Thursday, February 5th, 2009

We are pleased to announce that a new version of CUDA.NET is out, following the release of CUDA 2.1.

The new release of CUDA.NET, provides support for new DirectX 10 API interoperability, and JIT compiler.

To download click here.

DirectX 10 interoperability

The new API by NVIDIA allows to integrate existing DirectX 10 applications with CUDA, to provide another level of computing, if for post-processing, image processing or other computations to perform.

DirectX 9 API is still supported.

JIT Compiler

A new compiler support is provided by NVIDIA, through the API. This allows to generate CUDA kernel code in runtime and compile it on demand using this new facility.

In addition, it allows to attach kernel source code to an application, and compile it at the site, using specific configuration: maximum register usage, specific hardware support and more.

FIXes

This release of CUDA.NET 2.1, fixes an issue with CUDAExecution class. When running a computation on the GPU using the class, an then calling the Clear method, didn’t clear the parameters state. As with this release the issue was fixed.

Security in Hoopoe

Monday, January 19th, 2009

Security in cloud systems is always a major part of the system, and requires a great effort to deal with and develop.

It usually starts when users are given access to actual machines, so they can run applications using the operating system, whether it is Windows or Linux based.

Security models in Hoopoe

Hoopoe provides several features to overcome this problem.

Isolated user environment

Hoopoe provides each user with a unique, isolated, environment. This way, only the user can access its files and computations, using the specific mechanism provided by for file management and related operations.

Hiding the “metal”

Hoopoe hides the “metal” from the user, providing access only through a web service interface to communicate with the system.
Thus, the user is limited with the flexibility of the code it can run.
There is no direct access to machines, so the user is able to submit his task to Hoopoe for further processing of the system. After the submission point, the user waits for the task to finish, and copy the results back.

Independent data management

User data is managed by Hoopoe as files, either raw or compressed (using GZip).
A buffer is then read in a fully managed (.NET) environment, thus reducing the risk for malformed or “bad” files.

Running computations

Hoopoe is meant to run computations, and not serve as an operating system. By such, user tasks are compiled on demand for the platform it should be processed on (if 32/64 bit, or specific hardware support).

Computations are running on the GPU itself, and this is where the interaction with the GPU ends. Copying the relevant data, performing the computations and placing the results back in the appropriate buffer.

Using CUDA FFT from FORTRAN

Monday, January 19th, 2009

In this post we will try to demonstrate how to call CUDA FFT routines (CUFFT) from a FORTRAN application, using the native CUDA interface and our bindings.

CUFFT usage

CUFFT library by NVIDIA, follows FFTW library manners to run FFTs.
For example, executing a 2D FFT over a 256×256 data set involves the following steps.

General GPU steps:

  1. Select the GPU device to work with
  2. Allocate enough device memory to store data
  3. Transfer input data to device

FFT steps:

  1. Create FFT plan with specific dimensions
  2. Execute FFT on device with input and output parameters
  3. Destroy FFT plan

After computing steps:

  1. Copy results back to CPU memory (RAM)
  2. Release device memory

Let’s code

General GPU steps

To select the device we want to work with we can take two possible ways. One is to use the driver interface, and the 2nd is to use the runtime interface.

Selecting a device with CUDA driver is a bit more complicated but adds more levels of flexibility.


# Initialize CUDA, default flags
call cuInit(0)
# Get a reference to the 1st device in the system
# recognized by CUDA
call cuDeviceGet(idev, 0)
# Now, create a new context a bind it to the
# device we got before
call cuCtxCreate(ictx, 0, idev)

This code fragment is relevant to clause 1 of general GPU steps, as we actually selected the device to work with, to be the 1st in the system.

Allocating device memory can be done using cuMemAlloc function of CUDA.
For example:


# Allocate memory for array of nx * ny with real
# complex elements
call cuMemAlloc(iptr, inx * iny * 4 * 2)

This one, maps to step 2 of general GPU steps.

To copy memory from CPU to GPU, or device, we need to issue cuMemcpyHtoD meaning Host->Device copy.


# Assume that data was defined as COMPLEX data(inx, iny)
call cuMemcpyHtoD(iptr, data, inx*iny * 4 * 2)

This maps to step 3 of general GPU steps.

By that we have finished to prepare the data on the GPU and we are ready to run the FFT routine.

FFT steps

Using CUFFT library is relatively easy using the following example.


# Here we create the FFT plan, note that dimensions
# of the FFT are specified in this stage so this plan
# can be reused later.
# The last parameter denotes the type of FFT to perform:
# Real->Complex, Complex->Real or Complex->Complex,
# The value 0x29 represents Complex->Complex, while
# it is possible to create a constant for this purpose.
call cufftPlan2d(iplan, inx, iny, 0x29)

This maps to step 1 of FFT steps, to create an FFT plan.

When we have the plan we can simply execute our requested FFT and get back results


# Execute the FFT according to our plan. Specifying
# iptr for input & output means in place FFT.
# It is possible to store the results in a different buffer.
# The value -1, denotes the direction of FFT, where
# -1 is forward and 1 is inverse.
call cufftExecC2C(iplan, iptr, iptr, -1)

This maps to step 2 of FFT steps.

After we managed to execute our FFT and finished working with it, it is now time to release the resources consumed by the FFT library.


# Destroy the FFT plan
call cufftDestroy(iplan)

Here we completed our FFT steps.

After computing steps:

Computations using the GPU are now over, we can copy the results back to CPU memory for further computations.


# Use the Device->Host function to copy the
# computed data from GPU to CPU.
call cuMemcpyDtoH(data, iptr, inx*iny * 4 * 2)

This maps to step 1 of after computing steps. After this copy command, data computed by the GPU will be available in “data” array variable.

Now we shall release GPU resources used during our computation


# Free the GPU memory we allocated previously
call cuMemFree(iptr)
# Unbind the CUDA context, this step happens in any case
# when the process exits, but it's a good habit
# to follow that
call cuCtxDestroy(ictx)

This is it, our entire code is over, and we used the GPU to compute FFT.

Final words

This example showed the usage of FFT computations using the GPU with CUDA framework by NVIDIA. FFT is a very important tool for many applications and scientific computations. The GPU can significantly improve performance with FFT computations, by many factors compared to the CPU.

Compiling

If using gfortran, g77, g95 or ifort under Linux, to compile the above code in FORTRAN simple issue the command:


gfortran fft.f cuda.o cufft.o -lcufft -lcuda

Where gfortran can be replaced by any of your favoured compiler. Libraries libcufft.so and libcuda.so come as part of NVIDIA CUDA Toolkit release and driver, so they are present on a machine having them installed. Files cuda.o and cufft.o contain the bridge code needed for FORTRAN to C communication.

Annoucing Hoopoe – Cloud Services for GPU Computing

Sunday, January 18th, 2009

We are happy to introduce to you “Hoopoe”, a cloud solution for GPU computing.

You may have all expected it to be available sometime, and indeed it is.

Hoopoe provides a web service interface to communicate with. In the near future it will also provide machine level access to run specific applications like with regular CPU based clouds.

Partial feature list of the system:

  • CUDA Support
  • Executing CUDA kernels, FFT and BLAS routines
  • OpenCL Support
  • Executing OpenCL kernels
  • Fully secure – Check out

Take a further look at: http://www.hoopoe-cloud.com. The system will be open for alpha testing very soon so you are invited to register.