Posts Tagged ‘CUDA’

00 – Preface

Tuesday, August 3rd, 2010

The new CUDA.NET Tutorials category was created to collect and manage resources and materials for developers starting to work and develop with CUDA.NET library for various platforms.

The usual composition will be of articles on specific topics and gradually increasing complexity.

This post will include an additional Table of Contents for published articles as we go.

Table of Contents

  1. Preface

 

For any question or comment, please contact us through our email address: support (at) hoopoe-cloud.com.

CUDA.NET 3.0.0 Released

Monday, June 21st, 2010

Dear all,

We are happy to announce the release of CUDA.NET version 3.0.0.
This release provides support for latest CUDA 3.0 API and few more updates that will make programming with CUDA from .NET easier and faster.

Additions:

  • Support for CUDA 3.0 API
  • Added memset functions for CUDA class
  • Supporting new graphics interoperability functions
  • Improved generics support for memory operations
  • Added CUDAContextSynchronizer class

Improved memory operations
We employ GCHandle class to be used with generic memory copies in CUDA class. This method allows to work with every data type (existing vectors or user defined) natively in .NET. The implication is that now you can copy existing custom arrays of structures/classes (user data-types) to device with memory copy functions.

CUDAContextSynchronizer
This class was added to assist developers in multi-GPU and multi-threaded environments sharing the same device. It uses existing CUDA API to manipulate the context each thread is attached to and provides .NET means to synchronize between threads sharing the same device for different computations.
Find it under the Tools namespace, the documentation includes a description of how to use it.

We hope you will enjoy this release.
As always, please send us comments or suggestions to: support@hoopoe-cloud.com.

Vicodeo™ – Accelerated Video Decoding Library

Tuesday, May 11th, 2010

Dear all,

We are glad to introduce a new library for video decoding, Vicodeo™, featuring accelerated performance for faster than real-time decoding of H.264, MPEG-2 (and more) video streams – in managed environments (.NET / Java).

Video processing nowadays has become a computing intensive task. Being able to accelerate decoding and various processing tasks, opens the door for many types of applications and usage of video in life, from: high-quality films, security/surveillance cameras, live events, video conversations over the web and much more.

Our library provides many capabilities beyond real-time (+) decoding of 1080p (Full HD) streams:

  • Codec support: H.264, MPEG-2, VC-1 and more
  • Color space conversion from YUV 4:2:0 to RGB (accelerated)
  • Integrated parser for elementary/transport streams and video packets
  • Simple integration with DirectX or OpenGL
  • Faster than real-time decoding for 1080p even on low-end platforms
  • Optional immediate decoding of frames, without buffering
  • And more!

For more information: Video Decoding

GECCO 2010 – GPU Competition

Friday, January 15th, 2010

Dear all,

GECCO (GPUs for Genetic and Evolutionary Computation Conference) will take part this year between July 7th-11th, at Portland, Oregon, USA.

Rules and competition guidelines are published on the website provided by the link below.
Registration is open until June 4th, 2010.

Link to the competition GECCO 2010.

Thanks to Dr. Simon Harding, Memorial University, Canada, for the notes and update.

OpenCL.NET 1.0.48 Released

Monday, December 7th, 2009

Hello,

We are happy to announce the availability of the so long waiting OpenCL.NET 1.0.48 library.

This version aligns with OpenCL 1.0.48 standard, and fully conforms with latest NVIDIA drivers for OpenCL (and as well on supported platforms).

In brief, this release of the standard added few API functions and modified some, to truly allow heterogeneous computing on a single system. An application can query for the existence of multiple computing devices on the system, also by different vendors (recognize the CPU and a GPU as compute resources) regardless of the vendor. Such that consuming different computing resources can be transparent.

For further details about standard features and changes please consult Khronos website.

For OpenCL.NET page and download, click here.

As always, you are invited to contact us at: support@hoopoe-cloud.com.

World Cloud Computing Summit 2009

Monday, November 30th, 2009

The 2nd annual cloud computing summit is about to take place in Shfayim, Israel, between December 2-3, 2009.

Following last year success, the event will cover recent developments and progress in cloud technologies. Presenting with top-of-the-line companies active in this field, including (partial list): Amazon, Google, eBay, IBM, HP, Sun, RedHat and more.

Additional “hands-on” labs and workshops are offered during the event for participants that would like to learn more about cloud technologies and integration possibilities.

We are also presenting Hoopoe at the summit, for GPU Cloud Computing, and providing a workshop on GPU Computing in general and Hoopoe as well.

This event ends 2009 and symbolically the last decade, marking cloud computing as a major development that we are about to see more and more in the next years.

You are invited to join us during the event.
Agenda
Registration

CUDA.NET 2.3.7 Released

Monday, October 26th, 2009

Dear all,

We would like to announce for the release of CUDA.NET 2.3.7.

This version addresses various issues with runtime API and types. The change was in data types and structures compliance with the native wrapper of CUDA Runtime API, to support cross-platform environments operating in 32 or 64 bit mode. The structures now support the SizeT structure we introduced in the previous CUDA.NET release.

Link to the download page.

Please send us your comments and feedback.

OpenCL.NET Released

Wednesday, September 16th, 2009

Hello everyone,

We are happy to announce the immediate availability of OpenCL.NET for the public.
This library provides a .NET implementation and wrapping of the OpenCL interface for GPU computing (and general computing as well).

Currently, the library supports revision 1.0.43 of Khronos (being the latest version of the standard).

Users may test the library with NVIDIA released drivers for OpenCL, or on other architectures as OpenCL should be supported on (Intel, AMD CPU etc.).

The API in this release was adapted to be cross platform in mind, and code, using the new SizeT construct for transparent handling of 32/64 bit platforms.

In addition, there is only one version of the library conforming to all operating systems who support OpenCL, regardless of Windows, Linux or Mac.

For any question, request, bug report or else, please contact us at: support@hoopoe-cloud.com.

We hope you will find this library useful.

SizeT – .NET and native code

Tuesday, September 15th, 2009

Hi,

In this post I wanted to introduce you with a new construct we added to the latest release of CUDA.NET (2.3.6) and will be available with the published OpenCL.NET library.

The problem

.NET is a very fixed environment, defining well known types, such that an int is always 4 bytes long (32 bit) and a long is always 8 bytes long (64 bit).

This is not the case with native code, for developers of C/C++. Writing a program in 32 bit environment, will always yield 32 bit types, unless using specific directives to get 64 bit variables. When writing 64 bit programs, they do get access to 64 bit wide variables as primitives supported by the compiler.

This clearly creates a portability problem for code and applications written in 32 and 64 bit environments.

Another example, is pointer size, where in C/C++ environments, under 32 bit the pointer is 4 bytes wide (int) and under 64 bit systems it is 8 bytes wide (long). The .NET environment (through different languages) provides a simple construct to overcome this problem, namely the IntPtr object, which some of you may be familiar with.

Now, coming back to our domain, the runtime API (also the driver in a new CUDA 2.3 function) and OpenCL makes extensive use of the C/C++ size_t data type. This data type ensures for developers that under different environments they will get the maximum width of the supported data type, unsigned int for 32 bit systems and unsigned long for 64 bit systems.

Possible options

By means of the interoprating library (wrapper), such as CUDA.NET, it creates a problem, since the API should provide several versions of the function, one given an uint (to map to 32 bit with unsigned int C/C++ type), and ulong (to map against unsigned long in 64 bit C/C++ systems). Supplying such an interface to the user will have to force him a specific behavior and system, since in .NET, the uint is always 32 bit wide, and ulong is 64 bit wide, no matter what.

Another option can be to provide a unique, standalone interface, using the IntPtr object, since .NET takes care to make it 32 bit wide in 32 bit systems and 64 bit wide for 64 bit systems, dynamically, without user intervention.

But using the IntPtr and a very serious downside, it is not dynamic, once it’s value is set, it cannot be changed through simple arithmetic operators, like +,-,*,/ or else.

The solution

Exactly for this purpose we created the SizeT object (structure). First, it maps to the same name as it’s native counterpart (size_t) and second it provides the dynamic mechanisms we want for working with 32 or 64 bit systems transparently.

SizeT can serve just like any other basic primitive in .NET.
For example:

SizeT temp = 15;
uint value = (uint)temp;
ulong value2 = (ulong)temp;
temp = value;

Internally, the SizeT wraps the IntPtr object to provide the same dynamic capabilities under 32 and 64 bit platforms.
It can host the required .NET primitives (int, uint, long, ulong), so when programming, one will make a good habit for using the SizeT instead of other data types (working with the runtime CUDA API).

For OpenCL the interface was built from the first place to use SizeT in mind, as the OpenCL API uses only size_t data types for cross platform functions.

Advanced data types with CUDA

Sunday, September 13th, 2009

Following with CUDA.NET 2.3.6 release, this article is meant to show you so of the more advanced constructs .NET can offer developers willing to get advanced interoperability with native code.
As most of you ar familiar, CUDA.NET offers to copy many types of arrays and data types to the GPU memory (through the different memcpy functions). These are based on well defined data types, mostly for numerical purposes.

Consider a basic data type of float, the corresponding array is declared as: float[], in C# or otherwise in different languages, but the principle is the same. In addition to these primitives (byte, short, int, long, float, double) there is also support for vector data types that CUDA support, such as Float2, where it is composed of 2 consequtive float elements.

What happens when you want to pass more complex data types that are not supported by CUDA.NET?

In this case, there are several techniques to achieve this goal, some maybe more complex to empploy than others, and it mostly depend on your expected usage.

1. Declaring a new copy function

Well, that’s always an option if you wish to extend the API of functions. In such case, the developer declares a new copy function to use, with expected parameters and consumes it.

The following example can show a little more:

// This is a dummy, complex data type
struct Test
{
public int value1;
public float value2;
}

// Define a new copy function to use with CUDA, assuming running under Linux
[DllImport("cuda")]
public static extern CUResult cuMemcpyHtoD(CUdeviceptr dst, Test[] src, uint bytes);

The definition above is for a function, to use, capable of copying data from an array of Test objects to device memory.
But, it may not always be convenient.

2. The dynamic, simpler way

Well, .NET offers one more possibility to convert .NET objects into native representation, without using “unsafe” mechanisms.

For this purpose, there is an object called “GCHandle” to use. This object provides an advanced control over the garbage collector of .NET to lock objects in memory and get their native pointer (IntPtr in .NET).

Since all copy functions in CUDA.NET support the IntPtr data type, one can use this mechanism as a generic way to copy data to the GPU. In practice, when a user calls one of the existing copy functions, the exact process is performed.

Again, consider the Test structure we created before.

// Getting native handle from an array
Test[] data = new Test[100];
// Fill in the array values...
GCHandle ptr = GCHandle.Alloc(data, GCHandleType.Pinned);
IntPtr src = ptr.AddrOfPinnedObject();
// Now copy to the GPU memory from this pointer...
....
// When finished, don't forget to free the GCHandle!
ptr.Free();

This is a simple process for exposing complex .NET data types to CUDA and CUDA.NET to be processed by the GPU.

In the next article we will present the new SizeT object we added for portability between 32 and 64 bit systems.