Tuesday, 18 December 2012

AMD News Letter


   View Online Version        AMD Embedded Solutions | December 2012  
   
 
   
 
 
 
AMD Embedded APU Solutions Guide

Browse AMD APU-based boards and systems in this just-published guide!

» Learn More

AMD Embedded Solutions is hiring!

Check out AMD Embedded Solutions job postings on LinkedIn.

» Learn More


This Engineering TV video explores Stony Brook University's new reality deck.

» Watch Now






   
 

The new AMD Embedded R-Series APU: Delivering exceptional performance in a power efficient platform

On May 21st of this year, AMD launched the AMD Embedded R-Series platform. This new Accelerated Processing Unit (APU) delivers high-performance processing coupled with a premium high-definition visual experience in a solution that is power efficient and compact. Many board and system partners have announced feature-rich AMD R-Series APU-based products and solutions over the last few months.

» Learn more about the AMD R-Series APUs.» Learn more about boards and systems based on these APUs.

The new AMD Embedded G-T16R APU: Providing an ultra low-power option in the AMD Embedded G-Series family

On June 25th of this year, AMD introduced the AMD Embedded G-T16R Accelerated Processing Unit (APU), which is targeted at very low power, small form factor and cost-sensitive embedded designs that require a combination of x86 compatibility and graphics. AMD  also extended the planned availability for the entire AMD Embedded G-Series processor family through 2017.

» Learn more about the AMD G-Series APUs. » Learn more about boards and systems based on these APUs.


December Partner Focus: congatec conga-TFS

The conga-TFS, a COM Express Type 6 module, offers superb integrated graphics performance along with an excellent performance-per-watt ratio for demanding applications, including gaming, digital signage, server, information appliance, communications, industrial controllers and medical.
  • AMD Embedded R-Series APU
  • AMD A70 controller hub
  • SODIMM, 16 GB, DDR3, 2x, 1066/800
  • 7x PCI Express™
  • 4x SATA
  • 4x USB 3.0, 4x USB 2.0
  • High performance DirectX® 11 GPU supports OpenCL™ 1.1 and OpenGL 4.2
» Learn More
  News
» AMD Announces Global Distribution Agreement with Symmetry Electronics for Embedded Products

» AMD Paves Ease-of-Programming Path to Heterogeneous System Architecture with New APP SDK 2.8 and Unified Developer Tool Suite

» New AMD Opteron 4300 and 3300 Series Processors Deliver Ideal Performance, Power and Price for Cloud Applications

Upcoming Events
AMD Embedded Seminar Series at RTECC (when signing up, make sure to mention that AMD invited you!):
» Santa Clara, CA – 1/24
» Dallas, TX – 3/19
» Austin, TX – 3/21

Embedded World 2013
» Nuremberg, Germany - 2/26-2/28

If you wish to set up a meeting with an AMD Embedded Solutions representative at any of these events, please email us at
embedded@amd.com

Blogs
» AMD Embedded R-Series APUs First to Provide Native Support for up to Four Connected DVI Displays

» What’s Catching my Eye? (Digital Signage)

FaceBook Twitter YouTube LinkedIn AMD Embedded

Useful Links
» Newsletter Registration
» AMD Embedded Home
» AMD Embedded Developer Support Site
» AMD-Based Embedded Product Catalog

Saturday, 15 December 2012

CUDA: WEEK IN REVIEW

CUDA SPOTLIGHT

Steve FordeGPU-Accelerated Motion Graphics
This week’s Spotlight is on Steve Forde of Adobe. Steve is responsible for Adobe’s visual effects product line, including Adobe After Effects in Creative Suite 6, which offers a new GPU-accelerated 3D ray-traced compositing workflow capability.

Read our interview with Steve Forde.

CUDA NEWS

GTC 2013 Registration Is Open
Registration is open for the GPU Technology Conference (GTC), March 18-21, 2013, San Jose, California. GTC 2013 will deliver valuable content for scientists and researchers, and is expanding to include additional areas where the GPU is central to innovation, such as computer graphics, cloud graphics, game development and mobile computing. Secure your spot today. Special 10% discount code for newsletter readers: GM10CD
Registration
Sessions and Tutorials
Travel and Hotels
Call for Posters

Try Tesla K20
Speed up your application with NVIDIA Tesla K20 GPU Accelerators. Built on the Kepler compute architecture, Tesla K20 offers innovative technologies like Dynamic Parallelism and Hyper-Q to boost performance and power efficiency.
Special offer: Purchase a K20 GPU Accelerator by Jan. 27 and receive free GTC 2013 pass (US only).
GPU test drive: Take a free and easy test drive to see how Tesla K20 can accelerate your code.

Right Around the Corner…
Register today for these interesting events coming up in the New Year:

CUDA in Chicago
Jan. 29-Feb. 1, 2013, Chicago, Illinois
Four-day course by Acceleware
Designed for programmers looking to develop skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of GPUs.

Titan Users and Developers Workshop (West Coast)
Jan. 29-31, 2013, Santa Clara, Calif.
Three-day workshop by Oak Ridge Leadership Computational Facility (OLCF)
An intense hands-on training on Titan, the world’s fastest supercomputer. Topics will cover everything from utilization of Oak Ridge resources to advanced GPGPU programming techniques.

Signal Processing & Communications Algorithms Using GPU Computing in MATLAB
Jan. 31, 2013
Webinar by MathWorks
This webinar will teach you how to leverage the computing power of GPUs to accelerate signal processing and communications applications in MATLAB, with minimal programming effort.

GPU THESIS WATCH

Title: All-Pairs Shortest Path Algorithms Using CUDA
Author: Jeremy M. Kemp, Durham University
Advisor: Professor Iain Stewart
Dept: School of Engineering & Computing Sciences

CUDA JOB OF THE WEEK

back to the top
The Honda Research Institute USA seeks talented candidates to conduct research on vision-based driver assistance systems. Requirements include strong skills in C/C++ and CUDA. Contact fulltime@honda-ri (dot) com (with job #P11F05 in subject line).

FROM THE BLOGOSPHERE

back to the top
Subscribe to the Parallel Forall RSS feedNew on the Parallel Forall Blog:
How to Overlap Data Transfers in CUDA Fortran, by Greg Ruetsch
How to Optimize Data Transfers in CUDA Fortran, by Greg Ruetsch
How to Optimize Data Transfers in CUDA C++, by Mark Harris
(Subscribe to the Parallel Forall RSS feed)
Subscribe to NVIDIA RSS feedNew on the NVIDIA blog:
How Gaming PCs Can Help In the Battle Against AIDS, by George Millington
GPU Startup Story: Fuzzy Logix Brings Clarity to Analytics, by Gary Rainville

GPU MEETUPS

back to the top
Find a GPU Meetup in your location, or start one up. Upcoming meetings include:
Paris, Dec. 18
New York, Dec. 20
Paris, Jan. 15
Brisbane, Jan. 24
New York, Jan. 24
Silicon Valley, Jan. 28

CUDA CALENDAR

back to the top
4-Day CUDA Course (Acceleware)
Jan. 29-Feb. 1, 2013, Chicago, Illinois
Instructor: Dr. Kelly Goss

Titan Users and Developers Workshop (West Coast)
Jan. 29-31, 2013, Santa Clara, Calif.
Hands-on training on Titan, the world’s fastest supercomputer

Signal Processing & Communications Algorithms Using GPU Computing (Webinar)
Jan. 31, 2013
Instructor: Kirthi Devleker, MathWorks

HPC Advisory Council Stanford Conference
Feb. 7-8, 2013, Stanford, Calif.
Open to the public

4-Day CUDA Course - Oil & Gas (Acceleware)
March 12-15, 2013, Houston, Texas
Instructor: Dr. Kelly Goss

GPU Tech Conference (GTC 2013)
March 18-21, 2013, San Jose, Calif.
Call for Posters
Developer Tutorials
Session Samples
(To list an event, emailcuda_week_in_review@nvidia.com)

CUDA RESOURCES

back to the top

GPU-Accelerated Apps

List of 200+ popular GPU-accelerated scientific and research applications.

CUDA Documentation

The new CUDA documentation site includes release notes, programming guides, manuals and code samples.

NVIDIA Tesla K20 and K20X

NVIDIA Tesla K20 and K20X GPU Accelerators are now available.

CUDA Education

NEW Coursera Course
NEW Udacity Course
NEW Book: CUDA Programming, by Shane Cook

NVIDIA Developer Forums

The new NVIDIA developer forums are now live. Join the new online community to learn from other developers and share your experience.

CUDA Consulting

Training, programming, and project development services are available from CUDA consultantsaround the world. To be considered for inclusion on list, email:cuda_week_in_review@nvidia.com (with CUDA Consulting in subject line).

GPU Computing on Twitter

For daily updates about GPU computing and parallel programming, follow @gpucomputing on Twitter.

Downloads

CUDA 5
CUDA 5 survey
Nsight
CARMA

Thursday, 13 December 2012

OpenCL Specification Versions


OpenCL 1.2

OpenCL 1.2 includes significant new functionality including:
The new OpenCL 1.2 specification released on November 15th 2011, provides enhanced performance and functionality in response to requests from the developer community – while retaining backwards compatibility with OpenCL 1.0 and 1.1. New features in OpenCL 1.2 include seamless sharing of media and surfaces with DirectX® 9 and 11, enhanced image support, custom devices and kernels, device partitioning and separate compilation and linking of objects.

OpenCL 1.1

OpenCL 1.1 includes significant new functionality including:
  • Host-thread safety, enabling OpenCL commands to be enqueued from multiple host threads;
  • Sub-buffer objects to distribute regions of a buffer across multiple OpenCL devices;
  • User events to enable enqueued OpenCL commands to wait on external events;
  • Event callbacks that can be used to enqueue new OpenCL commands based on event state changes in a non-blocking manner;
  • 3-component vector data types;
  • Global work-offset which enable kernels to operate on different portions of the NDRange;
  • Memory object destructor callback;
  • Read, write and copy a 1D, 2D or 3D rectangular region of a buffer object;
  • Mirrored repeat addressing mode and additional image formats;
  • New OpenCL C built-in functions such as integer clamp, shuffle and asynchronous strided copies;
  • Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL event objects to OpenGL fence sync objects;
  • Optional features in OpenCL 1.0 have been bought into core OpenCL 1.1 including: writes to a pointer of bytes or shorts from a kernel, and conversion of atomics to 32-bit integers in local or global memory.

OpenCL 1.0

OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.

Monday, 10 December 2012

OpenCL Studio 2.0 released

OpenCL Studio 2.0 released

OpenCL Studio integrates OpenCL and OpenGL into a single development environment for high performance computing. The feature rich editor, interactive scripting language and extensible plug-in architecture support the rapid development of complex parallel algorithms and accompanying visualizations. Version 2.0 now conforms to the Lua plug-in architecture and closely integrates the open-source libCL parallel algorithm library. A complete version of OpenCL Studio is freely available for download at www.opencldev.com, including instructional videos and technology showcases.

New CLOGS library with sort and scan primitives for OpenCL

CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently only two operations are supported: radix sorting and exclusive scan. Radix sort supports all the unsigned integral types as keys, and all the built-in scalar and vector types suitable for storage in buffers as values. Scan supports all the integral types. It also supports vector types, which allows for limited multi-scan capabilities.
Version 1.0 of the library has just been released. The home page is http://clogs.sourceforge.net/

OpenCL SDK for new Intel Core Processors


The Intel® SDK for OpenCL Applications now supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core™ processors with Intel® HD Graphics 4000/2500. For the first time, OpenCL developers using Intel® architecture can utilize compute resources across both Intel® Processor and Intel HD Graphics. More information: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk

VexCL: Vector expression template library for OpenCL

VexCL is vector expression template library for OpenCL developed by the Supercomputer Center of Russian academy of sciences. It has been created for ease of C++ based OpenCL development. Multi-device (and multi-platform) computations are supported. The code is publicly available under MIT license.
Main features:
  • Selection and initialization of compute devices according to extensible set of device filters.
  • Transparent allocation of device vectors spanning multiple devices.
  • Convenient notation for vector arithmetic, sparse matrix-vector multiplication, reductions. All computations are performed in parallel on all selected devices.
  • Appropriate kernels for vector expressions are generated automatically first time an expression is used.
Doxygen-generated documentation is available at http://ddemidov.github.com/vexcl/index.html.


SnuCL – OpenCL heterogeneous cluster computing

SnuCL is an OpenCL framework and freely available, open-source software developed at Seoul National University. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster consists of a single host node and multiple compute nodes. They are connected by an interconnection network, such as Gigabit and InfiniBand switches. The host node contains multiple CPU cores and each compute node consists of multiple CPU cores and multiple GPUs. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer. A GPU or a set of CPU cores becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node. Thus, with SnuCL, OpenCL applications written for a single heterogeneous system with multiple OpenCL compute devices can run on the cluster without any modifications. SnuCL achieves both high performance and ease of programming in a heterogeneous cluster environment.
SnuCL consists of SnuCL runtime and compiler. The SnuCL compiler is based on the OpenCL C compiler in SNU-SAMSUNG OpenCL framework. Currently, the SnuCL compiler supports x86, ARM, and PowerPC CPUs, AMD GPUs, and NVIDIA GPUs.

Virtual OpenCL (VCL) Cluster Platform 1.14 released

The MOSIX group announces the release of the Virtual OpenCL (VCL) cluster platform version 1.14. This version includes the SuperCL extension that allows micro OpenCL programs to run efficiently on devices of remote nodes. VCL provides an OpenCL platform in which all the cluster devices are seen as if they are located in the hosting-node. This platform benefits OpenCL applications that can use many devices concurrently. Applications written for VCL benefit from the reduced programming complexity of a single computer, the availability of shared-memory, multi-threads and lower granularity parallelism, as well as concurrent access to devices in many nodes. With SuperCL, a programmable sequence of kernels and/or memory operations can be sent to remote devices in cluster nodes, usually with just a single network round-trip. SuperCL also offers asynchronous communication with the host, to avoid the round-trip waiting time, as well as direct access to distributed file-systems. The VCL package can be downloaded from mosix.org.

CLU Runtime and Code Generator

The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications. It has been tested on Windows 7 with Visual Studio.

AMD CodeXL: comprehensive developer tool suite for heterogeneous compute

AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.

AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.
AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.

Webinar: Portability, Scalability, and Numerical Stability in Accelerated Kernels

Seeing speedups of an accelerated application is great, but what does it take to build a codebase that will last for years and across architectures? In this webinar, John Stratton will cover some of the insights gained at the University of Illinois at Urbana-Champaign from experience with computer architecture, programming languages, and application development.
The webinar will offer three main conclusions including:
  1. Performance portability should be more achievable than many people think.
  2. The number one performance-limiting factor now and in the future will be parallel scalability.
  3. As much as we care about performance, general libraries that will last have to be reliable as well as fast.
Register at http://www.gputechconf.com/page/gtc-express-webinar.html



GPU Computing: Past, Present and Future 

http://developer.download.nvidia.com/CUDA/training/GTC_Express_David_Luebke_June2011.pdf

CUDA 5 Production Release Now Available

The CUDA 5 Production Release is now available as a free download at www.nvidia.com/getcuda.

This powerful new version of the pervasive CUDA parallel computing platform and programming model can be used to accelerate more of applications using the following four (and many more) new features.
• CUDA Dynamic Parallelism brings GPU acceleration to new algorithms by enabling GPU threads to directly launch CUDA kernels and call GPU libraries.
• A new device code linker enables developers to link external GPU code and build libraries of GPU functions.
• NVIDIA Nsight Eclipse Edition enables you to develop, debug and optimize CUDA code all in one IDE for Linux and Mac OS.
• GPUDirect Support for RDMA provides direct communication between GPUs in different cluster nodes
As a demonstration of the power of Dynamic Parallelism and device code linking, CUDA 5 includes a device-callable version of the CUBLAS linear algebra library, so threads already running on the GPU can invoke CUBLAS functions on the GPU.


CUDA5  everything you need to know see the pdf

http://developer.download.nvidia.com/GTC/cuda5-everything-you-need-to-know.pdf

Webinar: Learn How GPU-Accelerated Applications Benefit Academic Research


GPUs have become a corner stone of computational research in high performance computing with over 200 commonly used applications already GPU-enabled. Researchers across many domains, such as Computational Chemistry, Biology, Weather & Climate, and Engineering, are using GPU-accelerated applications to greatly reduce time to discovery by achieving results that were simply not possible before.
Join Devang Sachdev, Sr. Product Manager, NVIDIA for an overview of the most popular applications used in academic research and an account of success stories enabled by GPUs. Learn also about a complimentary program which allows researchers to easily try GPU-accelerated applications on a remotely hosted cluster or Amazon AWS cloud.
Register at http://www.gputechconf.com/page/gtc-express-webinar.html.

OpenCL CodeBench Eclipse Code Creation Tools

OpenCL CodeBench is a code creation and productivity tools suite designed to accelerate and simplify OpenCL software development. OpenCL CodeBench provides developers with automation tools for host code and unit test bench generation. Kernel code development on OpenCL is accelerated and enhanced through a language aware editor delivering advanced incremental code analysis features. Software Programmers new to OpenCL can choose to be guided through an Eclipse wizard, while the power users can leverage the command line interface with XML-based configuration files. OpenCL CodeBench Beta is now available for Linux and Windows operating systems.

Sixth Workshop on General Purpose Processing Using GPUs (GPGPU6)

The Sixth Workshop on General Purpose Processing Using GPUs (GPGPU6) is held in conjunction with ASPLOS XVIII, Houston, TX, March 17, 2013.
Overview: The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms. Papers are being sought on many aspects of GPUs, including (but not limited to):
  • GPU applications + GPU compilation
  • GPU programming environments + GPU power/efficiency
  • GPU architectures + GPU benchmarking/measurements
  • Multi-GPU systems + Heterogeneous GPU platforms
Submission Information: Authors should submit their papers using the ACM SIG Proceedings format in double-column style using the directions on the conference website at http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU6. Submitted papers will be evaluated based on originality, significance to topics, technical soundness, and presentation quality. At least one author must register and attend GPGPU to present the work. Accepted papers will be included in preliminary proceedings and distributed at the event. All papers will be made available at the workshop and will also be published in the ACM Conference Proceedings Series.

CfP: High Performance Computing Symposium

The 21st High Performance Computing Symposium (HPC 2013), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and distributed simulations. Along with these new capabilities come new challenges in computing and system modeling. The goal of HPC 2013 is to encourage innovation in high performance computing
and communication technologies and to promote synergistic advances in modeling methodologies and simulation. It will promote the exchange of ideas and information between universities, industry, and national laboratories about new developments in system modeling, high performance computing and communication, and scientific computing and simulation.
Topics of interest include:
  • High performance/large scale application case studies
  • GPU for general purpose computations (GPGPU)
  • Multicore and many-core computing
  • Power aware computing
  • Cloud, distributed, and grid computing
  • Asynchronous numerical methods and programming
  • Hybrid system modeling and simulation
  • Large scale visualization and data management
  • tools and environments for coupling parallel codes
  • Parallel algorithms and architectures
  •  High performance software tools
  • Resilience at the simulation level
  • Component technologies for high performance computing
Submissions Due: 12/21/2012

Final CFP : Third Workshop on Parallel Computing and Optimization, PCO’13, Boston, USA

The Third Workshop on Parallel Computing and Optimization (PCO13) is held in conjunction with the IEEE IPDPS symposium, Boston, USA, May 24, 2013. Paper submission deadline is January 4, 2013.
The workshop on Parallel Computing and Optimization aims at providing a forum for scientific researchers and engineers on recent advances in the field of parallel or distributed computing for difficult combinatorial optimization problems, like 0-1 multidimensional knapsack problems and cutting stock problems, large scale linear programming problems, nonlinear optimization problems and global optimization problems. Emphasis will be placed on new techniques for the solution of these difficult problems like cooperative methods for integer programming problems and polynomial optimization methods. Aspects related to Combinatorial Scientific Computing (CSC) will also be treated. Finally, the use of new approaches in parallel computing like GPU or hybrid computing, peer to peer computing and cloud computing will be considered. Application to planning, logistics, manufacturing, finance, telecommunications and computational biology will be considered.
Please refer to the workshop webpage at http://conf.laas.fr/PCO13 for more details, and for submission instructions.

Wednesday, 5 December 2012

AMD Gaming Evolved Newsletter

Far Cry 3 Now Released
You are Jason Brody, a tourist stranded on a tropical island chain lost in a bloody conflict between psychotic warlords and indigenous rebels. Fighting to escape this beautiful but dangerous paradise, you’ll have to confront who you really are. Developed and published by Ubisoft, Far Cry 3 invites you on a journey through insanity, in which you’ll discover what you’re really made of, if you even live that long….

» Learn More   



Never Settle Bundle
This year’s best games on the fastest GPUs! And with a value of up to $170 USD, the NEVER SETTLE bundle is the biggest game promotion in the history of graphics cards.

» Learn More  


AMD Desktop Gaming Center

Find out more about AMD's gaming technologies and see for yourself why AMD is the leader in gaming platforms. Only AMD gives you high-performance processing and industry-leading graphics solutions making it the obvious choice for PC gaming. Visit the Newegg Desktop Gaming Center to learn more.

» Learn More

CUDA Webinars

Following the introduction of the Tesla K20 at this year’s Super Computing conference, we already have some great feedback from developers; here are just a few quotes.

Tesla K20 GPU is 2.3x faster than Tesla M2070, and no change was required in our code! - Senocak, Associate Professor in Boise State Univ

The K20 test cluster was an excellent opportunity for us to runTurbostream. Right out of the box, we saw a 2x speed up. - G. Pullan, Lecturer, University of Cambridge

Tesla K20 is very impressive. Our application runs 20x faster compared to a Sandy Bridge CPU. - A.Tumeo & O.Villa, Scientists, PNNL

We invite you to join us for new webinars about CUDA5 and the Tesla K20. During these live Webinars you will be able to get answers to your questions directly from the presenters. So don’t miss out and register today.

Inside Kepler Tesla K20 Family - Worlds Fastest and Most Efficient Accelerators
Presented by Julia Levites, NVIDIA and Stephen Jones, NVIDIA
Thursday, Dec 13, 2012 10am (PST) – Register Now

Best Practices for Deploying and Managing GPU Clusters
Presented by Dale Southard, NVIDIA
Wednesday, Dec 12, 2012 10am (PST) – Register Now

An Unlikely Symbiosis: Gaming and Super Computing
Presented by Sarah Tariq, NVIDIA
Tuesday, Dec 11, 2012 10am (PST) – Register Now

Introducing Fully Enabled Debugging of CUDA 5 Applications with Allinea DDT
Presented by Ian Lumb, Allinea Technologies
Wednesday, Dec 5, 2012 10am (PST) – Register Now

Friday, 30 November 2012

CUDA: WEEK IN REVIEW

CUDA: WEEK IN REVIEW, a news summary for the worldwide CUDA, GPGPU and parallel programming community.
CUDA TECH TIP: Need to measure GPU execution time of CUDA kernels and API calls? The most efficient and accurate run-time method is to use CUDA events. Learn more in this Parallel Forall blog post.

CUDA SPOTLIGHT

GPU-Accelerated Visual Effects
This week’s Spotlight is on Vladimir "Vlado" Koylazov, co-founder and head of software development at Chaos Group, developers of the popular V-Ray and V-Ray RT rendering software for artists and designers. Vlado comments: "The increased speed and interactivity enabled by GPU computing allows our users to work more efficiently than ever before."

Read our interview with Vlado Koylazov.

CUDA NEWS

Top Video Picks
Check out these timely presentations from the GPU Technology Theater at SC12:

Guest Speakers
Buddy Bland, ORNL: Titan: ORNL’s New Computer System for Science (19 mins)
Travis Oliphant, Continuum Analytics: Compiling Python to the GPU with Numba (20 mins)
John Urbanic, Pittsburgh SC: Bringing Supercomputing to the Masses with OpenACC (23 mins)
Wen-Mei Hwu, Univ. of Illinois: Kepler GPUs in Blue Waters (28 mins)

NVIDIA Speakers
Don Becker: CARMA: Developments in Power Efficient Computing (20 mins)
Bill Dally: The Road to Exascale (22 mins)
Mark Harris: New Features in CUDA 5 (26 mins)
Mark Ebersole: Intro to CUDA C/C++ (28 mins)
Ian Buck: CUDA: Past, Present and Future (30 mins)
Stephen Jones: Inside the Kepler Architecture (32 mins)

CUDA Documentation
Based on your feedback, NVIDIA has launched a brand new CUDA documentation site. It includes release notes, programming guides, manuals and code samples.

GPU THESIS WATCH

Title: Feasibility Study of the ‘Parareal’ Algorithm
Author: Allan S. Nielsen, Technical University of Denmark
Advisor: Dr. Allan P. Engsig Karup and Dr. Jan S. Hesthaven
Lab: GPUlab, DTU Informatics

CUDA JOB OF THE WEEK

back to the top
NVIDIA is seeking talented CUDA Library Software Engineers to develop performance application libraries and benchmarks for next generation GPUs. These include CUFFT, CURAND and other numerical libraries.

FROM THE BLOGOSPHERE

back to the top
Subscribe to the Parallel Forall RSS feed New on the Parallel Forall Blog:
Thinking Parallel, Part II: Tree Traversal on the GPU, by Tero Karras
How to Query Device Properties and Handle Errors in CUDA C/C++, by Mark Harris
How to Query Device Properties and Handle Errors in CUDA Fortran, by Greg Ruetsch

GPU MEETUPS

back to the top
Find a GPU Meetup in your location, or start one up. Upcoming meetings include:
New York, Nov. 29
Silicon Valley, Dec. 3
Perth, Dec. 5
Brisbane, Dec. 6
Boston, Dec. 14
Paris, Dec. 18

CUDA CALENDAR

back to the top
Parallel Computing with GPUs and CUDA for Finance (NVIDIA)
Nov. 29, 2012, 5:30 pm, Baruch College, New York, New York
Note: An Introduction for Financial Services Developers

Parallel Computing Course (SagivTech)
Dec. 2-5, 2012, Ramat Gan, Israel

Parallel Computing with GPUs and CUDA for Finance (NVIDIA)
Dec. 3, 2012, 5:30 pm, Microsoft, London, UK
Note: An Introduction for Financial Services Developers

GPUs in the Cloud
Dec. 3-6, 2012, Taipei, Taiwan

4-Day CUDA Course, with Finance Focus (Acceleware)
Dec. 4-7, 2012, New York, New York
Instructor: Dr. Kelly Goss, Acceleware

Many-Core Developer Conference (UKMAC 2012)
Dec. 5, 2012, University of Bristol, UK

CUDA and OpenACC (HPC@LR)
Dec. 6, 2012, Montpellier, France
Note: HPC@LR is the HPC competency center for Languedoc-Roussillon

Debugging of CUDA 5 Apps with Allinea DDT (Webinar)
Dec. 5, 2012, 10:00 am pacific
By Ian Lumb, Allinea

An Unlikely Symbiosis: Gaming and Supercomputing (Webinar)
Dec. 11, 2012, 10:00 am pacific
By Sarah Tariq, NVIDIA

Best Practices for Deploying and Managing GPU Clusters (Webinar)
Dec. 12, 2012, 10:00 am pacific
By Dale Southard, NVIDIA

Getting Started with ArrayFire: 30-Minute Jump Start (Webinar)
Dec. 13, 2012, noon pacific
Sponsored by AccelerEyes
2013
Understanding Parallel Graph Algorithms (Webinar)
Jan. 10, 2013, 9:00 am pacific
By Duane Merrill and Michael Garland, NVIDIA

GPU Tech Conference (GTC 2013)
March 18-21, 2013, San Jose, Calif.
Call for Posters
Developer Tutorials
Session Samples
(To list an event, email: cuda_week_in_review@nvidia.com)

CUDA RESOURCES

back to the top

NVIDIA Tesla K20 and K20X

NVIDIA Tesla K20 and K20X GPU Accelerators are now available.

GPU-Accelerated Apps

List of 200+ popular GPU-accelerated scientific and research applications.
Web | PDF

CUDA Education

NEW Coursera Course
NEW Udacity Course
NEW Book: CUDA Programming, by Shane Cook

NVIDIA Developer Forums

The new NVIDIA developer forums are now live. Join the new online community to learn from other developers and share your experience.

CUDA Consulting

Training, programming, and project development services are available from CUDA consultants around the world. To be considered for inclusion on list, email: cuda_week_in_review@nvidia.com (with CUDA Consulting in subject line).

GPU Computing on Twitter

For daily updates about GPU computing and parallel programming, follow @gpucomputing on Twitter.

Downloads

CUDA 5
CUDA 5 survey
Nsight
CARMA

CUDA on the Web

CUDA Spotlights
CUDA Newsletters
CUDA Zone
GPU Test Drive
GPUComputing.net
GPGPU.org