Performance Analysis of the Multi-pass Transformation for Complex 3d-Stencils on GPUs

  1. S. Tabik
  2. L.F. Romero
  3. E.L. Zapata
Libro:
Actas de las XXIV Jornadas de Paralelismo
  1. Guillermo Botella (coord.)
  2. Alberto A. Del Barrio (coord.)

Editorial: Limencop S.L.

ISBN: 978-84-695-8330-2

Año de publicación: 2013

Páginas: 422-427

Congreso: Jornadas de Paralelismo (24. 2013. Madrid)

Tipo: Aportación congreso

Resumen

Complex iterative 3d stencils based on aseries of multiple simpler stencils with different computationintensities cannot be handled properly usingstandard techniques on the GPU. This work demonstratesthat decomposing these kind of stencils into asequence of up to a specific number of simpler stencilsand further optimizing each individual kernel providesthe best overall performance. We focus on the familyof PDE-based denoising methods, which can be reformulatedas sequence of multiple stencils-based tasks.The performance results and analysis show that thereexists an optimal level of splitting-coalescence of thesestencils-based tasks that reaches the best compromisebetween better use of fast-memories and higher concurrency.