Introduction

With the end of Dennard scaling and the impending termination of Moore's law, researchers are actively searching for alternative forms of computing to continue providing better, faster, and less power-hungry systems. Today, several potential architectures are emerging to fill this widening void, including quantum and neuromorphic computers. However, out of the many proposed architectures, perhaps none is as salient an alternative as Coarse-Grained Reconfigurable Architectures/Arrays (CGRAs).

CGRAs belong to the programmable logic device family of architectures, providing reconfigurable Arithmetic Logic Units (ALUs) and a highly specialized yet versatile data path. This "coarsening" of reconfiguration allows CGRAs to achieve a significant reduction in power consumption and increase in operating frequency compared to FPGAs, while overcoming the expensive von Neumann overhead that traditional CPUs suffer from. In short, CGRAs strike a balance between the reconfigurability of FPGAs and the performance of CPUs, with power-consumption characteristics closer to custom ASICs.

CGRAs have a long research lineage dating back 25 years, but are recently garnering renewed interest in High-Performance Computing. Today, we see an explosion in the number of custom-built AI accelerators — many of which are CGRAs, such as those built by SambaNova or Cerebras. HPC centers are already including these CGRAs in their testbeds (e.g., Cerebras-1 at ORNL or EPCC).

This workshop provides a focused interdisciplinary forum for CGRA hardware researchers and HPC/distributed computing researchers from academia or industry to discuss state-of-the-art CGRA research for use in emerging HPC systems and Artificial Intelligence (AI).

Important Dates

February 1, 2026	Paper submission deadline
March 6, 2026	Camera-ready due
May 25, 2026	Workshop day — IPDPS 2026, New Orleans, USA

Register at IPDPS →

Invited Speakers

Nachiket Kapre — University of Waterloo

AI Code Generation for Tenstorrent Silicon

Effectively programming CGRA-like AI accelerators demands deep expertise in compute organization, memory hierarchy, and data movement—a fundamentally different discipline from conventional multi-threaded software development. To meet the pace of a rapidly evolving model landscape and to deliver cost-competitive, high-performance solutions, leading silicon providers have relied heavily on manual kernel authoring and hand-tuning. This approach, while effective, creates significant engineering bottlenecks: long debug cycles, opaque performance traces, and delayed customer delivery.

The emergence of agentic coding frameworks opens a new frontier for offloading portions of the code-to-deployment pipeline to AI agents. However, naively treating these agents as drop-in compilers misses their true potential. The durable path forward is to ground agentic flows in the underlying mathematical structure of the problem: a formulation that generalizes across problem classes, minimizes token overhead, and produces correct-by-construction outputs.

We present results from deploying this principled agentic approach across several domains: automatic generation of elementwise and fused reduction kernels along with NoC-optimized data movement operators, compilation of Hugging Face models with pattern-matched dispatch to hand-optimized kernels, sparse graph accelerator overlay, and EDA acceleration of some open-source tools. In each case, agentic flows matched or exceeded the performance of internal tooling — and in several instances produced viable solutions where none previously existed. We discuss the lessons learned, the boundaries of what is tractable, and the architectural principles that made these results possible.

Workshop Program

CGRA4HPCA 2026 will be held in conjunction with IPDPS 2026 in New Orleans, USA, on May 25th, 2026.

1:30pm – 1:40pm	Opening remarks
1:40pm – 2:10pm	Keynote 1: AI Code Generation for Tenstorrent Silicon Nachiket Kapre, University of Waterloo
2:10pm – 2:30pm	Paper 1: Control-Flow Execution on CGRAs: A Comprehensive Survey of Architectural and Compilation Techniques Hisako Ito, Takuya Kojima, Hideki Takase, and Hiroshi Nakamura
2:30pm – 3:00pm	Keynote 2: To be announced
3:00pm – 3:15pm	Coffee break
3:15pm – 3:35pm	Paper 2: FlowSpec: A Flexible and Scalable Simulation Framework for Coarse-Grained Spatial Architectures YoungNo Kim, Hyeonseo Kim, Eunseok Cho, San htet Aung, and Jongeun Lee
3:35pm – 4:05pm	Keynote 3: To be announced
4:05pm – 4:25pm	Paper 3: Predication in Elastic CGRAs Omkar Bhilare, Omar Ragheb, Boma Adhi, Kentaro Sano, Jason Anderson, and Tomohiro Ueno
4:25pm – 4:45pm	Paper 4: Compiler-Based Performance Results for Regular Application Kernels on the HPC CGRA HiPReP Markus Weinhardt
4:45pm – 5:05pm	Paper 5: bitSMM: A bit-Serial Matrix Multiplication Accelerator Pedro Antunes and Artur Podobas
5:05pm – 5:10pm	Concluding remarks

Call for Papers

The call for papers is available to download here.

Topics of Interest

Topics include (but are not limited to):

Novel high-performance CGRA architectures for HPC and AI, including energy-efficient architectures (asynchronous/clockless CGRAs, power optimizations, etc.)
Parallel programming language support for CGRA architectures (e.g., OpenMP or CUDA/HIP for CGRAs)
Compilation strategies, algorithms, and methods for mapping computations onto CGRAs
Smart middleware and runtime systems for CGRAs, including multi-CGRA systems for HPC and AI
Experience porting scientific kernels and applications to state-of-the-art CGRAs (weather/climate codes, CFD, MD, etc.)
Use of CGRA frameworks (e.g., CGRA-ME and OpenCGRA) to generate and customize architectures
Software-programmable CGRAs (e.g., Xilinx ACAP Versal)
Processors with a tightly interconnected CGRA subsystem
Machine Learning applications and case studies, performance and power-efficiency comparisons between CPUs/GPUs and CGRAs
Combination of CGRAs with other emerging post-Moore models (e.g., neuromorphic systems)
New emerging CGRA-like architectures for Generative AI
Case studies and evaluations of CGRAs for Generative AI

Paper Submission

We welcome full-length research papers on the topics of interest described above. Contributions should be unpublished and not under consideration at other venues.

Maximum 8 single-spaced pages
Double-column, 10-point font, 8.5×11 inch pages (IEEE conference style)
Single-blind review process
Accepted papers will be included in the workshop proceedings and submitted for inclusion in the IEEE Xplore Digital Library

We also welcome presentations on new and emerging CGRA technologies from industry and startups. Contact the organizers if you are interested in participating.

Submit Your Paper →

Organization

Organizers

Artur Podobas — KTH, Sweden
Kentaro Sano — RIKEN, Japan
Jason Anderson — University of Toronto, Canada
Tomohiro Ueno — RIKEN, Japan

Program Committee

Boma Anantasatya Adhi — RIKEN
Cheng Tan — Google / ASU
Jens Domke — RIKEN CCS
Lingli Wang — Fudan University
Markus Weinhardt — HS Osnabrück
Takuya Kojima — University of Tokyo
Georgi Gaydadjiev — TU Delft
Christian Hochberger — TU Darmstadt
Omar Ragheb — Fujitsu