CGRA4HPCA Workshop Logo

The Fourth International Workshop on Coarse-Grained Reconfigurable Architectures
for High-Performance Computing and AI

Submit Your Paper

CGRA4HPCA 2026 is co-located with IPDPS 2026 in New Orleans, USA — May 25th, 2026.

Submit Now

Introduction

With the end of Dennard scaling and the impending termination of Moore's law, researchers are actively searching for alternative forms of computing to continue providing better, faster, and less power-hungry systems. Today, several potential architectures are emerging to fill this widening void, including quantum and neuromorphic computers. However, out of the many proposed architectures, perhaps none is as salient an alternative as Coarse-Grained Reconfigurable Architectures/Arrays (CGRAs).

CGRAs belong to the programmable logic device family of architectures, providing reconfigurable Arithmetic Logic Units (ALUs) and a highly specialized yet versatile data path. This "coarsening" of reconfiguration allows CGRAs to achieve a significant reduction in power consumption and increase in operating frequency compared to FPGAs, while overcoming the expensive von Neumann overhead that traditional CPUs suffer from. In short, CGRAs strike a balance between the reconfigurability of FPGAs and the performance of CPUs, with power-consumption characteristics closer to custom ASICs.

CGRAs have a long research lineage dating back 25 years, but are recently garnering renewed interest in High-Performance Computing. Today, we see an explosion in the number of custom-built AI accelerators — many of which are CGRAs, such as those built by SambaNova or Cerebras. HPC centers are already including these CGRAs in their testbeds (e.g., Cerebras-1 at ORNL or EPCC).

This workshop provides a focused interdisciplinary forum for CGRA hardware researchers and HPC/distributed computing researchers from academia or industry to discuss state-of-the-art CGRA research for use in emerging HPC systems and Artificial Intelligence (AI).

Important Dates

February 1, 2026 Paper submission deadline
March 6, 2026 Camera-ready due
May 25, 2026 Workshop day — IPDPS 2026, New Orleans, USA

Register at IPDPS →

Invited Speakers

Nachiket Kapre — University of Waterloo

AI Code Generation for Tenstorrent Silicon

Effectively programming CGRA-like AI accelerators demands deep expertise in compute organization, memory hierarchy, and data movement—a fundamentally different discipline from conventional multi-threaded software development. To meet the pace of a rapidly evolving model landscape and to deliver cost-competitive, high-performance solutions, leading silicon providers have relied heavily on manual kernel authoring and hand-tuning. This approach, while effective, creates significant engineering bottlenecks: long debug cycles, opaque performance traces, and delayed customer delivery.

The emergence of agentic coding frameworks opens a new frontier for offloading portions of the code-to-deployment pipeline to AI agents. However, naively treating these agents as drop-in compilers misses their true potential. The durable path forward is to ground agentic flows in the underlying mathematical structure of the problem: a formulation that generalizes across problem classes, minimizes token overhead, and produces correct-by-construction outputs.

We present results from deploying this principled agentic approach across several domains: automatic generation of elementwise and fused reduction kernels along with NoC-optimized data movement operators, compilation of Hugging Face models with pattern-matched dispatch to hand-optimized kernels, sparse graph accelerator overlay, and EDA acceleration of some open-source tools. In each case, agentic flows matched or exceeded the performance of internal tooling — and in several instances produced viable solutions where none previously existed. We discuss the lessons learned, the boundaries of what is tractable, and the architectural principles that made these results possible.

Workshop Program

CGRA4HPCA 2026 will be held in conjunction with IPDPS 2026 in New Orleans, USA, on May 25th, 2026.

1:30pm – 1:40pm Opening remarks
1:40pm – 2:10pm Keynote 1: AI Code Generation for Tenstorrent Silicon
Nachiket Kapre, University of Waterloo
2:10pm – 2:30pm Paper 1: Control-Flow Execution on CGRAs: A Comprehensive Survey of Architectural and Compilation Techniques
Hisako Ito, Takuya Kojima, Hideki Takase, and Hiroshi Nakamura
2:30pm – 3:00pm Keynote 2: To be announced
3:00pm – 3:15pm Coffee break
3:15pm – 3:35pm Paper 2: FlowSpec: A Flexible and Scalable Simulation Framework for Coarse-Grained Spatial Architectures
YoungNo Kim, Hyeonseo Kim, Eunseok Cho, San htet Aung, and Jongeun Lee
3:35pm – 4:05pm Keynote 3: To be announced
4:05pm – 4:25pm Paper 3: Predication in Elastic CGRAs
Omkar Bhilare, Omar Ragheb, Boma Adhi, Kentaro Sano, Jason Anderson, and Tomohiro Ueno
4:25pm – 4:45pm Paper 4: Compiler-Based Performance Results for Regular Application Kernels on the HPC CGRA HiPReP
Markus Weinhardt
4:45pm – 5:05pm Paper 5: bitSMM: A bit-Serial Matrix Multiplication Accelerator
Pedro Antunes and Artur Podobas
5:05pm – 5:10pm Concluding remarks

Call for Papers

The call for papers is available to download here.

Topics of Interest

Topics include (but are not limited to):

Paper Submission

We welcome full-length research papers on the topics of interest described above. Contributions should be unpublished and not under consideration at other venues.

We also welcome presentations on new and emerging CGRA technologies from industry and startups. Contact the organizers if you are interested in participating.

Submit Your Paper →

Organization

Organizers

Program Committee