CGRA4HPCA 2026 is co-located with IPDPS 2026 in New Orleans, USA — May 25th, 2026.
Submit NowIntroduction
With the end of Dennard scaling and the impending termination of Moore's law, researchers are actively searching for alternative forms of computing to continue providing better, faster, and less power-hungry systems. Today, several potential architectures are emerging to fill this widening void, including quantum and neuromorphic computers. However, out of the many proposed architectures, perhaps none is as salient an alternative as Coarse-Grained Reconfigurable Architectures/Arrays (CGRAs).
CGRAs belong to the programmable logic device family of architectures, providing reconfigurable Arithmetic Logic Units (ALUs) and a highly specialized yet versatile data path. This "coarsening" of reconfiguration allows CGRAs to achieve a significant reduction in power consumption and increase in operating frequency compared to FPGAs, while overcoming the expensive von Neumann overhead that traditional CPUs suffer from. In short, CGRAs strike a balance between the reconfigurability of FPGAs and the performance of CPUs, with power-consumption characteristics closer to custom ASICs.
CGRAs have a long research lineage dating back 25 years, but are recently garnering renewed interest in High-Performance Computing. Today, we see an explosion in the number of custom-built AI accelerators — many of which are CGRAs, such as those built by SambaNova or Cerebras. HPC centers are already including these CGRAs in their testbeds (e.g., Cerebras-1 at ORNL or EPCC).
This workshop provides a focused interdisciplinary forum for CGRA hardware researchers and HPC/distributed computing researchers from academia or industry to discuss state-of-the-art CGRA research for use in emerging HPC systems and Artificial Intelligence (AI).
Important Dates
| February 1, 2026 | Paper submission deadline |
| March 6, 2026 | Camera-ready due |
| May 25, 2026 | Workshop day — IPDPS 2026, New Orleans, USA |
Invited Speakers
Nachiket Kapre — University of Waterloo
AI Code Generation for Tenstorrent Silicon
Effectively programming CGRA-like AI accelerators demands deep expertise in compute organization, memory hierarchy, and data movement—a fundamentally different discipline from conventional multi-threaded software development. To meet the pace of a rapidly evolving model landscape and to deliver cost-competitive, high-performance solutions, leading silicon providers have relied heavily on manual kernel authoring and hand-tuning. This approach, while effective, creates significant engineering bottlenecks: long debug cycles, opaque performance traces, and delayed customer delivery.
The emergence of agentic coding frameworks opens a new frontier for offloading portions of the code-to-deployment pipeline to AI agents. However, naively treating these agents as drop-in compilers misses their true potential. The durable path forward is to ground agentic flows in the underlying mathematical structure of the problem: a formulation that generalizes across problem classes, minimizes token overhead, and produces correct-by-construction outputs.
We present results from deploying this principled agentic approach across several domains: automatic generation of elementwise and fused reduction kernels along with NoC-optimized data movement operators, compilation of Hugging Face models with pattern-matched dispatch to hand-optimized kernels, sparse graph accelerator overlay, and EDA acceleration of some open-source tools. In each case, agentic flows matched or exceeded the performance of internal tooling — and in several instances produced viable solutions where none previously existed. We discuss the lessons learned, the boundaries of what is tractable, and the architectural principles that made these results possible.
Workshop Program
CGRA4HPCA 2026 will be held in conjunction with IPDPS 2026 in New Orleans, USA, on May 25th, 2026.
| 1:30pm – 1:40pm | Opening remarks |
| 1:40pm – 2:10pm | Keynote 1: AI Code Generation for Tenstorrent Silicon Nachiket Kapre, University of Waterloo |
| 2:10pm – 2:30pm | Paper 1: Control-Flow Execution on CGRAs: A Comprehensive Survey of Architectural and Compilation Techniques Hisako Ito, Takuya Kojima, Hideki Takase, and Hiroshi Nakamura |
| 2:30pm – 3:00pm | Keynote 2: To be announced |
| 3:00pm – 3:15pm | Coffee break |
| 3:15pm – 3:35pm | Paper 2: FlowSpec: A Flexible and Scalable Simulation Framework for Coarse-Grained Spatial Architectures YoungNo Kim, Hyeonseo Kim, Eunseok Cho, San htet Aung, and Jongeun Lee |
| 3:35pm – 4:05pm | Keynote 3: To be announced |
| 4:05pm – 4:25pm | Paper 3: Predication in Elastic CGRAs Omkar Bhilare, Omar Ragheb, Boma Adhi, Kentaro Sano, Jason Anderson, and Tomohiro Ueno |
| 4:25pm – 4:45pm | Paper 4: Compiler-Based Performance Results for Regular Application Kernels on the HPC CGRA HiPReP Markus Weinhardt |
| 4:45pm – 5:05pm | Paper 5: bitSMM: A bit-Serial Matrix Multiplication Accelerator Pedro Antunes and Artur Podobas |
| 5:05pm – 5:10pm | Concluding remarks |
Call for Papers
The call for papers is available to download here.
Topics of Interest
Topics include (but are not limited to):
- Novel high-performance CGRA architectures for HPC and AI, including energy-efficient architectures (asynchronous/clockless CGRAs, power optimizations, etc.)
- Parallel programming language support for CGRA architectures (e.g., OpenMP or CUDA/HIP for CGRAs)
- Compilation strategies, algorithms, and methods for mapping computations onto CGRAs
- Smart middleware and runtime systems for CGRAs, including multi-CGRA systems for HPC and AI
- Experience porting scientific kernels and applications to state-of-the-art CGRAs (weather/climate codes, CFD, MD, etc.)
- Use of CGRA frameworks (e.g., CGRA-ME and OpenCGRA) to generate and customize architectures
- Software-programmable CGRAs (e.g., Xilinx ACAP Versal)
- Processors with a tightly interconnected CGRA subsystem
- Machine Learning applications and case studies, performance and power-efficiency comparisons between CPUs/GPUs and CGRAs
- Combination of CGRAs with other emerging post-Moore models (e.g., neuromorphic systems)
- New emerging CGRA-like architectures for Generative AI
- Case studies and evaluations of CGRAs for Generative AI
Paper Submission
We welcome full-length research papers on the topics of interest described above. Contributions should be unpublished and not under consideration at other venues.
- Maximum 8 single-spaced pages
- Double-column, 10-point font, 8.5×11 inch pages (IEEE conference style)
- Single-blind review process
- Accepted papers will be included in the workshop proceedings and submitted for inclusion in the IEEE Xplore Digital Library
We also welcome presentations on new and emerging CGRA technologies from industry and startups. Contact the organizers if you are interested in participating.
Organization
Organizers
- Artur Podobas — KTH, Sweden
- Kentaro Sano — RIKEN, Japan
- Jason Anderson — University of Toronto, Canada
- Tomohiro Ueno — RIKEN, Japan
Program Committee
- Boma Anantasatya Adhi — RIKEN
- Cheng Tan — Google / ASU
- Jens Domke — RIKEN CCS
- Lingli Wang — Fudan University
- Markus Weinhardt — HS Osnabrück
- Takuya Kojima — University of Tokyo
- Georgi Gaydadjiev — TU Delft
- Christian Hochberger — TU Darmstadt
- Omar Ragheb — Fujitsu