Nowa: A Wait-Free Continuation-Stealing Concurrency Platform

Schmaus F, Pfeiffer N, Hönig T, Nolte J, Schröder-Preikschat W (2021)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2021

Conference Proceedings Title: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Event location: Portland, Oregon US


DOI: 10.1109/IPDPS49936.2021.00044

Open Access Link:


It is an ongoing challenge to efficiently use parallelism with today’s multi- and many-core processors. Scalability becomes more crucial than ever with the rapidly growing number of processing elements in many-core systems that operate in data centres and embedded domains. Guaranteeing scalability is often ensured by using fully-strict fork/join concurrency, which is the prevalent approach used by concurrency platforms like Cilk. The runtime systems employed by those platforms typically resort to lock-based synchronisation due to the complex interactions of data structures within the runtime. However, locking limits scalability severely. With the availability of commercial off-the shelf systems with hundreds of logical cores, this is becoming a problem for an increasing number of systems.

This paper presents Nowa, a novel wait-free approach to arbitrate the plentiful concurrent strands managed by a concurrency platform’s runtime system. The wait-free approach is enabled by exploiting inherent properties of fully-strict fork/join concurrency, and hence is potentially applicable for every continuation-stealing runtime system of a concurrency platform. We have implemented Nowa and compared it with existing runtime systems, including Cilk Plus, and Threading Building Blocks (TBB), which employ a lock-based approach. Our evaluation results show that the wait-free implementation increases the performance up to 1.64 × compared to lock-based ones, on a system with 256 hardware threads. The performance increased by 1.17 × on average, while no but one benchmark exhibited performance regression. Compared against OpenMP tasks using Clang’s libomp, Nowa outperforms OpenMP by 8.68 × on average.

Authors with CRIS profile

Related research project(s)

Involved external institutions

How to cite


Schmaus, F., Pfeiffer, N., Hönig, T., Nolte, J., & Schröder-Preikschat, W. (2021). Nowa: A Wait-Free Continuation-Stealing Concurrency Platform. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Portland, Oregon, US.


Schmaus, Florian, et al. "Nowa: A Wait-Free Continuation-Stealing Concurrency Platform." Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Portland, Oregon 2021.

BibTeX: Download