Efficiency in ILP Processing by Using Orthogonality

Brand M, Hannig F, Tanase AP, Teich J (2017)

Publication Language: English

Publication Type: Conference contribution, Abstract of a poster

Publication year: 2017

Pages Range: 207

Conference Proceedings Title: 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Event location: Seattle US

ISBN: 978-1-5090-4825-0

DOI: 10.1109/ASAP.2017.7995282


For the next generations of Processor-Arrays-on-Chip(e. g., coarse-grained reconfigurable or programmable arrays)— including more than 100s to 1000s of processing elements—it is very important to keep the on-chip configuration/instruction memories as small as possible. Hence, compilers must take into account the scarceness of available instruction memory and create the code as compact as possible [1]. However, Very Long Instruction Word (VLIW) processors have thewell-known problem that compilers typically produce lengthycodes. A lot of unnecessary code is produced due to unused Functional Units (FUs) or repeating operations for single FUs in instruction sequences. Techniques like software pipeliningcan be used to improve the utilization of the FUs, yet with therisk of code explosion [2] due to the overlapped scheduling of multiple loop iterations or other control flow statements. This is, where our proposed Orthogonal Instruction Processing (OIP) architecture (see Fig. 1) shows benefits in reducing the code size of compute-intensive loop programs. The idea is, contrary to lightweight VLIW processors used in arrays like Tightly Coupled Processor Arrays (TCPAs) [4], to equip each FU with its own instruction memory, branch unit, andprogram counter, but still let the FUs share the register files as well as input and output signals. This enables a processorto orthogonally execute a loop program. Each FU can execute its own sub-program while exchanging data over the register files. The branch unit and its instruction format have to beslightly changed by introducing a counter to each instructionthat determines how often the instruction is repeated until the specified branch is executed. This enables repeating instructions without repeating them in the code. Those kind of processors have to be carefully programmed, e. g., to not run into data dependency problems while optimizing throughput. For solving this resource-constrained modulo scheduling problem, we use techniques based on mixed integer linear programming [5], [3].
Obviously, the modifications of the processor produce architectural overhead in form of additional branch units and an increase of instruction memory compared to the lightweight VLIW processors. Thus, we created an analytical model of both the lightweight VLIW processor and our proposed architecture to analyze the overhead. The model gives an upper bound of the hardware costs and the memory consumption according to [7]. We examined the HW costs of a lightweight VLIW processor with different instruction memory lengths mVLIW and compared them to our OIP processor with varying instruction ratios IR and thus instruction memory lengths mOIP of each FU’s instruction memory. In the examination, we covered processors containingten FUs and averaged the HW costs over the instruction ratio. Figure 2 shows that the overhead is negligible as soon as we are able to reduce program sizes to 50 % (i. e., IR = 2), whichis usually achieved by our compiler.

Authors with CRIS profile

Related research project(s)

How to cite


Brand, M., Hannig, F., Tanase, A.-P., & Teich, J. (2017, July). Efficiency in ILP Processing by Using Orthogonality. Poster presentation at The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2017), Seattle, US.


Brand, Marcel, et al. "Efficiency in ILP Processing by Using Orthogonality." Presented at The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2017), Seattle Ed. IEEE, 2017.

BibTeX: Download