Embedded Realtime Language Development Framework (ErLaDeF)

Internally funded project


Acronym: ErLaDeF

Start date : 01.01.2012

End date : 30.11.2014


Project picture

Project details

Scientific Abstract

ErLaDeF is our test-bed for new programming language and compiler techniques. Our main focus is on building infrastructure for easier (hard + soft) real-time embedded parallel systems programming.
We focus on hard real-time embedded systems as they are about to go massively parallel in the near future.
Real-time and embedded systems also have hard constraints on resource usage. For example, a task should complete in a fixed amount of time, have guaranteed upper-limits on the amount of memory used, etc. We are developing different ways to manage this concurrency using a combination of strategies: simpler language features, automatic parallelization, libraries of parallel programming patterns, deep compiler analysis, model checking, and making compiler analysis fast enough for interactive use.

Runtime Parallelization of Programs

Our automatic parallelization efforts are currently focused on dynamic parallelization. While a program is running, it is analyzed to find loops where parallelization can help performance. Our current idea is to run long-running loops three times. The first two runs analyze the memory accesses of the loop and can both run in parallel. The first run stores in a shared data structure for every memory address in which loop iteration a write access happens. We do not need any synchronization for this data structure, only the guarantee that one value is written to memory, when two concurrent writes happen. In the second pass we check for every memory access, if it has a dependency to one of the stored write accesses. A write access is part of any data-dependency, so we can find all types of data dependencies. If we do not find any, the loop is actually run in parallel. If we find dependencies, the loop is executed sequentially. We can execute the analyses in parallel to a modified sequential execution of the first loop iterations.

In 2013 we have explored alternatives to polymorphism and inheritance that may be easier to analyze. We have also examined alternative thread synchronization methods, for example transactional memory, implicit synchronization, remote procedure calls, etc.

In 2014 we enhanced the analysis so that a loop can start running while the remainder of the loop is analyzed to see if it can be run in parallel. To allow the sequential loop to execute while the tail of the loop is analyzed we needed to instrument the sequential loop slightly. The result is that the loop runs only slightly slower if the loop cannot be parallelized, but if the loop is found to be parallelizable, speedup is near to linear.

Finally, we also created a new language that uses the above library for run-time parallelization. Any loops that the programmer marked as candidates for run-time parallelization are analyzed for constructs that the library cannot yet handle. If the loop is clean, code is generated that uses the library's macros.

Design Patterns for Parallel Programming

A library of parallel programming patterns allows a programmer to select well known parallelization and inter-core communication strategies from a well-debugged library. We are performing research into what (communication) patterns actually exist and when they can be applied. We have collected over 30 different patterns for parallel communication. In 2013 we investigated mechanisms to automatically determine the fitting implementation for a given software and hardware environment. We also added a set of distributed channels where cores can send data from one local memory to another. The
distributed channels allow the library to be used to program modern Network-on-Chip (NoC) processors.

Script based language for embedded systems (Pylon)

Pylon is a language that is close to scripting languages (but is statically typed). A large part of the complexity that a programmer would normally take care of when creating an application, is moved to the compiler (i.e., type inference). The programmer does not have to think about types at all. By analyzing the expressions in the program, types are inferred (duck typing). The language is also implicitly parallel, the programmer does not need to have expert knowledge to parallelize an application. The compiler automatically decides what to run in parallel. Finally, the language is kept simple so that learning the language remains easy for novice programmers. For example, we kept the number of keywords small.
Any language constructs that make analyzing the program hard for a compiler has been omitted (pointer arithmetic, inheritance, etc.). Any removed features have been replaced by simpler variants that can be easily analyzed. The current focus of this project lies in supporting the programmer in designing the code. The previous programming language research results have been absorbed in the Pylon project. For example, the prior research results on alternatives to polymorphism and inheritance have been added to the Pylon project. This allows us to report errors at compile time where other languages can only find them at run-time.

Interactive Program Analysis

To ensure that program design errors are caught early in the development cycle, it is necessary to find bugs while editing. This requires that any program analysis works at interactive speeds. We are following two approaches to this. The first approach centers around algorithmic changes to program analysis problems. Making analysis problems lazy means that only those parts of a program should be examined that are pertinent to the question that is currently asked by the compiler. For example, if the compiler needs to know which functions access a certain object, it should not examine unrelated functions, classes, or packages. Making program analysis incremental means that a small change in the program should only require small work for the (re-)analysis. To achieve that, a program is split recursively into parts. Then, for each of the parts, it
is calculated which effect it would have during an execution of the program. For each part, a symbolic representation of its effects are saved. These representations can then for one be used to find the errors that occur when two of the parts interact (concurrently or non-concurrently). Also, we can deduce the effects that a bigger part of the program has during it’s execution by combining the effects of the smaller parts the bigger part consists of. This enables incremental analysis, because changes in one place do not cause the whole program to be reanalyzed, as the symbolic representations of the effects of unchanged parts of the program stay unchanged as well.

In 2013 our key focuses were twofold: Firstly, we developed data structures that can both precisely and efficiently describe the effects of a part of a program. Secondly, we developed both efficient and precise algorithms to create and use these data structures.

In 2014 we expanded and modified this analysis framework in order to support big code bases, to analyze them and keep the analysis results for later use.  This enables us to precisely analyze programs that use libraries, by first analyzing the library, and then using the library{'}s analysis result to get precise analysis results for our program.

Our second approach to bring compiler analysis to interactive speed is to make the analysis itself parallel. In 2013 we continued to develop data-parallel formulations of basic compiler analyses. We have started to implement a generic data-parallel predicate propagation framework. Its data-parallel forms are then portably executable on many different multi-core architectures.

Object-oriented languages offer the possibility to dynamically allcoate objects. The memory required for this is allocated at run-tie. However, in contrast to desktop systems, embedded systems typically have very little memory. If the 'new' operator is used often in an embedded system (that are now starting to be programmed with higher-level languages such as Java and C++ that include 'new'), memory can be exhausted at run-time causing an embedded system to crash.

In 2014, we created an analysis to find this problem at compile-time and report this to the developer.
To detect memory exhaustion at compile-time, the analysis determines the live-time of references to objects. If there are no more references to an object, the object can be removed from memory. Normally, such reference counting schemes are performed at run-time, we, however, perform reference counting at compile-time in an interactive fashion. The result is that memory management errors can be found at compile-time instead of at run-time. Additionally, the static reference counting increases a program{'}s performance as the reference counts do not have to be manipulated at run-time.

If it is statically determined that an object can be removed, the developer needs to insert a 'delete' statement. With explicit memory management, we are now able to statically determine a program's worst-case memory requirements. The whole analysis outlined above is integrated into Pylon and a predicate propagation framework previously reported on. Note that the analysis is language independent, however, and can be applied to other languages as well (Java, C++, etc.). However, we can in that case no guarantee that reference counts are correctly computed as we require Pylon{'}s analyzability for this.

The ErLaDeF project is a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi-anwendungszentrum.de/)

Involved:

Contributing FAU Organisations: