What is Observable Behavior & Related Issues
The term observable behavior, according to the standard, means the following:
— Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread.
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
The “as-if rule” is strongly related, in short, any code transformation is allowed that does not change the observable behavior of the program.
The C++ standard precisely defines the observable behavior of every C++ program that does not fall into one of the following classes:
- ill-formed
- ill-formed, no diagnostic required
- implementation-defined behaviour
- unspecified behaviour
- undefined behaviour
Ill-Formed: The program has syntax errors and/or diagnosable semantic errors. The compiler will tell you about them. The violated rules are written in the standard with either shall, shall not or ill-formed.
Ill-Formed, No Diagnostic Required (IFNDR): There will be no compiler errors. The program doesn’t have syntactic errors, only semantic ones, but in general, they are not diagnosable by the compiler. These semantic errors are either detectable at link time, or if the program is executed, it results in undefined behavior.
Implementation-Defined Behavior: This is behavior that the C++ standard does not define explicitly, leaving it up to the compiler or the runtime environment to decide how to implement it. The compiler documentation typically documents how it handles these situations. Examples include the size of an int
or the rounding behavior of floating-point types.
Unspecified Behavior: This refers to behavior that the standard allows to vary between different implementations, but unlike implementation-defined behavior, the compiler is not required to document how it chooses to act. For instance, the order in which function arguments are evaluated is unspecified.
Undefined Behavior (UB): This is a term for behavior that the C++ standard does not define at all, often resulting from code errors such as accessing out-of-bounds array elements or dereferencing null pointers. When a program exhibits undefined behavior, anything can happen, from seemingly normal operation to crashes or corrupt data.
The reasons behind undefined behavior’s existence?
In essence, undefined behavior (UB) allows for high-performance, system-specific optimizations and maintains the language’s practicality and legacy support, at the cost of shifting the responsibility for avoiding it onto the programmer.
The concept of undefined behavior was not introduced by C++. It was already there in C.
For example, if you allocate an array in C, the data is unspecified. In Java (a language does not have UB), all bytes must be initialized to 0
(or some other specified value). This means the runtime must pass over the array (an O(n)
operation), while C can perform the allocation in an instant. So C will always be faster for such operations.
If the code using the array is going to populate it anyway before reading, this is basically wasted effort for Java. But in the case where the code read first, you get predictable results in Java but unpredictable results in C.
For example, when compiling a loop that could theoretically overflow an integer:
void copy_array(int* dst, const int* src, size_t size) {
for (size_t i = 0; i < size; ++i) {
dst[i] = src[i];
}
}
The compiler can assume i
will never overflow (as that would be undefined behavior), hence it can optimize without inserting overflow checks, resulting in faster loop execution.
How to avoid undefined behavior?
Using try-catch
blocks will not going to work, UB is not about exceptions handled in the wrong way.
So what can you do against undefined behaviour?
- Turn on whatever warnings you can (-Wall, -Wextra, -Wpedantic) and treat them as errors.
- Use a sanitizer, both g++ and clang offer some.
- Follow coding best practices and naming guidelines.
- Understand the concepts behind the language.
- Practice contractual programming (e.g., use and read document of the standard library)
- Share the knowledge
What is iterator invalidation?
Iterator invalidation refers to the loss of validity of an iterator due to certain modifications in the data structure it points to. In C++, iterators are often used to traverse containers like vectors, lists, maps, etc. If the container is modified (elements added, removed, or reallocated), existing iterators may no longer point to the correct elements or may point to memory that is no longer part of the container. Using invalidated iterators can lead to undefined behavior.
Refer to this table to have a full list of when a container is invalidated.