Debugging a memory leak - FreeRTOS
There is a reason dynamic memory allocation is forbidden by most safety critical software development standards.
But what if a framework / bsp you are considering of using on a low cost COTS solution is making use of dynamic memory allocation for some components ? The first step in investigating suitability is probing the heap memory usage.
On FreeRTOS using a memory allocator based on the heap_4.c implementation, one can simply call xPortGetFreeHeapSize(void) to get the number of available bytes in the heap.
After running for some time and plotting the data, the existence of a memory leak can be seen as a constant decrease of available memory with discrete steps.
How do we find the offending allocations ?
We first need to figure out the size of the allocations that are not freed.
Memory is constantly allocated and freed in this case (another sign that this component is not suitable for safety critical applications). Every time there is an offending allocation that is not freed, it appears as a step in the data with noise from non offending allocations. Treating the non offending allocations as noise, we can filter them out using a rolling mode. As a reminder, the mode in statistics is the most common value in a dataset. By implementing it on a rolling window, we can distinguish the “steady state” or the baseline memory value of each window from the noise.
We can implement that easily with the power of numpy and scipy in python:
from numpy.lib.stride_tricks import sliding_window_view
from scipy import stats
import numpy as np
# window size for rolling mode (samples)
# estimated from the width of the steps in the data
wSIZE = 40
ramSlideView = sliding_window_view(ram, wSIZE)
timeSlideView = sliding_window_view(time, wSIZE)
rollingTime = np.mean(timeSlideView, axis=1)
rollingMode, _ = stats.mode(ramSlideView, axis=1)
slideModeDiffs = np.diff(rollingMode)
# allocations are negative diffs, keep only those
slideModeDiffs = slideModeDiffs[slideModeDiffs<0]
m, b = np.polyfit(rollingTime, slideMode, 1)
# calculate x intercept of linear fit: -b/m = xx'
print(f"Will run out memory in {-b/m*TICKTOSEC/60/60/24:.2f} days.")
Which when plotted gives us:
By performing a linear fit on the offending allocations and calculating the xx’ intercept, we learn that we will run out of memory in 138.75 days. That might be acceptable in the case of missile On-Board Software, but it is unacceptable in almost every other case.
By printing out the diffs of the clean data (or plotting a histogram if they are many) we find out that the offending allocations are in the order of 348 bytes. Now we just need to find what allocates 348 bytes.
The heap_4.c implementation of FreeRTOS already calls a handy traceMALLOC( pvAddress, uiSize )
macro in every allocation, which is not implemented by default. We can define that macro and point it to a function that logs the information of each allocation.
Using the built in GCC function __builtin_return_address
to walk the stack frames and find the address of the caller we can find out where pvPortMalloc was called from. The number of frames you need to walk back will depend on your specific memory management setup.
#define traceMALLOC( pvAddress, uiSize ) printAlloc( pvAddress, uiSize )
void printAlloc(void * pvAddress, size_t uiSize) {
printf("%x bytes allocated at %p by %p\n", uiSize, pvAddress, __builtin_return_address(1));
}
Finally, we can grep serial output to single out the allocations whose sizes are of the offending value we discovered, and use addr2line -pfiaC
to convert the instruction address to a more human friendly format. We then proceed with finding out where vPortFree
should be added in order to fix the leak.
..or we give up on that component and follow the safety critical SW requirements of static memory allocation, as we should’ve done in the first place.
Looking for support with embedded development/avionics/radiation hardness assurance/product prototyping projects or any combination thereof ? I am open for work !