I have been optimizing some Fortran code that involved searching for an integer value in an unordered array (we know the value occurs only once). Since there is no intrinsic procedure to accomplish this, I thought I’d try a couple of approaches to see which was fastest. The simple answer is that, in this case, brute force beats elegance, even when the target value is near the end of the array.
Download the full example from GitHub
Method 1: Brute force
Algorithm: use a do loop to iterate through the array until you find the target value. Exit the loop when you find the value.
do i = 1, num_elements if (array1(i) .eq. target_value) then loc = i exit endif end do
Results: CPU time required my PC: 0.048 sec
Method 2: forall
Algorithm: use a forall loop to compute the absolute value of the difference between the target value and each value in the array, and store the result in a temporary array. Then use the minloc intrinsic procedure to return the index of the minimum value of the temporary array.
forall (i=1:num_elements) array2(i) = abs(array1(i) - target_value) loc = minloc(array2, 1)
Results: CPU time 0.248 seconds
Method 3: Intrinsic Procedures
Algorithm: use a standard arithmetic operator to compute the difference between each element of the array the the target value, use the abs procedure to compute the absolute value, and use the minloc procedure to find the index of the minimum element.
loc = minloc(abs(array1 - target_value), 1)
Results: CPU time 0.172 seconds
Conclusions
Although the third method appears to be the most elegant, it is quite slow. A simple for loop is orders of magnitude faster, even when it is “penalized” by searching for a value near the end of the array. This is very different from Python (especially NumPy arrays), where it is generally much faster to use NumPy functions than to iterate over arrays.
Do you find that using different optimization flags changes the timing much?
I didn’t try changing the compiler flags. I am pretty sure the compiler is optimizing the do loop–otherwise, it would run much slower. It might be interesting to disable optimization altogether and see if the results change.
Trying with ifort on Linux, with `-O2` to `-O5` the time is the same, but with `-O1` or `-O0` the brute force loop is the fastest. Interestingly, the `forall` approach doesn’t even change with any optimization and is consistently the slowest.
I suspect that the built-in function minloc is the limiting factor for the second and third methods. Since this function is pre-built, compiler flags have no effect. Further, since the foreach construction tells the compiler that the loop can be performed in parallel, optimization flags have no effect on foreach. Compilers and processors have become so sophisticated that it can be counter-productive to manually optimize code. It’s best to write the code in a way that makes its purpose clear to the compiler.
Thank you!
and what if array doesn’t contain x at all? only the first one gives correct answer if you init loc to some ‘zero’ value.
How is this even possible? :O
Very helpful thanks, is there any difference seen between using the minloc() function and the findloc() function? (using intel fortran)
I will be using method 1 anyway if it’s that much faster!