Why is it that every time a developer has to go optimize something in assembler, they have to re-learn the darn language again?
Okay, you've optimized the, to use a technical term, crap out of that loop in C/C++ and it is not calling any functions what-so-ever any more...and it still has lousy performance. So, now you have to make the tough call to move to assembler. It will instantly tie your application to a single compiler and architecture and make it that much harder to port. Any reasonable business person will look at the potential millions lost and will decide that the product is "good enough" if it means getting it cross-platform faster to gain an extra 20% market share sooner and then go back later to "fix" the problem.
Unfortunately, in this case, there is only one platform and the performance is abysmal. The program should be operating at 10 times the speed it is actually performing at. And I've forgotten just about everything assembler in an attempt to block some really bad memories of a previous life.