+1
Under review

Overlay blend should be performed without a branch

Noam Gat 9 jaar geleden bijgewerkt 9 jaar geleden 3

The overlay blend formula is currently


( _dst > 0.5 ? (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)) : (2.0*_dst*_src) )


But it can be implemented in the following way with the use of the step function ( http://http.developer.nvidia.com/Cg/step.html ) :


lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(_dst, 0.5) )


This is also discussed in this thread:

http://forum.unity3d.com/threads/overlay-blend-mode-shader.181134/#post-1344111


This is healthier for GPUs, as it trades one branch instruction for two math instructions.

At least, it should be somehow possible to choose between the two.


(Note, I checked the generated GLSL code of the compiled shaderforge, and it indeed results in a branch instruction)

Correction, this is the correct 1:1 fix:


return saturate(lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(0.5, _src) ));

Under review

I'm pretty sure that this was because the step function was actually more expensive on some platforms, creating a branch regardless. It's possible that it's slower in general though.

We saw a pretty clear improvement on android GLES2 and were able to create branchless shaders. Perhaps configurable?