+1
À l'étude

Overlay blend should be performed without a branch

Noam Gat il y a 9 ans mis à jour il y a 8 ans 3

The overlay blend formula is currently


( _dst > 0.5 ? (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)) : (2.0*_dst*_src) )


But it can be implemented in the following way with the use of the step function ( http://http.developer.nvidia.com/Cg/step.html ) :


lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(_dst, 0.5) )


This is also discussed in this thread:

http://forum.unity3d.com/threads/overlay-blend-mode-shader.181134/#post-1344111


This is healthier for GPUs, as it trades one branch instruction for two math instructions.

At least, it should be somehow possible to choose between the two.


(Note, I checked the generated GLSL code of the compiled shaderforge, and it indeed results in a branch instruction)

Correction, this is the correct 1:1 fix:


return saturate(lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(0.5, _src) ));

À l'étude

I'm pretty sure that this was because the step function was actually more expensive on some platforms, creating a branch regardless. It's possible that it's slower in general though.

We saw a pretty clear improvement on android GLES2 and were able to create branchless shaders. Perhaps configurable?