+1
Under review
Overlay blend should be performed without a branch
The overlay blend formula is currently
( _dst > 0.5 ? (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)) : (2.0*_dst*_src) )
But it can be implemented in the following way with the use of the step function ( http://http.developer.nvidia.com/Cg/step.html ) :
lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(_dst, 0.5) )
This is also discussed in this thread:
http://forum.unity3d.com/threads/overlay-blend-mode-shader.181134/#post-1344111
This is healthier for GPUs, as it trades one branch instruction for two math instructions.
At least, it should be somehow possible to choose between the two.
(Note, I checked the generated GLSL code of the compiled shaderforge, and it indeed results in a branch instruction)
Customer support service by UserEcho
Correction, this is the correct 1:1 fix:
return saturate(lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(0.5, _src) ));
I'm pretty sure that this was because the step function was actually more expensive on some platforms, creating a branch regardless. It's possible that it's slower in general though.
We saw a pretty clear improvement on android GLES2 and were able to create branchless shaders. Perhaps configurable?