À l'étude

Overlay blend should be performed without a branch

il y a 9 ans • mis à jour il y a 9 ans • 3

The overlay blend formula is currently

( _dst > 0.5 ? (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)) : (2.0*_dst*_src) )

But it can be implemented in the following way with the use of the step function ( http://http.developer.nvidia.com/Cg/step.html ) :

lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(_dst, 0.5) )

This is also discussed in this thread:

http://forum.unity3d.com/threads/overlay-blend-mode-shader.181134/#post-1344111

This is healthier for GPUs, as it trades one branch instruction for two math instructions.

At least, it should be somehow possible to choose between the two.

(Note, I checked the generated GLSL code of the compiled shaderforge, and it indeed results in a branch instruction)

Voter

Réponses 3
Plus ancien en premier
- Plus récent en premier
- Plus ancien en premier

il y a 9 ans

Correction, this is the correct 1:1 fix:

return saturate(lerp( (2.0*_dst*_src), (1.0-(1.0-2.0*(_dst-0.5))*(1.0-_src)), step(0.5, _src) ));

Répondre
|

À l'étude

il y a 9 ans

I'm pretty sure that this was because the step function was actually more expensive on some platforms, creating a branch regardless. It's possible that it's slower in general though.

Répondre
|

il y a 9 ans

We saw a pretty clear improvement on android GLES2 and were able to create branchless shaders. Perhaps configurable?

Répondre
|

Service d'assistance aux clients par UserEcho