If a static library has the property CUDA_RESOLVE_DEVICE_SYMBOLS enabled
it will now perform the device link step. The normal behavior is
to delay calling device link until the static library is consumed by
a shared library or an executable.
Previously we had a two issues when building cuda executables
that required separable compilation. The first was that we didn't
propagate FLAGS causing any -arch / -gencode flags to be dropped, and
secondly generators such as ninja would use the CXX language flags
instead of CUDA when the executable was mixed language.