Several aspects affect inferencing speeds but just assuming kinds of cores and their enabling libraries would be the only differences between different manufacturers, Nvidia is 2.95 times as fast for inferencing because, if programmed competently, it can also run on all the kinds of cores on Nvidia GPUs simultaneously while the default OpenCL cannot and only uses CUDA cores then also only naively as if they were ordinary generic cores.
However, Metal for Apple M might not be as weak as, say, AMD if AI inferencing software is specifically redeveloped for Metal and if we limit our choice of Nvidia chips of comparable wafer nodes to those with power consumption as low as of Apple M. Without redevelopment for Metal, it boils down to the slow OpenCL when using Apple.
The difference between training and inferencing is rather the need for more storage and more runtime on given GPU chips. Both training and inferencing profit from more cores, faster cores and faster other hardware, such as CPU and bandwidths.