AI Inference on the edge Network? CloudFlare

Mobile devices and other resource constrained devices, may never be able to run large inference models.

So, what is one to do?

How about running them on an edge network such as CloudFlare? The latency is of the order of 50 ms from almost any and every major city in the world.