is it possible to apply post-training quantization for faster inference with tf-serving? it seems like the output is tflite and this is not usable for tf-serving.
-
-
-
We're working on it. I'm general a lot of the work is on the operations/kernels that need to be optimized according to the hardware specific capabilities. That's what we've done with TFLite ops on mobile CPUs and some NPUs. Seehttps://youtu.be/3JWRVx1OKQQ
End of conversation
New conversation -
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.