Quantize Aware Training(QAT) Help!

My model quantized at 8 bits performs poorly when quantized using the tensorzone converter.
The 16 bit performance is good but produces 2x the latency.

In tensorflow using QAT I get really good performance in tensorflow but the imported FakeQuantize Layers add latency in inferencing so little is gained compared to the 16 bit model

Any thoughts or suggestions…