Enables TF32/BF16 Tensor Core fast paths in PyTorch via safe auto-detection, with auditable, reversible flag application and reproducible benchmarks. A reproducible performance protocol packaged as ...