Export Your ML Model in ONNX Format


In this article, you will learn how to export models from PyTorch, scikit-learn, and TensorFlow/Keras to ONNX and compare PyTorch vs. ONNX Runtime inference on CPU for accuracy and speed.

Topics we will cover include:

  • Fine-tuning a ResNet-18 on CIFAR-10 and exporting it to ONNX.
  • Verifying numerical parity and benchmarking CPU latency between PyTorch and ONNX Runtime.
  • Converting scikit-learn and TensorFlow/Keras models to ONNX for portable deployment.

Without further delay, let’s begin.

Export Your ML Model in ONNX Format

Export Your ML Model in ONNX Format
Image by Author

Introduction

When building machine learning models, training is only half the journey. Deploying those models reliably across different environments is where many projects slow down. This is where ONNX (Open Neural Network Exchange) becomes important. ONNX provides a common, framework-agnostic format that allows models trained in PyTorch, TensorFlow, or scikit-learn to be exported once and run anywhere.

In this tutorial, we will go step by step through the complete ONNX workflow. We will start by fine-tuning a model and saving the fine-tuned version in native PyTorch format as well as in ONNX format.

Once both versions are ready, we will compare their inference performance on CPU, focusing on two key aspects: accuracy and inference speed. This comparison will help you understand the practical tradeoffs between framework-native models and ONNX-based deployment.

Finally, we will also cover how to convert models trained with scikit-learn and TensorFlow into ONNX format, so you can apply the same deployment approach across different machine learning frameworks.

Exporting Fine-Tuned PyTorch Model To ONNX

In this section, we will fine-tune a ResNet-18 model on the CIFAR-10 dataset for image classification. After training, we will save the fine-tuned model in the normal PyTorch format and also export it into ONNX format. Then we will run both versions on CPU and compare their inference results using accuracy and macro F1 score, along with inference speed.

Setting Up

First, we install the libraries we need for training, exporting, and benchmarking. We use PyTorch and TorchVision to fine-tune the model, ONNX to store the exported model, and ONNX Runtime to run ONNX inference on CPU.

We also install scikit-learn because it provides simple evaluation metrics like accuracy and F1 score.

Finally, we import all the required modules so we can train the model, export it, and measure performance.

Loading CIFAR-10 And Building ResNet-18

Now we prepare the dataset and model.

The get_cifar10_loaders function loads CIFAR-10 and returns two DataLoaders: one for training and one for testing. We resize CIFAR-10 images from 32×32 to 224×224 because ResNet-18 is designed for ImageNet-sized inputs.

We also apply ImageNet normalization values so the pretrained ResNet weights work correctly. The training loader includes random horizontal flipping to add basic data augmentation.

The build_resnet18_cifar10 function loads a ResNet-18 model with ImageNet pretrained weights and replaces the final fully connected layer. ImageNet has 1000 classes, but CIFAR-10 has 10 classes, so we update the last layer to output 10 logits.

Quick Fine-Tuning

In this step, we do a small fine-tuning run to make the model adapt to CIFAR-10. This is not meant to be a full training pipeline. It is a fast demo training loop so we can later compare PyTorch inference vs. ONNX inference.

The quick_finetune_cifar10 function trains the model for a limited number of batches. It uses cross-entropy loss because CIFAR-10 is a multi-class classification task. It uses the Adam optimizer for quick learning. The loop runs through batches, performs a forward pass, calculates the loss, runs backpropagation, and updates model weights. At the end, it prints an average training loss so we can see that training happened.

After training, we save the model weights using torch.save(). This creates a .pth file, which is the standard PyTorch format for storing model parameters.

Exporting To ONNX

Now we export the fine-tuned PyTorch model into ONNX format so it can be deployed and executed using ONNX Runtime.

The export_resnet18_cifar10_to_onnx function loads the model architecture again, loads the fine-tuned weights, and switches the model into evaluation mode using model.eval() so inference behaves consistently.

We also create a dummy input tensor with shape (1, 3, 224, 224). ONNX export needs this dummy input to trace the model graph and understand the input and output shapes.

Finally, torch.onnx.export() generates the .onnx file.

Benchmarking Torch CPU Vs. ONNX Runtime

In this final part, we evaluate both formats side by side. We keep everything on CPU so the comparison is fair.

The following function performs four major tasks:

  1. Load the PyTorch model on CPU.
  2. Load and validate the ONNX model.
  3. Check output similarity on one batch.
  4. Warm up and benchmark inference speed.

Then we run timed inference for a fixed number of batches:

  • We measure the time taken by PyTorch inference on CPU.
  • We measure the time taken by ONNX Runtime inference on CPU.
  • We collect predictions from both and compute accuracy and macro F1 score.

Finally, we print average latency per batch and show an estimated speedup ratio.

As a result, we get a detailed report. The accuracy remains the same, but we achieve faster inference speed with ONNX.

Exporting Scikit-Learn And Keras Models To ONNX

In this section, we show how ONNX can also be used beyond deep learning frameworks like PyTorch. We will export a traditional scikit-learn model and a TensorFlow/Keras neural network into ONNX format. This demonstrates how ONNX acts as a common deployment layer across classical machine learning and deep learning models.

Exporting A Scikit-Learn Model To ONNX

We will now train a simple Random Forest classifier on the Iris dataset using scikit-learn and then export it to ONNX format for deployment.

Before conversion, we explicitly define the ONNX input type, including the input name, floating-point data type, dynamic batch size, and the correct number of input features, which ONNX requires to build a static computation graph.

We then convert the trained model, save the resulting .onnx file, and finally validate it to ensure the exported model is well-formed and ready for inference with ONNX Runtime.

Our model is now trained, converted, saved, and validated.

Exporting A TensorFlow/Keras Model To ONNX

We will now export a TensorFlow neural network to ONNX format to demonstrate how deep learning models trained with TensorFlow can be prepared for portable deployment.

The environment is configured to run on CPU with minimal logging to keep the process clean and reproducible. A simple fully connected Keras model is built using the Functional API, with a fixed input size and a small number of layers to keep the conversion straightforward.

An input signature is then defined so ONNX knows the expected input shape, data type, and tensor name at inference time. Using this information, the Keras model is converted into ONNX format and saved as a .onnx file.

Finally, the exported model is validated to ensure it is well-formed and ready to be executed using ONNX Runtime or any other ONNX-compatible inference engine.

Our model is now trained, converted, saved, and validated.

Final Thoughts

ONNX provides a practical bridge between model training and real-world deployment by making machine learning models portable, framework-independent, and easier to optimize for inference.

By fine-tuning a PyTorch model, exporting it to ONNX, and comparing accuracy and CPU inference speed, we saw that ONNX can deliver the same predictive quality with improved performance.

It simplifies the path from experimentation to production and reduces friction when deploying models across different environments.

With this level of portability, performance, and consistency, it is worth asking: what more reason do you need not to use ONNX for all of your machine learning projects?



Source link