We’re excited to announce Google AI Edge Torch – a direct path from PyTorch to the TensorFlow Lite (TFLite) runtime with nice mannequin protection and CPU efficiency. TFLite already works with fashions written in Jax, Keras, and TensorFlow, and we are actually including PyTorch as a part of a wider dedication to framework optionality.
This new providing is now accessible as a part of Google AI Edge, a set of instruments with quick access to ready-to-use ML duties, frameworks that allow you to construct ML pipelines, and run well-liked LLMs and customized fashions – all on-device. That is the primary of a sequence of weblog posts overlaying Google AI Edge releases that can assist builders construct AI enabled options, and simply deploy them on a number of platforms.
AI Edge Torch is launched in Beta in the present day that includes:
- Direct PyTorch integration
- Wonderful CPU efficiency and preliminary GPU help
- Validated on over 70 fashions from torchvision, timm, torchaudio and HuggingFace
- Help for > 70% of core_aten operators in PyTorch
- Compatibility with current TFLite runtime, with no change to deployment code wanted
- Help for Mannequin Explorer visualization at a number of levels of the workflow.
A easy, PyTorch-centric expertise
Google AI Edge Torch was constructed from the bottom as much as present an ideal expertise to the PyTorch neighborhood, with APIs that really feel native, and supply a simple conversion path.
import torchvision
import ai_edge_torch
# Initialize mannequin
resnet18 = torchvision.fashions.resnet18().eval()
# Convert
sample_input = (torch.randn(4, 3, 224, 224),)
edge_model = ai_edge_torch.convert(resnet18, sample_input)
# Inference in Python
output = edge_model(*sample_input)
# Export to a TfLite mannequin for on-device deployment
edge_model.export('resnet.tflite'))
Below the hood, ai_edge_torch.convert()
is built-in with TorchDynamo utilizing torch.export – which is the PyTorch 2.x approach to export PyTorch fashions into standardized mannequin representations meant to be run on completely different environments. Our present implementation helps greater than 60% of core_aten
operators, which we plan to extend considerably as we construct in direction of a 1.0 launch of ai_edge_torch
. We’ve included examples exhibiting PT2E quantization, the quantization method native to PyTorch2, to allow simple quantization workflows. We’re excited to listen to from the PyTorch neighborhood to seek out methods to enhance developer expertise when bringing innovation that begins in PyTorch to a large set of gadgets.
Protection & Efficiency
Previous to this launch, many builders had been utilizing neighborhood offered paths corresponding to ONNX2TF
to allow PyTorch fashions on TFLite. Our aim in growing AI Edge Torch was to scale back developer friction, present nice mannequin protection, and to proceed our mission of delivering finest in school efficiency on Android gadgets.
On protection, our assessments show important enhancements over the outlined set of fashions over current workflows, notably ONNX2TF
On efficiency, our assessments present constant efficiency with ONNX2TF baseline, whereas additionally exhibiting meaningfully higher efficiency than the ONNX runtime:
This exhibits detailed per-model efficiency on the subset of the fashions coated by ONNX:
Determine: Inference latency per community in comparison with ONNX, measured on Pixel8, fp32 precision, XNNPACK fastened to 4 threads to help reproducibility, common of 100 runs after 20 iteration heat up
Early Adoption and Partnerships
In the previous couple of months, we’ve labored intently with early adoption companions together with Shopify, Adobe, and Niantic to enhance our PyTorch help. ai_edge_torch is already being utilized by the workforce at Shopify to carry out on-device background elimination for product pictures and can be accessible in an upcoming launch of the Shopify app.
Silicon partnerships & delegates
We’ve additionally labored intently with companions to work on {hardware} help throughout CPUs, GPUs and accelerators – this contains Arm, Google Tensor G3, MediaTek, Qualcomm and Samsung System LSI. By means of these partnerships, we improved efficiency and protection, and have validated PyTorch generated TFLite recordsdata on accelerator delegates.
We’re additionally thrilled to co-announce Qualcomm’s new TensorFlow Lite delegate, which is now brazenly accessible right here for any developer to make use of. TFLite Delegates are add-on software program modules that assist speed up execution on GPUs and {hardware} accelerators. This new QNN delegate helps most fashions in our PyTorch Beta check set, whereas offering help for a large set of Qualcomm silicon, and provides important common speedups relative to CPU(20x) and GPU(5x) by using Qualcomm’s DSP and neural processing items. To make it simple to check out, Qualcomm has additionally just lately launched their new AI Hub. The Qualcomm AI Hub is a cloud service that permits builders to check TFLite fashions in opposition to a large system pool of Android gadgets, and supplies visibility of efficiency positive factors accessible on completely different gadgets utilizing the QNN delegate.
What’s subsequent?
Within the coming months we are going to proceed to iterate within the open, with releases increasing mannequin protection, bettering GPU help, and enabling new quantization modes as we construct to a 1.0 launch. Partly 2 of this sequence, we’ll take a deeper have a look at the AI Edge Torch Generative API, which permits builders to deliver customized GenAI fashions to the sting with nice efficiency.
We’d prefer to thank all of our early entry clients for his or her beneficial suggestions that helped us catch early bugs and guarantee a easy developer expertise. We’d additionally prefer to thank {hardware} companions, and ecosystem contributors to XNNPACK which have helped us enhance efficiency throughout a variety of gadgets. We might additionally prefer to thank the broader Pytorch neighborhood for his or her steering and help.
Acknowledgements
We’d prefer to thank all workforce members who contributed to this work: Aaron Karp, Advait Jain, Akshat Sharma, Alan Kelly, Arian Arfaian, Chun-nien Chan, Chuo-Ling Chang, Claudio Basille, Cormac Brick, Dwarak Rajagopal, Eric Yang, Gunhyun Park, Han Qi, Haoliang Zhang, Jing Jin, Juhyun Lee, Jun Jiang, Kevin Gleason, Khanh LeViet, Kris Tonthat, Kristen Wright, Lu Wang, Luke Boyer, Majid Dadashi, Maria Lyubimtseva, Mark Sherwood, Matthew Soulanille, Matthias Grundmann, Meghna Johar, Michael Levesque-Dion, Milad Mohammadi, Na Li, Paul Ruiz, Pauline Sho, Ping Yu, Pulkit Bhuwalka, Ram Iyengar, Sachin Kotwani, Sandeep Dasgupta, Sharbani Roy, Shauheen Zahirazami, Siyuan Liu, Vamsi Manchala, Vitalii Dziuba, Weiyi Wang, Wonjoo Lee, Yishuang Pang, Zoe Wang, and the StableHLO workforce.