Note
This tutorial can be used interactively with Google Colab! You can also click here to run the Jupyter notebook locally.
microTVM with TFLite Models¶
Author: Tom Gall
This tutorial is an introduction to working with microTVM and a TFLite model with Relay.
Install microTVM Python dependencies¶
TVM does not include a package for Python serial communication, so we must install one before using microTVM. We will also need TFLite to load models.
%%shell pip install pyserial==3.5 tflite==2.1
import os
# By default, this tutorial runs on x86 CPU using TVM's C runtime. If you would like
# to run on real Zephyr hardware, you must export the `TVM_MICRO_USE_HW` environment
# variable. Otherwise (if you are using the C runtime), you can skip installing
# Zephyr. It takes ~20 minutes to install Zephyr.
use_physical_hw = bool(os.getenv("TVM_MICRO_USE_HW"))
Install Zephyr¶
%%shell # Install west and ninja python3 -m pip install west apt-get install -y ninja-build # Install ZephyrProject ZEPHYR_PROJECT_PATH="/content/zephyrproject" export ZEPHYR_BASE=${ZEPHYR_PROJECT_PATH}/zephyr west init ${ZEPHYR_PROJECT_PATH} cd ${ZEPHYR_BASE} git checkout v2.7-branch cd .. west update west zephyr-export chmod -R o+w ${ZEPHYR_PROJECT_PATH} # Install Zephyr SDK ZEPHYR_SDK_VERSION=0.13.2 ZEPHYR_SDK_FILE="/content/zephyr-sdk-linux-setup.run" wget --no-verbose -O $ZEPHYR_SDK_FILE \ https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v${ZEPHYR_SDK_VERSION}/zephyr-sdk-${ZEPHYR_SDK_VERSION}-linux-x86_64-setup.run chmod +x $ZEPHYR_SDK_FILE "$ZEPHYR_SDK_FILE" -- -d /content/zephyr-sdk --quiet # Install python dependencies python3 -m pip install -r "${ZEPHYR_BASE}/scripts/requirements.txt"
Import Python dependencies¶
import json
import tarfile
import pathlib
import tempfile
import numpy as np
import tvm
from tvm import relay
import tvm.contrib.utils
from tvm.contrib.download import download_testdata
model_url = "https://people.linaro.org/~tom.gall/sine_model.tflite"
model_file = "sine_model.tflite"
model_path = download_testdata(model_url, model_file, module="data")
tflite_model_buf = open(model_path, "rb").read()
Using the buffer, transform into a tflite model python object
try:
import tflite
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)
except AttributeError:
import tflite.Model
tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)
Print out the version of the model
Model Version: 3
Parse the python model object to convert it into a relay module and weights. It is important to note that the input tensor name must match what is contained in the model.
If you are unsure what that might be, this can be discovered by using
the visualize.py
script within the Tensorflow project.
See How do I inspect a .tflite file?
input_tensor = "dense_4_input"
input_shape = (1,)
input_dtype = "float32"
mod, params = relay.frontend.from_tflite(
tflite_model, shape_dict={input_tensor: input_shape}, dtype_dict={input_tensor: input_dtype}
)
Defining the target¶
Now we create a build config for relay, turning off two options and then calling relay.build which will result in a C source file for the selected TARGET. When running on a simulated target of the same architecture as the host (where this Python script is executed) choose “host” below for the TARGET, the C Runtime as the RUNTIME and a proper board/VM to run it (Zephyr will create the right QEMU VM based on BOARD. In the example below the x86 arch is selected and a x86 VM is picked up accordingly:
RUNTIME = tvm.relay.backend.Runtime("crt", {"system-lib": True})
TARGET = tvm.target.target.micro("host")
#
# Compiling for physical hardware
# When running on physical hardware, choose a TARGET and a BOARD that describe the hardware. The
# STM32F746 Nucleo target and board is chosen in the example below. Another option would be to
# choose the STM32F746 Discovery board instead. Since that board has the same MCU as the Nucleo
# board but a couple of wirings and configs differ, it's necessary to select the "stm32f746g_disco"
# board to generated the right firmware image.
#
if use_physical_hw:
boards_file = pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr")) / "boards.json"
with open(boards_file) as f:
boards = json.load(f)
BOARD = os.getenv("TVM_MICRO_BOARD", default="nucleo_l4r5zi")
SERIAL = os.getenv("TVM_MICRO_SERIAL", default=None)
TARGET = tvm.target.target.micro(boards[BOARD]["model"])
#
# For some boards, Zephyr runs them emulated by default, using QEMU. For example, below is the
# TARGET and BOARD used to build a microTVM firmware for the mps2-an521 board. Since that board
# runs emulated by default on Zephyr the suffix "-qemu" is added to the board name to inform
# microTVM that the QEMU transporter must be used to communicate with the board. If the board name
# already has the prefix "qemu_", like "qemu_x86", then it's not necessary to add that suffix.
#
# TARGET = tvm.target.target.micro("mps2_an521")
# BOARD = "mps2_an521-qemu"
Now, compile the model for the target:
with tvm.transform.PassContext(
opt_level=3, config={"tir.disable_vectorize": True}, disabled_pass=["AlterOpLayout"]
):
module = relay.build(mod, target=TARGET, runtime=RUNTIME, params=params)
# Inspecting the compilation output
# ---------------------------------
#
# The compilation process has produced some C code implementing the operators in this graph. We
# can inspect it by printing the CSourceModule contents (for the purposes of this tutorial, let's
# just print the first 10 lines):
c_source_module = module.get_lib().imported_modules[0]
assert c_source_module.type_key == "c", "tutorial is broken"
c_source_code = c_source_module.get_source()
first_few_lines = c_source_code.split("\n")[:10]
assert any(
l.startswith("TVM_DLL int32_t tvmgen_default_") for l in first_few_lines
), f"tutorial is broken: {first_few_lines!r}"
print("\n".join(first_few_lines))
# Compiling the generated code
# ----------------------------
#
# Now we need to incorporate the generated C code into a project that allows us to run inference on the
# device. The simplest way to do this is to integrate it yourself, using microTVM's standard output format
# (:doc:`Model Library Format` </dev/model_library_format>`). This is a tarball with a standard layout:
# Get a temporary path where we can store the tarball (since this is running as a tutorial).
fd, model_library_format_tar_path = tempfile.mkstemp()
os.close(fd)
os.unlink(model_library_format_tar_path)
tvm.micro.export_model_library_format(module, model_library_format_tar_path)
with tarfile.open(model_library_format_tar_path, "r:*") as tar_f:
print("\n".join(f" - {m.name}" for m in tar_f.getmembers()))
# Cleanup for tutorial:
os.unlink(model_library_format_tar_path)
# TVM also provides a standard way for embedded platforms to automatically generate a standalone
# project, compile and flash it to a target, and communicate with it using the standard TVM RPC
# protocol. The Model Library Format serves as the model input to this process. When embedded
# platforms provide such an integration, they can be used directly by TVM for both host-driven
# inference and autotuning . This integration is provided by the
# `microTVM Project API` <https://github.com/apache/tvm-rfcs/blob/main/rfcs/0008-microtvm-project-api.md>_,
#
# Embedded platforms need to provide a Template Project containing a microTVM API Server (typically,
# this lives in a file ``microtvm_api_server.py`` in the root directory). Let's use the example ``host``
# project in this tutorial, which simulates the device using a POSIX subprocess and pipes:
template_project_path = pathlib.Path(tvm.micro.get_microtvm_template_projects("crt"))
project_options = {} # You can use options to provide platform-specific options through TVM.
# Compiling for physical hardware (or an emulated board, like the mps_an521)
# --------------------------------------------------------------------------
# For physical hardware, you can try out the Zephyr platform by using a different template project
# and options:
#
if use_physical_hw:
template_project_path = pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr"))
project_options = {
"project_type": "host_driven",
"board": BOARD,
"serial_number": SERIAL,
"config_main_stack_size": 4096,
"zephyr_base": os.getenv("ZEPHYR_BASE", default="/content/zephyrproject/zephyr"),
}
# Create a temporary directory
temp_dir = tvm.contrib.utils.tempdir()
generated_project_dir = temp_dir / "generated-project"
generated_project = tvm.micro.generate_project(
template_project_path, module, generated_project_dir, project_options
)
# Build and flash the project
generated_project.build()
generated_project.flash()
// tvm target: c -keys=cpu -model=host
#define TVM_EXPORTS
#include "tvm/runtime/c_runtime_api.h"
#include "tvm/runtime/c_backend_api.h"
#include <math.h>
#include <stdbool.h>
#ifdef __cplusplus
extern "C"
#endif
TVM_DLL int32_t tvmgen_default_fused_nn_dense_add(void* args, int32_t* arg_type_ids, int32_t num_args, void* out_ret_value, int32_t* out_ret_tcode, void* resource_handle) {
- .
- ./codegen
- ./codegen/host
- ./codegen/host/src
- ./codegen/host/src/default_lib0.c
- ./codegen/host/src/default_lib1.c
- ./codegen/host/src/default_lib2.c
- ./executor-config
- ./executor-config/graph
- ./executor-config/graph/default.graph
- ./metadata.json
- ./parameters
- ./parameters/default.params
- ./src
- ./src/default.relay
Next, establish a session with the simulated device and run the computation. The with session line would typically flash an attached microcontroller, but in this tutorial, it simply launches a subprocess to stand in for an attached microcontroller.
with tvm.micro.Session(transport_context_manager=generated_project.transport()) as session:
graph_mod = tvm.micro.create_local_graph_executor(
module.get_graph_json(), session.get_system_lib(), session.device
)
# Set the model parameters using the lowered parameters produced by `relay.build`.
graph_mod.set_input(**module.get_params())
# The model consumes a single float32 value and returns a predicted sine value. To pass the
# input value we construct a tvm.nd.array object with a single contrived number as input. For
# this model values of 0 to 2Pi are acceptable.
graph_mod.set_input(input_tensor, tvm.nd.array(np.array([0.5], dtype="float32")))
graph_mod.run()
tvm_output = graph_mod.get_output(0).numpy()
print("result is: " + str(tvm_output))
result is: [[0.4443792]]