Our platform allows users to export model weights in ONNX format. When a training is finished, users may request a model to be exported by going to the Productionize tab in the project, selecting Model Container Export, checking a checkbox next to the model they would like to export, and clicking Export ONNX Model.
When the export is finished, users can download ONNX weights by clicking on the link provided in the models table next to the model they have exported.
Our Inference Server can be downloaded as an exported docker image that can be later imported into a running docker engine. The image contains all necessary dependencies and does not require internet connection, so it can be run locally on any isolated system. To download the server archive, users need to click on the Download Inference Server button directly above the models table and allow pop-up windows to be displayed.
The server supports any model exported from the platform. The model can be easily plugged into the server by mounting the weights into a correct path when starting a container.
Users can also mount input and output paths. While the server is running, it is going to continuously scan for new files in the input location, run inference as soon as any new files are detected, and save the inference results in the output location.
After downloading the isolated server archive, you can import the image into your docker environment:
gzip -cd isolated-server.tar.gz | docker load
The input directory should mount to /opt/code/input. The output directory should be mounted to /opt/code/output.
Example of how to run the container for object detection:
docker run -it -v $(pwd)/input/:/opt/code/input -v $(pwd)/output:/opt/code/output -v $(pwd)/object_detection_weights.h5:/opt/code/model_weights.h5 --network=host --gpus all crowd/isolated-server:latest /bin/bash -c "/opt/code/start-server.sh object-detection-tiled"
Example of how to run the container for segmentation:
docker run -it -v $(pwd)/input/:/opt/code/input -v $(pwd)/output:/opt/code/output -v $(pwd)/segmentation_weights.h5:/opt/code/model_weights.h5 --network=host --gpus all crowd/isolated-server:latest /bin/bash -c "/opt/code/start-server.sh segmentation"
Example of how to run the container for gRPC object detection:
docker run -it -v $(pwd)/segmentation_weights.h5:/opt/code/model_weights.h5 --network=host --gpus all crowd/isolated-server:latest /bin/bash -c "/opt/code/start-server-grpc.sh object-detection"
System dependencies:
Installing Docker: https://docs.docker.com/engine/install/ubuntu/
Configuring NVIDIA GPU: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html