openvino-fastapi-infer

Intel OpenVINO inference microservice (CPU/iGPU/NPU)

amd64
OpenVINOIntelInferenceRESTamd64
Overview
Primary hardware
Intel NUC / Arc / Core Ultra (amd64)
What it does

FastAPI wrapper around OpenVINO Runtime for image models, auto-selecting best device.

Why it saves time

Skip SDK installs and sample refactors; drop a model and get /health and /infer immediately.

Get access

Use StreamDeploy to manage OTA updates, versioned configs, and rollbacks across fleets.

Request access
Dockerfile
ARG BASE_IMAGE
FROM ${BASE_IMAGE:-"openvino/ubuntu22_runtime:2024.2.0"}
ENV DEBIAN_FRONTEND=noninteractive LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3-pip python3-venv curl ca-certificates && \
    rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir fastapi==0.111.0 uvicorn[standard]==0.30.0 pillow==10.3.0 numpy==1.26.4
WORKDIR /app
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
ENV MODEL_XML=/models/model.xml \
    SERVICE_HOST=0.0.0.0 SERVICE_PORT=8080 \
    OV_DEVICE="AUTO"
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s \
  CMD curl -fsS http://localhost:8080/health || exit 1
ENTRYPOINT ["/app/entrypoint.sh"]
entrypoint.sh
#!/usr/bin/env bash
set -euo pipefail
cat > /app/server.py << 'PY'
import os, io
from fastapi import FastAPI, UploadFile, File
from fastapi.responses import JSONResponse
from PIL import Image
import numpy as np
from openvino.runtime import Core

core = Core()
model_xml = os.getenv("MODEL_XML","/models/model.xml")
device = os.getenv("OV_DEVICE","AUTO")
model = core.read_model(model=model_xml)
compiled = core.compile_model(model=model, device_name=device)
input_name = compiled.input(0).get_any_name()

app = FastAPI()
@app.get("/health")
def health():
    return {"ok": True, "device": device, "model": os.path.basename(model_xml)}

@app.post("/infer")
async def infer(file: UploadFile = File(...)):
    img = Image.open(io.BytesIO(await file.read())).convert("RGB").resize((640,640))
    x = np.asarray(img).transpose(2,0,1)[None].astype(np.float32)/255.0
    res = compiled([x])
    return JSONResponse({"outputs":[r.tolist() for r in res]})
PY
exec python3 -m uvicorn server:app --host "${SERVICE_HOST}" --port "${SERVICE_PORT}"