openvino-fastapi-infer
Intel OpenVINO inference microservice (CPU/iGPU/NPU)
amd64
OpenVINOIntelInferenceRESTamd64
Overview
Primary hardware
Intel NUC / Arc / Core Ultra (amd64)
What it does
FastAPI wrapper around OpenVINO Runtime for image models, auto-selecting best device.
Why it saves time
Skip SDK installs and sample refactors; drop a model and get /health and /infer immediately.
Get access
Use StreamDeploy to manage OTA updates, versioned configs, and rollbacks across fleets.
Request accessDockerfile
ARG BASE_IMAGE
FROM ${BASE_IMAGE:-"openvino/ubuntu22_runtime:2024.2.0"}
ENV DEBIAN_FRONTEND=noninteractive LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
RUN apt-get update && apt-get install -y --no-install-recommends \
python3-pip python3-venv curl ca-certificates && \
rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir fastapi==0.111.0 uvicorn[standard]==0.30.0 pillow==10.3.0 numpy==1.26.4
WORKDIR /app
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
ENV MODEL_XML=/models/model.xml \
SERVICE_HOST=0.0.0.0 SERVICE_PORT=8080 \
OV_DEVICE="AUTO"
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s \
CMD curl -fsS http://localhost:8080/health || exit 1
ENTRYPOINT ["/app/entrypoint.sh"]
entrypoint.sh
#!/usr/bin/env bash
set -euo pipefail
cat > /app/server.py << 'PY'
import os, io
from fastapi import FastAPI, UploadFile, File
from fastapi.responses import JSONResponse
from PIL import Image
import numpy as np
from openvino.runtime import Core
core = Core()
model_xml = os.getenv("MODEL_XML","/models/model.xml")
device = os.getenv("OV_DEVICE","AUTO")
model = core.read_model(model=model_xml)
compiled = core.compile_model(model=model, device_name=device)
input_name = compiled.input(0).get_any_name()
app = FastAPI()
@app.get("/health")
def health():
return {"ok": True, "device": device, "model": os.path.basename(model_xml)}
@app.post("/infer")
async def infer(file: UploadFile = File(...)):
img = Image.open(io.BytesIO(await file.read())).convert("RGB").resize((640,640))
x = np.asarray(img).transpose(2,0,1)[None].astype(np.float32)/255.0
res = compiled([x])
return JSONResponse({"outputs":[r.tolist() for r in res]})
PY
exec python3 -m uvicorn server:app --host "${SERVICE_HOST}" --port "${SERVICE_PORT}"