FailureAI on Intel Arc Pro B70 — Qwen3.5 single-model checkpoint + Gemma lab prep v0.6 alpha

This update captures the hard-won findings from the Qwen3.5/B70/Open WebUI/Frigate session and packages the working reference scripts, systemd examples, test utilities and Gemma-lab next steps.

The key architectural win: the camera bridge no longer loads its own Qwen2.5-VL model. Open WebUI text chat and FailureAI camera analysis now share one Qwen3.5 service on port 8011.

Qwen3.5 production checkpoint Single model path Bridge uses HTTP API Qwen2.5 disabled Open WebUI text OK Frigate classify OK Open WebUI image chat experimental Gemma lab next on 8012

Download Qwen3.5 B70 v0.6 alpha kit Download Gemma26 v0.5 notes Download Frigate B70 kit SHA256

This page is a technical field log and reference kit. It intentionally includes failed attempts, ugly edge cases and the exact lessons learned so the next Gemma lab can start without repeating the same pain.

v0.6 alpha

Current stable path: Qwen/Qwen3.5-9B via Transformers XPU on B70.

Next lab: Gemma4 E4B sanity check on port 8012, then possibly Gemma4 26B A4B quant.

Do not break: port 8011 Qwen3.5 production and failureai-vision-bridge.service.

Current verified status

Qwen3.5 API

failureai-qwen35-openai-api.service is active and enabled. It serves failureai-qwen3.5-9b-b70 on port 8011.

Vision bridge

failureai-vision-bridge.service is active and enabled. It calls http://127.0.0.1:8011/failureai/vision/classify.

Old Qwen2.5 API

failureai-qwen-openai-api.service is inactive and disabled. Port 8010 should not be part of the normal production path.

Open WebUI image chat

Technically functional but not reliable enough to trust. Frigate classification is the stable image use case.

Known-good command output shape

systemctl is-active failureai-qwen35-openai-api.service   # active
systemctl is-active failureai-vision-bridge.service       # active
systemctl is-active failureai-qwen-openai-api.service     # inactive

systemctl is-enabled failureai-qwen35-openai-api.service  # enabled
systemctl is-enabled failureai-vision-bridge.service      # enabled
systemctl is-enabled failureai-qwen-openai-api.service    # disabled

Current architecture

Open WebUI
  -> http://192.168.42.99:8011/v1
  -> failureai-qwen3.5-9b-b70

Frigate camera event
  -> MQTT frigate/events
  -> FailureAI Vision Bridge
  -> /failureai/vision/classify on 8011
  -> Qwen/Qwen3.5-9B
  -> MQTT ai/frigate/classification
  -> Home Assistant automation / mobile notification / Agent DVR

Why this matters

The old bridge loaded Qwen/Qwen2.5-VL-3B-Instruct directly into XPU memory. That caused a split-brain setup: one model for Frigate and another for Open WebUI. The current version turns the bridge into a lightweight HTTP client and keeps one big model path loaded.

Before:
  bridge -> AutoProcessor + Qwen2_5_VLForConditionalGeneration -> XPU
  Open WebUI -> separate Qwen3.5 service

Now:
  bridge -> HTTP POST :8011/failureai/vision/classify
  Open WebUI -> HTTP POST :8011/v1/chat/completions
  one loaded model: Qwen/Qwen3.5-9B

Qwen3.5 B70 service

8011

OpenAI-compatible local API port.

19-20G

Typical B70 VRAM usage after Qwen3.5 is loaded.

60%

Approximate B70 memory utilization observed in the final checkpoint.

Important files

/opt/stack/llm-scaler-b70-lab/openai-api/qwen35_openai_api.py
/opt/stack/llm-scaler-b70-lab/scripts/start_qwen35_openai_api.sh
/etc/systemd/system/failureai-qwen35-openai-api.service

Useful health checks

curl -s http://127.0.0.1:8011/health | jq .

sudo ss -ltnp | grep ':8011' || true

docker exec llm-scaler-vllm-b70-lab bash -lc \
'ps -ef | grep -E "qwen35_openai_api|failureai_vision_bridge|uvicorn" | grep -v grep'

Open WebUI findings

Text chat

Open WebUI text chat works well enough through http://192.168.42.99:8011/v1. It is much more useful than the earlier Qwen2.5-VL chat path.

Image chat

Open WebUI image input was made technically functional, but Qwen3.5-9B remained unstable for free-form image captions. It often copied instructions, leaked English reasoning, echoed the prompt or focused on side details.

Fixed

OpenAI-style image_url / base64 images no longer need to go to the tokenizer as text.

Fixed

Argument list too long was avoided by using Python/urllib instead of huge shell curl payloads.

Still annoying

Free-form image captions are not reliable enough. Use Frigate classify for production vision tasks.

Test trap

/tmp/test_openwebui_image_small.py is hardcoded to a Transit crop. Use test_openwebui_image_any.py instead.

Errors found and fixed

- raw base64 reached tokenizer: 1285400 > 262144
- RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements
- cleanup_answer missing / misplaced
- early cleanup truncated raw image answers to fragments like "The user asked"
- placeholder prompt text was copied as answer
- example prompt text was copied as answer

FailureAI Vision Bridge

The bridge is now a lightweight adapter that listens to Frigate MQTT events, creates/fetches snapshots/crops and calls the Qwen3.5 HTTP API.

Important files

/opt/stack/llm-scaler-b70-lab/scripts/failureai_vision_bridge.py
/opt/stack/llm-scaler-b70-lab/scripts/failureai_vision_bridge.qwen35-http-final-working.py
/etc/systemd/system/failureai-vision-bridge.service

Runtime values

MQTT input:   frigate/events
MQTT output:  ai/frigate/classification
Frigate URL:  http://192.168.42.99:5000
Vision URL:   http://127.0.0.1:8011/failureai/vision/classify
Model:        Qwen/Qwen3.5-9B

Classification classes

own_car
unknown_car
delivery_van
false_positive
person
animal

Person events use a Frigate high-confidence fastpath. Car/vehicle events use the Qwen3.5 classify endpoint. A vehicle under a cover should be unknown_car, not false_positive.

Gemma lab next

The next clean step is to add Gemma as a lab service without touching Qwen3.5 production.

8011 = current Qwen3.5 production
8012 = future Gemma4 lab

Do not replace 8011 directly.
Do not connect Gemma to Frigate until it passes text, image, Open WebUI and memory tests.

Why Gemma failed earlier

The previous Gemma attempt happened before the B70/XPU path was mature.
Gemma4 26B was too large for the old RTX 3060 12GB path.
Gemma4 E4B was light but not impressive enough at that time.
The Docker VM CPU type may have been generic kvm64, causing X86_V2 / NumPy / Open WebUI issues.
The bridge was not yet a clean HTTP client.

Suggested test order

1. Gemma4 E4B sanity check on port 8012
2. If E4B works, test Gemma4 26B A4B quantized
3. Compare with Llama 3.2 11B Vision, MiniCPM-V or InternVL if needed

Older Gemma26 notes from v0.5 alpha

The previous v0.5 page documented a Gemma4 26B-A4B Q4_K_M llama.cpp SYCL path. Those notes are kept as a separate download because they may still help the next Gemma lab.

SYCL / Level Zero saw the B70 as SYCL0.
--device SYCL0 was required; --device 0 failed.
--batch-size 4096 and --ubatch-size 4096 fixed one multimodal image crash.
--reasoning off and --reasoning-budget 0 gave cleaner message content.
--reasoning-format none was not good in that test because tags leaked into output.

Download older Gemma26 v0.5 alpha kit

Troubleshooting and exact commands

Full status check

echo "===== services ====="
systemctl is-active failureai-qwen35-openai-api.service
systemctl is-active failureai-vision-bridge.service
systemctl is-active failureai-qwen-openai-api.service || true

echo "===== enabled ====="
systemctl is-enabled failureai-qwen35-openai-api.service
systemctl is-enabled failureai-vision-bridge.service
systemctl is-enabled failureai-qwen-openai-api.service || true

echo "===== health ====="
curl -s http://127.0.0.1:8011/health | jq .

echo "===== xpu ====="
docker exec llm-scaler-vllm-b70-lab bash -lc 'xpu-smi stats -d 0 || true'

Image test with explicit path

docker exec -i llm-scaler-vllm-b70-lab bash -lc \
'python3 /tmp/test_openwebui_image_any.py /path/to/image.jpg "Mitä kuvassa näkyy? Vastaa yhdellä lauseella."'

Reboot test

After reboot, model_loaded: false is normal until the first request warms the model.

sudo reboot

# after reboot
curl -s http://127.0.0.1:8011/health | jq .
systemctl is-active failureai-qwen35-openai-api.service
systemctl is-active failureai-vision-bridge.service

Downloads

FailureAI Qwen3.5 B70 v0.6 alpha

New reference package with this chat's Qwen3.5 single-model checkpoint, Open WebUI image findings, test scripts, verification script, systemd example and Gemma-lab next notes.

Download Qwen3.5 v0.6 alpha

FailureAI Gemma26 B70 v0.5 alpha

Older Gemma26 llama.cpp SYCL notes and scripts. Useful for the next Gemma lab, not the current Qwen3.5 production path.

Download Gemma26 v0.5 alpha

Legacy bridge v0.4 alpha

Older Qwen-focused bridge package kept for reference. Current bridge should use the Qwen3.5 HTTP API.

Download bridge v0.4 alpha

Frigate B70 OpenVINO build kit

Original Frigate B70 / Battlemage / OpenVINO / VAAPI detector foundation kit.

Download Frigate B70 kit

Verify SHA256 Download notes

Command-line download

cd /opt
wget -O failureai-qwen35-b70-v0.6-alpha.zip \
  http://failurenetworks.net/failurefrib70/downloads/failureai-qwen35-b70-v0.6-alpha.zip

wget -O SHA256SUMS.txt \
  http://failurenetworks.net/failurefrib70/downloads/SHA256SUMS.txt

sha256sum -c SHA256SUMS.txt --ignore-missing
unzip failureai-qwen35-b70-v0.6-alpha.zip

Do not publish real RTSP passwords, MQTT credentials, private camera URLs or unsanitized logs.