FailureAI on Intel Arc Pro B70 — Qwen3.5 single-model checkpoint + Gemma lab prep v0.6 alpha
This update captures the hard-won findings from the Qwen3.5/B70/Open WebUI/Frigate session and packages the working reference scripts, systemd examples, test utilities and Gemma-lab next steps.
The key architectural win: the camera bridge no longer loads its own Qwen2.5-VL model. Open WebUI text chat and FailureAI camera analysis now share one Qwen3.5 service on port 8011.
This page is a technical field log and reference kit. It intentionally includes failed attempts, ugly edge cases and the exact lessons learned so the next Gemma lab can start without repeating the same pain.
Current stable path: Qwen/Qwen3.5-9B via Transformers XPU on B70.
Next lab: Gemma4 E4B sanity check on port 8012, then possibly Gemma4 26B A4B quant.
Do not break: port 8011 Qwen3.5 production and failureai-vision-bridge.service.
Current verified status
failureai-qwen35-openai-api.service is active and enabled. It serves failureai-qwen3.5-9b-b70 on port 8011.
failureai-vision-bridge.service is active and enabled. It calls http://127.0.0.1:8011/failureai/vision/classify.
failureai-qwen-openai-api.service is inactive and disabled. Port 8010 should not be part of the normal production path.
Technically functional but not reliable enough to trust. Frigate classification is the stable image use case.
Known-good command output shape
systemctl is-active failureai-qwen35-openai-api.service # active
systemctl is-active failureai-vision-bridge.service # active
systemctl is-active failureai-qwen-openai-api.service # inactive
systemctl is-enabled failureai-qwen35-openai-api.service # enabled
systemctl is-enabled failureai-vision-bridge.service # enabled
systemctl is-enabled failureai-qwen-openai-api.service # disabled
Current architecture
Open WebUI
-> http://192.168.42.99:8011/v1
-> failureai-qwen3.5-9b-b70
Frigate camera event
-> MQTT frigate/events
-> FailureAI Vision Bridge
-> /failureai/vision/classify on 8011
-> Qwen/Qwen3.5-9B
-> MQTT ai/frigate/classification
-> Home Assistant automation / mobile notification / Agent DVR
Why this matters
The old bridge loaded Qwen/Qwen2.5-VL-3B-Instruct directly into XPU memory. That caused a split-brain setup: one model for Frigate and another for Open WebUI. The current version turns the bridge into a lightweight HTTP client and keeps one big model path loaded.
Before:
bridge -> AutoProcessor + Qwen2_5_VLForConditionalGeneration -> XPU
Open WebUI -> separate Qwen3.5 service
Now:
bridge -> HTTP POST :8011/failureai/vision/classify
Open WebUI -> HTTP POST :8011/v1/chat/completions
one loaded model: Qwen/Qwen3.5-9B
Qwen3.5 B70 service
OpenAI-compatible local API port.
Typical B70 VRAM usage after Qwen3.5 is loaded.
Approximate B70 memory utilization observed in the final checkpoint.
Important files
/opt/stack/llm-scaler-b70-lab/openai-api/qwen35_openai_api.py
/opt/stack/llm-scaler-b70-lab/scripts/start_qwen35_openai_api.sh
/etc/systemd/system/failureai-qwen35-openai-api.service
Useful health checks
curl -s http://127.0.0.1:8011/health | jq .
sudo ss -ltnp | grep ':8011' || true
docker exec llm-scaler-vllm-b70-lab bash -lc \
'ps -ef | grep -E "qwen35_openai_api|failureai_vision_bridge|uvicorn" | grep -v grep'
Open WebUI findings
Text chat
Open WebUI text chat works well enough through http://192.168.42.99:8011/v1. It is much more useful than the earlier Qwen2.5-VL chat path.
Image chat
Open WebUI image input was made technically functional, but Qwen3.5-9B remained unstable for free-form image captions. It often copied instructions, leaked English reasoning, echoed the prompt or focused on side details.
OpenAI-style image_url / base64 images no longer need to go to the tokenizer as text.
Argument list too long was avoided by using Python/urllib instead of huge shell curl payloads.
Free-form image captions are not reliable enough. Use Frigate classify for production vision tasks.
/tmp/test_openwebui_image_small.py is hardcoded to a Transit crop. Use test_openwebui_image_any.py instead.
Errors found and fixed
- raw base64 reached tokenizer: 1285400 > 262144
- RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements
- cleanup_answer missing / misplaced
- early cleanup truncated raw image answers to fragments like "The user asked"
- placeholder prompt text was copied as answer
- example prompt text was copied as answer
FailureAI Vision Bridge
The bridge is now a lightweight adapter that listens to Frigate MQTT events, creates/fetches snapshots/crops and calls the Qwen3.5 HTTP API.
Important files
/opt/stack/llm-scaler-b70-lab/scripts/failureai_vision_bridge.py
/opt/stack/llm-scaler-b70-lab/scripts/failureai_vision_bridge.qwen35-http-final-working.py
/etc/systemd/system/failureai-vision-bridge.service
Runtime values
MQTT input: frigate/events
MQTT output: ai/frigate/classification
Frigate URL: http://192.168.42.99:5000
Vision URL: http://127.0.0.1:8011/failureai/vision/classify
Model: Qwen/Qwen3.5-9B
Classification classes
own_car
unknown_car
delivery_van
false_positive
person
animal
Person events use a Frigate high-confidence fastpath. Car/vehicle events use the Qwen3.5 classify endpoint. A vehicle under a cover should be unknown_car, not false_positive.
Gemma lab next
The next clean step is to add Gemma as a lab service without touching Qwen3.5 production.
8011 = current Qwen3.5 production
8012 = future Gemma4 lab
Do not replace 8011 directly.
Do not connect Gemma to Frigate until it passes text, image, Open WebUI and memory tests.
Why Gemma failed earlier
- The previous Gemma attempt happened before the B70/XPU path was mature.
- Gemma4 26B was too large for the old RTX 3060 12GB path.
- Gemma4 E4B was light but not impressive enough at that time.
- The Docker VM CPU type may have been generic
kvm64, causing X86_V2 / NumPy / Open WebUI issues. - The bridge was not yet a clean HTTP client.
Suggested test order
1. Gemma4 E4B sanity check on port 8012
2. If E4B works, test Gemma4 26B A4B quantized
3. Compare with Llama 3.2 11B Vision, MiniCPM-V or InternVL if needed
Older Gemma26 notes from v0.5 alpha
The previous v0.5 page documented a Gemma4 26B-A4B Q4_K_M llama.cpp SYCL path. Those notes are kept as a separate download because they may still help the next Gemma lab.
- SYCL / Level Zero saw the B70 as
SYCL0. --device SYCL0was required;--device 0failed.--batch-size 4096and--ubatch-size 4096fixed one multimodal image crash.--reasoning offand--reasoning-budget 0gave cleaner message content.--reasoning-format nonewas not good in that test because tags leaked into output.
Troubleshooting and exact commands
Full status check
echo "===== services ====="
systemctl is-active failureai-qwen35-openai-api.service
systemctl is-active failureai-vision-bridge.service
systemctl is-active failureai-qwen-openai-api.service || true
echo "===== enabled ====="
systemctl is-enabled failureai-qwen35-openai-api.service
systemctl is-enabled failureai-vision-bridge.service
systemctl is-enabled failureai-qwen-openai-api.service || true
echo "===== health ====="
curl -s http://127.0.0.1:8011/health | jq .
echo "===== xpu ====="
docker exec llm-scaler-vllm-b70-lab bash -lc 'xpu-smi stats -d 0 || true'
Image test with explicit path
docker exec -i llm-scaler-vllm-b70-lab bash -lc \
'python3 /tmp/test_openwebui_image_any.py /path/to/image.jpg "Mitä kuvassa näkyy? Vastaa yhdellä lauseella."'
Reboot test
After reboot, model_loaded: false is normal until the first request warms the model.
sudo reboot
# after reboot
curl -s http://127.0.0.1:8011/health | jq .
systemctl is-active failureai-qwen35-openai-api.service
systemctl is-active failureai-vision-bridge.service
Downloads
New reference package with this chat's Qwen3.5 single-model checkpoint, Open WebUI image findings, test scripts, verification script, systemd example and Gemma-lab next notes.
Older Gemma26 llama.cpp SYCL notes and scripts. Useful for the next Gemma lab, not the current Qwen3.5 production path.
Older Qwen-focused bridge package kept for reference. Current bridge should use the Qwen3.5 HTTP API.
Original Frigate B70 / Battlemage / OpenVINO / VAAPI detector foundation kit.
Command-line download
cd /opt
wget -O failureai-qwen35-b70-v0.6-alpha.zip \
http://failurenetworks.net/failurefrib70/downloads/failureai-qwen35-b70-v0.6-alpha.zip
wget -O SHA256SUMS.txt \
http://failurenetworks.net/failurefrib70/downloads/SHA256SUMS.txt
sha256sum -c SHA256SUMS.txt --ignore-missing
unzip failureai-qwen35-b70-v0.6-alpha.zip
Do not publish real RTSP passwords, MQTT credentials, private camera URLs or unsanitized logs.