Created
May 6, 2025 21:59
-
-
Save hugs/b12edab5a6f6fe1b5928c5936ddcae2b to your computer and use it in GitHub Desktop.
Revisions
-
hugs created this gist
May 6, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,95 @@ # Valet Vision Architecture Overview Valet Vision is a flexible, open-source automation platform built on the Raspberry Pi. It enables precise control and observation of mobile devices through a combination of computer vision, hardware control, and network-accessible APIs. --- ## 📐 Architecture At its core, **Valet Vision** runs an HTTP server on a Raspberry Pi. This server exposes a simple JSON-over-HTTP API to: - Access the **camera feed** - Simulate **virtual mouse**, **keyboard**, and **stylus** inputs - Capture **screenshots** and stream **live MJPEG video** All requests can be made locally or over the network. The API is platform-agnostic, allowing automation scripts to run: - **Locally on the Valet** itself - On another machine within the same network 🗺️ Architecture diagram: [valetnet.dev/overview](https://valetnet.dev/overview/) --- ## 🔌 USB Gadget Protocol Valet Vision uses Linux’s **USB Gadget** protocol to emulate USB peripherals to a connected mobile device. This includes: - Virtual **touch stylus** - USB **keyboard** and **mouse** - Optional **Ethernet gadget** mode to share or restrict the Pi’s network access with the mobile device This allows both input simulation and network configuration of the attached phone or tablet, all via a single USB connection. --- ## 👁️ Vision + AI Capabilities - Screenshots can be fetched as `image/jpeg` or `image/png` - A **live video stream** is available as an MJPEG feed over HTTP Valet Vision includes: - **OpenCV** for computer vision and object detection - **Tesseract OCR** for text recognition These tools allow automation scripts to detect and act on visual UI elements in screenshots. For more demanding AI workloads, you have options: - Add the [Raspberry Pi AI Kit (GPU accelerator)](https://www.raspberrypi.com/products/ai-kit/) - Run ML inference remotely on a more powerful machine - Use hosted services to offload heavy image processing Importantly, **Valet Vision operates fully offline by default**—it does not require or depend on any cloud services. --- ## 📲 Push Button Module (PBM) For full-device automation, Valet Vision can control hardware side buttons (e.g., power, volume up/down) via an optional **Push Button Module** (PBM): - PBM actuators are **digitally controlled servos** - Connected via **Dynamixel Protocol 2.0** over serial from the Pi - Placement is flexible—PBM arms can be positioned on either side of the device Support for PBM control via HTTP API is on the roadmap. 🔗 More on the Dynamixel Protocol: [emanual.robotis.com](https://emanual.robotis.com/docs/en/dxl/protocol2/) --- ## 🧪 Developer Tools & Open Source The software that powers Valet Vision is fully open source: - Core Server: [checkbox-server on GitHub](https://github.com/tapsterbot/checkbox-server) - Python Client: [checkbox-client-python](https://github.com/tapsterbot/checkbox-client-python) ### Example: Simulate a Tap ```bash curl -X POST $HOST/api/touch/tap -H "Content-Type: application/json" -d '{"x": 0, "y": 0}' ``` --- ## 🧩 Modular, Local-First Automation Valet Vision is designed to be adaptable: - Fully autonomous (all logic and inference on-device) - Or controlled remotely from any machine on the same network - No mandatory cloud infrastructure This makes it ideal for labs, local QA environments, and regulated industries where network control and data locality matter.