Skip to main content

Juq405+top May 2026

| Step | Action | Result | |------|--------|--------| | 1️⃣ | Ported the existing TensorRT model to the compiler, enabling deterministic slicing of the network. | Baseline latency: 48 ms. | | 2️⃣ | Applied the +top overlay: switched to the top‑level scheduler and linked against the newest cuDNN v8.9 . | Latency dropped to 28 ms . | | 3️⃣ | Leveraged the top‑level monitoring to spot a rare spike (7 % of frames) where a particular slice stalled. Optimized that slice’s memory layout, gaining an extra 2 ms headroom. | Final latency: 26 ms , comfortably under the 30 ms target. | | 4️⃣ | Documented the whole stack under the tag juq405+top‑aetherbyte‑2026 . | Other teams now reuse the exact build without re‑tuning. |