Governance for AI Agents (Part 1): The Gap, the OWASP Agentic Top 10, and Where AGT Fits

Governance for AI Agents (Part 1): The Gap, the OWASP Agentic Top 10, and Where AGT Fits


🎯 TL;DR

You’ve built a single agent, or wired up a multi-agent orchestration with LangChain / AutoGen / CrewAI / Microsoft Agent Framework. It works.

Now answer this: how are you covering the OWASP Agentic Top 10? How do you prove to a regulator that the agent did only what it was allowed to do? Prompt engineering and content filters don’t reach the layer where actions happen.

The Agent Governance Toolkit (AGT), an open-source Microsoft project I work on, puts a sub-millisecond deterministic policy decision in front of every tool call, gives every agent a cryptographic identity, and produces a tamper-evident audit trail.

This post is Part 1 of a series: what the governance gap is, what the OWASP Agentic Top 10 actually contains, and where AGT sits in your stack.

Repo: github.com/microsoft/agent-governance-toolkit

If you’ve shipped anything with LangChain, AutoGen, CrewAI, or Microsoft Agent Framework recently, you’ve probably hit the same wall I did. The agent works. It plans, calls tools, remembers things. And then you try to put it somewhere it can actually do harm, touch a database, hit a real API, run shell commands, talk to another agent, and you realise you have no good way to bound what it can do.

You have a model. You have prompts. You have a tool list. You don’t have a policy layer. So you do what we all do: stitch together if-statements, allowlists, regex filters on prompts, maybe a sandbox if you’re feeling fancy. It mostly works. Until it doesn’t.

flowchart LR
    U([User / Prompt]) --> M[LLM Planner
non-deterministic] M -->|tool call| T{{No policy layer}} T -->|just runs| DB[(Database)] T -->|just runs| API[(Production API)] T -->|just runs| SH[/Shell /] T -->|just runs| A2[Other Agents] style T fill:#ffd6d6,stroke:#c0392b,stroke-width:2px,color:#000 style M fill:#fff4c2,stroke:#b7950b,color:#000

The agent works. The governance doesn’t exist yet. That gap is what the OWASP Agentic Top 10 and every AI regulation is pointing at.

Read more
Running FLUX.1 OmniControl on a Consumer GPU: A Docker Implementation tested on RTX 3060

Running FLUX.1 OmniControl on a Consumer GPU: A Docker Implementation tested on RTX 3060


🎯 TL;DR: Subject-Driven Image Generation on 12GB VRAM

Large AI models like FLUX.1-schnell typically require datacenter GPUs with 48GB+ VRAM. Problem: Most developers and hobbyists only have access to consumer RTX cards which vary from 6 - 12GB VRAM in most cases (with the exception of the expensive 4090/5090 cards which can go up to 32gb).

Solution: Using mmgp (Memory Management for GPU Poor) with Docker containerization enables FLUX.1 OmniControl to run on RTX 3060 12GB through 8-bit quantization, dynamic VRAM/RAM offloading, and selective layer loading. The implementation provides a Gradio web interface generating 512x512 images in ~10 seconds after initial model loading, with models persisting in system RAM to avoid reload overhead.

Technical Approach: Profile 3 configuration quantizes the T5 text encoder (8.8GB → ~4.4GB), pins the FLUX transformer (22.7GB) to reserved system RAM, and dynamically loads only active layers to VRAM during inference. Tested and validated on RTX 3060 12GB with 64GB system RAM running Windows 11 + WSL2 + Docker Desktop.

Complete Implementation: All code, Dockerfile, and setup instructions are available at github.com/Ricky-G/docker-ai-models/omnicontrol


Recently, I wanted to experiment with OmniControl, a subject-driven image generation model that extends FLUX.1-schnell with LoRA adapters for precise control over object placement. The challenge? The model requirements listed 48GB+ VRAM, and I only had an RTX 3060 with 12GB sitting in my workstation.

This is a common frustration in the AI development community. Research papers showcase impressive results on expensive datacenter hardware, but practical implementation on consumer GPUs requires significant engineering effort. Could I actually run this model locally without upgrading to an RTX 4090/5090 or pay for a VM in Azure with A100?

The answer turned out to be yes - with some clever memory management and containerization. This blog post walks through the complete process of dockerizing OmniControl to run efficiently on a 12GB consumer GPU.

Read more