Note_Tech

All technological notes.


Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

Guiding Principle & Guardrails

Back


AI Guiding Principle


Guardrail


Types of AI Guardrails

ser Input → [Input Guardrails] → AI Model → [Output Guardrails] → User
                               ↓
                        [Action Guardrails]

Input Guardrails (Pre-processing)

Control what users are allowed to send into the AI.

Examples:

Example Scenario:

User tries to request illegal instructions → blocked before reaching the model


Output Guardrails (Post-processing)

Filter or modify the AI’s response before returning it.

Examples:

Example Scenario:

AI generates unsafe advice → system rewrites or blocks it


Policy-Based Guardrails

Rules defined by organizations or platforms.

Examples:

Used heavily in enterprise AI systems.


Contextual Guardrails

Limit AI behavior based on user role or scenario.

Examples:


Tool / Action Guardrails (Critical for AI Agents)

Control what actions an AI agent can take.

Examples:


How Guardrails Are Implemented

Rule-Based Filters

ML-Based Moderation

Prompt Engineering

Define behavior via system instructions.

You are a DevOps assistant.
Do not suggest destructive commands.
Do not expose secrets.

Human-in-the-Loop

Require approval for critical or high-risk actions