Note_Tech

All technological notes.

Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

Guiding Principle & Guardrails

Veracity & Robustness
- to achieve correct output
- reduce hallucination
Safety
- prevent harmful system output and misuse
  - toxicity: the offensive, discriminatory or harmful content
Privacy and Security
- protect data and models
- prevent leaks of PII data or confidential data, copyright infringement
Explainability
- ensure decision making can be traced and explained
Fairness
- ensure diversity
Transparency
- Communicate info about AI system to stakeholders
Controllability
- monitor and steer AI behavior
Governance
- incorporate best practices into the AI supply chain.

Guardrails
- rules, filters, and control mechanisms designed to ensure an AI system behaves safely, ethically, and within intended boundaries, and ensure the reliability, safety, and compliance of large language models (LLMs) by implementing validation checks on input and output data.

ser Input → [Input Guardrails] → AI Model → [Output Guardrails] → User
                               ↓
                        [Action Guardrails]

Control what users are allowed to send into the AI.

Examples:

Example Scenario:

User tries to request illegal instructions → blocked before reaching the model

Filter or modify the AI’s response before returning it.

Examples:

Example Scenario:

AI generates unsafe advice → system rewrites or blocks it

Rules defined by organizations or platforms.

Examples:

Used heavily in enterprise AI systems.

Limit AI behavior based on user role or scenario.

Examples:

Control what actions an AI agent can take.

Examples:

Define behavior via system instructions.

You are a DevOps assistant.
Do not suggest destructive commands.
Do not expose secrets.

Require approval for critical or high-risk actions