Skip to main content
Safety models classify and moderate content based on custom policies you define.

OpenAI
GPT-OSS Safeguard 120B
gpt-oss-safeguard-120b
Parameters: 117B (5.1B active)Context: 128K tokensStrengths: Safety reasoning, bring-your-own-policy flexibility, full access to reasoning chains for debugging, configurable reasoning effort levelsStructured Outputs: Structured response formatting supportBest for: Content moderation, policy enforcement, LLM guardrails, and Trust & Safety labeling workflowsConfiguration repo: tinfoilsh/confidential-gpt-oss-safeguard-120b
Safety Model: Classifies text content based on custom safety policies you provide.