Skip to main content
openai tutorialOpenAI tutorial

Tutorial 09: Custom Legal Skills, Guardrails & Agents (OpenAI)

Build custom skills for your firm's workflows, create compliance guardrails, and deploy multi-agent systems for complex legal tasks.

What You'll Learn

This tutorial shows you how to build custom legal skills, add safety checks (guardrails), and run multi-agent workflows with OpenAI. Some technical comfort is required.

Expert Level

Developer skills recommended. Estimated time: 120 minutes.

Learning Objectives

By the end of this tutorial, you will:

  • Understand OpenAI's architecture (Assistants, Custom GPTs, Moderation, custom guardrails)
  • Build custom legal skills for your firm's workflows
  • Create guardrails for quality control and compliance
  • Deploy multi-agent systems for complex legal tasks

Part 1: Understanding the OpenAI Stack

Architecture Overview

OPENAI STACK
|
+-- CUSTOM GPTs
|   +-- Custom instructions and knowledge files
|   +-- Persistent context for matter-specific workflows
|
+-- ASSISTANTS API
|   +-- Multi-turn conversations with tools
|   +-- File search and code interpreter
|   +-- Orchestration via threads and runs
|
+-- MODERATION & GUARDRAILS
|   +-- Moderation API: input/output content classification
|   +-- Custom wrappers: audit logging, compliance checks
|
+-- FUNCTION CALLING
|   +-- Connect to external tools and data sources
|
+-- ACTIONS (Custom GPTs)
    +-- OpenAPI-defined integrations for ChatGPT
ComponentLegal Application
Custom GPTsEncode playbooks, review procedures, drafting standards
Assistants APIMulti-step workflows, document processing pipelines
Moderation APIInput/output content classification; block inappropriate content
Custom WrappersAudit logging, compliance checks, approval gates
Function CallingConnect to document management, legal research tools
OrchestrationParallelize document review, research tasks (your code coordinates)

What Are Skills in OpenAI Context?

Skills are specialized instructions and knowledge stored in Custom GPTs or Assistants. Unlike one-time prompts, skills persist and activate automatically within the configured context.

Skill Implementation via Custom GPT

Step 1: Create Custom GPT

  • Go to ChatGPT and create a new Custom GPT
  • Name it (e.g., "Contract Review - [Firm Name]")

Step 2: Write Instructions (SKILL.md equivalent)

# Contract Review Skill

## Purpose
This skill provides comprehensive contract review capabilities
aligned with [Firm Name]'s standard practices.

## Activation
Apply this skill when:
- User uploads a contract document
- User mentions "contract review" or similar
- User references specific contract types (NDA, MSA, SaaS, etc.)

## Process

### Step 1: Classification
Before analyzing, identify:
1. Contract Type: NDA, MSA, SaaS, License, Services, etc.
2. Our Role: Which party do we represent?
3. Counterparty Profile: Enterprise, mid-market, startup?
4. Deal Tier: Estimated value and strategic importance

### Step 2: Document Processing
- Read entire contract before providing analysis
- Note all defined terms and their definitions
- Identify governing law and dispute resolution
- Map clause structure and cross-references

### Step 3: Playbook Application
Apply positions from playbook:
- Compare each clause to standard position
- Identify deviations and assess severity
- Note missing required provisions

### Step 4: Risk Assessment
For each issue:
- Assign severity: RED | YELLOW | GREEN
- Explain practical business impact
- Consider interaction with other clauses

### Step 5: Redline Generation
For RED and YELLOW issues:
- Provide specific alternative language
- Reference clause library when applicable
- Explain rationale for changes

### Step 6: Output Generation
Structure response as:
1. Executive Summary (3-5 sentences)
2. Deal Parameters Table
3. Clause-by-Clause Analysis
4. Risk Score and Escalation Recommendation
5. Negotiation Priorities
6. Questions for Business Team

## Quality Requirements
- Never provide legal advice without qualification
- Flag any clause requiring jurisdictional verification
- Note when playbook doesn't cover specific terms
- Recommend escalation for deals over $500K

Step 3: Add Knowledge Files

Upload to Custom GPT or Assistant:

  • Playbook JSON or markdown
  • Clause library
  • Example outputs (good and bad)

Knowledge File Structure (for Custom GPT):

FilePurposeFormat
playbook.jsonStandard positions, risk thresholdsJSON
clause-library.mdApproved language for redlinesMarkdown
good-review-example.mdCalibration referenceMarkdown
bad-review-example.mdAnti-pattern to avoidMarkdown

Reference these in instructions: "Apply positions from playbook.json. Use clause-library.md for approved redline language."

Example: Playbook JSON (upload as knowledge file)

{
  "contract_types": {
    "SaaS_Customer": {
      "liability": {
        "standard": "12 months fees",
        "minimum": "total contract value",
        "carve_outs": ["indemnification", "data_breach", "confidentiality", "IP", "gross_negligence", "willful_misconduct"]
      },
      "indemnification": {
        "required_vendor": ["IP_infringement", "data_breach", "security_failure"],
        "acceptable_exclusions": ["customer_modifications", "third_party_components_with_notice"]
      },
      "data": {
        "ownership": "customer",
        "vendor_rights": "service_delivery_only",
        "prohibited_uses": ["AI_training", "analytics", "marketing", "sale"],
        "retention_limit": "30_days_post_termination"
      }
    }
  },
  "severity_matrix": {
    "RED": [
      "unlimited_customer_liability",
      "no_vendor_indemnity",
      "data_used_for_AI_training",
      "no_termination_for_convenience"
    ],
    "YELLOW": [
      "liability_cap_below_12_months",
      "narrow_indemnity_carveouts",
      "60_plus_day_termination_notice"
    ]
  }
}

Example: Clause Library (upload as knowledge file)

# Approved Clause Language Library
 
## Limitation of Liability
 
### Standard Mutual Cap
"EACH PARTY'S TOTAL LIABILITY ARISING OUT OF OR RELATED TO THIS
AGREEMENT SHALL NOT EXCEED THE FEES PAID OR PAYABLE BY CUSTOMER
IN THE TWELVE (12) MONTHS PRECEDING THE CLAIM."
 
### Uncapped Carve-Outs Addition
"THE FOREGOING LIMITATION SHALL NOT APPLY TO: (A) EITHER PARTY'S
INDEMNIFICATION OBLIGATIONS; (B) BREACH OF SECTION [DATA SECURITY];
(C) BREACH OF CONFIDENTIALITY OBLIGATIONS; (D) EITHER PARTY'S
GROSS NEGLIGENCE OR WILLFUL MISCONDUCT; OR (E) CUSTOMER'S PAYMENT
OBLIGATIONS."
 
## Data Ownership
 
### Customer Ownership Clause
"As between the parties, Customer retains all right, title, and
interest in and to Customer Data. Vendor acquires no rights in
Customer Data except the limited license granted herein."
 
### No AI Training Clause
"Vendor shall not use Customer Data or any derivatives thereof
to train, develop, or improve any machine learning model,
artificial intelligence system, or similar technology."

Step 4: Test and Calibrate

Run 3-5 real contracts through the GPT. Compare output to expert review. Refine instructions.

Calibration Notes (add to GPT instructions or a knowledge file):

  • Liability cap thresholds updated January 2026
  • New data processing requirements per GDPR changes
  • Updated AI/ML clause language required

Example: NDA Triage Skill

For high-volume NDA processing, create a separate Custom GPT with triage-specific instructions:

# NDA Triage Skill

## Purpose
Rapid classification of incoming NDAs for routing.

## Activation
Apply when user uploads NDA or requests "triage this NDA."

## Process
1. Identify NDA type (mutual/one-way, direction)
2. Scan for non-standard terms: non-solicit, jurisdiction, term length, carve-outs
3. Compare to our standard NDA template
4. Assign GREEN/YELLOW/RED with specific flagged items
5. Output: Triage result, flagged items, recommendation

## Standard Items (No Flag)
- Mutual obligations, 3-year term, standard exclusions, return/destruction

## Non-Standard (Flag for Review)
- Non-solicit, jurisdiction other than Delaware, term >5 years, broad carve-outs

Part 3: Building Compliance Guardrails

What Are Guardrails?

Guardrails are checks that run at specific points in the AI workflow. OpenAI provides the Moderation API for content classification; additional controls (audit logging, PII checks, approval gates) are implemented via custom wrappers or the OpenAI Guardrails Python framework (preview).

Guardrail TypeTrigger PointImplementation
Input ModerationBefore prompt processingModeration API or custom wrapper
Output ModerationAfter responseModeration API or custom wrapper
Content FiltersBefore/afterCustom logic; PII detection, privilege checks
Audit LoggingVia wrapperCustom API wrapper around OpenAI calls

Purpose: Prevent AI from outputting unauthorized changes to privileged documents.

Implementation Options:

  1. OpenAI Moderation API: Use for input/output content filtering
  2. Custom API Wrapper: Intercept requests/responses, log, validate
  3. Post-Processing Script: Run output through validation before use

Example: Audit Logging Wrapper (Conceptual)

// Conceptual: Wrap OpenAI calls to log all legal AI interactions
async function legalAICall(prompt, options) {
  const startTime = Date.now();
  const response = await openai.chat.completions.create({
    model: "gpt-4", // or current model (e.g., gpt-4o)
    messages: [{ role: "user", content: prompt }],
    ...options
  });
  await auditLog.write({
    timestamp: new Date(),
    duration: Date.now() - startTime,
    promptHash: hash(prompt),
    model: options.model,
    tokenCount: response.usage?.total_tokens
  });
  return response;
}

Additional Guardrail Use Cases

Citation Verification:

  • Add instruction: "For any legal citation, note that verification in Westlaw/Lexis is required."
  • Post-process output to flag citation patterns for human review

Confidentiality Check:

  • Use input moderation to detect potential client identifiers
  • Add instruction: "Do not include client names or matter identifiers in output."

Privilege Protection:

  • Restrict which Custom GPTs can access privileged project folders (via integration design)
  • Log all document access for compliance review

Guardrail Implementation Checklist

Before deploying legal AI with guardrails:

  • Input moderation: Moderation API or custom wrapper blocks inappropriate content
  • Output verification: Instruction to "note that all citations require Westlaw/Lexis verification"
  • Audit logging: All API calls logged with timestamp, model, token count, prompt hash (no PII)
  • Confidentiality: Instructions prohibit client names and matter IDs in output
  • Approval gates: For privileged folders, require explicit user approval before document access
  • PII detection: Input/output scanned for identifiers; redact or block as needed
  • Rate limiting: Throttle high-volume calls to avoid cost spikes and abuse

OpenAI Moderation API Quick Reference

// Check user input before sending to model
const moderation = await openai.moderations.create({
  input: userPrompt
});
if (moderation.results[0].flagged) {
  return { error: "Content flagged. Please revise." };
}

Categories include harassment, hate, sexual, violence, self-harm, and others. See the Moderation API reference for the full list. For legal workflows, focus on preventing accidental data leakage rather than content moderation.


Understanding Multi-Agent Patterns

OpenAI supports multi-agent workflows via:

  • Assistants API: Multiple assistants with different instructions; your code coordinates runs across threads
  • Agents SDK: Handoffs between specialized agents (see OpenAI Agents SDK for JS/TS; openai-agents-python for Python)
  • Orchestration: Your code coordinates multiple API calls (Assistants or Chat Completions)

Example: Parallel Due Diligence Review

// Due Diligence Multi-Agent Workflow (Conceptual)
// Uses Assistants API: openai.beta.threads.runs.create
 
const reviewTasks = [
  { name: 'contract-reviewer', assistantId: 'asst_xxx', instructions: 'Apply customer agreement playbook' },
  { name: 'ip-reviewer', assistantId: 'asst_yyy', instructions: 'Apply IP agreement playbook' },
  { name: 'employment-reviewer', assistantId: 'asst_zzz', instructions: 'Apply employment agreement playbook' },
  { name: 'litigation-reviewer', assistantId: 'asst_www', instructions: 'Assess litigation exposure and reserves' }
];
 
// Create a thread per task (or reuse); run in parallel
const results = await Promise.all(
  reviewTasks.map(async (task) => {
    const thread = await openai.beta.threads.create();
    const run = await openai.beta.threads.runs.create(thread.id, {
      assistant_id: task.assistantId,
      additional_instructions: task.instructions
    });
    return { task: task.name, run };
  })
);
 
// Poll for completion, then synthesize via Chat Completions
const synthesis = await openai.chat.completions.create({
  model: "gpt-4", // or current model
  messages: [{
    role: "user",
    content: `Synthesize due diligence findings from ${results.length} reviewers into: Executive Summary, Critical Issues, Risk Matrix, Recommended Deal Adjustments.`
  }]
});

Example: Research + Draft Workflow

// Legal Research + Drafting Multi-Step Workflow (Conceptual)
// For web search, see Responses API or built-in tools: platform.openai.com/docs/guides/tools
 
async function researchAndDraft(topic, jurisdiction, outputType) {
  // Stage 1: Research (add tools for web search; see Responses API or built-in tools)
  const research = await openai.chat.completions.create({
    model: "gpt-4", // or current model
    messages: [{
      role: "user",
      content: `Research ${topic} under ${jurisdiction} law. Provide comprehensive analysis with citations. Format as structured legal memorandum outline.`
    }]
  });
 
  // Stage 2: Draft (uses research output)
  const draft = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "user", content: `Research: ${research.choices[0].message.content}` },
      { role: "user", content: `Draft a ${outputType} addressing ${topic}. Include all relevant citations. Follow firm style guide.` }
    ]
  });
 
  // Stage 3: Review
  const review = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [{
      role: "user",
      content: `Review this draft for: 1) Legal accuracy 2) Citation completeness 3) Style compliance 4) Missing analysis. Draft: ${draft.choices[0].message.content}`
    }]
  });
 
  return { research, draft, review };
}

Part 5: Packaging Skills for Distribution

Custom GPT Sharing

  • Private: Only you can use
  • Organization: Share with ChatGPT Team/Enterprise
  • Public: Publish to GPT Store (use with caution for legal workflows)

Assistants API for Programmatic Use

For firm-wide deployment:

  1. Create Assistants via API with your skill instructions
  2. Store assistant IDs in your application
  3. Use threads for matter isolation
  4. Add file search for knowledge base

Skill Package Structure (Assistants API)

When packaging skills for programmatic use, organize assets as:

legal-contract-assistant/
├── instructions.md           # Main skill instructions (Assistant instructions)
├── playbook.json             # Standard positions, risk thresholds
├── clause-library.md         # Approved redline language
├── examples/
│   ├── good-review.md        # Calibration reference
│   └── bad-review.md         # Anti-pattern to avoid
└── README.md                 # Setup and usage notes

Distribution Checklist

Before sharing a Custom GPT or Assistant:

  • Instructions reviewed for accuracy and firm standards
  • Knowledge files (playbook, clause library) are current
  • No client names or matter IDs in any uploaded content
  • Sharing scope appropriate (Private vs Organization)
  • Version/date noted in GPT name or description

Security Considerations

  • Source verification: Only install skills from trusted sources
  • Code review: Review all integration code before deployment
  • No client data: Never include client data in skill files
  • Version control: Track changes to instructions
  • Access control: Limit who can modify firm GPTs/Assistants

Part 6: Security and Compliance

Skill and GPT Security

  • Source verification: Only install skills or GPTs from trusted sources
  • Code review: Review all integration code and API wrappers before deployment
  • No client data: Never include client names, matter IDs, or privileged content in GPT instructions or knowledge files
  • Version control: Track changes to instructions; note version/date in GPT name or description
  • Access control: Limit who can modify firm GPTs; use Organization sharing, not Public

Data Protection Patterns

API Wrapper for Audit + Sanitization (conceptual):

// Conceptual: Sanitize output before returning to user
async function safeLegalResponse(prompt, options) {
  const response = await openai.chat.completions.create({
    model: "gpt-4", // or current model (e.g., gpt-4o)
    messages: [{ role: "user", content: prompt }],
    ...options
  });
  const content = response.choices[0].message.content;
  // Remove potential PII patterns before returning
  const sanitized = content.replace(/\b[A-Z]{2,}\s+[A-Z][a-z]+\b/g, "[REDACTED]");
  await auditLog.write({ /* ... */ });
  return { ...response, content: sanitized };
}

Compliance Requirements Checklist

  • Skills/GPTs reviewed by IT security
  • Guardrails tested in sandbox environment
  • Audit logging enabled and retention policy defined
  • Client data segregation verified (no cross-matter leakage)
  • Access controls configured for Custom GPTs and Assistants
  • Backup and recovery procedures documented

Skill Security Checklist

  • Source verification: Only install skills or GPTs from trusted sources
  • Code review: Review all integration code and API wrappers before deployment
  • No client data: Never include client names, matter IDs, or privileged content in GPT instructions or knowledge files
  • Version control: Track changes to instructions; note version/date in GPT name or description
  • Access control: Limit who can modify firm GPTs; use Organization sharing, not Public

Part 7: Troubleshooting

Common Issues

IssueCauseFix
Skill not activatingInstructions too long or vaguePut activation criteria at top; use clear "Apply when" section
Inconsistent outputNo output format in instructionsAdd "OUTPUT FORMAT" with required structure
Guardrail bypassedWrapper not applied to all code pathsEnsure all OpenAI calls go through wrapper
Assistants timeoutLong-running runsUse polling with backoff; consider chunking work
Wrong model behaviorConflicting instructionsSimplify; remove redundant or contradictory rules
Token limit exceededLarge context + long outputSummarize intermediate steps; use multiple runs
Custom GPT ignores knowledge filesFiles not uploaded or wrong formatRe-upload; use supported formats (JSON, MD, TXT); check file size limits
Moderation API false positivesLegal terminology flaggedUse custom wrapper to whitelist known terms; log for review

Assistants API Debugging Tips

  • Check run.status and run.last_error for failed runs
  • Use include parameter to retrieve message content and tool calls
  • For function calling, verify tool definitions match expected schema
  • Log usage (prompt_tokens, completion_tokens) to tune context size
  • For file search: ensure files are attached to the assistant; verify vector store is built

Custom GPT Debugging Tips

  • Test with minimal instructions first, then add complexity
  • If output is truncated, add "Provide complete response" or split into smaller requests
  • Verify knowledge file content is referenced in instructions (e.g., "Apply positions from playbook.json")

When to Use Custom GPTs vs Assistants API

Use CasePreferReason
Ad-hoc legal reviewCustom GPTNo code; quick setup
Firm-wide workflowCustom GPT (Org)Easy distribution
Automated pipelinesAssistants APIProgrammatic control
Multi-step orchestrationAssistants APIThreads, tools; or Agents SDK for handoffs
Audit/compliance requirementsAssistants API + wrapperFull control over logging

Example: Output Format Instruction

Add to Custom GPT or Assistant instructions for consistent structure:

## Output Format
 
Structure every contract review response as:
 
1. **Executive Summary** (3-5 sentences)
2. **Deal Parameters Table** (counterparty, type, term, value)
3. **Clause-by-Clause Analysis** (issue | severity | recommendation)
4. **Risk Score** (RED/YELLOW/GREEN) and escalation recommendation
5. **Negotiation Priorities** (must-have vs nice-to-have)
6. **Questions for Business Team**

Do This Now

  • Create a custom skill (Custom GPT) for one of your firm's review processes
  • Add at least one guardrail (audit logging or output verification)
  • Test a multi-step workflow for document processing
  • Document your skill so your team can use it
  • Consider packaging as shared Custom GPT for distribution


Quick Reference: OpenAI Commands

# Custom GPTs
chat.openai.com -> Create GPT -> Configure instructions and knowledge

# Assistants API
openai.beta.assistants.create({ instructions, model, tools })
openai.beta.threads.create()
openai.beta.threads.runs.create(thread_id, { assistant_id })

# Moderation
openai.moderations.create({ input: user_content })


Sources

Additional Reading