Consult7 MCP Server

Consult7 is a Model Context Protocol (MCP) server that enables AI agents to consult large context window models via OpenRouter for analyzing extensive file collections - entire codebases, document repositories, or mixed content that exceed the current agent's context limits.

Why Consult7?

Consult7 enables any MCP-compatible agent to offload file analysis to large context models (up to 2M tokens). Useful when:

Agent's current context is full
Task requires specialized model capabilities
Need to analyze large codebases in a single query
Want to compare results from different models

"For Claude Code users, Consult7 is a game changer."

How it works

Consult7 collects files from the specific paths you provide (with optional wildcards in filenames), assembles them into a single context, and sends them to a large context window model along with your query. The result is directly fed back to the agent you are working with.

Example Use Cases

Quick codebase summary

Files: ["/Users/john/project/src/*.py", "/Users/john/project/lib/*.py"]
Query: "Summarize the architecture and main components of this Python project"
Model: "google/gemini-3-flash-preview"
Mode: "fast"

Deep analysis with reasoning

Files: ["/Users/john/webapp/src/*.py", "/Users/john/webapp/auth/*.py", "/Users/john/webapp/api/*.js"]
Query: "Analyze the authentication flow across this codebase. Think step by step about security vulnerabilities and suggest improvements"
Model: "anthropic/claude-opus-4.8"
Mode: "think"

Generate a report saved to file

Files: ["/Users/john/project/src/*.py", "/Users/john/project/tests/*.py"]
Query: "Generate a comprehensive code review report with architecture analysis, code quality assessment, and improvement recommendations"
Model: "google/gemini-2.5-pro"
Mode: "think"
Output File: "/Users/john/reports/code_review.md"
Result: Returns "Result has been saved to /Users/john/reports/code_review.md" instead of flooding the agent's context

Featured: Gemini 3.1 Models

Consult7 supports Google's Gemini 3.1 family:

Gemini 3.1 Pro (google/gemini-3.1-pro-preview) - Flagship reasoning model, 1M context
Gemini 3 Flash (google/gemini-3-flash-preview) - Ultra-fast model, 1M context
Gemini 3.1 Flash Lite (google/gemini-3.1-flash-lite-preview) - Ultra-fast lite model, 1M context

Quick mnemonics for power users:

gemt = Gemini 3.1 Pro + think (flagship reasoning)
gemf = Gemini 3 Flash + fast (ultra fast)
gptt = GPT-5.5 + think (latest GPT)
grot = Grok 4.20 + think (automatic reasoning)
oput = Claude Opus 4.8 + think (adaptive thinking)
ULTRA = Run GEMT, GPTT, GROT, and OPUT in parallel (4 frontier models)
FUSE = Fusion: a frontier panel deliberates and a judge synthesizes, in one call

These mnemonics make it easy to reference model+mode combinations in your queries.

Featured: Fusion (multi-model analysis)

Consult7 supports OpenRouter's Fusion (openrouter/fusion) — a single call where a panel of frontier models (Opus, GPT, Gemini Pro) answers your query in parallel and a judge model synthesizes their responses into one answer. Reach for it on hard questions where multiple perspectives help and the cost of being wrong outweighs a few extra completions.

Context: 128K — smaller than the 1M–2M single models, so it's best for hard questions on moderate input, not giant file bundles.
Mode → research depth: fast / mid / think map the panel's web-search/fetch budget to max_tool_calls of 2 / 8 / 16.
Mnemonic: FUSE = openrouter/fusion.

Trivial prompts answer directly (no panel); the panel fires only when the question warrants deliberation. Fusion is billed per panel run, so it costs more than a single-model call.

Installation

Claude Code

Simply run:

claude mcp add -s user consult7 uvx -- consult7 your-openrouter-api-key

Claude Desktop

Add to your Claude Desktop configuration file:

{
  "mcpServers": {
    "consult7": {
      "type": "stdio",
      "command": "uvx",
      "args": ["consult7", "your-openrouter-api-key"]
    }
  }
}

Replace your-openrouter-api-key with your actual OpenRouter API key.

No installation required - uvx automatically downloads and runs consult7 in an isolated environment.

Command Line Options

uvx consult7 <api-key> [--test]

<api-key>: Required. Your OpenRouter API key
--test: Optional. Test the API connection

The model and mode are specified when calling the tool, not at startup.

Supported Models

Consult7 supports all 500+ models available on OpenRouter. Below are the flagship models with optimized dynamic file size limits:

Model	Context	Use Case
`openai/gpt-5.5`	1M	Latest GPT, balanced performance
`google/gemini-3.1-pro-preview`	1M	Flagship reasoning model
`google/gemini-3-flash-preview`	1M	Gemini 3 Flash, ultra fast
`google/gemini-3.1-flash-lite-preview`	1M	Ultra-fast lite model
`anthropic/claude-opus-4.8`	1M	Best quality, adaptive thinking
`anthropic/claude-sonnet-4.6`	1M	Excellent reasoning, fast
`anthropic/claude-haiku-4.5`	200k	Budget, very fast
`x-ai/grok-4.20`	2M	Automatic reasoning, huge context
`x-ai/grok-4.1-fast`	2M	Largest context window
`openrouter/fusion`	128k	Multi-model panel + judge (see Featured: Fusion)

Quick mnemonics:

gptt = openai/gpt-5.5 + think (latest GPT, deep reasoning)
gemt = google/gemini-3.1-pro-preview + think (Gemini 3.1 Pro, flagship reasoning)
grot = x-ai/grok-4.20 + think (Grok 4.20, automatic reasoning)
oput = anthropic/claude-opus-4.8 + think (Claude Opus, adaptive thinking)
opuf = anthropic/claude-opus-4.8 + fast (Claude Opus, no reasoning)
gemf = google/gemini-3-flash-preview + fast (Gemini 3 Flash, ultra fast)
ULTRA = call GEMT, GPTT, GROT, and OPUT IN PARALLEL (4 frontier models for maximum insight)
FUSE = openrouter/fusion (one call: a frontier panel deliberates, a judge synthesizes; mode sets web-research depth)

You can use any OpenRouter model ID (e.g., deepseek/deepseek-r1-0528). See the full model list. File size limits are automatically calculated based on each model's context window.

Performance Modes

fast: No reasoning - quick answers, simple tasks
mid: Moderate reasoning - code reviews, bug analysis
think: Maximum reasoning - security audits, complex refactoring

File Specification Rules

Absolute paths only: /Users/john/project/src/*.py
Wildcards in filenames only: /Users/john/project/*.py (not in directory paths)
Extension required with wildcards: *.py not *
Mix files and patterns: ["/path/src/*.py", "/path/README.md", "/path/tests/*_test.py"]

Common patterns:

All Python files: /path/to/dir/*.py
Test files: /path/to/tests/*_test.py or /path/to/tests/test_*.py
Multiple extensions: ["/path/*.js", "/path/*.ts"]

Automatically ignored: __pycache__, .env, secrets.py, .DS_Store, .git, node_modules

Size limits: Dynamic based on model context window (e.g., Grok 4.20: ~8MB, GPT-5.5: ~4MB)

Tool Parameters

The consultation tool accepts the following parameters:

files (required): List of absolute file paths or patterns with wildcards in filenames only
query (required): Your question or instruction for the LLM to process the files
model (required): The LLM model to use (see Supported Models above)
mode (required): Performance mode - fast, mid, or think
output_file (optional): Absolute path to save the response to a file instead of returning it
- If the file exists, it will be saved with _updated suffix (e.g., report.md → report_updated.md)
- When specified, returns only: "Result has been saved to /path/to/file"
- Useful for generating reports, documentation, or analyses without flooding the agent's context
zdr (optional): Enable Zero Data Retention routing (default: false)
- When true, routes only to endpoints with ZDR policy (prompts not retained by provider)
- ZDR available: Gemini 3.1 Pro/Flash, Claude Opus 4.8, GPT-5, GPT-5.5
- Not available: Grok 4.20 (returns error)

Usage Examples

Via MCP in Claude Code

Claude Code will automatically use the tool with proper parameters:

{
  "files": ["/Users/john/project/src/*.py"],
  "query": "Explain the main architecture",
  "model": "google/gemini-3-flash-preview",
  "mode": "fast"
}

Via Python API

from consult7.consultation import consultation_impl

result = await consultation_impl(
    files=["/path/to/file.py"],
    query="Explain this code",
    model="google/gemini-3-flash-preview",
    mode="fast",  # fast, mid, or think
    provider="openrouter",
    api_key="sk-or-v1-..."
)

Testing

# Test OpenRouter connection
uvx consult7 sk-or-v1-your-api-key --test

Uninstalling

To remove consult7 from Claude Code:

claude mcp remove consult7 -s user

Version History

v3.7.1

Surface mid-stream API errors: when OpenRouter sends an error as a streaming data chunk (after the initial 200), the call now returns that error message instead of a misleading "No content received".

v3.7.0

Added Fusion (openrouter/fusion) — a multi-model panel plus a judge in one call; mode maps to web-research depth (fast/mid/think → max_tool_calls 2/8/16). New FUSE mnemonic.
Upgraded Claude Opus 4.7 → 4.8 (1M context, adaptive thinking); oput/opuf now point to 4.8, and 4.7 is kept as a legacy ID.
The response footer now reports the call cost in USD (from OpenRouter usage accounting), e.g. cost: $0.0923.

v3.6.1

Toggle-reasoning footer now distinguishes mid vs think for adaptive models (Opus, Grok)
Friendlier error message when a model has no Zero Data Retention endpoint
output_file return now includes the metadata footer so callers can verify what ran

v3.6.0

Upgraded models: GPT-5.5, Claude Opus 4.7, Grok 4.20
Claude Opus 4.7 (1M context) uses adaptive thinking — reasoning.enabled=true
Grok 4.20 (2M context) uses automatic reasoning — reasoning.enabled=true
Updated mnemonics: gptt → GPT-5.5, oput/opuf → Claude Opus 4.7, grot → Grok 4.20
Legacy model IDs still supported

v3.5.0

Upgraded GPT-5.2 → GPT-5.4 (~1M context)

v3.4.0

Upgraded models: Gemini 3.1 Pro, Claude Opus 4.6, Claude Sonnet 4.6, Grok 4.1 Fast
Added new models: Claude Haiku 4.5, Gemini 3.1 Flash Lite
Updated mnemonics: gemt → Gemini 3.1 Pro, oput/opuf → Claude Opus 4.6
Legacy model IDs still supported

v3.3.0

Fixed GPT-5.2 thinking mode truncation issue (switched to streaming)
Added google/gemini-3-flash-preview (Gemini 3 Flash, ultra fast)
Updated gemf mnemonic to use Gemini 3 Flash
Added zdr parameter for Zero Data Retention routing

v3.2.0

Updated to GPT-5.2 with effort-based reasoning

v3.1.0

Added google/gemini-3-pro-preview (1M context, flagship reasoning model)
New mnemonics: gemt (Gemini 3 Pro), grot (Grok 4), ULTRA (parallel execution)

v3.0.0

Removed Google and OpenAI direct providers - now OpenRouter only
Removed |thinking suffix - use mode parameter instead (now required)
Clean mode parameter API: fast, mid, think
Simplified CLI from consult7 <provider> <key> to consult7 <key>
Better MCP integration with enum validation for modes
Dynamic file size limits based on model context window

v2.1.0

Added output_file parameter to save responses to files

v2.0.0

New file list interface with simplified validation
Reduced file size limits to realistic values

License

MIT

Consult7Consult7

Установка