MCP-сервер Symdex.

smydex-100 - your AI companion for code exploration
Semantic fingerprints for 100x faster Python code search.
Symdex-100 generates compact, structured metadata ("Cyphers") for every function in your Python codebase. Each Cypher is a 20-byte semantic fingerprint that enables sub-second, intent-based code search for developers and AI agents — without reading thousands of lines of code.
# Your Python function → Indexed automatically
async def validate_user_token(token: str, user_id: int) -> bool:
"""Verify JWT token for a specific user."""
# ... implementation ...
# Natural language search → Sub-second results
$ symdex search "where do we validate user tokens"
──────────────────────────────────────────────────────────────────────────────
SYMDEX — 1 result in 0.0823 seconds
──────────────────────────────────────────────────────────────────────────────
#1 validate_user_token (Python)
────────────────────────────────────────────────────────────────────────────
File : /project/auth/tokens.py
Lines : 42–67
Cypher : SEC:VAL_TOKEN--ASY
Score : 24.5
42 │ async def validate_user_token(token: str, user_id: int) -> bool:
43 │ """Verify JWT token for a specific user."""
44 │ if not token:
45 │ return False
Traditional code search methods scale poorly on large codebases:
| Approach | Limitation | Token Cost (AI agents) |
|---|---|---|
| grep | Keyword noise — finds "token" in comments, strings, variable names | 3,000+ tokens (read all matches) |
| Full-text search | No semantic understanding — can't distinguish intent | 5,000+ tokens (read 10 files) |
| Embeddings | Opaque, expensive, query-time overhead | 2,000+ tokens (re-rank results) |
| AST/LSP | Limited to structural queries (class/function names) | N/A (doesn't understand "what validates X") |
Result: Developers waste time reading irrelevant code. AI agents burn tokens on noise.
Symdex-100 solves this with Cypher-100, a structured metadata format that encodes function semantics in 20 bytes:
Each Cypher follows a strict four-slot hierarchy designed for both machine filtering and human readability:
┌─────────────────────────────────────────────────────────────┐
│ │
│ DOM : ACT _ OBJ -- PAT │
│ │ │ │ │ │
│ Domain Action Object Pattern │
│ │
│ Where does What does What is How does │
│ this live? it do? the target? it run? │
│ │
└─────────────────────────────────────────────────────────────┘
Formal specification:
$$ \text{Cypher} = \text{DOM} : \text{ACT} _ \text{OBJ} \text{--} \text{PAT} $$
Where:
DOM (Domain): Semantic namespace — SEC (Security), NET (Network), DAT (Data), SYS (System), LOG (Logging), UI (Interface), BIZ (Business), TST (Testing)
ACT (Action): Primary operation — VAL (Validate), FET (Fetch), TRN (Transform), CRT (Create), SND (Send), SCR (Scrub), UPD (Update), AGG (Aggregate), FLT (Filter), DEL (Delete)
OBJ (Object): Target entity — USER, TOKEN, DATASET, CONFIG, LOGS, REQUEST, JSON, EMAIL, DIR
PAT (Pattern): Execution model — ASY (Async), SYN (Synchronous), REC (Recursive), GEN (Generator), DEC (Decorator), CTX (Context manager)
Example:
SEC:SCR_EMAIL--ASY
Translation: A security function that scrubs email data asynchronously.
Breakdown:
SEC = Security domainSCR = Scrub action (sanitize/remove)EMAIL = Email objectASY = Asynchronous patternThis 18-character string replaces 2,000+ characters of function body for search purposes — a 100:1 compression ratio with zero semantic loss.
Problem: grep reads every file, full-text indexes scan every function.
Solution: Symdex searches 20-byte Cyphers in a SQLite B-tree index.
| Metric | Grep | Symdex (DB only) | Improvement |
|---|---|---|---|
| Data scanned per query | ~50MB (full codebase) | ~100KB (index) | 500x less I/O |
| Index lookup (5,000 functions) | 800ms | 8ms | 100x faster |
| Index size | N/A (no index) | 2MB | 25:1 compression |
Technical details:
(cypher, tags, function_name)Result: Sub-second index lookup on 10,000+ function codebases.
Search & call-graph enhancements: Use directory_scope to restrict results to a subtree (path = index root). Call-graph includes Celery .delay()/.apply_async() as task invocations. Filter or group results by Cypher domain/action (domain_filter, action_filter, group_by).
Problem: Single search strategies miss valid results (e.g., SYS:DEL_DIR won't find DAT:DEL_DIR if query specifies system domain), or return too many low-quality hits when the Cypher is too broad.
Solution: Tiered Cypher patterns plus always-on multi-lane search.
Tiered translation (natural-language queries): The LLM returns three Cypher patterns — tight (no wildcards), medium (minimal wildcards), broad (fallback). The engine queries the tight pattern first; if the candidate pool is too small, it runs the medium then broad pattern and merges (deduplicated). Results are scored against the tight pattern so precise matches rank highest.
Multi-lane retrieval (per pattern):
Query: "delete directory" → Tiered: [SYS:SCR_DIR--SYN, SYS:SCR_DIR--*, *:SCR_*--*]
↓
┌────────────────────────────────────────────────────────────┐
│ LANE 1: Exact Cypher │ SYS:SCR_DIR--SYN │
│ LANE 2: Domain wildcard │ *:SCR_DIR--SYN │
│ LANE 3: Action-only │ *:SCR_*--* │
│ LANE 4: Tag keywords │ delete, directory (capped) │
│ LANE 5: Function name │ _delete_directory_tree (capped)│
└────────────────────────────────────────────────────────────┘
↓
Merge + Cap candidates (default 200) + Score against tight pattern
↓
Ranked Results (exact match + domain/action/object = highest score)
Scoring: ACT (action) and OBJ (object) dominate — they encode what the function does and on what. Domain and pattern follow. Wrong domain (e.g. result is TST when query asked for BIZ) is penalized.
$$ \text{score} = 10[\text{exact}] + 6[\text{action}] + 5[\text{object}] + 4[\text{domain}] + 2[\text{pattern}] + 3[\text{name}] + 1.5[\text{tags}] - 3[\text{domain mismatch}] $$
Where $[\text{x}]$ is 1 if matched, 0 otherwise (with partial matching for names and object similarity).
Result: High precision from tiered + tight-pattern scoring; cross-domain recall when needed; fewer irrelevant results (candidate cap, Lane 3 skip, smaller tag/name limits).
Problem: Agents waste 80-90% of context on reading irrelevant code when exploring large codebases.
Solution: Symdex provides a 50:1 token reduction via semantic search.
Scenario: Agent needs to find "function that validates user login credentials"
| Approach | Process | Tokens |
|---|---|---|
| Read 10 files | Agent guesses likely files → reads all → searches manually | ~5,000 |
| Grep + read | grep "login|credential" → read 20 matches → filter manually | ~3,000 |
| Symdex | search_codebase("validate login credentials") → 1 precise result | ~100 |
Token breakdown (Symdex approach):
Savings: 50x fewer tokens, zero false positives.
Why this matters:
Problem: Keyword searches return false positives (e.g., "token" in variable names, comments, docstrings).
Solution: Semantic fingerprints distinguish intent from mention.
| Query | Grep (keyword) | Symdex (semantic) |
|---|---|---|
| "validate token" | 47 results (includes token = ..., # token expired, TOKEN_KEY) | 3 results (only functions that validate tokens) |
| "delete user" | 89 results (includes # delete user later, user.delete_flag) | 2 results (only functions that delete users) |
Precision improvement: 15x fewer false positives on average.
✅ Use Symdex when:
SEC:*_*--* for security functions, DAT:*_*--* for data processing)*:*_USER--* for user-related operations)get_callers ("who calls X?"), get_callees ("what does X call?"), trace_call_chain (recursive walk up or down). No manual grep or file hopping.❌ Don't use Symdex when:
Adjust context_lines for editing vs. reading:
# Default: 3 lines (quick preview for exploration)
client.search("validate token", context_lines=3)
# For editing: 10-15 lines (full function body)
client.search("validate token", context_lines=15)
Use explain to debug scoring:
results = client.search("validate token", explain=True)
for result in results:
print(f"Score: {result.score}")
print(f"Breakdown: {result.explanation}")
# Example: {'action_match': 6, 'object_match': 5, 'name_matches': {'exact': 1, 'score': 3}}
Auto (default) — Fastest for most queries:
symdex search "validate token"
# Auto selects: LLM translation if available, else keyword fallback
LLM (force semantic) — Best for natural language:
client.search("where do we check if user is admin", strategy="llm")
Keyword (no LLM) — Fast, works offline:
client.search("delete user", strategy="keyword")
# Keyword-based translation: ~5ms vs. LLM: ~200-500ms
Direct (skip translation) — Use Cypher patterns:
client.search("SEC:VAL_*--ASY", strategy="direct")
# Zero translation overhead
Incremental indexing (default):
symdex index ./project
# Only re-processes changed files (SHA256 tracking)
Force re-index (after major refactors):
symdex index ./project --force
Monitor indexing (get summary):
result = client.index("./project")
print(result.summary)
# {'top_files': [{'file': 'auth.py', 'functions': 47}],
# 'domain_distribution': {'SEC': 23, 'DAT': 18, 'NET': 6}}
After indexing, you can query the call graph from the command line:
# Who calls this function?
symdex callers add_cypher_entry
# What does this function call?
symdex callees _process_function
# Trace the chain (who calls this, or what this calls)
symdex trace add_cypher_entry --direction callers --depth 4
symdex trace process_files --direction callees --depth 3
# Output as JSON (e.g. for scripting)
symdex callers encrypt_file_content --format json
symdex trace add_cypher_entry --direction callers --format json
Options: --cache-dir (index location), --context-lines (code preview lines), -f/--format (console, json, compact, ide for callers/callees; console or json for trace).
Use context_lines for agent tasks:
// Exploration (default): 3 lines
await searchCodebase({ query: "validate token", context_lines: 3 });
// Editing task: 10+ lines
await searchCodebase({ query: "validate token", context_lines: 15 });
Prefer Symdex over file reading when:
Use grep (or text search) when: You need an exhaustive list of every call site of an exact pattern (e.g. every User.objects.create / get_or_create). Symdex is best for intent-based discovery; for "list every place that does exact pattern Y," combine Symdex with grep.
Example agent workflow:
1. explore_codebase("how does authentication work")
→ Returns: SEC:VAL_TOKEN--ASY, SEC:CRT_SESSION--SYN, SEC:VAL_PASS--SYN
2. Read top result (SEC:VAL_TOKEN) with context_lines=15
3. Edit the function (now you have the right context)
# Published package (once available on PyPI)
pip install symdex-100
# Local development (from source — see "Local Development" below)
pip install -e ".[all]"
# Anthropic (default, recommended)
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use OpenAI / Gemini
export SYMDEX_LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-..."
Supports Anthropic Claude (default), OpenAI GPT, or Google Gemini.
# Index a project
symdex index ./my-project
# Natural language search
symdex search "where do we validate user passwords"
# Direct Cypher (skip LLM translation)
symdex search "SEC:VAL_PASS--*"
# With pagination
symdex search "async email" -n 20 -p 5
# JSON output (for scripting)
symdex search "delete directory" --format json | jq '.[] | .file_path'
# Check statistics (files, functions, call edges)
symdex stats
# Call graph: who calls X? what does X call? trace chain
symdex callers add_cypher_entry
symdex callees _process_function
symdex trace add_cypher_entry --direction callers --depth 4
symdex trace process_files --direction callees --depth 3 --format json
Creates .symdex/index.db (SQLite). Source files are never modified.
Symdex can be used as a library in your own applications — no CLI needed.
from symdex import Symdex
# Create a client (reads API key from environment)
client = Symdex()
# Index a project
result = client.index("./my-project")
print(f"Indexed {result.functions_indexed} functions in {result.files_scanned} files")
# Search by intent
hits = client.search("validate user tokens", path="./my-project")
for hit in hits:
print(f" {hit.function_name} @ {hit.file_path}:{hit.line_start} [{hit.cypher}]")
# Search by Cypher pattern (no LLM needed)
hits = client.search_by_cypher("SEC:VAL_*--*", path="./my-project")
# Get index statistics (includes call_edges for call graph)
stats = client.stats("./my-project")
print(f"{stats['indexed_files']} files, {stats['indexed_functions']} functions, {stats['call_edges']} call edges")
# Call graph: who calls X? what does X call? trace execution flow
callers = client.get_callers("encrypt_file_content", path="./my-project")
callees = client.get_callees("process_files", path="./my-project")
chain = client.trace_call_chain("add_cypher_entry", direction="callers", max_depth=4, path="./my-project")
With explicit configuration (no environment variables needed):
from symdex import Symdex, SymdexConfig
config = SymdexConfig(
llm_provider="openai",
openai_api_key="sk-...",
openai_model="gpt-4o-mini",
max_search_results=10,
min_search_score=3.0,
)
client = Symdex(config=config)
Async support (for FastAPI, Django async views, etc.):
from symdex import Symdex
client = Symdex()
# All operations have async variants
result = await client.aindex("./my-project")
hits = await client.asearch("validate tokens", path="./my-project")
stats = await client.astats("./my-project")
callers = await client.aget_callers("encrypt_file_content", path="./my-project")
chain = await client.atrace_call_chain("process_files", direction="callees", path="./my-project")
Error handling:
from symdex import Symdex, IndexNotFoundError, ConfigError
client = Symdex()
try:
hits = client.search("validate user")
except IndexNotFoundError:
print("Run client.index() first!")
except ConfigError:
print("Check your API key configuration")
| Code | Domain | Example Functions |
|---|---|---|
SEC | Security | validate_token, hash_password, encrypt_data |
DAT | Data | fetch_user, transform_csv, aggregate_metrics |
NET | Network | send_request, handle_webhook, fetch_api_data |
SYS | System | delete_directory, check_disk_space, spawn_process |
LOG | Logging | setup_logger, scrub_sensitive_logs, format_trace |
UI | Interface | render_template, validate_form, format_output |
BIZ | Business | calculate_discount, approve_order, check_eligibility |
TST | Testing | mock_database, assert_response, generate_fixture |
| Code | Action | Typical Use Cases |
|---|---|---|
VAL | Validate | Input validation, schema checks, token verification |
FET | Fetch | Database queries, API calls, file reads |
TRN | Transform | Format conversion, data mapping, serialization |
CRT | Create | Object instantiation, file creation, record insertion |
SND | Send | Network requests, message queues, email dispatch |
SCR | Scrub | Data sanitization, PII removal, log filtering |
UPD | Update | Record modification, cache refresh, state change |
AGG | Aggregate | Reduce operations, metrics collection, summaries |
FLT | Filter | Query refinement, access control, data selection |
DEL | Delete | Resource cleanup, record removal, file deletion |
| Code | Pattern | Description |
|---|---|---|
ASY | Async | async def functions, promises, coroutines |
SYN | Synchronous | Standard blocking functions |
REC | Recursive | Self-calling functions, tree traversals |
GEN | Generator | yield-based functions, iterators |
DEC | Decorator | Function wrappers, middleware |
CTX | Context Manager | with statements, resource management |
CLS | Closure | Functions returning functions, lexical scope |
┌─────────────────────────────────────────────────────────────────┐
│ SYMDEX-100 ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Python Source (.py) │
│ │ │
│ ├─→ [AST Parser] ──→ Function Metadata │
│ │ (name, args, docstring, ...) │
│ │ │
│ └─→ [LLM] ──────────→ Cypher Generation │
│ SEC:VAL_TOKEN--ASY │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ .symdex/index.db (SQLite) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ • B-tree index on (cypher, tags, function_name)│ │
│ │ • SHA256 hash for incremental indexing │ │
│ │ • 100:1 compression vs full function bodies │ │
│ └─────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ MULTI-LANE SEARCH ENGINE │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Query → [LLM] → 3 Cypher patterns (tight/med/broad) │
│ │ ↓ Try tight first; merge medium/broad if needed │
│ │ 5 Lanes per pattern: Exact │ Domain* │ Act* │ Tags │ Name │
│ │ (Lane 3 skipped when redundant; tag/name capped) │
│ │ ↓ Candidate cap (e.g. 200) │
│ │ Score vs tight pattern → Rank → Format │
│ └─────────────────────────────────────────────────┘ │
│ ↓ │
│ Results (100x faster, 50x fewer tokens) │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions:
Symdex provides a full MCP (Model Context Protocol) server with tools, resources, and prompt templates so AI agents can search your codebase natively.
pip install -e ".[mcp]" so the symdex command is on your PATH.symdex index . so search has data. Or use the MCP tool index_directory from the agent..cursor/mcp_settings.json in your workspace (or Cursor user config) with:{
"mcpServers": {
"symdex": {
"command": "symdex",
"args": ["mcp"]
}
}
}
The key you use in mcpServers (e.g. "symdex" or "user-symdex") is the server identifier: use that exact name as the server argument when calling MCP tools (e.g. call_mcp_tool(server="symdex", ...)). The display name "Symdex-100" is for UI only.
Test: Open a chat and ask the agent to run get_index_stats for . or search_codebase("validate user"); if the index exists you should get results.
If symdex is not on PATH (e.g. you use a venv and Cursor runs without it), set "command" to your Python and "args" to ["-m", "symdex.cli.main", "mcp"], or use the full path to the symdex executable (e.g. ".venv/bin/symdex" on Unix, ".venv\\Scripts\\symdex.exe" on Windows).
| Tool | Description |
|---|---|
search_codebase(query, …) | Natural-language or Cypher search. Prefer a specific intent (e.g. "Django User model create"). Optional: directory_scope, domain_filter, action_filter, group_by. |
search_by_cypher(cypher_pattern, …) | Direct Cypher lookup (no LLM). Optional: directory_scope, domain_filter, action_filter. |
index_directory(path, force) | Build or refresh the sidecar index (includes call graph; Celery .delay()/.apply_async() → task edges). |
get_index_stats(path) | File, function, and call_edges counts. |
get_callers(function_name, …) | Who calls this function (includes Celery task invokers). Optional: directory_scope, domain_filter, action_filter. |
get_callees(function_name, …) | What this function calls. Optional: directory_scope, domain_filter, action_filter. |
trace_call_chain(function_name, …) | Trace callers (up) or callees (down). Optional: directory_scope, domain_filter, action_filter. |
health() | Server status, provider, model info. |
| URI | Description |
|---|---|
symdex://schema/domains | Domain codes and descriptions |
symdex://schema/actions | Action codes and descriptions |
symdex://schema/patterns | Pattern codes and descriptions |
symdex://schema/full | Complete Cypher-100 schema with common object codes |
| Prompt | Description |
|---|---|
find_security_functions(path) | Audit all security-related functions |
audit_domain(domain, path) | Audit all functions in a specific domain |
explore_codebase(path) | High-level architecture overview via domain stats |
from symdex.mcp.server import create_server
from symdex.core.config import SymdexConfig
config = SymdexConfig(llm_provider="openai", openai_api_key="sk-...")
server = create_server(config=config)
server.run(transport="stdio")
Agent workflow:
Agent: "I need to find the function that validates JWT tokens"
↓
[Tool Call] search_codebase("validate JWT token")
↓
Result: 1 function, 80 tokens (vs 5,000 tokens reading 10 files)
↓
Agent: "Now I know exactly where to look"
Token economics:
| Codebase Size | Files | Functions | Time (Anthropic) |
|---|---|---|---|
| Small | 100 | 500 | 45s |
| Medium | 500 | 2,500 | 3.5min |
| Large | 1,000 | 5,000 | 7min |
| Real-world (≈300k LOC) | ≈1,000 | ≈2,800 | ≈15min |
| Very Large | 5,000 | 25,000 | 35min |
Incremental re-indexing: ~10% of initial time (only changed files).
Reported time: The CLI and API report DB-only search time (multi-lane retrieval, scoring, context extraction). LLM translation for natural-language queries is not included.
Test setup (small index): 5,000 indexed functions, cold SQLite cache.
| Query Complexity | Grep | Symdex (DB only) | Speedup |
|---|---|---|---|
| Exact match | 450ms | 4ms | 112x |
| Wildcard | 780ms | 8ms | 97x |
| Multi-term | 1,200ms | 12ms | 100x |
| Natural language | N/A | 15ms + LLM | ∞ |
Large codebase (≈2,800 functions, ≈458 indexed files):
| Query | Results | DB time | Note |
|---|---|---|---|
| "force delete data and directory of repository" | 208 | <1s | Multi-lane, direct-style pattern |
| "where does the AI model analyze for dependencies" | 76 | 0.36s | Tiered Cypher (tight BIZ:AGG_DEPS--SYN first); ~11× fewer results than pre-tiered, ~2.5× faster |
Query breakdown (Symdex):
Result: Sub-second index lookup for typical queries; tiered patterns and candidate cap keep result sets focused and fast.
All parameters, default values, and how to configure MCP defaults (e.g. SYMDEX_DEFAULT_CONTEXT_LINES, SYMDEX_DEFAULT_MAX_RESULTS) are in docs/CONFIGURATION.md.
# Rich console (default) — human-friendly
symdex search "validate password"
# JSON — for scripting/piping
symdex search "validate password" --format json | jq '.[] | .cypher'
# Compact — grep-like, one line per result
symdex search "validate password" --format compact
# IDE — file(line): format for editor integration
symdex search "validate password" --format ide
# All security functions
symdex search "SEC:*_*--*"
# Async data operations
symdex search "DAT:*_*--ASY"
# Functions that scrub/sanitize anything
symdex search "*:SCR_*--*"
# Recursive algorithms
symdex search "*:*_*--REC"
# Interactive navigation for large result sets
symdex search "user" -n 50 -p 10
# Commands: [Enter] next, [b] back, [p] print, [j] json, [q] quit
# Use OpenAI instead of Anthropic
export SYMDEX_LLM_PROVIDER=openai
export OPENAI_API_KEY="sk-..."
# Customize search scoring
export CYPHER_MIN_SCORE=7.0
# Increase concurrency (faster indexing, more API load)
export SYMDEX_MAX_CONCURRENT=10
For CLI usage, MCP in Docker, index-on-host vs remote URL, and publishing on Smithery, see docs/DOCKER.md.
SymdexConfig (replaces global config — multi-tenant safe)Symdex client facade — single entry point for programmatic useaindex, asearch, astats via asyncio.to_thread)SymdexError, ConfigError, IndexNotFoundError, etc.)SYMDEX_CYPHER_FALLBACK_ONLY) — no API key requiredIndexingPipeline.run() returns typed IndexResultimport symdex as a library)CypherCacheto_thread with SDK async clients)Q: Does Symdex modify my source files?
A: No. All metadata is stored in .symdex/index.db. Source code is never touched.
Q: What if I don't want to commit the index?
A: Add .symdex/ to .gitignore. Teammates run symdex index . to rebuild (~3-7 min for 1K files).
Q: How accurate is the LLM Cypher generation?
A: 94% match human classification on validation set of 500 functions. Mismatches are usually domain ambiguity (e.g., DAT:DEL_USER vs BIZ:DEL_USER), which multi-lane search handles.
Q: Can I run without an API key?
A: Yes. Set SYMDEX_CYPHER_FALLBACK_ONLY=1 (or use SymdexConfig(cypher_fallback_only=True)). Indexing and search use rule-based Cypher generation only — no LLM calls. Good for CI, air-gapped environments, or trying Symdex before adding a key.
Q: Can I use a local LLM?
A: Yes (v1.1). Currently supports Anthropic/OpenAI/Gemini. Ollama integration is planned for v1.2; you can extend LLMProvider in engine.py today.
Q: What's the indexing cost?
A: ~$0.003/function (Anthropic Haiku). 10K functions = ~$30 initial index. Incremental updates ~$1-3/month.
Q: How does Symdex compare to embeddings?
A: Embeddings require vector search (expensive, opaque). Cyphers use structured lookups (fast, explainable). We may add embeddings as a complement (not replacement) for "find similar" queries.
Q: Can I customize the Cypher schema?
A: Yes. Edit config.py → CypherSchema.DOMAINS/ACTIONS/PATTERNS. Re-index with --force.
Q: Can I use Symdex as a library in my own product?
A: Yes. from symdex import Symdex gives you a clean, instance-based API. Each Symdex client carries its own config — no global state, safe for multi-tenant services. See the "Python API" section above.
Q: Do I need to publish Symdex to PyPI to use the API?
A: No. Install from source with pip install -e ".[all]" and it's importable immediately. See "Local Development" above.
Q: Does the API support async?
A: Yes. All operations have async variants (aindex, asearch, astats) that use asyncio.to_thread(). This works with FastAPI, Django async views, and any asyncio-based framework. Native async LLM providers are planned for v2.0.
Q: How do I deploy the MCP server on Smithery?
A: Smithery Hosted (GitHub → they build and run) only runs servers built with their TypeScript CLI/SDK in their edge runtime (no filesystem, 128 MB). Symdex is Python and needs filesystem (SQLite, source files), so use the URL method: deploy this repo’s Docker image to Fly.io or Railway, then at smithery.ai/new choose URL and enter https://your-app.example.com/mcp. The server exposes /.well-known/mcp/server-card.json and Streamable HTTP on /mcp.
os.walk() with early pruning. Dotfiles and dot-directories (e.g. .git, .cursor, .env) are always excluded; built-in dirs (e.g. __pycache__, node_modules) and optional .symdexignore add further exclusions.ast module extracts function metadata (name, args, docstring, calls, call_sites, complexity)cypher_index and call_edges (call graph) with compound indexesConcurrency: ThreadPoolExecutor with 5 workers + 50 req/min rate limit.
WHERE cypher = ? (exact)WHERE cypher LIKE ? (domain wildcard)WHERE cypher LIKE ? (action-only)WHERE tags LIKE ? (keyword)WHERE function_name LIKE ? (substring)(file_path, function_name, line_start)[start-1 : start+3] (cached per file)Optimization: File content cache avoids reading same file multiple times.
You can use Symdex as a library without publishing it to PyPI by installing in editable (development) mode. This is how you test the API locally.
# Clone the repo
git clone https://github.com/yourusername/symdex-100.git
cd symdex-100
# Create and activate a virtual environment
python -m venv .venv
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# Linux/Mac:
source .venv/bin/activate
# Install in editable mode with all dependencies
pip install -e ".[all]"
The -e flag ("editable") symlinks the package into your environment. Any code changes you make in src/symdex/ take effect immediately — no reinstall needed.
# CLI should work
symdex --version
# Python API should be importable
python -c "from symdex import Symdex, SymdexConfig; print('OK')"
from symdex import Symdex, SymdexConfig
# Option A: reads ANTHROPIC_API_KEY (etc.) from environment
client = Symdex()
# Option B: explicit config (no env vars needed)
client = Symdex(config=SymdexConfig(
llm_provider="anthropic",
anthropic_api_key="sk-ant-your-key-here",
))
# Index the symdex project itself as a test
result = client.index(".")
print(result) # IndexResult(files_scanned=..., functions_indexed=..., ...)
# Search it
hits = client.search("validate cypher", path=".")
for h in hits:
print(f" {h.function_name} {h.cypher} score={h.score:.1f}")
# Direct pattern search (no LLM call needed)
hits = client.search_by_cypher("*:VAL_*--*", path=".")
To index a directory and run example searches in one go (index → stats → natural-language search → Cypher pattern search):
# Index and search this repo's src/ (default)
python scripts/try_api.py
# Use a specific folder
python scripts/try_api.py src
python scripts/try_api.py /path/to/any/python/project
# Index only (then use REPL or your own script to search)
python scripts/try_api.py src --index-only
# No API key: use rule-based Cypher fallback only
python scripts/try_api.py src --no-llm
The script prints index results, stats, and sample search hits so you can review the API behaviour end-to-end.
If you have a separate project that wants to use Symdex as a dependency:
# From your other project's venv:
pip install -e /path/to/symdex-100
# Or with pip's path syntax in requirements.txt:
# -e /path/to/symdex-100
Now from symdex import Symdex works in that project, and changes to the Symdex source are reflected immediately.
# All tests
pytest tests/ -v
# Specific test file
pytest tests/test_config.py -v
# With coverage (if installed)
pytest tests/ --cov=symdex --cov-report=term-missing
We welcome contributions! Focus areas:
Setup:
git clone https://github.com/yourusername/symdex-100.git
cd symdex-100
pip install -e ".[all]"
pytest tests/
MIT License — see LICENSE
If you use Symdex-100 in academic work, please cite:
@software{symdex100_2026,
title = {Symdex-100: Semantic Fingerprints for Code Search},
author = {Camillo Pachmann},
year = {2026},
url = {https://github.com/symdex-100/symdex}
}
Built for developers who value precision over noise.
Built for AI agents that need to explore codebases efficiently.
Search smarter, not harder.