A complete account of every quality dimension that separates a working MCP server from a well-built one, plus the skill that automates all of it.
Before MCP, an AI assistant only knew what it was trained on plus whatever you pasted into the chat. If you wanted it to check your database, call an API, or get a live weather reading, you were doing that yourself and copy-pasting the result. That workflow breaks at scale and breaks badly when the data changes fast.
MCP — Model Context Protocol — is an open protocol that lets AI clients call external tools and data sources directly. You write a server that exposes tools. AI clients discover and call those tools at inference time. The AI gets real, live data. You stop copy-pasting.
It's also not proprietary. Anthropic published the spec, and most major clients have picked it up — Claude, Cursor, Windsurf, Cline. Write one server and it runs in all of them.
Real examples already in production: the GitHub MCP lets Claude read and write code in your repos. The Stripe MCP lets it query payment data. Internal database MCPs at companies give their AI assistants direct access to proprietary data that never hit training. The pattern is the same everywhere — expose your data as tools, let the AI call them.
The protocol handles tool discovery, parameter validation, error responses, and streaming. You focus on writing the tools themselves.
I built seasons.kooexperience.com because I was tired of asking AI assistants about sakura bloom timing and getting responses based on averages from years ago. The actual bloom date shifts by weeks depending on the year. You need live data.
The MCP server wraps the JMA (Japan Meteorological Association) and n-kishou APIs and exposes them as callable tools. Ask Claude when cherry blossoms will peak in Kyoto this year, it calls sakura.forecast, gets current data, and gives you a real answer.
Six data types: sakura and koyo forecasts from the Japan Meteorological Corporation (live, updated daily), short-range weather from JMA (live), and curated static datasets for flowers, fruit farms, and festivals — 1,700+ GPS-tagged locations in total.
When I first submitted to Smithery I got a 64. I could see the score but not the breakdown. That forced me to read the spec carefully and reverse-engineer what each dimension actually checks.
| Dimension | Points | What it checks |
|---|---|---|
| Tool descriptions | 12 | Every tool has a clear, verb-first description that names the resource and explains when to call it |
| Parameter descriptions | 11 | Every parameter has a description, type, and example where relevant — nothing left implicit |
| Annotations | 7 | Tools declare readOnlyHint, destructiveHint, openWorldHint — signals for how clients should treat the tool |
| Tool names (dot notation) | 5 | Tool names use domain.action format to form a navigable tree (e.g. sakura.forecast, not get_sakura_forecast) |
| Prompts | 5 | Server exposes at least one prompt that helps users understand how to interact with the server |
| Resources | 5 | Awarded automatically — Smithery's own tooltip says "Resources are always awarded full points" |
| Server metadata | 30 | Complete smithery.yaml with name, description, homepage, categories, and tags |
| Config UX | 25 | Config schema is minimal — no unnecessary required fields, sensible defaults for optional params |
| Server instructions | varies | The server sends an instructions string in initialize response — routing guidance for the AI |
| Static data & caching | bonus | Frequently-called data is cached with appropriate TTLs — reduces upstream API load and latency |
The server worked. All 12 tools returned correct data. But the quality score told a different story. What was actually wrong:
get_* — no dot notation, no domain groupingsmithery.yaml beyond the bare minimumRewrote every tool description to be verb-first, specific, and useful. Added full parameter descriptions with types and examples. Added annotations to every tool using four constants (covered in section 8). Jumped 18 points in one pass.
The parameter descriptions alone were probably worth 8 of those points. I had 40+ parameters across 12 tools with no descriptions. That's a lot of missing signal.
Added an instructions string to the initialize response. This is the routing guide — it tells the AI when to use which tool, in what order, and for what kind of question. Wrote it like internal documentation for the model, not for humans.
Also added two prompts: one for sakura season planning, one for koyo. These are pre-built interaction patterns that users can invoke directly from supported clients.
Wrote a complete smithery.yaml with name, description, homepage URL, categories (travel, weather, japan), tags, and a minimal config schema. The config UX dimension rewards you for keeping the setup friction low — no required API keys for a public read-only server means full marks there.
Stuck at 98 for longer than I'd like to admit. The fix was the last 2 points and the most non-obvious change in the whole process. Full story in the next section.
After stage 4, I was at 98. I knew I was missing 2 points somewhere in the tool naming dimension. I'd read the docs. I thought my names were fine. They followed the pattern, had underscores, were descriptive. Standard stuff.
I asked Grok to review the tool names. It suggested varying the verb prefixes — replace all get_ with a mix of list_, find_, and fetch_ depending on whether the tool retrieves a collection or a single item. That made semantic sense, so I tried it. Still 98.
Then I hovered the tooltip on the tool names section in the Smithery dashboard. The text read: "Tool names should form a navigable tree using dot-notation (e.g. admin.tools.list)".
That was it. Not underscores. Dots. The dimension is checking for a grouped, hierarchical naming scheme — the kind that lets a client build a tree view of your tools instead of a flat list of 12 similarly-named functions.
get_sakura_forecastget_sakura_spotsget_sakura_best_datesget_koyo_forecastget_koyo_spotsget_koyo_best_datesget_kawazu_cherryget_weather_forecast
sakura.forecastsakura.spotssakura.best_dateskoyo.forecastkoyo.spotskoyo.best_dateskawazu.cherryweather.forecast
Renamed all 12 tools. Resubmitted. 100.
sakura.*, koyo.*, weather.* as natural clusters — not a flat list of 12 equally-weighted names. The model can route more accurately because the naming itself is structured signal.
Here's what a tool registration looks like with the current SDK:
The description is what the AI reads to decide whether to call your tool and what to pass it. A bad description leads to wrong tool selection, wrong parameter choices, or the model hallucinating data instead of calling your tool at all. The score dimension rewards quality here because it correlates directly with usability.
"This tool provides information about sakura spots in Japan."
"Get live bloom percentages for 1,012 cherry blossom spots across Japan. Filter by prefecture. Call sakura.forecast first for city-level timing, then use this for specific parks."
prefecture, "Prefecture to filter by" is nearly useless. "Optional prefecture name in English or Japanese to filter results. Example: 'Tokyo', 'Kyoto', '東京'" tells the model the format, the optionality, and gives a concrete example. That's the difference between the model guessing and the model knowing.
Annotations are metadata attached to each tool that tells clients how to treat it. A client that knows a tool is read-only can call it without asking for user confirmation. A client that knows a tool is destructive should warn the user. These signals also flow through to logging, rate limiting, and audit trails.
I defined four constants and apply them to every tool. No guessing, no inconsistency:
For japan-seasons-mcp, every tool uses either READONLY or READONLY_EXTERNAL. Tools that return cached static data (prefectures, flower types) use READONLY. Tools that make live API calls use READONLY_EXTERNAL — the openWorldHint: true flag is the signal.
One tool call without caching means one upstream API request. With 12 tools and potentially dozens of calls per session, that adds up fast. More importantly, most of the data doesn't change on a per-request basis — bloom percentages update once or twice a day, not on every query.
I defined TTL constants that match the actual update cadence of each data type:
The cache layer is a simple in-memory Map with timestamp-based invalidation. No Redis, no external dependency. For a server that runs in a single process and handles read-only data, that's all you need.
After going through the whole process manually, I wrote a Claude Code skill that captures every lesson. A skill is a markdown file that gives Claude a structured workflow — it loads into context and guides the session with specific steps, checklists, and patterns to follow.
The create-mcp skill has two paths:
Both paths encode every fix covered in this post — dot notation naming, verb-first descriptions, annotation constants, TTL strategy, smithery.yaml structure — so you're not discovering them one at a time by watching your score move.
The audit step is where most of the value is. It runs through a checklist of every dimension — tool names (dot notation check), descriptions (verb-first check, max 2 sentence check), parameters (description present, example present), annotations (type assigned), server instructions (present and complete), smithery.yaml (all fields filled), prompts (at least one), resources (hosted registry check).
Install it with:
Once installed, invoke it in any Claude Code session with /create-mcp. It will ask whether you're creating or auditing, then walk you through the full workflow. Source and full documentation on GitHub.
Writing a checklist is easy. The test is whether it holds up on real tasks. I ran it against three independent scenarios — different APIs, one audit of a broken server — and graded the outputs against explicit assertions with a baseline for comparison.
| Scenario | Type | Pass rate | Time |
|---|---|---|---|
| CoinGecko crypto MCP | Create | 11/12 (92%) | 133s |
| Recipe MCP audit | Audit | 10/10 (100%) | 95s |
| OpenWeatherMap MCP | Create | 12/12 (100%) | 110s |
The one miss in the crypto eval was an ambiguous assertion about TTL constants — the grader read "TTL inside the handler body" as requiring the constant to be defined in the handler, not just used there. The actual code was correct: getOrFetch(key, TTL.PRICES, fn) called in every handler. Wording issue, not a skill issue.
The audit scenario is where the previous version of the skill had a real gap. It would identify all the problems and describe the fixes — but not always write package.json and the smithery config as output files. The revised skill makes this explicit: generate these files as output, not as a suggestion. The audit eval now verifies both files physically exist in the output directory.
"author": "Your Name" and GitHub placeholder URLs), and prompt message text must name at least one tool by its dot notation name. The new skill passes both. The only useful assertion is one a bad output can't accidentally pass.
Publishing to Smithery is the highest-leverage directory because it scores your server, shows up in the most MCP clients, and has the most users. But the others are worth doing — they each have different audiences and some are crawled by AI search tools directly.
It depends on how your server runs.
Hosted servers (HTTP, deployed on Railway/Fly.io) don't need npm at all. Users connect directly to your URL — Smithery stores the endpoint and handles the connection. You push to GitHub, Railway deploys, users get the update automatically. japan-seasons-mcp works this way. No npm, no user installs.
Local servers (stdio, runs on the user's machine) need npm. When a user adds your server to Claude Desktop, the config looks like this:
Claude Desktop spawns npx my-weather-mcp as a subprocess whenever it starts. npx fetches your package from the npm registry and runs it — that's the entire install story from the user's side. No manual setup, no cloning, no build step. But it only works if the package is on npm.
To publish:
Users get the latest version on each run — npx checks for updates automatically. Push a fix to npm and it rolls out without anyone touching their config.
Railway is the easiest way to get a hosted MCP live. Connect your GitHub repo and it deploys on every push — no infra to manage.
node dist/index.jshttps://your-app.up.railway.app)That URL is your MCP endpoint. Register it on Smithery once:
After that, every git push to main triggers a Railway redeploy. Everyone connecting via Smithery or the direct URL gets the update automatically — no version bumps, no user action required. This is how japan-seasons-mcp ships updates: push the code, Railway deploys in under a minute, and users connecting via Smithery or the direct URL get the new version on their next call.
| Directory | Setup | Notes |
|---|---|---|
| Smithery | CLI | Highest traffic. Scores your server. Hosted registry gives you the free resources points. |
| mcp.so | UI | Submit via their web form. Simple listing, no scoring. Gets indexed by Google. |
| Glama | CLI | Submit via GitHub PR to their registry repo. Fast review, usually merged same day. |
| PulseMCP | UI | Web submission form. Categorized directory, good for niche discovery. |
| mcp.run | UI | Smaller community but active. Good for getting early feedback and stars. |
For Smithery specifically, if your server is hosted (not just directory-listed), you can deploy updates with:
For stdio servers (installed via npx), go to smithery.ai → Add Server → paste the GitHub URL. It picks up the smithery.yaml automatically.
Submitting to all five directories takes 30–60 minutes total. Each one wants roughly the same information — name, GitHub URL, a one-paragraph description — just in slightly different forms.
Building MCPs is straightforward. The SDK is well-documented, TypeScript support is solid, and the deploy story (Railway, Smithery hosted, or self-hosted) is mature enough. A working MCP server is an afternoon of work.
The gap between a working MCP and a well-built one is about 15 specific things: dot notation naming, verb-first descriptions, parameter examples, annotation constants, server instructions, a complete smithery.yaml, at least one prompt, appropriate TTLs, minimal config surface area, and submitting to all five directories instead of just one.
This post covers all of them. The create-mcp skill automates most of them.
If you're building your own, the japan-seasons-mcp source is on GitHub and has all of these patterns in production. Feel free to use it as a reference.
sakura.forecast and give you a real answer with current bloom data.