MCP Explained Part 9: Preliminary Conclusions

The final part of the MCP Explained Series reflects on lessons from real LLM tests with the Blockscout MCP server, highlighting limits in reasoning, math reliability, and UI challenges that shape future improvements.

MCP Explained Part 9 Preliminary Conclusions

This section of the MCP Explained Series sums up what we learned while using the Blockscout MCP server with real LLM agents. It's perhaps more of an academic-style conclusion rather than a how-to guide.

We focus on current limits, why they appear, and which directions look promising for the next round of work.

Reliable Data ≠ Reliable Reasoning

Even with a careful server-side design, agents can still invent details. In several real tests, an agent read a list of transactions and then produced a non-existent transaction hash to pass into a tool. The MCP server returned an error (as it should), but the agent did not connect the dots that the hash was made up.

This shows a core truth: hallucinations are a reasoning failure, not a transport failure.

Math is Brittle Without Guardrails

Amounts from Blockscout are raw onchain numbers. Normalizing by decimals sounds trivial to a human, but LLMs often misconvert wei ↔ ETH or misapply decimal shifts.

A single slip turns results into numbers that are ×10…×1,000,000 off, breaking trust in the analysis.

Chat UIs Clash with Exploratory Work

On simple prompts, the OpenAI and Anthropic agents behaved in a stable way. But simple prompts are not the real world. Real questions are exploratory: you open an address, skim transfers, jump to logs, follow a side path, and only then form a view.

Users often don’t know the path in advance, so they cannot describe a perfect step-by-step plan in one message.

In practice, agents then deliver partial answers that reflect whatever they happened to notice first; frustrated users switch back to Blockscout’s UI and explore manually, using their own analogue intelligence.

Long-Horizon Analysis Hits Context and Session Limits

Claude.ai and ChatGPT are general-purpose chat platforms. They are not optimized for long single tasks that call many tools and accumulate large intermediate structures.

When the context window fills up, the usual answer is “start a new chat,” which discards the working set.

Code agents have started to solve this with automatic compaction (summarizing with task focus) and project memory; these chat UIs do not yet provide the same frictionless loop.

Identifying Real Bottlenecks

Our cross-agent tests showed that outcome quality depends less on “did the server return JSON” and more on how the model plans and follows through (pagination, numeric conversions, chain selection, domain cues).

Failures were often strategy failures, not API failures.

Directions to Explore Next

These are not prescriptions for the current server but ideas for future experiments:

Typed, unit-safe fields

Where possible, return {raw, normalized, unit} triplets and mark conversions in the payload to reduce arithmetic errors inside the model. Keep responses compact to protect context.

Mini-tools for common math

Expose small helpers (amount normalization, basis-point math, average block time estimation) so the agent calls code instead of “doing math in prose.”

Prompt templates

The Model Context Protocol allows servers to publish reusable prompts. A library of templates for comprehensive blockchain analysis could be built and integrated into the MCP server.

Code-based analysis for heavy workloads

For complex tasks that require processing large volumes of blockchain data, agents could write and execute small scripts (for example, in Python) that handle the heavy lifting and return only aggregated results.

Such scripts can query the MCP REST API in batches, analyze transactions or logs offline, and provide concise summaries back to the model - reducing context load and token cost.

For general-purpose chat agents, this behavior could be encouraged through server instructions, guiding them to switch to script-based workflows when raw data volume becomes excessive.

For dedicated research agents, this idea can be built into the agent’s logic directly, with tools like write_file and execute, or even richer toolchains such as Foundry, enabling direct on-chain interaction through command-line utilities like cast.

A dedicated research agent

Build (or adapt) an agent with auto-compaction, reflection, and memory while continuing to use the same MCP surface so that server logic remains single-sourced.

Closing thoughts

The MCP server already does the right things for data delivery: lean payloads, multichain routing, and clear pagination. These choices lower friction and make analysis cheaper in tokens.

The remaining gaps lie mostly in agent behavior - numeric reliability, long-horizon planning, and interactive exploration. Bridging those gaps likely requires a specialized research agent built on top of the same server surface, not a heavier server.

That is a good sign: our foundation is solid; next, we should teach the navigator to use it.