This post might interest you if you run local models with ollama and want to connect them with your notes, and if you care about reliability. The MCP server and scripts of these experiments are available here on Github.

MCP in a few words

Model Context Protocol (MCP) was developed and coined by Anthropic in november 2024 as an open-source, universal language for Large Language Models to interact with external tools. A model by itself is capable of much, but prone to hallucinating, and, for example, not able to do maths reliably or remember a database of factual information (about a list of objects, clients, events…). It is then useful to augment the model with a set of tools that can help it perform a variety of tasks.

🧩 Tool or resource?

In MCP, tools are actions, resources are data.

ConceptVerb or noun?Typical useExample
ToolVerbAsk the model to do somethingadd(a, b), search_notes(...)
ResourceNounLet the model see some data / metadataconfig://app, notes://list

Use a tool when the model should do something: call an API, modify a file, write a note, run a search, compute a result. If can be described with a verb (create, update, summarize, search), it is probably a tool.

Use a resource when the model should see something: read-only config, a directory listing, a static document, or a pre-computed index. If it can be described with a noun (config, notes list, embedding index), this is probably a resource.

FastMCP

The FastMCP framework is a python package that helps build, manage and interact with MCP servers.

Defining your tools and resources

For the examples in this section, assume we are working on a server.py python file that we will launch as our FastMCP server.

from fastmcp import FastMCP
from fastmcp.resources import DirectoryResource
from pathlib import Path

mcp = FastMCP("demo-server")
BASE_DIR = Path("./data").resolve()

And at the end :

def main() -> None:
    mcp.run(transport="http", host="127.0.0.1", port=8000)

if __name__ == "__main__":
    main()

Using MCP vocabulary, tools are functions that are used to perform an external action by the LLM. It can be an API call, reading a file, creating a file, performing a mathematical operation. Here is a simple, straightforward tool :

@mcp.tool
def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b

It is a typed function, with a well-written docstring (important for future guiding of the model) and a @mcp.tool decorator.

Resources are a new, important concept to understand in MCP, and are used to list data, information or files using an URI address. In real-world examples, they would rather be statically called to add to a model context rather than dynamically via model choice of external calling. The decorator defines the URI and accepts a MIME type that tells the host how to display or treat the content, as shown in the simple example below:

@mcp.resource("config://app", mime_type="application/json")
def app_config() -> dict:
    """Static application configuration."""
    return {
        "name": "Demo MCP Server",
        "version": "1.0.0",
        "env": "dev"
    }

Resources can be parameterized as shown here :

@mcp.resource("greetings://{name}")
def personalized_greeting(name: str) -> str:
	"""Generates a personalized greeting for the given name."""
	return f"Hello, {name}! Welcome to the MCP server."

Here is an example of calling these simple tools in order (check the config, do an addition, call the user greeting function) with llama3.2:3b-instruct-fp16 with the following prompt :

You have access to these MCP capabilities:
- Tool: add(a: int, b: int) -> int
- Resource: resource://config (JSON app configuration)
- Resource: resource://greetings/{name} (returns a greeting for the given name)
Please do the following, using the tools and resources instead of guessing:
1. Fetch the resource://config resource and briefly summarize the app’s name, version, and environment.
2. Use the add tool to compute 40 + 2, and tell me the result.
3. Fetch resource://greetings/Alice and include that greeting in your answer.
Finally, produce a short, friendly paragraph that:
When EVERYTHING is done, send ONE final message that starts with the line:
END_OF_RUN
- Mentions the configuration you found,
- States the result of 40 + 2,
- And naturally weaves in the greeting for Alice.
Near-duplicate pair 1

I’ve run this simple workflow over 50 runs for a few tool-calling models with ollama. We can already observe the variability and instability of runs with this runtime versus error rate versus number of steps below.

Even on this toy workflow, I saw runs that stopped early or hallucinated “I did X” while nothing was written. Some runs chose to use all tools available for whatever reason instead of following the workflow.

Prompt and model influence greatly error patterns. Tool calling isn’t a solved problem, especially on small local models.

Here is a compiled table of metrics on these simple tasks for a bunch of models that run on my graphics card (NVIDIA RTX 5070 Ti, with 16 GB VRAM):

ModelMedian wall (ms)Success rateAll required toolsNotes
llama3.2:3b-instruct-fp162 2090.740.80Fast, but unstable / partial flows
llama3-groq-tool-use:8b1 8561.000.00No errors… but doesn’t do the job
granite4:7b-a1b-h1 8891.001.00Fast and reliably follows workflow
mistral-nemo:12b-instruct-2407-q8_017 3561.001.00Slow but very reliable
qwen2.5-14b-lean:latest21 4000.961.00Slow, but good on more complex flows
lukaspetrik/gemma3-tools:12b4 8371.001.00Good generalist tool-caller

In a word : granite4:7b-a1b-h is very fast and always reliable. If I need a fast and reliable agent for a simple flow i’ll think of this new IBM model for sure. llama3-groq-tool-use:8b reliably triggers no errors… but does not do what is needed, at all. Larger models (Mistral, Qwen, Gemma) were slower but very reliable and shine when the workflow is more complex.

Considering a success here is simply ‘having called the right tools’ we could also use a LLM as a judge comparing the model’s END_OF_RUN output and the ideal output. Reading outputs quickly, most of successful runs had not significant errors in their output, maybe only a bit of creativity in formulation, here are two example outputs :

qwen2.5-14b-lean:latest The application configuration indicates that this is a version 1.0 built by MyTeam in their specified environment. The computation of 40 + 2 using the add tool yields a result of 42. Alice has been warmly welcomed with the greeting “Hello, Alice! Welcome to the MCP server.” It’s great to see everything functioning smoothly and to have Alice join us on this journey through the capabilities of our system.

lukaspetrik/gemma3-tools:12b The application’s configuration reveals it’s version 1.0, authored by MyTeam. I was able to compute that 40 + 2 equals 42. And, as a friendly greeting, I’m including “Hello, Alice! Welcome to the MCP server.” It’s been a productive session!

Some of my custom tools for interacting with markdown notes

This code defines a resource based on my notes directory.

BASE_DIR.mkdir(parents=True, exist_ok=True)

notes_listing = DirectoryResource(
    uri="notes://list",
    path=BASE_DIR,
    name="Demo notes directory",
    description="Lists Markdown notes in ./data.",
    recursive=False,
)
mcp.add_resource(notes_listing)

This tool uses the read_text method of the pathlib module to read the content of a file in the notes directory.

@mcp.tool
def read_note(relpath: str) -> str:
    """
    Read a UTF-8 text note from the demo directory.

    Example:
      {"relpath": "example.md"}
    """
    p = (BASE_DIR / relpath).resolve()

    if BASE_DIR not in p.parents and p != BASE_DIR:
        raise ValueError("Path escapes BASE_DIR")

    if not p.exists():
        raise FileNotFoundError(relpath)

    return p.read_text(encoding="utf-8", errors="ignore")

This tool performs a simple search in the notes for a given string in a line and returns the first 10 results, truncated to about 200 characters.

import re
from typing import List
from pydantic import BaseModel, Field

class NoteHit(BaseModel):
    path: str = Field(..., description="Path relative to BASE_DIR")
    line: int
    snippet: str

@mcp.tool
def search_notes(query: str, max_results: int = 10) -> List[NoteHit]:
    """
    Search all notes in BASE_DIR for a substring.
    """
    pattern = re.escape(query)
    rx = re.compile(pattern, re.IGNORECASE)

    hits: List[NoteHit] = []
    for p in BASE_DIR.rglob("*.md"):
        if len(hits) >= max_results:
            break
        with p.open("r", encoding="utf-8", errors="ignore") as f:
            for i, line in enumerate(f, start=1):
                if rx.search(line):
                    rel = p.relative_to(BASE_DIR).as_posix()
                    snip = line.strip()
                    if len(snip) > 200:
                        snip = snip[:197] + "…"
                    hits.append(NoteHit(path=rel, line=i, snippet=snip))
                    if len(hits) >= max_results:
                        break
    return hits
Running and calling a server

I am using ollama to run a local LLM and make it interact with the FastMCP tools. Here is the general architectural flow of the way all this interact in this experiment. I have defined tools and services in a FastMCP server script (a few of those tools are defined above) :

fastmcp run server.py:mcp --transport http --port 8000

I need to have ollama running, so let’s launch it with the desired model :

ollama run llama3.2:3b-instruct-fp16 --verbose

Now it is time to run a MCP client that glues it all together:

flowchart LR
    subgraph Notes[📄 Markdown Notes]
    end

    subgraph Server[FastMCP Server]
        direction TB
        T1[tools]
        T2[resources]
    end

    subgraph Client[FastMCP Client]
    end

    subgraph Ollama[Ollama API]
    end

    LLMs[LLMs / Embeddings]

    %% Arrows to notes
    Notes -->|"reads"| Server
    Server -->|"creates"| Notes

    %% Bidirectional arrows
    Server <--> Client
    Client <--> Ollama
    Ollama --> LLMs

To go beyond the simple exact word search, I also created an embedding for chunked notes, that the model can query to perform a semantic search if needed:

flowchart LR

    Notes[(📝 Markdown notes)]

    Server["🧩 FastMCP Server
build_embed_index"] Vectors[(🧊 vectors.npy)] Ollama["🦙 Ollama API"] Model{{"🤖 bge-m3:latest
embedding model"}} %% Two stacked arrows Notes ↔ Server Notes -->|"read & chunk"| Server Server -->|"(optional) update"| Notes %% Embedding flow Server <-->|"embed chunks"| Ollama Model -->|"loaded and called"| Ollama Server -->|"write embeddings"| Vectors

Prompts, overall workflows logic, the client for fastMCP, loops for trying and guardrails are launched via a python script following the above schema structure:

python bridge_mcp_ollama.py

Experiments with notes creating and curating

The main problem: even with simple prompts, small local models were highly variable. Some runs worked well, others confidently hallucinated notes that never existed.

In notes-search experiments, in a good run the model pulls its list of tools and resources, reads relevant notes and outputs a resulting summary. In a bad run it hallucinates a note without having even searched anything. In notes creating experiments (populating the notes sandbox with a variety of credible notes), a good run outputs a few creative ideas into .md files in the folder. Sometimes it creatively hallucinates having created notes. I had trouble outputting more than two or three notes reliably. I will need to do more experiments with task flows, prompt semantics, nudging gates and various models before showing more quantitative results. The runs I have done for now are not conclusive.

What was useful
  • Setting up an overall loop to stop iterating after a maximum number of steps (for example 15 or 30) in case of infinite loops.
  • Prompting for an explicit end of task tag to avoid stopping too early (here, END_OF_RUN). At first, I had only a rule that an empty response was the end of the loop.
  • Guiding the model by having a catalog and listing functions and tools, so that a lost model can fetch a list of usable tools if needed.
  • Have a scaffolding of an example sequence of tools following a logical path, so that the model has an inkling of what to do and in which order.
  • Have a set of tracked “had to be called” tools that basically force the model to continue looping as long as this set of tools was not called : for example WRITE_TOOLS = {"append_note"} when I wanted to populate the notes sandbox with ai-generated notes. Reduced the hallucinated statements like “END_OF_RUN: I wrote 5 notes” when in fact nothing had been created.
A gate nudges the model towards note creation
Figure : See in the orange box a gate activation that detects that the END_OF_RUN tag was returned too quickly and nudges the model towards synthetic note creation.
  • Having a model both good and reliable at tool calling (that requires rather a low temperature) and creative enough to output interesting text is a compromise hard to reach, but this can be circumvented by calling a model specifically for creative writing and using the tool-calling model as an orchestrator for the content-creating tool.

Here is one of my system prompts exemplifying these different ideas :

{"role": "system",
	    "content":
"You have tools: list_resources, read_resource, catalog, search_notes, read_note, append_note, summarize_note, similar_notes, build_embed_index, refresh_list.\n"
"Strategy: call catalog(). call refresh_list() to register the notes listing resource. Call list_resources() and look for a URI like resource://notes-list. \n"
"Call read_resource on that URI to inspect the sandbox. append_note to write new notes.\n"
"ALWAYS produce a final natural-language answer ONLY AT THE END, starting the final message with the line 'END_OF_RUN'.\n"
"Until you reach END_OF_RUN, prefer issuing tool calls over plain text.\n"
"If more information is needed, call tools instead of asking the user.\n"
"Operator stance:\n"
"- Act-first. Prefer issuing tool calls over speculative text.\n"
"- If uncertain, choose a sensible default and proceed.\n"
"- Stop when the job is complete and you have created all required notes and the Operator Log.\n"},

In general, with the small local models I tried, there was a lot of variability of results even with simple prompts. Tool calling by small local models does not seem like a solved problem to me.

Good results come from choosing the right model from the job (orchestrator versus writer for instance) and adding enough guardrails and explicit workflows.

I will keep iterating more on the subject, and will keep you updated once I have stronger quantitative results.