13.5 C
New York
Monday, October 13, 2025

Right here’s How I Constructed an MCP to Automate My Information Science Job


Right here’s How I Constructed an MCP to Automate My Information Science Job
Picture by Ideogram

 

Most of my days as an information scientist seem like this:

  • Stakeholder: “Are you able to inform us how a lot we made in promoting income within the final month and what number of that got here from search advertisements?”
  • Me: “Run an SQL question to extract the info and hand it to them.”
  • Stakeholder: “I see. What’s our income forecast for the subsequent 3 years?”
  • Me: “Consolidate information from a number of sources, converse to the finance group, and construct a mannequin that forecasts income.”

Duties just like the above are advert hoc requests from enterprise stakeholders. They take round 3–5 hours to finish and are normally unrelated to the core mission I am engaged on.

When data-related questions like these are available, they typically require me to push the deadlines of present tasks or work further hours to get the job carried out. And that is the place AI is available in.

As soon as AI fashions like ChatGPT and Claude have been made out there, the group’s effectivity improved, as did my means to reply to advert hoc stakeholder requests. AI dramatically lowered the time I spent writing code, producing SQL queries, and even collaborating with totally different groups for required info. Moreover, after AI code assistants like Cursor have been built-in with our codebases, effectivity positive factors improved even additional. Duties just like the one I simply defined above may now be accomplished twice as quick as earlier than.

Lately, when MCP servers began gaining reputation, I believed to myself:

 

Can I construct an MCP that automates these information science workflows additional?

 

I spent two days constructing this MCP server, and on this article, I’ll break down:

  • The outcomes and the way a lot time I’ve saved with my information science MCP
  • Sources and reference supplies used to create the MCP
  • The essential setup, APIs, and providers I built-in into my workflow

 

Constructing a Information Science MCP

 
Should you do not already know what an MCP is, it stands for Mannequin Context Protocol and is a framework that permits you to join a big language mannequin to exterior providers.
This video is a good introduction to MCPs.

 

// The Core Downside

The issue I needed to resolve with my new information science MCP was:

How do I consolidate info that’s scattered throughout varied sources and generate outcomes that may instantly be utilized by stakeholders and group members?

 

To perform this, I constructed an MCP with three elements, as proven within the flowchart beneath:

 

Data Science MCP Flowchart
Picture by Creator | Mermaid

 

// Part 1: Question Financial institution Integration

As a information base for my MCP, I used my group’s question financial institution (which contained questions, a pattern question to reply the query, and a few context concerning the tables).

When a stakeholder asks me a query like this:

What share of promoting income got here from search advertisements?

I not should look by way of a number of tables and column names to generate a question. The MCP as an alternative searches the question financial institution for the same query. It then positive factors context concerning the related tables it ought to question and adapts these queries to my particular query. All I must do is name the MCP server, paste in my stakeholder’s request, and I get a related question in a couple of minutes.

 

// Part 2: Google Drive Integration

Product documentation is normally saved in Google Drive—whether or not it is a slide deck, doc, or spreadsheet.

I linked my MCP server to the group’s Google Drive so it had entry to all our documentation throughout dozens of tasks. This helps shortly extract information and reply questions like:

Are you able to inform us how a lot we made in promoting income within the final month?

I additionally listed these paperwork to extract particular key phrases and titles, so the MCP merely has to undergo the key phrase record based mostly on the question relatively than accessing lots of of pages directly.

For instance, if somebody asks a query associated to “cell video advertisements,” the MCP will first search by way of the doc index to determine essentially the most related recordsdata earlier than trying by way of them.

 

// Part 3: Native Doc Entry

That is the best element of the MCP, the place I’ve a neighborhood folder that the MCP searches by way of. I add or take away recordsdata as wanted, permitting me so as to add my very own context, info, and directions on high of my group’s tasks.

 

Abstract: How My Information Science MCP Works

 
Here is an instance of how my MCP presently works to reply advert hoc information requests:

  • A query is available in: ”What number of video advert impressions did we serve in Q3, and the way a lot advert demand do we now have relative to produce?”
  • The doc retrieval MCP searches our mission folder for “Q3,” “video,” “advert,” “demand,” and “provide,” and finds related mission paperwork
  • It then retrieves particular particulars concerning the Q3 video advert marketing campaign, its provide, and demand from group paperwork
  • It searches the question financial institution for related questions on advert serves
  • It makes use of the context obtained from the paperwork and question financial institution to generate an SQL question about Q3’s video marketing campaign
  • Lastly, the question is handed to a separate MCP that’s linked to Presto SQL, which is routinely executed
  • I then collect the outcomes, evaluate them, and ship them to my stakeholders

 

Implementation Particulars

 
Right here is how I applied this MCP:

 

// Step 1: Cursor Set up

I used Cursor as my MCP shopper. You may set up Cursor from this hyperlink. It’s basically an AI code editor that may entry your codebase and use it to generate or modify code.

 

// Step 2: Google Drive Credentials

Virtually all of the paperwork utilized by this MCP (together with the question financial institution) have been saved in Google Drive.

To present your MCP entry to Google Drive, Sheets, and Docs, you will must arrange API entry:

  1. Go to the Google Cloud Console and create a brand new mission.
  2. Allow the next APIs: Google Drive, Google Sheets, Google Docs.
  3. Create credentials (OAuth 2.0 shopper ID) and save them in a file known as credentials.json.

 

// Step 3: Set Up FastMCP

FastMCP is an open-source Python framework used to construct MCP servers. I adopted this tutorial to construct my first MCP server utilizing FastMCP.

(Observe: This tutorial makes use of Claude Desktop because the MCP shopper, however the steps are relevant to Cursor or any AI code editor of your alternative.)

With FastMCP, you possibly can create the MCP server with Google integration (pattern code snippet beneath):

@mcp.device()
def search_team_docs(question: str) -> str:
    """Search group paperwork in Google Drive"""
    drive_service, _ = get_google_services()
    # Your search logic right here
    return f"Looking for: {question}"

 

// Step 4: Configure the MCP

As soon as your MCP is constructed, you possibly can configure it in Cursor. This may be carried out by navigating to Cursor’s Settings window → Options → Mannequin Context Protocol. Right here, you will see a bit the place you possibly can add an MCP server. Once you click on on it, a file known as mcp.json will open, the place you possibly can embody the configuration on your new MCP server.

That is an instance of what your configuration ought to seem like:

{
  "mcpServers": {
    "team-data-assistant": {
      "command": "python",
      "args": ["path/to/team_data_server.py"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "path/to/credentials.json"
      }
    }
  }
}

 

After saving your adjustments to the JSON file, you possibly can allow this MCP and begin utilizing it inside Cursor.

 

Closing Ideas

 
This MCP server was a easy facet mission I made a decision to construct to save lots of time on my private information science workflows. It is not groundbreaking, however this device solves my instant ache level: spending hours answering advert hoc information requests that take away from the core tasks I am engaged on. I imagine {that a} device like this merely scratches the floor of what is doable with generative AI and represents a broader shift in how information science work will get carried out.

The normal information science workflow is shifting away from:

  • Spending hours discovering information
  • Writing code
  • Constructing fashions

The main focus is shifting away from hands-on technical work, and information scientists at the moment are anticipated to have a look at the larger image and remedy enterprise issues. In some circumstances, we’re anticipated to supervise product choices and step in as a product or mission supervisor.

As AI continues to evolve, I imagine that the strains between technical roles will change into blurred. What is going to stay related is the ability of understanding enterprise context, asking the correct questions, deciphering outcomes, and speaking insights. If you’re an information scientist (or an aspiring one), there isn’t any query that AI will change the best way you’re employed.

You have got two selections: you possibly can both undertake AI instruments and construct options that form this transformation on your group, or let others construct them for you.
 
 

Natassha Selvaraj is a self-taught information scientist with a ardour for writing. Natassha writes on every part information science-related, a real grasp of all information matters. You may join along with her on LinkedIn or try her YouTube channel.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles