HomeSample Page

Sample Page Title


How to Build a Lightweight Data Pipeline with Airtable and Python
Picture by Editor | ChatGPT

 

Introduction

 
Airtable not solely affords a versatile, spreadsheet-like interface for knowledge storage and evaluation, it additionally offers an API for programmatic interplay. In different phrases, you possibly can join it to exterior instruments and applied sciences — as an illustration, Python — to construct knowledge pipelines or processing workflows, bringing your outcomes again to your Airtable database (or just “base”, in Airtable jargon).

This text demonstrates create a easy, ETL-like pipeline utilizing the Airtable Python API. We are going to follow the free tier, guaranteeing the method works with out paid options.

 

Airtable Dataset Setup

 
Whereas the pipeline constructed on this article may be simply tailored to quite a lot of datasets, for these new to Airtable and needing an Airtable venture and saved dataset as a place to begin, we suggest you comply with this current introductory tutorial to Airtable and create a tabular dataset referred to as “Prospects”, containing 200 rows and the next columns (see picture):

 

Customers dataset/table in Airtable
Prospects dataset/desk in Airtable | Picture by Creator

 

Airtable-Python Information Pipeline

 
In Airtable, go to your person avatar — on the time of writing, it’s the circled avatar situated on the bottom-left nook of the app interface — and choose “Builder Hub”. Within the new display screen (see screenshot under), click on on “Private entry tokens”, then on “Create token”. Give it a reputation, and ensure you add not less than these two scopes: knowledge.data:learn and knowledge.data:write. Likewise, choose the bottom the place your clients desk is situated within the “Entry” part, in order that your token has configured entry to this base.

 

Creating Airtable API token
Creating an Airtable API token | Picture by Creator

 

As soon as the token has been created, copy and retailer it rigorously in a secure place, as it will likely be proven solely as soon as. We’ll want it later. The token begins with pat adopted by a protracted alphanumeric code.

One other key piece of data we might want to construct our Python-based pipeline that interacts with Airtable is the ID of our base. Return to your base within the Airtable net interface, and as soon as there, you must see that its URL within the browser has a syntax like: https://airtable.com/app[xxxxxx]/xxxx/xxxx. The half we’re thinking about copying is the app[xxxx] ID contained between two consecutive slashes (/): that is the bottom ID we are going to want. 

With this in hand, and assuming you have already got a populated desk referred to as “Prospects” in your base, we’re prepared to begin our Python program. I might be utilizing a pocket book for coding it. In case you are utilizing an IDE, you could must barely change the half the place the three Airtable atmosphere variables are outlined, to have them learn from an .env file as a substitute. On this model, for simplicity and ease of illustration, we are going to immediately outline them in our pocket book. Let’s begin by putting in the mandatory dependencies:

!pip set up pyairtable python-dotenv

 

Subsequent, we outline the Airtable atmosphere variables. Discover that for the primary two, you’ll want to substitute the worth together with your precise entry token and base ID, respectively:

import os
from dotenv import load_dotenv # Obligatory provided that studying variables from a .env file
from pyairtable import Api, Desk
import pandas as pd

PAT = "pat-xxx" # Your PAT (Private Entry Token) is pasted right here
BASE_ID = "app-xxx" # Your Airtable Base ID is pasted right here
TABLE_NAME = "Prospects"

api = Api(PAT)
desk = Desk(PAT, BASE_ID, TABLE_NAME)

 

We’ve simply arrange an occasion of the Python Airtable API and instantiated a connection level to the shoppers desk in our base. Now, that is how we learn all the dataset contained in our Airtable desk and cargo it right into a Pandas DataFrame. You simply have to be cautious to make use of the precise column names from the supply desk for the string arguments contained in the get() methodology calls:

rows = []
for rec in desk.all():  # honors 5 rps; auto-retries on 429s
    fields = rec.get("fields", {})
    rows.append({
        "id": rec["id"],
        "CustomerID": fields.get("CustomerID"),
        "Gender": fields.get("Gender"),
        "Age": fields.get("Age"),
        "Annual Revenue (okay$)": fields.get("Annual Revenue (okay$)"),
        "Spending Rating (1-100)": fields.get("Spending Rating (1-100)"),
        "Revenue class": fields.get("Revenue Class"),
    })

df = pd.DataFrame(rows)

 

As soon as the information has been loaded, it’s time to apply a easy transformation. For simplicity, we are going to simply apply one transformation, however we may apply as many as wanted, simply as we might often do when preprocessing or cleansing datasets with Pandas. We are going to create a brand new binary attribute, referred to as Is Excessive Worth, to indicate high-value clients, i.e., these whose earnings and spending rating are each excessive:

def high_value(row):
    attempt:
        return (row["Spending Score (1-100)"] >= 70) and (row["Annual Income (k$)"] >= 70)
    besides TypeError:
        return False

df["Is High Value"] = df.apply(high_value, axis=1)
df.head()

 

Ensuing dataset:

 

Airtable data transformation with Python and Pandas
Airtable knowledge transformation with Python and Pandas | Picture by Creator

 

Lastly, it’s time to write the modifications again to Airtable by incorporating the brand new knowledge related to the brand new column. There’s a little caveat: we first must manually create a brand new column named “Excessive Worth” in our Airtable clients desk, with its kind set to “Checkbox” (the equal of binary categorical attributes). As soon as this clean column has been created, run the next code in your Python program, and the brand new knowledge might be routinely added to Airtable!

updates = []
for _, r in df.iterrows():
    if pd.isna(r["id"]):
        proceed
    updates.append({
        "id": r["id"],
        "fields": {
            "Excessive Worth": bool(r["Is High Value"])
        }
    })

if updates:
    desk.batch_update(updates)

 

Time to return to Airtable and see what modified in our supply clients desk! If at first look you see no modifications and the brand new column nonetheless appears empty, do not panic simply but. Not many shoppers are labeled as “excessive worth”, and you could must scroll down somewhat to see some labeled with a inexperienced tick signal:

 

Updated customers table
Up to date clients desk | Picture by Creator

 

That is it! You simply constructed your individual light-weight, ETL-like knowledge pipeline primarily based on a bidirectional interplay between Airtable and Python. Properly achieved!

 

Wrapping Up

 
This text targeted on showcasing knowledge capabilities with Airtable, a flexible and user-friendly cloud-based platform for knowledge administration and evaluation that mixes options of spreadsheets and relational databases with AI-powered capabilities. Specifically, we confirmed run a light-weight knowledge transformation pipeline with the Airtable Python API that reads knowledge from Airtable, transforms it, and masses it again to Airtable — all inside the capabilities and limitations of Airtable’s free model.
 
 

Iván Palomares Carrascosa is a frontrunner, author, speaker, and adviser in AI, machine studying, deep studying & LLMs. He trains and guides others in harnessing AI in the true world.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles