
Picture by Writer
# Introduction
Excel stays related for information work, however a good portion of the time spent utilizing it’s purely mechanical. Duties like combining recordsdata from a number of sources, monitoring down duplicate data, reformatting inconsistent exports, and splitting a grasp sheet into separate recordsdata will not be advanced, however they’re time-consuming and liable to human error.
These 5 Python scripts assist automate these duties. Every one is self-contained, configurable, and designed to work with messy real-world information.
Yow will discover all of the scripts on GitHub.
# Merging A number of Excel Recordsdata
// The Ache Level
When consolidating information from a number of Excel or comma-separated values (CSV) recordsdata, the handbook course of — opening every file, copying the info, and pasting right into a grasp sheet — is sluggish and liable to misalignment errors, particularly when column orders differ between recordsdata.
// What the Script Does
This script scans a folder for .xlsx and .csv recordsdata, stacks all their information right into a single unified sheet, and writes a clear merged output file. It will possibly optionally add a supply column so that you all the time know which row originated from which file, and it handles mismatched column orders routinely.
// How It Works
The script makes use of pandas to learn each file in a goal listing, aligns columns by identify relatively than place, and concatenates the whole lot into one DataFrame. A configurable add_source_column flag appends the unique filename to every row. Column mismatches are logged so if some recordsdata had additional or lacking fields. The output is written with openpyxl and features a abstract tab exhibiting file-by-file row counts.
⏩ Get the Excel recordsdata merger script
# Discovering and Flagging Duplicate Rows
// The Ache Level
Duplicate data are widespread in datasets which were exported and re-imported throughout techniques. Actual matches are simple to seek out, however near-duplicates — identical document, barely completely different formatting or spacing — are more durable to catch manually at scale.
// What the Script Does
This script scans an Excel file for duplicate rows based mostly on columns you outline, flags precise duplicates and near-duplicates via fuzzy matches on string fields, and writes an annotated output file highlighting each suspected duplicate group with shade coding and a confidence rating.
// How It Works
The script makes use of pandas for precise duplicate detection and RapidFuzz for fuzzy string matching on configurable key columns. Every row is assigned a reproduction group ID and a match confidence proportion. The output Excel file makes use of openpyxl formatting to focus on duplicate clusters. A separate abstract sheet exhibits complete duplicates discovered, damaged down by match sort.
⏩ Get the duplicate finder script
# Cleansing and Standardizing Messy Exported Information
// The Ache Level
Information exported from exterior techniques usually arrives inconsistently formatted with blended date codecs, inconsistent capitalization, telephone numbers with various separators, and trailing whitespaces. Cleansing this manually earlier than any evaluation provides up rapidly.
// What the Script Does
This script applies a configurable set of cleansing guidelines to an Excel or CSV file. These embrace standardizing dates, trimming whitespace, fixing capitalization, normalizing telephone numbers and postcodes, eradicating clean rows, and flagging cells that seem incorrect. It outputs a cleaned file and a change log exhibiting precisely what was modified.
// How It Works
The script reads a configuration file that maps column names to cleansing operations: date_format, title_case, strip_whitespace, phone_normalize, remove_blank_rows, and others. Every operation is utilized in sequence. A side-by-side change log is written to a second sheet within the output, exhibiting authentic versus cleaned values for each modified cell. Nothing is silently discarded. If a price can’t be parsed, it’s flagged in a _clean_errors column.
# Splitting One Sheet into Separate Recordsdata by Column Worth
// The Ache Level
A grasp dataset usually must be distributed as separate recordsdata — corresponding to one per area, division, or class. Doing this manually includes filtering, copying, and saving repeatedly, with a excessive threat of blending up information between recordsdata.
// What the Script Does
This script reads a single Excel sheet and splits it into separate output recordsdata — one per distinctive worth in a specified column. Every output file incorporates solely the rows for that worth, with the unique formatting preserved. Filenames are generated routinely from the column values. Optionally, it could actually ship every file as an e-mail attachment utilizing a name-to-email mapping you present.
// How It Works
The script teams the DataFrame by the goal column utilizing pandas, then writes every group to its personal .xlsx file utilizing openpyxl. A naming template, like Sales_Report_{worth}_{date}.xlsx, lets you management the output filename format. Column headers, information sorts, and primary formatting are preserved in every output file. An optionally available e-mail mode reads a CSV mapping of {worth} → {e-mail handle} and sends every file through the Easy Mail Switch Protocol (SMTP).
⏩ Get the sheet splitter script
# Producing a Abstract Pivot Report from Uncooked Information
// The Ache Level
Producing a abstract report from uncooked information — totals by class, month-to-month developments, or prime performers — includes constructing pivot tables, formatting them, and copying outcomes to a presentable structure. When the supply information updates often, this course of is repeated from scratch every time.
// What the Script Does
This script reads a uncooked information Excel file, builds configurable pivot summaries, and writes a formatted multi-tab abstract report. Charts are generated and embedded within the output file. You may re-run it any time the supply information adjustments.
// How It Works
A configuration file defines the date area, the worth area, grouping columns, and particular aggregations to run. The script makes use of pandas for all aggregation logic and openpyxl with Matplotlib for chart technology. Every abstract sort is given its personal tab. Conditional formatting highlights the very best and lowest values. The report is designed for on-demand regeneration, and operating the script once more overwrites the earlier output cleanly.
⏩ Get the pivot report generator script
# Wrapping Up
These 5 scripts cowl widespread Excel duties which are simple to automate however tedious to carry out manually. Select whichever one addresses essentially the most frequent process in your workflow and begin there. Here’s a fast overview:
| Script Identify | Goal | Key Options | Greatest Use Case |
|---|---|---|---|
| Excel Recordsdata Merger | Mix a number of Excel/CSV recordsdata | Column alignment, supply monitoring, abstract sheet | Consolidating information from a number of sources |
| Duplicate Finder | Establish precise and fuzzy duplicates | Fuzzy matching, confidence scores, shade highlighting | Cleansing datasets with repeated data |
| Information Cleaner | Standardize messy exported information | Formatting guidelines, normalization, change log | Preprocessing uncooked exterior information |
| Sheet Splitter | Break up one sheet into a number of recordsdata | Auto file naming, grouping, optionally available e-mail sending | Distributing experiences by class/area |
| Pivot Report Generator | Create abstract experiences from uncooked information | Automated pivots, charts, multi-tab output | Recurring reporting and dashboards |
Comfortable automating!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and occasional! At the moment, she’s engaged on studying and sharing her data with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.