The Problem

Hierarchical .xlsx exports are everywhere - and they are a pain to work with programmatically.

📉

Merged and sparse data

Reporting tools export grouped data where "Region" only appears once for a block of rows. Every row below is blank until the next group. You need flat, filled rows for SQL queries, pandas, or dashboards.

🔧

Manual cleanup is fragile

Opening the file and dragging fill-down by hand works once - but breaks the moment a new export lands. You need something repeatable in a script or data pipeline.

🔄

Pandas overhead for simple ETL

Pulling in pandas just to forward-fill a few columns and write to SQLite is a heavy dependency. xlfilldown does the job with only openpyxl, streaming rows in constant memory.

🔍

No audit trail

When something looks wrong downstream, you need to trace a row back to the original spreadsheet. Without preserved row numbers and stable hashes, debugging is guesswork.

⚠️

Hierarchical resets

When "Region" changes, "Country" and "City" should reset - not carry stale values from the previous group. Most simple ffill solutions get this wrong.

📦

No lightweight CLI tool

You want to run a single command in CI/CD or cron to ingest an .xlsx into SQLite, with fill-down, hashing, and row numbers - without writing a custom script every time.

See It in Action

Watch xlfilldown transform sparse hierarchical data into clean, filled rows - then load it into a database.

Interactive Walkthrough

Step through the fill-down process to see how each stage transforms the data.

Step 1 of 4

sales_report.xlsx - Sheet1

	A	B	C	D	E
1	Region	Country	City	Product	Revenue
2	EMEA	UK	London	Widget A	12500
3			Manchester	Widget B	8300
4		Germany	Berlin	Widget A	15200
5			Munich	Widget C	9100
6	APAC	Japan	Tokyo	Widget B	22400
7			Osaka	Widget A	11800
8		Australia	Sydney	Widget C	17600

xlfilldown

sales_filled.xlsx - Processed

	A	B	C	D	E
1	Region	Country	City	Product	Revenue
2	EMEA	UK	London	Widget A	12500
3	EMEA	UK	Manchester	Widget B	8300
4	EMEA	Germany	Berlin	Widget A	15200
5	EMEA	Germany	Munich	Widget C	9100
6	APAC	Japan	Tokyo	Widget B	22400
7	APAC	Japan	Osaka	Widget A	11800
8	APAC	Australia	Sydney	Widget C	17600

The command that does it all:

$ xlfilldown db \
    --infile sales_report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country"]' \
    --db sales.db \
    --row-hash --excel-row-numbers

One command. Clean data.

xlfilldown reads your .xlsx, forward-fills the sparse columns you specify, and writes to SQLite or a new Excel file. No pandas. No manual steps. Just pipe it into your workflow.

# Python API - two lines to go from messy .xlsx to clean SQLite
from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    db="output.db",
    row_hash=True,
    excel_row_numbers=True,
)
print(f"Ingested {summary['rows_ingested']} rows")

Who Is This For?

Anyone dealing with grouped .xlsx exports that need to become flat, queryable data.

Data Analyst

Analysts cleaning monthly reports

You get a grouped .xlsx export from finance, HR, or sales every week. You need flat rows for pivot tables, SQL queries, or BI tools. xlfilldown fills the gaps and gives you a clean sheet or SQLite database in one command.

Data Engineer

Engineers building data pipelines

You need a lightweight, scriptable step in your ETL pipeline to normalize .xlsx inputs before loading into a warehouse. xlfilldown streams rows in constant memory, adds row hashes for deduplication, and needs only openpyxl as a dependency.

DevOps / Automation

Automation in CI/CD or cron

Drop a single CLI command into your pipeline. No Python script to maintain - just xlfilldown db with the right flags. Schedule it with cron, run it in a container, or trigger it from Airflow.

Local User

Running locally on your laptop

Install with pip or pipx, run the command, open the output. No server needed, no cloud uploads. Your data stays on your machine. Perfect for one-off cleanup or quick ad-hoc analysis.

Python Developer

Embedding in your app

Use the Python API directly. Two functions - ingest_excel_to_sqlite() and ingest_excel_to_excel() - with clear return summaries. Build upload endpoints, batch processors, or data quality checks on top.

QA / Audit

Audit trails and data validation

Every row carries its original Excel row number and a SHA-256 hash. Trace any database record back to the source spreadsheet line. Detect duplicates. Verify data integrity after transformation.

Features

Everything you need to go from messy .xlsx to clean, queryable data.

⬇️

Hierarchical fill-down

When a parent column changes, child columns reset automatically. Region > Country > City stays correct across group boundaries.

🔁

Independent fill mode

Pandas-style ffill where each column carries forward on its own. No resets, no dependencies between columns.

💾

SQLite & Excel output

Write to a SQLite database or a new .xlsx file. Same input flags for both. Choose your destination, xlfilldown handles the rest.

🔒

SHA-256 row hashes

Stable, deterministic hashes over all columns after filling. Use for deduplication, change detection, and audit trails.

📋

Excel row numbers

Preserve the original 1-based row number from the source file. Trace any output record back to the exact spreadsheet line.

⚡

Streaming & constant memory

Uses openpyxl read-only mode. Rows stream through without loading the entire sheet into memory. Handles large files efficiently.

🚫

Row filtering

Drop blank spacer rows or require specific columns to be non-null after filling. Clean output without post-processing.

📥

Raw ingest mode

Skip fill-down entirely. Just load the sheet as-is with optional audit columns. Useful for raw copies and baseline comparisons.

🛠️

Minimal dependencies

Only requires openpyxl. No pandas, no numpy, no heavy frameworks. Installs fast, stays lean.

Quick Start

From install to clean data in under a minute.

Install

Grab it from PyPI with pip or pipx. Python 3.9+ required.

$ pip install xlfilldown

Fill down to SQLite

Point at your .xlsx, name the columns to fill, and pick a database.

$ xlfilldown db \
    --infile report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country"]' \
    --db output.db \
    --row-hash --excel-row-numbers

Or fill down to a new .xlsx

Same flags, different destination. Output a clean, filled spreadsheet.

$ xlfilldown xlsx \
    --infile report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A B \
    --outfile filled.xlsx

Use the Python API

Two functions, clear return values. Embed in Flask, FastAPI, Airflow, or any script.

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    db="output.db",
)
# Returns: {table, columns, rows_ingested, row_hash, excel_row_numbers}

row_hash	excel_row	Region	Country	City	Product	Revenue
a3f1c8...	2	EMEA	UK	London	Widget A	12500
7b2e91...	3	EMEA	UK	Manchester	Widget B	8300
d4f5a2...	4	EMEA	Germany	Berlin	Widget A	15200
e8c3b7...	5	EMEA	Germany	Munich	Widget C	9100
1a9d4e...	6	APAC	Japan	Tokyo	Widget B	22400
c6f2b8...	7	APAC	Japan	Osaka	Widget A	11800
5e7a13...	8	APAC	Australia	Sydney	Widget C	17600