A Python library for filling down values in .xlsx spreadsheets (OOXML). Stream into SQLite or a new Excel sheet with forward-fill padding, preserved row numbers, and stable SHA-256 row hashes.
$ pip install xlfilldown
Hierarchical .xlsx exports are everywhere - and they are a pain to work with programmatically.
Reporting tools export grouped data where "Region" only appears once for a block of rows. Every row below is blank until the next group. You need flat, filled rows for SQL queries, pandas, or dashboards.
Opening the file and dragging fill-down by hand works once - but breaks the moment a new export lands. You need something repeatable in a script or data pipeline.
Pulling in pandas just to forward-fill a few columns and write to SQLite is a heavy dependency. xlfilldown does the job with only openpyxl, streaming rows in constant memory.
When something looks wrong downstream, you need to trace a row back to the original spreadsheet. Without preserved row numbers and stable hashes, debugging is guesswork.
When "Region" changes, "Country" and "City" should reset - not carry stale values from the previous group. Most simple ffill solutions get this wrong.
You want to run a single command in CI/CD or cron to ingest an .xlsx into SQLite, with fill-down, hashing, and row numbers - without writing a custom script every time.
Watch xlfilldown transform sparse hierarchical data into clean, filled rows - then load it into a database.
Step through the fill-down process to see how each stage transforms the data.
| A | B | C | D | E | |
|---|---|---|---|---|---|
| 1 | Region | Country | City | Product | Revenue |
| 2 | EMEA | UK | London | Widget A | 12500 |
| 3 | Manchester | Widget B | 8300 | ||
| 4 | Germany | Berlin | Widget A | 15200 | |
| 5 | Munich | Widget C | 9100 | ||
| 6 | APAC | Japan | Tokyo | Widget B | 22400 |
| 7 | Osaka | Widget A | 11800 | ||
| 8 | Australia | Sydney | Widget C | 17600 |
| A | B | C | D | E | |
|---|---|---|---|---|---|
| 1 | Region | Country | City | Product | Revenue |
| 2 | EMEA | UK | London | Widget A | 12500 |
| 3 | EMEA | UK | Manchester | Widget B | 8300 |
| 4 | EMEA | Germany | Berlin | Widget A | 15200 |
| 5 | EMEA | Germany | Munich | Widget C | 9100 |
| 6 | APAC | Japan | Tokyo | Widget B | 22400 |
| 7 | APAC | Japan | Osaka | Widget A | 11800 |
| 8 | APAC | Australia | Sydney | Widget C | 17600 |
xlfilldown reads your .xlsx, forward-fills the sparse columns you specify, and writes to SQLite or a new Excel file. No pandas. No manual steps. Just pipe it into your workflow.
# Python API - two lines to go from messy .xlsx to clean SQLite from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file="report.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], db="output.db", row_hash=True, excel_row_numbers=True, ) print(f"Ingested {summary['rows_ingested']} rows")
Anyone dealing with grouped .xlsx exports that need to become flat, queryable data.
You get a grouped .xlsx export from finance, HR, or sales every week. You need flat rows for pivot tables, SQL queries, or BI tools. xlfilldown fills the gaps and gives you a clean sheet or SQLite database in one command.
You need a lightweight, scriptable step in your ETL pipeline to normalize .xlsx inputs before loading into a warehouse. xlfilldown streams rows in constant memory, adds row hashes for deduplication, and needs only openpyxl as a dependency.
Drop a single CLI command into your pipeline. No Python script to maintain - just xlfilldown db with the right flags. Schedule it with cron, run it in a container, or trigger it from Airflow.
Install with pip or pipx, run the command, open the output. No server needed, no cloud uploads. Your data stays on your machine. Perfect for one-off cleanup or quick ad-hoc analysis.
Use the Python API directly. Two functions - ingest_excel_to_sqlite() and ingest_excel_to_excel() - with clear return summaries. Build upload endpoints, batch processors, or data quality checks on top.
Every row carries its original Excel row number and a SHA-256 hash. Trace any database record back to the source spreadsheet line. Detect duplicates. Verify data integrity after transformation.
Everything you need to go from messy .xlsx to clean, queryable data.
When a parent column changes, child columns reset automatically. Region > Country > City stays correct across group boundaries.
Pandas-style ffill where each column carries forward on its own. No resets, no dependencies between columns.
Write to a SQLite database or a new .xlsx file. Same input flags for both. Choose your destination, xlfilldown handles the rest.
Stable, deterministic hashes over all columns after filling. Use for deduplication, change detection, and audit trails.
Preserve the original 1-based row number from the source file. Trace any output record back to the exact spreadsheet line.
Uses openpyxl read-only mode. Rows stream through without loading the entire sheet into memory. Handles large files efficiently.
Drop blank spacer rows or require specific columns to be non-null after filling. Clean output without post-processing.
Skip fill-down entirely. Just load the sheet as-is with optional audit columns. Useful for raw copies and baseline comparisons.
Only requires openpyxl. No pandas, no numpy, no heavy frameworks. Installs fast, stays lean.
From install to clean data in under a minute.
Grab it from PyPI with pip or pipx. Python 3.9+ required.
$ pip install xlfilldown
Point at your .xlsx, name the columns to fill, and pick a database.
$ xlfilldown db \ --infile report.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols '["Region","Country"]' \ --db output.db \ --row-hash --excel-row-numbers
Same flags, different destination. Output a clean, filled spreadsheet.
$ xlfilldown xlsx \ --infile report.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols-letters A B \ --outfile filled.xlsx
Two functions, clear return values. Embed in Flask, FastAPI, Airflow, or any script.
from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file="report.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], db="output.db", ) # Returns: {table, columns, rows_ingested, row_hash, excel_row_numbers}
Tested and supported across Python and openpyxl versions.