Stop Writing Hardcoded SQL. Build Metadata-Driven Engines.
I see too many Data Engineers manually writing the same SELECT, LEFT JOIN, and WHERE clauses for every new pipeline. If you’re copy-pasting SQL, you aren’t engineering; you’re transcribing.
I’ve shifted my development workflow to a Metadata-Driven Approach using Jinja2.
By decoupling the Business Logic (JSON/YAML) from the Execution Logic (SQL Template), I’ve turned a repetitive chore into a scalable, declarative engine.
The Framework: Separation of Concerns
Instead of a static 200-line SQL file, I define the “What” in a configuration and let the Jinja template handle the “How.”
The Logic:
- The Config: Defines tables, join keys, and specific columns. (Notice: No
SELECT *—I optimize for I/O and compute cost). - The Template: A reusable Jinja2 string that enforces engineering standards across every query it generates.
Why the Top 1% of Engineers Build This Way:
- Zero Redundancy (DRY): If I need to change a join pattern or add a global filter, I update the template once. It propagates to every pipeline instantly.
- Cost Governance: By explicitly defining columns in the metadata, I prevent “silent” cost explosions in Snowflake or BigQuery that come from lazy
SELECT *statements. - Scalability: This is the “internal” logic that powers tools like dbt. Understanding these abstractions is the difference between a Junior dev and a Senior Architect.
If you want to build systems that don’t break every time a schema changes, start thinking in templates, not just scripts.
The Implementation
Install Jinja 2
pip install jinja2
Setup Configuration
jinja_config = [
{
"table":"project.bronze.table1 table1",
"select":"table1.*",
"where":""
},
{
"table":"project.bronze.table2 table2",
"select":"table2.col1,col2,col3",
"where":"",
"on":"table1.col = table2.col"
},
{
"table":"project.bronze.table3 table3",
"select":"table3.col1,col2,col3,col4",
"where":"",
"on":"table1.col = table3.col"
}
]
Generate And Render
jinja_str = """
SELECT
{% for config in configs %}
{% if loop.last %}
{{config.select}}
{% else %}
{{config.select}},
{%endif%}
{%endfor%}
FROM
{%for config in configs%}
{%if loop.first%}
{{config.table}}
{%endif%}
{%if not loop.first%}
LEFT JOIN {{config.table}} ON {{config.on}}
{%endif%}
{%endfor%}
{%for config in configs%}
{%if loop.first%}
{%if config.where != ""%}
WHERE
{%endif%}
{%endif%}
{%if not loop.last%}
{%if config.where != "" %}
{{config.where}}
AND
{%endif%}
{%endif%}
{%endfor%}
"""
render = jinja2.Template(jinja_str).render(configs=jinja_config)
print(render)
Final Product
SELECT
table1.*,
table2.col1,col2,col3,
table3.col1,col2,col3,col4
FROM
project.bronze.table1 table1
LEFT JOIN project.bronze.table2 table2 ON table1.col = table2.col
LEFT JOIN project.bronze.table3 table3 ON table1.col = table3.col
# Note : This query was not hardcoded , it was generated by the confuguration and rendering process.