Stop Writing Hardcoded SQL. Build Metadata-Driven Engines.

I see too many Data Engineers manually writing the same SELECT, LEFT JOIN, and WHERE clauses for every new pipeline. If you’re copy-pasting SQL, you aren’t engineering; you’re transcribing.

I’ve shifted my development workflow to a Metadata-Driven Approach using Jinja2.

By decoupling the Business Logic (JSON/YAML) from the Execution Logic (SQL Template), I’ve turned a repetitive chore into a scalable, declarative engine.

The Framework: Separation of Concerns

Instead of a static 200-line SQL file, I define the “What” in a configuration and let the Jinja template handle the “How.”

The Logic:

  • The Config: Defines tables, join keys, and specific columns. (Notice: No SELECT *—I optimize for I/O and compute cost).
  • The Template: A reusable Jinja2 string that enforces engineering standards across every query it generates.

Why the Top 1% of Engineers Build This Way:

  1. Zero Redundancy (DRY): If I need to change a join pattern or add a global filter, I update the template once. It propagates to every pipeline instantly.
  2. Cost Governance: By explicitly defining columns in the metadata, I prevent “silent” cost explosions in Snowflake or BigQuery that come from lazy SELECT * statements.
  3. Scalability: This is the “internal” logic that powers tools like dbt. Understanding these abstractions is the difference between a Junior dev and a Senior Architect.

If you want to build systems that don’t break every time a schema changes, start thinking in templates, not just scripts.

The Implementation

Install Jinja 2
pip install jinja2
Setup Configuration
jinja_config = [
    {
        "table":"project.bronze.table1 table1",
        "select":"table1.*",
        "where":""
    },
    {
        "table":"project.bronze.table2 table2",
        "select":"table2.col1,col2,col3",
        "where":"",
        "on":"table1.col = table2.col"
    },
    {
        "table":"project.bronze.table3 table3",
        "select":"table3.col1,col2,col3,col4",
        "where":"",
        "on":"table1.col = table3.col"
    }
]
Generate And Render
jinja_str = """
    SELECT 
        {% for config in configs %}
            {% if loop.last %}
                {{config.select}}
            {% else %}
                {{config.select}},
            {%endif%}
        {%endfor%}
    FROM 
        {%for config in configs%}
            {%if loop.first%}
                {{config.table}}
            {%endif%}
            {%if not loop.first%}
                LEFT JOIN {{config.table}} ON {{config.on}}
            {%endif%}
        {%endfor%}
        
    {%for config in configs%}
        {%if loop.first%}
             {%if config.where != ""%} 
                WHERE
             {%endif%}
        {%endif%}
        {%if not loop.last%}
            {%if config.where != "" %}
                {{config.where}}
                AND 
            {%endif%}
        {%endif%}
    {%endfor%}
"""

render = jinja2.Template(jinja_str).render(configs=jinja_config)
print(render)
Final Product
SELECT 
          table1.*,
                table2.col1,col2,col3,
                table3.col1,col2,col3,col4
FROM 
                project.bronze.table1 table1
                LEFT JOIN project.bronze.table2 table2 ON table1.col = table2.col
                LEFT JOIN project.bronze.table3 table3 ON table1.col = table3.col

# Note : This query was not hardcoded , it was generated by the confuguration and rendering process.