Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

How to optimize your Snowflake query performance

Optimizing your Snowflake queries can go a long way in maximizing performance—not to mention reducing your costs. Here are some practical tips on how to write more efficient queries in Snowflake.

and
March 5, 2025
Blog Author Image

Writer / Data

March 5, 2025
How to optimize your Snowflake query performance

In the world of Snowflake, performance isn’t just about speed—it’s also about managing spend. Every inefficient query not only slows down what should be a faster process, but it directly impacts your Snowflake warehouse costs, too.

Unlike traditional databases, Snowflake’s consumption-based pricing model means that query inefficiency translates directly to higher bills. When you’re burning compute resources on poorly optimized queries, you’re essentially watching credits disappear.

In this guide, we’ll walk through practical techniques for Snowflake query optimization that will help you squeeze maximum performance from your Snowflake environment without maxing out your budget.

First things first: Understanding Snowflake's architecture

Before diving into optimization techniques, let's take a moment to understand how Snowflake actually processes your SQL queries.

Snowflake’s architecture separates storage from compute, utilizing a multi-cluster shared data approach. What this means in practice:

  • Your data lives in the storage layer (organized in micro-partitions)
  • Queries execute on virtual warehouses (the compute layer)
  • Metadata is managed by a services layer that coordinates everything
Snowflake's architecture helps us understand how queries work.

This architecture has serious implications for your Snowflake query performance. When a query runs, Snowflake has to determine which micro-partitions are relevant, load them into the compute layer, and then process the data.

Efficient query execution is crucial in Snowflake's architecture to ensure faster results and minimize resource usage. The efficiency of this operation depends heavily on how well your data is organized and how well your SQL queries are structured. The better Snowflake can prune irrelevant micro-partitions, the less data needs to be scanned, and the faster and cheaper your query execution.

How Snowflake manages resources and queries

Snowflake manages resources and queries through a sophisticated multi-layered architecture that separates storage and compute resources. This separation allows for independent scaling of both, enabling teams to work at their own pace without hitting bottlenecks caused by limited resources. When a query is submitted, a governing program routes the request to the query optimizer, which analyzes the request and devises the most efficient execution plan available.

Snowflake then allocates compute resources to execute the query, utilizing virtual warehouses that dynamically scale horizontally depending on the immediate workload. This approach enables parallel processing in a shared data architecture, reducing latency and improving overall query performance. By leveraging this architecture, Snowflake ensures that resources are used efficiently, and queries are executed as quickly as possible, providing a robust environment for data analytics.

Monitoring and benchmarking your queries

Before you start optimizing your queries, it's important that you have a plan for benchmarking their performance—because you can’t optimize what you don’t measure. Before making changes, try to get a solid understanding of your current performance by deploying some of these pracrices.

Leveraging Snowflake's query history

```sql

SELECT

  QUERY_ID,

  QUERY_TEXT,

  WAREHOUSE_SIZE,

  EXECUTION_TIME/1000 as EXECUTION_TIME_SECONDS,

  YTES_SCANNED/(1024*1024*1024) as GB_SCANNED,

  PARTITIONS_SCANNED,

  PARTITIONS_TOTAL

FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY

WHERE EXECUTION_STATUS = 'SUCCESS'

AND WAREHOUSE_NAME = 'YOUR_WAREHOUSE'

ORDER BY START_TIME DESC

LIMIT 100;

```

This query reveals critical metrics like execution time, data scanned, and partition usage – all key indicators of query efficiency.

Using the query profile tool

For deeper analysis, Snowflake’s Query Profile tool provides a visual breakdown of execution time across different operations. Access it through the Snowflake web UI by clicking on a query ID in the history tab. This tool can help streamline query execution by identifying inefficiencies and suggesting improvements.

Pay special attention to:

  • Percentage of time spent on each operation
  • Data scanning metrics
  • Spilling to disk (a red flag for memory pressure)

Once you have baseline measurements for your most important or resource-intensive queries, you can objectively evaluate your optimization efforts.

Identifying performance bottlenecks

Identifying performance bottlenecks in Snowflake is crucial for maintaining optimal query performance. Start by monitoring your query history and examining key statistics such as partitions scanned and rows fetched. These metrics can highlight areas where performance may be lagging.

Techniques for spotting slowdowns and problem areas include resource allocation analysis, utilizing monitoring tools, and leveraging query profiling. Snowflake’s advanced query profiling tools provide a visual breakdown of execution times and resource usage, making it easier to pinpoint inefficiencies. By analyzing these metrics, teams can develop workarounds that integrate with their workflows and prevent bottlenecks from developing in the first place. This proactive approach ensures that your Snowflake environment remains efficient and responsive.

Best practices for optimizing your SQL queries

Now that you understand Snowflake's architecture and have tools to measure performance, let's dive into specific techniques that can dramatically improve your query speeds while reducing compute costs. These optimizations focus on helping Snowflake efficiently prune data and minimize resource usage.

Filter pushdown: Your first line of defense

One of the most powerful optimization techniques is ensuring filter conditions are pushed down to the storage layer, reducing data volume and allowing Snowflake to skip irrelevant micro-partitions entirely.

```sql

-- Inefficient: Scans the entire table before filtering

SELECT * FROM large_table

WHERE DATE(timestamp_column) = CURRENT_DATE();

-- Optimized: Pushes the filter down

SELECT * FROM large_table

WHERE timestamp_column >= CURRENT_DATE()

  AND timestamp_column < DATEADD(day, 1, CURRENT_DATE());

```

The second query allows Snowflake to leverage metadata about partition contents to skip entire blocks of data.

JOIN optimization: Mind your order

The order and type of JOINs significantly impact performance. When joining tables of different sizes, put the larger table on the left side of the join:

```sql

-- Optimized join order (assuming facts is much larger than dim)

SELECT f.*, d.dimension_name

FROM facts f

JOIN dimensions d ON f.dimension_id = d.id

WHERE f.create_date > '2023-01-01';

```

Also consider using `EXISTS` instead of `IN` for better performance with subqueries:

```sql

-- Instead of this

SELECT * FROM customers

WHERE customer_id IN (SELECT customer_id FROM orders WHERE amount > 1000);

-- Use this

SELECT * FROM customers c

WHERE EXISTS (SELECT 1 FROM orders o

              WHERE o.customer_id = c.customer_id

              AND o.amount > 1000);

```

Materialized views: Pre-compute for performance

For frequently accessed aggregations or complex joins, materialized views can dramatically improve performance:

```sql

CREATE MATERIALIZED VIEW daily_sales_mv AS

SELECT

  date_trunc('day', order_date) as day,

  product_id,

  sum(quantity) as units_sold,

  um(amount) as revenue

FROM orders

GROUP BY 1, 2;

```

Snowflake automatically maintains materialized views, refreshing them when the underlying data changes. This can significantly enhance the performance of subsequent queries by providing precomputed results, reducing the need for repetitive calculations.

Snowflake-specific query optimization tips

While the tips we listed above are great tips for SQL queries in general, we also have some Snowflake-specific tips that apply specifically for your Snowflake queries.

Clustering keys: Guide Snowflake's data organization

Clustering keys, also known as cluster keys, tell Snowflake how to organize your data physically, which directly impacts query performance. Choose columns that:

  • Are frequently used in `WHERE` clauses
  • Have high cardinality (but not too high)
  • Are commonly used for range filters

```sql

ALTER TABLE orders CLUSTER BY (order_date, customer_id);

```

To assess your clustering effectiveness:

```sql

SELECT

  TABLE_NAME,

  CLUSTERING_KEY,

  CLUSTER_BY_KEYS,

  AVG_CLUSTERING_DEPTH,

  AVERAGE_OVERLAPS

FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES

WHERE TABLE_NAME = 'ORDERS';

```

Result caching: Don't compute twice

Optimizing query performance in Snowflake involves various strategies, and one effective method is leveraging result caching. Snowflake caches query results for 24 hours. Identical queries will return cached results without consuming additional credits–a massive optimization opportunity.

However, be aware that any data changes invalidate relevant cache entries. To maximize cache hits:

  • Standardize query patterns
  • Avoid unnecessary columns in `SELECT` clauses
  • Be consistent with whitespace and capitalization

Search optimization service: Speed up point lookups

For tables where you frequently need to locate specific rows by equality conditions, Snowflake's search optimization service can deliver massive performance gains:

```sql

ALTER TABLE customers ADD SEARCH OPTIMIZATION;

-- Or for specific columns

ALTER TABLE customers ADD SEARCH OPTIMIZATION ON (email, customer_id);

```

This creates search index structures that make point lookups nearly instantaneous, though it does incur additional storage costs.

Query acceleration service: Help for complex analytics

For complex analytical queries that can't be further optimized through normal means, Snowflake's query acceleration service can provide a performance boost:

```sql

ALTER WAREHOUSE analytics_wh SET QUERY_ACCELERATION = ON;

```

Beyond the query: What else to do to maximize query performance

We've covered best practices with your queries, specifically, but what about things you can do outside of the actual queries you're writing?

Optimizing data loading and storage

Optimizing data loading and storage is crucial for improving query performance in Snowflake. Start by using optimized file formats such as Parquet or ORC, which are designed for efficient data storage and retrieval. Parallel loading can also significantly speed up the data ingestion process, allowing multiple files to be loaded simultaneously.

Snowpipe is another powerful tool for continuous data ingestion, enabling real-time data loading with minimal latency. Efficient data compression techniques can reduce the amount of data that needs to be scanned, further improving query performance. Bulk loading methods, such as using the `COPY` command, can handle large data volumes efficiently, reducing the time required for data loading.

Choosing the right data type for your queries is also essential. Proper data types can speed up search and retrieval functions, reducing the amount of data that needs to be transferred and lowering bandwidth demands and egress fees. By utilizing these techniques, teams can improve overall query performance and reduce costs, ensuring a more efficient and cost-effective Snowflake environment.

Choosing the right Snowflake warehouse size

Selecting the appropriate Snowflake warehouse size is a big part of balancing performance and cost. A right-sized warehouse ensures efficient query execution without incurring unnecessary expenses.

  1. Understand your workload: Analyze the types of queries you run. Large, complex queries may benefit from a larger warehouse, while smaller, frequent queries might only require a medium or small size.
  2. Monitor query performance: Use Snowflake's query history to track execution times and resource usage. If queries consistently run slowly, consider increasing warehouse size to improve query performance.
  3. Leverage auto-scaling: Enable auto-scaling to automatically adjust the number of compute resources based on demand, ensuring optimal performance without manual intervention.
  4. Implement auto-suspend: Set your warehouse to auto-suspend after a period of inactivity to save on costs. This prevents charges from accruing when the warehouse is not in use.
  5. Evaluate cost vs. performance: Continuously assess whether the benefits of a larger warehouse justify the increased costs. Regularly review your Snowflake usage and adjust warehouse size as needed to optimize query performance and resource utilization.

By carefully selecting and managing warehouse sizes, you can maximize performance in Snowflake while keeping costs under control.

Common Snowflake query mistakes to avoid

Now that we've covered some querying best practices, let's take a quick look at some common query optimization mistakes and how to avoid them.

The dreaded full table scan

Queries that force Snowflake to scan all micro-partitions are budget killers. Watch for:

  • Functions wrapped around filtered columns
  • Implicit type conversions
  • Complex expressions in `WHERE` clauses

```sql

-- Will cause full table scan

SELECT * FROM orders WHERE YEAR(order_date) = 2023;

-- Better approach

SELECT * FROM orders

WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';

```

SELECT * syndrome

Retrieving all columns when you only need a few forces Snowflake to process more data than necessary:

```sql

-- Inefficient

SELECT * FROM customers JOIN orders ON customers.id = orders.customer_id;

-- Optimized

SELECT

  c.id, c.name, c.email,

  o.order_id, o.amount

FROM customers c

JOIN orders o ON c.id = o.customer_id;

```

Warehouse sizing missteps

Too small, and your queries wait for resources. Too large, and you waste credits on idle capacity. Consider:

  • Using different warehouses for different workload types
  • Implementing auto-suspend and auto-resume
  • Monitoring warehouse utilization metrics

```sql

-- Check warehouse utilization

SELECT

  WAREHOUSE_NAME,

  COUNT(*) as query_count,

  AVG(EXECUTION_TIME)/1000 as avg_execution_seconds,

  SUM(EXECUTION_TIME)/1000 as total_execution_seconds,

  SUM(CREDITS_USED) as credits_consumed

FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY

WHERE START_TIME >= DATEADD(day, -7, CURRENT_DATE())

GROUP BY WAREHOUSE_NAME

ORDER BY credits_consumed DESC;

```

Advanced query optimization strategies

Everything listed above is a great place to start when setting out to optimize your queries. If you're already doing most of them, though, or just want more optimization ideas, try these techniques.

Rewrite queries for parallel execution

Break complex queries into smaller parts using temporary tables:

```sql

-- Create temp tables for intermediate results

CREATE TEMPORARY TABLE stage_1 AS

SELECT customer_id, SUM(amount) as total_spend

FROM orders

WHERE order_date >= DATEADD(year, -1, CURRENT_DATE())

GROUP BY customer_id;

-- Use the temp table in subsequent operations

SELECT

  c.segment,

  AVG(s.total_spend) as avg_annual_spend

FROM customers c

JOIN stage_1 s ON c.id = s.customer_id

GROUP BY c.segment;

```

This approach allows Snowflake to optimize each step independently and potentially execute parts in parallel.

Use temporary tables for complex transformations

When performing multi-stage transformations, temporary tables can improve both performance and readability:

```sql

-- Stage 1: Filter and join base tables

CREATE TEMPORARY TABLE stage_1 AS

SELECT

  o.order_id,

  o.customer_id,

  o.order_date,

  p.product_name,

  p.

```

Create dedicated virtual warehouses for different workloads

Not all queries are created equal. Consider creating separate warehouses for:

  • ETL processes (larger, scheduled)
  • Interactive analytics (medium, auto-scaling)
  • Dashboard queries (optimized for concurrent users)
  • Data science workloads (XL for periodic intensive work)

This approach prevents resource contention and allows for workload-specific optimization.

Final thoughts

Query optimization in Snowflake isn't just a technical exercise—it's a direct lever for controlling costs and improving the responsiveness of your data platform.

The techniques covered here should give you a solid foundation for maximizing your Snowflake investment. Remember that optimization is an ongoing process that requires regular monitoring and reassessment as your data volumes and query patterns evolve.

By applying these strategies, you'll not only see faster query results but also more predictable and manageable Snowflake costs—a win-win for both users and the finance team.

Want a tool that helps maintain the quality of your data while also keeping an eye on your Snowflake spend? Try Metaplane for free! Data teams use Metaplane to monitor their entire data stack, get better visibility into their data, and increase trust across their organization.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.