Compare commits

...

3 Commits

Author SHA1 Message Date
Cursor Agent 1669594c36 Add cross-references to best practices documentation in schema tuning guide
Co-authored-by: huanmingwong <huanmingwong@gmail.com>
2025-07-24 03:27:58 +00:00
Cursor Agent 9c88a29f43 Add related resources section to schema tuning documentation
Co-authored-by: huanmingwong <huanmingwong@gmail.com>
2025-07-24 03:25:19 +00:00
Cursor Agent 2c48136ee8 Revamp schema tuning documentation with comprehensive best practices
Co-authored-by: huanmingwong <huanmingwong@gmail.com>
2025-07-24 03:16:43 +00:00
1 changed files with 494 additions and 129 deletions

View File

@ -3,182 +3,547 @@ displayed_sidebar: docs
sidebar_position: 50
---
# Schema Tuning Recipes
# Schema Tuning Best Practices
This document provides practical tips and best practices for optimizing query performance in StarRocks through effective schema design and foundational table choices. By understanding how different table types, keys, and distribution strategies impact query execution, you can significantly improve both speed and resource efficiency. Use these guidelines to make informed decisions when designing schemas, selecting table types, and tuning your StarRocks environment for high-performance analytics.
This document provides comprehensive guidance for optimizing query performance in StarRocks through effective schema design and table configuration. By understanding how different table types, keys, distribution strategies, and indexing options impact query execution, you can significantly improve both speed and resource efficiency.
## Table of Contents
- [Table Type Selection](#table-type-selection)
- [Distribution and Partitioning](#distribution-and-partitioning)
- [Indexing Strategies](#indexing-strategies)
- [Colocate Tables](#colocate-tables)
- [Schema Design Patterns](#schema-design-patterns)
- [Schema Changes](#schema-changes)
- [Performance Optimization Tips](#performance-optimization-tips)
- [Related Resources](#related-resources)
## Table Type Selection
StarRocks supports four table types: Duplicate Key table, Aggregate table, Unique Key table, and Primary Key table. All of them are sorted by KEY.
StarRocks supports four table types, each optimized for different use cases. All tables are sorted by their KEY columns for efficient querying.
- `AGGREGATE KEY`: When records with the same AGGREGATE KEY is loaded into StarRocks, the old and new records are aggregated. Currently, Aggregate tables supports the following aggregate functions: SUM, MIN, MAX, and REPLACE. Aggregate tables support aggregating data in advance, facilitating business statements and multi-dimensional analyses.
- `DUPLICATE KEY`: You only need to specify the sort key for a DUPLICATE KEY table. Records with the same DUPLICATE KEY exist at the same time. It is suitable for analyses that do not involve aggregating data in advance.
- `UNIQUE KEY`: When records with the same UNIQUE KEY is loaded into StarRocks, the new record overwrites the old one. A UNIQUE KEY tables is similar to an Aggregate table with REPLACE function. Both are suitable for analyses involving constant updates.
- `PRIMARY KEY`: Primary Key tables guarantee the uniqueness of records, and allow you to perform realtime updating.
### Aggregate Key Tables
~~~sql
CREATE TABLE site_visit
(
siteid INT,
city SMALLINT,
username VARCHAR(32),
pv BIGINT SUM DEFAULT '0'
**Use Case**: Pre-aggregated analytics, OLAP scenarios, dimensional analysis
Aggregate Key tables automatically merge records with identical keys using specified aggregation functions. This is ideal for metrics and fact tables where you want to pre-compute aggregations.
**Supported Aggregation Functions**:
- `SUM`: Numerical summation
- `MIN`/`MAX`: Minimum/maximum values
- `REPLACE`: Latest value overwrites (similar to UNIQUE KEY)
- `REPLACE_IF_NOT_NULL`: Replace only if new value is not null
- `HLL_UNION`: HyperLogLog union for approximate distinct counting
- `BITMAP_UNION`: Bitmap union for exact distinct counting
```sql
-- Example: Website analytics table
CREATE TABLE site_analytics (
date_key DATE,
site_id INT,
city_id SMALLINT,
page_views BIGINT SUM DEFAULT '0',
unique_visitors BIGINT SUM DEFAULT '0',
bounce_rate DECIMAL(5,4) REPLACE DEFAULT '0.0000',
last_updated DATETIME REPLACE
)
AGGREGATE KEY(siteid, city, username)
DISTRIBUTED BY HASH(siteid);
CREATE TABLE session_data
(
visitorid SMALLINT,
sessionid BIGINT,
visittime DATETIME,
city CHAR(20),
province CHAR(20),
ip varchar(32),
browser CHAR(20),
url VARCHAR(1024)
)
DUPLICATE KEY(visitorid, sessionid)
DISTRIBUTED BY HASH(sessionid, visitorid);
CREATE TABLE sales_order
(
orderid BIGINT,
status TINYINT,
username VARCHAR(32),
amount BIGINT DEFAULT '0'
)
UNIQUE KEY(orderid)
DISTRIBUTED BY HASH(orderid);
CREATE TABLE sales_order
(
orderid BIGINT,
status TINYINT,
username VARCHAR(32),
amount BIGINT DEFAULT '0'
)
PRIMARY KEY(orderid)
DISTRIBUTED BY HASH(orderid);
~~~
## Colocate Table
To speed up queries, tables with the same distribution can use a common bucketing column. In that case, data can be joined locally without being transferred across the cluster during the `join` operation.
~~~sql
CREATE TABLE colocate_table
(
visitorid SMALLINT,
sessionid BIGINT,
visittime DATETIME,
city CHAR(20),
province CHAR(20),
ip varchar(32),
browser CHAR(20),
url VARCHAR(1024)
)
DUPLICATE KEY(visitorid, sessionid)
DISTRIBUTED BY HASH(sessionid, visitorid)
AGGREGATE KEY(date_key, site_id, city_id)
DISTRIBUTED BY HASH(site_id)
PROPERTIES(
"colocate_with" = "group1"
"replication_num" = "3"
);
~~~
```
For more information about colocate join and replica management, see [Colocate join](../../using_starrocks/Colocate_join.md)
### Duplicate Key Tables
## Flat table and star schema
**Use Case**: Raw data storage, event logging, time-series data
StarRocks supports star schema, which is more flexible in modelling than flat tables. You can create a view to replace flat tables during modelling and then query data from multiple tables to accelerate queries.
Duplicate Key tables store all records without aggregation, making them perfect for detailed analysis and data exploration.
Flat tables have the following drawbacks:
```sql
-- Example: User session tracking
CREATE TABLE user_sessions (
user_id BIGINT,
session_id VARCHAR(64),
event_time DATETIME,
page_url VARCHAR(1024),
user_agent VARCHAR(512),
referrer VARCHAR(1024),
ip_address VARCHAR(45),
country_code CHAR(2),
device_type VARCHAR(32)
)
DUPLICATE KEY(user_id, session_id, event_time)
DISTRIBUTED BY HASH(user_id)
PROPERTIES(
"replication_num" = "3"
);
```
- Costly dimension updates because a flat table usually contains a massive number of dimensions. Each time a dimension is updated, the entire table must be updated. The situation exacerbates as the update frequency increases.
- High maintenance cost because flat tables require additional development workloads, storage space, and data backfilling operations.
- High data ingestion cost because a flat table has many fields and an Aggregate table may contain even more key fields. During data loading, more fields need to be sorted, which prolongs data loading.
### Unique Key Tables
If you have high requirements on query concurrency or low latency, you can still use flat tables.
**Use Case**: Dimension tables, slowly changing dimensions, upsert scenarios
## Partition and bucket
Unique Key tables ensure only one record exists per key, with new records overwriting existing ones.
StarRocks supports two levels of partitioning: the first level is RANGE partition and the second level is HASH bucket.
```sql
-- Example: Customer dimension table
CREATE TABLE customers (
customer_id BIGINT,
email VARCHAR(255),
first_name VARCHAR(100),
last_name VARCHAR(100),
registration_date DATE,
status VARCHAR(20),
last_login DATETIME,
total_orders INT DEFAULT '0'
)
UNIQUE KEY(customer_id)
DISTRIBUTED BY HASH(customer_id)
PROPERTIES(
"replication_num" = "3"
);
```
- RANGE partition: RANGE partition is used to divide data into different intervals (can be understood as dividing the original table into multiple sub-tables). Most users choose to set partitions by time, which has the following advantages:
### Primary Key Tables
- Easier to distinguish between hot and cold data
- Be able to leverage StarRocks tiered storage (SSD + SATA)
- Faster to delete data by partition
**Use Case**: Real-time updates, transactional workloads, CDC scenarios
- HASH bucket: Divides data into different buckets according to the hash value.
Primary Key tables provide ACID guarantees and support real-time updates with better performance than Unique Key tables for frequent updates.
- It is recommended to use a column with a high degree of discrimination for bucketing to avoid data skew.
- To facilitate data recovery, it is recommended to keep the size of compressed data in each bucket between 100 MB to 1 GB. We recommend you configure an appropriate number of buckets when you create a table or add a partition.
- Random bucketing is not recommended. You must explicitly specify the HASH bucketing column when you create a table.
For comprehensive guidance on Primary Key table design, performance optimization, and best practices, see **[Primary Key Table Best Practices](../primarykey_table.md)**.
## Sparse index and bloomfilter index
```sql
-- Example: Real-time order tracking
CREATE TABLE orders_realtime (
order_id BIGINT,
customer_id BIGINT,
order_status VARCHAR(20),
order_amount DECIMAL(10,2),
created_at DATETIME,
updated_at DATETIME
)
PRIMARY KEY(order_id)
DISTRIBUTED BY HASH(order_id)
PROPERTIES(
"replication_num" = "3",
"enable_persistent_index" = "true"
);
```
StarRocks stores data in an ordered manner and builds sparse indexes at a granularity of 1024 rows.
## Distribution and Partitioning
StarRocks selects a fixed-length prefix (currently 36 bytes) in the schema as the sparse index.
Proper partitioning and bucketing strategies are fundamental to StarRocks performance. For comprehensive guidance on these topics, see our dedicated best practices guides:
When creating a table, it is recommended to place common filter fields at the beginning of the schema declaration. Fields with the highest differentiation and query frequency must be placed first.
- **[Partitioning Best Practices](../partitioning.md)** - Complete guide to time-based and composite partitioning strategies
- **[Bucketing Best Practices](../bucketing.md)** - In-depth comparison of Hash vs Random bucketing with practical examples
- **[Table Clustering](../table_clustering.md)** - Advanced clustering strategies for improved query performance
A VARCHAR field must placed at the end of a sparse index because the index gets truncated from the VARCHAR field. If the VARCHAR field appears first, the index may be less than 36 bytes.
### Partitioning Strategy
Use the above `site_visit` table as an example. The table has four columns: `siteid, city, username, pv`. The sort key contains three columns `siteid, city, username`, which occupy 4, 2, and 32 bytes respectively. So the prefix index (sparse index) can be the first 30 bytes of `siteid + city + username`.
StarRocks supports **RANGE partitioning** for the first level, typically by date/time columns.
In addition to sparse indexes, StarRocks also provides bloomfilter indexes, which are effective for filtering columns with high discrimination. If you want to place VARCHAR fields before other fields, you can create bloomfilter indexes.
**Benefits of Time-based Partitioning**:
- Efficient data lifecycle management
- Improved query performance through partition pruning
- Support for tiered storage (hot/cold data)
- Faster data deletion by dropping partitions
## Inverted Index
```sql
-- Example: Partitioned fact table
CREATE TABLE sales_fact (
transaction_date DATE,
store_id INT,
product_id INT,
quantity INT,
revenue DECIMAL(10,2)
)
DUPLICATE KEY(transaction_date, store_id, product_id)
PARTITION BY RANGE(transaction_date) (
PARTITION p202301 VALUES [('2023-01-01'), ('2023-02-01')),
PARTITION p202302 VALUES [('2023-02-01'), ('2023-03-01')),
PARTITION p202303 VALUES [('2023-03-01'), ('2023-04-01'))
)
DISTRIBUTED BY HASH(store_id, product_id) BUCKETS 32;
```
StarRocks adopts Bitmap Indexing technology to support inverted indexes that can be applied to all columns of the Duplicate Key table and the key column of the Aggregate table and Unique Key table. Bitmap Index is suitable for columns with a small value range, such as gender, city, and province. As the range expands, the bitmap index expands in parallel.
For detailed partitioning strategies including multi-tenant scenarios, granularity selection, and mixed granularity approaches, see **[Partitioning Best Practices](../partitioning.md)**.
## Materialized view (rollup)
### Bucketing Strategy
A rollup is essentially a materialized index of the original table (base table). When creating a rollup, only some columns of the base table can be selected as the schema, and the order of the fields in the schema can be different from that of the base table. Below are some use cases of using a rollup:
**HASH bucketing** distributes data across nodes for parallel processing.
- Data aggregation in the base table is not high, because the base table has fields with high differentiation. In this case, you may consider selecting some columns to create rollups. Use the above `site_visit` table as an example:
**Best Practices**:
- Choose high-cardinality columns for even distribution
- Avoid columns with severe data skew
- Target 100MB-1GB of compressed data per bucket
- Use multiple columns for better distribution if needed
~~~sql
site_visit(siteid, city, username, pv)
~~~
```sql
-- Good: High cardinality distribution
DISTRIBUTED BY HASH(user_id) BUCKETS 32;
`siteid` may lead to poor data aggregation. If you need to frequently calculate PVs by city, you can create a rollup with only `city` and `pv`.
-- Better: Multiple columns for even distribution
DISTRIBUTED BY HASH(user_id, product_id) BUCKETS 32;
~~~sql
ALTER TABLE site_visit ADD ROLLUP rollup_city(city, pv);
~~~
-- Avoid: Low cardinality causing skew
-- DISTRIBUTED BY HASH(country) BUCKETS 32; -- Don't do this
```
- The prefix index in the base table cannot be hit, because the way the base table is built cannot cover all the query patterns. In this case, you may consider creating a rollup to adjust the column order. Use the above `session_data` table as an example:
For comprehensive bucketing guidance including Random bucketing, operational guidelines, and trade-offs, see **[Bucketing Best Practices](../bucketing.md)**.
~~~sql
session_data(visitorid, sessionid, visittime, city, province, ip, browser, url)
~~~
## Indexing Strategies
If there are cases where you need to analyze visits by `browser` and `province` in addition to `visitorid`, you can create a separate rollup:
### Sparse Index Optimization
~~~sql
ALTER TABLE session_data
ADD ROLLUP rollup_browser(browser,province,ip,url)
DUPLICATE KEY(browser,province);
~~~
StarRocks automatically creates sparse indexes (every 1024 rows) using the first 36 bytes of the sort key.
## Schema change
**Key Principles**:
1. Place most selective filter columns first
2. Place frequently queried columns early
3. Put VARCHAR columns at the end (they truncate the index)
4. Order by query frequency and selectivity
There are three ways to change schemas in StarRocks: sorted schema change, direct schema change, and linked schema change.
```sql
-- Optimized column order for sparse index
CREATE TABLE user_events (
user_id BIGINT, -- High selectivity, frequent filter
event_date DATE, -- Common filter, good for partitioning
event_type VARCHAR(32), -- Moderate selectivity
session_id VARCHAR(64), -- High selectivity but VARCHAR
event_data JSON -- Low selectivity, analysis column
)
DUPLICATE KEY(user_id, event_date, event_type, session_id)
DISTRIBUTED BY HASH(user_id);
```
- Sorted schema change: Change the sorting of a column and reorder the data. For example, deleting a column in a sorted schema leads to data reorder.
### Bloom Filter Index
`ALTER TABLE site_visit DROP COLUMN city;`
Use Bloom Filter indexes for high-cardinality columns that need exact match filtering.
- Direct schema change: Transform the data instead of reordering it, for example, changing the column type or adding a column to a sparse index.
```sql
-- Create Bloom Filter index for VARCHAR columns
CREATE TABLE products (
product_id BIGINT,
product_name VARCHAR(255),
category VARCHAR(100),
brand VARCHAR(100),
sku VARCHAR(64)
)
DUPLICATE KEY(product_id)
DISTRIBUTED BY HASH(product_id)
PROPERTIES(
"bloom_filter_columns" = "product_name,sku"
);
```
`ALTER TABLE site_visit MODIFY COLUMN username varchar(64);`
### Bitmap Index
- Linked schema change: Complete changes without transforming data, for example, adding columns.
Bitmap indexes are effective for low-cardinality columns (typically < 10,000 distinct values).
`ALTER TABLE site_visit ADD COLUMN click bigint SUM default '0';`
```sql
-- Create bitmap index for categorical columns
ALTER TABLE user_events ADD INDEX idx_event_type (event_type) USING BITMAP;
ALTER TABLE customers ADD INDEX idx_country (country_code) USING BITMAP;
ALTER TABLE products ADD INDEX idx_category (category) USING BITMAP;
```
It is recommended to choose an appropriate schema when you create tables to accelerate schema changes.
### Inverted Index
Inverted indexes support full-text search and complex string operations.
```sql
-- Create inverted index for text search
CREATE TABLE documents (
doc_id BIGINT,
title VARCHAR(500),
content TEXT,
tags ARRAY<VARCHAR(50)>
)
DUPLICATE KEY(doc_id)
DISTRIBUTED BY HASH(doc_id)
PROPERTIES(
"inverted_index_columns" = "title,content"
);
```
## Colocate Tables
Colocate tables share the same bucketing strategy, enabling local joins without network shuffling.
**Benefits**:
- Dramatically faster join performance
- Reduced network I/O
- Lower resource consumption
```sql
-- Create colocated tables for efficient joins
CREATE TABLE orders (
order_id BIGINT,
customer_id BIGINT,
order_date DATE,
total_amount DECIMAL(10,2)
)
DUPLICATE KEY(order_id)
DISTRIBUTED BY HASH(customer_id) BUCKETS 32
PROPERTIES(
"colocate_with" = "customer_group"
);
CREATE TABLE order_items (
order_id BIGINT,
item_id BIGINT,
quantity INT,
unit_price DECIMAL(8,2)
)
DUPLICATE KEY(order_id, item_id)
DISTRIBUTED BY HASH(order_id) BUCKETS 32
PROPERTIES(
"colocate_with" = "customer_group" -- Same group as orders table
);
```
## Schema Design Patterns
For advanced table organization and clustering strategies, see **[Table Clustering Best Practices](../table_clustering.md)**.
### Star Schema vs. Flat Tables
**Star Schema Advantages**:
- Flexible modeling and easier maintenance
- Efficient dimension updates
- Better storage utilization
- Supports multiple fact tables sharing dimensions
**Flat Table Use Cases**:
- Ultra-low latency requirements
- High query concurrency
- Simple aggregation patterns
```sql
-- Star Schema Example
-- Fact table
CREATE TABLE sales_fact (
date_key INT,
customer_key INT,
product_key INT,
store_key INT,
quantity INT,
revenue DECIMAL(10,2),
cost DECIMAL(10,2)
)
AGGREGATE KEY(date_key, customer_key, product_key, store_key)
DISTRIBUTED BY HASH(customer_key, product_key);
-- Dimension tables
CREATE TABLE dim_customer (
customer_key INT,
customer_id VARCHAR(50),
customer_name VARCHAR(200),
segment VARCHAR(50),
region VARCHAR(50)
)
UNIQUE KEY(customer_key)
DISTRIBUTED BY HASH(customer_key);
```
### Materialized Views (Rollups)
Create rollups to optimize specific query patterns or improve aggregation performance.
```sql
-- Base table with detailed data
CREATE TABLE web_analytics (
timestamp DATETIME,
user_id BIGINT,
page_id INT,
session_id VARCHAR(64),
country VARCHAR(50),
device_type VARCHAR(20),
page_views BIGINT SUM DEFAULT '1',
time_spent INT SUM DEFAULT '0'
)
AGGREGATE KEY(timestamp, user_id, page_id, session_id, country, device_type)
DISTRIBUTED BY HASH(user_id);
-- Rollup for country-level analysis
ALTER TABLE web_analytics
ADD ROLLUP country_rollup (
timestamp,
country,
device_type,
page_views,
time_spent
) DUPLICATE KEY(timestamp, country, device_type);
-- Rollup for hourly aggregation
ALTER TABLE web_analytics
ADD ROLLUP hourly_rollup (
timestamp,
country,
page_views,
time_spent
) AGGREGATE KEY(timestamp, country);
```
## Schema Changes
StarRocks supports three types of schema changes with different performance characteristics:
### Linked Schema Change (Fastest)
Operations that don't require data transformation:
```sql
-- Add new columns (always added at the end)
ALTER TABLE customers ADD COLUMN loyalty_points INT DEFAULT '0';
ALTER TABLE customers ADD COLUMN preferences JSON;
```
### Direct Schema Change (Medium)
Operations requiring data transformation but not reordering:
```sql
-- Modify column types (compatible changes)
ALTER TABLE customers MODIFY COLUMN phone VARCHAR(20);
ALTER TABLE products MODIFY COLUMN price DECIMAL(12,2);
```
### Sorted Schema Change (Slowest)
Operations requiring data reordering:
```sql
-- Drop columns from sort key
ALTER TABLE sales_fact DROP COLUMN old_dimension;
-- Change sort key order (requires table recreation)
-- This is typically avoided in production
```
## Performance Optimization Tips
### 1. Column Ordering Strategy
```sql
-- Optimal column ordering
CREATE TABLE optimized_table (
-- 1. Primary filters (highest selectivity)
user_id BIGINT,
date_partition DATE,
-- 2. Secondary filters
category_id INT,
status TINYINT,
-- 3. Join keys
product_id BIGINT,
-- 4. Metrics/aggregatable columns
quantity INT,
revenue DECIMAL(10,2),
-- 5. VARCHAR columns (at the end)
description VARCHAR(1000),
notes TEXT
)
DUPLICATE KEY(user_id, date_partition, category_id, status)
DISTRIBUTED BY HASH(user_id);
```
### 2. Bucket Sizing Guidelines
```sql
-- Calculate optimal bucket count
-- Target: 100MB - 1GB compressed data per bucket
-- Formula: (Total_Data_Size_GB / Target_Size_Per_Bucket_GB)
-- For a 100GB table targeting 500MB per bucket:
DISTRIBUTED BY HASH(key_column) BUCKETS 200;
```
### 3. Partition Management
```sql
-- Dynamic partition for automated partition management
CREATE TABLE time_series_data (
event_time DATETIME,
metric_name VARCHAR(100),
metric_value DOUBLE
)
DUPLICATE KEY(event_time, metric_name)
PARTITION BY RANGE(event_time) ()
DISTRIBUTED BY HASH(metric_name)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30", -- Keep 30 days of history
"dynamic_partition.end" = "3", -- Pre-create 3 days ahead
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32"
);
```
### 4. Monitoring and Maintenance
```sql
-- Check table statistics
SHOW TABLE STATUS LIKE 'your_table_name';
-- Analyze partition sizes
SHOW PARTITIONS FROM your_table_name;
-- Check index usage
SHOW INDEX FROM your_table_name;
-- Optimize table after major changes
OPTIMIZE TABLE your_table_name;
```
## Related Resources
### Core Table Design Documentation
- **[Table Overview](../../table_design/StarRocks_table_design.md)** - Comprehensive guide to StarRocks table structure and design principles
- **[Table Types](../../table_design/table_types/table_types.md)** - Detailed comparison of Duplicate, Aggregate, Unique, and Primary Key tables
- **[Table Capabilities](../../table_design/table_types/table_capabilities.md)** - Feature comparison across different table types
### Distribution and Partitioning Best Practices
- **[Partitioning Best Practices](../partitioning.md)** - In-depth guide to time-based and composite partitioning strategies
- **[Bucketing Best Practices](../bucketing.md)** - Comprehensive comparison of Hash vs Random bucketing with practical examples
- **[Data Distribution](../../table_design/data_distribution/Data_distribution.md)** - Technical details on how data is distributed across nodes
### Indexing and Performance
- **[Indexes Overview](../../table_design/indexes/indexes.md)** - Complete guide to all index types in StarRocks
- **[Prefix Index and Sort Key](../../table_design/indexes/Prefix_index_sort_key.md)** - Detailed explanation of sparse indexes and sort key optimization
- **[Bloom Filter Index](../../table_design/indexes/Bloomfilter_index.md)** - Guide to creating and using Bloom Filter indexes
- **[Bitmap Index](../../table_design/indexes/Bitmap_index.md)** - Best practices for low-cardinality column indexing
- **[Inverted Index](../../table_design/indexes/inverted_index.md)** - Full-text search and string operation optimization
### Join Optimization
- **[Colocate Join](../../using_starrocks/Colocate_join.md)** - Complete guide to setting up and managing colocate joins for high-performance local joins
### Query Tuning Integration
- **[Query Tuning Introduction](./query_plan_intro.md)** - Overview of the complete query optimization process
- **[Query Profile Overview](./query_profile_overview.md)** - How to analyze query execution and identify bottlenecks
- **[Query Tuning Recipes](./query_profile_tuning_recipes.md)** - Symptom-driven diagnosis and performance optimization
- **[Query Hints](./query_hint.md)** - Using optimizer hints to guide query execution
### Specialized Table Types
- **[Primary Key Tables](../primarykey_table.md)** - Best practices for real-time update scenarios
- **[Table Clustering](../table_clustering.md)** - Advanced clustering strategies for improved query performance
### Operational Considerations
- **[Data Compression](../../table_design/data_compression.md)** - Optimizing storage and I/O performance
- **[Hybrid Tables](../../table_design/hybrid_table.md)** - Combining different storage strategies
## Conclusion
Effective schema design in StarRocks requires understanding your query patterns, data characteristics, and performance requirements. Start with these best practices:
1. **Choose the right table type** based on your use case
2. **Design optimal sort keys** with selective columns first
3. **Partition by time** for lifecycle management
4. **Distribute evenly** using high-cardinality columns
5. **Use appropriate indexes** for your query patterns
6. **Consider colocate tables** for frequent joins
7. **Monitor and adjust** based on actual performance
Regular monitoring and iterative optimization will help you achieve optimal performance for your specific workload.