[Doc] add automatic partitioning (#20577)
* add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> * add automatic partitioning Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com> --------- Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com>
This commit is contained in:
parent
5249bc5931
commit
e6f6ebc33b
|
|
@ -18,6 +18,7 @@
|
|||
+ [Data models](./table_design/Data_model.md)
|
||||
+ Data distribution
|
||||
+ [Data distribution](./table_design/Data_distribution.md)
|
||||
+ [Automatic partitioning](./table_design/automatic_partitioning.md)
|
||||
+ [Dynamic partitioning](./table_design/dynamic_partitioning.md)
|
||||
+ [Data compression](./table_design/data_compression.md)
|
||||
+ [Sort keys and prefix indexes](./table_design/Sort_key.md)
|
||||
|
|
|
|||
|
|
@ -168,155 +168,167 @@ If you intend to set the number of buckets, StarRocks 2.4 and later versions sup
|
|||
|
||||
### Create partitions
|
||||
|
||||
You can partition a table in threes ways by using `PARTITION BY RANGE` clause:
|
||||
Since version 3.0, StarRocks supports [automatic partitioning](./automatic_partitioning.md) during data loading. You no longer need to create a large number of partitions in advance. This on-demand partition creation method can help you reduce the O&M costs.
|
||||
|
||||
- Partition a table with the LESS THAN clause. For more information, see [CREATE TABLE](../sql-reference/sql-statements/data-definition/CREATE%20TABLE.md).
|
||||
If you need to create partitions in advance, you can use other partition creation methods, such as enabling dynamic partitioning and manually creating partitions.
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, ...)
|
||||
(
|
||||
PARTITION partition_name1 VALUES LESS THAN ("value1", "value2", ...),
|
||||
PARTITION partition_name2 VALUES LESS THAN ("value1", "value2", ...),
|
||||
PARTITION partition_name3 VALUES LESS THAN (MAXVALUE)
|
||||
)
|
||||
```
|
||||
- Dynamic partitioning
|
||||
|
||||
- Partition a table by specifying values of a fixed range. For more information, see CREATE TABLE.
|
||||
StarRocks supports [dynamic partitioning](./dynamic_partitioning.md), which can automatically manage the time to live (TTL) of partitions, such as partitioning new input data in tables and deleting expired partitions. This feature significantly reduces maintenance costs.
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, k3, ...)
|
||||
(
|
||||
PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)],
|
||||
PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, )],
|
||||
"k3-upper1-2", ...
|
||||
)
|
||||
```
|
||||
- Manually create partitions
|
||||
|
||||
- Partition a table by specifying START, END, and EVERY. You can create multiple partitions at a time by using this method. For more information, see CREATE TABLE.
|
||||
- Partition a table with the LESS THAN clause. For more information, see [CREATE TABLE](../sql-reference/sql-statements/data-definition/CREATE%20TABLE.md).
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, ...)
|
||||
(
|
||||
START ("value1") END ("value2") EVERY (INTERVAL value3 day)
|
||||
)
|
||||
```
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, ...)
|
||||
(
|
||||
PARTITION partition_name1 VALUES LESS THAN ("value1", "value2", ...),
|
||||
PARTITION partition_name2 VALUES LESS THAN ("value1", "value2", ...),
|
||||
PARTITION partition_name3 VALUES LESS THAN (MAXVALUE)
|
||||
)
|
||||
```
|
||||
|
||||
#### Examples
|
||||
- Partition a table by specifying values of a fixed range. For more information, see [CREATE TABLE](../sql-reference/sql-statements/data-definition/CREATE%20TABLE.md).
|
||||
|
||||
The following examples demonstrate how to partition a table by specifying START, END, and EVERY.
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, k3, ...)
|
||||
(
|
||||
PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)],
|
||||
PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, )],
|
||||
"k3-upper1-2", ...
|
||||
)
|
||||
```
|
||||
|
||||
- The data type of the partitioning column is DATE and you specify the time range of partitioning via START and END and define the time range via EVERY. Example:
|
||||
- Create multiple partitions at a time.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access (
|
||||
datekey DATE,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey) (
|
||||
START ("2021-01-01") END ("2021-01-04") EVERY (INTERVAL 1 DAY)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES ("replication_num" = "1"
|
||||
);
|
||||
```
|
||||
Partition a table by specifying START, END, and EVERY. You can create multiple partitions at a time by using this method. For more information, see [CREATE TABLE](../sql-reference/sql-statements/data-definition/CREATE%20TABLE.md).
|
||||
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
```SQL
|
||||
PARTITION BY RANGE (k1, k2, ...)
|
||||
(
|
||||
START ("value1") END ("value2") EVERY (INTERVAL value3 day)
|
||||
)
|
||||
```
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p20210101 VALUES [('2021-01-01'), ('2021-01-02')),
|
||||
PARTITION p20210102 VALUES [('2021-01-02'), ('2021-01-03')),
|
||||
PARTITION p20210103 VALUES [('2021-01-03'), ('2021-01-04'))
|
||||
)
|
||||
```
|
||||
**Examples**
|
||||
|
||||
- The data type of the partitioning column is DATE and you specify different EVERY clauses for different time ranges (which cannot overlap with each other). Example:
|
||||
The following examples demonstrate how to partition a table by specifying START, END, and EVERY.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access(
|
||||
datekey DATE,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey)
|
||||
(
|
||||
START ("2019-01-01") END ("2021-01-01") EVERY (INTERVAL 1 YEAR),
|
||||
START ("2021-01-01") END ("2021-05-01") EVERY (INTERVAL 1 MONTH),
|
||||
START ("2021-05-01") END ("2021-05-04") EVERY (INTERVAL 1 DAY)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES(
|
||||
"replication_num" = "1"
|
||||
);
|
||||
```
|
||||
- The data type of the partitioning column is DATE and you specify the time range of partitioning via START and END and define the time range via EVERY. Example:
|
||||
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
```SQL
|
||||
CREATE TABLE site_access (
|
||||
datekey DATE,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey) (
|
||||
START ("2021-01-01") END ("2021-01-04") EVERY (INTERVAL 1 DAY)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES ("replication_num" = "1"
|
||||
);
|
||||
```
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p2019 VALUES [('2019-01-01'), ('2020-01-01')),
|
||||
PARTITION p2020 VALUES [('2020-01-01'), ('2021-01-01')),
|
||||
PARTITION p202101 VALUES [('2021-01-01'), ('2021-02-01')),
|
||||
PARTITION p202102 VALUES [('2021-02-01'), ('2021-03-01')),
|
||||
PARTITION p202103 VALUES [('2021-03-01'), ('2021-04-01')),
|
||||
PARTITION p202104 VALUES [('2021-04-01'), ('2021-05-01')),
|
||||
PARTITION p20210501 VALUES [('2021-05-01'), ('2021-05-02')),
|
||||
PARTITION p20210502 VALUES [('2021-05-02'), ('2021-05-03')),
|
||||
PARTITION p20210503 VALUES [('2021-05-03'), ('2021-05-04'))
|
||||
)
|
||||
```
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
|
||||
- The data type of the partitioning column is INT and you specify the value range of partitioning by using START and END and define the incremental value via EVERY. Example:
|
||||
> Note: Do not double quote the incremental value defined by EVERY.
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p20210101 VALUES [('2021-01-01'), ('2021-01-02')),
|
||||
PARTITION p20210102 VALUES [('2021-01-02'), ('2021-01-03')),
|
||||
PARTITION p20210103 VALUES [('2021-01-03'), ('2021-01-04'))
|
||||
)
|
||||
```
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access (
|
||||
datekey INT,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey) (START ("1") END ("5") EVERY (1)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES ("replication_num" = "1"
|
||||
);
|
||||
```
|
||||
- The data type of the partitioning column is DATE and you specify different EVERY clauses for different time ranges (which cannot overlap with each other). Example:
|
||||
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
```SQL
|
||||
CREATE TABLE site_access(
|
||||
datekey DATE,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey)
|
||||
(
|
||||
START ("2019-01-01") END ("2021-01-01") EVERY (INTERVAL 1 YEAR),
|
||||
START ("2021-01-01") END ("2021-05-01") EVERY (INTERVAL 1 MONTH),
|
||||
START ("2021-05-01") END ("2021-05-04") EVERY (INTERVAL 1 DAY)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES(
|
||||
"replication_num" = "1"
|
||||
);
|
||||
```
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p2019 VALUES [('2019-01-01'), ('2020-01-01')),
|
||||
PARTITION p2020 VALUES [('2020-01-01'), ('2021-01-01')),
|
||||
PARTITION p202101 VALUES [('2021-01-01'), ('2021-02-01')),
|
||||
PARTITION p202102 VALUES [('2021-02-01'), ('2021-03-01')),
|
||||
PARTITION p202103 VALUES [('2021-03-01'), ('2021-04-01')),
|
||||
PARTITION p202104 VALUES [('2021-04-01'), ('2021-05-01')),
|
||||
PARTITION p20210501 VALUES [('2021-05-01'), ('2021-05-02')),
|
||||
PARTITION p20210502 VALUES [('2021-05-02'), ('2021-05-03')),
|
||||
PARTITION p20210503 VALUES [('2021-05-03'), ('2021-05-04'))
|
||||
)
|
||||
```
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
|
||||
- After a table is created, you can use the [ALTER TABLE](../sql-reference/sql-statements/data-definition/ALTER%20TABLE.md) statement to add partitions for the table.
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p2019 VALUES [('2019-01-01'), ('2020-01-01')),
|
||||
PARTITION p2020 VALUES [('2020-01-01'), ('2021-01-01')),
|
||||
PARTITION p202101 VALUES [('2021-01-01'), ('2021-02-01')),
|
||||
PARTITION p202102 VALUES [('2021-02-01'), ('2021-03-01')),
|
||||
PARTITION p202103 VALUES [('2021-03-01'), ('2021-04-01')),
|
||||
PARTITION p202104 VALUES [('2021-04-01'), ('2021-05-01')),
|
||||
PARTITION p20210501 VALUES [('2021-05-01'), ('2021-05-02')),
|
||||
PARTITION p20210502 VALUES [('2021-05-02'), ('2021-05-03')),
|
||||
PARTITION p20210503 VALUES [('2021-05-03'), ('2021-05-04'))
|
||||
)
|
||||
```
|
||||
|
||||
```SQL
|
||||
ALTER TABLE site_access
|
||||
ADD PARTITIONS START ("2021-01-04") END ("2021-01-06") EVERY (INTERVAL 1 DAY);
|
||||
```
|
||||
- The data type of the partitioning column is INT and you specify the value range of partitioning by using START and END and define the incremental value via EVERY. Example:
|
||||
> **NOTE**
|
||||
>
|
||||
> Do not double quote the incremental value defined by EVERY.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access (
|
||||
datekey INT,
|
||||
site_id INT,
|
||||
city_code SMALLINT,
|
||||
user_name VARCHAR(32),
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
ENGINE=olap
|
||||
DUPLICATE KEY(datekey, site_id, city_code, user_name)
|
||||
PARTITION BY RANGE (datekey) (START ("1") END ("5") EVERY (1)
|
||||
)
|
||||
DISTRIBUTED BY HASH(site_id) BUCKETS 10
|
||||
PROPERTIES ("replication_num" = "1"
|
||||
);
|
||||
```
|
||||
|
||||
The PARTITION BY RANGE clause in this example is equal to the following:
|
||||
|
||||
```SQL
|
||||
PARTITION BY RANGE (datekey) (
|
||||
PARTITION p2019 VALUES [('2019-01-01'), ('2020-01-01')),
|
||||
PARTITION p2020 VALUES [('2020-01-01'), ('2021-01-01')),
|
||||
PARTITION p202101 VALUES [('2021-01-01'), ('2021-02-01')),
|
||||
PARTITION p202102 VALUES [('2021-02-01'), ('2021-03-01')),
|
||||
PARTITION p202103 VALUES [('2021-03-01'), ('2021-04-01')),
|
||||
PARTITION p202104 VALUES [('2021-04-01'), ('2021-05-01')),
|
||||
PARTITION p20210501 VALUES [('2021-05-01'), ('2021-05-02')),
|
||||
PARTITION p20210502 VALUES [('2021-05-02'), ('2021-05-03')),
|
||||
PARTITION p20210503 VALUES [('2021-05-03'), ('2021-05-04'))
|
||||
)
|
||||
```
|
||||
|
||||
- After a table is created, you can use the [ALTER TABLE](../sql-reference/sql-statements/data-definition/ALTER%20TABLE.md) statement to add partitions for the table.
|
||||
|
||||
```SQL
|
||||
ALTER TABLE site_access
|
||||
ADD PARTITIONS START ("2021-01-04") END ("2021-01-06") EVERY (INTERVAL 1 DAY);
|
||||
```
|
||||
|
||||
### Add a partition
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,162 @@
|
|||
# Automatic partitioning
|
||||
|
||||
This topic describes how to create a table that supports automatic partitioning. This topic also describes the usage notes and limits of automatic partitioning.
|
||||
|
||||
## Introduction
|
||||
|
||||
To make partitions creation more easy to use and flexible, StarRocks supports the partitioning expression and automatic partitioning since version 3.0. You only need to specify a partition column of the DATE or DATETIME data type and a partition granularity (year, month, day, or hour) in the partition expression, which includes a time function. With this implicit partitioning method implemented by using a expression, you do not need to create a large number of partitions in advance. StarRocks automatically creates partitions when new data is written. It is recommended that you prioritize using automatic partitioning.
|
||||
|
||||
## Enable automatic partitioning
|
||||
|
||||
### Syntax
|
||||
|
||||
The `PARTITION BY` clause contains a function expression that specifies the partition granularity and partition column for automatic partitioning.
|
||||
|
||||
```SQL
|
||||
PARTITION BY date_trunc(<time_unit>,<partition_column_name>)
|
||||
...
|
||||
[PROPERTIES("partition_live_number" = "xxx")];
|
||||
```
|
||||
|
||||
Or
|
||||
|
||||
```SQL
|
||||
PARTITION BY time_slice(<partition_column_name>,INTERVAL N <time_unit>[, boundary]))
|
||||
...
|
||||
[PROPERTIES("partition_live_number" = "xxx")];
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- Functions: Currently, only the [date_trunc](../sql-reference/sql-functions/date-time-functions/date_trunc.md) and [time_slice](../sql-reference/sql-functions/date-time-functions/time_slice.md) functions are supported. If you use the function `time_slice`, you do not need to pass the `boundary` parameter. It is because in this scenario, the default and valid value for this parameter is `floor`, and the value can not be `ceil`.
|
||||
- `time_unit`: the partition granularity, which can be `hour`, `day`, `month` or `year`. The `week` partition granularity is not supported. If the partition granularity is `hour`, the partition column must be of the DATETIME data type and cannot be of the DATE data type.
|
||||
- `partition_column_name`: the name of the partition column. The partition type is RANGE, and therefore the partition column can only be of the DATE or DATETIME data type. Currently, you can specify only one partition column and multiple partition columns are not supported. If the `date_trunc` function is used, the partition column can be of the DATE or DATETIME data type. If the `time_slice` function is used, the partition column must be of the DATETIME data type. The partition column allows `NULL` values. If the partition column is of the DATE data type, the supported range is [0000-01-01~9999-12-31]. If the partition column is of the DATETIME data type, the supported range is [0000-01-01 01:01:019999-12-31 23:59:59].
|
||||
- `partition_live_number`: the number of the most recent partitions to be retained. "Recent" refers to sorting the partitions in chronological order, and then keeping the number of partitions that counted backwards, while deleting the rest of the partitions. StarRocks schedules tasks to manage the number of partitions, and the scheduling interval can be configured through the FE dynamic parameter `dynamic_partition_check_interval_seconds`, which defaults to 600 seconds (10 minutes).
|
||||
|
||||
## Examples
|
||||
|
||||
Example 1: Use the `date_trunc` function to create a table that supports automatic partitioning. Set the partition granularity to `day` and the partition column to `event_day`.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access (
|
||||
event_day DATETIME NOT NULL,
|
||||
site_id INT DEFAULT '10',
|
||||
city_code VARCHAR(100),
|
||||
user_name VARCHAR(32) DEFAULT '',
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
DUPLICATE KEY(event_day, site_id, city_code, user_name)
|
||||
PARTITION BY date_trunc('day', event_day)
|
||||
DISTRIBUTED BY HASH(event_day, site_id)
|
||||
PROPERTIES(
|
||||
"replication_num" = "1"
|
||||
);
|
||||
```
|
||||
|
||||
When the following two data rows are inserted, StarRocks automatically creates two partitions, `p20230226` and `p20230227`, whose ranges are [2023-02-26 00:00:00, 2023-02-27 00:00:00) and [2023-02-27 00:00:00, 2023-02-28 00:00:00) respectively.
|
||||
|
||||
```SQL
|
||||
-- insert two data rows
|
||||
INSERT INTO site_access VALUES
|
||||
("2023-02-26 20:12:04",002,"New York","Sam Smith",1),
|
||||
("2023-02-27 21:06:54",001,"Los Angeles","Taylor Swift",1);
|
||||
|
||||
-- view partitions
|
||||
SHOW PARTITIONS FROM site_access\G
|
||||
*************************** 1. row ***************************
|
||||
PartitionId: 135846228
|
||||
PartitionName: p20230226
|
||||
VisibleVersion: 2
|
||||
VisibleVersionTime: 2023-03-22 14:50:17
|
||||
VisibleVersionHash: 0
|
||||
State: NORMAL
|
||||
PartitionKey: event_day
|
||||
Range: [types: [DATETIME]; keys: [2023-02-26 00:00:00]; ..types: [DATETIME]; keys: [2023-02-27 00:00:00]; )
|
||||
DistributionKey: event_day, site_id
|
||||
Buckets: 6
|
||||
ReplicationNum: 1
|
||||
StorageMedium: HDD
|
||||
CooldownTime: 9999-12-31 23:59:59
|
||||
LastConsistencyCheckTime: NULL
|
||||
DataSize: 0B
|
||||
IsInMemory: false
|
||||
RowCount: 0
|
||||
*************************** 2. row ***************************
|
||||
PartitionId: 135846215
|
||||
PartitionName: p20230227
|
||||
VisibleVersion: 2
|
||||
VisibleVersionTime: 2023-03-22 14:50:17
|
||||
VisibleVersionHash: 0
|
||||
State: NORMAL
|
||||
PartitionKey: event_day
|
||||
Range: [types: [DATETIME]; keys: [2023-02-27 00:00:00]; ..types: [DATETIME]; keys: [2023-02-28 00:00:00]; )
|
||||
DistributionKey: event_day, site_id
|
||||
Buckets: 6
|
||||
ReplicationNum: 1
|
||||
StorageMedium: HDD
|
||||
CooldownTime: 9999-12-31 23:59:59
|
||||
LastConsistencyCheckTime: NULL
|
||||
DataSize: 0B
|
||||
IsInMemory: false
|
||||
RowCount: 0
|
||||
2 rows in set (0.00 sec)
|
||||
```
|
||||
|
||||
Example 2: Use the `date_trunc` function to create a table that supports automatic partitioning. Set the partition granularity to `month` and the partition column to `event_day`. Additionally, create some historical partitions before loading data and specify that the table only retains the most recent three partitions.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access(
|
||||
event_day DATETIME NOT NULL,
|
||||
site_id INT DEFAULT '10',
|
||||
city_code VARCHAR(100),
|
||||
user_name VARCHAR(32) DEFAULT '',
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
DUPLICATE KEY(event_day, site_id, city_code, user_name)
|
||||
PARTITION BY date_trunc('month', event_day)(
|
||||
START ("2022-06-01") END ("2022-12-01") EVERY (INTERVAL 1 month)
|
||||
)
|
||||
DISTRIBUTED BY HASH(event_day, site_id) BUCKETS 32
|
||||
PROPERTIES(
|
||||
"partition_live_number" = "3",
|
||||
"replication_num" = "1"
|
||||
);
|
||||
```
|
||||
|
||||
Example 3: Use the `time_slice` function to create a table that supports automatic partitioning. Set the partition granularity to 7 days and the partition column to `event_day`.
|
||||
|
||||
```SQL
|
||||
CREATE TABLE site_access(
|
||||
event_day DATETIME NOT NULL,
|
||||
site_id INT DEFAULT '10',
|
||||
city_code VARCHAR(100),
|
||||
user_name VARCHAR(32) DEFAULT '',
|
||||
pv BIGINT DEFAULT '0'
|
||||
)
|
||||
DUPLICATE KEY(event_day, site_id, city_code, user_name)
|
||||
PARTITION BY time_slice(event_day, INTERVAL 7 day)
|
||||
DISTRIBUTED BY HASH(event_day, site_id) BUCKETS 32
|
||||
PROPERTIES("replication_num" = "1");
|
||||
```
|
||||
|
||||
## Usage notes
|
||||
|
||||
- StarRocks automatically creates partitions and sets the start time and end time of the partitions based on the loaded data and the automatic partitioning rule configured at table creation. For example, if the value of the partition column for the data row is `2015-06-05`, and the partition granularity is `month`, then a partition named `p201506` is created with a range of [2015-06-01, 2015-07-01) rather than [2015-06-05, 2015-07-05).
|
||||
- For tables that support automatic partitioning, StarRocks sets the default maximum number of automatically created partitions to 4096, which can be configured by the FE parameter `max_automatic_partition_number`. This parameter can prevent you from accidentally creating too many partitions, such as when specifying a partition column of DATETIME type with a too-fine partition granularity like `hour`, which can generate a large number of partitions.
|
||||
- During data loading, StarRocks automatically creates some partitions based on the loaded data, but if the load job fails for some reason, the partitions that are automatically created by StarRocks cannot be automatically deleted.
|
||||
- Note that the `PARTITION BY` clause is only used to calculate the partition range for the loaded data and does not change the values of the data. For example, if the original data is `2023-02-27 21:06:54`, the function expression in `PARTITION BY date_trunc('day', event_day)` computes it as 2023-02-27 00:00:00 and infers that it belongs to the partition range [2023-02-27 00:00:00, 2023-02-28 00:00:00), but the data is still written as `2023-02-27 21:06:54`. If you want the written data's value to be the same as the start time of the partition range, you need to use the function specified in the `PARTITION BY` clause, such as `date_trunc`, on the `event_day` column when creating a load job.
|
||||
- The naming rules for automatic partitioning are consistent with the naming rules for dynamic partitioning.
|
||||
|
||||
## Limits
|
||||
|
||||
- Only the range partitioning type is supported, whereas the list partitioning type is not supported.
|
||||
- When a table that supports automatic partitioning is created, it is generally not recommended to create partitions in advance. If you need to create partitions in advance, you can create multiple partitions all at a time, as shown in the preceding Example 2. The statement in Example 2 has the following limits:
|
||||
- The granularity of the partitions created in advance must be consistent with that of the automatically created partitions.
|
||||
- When you configure automatic partitioning, you can only use the function `date_trunc` rather than `time_slice`.
|
||||
- The syntax for creating multiple partitions all at a time only supports an interval of `1`.
|
||||
- After a table that supports automatic partitioning is created, you can use `ALTER TABLE ADD PARTITION` to add partitions for that table. And the statement `ALTER TABLE ADD PARTITION` also has the above limits.
|
||||
- Currently, StarRocks's shared-data mode does not support this feature.
|
||||
- Currently, using CTAS to create a table that supports automatic partitioning is not supported.
|
||||
- Currently, using Spark Load to load data to tables that support automatic partitioning is not supported.
|
||||
- If you use automatic partitioning, you can only roll back your StarRocks cluster to version 2.5.4 or later.
|
||||
- To view the specific information of automatically created partitions, use the SHOW PARTITIONS FROM statement, rather than the SHOW CREATE TABLE statement.
|
||||
Loading…
Reference in New Issue