[fix chinese char in doc] (#22159)

Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com>
This commit is contained in:
evelyn.zhaojie 2023-04-21 19:06:35 +08:00 committed by GitHub
parent 4fff6e89c0
commit 6d7cc91ff5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
77 changed files with 254 additions and 275 deletions

View File

@ -68,7 +68,7 @@ Learn more 👉🏻 [Introduction to StarRocks](https://www.starrocks.io/blog/in
</a>
</p>
StarRockss streamlined architecture is mainly composed of two modulesFrontend (FE) and Backend (BE). The entire system eliminates single points of failure through seamless and horizontal scaling of FE and BE, as well as replication of metadata and data.
StarRockss streamlined architecture is mainly composed of two modules: Frontend (FE) and Backend (BE). The entire system eliminates single points of failure through seamless and horizontal scaling of FE and BE, as well as replication of metadata and data.
<br>

View File

@ -1,63 +1,65 @@

| Role | Responsibilities | Requirements |
| -----| ---------------- | ------------ |
| -----| ---------------- | ------------ |
| Contributor | Active participant in the community | Finish at least one contribution to the project in specific repos |
| Active Contributor | Active contributor in the community | Have 5 merged PRs or fixed major bugs|
| Committer | Contributions acceptance approval| Have proven track record of contributions and lead major development. Nominees must be approved by a majority vote of the TSC voting members.|
| Maintainer | A maintainer makes and approves technical design decisions. Define milestones and releases. Elect new community members.| Highly experienced committer. Nominees must be approved by a majority vote of the TSC voting members.|
| Active Contributor | Active contributor in the community | Have 5 merged PRs or fixed major bugs|
| Committer | Contributions acceptance approval| Have proven track record of contributions and lead major development. Nominees must be approved by a majority vote of the TSC voting members.|
| Maintainer | A maintainer makes and approves technical design decisions. Define milestones and releases. Elect new community members.| Highly experienced committer. Nominees must be approved by a majority vote of the TSC voting members.|
**Note:** This document is a work in progress.
This doc outlines the various responsibilities of contributor roles in StarRocks.
## Contributor
Everyone who contributes can become a StarRocks contributor. The members will provide mentorship and guidance when new contributors need assistance.
### How to become a Contributor?
- 1 merged PR (pull request) in StarRocks' repositories
### As a Contributor, we expect you to
- Actively participate in the development of the StarRocks project
- Attend community events such as meetups and hackathons
- Continuously learn about StarRocks-related technologies and share your knowledge with others
- 1 merged PR (pull request) in StarRocks' repositories
### Privileges:
- Be recognized as a StarRocks Contributor
### As a Contributor, we expect you to:
- Actively participate in the development of the StarRocks project
- Attend community events such as meetups and hackathons
- Continuously learn about StarRocks-related technologies and share your knowledge with others
### Privileges
- Be recognized as a StarRocks Contributor
## Active Contributor
Active contributors are contributors who have made outstanding contributions and sustained commitment to StarRocks. They actively participate in the community by contributing code, improving docs and helping others.
### How to become an Active Contributor?
- Have at least 5 merged PRs or have fixed major bugs
- Participate in at least 5 code reviews
- Actively engage with the community by attending online or offline meetups and participating in community discussions
- Have at least 5 merged PRs or have fixed major bugs
- Participate in at least 5 code reviews
- Actively engage with the community by attending online or offline meetups and participating in community discussions
### Responsibilities and privileges:
- Join the community meeting and discussion
- Mentor and guide new contributors
- Be recognized as a StarRocks Active Contributor
### Responsibilities and privileges
- Join the community meeting and discussion
- Mentor and guide new contributors
- Be recognized as a StarRocks Active Contributor
## Committer
Committers are Contributors who have earned the ability to modify (“commit”) source code, documentation or other technical artifacts in a projects repository.
Committers are Contributors who have earned the ability to modify ("commit") source code, documentation or other technical artifacts in a projects repository.
### How to become a Committer?
- Have a deep understanding of StarRocks' principles and future plans
- Have the ability to deal with various issues that arise in the project promptly
- Lead at least one major development, write and revise related documents
- A contributor may become a Committer by a majority approval of the voting members of the TSC as defined in the Technical Charter.
- Have a deep understanding of StarRocks' principles and future plans
- Have the ability to deal with various issues that arise in the project promptly
- Lead at least one major development, write and revise related documents
- A contributor may become a Committer by a majority approval of the voting members of the TSC as defined in the Technical Charter.
### Responsibilities and privileges:
- Mentor and guide other members in the community
- Ensure continued health of subproject
- Be granted write access to StarRocks repos (to be specified)
- Be recognized as a StarRocks Committer
### Responsibilities and privileges
- Mentor and guide other members in the community
- Ensure continued health of subproject
- Be granted write access to StarRocks repos (to be specified)
- Be recognized as a StarRocks Committer
## Maintainer
@ -65,18 +67,16 @@ Maintainers are a subset of Committers with additional responsibilities for driv
### How to become a PMC?
- In-depth understanding of StarRocks principles and a clear understanding of StarRocks' future plans
- Have the ability to deal with project issues promptly
- Lead project development and iterations, and steer the overall direction of the project
- A Committer may become a Maintainer by a majority approval of the existing Maintainers as defined in the Technical Charter.
- In-depth understanding of StarRocks principles and a clear understanding of StarRocks' future plans
- Have the ability to deal with project issues promptly
- Lead project development and iterations, and steer the overall direction of the project
- A Committer may become a Maintainer by a majority approval of the existing Maintainers as defined in the Technical Charter.
### Responsibilities and privileges
### Responsibilities and privileges:
- Mentor and guide other members in the community
- Ensure continued health of the project, such as code quality and test coverage
- Make and approve technical design decisions
- Define milestones and releases
- Vote and promote new TSC members (Committers and Maintainers)
- Be recognized as a StarRocks Maintainer
- Mentor and guide other members in the community
- Ensure continued health of the project, such as code quality and test coverage
- Make and approve technical design decisions
- Define milestones and releases
- Vote and promote new TSC members (Committers and Maintainers)
- Be recognized as a StarRocks Maintainer

View File

@ -44,7 +44,7 @@ ADD SQLBLACKLIST "select count(\\*) from .+"
ADD SQLBLACKLIST "select count(distinct .+) from .+"
~~~
* Prohibit order by limit x, y1 <= x <=7, 5 <=y <=7:
* Prohibit order by limit x, y, 1 <= x <=7, 5 <=y <=7:
~~~sql
ADD SQLBLACKLIST "select id_int from test_all_type_select1 order by id_int limit [1-7], [5-7]"

View File

@ -8,7 +8,7 @@
* TabletChecker (TC): A resident background thread that periodically scans all Tablets and decides whether to send tablets to TabletScheduler based on their status.
* TabletScheduler (TS): A resident background thread that processes tablets sent by TabletChecker, and also performs cluster replica balancing.
* TabletSchedCtx (TSC): A wrapper for a tablet. When a tablet is selected by TC, it is encapsulated as a TSC and sent to TS.
* Storage MediumStarRocks supports different storage media for partition granularity, including SSD and HDD. The scheduling of replicas varies for different storage media.
* Storage Medium: StarRocks supports different storage media for partition granularity, including SSD and HDD. The scheduling of replicas varies for different storage media.
## Status

View File

@ -25,7 +25,7 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ (
`user` VARCHAR(64) COMMENT "User who initiates the query",
`resourceGroup` VARCHAR(64) COMMENT "Resource group name",
`db` VARCHAR(96) COMMENT "Database that the query scans",
`state` VARCHAR(8) COMMENT "Query state (EOFERROK)",
`state` VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)",
`errorCode` VARCHAR(96) COMMENT "Error code",
`queryTime` BIGINT COMMENT "Query latency in milliseconds",
`scanBytes` BIGINT COMMENT "Size of the scanned data in bytes",
@ -67,7 +67,7 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ (
`user` VARCHAR(64) COMMENT "User who initiates the query",
`resourceGroup` VARCHAR(64) COMMENT "Resource group name",
`db` VARCHAR(96) COMMENT "Database that the query scans",
`state` VARCHAR(8) COMMENT "Query state (EOFERROK)",
`state` VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)",
`errorCode` VARCHAR(96) COMMENT "Error code",
`queryTime` BIGINT COMMENT "Query latency in milliseconds",
`scanBytes` BIGINT COMMENT "Size of the scanned data in bytes",
@ -109,7 +109,7 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__
client_ip VARCHAR(32) COMMENT "Client IP address",
user VARCHAR(64) COMMENT "User who initiates the query",
db VARCHAR(96) COMMENT "Database that the query scans",
state VARCHAR(8) COMMENT "Query state (EOFERROK)",
state VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)",
query_time BIGINT COMMENT "Query latency in milliseconds",
scan_bytes BIGINT COMMENT "Size of the scanned data in bytes",
scan_rows BIGINT COMMENT "Row count of the scanned data",
@ -147,7 +147,7 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__
client_ip VARCHAR(32) COMMENT "Client IP address",
user VARCHAR(64) COMMENT "User who initiates the query",
db VARCHAR(96) COMMENT "Database that the query scans",
state VARCHAR(8) COMMENT "Query state (EOFERROK)",
state VARCHAR(8) COMMENT "Query state (EOFE, RR, OK)",
query_time BIGINT COMMENT "Query latency in milliseconds",
scan_bytes BIGINT COMMENT "Size of the scanned data in bytes",
scan_rows BIGINT COMMENT "Row count of the scanned data",
@ -183,7 +183,7 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__
client_ip VARCHAR(32) COMMENT "Client IP address",
user VARCHAR(64) COMMENT "User who initiates the query",
db VARCHAR(96) COMMENT "Database that the query scans",
state VARCHAR(8) COMMENT "Query state (EOFERROK)",
state VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)",
query_time BIGINT COMMENT "Query latency in milliseconds",
scan_bytes BIGINT COMMENT "Size of the scanned data in bytes",
scan_rows BIGINT COMMENT "Row count of the scanned data",

View File

@ -61,9 +61,9 @@ This section describes privileges that are available on different objects.
| Object | Privilege | Description |
| --------------------------- | ----------------------------------------------------- | ---------------------------------------------- |
| CATALOGinternal catalog) | USAGE | Uses the internal catalog (default_catalog). |
| CATALOGinternal catalog) | CREATE DATABASE | Creates databases in the internal catalog. |
| CATALOGinternal catalog) | ALL | Has all the above privileges on the internal catalog. |
| CATALOG (internal catalog) | USAGE | Uses the internal catalog (default_catalog). |
| CATALOG (internal catalog) | CREATE DATABASE | Creates databases in the internal catalog. |
| CATALOG (internal catalog) | ALL | Has all the above privileges on the internal catalog. |
| CATALOG (external catalog) | USAGE | Uses an external catalog to view tables in it. |
| CATALOG (external catalog) | DROP | Deletes an external catalog. |
| CATALOG (external catalog) | ALL | Has all the above privileges on the external catalog. |

View File

@ -103,7 +103,7 @@ A classifier matches a query only when one or all conditions of the classifier m
>
> If a query does not hit any classifiers, the default resource group `default_wg` is used. The resource limits of `default_wg` are as follows:
>
> - `cpu_core_limit`: 1 (<= v2.3.7) or the number of CPU cores in BE> v2.3.7)
> - `cpu_core_limit`: 1 (<= v2.3.7) or the number of CPU cores in BE (> v2.3.7)
> - `mem_limit`: 100%
> - `concurrency_limit`: 0
> - `big_query_cpu_second_limit`: 0
@ -139,7 +139,7 @@ classifier B (user='Alice', source_ip = '192.168.1.0/24')
-- Classifier C has fewer query types specified in it than Classifier D. Therefore, Classifier C has a higher degree of matching than Classifier D.
classifier C (user='Alice', query_type in ('select'))
classifier D (user='Alice', query_type in ('insert','select')
classifier D (user='Alice', query_type in ('insert','select'))
```
## Isolate computing resources
@ -224,7 +224,7 @@ SHOW RESOURCE GROUPS;
Execute the following statement to query a specified resource group and its classifiers:
```SQL
SHOW RESOURCE GROUP group_name
SHOW RESOURCE GROUP group_name;
```
Example:

View File

@ -6,7 +6,7 @@ StarRocks supports access to other data sources by using external tables. Extern
>
> Since v3.0, we recommend that you use catalogs to query Hive, Iceberg, and Hudi data. See [Hive catalog](../data_source/catalog/hive_catalog.md), [Iceberg catalog](../data_source/catalog/iceberg_catalog.md), and [Hudi catalog](../data_source/catalog/hudi_catalog.md).
From 2.5 onwards, StarRocks provides the Local Cache feature, which accelerates hot data queriers on external data sources. For more information, see [Local Cache](Block_cache.md)
From 2.5 onwards, StarRocks provides the Local Cache feature, which accelerates hot data queriers on external data sources. For more information, see [Local Cache](Block_cache.md).
## MySQL external table
@ -94,7 +94,7 @@ insert into external_t values ('2020-10-11', 1, 1, 'hello', '2020-10-11 10:00:00
insert into external_t select * from other_table;
~~~
Parameters
Parameters:
* **EXTERNAL:** This keyword indicates that the table to be created is an external table.
* **host:** This parameter specifies the IP address of the leader FE node of the destination StarRocks cluster.
@ -472,7 +472,7 @@ The required parameters in `properties` are as follows:
* `resource`: the name of the JDBC resource used to create the external table.
* `table`the target table name in the database.
* `table`: the target table name in the database.
For supported data types and data type mapping between StarRocks and target databases, see [Data type mapping](External_table.md#Data type mapping).
@ -696,13 +696,13 @@ select count(*) from profile_wos_p7;
### Configuration
* The path of the FE configuration file is `fe/conf`, to which the configuration file can be added if you need to customize the Hadoop cluster. For example: HDFS cluster uses a highly available nameservice, you need to put `hdfs-site.xml` under `fe/conf`. If HDFS is configured with viewfs, you need to put the `core-site.xml` under `fe/conf`.
* The path of the FE configuration file is `fe/conf`, to which the configuration file can be added if you need to customize the Hadoop cluster. For example: HDFS cluster uses a highly available nameservice, you need to put `hdfs-site.xml` under `fe/conf`. If HDFS is configured with viewfs, you need to put the `core-site.xml` under `fe/conf`.
* The path of the BE configuration file is `be/conf`, to which configuration file can be added if you need to customize the Hadoop cluster. For example, HDFS cluster using a highly available nameservice, you need to put `hdfs-site.xml` under `be/conf`. If HDFS is configured with viewfs, you need to put `core-site.xml` under `be/conf`.
* The machine where BE is located need to configure JAVA_HOME as a jdk environment rather than a jre environment
* kerberos supports:
1. To log in with `kinit -kt keytab_path principal` to all FE/BE machines, you need to have access to Hive and HDFS. The kinit command login is only good for a period of time and needs to be put into crontab to be executed regularly.
2. Put `hive-site.xml/core-site.xml/hdfs-site.xml` under `fe/conf`, and put `core-site.xml/hdfs-site.xml` under `be/conf`.
3. Add **Djava.security.krb5.conf:/etc/krb5.conf** to the **JAVA_OPTS/JAVA_OPTS_FOR_JDK_9** option of the **fe/conf/fe.conf** file. **/etc/krb5.conf** is the path of the **krb5.conf** file. You can adjust the path based on your operating system.
3. Add **Djava.security.krb5.conf:/etc/krb5.conf** to the **JAVA_OPTS/JAVA_OPTS_FOR_JDK_9** option of the **fe/conf/fe.conf** file. **/etc/krb5.conf** is the path of the **krb5.conf** file. You can adjust the path based on your operating system.
4. When you add a Hive resource, you must pass in a domain name to `hive.metastore.uris`. In addition, you must add the mapping between Hive/HDFS domain names and IP addresses in the **/etc/hosts** file*.*
* Configure support for AWS S3: Add the following configuration to `fe/conf/core-site.xml` and `be/conf/core-site.xml`.

View File

@ -22,7 +22,7 @@ This change does not affect your download behavior or use of StarRocks. You can
> **NOTE**
>
> Replace `<sr_ver>` with the version number of the StarRocks installation package you want to download
> Replace `<sr_ver>` with the version number of the StarRocks installation package you want to download.
2. Load the debuginfo file when you perform GDB debugging.

View File

@ -72,7 +72,7 @@ export PYTHON=/usr/bin/python3
`git clone https://github.com/StarRocks/starrocks.git`
**Install required tools for compilation**
**Install required tools for compilation**:
```bash
sudo apt update
@ -103,7 +103,7 @@ The first time compile needs to compile thirdparty, it will require some time.
### FE
FE development is simple because you can compile it in MacOS directly. Just enter `fe` folder and run the command `mvn install -DskipTests`
FE development is simple because you can compile it in MacOS directly. Just enter `fe` folder and run the command `mvn install -DskipTests`.
Then you can use IDEA to open `fe` folder directly, everything is ok.
@ -143,7 +143,7 @@ STARROCKS_GCC_HOME=/usr/
STARROCKS_THIRDPARTY=/root/starrocks/thirdparty
```
Notice: Be careful not to check `Include system environment variables`
Notice: Be careful not to check `Include system environment variables`.
<img src="../../assets/ide-4.png" alt="ide-4" style="zoom:50%;" />
@ -189,7 +189,7 @@ Then just run `./bin/start_be.sh` without any flag.
Of course, you can use LLVM tools to development be.
Ubuntu LLVM installtion refer tohttps://apt.llvm.org/
Ubuntu LLVM installtion refer to: https://apt.llvm.org/
Then use the command: `CC=clang-15 CXX=clang++-15 ./build.sh` to compile be. But the premise is that your thirdparty has been compiled with gcc.

View File

@ -50,8 +50,8 @@ PROPERTIES ("key"="value", ...);
```plain text
PROPERTIES currently supports the following properties:
"type" = "full"indicates that this is a full update (default).
"timeout" = "3600"task timeout. The default is one day. The unit is seconds.
"type" = "full": indicates that this is a full update (default).
"timeout" = "3600": task timeout. The default is one day. The unit is seconds.
```
StarRocks does not support full database backup at present. We need to specify the tables or partitions to be backed up ON (...), and these tables or partitions will be backed up in parallel.

View File

@ -10,7 +10,7 @@ Both are variable-length data types. When you store data of the same length, VAR
Yes.
## Why do TXT files imported from Oracle still appear garbled after I set the character set to UTF-8
## Why do TXT files imported from Oracle still appear garbled after I set the character set to UTF-8?
To solve this problem, perform the following steps:
@ -18,7 +18,7 @@ To solve this problem, perform the following steps:
```Plain%20Text
file --mime-encoding origin.txt
origin.txtiso-8859-1
origin.txt: iso-8859-1
```
2. Run the `iconv` command to convert the character set of this file into UTF-8.
@ -59,7 +59,7 @@ ALTER DATABASE example_db SET DATA QUOTA 10T;
StarRocks 2.2 and later support updating specific fields in a table by using the primary key model. StarRocks 1.9 and later support updating all fields in a table by using the primary key model. For more information, see [Primary key model](../table_design/Data_model.md#primary-key-model) in StarRocks 2.2.
## How to swap the data between two tables or two partitions
## How to swap the data between two tables or two partitions?
Execute the SWAP WITH statement to swap the data between two tables or two partitions. The SWAP WITH statement is more secure than the INSERT OVERWRITE statement. Before you swap the data, check the data first and then see whether the data after the swapping is consistent with the data before the swapping.
@ -210,7 +210,7 @@ The speed to read disks for the first query relates to the performance of disks.
StarRocks supports single node deployment, so you need to configure at least one BE. BEs need to be run with AVX2, so we recommend that you deploy BEs on machines with 8-core and 16GB or higher configurations.
## How to set data permissions when I use Apache Superset to visualize the data in StarRocks
## How to set data permissions when I use Apache Superset to visualize the data in StarRocks?
You can create a new user account and then set the data permission by granting permissions on the table query to the user.

View File

@ -14,7 +14,7 @@ StarRocks does not directly cache final query results. From v2.5 onwards, StarRo
In standard SQL, every calculation that includes an operand with a `NULL` value returns a `NULL`.
## Why is the query result incorrect after I enclose quotation marks around a value of the BIGINT data type for an equivalence query
## Why is the query result incorrect after I enclose quotation marks around a value of the BIGINT data type for an equivalence query?
### Problem description
@ -114,7 +114,7 @@ StarRocks is a distributed database, of which data stored in the underlying tabl
## Why is there a large gap in column efficiency between SELECT * and SELECT?
To solve this problem, check the profile and see MERGE details
To solve this problem, check the profile and see MERGE details:
- Check whether the aggregation on the storage layer takes up too much time.

View File

@ -69,15 +69,15 @@ This package needs to be obtained through Aliyun mirror address.
**Solution:**
Please make sure that the mirror part of /etc/maven/settings.xml is all configured to be obtained through Aliyun mirror address.
Please make sure that the mirror part of `/etc/maven/settings.xml` is all configured to be obtained through Aliyun mirror address.
If it is, please change it to the following:
If it is, change it to the following:
<mirror>
<id>aliyunmaven </id>
<mirrorf>central</mirrorf>
<name>阿里云公共仓库</name>
<url>https//maven.aliyun.com/repository/public</url>
<name>aliyun public repo</name>
<url>https: //maven.aliyun.com/repository/public</url>
</mirror>
## The meaning of parameter sink.buffer-flush.interval-ms in Flink-connector-StarRocks

View File

@ -270,7 +270,7 @@ LOAD LABEL test_db.label_7
WITH BROKER
(
StorageCredentialParams
)
);
```
To load all data files from the `input` folder into `table1`, execute the following statement:
@ -286,7 +286,7 @@ LOAD LABEL test_db.label_8
WITH BROKER
(
StorageCredentialParams
)
);
```
### View a load job

View File

@ -192,7 +192,7 @@ tar zxvf canal.deployer-$version.tar.gz -C /tmp/canal
~~~bash
## mysql serverId
canal.instance.mysql.slaveId = 1234
#position infoneed to change to your own database information
#position info, need to change to your own database information
canal.instance.master.address = 127.0.0.1:3306
canal.instance.master.journal.name =
canal.instance.master.position =
@ -201,7 +201,7 @@ canal.instance.master.timestamp =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#username/passwordneed to change to your own database information
#username/password, need to change to your own database information
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal
canal.instance.defaultDatabaseName =

View File

@ -108,7 +108,7 @@ REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identityREVOKE USAGE_PRIV
* Create resource
**For example**
**For example**:
~~~sql
-- yarn cluster mode

View File

@ -76,9 +76,9 @@ The parameters related to `data_source_properties` and their descriptions are as
| **Parameter** | **Required** | **Description** |
| ------------------------------------------- | ------------ | ------------------------------------------------------------ |
| pulsar_service_url | Yes | The URL that is used to connect to the Pulsar cluster. Format: `"pulsar://ip:port"` or `"pulsar://service:port"`.Example: `"pulsar_service_url" = "pulsar://``localhost:6650``"` |
| pulsar_topic | Yes | Subscribed topic.Example"pulsar_topic" = "persistent://tenant/namespace/topic-name" |
| pulsar_topic | Yes | Subscribed topic. Example: "pulsar_topic" = "persistent://tenant/namespace/topic-name" |
| pulsar_subscription | Yes | Subscription configured for the topic.Example: `"pulsar_subscription" = "my_subscription"` |
| pulsar_partitionspulsar_initial_positions | No | `pulsar_partitions` : Subscribed partitions in the topic.`pulsar_initial_positions`: initial positions of partitions specified by `pulsar_partitions`. The initial positions must correspond to the partitions in `pulsar_partitions`. Valid values:`POSITION_EARLIEST` (Default value): Subscription starts from the earliest available message in the partition. `POSITION_LATEST`Subscription starts from the latest available message in the partition.Note:If `pulsar_partitions` is not specified, the topic's all partitions are subscribed.If both `pulsar_partitions` and `property.pulsar_default_initial_position` are specified, the `pulsar_partitions` value overrides `property.pulsar_default_initial_position` value.If neither `pulsar_partitions` nor `property.pulsar_default_initial_position` is specified, subscription starts from the latest available message in the partition.Example:`"pulsar_partitions" = "my-partition-0,my-partition-1,my-partition-2,my-partition-3", "pulsar_initial_positions" = "POSITION_EARLIEST,POSITION_EARLIEST,POSITION_LATEST,POSITION_LATEST"` |
| pulsar_partitions, pulsar_initial_positions | No | `pulsar_partitions` : Subscribed partitions in the topic.`pulsar_initial_positions`: initial positions of partitions specified by `pulsar_partitions`. The initial positions must correspond to the partitions in `pulsar_partitions`. Valid values:`POSITION_EARLIEST` (Default value): Subscription starts from the earliest available message in the partition. `POSITION_LATEST`: Subscription starts from the latest available message in the partition.Note:If `pulsar_partitions` is not specified, the topic's all partitions are subscribed.If both `pulsar_partitions` and `property.pulsar_default_initial_position` are specified, the `pulsar_partitions` value overrides `property.pulsar_default_initial_position` value.If neither `pulsar_partitions` nor `property.pulsar_default_initial_position` is specified, subscription starts from the latest available message in the partition.Example:`"pulsar_partitions" = "my-partition-0,my-partition-1,my-partition-2,my-partition-3", "pulsar_initial_positions" = "POSITION_EARLIEST,POSITION_EARLIEST,POSITION_LATEST,POSITION_LATEST"` |
Routine Load supports the following custom parameters for Pulsar.

View File

@ -17,7 +17,7 @@
#### Range partitioning
* Reasonable range partitioning can reduce the amount of data for scanning. Taking a data management perspective, we normally choose “time” or “region” as range partition keys.
* Reasonable range partitioning can reduce the amount of data for scanning. Taking a data management perspective, we normally choose "time" or "region" as range partition keys.
* With dynamic partitioning, you can create partitions automatically at regular intervals (on a daily basis).
#### Hash partitioning

View File

@ -18,7 +18,7 @@ SHOW VARIABLES LIKE '%time_zone%';
### Set variables
Variables can generally be set to take effect **globally** or **only on the current session**. When set to global, a new value will be used in subsequent new sessions without affecting the current session. When set to “current session only”, the variable will only take effect on the current session.
Variables can generally be set to take effect **globally** or **only on the current session**. When set to global, a new value will be used in subsequent new sessions without affecting the current session. When set to "current session only", the variable will only take effect on the current session.
A variable set by `SET var_name=xxx;` only takes effect for the current session. For example:
@ -131,8 +131,8 @@ SELECT /*+ SET_VAR
* streaming_preaggregation_mode
Used to specify the preaggregation mode for the first phase of GROUP BY. If the preaggregation effect in the first phase is not satisfactory, you can use the streaming mode, which performs simple data serialization before streaming data to the destinationValid values:
* `auto`The system first tries local preaggregation. If the effect is not satisfactory, it switches to the streaming mode. This is the default value.
Used to specify the preaggregation mode for the first phase of GROUP BY. If the preaggregation effect in the first phase is not satisfactory, you can use the streaming mode, which performs simple data serialization before streaming data to the destination. Valid values:
* `auto`: The system first tries local preaggregation. If the effect is not satisfactory, it switches to the streaming mode. This is the default value.
* `force_preaggregation`: The system directly performs local preaggregation.
* `force_streaming`: The system directly performs streaming.

View File

@ -254,7 +254,7 @@ Release date: July 29, 2022
- The Primary Key model supports complete DELETE WHERE syntax. For more information, see [DELETE](../sql-reference/sql-statements/data-manipulation/DELETE.md#delete-and-primary-key-model).
- The Primary Key model supports persistent primary key indexes. You can choose to persist the primary key index on disk rather than in memory, significantly reducing memory usage. For more information, see [Primary Key model](../table_design/Data_model.md#how-to-use-it-3).
- Global dictionary can be updated during real-time data ingestionoptimizing query performance and delivering 2X query performance for string data.
- Global dictionary can be updated during real-time data ingestion, optimizing query performance and delivering 2X query performance for string data.
- The CREATE TABLE AS SELECT statement can be executed asynchronously. For more information, see [CREATE TABLE AS SELECT](../sql-reference/sql-statements/data-definition/CREATE%20TABLE%20AS%20SELECT.md).
- Support the following resource group-related features:
- Monitor resource groups: You can view the resource group of the query in the audit log and obtain the metrics of the resource group by calling APIs. For more information, see [Monitor and Alerting](../administration/Monitor_and_Alert.md#monitor-and-alerting).

View File

@ -89,7 +89,7 @@ The following bugs are fixed:
- Memory leak caused by a materialized view QeProcessorImpl issue. [#15699](https://github.com/StarRocks/starrocks/pull/15699)
- The results of queries with `limit` are inconsistent. [#13574](https://github.com/StarRocks/starrocks/pull/13574)
- Memory leak caused by INSERT. [#14718](https://github.com/StarRocks/starrocks/pull/14718)
- Primary Key tables executes Tablet Migration[#13720](https://github.com/StarRocks/starrocks/pull/13720)
- Primary Key tables executes Tablet Migration.[#13720](https://github.com/StarRocks/starrocks/pull/13720)
- Broker Kerberos tickets timeout during Broker Load. [#16149](https://github.com/StarRocks/starrocks/pull/16149)
- The `nullable` information is inferred incorrectly in the view of a table. [#15744](https://github.com/StarRocks/starrocks/pull/15744)

View File

@ -2,11 +2,11 @@
## Background
The window function is a special class of built-in functions. Similar to the aggregation function, it also does calculations on multiple input rows to get a single data value. The difference is that the window function processes the input data within a specific window, rather than using the “group by” method. The data in each window can be sorted and grouped using the over() clause. The window function **computes a separate value for each row**, rather than computing one value for each group. This flexibility allows users to add additional columns to the select clause and further filter the result set. The window function can only appear in the select list and the outermost position of a clause. It takes effect at the end of the query, that is, after the `join`, `where`, and `group by` operations are performed. The window function is often used to analyze trends, calculate outliers, and perform bucketing analyses on large-scale data.
The window function is a special class of built-in functions. Similar to the aggregation function, it also does calculations on multiple input rows to get a single data value. The difference is that the window function processes the input data within a specific window, rather than using the "group by" method. The data in each window can be sorted and grouped using the over() clause. The window function **computes a separate value for each row**, rather than computing one value for each group. This flexibility allows users to add additional columns to the select clause and further filter the result set. The window function can only appear in the select list and the outermost position of a clause. It takes effect at the end of the query, that is, after the `join`, `where`, and `group by` operations are performed. The window function is often used to analyze trends, calculate outliers, and perform bucketing analyses on large-scale data.
## Usage
Syntax of the window function
Syntax of the window function:
~~~SQL
function(args) OVER(partition_by_clause order_by_clause [window_clause])
@ -399,7 +399,7 @@ INSERT INTO test_tbl VALUES
(10, NULL);
~~~
Query data from this table, where `offset` is 2, which means traversing the previous two rows; `default` is 0, which means 0 is returned if no matching rows are found
Query data from this table, where `offset` is 2, which means traversing the previous two rows; `default` is 0, which means 0 is returned if no matching rows are found.
Output:
@ -496,14 +496,14 @@ Returns the value of the row that leads the current row by `offset` rows. This f
Data types that can be queried by `lead()` are the same as those supported by [lag()](#lag).
Syntax
Syntax
~~~sql
LEAD(expr [IGNORE NULLS] [, offset[, default]])
OVER([<partition_by_clause>] [<order_by_clause>])
~~~
Parameters
Parameters:
* `expr`: the field you want to compute.
* `offset`: the offset. It must be a positive integer. If this parameter is not specified, 1 is the default.
@ -531,7 +531,7 @@ INSERT INTO test_tbl VALUES
(10, NULL);
~~~
Query data from this table, where `offset` is 2, which means traversing the subsequent two rows; `default` is 0, which means 0 is returned if no matching rows are found
Query data from this table, where `offset` is 2, which means traversing the subsequent two rows; `default` is 0, which means 0 is returned if no matching rows are found.
Output:
@ -589,7 +589,7 @@ For the first row, the value two rows forward is NULL and NULL is ignored becaus
Returns the maximum value of the specified rows in the current window.
Syntax
Syntax
~~~SQL
MAX(expression) [OVER (analytic_clause)]
@ -642,7 +642,7 @@ where property in ('prime','square');
Returns the minimum value of the specified rows in the current window.
Syntax
Syntax:
~~~SQL
MIN(expression) [OVER (analytic_clause)]
@ -897,7 +897,7 @@ The execution order of clauses in a query with QUALIFY is evaluated in the follo
### SUM()
Syntax
Syntax:
~~~SQL
SUM(expression) [OVER (analytic_clause)]

View File

@ -79,14 +79,14 @@ insert into bitmap_table1
select id, bitmap_union(id2) from bitmap_table2 group by id;
```
* id2's column type in source table is int, and the bitmap type is generated by to_bitmap().
* id2's column type in source table is INT, and the bitmap type is generated by to_bitmap().
```SQL
insert into bitmap_table1
select id, to_bitmap(id2) from table;
```
* id2's column type in source table is String, and the bitmap type is generated by bitmap_hash().
* id2's column type in source table is STRING, and the bitmap type is generated by bitmap_hash().
```SQL
insert into bitmap_table1
@ -109,9 +109,9 @@ select id, bitmap_hash(id2) from table;
### Example
The following SQL uses the pv_bitmap table above as an example:
The following SQL uses the `pv_bitmap` table above as an example:
Calculate the deduplicated value for user_id:
Calculate the deduplicated value for `user_id`:
```SQL
select bitmap_union_count(user_id)
@ -121,21 +121,21 @@ select bitmap_count(bitmap_union(user_id))
from pv_bitmap;
```
Calculate the deduplicated value of id:
Calculate the deduplicated value of `id`:
```SQL
select bitmap_union_int(id)
from pv_bitmap;
```
Calculate the retention of user_id:
Calculate the retention of `user_id`:
```SQL
select intersect_count(user_id, page, 'meituan') as meituan_uv,
intersect_count(user_id, page, 'waimai') as waimai_uv,
intersect_count(user_id, page, 'meituan', 'waimai') as retention -- 在 'meituan' 和 'waimai' 两个页面都出现的用户数
select intersect_count(user_id, page, 'game') as game_uv,
intersect_count(user_id, page, 'shopping') as shopping_uv,
intersect_count(user_id, page, 'game', 'shopping') as retention -- Number of users that access both the 'game' and 'shopping' pages
from pv_bitmap
where page in ('meituan', 'waimai');
where page in ('game', 'shopping');
```
## keyword

View File

@ -51,7 +51,7 @@ select * from test order by id;
+------+----------+----------+------------+
~~~
Example 1Count the number of rows in table `test`.
Example 1: Count the number of rows in table `test`.
~~~Plain
select count(*) from test;
@ -62,7 +62,7 @@ Example 1Count the number of rows in table `test`.
+----------+
~~~
Example 2Count the number of values in the `id` column.
Example 2: Count the number of values in the `id` column.
~~~Plain
select count(id) from test;
@ -84,7 +84,7 @@ select count(category) from test;
+-----------------+
~~~
Example 4Count the number of distinct values in the `category` column.
Example 4: Count the number of distinct values in the `category` column.
~~~Plain
select count(distinct category) from test;
@ -95,7 +95,7 @@ select count(distinct category) from test;
+-------------------------+
~~~
Example 5Count the number of combinations that can be formed by `category` and `supplier`.
Example 5: Count the number of combinations that can be formed by `category` and `supplier`.
~~~Plain
select count(distinct category, supplier) from test;

View File

@ -37,7 +37,7 @@ select * from test order by id;
+------+----------+----------+------------+
~~~
Example 1Count the number of distinct values in the `category` column.
Example 1: Count the number of distinct values in the `category` column.
~~~Plain
select multi_distinct_count(category) from test;
@ -48,7 +48,7 @@ select multi_distinct_count(category) from test;
+--------------------------------+
~~~
Example 2Count the number of distinct values in the `supplier` column.
Example 2: Count the number of distinct values in the `supplier` column.
~~~Plain
select multi_distinct_count(supplier) from test;

View File

@ -104,7 +104,7 @@ The evaluation starts from the first condition.
- User 4 meets no condition and [0,0] is returned.
Example 2Calculate the percentage of users who have viewed commodity page on 2022-01-01 (action='pv') and placed an order on 2022-01-02action='buy').
Example 2: Calculate the percentage of users who have viewed commodity page on 2022-01-01 (action='pv') and placed an order on 2022-01-02 (action='buy').
```Plain Text
MySQL > select sum(r[1]),sum(r[2])/sum(r[1])

View File

@ -75,7 +75,7 @@ Data type mapping between input value and return value:
4. Use this function to compute sum.
Example 1Calculate the total sales of each region.
Example 1: Calculate the total sales of each region.
```Plain Text
MySQL > SELECT region_num, sum(sales) from employees

View File

@ -22,7 +22,7 @@ Returns a value of the BOOLEAN type.
1 is returned if `arr2` is a subset of `arr1`. Otherwise, 0 is returned.
If any of the two arrays is NULL NULL is returned.
If any of the two arrays is NULL, NULL is returned.
## Usage notes

View File

@ -16,7 +16,7 @@ ARRAY array_generate([start,] end [, step])
- `start`: optional, the start value. It must be a constant or a column that evaluates to TINYINT, SMALLINT, INT, BIGINT, or LARGEINT. The default value is 1.
- `end`: required, the end value. It must be a constant or a column that evaluates to TINYINT, SMALLINT, INT, BIGINT, or LARGEINT.
- `step`: optional, the increment. It must be a constant or a column that evaluates to TINYINT, SMALLINT, INT, BIGINT, or LARGEINT. When `start` is less than `end`, the default value is 1. When `start` is greater than `end`, the default value is -1.
- `step`: optional, the increment. It must be a constant or a column that evaluates to TINYINT, SMALLINT, INT, BIGINT, or LARGEINT. When `start` is less than `end`, the default value is 1. When `start` is greater than `end`, the default value is -1.
## Return value

View File

@ -37,7 +37,7 @@ array_sortby(<lambda function>, array0 [, array1...])
- `array0`: the array you want to sort. It must be an array, array expression, or `null`. Elements in the array must be sortable.
- `array1`: the sorting array used to sort `array0`. It must be an array, array expression, or `null`.
- `lambda function`the lambda expression used to generate the sorting array.
- `lambda function`: the lambda expression used to generate the sorting array.
## Return value

View File

@ -36,7 +36,7 @@ Here are the formats available:
%M | Month name in full
%m | Month name as a numeric value (00-12)
%p | AM or PM
%r | Time in 12 hourhh:mm:ss AM or PM
%r | Time in 12 hour (hh:mm:ss AM or PM)
%S | Seconds (00-59)
%s | Seconds (00-59)
%T | Time in 24 hour format (hh:mm:ss)

View File

@ -7,12 +7,12 @@ Converts a UNIX timestamp into the required time format. The default format is `
Currently, `string_format` supports the following formats:
```plain text
%YYear e.g.20141900
%mMonth e.g.1209
%dDay e.g.1101
%HHour e.g.230112
%iMinute e.g.0511
%sSecond e.g.5901
%Y: Year e.g.: 2014, 1900
%m: Month e.g.: 12, 09
%d: Day e.g.: 11, 01
%H: Hour e.g.: 23, 01, 12
%i: Minute e.g.: 05, 11
%s: Second e.g.: 59, 01
```
Other formats are invalid and NULL will be returned.

View File

@ -18,7 +18,7 @@ DATETIME microseconds_add(DATETIME expr1,INT expr2);
## Return value
Returns a value of the DATETIME type. If the input value is of the DATE typethe hour, minute, and seconds parts are processed as `00:00:00`.
Returns a value of the DATETIME type. If the input value is of the DATE type, the hour, minute, and seconds parts are processed as `00:00:00`.
## Examples

View File

@ -18,7 +18,7 @@ DATETIME microseconds_sub(DATETIME expr1,INT expr2);
## Return value
Returns a value of the DATETIME type. If the input value is of the DATE typethe hour, minute, and seconds parts are processed as `00:00:00`.
Returns a value of the DATETIME type. If the input value is of the DATE type, the hour, minute, and seconds parts are processed as `00:00:00`.
## Examples

View File

@ -20,7 +20,7 @@ DATETIME seconds_sub(DATETIME|DATE date, INT seconds);
Returns a value of the DATETIME type.
If the input value is of the DATE typethe hour, minute, and seconds parts are processed as `00:00:00`.
If the input value is of the DATE type, the hour, minute, and seconds parts are processed as `00:00:00`.
## Examples

View File

@ -14,7 +14,7 @@ BIGINT time_to_sec(TIME time)
## Parameters
`time`It must be of the TIME type。
`time`: It must be of the TIME type.
## Return value

View File

@ -16,7 +16,7 @@ DATETIME timestamp(DATETIME|DATE expr);
## Return value
Returns a DATETIME value. If the input time is empty or does not exist, such as `2021-02-29`NULL is returned.
Returns a DATETIME value. If the input time is empty or does not exist, such as `2021-02-29`, NULL is returned.
## Examples

View File

@ -12,7 +12,7 @@ json_string(json_object_expr)
## Parameters
- `json_object_expr`the expression that represents the JSON object. The object can be a JSON column, or a JSON object that is produced by a JSON constructor function such as PARSE_JSON.
- `json_object_expr`: the expression that represents the JSON object. The object can be a JSON column, or a JSON object that is produced by a JSON constructor function such as PARSE_JSON.
## Return value

View File

@ -26,7 +26,7 @@ MAP transform_keys(any_map, lambda_func)
- `any_map`: the Map.
- `lambda_func`the Lambda expression you want to apply to `any_map`.
- `lambda_func`: the Lambda expression you want to apply to `any_map`.
## Return value

View File

@ -26,7 +26,7 @@ MAP transform_values(any_map, lambda_func)
- `any_map`: the Map.
- `lambda_func`the Lambda expression you want to apply to `any_map`.
- `lambda_func`: the Lambda expression you want to apply to `any_map`.
## Return value

View File

@ -16,7 +16,7 @@ PERCENTILE_APPROX_RAW(x, y);
- `x`: It can be a column or a set of values. It must evaluate to PERCENTILE.
- `y`: the percentile. The supported data type is DOUBLE. Value range: [0.0,1.0]
- `y`: the percentile. The supported data type is DOUBLE. Value range: [0.0,1.0].
## Return value

View File

@ -19,13 +19,6 @@ MySQL > select char_length("abc");
+--------------------+
| 3 |
+--------------------+
MySQL > select char_length("中国");
+----------------------+
| char_length('中国') |
+----------------------+
| 2 |
+----------------------+
```
## keyword

View File

@ -19,13 +19,6 @@ MySQL > select length("abc");
+---------------+
| 3 |
+---------------+
MySQL > select length("中国");
+------------------+
| length('中国') |
+------------------+
| 6 |
+------------------+
```
## keyword

View File

@ -34,7 +34,7 @@ mysql> SELECT rtrim(' ab d ');
1 row in set (0.00 sec)
```
Example 2Remove specified characters from the end of the string.
Example 2: Remove specified characters from the end of the string.
```Plain Text
MySQL > SELECT rtrim("xxabcdxx", "x");

View File

@ -27,13 +27,6 @@ MySQL > select split_part("hello world", " ", 2);
| world |
+-----------------------------------+
MySQL > select split_part("2019年7月8号", "月", 1);
+-----------------------------------------+
| split_part('2019年7月8号', '月', 1) |
+-----------------------------------------+
| 2019年7 |
+-----------------------------------------+
MySQL > select split_part("abca", "a", 1);
+----------------------------+
| split_part('abca', 'a', 1) |

View File

@ -34,7 +34,7 @@ Example 2: Add `count(distinct )` to the SQL blacklist.
mysql> ADD SQLBLACKLIST "select count(distinct .+) from .+";
```
Example 3: Add `order by limit x, y1 <= x <=7, 5 <=y <=7` to the SQL blacklist.
Example 3: Add `order by limit x, y, 1 <= x <=7, 5 <=y <=7` to the SQL blacklist.
```Plain
mysql> ADD SQLBLACKLIST "select id_int from test_all_type_select1

View File

@ -15,12 +15,12 @@ Note:
Description of the return parameters:
```plain text
1. Key Configuration item name
2. Value Configuration item value
3. Type Configuration item type
4. IsMutable Whether it can be set through the ADMIN SET CONFIG command
5. MasterOnly Whether it only applies to leader FE
6. Comment Configuration item description
1. Key: Configuration item name
2. Value: Configuration item value
3. Type: Configuration item type
4. IsMutable: Whether it can be set through the ADMIN SET CONFIG command
5. MasterOnly: Whether it only applies to leader FE
6. Comment: Configuration item description
```
## Examples

View File

@ -94,12 +94,12 @@ Note:
Hub of Mysql type needs to specify the following parameters:
```plain text
hostmysql host
portmysql port
usermysql user
passwordmysql password
databasemysql database
tablemysql table
host: mysql host
port: mysql port
user: mysql user
password: mysql password
database: mysql database
table: mysql table
```
- When using the Broker type, import errors will be generated into a file and be written into a designated remote storage system through Broker. Please make sure that corresponding broker is already deployed in the remote system.

View File

@ -80,7 +80,7 @@ Example 1: Set the replica status of tablet 10003 on BE 10001 to `bad`.
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10003", "backend_id" = "10001", "status" = "bad");
```
Example 2: Set the replica status of tablet 10003 on BE 10001 to `ok`
Example 2: Set the replica status of tablet 10003 on BE 10001 to `ok`.
```SQL
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10003", "backend_id" = "10001", "status" = "ok");

View File

@ -123,8 +123,8 @@ GRANT IMPERSONATE ON USER <user_identity> TO USER <user_identity> [ WITH GRANT O
### Grant roles to roles or users
```SQL
GRANT <role_name> [,<role_name>, ...] TO ROLE <role_name>
GRANT <role_name> [,<role_name>, ...] TO USER <user_identity>
GRANT <role_name> [,<role_name>, ...] TO ROLE <role_name>;
GRANT <role_name> [,<role_name>, ...] TO USER <user_identity>;
```
## References

View File

@ -57,7 +57,7 @@ SHOW ALL AUTHENTICATION;
+---------------+----------+-------------------------+-------------------+
```
Example 3Display the authentication information of a specified user.
Example 3: Display the authentication information of a specified user.
```Plain
SHOW AUTHENTICATION FOR root;

View File

@ -111,7 +111,7 @@ DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
```
Implicitly assign and explicitly specify the values for the `AUTO_INCREMENT` column `number` in the table `test_tbl2`
Implicitly assign and explicitly specify the values for the `AUTO_INCREMENT` column `number` in the table `test_tbl2`.
```SQL
INSERT INTO test_tbl2 (id, number) VALUES (1, DEFAULT);

View File

@ -370,7 +370,7 @@ Note:
Syntax:
```sql
DROP INDEX index_name
DROP INDEX index_name;
```
### Swap

View File

@ -32,13 +32,13 @@ This statement is a synchronous operation and requires you to have the `ALTER_PR
## Examples
Example 1Cancel the schema changes to `example_table` in the `example_db`database.
Example 1: Cancel the schema changes to `example_table` in the `example_db`database.
```SQL
CANCEL ALTER TABLE COLUMN FROM example_db.example_table;
```
Example 2Cancel rollup index changes to `example_table` in your current database.
Example 2: Cancel rollup index changes to `example_table` in your current database.
```SQL
CANCEL ALTER TABLE ROLLUP FROM example_table;

View File

@ -53,7 +53,7 @@ Note:
If Spark is used for ETL, working_DIR and broker need to be specified. The instructions are as follows:
```plain text
working_dir: Directory used by ETL. It is required when spark is used as ETL resource. For examplehdfs://host:port/tmp/starrocks。
working_dir: Directory used by ETL. It is required when spark is used as ETL resource. For example: hdfs://host:port/tmp/starrocks.
broker: Name of broker. It is required when spark is used as ETL resource and needs to be configured beforehand by using `ALTER SYSTEM ADD BROKER` command.
broker.property_key: It is the property information needed to be specified when broker reads the intermediate files created by ETL.
```

View File

@ -31,40 +31,40 @@ Syntax:
col_name col_type [agg_type] [NULL | NOT NULL] [DEFAULT "default_value"] [AUTO_INCREMENT]
```
**col_name**Column name.
**col_name**: Column name.
**col_type**Column type. Specific column information, such as types and ranges:
**col_type**: Column type. Specific column information, such as types and ranges:
- TINYINT1 byte): Ranges from -2^7 + 1 to 2^7 - 1.
- TINYINT (1 byte): Ranges from -2^7 + 1 to 2^7 - 1.
- SMALLINT (2 bytes): Ranges from -2^15 + 1 to 2^15 - 1.
- INT4 bytes): Ranges from -2^31 + 1 to 2^31 - 1.
- BIGINT8 bytes): Ranges from -2^63 + 1 to 2^63 - 1.
- LARGEINT16 bytes): Ranges from -2^127 + 1 to 2^127 - 1.
- FLOAT4 bytes): Supports scientific notation.
- DOUBLE8 bytes): Supports scientific notation.
- INT (4 bytes): Ranges from -2^31 + 1 to 2^31 - 1.
- BIGINT (8 bytes): Ranges from -2^63 + 1 to 2^63 - 1.
- LARGEINT (16 bytes): Ranges from -2^127 + 1 to 2^127 - 1.
- FLOAT (4 bytes): Supports scientific notation.
- DOUBLE (8 bytes): Supports scientific notation.
- DECIMAL[(precision, scale)] (16 bytes)
- Default value: DECIMAL(10, 0)
- precision: 1 ~ 38
- scale: 0 ~ precision
- Integer partprecision - scale
- Integer part: precision - scale
Scientific notation is not supported.
- DATE (3 bytes): Ranges from 0000-01-01 to 9999-12-31.
- DATETIME (8 bytes): Ranges from 0000-01-01 00:00:00 to 9999-12-31 23:59:59.
- CHAR[(length)]: Fixed length string. Range1 ~ 255. Default value: 1.
- CHAR[(length)]: Fixed length string. Range: 1 ~ 255. Default value: 1.
- VARCHAR[(length)]: A variable-length string. The default value is 1. Unit: bytes. In versions earlier than StarRocks 2.1, the value range of `length` is 165533. [Preview] In StarRocks 2.1 and later versions, the value range of `length` is 11048576.
- HLL (1~16385 bytes): For HLL type, there's no need to specify length or default value. The length will be controlled within the system according to data aggregation. HLL column can only be queried or used by [hll_union_agg](../../sql-functions/aggregate-functions/hll_union_agg.md), [Hll_cardinality](../../sql-functions/scalar-functions/hll_cardinality.md), and [hll_hash](../../sql-functions/aggregate-functions/hll_hash.md).
- BITMAP: Bitmap type does not require specified length or default value. It represents a set of unsigned bigint numbers. The largest element could be up to 2^64 - 1.
**agg_type**aggregation type. If not specified, this column is key column.
**agg_type**: aggregation type. If not specified, this column is key column.
If specified, it is value column. The aggregation types supported are as follows:
- SUM, MAX, MIN, REPLACE
- HLL_UNION (only for HLL type)
- BITMAP_UNION(only for BITMAP)
- REPLACE_IF_NOT_NULLThis means the imported data will only be replaced when it is of non-null value. If it is of null value, StarRocks will retain the original value.
- REPLACE_IF_NOT_NULL: This means the imported data will only be replaced when it is of non-null value. If it is of null value, StarRocks will retain the original value.
> NOTE
>
@ -81,7 +81,7 @@ This aggregation type applies ONLY to the aggregation model whose key_desc type
- **DEFAULT <default_value>**: Use a given value of the column data type as the default value. For example, if the data type of the column is VARCHAR, you can specify a VARCHAR string, such as beijing, as the default value, as presented in `DEFAULT "beijing"`. Note that default values cannot be any of the following types: ARRAY, BITMAP, JSON, HLL, and BOOLEAN.
- **DEFAULT (\<expr\>)**: Use the result returned by a given function as the default value. Only the [uuid()](../../sql-functions/utility-functions/uuid.md) and [uuid_numeric()](../../sql-functions/utility-functions/uuid_numeric.md) expressions are supported.
- **AUTO_INCREMENT**specifies an `AUTO_INCREMENT` column. The data types of `AUTO_INCREMENT` columns must be BIGINT. Auto-incremented IDs start from 1 and increase at a step of 1. For more information about `AUTO_INCREMENT` columns, see [AUTO_INCREMENT](../../sql-statements/auto_increment.md).
- **AUTO_INCREMENT**: specifies an `AUTO_INCREMENT` column. The data types of `AUTO_INCREMENT` columns must be BIGINT. Auto-incremented IDs start from 1 and increase at a step of 1. For more information about `AUTO_INCREMENT` columns, see [AUTO_INCREMENT](../../sql-statements/auto_increment.md).
### index_definition
@ -183,7 +183,7 @@ Optional value: mysql, elasticsearch, hive, jdbc (2.3 and later), iceberg, and h
### key_desc
Syntax
Syntax:
```SQL
key_type(k1[,k2 ...])
@ -407,9 +407,9 @@ If your StarRocks cluster has multiple data replicas, you can set different writ
The valid values of `write_quorum` are:
- `MAJORITY`Default value. When the **majority** of data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
- `ONE`When **one** of the data replicas returns loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
- `ALL`When **all** of the data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
- `MAJORITY`: Default value. When the **majority** of data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
- `ONE`: When **one** of the data replicas returns loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
- `ALL`: When **all** of the data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed.
> **CAUTION**
>
@ -527,7 +527,7 @@ PROPERTIES(
);
```
### Create a Duplicate Key table that uses Range partition, Hash bucketingand column-based storage, and set the storage medium and cooldown time
### Create a Duplicate Key table that uses Range partition, Hash bucketing, and column-based storage, and set the storage medium and cooldown time
LESS THAN

View File

@ -24,7 +24,7 @@ Syntax:
RECOVER PARTITION partition_name FROM [<db_name>.]<table_name>
```
Note
Note:
1. It can only recover meta-information deleted some time ago. The default time: one day. (You can change it through parameter configuration catalog_trash_expire_second in fe.conf. )
2. If the meta-information is deleted with an identical meta-information created, the previous one will not be recovered.

View File

@ -22,7 +22,7 @@ Run the following command to switch to a Hive catalog named `hive_metastore` in
SET CATALOG hive_metastore;
```
Run the following command to switch to the internal catalog `default_catalog` in the current session
Run the following command to switch to the internal catalog `default_catalog` in the current session:
```SQL
SET CATALOG default_catalog;

View File

@ -16,7 +16,7 @@ RETURNS ret_type
AGGREGATE: This parameter indicates that the created function is an aggregate function, otherwise it is a scalar function.
function_name: When creating the name of a function, you can include the name of the database. For example,`db1.my_func`
function_name: When creating the name of a function, you can include the name of the database. For example,`db1.my_func`.
arg_type: Argument type of function, the same type as that defined when creating table. The added argument can be represented by`, ...` The added type is the same as that of the last argument before it was added.

View File

@ -384,7 +384,7 @@ The following parameters are supported:
Suppose that you want to load a 1-GB data file on which two materialized views are created into a StarRocks cluster whose average load speed is 10 MB/s and maximum number of concurrent instances allowed per task is 3. The amount of time required for the data load is approximately 102 seconds.
(1 x 1024 x 3)/(10 x 3) = 102second
(1 x 1024 x 3)/(10 x 3) = 102 (second)
For this example, we recommend that you set the timeout period to a value greater than 102 seconds.

View File

@ -417,7 +417,7 @@ PROPERTIES
"property.security.protocol" = "ssl",
-- The location of the CA certificate.
"property.ssl.ca.location" = "FILE:ca-cert",
-- If authentication is enabled for Kafka clients, you need to configure the following properties
-- If authentication is enabled for Kafka clients, you need to configure the following properties:
-- The location of the Kafka client's public key.
"property.ssl.certificate.location" = "FILE:client.pem",
-- The location of the Kafka client's private key.

View File

@ -49,7 +49,7 @@ There is actually no special syntax to identify self-join. The conditions on bot
We need to assign them different aliases.
Example
Example:
```sql
SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;
@ -220,7 +220,7 @@ The HAVING clause does not filter row data in a table, but filters the results o
Generally speaking, HAVING is used with aggregate functions (such as COUNT(), SUM(), AVG(), MIN(), MAX()) and GROUP BY clauses.
Example
Example:
```sql
select tiny_column, sum(short_column)
@ -270,7 +270,7 @@ The size of the query result set needs to be limited because of the large amount
Instructions for use: The value of the LIMIT clause must be a numeric literal constant.
Example
Example:
```plain text
mysql> select tiny_column from small_table limit 1;
@ -305,7 +305,7 @@ The result set defaults to start at line 0, so offset 0 and no offset return the
Generally speaking, OFFSET clauses need to be used with ORDER BY and LIMIT clauses to be valid.
Example
Example:
```plain text
mysql> select varchar_column from big_table order by varchar_column limit 3;
@ -381,7 +381,7 @@ As a result, queries using UNION ALL operations are faster and consume less memo
You need to place the UNION operation in the subquery, then select from the subquery, and finally place the subquery and order by outside the subquery.
Example
Example:
```plain text
mysql> (select tiny_column from small_table) union all (select tiny_column from small_table);
@ -478,7 +478,7 @@ Subqueries are divided into irrelevant subqueries and related subqueries by rele
Uncorrelated subqueries support [NOT] IN and EXISTS.
Example
Example:
```sql
SELECT x FROM t1 WHERE x [NOT] IN (SELECT y FROM t2);
@ -492,7 +492,7 @@ SELECT x FROM t1 WHERE EXISTS (SELECT y FROM t2 WHERE y = 1);
Related subqueries support [NOT] IN and [NOT] EXISTS.
Example
Example:
```sql
SELECT * FROM t1 WHERE x [NOT] IN (SELECT a FROM t2 WHERE t1.y = t2.b);
@ -519,7 +519,7 @@ Example:
3. Related scalar quantum queries. For example, output the highest salary information for each department.
```sql
SELECT name FROM table a WHERE salary = SELECT MAX(salary) FROM table b WHERE b.Department= a.Department;
SELECT name FROM table a WHERE salary = (SELECT MAX(salary) FROM table b WHERE b.Department= a.Department);
```
4. Scalar quantum queries are used as parameters of ordinary functions.
@ -540,7 +540,7 @@ Convenient and easy to maintain, reducing duplication within queries.
It is easier to read and understand SQL code by abstracting the most complex parts of a query into separate blocks.
Example
Example:
```sql
-- Define one subquery at the outer level, and another at the inner level as part of the
@ -558,7 +558,7 @@ SQL operators are a series of functions used for comparison and are widely used
Arithmetic operators usually appear in expressions that contain left, right, and most often left operands
**+and-**can be used as a unit or as a 2-ary operator. When used as a unit operator, such as +1, -2.5 or -col_ name, which means the value is multiplied by +1 or -1.
**+and-**: can be used as a unit or as a 2-ary operator. When used as a unit operator, such as +1, -2.5 or -col_ name, which means the value is multiplied by +1 or -1.
So the cell operator + returns an unchanged value, and the cell operator - changes the symbol bits of that value.
@ -568,7 +568,7 @@ Because--is interpreted as a comment in the following statement (when a user can
When + or - is a binary operator, such as 2+2, 3+1.5, or col1+col2, it means that the left value is added or subtracted from the right value. Both left and right values must be numeric types.
**and/**represent multiplication and division, respectively. The operands on both sides must be data types. When two numbers are multiplied.
**and/**: represent multiplication and division, respectively. The operands on both sides must be data types. When two numbers are multiplied.
Smaller operands may be promoted if needed (e.g., SMALLINT to INT or BIGINT), and the result of the expression will be promoted to the next larger type.
@ -576,9 +576,9 @@ For example, TINYINT multiplied by INT will produce a BIGINT type of result. Whe
If the user wants to convert the result of the expression to another type, it needs to be converted using the CAST function.
**%**Modulation operator. Returns the remainder of the left operand divided by the right operand. Both left and right operands must be integers.
**%**: Modulation operator. Returns the remainder of the left operand divided by the right operand. Both left and right operands must be integers.
**&|and ^**The bitwise operator returns the result of bitwise AND, bitwise OR, bitwise XOR operations on two operands. Both operands require an integer type.
**&, |and ^**: The bitwise operator returns the result of bitwise AND, bitwise OR, bitwise XOR operations on two operands. Both operands require an integer type.
If the types of the two operands of a bitwise operator are inconsistent, the operands of a smaller type are promoted to the operands of a larger type, and the corresponding bitwise operations are performed.
@ -604,7 +604,7 @@ Data type: Usually an expression evaluates to a numeric type, which also support
If you need to make sure the expression works correctly, you can use functions such as upper (), lower (), substr (), trim ().
Example
Example:
```sql
select c1 from t1 where month between 1 and 6;
@ -632,7 +632,7 @@ select * from small_table where tiny_column in (1,2);
This operator is used to compare to a string. ''matches a single character,'%' matches multiple characters. The parameter must match the complete string. Typically, placing'%'at the end of a string is more practical.
Example
Example:
```plain text
mysql> select varchar_column from small_table where varchar_column like 'm%';
@ -668,7 +668,7 @@ OR: 2-ary operator that returns TRUE if one of the parameters on the left and ri
NOT: Unit operator, the result of inverting an expression. If the parameter is TRUE, the operator returns FALSE; If the parameter is FALSE, the operator returns TRUE.
Example
Example:
```plain text
mysql> select true and true;
@ -728,7 +728,7 @@ If you want to match the middle part, the front part of the regular expression c
The'|'operator is an optional operator. Regular expressions on either side of'|' only need to satisfy one side condition. The'|'operator and regular expressions on both sides usually need to be enclosed in ().
Example
Example:
```plain text
mysql> select varchar_column from small_table where varchar_column regexp '(mi|MI).*';
@ -764,7 +764,7 @@ Simply add an AS alias clause after the table, column, and expression names in t
AS keywords are optional and users can specify aliases directly after the original name. If an alias or other identifier has the same name as an internal keyword, you need to add a ``symbol to the name. Aliases are case sensitive.
Example
Example:
```sql
select tiny_column as name, int_column as sex from big_table;

View File

@ -4,13 +4,13 @@
This statement is used to display the amount of data, the number of copies, and the number of statistical rows.
Syntax
Syntax:
```sql
SHOW DATA [FROM <db_name>[.<table_name>]]
```
Note
Note:
1. If the FROM clause is not specified, the amount of data and copies subdivided into each table in the current db will be displayed. Where the data volume is the total data volume of all replicas. The number of copies is the number of copies of all partitions of the table and all materialized views.
2. If the FROM clause is specified, the amount of data, the number of copies and the number of statistical rows subdivided into each materialized view under the table are displayed. Where the data volume is the total data volume of all replicas. The number of copies is the number of copies of all partitions corresponding to the materialized view. The number of statistical rows is the number of statistical rows of all partitions corresponding to the materialized view.

View File

@ -118,7 +118,7 @@ The parameters in the return result are described as follows:
WHERE queryid = "921d8f80-7c9d-11eb-9342-acde48001122";
```
- Query export jobs that are in the `EXPORTING` state in the database `example_db` and specify to sort the export job records in the result set by `StartTime` in ascending order
- Query export jobs that are in the `EXPORTING` state in the database `example_db` and specify to sort the export job records in the result set by `StartTime` in ascending order:
```SQL
SHOW EXPORT FROM example_db

View File

@ -28,7 +28,7 @@ SHOW RESTORE [FROM <db_name>]
| Label | Name of the data snapshot. |
| Timestamp | Backup timestamp. |
| DbName | Name of the database that the RESTORE task belongs to. |
| State | Current state of the RESTORE task:<ul><li>PENDING: Initial state after submitting a job.</li><li>SNAPSHOTING: Executing the local snapshot.</li><li>DOWNLOAD: Submitting snapshot download task.</li><li>DOWNLOADING: Downloading the snapshot.</li><li>COMMITTo commit the downloaded snapshot.</li><li>COMMITTING: Committing the downloaded snapshot.</li><li>FINISHED: RESTORE task finished.</li><li>CANCELLED: RESTORE task failed or cancelled.</li></ul> |
| State | Current state of the RESTORE task:<ul><li>PENDING: Initial state after submitting a job.</li><li>SNAPSHOTING: Executing the local snapshot.</li><li>DOWNLOAD: Submitting snapshot download task.</li><li>DOWNLOADING: Downloading the snapshot.</li><li>COMMIT: To commit the downloaded snapshot.</li><li>COMMITTING: Committing the downloaded snapshot.</li><li>FINISHED: RESTORE task finished.</li><li>CANCELLED: RESTORE task failed or cancelled.</li></ul> |
| AllowLoad | If loading data is allowed during the RESTORE task. |
| ReplicationNum | Number of replicas to be restored. |
| RestoreObjs | The restored objects (tables and partitions). |

View File

@ -4,7 +4,7 @@
Displays all tables in the current database.
Syntax
Syntax
```sql
SHOW TABLES;

View File

@ -6,7 +6,7 @@ Spark load preprocesses the imported data through external spark resources, impr
Spark load is an asynchronous import method. Users need to create Spark type import tasks through MySQL protocol and view the import results through`SHOW LOAD`.
Syntax
Syntax
```sql
LOAD LABEL load_label
@ -57,7 +57,7 @@ INTO TABLE tbl_name
[WHERE predicate]
```
Note
Note
```plain text
file_path:
@ -76,22 +76,22 @@ PARTITION:
If this parameter is specified, only the specified partition will be imported, and the data outside the imported partition will be filtered out.
If not specified, all partitions of table will be imported by default.
NEGATIVE
NEGATIVE:
If this parameter is specified, it is equivalent to importing a batch of "negative" data. Used to offset the same batch of previously imported data.
This parameter is only applicable when the value column exists and the aggregation type of the value column is SUM only.
column_separator
column_separator:
Specifies the column separator in the import file. Default is \ t
If it is an invisible character, you need to prefix it with \ \ x and use hexadecimal to represent the separator.
For example, the separator of hive file \ x01 is specified as "\ \ x01"
file_type
file_type:
Used to specify the type of imported file. Currently, only csv is supported.
column_list
column_list:
Used to specify the correspondence between the columns in the import file and the columns in the table.
When you need to skip a column in the import file, specify the column as a column name that does not exist in the table.
@ -101,8 +101,8 @@ Syntax:
SET:
If specify this parameter, you can convert a column of the source file according to the function, and then import the converted results into table. Syntax is column_name = expression
Only Spark SQL build_in functions are supported. Please refer to https://spark.apache.org/docs/2.4.6/api/sql/index.html
If specify this parameter, you can convert a column of the source file according to the function, and then import the converted results into table. Syntax is column_name = expression.
Only Spark SQL build_in functions are supported. Please refer to https://spark.apache.org/docs/2.4.6/api/sql/index.html.
Give a few examples to help understand.
Example 1: there are three columns "c1, c2, c3" in the table, and the first two columns in the source file correspond to (c1, c2), and the sum of the last two columns corresponds to C3; then columns (c1, c2, tmp_c3, tmp_c4) set (c3 = tmp_c3 + tmp_c4) needs to be specified;
Example 2: there are three columns "year, month and day" in the table, and there is only one time column in the source file in the format of "2018-06-01 01:02:03".
@ -131,17 +131,17 @@ Syntax:
[PROPERTIES ("key"="value", ...)]
```
You can specify the following parameters
timeout specifies the timeout of the import operation. The default timeout is 4 hours. In seconds.
max_filter_ratiothe maximum allowable data proportion that can be filtered (for reasons such as non-standard data). The default is zero tolerance.
strict mode whether to strictly restrict the data. The default is false.
You can specify the following parameters:
timeout: specifies the timeout of the import operation. The default timeout is 4 hours. In seconds.
max_filter_ratio:the maximum allowable data proportion that can be filtered (for reasons such as non-standard data). The default is zero tolerance.
strict mode: whether to strictly restrict the data. The default is false.
timezone: specifies the time zone of some functions affected by the time zone, such as strftime / alignment_timestamp/from_unixtime, etc. Please refer to the [time zone] document for details. If not specified, the "Asia / Shanghai" time zone is used.
6.Import data format example
intTINYINT/SMALLINT/INT/BIGINT/LARGEINT1, 1000, 1234
floatFLOAT/DOUBLE/DECIMAL1.1, 0.23, .356
dateDATE/DATETIME2017-10-03, 2017-06-13 12:34:03。
int (TINYINT/SMALLINT/INT/BIGINT/LARGEINT): 1, 1000, 1234
float (FLOAT/DOUBLE/DECIMAL): 1.1, 0.23, .356
date (DATE/DATETIME) :2017-10-03, 2017-06-13 12:34:03.
(Note: for other date formats, you can use strftime or time_format function to convert in the Import command) string class (CHAR/VARCHAR): "I am a student", "a"
NULL value: \ N
@ -195,13 +195,13 @@ NULL value: \ N
Adele,1,1
Each column in the data file corresponds to each column specified in the import statement
Each column in the data file corresponds to each column specified in the import statement:
k1,tmp_k2,tmp_k3
The conversion is as follows:
1. k1: no conversion
2. k2is the sum of tmp_ k2 and tmp_k3
2. k2:is the sum of tmp_ k2 and tmp_k3
LOAD LABEL example_db.label6
(

View File

@ -64,7 +64,7 @@ Describes the data file that you want to load. The `data_desc` descriptor can in
-H "format: CSV | JSON"
-H "column_separator: <column_separator>"
-H "row_delimiter: <row_delimiter>"
-H "columns: <column1_name>[, <column2_name>... ]"
-H "columns: <column1_name>[, <column2_name>, ... ]"
-H "partitions: <partition1_name>[, <partition2_name>, ...]"
-H "jsonpaths: [ \"<json_path1>\"[, \"<json_path2>\", ...] ]"
-H "strip_outer_array: true | false"
@ -302,7 +302,7 @@ If you want to load only the data records whose values in the first column of `e
```Bash
curl --location-trusted -u root: -H "label:label4" \
-H "columns: col1, col2col3]"\
-H "columns: col1, col2, col3]"\
-H "where: col1 = 20180601" \
-T example4.csv -XPUT \
http://<fe_host>:<fe_http_port>/api/test_db/table4/_stream_load

View File

@ -13,7 +13,7 @@ The storage space used by HLL is determined by the distinct values in the hash v
In actual business scenarios, data volume and data distribution affect the memory usage of queries and the accuracy of the approximate result. You need to consider these two factors:
- Data volume: HLL returns an approximate value. A larger data volume results in a more accurate result. A smaller data volume results in larger deviation.
- Data distributionIn the case of large data volume and high-cardinality dimension column for GROUP BYdata computation will use more memory. HLL is not recommended in this situation. It is recommended when you perform no-group-by count distinct or GROUP BY on low-cardinality dimension columns.
- Data distribution: In the case of large data volume and high-cardinality dimension column for GROUP BY, data computation will use more memory. HLL is not recommended in this situation. It is recommended when you perform no-group-by count distinct or GROUP BY on low-cardinality dimension columns.
- Query granularity: If you query data at a large query granularity, we recommend you use the Aggregate Key table or materialized view to pre-aggregate data to reduce data volume.
For details about using HLL, see [Use HLL for approximate count distinct](../../../using_starrocks/Using_HLL.md) and [HLL](../data-definition/HLL.md).

View File

@ -117,7 +117,7 @@ The following parameters apply only to the Flink DataStream reading method.
| scan.columns | No | STRING | The column that you want to read. You can specify multiple columns, which must be separated by a comma (,). |
| scan.filter | No | STRING | The filter condition based on which you want to filter data. |
Assume that in Flink you create a table that consists of three columns, which are `c1`、`c2`、`c3`. To read the rows whose values in the `c1` column of this Flink table are equal to `100`, you can specify two filter conditions `"scan.columns, "c1"` and `"scan.filter, "c1 = 100"`.
Assume that in Flink you create a table that consists of three columns, which are `c1`, `c2`, `c3`. To read the rows whose values in the `c1` column of this Flink table are equal to `100`, you can specify two filter conditions `"scan.columns, "c1"` and `"scan.filter, "c1 = 100"`.
## Data type mapping between StarRocks and Flink

View File

@ -172,7 +172,7 @@ PROPERTIES (
);
~~~
Table 2
Table 2:
~~~SQL
CREATE TABLE `tbl2` (
@ -383,13 +383,13 @@ This API is implemented on the FE and can be accessed using `fe_host:fe_http_por
* Mark as Stable
`POST /api/colocate/group_stable?db_id=10005&group_id=10008`
`Return200`
`Return: 200`
* Mark as Unstable
`DELETE /api/colocate/group_stable?db_id=10005&group_id=10008`
`Return200`
`Return: 200`
3. Set the data distribution of a Group

View File

@ -15,16 +15,16 @@ Since the HLL algorithm involves a lot of mathematical knowledge, we will use a
* ...
* X=n, P(X=n)=(1/2)<sup>n</sup>
We use test A to construct randomized test B which is to do N independent repetitions of test A, generating N independent identically distributed random variables X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, ..., X<sub>N</sub>.Take the maximum value of the random variables as X<sub>max</sub>. Leveraging the great likelihood estimation, the estimated value of N is 2<sup>X<sub>max</sub></sup>
We use test A to construct randomized test B which is to do N independent repetitions of test A, generating N independent identically distributed random variables X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, ..., X<sub>N</sub>.Take the maximum value of the random variables as X<sub>max</sub>. Leveraging the great likelihood estimation, the estimated value of N is 2<sup>X<sub>max</sub></sup>.
<br/>
Now, we simulate the above experiment using the hash function on the given dataset:
* Test A: Calculate the hash value of dataset elements and convert the hash value to binary representation. Record the occurrence of bit=1, starting from the lowest bit of the binary.
* Test B: Repeat the Test A process for dataset elements of Test B. Update the maximum position “m” of the first bit 1 occurrence for each test;
* Test B: Repeat the Test A process for dataset elements of Test B. Update the maximum position "m" of the first bit 1 occurrence for each test;
* Estimate the number of non-repeating elements in the dataset as m<sup>2</sup>.
In fact, the HLL algorithm divides the elements into K=2<sup>k</sup> buckets based on the lower k bits of the element hash. Count the maximum value of the first bit 1 occurrence from the k+1st bit as m<sub>1</sub>, m<sub>2</sub>,..., m<sub>k</sub>, and estimate the number of non-repeating elements in the bucket as 2<sup>m<sub>1</sub></sup>, 2<sup>m<sub>2</sub></sup>,..., 2<sup>m<sub>k</sub></sup>. The number of non-repeating elements in the data set is the summed average of the number of buckets multiplied by the number of non-repeating elements in the buckets: N = K(K/(2<sup>\-m<sub>1</sub></sup>+2<sup>\-m<sub>2</sub></sup>,..., 2<sup>\-m<sub>K</sub></sup>))
In fact, the HLL algorithm divides the elements into K=2<sup>k</sup> buckets based on the lower k bits of the element hash. Count the maximum value of the first bit 1 occurrence from the k+1st bit as m<sub>1</sub>, m<sub>2</sub>,..., m<sub>k</sub>, and estimate the number of non-repeating elements in the bucket as 2<sup>m<sub>1</sub></sup>, 2<sup>m<sub>2</sub></sup>,..., 2<sup>m<sub>k</sub></sup>. The number of non-repeating elements in the data set is the summed average of the number of buckets multiplied by the number of non-repeating elements in the buckets: N = K(K/(2<sup>\-m<sub>1</sub></sup>+2<sup>\-m<sub>2</sub></sup>,..., 2<sup>\-m<sub>K</sub></sup>)).
<br/>
HLL multiplies the correction factor with the estimation result to make the result more accurate.

View File

@ -2,32 +2,32 @@
## Background
There are usually two ways to conduct accurate de-duplication analysis in StarRocks.
There are usually two ways to conduct accurate de-duplication analysis in StarRocks.
* Detail-based de-duplication: This is a traditional count distinct approach that is able to retain detailed data for flexible analysis. However, it consumes huge computational and storage resources and is not friendly enough to support scenarios involving large-scale datasets and query latency-sensitive de-duplication.
* Precomputation-based de-duplication: This approach is also recommended by StarRocks. In some scenarios, users only want to get the results after de-duplication and care less about detailed data. Such a scenario can be analyzed by precomputation, which is essentially using space for time and resonates with the core idea of the MOLAP aggregation model. It is to calculate in the process of data import, reducing the storage cost and the cost of on-site calculation during query. You can further reduce the size of datasets for on-site computation by shrinking RollUp dimension.
* Detail-based de-duplication: This is a traditional count distinct approach that is able to retain detailed data for flexible analysis. However, it consumes huge computational and storage resources and is not friendly enough to support scenarios involving large-scale datasets and query latency-sensitive de-duplication.
* Precomputation-based de-duplication: This approach is also recommended by StarRocks. In some scenarios, users only want to get the results after de-duplication and care less about detailed data. Such a scenario can be analyzed by precomputation, which is essentially using space for time and resonates with the core idea of the multidimensional OLAP (MOLAP) aggregation model. It is to calculate data in the process of data loading, reducing the storage cost and the cost of on-site calculation during query. You can further reduce the size of datasets for on-site computation by shrinking RollUp dimension.
## Traditional Count Distinct Calculation
StarRocks is implemented based on the MPP architecture that supports retaining detailed data when using count distinct calculation for accurate de-duplication t. However, because of the need for multiple data shuffles (transferring data between different nodes and calculating de-weighting) during query, it leads to a linear decrease in performance as the data volume increases.
StarRocks is implemented based on the MPP architecture that supports retaining detailed data when using count distinct calculation for accurate de-duplication. However, because of the need for multiple data shuffles (transferring data across nodes and calculating de-weighting) during query, it leads to a linear decrease in performance as the data volume increases.
In the following scenario, there are tables (dt, page, user_id) that need to calculate UV by detailed data.
| dt | page | user_id |
| :---: | :---: | :---:|
| 20191206 | xiaoxiang | 101 |
| 20191206 | waimai | 101 |
| 20191206 | xiaoxiang | 101 |
| 20191206 | waimai | 101 |
| 20191206 | xiaoxiang | 101 |
| 20191206 | waimai | 101 |
| 20191206 | game | 101 |
| 20191206 | shopping | 101 |
| 20191206 | game | 101 |
| 20191206 | shopping | 101 |
| 20191206 | game | 101 |
| 20191206 | shopping | 101 |
Count `uv` grouping by `page`
| page | uv |
| :---: | :---: |
| xiaoxiang | 1 |
| waimai | 2 |
| game | 1 |
| shopping | 2 |
```sql
select page, count(distinct user_id) as uv from table group by page;
@ -48,7 +48,7 @@ Given an array A with values in the range [0, n) (note: not including n), a bitm
## Advantages of bitmap de-duplication
1. Space advantage: Using one bit of a bitmap to indicate the existence of the corresponding subscript has a great space advantage. For example, for int32 de-duplication, the storage space required by a normal bitmap is only 1/32 of the traditional de-duplication. The implementation of Roaring Bitmap in StarRocks further significantly reduces storage usage through optimizing sparse bitmaps.
2. Time advantage: The bitmap de-duplication involves computation such as bit placement for a given subscript and counting the number of placed bitmaps, which are O(1) and O(n) operations respectively. The latter can be computed efficiently using clz, ctz and other instructions. In addition, bitmap de-duplication can be accelerated in parallel in the MPP execution engine, where each computing node computes a local sub-bitmap and uses the bitor operation to merge allsub-bitmaps into a final bitmap. Bitor operation is more efficient than sort-based or hash-based de-duplication in that it has no condition or data dependencies and supports vectorized execution.
2. Time advantage: The bitmap de-duplication involves computation such as bit placement for a given subscript and counting the number of placed bitmaps, which are O(1) and O(n) operations respectively. The latter can be computed efficiently using clz, ctz and other instructions. In addition, bitmap de-duplication can be accelerated in parallel in the MPP execution engine, where each computing node computes a local sub-bitmap and uses the bit_or function to merge all sub-bitmaps into a final bitmap. bit_or is more efficient than sort-based or hash-based de-duplication in that it has no condition or data dependencies and supports vectorized execution.
Roaring Bitmap implementation, details can be found at: [specific paper and implementation](https://github.com/RoaringBitmap/RoaringBitmap)
@ -67,9 +67,9 @@ First, create a table with a BITMAP column, where `visit_users` is an aggregated
```sql
CREATE TABLE `page_uv` (
`page_id` INT NOT NULL COMMENT '页面id',
`visit_date` datetime NOT NULL COMMENT '访问时间',
`visit_users` BITMAP BITMAP_UNION NOT NULL COMMENT '访问用户id'
`page_id` INT NOT NULL COMMENT 'page ID',
`visit_date` datetime NOT NULL COMMENT 'access time',
`visit_users` BITMAP BITMAP_UNION NOT NULL COMMENT 'user ID'
) ENGINE=OLAP
AGGREGATE KEY(`page_id`, `visit_date`)
DISTRIBUTED BY HASH(`page_id`) BUCKETS 1

View File

@ -164,7 +164,7 @@ public class HiveMetaClient {
LOG.error("Failed to get hive client. {}", connectionException.getMessage());
} else if (connectionException != null) {
LOG.error("An exception occurred when using the current long link " +
"to access metastore. msg {}", messageIfError);
"to access metastore. msg: {}", messageIfError);
client.close();
} else if (client != null) {
client.finish();
@ -246,7 +246,7 @@ public class HiveMetaClient {
LOG.error("Failed to get hive client. {}", connectionException.getMessage());
} else if (connectionException != null) {
LOG.error("An exception occurred when using the current long link " +
"to access metastore. msg {}", connectionException.getMessage());
"to access metastore. msg: {}", connectionException.getMessage());
client.close();
} else if (client != null) {
client.finish();

View File

@ -64,7 +64,7 @@ public class Trino2SRFunctionCallTransformer {
private static void registerAllFunctionTransformer() {
registerAggregateFunctionTransformer();
registerArrayFunctionTransformer();
// todo support more function transform
// todo: support more function transform
}
private static void registerAggregateFunctionTransformer() {

View File

@ -103,7 +103,7 @@ The execution timeout(s) for a single case, default is 10 min. Once the case's r
With this parameter, the framework will just list the names of the cases that need to be executed, but will not actually execute them.
**`-a|--attr` [Optional]**
tag filters, formattag1,tag2...
tag filters, format: tag1,tag2...
**`--file_filter=` [Optional]**
The format of the values is a regular expression, and only test cases in files with filenames that match it will be executed.