parent
3638b102f4
commit
e08e8d6b30
|
|
@ -106,7 +106,7 @@ When you create your branch to work on the PR base it off of the `docusaurus` br
|
|||
|
||||
#### Edit the nav for the version that you are working on
|
||||
|
||||
The nav files are in [`versioned_sidebars/`](https://github.com/StarRocks/docs-site/tree/docusaurus/versioned_sidebars) (nav in Docusaurus is called **Sidebar**). If you are working on 3.1 then `versioned_sidebars/version-3.1`. This file contains both English and Chinese sidebars.
|
||||
The nav files are in [`sidebars/`](./sidebars.json) (nav in Docusaurus is called **Sidebar**) in each branch. This file contains both English and Chinese sidebars.
|
||||
|
||||
> Note on file structure:
|
||||
>
|
||||
|
|
|
|||
|
|
@ -37,7 +37,11 @@ Make sure that you have finished the following preparations:
|
|||
|
||||
## Integration
|
||||
|
||||
Visit [https:///admin/query_engine/](https://localhost:10001/admin/query_engine/) and add a new query engine:
|
||||
Visit the following URL and add a new query engine:
|
||||
|
||||
```Plain
|
||||
https://localhost:10001/admin/query_engine/
|
||||
```
|
||||
|
||||

|
||||
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@ displayed_sidebar: "English"
|
|||
|
||||
[Apache Superset](https://superset.apache.org) is a modern data exploration and visualization platform. It uses [SQLAlchemy](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks) to query data.
|
||||
|
||||
Although [Mysql Dialect](https://superset.apache.org/docs/databases/mysql) can be used, it does not support `largeint`. So we developed [StarRocks Dialect](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks/sqlalchemy).
|
||||
Although [Mysql Dialect](https://superset.apache.org/docs/databases/mysql) can be used, it does not support `largeint`. So we developed [StarRocks Dialect](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks/).
|
||||
|
||||
## Environment
|
||||
|
||||
|
|
|
|||
|
|
@ -576,7 +576,7 @@ There are several ways to implement a Flink DataStream job according to the type
|
|||
[Flink CDC 3.0](https://nightlies.apache.org/flink/flink-cdc-docs-stable) framework can be used
|
||||
to easily build a streaming ELT pipeline from CDC sources (such as MySQL and Kafka) to StarRocks. The pipeline can synchronize whole database, merged sharding tables, and schema changes from sources to StarRocks.
|
||||
|
||||
Since v1.2.9, the Flink connector for StarRocks is integrated into this framework as [StarRocks Pipeline Connector](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/starrocks). The StarRocks Pipeline Connector supports:
|
||||
Since v1.2.9, the Flink connector for StarRocks is integrated into this framework as [StarRocks Pipeline Connector](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/connectors/pipeline-connectors/starrocks/). The StarRocks Pipeline Connector supports:
|
||||
|
||||
- Automatic creation of databases and tables
|
||||
- Schema change synchronization
|
||||
|
|
|
|||
|
|
@ -42,7 +42,7 @@ The synchronization process guarantees exactly-once semantics.
|
|||
|
||||
a. The Flink SQL client executes the data loading statement `INSERT INTO SELECT` to submit one or more Flink jobs to the Flink cluster.
|
||||
|
||||
b. The Flink cluster runs the Flink jobs to obtain data. The [Flink CDC connector](https://github.com/ververica/flink-cdc-connectors/blob/master/docs/content/quickstart/build-real-time-data-lake-tutorial.md) first reads full historical data from the source database, then seamlessly switches to incremental reading, and sends the data to flink-connector-starrocks.
|
||||
b. The Flink cluster runs the Flink jobs to obtain data. The Flink CDC connector first reads full historical data from the source database, then seamlessly switches to incremental reading, and sends the data to flink-connector-starrocks.
|
||||
|
||||
c. flink-connector-starrocks accumulates data in mini-batches, and synchronizes each batch of data to StarRocks.
|
||||
|
||||
|
|
@ -101,7 +101,7 @@ To synchronize data from MySQL, you need to install the following tools: SMT, Fl
|
|||
Starting taskexecutor daemon on host.
|
||||
```
|
||||
|
||||
2. Download [Flink CDC connector](https://github.com/ververica/flink-cdc-connectors/releases). This topic uses MySQL as the data source and therefore, `flink-sql-connector-mysql-cdc-x.x.x.jar` is downloaded. The connector version must match the [Flink](https://github.com/ververica/flink-cdc-connectors/releases) version. For detailed version mapping, see [Supported Flink Versions](https://ververica.github.io/flink-cdc-connectors/release-2.2/content/about.html#supported-flink-versions). This topic uses Flink 1.14.5 and you can download `flink-sql-connector-mysql-cdc-2.2.0.jar`.
|
||||
2. Download [Flink CDC connector](https://github.com/ververica/flink-cdc-connectors/releases). This topic uses MySQL as the data source and therefore, `flink-sql-connector-mysql-cdc-x.x.x.jar` is downloaded. The connector version must match the [Flink](https://github.com/ververica/flink-cdc-connectors/releases) version. This topic uses Flink 1.14.5 and you can download `flink-sql-connector-mysql-cdc-2.2.0.jar`.
|
||||
|
||||
```Bash
|
||||
wget https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.1.1/flink-sql-connector-mysql-cdc-2.2.0.jar
|
||||
|
|
|
|||
|
|
@ -302,7 +302,7 @@ The data is successfully loaded when the above result is returned.
|
|||
|
||||
**Required**:<br/>
|
||||
**Default value**:<br/>
|
||||
**Description**: Stream Load parameters o control load behavior. For example, the parameter `sink.properties.format` specifies the format used for Stream Load, such as CSV or JSON. For a list of supported parameters and their descriptions, see [STREAM LOAD](../sql-reference/sql-statements/data-manipulation/STREAM LOAD.md).
|
||||
**Description**: Stream Load parameters o control load behavior. For example, the parameter `sink.properties.format` specifies the format used for Stream Load, such as CSV or JSON. For a list of supported parameters and their descriptions, see [STREAM LOAD](../sql-reference/sql-statements/data-manipulation/STREAM_LOAD.md).
|
||||
|
||||
### sink.properties.format
|
||||
|
||||
|
|
|
|||
|
|
@ -288,7 +288,7 @@ Release date: November 10, 2022
|
|||
|
||||
### Improvements
|
||||
|
||||
- The error message provides a solution when StarRocks fails to create a Routine Load job because the number of running Routine Load job exceeds the limit. [#12204]( https://github.com/StarRocks/starrocks/pull/12204)
|
||||
- The error message provides a solution when StarRocks fails to create a Routine Load job because the number of running Routine Load job exceeds the limit. [#12204](https://github.com/StarRocks/starrocks/pull/12204)
|
||||
- The query fails when StarRocks queries data from Hive and fails to parse CSV files. [#13013](https://github.com/StarRocks/starrocks/pull/13013)
|
||||
|
||||
### Bug Fixes
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ StarRocks starts to support the JSON data type since v2.2.0. This topic describe
|
|||
|
||||
JSON is a lightweight, data-interchange format that is designed for semi-structured data. JSON presents data in a hierarchical tree structure, which is flexible and easy to read and write in a wide range of data storage and analytics scenarios. JSON supports `NULL` values and the following data types: NUMBER, STRING, BOOLEAN, ARRAY, and OBJECT.
|
||||
|
||||
For more information about JSON, visit the [JSON website](http://www.json.org/?spm=a2c63.p38356.0.0.50756b9fVEfwCd). For information about the input and output syntax of JSON, see JSON specifications at [RFC 7159](https://tools.ietf.org/html/rfc7159?spm=a2c63.p38356.0.0.14d26b9fcp7fcf#page-4).
|
||||
For more information about JSON, visit the [JSON website](https://www.json.org/json-en.html). For information about the input and output syntax of JSON, see JSON specifications at [RFC 7159](https://tools.ietf.org/html/rfc7159?spm=a2c63.p38356.0.0.14d26b9fcp7fcf#page-4).
|
||||
|
||||
StarRocks supports both storage and efficient querying and analytics of JSON data. StarRocks does not directly store the input text. Instead, it stores JSON data in a binary format to reduce the cost of parsing and increase query efficiency.
|
||||
|
||||
|
|
|
|||
|
|
@ -126,7 +126,7 @@ PROPERTIES (
|
|||
|
||||
* `password`:目标数据库用户登录密码。
|
||||
|
||||
* `jdbc_uri`:JDBC 驱动程序连接目标数据库的 URI,需要满足目标数据库 URI 的语法。常见的目标数据库 URI,请参见 [MySQL](https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-jdbc-url-format.html)、[Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/data-sources-and-URLs.html#GUID-6D8EFA50-AB0F-4A2B-88A0-45B4A67C361E)、[PostgreSQL](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database)、[SQL Server](https://docs.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16) 官网文档。
|
||||
* `jdbc_uri`:JDBC 驱动程序连接目标数据库的 URI,需要满足目标数据库 URI 的语法。常见的目标数据库 URI,请参见 [MySQL](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-jdbc-url-format.html)、[Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/data-sources-and-URLs.html#GUID-6D8EFA50-AB0F-4A2B-88A0-45B4A67C361E)、[PostgreSQL](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database)、[SQL Server](https://docs.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16) 官网文档。
|
||||
|
||||
> **说明**
|
||||
>
|
||||
|
|
|
|||
|
|
@ -37,7 +37,11 @@ Querybook 支持对 StarRocks 的内部数据和外部数据进行查询和可
|
|||
|
||||
## 集成
|
||||
|
||||
进入 [https:///admin/query_engine/](https://localhost:10001/admin/query_engine/) 页面添加查询引擎。
|
||||
进入以下页面添加查询引擎。
|
||||
|
||||
```Plain
|
||||
https://localhost:10001/admin/query_engine/
|
||||
```
|
||||
|
||||

|
||||
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ displayed_sidebar: "Chinese"
|
|||
# 支持 Superset
|
||||
|
||||
[Apache Superset](https://superset.apache.org) 是一个现代数据探索和可视化平台。它使用 [SQLAlchemy](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks) 来查询数据。
|
||||
虽然可以使用 [Mysql Dialect](https://superset.apache.org/docs/databases/mysql),但是它不支持 LARGEINT。所以我们开发了 [StarRocks Dialect](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks/sqlalchemy)。
|
||||
虽然可以使用 [Mysql Dialect](https://superset.apache.org/docs/databases/mysql),但是它不支持 LARGEINT。所以我们开发了 [StarRocks Dialect](https://github.com/StarRocks/starrocks/tree/main/contrib/starrocks-python-client/starrocks/)。
|
||||
|
||||
## 环境准备
|
||||
|
||||
|
|
|
|||
|
|
@ -425,7 +425,7 @@ DISTRIBUTED BY HASH(id);
|
|||
|
||||
[Flink CDC 3.0 框架](https://nightlies.apache.org/flink/flink-cdc-docs-stable)可以轻松地从 CDC 数据源(如 MySQL、Kafka)到 StarRocks 构建流式 ELT 管道。该管道能够将整个数据库、分库分表以及来自源端的 schema change 同步到 StarRocks。
|
||||
|
||||
自 v1.2.9 起,StarRocks 提供的 Flink connector 已经集成至该框架中,并且被命名为 [StarRocks Pipeline Connector](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/starrocks)。StarRocks Pipeline Connector 支持:
|
||||
自 v1.2.9 起,StarRocks 提供的 Flink connector 已经集成至该框架中,并且被命名为 [StarRocks Pipeline Connector](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/connectors/pipeline-connectors/starrocks/)。StarRocks Pipeline Connector 支持:
|
||||
|
||||
- 自动创建数据库/表
|
||||
- 同步 schema change
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ StarRocks 支持多种方式将 MySQL 的数据实时同步至 StarRocks,支
|
|||
|
||||
2. **同步数据**
|
||||
|
||||
Flink SQL 客户端执行导入数据的 SQL 语句(`INSERT INTO SELECT`语句),向 Flink 集群提交一个或者多个长时间运行的 Flink job。Flink集群运行 Flink job ,[Flink cdc connector](https://ververica.github.io/flink-cdc-connectors/master/content/快速上手/build-real-time-data-lake-tutorial-zh.html) 先读取数据库的历史全量数据,然后无缝切换到增量读取,并且发给 flink-connector-starrocks,最后 flink-connector-starrocks 攒微批数据同步至 StarRocks。
|
||||
Flink SQL 客户端执行导入数据的 SQL 语句(`INSERT INTO SELECT`语句),向 Flink 集群提交一个或者多个长时间运行的 Flink job。Flink集群运行 Flink job ,Flink cdc connector 先读取数据库的历史全量数据,然后无缝切换到增量读取,并且发给 flink-connector-starrocks,最后 flink-connector-starrocks 攒微批数据同步至 StarRocks。
|
||||
|
||||
:::info
|
||||
|
||||
|
|
@ -94,7 +94,7 @@ StarRocks 支持多种方式将 MySQL 的数据实时同步至 StarRocks,支
|
|||
Starting taskexecutor daemon on host.
|
||||
```
|
||||
|
||||
2. **下载 [Flink CDC connector](https://github.com/ververica/flink-cdc-connectors/releases)**。本示例的数据源为 MySQL,因此下载 flink-sql-connector-**mysql**-cdc-x.x.x.jar。并且版本需支持对应的 Flink 版本,两者版本支持度,请参见 [Supported Flink Versions](https://ververica.github.io/flink-cdc-connectors/release-2.2/content/about.html#supported-flink-versions)。由于本文使用 Flink 1.14.5,因此可以使用 flink-sql-connector-mysql-cdc-2.2.0.jar。
|
||||
2. **下载 [Flink CDC connector](https://github.com/ververica/flink-cdc-connectors/releases)**。本示例的数据源为 MySQL,因此下载 flink-sql-connector-**mysql**-cdc-x.x.x.jar。并且版本需支持对应的 Flink 版本。由于本文使用 Flink 1.14.5,因此可以使用 flink-sql-connector-mysql-cdc-2.2.0.jar。
|
||||
|
||||
```Bash
|
||||
wget https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.0/flink-sql-connector-mysql-cdc-2.2.0.jar
|
||||
|
|
|
|||
|
|
@ -31,7 +31,7 @@ PROPERTIES (
|
|||
|
||||
### 准备 AutoMQ Kafka 环境和测试数据
|
||||
|
||||
参考 AutoMQ [快速入门](https://docs.automq.com/zh/docs/automq-s3kafka/VKpxwOPvciZmjGkHk5hcTz43nde)部署好 AutoMQ Kafka 集群,确保 AutoMQ Kafka 与 StarRocks 之间保持网络连通性。
|
||||
参考 AutoMQ [快速入门](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g)部署好 AutoMQ Kafka 集群,确保 AutoMQ Kafka 与 StarRocks 之间保持网络连通性。
|
||||
在AutoMQ Kafka中快速创建一个名为 example_topic 的主题并向其中写入一条测试JSON数据,可以通过以下步骤实现:
|
||||
|
||||
#### **创建Topic**
|
||||
|
|
|
|||
|
|
@ -288,7 +288,7 @@ displayed_sidebar: "Chinese"
|
|||
|
||||
### 功能优化
|
||||
|
||||
- 优化 Routine Load 创建失败时的报错提示。[#12204]( https://github.com/StarRocks/starrocks/pull/12204)
|
||||
- 优化 Routine Load 创建失败时的报错提示。[#12204](https://github.com/StarRocks/starrocks/pull/12204)
|
||||
- 查询 Hive 时解析 CSV 数据失败后会直接报错。[#13013](https://github.com/StarRocks/starrocks/pull/13013)
|
||||
|
||||
### 问题修复
|
||||
|
|
|
|||
|
|
@ -30,7 +30,7 @@ displayed_sidebar: "Chinese"
|
|||
- 优化了存算分离集群的垃圾回收机制,支持手动对表或分区进行 Compaction 操作,可以更高效的回收对象存储上的数据。[#39532](https://github.com/StarRocks/starrocks/issues/39532)
|
||||
- 支持从 StarRocks 读取 ARRAY、MAP 和 STRUCT 等复杂类型的数据,并以 Arrow 格式可提供给 Flink connector 读取使用。[#42932](https://github.com/StarRocks/starrocks/pull/42932) [#347](https://github.com/StarRocks/starrocks-connector-for-apache-flink/pull/347)
|
||||
- 支持查询时异步填充 Data Cache,从而减少缓存填充对首次查询性能影响。[#40489](https://github.com/StarRocks/starrocks/pull/40489)
|
||||
- 外表 ANALYZE TABLE 命令支持收集直方图统计信息,可以有效应对数据倾斜场景。参见 [CBO 统计信息]( https://docs.starrocks.io/zh/docs/3.2/using_starrocks/Cost_based_optimizer/#%E9%87%87%E9%9B%86-hiveiceberghudi-%E8%A1%A8%E7%9A%84%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF)。[#42693](https://github.com/StarRocks/starrocks/pull/42693)
|
||||
- 外表 ANALYZE TABLE 命令支持收集直方图统计信息,可以有效应对数据倾斜场景。参见 [CBO 统计信息](https://docs.starrocks.io/zh/docs/3.2/using_starrocks/Cost_based_optimizer/#%E9%87%87%E9%9B%86-hiveiceberghudi-%E8%A1%A8%E7%9A%84%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF)。[#42693](https://github.com/StarRocks/starrocks/pull/42693)
|
||||
- Lateral Join 结合 [UNNEST](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/array-functions/unnest/) 支持 LEFT JOIN。[#43973](https://github.com/StarRocks/starrocks/pull/43973)
|
||||
- Query Pool 内存支持通过 BE 静态参数 `query_pool_spill_mem_limit_threshold` 配置 Spill 阈值,如果超过阈值,查询可以通过中间结果落盘的方式降低内存占用减少 OOM。[#44063](https://github.com/StarRocks/starrocks/pull/44063)
|
||||
- 支持基于 Hive View 创建异步物化视图。[#45085](https://github.com/StarRocks/starrocks/pull/45085)
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ displayed_sidebar: "Chinese"
|
|||
|
||||
JSON 是一种轻量级的数据交换格式,JSON 类型的数据是一种半结构化的数据,支持树形结构。JSON 数据层次清晰,结构灵活易于阅读和处理,广泛应用于数据存储和分析场景。JSON 支持的数据类型为数字类型(NUMBER)、字符串类型(STRING)、布尔类型(BOOLEAN)、数组类型(ARRAY)、对象类型(OBJECT),以及 NULL 值。
|
||||
|
||||
JSON 的更多介绍,请参见 [JSON 官网](http://www.json.org/?spm=a2c63.p38356.0.0.50756b9fVEfwCd),JSON 数据的输入和输出语法,请参见 JSON 规范 [RFC 7159](https://tools.ietf.org/html/rfc7159?spm=a2c63.p38356.0.0.14d26b9fcp7fcf#page-4) 。
|
||||
JSON 的更多介绍,请参见 [JSON 官网](https://www.json.org/json-en.html),JSON 数据的输入和输出语法,请参见 JSON 规范 [RFC 7159](https://tools.ietf.org/html/rfc7159?spm=a2c63.p38356.0.0.14d26b9fcp7fcf#page-4) 。
|
||||
|
||||
StarRocks 支持存储和高效查询分析 JSON 数据。StarRocks 采用二进制格式编码来存储 JSON 数据,而不是直接存储所输入文本,因此在数据计算查询时,降低解析成本,从而提升查询效率。
|
||||
|
||||
|
|
|
|||
|
|
@ -137,7 +137,7 @@ Spark 集群配置不同,资源需要添加的配置项也不同。当前 Spar
|
|||
| type | 是 | 资源类型,取值为 `jdbc`。 |
|
||||
| user | 是 | 登录到支持的 JDBC 数据库 (以下简称“目标数据库”)的用户名。 |
|
||||
| password | 是 | 目标数据库的登录密码。 |
|
||||
| jdbc_uri | 是 | 用于连接目标数据库的JDBC URI,需要满足目标数据库 URI 的语法。常见的目标数据库 URI,请参见 [MySQL](https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-jdbc-url-format.html)、[Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/data-sources-and-URLs.html#GUID-6D8EFA50-AB0F-4A2B-88A0-45B4A67C361E)、[PostgreSQL](https://jdbc.postgresql.org/documentation/head/connect.html)、[SQL Server](https://docs.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16) 官网文档。 |
|
||||
| jdbc_uri | 是 | 用于连接目标数据库的JDBC URI,需要满足目标数据库 URI 的语法。常见的目标数据库 URI,请参见 [MySQL](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-jdbc-url-format.html)、[Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/data-sources-and-URLs.html#GUID-6D8EFA50-AB0F-4A2B-88A0-45B4A67C361E)、[PostgreSQL](https://jdbc.postgresql.org/documentation/head/connect.html)、[SQL Server](https://docs.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16) 官网文档。 |
|
||||
| driver_url | 是 | 用于下载 JDBC 驱动程序 JAR 包的 URL,支持使用 HTTP 协议 或者 file 协议。例如 `https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar`, `file:///home/disk1/postgresql-42.3.3.jar`。 |
|
||||
| driver_class | 是 | 目标数据库使用的 JDBC 驱动程序的类名称。常见的类名称如下: <ul><li>MySQL:com.mysql.jdbc.Driver(MySQL 5.x 及以下版本)和com.mysql.cj.jdbc.Driver (MySQL 6.x 及以上版本)</li><li>SQL Server:com.microsoft.sqlserver.jdbc.SQLServerDriver </li><li>Oracle: oracle.jdbc.driver.OracleDriver </li><li>PostgreSQL:org.postgresql.Driver</li></ul> |
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue