[Doc] Update Colocate Join Principles (backport #63153) (#63165)

Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com> Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
2025-09-16 02:13:50 +00:00 · 2025-09-16 02:13:50 +00:00 · c0e4a1337b
parent 53754c69a8
commit c0e4a1337b
3 changed files with 2 additions and 5 deletions
--- a/docs/en/using_starrocks/Colocate_join.md
+++ b/docs/en/using_starrocks/Colocate_join.md
@ -25,8 +25,7 @@ Bucket Seq is obtained by `hash(key) mod buckets`. Suppose a Table has 8 buckets
 In order to have the same data distribution, tables within the same CG must comply with the following.

 1. Tables within the same CG must have the identical  bucketing key (type, number, order) and the same number of buckets so that the data slices of multiple tables can be distributed and controlled one by one. The bucketing key is the columns specified in the table creation statement `DISTRIBUTED BY HASH(col1, col2, ...)`. The bucketing key determines which columns of data are Hashed into different Bucket Seqs. The name of the bucketing key can vary for tables within the same CG.The bucketing columns can be different in the creation statement, but the order of the corresponding data types in `DISTRIBUTED BY HASH(col1, col2, ...)` should be exactly the same .
-2. Tables within the same CG must have the same number of partition copies. If not, it may happen that a tablet copy has no corresponding copy in the partition of  the same BE.
-3. Tables within the same CG may have different numbers of partitions and different partition keys.
+2. Tables within the same CG may have different numbers of partitions and different partition keys.

 When creating a table, the CG is specified by the attribute `"colocate_with" = "group_name"` in the table PROPERTIES. If the CG does not exist, it means the table is the first table of the CG and called Parent Table. The data distribution of the Parent Table (type, number and order of split bucket keys, number of copies and number of split buckets) determines the CGS. If the CG exists, check whether the data distribution of the table is consistent with the CGS.

--- a/docs/ja/using_starrocks/Colocate_join.md
+++ b/docs/ja/using_starrocks/Colocate_join.md
@ -25,8 +25,7 @@ Colocate Join は、同じ CGS を持つ一連のテーブルで CG を形成し
 同じデータ分布を持つために、同じ CG 内のテーブルは次のことを遵守する必要があります。

 1. 同じ CG 内のテーブルは、同一のバケッティングキー（タイプ、数、順序）と同じ数のバケットを持たなければなりません。これにより、複数のテーブルのデータスライスを一対一で分配および制御できます。バケッティングキーは、テーブル作成ステートメント `DISTRIBUTED BY HASH(col1, col2, ...)` で指定された列です。バケッティングキーは、データのどの列が異なるバケットシーケンスにハッシュされるかを決定します。同じ CG 内のテーブルでバケッティングキーの名前は異なる場合があります。作成ステートメントでバケッティング列が異なる場合がありますが、`DISTRIBUTED BY HASH(col1, col2, ...)` の対応するデータ型の順序は完全に同じである必要があります。
-2. 同じ CG 内のテーブルは、同じ数のパーティションコピーを持たなければなりません。そうでない場合、タブレットコピーが同じ BE のパーティションに対応するコピーを持たないことがあるかもしれません。
-3. 同じ CG 内のテーブルは、異なる数のパーティションと異なるパーティションキーを持つことができます。
+2. 同じ CG 内のテーブルは、異なる数のパーティションと異なるパーティションキーを持つことができます。

 テーブルを作成するとき、CG はテーブルプロパティ内の属性 `"colocate_with" = "group_name"` によって指定されます。CG が存在しない場合、それはテーブルが CG の最初のテーブルであり、親テーブルと呼ばれます。親テーブルのデータ分布（スプリットバケットキーのタイプ、数、順序、コピーの数、スプリットバケットの数）が CGS を決定します。CG が存在する場合、テーブルのデータ分布が CGS と一致しているかどうかを確認します。

--- a/docs/zh/using_starrocks/Colocate_join.md
+++ b/docs/zh/using_starrocks/Colocate_join.md
@ -45,7 +45,6 @@ PROPERTIES(
 为了使得表能够有相同的数据分布，同一 CG 内的表必须满足下列约束：

 * 同一 CG 内的表的分桶键的类型、数量和顺序完全一致，并且桶数一致，从而保证多张表的数据分片能够一一对应地进行分布控制。分桶键，即在建表语句中 `DISTRIBUTED BY HASH(col1, col2, ...)` 中指定一组列。分桶键决定了一张表的数据通过哪些列的值进行 Hash 划分到不同的 Bucket Seq 下。同 CG 的表的分桶键的名字可以不相同，分桶列的定义在建表语句中的出现次序可以不一致，但是在 `DISTRIBUTED BY HASH(col1, col2, ...)` 的对应数据类型的顺序要完全一致。
-* 同一个 CG 内所有表的所有分区的副本数必须一致。如果不一致，可能出现某一个子表的某一个副本，在同一个 BE 上没有其他的表分片的副本对应。
 * 同一个 CG 内所有表的分区键，分区数量可以不同。

 同一个 CG 中的所有表的副本放置必须满足下列约束：