yan zhang
7b39acd554
[Enhancement] upgrade hudi-common and fix CVEs ( #59501 )
...
Upgrade hudi-common package so we can keep fixing latest CVEs.
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-05-30 09:34:33 -07:00
Yixin Luo
d9a3c105f9
[BugFix] fix skip write txnlog when compaction fail ( #59508 )
...
Signed-off-by: luohaha <18810541851@163.com>
2025-05-30 10:58:52 +00:00
shuming.li
427a73f1a3
[BugFix] Fix partition fetching bug with partition_retention_condition ( #59466 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-05-30 17:45:50 +08:00
yiming
ace091afea
[BugFix] Shouldn't show process list for root user when admin protection enabled ( #59435 )
...
Signed-off-by: Dejun Xia <xiadejun@starrocks.com>
2025-05-30 14:19:23 +08:00
zhangqiang
8d0f2561f4
[Feature] Support enabling partition aggregation for tables created in older versions ( #59102 )
...
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-30 14:03:44 +08:00
wyb
3ec633501e
[UT] Fix lake rollup job UT ( #59494 )
...
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-30 13:47:08 +08:00
Yixin Luo
1f057a298b
[BugFix] fix object storage dir delete after load spill ( #59480 )
...
Signed-off-by: luohaha <18810541851@163.com>
2025-05-30 02:23:30 +00:00
yan zhang
416cab867b
[BugFix] add `numFiles` to hive common stats to avoid hive insert overwrite failure ( #59469 )
...
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-05-30 09:24:30 +08:00
trueeyu
a86ab93f33
[Refactor] HiveDataSource will use page_cache_available instead of relying solely on configuration to determine whether to use page cache ( #59465 )
...
Why I'm doing:
The original implementation of get_capacity() acquired locks for all shards one by one, and then summed their capacities, which was inefficient. To reduce lock contention, I modified it to retrieve the total capacity directly from the SharedLRUCache directory entry, which avoids locking multiple shards.
PageCache is used by both external tables and internal tables, so i move file page_cache.h from storage/rowset to cache/object_cache
HiveDataSource will use func page_cache_available instead of relying solely on configuration to determine whether to use page cache.
What I'm doing:
HiveDataSource will use page_cache_available instead of relying solely on configuration to determine whether to use page cache
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-29 15:37:12 -07:00
Dan Jing
d8cbd13840
[Enhancement] Add log for tablet id of max_tablet_rowset_num ( #59467 )
...
Signed-off-by: Dan Jing <jingdan@starrocks.com>
2025-05-29 10:51:51 -07:00
shuming.li
bc688db036
[Enhancement] [BugFix] Optimize warehouse metrics queries ( #59379 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-05-29 18:56:04 +08:00
yan zhang
0beffd3c16
[Enhancxement] set `enable_rewrite_simple_agg_to_hdfs_scan` true by default ( #59462 )
...
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-05-29 16:44:58 +08:00
zhangqiang
2cb69a1c6f
[BugFix] Fix aggregate publish bug ( #59464 )
...
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-29 07:56:36 +00:00
Kevin Cai
2638c2aee5
[Enhancement] BE wait for at least one heartbeat from frontend during graceful exit ( #59428 )
...
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-05-28 21:19:33 -07:00
trueeyu
900dbea505
[Refactor] Use the storage page cache instead of the object cache to cache the decompressed data of external tables ( #59341 )
...
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-29 11:42:32 +08:00
Yixin Luo
c42c26d919
[Enhancement] pk compaction should not retry when apply timeout ( #59440 )
...
Signed-off-by: luohaha <18810541851@163.com>
2025-05-29 11:41:59 +08:00
satanson
260683ca89
[BugFix] analyzePlanWithExecStats throw NPE when Statement's OriginStatement is not set ( #59431 )
...
Signed-off-by: satanson <ranpanf@gmail.com>
2025-05-29 10:59:49 +08:00
gengjun-git
9c0277f110
[Enhancement] Use fastutil to optimize memory usage of HashMap in FE ( #58931 )
...
## Why I'm doing:
In a cluster with 5 million tablets (3 replicas, 500 tables, each table with 100 partitions, and each partition with 100 buckets), under no load (no checkpoint or tablet reporting), the jmap memory statistics are as follows:
```
$tail -n 10 jmap-raw1.txt
5409: 1 16 sun.reflect.generics.tree.VoidDescriptor (java.base@11.0.23)
5410: 1 16 sun.tools.attach.AttachProviderImpl (jdk.attach@11.0.23)
5411: 1 16 sun.tools.attach.HotSpotVirtualMachine$$Lambda$588/0x00002ad629a7f040 (jdk.attach@11.0.23)
5412: 1 16 sun.util.cldr.CLDRBaseLocaleDataMetaInfo (java.base@11.0.23)
5413: 1 16 sun.util.locale.provider.CalendarDataUtility$CalendarWeekParameterGetter (java.base@11.0.23)
5414: 1 16 sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter (java.base@11.0.23)
5415: 1 16 sun.util.logging.internal.LoggingProviderImpl (java.logging@11.0.23)
5416: 1 16 sun.util.resources.LocaleData$LocaleDataStrategy (java.base@11.0.23)
5417: 1 16 sun.util.resources.cldr.provider.CLDRLocaleDataMetaInfo (jdk.localedata@11.0.23)
Total 212125537 12211076824
$ grep "java.lang.Long" jmap-raw1.txt
3: 75453717 1810889208 java.lang.Long (java.base@11.0.23)
476: 1 2072 [Ljava.lang.Long; (java.base@11.0.23)
$ grep HashMap jmap-raw1.txt | awk '{sum += $3} END {print sum}'
4983183896
Total memory usage is 12,211,076,824 Bytes ≈ 11.37 GB, of which the HashMap data structure itself (excluding stored key-value data) occupies 4,983,183,896 Bytes ≈ 4.64 GB, and Long occupies 1,810,889,208 Bytes ≈ 1.69 GB. This memory can be optimized using specific data structures.
```
Total memory usage is 12,211,076,824 Bytes ≈ 11.37 GB, of which the HashMap data structure itself (excluding stored key-value data) occupies 4,983,183,896 Bytes ≈ 4.64 GB, and Long occupies 1,810,889,208 Bytes ≈ 1.69 GB. This memory can be optimized using specific data structures.
After using Fastutil's Long2ObjectOpenHashMap, the total memory usage decreased to 8,621,562,088 Bytes ≈ 8.03 GB, a reduction of 30%. The combined memory usage of HashMap and Long after optimization is 130,886,544 + 998,515,384 = 1,129,401,928 Bytes ≈ 1.05 GB. The effect is quite significant.
```
$ tail -n 10 jmap-fast1.txt
6071: 1 16 sun.reflect.generics.tree.VoidDescriptor (java.base@11.0.23)
6072: 1 16 sun.tools.attach.AttachProviderImpl (jdk.attach@11.0.23)
6073: 1 16 sun.tools.attach.HotSpotVirtualMachine$$Lambda$515/0x00002aef76e4f040 (jdk.attach@11.0.23)
6074: 1 16 sun.util.cldr.CLDRBaseLocaleDataMetaInfo (java.base@11.0.23)
6075: 1 16 sun.util.locale.provider.CalendarDataUtility$CalendarWeekParameterGetter (java.base@11.0.23)
6076: 1 16 sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter (java.base@11.0.23)
6077: 1 16 sun.util.logging.internal.LoggingProviderImpl (java.logging@11.0.23)
6078: 1 16 sun.util.resources.LocaleData$LocaleDataStrategy (java.base@11.0.23)
6079: 1 16 sun.util.resources.cldr.provider.CLDRLocaleDataMetaInfo (jdk.localedata@11.0.23)
Total 94530553 8621562088
$ grep "java.lang.Long" jmap-fast1.txt
13: 5453606 130886544 java.lang.Long (java.base@11.0.23)
520: 1 2072 [Ljava.lang.Long; (java.base@11.0.23)
$ grep HashMap jmap-fast1.txt | awk '{sum += $3} END {print sum}'
998515384
```
Signed-off-by: gengjun-git <gengjun@starrocks.com>
2025-05-29 10:30:19 +08:00
shuming.li
36a70c3f3c
[Enhancement] [Doc] add materialized view indexes docs ( #59432 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
2025-05-29 09:55:11 +08:00
SevenJ
3144f64a26
[Enhancement] add sql test for iceberg transform partition ( #59350 )
...
Signed-off-by: SevenJ <wenjun7j@gmail.com>
2025-05-29 01:51:54 +00:00
Rohit Satardekar
a6f22d062c
[Enhancement] Support current_timestamp(x) as column default value ( #58167 )
...
Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
2025-05-29 09:50:51 +08:00
zhangqiang
323078ba0a
[BugFix] Fix storage_size lost in partitions_meta table ( #59434 )
...
## Why I'm doing:
This pr(https://github.com/StarRocks/starrocks/pull/56234 ) add storage_size for lake table and add it to partition proc, but storage size does not added to table `partitions_meta`.
## What I'm doing:
Add storage_size to table `partitions_meta`
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-28 09:57:10 -07:00
Kevin Cai
a59852d6af
[BugFix] stream load workflow check process exit status in a few points ( #59362 )
...
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-05-28 08:55:09 -07:00
Dan Roscigno
31fdd8f622
[Doc] remove description heading ( #59443 )
...
Signed-off-by: DanRoscigno <dan@roscigno.com>
2025-05-28 08:48:20 -07:00
shuming.li
daac44fb96
[Feature] (CNResourceProvider Part1) Introduce ComputeNode Resource and ResourceProvider ( #59358 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-05-28 19:07:05 +08:00
絵空事スピリット
4c384dc63b
[Doc] Fix Partition Merge syntax and example ( #59429 )
2025-05-28 18:38:58 +08:00
Zach
31f15b049d
[BugFix] Add special handling for information_schema.tables to support Tableau's use of SELECT aliases in HAVING clause ( #59406 )
...
Signed-off-by: Zac-saodiseng <3253345336@qq.com>
2025-05-28 18:38:41 +08:00
zhangqiang
a199a33f53
[Enhancement] Fully support aggregate publish version for lake table ( #59395 )
...
## Why I'm doing:
In previous PRs, we support aggregate publish for lake table. But there are some job not support aggregate publish yet:
1. batch publish
2. alter job
## What I'm doing:
enable aggregate publish for those jobs.
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-28 16:44:20 +08:00
trueeyu
b7c39b9d5c
[Enhancement] Fix the cache select behavior when datacache is disabled ( #59410 )
...
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-28 16:26:44 +08:00
絵空事スピリット
664771f2ea
[Doc] SELECT supports FILTER ( #59417 )
2025-05-28 15:30:57 +08:00
Seaven
4b0829b668
[Feature][SPM] spm support auto capturer ( #58729 )
...
Signed-off-by: Seaven <seaven_7@qq.com>
2025-05-28 15:28:01 +08:00
Youngwb
f8628553de
[Enhancement] Support query iceberg metadata table for unified catalog ( #59412 )
...
Signed-off-by: Youngwb <yangwenbo_mailbox@163.com>
2025-05-28 15:19:45 +08:00
絵空事スピリット
d07c6ec1b1
[Doc] Update the cover page for Async MV references ( #59339 )
2025-05-28 15:17:51 +08:00
simo
e431486164
[Doc] Update security_integration.md ( #59348 )
...
Signed-off-by: simo <48942089+wangsimo0@users.noreply.github.com>
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
2025-05-28 15:17:41 +08:00
stdpain
69fc6bdecb
[Enhancement] Add SignalTimerGuard class for thread stack trace timeout monitoring ( #59380 )
...
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-28 14:47:28 +08:00
stdpain
e8b0cd792c
[Enhancement] fix unexpected function call in fill_dst_column ( #59411 )
...
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-28 13:52:57 +08:00
PengFei Li
a8ac566ff5
[BugFix] Don't archive task run history on fe follower ( #59393 )
...
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-05-28 03:07:34 +00:00
zombee0
09f7993876
[BugFix]Don't return error when decoding min/max is not supported ( #59346 )
...
we didn't support decoding min/max value of some types, but we shouldn't return
the error status to users.
On main branch and branch-3.5, there is no bug, we just deal with this error status
asap to avoid further bug.
On branch-3.4 and branch-3.3, there is bug when the type of predicate is in-filter
or rf-min-max and the type of data is float/double.
Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-05-27 19:50:48 -07:00
Kevin Cai
b3f037315d
[BugFix] shared-data stream load filtering out SHUTDOWN node ( #59349 )
...
Use DefaultSharedDataWorkerProvider to replace unavailable workers with its backup node in shared-data mode.
The shared-nothing part is done in #58357
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-05-27 19:47:25 -07:00
Hechem Selmi
c68c2fb732
[Enhancement] Push down limit to multi cast sink ( #59265 )
...
Signed-off-by: m-selmi <m.selmi@celonis.com>
2025-05-28 09:57:52 +08:00
PengFei Li
58569bb871
[Enhancement] Secondary replica supports to poll state from primary replica ( #57539 )
...
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-05-28 09:28:26 +08:00
Zach
24619d6a48
[Enhancement]Replace exec with ast to load metadata safely ( #59377 )
...
Signed-off-by: Zac-saodiseng <3253345336@qq.com>
2025-05-27 17:29:30 +08:00
xiangguangyxg
6660eb6a50
[BugFix] Fix forgetting to set min version for showing single tablet info in shared-data mode ( #59373 )
...
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-05-27 16:47:57 +08:00
xiangguangyxg
66f1a78adb
[Doc] Change defaut value of max_replication_data_size_per_job_in_gb in data migration tool ( #59383 )
...
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-05-27 08:46:10 +00:00
shuming.li
4d30cd8c19
[BugFix] Fix mv refresh external table error ( #59369 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-05-27 16:21:46 +08:00
shuming.li
522f719bf6
[Tool] [BugFix] Fix building format-sdk compile bugs ( #59365 )
...
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-05-27 16:03:16 +08:00
Zach
d2620451d4
[BugFix]Fix table_type ( #59368 )
...
Signed-off-by: Zac-saodiseng <3253345336@qq.com>
2025-05-27 14:31:21 +08:00
Yixin Luo
2f1ec5e920
[BugFix] fix persistent index compatibility issue when migrate between different cpu arch ( #59219 )
...
Signed-off-by: luohaha <18810541851@163.com>
2025-05-26 22:58:33 -07:00
Binglin Chang
3c61005e17
[Feature] Add storage volume and staros support for GCS ( #58815 )
...
Signed-off-by: Binglin Chang <decstery@gmail.com>
2025-05-27 11:21:48 +08:00
wyb
ad3380ee61
[Enhancement] Support service principal authentication in azure blob file system ( #59308 )
...
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-27 10:03:11 +08:00