Commit Graph

7241 Commits

Author SHA1 Message Date
qingzhongli 7df950e7d8
[UT] Fix sse_memcmp UT compilation error on aarch64 (#61569)
Fix sse_memcmp UT compilation error on aarch64.

## Why I'm doing:
```
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/monotime_test.cpp.o
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/mysql_row_buffer_test.cpp.o
/root/starrocks/be/test/util/memcmp_test.cpp: In member function 'virtual void starrocks::sse_memcmp_Test_Test::TestBody()':
/root/starrocks/be/test/util/memcmp_test.cpp:38:20: error: 'sse_memcmp2' was not declared in this scope
   38 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:46:20: error: 'sse_memcmp2' was not declared in this scope
   46 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:54:20: error: 'sse_memcmp2' was not declared in this scope
   54 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:62:20: error: 'sse_memcmp2' was not declared in this scope
   62 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:71:20: error: 'sse_memcmp2' was not declared in this scope
   71 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:80:20: error: 'sse_memcmp2' was not declared in this scope
   80 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:89:20: error: 'sse_memcmp2' was not declared in this scope
   89 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:98:20: error: 'sse_memcmp2' was not declared in this scope
   98 |         int res2 = sse_memcmp2(c1, c2, 3);
      |                    ^~~~~~~~~~~
make[2]: *** [test/CMakeFiles/starrocks_test_objs.dir/util/memcmp_test.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [test/CMakeFiles/starrocks_test_objs.dir/all] Error 2
make: *** [all] Error 2
```

Signed-off-by: qingzhongli <qingzhongli2018@gmail.com>
2025-08-04 15:45:17 +08:00
Yixin Luo 7dac2090e1
[Enhancement] reuse I/O when reading bundled tablet meta (#61413)
Signed-off-by: luohaha <18810541851@163.com>
2025-08-04 10:47:34 +08:00
Yixin Luo 75854adf72
[Enhancement] optimize tablet meta copy when enable file bundling (#61410)
Signed-off-by: luohaha <18810541851@163.com>
2025-08-04 10:47:21 +08:00
Kevin Cai a3a0a01140
[BugFix] fix file-prefix-map, remove the build_XXX part (#61540)
* -ffile-prefix-map=/build/starrocks/be/build_RELEASE=. -ffile-prefix-map=/build/starrocks/be=be
* before this fix: source file lists 
```
  be/build_RELEASE/be/src/agent/agent_common.h
  be/build_RELEASE/be/src/agent/agent_server.cpp
  ...
  be/build_RELEASE/be/src/util/value_generator.h
  be/build_RELEASE/be/src/util/xxhash.h
``` 
  after this fix: 
```
 ./be/src/agent/agent_common.h
 ./be/src/agent/agent_server.cpp
 ...
 ./be/src/util/value_generator.h
 ./be/src/util/xxhash.h
```

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-08-04 09:11:24 +08:00
zhangqiang d89a2f64f4
[Refactor] Change the data type of data_size column in the partitions_meta table to bigint. (#61251)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-08-02 11:32:38 +08:00
Kevin Cai 45c2310372
[BugFix] don't try to build MFV versions for the instructions turned off (#61532)
the cpu instruction is off either because of not wanted the target instruction set or the build machine doesn't have the instruction supported. Be respectful to the instruction switch.

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-08-02 11:02:34 +08:00
stdpain 94726f0973
[BugFix] Fix UAF in local-partition preagg (#61524)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-08-01 09:37:18 +00:00
stdpain 14fca55647
[BugFix] Fix local-passthrough cancel dead lock (#61487)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-31 20:02:39 +08:00
Kevin Cai ef04362a2f
[BugFix] properly handle orc reader decompress error (#61464)
handle zstd decompress failure, throw runtime_error exception
fix orc_scanner tpch_10k.orc.zstd, it's corrupted. Replace it with correct test file and update the related test cases.

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-31 16:35:18 +08:00
SevenJ a362c009bc
[UT] Fix iceberg trans ut (#61459)
Signed-off-by: SevenJ <wenjun7j@gmail.com>
2025-07-31 14:38:18 +08:00
stdpain 65fd661601
[Enhancement] add vectorized implemention for assign_data_with_nulls (#61454)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-31 14:25:52 +08:00
zombee0 ee8bea1c33
[Enhancement]murmur3 hash to do bucket aware execution for iceberg (#61366)
Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-07-31 10:05:17 +08:00
stdpain fce0346e97
[BugFix] Fix local-passthrough cause rpc to get stuck (#61427)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-30 16:21:23 +08:00
Fatih Çatalkaya 75b996b714
[BugFix] Set Content-Type to application/json when responding to stream load http requests (#61144)
Why I'm doing:
When sending a request to the /api/transaction/{begin,load,commit,...} endpoints, the content type is wrongly set to text/html instead of application/json.

What I'm doing:
Fixes #61130

Signed-off-by: Fatih Çatalkaya <fatih.catalkaya@yahoo.de>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-30 08:47:21 +08:00
Yixin Luo 768e03ec5e
[Enhancement] add idle time config for publish version thread pool (#61239)
Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: Yixin Luo <luoyixin6688@gmail.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
2025-07-29 16:40:52 +00:00
Murphy af49488e6f
[UT] Fuzz test built-in functions with type coverage (#61303)
Signed-off-by: Murphy <mofei@starrocks.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-29 11:20:47 +00:00
Kevin Cai 942a77c5bb
[UT] fix incorrect use of evhttp in stream load unit test (#61390)
should not create a separate evhttp_request in test body
shall leverage the input_buffer created in the evhttp_request initialized by evhttp_request_new()

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-29 18:43:01 +08:00
stdpain 04001e8618
[Enhancement] some minor optmize for read parquet files (#60551)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-29 18:12:53 +08:00
zhangqiang 2defcf0579
[BugFix] Fix nullptr exception during clone. (#61359)
Why I'm doing:
When clone and drop table run concurrency, the new_tablet during clone maybe dropped and throw null exception.


Signed-off-by: sevev <qiangzh95@gmail.com>
2025-07-29 16:26:44 +08:00
Yixin Luo 0f1deef421
[BugFix] fix missing partition id in combine txnlog (#61207)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-29 16:25:12 +08:00
Murphy c9ea6464fe
[BugFix] compile failure in clang (#61351)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-29 13:07:50 +08:00
stdpain 70a7f618d5
[Refactor] Refactor scalar function registration to speed up compilation (#61358)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-29 09:45:15 +08:00
yan zhang d46937ed5c
[UT] Fix compilation and be-ut (#61347)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-29 09:21:40 +08:00
Hechem Selmi b7c2561dc0
[Enhancement] Avoid brpc communication when using local pass through (#60538)
Signed-off-by: m-selmi <m.selmi@celonis.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Co-authored-by: stdpain <drfeng08@gmail.com>
2025-07-28 11:08:19 +00:00
shuming.li ac5fc3f681
[UT] [BugFix] Fix unstable JITCacheTest tests (#61331) 2025-07-28 18:47:39 +08:00
starrocks-xupeng b0f5cbbbb1
[Enhancement] add segment write time in lake compaction (#60891)
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
2025-07-28 17:35:11 +08:00
yan zhang b84d2051e4
[UT] disable parquet asan long running ut (#61334)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-28 15:58:36 +08:00
stdpain 9521bd8266
[Enhancement] Introduce RETURN_IF_DCHECK_XX_FAILED (#61315)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-28 14:08:11 +08:00
Jun-Seok Heo 774b9d0de3
[BugFix] fix the pruned column size to be same with the unpruned one (#61271) 2025-07-28 12:06:20 +08:00
stdpain fc856ca330
[BugFix] Fix array_map crash when capture const array columns (#61309)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-26 15:54:39 +08:00
duanyyyyyyy 561b82cd93
[BugFix] Fix a bug that agg_state_if will not handle the streaming aggregation cases (#61084)
Signed-off-by: ‘duanyyyyyyy’ <yan.duan9759@gmail.com>
2025-07-26 12:50:47 +08:00
Kevin Cai b5cc684042
[UT] fix StarOSWorker AwsSDK cleanup issue (#61265)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-25 13:49:44 +08:00
srlch cbb77d9883
[BugFix] Fix set null value for auto_increment column will reject the valid data if they are in the same chunk (#61255)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-07-25 12:48:28 +08:00
Evgeniy Zuikin 81ff271a80
[BugFix] Fix array column cloning durign array comparison (#61036)
Signed-off-by: SHaaD94 <eugenzuy@gmail.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Signed-off-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: stdpain <drfeng08@gmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
2025-07-25 11:06:42 +08:00
Murphy 2b69350d1b
[BugFix] fix hour_from_unixtime (#61206)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-25 10:10:14 +08:00
Gavin 46601e16e4
[Enhancement] Disable the inline mode when writing data to datacache because it may cause a performance degradation. (#60530)
Signed-off-by: GavinMar <yangguansuo@starrocks.com>
2025-07-24 17:27:34 +08:00
eyes_on_me 4167aaf940
[BugFix] fix TableMetricsMgrTest (#61218)
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
2025-07-24 13:56:44 +08:00
zihe.liu 3107899823
[BugFix] Fix resource group cpu usage (#61177)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-07-23 19:37:10 +08:00
eyes_on_me d71cc3d2c7
[BugFix] reduce lock contention of TableMetricsManager (#58911)
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
2025-07-23 19:15:07 +08:00
eyes_on_me 6abb89573c
[BugFix] make scan behavior consistent on shared-data and shared-nothing (#61100)
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
2025-07-23 10:24:59 +08:00
satanson e91696fa1b
[BugFix] excluding some files involving JIT when STARROCKS_JIT_ENABLE=OFF (#61138)
Signed-off-by: satanson <ranpanf@gmail.com>
2025-07-22 16:13:37 +08:00
zihe.liu 2144db870c
[Enhancement] Use RangeDirectMapping to optimize hash join (#61124)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-07-22 15:55:54 +08:00
alexzorin 2dbfc1d516
[BugFix] set hit_count in vector index metrics (#61102)
Signed-off-by: Alex Zorin <alex@zorin.au>
2025-07-22 14:41:56 +08:00
starrocks-xupeng f3144b9a2e
[BugFix] fix cache might not be used when upgraded from 3.3 (#60973)
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
2025-07-22 14:32:30 +08:00
stdpain 78558bcc07
[BugFix] Fix dictionary inconsistency in shared-data mode (#61006)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-22 14:21:27 +08:00
stdpain ebd73ed42c
[Enhancement] avoid reuse ByteBuffer when merge data in JAVA UDAF (#61054)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-21 15:39:22 +08:00
zihe.liu c2d4734377
[Refactor] Split join_hash_map into files (#61010)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-07-21 14:04:11 +08:00
srlch 4ac5ae833f
[Enhancement] Filter out keys using SstablePredicate for sstable after compaction (#60743)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-07-21 09:33:53 +08:00
satanson f877782f08
[BugFix] Executable segments generated by JIT are not released when it is evicted from JIT cache (#61027)
Signed-off-by: satanson <ranpanf@gmail.com>
2025-07-18 16:43:09 +08:00
satanson b26637e0f5
[BugFix] disable jit in BE (#61060)
Signed-off-by: satanson <ranpanf@gmail.com>
2025-07-18 11:40:06 +08:00
yan zhang c72152f5a5
[Enhancement] support uuid type in postgres (#61021)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-18 10:11:25 +08:00
yan zhang f5f8e9bc2c
[Enhancement] support map type in UDAF (#60840)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-18 09:49:51 +08:00
stdpain 44e64daea0
[BugFix] Make Python UDF error reporting clearer (#61015)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-17 18:02:25 +08:00
zihe.liu 68e827d71a
[Refactor] Split JoinFunc to KeyConstructor and HashMapMethod (#60932)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-07-17 11:10:36 +08:00
zhangqiang c655814072
[BugFix] erase partition from partiton_map when partiton_ids is empty (#60842)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-07-17 10:31:44 +08:00
shuming.li 26f053fb8f
[UT] Fix broken stringCastBitmapFailed0 test (#60971)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-07-16 14:57:56 +08:00
Seaven df099e035a
[Enhancement] compute unused column by be (#60462)
Signed-off-by: Seaven <seaven_7@qq.com>
2025-07-16 14:29:28 +08:00
Murphy a63d9820e5
[BugFix] fix the counter unit of OutputChunkBytes (#60940)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-16 10:28:37 +08:00
stdpain d6a0b7413d
[BugFix] Fix extract wrong result from relative URL (#60926)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-16 09:36:48 +08:00
Murphy 2d2219ee42
[UT] mark as slow ut: testJsonColumnCompression (#60942)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-15 18:55:19 +08:00
gengjun-git 5e5a9c972f
[BugFix] Change KEYWORD to WORD to comply with MySQL's standard definition (#60863)
Change KEYWORD to WORD to comply with MySQL's standard definition. https://dev.mysql.com/doc/refman/8.0/en/information-schema-keywords-table.html

Signed-off-by: gengjun-git <gengjun@starrocks.com>
2025-07-15 09:55:10 +08:00
wyb 61f12e7675
[Enhancement] Support parquet version in files unload (#60843)
Signed-off-by: wyb <wybb86@gmail.com>
2025-07-15 09:34:58 +08:00
Yixin Luo 951816f2d5
[Enhancement] add some metrics for aggregate publish version & compaction (#60747)
Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: Yixin Luo <luoyixin6688@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-14 19:32:12 +08:00
meegoo 9475e2abce
[BugFix] Fix large number of base compactions block other compaction tasks (#60711)
Signed-off-by: meegoo <meegoo.sr@gmail.com>
2025-07-14 18:46:54 +08:00
Kevin Cai c186dc288c
[BugFix] efficiently handle error string truncating (#60878)
* truncate the string before convert to a std::string

## Why I'm doing:

```
W0713 17:45:07.945505 41413 mem_hook.cpp:90] large memory alloc, query_id:78472d5d-414e-ebb9-3edf-18733d316fb4 instance: 00000000-0000-0000-0000-000000000000 acquire:4294955590 bytes, stack:
    @          0x2f9d0f2  malloc
    @          0x915b765  operator new()
    @          0x373a583  std::__cxx11::basic_string<>::_M_construct<>()
    @          0x373b014  starrocks::stream_load::OlapTableSink::_print_varchar_error_msg()
    @          0x373fa75  starrocks::stream_load::OlapTableSink::_validate_data()
    @          0x374782a  starrocks::stream_load::OlapTableSink::send_chunk()
    @          0x2f5a50b  starrocks::PlanFragmentExecutor::_open_internal_vectorized()
    @          0x2f5cd81  starrocks::PlanFragmentExecutor::open()
    @          0x2ed4ceb  starrocks::FragmentExecState::execute()
    @          0x2edb5a8  starrocks::FragmentMgr::exec_actual()
    @          0x30542cc  starrocks::ThreadPool::dispatch_thread()
    @          0x304d0ea  starrocks::Thread::supervise_thread()
    @     0x2b3316651ea5  start_thread
    @     0x2b331728c96d  __clone
    @              (nil)  (unknown)
```

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-14 18:31:28 +08:00
stdpain 8a035148f0
[BugFix] Fix inconsistent implementations in url_extract_parameter (#60873)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-14 18:24:30 +08:00
stdpain a24412587e
[BugFix] Fix BE crash when loading a OOM partition (#60778)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-14 10:09:25 +08:00
Alexey eeb2900f64
[BugFix] incorrect expression in zonemap filters (#60845)
regression introduced in #53967

available in 3.5, backported to 3.4
3.3 doesn't have bug 

`!(expr1 && expr2)` have to be transformed into `!expr1 || !expr2`
not like it happens in pr `!expr1 && !expr2`

otherwise we will use incorrect expression a top of scalar column

## Why I'm doing:

in #53967 we introduce zonemap filtering for struct column
but it contain bug 

failed query
```
select x
from y
where x.field1[1].field2
```

```cpp
    // check subfield expr has only one child, and it's a SlotRef
    if (subfield_expr->children().size() != 1 && !subfield_expr->get_child(0)->is_slotref()) {
        return Status::InternalError("Invalid pattern for predicate");
    }
```

`subfield_expr->children().size() != 1`  - `false`
`!subfield_expr->get_child(0)->is_slotref()` - `true`, because it will be array access

whole expression also `false` and execution continued
now we expect, because we want 

```
// Rewrite ColumnExprPredicate which contains subfield expr and put subfield path into subfield_output
// For example, WHERE col.a.b.c > 5, a.b.c is subfields, we will rewrite it to c > 5
```
but still have `[1].field2`

## What I'm doing:

fix expression logic

Signed-off-by: Aliaksei Dziomin <diominay@gmail.com>
2025-07-12 13:01:43 +08:00
Yakir Gibraltar 0fc909c8de
[Enhancement] Implement missing HDFS filesystem operations for spilling support (#59759)
Fixes #59757

Why I'm doing:
When using HDFS as a remote storage volume for spilling, StarRocks fails with NOT_IMPLEMENTED_ERROR because several critical filesystem operations were not implemented in the HDFS filesystem wrapper (HdfsFileSystem class). These operations are essential for the spilling workflow:

Creating directories for spill data organization
Checking if paths are directories vs files
Deleting files and directories during cleanup
Managing directory hierarchy for spill containers
Without these operations, queries that need to spill to HDFS storage volumes cannot function, severely limiting StarRocks' ability to handle large datasets when using HDFS as external storage.

What I'm doing:
This PR implements the missing HDFS filesystem operations required for spilling functionality:

Implemented Operations:
delete_file() - Delete files from HDFS using hdfsDelete
create_dir() - Create directories using hdfsCreateDirectory
create_dir_if_missing() - Create directories if they don't exist (with existence check)
create_dir_recursive() - Create directories recursively (leverages HDFS native recursive creation)
delete_dir() - Delete empty directories using hdfsDelete
delete_dir_recursive() - Delete directories and all contents recursively
is_directory() - Check if a path is a directory using hdfsGetPathInfo
Additional Improvements:
Added private helper method _is_directory() for internal directory type checking
Fixed bug in hdfs_write_buffer_size assignment for upload options (was using __isset instead of actual value)
Added comprehensive test coverage including realistic spilling workflow simulation
Implementation Details:
All operations properly handle HDFS connections through existing HdfsFsCache infrastructure
Robust error handling with meaningful error messages using get_hdfs_err_msg()
Path existence validation before operations to provide clear error messages
Directory vs file type validation to prevent incorrect operations
Follows existing code patterns and error handling conventions in the codebase
Fixes #59757

Signed-off-by: Yakir Gibraltar <yakir.g@taboola.com>
Signed-off-by: Yakir Gibraltar <yakirgb@gmail.com>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Yakir Gibraltar <yakir.g@taboola.com>
Co-authored-by: Kevin Cai <caixh.kevin@gmail.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
2025-07-12 09:58:26 +08:00
liubotao d34b14224e
[Feature] add new function to_datetime and to_datetime_ntz (#60637)
Signed-off-by: liubotao <316945435@qq.com>
2025-07-11 22:07:01 +08:00
zhangqiang ebeb2ac7fd
[Enhancement] Support different compaction strategy for different table in shared-data mode (#60366)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-07-11 11:18:14 +08:00
Xu Bai 6bf1a340dc
[Feature] Implement parquet variant decoding (#60189) 2025-07-10 20:16:19 +08:00
Murphy d9b23cffb4
[Enhancement] function hour_from_unixtime (#60331)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-10 16:16:51 +08:00
stephen b8d2a70930
[Enhancement] support collection array column ndv (#60623)
Signed-off-by: stephen <stephen5217@163.com>
2025-07-10 14:44:31 +08:00
Yixin Luo 1863e49d09
[BugFix] Add compact to LakeService_RecoverableStub and optimize the error messages returned by aggregation compaction (#60715)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-10 11:16:25 +08:00
Gavin a7c8f86e47
[BugFix] Release the cache engine instances before the datacache is freed to clean some related resources in advance (#60745)
Signed-off-by: GavinMar <yangguansuo@starrocks.com>
2025-07-10 09:56:52 +08:00
shuming.li a45f155e86
[BugFix] Remove unused output_scale variable in expr (#60731)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-07-09 11:57:54 +00:00
ruyliu b0d96c5c52
[BugFix] Cherry-Pick ORC-1525 bugfix from apache-orc (#60722) 2025-07-09 11:05:27 +00:00
srlch a04ba3afca
[Feature] Introduce predicate for sstable (#60645)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-07-09 16:40:43 +08:00
zhangqiang 457f0a9e2e
[Enhancement] Add lake service stub cache (#60517)
Why I'm doing:
After support file_bundling, we will create brpc channels between CN nodes during each publish operation which may affect publish performance.

What I'm doing:
Add lake service stub cache to avoid creating brpc channels on each publish


Signed-off-by: sevev <qiangzh95@gmail.com>
2025-07-09 08:50:08 +08:00
Yixin Luo 1b855d34a0
[Enhancement] Add garbage file checking to facilitate future comprehensive vacuum testing (#60639)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-08 16:29:48 +08:00
starrocks-xupeng 0249856554
[BugFix] fix compaction new segments does not clean by abort txn (#60673)
when partial compaction is not used, still need to set correct new segment info, so that abort txn can clean new segments

Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
2025-07-08 11:37:19 +08:00
stdpain a3def612b0
[BugFix] Fix memory/row count inaccuracies that can cause aggregate stucked (#60612)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-07 17:52:05 +08:00
TsukiokaKogane c6843040d8
[BugFix] fix short circuit query core with out of order value column sql (#60466)
Signed-off-by: TsukiokaKogane <cby141994@gmail.com>
2025-07-07 08:51:01 +00:00
stdpain 8b3965a337
[Enhancement] Make UDF URLs not have to use a specific ending (#60622)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-07 14:48:01 +08:00
Yixin Luo 0b93fb7c2a
[BugFix] fix cloud native pk index memory statistic leak (#60566)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-07 14:37:21 +08:00
before-Sunrise d8981a85be
[BugFix]fix ngram_search use after free (#60608)
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
2025-07-07 13:54:43 +08:00
Mesut Döner c1e47b2d3d
[BugFix] split_part function should not return null when delimiter is not matched (#56967)
Signed-off-by: stdpain <34912776+stdpain@users.noreply.github.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: stdpain <drfeng08@gmail.com>
2025-07-04 16:40:12 +08:00
Yixin Luo 3edbb4955c
[Tool] print bundle tablet meta proto as string (#60600)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-04 06:56:02 +00:00
Murphy 74d21b3600
[Enhancement] Add expression filter counter to OLAP_SCAN for non-pushdown predicates (#60552) 2025-07-04 13:23:29 +08:00
Yixin Luo 74458bf52a
[Enhancement] datafile_gc support bundle tablet meta (#60507)
Signed-off-by: luohaha <18810541851@163.com>
2025-07-04 10:38:47 +08:00
Murphy 71793ad7b0
[Enhancement] improve crc64 for hashset performance (#60074)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-04 10:24:21 +08:00
ThunderScar 58d34eb991
[Refactor] Remove class StringValue (#25002)
Signed-off-by: linyan <1870750355@qq.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Co-authored-by: linyan <1870750355@qq.com>
2025-07-04 09:33:43 +08:00
Wu Xueyang 7e662d4249
[BugFix] GC inverted index path after segment data deleted (#60390)
If rowset data is deleted by garbage collection, the inverted index will not be removed because path scanning ignores all of the directories under the tablet schema hash path.

What I'm doing:
Path scanning will scan inverted index paths.

Signed-off-by: wuxueyang.wxy <wuxueyang.wxy@alibaba-inc.com>
2025-07-03 18:50:11 +08:00
yan zhang 7ff005aaac
[BugFix] fix min/max opt when all null values (#60545) 2025-07-03 17:55:47 +08:00
Murphy 46826ad18a
[Enhancement] test json compression (#60380)
Signed-off-by: Murphy <mofei@starrocks.com>
2025-07-03 17:42:48 +08:00
stdpain be1fc50ecd
[BugFix] Fix Exception when Java UDF output empty map (#60539)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-03 16:48:56 +08:00
srlch b34357a2cc
[BugFix] Let submitted tasks without execution can be awared in starrocks::LakeServiceImpl to set a correct response and status (#59814)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-07-03 15:50:13 +08:00
Murphy 461eaf4862
[Enhancement] extend the date cache to 2050 (#60533) 2025-07-03 12:30:10 +08:00
shuming.li 073e49d13f
[Feature] (Part1) Enhance observability for Warehouse CNGroup (#60343)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-07-02 21:05:09 +08:00
yan zhang d3e9134902
[Refactor] sort out count/min/max opt prerequisite (#60515)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-02 10:49:49 +00:00
srlch e66031fc62
[Feature] Make record predicate available in rowset/segment read path (#60423)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-07-02 17:32:52 +08:00
SevenJ 771e8fd8d7
[BugFix] fix thread unsafe gmtime to gmtime_r (#60483)
Signed-off-by: SevenJ <wenjun7j@gmail.com>
2025-07-02 08:28:29 +00:00
JinYang 76f64e5ff6
[Enhancement] accelerate the crc32c calculation speed (#43433)
Signed-off-by: GoHalo <gohalo@163.com>
Signed-off-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
2025-07-02 13:59:48 +08:00
stdpain 20d587f0a0
[BugFix] Fix arrow flight crash when fetch from a not exist query_id (#60497)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-02 13:43:24 +08:00
stdpain de1f77e27c
[Enhancement] Fixing reports on clang-tidy (#60480)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-07-02 09:40:36 +08:00
裸奔丶小馒头 31b8ff4251
[Feature] Support deleting all UDF jar caches at be startup (#41598)
Signed-off-by: changxin <streakxin@foxmail.com>
Signed-off-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
2025-07-01 18:09:04 +08:00
yan zhang 952db2da5f
[Enhancement] use lower_bound/upper_bound to optimize min/max (#60385)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-07-01 17:42:42 +08:00
satanson 0f3b2661b4
[Enhancement] Spill PartitionWise aggregation (#60216)
Signed-off-by: satanson <ranpanf@gmail.com>
2025-07-01 11:28:11 +08:00
Hongkun Xu 12f66bf17e
[Refactor] Fix spelling errors in variable names (#60433)
Signed-off-by: Hongkun Xu <xuhongkun666@163.com>
2025-06-30 19:52:09 +08:00
yan zhang 244ed71f5b
[Refactor] refactor bit packing code (#60434)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-06-30 19:47:11 +08:00
shuming.li 74d99cff15
[BugFix] Ensure information_schema.task_runs more compatible with null values (#60426)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-06-30 16:15:04 +08:00
srlch 4be45d5c45
[Feature] Introduce Column Hash Predicate as a Record Predicate for data filtering (#59993)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-06-30 10:12:44 +08:00
trueeyu 0ae63e4c18
[Refactor] Unify the local cache engine (#60110)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-30 10:06:45 +08:00
cutiechi 3a708b5c62
[BugFix] Prevent approx_cosine_similarity from Returning NaN When Input Vector Norm is Zero (#60297)
Signed-off-by: cutiechi <superchijinpeng@gmail.com>
2025-06-29 14:52:27 +08:00
Yixin Luo 37c7c51ee8
[Refactor] enable skip pk preload by default (#60368)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-27 12:18:42 +00:00
shuming.li a4ebb6b582
[BugFix] Fix information_schema.materialized_views table compatible bugs (#60374)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-06-27 17:00:09 +08:00
stephen f0bbb1bb53
[Feature] Add comprehensive decimal256 support (#60207)
Signed-off-by: stephen <stephen5217@163.com>
2025-06-27 11:20:25 +08:00
yan zhang 8d2648039c
[Refactor] refactor bitpacking code (#60320)
Why I'm doing:
refactor bitpacking code for further improvement.

What I'm doing:
This PR does:

merge bit_packing.h and bit_packing.inline.h => bit_packing_default.h. This implementation is to use template and unroll to do acceleration. Meanwhile, use namespace util::bitpacking_default instead of class BitPacking
rename bit_packing_simd.h to bit_packing_avx2.h, because it just uses avx2 instructions.
move arrow bit packing code to bit_packing_arrow.h
rename bit_packing_adaptor.h to bit_packing.h. And this is the entry file.
So right now we have following files, and entry file is bit_packing.h

-rw-rw-r-- 1 zhangyan zhangyan  4861 Jun 26 14:09 bit_packing_arrow.h
-rw-rw-r-- 1 zhangyan zhangyan 19580 Jun 26 14:05 bit_packing_avx2.h
-rw-rw-r-- 1 zhangyan zhangyan 11541 Jun 26 14:03 bit_packing_default.h
-rw-rw-r-- 1 zhangyan zhangyan  1708 Jun 26 14:10 bit_packing.h

Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-06-27 10:20:26 +08:00
Mesut Döner fada758ad4
[Feature] Add strpos function (#57287)
Why I'm doing:
trying to implement functions in Good First Issue list

What I'm doing:
Trino reference:
image

Fixes #52604

Signed-off-by: Mesut-Doner <mesutdonerng@gmail.com>
2025-06-27 10:01:10 +08:00
Kevin Cai fa25472968
[BugFix] fix incorrect message shown from spill dir configuration (#60339)
report correct configuration name when parse_conf_store_paths other than storage_root_path

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-06-26 19:56:31 +08:00
xiangguangyxg f278793601
[Feature] Support marking data files as shared in tablet metadata and skipping to delete shared data files in vacuum, leaving them for full gc to clean up (#60140)
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-06-26 19:20:40 +08:00
Yixin Luo b395fdfd9d
[BugFix] Revert use file info instead of file path in bundle data file reader (#60220) (#60338)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-26 11:07:35 +00:00
Seaven 88ac67846d
[BugFix] Fix prune unused predicate column bug (#60208)
Signed-off-by: Seaven <seaven_7@qq.com>
2025-06-26 17:16:46 +08:00
srlch 1cc2b266fc
[Enhancement] Introduce GC control by cluster snapshot info (#58909)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-06-26 13:51:10 +08:00
shuming.li f85e05bcb8
[BugFix] Fix information_schema.task_runs schema scan bugs (#60296)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-06-25 20:54:45 +08:00
Yixin Luo 74d6789f62
[Enhancement] support bundle file deletion for deleteTablets (#59966)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-25 19:12:08 +08:00
Rohit Satardekar f3da9857c2
[Feature] support https for brpc connections (#53695)
Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
2025-06-25 18:27:25 +08:00
Yixin Luo 0fa0491f95
[Enhancement] remove useless dir create operations when load spill (#60282)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-25 10:11:23 +00:00
Mesut-Doner da090d8c7c
[Feature] Add boolor function (#57414)
Signed-off-by: Mesut-Doner <mesutdonerng@gmail.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Co-authored-by: stdpain <drfeng08@gmail.com>
2025-06-25 18:03:20 +08:00
zhangqiang 9c9f55fe34
[BugFix] Fix some bugs in scenarios where file_bundling and alter operations intersect (#60091)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-06-25 14:06:36 +08:00
trueeyu 2ae06f73fb
[Refactor] Remove core arena mem allocator (#60221)
Branch-3.3 (pr: #51263) has already set the default value of config::chunk_reserved_bytes_limit to 0, and there is no performance issue, so we finally removed the core memory allocator in the main branch.

What I'm doing:
Remove core arena mem allocator

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-25 13:21:52 +08:00
Wu Xueyang 43f432565f
[BugFix] Return error asap while executing ShortCircuitHybridScanNode. (#53060)
Signed-off-by: 枢木 <wuxueyang.wxy@alibaba-inc.com>
2025-06-25 11:07:44 +08:00
SevenJ 32ccd0e9b7
[BugFix] fix partition key shuffle error (#60072)
Signed-off-by: SevenJ <wenjun7j@gmail.com>
2025-06-25 09:44:01 +08:00
shuming.li 4dc3df2106
[Enhancement] Add more informations for information_schema.task_runs and information_schema.materialized_views (#60054)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-06-24 19:17:13 +08:00
Yixin Luo b81d60db99
[Enhancement] use file info instead of file path in bundle data file reader (#60220)
Why I'm doing:
When reading bundled data files, we should pass the file info instead of the file path, as the info may contain file size information. In some filesystem implementations, this avoids additional file size fetch requests.

What I'm doing:
This pull request modifies the FileSystem::new_random_access_file_w method in be/src/fs/fs.cpp to improve how RandomAccessFile objects are created by passing the entire FileInfo object instead of just its path attribute.

Changes to FileSystem::new_random_access_file_w:
Updated calls to new_random_access_file to use the full FileInfo object instead of only file_info.path. This ensures that all relevant file metadata is available during the creation of RandomAccessFile instances.

Signed-off-by: luohaha <18810541851@163.com>
2025-06-24 19:11:32 +08:00
Siqi Ling 4b74e7d831
[Enhancement] Add transmitted bytes to FE Auditlog (#58346)
Signed-off-by: Siqi Ling <s.ling@celonis.com>
2025-06-24 14:48:06 +08:00
trueeyu e04a2c14df
[BugFix] Fix the bug in the PageHandle move assignment operator (#60206)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-24 14:07:38 +08:00
shuming.li 310c23aa58
[Enhancement] Support last_day constant evaluation in FE and partition pruning (#59504)
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
2025-06-24 13:39:06 +08:00
stdpain adedb6d506
[BugFix] Fix unsupported nestloop null-aware left anti join (#60119)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-24 09:35:52 +08:00
Yixin Luo 529dbb0b09
[BugFix] fix initial tablet meta read when only provide full path (#60132)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-23 13:51:40 +08:00
Yixin Luo 882e816598
[Tool] add meta tool to print bundle tablet meta (#60093)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-20 17:35:28 +08:00
Seaven ec24a2943c
[UT] fix case when expr ut error (#60107)
Signed-off-by: Seaven <seaven_7@qq.com>
2025-06-20 14:28:57 +08:00
zhangqiang 3a72bfe807
[Refactor] Refactor some BE log (#56928)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-06-20 13:49:35 +08:00
yan zhang 5943373fa9
[Enhancement] optimize count(1) for iceberg table (#60022)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-06-20 09:55:50 +08:00
trueeyu 54d4c7e365
[Refactor] Remove ObjectCache interface (#59942)
We have abstracted a local cache engine interface, so the object cache interface is no longer needed.

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-19 20:53:13 +08:00
Yixin Luo 3273d9ac10
[Refactor] rename aggregate/shared prefix to bundle (#60057)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-19 10:30:36 +00:00
Rohit Satardekar d985de00f3
[BugFix] BE crash when invoke list_rowsets() in non shared mode (#57462)
Fixes #57461

mysql> select * from TABLE(list_rowsets(24015, 10));
ERROR 1064 (HY000): Only works for tablets in the cloud-native table: BE:11001

Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
2025-06-19 16:09:45 +08:00
wyb 96340a0cef
[BugFix] Fix BE crash caused by filesystem is_symlink exception (#60028)
Signed-off-by: wyb <wybb86@gmail.com>
2025-06-19 15:22:00 +08:00
stephen 52db81f79f
[Enhancement] Optimize int256 division implement (#59892)
Signed-off-by: stephen <stephen5217@163.com>
2025-06-19 15:14:03 +08:00
zhanghe e51bda89cd
[BugFix] Filesystem cache failure in concurrent scenarios. (#60053)
Signed-off-by: edwinhzhang <edwinhzhang@tencent.com>
2025-06-19 06:09:53 +00:00
Yixin Luo eb02c84ea9
[Enhancement] add touch_cache and get_numeric_statistics implementation to BundleSeekableInputStream (#60034)
Signed-off-by: luohaha <18810541851@163.com>
2025-06-19 12:05:58 +08:00
stephen 8926e97a30
[BugFix] Adapt JDBC type_checker for decimal256 (#60039)
Fix be crash with jdbc decimal case

Signed-off-by: stephen <stephen5217@163.com>
2025-06-19 09:26:47 +08:00
Seaven 4fb350a848
[BugFix] Fix case-when return collection type bugs (#59972)
Fixes #59921

Signed-off-by: Seaven <seaven_7@qq.com>
2025-06-19 08:48:41 +08:00
Alexey 16ff4bec20
[BugFix] fix window pre-aggregation for count(*) (#60003)
## Why I'm doing:

Fixes regression after #52466 
if we disable optimization with flag query will executed correctly
`enable_push_down_pre_agg_with_rank=false`
otherwise will crashed

another workaround: use specific column name instead of `*`

some function have empty arg_types, for example `count(*)`
and code will crash
```
        TypeDescriptor arg_type = TypeDescriptor::from_thrift(fn.arg_types[0]);
```

```
PC: @         0x1330e5b6 std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> >::size() const
*** SIGSEGV (@0x10) received by PID 133245 (TID 0x7f7b2a631640) from PID 16; stack trace: ***
    @     0x7f7bf4e3fee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
    @         0x1cc76d69 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f7bf4de8520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
    @         0x1330e5b6 std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> >::size() const
    @         0x188448e6 starrocks::TypeDescriptor::TypeDescriptor(std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> > const&, int*)
    @         0x13382799 starrocks::TypeDescriptor::from_thrift(starrocks::TTypeDesc const&)
    @         0x1837b490 starrocks::pipeline::LocalPartitionTopnContext::prepare_pre_agg(starrocks::RuntimeState*)
    @         0x1837ad8a starrocks::pipeline::LocalPartitionTopnContext::prepare(starrocks::RuntimeState*, starrocks::RuntimeProfile*)
    @         0x183b62fe starrocks::pipeline::LocalPartitionTopnSinkOperator::prepare(starrocks::RuntimeState*)
    @         0x1764e58f starrocks::pipeline::PipelineDriver::prepare(starrocks::RuntimeState*)
```

## What I'm doing:

add validation of parameter and if we have empty arg_types will set it as TYPE_UNKNOWN
it's safe because for `count` function we always override it later as
```cpp
        if (fn.name.function_name == "count") {
            arg_type.type = TYPE_BIGINT;
```

for other functions also safe and will handled by code with return Status::InternalError instead of BE crash

```cpp
        func = get_window_function(fn.name.function_name, arg_type.type, return_type.type, is_input_nullable,
                                   fn.binary_type, state->func_version());

        if (func == nullptr) {
            return Status::InternalError(strings::Substitute("Invalid window function plan: ($0, $1, $2, $3, $4, $5)",
                                                             fn.name.function_name, arg_type.type, return_type.type,
                                                             is_input_nullable, fn.binary_type, state->func_version()));
        }
```

code introduced in 3.4 as optimization, so required branches to fix main, 3.6, 3.5, 3.4

Signed-off-by: Aliaksei Dziomin <diominay@gmail.com>
2025-06-18 10:51:19 +08:00
Kevin Cai 3eefd6cb8b
[BugFix] fix InternalService_RecoverableStub reset_channel for http protocol (#59973)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-06-17 15:30:40 +08:00
Gavin 494aca6b97
[BugFix] Fix the `disk_device_id` function to aquire the device_id of an non-existent entry by its ancestor entries. (#59919)
Signed-off-by: GavinMar <yangguansuo@starrocks.com>
2025-06-17 11:22:38 +08:00
Kevin Cai 9dc1a6931c
[BugFix] Fix InternalService_RecoverableStub race conditon (#59933)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-06-16 18:07:36 -07:00
zhangqiang ad08b125aa
[BugFix] Fix aggregate publish failed when lake table has rollup (#59873)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-06-16 17:04:32 +08:00
xiangguangyxg 8fee4930ad
[BugFix] Fix forgetting to set virtual buckets when BE is upgraded to the new version but FE is still on the old version (#59922)
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-06-16 14:34:55 +08:00
stephen 5d3a511498
[Feature] Support write and read decimal256 (#59778)
Signed-off-by: stephen <stephen5217@163.com>
2025-06-16 11:03:31 +08:00
wyb 84ecab9cdf
[Enhancement] Change time-related column type to datetime in FeTabletSchedules system table (#59813)
## Why I'm doing:

`CREATE_TIME, SCHEDULE_TIME, FINISH_TIME` column types are double.

```
mysql> select * from fe_tablet_schedules;
+----------+--------------+-----------+--------+----------+-----------+----------------+----------------+----------------+----------------+-----------+------------+-------------+----------------+-------------------------------------+
| TABLE_ID | PARTITION_ID | TABLET_ID | TYPE   | PRIORITY | STATE     | TABLET_STATUS  | CREATE_TIME    | SCHEDULE_TIME  | FINISH_TIME    | CLONE_SRC | CLONE_DEST | CLONE_BYTES | CLONE_DURATION | MSG                                 |
+----------+--------------+-----------+--------+----------+-----------+----------------+----------------+----------------+----------------+-----------+------------+-------------+----------------+-------------------------------------+
|    52033 |        52032 |     52036 | REPAIR | HIGH     | CANCELLED | DISK_MIGRATION | 1749609761.376 | 1749609762.384 | 1749609762.384 |        -1 |         -1 |           0 |              0 | there is no replica need to migrate |
```

## What I'm doing:

change double to datetime.

```
mysql> select * from fe_tablet_schedules;
+----------+--------------+-----------+--------+----------+-----------+----------------+---------------------+---------------------+---------------------+-----------+------------+-------------+----------------+-------------------------------------+
| TABLE_ID | PARTITION_ID | TABLET_ID | TYPE   | PRIORITY | STATE     | TABLET_STATUS  |    CREATE_TIME      |  SCHEDULE_TIME      |    FINISH_TIME      | CLONE_SRC | CLONE_DEST | CLONE_BYTES | CLONE_DURATION | MSG                                 |
+----------+--------------+-----------+--------+----------+-----------+----------------+---------------------+---------------------+---------------------+-----------+------------+-------------+----------------+-------------------------------------+
|    52033 |        52032 |     52036 | REPAIR | HIGH     | CANCELLED | DISK_MIGRATION | 2025-06-11 10:42:41 | 2025-06-11 10:42:42 | 2025-06-11 10:42:42 |        -1 |         -1 |           0 |              0 | there is no replica need to migrate |
```

Signed-off-by: wyb <wybb86@gmail.com>
Co-authored-by: alvin <115669851+alvin-celerdata@users.noreply.github.com>
2025-06-13 10:17:00 -07:00
trueeyu a7fecb900c
[Refactor] Adjust some naming related to the Cache (#59877)
Interface of cache engine:

LocalCache -> LocalCacheEngine
RemoteCache -> RemoteCacheEngine
Implement of local cache engine

StarCacheWrapper -> StarCacheEngine
TODO: add LRUCacheEngine
Implement of remote cache engine

PeerCacheWrapper -> PeerCacheEngine
metrics

dummy_types.h -> cache_metrics.h

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-13 09:01:48 -07:00
stephen 2bd5b502b4
[Enhancement] Optimize int256 multiplication implement (#59862) 2025-06-13 19:48:48 +08:00
stephen d386665964
[Feature] support int256 data type (#59808)
Signed-off-by: stephen <stephen5217@163.com>
2025-06-13 11:31:44 +08:00
trueeyu 4b80b35067
[Refactor] Refactor cache metrics (#59846)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-13 10:13:55 +08:00
Kevin Cai 8d397ed3d6
[BugFix] fix MemoryScratchSinkOperator set_cancelled issue (#59810)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-06-12 13:34:38 +08:00
starrocks-xupeng 5b144be9d7
[BugFix] fix cache select error after DDL (#59812)
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
2025-06-12 04:30:26 +00:00
Rohit Satardekar 7a664b7e7a
[BugFix] StreamLoad not working for URL encoded chinese characters (#59722)
RestBaseAction.java redirectTo() incorrectly processes the table name, should use resultUriObj.toASCIIString() instead of resultUriObj.toString()

Fixes #59721

curl -v -k --location-trusted -uroot -H "label:table1" -H "column_separator:," -T data.csv http://127.0.0.1:8030/api/mydb/testStreamLoad%E6%A1%8C/_stream_load

Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
2025-06-11 09:10:12 -07:00
Yakir Gibraltar db09d4627d
[BugFix] Fix #59802 - incorrect HDFS write buffer size (#59803)
Why I'm doing:
Bug in HDFS upload operations where the write buffer size is incorrectly set to a boolean value (1) instead of the actual configured buffer size.

What I'm doing:
Fixed line 598 in be/src/fs/hdfs/fs_hdfs.cpp by changing the assignment from _options.upload->__isset.hdfs_write_buffer_size_kb (boolean flag) to _options.upload->hdfs_write_buffer_size_kb (actual buffer size value).

Fixes #59802
2025-06-11 08:59:02 -07:00
zhangqiang 249ee8a7b9
[BugFix] Revert PR #59009 (#59815) 2025-06-11 23:33:10 +08:00
stdpain 42a52b46cb
[BugFix] Fix race condition in merge_isomorphic_profiles (#59809)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-11 21:00:12 +08:00
trueeyu 7dcd18f1cd
[UT] Fix the compile problem using --without-starcache (#59794)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-11 14:45:17 +08:00
stephen f2ba44320c
[Feature] Create table supports decimal256 column type (#59651)
## What I'm doing:
support create table with decimal precision in the range (38, 76]. 
eg:
```
CREATE TABLE test_decimal (k1 int, k2 int, k3 int , d1 decimal(50, 24));
```
now, we can support decimal's precision in the range [1, 76]. decimal's scale in the range [0, 76]. 
Also, this patch removes some unused code.

Fixes #issue
https://github.com/StarRocks/starrocks/issues/59645

Signed-off-by: stephen <stephen5217@163.com>
2025-06-10 08:58:22 -07:00
PengFei Li 9e9cd96640
[Enhancement] Add failpoints for load (#59518)
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-06-10 23:01:20 +08:00
PengFei Li 0171f81d01
[BugFix] Fix load failure caused by a single secondary replica failure​ (#59762)
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-06-10 02:23:50 +00:00
Giorgio Pellero 9d0fc49bc5
[BugFix] do not set basic auth header if credentials not provided (#54960)
Why I'm doing:
While the FE already has this logic, the BE would still send an empty Authorization: Basic header to ElasticSearch even if credentials were not provided; this causes a 403 even if the ES cluster has no AuthN/AuthZ settings. This change allows using StarRocks with ElasticSearch external catalogs even with Basic Auth disabled.

What I'm doing:
Adds a check for empty credentials (both username and password) in the ElasticSearch scan reader, in which case the Authorization: Basic ... header is not set.

Signed-off-by: Giorgio Pellero <giorgio.pellero@gmail.com>
Co-authored-by: alvin <115669851+alvin-celerdata@users.noreply.github.com>
2025-06-09 18:42:33 -07:00
Yixin Luo ccdcd58332
[BugFix] Fix multiple issues with bundled data file reading failures (#59720)
Why I'm doing:
In previous PR #58923 , we introduced bundle data file feature.
And in the current implementation, there are several issues that may cause bundled data file reading failures:

During multi-partition imports, bundled data files may be shared across partitions. Since data files cannot exist across partition directories, this causes the system to fail locating the bundled files.
Preloading initiated before bundled data files are fully persisted results in "file not found" errors.
Incorrect usage of bundled data file reading APIs in multiple code paths leads to data reading anomalies.
What I'm doing:
Fix #58316
This PR fixes the aforementioned issues.

Enhancements for Bundling Data Files:
New Method for Checking Bundling Status: Added _is_data_file_bundle_enabled method in LakeTabletsChannel to encapsulate logic for determining whether data file bundling is enabled based on request parameters. ([[1]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5R209-R210), [[2]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5R668-R672))
Partition-Based Bundling Context: Replaced the single BundleWritableFileContext with a partition-based mapping (_bwfile_ctxs_by_partition) to manage bundled files more effectively. ([[1]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5L238-R241), [[2]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5R710-R716))
Delta Writer Updates: Modified _create_delta_writers to use partition-specific bundling contexts and updated the DeltaWriter builder to support bundled files. ([[1]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5R710-R716), [[2]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-cb0fbac7cd7174aa5fccc214e7e862f79a8e11a9666c0fa36a8205ab4d09fcb5L726-R737))
File Handling Improvements:
File Access Updates: Replaced new_random_access_file with new_random_access_file_with_bundling in multiple locations to accommodate bundled file handling. ([[1]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-a5433fc4c5ee849e918b1c2eb9974d5aecc02b3903155ed745b7c7ce6fa42461L192-R192), [[2]](https://github.com/StarRocks/starrocks/pull/59720/files#diff-3c073f242259e54f685bafc4fd19da86b21969be5de8ae5b81ca0a4ee451c325L43-R43))
Skip Preload Logic: Enhanced DeltaWriterImpl to skip primary key preload when bundled files are present, ensuring correct handling of unflushed data. ([be/src/storage/lake/delta_writer.cppL616-R618](https://github.com/StarRocks/starrocks/pull/59720/files#diff-24e44403f5e4c75144f9645f2fcff77dcdd9269836bafd5b79d5b2d6589e49fdL616-R618))
Testing:
New Test Case: Added test_write_bundling_file in LakeTabletsChannelTest to validate the functionality of data file bundling, including partition handling and transaction log verification. ([be/test/runtime/lake_tablets_channel_test.cppR444-R508](https://github.com/StarRocks/starrocks/pull/59720/files#diff-24bc5f1765c03c495d5cd72f1dff37877249a0f187b1b608e1010e3284b460a0R444-R508))
Miscellaneous:
Comment Updates: Updated comments to reflect the new bundling terminology, replacing "shared data file" with "bundle data file." ([be/src/storage/lake/general_tablet_writer.cppL132-R132](https://github.com/StarRocks/starrocks/pull/59720/files#diff-3dc155c1f66ca23166df68c90b2dd007553dcdb5614809d09dc669af8d406593L132-R132))
Additional Includes: Included <cstdint> in tablet_retain_info.h for consistency and completeness. ([be/src/storage/lake/tablet_retain_info.hR17](https://github.com/StarRocks/starrocks/pull/59720/files#diff-1249fb64d433d33db81344d35f81ac9e5d933a5151bf1dc2af76bb0152163b39R17))

Signed-off-by: luohaha <18810541851@163.com>
2025-06-09 14:30:42 -07:00
srlch 3c91db69cf
[BugFix] Fix compile problem for tablet_retain_info (#59736)
Signed-off-by: srlch <linzichao@starrocks.com>
2025-06-09 16:48:41 +08:00
zombee0 a63f481ed2
[BugFix]deal with RemoteFileNotFound when get_next (#59733)
Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-06-09 15:36:31 +08:00
chen9t bf7d424706
[BugFix] Use correct parameter for pulsar_consumer.seek() (#34722)
According to the pulsar C++ API, we have 2 ways to call: consumer.seek():

    Result seek(const MessageId& messageId);
    Result seek(uint64_t timestamp);
And we intended to use the first one. But we use the wrong parameter here:

if (initial_position == InitialPosition::LATEST || initial_position == InitialPosition::EARLIEST) {
        pulsar::InitialPosition p_initial_position = initial_position == InitialPosition::LATEST
                                                             ? pulsar::InitialPosition::InitialPositionLatest
                                                             : pulsar::InitialPosition::InitialPositionEarliest;
        result = _p_consumer.seek(p_initial_position);
        ...
    }
So it call the 2nd function instead.
This PR will fix this issue.

Signed-off-by: jiutianchen <chen9t@gmail.com>
2025-06-08 10:58:37 -07:00
Yufei Liu 0c4671e154
[BugFix] fix rpc closure unexpected delete (#19235) (#19239)
Signed-off-by: liuyufei9527 <liuyufei9527@gmail.com>
Signed-off-by: alvin <115669851+alvin-celerdata@users.noreply.github.com>
Co-authored-by: alvin <115669851+alvin-celerdata@users.noreply.github.com>
2025-06-07 17:42:12 -07:00
duanyyyyyyy bdabb3a9a4
[BugFix] Fix a bug that resource_group_mem_allocated_bytes used twice… (#59716)
In the patch patch.
I made a mistake that copy the wrong metrics name.

Signed-off-by: ‘duanyyyyyyy’ <yan.duan9759@gmail.com>
2025-06-06 14:49:25 -07:00
srlch 1678ad7f29
[Enhancement] Introduce TabletRetainInfo for vaccum (#59686)
Introduce TabletRetainInfo class to save the necessary information by the specifed versions that used in vacuum process to determine which files(data or meta) can be deleted.

Signed-off-by: srlch <linzichao@starrocks.com>
2025-06-06 09:30:00 -07:00
zhangqiang 7b3811e534
[Refactor] Rename enable_partiton_aggregate to file_bundling (#59562)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-06-05 16:03:02 +08:00
Seaven 50121ff78a
[UT] update flat json encoding ut (#59609)
Signed-off-by: Seaven <seaven_7@qq.com>
2025-06-05 06:35:09 +00:00
xiangguangyxg 53218f7642
[Feature] Distributing data according to virtual buckets for dynamic tablet splitting and merging (#59433)
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-06-05 14:14:29 +08:00
Yixin Luo e6071ba02c
[Feature] support data file bundling (#58923) 2025-06-04 15:48:47 +00:00
Seaven b6ba36583b
[BugFix] Fix flat json set child encoding error (#59578)
Signed-off-by: Seaven <seaven_7@qq.com>
2025-06-04 19:33:17 +08:00
stdpain b4fcb4bfa4
[BugFix] Fix compilation errors in gcc-14 (#59589)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-04 19:17:33 +08:00
before-Sunrise eb60766312
[BugFix] keep lambdaFunction's isConst as before, array_contains fall back to non-const process when state is nullptr (#59577)
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
2025-06-04 18:45:55 +08:00
Kevin Cai 76fb97f83e
[BugFix] defensive coding against empty partition (#59553)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-06-04 17:28:34 +08:00
wyb fd8fcfe36a
[BugFix] Fix string to infinity float bug (#59574)
Signed-off-by: wyb <wybb86@gmail.com>
2025-06-04 05:57:51 +00:00
stdpain bb00de2af5
[Enhancement] support build aggregate runtime in-filters (#59288)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-04 10:00:02 +08:00
stdpain 6f74df696d
[BugFix] Fix invalid connections are fetched in the thrift connection pools (#59536)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-04 09:48:18 +08:00
trueeyu 105b2f6dfe
[Refactor] Use page cache instead of object cache to cache external file footer (#59537)
Why I'm doing:
LRUCacheModule and StarCacheModule are only the wrappers of object cache interfaces. PageCache is used to cache the decompressed or decoded page, and BlockCache is used to cache the raw block of external tables or shared-data tables.

What I'm doing:
Use page cache instead of object cache to cache external file footer

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-06-03 08:58:50 -07:00
xiangguangyxg 4b9f9e2a5a
[Feature] Change bucket number from physical partition level to materialized index level (#59441)
Why I'm doing:
This is a preliminary work of tablet splitting and merging.
Previously, bucket number is at physical partition level. All materialized indexes in a physical partition must have the same bucket number. But after tablet splitting, different materialized index in a physical partition could have different bucket number. We need to change bucket number from physical partition level to materialized index level.

What I'm doing:
Change bucket number from physical partition level to materialized index level.

Because different materialized index in a physical partition could have different bucket number. Tablet sink cannot calculate a unified tablet index for each record of data to be distributed to different materialized index.
To solve the problem, we refactor the code of tablet sink. Now tablet sink calculate a unified hash value for each record of data to be distributed to different materialized index, then the tablet index for each record of data will be calculated according to the hash value and the tablet size of a materialized index when the record of data is distributed to the materialized index.

This pr just refactor the code. Next pr will remove num_bucket in OlapTablePartition and use tablets.size() in each OlapTableIndexTablets instead.

Fixes #59134

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
2025-06-03 08:55:45 -07:00
before-Sunrise dde8dac9f7
[BugFix] fix LambdaFunction's isConst always return false (#59510)
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
2025-06-03 13:39:47 +08:00
stdpain f224abe7b4
[BugFix] Avoid allocating memory in dontdump_unused_pages (#59538)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-06-03 13:36:17 +08:00
meegoo 3fb06e0ff1
[BugFix] Fix backend hang in fatal log (#59340)
dontdump_unused_pages() is called back during the execution of FATAL LOG, which does not allow LOG to be used again to print logs, which will cause deadlock.

Signed-off-by: meegoo <meegoo.sr@gmail.com>
2025-06-01 15:34:11 -07:00
AlgoLin 8c1a27573e
[BugFix] fix spelling errors in log output (#27822)
Signed-off-by: linenwei1 <linenwei1@jd.com>
2025-05-31 07:32:19 -07:00
trueeyu 5ddcdec836
[Enhancement] Interface of PageCache support options evict_probability (#59492)
Why I'm doing:
The interface of PageCache now supports the option evict_probability, which will be used by external tables.
For shared LRU cache, calculating memory usage is expensive. Temporarily disable the eviction probability's dependency on memory usage.
Remove the kept_in_memory option, which is currently unused. Use priority instead.

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-30 09:35:21 -07:00
Yixin Luo d9a3c105f9
[BugFix] fix skip write txnlog when compaction fail (#59508)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-30 10:58:52 +00:00
zhangqiang 8d0f2561f4
[Feature] Support enabling partition aggregation for tables created in older versions (#59102)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-30 14:03:44 +08:00
Yixin Luo 1f057a298b
[BugFix] fix object storage dir delete after load spill (#59480)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-30 02:23:30 +00:00
trueeyu a86ab93f33
[Refactor] HiveDataSource will use page_cache_available instead of relying solely on configuration to determine whether to use page cache (#59465)
Why I'm doing:

The original implementation of get_capacity() acquired locks for all shards one by one, and then summed their capacities, which was inefficient. To reduce lock contention, I modified it to retrieve the total capacity directly from the SharedLRUCache directory entry, which avoids locking multiple shards.
PageCache is used by both external tables and internal tables, so i move file page_cache.h from storage/rowset to cache/object_cache
HiveDataSource will use func page_cache_available instead of relying solely on configuration to determine whether to use page cache.

What I'm doing:

HiveDataSource will use page_cache_available instead of relying solely on configuration to determine whether to use page cache

Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-29 15:37:12 -07:00
Dan Jing d8cbd13840
[Enhancement] Add log for tablet id of max_tablet_rowset_num (#59467)
Signed-off-by: Dan Jing <jingdan@starrocks.com>
2025-05-29 10:51:51 -07:00
Kevin Cai 2638c2aee5
[Enhancement] BE wait for at least one heartbeat from frontend during graceful exit (#59428)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-05-28 21:19:33 -07:00
trueeyu 900dbea505
[Refactor] Use the storage page cache instead of the object cache to cache the decompressed data of external tables (#59341)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-29 11:42:32 +08:00
Yixin Luo c42c26d919
[Enhancement] pk compaction should not retry when apply timeout (#59440)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-29 11:41:59 +08:00
zhangqiang 323078ba0a
[BugFix] Fix storage_size lost in partitions_meta table (#59434)
## Why I'm doing:
This pr(https://github.com/StarRocks/starrocks/pull/56234) add storage_size for lake table and add it to partition proc, but storage size does not added to table `partitions_meta`.

## What I'm doing:
Add storage_size to table `partitions_meta`

Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-28 09:57:10 -07:00
Kevin Cai a59852d6af
[BugFix] stream load workflow check process exit status in a few points (#59362)
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
2025-05-28 08:55:09 -07:00
trueeyu b7c39b9d5c
[Enhancement] Fix the cache select behavior when datacache is disabled (#59410)
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-28 16:26:44 +08:00
stdpain 69fc6bdecb
[Enhancement] Add SignalTimerGuard class for thread stack trace timeout monitoring (#59380)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-28 14:47:28 +08:00
stdpain e8b0cd792c
[Enhancement] fix unexpected function call in fill_dst_column (#59411)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-28 13:52:57 +08:00
zombee0 09f7993876
[BugFix]Don't return error when decoding min/max is not supported (#59346)
we didn't support decoding min/max value of some types, but we shouldn't return
the error status to users.
On main branch and branch-3.5, there is no bug, we just deal with this error status
asap to avoid further bug.
On branch-3.4 and branch-3.3, there is bug when the type of predicate is in-filter
or rf-min-max and the type of data is float/double.

Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-05-27 19:50:48 -07:00
Hechem Selmi c68c2fb732
[Enhancement] Push down limit to multi cast sink (#59265)
Signed-off-by: m-selmi <m.selmi@celonis.com>
2025-05-28 09:57:52 +08:00
PengFei Li 58569bb871
[Enhancement] Secondary replica supports to poll state from primary replica (#57539)
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-05-28 09:28:26 +08:00
Yixin Luo 2f1ec5e920
[BugFix] fix persistent index compatibility issue when migrate between different cpu arch (#59219)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-26 22:58:33 -07:00
Binglin Chang 3c61005e17
[Feature] Add storage volume and staros support for GCS (#58815)
Signed-off-by: Binglin Chang <decstery@gmail.com>
2025-05-27 11:21:48 +08:00
wyb ad3380ee61
[Enhancement] Support service principal authentication in azure blob file system (#59308)
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-27 10:03:11 +08:00
Drake Wang c9cca5fe14
[BugFix] Fix invalid max column unique id introduced by version compatibility for cloud-native table (#59190)
Signed-off-by: drake_wang <wxl250059@alibaba-inc.com>
2025-05-26 19:32:39 +08:00
stdpain 9820cf29dd
[BugFix] update azure install lib dirs (#59336)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-26 16:56:23 +08:00
zhangqiang 469ae2a077
[Enhancement] Report tablet version in advance if publish cost too much time (#59009)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-26 16:19:33 +08:00
Zach 18f327f8b3
[Feature]Support Arrow Flight SQL V1 (#57956)
Signed-off-by: Zac-saodiseng <3253345336@qq.com>
Signed-off-by: Zach <3253345336@qq.com>
2025-05-26 14:22:43 +08:00
trueeyu e5a2424772
[Refactor] Refactor the page cache interface to make it reusable for external tables in the future. (#59269)
Signed-off-by: lxhhust350@qq.com <lxhhust350@qq.com>
Signed-off-by: trueeyu <lxhhust350@qq.com>
2025-05-26 14:00:21 +08:00
duanyyyyyyy 142741cc13
[Enhancement] Add metrics for connector scan pool (#59323)
Signed-off-by: ‘duanyyyyyyy’ <yan.duan9759@gmail.com>
2025-05-26 10:10:52 +08:00
Maxim Martynov f3b7fcadc2
[UT] Add tests for C++ ThreadLocalUUIDGenerator (#59267)
Why I'm doing:
Before changing behavior of ThreadLocalUUIDGenerator in #59107, added some minimal tests.

What I'm doing:
Added tests for ThreadLocalUUIDGenerator C++ class, checking UUID uniqueness and thread-safety.

Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
2025-05-24 09:13:35 -07:00
Yaqi Zhang 2815b670a7
[UT] Fix flaky ThreadPool tests (#59304)
Why I'm doing:
We found that some ThreadPool tests are flaky.

What I'm doing:
This PR fixes the flaky tests by increasing the idle_timeout. Ran presubmit-tests 5 times, all passed.

Signed-off-by: Yaqi Zhang <y.zhang@celonis.com>
2025-05-23 14:33:41 -07:00
wyb 7055c8da30
[Enhancement] Support using native sdk to access azure blob storage in files() table function (#59059)
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-23 17:05:27 +08:00
PengFei Li 4d974ecb03
[Enhancement] Add some load metrics for cloud-native (#59209)
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
2025-05-23 11:26:04 +08:00
zihe.liu af1925a58f
[BugFix] Fix check runtime filter version for compatibility (#59248)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-05-23 11:20:06 +08:00
Yixin Luo 0eafbda927
[BugFix] persistent index will lost data when load snapshot fail (#59247)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-23 10:18:09 +08:00
SevenJ 77c0f736cd
[Feature] implement insert into iceberg partition transform table (#59024)
Signed-off-by: SevenJ <wenjun7j@gmail.com>
2025-05-23 10:17:44 +08:00
zihe.liu 0ccc3065bc
[BugFix] Fix result queue capacity of result sink (#59153)
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
2025-05-22 18:57:05 +08:00
zhangqiang 6d79c7b71e
[BugFix] Disable release data cache before BE core dump (#59227)
Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-22 06:29:46 +00:00
zombee0 8684006e55
[BugFix]fix crash when a subfield of struct appearw in two predicates (#59216)
Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-05-22 13:55:09 +08:00
stdpain 4ddd9dcd06
[Refactor] Refactor hash map/set variants (#59150)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-22 12:00:42 +08:00
trueeyu 381ef732cd
[Refactor] Rename some config names related to disk watermark. (#59167)
Signed-off-by: liverpoolai666@gmail.com <lxhhust350@qq.com>
2025-05-22 11:59:44 +08:00
satanson e1ec33194b
[Enhancement] Colocate/Bucket-shuffle intersect/union/except (#58782)
Signed-off-by: satanson <ranpanf@gmail.com>
2025-05-22 10:11:29 +08:00
wyb 3f7ad9764a
[Enhancement] Add azure blob file system in backend (#59061)
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-22 10:06:59 +08:00
zombee0 c1a2d01aca
[Enhancement] Skip compressed data cache (#58927)
Signed-off-by: zombee0 <ewang2027@gmail.com>
2025-05-22 09:46:32 +08:00
zhangqiang 3d7969c949
[Enhancement] Submit write combined_txn_log/aggregate_tablet_metadata to thread pool to avoid block brpc thread (#58892)
The combined_txn_log and aggregated_tablet_metadata are written to remote storage directly in the BRPC threads, which may block the BRPC threads. This PR submits these tasks to a thread pool for execution to avoid blocking BRPC threads.

Signed-off-by: sevev <qiangzh95@gmail.com>
2025-05-22 08:57:25 +08:00
Yixin Luo fd6d17a660
[Enhancement] enable tablet meta aggregation for PK table (#59088)
Signed-off-by: luohaha <18810541851@163.com>
2025-05-21 19:47:03 +08:00
Hongkun Xu c73934557c
[Feature] Support tokenize function (#58965)
Signed-off-by: Hongkun Xu <xuhongkun666@163.com>
Co-authored-by: leorishdu <18771113323@163.com>
2025-05-21 13:08:01 +08:00
yan zhang 2538dc225a
[BugFix] fix asan misalignment in avx512 (#59148)
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
2025-05-20 19:17:23 +08:00
stdpain af460c42bf
[Enhancement] Optimizing read parquet null columns (#58675)
Signed-off-by: stdpain <drfeng08@gmail.com>
2025-05-20 18:03:51 +08:00
trueeyu a9f8f13da9
[Refactor] Modify some configuration names related to adaptive adjustment of disk cache and memory cache. (#59127)
Signed-off-by: trueeyu <lxhhust350@qq.com>
Signed-off-by: liverpoolai666@gmail.com <lxhhust350@qq.com>
2025-05-20 15:34:22 +08:00
wyb a8cb17b559
[Enhancement] Add azure blob uri class in backend (#59062)
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-20 15:04:03 +08:00
wyb 5647801660
[Enhancement] Add azure blob credential in backend (#59066)
Signed-off-by: wyb <wybb86@gmail.com>
2025-05-20 11:00:26 +08:00