Fix sse_memcmp UT compilation error on aarch64.
## Why I'm doing:
```
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/monotime_test.cpp.o
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/mysql_row_buffer_test.cpp.o
/root/starrocks/be/test/util/memcmp_test.cpp: In member function 'virtual void starrocks::sse_memcmp_Test_Test::TestBody()':
/root/starrocks/be/test/util/memcmp_test.cpp:38:20: error: 'sse_memcmp2' was not declared in this scope
38 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:46:20: error: 'sse_memcmp2' was not declared in this scope
46 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:54:20: error: 'sse_memcmp2' was not declared in this scope
54 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:62:20: error: 'sse_memcmp2' was not declared in this scope
62 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:71:20: error: 'sse_memcmp2' was not declared in this scope
71 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:80:20: error: 'sse_memcmp2' was not declared in this scope
80 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:89:20: error: 'sse_memcmp2' was not declared in this scope
89 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:98:20: error: 'sse_memcmp2' was not declared in this scope
98 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
make[2]: *** [test/CMakeFiles/starrocks_test_objs.dir/util/memcmp_test.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [test/CMakeFiles/starrocks_test_objs.dir/all] Error 2
make: *** [all] Error 2
```
Signed-off-by: qingzhongli <qingzhongli2018@gmail.com>
the cpu instruction is off either because of not wanted the target instruction set or the build machine doesn't have the instruction supported. Be respectful to the instruction switch.
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
handle zstd decompress failure, throw runtime_error exception
fix orc_scanner tpch_10k.orc.zstd, it's corrupted. Replace it with correct test file and update the related test cases.
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
When sending a request to the /api/transaction/{begin,load,commit,...} endpoints, the content type is wrongly set to text/html instead of application/json.
What I'm doing:
Fixes#61130
Signed-off-by: Fatih Çatalkaya <fatih.catalkaya@yahoo.de>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
should not create a separate evhttp_request in test body
shall leverage the input_buffer created in the evhttp_request initialized by evhttp_request_new()
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
When clone and drop table run concurrency, the new_tablet during clone maybe dropped and throw null exception.
Signed-off-by: sevev <qiangzh95@gmail.com>
regression introduced in #53967
available in 3.5, backported to 3.4
3.3 doesn't have bug
`!(expr1 && expr2)` have to be transformed into `!expr1 || !expr2`
not like it happens in pr `!expr1 && !expr2`
otherwise we will use incorrect expression a top of scalar column
## Why I'm doing:
in #53967 we introduce zonemap filtering for struct column
but it contain bug
failed query
```
select x
from y
where x.field1[1].field2
```
```cpp
// check subfield expr has only one child, and it's a SlotRef
if (subfield_expr->children().size() != 1 && !subfield_expr->get_child(0)->is_slotref()) {
return Status::InternalError("Invalid pattern for predicate");
}
```
`subfield_expr->children().size() != 1` - `false`
`!subfield_expr->get_child(0)->is_slotref()` - `true`, because it will be array access
whole expression also `false` and execution continued
now we expect, because we want
```
// Rewrite ColumnExprPredicate which contains subfield expr and put subfield path into subfield_output
// For example, WHERE col.a.b.c > 5, a.b.c is subfields, we will rewrite it to c > 5
```
but still have `[1].field2`
## What I'm doing:
fix expression logic
Signed-off-by: Aliaksei Dziomin <diominay@gmail.com>
Fixes#59757
Why I'm doing:
When using HDFS as a remote storage volume for spilling, StarRocks fails with NOT_IMPLEMENTED_ERROR because several critical filesystem operations were not implemented in the HDFS filesystem wrapper (HdfsFileSystem class). These operations are essential for the spilling workflow:
Creating directories for spill data organization
Checking if paths are directories vs files
Deleting files and directories during cleanup
Managing directory hierarchy for spill containers
Without these operations, queries that need to spill to HDFS storage volumes cannot function, severely limiting StarRocks' ability to handle large datasets when using HDFS as external storage.
What I'm doing:
This PR implements the missing HDFS filesystem operations required for spilling functionality:
Implemented Operations:
delete_file() - Delete files from HDFS using hdfsDelete
create_dir() - Create directories using hdfsCreateDirectory
create_dir_if_missing() - Create directories if they don't exist (with existence check)
create_dir_recursive() - Create directories recursively (leverages HDFS native recursive creation)
delete_dir() - Delete empty directories using hdfsDelete
delete_dir_recursive() - Delete directories and all contents recursively
is_directory() - Check if a path is a directory using hdfsGetPathInfo
Additional Improvements:
Added private helper method _is_directory() for internal directory type checking
Fixed bug in hdfs_write_buffer_size assignment for upload options (was using __isset instead of actual value)
Added comprehensive test coverage including realistic spilling workflow simulation
Implementation Details:
All operations properly handle HDFS connections through existing HdfsFsCache infrastructure
Robust error handling with meaningful error messages using get_hdfs_err_msg()
Path existence validation before operations to provide clear error messages
Directory vs file type validation to prevent incorrect operations
Follows existing code patterns and error handling conventions in the codebase
Fixes#59757
Signed-off-by: Yakir Gibraltar <yakir.g@taboola.com>
Signed-off-by: Yakir Gibraltar <yakirgb@gmail.com>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Yakir Gibraltar <yakir.g@taboola.com>
Co-authored-by: Kevin Cai <caixh.kevin@gmail.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
After support file_bundling, we will create brpc channels between CN nodes during each publish operation which may affect publish performance.
What I'm doing:
Add lake service stub cache to avoid creating brpc channels on each publish
Signed-off-by: sevev <qiangzh95@gmail.com>
when partial compaction is not used, still need to set correct new segment info, so that abort txn can clean new segments
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
If rowset data is deleted by garbage collection, the inverted index will not be removed because path scanning ignores all of the directories under the tablet schema hash path.
What I'm doing:
Path scanning will scan inverted index paths.
Signed-off-by: wuxueyang.wxy <wuxueyang.wxy@alibaba-inc.com>
Why I'm doing:
refactor bitpacking code for further improvement.
What I'm doing:
This PR does:
merge bit_packing.h and bit_packing.inline.h => bit_packing_default.h. This implementation is to use template and unroll to do acceleration. Meanwhile, use namespace util::bitpacking_default instead of class BitPacking
rename bit_packing_simd.h to bit_packing_avx2.h, because it just uses avx2 instructions.
move arrow bit packing code to bit_packing_arrow.h
rename bit_packing_adaptor.h to bit_packing.h. And this is the entry file.
So right now we have following files, and entry file is bit_packing.h
-rw-rw-r-- 1 zhangyan zhangyan 4861 Jun 26 14:09 bit_packing_arrow.h
-rw-rw-r-- 1 zhangyan zhangyan 19580 Jun 26 14:05 bit_packing_avx2.h
-rw-rw-r-- 1 zhangyan zhangyan 11541 Jun 26 14:03 bit_packing_default.h
-rw-rw-r-- 1 zhangyan zhangyan 1708 Jun 26 14:10 bit_packing.h
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
Why I'm doing:
trying to implement functions in Good First Issue list
What I'm doing:
Trino reference:
image
Fixes#52604
Signed-off-by: Mesut-Doner <mesutdonerng@gmail.com>
Branch-3.3 (pr: #51263) has already set the default value of config::chunk_reserved_bytes_limit to 0, and there is no performance issue, so we finally removed the core memory allocator in the main branch.
What I'm doing:
Remove core arena mem allocator
Signed-off-by: trueeyu <lxhhust350@qq.com>
Why I'm doing:
When reading bundled data files, we should pass the file info instead of the file path, as the info may contain file size information. In some filesystem implementations, this avoids additional file size fetch requests.
What I'm doing:
This pull request modifies the FileSystem::new_random_access_file_w method in be/src/fs/fs.cpp to improve how RandomAccessFile objects are created by passing the entire FileInfo object instead of just its path attribute.
Changes to FileSystem::new_random_access_file_w:
Updated calls to new_random_access_file to use the full FileInfo object instead of only file_info.path. This ensures that all relevant file metadata is available during the creation of RandomAccessFile instances.
Signed-off-by: luohaha <18810541851@163.com>
Fixes#57461
mysql> select * from TABLE(list_rowsets(24015, 10));
ERROR 1064 (HY000): Only works for tablets in the cloud-native table: BE:11001
Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
## Why I'm doing:
Fixes regression after #52466
if we disable optimization with flag query will executed correctly
`enable_push_down_pre_agg_with_rank=false`
otherwise will crashed
another workaround: use specific column name instead of `*`
some function have empty arg_types, for example `count(*)`
and code will crash
```
TypeDescriptor arg_type = TypeDescriptor::from_thrift(fn.arg_types[0]);
```
```
PC: @ 0x1330e5b6 std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> >::size() const
*** SIGSEGV (@0x10) received by PID 133245 (TID 0x7f7b2a631640) from PID 16; stack trace: ***
@ 0x7f7bf4e3fee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
@ 0x1cc76d69 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x7f7bf4de8520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
@ 0x1330e5b6 std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> >::size() const
@ 0x188448e6 starrocks::TypeDescriptor::TypeDescriptor(std::vector<starrocks::TTypeNode, std::allocator<starrocks::TTypeNode> > const&, int*)
@ 0x13382799 starrocks::TypeDescriptor::from_thrift(starrocks::TTypeDesc const&)
@ 0x1837b490 starrocks::pipeline::LocalPartitionTopnContext::prepare_pre_agg(starrocks::RuntimeState*)
@ 0x1837ad8a starrocks::pipeline::LocalPartitionTopnContext::prepare(starrocks::RuntimeState*, starrocks::RuntimeProfile*)
@ 0x183b62fe starrocks::pipeline::LocalPartitionTopnSinkOperator::prepare(starrocks::RuntimeState*)
@ 0x1764e58f starrocks::pipeline::PipelineDriver::prepare(starrocks::RuntimeState*)
```
## What I'm doing:
add validation of parameter and if we have empty arg_types will set it as TYPE_UNKNOWN
it's safe because for `count` function we always override it later as
```cpp
if (fn.name.function_name == "count") {
arg_type.type = TYPE_BIGINT;
```
for other functions also safe and will handled by code with return Status::InternalError instead of BE crash
```cpp
func = get_window_function(fn.name.function_name, arg_type.type, return_type.type, is_input_nullable,
fn.binary_type, state->func_version());
if (func == nullptr) {
return Status::InternalError(strings::Substitute("Invalid window function plan: ($0, $1, $2, $3, $4, $5)",
fn.name.function_name, arg_type.type, return_type.type,
is_input_nullable, fn.binary_type, state->func_version()));
}
```
code introduced in 3.4 as optimization, so required branches to fix main, 3.6, 3.5, 3.4
Signed-off-by: Aliaksei Dziomin <diominay@gmail.com>
Why I'm doing:
Bug in HDFS upload operations where the write buffer size is incorrectly set to a boolean value (1) instead of the actual configured buffer size.
What I'm doing:
Fixed line 598 in be/src/fs/hdfs/fs_hdfs.cpp by changing the assignment from _options.upload->__isset.hdfs_write_buffer_size_kb (boolean flag) to _options.upload->hdfs_write_buffer_size_kb (actual buffer size value).
Fixes#59802
## What I'm doing:
support create table with decimal precision in the range (38, 76].
eg:
```
CREATE TABLE test_decimal (k1 int, k2 int, k3 int , d1 decimal(50, 24));
```
now, we can support decimal's precision in the range [1, 76]. decimal's scale in the range [0, 76].
Also, this patch removes some unused code.
Fixes #issue
https://github.com/StarRocks/starrocks/issues/59645
Signed-off-by: stephen <stephen5217@163.com>
Why I'm doing:
While the FE already has this logic, the BE would still send an empty Authorization: Basic header to ElasticSearch even if credentials were not provided; this causes a 403 even if the ES cluster has no AuthN/AuthZ settings. This change allows using StarRocks with ElasticSearch external catalogs even with Basic Auth disabled.
What I'm doing:
Adds a check for empty credentials (both username and password) in the ElasticSearch scan reader, in which case the Authorization: Basic ... header is not set.
Signed-off-by: Giorgio Pellero <giorgio.pellero@gmail.com>
Co-authored-by: alvin <115669851+alvin-celerdata@users.noreply.github.com>
According to the pulsar C++ API, we have 2 ways to call: consumer.seek():
Result seek(const MessageId& messageId);
Result seek(uint64_t timestamp);
And we intended to use the first one. But we use the wrong parameter here:
if (initial_position == InitialPosition::LATEST || initial_position == InitialPosition::EARLIEST) {
pulsar::InitialPosition p_initial_position = initial_position == InitialPosition::LATEST
? pulsar::InitialPosition::InitialPositionLatest
: pulsar::InitialPosition::InitialPositionEarliest;
result = _p_consumer.seek(p_initial_position);
...
}
So it call the 2nd function instead.
This PR will fix this issue.
Signed-off-by: jiutianchen <chen9t@gmail.com>
Introduce TabletRetainInfo class to save the necessary information by the specifed versions that used in vacuum process to determine which files(data or meta) can be deleted.
Signed-off-by: srlch <linzichao@starrocks.com>
Why I'm doing:
LRUCacheModule and StarCacheModule are only the wrappers of object cache interfaces. PageCache is used to cache the decompressed or decoded page, and BlockCache is used to cache the raw block of external tables or shared-data tables.
What I'm doing:
Use page cache instead of object cache to cache external file footer
Signed-off-by: trueeyu <lxhhust350@qq.com>
Why I'm doing:
This is a preliminary work of tablet splitting and merging.
Previously, bucket number is at physical partition level. All materialized indexes in a physical partition must have the same bucket number. But after tablet splitting, different materialized index in a physical partition could have different bucket number. We need to change bucket number from physical partition level to materialized index level.
What I'm doing:
Change bucket number from physical partition level to materialized index level.
Because different materialized index in a physical partition could have different bucket number. Tablet sink cannot calculate a unified tablet index for each record of data to be distributed to different materialized index.
To solve the problem, we refactor the code of tablet sink. Now tablet sink calculate a unified hash value for each record of data to be distributed to different materialized index, then the tablet index for each record of data will be calculated according to the hash value and the tablet size of a materialized index when the record of data is distributed to the materialized index.
This pr just refactor the code. Next pr will remove num_bucket in OlapTablePartition and use tablets.size() in each OlapTableIndexTablets instead.
Fixes#59134
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
dontdump_unused_pages() is called back during the execution of FATAL LOG, which does not allow LOG to be used again to print logs, which will cause deadlock.
Signed-off-by: meegoo <meegoo.sr@gmail.com>
Why I'm doing:
The interface of PageCache now supports the option evict_probability, which will be used by external tables.
For shared LRU cache, calculating memory usage is expensive. Temporarily disable the eviction probability's dependency on memory usage.
Remove the kept_in_memory option, which is currently unused. Use priority instead.
Signed-off-by: trueeyu <lxhhust350@qq.com>
Why I'm doing:
The original implementation of get_capacity() acquired locks for all shards one by one, and then summed their capacities, which was inefficient. To reduce lock contention, I modified it to retrieve the total capacity directly from the SharedLRUCache directory entry, which avoids locking multiple shards.
PageCache is used by both external tables and internal tables, so i move file page_cache.h from storage/rowset to cache/object_cache
HiveDataSource will use func page_cache_available instead of relying solely on configuration to determine whether to use page cache.
What I'm doing:
HiveDataSource will use page_cache_available instead of relying solely on configuration to determine whether to use page cache
Signed-off-by: trueeyu <lxhhust350@qq.com>
## Why I'm doing:
This pr(https://github.com/StarRocks/starrocks/pull/56234) add storage_size for lake table and add it to partition proc, but storage size does not added to table `partitions_meta`.
## What I'm doing:
Add storage_size to table `partitions_meta`
Signed-off-by: sevev <qiangzh95@gmail.com>
we didn't support decoding min/max value of some types, but we shouldn't return
the error status to users.
On main branch and branch-3.5, there is no bug, we just deal with this error status
asap to avoid further bug.
On branch-3.4 and branch-3.3, there is bug when the type of predicate is in-filter
or rf-min-max and the type of data is float/double.
Signed-off-by: zombee0 <ewang2027@gmail.com>
Why I'm doing:
Before changing behavior of ThreadLocalUUIDGenerator in #59107, added some minimal tests.
What I'm doing:
Added tests for ThreadLocalUUIDGenerator C++ class, checking UUID uniqueness and thread-safety.
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Why I'm doing:
We found that some ThreadPool tests are flaky.
What I'm doing:
This PR fixes the flaky tests by increasing the idle_timeout. Ran presubmit-tests 5 times, all passed.
Signed-off-by: Yaqi Zhang <y.zhang@celonis.com>
The combined_txn_log and aggregated_tablet_metadata are written to remote storage directly in the BRPC threads, which may block the BRPC threads. This PR submits these tasks to a thread pool for execution to avoid blocking BRPC threads.
Signed-off-by: sevev <qiangzh95@gmail.com>