Why I'm doing:
StarRocks supports spilling some intermediate data to disk or object storage when writing to native table. This can avoid wring too many small files under memory pressure.
The same issue is also exist when writing external table. However, now the spill procedure heavily coupled with native table and cannot be reused by external table directly.
So, it is necessary to introduce a separate module to implement the spill function, which can easily be used by native and external table.
What I'm doing:
Introduce a load chunk spiller to handle the load and merge functions.
Refactor the spill memtable sink of native table based on the load chunk spiller.
Signed-off-by: GavinMar <yangguansuo@starrocks.com>
What I'm doing:
bits_function: the implementation is wrong
change the static DCHECK to dynamic argument validation for some functions
fix some type mapping error in the logical_type.cpp
Signed-off-by: Murphy <mofei@starrocks.com>
## Why I'm doing:
Currently, users can only configure the timeout for prepared transactions through the global FE configuration `prepared_transaction_default_timeout_second`. This approach lacks flexibility as it requires all transactions to use the same timeout value. Users need the ability to specify different timeout values for different transactions based on their specific requirements, especially in production environments where precise control over transaction lifecycle is crucial.
## What I'm doing:
This PR adds support for the `prepared_timeout` configuration in transaction stream load, allowing users to specify a timeout period for transactions from PREPARED to COMMITTED state. The implementation includes:
**Backend Changes:**
- Added `HTTP_PREPARED_TIMEOUT` constant in `be/src/http/http_common.h`
- Extended `StreamLoadContext` with `prepared_timeout_second` field
- Modified `TransactionMgr` to parse `prepared_timeout` HTTP header
- Updated `StreamLoadExecutor::prepare_txn` to pass timeout to FE
- Enhanced `TransactionState` with `preparedTimeoutMs` field and timeout detection logic
- Updated Thrift interface `TLoadTxnCommitRequest` with `prepared_timeout_second` field
**Frontend Changes:**
- Modified `TransactionLoadAction` to parse `prepared_timeout` parameter
- Updated `TransactionState` with `setPreparedTimeAndTimeout` method
- Enhanced `DatabaseTransactionMgr` and `GlobalTransactionMgr` to handle prepared timeout
- Updated transaction timeout detection logic in `TransactionState::isTimeout`
**Usage Example:**
```bash
# Begin transaction
curl --location-trusted -u root: -H "label:test_txn" -H "timeout:300" -H "db:test_db" -H "table:test_table" \
-XPOST http://fe_host:8030/api/transaction/begin
# Load data
curl --location-trusted -u root: -H "label:test_txn" -H "db:test_db" -H "table:test_table" \
-d '1' -XPUT http://fe_host:8030/api/transaction/load
# Prepare transaction with custom timeout (60 seconds)
curl --location-trusted -u root: -H "label:test_txn" -H "db:test_db" \
-H "prepared_timeout:60" -XPOST http://fe_host:8030/api/transaction/prepare
# Commit transaction
curl --location-trusted -u root: -H "label:test_txn" -H "db:test_db" \
-XPOST http://fe_host:8030/api/transaction/commit
# View transaction details including PreparedTime and PreparedTimeoutMs
SHOW TRANSACTION WHERE id = <transaction_id>;
+---------------+--------+---------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+--------+--------------------+------------+-----------+-------------------+--------+
| TransactionId | Label | Coordinator | TransactionStatus | LoadJobSourceType | PrepareTime | PreparedTime | CommitTime | PublishTime | FinishTime | Reason | ErrorReplicasCount | ListenerId | TimeoutMs | PreparedTimeoutMs | ErrMsg |
+---------------+--------+---------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+--------+--------------------+------------+-----------+-------------------+--------+
| 1633 | test_txn | BE: 127.0.0.1 | VISIBLE | BACKEND_STREAMING | 2025-08-03 11:02:54 | 2025-08-03 11:03:10 | 2025-08-03 11:03:14 | 2025-08-03 11:03:14 | 2025-08-03 11:03:14 | | 0 | [12237] | 300000 | 60000 | |
+---------------+--------+---------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+--------+--------------------+------------+-----------+-------------------+--------+
```
**Documentation:**
- Updated `Stream_Load_transaction_interface.md` with `prepared_timeout` usage instructions
- Modified `SHOW_TRANSACTION.md` to document new `PreparedTime` and `PreparedTimeoutMs` fields
- Added version information indicating support from 4.0.0 onwards
The feature provides backward compatibility by using the FE configuration `prepared_transaction_default_timeout_second` as the default value when `prepared_timeout` is not specified.
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
Fix sse_memcmp UT compilation error on aarch64.
## Why I'm doing:
```
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/monotime_test.cpp.o
[ 96%] Building CXX object test/CMakeFiles/starrocks_test_objs.dir/util/mysql_row_buffer_test.cpp.o
/root/starrocks/be/test/util/memcmp_test.cpp: In member function 'virtual void starrocks::sse_memcmp_Test_Test::TestBody()':
/root/starrocks/be/test/util/memcmp_test.cpp:38:20: error: 'sse_memcmp2' was not declared in this scope
38 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:46:20: error: 'sse_memcmp2' was not declared in this scope
46 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:54:20: error: 'sse_memcmp2' was not declared in this scope
54 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:62:20: error: 'sse_memcmp2' was not declared in this scope
62 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:71:20: error: 'sse_memcmp2' was not declared in this scope
71 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:80:20: error: 'sse_memcmp2' was not declared in this scope
80 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:89:20: error: 'sse_memcmp2' was not declared in this scope
89 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
/root/starrocks/be/test/util/memcmp_test.cpp:98:20: error: 'sse_memcmp2' was not declared in this scope
98 | int res2 = sse_memcmp2(c1, c2, 3);
| ^~~~~~~~~~~
make[2]: *** [test/CMakeFiles/starrocks_test_objs.dir/util/memcmp_test.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [test/CMakeFiles/starrocks_test_objs.dir/all] Error 2
make: *** [all] Error 2
```
Signed-off-by: qingzhongli <qingzhongli2018@gmail.com>
the cpu instruction is off either because of not wanted the target instruction set or the build machine doesn't have the instruction supported. Be respectful to the instruction switch.
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
handle zstd decompress failure, throw runtime_error exception
fix orc_scanner tpch_10k.orc.zstd, it's corrupted. Replace it with correct test file and update the related test cases.
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
When sending a request to the /api/transaction/{begin,load,commit,...} endpoints, the content type is wrongly set to text/html instead of application/json.
What I'm doing:
Fixes#61130
Signed-off-by: Fatih Çatalkaya <fatih.catalkaya@yahoo.de>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
should not create a separate evhttp_request in test body
shall leverage the input_buffer created in the evhttp_request initialized by evhttp_request_new()
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
When clone and drop table run concurrency, the new_tablet during clone maybe dropped and throw null exception.
Signed-off-by: sevev <qiangzh95@gmail.com>
regression introduced in #53967
available in 3.5, backported to 3.4
3.3 doesn't have bug
`!(expr1 && expr2)` have to be transformed into `!expr1 || !expr2`
not like it happens in pr `!expr1 && !expr2`
otherwise we will use incorrect expression a top of scalar column
## Why I'm doing:
in #53967 we introduce zonemap filtering for struct column
but it contain bug
failed query
```
select x
from y
where x.field1[1].field2
```
```cpp
// check subfield expr has only one child, and it's a SlotRef
if (subfield_expr->children().size() != 1 && !subfield_expr->get_child(0)->is_slotref()) {
return Status::InternalError("Invalid pattern for predicate");
}
```
`subfield_expr->children().size() != 1` - `false`
`!subfield_expr->get_child(0)->is_slotref()` - `true`, because it will be array access
whole expression also `false` and execution continued
now we expect, because we want
```
// Rewrite ColumnExprPredicate which contains subfield expr and put subfield path into subfield_output
// For example, WHERE col.a.b.c > 5, a.b.c is subfields, we will rewrite it to c > 5
```
but still have `[1].field2`
## What I'm doing:
fix expression logic
Signed-off-by: Aliaksei Dziomin <diominay@gmail.com>
Fixes#59757
Why I'm doing:
When using HDFS as a remote storage volume for spilling, StarRocks fails with NOT_IMPLEMENTED_ERROR because several critical filesystem operations were not implemented in the HDFS filesystem wrapper (HdfsFileSystem class). These operations are essential for the spilling workflow:
Creating directories for spill data organization
Checking if paths are directories vs files
Deleting files and directories during cleanup
Managing directory hierarchy for spill containers
Without these operations, queries that need to spill to HDFS storage volumes cannot function, severely limiting StarRocks' ability to handle large datasets when using HDFS as external storage.
What I'm doing:
This PR implements the missing HDFS filesystem operations required for spilling functionality:
Implemented Operations:
delete_file() - Delete files from HDFS using hdfsDelete
create_dir() - Create directories using hdfsCreateDirectory
create_dir_if_missing() - Create directories if they don't exist (with existence check)
create_dir_recursive() - Create directories recursively (leverages HDFS native recursive creation)
delete_dir() - Delete empty directories using hdfsDelete
delete_dir_recursive() - Delete directories and all contents recursively
is_directory() - Check if a path is a directory using hdfsGetPathInfo
Additional Improvements:
Added private helper method _is_directory() for internal directory type checking
Fixed bug in hdfs_write_buffer_size assignment for upload options (was using __isset instead of actual value)
Added comprehensive test coverage including realistic spilling workflow simulation
Implementation Details:
All operations properly handle HDFS connections through existing HdfsFsCache infrastructure
Robust error handling with meaningful error messages using get_hdfs_err_msg()
Path existence validation before operations to provide clear error messages
Directory vs file type validation to prevent incorrect operations
Follows existing code patterns and error handling conventions in the codebase
Fixes#59757
Signed-off-by: Yakir Gibraltar <yakir.g@taboola.com>
Signed-off-by: Yakir Gibraltar <yakirgb@gmail.com>
Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Yakir Gibraltar <yakir.g@taboola.com>
Co-authored-by: Kevin Cai <caixh.kevin@gmail.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
Why I'm doing:
After support file_bundling, we will create brpc channels between CN nodes during each publish operation which may affect publish performance.
What I'm doing:
Add lake service stub cache to avoid creating brpc channels on each publish
Signed-off-by: sevev <qiangzh95@gmail.com>
when partial compaction is not used, still need to set correct new segment info, so that abort txn can clean new segments
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
If rowset data is deleted by garbage collection, the inverted index will not be removed because path scanning ignores all of the directories under the tablet schema hash path.
What I'm doing:
Path scanning will scan inverted index paths.
Signed-off-by: wuxueyang.wxy <wuxueyang.wxy@alibaba-inc.com>
Why I'm doing:
refactor bitpacking code for further improvement.
What I'm doing:
This PR does:
merge bit_packing.h and bit_packing.inline.h => bit_packing_default.h. This implementation is to use template and unroll to do acceleration. Meanwhile, use namespace util::bitpacking_default instead of class BitPacking
rename bit_packing_simd.h to bit_packing_avx2.h, because it just uses avx2 instructions.
move arrow bit packing code to bit_packing_arrow.h
rename bit_packing_adaptor.h to bit_packing.h. And this is the entry file.
So right now we have following files, and entry file is bit_packing.h
-rw-rw-r-- 1 zhangyan zhangyan 4861 Jun 26 14:09 bit_packing_arrow.h
-rw-rw-r-- 1 zhangyan zhangyan 19580 Jun 26 14:05 bit_packing_avx2.h
-rw-rw-r-- 1 zhangyan zhangyan 11541 Jun 26 14:03 bit_packing_default.h
-rw-rw-r-- 1 zhangyan zhangyan 1708 Jun 26 14:10 bit_packing.h
Signed-off-by: yan zhang <dirtysalt1987@gmail.com>
Why I'm doing:
trying to implement functions in Good First Issue list
What I'm doing:
Trino reference:
image
Fixes#52604
Signed-off-by: Mesut-Doner <mesutdonerng@gmail.com>
Branch-3.3 (pr: #51263) has already set the default value of config::chunk_reserved_bytes_limit to 0, and there is no performance issue, so we finally removed the core memory allocator in the main branch.
What I'm doing:
Remove core arena mem allocator
Signed-off-by: trueeyu <lxhhust350@qq.com>
Why I'm doing:
When reading bundled data files, we should pass the file info instead of the file path, as the info may contain file size information. In some filesystem implementations, this avoids additional file size fetch requests.
What I'm doing:
This pull request modifies the FileSystem::new_random_access_file_w method in be/src/fs/fs.cpp to improve how RandomAccessFile objects are created by passing the entire FileInfo object instead of just its path attribute.
Changes to FileSystem::new_random_access_file_w:
Updated calls to new_random_access_file to use the full FileInfo object instead of only file_info.path. This ensures that all relevant file metadata is available during the creation of RandomAccessFile instances.
Signed-off-by: luohaha <18810541851@163.com>
Fixes#57461
mysql> select * from TABLE(list_rowsets(24015, 10));
ERROR 1064 (HY000): Only works for tablets in the cloud-native table: BE:11001
Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>