Init commit

This commit is contained in:
Zhao Chun 2021-09-04 20:33:19 +08:00
commit 5fa55b8199
4381 changed files with 1070188 additions and 0 deletions

54
.gitignore vendored Normal file
View File

@ -0,0 +1,54 @@
*.swp
*.pyc
be/output
be/build
be/build_Release
be/ut_build
output
docs/contents
docs/.temp
docs/.vuepress/dist
docs/node_modules
docs/build
docs/contents
gensrc/build
fe/fe-core/target
thirdparty/src
thirdparty/installed
*.so.tmp
.DS_Store
*.iml
core.*
extension/spark-doris-connector/.classpath
extension/spark-doris-connector/target
fe/log
custom_env.sh
ut_dir
log/
fe_plugins/*/target
fe_plugins/output
fe/mocked
fe/ut_ports
fe/*/target
dependency-reduced-pom.xml
#ignore eclipse project file & idea project file
.cproject
.project
.settings/
.idea/
/Default/
be/cmake-build*
be/.vscode
be/src/gen_cpp/*.cc
be/src/gen_cpp/*.cpp
be/src/gen_cpp/*.h
be/src/gen_cpp/opcode
be/ut_build_ASAN/
be/tags
#ignore vscode project file
.vscode
.*
!.gitignore

263
APACHE-LICENSE-2.0.txt Normal file
View File

@ -0,0 +1,263 @@
Portions of this code are available here [https://github.com/apache/incubator-doris].
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--------------------------------------------------------------------------------
This product includes code from Apache Doris (incubating), which includes the
following in its NOTICE file:
Apache Doris (incubating)
Copyright 2018-2021 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Based on source code originally developed by
Baidu (http://www.baidu.com/).
--------------------------------------------------------------------------------
This product includes code from Apache Impala, which includes the following in
its NOTICE file:
Apache Impala
Copyright 2019 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Portions of this software were developed at
Cloudera, Inc (http://www.cloudera.com/).
This product includes software developed by the OpenSSL
Project for use in the OpenSSL Toolkit (http://www.openssl.org/)
This product includes cryptographic software written by Eric Young
(eay@cryptsoft.com). This product includes software written by Tim
Hudson (tjh@cryptsoft.com).
This product includes software developed by the University of Chicago,
as Operator of Argonne National Laboratory.
Copyright (C) 1999 University of Chicago. All rights reserved.
--------------------------------------------------------------------------------
This product includes code from Apache Kudu, which includes the following in
its NOTICE file:
Apache Kudu
Copyright 2016 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Portions of this software were developed at
Cloudera, Inc (http://www.cloudera.com/).
This product includes software developed by the OpenSSL
Project for use in the OpenSSL Toolkit (http://www.openssl.org/)
This product includes cryptographic software written by Eric Young
(eay@cryptsoft.com). This product includes software written by Tim
Hudson (tjh@cryptsoft.com).

45
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,45 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity, and expression, level of experience, education, socioeconomic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when an individual is representing the project or its community in public spaces. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at opensource@starrocks.com. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality about the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 1.4.
For answers to common questions about this code of conduct, see [FAQ](https://www.contributor-covenant.org/faq).

12
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,12 @@
# Contributing to StarRocks
## Code of Conduct
The code of conduct is described in [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md)
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You
only need to do this once. All your contributions will be covered by the CLA.
Our CLA is based on the Apache CLAs.
Complete your CLA [here](https://cla-assistant.io/StarRocks/StarRocks)

93
LICENSE.txt Normal file
View File

@ -0,0 +1,93 @@
Elastic License 2.0
URL: https://www.elastic.co/licensing/elastic-license
## Acceptance
By using the software, you agree to all of the terms and conditions below.
## Copyright License
The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.
## Limitations
You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.
You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.
You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor's trademarks is subject
to applicable law.
## Patents
The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to
the software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.
## Notices
You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.
If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.
## No Other Rights
These terms do not imply any licenses other than those expressly granted in
these terms.
## Termination
If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you
with a notice of your violation, and you cease all violation of this license no
later than 30 days after you receive that notice, your licenses will be
reinstated retroactively. However, if you violate these terms after such
reinstatement, any additional violation of these terms will cause your licenses
to terminate automatically and permanently.
## No Liability
*As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising
out of these terms or the use or nature of the software, under any kind of
legal claim.*
## Definitions
The **licensor** is the entity offering these terms, and the **software** is the
software the licensor makes available under these terms, including any portion
of it.
**you** refers to the individual or entity agreeing to these terms.
**your company** is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that
organization. **control** means ownership of substantially all the assets of an
entity, or the power to direct its management and policies by vote, contract, or
otherwise. Control can be direct or indirect.
**your licenses** are all the licenses granted to you for the software under
these terms.
**use** means anything you do with the software requiring one of your licenses.
**trademark** means trademarks, service marks, and similar rights.

34
README.md Normal file
View File

@ -0,0 +1,34 @@
# StarRocks
StarRocks is an next-gen MPP-based interactive database for all your analysius, including multi-dimensional analytics, real-time analytics and Ad-hoc query.
## Technology
* Native vectorized SQL engineStarRocks adopts vectorization technology to leverage the parallel computing power of CPU, including SIMD instructions and Cache Affinity. Their is a 5-10 times performance advantage over previous technologies.
* Simple architectureStarRocks does not rely on any external systems. The simple architecture makes it easy to deploy, maintain and scale out. Also provides high availability, reliability, fault tolerance, and scalability.
* Standard SQLStarRocks supports Ansi SQL syntax (fully supportted TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
* Smart Query Optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
* Realtime update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
* Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
* Convenient federated queries: StarRocks make it easy to run interactive ad-hoc analytic queries against data sources of Hive, MySQL and Elasticsearch.
## User cases
* StarRocks not only provides high concurrency & low latency point lookups, but also provides high throughput queries of ad-hoc analysis.
* StarRocks unified batch data ingestion and near real-time streaming.
* Pre-aggregations, Flat tables, star and snowflake schemas are supported and all run at enhanced speed.
* StarRocks hybridize serving and analysis requirements with an easy way to deploy, develop and use them.
## Install
Please refer [deploy](https://github.com/StarRocks/docs/blob/master/quick_start/deploy.md)
## Links
* [StarRocks official site](https://www.starrocks.com) (WIP)
* [StarRocks Documentation](https://docs.starrocks.com) (WIP)
## LICENSE
Code in this repository is provided under the Elastic License 2.0. Some portions are available under open source licenses.
Please see our FAQ.

6
be/.gitignore vendored Normal file
View File

@ -0,0 +1,6 @@
/*
!.gitignore
!benchmark/
!src/
!test/
!CMakeLists.txt

648
be/CMakeLists.txt Normal file
View File

@ -0,0 +1,648 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
cmake_minimum_required(VERSION 3.5.1)
# set CMAKE_C_COMPILER, this must set before project command
if (DEFINED ENV{CC} AND DEFINED ENV{CXX})
set(CMAKE_C_COMPILER "$ENV{CC}")
set(CMAKE_CXX_COMPILER "$ENV{CXX}")
elseif (DEFINED ENV{STARROCKS_GCC_HOME})
# prefer GCC
set(CMAKE_C_COMPILER "$ENV{STARROCKS_GCC_HOME}/bin/gcc")
set(CMAKE_CXX_COMPILER "$ENV{STARROCKS_GCC_HOME}/bin/g++")
elseif (DEFINED ENV{STARROCKS_LLVM_HOME})
set(CMAKE_C_COMPILER "$ENV{STARROCKS_LLVM_HOME}/bin/clang")
set(CMAKE_CXX_COMPILER "$ENV{STARROCKS_LLVM_HOME}/bin/clang++")
else()
message(FATAL_ERROR "STARROCKS_GCC_HOME environment variable is not set")
endif()
project(starrocks CXX C)
# set CMAKE_BUILD_TYPE
if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE RELEASE)
endif()
string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE)
message(STATUS "Build type is ${CMAKE_BUILD_TYPE}")
# set CMAKE_BUILD_TARGET_ARCH
# use `lscpu | grep 'Architecture' | awk '{print $2}'` only support system which language is en_US.UTF-8
execute_process(COMMAND bash "-c" "uname -m"
OUTPUT_VARIABLE
CMAKE_BUILD_TARGET_ARCH
OUTPUT_STRIP_TRAILING_WHITESPACE)
message(STATUS "Build target arch is ${CMAKE_BUILD_TARGET_ARCH}")
# Set dirs
set(BASE_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
set(ENV{STARROCKS_HOME} "${BASE_DIR}/../")
set(BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}")
set(THIRDPARTY_DIR "$ENV{STARROCKS_THIRDPARTY}/installed/")
set(GENSRC_DIR "${BASE_DIR}/../gensrc/build/")
set(SRC_DIR "${BASE_DIR}/src/")
set(TEST_DIR "${CMAKE_SOURCE_DIR}/test/")
set(OUTPUT_DIR "${BASE_DIR}/output")
set(CONTRIB_DIR "${BASE_DIR}/../contrib/")
if (APPLE)
set(MAKE_TEST "ON")
else()
option(MAKE_TEST "ON for make unit test or OFF for not" OFF)
endif()
message(STATUS "make test: ${MAKE_TEST}")
option(WITH_MYSQL "Support access MySQL" ON)
option(WITH_GCOV "Build binary with gcov to get code coverage" OFF)
# Check gcc
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8.2")
message(FATAL_ERROR "Need GCC version at least 4.8.2")
endif()
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "7.3.0")
message(STATUS "GCC version ${CMAKE_CXX_COMPILER_VERSION} is greater than 7.3.0, disable -Werror. Be careful with compile warnings.")
else()
# -Werror: compile warnings should be errors when using the toolchain compiler.
set(CXX_GCC_FLAGS "${CXX_GCC_FLAGS} -Werror")
endif()
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
elseif (NOT APPLE)
message(FATAL_ERROR "Compiler should be GNU")
endif()
set(PIC_LIB_PATH "${THIRDPARTY_DIR}")
if(PIC_LIB_PATH)
message(STATUS "defined PIC_LIB_PATH")
set(CMAKE_SKIP_RPATH TRUE)
set(Boost_USE_STATIC_LIBS ON)
set(Boost_USE_STATIC_RUNTIME ON)
set(LIBBZ2 ${PIC_LIB_PATH}/lib/libbz2.a)
set(LIBZ ${PIC_LIB_PATH}/lib/libz.a)
set(LIBEVENT ${PIC_LIB_PATH}/lib/libevent.a)
else()
message(STATUS "undefined PIC_LIB_PATH")
set(Boost_USE_STATIC_LIBS ON)
set(Boost_USE_STATIC_RUNTIME ON)
set(LIBBZ2 -lbz2)
set(LIBZ -lz)
set(LIBEVENT event)
endif()
# Compile generated source if necessary
message(STATUS "build gensrc if necessary")
execute_process(COMMAND make -C ${BASE_DIR}/../gensrc/
RESULT_VARIABLE MAKE_GENSRC_RESULT)
if(NOT ${MAKE_GENSRC_RESULT} EQUAL 0 AND NOT APPLE)
message(FATAL_ERROR "Failed to build ${BASE_DIR}/../gensrc/")
endif()
# Set Boost
set(Boost_DEBUG FALSE)
set(Boost_USE_MULTITHREADED ON)
set(BOOST_ROOT ${THIRDPARTY_DIR})
# boost suppress warning is supported on cmake 3.20
# https://cmake.org/cmake/help/latest/module/FindBoost.html
set(Boost_NO_WARN_NEW_VERSIONS ON)
if (NOT APPLE)
find_package(Boost 1.55.0 REQUIRED COMPONENTS thread regex filesystem system date_time program_options)
else()
find_package(Boost 1.55.0 COMPONENTS thread regex filesystem system date_time program_options)
endif()
include_directories(${Boost_INCLUDE_DIRS})
message(STATUS ${Boost_LIBRARIES})
set(GPERFTOOLS_HOME "${THIRDPARTY_DIR}/gperftools")
# Set all libraries
add_library(gflags STATIC IMPORTED)
set_target_properties(gflags PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libgflags.a)
add_library(glog STATIC IMPORTED)
set_target_properties(glog PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libglog.a)
add_library(re2 STATIC IMPORTED)
set_target_properties(re2 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libre2.a)
add_library(pprof STATIC IMPORTED)
set_target_properties(pprof PROPERTIES IMPORTED_LOCATION
${GPERFTOOLS_HOME}/lib/libprofiler.a)
add_library(tcmalloc STATIC IMPORTED)
set_target_properties(tcmalloc PROPERTIES IMPORTED_LOCATION
${GPERFTOOLS_HOME}/lib/libtcmalloc.a)
add_library(protobuf STATIC IMPORTED)
set_target_properties(protobuf PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libprotobuf.a)
add_library(protoc STATIC IMPORTED)
set_target_properties(protoc PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libprotoc.a)
add_library(gtest STATIC IMPORTED)
set_target_properties(gtest PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libgtest.a)
add_library(gmock STATIC IMPORTED)
set_target_properties(gmock PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libgmock.a)
add_library(snappy STATIC IMPORTED)
set_target_properties(snappy PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libsnappy.a)
add_library(curl STATIC IMPORTED)
set_target_properties(curl PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libcurl.a)
add_library(lz4 STATIC IMPORTED)
set_target_properties(lz4 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/liblz4.a)
add_library(thrift STATIC IMPORTED)
set_target_properties(thrift PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libthrift.a)
add_library(thriftnb STATIC IMPORTED)
set_target_properties(thriftnb PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libthriftnb.a)
if (WITH_MYSQL)
add_library(mysql STATIC IMPORTED)
set_target_properties(mysql PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libmysqlclient.a)
endif()
if (WITH_HDFS)
add_library(hdfs STATIC IMPORTED)
set_target_properties(hdfs PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libhdfs.a)
endif()
add_library(libevent STATIC IMPORTED)
set_target_properties(libevent PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libevent.a)
add_library(crypto STATIC IMPORTED)
set_target_properties(crypto PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libcrypto.a)
add_library(openssl STATIC IMPORTED)
set_target_properties(openssl PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libssl.a)
add_library(leveldb STATIC IMPORTED)
set_target_properties(leveldb PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libleveldb.a)
add_library(jemalloc STATIC IMPORTED)
set_target_properties(jemalloc PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libjemalloc.a)
add_library(brotlicommon STATIC IMPORTED)
set_target_properties(brotlicommon PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbrotlicommon.a)
add_library(brotlidec STATIC IMPORTED)
set_target_properties(brotlidec PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbrotlidec.a)
add_library(brotlienc STATIC IMPORTED)
set_target_properties(brotlienc PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbrotlienc.a)
add_library(zstd STATIC IMPORTED)
set_target_properties(zstd PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libzstd.a)
add_library(arrow STATIC IMPORTED)
set_target_properties(arrow PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libarrow.a)
add_library(double-conversion STATIC IMPORTED)
set_target_properties(double-conversion PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libdouble-conversion.a)
add_library(parquet STATIC IMPORTED)
set_target_properties(parquet PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libparquet.a)
add_library(brpc STATIC IMPORTED)
set_target_properties(brpc PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbrpc.a)
add_library(rocksdb STATIC IMPORTED)
set_target_properties(rocksdb PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/librocksdb.a)
add_library(librdkafka_cpp STATIC IMPORTED)
set_target_properties(librdkafka_cpp PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/librdkafka++.a)
add_library(librdkafka STATIC IMPORTED)
set_target_properties(librdkafka PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/librdkafka.a)
add_library(libs2 STATIC IMPORTED)
set_target_properties(libs2 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libs2.a)
add_library(bitshuffle STATIC IMPORTED)
set_target_properties(bitshuffle PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libbitshuffle.a)
add_library(roaring STATIC IMPORTED)
set_target_properties(roaring PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libroaring.a)
add_library(cctz STATIC IMPORTED)
set_target_properties(cctz PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libcctz.a)
add_library(benchmark STATIC IMPORTED)
set_target_properties(benchmark PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbenchmark.a)
add_library(benchmark_main STATIC IMPORTED)
set_target_properties(benchmark_main PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libbenchmark_main.a)
add_library(fmt STATIC IMPORTED)
set_target_properties(fmt PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libfmt.a)
add_library(ryu STATIC IMPORTED)
set_target_properties(ryu PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib64/libryu.a)
add_library(breakpad STATIC IMPORTED)
set_target_properties(breakpad PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libbreakpad_client.a)
find_program(THRIFT_COMPILER thrift ${CMAKE_SOURCE_DIR}/bin)
# Check if functions are supported in this platform. All flags will generated
# in gensrc/build/common/env_config.h.
# You can check funcion here which depends on platform. Don't forget add this
# to be/src/common/env_config.h.in
include(CheckFunctionExists)
check_function_exists(sched_getcpu HAVE_SCHED_GETCPU)
# support to pass cxx flags from environment.
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} $ENV{STARROCKS_CXX_COMMON_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} $ENV{STARROCKS_CXX_LINKER_FLAGS}")
# compiler flags that are common across debug/release builds
# -Wall: Enable all warnings.
# -Wno-sign-compare: suppress warnings for comparison between signed and unsigned
# integers
# -fno-strict-aliasing: disable optimizations that assume strict aliasing. This
# is unsafe to do if the code uses casts (which we obviously do).
# -Wno-unknown-pragmas: suppress warnings for unknown (compiler specific) pragmas
# -Wno-deprecated: gutil contains deprecated headers
# -Wno-vla: we use C99-style variable-length arrays
# -pthread: enable multithreaded malloc
# -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG: enable nanosecond precision for boost
# -fno-omit-frame-pointers: Keep frame pointer for functions in register
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall -Wno-sign-compare -Wno-unknown-pragmas -pthread -Wno-register")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-strict-aliasing -fno-omit-frame-pointer")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -std=gnu++17 -D__STDC_FORMAT_MACROS")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-deprecated -Wno-vla -Wno-comment")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Werror=return-type -Werror=switch")
# When LLVM is used, should give GCC_HOME to get c++11 header to use new string and list
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
if (DEFINED ENV{STARROCKS_GCC_HOME})
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} --gcc-toolchain=$ENV{STARROCKS_GCC_HOME}")
else()
message(WARNING "STARROCKS_GCC_HOME evnironment variable is not set, ")
endif()
endif()
if ("${CMAKE_BUILD_TARGET_ARCH}" STREQUAL "x86" OR "${CMAKE_BUILD_TARGET_ARCH}" STREQUAL "x86_64")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -msse4.2 -mavx2")
endif()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-attributes -DS2_USE_GFLAGS -DS2_USE_GLOG")
if (WITH_HDFS)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -DSTARROCKS_WITH_HDFS")
endif()
if (WITH_MYSQL)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -DSTARROCKS_WITH_MYSQL")
endif()
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -faligned-new")
endif()
# For any gcc builds:
# -g: Enable symbols for profiler tools
# -Wno-unused-local-typedefs: Do not warn for local typedefs that are unused.
set(CXX_GCC_FLAGS "${CXX_GCC_FLAGS} -g -Wno-unused-local-typedefs")
if (WITH_GCOV)
# To generate flags to code coverage
set(CXX_GCC_FLAGS "${CXX_GCC_FLAGS} -fprofile-arcs -ftest-coverage")
endif()
# to compresss debug section. https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -gz=zlib")
# For CMAKE_BUILD_TYPE=Debug
# -ggdb: Enable gdb debugging
# Debug information is stored as dwarf2 to be as compatible as possible
# -Werror: compile warnings should be errors when using the toolchain compiler.
# Only enable for debug builds because this is what we test in pre-commit tests.
set(CXX_FLAGS_DEBUG "${CXX_GCC_FLAGS} -ggdb -O0 -gdwarf-4")
# For CMAKE_BUILD_TYPE=Release
# -O3: Enable all compiler optimizations
# -DNDEBUG: Turn off dchecks/asserts/debug only code.
# -gdwarf-4: Debug information is stored as dwarf2 to be as compatible as possible
set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -gdwarf-4 -DNDEBUG")
SET(CXX_FLAGS_ASAN "${CXX_GCC_FLAGS} -ggdb3 -O0 -gdwarf-4 -fsanitize=address -DADDRESS_SANITIZER")
SET(CXX_FLAGS_LSAN "${CXX_GCC_FLAGS} -ggdb3 -O0 -gdwarf-4 -fsanitize=leak -DLEAK_SANITIZER")
# Set the flags to the undefined behavior sanitizer, also known as "ubsan"
# Turn on sanitizer and debug symbols to get stack traces:
SET(CXX_FLAGS_UBSAN "${CXX_FLAGS_RELEASE} -fsanitize=undefined")
# Ignore a number of noisy errors with too many false positives:
# TODO(zc):
# SET(CXX_FLAGS_UBSAN "${CXX_FLAGS_UBSAN} -fno-sanitize=alignment,function,vptr,float-divide-by-zero,float-cast-overflow")
# Don't enforce wrapped signed integer arithmetic so that the sanitizer actually sees
# Set the flags to the thread sanitizer, also known as "tsan"
# Turn on sanitizer and debug symbols to get stack traces:
SET(CXX_FLAGS_TSAN "${CXX_GCC_FLAGS} -O0 -ggdb3 -fsanitize=thread -DTHREAD_SANITIZER")
# Set compile flags based on the build type.
if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG")
SET(CMAKE_CXX_FLAGS ${CXX_FLAGS_DEBUG})
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "RELEASE" OR "${CMAKE_BUILD_TYPE}" STREQUAL "BCC")
SET(CMAKE_CXX_FLAGS ${CXX_FLAGS_RELEASE})
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "ASAN")
SET(CMAKE_CXX_FLAGS "${CXX_FLAGS_ASAN}")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "LSAN")
SET(CMAKE_CXX_FLAGS "${CXX_FLAGS_LSAN}")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "UBSAN")
SET(CMAKE_CXX_FLAGS "${CXX_FLAGS_UBSAN}")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "TSAN")
SET(CMAKE_CXX_FLAGS "${CXX_FLAGS_TSAN}")
else()
message(FATAL_ERROR "Unknown build type: ${CMAKE_BUILD_TYPE}")
endif()
# Add flags that are common across build types
SET(CMAKE_CXX_FLAGS "${CXX_COMMON_FLAGS} ${CMAKE_CXX_FLAGS}")
message(STATUS "Compiler Flags: ${CMAKE_CXX_FLAGS}")
# Thrift requires these two definitions for some types that we use
add_definitions(-DHAVE_INTTYPES_H -DHAVE_NETINET_IN_H)
# Set include dirs
include_directories(
BEFORE
${SRC_DIR}/formats/orc/apache-orc/c++/include/
)
include_directories(
${SRC_DIR}/
${TEST_DIR}/
${GENSRC_DIR}/
${THIRDPARTY_DIR}/include
${GPERFTOOLS_HOME}/include
${THIRDPARTY_DIR}/include/thrift/
${THIRDPARTY_DIR}/include/event/
${THIRDPARTY_DIR}/include/breakpad/
)
set(WL_START_GROUP "-Wl,--start-group")
set(WL_END_GROUP "-Wl,--end-group")
# Set starrocks libraries
# Set Palo libraries
set(STARROCKS_LINK_LIBS
${WL_START_GROUP}
Agent
Common
Column
Env
Exec
Exprs
Formats
Gutil
Memory
Olap
Rowset
OlapFs
Runtime
Service
Udf
Util
StarRocksGen
Webserver
TestUtil
Tools
Geo
Plugin
${WL_END_GROUP}
)
# Set thirdparty libraries
set(STARROCKS_DEPENDENCIES
${STARROCKS_DEPENDENCIES}
${WL_START_GROUP}
rocksdb
librdkafka_cpp
librdkafka
libs2
snappy
${Boost_LIBRARIES}
thrift
thriftnb
glog
re2
pprof
lz4
libevent
curl
${LIBZ}
${LIBBZ2}
gflags
brpc
protobuf
openssl
crypto
leveldb
bitshuffle
roaring
double-conversion
jemalloc
brotlicommon
brotlidec
brotlienc
zstd
arrow
parquet
orc
cctz
fmt
ryu
breakpad
${WL_END_GROUP}
)
if (WITH_MYSQL)
set(STARROCKS_DEPENDENCIES ${STARROCKS_DEPENDENCIES}
mysql
)
endif()
if (WITH_HDFS)
set(STARROCKS_DEPENDENCIES ${STARROCKS_DEPENDENCIES}
hdfs
jvm
)
endif()
# Add all external dependencies. They should come after the starrocks libs.
# static link gcc's lib
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS}
${STARROCKS_DEPENDENCIES}
-static-libstdc++
-static-libgcc
)
if ("${CMAKE_BUILD_TYPE}" STREQUAL "BCC")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS}
-Wl,--dynamic-linker=/lib64/ld-linux-x86-64.so.2
)
endif()
# Add sanitize static link flags or tcmalloc
if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG" OR "${CMAKE_BUILD_TYPE}" STREQUAL "RELEASE" OR "${CMAKE_BUILD_TYPE}" STREQUAL "BCC")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} tcmalloc)
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "ASAN")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -static-libasan)
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "LSAN")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -static-liblsan)
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "UBSAN")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -static-libubsan tcmalloc)
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "TSAN")
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -static-libtsan)
else()
message(FATAL_ERROR "Unknown build type: ${CMAKE_BUILD_TYPE}")
endif()
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS}
-lrt -lbfd -liberty -lc -lm -ldl -rdynamic -pthread
)
# link gcov if WITH_GCOV is on
if (WITH_GCOV)
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -lgcov)
endif()
# Set libraries for test
set (TEST_LINK_LIBS ${STARROCKS_LINK_LIBS}
${WL_START_GROUP}
gmock
gtest
${WL_END_GROUP}
)
# Only build static libs
set(BUILD_SHARED_LIBS OFF)
if (${MAKE_TEST} STREQUAL "ON")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-arcs -ftest-coverage -DGTEST_USE_OWN_TR1_TUPLE=0")
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fprofile-arcs -ftest-coverage -lgcov")
add_definitions(-DBE_TEST)
else()
# output *.a, *.so, *.dylib to output/tmp
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${OUTPUT_DIR}/tmp/${CMAKE_BUILD_TYPE})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${OUTPUT_DIR}/tmp/${CMAKE_BUILD_TYPE})
# output *.exe to output/lib
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${OUTPUT_DIR}/lib)
endif ()
add_subdirectory(${SRC_DIR}/agent)
add_subdirectory(${SRC_DIR}/common)
add_subdirectory(${SRC_DIR}/column)
add_subdirectory(${SRC_DIR}/formats)
add_subdirectory(${SRC_DIR}/env)
add_subdirectory(${SRC_DIR}/exec)
add_subdirectory(${SRC_DIR}/exprs)
add_subdirectory(${SRC_DIR}/gen_cpp)
add_subdirectory(${SRC_DIR}/geo)
add_subdirectory(${SRC_DIR}/gutil)
add_subdirectory(${SRC_DIR}/http)
add_subdirectory(${SRC_DIR}/storage)
add_subdirectory(${SRC_DIR}/runtime)
add_subdirectory(${SRC_DIR}/service)
add_subdirectory(${SRC_DIR}/testutil)
add_subdirectory(${SRC_DIR}/udf)
add_subdirectory(${SRC_DIR}/tools)
add_subdirectory(${SRC_DIR}/util)
add_subdirectory(${SRC_DIR}/plugin)
# Utility CMake function to make specifying tests and benchmarks less verbose
FUNCTION(ADD_BE_TEST TEST_NAME)
set(BUILD_OUTPUT_ROOT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/")
# This gets the directory where the test is from (e.g. 'exprs' or 'runtime')
get_filename_component(DIR_NAME ${CMAKE_CURRENT_SOURCE_DIR} NAME)
get_filename_component(TEST_DIR_NAME ${TEST_NAME} PATH)
get_filename_component(TEST_FILE_NAME ${TEST_NAME} NAME)
ADD_EXECUTABLE(${TEST_FILE_NAME} ${TEST_NAME}.cpp)
TARGET_LINK_LIBRARIES(${TEST_FILE_NAME} ${TEST_LINK_LIBS})
SET_TARGET_PROPERTIES(${TEST_FILE_NAME} PROPERTIES COMPILE_FLAGS "-fno-access-control")
if (NOT "${TEST_DIR_NAME}" STREQUAL "")
SET_TARGET_PROPERTIES(${TEST_FILE_NAME} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${BUILD_OUTPUT_ROOT_DIRECTORY}/${TEST_DIR_NAME}")
endif()
ADD_TEST(${TEST_FILE_NAME} "${BUILD_OUTPUT_ROOT_DIRECTORY}/${TEST_NAME}")
ENDFUNCTION()
FUNCTION(ADD_BE_PLUGIN PLUGIN_NAME)
set(BUILD_OUTPUT_ROOT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/")
get_filename_component(DIR_NAME ${CMAKE_CURRENT_SOURCE_DIR} NAME)
get_filename_component(PLUGIN_DIR_NAME ${PLUGIN_NAME} PATH)
get_filename_component(PLUGIN_FILE_NAME ${PLUGIN_NAME} NAME)
ADD_LIBRARY(${PLUGIN_FILE_NAME} SHARED ${PLUGIN_NAME}.cpp)
TARGET_LINK_LIBRARIES(${PLUGIN_FILE_NAME} ${STARROCKS_LINK_LIBS})
SET_TARGET_PROPERTIES(${PLUGIN_FILE_NAME} PROPERTIES COMPILE_FLAGS "-fno-access-control")
if (NOT "${PLUGIN_DIR_NAME}" STREQUAL "")
SET_TARGET_PROPERTIES(${PLUGIN_FILE_NAME} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${BUILD_OUTPUT_ROOT_DIRECTORY}/${PLUGIN_DIR_NAME}")
endif ()
ENDFUNCTION()
if (${MAKE_TEST} STREQUAL "ON")
add_subdirectory(test)
# The following commands should be removed after all tests mreged into a single binary.
add_subdirectory(${TEST_DIR}/exec)
add_subdirectory(${TEST_DIR}/http)
add_subdirectory(${TEST_DIR}/storage)
add_subdirectory(${TEST_DIR}/runtime)
add_subdirectory(${TEST_DIR}/util)
endif ()
# Install be
install(DIRECTORY DESTINATION ${OUTPUT_DIR})
install(DIRECTORY DESTINATION ${OUTPUT_DIR}/bin)
install(DIRECTORY DESTINATION ${OUTPUT_DIR}/conf)
install(FILES
${BASE_DIR}/../bin/common.sh
${BASE_DIR}/../bin/start_be.sh
${BASE_DIR}/../bin/stop_be.sh
${BASE_DIR}/../bin/show_be_version.sh
${BASE_DIR}/../bin/meta_tool.sh
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE
GROUP_READ GROUP_WRITE GROUP_EXECUTE
WORLD_READ WORLD_EXECUTE
DESTINATION ${OUTPUT_DIR}/bin)
install(FILES
${BASE_DIR}/../conf/be.conf
${BASE_DIR}/../conf/hadoop_env.sh
DESTINATION ${OUTPUT_DIR}/conf)
install(DIRECTORY
${BASE_DIR}/../webroot/be/
DESTINATION ${OUTPUT_DIR}/www)

View File

@ -0,0 +1,33 @@
# This file is made available under Elastic License 2.0.
# This file is based on code available under the Apache license here:
# https://github.com/apache/incubator-doris/blob/master/be/src/agent/CMakeLists.txt
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# where to put generated libraries
set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/agent")
# where to put generated binaries
set(EXECUTABLE_OUTPUT_PATH "${BUILD_DIR}/src/agent")
add_library(Agent STATIC
agent_server.cpp
heartbeat_server.cpp
task_worker_pool.cpp
utils.cpp
)

View File

@ -0,0 +1,216 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/agent_server.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "agent/agent_server.h"
#include <filesystem>
#include <string>
#include "agent/task_worker_pool.h"
#include "common/logging.h"
#include "common/status.h"
#include "gutil/strings/substitute.h"
#include "storage/snapshot_manager.h"
using std::string;
using std::vector;
namespace starrocks {
AgentServer::AgentServer(ExecEnv* exec_env, const TMasterInfo& master_info)
: _exec_env(exec_env), _master_info(master_info) {
for (auto& path : exec_env->store_paths()) {
try {
string dpp_download_path_str = path.path + DPP_PREFIX;
std::filesystem::path dpp_download_path(dpp_download_path_str);
if (std::filesystem::exists(dpp_download_path)) {
std::filesystem::remove_all(dpp_download_path);
}
} catch (...) {
LOG(WARNING) << "std exception when remove dpp download path. path=" << path.path;
}
}
// It is the same code to create workers of each type, so we use a macro
// to make code to be more readable.
#ifndef BE_TEST
#define CREATE_AND_START_POOL(type, pool_name) \
pool_name.reset(new TaskWorkerPool(TaskWorkerPool::TaskWorkerType::type, _exec_env, master_info)); \
pool_name->start();
#else
#define CREATE_AND_START_POOL(type, pool_name)
#endif // BE_TEST
CREATE_AND_START_POOL(CREATE_TABLE, _create_tablet_workers);
CREATE_AND_START_POOL(DROP_TABLE, _drop_tablet_workers);
// Both PUSH and REALTIME_PUSH type use _push_workers
CREATE_AND_START_POOL(PUSH, _push_workers);
CREATE_AND_START_POOL(PUBLISH_VERSION, _publish_version_workers);
CREATE_AND_START_POOL(CLEAR_TRANSACTION_TASK, _clear_transaction_task_workers);
CREATE_AND_START_POOL(DELETE, _delete_workers);
CREATE_AND_START_POOL(ALTER_TABLE, _alter_tablet_workers);
CREATE_AND_START_POOL(CLONE, _clone_workers);
CREATE_AND_START_POOL(STORAGE_MEDIUM_MIGRATE, _storage_medium_migrate_workers);
CREATE_AND_START_POOL(CHECK_CONSISTENCY, _check_consistency_workers);
CREATE_AND_START_POOL(REPORT_TASK, _report_task_workers);
CREATE_AND_START_POOL(REPORT_DISK_STATE, _report_disk_state_workers);
CREATE_AND_START_POOL(REPORT_OLAP_TABLE, _report_tablet_workers);
CREATE_AND_START_POOL(UPLOAD, _upload_workers);
CREATE_AND_START_POOL(DOWNLOAD, _download_workers);
CREATE_AND_START_POOL(MAKE_SNAPSHOT, _make_snapshot_workers);
CREATE_AND_START_POOL(RELEASE_SNAPSHOT, _release_snapshot_workers);
CREATE_AND_START_POOL(MOVE, _move_dir_workers);
CREATE_AND_START_POOL(UPDATE_TABLET_META_INFO, _update_tablet_meta_info_workers);
#undef CREATE_AND_START_POOL
}
AgentServer::~AgentServer() {}
// TODO(lingbin): each task in the batch may have it own status or FE must check and
// resend request when something is wrong(BE may need some logic to guarantee idempotence.
void AgentServer::submit_tasks(TAgentResult& agent_result, const std::vector<TAgentTaskRequest>& tasks) {
Status ret_st;
// TODO check master_info here if it is the same with that of heartbeat rpc
if (_master_info.network_address.hostname == "" || _master_info.network_address.port == 0) {
Status ret_st = Status::Cancelled("Have not get FE Master heartbeat yet");
ret_st.to_thrift(&agent_result.status);
return;
}
for (auto task : tasks) {
VLOG_RPC << "submit one task: " << apache::thrift::ThriftDebugString(task).c_str();
TTaskType::type task_type = task.task_type;
int64_t signature = task.signature;
#define HANDLE_TYPE(t_task_type, work_pool, req_member) \
case t_task_type: \
if (task.__isset.req_member) { \
work_pool->submit_task(task); \
} else { \
ret_st = Status::InvalidArgument( \
strings::Substitute("task(signature=$0) has wrong request member", signature)); \
} \
break;
// TODO(lingbin): It still too long, divided these task types into several categories
switch (task_type) {
HANDLE_TYPE(TTaskType::CREATE, _create_tablet_workers, create_tablet_req);
HANDLE_TYPE(TTaskType::DROP, _drop_tablet_workers, drop_tablet_req);
HANDLE_TYPE(TTaskType::PUBLISH_VERSION, _publish_version_workers, publish_version_req);
HANDLE_TYPE(TTaskType::CLEAR_TRANSACTION_TASK, _clear_transaction_task_workers, clear_transaction_task_req);
HANDLE_TYPE(TTaskType::CLONE, _clone_workers, clone_req);
HANDLE_TYPE(TTaskType::STORAGE_MEDIUM_MIGRATE, _storage_medium_migrate_workers, storage_medium_migrate_req);
HANDLE_TYPE(TTaskType::CHECK_CONSISTENCY, _check_consistency_workers, check_consistency_req);
HANDLE_TYPE(TTaskType::UPLOAD, _upload_workers, upload_req);
HANDLE_TYPE(TTaskType::DOWNLOAD, _download_workers, download_req);
HANDLE_TYPE(TTaskType::MAKE_SNAPSHOT, _make_snapshot_workers, snapshot_req);
HANDLE_TYPE(TTaskType::RELEASE_SNAPSHOT, _release_snapshot_workers, release_snapshot_req);
HANDLE_TYPE(TTaskType::MOVE, _move_dir_workers, move_dir_req);
HANDLE_TYPE(TTaskType::UPDATE_TABLET_META_INFO, _update_tablet_meta_info_workers,
update_tablet_meta_info_req);
case TTaskType::REALTIME_PUSH:
case TTaskType::PUSH:
if (!task.__isset.push_req) {
ret_st = Status::InvalidArgument(
strings::Substitute("task(signature=$0) has wrong request member", signature));
break;
}
if (task.push_req.push_type == TPushType::LOAD_V2) {
_push_workers->submit_task(task);
} else if (task.push_req.push_type == TPushType::DELETE) {
_delete_workers->submit_task(task);
} else {
ret_st = Status::InvalidArgument(
strings::Substitute("task(signature=$0, type=$1, push_type=$2) has wrong push_type", signature,
task_type, task.push_req.push_type));
}
break;
case TTaskType::ALTER:
if (task.__isset.alter_tablet_req || task.__isset.alter_tablet_req_v2) {
_alter_tablet_workers->submit_task(task);
} else {
ret_st = Status::InvalidArgument(
strings::Substitute("task(signature=$0) has wrong request member", signature));
}
break;
default:
ret_st = Status::InvalidArgument(
strings::Substitute("task(signature=$0, type=$1) has wrong task type", signature, task_type));
break;
}
#undef HANDLE_TYPE
if (!ret_st.ok()) {
LOG(WARNING) << "fail to submit task. reason: " << ret_st.get_error_msg() << ", task: " << task;
// For now, all tasks in the batch share one status, so if any task
// was failed to submit, we can only return error to FE(even when some
// tasks have already been successfully submitted).
// However, Fe does not check the return status of submit_tasks() currently,
// and it is not sure that FE will retry when something is wrong, so here we
// only print an warning log and go on(i.e. do not break current loop),
// to ensure every task can be submitted once. It is OK for now, because the
// ret_st can be error only when it encounters an wrong task_type and
// req-member in TAgentTaskRequest, which is basically impossible.
// TODO(lingbin): check the logic in FE again later.
}
}
ret_st.to_thrift(&agent_result.status);
}
void AgentServer::make_snapshot(TAgentResult& t_agent_result, const TSnapshotRequest& snapshot_request) {
string snapshot_path;
auto st = SnapshotManager::instance()->make_snapshot(snapshot_request, &snapshot_path);
if (!st.ok()) {
LOG(WARNING) << "fail to make_snapshot. tablet_id=" << snapshot_request.tablet_id
<< ", schema_hash=" << snapshot_request.schema_hash << ", msg=" << st.to_string();
} else {
LOG(INFO) << "success to make_snapshot. tablet_id=" << snapshot_request.tablet_id
<< ", schema_hash=" << snapshot_request.schema_hash << ", snapshot_path: " << snapshot_path;
t_agent_result.__set_snapshot_path(snapshot_path);
}
st.to_thrift(&t_agent_result.status);
t_agent_result.__set_snapshot_format(snapshot_request.preferred_snapshot_format);
t_agent_result.__set_allow_incremental_clone(true);
}
void AgentServer::release_snapshot(TAgentResult& t_agent_result, const std::string& snapshot_path) {
Status ret_st;
OLAPStatus err_code = SnapshotManager::instance()->release_snapshot(snapshot_path);
if (err_code != OLAP_SUCCESS) {
LOG(WARNING) << "failt to release_snapshot. snapshot_path: " << snapshot_path << ", err_code: " << err_code;
ret_st = Status::RuntimeError(strings::Substitute("fail to release_snapshot. err_code=$0", err_code));
} else {
LOG(INFO) << "success to release_snapshot. snapshot_path=" << snapshot_path << ", err_code=" << err_code;
}
ret_st.to_thrift(&t_agent_result.status);
}
void AgentServer::publish_cluster_state(TAgentResult& t_agent_result, const TAgentPublishRequest& request) {
Status status = Status::NotSupported("deprecated method(publish_cluster_state) was invoked");
status.to_thrift(&t_agent_result.status);
}
} // namespace starrocks

View File

@ -0,0 +1,89 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/agent_server.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_AGENT_AGENT_SERVER_H
#define STARROCKS_BE_SRC_AGENT_AGENT_SERVER_H
#include <memory>
#include <string>
#include <vector>
#include "gen_cpp/AgentService_types.h"
#include "runtime/exec_env.h"
namespace starrocks {
class TaskWorkerPool;
// Each method corresponds to one RPC from FE Master, see BackendService.
class AgentServer {
public:
explicit AgentServer(ExecEnv* exec_env, const TMasterInfo& master_info);
~AgentServer();
// Receive agent task from FE master
void submit_tasks(TAgentResult& agent_result, const std::vector<TAgentTaskRequest>& tasks);
// TODO(lingbin): make the agent_result to be a pointer, because it will be modified.
void make_snapshot(TAgentResult& agent_result, const TSnapshotRequest& snapshot_request);
void release_snapshot(TAgentResult& agent_result, const std::string& snapshot_path);
// Deprected
// TODO(lingbin): This method is deprecated, should be removed later.
void publish_cluster_state(TAgentResult& agent_result, const TAgentPublishRequest& request);
private:
DISALLOW_COPY_AND_ASSIGN(AgentServer);
// Not Owned
ExecEnv* _exec_env;
// Reference to the ExecEnv::_master_info
const TMasterInfo& _master_info;
std::unique_ptr<TaskWorkerPool> _create_tablet_workers;
std::unique_ptr<TaskWorkerPool> _drop_tablet_workers;
std::unique_ptr<TaskWorkerPool> _push_workers;
std::unique_ptr<TaskWorkerPool> _publish_version_workers;
std::unique_ptr<TaskWorkerPool> _clear_transaction_task_workers;
std::unique_ptr<TaskWorkerPool> _delete_workers;
std::unique_ptr<TaskWorkerPool> _alter_tablet_workers;
std::unique_ptr<TaskWorkerPool> _clone_workers;
std::unique_ptr<TaskWorkerPool> _storage_medium_migrate_workers;
std::unique_ptr<TaskWorkerPool> _check_consistency_workers;
// These 3 worker-pool do not accept tasks from FE.
// It is self triggered periodically and reports to Fe master
std::unique_ptr<TaskWorkerPool> _report_task_workers;
std::unique_ptr<TaskWorkerPool> _report_disk_state_workers;
std::unique_ptr<TaskWorkerPool> _report_tablet_workers;
std::unique_ptr<TaskWorkerPool> _upload_workers;
std::unique_ptr<TaskWorkerPool> _download_workers;
std::unique_ptr<TaskWorkerPool> _make_snapshot_workers;
std::unique_ptr<TaskWorkerPool> _release_snapshot_workers;
std::unique_ptr<TaskWorkerPool> _move_dir_workers;
std::unique_ptr<TaskWorkerPool> _recover_tablet_workers;
std::unique_ptr<TaskWorkerPool> _update_tablet_meta_info_workers;
};
} // end namespace starrocks
#endif // STARROCKS_BE_SRC_AGENT_AGENT_SERVER_H

View File

@ -0,0 +1,180 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/heartbeat_server.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "agent/heartbeat_server.h"
#include <thrift/TProcessor.h>
#include <ctime>
#include <fstream>
#include "common/status.h"
#include "gen_cpp/HeartbeatService.h"
#include "gen_cpp/Status_types.h"
#include "runtime/heartbeat_flags.h"
#include "service/backend_options.h"
#include "storage/storage_engine.h"
#include "storage/utils.h"
#include "util/debug_util.h"
#include "util/thrift_server.h"
using std::fstream;
using std::nothrow;
using std::string;
using std::vector;
using apache::thrift::transport::TProcessor;
namespace starrocks {
HeartbeatServer::HeartbeatServer(TMasterInfo* master_info) : _master_info(master_info), _epoch(0) {
_olap_engine = StorageEngine::instance();
}
void HeartbeatServer::init_cluster_id() {
_master_info->cluster_id = _olap_engine->effective_cluster_id();
}
void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result, const TMasterInfo& master_info) {
//print heartbeat in every minute
LOG_EVERY_N(INFO, 12) << "get heartbeat from FE."
<< "host:" << master_info.network_address.hostname
<< ", port:" << master_info.network_address.port << ", cluster id:" << master_info.cluster_id
<< ", counter:" << google::COUNTER;
// do heartbeat
Status st = _heartbeat(master_info);
st.to_thrift(&heartbeat_result.status);
if (st.ok()) {
heartbeat_result.backend_info.__set_be_port(config::be_port);
heartbeat_result.backend_info.__set_http_port(config::webserver_port);
heartbeat_result.backend_info.__set_be_rpc_port(-1);
heartbeat_result.backend_info.__set_brpc_port(config::brpc_port);
heartbeat_result.backend_info.__set_version(get_short_version());
}
}
Status HeartbeatServer::_heartbeat(const TMasterInfo& master_info) {
std::lock_guard<std::mutex> lk(_hb_mtx);
if (master_info.__isset.backend_ip) {
if (master_info.backend_ip != BackendOptions::get_localhost()) {
LOG(WARNING) << "backend ip saved in master does not equal to backend local ip" << master_info.backend_ip
<< " vs. " << BackendOptions::get_localhost();
std::stringstream ss;
ss << "actual backend local ip: " << BackendOptions::get_localhost();
return Status::InternalError(ss.str());
}
}
// Check cluster id
if (_master_info->cluster_id == -1) {
LOG(INFO) << "get first heartbeat. update cluster id";
// write and update cluster id
auto st = _olap_engine->set_cluster_id(master_info.cluster_id);
if (!st.ok()) {
LOG(WARNING) << "fail to set cluster id. status=" << st.get_error_msg();
return Status::InternalError("fail to set cluster id.");
} else {
_master_info->cluster_id = master_info.cluster_id;
LOG(INFO) << "record cluster id. host: " << master_info.network_address.hostname
<< ". port: " << master_info.network_address.port << ". cluster id: " << master_info.cluster_id;
}
} else {
if (_master_info->cluster_id != master_info.cluster_id) {
LOG(WARNING) << "ignore invalid cluster id: " << master_info.cluster_id;
return Status::InternalError("invalid cluster id. ignore.");
}
}
bool need_report = false;
if (_master_info->network_address.hostname != master_info.network_address.hostname ||
_master_info->network_address.port != master_info.network_address.port) {
if (master_info.epoch > _epoch) {
_master_info->network_address.hostname = master_info.network_address.hostname;
_master_info->network_address.port = master_info.network_address.port;
_epoch = master_info.epoch;
need_report = true;
LOG(INFO) << "master change. new master host: " << _master_info->network_address.hostname
<< ". port: " << _master_info->network_address.port << ". epoch: " << _epoch;
} else {
LOG(WARNING) << "epoch is not greater than local. ignore heartbeat. host: "
<< _master_info->network_address.hostname << " port: " << _master_info->network_address.port
<< " local epoch: " << _epoch << " received epoch: " << master_info.epoch;
return Status::InternalError("epoch is not greater than local. ignore heartbeat.");
}
} else {
// when Master FE restarted, host and port remains the same, but epoch will be increased.
if (master_info.epoch > _epoch) {
_epoch = master_info.epoch;
need_report = true;
LOG(INFO) << "master restarted. epoch: " << _epoch;
}
}
if (master_info.__isset.token) {
if (!_master_info->__isset.token) {
_master_info->__set_token(master_info.token);
LOG(INFO) << "get token. token: " << _master_info->token;
} else if (_master_info->token != master_info.token) {
LOG(WARNING) << "invalid token. local_token:" << _master_info->token << ". token:" << master_info.token;
return Status::InternalError("invalid token.");
}
}
if (master_info.__isset.http_port) {
_master_info->__set_http_port(master_info.http_port);
}
if (master_info.__isset.heartbeat_flags) {
HeartbeatFlags* heartbeat_flags = ExecEnv::GetInstance()->heartbeat_flags();
heartbeat_flags->update(master_info.heartbeat_flags);
}
if (master_info.__isset.backend_id) {
_master_info->__set_backend_id(master_info.backend_id);
}
if (need_report) {
LOG(INFO) << "Master FE is changed or restarted. report tablet and disk info immediately";
_olap_engine->trigger_report();
}
return Status::OK();
}
AgentStatus create_heartbeat_server(ExecEnv* exec_env, uint32_t server_port, ThriftServer** thrift_server,
uint32_t worker_thread_num, TMasterInfo* local_master_info) {
HeartbeatServer* heartbeat_server = new (nothrow) HeartbeatServer(local_master_info);
if (heartbeat_server == NULL) {
return STARROCKS_ERROR;
}
heartbeat_server->init_cluster_id();
std::shared_ptr<HeartbeatServer> handler(heartbeat_server);
std::shared_ptr<TProcessor> server_processor(new HeartbeatServiceProcessor(handler));
string server_name("heartbeat");
*thrift_server =
new ThriftServer(server_name, server_processor, server_port, exec_env->metrics(), worker_thread_num);
return STARROCKS_SUCCESS;
}
} // namespace starrocks

View File

@ -0,0 +1,74 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/heartbeat_server.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_AGENT_HEARTBEAT_SERVER_H
#define STARROCKS_BE_SRC_AGENT_HEARTBEAT_SERVER_H
#include <mutex>
#include "agent/status.h"
#include "gen_cpp/HeartbeatService.h"
#include "gen_cpp/Status_types.h"
#include "runtime/exec_env.h"
#include "storage/olap_define.h"
#include "thrift/transport/TTransportUtils.h"
namespace starrocks {
const uint32_t HEARTBEAT_INTERVAL = 10;
class StorageEngine;
class Status;
class ThriftServer;
class HeartbeatServer : public HeartbeatServiceIf {
public:
explicit HeartbeatServer(TMasterInfo* master_info);
virtual ~HeartbeatServer(){};
virtual void init_cluster_id();
// Master send heartbeat to this server
//
// Input parameters:
// * master_info: The struct of master info, contains host ip and port
//
// Output parameters:
// * heartbeat_result: The result of heartbeat set
virtual void heartbeat(THeartbeatResult& heartbeat_result, const TMasterInfo& master_info);
private:
Status _heartbeat(const TMasterInfo& master_info);
StorageEngine* _olap_engine;
// mutex to protect master_info and _epoch
std::mutex _hb_mtx;
// Not owned. Point to the ExecEnv::_master_info
TMasterInfo* _master_info;
int64_t _epoch;
DISALLOW_COPY_AND_ASSIGN(HeartbeatServer);
}; // class HeartBeatServer
AgentStatus create_heartbeat_server(ExecEnv* exec_env, uint32_t heartbeat_server_port, ThriftServer** heart_beat_server,
uint32_t worker_thread_num, TMasterInfo* local_master_info);
} // namespace starrocks
#endif // STARROCKS_BE_SRC_AGENT_HEARTBEAT_SERVER_H

51
be/src/agent/status.h Normal file
View File

@ -0,0 +1,51 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/status.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_AGENT_STATUS_H
#define STARROCKS_BE_SRC_AGENT_STATUS_H
namespace starrocks {
enum AgentStatus {
STARROCKS_SUCCESS = 0,
STARROCKS_ERROR = -1,
STARROCKS_TASK_REQUEST_ERROR = -101,
STARROCKS_FILE_DOWNLOAD_INVALID_PARAM = -201,
STARROCKS_FILE_DOWNLOAD_INSTALL_OPT_FAILED = -202,
STARROCKS_FILE_DOWNLOAD_CURL_INIT_FAILED = -203,
STARROCKS_FILE_DOWNLOAD_FAILED = -204,
STARROCKS_FILE_DOWNLOAD_GET_LENGTH_FAILED = -205,
STARROCKS_FILE_DOWNLOAD_NOT_EXIST = -206,
STARROCKS_FILE_DOWNLOAD_LIST_DIR_FAIL = -207,
STARROCKS_CREATE_TABLE_EXIST = -301,
STARROCKS_CREATE_TABLE_DIFF_SCHEMA_EXIST = -302,
STARROCKS_CREATE_TABLE_NOT_EXIST = -303,
STARROCKS_DROP_TABLE_NOT_EXIST = -401,
STARROCKS_PUSH_INVALID_TABLE = -501,
STARROCKS_PUSH_INVALID_VERSION = -502,
STARROCKS_PUSH_TIME_OUT = -503,
STARROCKS_PUSH_HAD_LOADED = -504,
STARROCKS_TIMEOUT = -901,
STARROCKS_INTERNAL_ERROR = -902,
STARROCKS_DISK_REACH_CAPACITY_LIMIT = -903,
};
} // namespace starrocks
#endif // STARROCKS_BE_SRC_AGENT_STATUS_H

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,148 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/task_worker_pool.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_TASK_WORKER_POOL_H
#define STARROCKS_BE_SRC_TASK_WORKER_POOL_H
#include <atomic>
#include <condition_variable>
#include <deque>
#include <memory>
#include <mutex>
#include <utility>
#include <vector>
#include "agent/status.h"
#include "agent/utils.h"
#include "gen_cpp/AgentService_types.h"
#include "gen_cpp/HeartbeatService_types.h"
#include "storage/olap_define.h"
#include "storage/storage_engine.h"
namespace starrocks {
class ExecEnv;
class TaskWorkerPool {
public:
enum TaskWorkerType {
CREATE_TABLE,
DROP_TABLE,
PUSH,
REALTIME_PUSH,
PUBLISH_VERSION,
// Deprecated
CLEAR_ALTER_TASK,
CLEAR_TRANSACTION_TASK,
DELETE,
ALTER_TABLE,
// Deprecated
QUERY_SPLIT_KEY,
CLONE,
STORAGE_MEDIUM_MIGRATE,
CHECK_CONSISTENCY,
REPORT_TASK,
REPORT_DISK_STATE,
REPORT_OLAP_TABLE,
UPLOAD,
DOWNLOAD,
MAKE_SNAPSHOT,
RELEASE_SNAPSHOT,
MOVE,
RECOVER_TABLET,
UPDATE_TABLET_META_INFO
};
typedef void* (*CALLBACK_FUNCTION)(void*);
TaskWorkerPool(const TaskWorkerType task_worker_type, ExecEnv* env, const TMasterInfo& master_info);
virtual ~TaskWorkerPool();
// Start the task worker callback thread
virtual void start();
// Submit task to task pool
//
// Input parameters:
// * task: the task need callback thread to do
virtual void submit_task(const TAgentTaskRequest& task);
private:
bool _register_task_info(const TTaskType::type task_type, int64_t signature);
void _remove_task_info(const TTaskType::type task_type, int64_t signature);
void _spawn_callback_worker_thread(CALLBACK_FUNCTION callback_func);
void _finish_task(const TFinishTaskRequest& finish_task_request);
uint32_t _get_next_task_index(int32_t thread_count, std::deque<TAgentTaskRequest>& tasks, TPriority::type priority);
static void* _create_tablet_worker_thread_callback(void* arg_this);
static void* _drop_tablet_worker_thread_callback(void* arg_this);
static void* _push_worker_thread_callback(void* arg_this);
static void* _publish_version_worker_thread_callback(void* arg_this);
static void* _clear_transaction_task_worker_thread_callback(void* arg_this);
static void* _alter_tablet_worker_thread_callback(void* arg_this);
static void* _clone_worker_thread_callback(void* arg_this);
static void* _storage_medium_migrate_worker_thread_callback(void* arg_this);
static void* _check_consistency_worker_thread_callback(void* arg_this);
static void* _report_task_worker_thread_callback(void* arg_this);
static void* _report_disk_state_worker_thread_callback(void* arg_this);
static void* _report_tablet_worker_thread_callback(void* arg_this);
static void* _upload_worker_thread_callback(void* arg_this);
static void* _download_worker_thread_callback(void* arg_this);
static void* _make_snapshot_thread_callback(void* arg_this);
static void* _release_snapshot_thread_callback(void* arg_this);
static void* _move_dir_thread_callback(void* arg_this);
static void* _update_tablet_meta_worker_thread_callback(void* arg_this);
void _alter_tablet(TaskWorkerPool* worker_pool_this, const TAgentTaskRequest& alter_tablet_request,
int64_t signature, const TTaskType::type task_type, TFinishTaskRequest* finish_task_request);
AgentStatus _get_tablet_info(const TTabletId tablet_id, const TSchemaHash schema_hash, int64_t signature,
TTabletInfo* tablet_info);
AgentStatus _move_dir(const TTabletId tablet_id, const TSchemaHash schema_hash, const std::string& src,
int64_t job_id, bool overwrite, std::vector<std::string>* error_msgs);
// Reference to the ExecEnv::_master_info
const TMasterInfo& _master_info;
TBackend _backend;
std::unique_ptr<AgentUtils> _agent_utils;
std::unique_ptr<MasterServerClient> _master_client;
ExecEnv* _env;
// Protect task queue
std::mutex _worker_thread_lock;
std::condition_variable* _worker_thread_condition_variable;
std::deque<TAgentTaskRequest> _tasks;
uint32_t _worker_count = 0;
TaskWorkerType _task_worker_type;
CALLBACK_FUNCTION _callback_function;
static FrontendServiceClientCache _master_service_client_cache;
static std::atomic_ulong _s_report_version;
static std::mutex _s_task_signatures_lock;
static std::map<TTaskType::type, std::set<int64_t>> _s_task_signatures;
DISALLOW_COPY_AND_ASSIGN(TaskWorkerPool);
}; // class TaskWorkerPool
} // namespace starrocks
#endif // STARROCKS_BE_SRC_TASK_WORKER_POOL_H

277
be/src/agent/utils.cpp Normal file
View File

@ -0,0 +1,277 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/utils.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "agent/utils.h"
#include <rapidjson/document.h>
#include <rapidjson/rapidjson.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/writer.h>
#include <cstdio>
#include <fstream>
#include <sstream>
#include "common/status.h"
using std::map;
using std::string;
using std::stringstream;
using std::vector;
using apache::thrift::TException;
using apache::thrift::transport::TTransportException;
namespace starrocks {
MasterServerClient::MasterServerClient(const TMasterInfo& master_info, FrontendServiceClientCache* client_cache)
: _master_info(master_info), _client_cache(client_cache) {}
AgentStatus MasterServerClient::finish_task(const TFinishTaskRequest& request, TMasterResult* result) {
Status client_status;
FrontendServiceConnection client(_client_cache, _master_info.network_address, config::thrift_rpc_timeout_ms,
&client_status);
if (!client_status.ok()) {
LOG(WARNING) << "Fail to get master client from cache. "
<< "host=" << _master_info.network_address.hostname
<< ", port=" << _master_info.network_address.port << ", code=" << client_status.code();
return STARROCKS_ERROR;
}
try {
try {
client->finishTask(*result, request);
} catch (TTransportException& e) {
LOG(WARNING) << "master client, retry finishTask: " << e.what();
client_status = client.reopen(config::thrift_rpc_timeout_ms);
if (!client_status.ok()) {
LOG(WARNING) << "Fail to get master client from cache. "
<< "host=" << _master_info.network_address.hostname
<< ", port=" << _master_info.network_address.port << ", code=" << client_status.code();
return STARROCKS_ERROR;
}
client->finishTask(*result, request);
}
} catch (TException& e) {
client.reopen(config::thrift_rpc_timeout_ms);
LOG(WARNING) << "Fail to finish_task. "
<< "host=" << _master_info.network_address.hostname
<< ", port=" << _master_info.network_address.port << ", error=" << e.what();
return STARROCKS_ERROR;
}
return STARROCKS_SUCCESS;
}
AgentStatus MasterServerClient::report(const TReportRequest& request, TMasterResult* result) {
Status client_status;
FrontendServiceConnection client(_client_cache, _master_info.network_address, config::thrift_rpc_timeout_ms,
&client_status);
if (!client_status.ok()) {
LOG(WARNING) << "Fail to get master client from cache. "
<< "host=" << _master_info.network_address.hostname
<< " port=" << _master_info.network_address.port << " code=" << client_status.code();
return STARROCKS_ERROR;
}
try {
try {
client->report(*result, request);
} catch (TTransportException& e) {
TTransportException::TTransportExceptionType type = e.getType();
if (type != TTransportException::TTransportExceptionType::TIMED_OUT) {
// if not TIMED_OUT, retry
LOG(WARNING) << "master client, retry finishTask: " << e.what();
client_status = client.reopen(config::thrift_rpc_timeout_ms);
if (!client_status.ok()) {
LOG(WARNING) << "Fail to get master client from cache. "
<< "host=" << _master_info.network_address.hostname
<< ", port=" << _master_info.network_address.port << ", code=" << client_status.code();
return STARROCKS_ERROR;
}
client->report(*result, request);
} else {
// TIMED_OUT exception. do not retry
// actually we don't care what FE returns.
LOG(WARNING) << "Fail to report to master: " << e.what();
return STARROCKS_ERROR;
}
}
} catch (TException& e) {
client.reopen(config::thrift_rpc_timeout_ms);
LOG(WARNING) << "Fail to report to master. "
<< "host=" << _master_info.network_address.hostname
<< ", port=" << _master_info.network_address.port << ", code=" << client_status.code();
return STARROCKS_ERROR;
}
return STARROCKS_SUCCESS;
}
AgentStatus AgentUtils::rsync_from_remote(const string& remote_host, const string& remote_file_path,
const string& local_file_path,
const std::vector<string>& exclude_file_patterns,
uint32_t transport_speed_limit_kbps, uint32_t timeout_second) {
int ret_code = 0;
std::stringstream cmd_stream;
cmd_stream << "rsync -r -q -e \"ssh -o StrictHostKeyChecking=no\"";
for (auto exclude_file_pattern : exclude_file_patterns) {
cmd_stream << " --exclude=" << exclude_file_pattern;
}
if (transport_speed_limit_kbps != 0) {
cmd_stream << " --bwlimit=" << transport_speed_limit_kbps;
}
if (timeout_second != 0) {
cmd_stream << " --timeout=" << timeout_second;
}
cmd_stream << " " << remote_host << ":" << remote_file_path << " " << local_file_path;
LOG(INFO) << "rsync cmd: " << cmd_stream.str();
FILE* fp = NULL;
fp = popen(cmd_stream.str().c_str(), "r");
if (fp == NULL) {
return STARROCKS_ERROR;
}
ret_code = pclose(fp);
if (ret_code != 0) {
return STARROCKS_ERROR;
}
return STARROCKS_SUCCESS;
}
std::string AgentUtils::print_agent_status(AgentStatus status) {
switch (status) {
case STARROCKS_SUCCESS:
return "STARROCKS_SUCCESS";
case STARROCKS_ERROR:
return "STARROCKS_ERROR";
case STARROCKS_TASK_REQUEST_ERROR:
return "STARROCKS_TASK_REQUEST_ERROR";
case STARROCKS_FILE_DOWNLOAD_INVALID_PARAM:
return "STARROCKS_FILE_DOWNLOAD_INVALID_PARAM";
case STARROCKS_FILE_DOWNLOAD_INSTALL_OPT_FAILED:
return "STARROCKS_FILE_DOWNLOAD_INSTALL_OPT_FAILED";
case STARROCKS_FILE_DOWNLOAD_CURL_INIT_FAILED:
return "STARROCKS_FILE_DOWNLOAD_CURL_INIT_FAILED";
case STARROCKS_FILE_DOWNLOAD_FAILED:
return "STARROCKS_FILE_DOWNLOAD_FAILED";
case STARROCKS_FILE_DOWNLOAD_GET_LENGTH_FAILED:
return "STARROCKS_FILE_DOWNLOAD_GET_LENGTH_FAILED";
case STARROCKS_FILE_DOWNLOAD_NOT_EXIST:
return "STARROCKS_FILE_DOWNLOAD_NOT_EXIST";
case STARROCKS_FILE_DOWNLOAD_LIST_DIR_FAIL:
return "STARROCKS_FILE_DOWNLOAD_LIST_DIR_FAIL";
case STARROCKS_CREATE_TABLE_EXIST:
return "STARROCKS_CREATE_TABLE_EXIST";
case STARROCKS_CREATE_TABLE_DIFF_SCHEMA_EXIST:
return "STARROCKS_CREATE_TABLE_DIFF_SCHEMA_EXIST";
case STARROCKS_CREATE_TABLE_NOT_EXIST:
return "STARROCKS_CREATE_TABLE_NOT_EXIST";
case STARROCKS_DROP_TABLE_NOT_EXIST:
return "STARROCKS_DROP_TABLE_NOT_EXIST";
case STARROCKS_PUSH_INVALID_TABLE:
return "STARROCKS_PUSH_INVALID_TABLE";
case STARROCKS_PUSH_INVALID_VERSION:
return "STARROCKS_PUSH_INVALID_VERSION";
case STARROCKS_PUSH_TIME_OUT:
return "STARROCKS_PUSH_TIME_OUT";
case STARROCKS_PUSH_HAD_LOADED:
return "STARROCKS_PUSH_HAD_LOADED";
case STARROCKS_TIMEOUT:
return "STARROCKS_TIMEOUT";
case STARROCKS_INTERNAL_ERROR:
return "STARROCKS_INTERNAL_ERROR";
default:
return "UNKNOWM";
}
}
bool AgentUtils::exec_cmd(const string& command, string* errmsg) {
// The exit status of the command.
uint32_t rc = 0;
// Redirect stderr to stdout to get error message.
string cmd = command + " 2>&1";
// Execute command.
FILE* fp = popen(cmd.c_str(), "r");
if (fp == NULL) {
std::stringstream err_stream;
err_stream << "popen failed. " << strerror(errno) << ", with errno: " << errno << ".\n";
*errmsg = err_stream.str();
return false;
}
// Get command output.
char result[1024] = {'\0'};
while (fgets(result, sizeof(result), fp) != NULL) {
*errmsg += result;
}
// Waits for the associated process to terminate and returns.
rc = pclose(fp);
if (rc == -1) {
if (errno == ECHILD) {
*errmsg += "pclose cannot obtain the child status.\n";
} else {
std::stringstream err_stream;
err_stream << "Close popen failed. " << strerror(errno) << ", with errno: " << errno << "\n";
*errmsg += err_stream.str();
}
return false;
}
// Get return code of command.
int32_t status_child = WEXITSTATUS(rc);
if (status_child == 0) {
return true;
} else {
return false;
}
}
bool AgentUtils::write_json_to_file(const map<string, string>& info, const string& path) {
rapidjson::Document json_info(rapidjson::kObjectType);
for (auto& it : info) {
json_info.AddMember(rapidjson::Value(it.first.c_str(), json_info.GetAllocator()).Move(),
rapidjson::Value(it.second.c_str(), json_info.GetAllocator()).Move(),
json_info.GetAllocator());
}
rapidjson::StringBuffer json_info_str;
rapidjson::Writer<rapidjson::StringBuffer> writer(json_info_str);
json_info.Accept(writer);
std::ofstream fp(path);
if (!fp) {
return false;
}
fp << json_info_str.GetString() << std::endl;
fp.close();
return true;
}
} // namespace starrocks

97
be/src/agent/utils.h Normal file
View File

@ -0,0 +1,97 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/agent/utils.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_AGENT_UTILS_H
#define STARROCKS_BE_SRC_AGENT_UTILS_H
#include "agent/status.h"
#include "gen_cpp/FrontendService.h"
#include "gen_cpp/FrontendService_types.h"
#include "gen_cpp/HeartbeatService_types.h"
#include "runtime/client_cache.h"
namespace starrocks {
class MasterServerClient {
public:
MasterServerClient(const TMasterInfo& master_info, FrontendServiceClientCache* client_cache);
virtual ~MasterServerClient(){};
// Reprot finished task to the master server
//
// Input parameters:
// * request: The infomation of finished task
//
// Output parameters:
// * result: The result of report task
virtual AgentStatus finish_task(const TFinishTaskRequest& request, TMasterResult* result);
// Report tasks/olap tablet/disk state to the master server
//
// Input parameters:
// * request: The infomation to report
//
// Output parameters:
// * result: The result of report task
virtual AgentStatus report(const TReportRequest& request, TMasterResult* result);
private:
DISALLOW_COPY_AND_ASSIGN(MasterServerClient);
// Not ownder. Reference to the ExecEnv::_master_info
const TMasterInfo& _master_info;
FrontendServiceClientCache* _client_cache;
};
class AgentUtils {
public:
AgentUtils(){};
virtual ~AgentUtils(){};
// Use rsync synchronize folder from remote agent to local folder
//
// Input parameters:
// * remote_host: the host of remote server
// * remote_file_path: remote file folder path
// * local_file_path: local file folder path
// * exclude_file_patterns: the patterns of the exclude file
// * transport_speed_limit_kbps: speed limit of transport(kb/s)
// * timeout_second: timeout of synchronize
virtual AgentStatus rsync_from_remote(const std::string& remote_host, const std::string& remote_file_path,
const std::string& local_file_path,
const std::vector<std::string>& exclude_file_patterns,
const uint32_t transport_speed_limit_kbps, const uint32_t timeout_second);
// Print AgentStatus as string
virtual std::string print_agent_status(AgentStatus status);
// Execute shell cmd
virtual bool exec_cmd(const std::string& command, std::string* errmsg);
// Write a map to file by json format
virtual bool write_json_to_file(const std::map<std::string, std::string>& info, const std::string& path);
private:
DISALLOW_COPY_AND_ASSIGN(AgentUtils);
}; // class AgentUtils
} // namespace starrocks
#endif // STARROCKS_BE_SRC_AGENT_UTILS_H

View File

@ -0,0 +1,38 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/codegen/doris_ir.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_QUERY_CODGEN_STARROCKS_IR_H
#define STARROCKS_BE_SRC_QUERY_CODGEN_STARROCKS_IR_H
#ifdef IR_COMPILE
// For cross compiling to IR, we need functions decorated in specific ways. For
// functions that we will replace with codegen, we need them not inlined (otherwise
// we can't find the function by name. For functions where the non-codegen'd version
// is too long for the compiler to inline, we might still want to inline it since
// the codegen'd version is suitable for inling.
// In the non-ir case (g++), we will just default to whatever the compiler thought
// best at that optimization setting.
#define IR_NO_INLINE __attribute__((noinline))
#define IR_ALWAYS_INLINE __attribute__((always_inline))
#else
#define IR_NO_INLINE
#define IR_ALWAYS_INLINE
#endif
#endif

View File

@ -0,0 +1,21 @@
# This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
# where to put generated libraries
set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/column")
add_library(Column STATIC
array_column.cpp
column_helper.cpp
chunk.cpp
const_column.cpp
datum_convert.cpp
datum_tuple.cpp
field.cpp
fixed_length_column_base.cpp
fixed_length_column.cpp
nullable_column.cpp
schema.cpp
binary_column.cpp
object_column.cpp
decimalv3_column.cpp
)

View File

@ -0,0 +1,413 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/array_column.h"
#include "column/column_helper.h"
#include "column/fixed_length_column.h"
#include "gutil/bits.h"
#include "gutil/casts.h"
#include "gutil/strings/fastmem.h"
#include "util/mysql_row_buffer.h"
namespace starrocks::vectorized {
ArrayColumn::ArrayColumn(ColumnPtr elements, UInt32Column::Ptr offsets)
: _elements(std::move(elements)), _offsets(std::move(offsets)) {
if (_offsets->empty()) {
_offsets->append(0);
}
}
size_t ArrayColumn::size() const {
return _offsets->size() - 1;
}
const uint8_t* ArrayColumn::raw_data() const {
return _elements->raw_data();
}
uint8_t* ArrayColumn::mutable_raw_data() {
return _elements->mutable_raw_data();
}
size_t ArrayColumn::byte_size(size_t from, size_t size) const {
DCHECK_LE(from + size, this->size()) << "Range error";
return _elements->byte_size(_offsets->get_data()[from], _offsets->get_data()[from + size]) +
_offsets->Column::byte_size(from, size);
}
size_t ArrayColumn::byte_size(size_t idx) const {
return _elements->byte_size(_offsets->get_data()[idx], _offsets->get_data()[idx + 1]) +
sizeof(_offsets->get_data()[idx]);
}
void ArrayColumn::reserve(size_t n) {
_offsets->reserve(n + 1);
}
void ArrayColumn::resize(size_t n) {
_offsets->get_data().resize(n + 1, _offsets->get_data().back());
size_t array_size = _offsets->get_data().back();
_elements->resize(array_size);
}
void ArrayColumn::assign(size_t n, size_t idx) {
DCHECK(false) << "array column shouldn't call assign";
}
void ArrayColumn::append_datum(const Datum& datum) {
const auto& array = datum.get<DatumArray>();
size_t array_size = array.size();
for (size_t i = 0; i < array_size; ++i) {
_elements->append_datum(array[i]);
}
_offsets->append(_offsets->get_data().back() + array_size);
}
void ArrayColumn::append(const Column& src, size_t offset, size_t count) {
const auto& array_column = down_cast<const ArrayColumn&>(src);
const UInt32Column& src_offsets = array_column.offsets();
size_t src_offset = src_offsets.get_data()[offset];
size_t src_count = src_offsets.get_data()[offset + count] - src_offset;
_elements->append(array_column.elements(), src_offset, src_count);
for (size_t i = offset; i < offset + count; i++) {
size_t l = src_offsets.get_data()[i + 1] - src_offsets.get_data()[i];
_offsets->append(_offsets->get_data().back() + l);
}
}
void ArrayColumn::append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) {
for (uint32_t i = 0; i < size; i++) {
append(src, indexes[from + i], 1);
}
}
void ArrayColumn::append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) {
for (uint32_t i = 0; i < size; i++) {
append(src, index, 1);
}
}
void ArrayColumn::append_value_multiple_times(const void* value, size_t count) {
const Datum* datum = reinterpret_cast<const Datum*>(value);
const auto& array = datum->get<DatumArray>();
size_t array_size = array.size();
for (size_t c = 0; c < count; ++c) {
for (size_t i = 0; i < array_size; ++i) {
_elements->append_datum(array[i]);
}
_offsets->append(_offsets->get_data().back() + array_size);
}
}
bool ArrayColumn::append_nulls(size_t count) {
return false;
}
void ArrayColumn::append_default() {
_offsets->append(_offsets->get_data().back());
}
void ArrayColumn::append_default(size_t count) {
size_t offset = _offsets->get_data().back();
_offsets->append_value_multiple_times(&offset, count);
}
uint32_t ArrayColumn::serialize(size_t idx, uint8_t* pos) {
uint32_t offset = _offsets->get_data()[idx];
uint32_t array_size = _offsets->get_data()[idx + 1] - offset;
strings::memcpy_inlined(pos, &array_size, sizeof(array_size));
size_t ser_size = sizeof(array_size);
for (size_t i = 0; i < array_size; ++i) {
ser_size += _elements->serialize(offset + i, pos + ser_size);
}
return ser_size;
}
uint32_t ArrayColumn::serialize_default(uint8_t* pos) {
uint32_t array_size = 0;
strings::memcpy_inlined(pos, &array_size, sizeof(array_size));
return sizeof(array_size);
}
const uint8_t* ArrayColumn::deserialize_and_append(const uint8_t* pos) {
uint32_t array_size = 0;
memcpy(&array_size, pos, sizeof(uint32_t));
pos += sizeof(uint32_t);
_offsets->append(_offsets->get_data().back() + array_size);
for (size_t i = 0; i < array_size; ++i) {
pos = _elements->deserialize_and_append(pos);
}
return pos;
}
uint32_t ArrayColumn::max_one_element_serialize_size() const {
// TODO: performance optimization.
size_t n = size();
uint32_t max_size = 0;
for (size_t i = 0; i < n; i++) {
max_size = std::max(max_size, serialize_size(i));
}
return max_size;
}
uint32_t ArrayColumn::serialize_size(size_t idx) const {
uint32_t offset = _offsets->get_data()[idx];
uint32_t array_size = _offsets->get_data()[idx + 1] - offset;
uint32_t ser_size = sizeof(array_size);
for (size_t i = 0; i < array_size; ++i) {
ser_size += _elements->serialize_size(offset + i);
}
return ser_size;
}
void ArrayColumn::serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) {
for (size_t i = 0; i < chunk_size; ++i) {
slice_sizes[i] += serialize(i, dst + i * max_one_row_size + slice_sizes[i]);
}
}
void ArrayColumn::deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) {
reserve(batch_size);
for (size_t i = 0; i < batch_size; ++i) {
srcs[i].data = (char*)deserialize_and_append((uint8_t*)srcs[i].data);
}
}
size_t ArrayColumn::serialize_size() const {
return _offsets->serialize_size() + _elements->serialize_size();
}
uint8_t* ArrayColumn::serialize_column(uint8_t* dst) {
dst = _offsets->serialize_column(dst);
dst = _elements->serialize_column(dst);
return dst;
}
const uint8_t* ArrayColumn::deserialize_column(const uint8_t* src) {
src = _offsets->deserialize_column(src);
src = _elements->deserialize_column(src);
return src;
}
MutableColumnPtr ArrayColumn::clone_empty() const {
return create_mutable(_elements->clone_empty(), UInt32Column::create());
}
size_t ArrayColumn::filter_range(const Column::Filter& filter, size_t from, size_t to) {
DCHECK_EQ(size(), to);
uint32_t* offsets = reinterpret_cast<uint32_t*>(_offsets->mutable_raw_data());
uint32_t elements_start = offsets[from];
uint32_t elements_end = offsets[to];
Filter element_filter(elements_end, 0);
auto check_offset = from;
auto result_offset = from;
#ifdef __AVX2__
const uint8_t* f_data = filter.data();
constexpr size_t kBatchSize = /*width of AVX registers*/ 256 / 8;
const __m256i all0 = _mm256_setzero_si256();
while (check_offset + kBatchSize < to) {
__m256i f = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(f_data + check_offset));
uint32_t mask = _mm256_movemask_epi8(_mm256_cmpgt_epi8(f, all0));
if (mask == 0) {
// all no hit, pass
} else if (mask == 0xffffffff) {
// all hit, copy all
auto element_size = offsets[check_offset + kBatchSize] - offsets[check_offset];
memset(element_filter.data() + offsets[check_offset], 1, element_size);
if (result_offset != check_offset) {
DCHECK_LE(offsets[result_offset], offsets[check_offset]);
// Equivalent to the following code:
// ```
// uint32_t array_sizes[kBatchSize];
// for (int i = 0; i < kBatchSize; i++) {
// array_sizes[i] = offsets[check_offset + i + 1] - offsets[check_offset + i];
// }
// for (int i = 0; i < kBatchSize; i++) {
// offsets[result_offset + i + 1] = offsets[result_offset + i] + array_sizes[i];
// }
// ```
auto delta = offsets[check_offset] - offsets[result_offset];
memmove(offsets + result_offset + 1, offsets + check_offset + 1, kBatchSize * sizeof(offsets[0]));
for (int i = 0; i < kBatchSize; i++) {
offsets[result_offset + i + 1] -= delta;
}
}
result_offset += kBatchSize;
} else {
// skip not hit row, it's will reduce compare when filter layout is sparse,
// like "00010001...", but is ineffective when the filter layout is dense.
auto zero_count = Bits::CountTrailingZerosNonZero32(mask);
auto i = zero_count;
while (i < kBatchSize) {
mask = zero_count < 31 ? mask >> (zero_count + 1) : 0;
auto array_size = offsets[check_offset + i + 1] - offsets[check_offset + i];
memset(element_filter.data() + offsets[check_offset + i], 1, array_size);
offsets[result_offset + 1] = offsets[result_offset] + array_size;
zero_count = Bits::CountTrailingZeros32(mask);
result_offset += 1;
i += (zero_count + 1);
}
}
check_offset += kBatchSize;
}
#endif
for (auto i = check_offset; i < to; ++i) {
if (filter[i]) {
DCHECK_GE(offsets[i + 1], offsets[i]);
uint32_t array_size = offsets[i + 1] - offsets[i];
memset(element_filter.data() + offsets[i], 1, array_size);
offsets[result_offset + 1] = offsets[result_offset] + array_size;
result_offset++;
}
}
auto ret = _elements->filter_range(element_filter, elements_start, elements_end);
DCHECK_EQ(offsets[result_offset], ret);
resize(result_offset);
return result_offset;
}
int ArrayColumn::compare_at(size_t left, size_t right, const Column& right_column, int nan_direction_hint) const {
const ArrayColumn& rhs = down_cast<const ArrayColumn&>(right_column);
size_t lhs_offset = _offsets->get_data()[left];
size_t lhs_size = _offsets->get_data()[left + 1] - lhs_offset;
const UInt32Column& rhs_offsets = rhs.offsets();
size_t rhs_offset = rhs_offsets.get_data()[right];
size_t rhs_size = rhs_offsets.get_data()[right + 1] - rhs_offset;
size_t min_size = std::min(lhs_size, rhs_size);
for (size_t i = 0; i < min_size; ++i) {
int res = _elements->compare_at(lhs_offset + i, rhs_offset + i, rhs.elements(), nan_direction_hint);
if (res != 0) {
return res;
}
}
return lhs_size < rhs_size ? -1 : (lhs_size == rhs_size ? 0 : 1);
}
void ArrayColumn::fvn_hash(uint32_t* seed, uint16_t from, uint16_t to) const {
// TODO: only used in shuffle.
DCHECK(false) << "If you use array element as join column, it should be implemented";
}
void ArrayColumn::crc32_hash(uint32_t* seed, uint16_t from, uint16_t to) const {
// TODO: only used in shuffle.
DCHECK(false) << "If you use array element as join column, it should be implemented";
}
void ArrayColumn::put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const {
DCHECK_LT(idx, size());
const size_t offset = _offsets->get_data()[idx];
const size_t array_size = _offsets->get_data()[idx + 1] - offset;
buf->begin_push_array();
Column* elements = _elements.get();
if (array_size > 0) {
elements->put_mysql_row_buffer(buf, offset);
}
for (size_t i = 1; i < array_size; i++) {
buf->separator(',');
elements->put_mysql_row_buffer(buf, offset + i);
}
buf->finish_push_array();
}
Datum ArrayColumn::get(size_t idx) const {
DCHECK_LT(idx + 1, _offsets->size()) << "idx + 1 should be less than offsets size";
size_t offset = _offsets->get_data()[idx];
size_t array_size = _offsets->get_data()[idx + 1] - offset;
DatumArray res(array_size);
for (size_t i = 0; i < array_size; ++i) {
res[i] = _elements->get(offset + i);
}
return Datum(res);
}
bool ArrayColumn::set_null(size_t idx) {
return false;
}
size_t ArrayColumn::element_memory_usage(size_t from, size_t size) const {
DCHECK_LE(from + size, this->size()) << "Range error";
return _elements->element_memory_usage(_offsets->get_data()[from], _offsets->get_data()[from + size]) +
_offsets->Column::element_memory_usage(from, size);
}
void ArrayColumn::swap_column(Column& rhs) {
ArrayColumn& array_column = down_cast<ArrayColumn&>(rhs);
_offsets->swap_column(*array_column.offsets_column());
_elements->swap_column(*array_column.elements_column());
}
void ArrayColumn::reset_column() {
Column::reset_column();
_offsets->resize(1);
_elements->reset_column();
}
const Column& ArrayColumn::elements() const {
return *(_elements.get());
}
ColumnPtr& ArrayColumn::elements_column() {
return _elements;
}
const UInt32Column& ArrayColumn::offsets() const {
return *_offsets;
}
UInt32Column::Ptr& ArrayColumn::offsets_column() {
return _offsets;
}
std::string ArrayColumn::debug_item(uint32_t idx) const {
DCHECK_LT(idx, size());
size_t offset = _offsets->get_data()[idx];
size_t array_size = _offsets->get_data()[idx + 1] - offset;
std::stringstream ss;
ss << "[";
for (size_t i = 0; i < array_size; ++i) {
if (i > 0) {
ss << ", ";
}
ss << _elements->debug_item(offset + i);
}
ss << "]";
return ss.str();
}
std::string ArrayColumn::debug_string() const {
std::stringstream ss;
for (size_t i = 0; i < size(); ++i) {
if (i > 0) {
ss << ", ";
}
ss << debug_item(i);
}
return ss.str();
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,233 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/column.h"
#include "column/fixed_length_column.h"
namespace starrocks::vectorized {
class ArrayColumn final : public ColumnFactory<Column, ArrayColumn> {
friend class ColumnFactory<Column, ArrayColumn>;
public:
using ValueType = void;
ArrayColumn(ColumnPtr elements, UInt32Column::Ptr offests);
// Copy constructor
ArrayColumn(const ArrayColumn& rhs)
: _elements(rhs._elements->clone_shared()),
_offsets(std::static_pointer_cast<UInt32Column>(rhs._offsets->clone_shared())) {}
// Move constructor
ArrayColumn(ArrayColumn&& rhs) noexcept : _elements(std::move(rhs._elements)), _offsets(std::move(rhs._offsets)) {}
// Copy assignment
ArrayColumn& operator=(const ArrayColumn& rhs) {
ArrayColumn tmp(rhs);
this->swap_column(tmp);
return *this;
}
// Move assignment
ArrayColumn& operator=(ArrayColumn&& rhs) {
ArrayColumn tmp(std::move(rhs));
this->swap_column(tmp);
return *this;
}
~ArrayColumn() override = default;
bool is_array() const override { return true; }
const uint8_t* raw_data() const override;
uint8_t* mutable_raw_data() override;
// Return number of values in column.
size_t size() const override;
size_t type_size() const override { return sizeof(DatumArray); }
// Size of column data in memory (may be approximate). Zero, if could not be determined.
size_t byte_size() const override { return _elements->byte_size() + _offsets->byte_size(); }
size_t byte_size(size_t from, size_t size) const override;
// The byte size for serialize, for varchar, we need to add the len byte size
size_t byte_size(size_t idx) const override;
void reserve(size_t n) override;
void resize(size_t n) override;
// Assign specified idx element to the column container content,
// and modifying column size accordingly.
void assign(size_t n, size_t idx) override;
// Appends one value at the end of column (column's size is increased by 1).
void append_datum(const Datum& datum) override;
// Append |count| elements from |src|, started from the offset |offset|, into |this| column.
// It's undefined behaviour if |offset+count| greater than the size of |src|.
// The type of |src| and |this| must be exactly matched.
void append(const Column& src, size_t offset, size_t count) override;
// This function will append data from src according to the input indexes. 'indexes' contains
// the row index of the src.
// This function will get row index from indexes and append the data to this column.
// This function will handle indexes start from input 'from' and will append 'size' times
// For example:
// input indexes: [5, 4, 3, 2, 1]
// from: 2
// size: 2
// This function will copy the [3, 2] row of src to this column.
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
// Append multiple `null` values into this column.
// Return false if this is a non-nullable column, i.e, if `is_nullable` return false.
bool append_nulls(size_t count) override;
// Append multiple strings into this column.
// Return false if the column is not a binary column.
bool append_strings(const std::vector<Slice>& strs) override { return false; }
// Copy |length| bytes from |buff| into this column and cast them as integers.
// The count of copied integers depends on |length| and the size of column value:
// - `int8_t` column: |length| integers will be copied.
// - `int16_t` column: |length| / 2 integers will be copied.
// - `int32_t` column: |length| / 4 integers will be copied.
// - ...
// |buff| must NOT be nullptr.
// Return
// - the count of copied integers on success.
// - -1 if this is not a numeric column.
size_t append_numbers(const void* buff, size_t length) override { return -1; }
// Append |*value| |count| times, this is only used when load default value.
void append_value_multiple_times(const void* value, size_t count) override;
// Append one default value into this column.
// NOTE:
// - for `NullableColumn`, the default value is `null`.
// - for `BinaryColumn`, the default value is empty string.
// - for `FixedLengthColumn`, the default value is zero.
// - for `ConstColumn`, the default value is the const value itself.
void append_default() override;
// Append multiple default values into this column.
void append_default(size_t count) override;
void remove_first_n_values(size_t count) override {}
// Sometimes(Hash group by multi columns),
// we need one buffer to hold tmp serialize data,
// So we need to know the max serialize_size for all column element
// The bad thing is we couldn't get the string defined len from FE when query
uint32_t max_one_element_serialize_size() const override;
// serialize one data,The memory must allocate firstly from mempool
uint32_t serialize(size_t idx, uint8_t* pos) override;
uint32_t serialize_default(uint8_t* pos) override;
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override;
// deserialize one data and append to this column
const uint8_t* deserialize_and_append(const uint8_t* pos) override;
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override;
// One element serialize_size
uint32_t serialize_size(size_t idx) const override;
// The serialize bytes size when serialize by directly copy whole column data
size_t serialize_size() const override;
// Serialize whole column data to dst
// The return value is dst + column serialize_size
uint8_t* serialize_column(uint8_t* dst) override;
// Deserialize whole column from the src
// The return value is src + column serialize_size
// TODO(kks): validate the input src column data
const uint8_t* deserialize_column(const uint8_t* src) override;
// return new empty column with the same type
MutableColumnPtr clone_empty() const override;
size_t filter_range(const Filter& filter, size_t from, size_t to) override;
// Compares (*this)[left] and rhs[right]. Column rhs should have the same type.
// Returns negative number, 0, or positive number (*this)[left] is less, equal, greater than
// rhs[right] respectively.
//
// If one of element's value is NaN or NULLs, then:
// - if nan_direction_hint == -1, NaN and NULLs are considered as least than everything other;
// - if nan_direction_hint == 1, NaN and NULLs are considered as greatest than everything other.
// For example, if nan_direction_hint == -1 is used by descending sorting, NaNs will be at the end.
//
// For non Nullable and non floating point types, nan_direction_hint is ignored.
int compare_at(size_t left, size_t right, const Column& right_column, int nan_direction_hint) const override;
// Compute fvn hash, mainly used by shuffle column data
// Note: shuffle hash function should be different from Aggregate and Join Hash map hash function
void fvn_hash(uint32_t* seed, uint16_t from, uint16_t to) const override;
// used by data loading compute tablet bucket
void crc32_hash(uint32_t* seed, uint16_t from, uint16_t to) const override;
// Push one row to MysqlRowBuffer
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override;
std::string get_name() const override { return "array"; }
// Return the value of the n-th element.
Datum get(size_t idx) const override;
// return false if this is a non-nullable column.
// |idx| must less than the size of column.
bool set_null(size_t idx) override;
size_t memory_usage() const override { return _elements->memory_usage() + _offsets->memory_usage(); }
size_t shrink_memory_usage() const override {
return _elements->shrink_memory_usage() + _offsets->shrink_memory_usage();
}
size_t container_memory_usage() const override {
return _elements->container_memory_usage() + _offsets->container_memory_usage();
}
size_t element_memory_usage(size_t from, size_t size) const;
void swap_column(Column& rhs) override;
void reset_column() override;
const Column& elements() const;
ColumnPtr& elements_column();
const UInt32Column& offsets() const;
UInt32Column::Ptr& offsets_column();
bool is_nullable() const override { return false; }
// Only used for debug one item in this column
std::string debug_item(uint32_t idx) const override;
std::string debug_string() const override;
private:
ColumnPtr _elements;
// Offsets column will store the start position of every array element.
// Offsets store more one data to indicate the end position.
// For example, [1, 2, 3], [4, 5, 6].
// The two element array has three offsets(0, 3, 6)
UInt32Column::Ptr _offsets;
};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,437 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/binary_column.h"
#include <immintrin.h>
#include "column/bytes.h"
#include "common/logging.h"
#include "gutil/bits.h"
#include "gutil/casts.h"
#include "gutil/strings/fastmem.h"
#include "util/coding.h"
#include "util/hash_util.hpp"
#include "util/mysql_row_buffer.h"
#include "util/raw_container.h"
namespace starrocks::vectorized {
void BinaryColumn::append(const Column& src, size_t offset, size_t count) {
const auto& b = down_cast<const BinaryColumn&>(src);
const unsigned char* p = &b._bytes[b._offsets[offset]];
const unsigned char* e = &b._bytes[b._offsets[offset + count]];
_bytes.insert(_bytes.end(), p, e);
for (size_t i = offset; i < offset + count; i++) {
size_t l = b._offsets[i + 1] - b._offsets[i];
_offsets.emplace_back(_offsets.back() + l);
}
_slices_cache = false;
}
void BinaryColumn::append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) {
const auto& src_column = down_cast<const BinaryColumn&>(src);
const auto& src_offsets = src_column.get_offset();
const auto& src_bytes = src_column.get_bytes();
size_t cur_row_count = _offsets.size() - 1;
size_t cur_byte_size = _bytes.size();
_offsets.resize(cur_row_count + size + 1);
for (size_t i = 0; i < size; i++) {
uint32_t row_idx = indexes[from + i];
uint32_t str_size = src_offsets[row_idx + 1] - src_offsets[row_idx];
_offsets[cur_row_count + i + 1] = _offsets[cur_row_count + i] + str_size;
cur_byte_size += str_size;
}
_bytes.resize(cur_byte_size);
auto* dest_bytes = _bytes.data();
for (uint32_t i = 0; i < size; i++) {
uint32_t row_idx = indexes[from + i];
uint32_t str_size = src_offsets[row_idx + 1] - src_offsets[row_idx];
strings::memcpy_inlined(dest_bytes + _offsets[cur_row_count + i], src_bytes.data() + src_offsets[row_idx],
str_size);
}
_slices_cache = false;
}
void BinaryColumn::append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) {
auto& src_column = down_cast<const BinaryColumn&>(src);
auto& src_offsets = src_column.get_offset();
auto& src_bytes = src_column.get_bytes();
size_t cur_row_count = _offsets.size() - 1;
size_t cur_byte_size = _bytes.size();
_offsets.resize(cur_row_count + size + 1);
for (size_t i = 0; i < size; i++) {
uint32_t row_idx = index;
uint32_t str_size = src_offsets[row_idx + 1] - src_offsets[row_idx];
_offsets[cur_row_count + i + 1] = _offsets[cur_row_count + i] + str_size;
cur_byte_size += str_size;
}
_bytes.resize(cur_byte_size);
auto* dest_bytes = _bytes.data();
for (uint32_t i = 0; i < size; i++) {
uint32_t row_idx = index;
uint32_t str_size = src_offsets[row_idx + 1] - src_offsets[row_idx];
strings::memcpy_inlined(dest_bytes + _offsets[cur_row_count + i], src_bytes.data() + src_offsets[row_idx],
str_size);
}
_slices_cache = false;
}
bool BinaryColumn::append_strings(const std::vector<Slice>& strs) {
for (const auto& s : strs) {
const uint8_t* const p = reinterpret_cast<const Bytes::value_type*>(s.data);
_bytes.insert(_bytes.end(), p, p + s.size);
_offsets.emplace_back(_bytes.size());
}
_slices_cache = false;
return true;
}
// NOTE: this function should not be inlined. If this function is inlined,
// the append_strings_overflow will be slower by 30%
template <size_t copy_length>
void append_fixed_length(const std::vector<Slice>& strs, Bytes* bytes, BinaryColumn::Offsets* offsets)
__attribute__((noinline));
template <size_t copy_length>
void append_fixed_length(const std::vector<Slice>& strs, Bytes* bytes, BinaryColumn::Offsets* offsets) {
size_t size = bytes->size();
for (const auto& s : strs) {
size += s.size;
}
size_t offset = bytes->size();
bytes->resize(size + copy_length);
for (const auto& s : strs) {
strings::memcpy_inlined(&(*bytes)[offset], s.data, copy_length);
offset += s.size;
offsets->emplace_back(offset);
}
bytes->resize(offset);
}
bool BinaryColumn::append_strings_overflow(const std::vector<Slice>& strs, size_t max_length) {
if (max_length <= 16) {
append_fixed_length<16>(strs, &_bytes, &_offsets);
} else if (max_length <= 32) {
append_fixed_length<32>(strs, &_bytes, &_offsets);
} else if (max_length <= 64) {
append_fixed_length<64>(strs, &_bytes, &_offsets);
} else if (max_length <= 128) {
append_fixed_length<128>(strs, &_bytes, &_offsets);
} else {
for (const auto& s : strs) {
const uint8_t* const p = reinterpret_cast<const Bytes::value_type*>(s.data);
_bytes.insert(_bytes.end(), p, p + s.size);
_offsets.emplace_back(_bytes.size());
}
}
_slices_cache = false;
return true;
}
bool BinaryColumn::append_continuous_strings(const std::vector<Slice>& strs) {
if (strs.empty()) {
return true;
}
size_t new_size = _bytes.size();
const uint8_t* p = reinterpret_cast<const uint8_t*>(strs.front().data);
const uint8_t* q = reinterpret_cast<const uint8_t*>(strs.back().data + strs.back().size);
_bytes.insert(_bytes.end(), p, q);
for (const Slice& s : strs) {
new_size += s.size;
_offsets.emplace_back(new_size);
}
DCHECK_EQ(_bytes.size(), new_size);
_slices_cache = false;
return true;
}
void BinaryColumn::append_value_multiple_times(const void* value, size_t count) {
const Slice* slice = reinterpret_cast<const Slice*>(value);
size_t size = slice->size * count;
_bytes.reserve(size);
const uint8_t* const p = reinterpret_cast<const uint8_t*>(slice->data);
const uint8_t* const pend = p + slice->size;
for (size_t i = 0; i < count; ++i) {
_bytes.insert(_bytes.end(), p, pend);
_offsets.emplace_back(_bytes.size());
}
_slices_cache = false;
}
void BinaryColumn::_build_slices() const {
DCHECK(_offsets.size() > 0);
_slices_cache = false;
_slices.clear();
_slices.reserve(_offsets.size() - 1);
for (int i = 0; i < _offsets.size() - 1; ++i) {
_slices.emplace_back(_bytes.data() + _offsets[i], _offsets[i + 1] - _offsets[i]);
}
_slices_cache = true;
}
void BinaryColumn::assign(size_t n, size_t idx) {
std::string value = std::string((char*)_bytes.data() + _offsets[idx], _offsets[idx + 1] - _offsets[idx]);
_bytes.clear();
_offsets.clear();
_offsets.emplace_back(0);
const uint8_t* const start = reinterpret_cast<const Bytes::value_type*>(value.data());
const uint8_t* const end = start + value.size();
for (int i = 0; i < n; ++i) {
_bytes.insert(_bytes.end(), start, end);
_offsets.emplace_back(_bytes.size());
}
_slices_cache = false;
}
//TODO(kks): improve this
void BinaryColumn::remove_first_n_values(size_t count) {
DCHECK_LE(count, _offsets.size() - 1);
size_t remain_size = _offsets.size() - 1 - count;
ColumnPtr column = cut(count, remain_size);
auto* binary_column = down_cast<BinaryColumn*>(column.get());
_offsets = std::move(binary_column->_offsets);
_bytes = std::move(binary_column->_bytes);
_slices_cache = false;
}
ColumnPtr BinaryColumn::cut(size_t start, size_t length) const {
auto result = create();
if (start >= size() || length == 0) {
return result;
}
size_t upper = std::min(start + length, _offsets.size());
size_t start_offset = _offsets[start];
// offset re-compute
result->get_offset().resize(upper - start + 1);
// Always set offsets[0] to 0, in order to easily get element
result->get_offset()[0] = 0;
for (size_t i = start + 1, j = 1; i < upper + 1; ++i, ++j) {
result->get_offset()[j] = _offsets[i] - start_offset;
}
// copy value
result->_bytes.resize(_offsets[upper] - _offsets[start]);
strings::memcpy_inlined(result->_bytes.data(), _bytes.data() + _offsets[start], _offsets[upper] - _offsets[start]);
return result;
}
size_t BinaryColumn::filter_range(const Column::Filter& filter, size_t from, size_t to) {
auto start_offset = from;
auto result_offset = from;
uint8_t* data = _bytes.data();
#ifdef __AVX2__
const uint8_t* f_data = filter.data();
int simd_bits = 256;
int batch_nums = simd_bits / (8 * (int)sizeof(uint8_t));
__m256i all0 = _mm256_setzero_si256();
while (start_offset + batch_nums < to) {
__m256i f = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(f_data + start_offset));
uint32_t mask = _mm256_movemask_epi8(_mm256_cmpgt_epi8(f, all0));
if (mask == 0) {
// all no hit, pass
} else if (mask == 0xffffffff) {
// all hit, copy all
// copy data
uint32_t size = _offsets[start_offset + batch_nums] - _offsets[start_offset];
memmove(data + _offsets[result_offset], data + _offsets[start_offset], size);
// set offsets, try vectorized
uint32_t* offset_data = _offsets.data();
for (int i = 0; i < batch_nums; ++i) {
// TODO: performance, all sub one same offset ?
offset_data[result_offset + i + 1] = offset_data[result_offset + i] +
offset_data[start_offset + i + 1] - offset_data[start_offset + i];
}
result_offset += batch_nums;
} else {
// skip not hit row, it's will reduce compare when filter layout is sparse,
// like "00010001...", but is ineffective when the filter layout is dense.
uint32_t zero_count = Bits::CountTrailingZerosNonZero32(mask);
uint32_t i = zero_count;
while (i < batch_nums) {
mask = zero_count < 31 ? mask >> (zero_count + 1) : 0;
uint32_t size = _offsets[start_offset + i + 1] - _offsets[start_offset + i];
// copy date
memmove(data + _offsets[result_offset], data + _offsets[start_offset + i], size);
// set offsets
_offsets[result_offset + 1] = _offsets[result_offset] + size;
zero_count = Bits::CountTrailingZeros32(mask);
result_offset += 1;
i += (zero_count + 1);
}
}
start_offset += batch_nums;
}
#endif
for (auto i = start_offset; i < to; ++i) {
if (filter[i]) {
DCHECK_GE(_offsets[i + 1], _offsets[i]);
uint32_t size = _offsets[i + 1] - _offsets[i];
// copy date
memmove(data + _offsets[result_offset], data + _offsets[i], size);
// set offsets
_offsets[result_offset + 1] = _offsets[result_offset] + size;
result_offset++;
}
}
this->resize(result_offset);
return result_offset;
}
int BinaryColumn::compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const {
const BinaryColumn& right_column = down_cast<const BinaryColumn&>(rhs);
return get_slice(left).compare(right_column.get_slice(right));
}
uint32_t BinaryColumn::max_one_element_serialize_size() const {
uint32_t max_size = 0;
auto prev_offset = _offsets[0];
for (size_t i = 0; i < _offsets.size() - 1; ++i) {
auto curr_offset = _offsets[i + 1];
max_size = std::max(max_size, curr_offset - prev_offset);
prev_offset = curr_offset;
}
return max_size + sizeof(uint32_t);
}
uint32_t BinaryColumn::serialize(size_t idx, uint8_t* pos) {
uint32_t binary_size = _offsets[idx + 1] - _offsets[idx];
uint32_t offset = _offsets[idx];
strings::memcpy_inlined(pos, &binary_size, sizeof(uint32_t));
strings::memcpy_inlined(pos + sizeof(uint32_t), &_bytes[offset], binary_size);
return sizeof(uint32_t) + binary_size;
}
uint32_t BinaryColumn::serialize_default(uint8_t* pos) {
uint32_t binary_size = 0;
strings::memcpy_inlined(pos, &binary_size, sizeof(uint32_t));
return sizeof(uint32_t);
}
void BinaryColumn::serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) {
for (size_t i = 0; i < chunk_size; ++i) {
slice_sizes[i] += serialize(i, dst + i * max_one_row_size + slice_sizes[i]);
}
}
const uint8_t* BinaryColumn::deserialize_and_append(const uint8_t* pos) {
uint32_t string_size{};
strings::memcpy_inlined(&string_size, pos, sizeof(uint32_t));
pos += sizeof(uint32_t);
size_t old_size = _bytes.size();
_bytes.insert(_bytes.end(), pos, pos + string_size);
_offsets.emplace_back(old_size + string_size);
return pos + string_size;
}
void BinaryColumn::deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) {
uint32_t string_size = *((uint32_t*)srcs[0].data);
_bytes.reserve(batch_size * string_size * 2);
for (size_t i = 0; i < batch_size; ++i) {
srcs[i].data = (char*)deserialize_and_append((uint8_t*)srcs[i].data);
}
}
uint8_t* BinaryColumn::serialize_column(uint8_t* dst) {
uint32_t bytes_size = _bytes.size() * sizeof(uint8_t);
encode_fixed32_le(dst, bytes_size);
dst += sizeof(uint32_t);
strings::memcpy_inlined(dst, _bytes.data(), bytes_size);
dst += bytes_size;
uint32_t offsets_size = _offsets.size() * sizeof(Offset);
encode_fixed32_le(dst, offsets_size);
dst += sizeof(uint32_t);
strings::memcpy_inlined(dst, _offsets.data(), offsets_size);
dst += offsets_size;
return dst;
}
const uint8_t* BinaryColumn::deserialize_column(const uint8_t* src) {
uint32_t bytes_size = decode_fixed32_le(src);
src += sizeof(uint32_t);
_bytes.resize(bytes_size);
strings::memcpy_inlined(_bytes.data(), src, bytes_size);
src += bytes_size;
uint32_t offsets_size = decode_fixed32_le(src);
src += sizeof(uint32_t);
_offsets.resize(offsets_size / sizeof(Offset));
strings::memcpy_inlined(_offsets.data(), src, offsets_size);
src += offsets_size;
return src;
}
void BinaryColumn::fvn_hash(uint32_t* hashes, uint16_t from, uint16_t to) const {
for (uint16_t i = from; i < to; ++i) {
hashes[i] = HashUtil::fnv_hash(_bytes.data() + _offsets[i], _offsets[i + 1] - _offsets[i], hashes[i]);
}
}
void BinaryColumn::crc32_hash(uint32_t* hashes, uint16_t from, uint16_t to) const {
// keep hash if _bytes is empty
for (uint16_t i = from; i < to && !_bytes.empty(); ++i) {
hashes[i] = HashUtil::zlib_crc_hash(_bytes.data() + _offsets[i], _offsets[i + 1] - _offsets[i], hashes[i]);
}
}
void BinaryColumn::put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const {
uint32_t start = _offsets[idx];
uint32_t len = _offsets[idx + 1] - start;
buf->push_string((const char*)_bytes.data() + start, len);
}
std::string BinaryColumn::debug_item(uint32_t idx) const {
std::string s;
auto slice = get_slice(idx);
s.reserve(slice.size + 2);
s.push_back('\'');
s.append(slice.data, slice.size);
s.push_back('\'');
return s;
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,286 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/bytes.h"
#include "column/column.h"
#include "util/slice.h"
namespace starrocks::vectorized {
class BinaryColumn final : public ColumnFactory<Column, BinaryColumn> {
friend class ColumnFactory<Column, BinaryColumn>;
public:
using ValueType = Slice;
using Offset = uint32_t;
using Offsets = Buffer<uint32_t>;
using Bytes = starrocks::raw::RawVectorPad16<uint8_t>;
using Container = Buffer<Slice>;
// TODO(kks): when we create our own vector, we could let vector[-1] = 0,
// and then we don't need explicitly emplace_back zero value
BinaryColumn() { _offsets.emplace_back(0); }
BinaryColumn(Bytes bytes, Offsets offsets) : _bytes(std::move(bytes)), _offsets(std::move(offsets)) {
if (_offsets.empty()) {
_offsets.emplace_back(0);
}
};
// Copy constructor
// NOTE: do *NOT* copy |_slices|
BinaryColumn(const BinaryColumn& rhs) : _bytes(rhs._bytes), _offsets(rhs._offsets) {}
// Move constructor
// NOTE: do *NOT* copy |_slices|
BinaryColumn(BinaryColumn&& rhs) : _bytes(std::move(rhs._bytes)), _offsets(std::move(rhs._offsets)) {}
// Copy assignment
BinaryColumn& operator=(const BinaryColumn& rhs) {
BinaryColumn tmp(rhs);
this->swap_column(tmp);
return *this;
}
// Move assignment
BinaryColumn& operator=(BinaryColumn&& rhs) {
BinaryColumn tmp(std::move(rhs));
this->swap_column(tmp);
return *this;
}
~BinaryColumn() override {
if (!_offsets.empty()) {
DCHECK_EQ(_bytes.size(), _offsets.back());
} else {
DCHECK_EQ(_bytes.size(), 0);
}
}
bool low_cardinality() const override { return false; }
bool is_binary() const override { return true; }
const uint8_t* raw_data() const override {
if (!_slices_cache) {
_build_slices();
}
return reinterpret_cast<const uint8_t*>(_slices.data());
}
uint8_t* mutable_raw_data() override {
if (!_slices_cache) {
_build_slices();
}
return reinterpret_cast<uint8_t*>(_slices.data());
}
size_t size() const override { return _offsets.size() - 1; }
size_t type_size() const override { return sizeof(Slice); }
size_t byte_size() const override { return _bytes.size() * sizeof(uint8_t) + _offsets.size() * sizeof(Offset); }
size_t byte_size(size_t from, size_t size) const override {
DCHECK_LE(from + size, this->size()) << "Range error";
return (_offsets[from + size] - _offsets[from]) + size * sizeof(Offset);
}
size_t byte_size(size_t idx) const override { return _offsets[idx + 1] - _offsets[idx] + sizeof(uint32_t); }
Slice get_slice(size_t idx) const {
return Slice(_bytes.data() + _offsets[idx], _offsets[idx + 1] - _offsets[idx]);
}
// For n value, the offsets size is n + 1
// For example, for string "I","love","you"
// the _bytes array is "Iloveyou"
// the _offsets array is [0,1,5,8]
void reserve(size_t n) override {
// hard to know how the best reserve size of |_bytes|, inaccurate reserve may
// affect the performance.
// _bytes.reserve(n * 4);
_offsets.reserve(n + 1);
_slices_cache = false;
}
// If you know the size of the Byte array in advance, you can call this method,
// n means the number of strings, byte_size is the total length of the string
void reserve(size_t n, size_t byte_size) {
_offsets.reserve(n + 1);
_bytes.reserve(byte_size);
_slices_cache = false;
}
void resize(size_t n) override {
_offsets.resize(n + 1, _offsets.back());
_bytes.resize(_offsets.back());
_slices_cache = false;
}
void assign(size_t n, size_t idx) override;
void remove_first_n_values(size_t count) override;
void append(const Slice& str) {
_bytes.insert(_bytes.end(), str.data, str.data + str.size);
_offsets.emplace_back(_bytes.size());
_slices_cache = false;
}
void append_datum(const Datum& datum) override {
append(datum.get_slice());
_slices_cache = false;
}
void append(const Column& src, size_t offset, size_t count) override;
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
bool append_nulls(size_t count) override { return false; }
void append_string(const std::string& str) {
_bytes.insert(_bytes.end(), str.data(), str.data() + str.size());
_offsets.emplace_back(_bytes.size());
_slices_cache = false;
}
bool append_strings(const std::vector<Slice>& strs) override;
bool append_strings_overflow(const std::vector<Slice>& strs, size_t max_length) override;
bool append_continuous_strings(const std::vector<Slice>& strs) override;
size_t append_numbers(const void* buff, size_t length) override { return -1; }
void append_value_multiple_times(const void* value, size_t count) override;
void append_default() override {
_offsets.emplace_back(_bytes.size());
_slices_cache = false;
}
void append_default(size_t count) override {
_offsets.insert(_offsets.end(), count, _bytes.size());
_slices_cache = false;
}
uint32_t max_one_element_serialize_size() const override;
uint32_t serialize(size_t idx, uint8_t* pos) override;
uint32_t serialize_default(uint8_t* pos) override;
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override;
const uint8_t* deserialize_and_append(const uint8_t* pos) override;
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override;
uint32_t serialize_size(size_t idx) const override { return sizeof(uint32_t) + _offsets[idx + 1] - _offsets[idx]; }
size_t serialize_size() const override {
DCHECK_EQ(_bytes.size(), _offsets.back());
return byte_size() + sizeof(uint32_t) * 2; // _offsets size + _bytes size;
}
uint8_t* serialize_column(uint8_t* dst) override;
const uint8_t* deserialize_column(const uint8_t* src) override;
MutableColumnPtr clone_empty() const override { return create_mutable(); }
ColumnPtr cut(size_t start, size_t length) const;
size_t filter_range(const Column::Filter& filter, size_t start, size_t to) override;
int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const override;
void fvn_hash(uint32_t* hashes, uint16_t from, uint16_t to) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override;
std::string get_name() const override { return "binary"; }
Container& get_data() {
if (!_slices_cache) {
_build_slices();
}
return _slices;
}
const Container& get_data() const {
if (!_slices_cache) {
_build_slices();
}
return _slices;
}
Bytes& get_bytes() { return _bytes; }
const Bytes& get_bytes() const { return _bytes; }
Offsets& get_offset() { return _offsets; }
const Offsets& get_offset() const { return _offsets; }
Datum get(size_t n) const override { return Datum(get_slice(n)); }
size_t container_memory_usage() const override {
return _bytes.capacity() + _offsets.capacity() * sizeof(_offsets[0]) + _slices.capacity() * sizeof(_slices[0]);
}
size_t shrink_memory_usage() const override {
return _bytes.size() * sizeof(uint8_t) + _offsets.size() * sizeof(_offsets[0]) +
_slices.size() * sizeof(_slices[0]);
}
void swap_column(Column& rhs) override {
auto& r = down_cast<BinaryColumn&>(rhs);
using std::swap;
swap(_delete_state, r._delete_state);
swap(_bytes, r._bytes);
swap(_offsets, r._offsets);
swap(_slices, r._slices);
swap(_slices_cache, r._slices_cache);
}
void reset_column() override {
Column::reset_column();
// TODO(zhuming): shrink size if needed.
_bytes.clear();
_offsets.resize(1, 0);
_slices.clear();
_slices_cache = false;
}
void invalidate_slice_cache() { _slices_cache = false; }
std::string debug_item(uint32_t idx) const override;
std::string debug_string() const override {
std::stringstream ss;
ss << "[";
for (int i = 0; i < size() - 1; ++i) {
ss << debug_item(i) << ", ";
}
ss << debug_item(size() - 1) << "]";
return ss.str();
}
private:
void _build_slices() const;
Bytes _bytes;
Offsets _offsets;
mutable Container _slices;
mutable bool _slices_cache = false;
};
using Offsets = BinaryColumn::Offsets;
} // namespace starrocks::vectorized

17
be/src/column/bytes.h Normal file
View File

@ -0,0 +1,17 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <stdint.h>
#include <vector>
#include "util/raw_container.h"
namespace starrocks::vectorized {
// Bytes is a special vector<uint8_t> in which the internal memory is always allocated with an additional 16 bytes,
// to make life easier with 128 bit instructions.
typedef starrocks::raw::RawVectorPad16<uint8_t> Bytes;
} // namespace starrocks::vectorized

420
be/src/column/chunk.cpp Normal file
View File

@ -0,0 +1,420 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/chunk.h"
#include "column/column_helper.h"
#include "column/datum_tuple.h"
#include "column/fixed_length_column.h"
#include "gen_cpp/data.pb.h"
#include "gutil/strings/substitute.h"
#include "runtime/descriptors.h"
#include "util/coding.h"
namespace starrocks::vectorized {
Chunk::Chunk() {
_slot_id_to_index.init(4);
_tuple_id_to_index.init(1);
}
Chunk::Chunk(Columns columns, SchemaPtr schema) : _columns(std::move(columns)), _schema(std::move(schema)) {
// bucket size cannot be 0.
_cid_to_index.init(std::max<size_t>(1, columns.size() * 2));
_slot_id_to_index.init(std::max<size_t>(1, _columns.size() * 2));
_tuple_id_to_index.init(1);
rebuild_cid_index();
check_or_die();
}
// TODO: FlatMap don't support std::move
Chunk::Chunk(Columns columns, const butil::FlatMap<SlotId, size_t>& slot_map)
: _columns(std::move(columns)), _slot_id_to_index(slot_map) {
// when use _slot_id_to_index, we don't need to rebuild_cid_index
_tuple_id_to_index.init(1);
}
// TODO: FlatMap don't support std::move
Chunk::Chunk(Columns columns, const butil::FlatMap<SlotId, size_t>& slot_map,
const butil::FlatMap<SlotId, size_t>& tuple_map)
: _columns(std::move(columns)), _slot_id_to_index(slot_map), _tuple_id_to_index(tuple_map) {
// when use _slot_id_to_index, we don't need to rebuild_cid_index
}
void Chunk::reset() {
for (ColumnPtr& c : _columns) {
c->reset_column();
}
_delete_state = DEL_NOT_SATISFIED;
}
void Chunk::swap_chunk(Chunk& other) {
_columns.swap(other._columns);
_schema.swap(other._schema);
_cid_to_index.swap(other._cid_to_index);
_slot_id_to_index.swap(other._slot_id_to_index);
_tuple_id_to_index.swap(other._tuple_id_to_index);
std::swap(_delete_state, other._delete_state);
}
void Chunk::set_num_rows(size_t count) {
for (ColumnPtr& c : _columns) {
c->resize(count);
}
}
std::string Chunk::get_column_name(size_t idx) const {
DCHECK_LT(idx, _columns.size());
return _schema->field(idx)->name();
}
void Chunk::append_column(ColumnPtr column, const FieldPtr& field) {
DCHECK(_cid_to_index.seek(field->id()) == nullptr);
_cid_to_index[field->id()] = _columns.size();
_columns.emplace_back(std::move(column));
_schema->append(field);
check_or_die();
}
void Chunk::append_column(ColumnPtr column, SlotId slot_id) {
_slot_id_to_index[slot_id] = _columns.size();
_columns.emplace_back(std::move(column));
check_or_die();
}
void Chunk::update_column(ColumnPtr column, SlotId slot_id) {
_columns[_slot_id_to_index[slot_id]] = std::move(column);
check_or_die();
}
void Chunk::insert_column(size_t idx, ColumnPtr column, const FieldPtr& field) {
DCHECK_LT(idx, _columns.size());
_columns.emplace(_columns.begin() + idx, std::move(column));
_schema->insert(idx, field);
rebuild_cid_index();
check_or_die();
}
void Chunk::append_tuple_column(const ColumnPtr& column, TupleId tuple_id) {
_tuple_id_to_index[tuple_id] = _columns.size();
_columns.emplace_back(column);
check_or_die();
}
void Chunk::remove_column_by_index(size_t idx) {
DCHECK_LT(idx, _columns.size());
_columns.erase(_columns.begin() + idx);
if (_schema != nullptr) {
_schema->remove(idx);
rebuild_cid_index();
}
}
void Chunk::remove_columns_by_index(const std::vector<size_t>& indexes) {
DCHECK(std::is_sorted(indexes.begin(), indexes.end()));
for (int i = indexes.size(); i > 0; i--) {
_columns.erase(_columns.begin() + indexes[i - 1]);
}
if (_schema != nullptr && !indexes.empty()) {
for (int i = indexes.size(); i > 0; i--) {
_schema->remove(indexes[i - 1]);
}
rebuild_cid_index();
}
}
void Chunk::rebuild_cid_index() {
_cid_to_index.clear();
for (size_t i = 0; i < _schema->num_fields(); i++) {
_cid_to_index[_schema->field(i)->id()] = i;
}
}
size_t Chunk::serialize_size() const {
size_t size = 0;
for (const auto& column : _columns) {
size += column->serialize_size();
}
size += sizeof(uint32_t) + sizeof(uint32_t); // version + num rows
return size;
}
void Chunk::serialize(uint8_t* dst) const {
uint32_t version = 1;
encode_fixed32_le(dst, version);
dst += sizeof(uint32_t);
encode_fixed32_le(dst, num_rows());
dst += sizeof(uint32_t);
for (const auto& column : _columns) {
dst = column->serialize_column(dst);
}
}
size_t Chunk::serialize_with_meta(starrocks::ChunkPB* chunk) const {
chunk->clear_slot_id_map();
chunk->mutable_slot_id_map()->Reserve(static_cast<int>(_slot_id_to_index.size()) * 2);
for (const auto& kv : _slot_id_to_index) {
chunk->mutable_slot_id_map()->Add(kv.first);
chunk->mutable_slot_id_map()->Add(kv.second);
}
chunk->clear_tuple_id_map();
chunk->mutable_tuple_id_map()->Reserve(static_cast<int>(_tuple_id_to_index.size()) * 2);
for (const auto& kv : _tuple_id_to_index) {
chunk->mutable_tuple_id_map()->Add(kv.first);
chunk->mutable_tuple_id_map()->Add(kv.second);
}
chunk->clear_is_nulls();
chunk->mutable_is_nulls()->Reserve(_columns.size());
for (const auto& column : _columns) {
chunk->mutable_is_nulls()->Add(column->is_nullable());
}
chunk->clear_is_consts();
chunk->mutable_is_consts()->Reserve(_columns.size());
for (const auto& column : _columns) {
chunk->mutable_is_consts()->Add(column->is_constant());
}
DCHECK_EQ(_columns.size(), _tuple_id_to_index.size() + _slot_id_to_index.size());
size_t size = serialize_size();
chunk->mutable_data()->resize(size);
serialize((uint8_t*)chunk->mutable_data()->data());
return size;
}
Status Chunk::deserialize(const uint8_t* src, size_t len, const RuntimeChunkMeta& meta) {
_slot_id_to_index = meta.slot_id_to_index;
_tuple_id_to_index = meta.tuple_id_to_index;
_columns.resize(_slot_id_to_index.size() + _tuple_id_to_index.size());
uint32_t version = decode_fixed32_le(src);
DCHECK_EQ(version, 1);
src += sizeof(uint32_t);
size_t rows = decode_fixed32_le(src);
src += sizeof(uint32_t);
for (size_t i = 0; i < meta.is_nulls.size(); ++i) {
_columns[i] = ColumnHelper::create_column(meta.types[i], meta.is_nulls[i], meta.is_consts[i], rows);
}
for (const auto& column : _columns) {
src = column->deserialize_column(src);
}
size_t except = serialize_size();
if (UNLIKELY(len != except)) {
return Status::InternalError(
strings::Substitute("deserialize chunk data failed. len: $0, except: $1", len, except));
}
DCHECK_EQ(rows, num_rows());
return Status::OK();
}
std::unique_ptr<Chunk> Chunk::clone_empty() const {
return clone_empty(num_rows());
}
std::unique_ptr<Chunk> Chunk::clone_empty(size_t size) const {
if (_columns.size() == _slot_id_to_index.size()) {
return clone_empty_with_slot(size);
} else {
return clone_empty_with_schema(size);
}
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_slot() const {
return clone_empty_with_slot(num_rows());
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_slot(size_t size) const {
DCHECK_EQ(_columns.size(), _slot_id_to_index.size());
Columns columns(_slot_id_to_index.size());
for (size_t i = 0; i < _slot_id_to_index.size(); i++) {
columns[i] = _columns[i]->clone_empty();
columns[i]->reserve(size);
}
return std::make_unique<Chunk>(columns, _slot_id_to_index);
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_schema() const {
int size = num_rows();
return clone_empty_with_schema(size);
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_schema(size_t size) const {
Columns columns(_columns.size());
for (size_t i = 0; i < _columns.size(); ++i) {
columns[i] = _columns[i]->clone_empty();
columns[i]->reserve(size);
}
return std::make_unique<Chunk>(columns, _schema);
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_tuple() const {
return clone_empty_with_tuple(num_rows());
}
std::unique_ptr<Chunk> Chunk::clone_empty_with_tuple(size_t size) const {
Columns columns(_columns.size());
for (size_t i = 0; i < _columns.size(); ++i) {
columns[i] = _columns[i]->clone_empty();
columns[i]->reserve(size);
}
return std::make_unique<Chunk>(columns, _slot_id_to_index, _tuple_id_to_index);
}
void Chunk::append_selective(const Chunk& src, const uint32_t* indexes, uint32_t from, uint32_t size) {
DCHECK_EQ(_columns.size(), src.columns().size());
for (size_t i = 0; i < _columns.size(); ++i) {
_columns[i]->append_selective(*src.columns()[i].get(), indexes, from, size);
}
}
size_t Chunk::filter(const Buffer<uint8_t>& selection) {
for (auto& column : _columns) {
column->filter(selection);
}
return num_rows();
}
size_t Chunk::filter_range(const Buffer<uint8_t>& selection, size_t from, size_t to) {
for (auto& column : _columns) {
column->filter_range(selection, from, to);
}
return num_rows();
}
DatumTuple Chunk::get(size_t n) const {
DatumTuple res;
res.reserve(_columns.size());
for (const auto& column : _columns) {
res.append(column->get(n));
}
return res;
}
size_t Chunk::memory_usage() const {
size_t memory_usage = 0;
for (const auto& column : _columns) {
memory_usage += column->memory_usage();
}
return memory_usage;
}
size_t Chunk::shrink_memory_usage() const {
size_t memory_usage = 0;
for (const auto& column : _columns) {
memory_usage += column->shrink_memory_usage();
}
return memory_usage;
}
size_t Chunk::container_memory_usage() const {
size_t container_memory_usage = 0;
for (const auto& column : _columns) {
container_memory_usage += column->container_memory_usage();
}
return container_memory_usage;
}
size_t Chunk::element_memory_usage(size_t from, size_t size) const {
DCHECK_LE(from + size, num_rows()) << "Range error";
size_t element_memory_usage = 0;
for (const auto& column : _columns) {
element_memory_usage += column->element_memory_usage(from, size);
}
return element_memory_usage;
}
size_t Chunk::bytes_usage() const {
return bytes_usage(0, num_rows());
}
size_t Chunk::bytes_usage(size_t from, size_t size) const {
DCHECK_LE(from + size, num_rows()) << "Range error";
size_t bytes_usage = 0;
for (const auto& column : _columns) {
bytes_usage += column->byte_size(from, size);
}
return bytes_usage;
}
#ifndef NDEBUG
void Chunk::check_or_die() {
if (_columns.empty()) {
CHECK(_schema == nullptr || _schema->fields().empty());
CHECK(_cid_to_index.empty());
CHECK(_slot_id_to_index.empty());
CHECK(_tuple_id_to_index.empty());
} else {
for (const ColumnPtr& c : _columns) {
CHECK_EQ(num_rows(), c->size());
}
}
if (_schema != nullptr) {
for (const auto& kv : _cid_to_index) {
ColumnId cid = kv.first;
size_t idx = kv.second;
CHECK_LT(idx, _columns.size());
CHECK_LT(idx, _schema->num_fields());
CHECK_EQ(cid, _schema->field(idx)->id());
}
}
}
#endif
std::string Chunk::debug_row(uint32_t index) const {
std::stringstream os;
os << "[";
for (size_t col = 0; col < _columns.size() - 1; ++col) {
os << _columns[col]->debug_item(index);
os << ", ";
}
os << _columns[_columns.size() - 1]->debug_item(index) << "]";
return os.str();
}
void Chunk::append(const Chunk& src, size_t offset, size_t count) {
DCHECK_EQ(num_columns(), src.num_columns());
const size_t n = src.num_columns();
for (size_t i = 0; i < n; i++) {
ColumnPtr& c = get_column_by_index(i);
c->append(*src.get_column_by_index(i), offset, count);
}
}
void Chunk::append_safe(const Chunk& src, size_t offset, size_t count) {
DCHECK_EQ(num_columns(), src.num_columns());
const size_t n = src.num_columns();
size_t cur_rows = num_rows();
for (size_t i = 0; i < n; i++) {
ColumnPtr& c = get_column_by_index(i);
if (c->size() == cur_rows) {
c->append(*src.get_column_by_index(i), offset, count);
}
}
}
void Chunk::reserve(size_t cap) {
for (auto& c : _columns) {
c->reserve(cap);
}
}
bool Chunk::has_const_column() const {
for (const auto& c : _columns) {
if (c->is_constant()) {
return true;
}
}
return false;
}
} // namespace starrocks::vectorized

306
be/src/column/chunk.h Normal file
View File

@ -0,0 +1,306 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <runtime/types.h>
#include "butil/containers/flat_map.h"
#include "column/column.h"
#include "column/schema.h"
#include "common/global_types.h"
namespace starrocks {
class ChunkPB;
namespace vectorized {
class DatumTuple;
class Chunk {
public:
using ChunkPtr = std::shared_ptr<Chunk>;
Chunk();
Chunk(Columns columns, SchemaPtr schema);
Chunk(Columns columns, const butil::FlatMap<SlotId, size_t>& slot_map);
Chunk(Columns columns, const butil::FlatMap<SlotId, size_t>& slot_map,
const butil::FlatMap<TupleId, size_t>& tuple_map);
Chunk(Chunk&& other) = default;
Chunk& operator=(Chunk&& other) = default;
~Chunk() = default;
// Disallow copy and assignment.
Chunk(const Chunk& other) = delete;
Chunk& operator=(const Chunk& other) = delete;
// Remove all records and reset the delete state.
void reset();
bool has_rows() const { return num_rows() > 0; }
bool is_empty() const { return num_rows() == 0; }
bool has_columns() const { return !_columns.empty(); }
bool has_tuple_columns() const { return !_tuple_id_to_index.empty(); }
size_t num_columns() const { return _columns.size(); }
size_t num_rows() const { return _columns.empty() ? 0 : _columns[0]->size(); }
size_t num_tuple_columns() const { return _tuple_id_to_index.size(); }
// Resize the chunk to contain |count| rows elements.
// - If the current size is less than count, additional default values are appended.
// - If the current size is greater than count, the chunk is reduced to its first count elements.
void set_num_rows(size_t count);
void swap_chunk(Chunk& other);
const SchemaPtr& schema() const { return _schema; }
SchemaPtr& schema() { return _schema; }
const Columns& columns() const { return _columns; }
Columns& columns() { return _columns; }
// schema must exists.
std::string get_column_name(size_t idx) const;
// schema must exist and will be updated.
void append_column(ColumnPtr column, const FieldPtr& field);
void append_column(ColumnPtr column, SlotId slot_id);
void insert_column(size_t idx, ColumnPtr column, const FieldPtr& field);
void update_column(ColumnPtr column, SlotId slot_id);
void append_tuple_column(const ColumnPtr& column, TupleId tuple_id);
void remove_column_by_index(size_t idx);
// Remove multiple columns by their indexes.
// For simplicity and better performance, we are assuming |indexes| all all valid
// and is sorted in ascending order, if it's not, unexpected columns may be removed (silently).
// |indexes| can be empty and no column will be removed in this case.
void remove_columns_by_index(const std::vector<size_t>& indexes);
// schema must exists.
const ColumnPtr& get_column_by_name(const std::string& column_name) const;
ColumnPtr& get_column_by_name(const std::string& column_name);
const ColumnPtr& get_column_by_index(size_t idx) const;
ColumnPtr& get_column_by_index(size_t idx);
const ColumnPtr& get_column_by_id(ColumnId cid) const;
ColumnPtr& get_column_by_id(ColumnId cid);
const ColumnPtr& get_tuple_column_by_id(TupleId tuple_id) const;
ColumnPtr& get_tuple_column_by_id(TupleId tuple_id);
// Must ensure the slot_id exist
const ColumnPtr& get_column_by_slot_id(SlotId slot_id) const;
ColumnPtr& get_column_by_slot_id(SlotId slot_id);
void set_slot_id_to_index(SlotId slot_id, size_t idx) { _slot_id_to_index[slot_id] = idx; }
bool is_slot_exist(SlotId id) const { return _slot_id_to_index.seek(id) != nullptr; }
bool is_tuple_exist(TupleId id) const { return _tuple_id_to_index.seek(id) != nullptr; }
void reset_slot_id_to_index() { _slot_id_to_index.clear(); }
void set_columns(const Columns& columns) { _columns = columns; }
// The size for serialize chunk meta and chunk data
size_t serialize_size() const;
// Serialize chunk data and meta to ChunkPB
// The result value is the chunk data serialize size
size_t serialize_with_meta(starrocks::ChunkPB* chunk) const;
// Only serialize chunk data to dst
// The serialize format:
// version(4 byte)
// num_rows(4 byte)
// column 1 data
// column 2 data
// ...
// column n data
// Note: You should ensure the dst buffer size is enough
void serialize(uint8_t* dst) const;
// Deserialize chunk by |src| (chunk data) and |meta| (chunk meta)
Status deserialize(const uint8_t* src, size_t len, const RuntimeChunkMeta& meta);
// Create an empty chunk with the same meta and reserve it of size chunk _num_rows
// not clone tuple column
std::unique_ptr<Chunk> clone_empty() const;
std::unique_ptr<Chunk> clone_empty_with_slot() const;
std::unique_ptr<Chunk> clone_empty_with_schema() const;
// Create an empty chunk with the same meta and reserve it of specified size.
// not clone tuple column
std::unique_ptr<Chunk> clone_empty(size_t size) const;
std::unique_ptr<Chunk> clone_empty_with_slot(size_t size) const;
std::unique_ptr<Chunk> clone_empty_with_schema(size_t size) const;
// Create an empty chunk with the same meta and reserve it of size chunk _num_rows
std::unique_ptr<Chunk> clone_empty_with_tuple() const;
// Create an empty chunk with the same meta and reserve it of specified size.
std::unique_ptr<Chunk> clone_empty_with_tuple(size_t size) const;
void append(const Chunk& src) { append(src, 0, src.num_rows()); }
// Append |count| rows from |src|, started from |offset|, to the |this| chunk.
void append(const Chunk& src, size_t offset, size_t count);
// columns in chunk may have same column ptr
// append_safe will check size of all columns in dest chunk
// to ensure same column will not apppend repeatedly
void append_safe(const Chunk& src) { append_safe(src, 0, src.num_rows()); }
// columns in chunk may have same column ptr
// append_safe will check size of all columns in dest chunk
// to ensure same column will not apppend repeatedly
void append_safe(const Chunk& src, size_t offset, size_t count);
// This function will append data from src according to the input indexes. 'indexes' contains
// the row index of the src.
// This function will get row index from indexes and append the data to this chunk.
// This function will handle indexes start from input 'from' and will append 'size' times
// For example:
// input indexes: [5, 4, 3, 2, 1]
// from: 2
// size: 2
// This function will copy the [3, 2] row of src to this chunk.
void append_selective(const Chunk& src, const uint32_t* indexes, uint32_t from, uint32_t size);
// Remove rows from this chunk according to the vector |selection|.
// The n-th row will be removed if selection[n] is zero.
// The size of |selection| must be equal to the number of rows.
// Return the number of rows after filter.
size_t filter(const Buffer<uint8_t>& selection);
// Return the number of rows after filter.
size_t filter_range(const Buffer<uint8_t>& selection, size_t from, size_t to);
// Return the data of n-th row.
// This method is relatively slow and mainly used for unit tests now.
DatumTuple get(size_t n) const;
void set_delete_state(DelCondSatisfied state) { _delete_state = state; }
DelCondSatisfied delete_state() const { return _delete_state; }
const butil::FlatMap<TupleId, size_t>& get_tuple_id_to_index_map() const { return _tuple_id_to_index; }
const butil::FlatMap<SlotId, size_t>& get_slot_id_to_index_map() const { return _slot_id_to_index; }
// Call `Column::reserve` on each column of |chunk|, with |cap| passed as argument.
void reserve(size_t cap);
// Chunk memory usage, used for memory limit.
// Including container capacity size and element data size.
// 1. object column: (column container capacity * type size) + pointer element serialize data size
// 2. other columns: column container capacity * type size
size_t memory_usage() const;
// memory usage after shrink
size_t shrink_memory_usage() const;
// Column container memory usage
size_t container_memory_usage() const;
// Element memory usage that is not in the container, such as memory referenced by pointer.
size_t element_memory_usage() const { return element_memory_usage(0, num_rows()); }
// Element memory usage of |size| rows from |from| in chunk
size_t element_memory_usage(size_t from, size_t size) const;
// Chunk bytes usage, used for memtable data size statistic.
// Including element data size only.
// 1. object column: serialize data size
// 2. other columns: column container size * type size
size_t bytes_usage() const;
// Bytes usage of |size| rows from |from| in chunk
size_t bytes_usage(size_t from, size_t size) const;
bool has_const_column() const;
#ifndef NDEBUG
// check whether the internal state is consistent, abort the program if check failed.
void check_or_die();
#else
void check_or_die() {}
#endif
#ifndef NDEBUG
#define DCHECK_CHUNK(chunk_ptr) \
do { \
if ((chunk_ptr) != nullptr) { \
(chunk_ptr)->check_or_die(); \
} \
} while (false)
#else
#define DCHECK_CHUNK(chunk_ptr)
#endif
std::string debug_row(uint32_t index) const;
private:
void rebuild_cid_index();
Columns _columns;
std::shared_ptr<Schema> _schema;
butil::FlatMap<ColumnId, size_t> _cid_to_index;
// For compatibility
butil::FlatMap<SlotId, size_t> _slot_id_to_index;
butil::FlatMap<TupleId, size_t> _tuple_id_to_index;
DelCondSatisfied _delete_state = DEL_NOT_SATISFIED;
};
inline const ColumnPtr& Chunk::get_column_by_name(const std::string& column_name) const {
return const_cast<Chunk*>(this)->get_column_by_name(column_name);
}
inline ColumnPtr& Chunk::get_column_by_name(const std::string& column_name) {
size_t idx = _schema->get_field_index_by_name(column_name);
return _columns[idx];
}
inline const ColumnPtr& Chunk::get_column_by_slot_id(SlotId slot_id) const {
return const_cast<Chunk*>(this)->get_column_by_slot_id(slot_id);
}
inline ColumnPtr& Chunk::get_column_by_slot_id(SlotId slot_id) {
DCHECK(is_slot_exist(slot_id));
size_t idx = _slot_id_to_index[slot_id];
return _columns[idx];
}
inline const ColumnPtr& Chunk::get_column_by_index(size_t idx) const {
return const_cast<Chunk*>(this)->get_column_by_index(idx);
}
inline ColumnPtr& Chunk::get_column_by_index(size_t idx) {
DCHECK_LT(idx, _columns.size());
return _columns[idx];
}
inline const ColumnPtr& Chunk::get_column_by_id(ColumnId cid) const {
return const_cast<Chunk*>(this)->get_column_by_id(cid);
}
inline ColumnPtr& Chunk::get_column_by_id(ColumnId cid) {
DCHECK(!_cid_to_index.empty());
DCHECK(_cid_to_index.seek(cid) != nullptr);
return _columns[_cid_to_index[cid]];
}
inline const ColumnPtr& Chunk::get_tuple_column_by_id(TupleId tuple_id) const {
return const_cast<Chunk*>(this)->get_tuple_column_by_id(tuple_id);
}
inline ColumnPtr& Chunk::get_tuple_column_by_id(TupleId tuple_id) {
return _columns[_tuple_id_to_index[tuple_id]];
}
// Chunk meta for runtime compute
// Currently Used in DataStreamRecvr to deserialize Chunk
struct RuntimeChunkMeta {
std::vector<TypeDescriptor> types;
std::vector<bool> is_nulls;
std::vector<bool> is_consts;
butil::FlatMap<SlotId, size_t> slot_id_to_index;
butil::FlatMap<TupleId, size_t> tuple_id_to_index;
};
} // namespace vectorized
} // namespace starrocks

360
be/src/column/column.h Normal file
View File

@ -0,0 +1,360 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <memory>
#include <string>
#include <type_traits>
#include "column/datum.h"
#include "column/vectorized_fwd.h"
#include "gutil/casts.h"
#include "storage/delete_condition.h" // for DelCondSatisfied
#include "util/slice.h"
namespace starrocks {
class MemPool;
class MysqlRowBuffer;
namespace vectorized {
class Column {
public:
// we use append fixed size to achieve faster memory copy.
// We copy 350M rows, which total length is 2GB, max length is 15.
// When size is 0, it means copy the string's actual size.
// When size is other values, it means we copy the fixed length, which means we will copy
// more bytes for each field.
// Following is my test result.
// size | time
// 0 | 8s036ms
// 16 | 3s485ms
// 32 | 4s630ms
// 64 | 5s127ms
// 128 | 5s899ms
// 256 | 8s210ms
// From the result, we can see when fixed length is 128, we can get speed up for column read.
enum { APPEND_OVERFLOW_MAX_SIZE = 128 };
// mutable operations cannot be applied to shared data when concurrent
using Ptr = std::shared_ptr<Column>;
// mutable means you could modify the data safely
using MutablePtr = std::unique_ptr<Column>;
virtual ~Column() = default;
// If true means this is a null literal column
virtual bool only_null() const { return false; }
virtual bool is_nullable() const { return false; }
virtual bool has_null() const { return false; }
virtual bool is_null(size_t idx) const { return false; }
virtual bool is_numeric() const { return false; }
virtual bool is_constant() const { return false; }
virtual bool is_binary() const { return false; }
virtual bool is_decimal() const { return false; }
virtual bool is_date() const { return false; }
virtual bool is_timestamp() const { return false; }
virtual bool is_object() const { return false; }
virtual bool is_array() const { return false; }
virtual bool low_cardinality() const { return false; }
virtual const uint8_t* raw_data() const = 0;
virtual uint8_t* mutable_raw_data() = 0;
// Return number of values in column.
virtual size_t size() const = 0;
bool empty() const { return size() == 0; }
virtual size_t type_size() const = 0;
// Size of column data in memory (may be approximate). Zero, if could not be determined.
virtual size_t byte_size() const = 0;
virtual size_t byte_size(size_t from, size_t size) const {
DCHECK_LE(from + size, this->size()) << "Range error";
if (empty()) {
return 0;
}
return byte_size() * size / this->size();
}
// The byte size for serialize, for varchar, we need to add the len byte size
virtual size_t byte_size(size_t idx) const = 0;
virtual void reserve(size_t n) = 0;
virtual void resize(size_t n) = 0;
virtual void resize_uninitialized(size_t n) { resize(n); }
// Assign specified idx element to the column container content,
// and modifying column size accordingly.
virtual void assign(size_t n, size_t idx) = 0;
// Appends one value at the end of column (column's size is increased by 1).
virtual void append_datum(const Datum& datum) = 0;
virtual void remove_first_n_values(size_t count) = 0;
// Append |count| elements from |src|, started from the offset |offset|, into |this| column.
// It's undefined behaviour if |offset+count| greater than the size of |src|.
// The type of |src| and |this| must be exactly matched.
virtual void append(const Column& src, size_t offset, size_t count) = 0;
virtual void append(const Column& src) { append(src, 0, src.size()); }
// This function will append data from src according to the input indexes. 'indexes' contains
// the row index of the src.
// This function will get row index from indexes and append the data to this column.
// This function will handle indexes start from input 'from' and will append 'size' times
// For example:
// input indexes: [5, 4, 3, 2, 1]
// from: 2
// size: 2
// This function will copy the [3, 2] row of src to this column.
virtual void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) = 0;
// This function will get row through 'from' index from src, and copy size elements to this column.
virtual void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) = 0;
// Append multiple `null` values into this column.
// Return false if this is a non-nullable column, i.e, if `is_nullable` return false.
[[nodiscard]] virtual bool append_nulls(size_t count) = 0;
// Append multiple strings into this column.
// Return false if the column is not a binary column.
[[nodiscard]] virtual bool append_strings(const std::vector<Slice>& strs) = 0;
// Like append_strings. To achieve higher performance, this function will read 16 bytes out of
// bounds. So the caller must make sure that no invalid address access exception occurs for
// out-of-bounds reads
[[nodiscard]] virtual bool append_strings_overflow(const std::vector<Slice>& strs, size_t max_length) {
return false;
}
// Like `append_strings` but the corresponding storage of each slice is adjacent to the
// next one's, the implementation can take advantage of this feature, e.g, copy the whole
// memory at once.
[[nodiscard]] virtual bool append_continuous_strings(const std::vector<Slice>& strs) {
return append_strings(strs);
}
// Copy |length| bytes from |buff| into this column and cast them as integers.
// The count of copied integers depends on |length| and the size of column value:
// - `int8_t` column: |length| integers will be copied.
// - `int16_t` column: |length| / 2 integers will be copied.
// - `int32_t` column: |length| / 4 integers will be copied.
// - ...
// |buff| must NOT be nullptr.
// Return
// - the count of copied integers on success.
// - -1 if this is not a numeric column.
[[nodiscard]] virtual size_t append_numbers(const void* buff, size_t length) = 0;
// Append |*value| |count| times, this is only used when load default value.
virtual void append_value_multiple_times(const void* value, size_t count) = 0;
// Append one default value into this column.
// NOTE:
// - for `NullableColumn`, the default value is `null`.
// - for `BinaryColumn`, the default value is empty string.
// - for `FixedLengthColumn`, the default value is zero.
// - for `ConstColumn`, the default value is the const value itself.
virtual void append_default() = 0;
// Append multiple default values into this column.
virtual void append_default(size_t count) = 0;
// Sometimes(Hash group by multi columns),
// we need one buffer to hold tmp serialize data,
// So we need to know the max serialize_size for all column element
// The bad thing is we couldn't get the string defined len from FE when query
virtual uint32_t max_one_element_serialize_size() const {
return 16; // For Non-string type, 16 is enough.
}
// serialize one data,The memory must allocate firstly from mempool
virtual uint32_t serialize(size_t idx, uint8_t* pos) = 0;
// serialize default value of column
// The behavior is consistent with append_default
virtual uint32_t serialize_default(uint8_t* pos) = 0;
virtual void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) = 0;
// A dedicated serialization method used by HashJoin to combine multiple columns into a wide-key
// column, and it's only implemented by numeric columns right now.
// This method serializes its elements one by one into the destination buffer starting at
// (dst + byte_offset) with an interval between each element. It returns size of the data type
// (which should be fixed size) of this column if this column supports this method, otherwise
// it returns 0.
virtual size_t serialize_batch_at_interval(uint8_t* dst, size_t byte_offset, size_t byte_interval, size_t start,
size_t count) {
return 0;
};
// deserialize one data and append to this column
virtual const uint8_t* deserialize_and_append(const uint8_t* pos) = 0;
virtual void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) = 0;
// One element serialize_size
virtual uint32_t serialize_size(size_t idx) const = 0;
// The serialize bytes size when serialize by directly copy whole column data
virtual size_t serialize_size() const = 0;
// Serialize whole column data to dst
// The return value is dst + column serialize_size
virtual uint8_t* serialize_column(uint8_t* dst) = 0;
// Deserialize whole column from the src
// The return value is src + column serialize_size
// TODO(kks): validate the input src column data
virtual const uint8_t* deserialize_column(const uint8_t* src) = 0;
// return new empty column with the same type
virtual MutablePtr clone_empty() const = 0;
virtual MutablePtr clone() const = 0;
// clone column
virtual Ptr clone_shared() const = 0;
// REQUIRES: size of |filter| equals to the size of this column.
// Removes elements that don't match the filter.
using Filter = Buffer<uint8_t>;
inline size_t filter(const Filter& filter) {
DCHECK_EQ(size(), filter.size());
return filter_range(filter, 0, filter.size());
}
inline size_t filter(const Filter& filter, size_t count) { return filter_range(filter, 0, count); }
// FIXME: Many derived implementation assume |to| equals to size().
virtual size_t filter_range(const Filter& filter, size_t from, size_t to) = 0;
// Compares (*this)[left] and rhs[right]. Column rhs should have the same type.
// Returns negative number, 0, or positive number (*this)[left] is less, equal, greater than
// rhs[right] respectively.
//
// If one of element's value is NaN or NULLs, then:
// - if nan_direction_hint == -1, NaN and NULLs are considered as least than everything other;
// - if nan_direction_hint == 1, NaN and NULLs are considered as greatest than everything other.
// For example, if nan_direction_hint == -1 is used by descending sorting, NaNs will be at the end.
//
// For non Nullable and non floating point types, nan_direction_hint is ignored.
virtual int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const = 0;
// Compute fvn hash, mainly used by shuffle column data
// Note: shuffle hash function should be different from Aggregate and Join Hash map hash function
virtual void fvn_hash(uint32_t* seed, uint16_t from, uint16_t to) const = 0;
// used by data loading compute tablet bucket
virtual void crc32_hash(uint32_t* seed, uint16_t from, uint16_t to) const = 0;
// Push one row to MysqlRowBuffer
virtual void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const = 0;
void set_delete_state(DelCondSatisfied delete_state) { _delete_state = delete_state; }
DelCondSatisfied delete_state() const { return _delete_state; }
virtual std::string get_name() const = 0;
// Return the value of the n-th element.
virtual Datum get(size_t n) const = 0;
// return false if this is a non-nullable column.
// |idx| must less than the size of column.
[[nodiscard]] virtual bool set_null(size_t idx __attribute__((unused))) { return false; }
// Only used for debug one item in this column
virtual std::string debug_item(uint32_t idx) const { return ""; }
virtual std::string debug_string() const { return std::string(); }
// memory usage includes container memory usage and element memory usage.
// 1. container memory usage: container capacity * type size.
// 2. element memory usage: element data size that is not in the container,
// such as memory referenced by pointer.
// 2.1 object column: element serialize data size.
// 2.2 other columns: 0.
virtual size_t memory_usage() const { return container_memory_usage() + element_memory_usage(); }
virtual size_t shrink_memory_usage() const = 0;
virtual size_t container_memory_usage() const = 0;
virtual size_t element_memory_usage() const { return element_memory_usage(0, size()); }
virtual size_t element_memory_usage(size_t from, size_t size) const { return 0; }
virtual void swap_column(Column& rhs) = 0;
virtual void reset_column() { _delete_state = DEL_NOT_SATISFIED; }
protected:
DelCondSatisfied _delete_state = DEL_NOT_SATISFIED;
};
// AncestorBase is root class of inheritance hierarchy
// if Derived class is the direct subclass of the root, then AncestorBase is just the Base class
// if Derived class is the indirect subclass of the root, Base class is parent class, and
// AncestorBase must be the root class. because Derived class need some type information from
// AncestorBase to override the virtual method. e.g. clone and clone_shared method.
template <typename Base, typename Derived, typename AncestorBase = Base>
class ColumnFactory : public Base {
private:
Derived* mutable_derived() { return down_cast<Derived*>(this); }
const Derived* derived() const { return down_cast<const Derived*>(this); }
public:
template <typename... Args>
ColumnFactory(Args&&... args) : Base(std::forward<Args>(args)...) {}
// mutable operations cannot be applied to shared data when concurrent
using Ptr = std::shared_ptr<Derived>;
// mutable means you could modify the data safely
using MutablePtr = std::unique_ptr<Derived>;
using AncestorBaseType = std::enable_if_t<std::is_base_of_v<AncestorBase, Base>, AncestorBase>;
template <typename... Args>
static Ptr create(Args&&... args) {
return Ptr(new Derived(std::forward<Args>(args)...));
}
template <typename... Args>
static MutablePtr create_mutable(Args&&... args) {
return MutablePtr(new Derived(std::forward<Args>(args)...));
}
template <typename T>
static Ptr create(std::initializer_list<T>&& arg) {
return Ptr(new Derived(std::forward<std::initializer_list<T>>(arg)));
}
template <typename T>
static MutablePtr create_mutable(std::initializer_list<T>&& arg) {
return MutablePtr(new Derived(std::forward<std::initializer_list<T>>(arg)));
}
typename AncestorBaseType::MutablePtr clone() const {
return typename AncestorBase::MutablePtr(new Derived(*derived()));
}
typename AncestorBaseType::Ptr clone_shared() const { return typename AncestorBase::Ptr(new Derived(*derived())); }
};
} // namespace vectorized
} // namespace starrocks

View File

@ -0,0 +1,170 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/column_helper.h"
#include "util/raw_container.h"
namespace starrocks {
namespace vectorized {
template <PrimitiveType Type>
class ColumnBuilder {
public:
using DataColumnPtr = typename RunTimeColumnType<Type>::Ptr;
using NullColumnPtr = NullColumn::Ptr;
using DatumType = RunTimeCppType<Type>;
ColumnBuilder() {
static_assert(!pt_is_decimal<Type>, "Not support Decimal32/64/128 types");
_has_null = false;
_column = RunTimeColumnType<Type>::create();
_null_column = NullColumn::create();
reserve(config::vector_chunk_size);
}
ColumnBuilder(int precision, int scale) {
_has_null = false;
_column = RunTimeColumnType<Type>::create();
_null_column = NullColumn::create();
reserve(config::vector_chunk_size);
if constexpr (pt_is_decimal<Type>) {
static constexpr auto max_precision = decimal_precision_limit<DatumType>;
DCHECK(0 <= scale && scale <= precision && precision <= max_precision);
auto raw_column = ColumnHelper::cast_to_raw<Type>(_column);
raw_column->set_precision(precision);
raw_column->set_scale(scale);
}
}
ColumnBuilder(DataColumnPtr column, NullColumnPtr null_column, bool has_null)
: _column(column), _null_column(null_column), _has_null(has_null) {}
//do nothing ctor, members are initialized by its offsprings.
explicit ColumnBuilder<Type>(void*) {}
void append(const DatumType& value) {
_null_column->append(DATUM_NOT_NULL);
_column->append(value);
}
void append(const DatumType& value, bool is_null) {
_has_null = _has_null | is_null;
_null_column->append(is_null);
_column->append(value);
}
void append_null() {
_has_null = true;
_null_column->append(DATUM_NULL);
_column->append_default();
}
ColumnPtr build(bool is_const) {
if (is_const && _has_null) {
return ColumnHelper::create_const_null_column(_column->size());
}
if (is_const) {
return ConstColumn::create(_column, _column->size());
} else if (_has_null) {
return NullableColumn::create(_column, _null_column);
} else {
return _column;
}
}
void reserve(int size) {
_column->reserve(size);
_null_column->reserve(size);
}
DataColumnPtr data_column() { return _column; }
protected:
DataColumnPtr _column;
NullColumnPtr _null_column;
bool _has_null;
};
class NullableBinaryColumnBuilder : public ColumnBuilder<TYPE_VARCHAR> {
public:
using ColumnType = RunTimeColumnType<TYPE_VARCHAR>;
using Offsets = ColumnType::Offsets;
NullableBinaryColumnBuilder() : ColumnBuilder(nullptr) {
_column = ColumnType::create();
_null_column = NullColumn::create();
_has_null = false;
}
// allocate enough room for offsets and null_column
// reserve bytes_size bytes for Bytes. size of offsets
// and null_column are deterministic, so proper memory
// room can be allocated, but bytes' size is non-deterministic,
// so just reserve moderate memory room. offsets need no
// initialization(raw::make_room), because it is overwritten
// fully. null_columns should be zero-out(resize), just
// slot corresponding to null elements is marked to 1.
void resize(size_t num_rows, size_t bytes_size) {
_column->get_bytes().reserve(bytes_size);
auto& offsets = _column->get_offset();
raw::make_room(&offsets, num_rows + 1);
offsets[0] = 0;
_null_column->get_data().resize(num_rows);
}
// mark i-th resulting element is null
void set_null(size_t i) {
_has_null = true;
Bytes& bytes = _column->get_bytes();
Offsets& offsets = _column->get_offset();
NullColumn::Container& nulls = _null_column->get_data();
offsets[i + 1] = bytes.size();
nulls[i] = 1;
}
void append_empty(size_t i) {
Bytes& bytes = _column->get_bytes();
Offsets& offsets = _column->get_offset();
offsets[i + 1] = bytes.size();
}
void append(uint8_t* begin, uint8_t* end, size_t i) {
Bytes& bytes = _column->get_bytes();
Offsets& offsets = _column->get_offset();
bytes.insert(bytes.end(), begin, end);
offsets[i + 1] = bytes.size();
}
// for concat and concat_ws, several columns are concatenated
// together into a string, so append must be invoked as many times
// as the number of evolving columns; however, the offset is updated
// only once, so we split the append into append_partial and append_complete
// as follows
void append_partial(uint8_t* begin, uint8_t* end) {
Bytes& bytes = _column->get_bytes();
bytes.insert(bytes.end(), begin, end);
}
void append_complete(size_t i) {
Bytes& bytes = _column->get_bytes();
Offsets& offsets = _column->get_offset();
offsets[i + 1] = bytes.size();
}
// move current ptr backwards for n bytes, used in concat_ws
void rewind(size_t n) {
Bytes& bytes = _column->get_bytes();
bytes.resize(bytes.size() - n);
}
NullColumnPtr get_null_column() { return _null_column; }
NullColumn::Container& get_null_data() { return _null_column->get_data(); }
// has_null = true means the finally resulting NullableColumn has nulls.
void set_has_null(bool has_null) { _has_null = has_null; }
private:
};
} // namespace vectorized
} // namespace starrocks

222
be/src/column/column_hash.h Normal file
View File

@ -0,0 +1,222 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "util/hash_util.hpp"
#include "util/slice.h"
#include "util/unaligned_access.h"
namespace starrocks::vectorized {
typedef unsigned __int128 uint128_t;
inline uint64_t umul128(uint64_t a, uint64_t b, uint64_t* high) {
auto result = static_cast<uint128_t>(a) * static_cast<uint128_t>(b);
*high = static_cast<uint64_t>(result >> 64u);
return static_cast<uint64_t>(result);
}
template <int n>
struct phmap_mix {
inline size_t operator()(size_t) const;
};
template <>
class phmap_mix<4> {
public:
inline size_t operator()(size_t a) const {
static constexpr uint64_t kmul = 0xcc9e2d51UL;
uint64_t l = a * kmul;
return static_cast<size_t>(l ^ (l >> 32u));
}
};
template <>
class phmap_mix<8> {
public:
// Very fast mixing (similar to Abseil)
inline size_t operator()(size_t a) const {
static constexpr uint64_t k = 0xde5fb9d2630458e9ULL;
uint64_t h;
uint64_t l = umul128(a, k, &h);
return static_cast<size_t>(h + l);
}
};
enum PhmapSeed { PhmapSeed1, PhmapSeed2 };
template <int n, PhmapSeed seed>
class phmap_mix_with_seed {
public:
inline size_t operator()(size_t) const;
};
template <>
class phmap_mix_with_seed<4, PhmapSeed1> {
public:
inline size_t operator()(size_t a) const {
static constexpr uint64_t kmul = 0xcc9e2d51UL;
uint64_t l = a * kmul;
return static_cast<size_t>(l ^ (l >> 32u));
}
};
template <>
class phmap_mix_with_seed<8, PhmapSeed1> {
public:
inline size_t operator()(size_t a) const {
static constexpr uint64_t k = 0xde5fb9d2630458e9ULL;
uint64_t h;
uint64_t l = umul128(a, k, &h);
return static_cast<size_t>(h + l);
}
};
template <>
class phmap_mix_with_seed<4, PhmapSeed2> {
public:
inline size_t operator()(size_t a) const {
static constexpr uint64_t kmul = 0xcc9e2d511d;
uint64_t l = a * kmul;
return static_cast<size_t>(l ^ (l >> 32u));
}
};
template <>
class phmap_mix_with_seed<8, PhmapSeed2> {
public:
inline size_t operator()(size_t a) const {
static constexpr uint64_t k = 0xde5fb9d263046000ULL;
uint64_t h;
uint64_t l = umul128(a, k, &h);
return static_cast<size_t>(h + l);
}
};
static uint32_t crc_hash_32(const void* data, int32_t bytes, uint32_t hash) {
uint32_t words = bytes / sizeof(uint32_t);
bytes = bytes % 4 /*sizeof(uint32_t)*/;
auto* p = reinterpret_cast<const uint8_t*>(data);
while (words--) {
hash = _mm_crc32_u32(hash, unaligned_load<uint32_t>(p));
p += sizeof(uint32_t);
}
while (bytes--) {
hash = _mm_crc32_u8(hash, *p);
++p;
}
// The lower half of the CRC hash has has poor uniformity, so swap the halves
// for anyone who only uses the first several bits of the hash.
hash = (hash << 16u) | (hash >> 16u);
return hash;
}
static uint64_t crc_hash_64(const void* data, int32_t length, uint64_t hash) {
if (UNLIKELY(length < 8)) {
return crc_hash_32(data, length, hash);
}
uint64_t words = length / sizeof(uint64_t);
auto* p = reinterpret_cast<const uint8_t*>(data);
auto* end = reinterpret_cast<const uint8_t*>(data) + length;
while (words--) {
hash = _mm_crc32_u64(hash, unaligned_load<uint64_t>(p));
p += sizeof(uint64_t);
}
// Reduce the branch condition
p = end - 8;
hash = _mm_crc32_u64(hash, unaligned_load<uint64_t>(p));
return hash;
}
class SliceHash {
public:
// TODO: 0x811C9DC5 is not prime number
static const uint32_t CRC_SEED = 0x811C9DC5;
std::size_t operator()(const Slice& slice) const {
return crc_hash_64(slice.data, static_cast<int32_t>(slice.size), CRC_SEED);
}
};
template <PhmapSeed>
class SliceHashWithSeed {
public:
std::size_t operator()(const Slice& slice) const;
};
template <>
class SliceHashWithSeed<PhmapSeed1> {
public:
static const uint32_t CRC_SEED = 0x811C9DC5;
std::size_t operator()(const Slice& slice) const {
return crc_hash_64(slice.data, static_cast<int32_t>(slice.size), CRC_SEED);
}
};
template <>
class SliceHashWithSeed<PhmapSeed2> {
public:
static const uint32_t CRC_SEED = 0x811c9dd7;
std::size_t operator()(const Slice& slice) const {
return crc_hash_64(slice.data, static_cast<int32_t>(slice.size), CRC_SEED);
}
};
#if defined(__SSE2__) && !defined(ADDRESS_SANITIZER)
// NOTE: This function will access 15 excessive bytes after p1 and p2.
// NOTE: typename T must be uint8_t or int8_t
template <typename T>
typename std::enable_if<sizeof(T) == 1, bool>::type memequal(const T* p1, size_t size1, const T* p2, size_t size2) {
if (size1 != size2) {
return false;
}
for (size_t offset = 0; offset < size1; offset += 16) {
uint16_t mask =
_mm_movemask_epi8(_mm_cmpeq_epi8(_mm_loadu_si128(reinterpret_cast<const __m128i*>(p1 + offset)),
_mm_loadu_si128(reinterpret_cast<const __m128i*>(p2 + offset))));
mask = ~mask;
if (mask) {
offset += __builtin_ctz(mask);
return offset >= size1;
}
}
return true;
}
#else
template <typename T>
typename std::enable_if<sizeof(T) == 1, bool>::type memequal(const T* p1, size_t size1, const T* p2, size_t size2) {
return (size1 == size2) && (memcmp(p1, p2, size1) == 0);
}
#endif
static constexpr uint16_t SLICE_MEMEQUAL_OVERFLOW_PADDING = 15;
class SliceEqual {
public:
bool operator()(const Slice& x, const Slice& y) const { return memequal(x.data, x.size, y.data, y.size); }
};
class SliceNormalEqual {
public:
bool operator()(const Slice& x, const Slice& y) const {
return (x.size == y.size) && (memcmp(x.data, y.data, x.size) == 0);
}
};
template <class T>
class StdHash {
public:
std::size_t operator()(T value) const { return phmap_mix<sizeof(size_t)>()(std::hash<T>()(value)); }
};
template <class T, PhmapSeed seed>
class StdHashWithSeed {
public:
std::size_t operator()(T value) const { return phmap_mix_with_seed<sizeof(size_t), seed>()(std::hash<T>()(value)); }
};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,322 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/column_helper.h"
#include <runtime/types.h>
#include "column/array_column.h"
#include "common/config.h"
#include "gutil/casts.h"
#include "runtime/types.h"
#include "simd/simd.h"
namespace starrocks::vectorized {
NullColumnPtr ColumnHelper::one_size_not_null_column = NullColumn::create(1, 0);
NullColumnPtr ColumnHelper::one_size_null_column = NullColumn::create(1, 1);
NullColumnPtr ColumnHelper::s_all_not_null_column = nullptr;
void ColumnHelper::init_static_variable() {
ColumnHelper::s_all_not_null_column = NullColumn::create(config::vector_chunk_size, 0);
}
Column::Filter& ColumnHelper::merge_nullable_filter(Column* column) {
if (column->is_nullable()) {
auto* nullable_column = down_cast<NullableColumn*>(column);
auto nulls = nullable_column->null_column_data().data();
auto& sel_vec = (down_cast<UInt8Column*>(nullable_column->mutable_data_column()))->get_data();
// NOTE(zc): Must use uint8_t* to enable auto-vectorized.
auto selected = sel_vec.data();
size_t num_rows = sel_vec.size();
// we treat null(1) as false(0)
for (size_t i = 0; i < num_rows; ++i) {
selected[i] &= !nulls[i];
}
return sel_vec;
} else {
return (down_cast<UInt8Column*>(column))->get_data();
}
}
void ColumnHelper::merge_two_filters(const ColumnPtr column, Column::Filter* __restrict filter, bool* all_zero) {
if (column->is_nullable()) {
auto* nullable_column = as_raw_column<NullableColumn>(column);
// NOTE(zc): Must use uint8_t* to enable auto-vectorized.
auto nulls = nullable_column->null_column_data().data();
auto datas = (down_cast<UInt8Column*>(nullable_column->mutable_data_column()))->get_data().data();
auto num_rows = nullable_column->size();
// we treat null(1) as false(0)
for (size_t j = 0; j < num_rows; ++j) {
(*filter)[j] &= (!nulls[j]) & datas[j];
}
} else {
size_t num_rows = column->size();
auto datas = as_raw_column<UInt8Column>(column)->get_data().data();
for (size_t j = 0; j < num_rows; ++j) {
(*filter)[j] &= datas[j];
}
}
if (all_zero != nullptr) {
// filter has just been updated, cache locality is good here.
// noted that here we don't need to count zero, but to check is there any non-zero.
// filter values are 0/1, we can use memchr here.
*all_zero = (memchr(filter->data(), 0x1, filter->size()) == nullptr);
}
}
void ColumnHelper::merge_filters(const Columns& columns, Column::Filter* __restrict filter) {
DCHECK_GT(columns.size(), 0);
// All filters must be the same length, there is no const filter
for (int i = 0; i < columns.size(); i++) {
bool all_zero = false;
merge_two_filters(columns[i], filter, &all_zero);
if (all_zero) {
break;
}
}
}
void ColumnHelper::merge_two_filters(Column::Filter* __restrict filter, const uint8_t* __restrict selected,
bool* all_zero) {
uint8_t* data = filter->data();
size_t num_rows = filter->size();
for (size_t i = 0; i < num_rows; i++) {
data[i] = data[i] & selected[i];
}
if (all_zero != nullptr) {
*all_zero = (memchr(filter->data(), 0x1, num_rows) == nullptr);
}
}
size_t ColumnHelper::count_nulls(const starrocks::vectorized::ColumnPtr& col) {
if (!col->is_nullable()) {
return 0;
}
if (col->only_null()) {
return col->size();
}
const Buffer<uint8_t>& null_data = as_raw_column<NullableColumn>(col)->null_column_data();
// @Warn: be careful, should rewrite the code if NullColumn type changed!
return SIMD::count_nonzero(null_data);
}
size_t ColumnHelper::count_true_with_notnull(const starrocks::vectorized::ColumnPtr& col) {
if (col->only_null()) {
return 0;
}
if (col->is_constant()) {
bool is_true = ColumnHelper::get_const_value<TYPE_BOOLEAN>(col);
return is_true ? col->size() : 0;
}
if (col->is_nullable()) {
auto tmp = ColumnHelper::as_raw_column<NullableColumn>(col);
const Buffer<uint8_t>& null_data = tmp->null_column_data();
const Buffer<uint8_t>& bool_data = ColumnHelper::cast_to_raw<TYPE_BOOLEAN>(tmp->data_column())->get_data();
int null_count = SIMD::count_nonzero(null_data);
int true_count = SIMD::count_nonzero(bool_data);
if (null_count == col->size()) {
return 0;
} else if (null_count == 0) {
return true_count;
} else {
// In fact, the null_count maybe is different with true_count, but it's no impact
return null_count;
}
} else {
const Buffer<uint8_t>& bool_data = ColumnHelper::cast_to_raw<TYPE_BOOLEAN>(col)->get_data();
return SIMD::count_nonzero(bool_data);
}
}
size_t ColumnHelper::count_false_with_notnull(const starrocks::vectorized::ColumnPtr& col) {
if (col->only_null()) {
return 0;
}
if (col->is_constant()) {
bool is_true = ColumnHelper::get_const_value<TYPE_BOOLEAN>(col);
return is_true ? 0 : col->size();
}
if (col->is_nullable()) {
auto tmp = ColumnHelper::as_raw_column<NullableColumn>(col);
const Buffer<uint8_t>& null_data = tmp->null_column_data();
const Buffer<uint8_t>& bool_data = ColumnHelper::cast_to_raw<TYPE_BOOLEAN>(tmp->data_column())->get_data();
int null_count = SIMD::count_nonzero(null_data);
int false_count = SIMD::count_zero(bool_data);
if (null_count == col->size()) {
return 0;
} else if (null_count == 0) {
return false_count;
} else {
// In fact, the null_count maybe is different with false_count, but it's no impact
return null_count;
}
} else {
const Buffer<uint8_t>& bool_data = ColumnHelper::cast_to_raw<TYPE_BOOLEAN>(col)->get_data();
return SIMD::count_zero(bool_data);
}
}
ColumnPtr ColumnHelper::create_const_null_column(size_t chunk_size) {
auto nullable_column = NullableColumn::create(Int8Column::create(), NullColumn::create());
nullable_column->append_nulls(1);
return ConstColumn::create(nullable_column, chunk_size);
}
ColumnPtr ColumnHelper::create_column(const TypeDescriptor& type_desc, bool nullable) {
return create_column(type_desc, nullable, false, 0);
}
ColumnPtr ColumnHelper::create_column(const TypeDescriptor& type_desc, bool nullable, bool is_const, size_t size) {
auto type = type_desc.type;
if (VLOG_ROW_IS_ON) {
VLOG_ROW << "PrimitiveType " << type << " nullable " << nullable << " is_const " << is_const;
}
if (nullable && is_const) {
return ColumnHelper::create_const_null_column(size);
}
if (type == TYPE_NULL) {
if (is_const) {
return ColumnHelper::create_const_null_column(size);
} else if (nullable) {
return NullableColumn::create(BooleanColumn::create(), NullColumn::create());
}
}
ColumnPtr p;
switch (type_desc.type) {
case TYPE_BOOLEAN:
p = BooleanColumn::create();
break;
case TYPE_TINYINT:
p = Int8Column::create();
break;
case TYPE_SMALLINT:
p = Int16Column::create();
break;
case TYPE_INT:
p = Int32Column::create();
break;
case TYPE_BIGINT:
p = Int64Column::create();
break;
case TYPE_LARGEINT:
p = Int128Column::create();
break;
case TYPE_FLOAT:
p = FloatColumn::create();
break;
case TYPE_DOUBLE:
p = DoubleColumn::create();
break;
case TYPE_DECIMALV2:
p = DecimalColumn::create();
break;
case TYPE_DATE:
p = DateColumn::create();
break;
case TYPE_DATETIME:
p = TimestampColumn::create();
break;
case TYPE_TIME:
p = DoubleColumn::create();
break;
case TYPE_VARCHAR:
case TYPE_CHAR:
p = BinaryColumn::create();
break;
case TYPE_HLL:
p = HyperLogLogColumn::create();
break;
case TYPE_OBJECT:
p = BitmapColumn::create();
break;
case TYPE_DECIMAL32: {
p = Decimal32Column::create(type_desc.precision, type_desc.scale);
break;
}
case TYPE_DECIMAL64: {
p = Decimal64Column::create(type_desc.precision, type_desc.scale);
break;
}
case TYPE_DECIMAL128: {
p = Decimal128Column::create(type_desc.precision, type_desc.scale);
break;
}
case TYPE_PERCENTILE:
p = PercentileColumn ::create();
break;
case TYPE_ARRAY: {
auto offsets = UInt32Column::create();
auto data = create_column(type_desc.children[0], true);
p = ArrayColumn::create(std::move(data), std::move(offsets));
break;
}
case INVALID_TYPE:
case TYPE_NULL:
case TYPE_BINARY:
case TYPE_DECIMAL:
case TYPE_STRUCT:
case TYPE_MAP:
CHECK(false) << "unreachable path: " << type_desc.type;
return nullptr;
}
if (is_const) {
return ConstColumn::create(p);
}
if (nullable) {
return NullableColumn::create(p, NullColumn::create());
}
return p;
}
bool ColumnHelper::is_all_const(const Columns& columns) {
return std::all_of(std::begin(columns), std::end(columns), [](const ColumnPtr& col) { return col->is_constant(); });
}
using ColumnsConstIterator = Columns::const_iterator;
bool ColumnHelper::is_all_const(ColumnsConstIterator const& begin, ColumnsConstIterator const& end) {
for (auto it = begin; it < end; ++it) {
if (!(*it)->is_constant()) {
return false;
}
}
return true;
}
size_t ColumnHelper::compute_bytes_size(ColumnsConstIterator const& begin, ColumnsConstIterator const& end) {
size_t n = 0;
size_t row_num = (*begin)->size();
for (auto it = begin; it < end; ++it) {
ColumnPtr const& col = *it;
// const null column is neglected
if (col->only_null()) {
continue;
}
auto binary = ColumnHelper::get_binary_column(col.get());
if (col->is_constant()) {
n += binary->get_bytes().size() * row_num;
} else {
n += binary->get_bytes().size();
}
}
return n;
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,236 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <immintrin.h>
#include <runtime/types.h>
#include "column/const_column.h"
#include "column/type_traits.h"
#include "common/config.h"
#include "gutil/bits.h"
#include "gutil/casts.h"
#include "runtime/primitive_type.h"
namespace starrocks {
struct TypeDescriptor;
}
namespace starrocks::vectorized {
class ColumnHelper {
public:
static void init_static_variable();
// The input column is nullable or non-nullable uint8 column
// The result column is not nullable uint8 column
// For nullable uint8 column, we merge it's null column and data column
// Used in ExecNode::eval_conjuncts
static Column::Filter& merge_nullable_filter(Column* column);
// merge column with filter, and save result to filer.
// `all_zero` means, after merging, if there is only zero value in filter.
static void merge_two_filters(const ColumnPtr column, Column::Filter* __restrict filter, bool* all_zero = nullptr);
static void merge_filters(const Columns& columns, Column::Filter* __restrict filter);
static void merge_two_filters(Column::Filter* __restrict filter, const uint8_t* __restrict selected,
bool* all_zero = nullptr);
static size_t count_nulls(const ColumnPtr& col);
/**
* 1 (not null and is true)
* 0 (null or is false)
* @param col must be a boolean column
*/
static size_t count_true_with_notnull(const ColumnPtr& col);
/**
* 1 (not null and is false)
* 0 (null or is true)
* @param col must be a boolean column
*/
static size_t count_false_with_notnull(const ColumnPtr& col);
template <PrimitiveType Type>
static inline ColumnPtr create_const_column(const RunTimeCppType<Type>& value, size_t chunk_size) {
static_assert(!pt_is_decimal<Type>,
"Decimal column can not created by this function because of missing "
"precision and scale param");
auto ptr = RunTimeColumnType<Type>::create();
ptr->append_datum(Datum(value));
// @FIXME: BinaryColumn get_data() will call build_slice() to modify the column's memory data,
// but the operator is thread-unsafe, it's will cause crash in multi-thread(OLAP_SCANNER) when
// OLAP_SCANNER call expression.
// Call the get_data() when create ConstColumn is a short-term solution
ptr->get_data();
return ConstColumn::create(ptr, chunk_size);
}
// If column is const column, duplicate the data column to chunk_size
static ColumnPtr unpack_and_duplicate_const_column(size_t chunk_size, const ColumnPtr& column) {
if (column->is_constant()) {
ConstColumn* const_column = down_cast<ConstColumn*>(column.get());
const_column->data_column()->assign(chunk_size, 0);
return const_column->data_column();
}
return column;
}
static ColumnPtr unfold_const_column(const TypeDescriptor& type_desc, size_t size, const ColumnPtr& column) {
if (column->only_null()) {
auto col = ColumnHelper::create_column(type_desc, true);
[[maybe_unused]] bool ok = col->append_nulls(size);
DCHECK(ok);
return col;
} else if (column->is_constant()) {
ConstColumn* const_column = down_cast<ConstColumn*>(column.get());
const_column->data_column()->assign(size, 0);
return const_column->data_column();
}
return column;
}
static ColumnPtr create_column(const TypeDescriptor& type_desc, bool nullable);
// If is_const is true, you must pass the size arg
static ColumnPtr create_column(const TypeDescriptor& type_desc, bool nullable, bool is_const, size_t size);
/**
* Cast columnPtr to special type ColumnPtr
* Plz sure actual column type by yourself
*/
template <PrimitiveType Type>
static inline typename RunTimeColumnType<Type>::Ptr cast_to(const ColumnPtr& value) {
return std::static_pointer_cast<RunTimeColumnType<Type>>(value);
}
/**
* Cast columnPtr to special type Column*
* Plz sure actual column type by yourself
*/
template <PrimitiveType Type>
static inline RunTimeColumnType<Type>* cast_to_raw(const ColumnPtr& value) {
return down_cast<RunTimeColumnType<Type>*>(value.get());
}
/**
* Cast columnPtr to special type ColumnPtr
* Plz sure actual column type by yourself
*/
template <typename Type>
static inline typename Type::Ptr as_column(ColumnPtr value) {
return std::static_pointer_cast<Type>(value);
}
/**
* Cast columnPtr to special type Column*
* Plz sure actual column type by yourself
*/
template <typename Type>
static inline Type* as_raw_column(const ColumnPtr& value) {
return down_cast<Type*>(value.get());
}
template <PrimitiveType Type>
static inline RunTimeCppType<Type> get_const_value(const ColumnPtr& col) {
const ColumnPtr& c = as_raw_column<ConstColumn>(col)->data_column();
return cast_to_raw<Type>(c)->get_data()[0];
}
static Column* get_data_column(Column* column) {
if (column->is_nullable()) {
auto* nullable_column = down_cast<NullableColumn*>(column);
return nullable_column->mutable_data_column();
} else if (column->is_constant()) {
auto* const_column = down_cast<ConstColumn*>(column);
return const_column->mutable_data_column()->get();
} else {
return column;
}
}
static const Column* get_data_column(const Column* column) {
if (column->is_nullable()) {
auto* nullable_column = down_cast<const NullableColumn*>(column);
return nullable_column->data_column().get();
} else if (column->is_constant()) {
auto* const_column = down_cast<const ConstColumn*>(column);
return const_column->data_column().get();
} else {
return column;
}
}
static BinaryColumn* get_binary_column(Column* column) { return down_cast<BinaryColumn*>(get_data_column(column)); }
static bool is_all_const(const Columns& columns);
using ColumnsConstIterator = Columns::const_iterator;
static bool is_all_const(ColumnsConstIterator const& begin, ColumnsConstIterator const& end);
static size_t compute_bytes_size(ColumnsConstIterator const& begin, ColumnsConstIterator const& end);
template <typename T>
static size_t filter_range(const Column::Filter& filter, T* data, size_t from, size_t to) {
auto start_offset = from;
auto result_offset = from;
#ifdef __AVX2__
const uint8_t* f_data = filter.data();
size_t data_type_size = sizeof(T);
constexpr size_t kBatchNums = 256 / (8 * sizeof(uint8_t));
const __m256i all0 = _mm256_setzero_si256();
while (start_offset + kBatchNums < to) {
__m256i f = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(f_data + start_offset));
uint32_t mask = _mm256_movemask_epi8(_mm256_cmpgt_epi8(f, all0));
if (mask == 0) {
// all no hit, pass
} else if (mask == 0xffffffff) {
// all hit, copy all
memmove(data + result_offset, data + start_offset, kBatchNums * data_type_size);
result_offset += kBatchNums;
} else {
// skip not hit row, it's will reduce compare when filter layout is sparse,
// like "00010001...", but is ineffective when the filter layout is dense.
int zero_count = Bits::CountTrailingZerosNonZero32(mask);
int i = zero_count;
while (i < kBatchNums) {
mask = zero_count < 31 ? mask >> (zero_count + 1u) : 0;
*(data + result_offset) = *(data + start_offset + i);
zero_count = Bits::CountTrailingZeros32(mask);
result_offset += 1;
i += (zero_count + 1);
}
}
start_offset += kBatchNums;
}
#endif
for (auto i = start_offset; i < to; ++i) {
if (filter[i]) {
*(data + result_offset) = *(data + i);
result_offset++;
}
}
return result_offset;
}
template <typename T>
static size_t filter(const Column::Filter& filter, T* data) {
return filter_range(filter, data, 0, filter.size());
}
static ColumnPtr create_const_null_column(size_t chunk_size);
static NullColumnPtr one_size_not_null_column;
static NullColumnPtr one_size_null_column;
static NullColumnPtr s_all_not_null_column;
};
} // namespace starrocks::vectorized

448
be/src/column/column_pool.h Normal file
View File

@ -0,0 +1,448 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <butil/thread_local.h>
#include <butil/time.h> // NOLINT
#include <atomic>
#include "common/compiler_util.h"
DIAGNOSTIC_PUSH
DIAGNOSTIC_IGNORE("-Wclass-memaccess")
#include <bvar/bvar.h>
DIAGNOSTIC_POP
#include "column/binary_column.h"
#include "column/const_column.h"
#include "column/decimalv3_column.h"
#include "column/fixed_length_column.h"
#include "column/object_column.h"
#include "common/config.h"
#include "common/type_list.h"
#include "gutil/dynamic_annotations.h"
namespace starrocks::vectorized {
// Before a thread exit, the corresponding thread-local column pool will be destroyed and the following
// bvar's will be updated too. This is ok in the production environment, because no column pool exists
// in the main thread, in other words, accessing a bvar is safe when destroying a thread-local column pool.
// But it's NOT true when running unit tests, because unit tests usually run in the main thread and a local
// column pool will be created in the main thread. When destroying the column pool in the main thread, the
// bvar's may have been destroyed, so it's unsafe to update bvar when destroying a column pool.
// To work around this, we simply do not update bvar's in unit tests.
inline bvar::Adder<uint64_t> g_column_pool_oversized_columns("column_pool", "oversized_columns");
inline bvar::Adder<int64_t> g_column_pool_total_local_bytes("column_pool", "total_local_bytes");
inline bvar::Adder<int64_t> g_column_pool_total_central_bytes("column_pool", "total_central_bytes");
#ifndef BE_TEST
#define UPDATE_BVAR(bvar_name, value) (bvar_name) << (value)
#else
#define UPDATE_BVAR(bvar_name, value) (void)(value) /* avoid compiler warning: unused variable */
#endif
template <typename T>
struct ColumnPoolBlockSize {
static const size_t value = 256;
};
template <>
struct ColumnPoolBlockSize<BinaryColumn> {
static const size_t value = 128;
};
template <typename T>
int64_t column_bytes(const T* col) {
static_assert(std::is_base_of_v<Column, T>, "must_derived_of_Column");
return col->memory_usage();
}
template <typename T>
void reset_column(T* col) {
static_assert(std::is_base_of_v<Column, T>, "must_derived_of_Column");
col->reset_column();
DCHECK(col->size() == 0);
DCHECK(col->delete_state() == DEL_NOT_SATISFIED);
}
// Returns the number of bytes freed to tcmalloc.
template <typename T>
size_t release_column_if_large(T* col, size_t limit) {
auto old_usage = column_bytes(col);
if (old_usage < limit) {
return 0;
}
if constexpr (std::is_same_v<BinaryColumn, T>) {
auto& bytes = col->get_bytes();
BinaryColumn::Bytes tmp;
tmp.swap(bytes);
} else {
typename T::Container tmp;
tmp.swap(col->get_data());
}
auto new_usage = column_bytes(col);
DCHECK_LT(new_usage, old_usage);
return old_usage - new_usage;
}
template <typename T>
size_t column_reserved_size(const T* col) {
if constexpr (std::is_same_v<BinaryColumn, T>) {
const auto& offsets = col->get_offset();
return offsets.capacity() > 0 ? offsets.capacity() - 1 : 0;
} else {
return col->get_data().capacity();
}
}
template <typename T, size_t NITEM>
struct ColumnPoolFreeBlock {
int64_t nfree;
int64_t bytes;
T* ptrs[NITEM];
};
struct ColumnPoolInfo {
size_t local_cnt = 0;
size_t central_free_items = 0;
size_t central_free_bytes = 0;
};
template <typename T>
class CACHELINE_ALIGNED ColumnPool {
static constexpr size_t kBlockSize = ColumnPoolBlockSize<T>::value;
using FreeBlock = ColumnPoolFreeBlock<T, kBlockSize>;
using DynamicFreeBlock = ColumnPoolFreeBlock<T, 0>;
class CACHELINE_ALIGNED LocalPool {
static_assert(std::is_base_of<Column, T>::value, "Must_be_derived_of_Column");
public:
explicit LocalPool(ColumnPool* pool) : _pool(pool) {
_curr_free.nfree = 0;
_curr_free.bytes = 0;
}
~LocalPool() {
auto freed_bytes = _curr_free.bytes;
if (_curr_free.nfree > 0 && !_pool->_push_free_block(_curr_free)) {
for (size_t i = 0; i < _curr_free.nfree; i++) {
ASAN_UNPOISON_MEMORY_REGION(_curr_free.ptrs[i], sizeof(T));
delete _curr_free.ptrs[i];
}
}
UPDATE_BVAR(g_column_pool_total_local_bytes, -freed_bytes);
_pool->_clear_from_destructor_of_local_pool();
}
inline T* get_object() {
if (_curr_free.nfree == 0) {
if (_pool->_pop_free_block(&_curr_free)) {
UPDATE_BVAR(g_column_pool_total_local_bytes, _curr_free.bytes);
} else {
return nullptr;
}
}
T* obj = _curr_free.ptrs[--_curr_free.nfree];
ASAN_UNPOISON_MEMORY_REGION(obj, sizeof(T));
auto bytes = column_bytes(obj);
_curr_free.bytes -= bytes;
UPDATE_BVAR(g_column_pool_total_local_bytes, -bytes);
return obj;
}
inline void return_object(T* ptr) {
if (UNLIKELY(column_reserved_size(ptr) > config::vector_chunk_size)) {
UPDATE_BVAR(g_column_pool_oversized_columns, 1);
delete ptr;
return;
}
auto bytes = column_bytes(ptr);
if (_curr_free.nfree < kBlockSize) {
ASAN_POISON_MEMORY_REGION(ptr, sizeof(T));
_curr_free.ptrs[_curr_free.nfree++] = ptr;
_curr_free.bytes += bytes;
UPDATE_BVAR(g_column_pool_total_local_bytes, bytes);
return;
}
if (_pool->_push_free_block(_curr_free)) {
ASAN_POISON_MEMORY_REGION(ptr, sizeof(T));
UPDATE_BVAR(g_column_pool_total_local_bytes, -_curr_free.bytes);
_curr_free.nfree = 1;
_curr_free.ptrs[0] = ptr;
_curr_free.bytes = bytes;
UPDATE_BVAR(g_column_pool_total_local_bytes, bytes);
return;
}
delete ptr;
}
inline void release_large_columns(size_t limit) {
size_t freed_bytes = 0;
for (size_t i = 0; i < _curr_free.nfree; i++) {
ASAN_UNPOISON_MEMORY_REGION(_curr_free.ptrs[i], sizeof(T));
freed_bytes += release_column_if_large(_curr_free.ptrs[i], limit);
ASAN_POISON_MEMORY_REGION(_curr_free.ptrs[i], sizeof(T));
}
_curr_free.bytes -= freed_bytes;
UPDATE_BVAR(g_column_pool_total_local_bytes, -freed_bytes);
}
static inline void delete_local_pool(void* arg) { delete (LocalPool*)arg; }
private:
ColumnPool* _pool;
FreeBlock _curr_free;
};
public:
static std::enable_if_t<std::is_default_constructible_v<T>, ColumnPool*> singleton() {
static ColumnPool p;
return &p;
}
template <bool AllocOnEmpty = true>
inline T* get_column() {
LocalPool* lp = _get_or_new_local_pool();
if (UNLIKELY(lp == nullptr)) {
return nullptr;
}
T* ptr = lp->get_object();
if (ptr == nullptr && AllocOnEmpty) {
ptr = new (std::nothrow) T();
}
return ptr;
}
inline void return_column(T* ptr) {
LocalPool* lp = _get_or_new_local_pool();
if (LIKELY(lp != nullptr)) {
reset_column(ptr);
lp->return_object(ptr);
} else {
delete ptr;
}
}
// Destroy some objects in the *central* free list.
// Returns the number of bytes freed to tcmalloc.
inline size_t release_free_columns(float free_ratio) {
free_ratio = std::min<float>(free_ratio, 1.0);
int64_t now = butil::gettimeofday_s();
std::vector<DynamicFreeBlock*> tmp;
if (now - _first_push_time > 3) {
// ^^^^^^^^^^^^^^^^ read without lock by intention.
std::lock_guard<std::mutex> l(_free_blocks_lock);
int n = implicit_cast<int>(_free_blocks.size() * (1 - free_ratio));
tmp.insert(tmp.end(), _free_blocks.begin() + n, _free_blocks.end());
_free_blocks.resize(n);
}
size_t freed_bytes = 0;
for (DynamicFreeBlock* blk : tmp) {
freed_bytes += blk->bytes;
_release_free_block(blk);
}
UPDATE_BVAR(g_column_pool_total_central_bytes, -freed_bytes);
return freed_bytes;
}
// Reduce memory usage on behalf of column if its memory usage is greater
// than or equal to |limit|.
inline void release_large_columns(size_t limit) {
LocalPool* lp = _local_pool;
if (lp) {
lp->release_large_columns(limit);
}
}
inline void clear_columns() {
LocalPool* lp = _local_pool;
if (lp) {
_local_pool = nullptr;
butil::thread_atexit_cancel(LocalPool::delete_local_pool, lp);
delete lp;
}
}
inline ColumnPoolInfo describe_column_pool() {
ColumnPoolInfo info;
info.local_cnt = _nlocal.load(std::memory_order_relaxed);
if (_free_blocks.empty()) {
return info;
}
std::lock_guard<std::mutex> l(_free_blocks_lock);
for (DynamicFreeBlock* blk : _free_blocks) {
info.central_free_items += blk->nfree;
info.central_free_bytes += blk->bytes;
}
return info;
}
private:
static inline void _release_free_block(DynamicFreeBlock* blk) {
for (size_t i = 0; i < blk->nfree; i++) {
T* p = blk->ptrs[i];
ASAN_UNPOISON_MEMORY_REGION(p, sizeof(T));
delete p;
}
free(blk);
}
inline LocalPool* _get_or_new_local_pool() {
LocalPool* lp = _local_pool;
if (LIKELY(lp != nullptr)) {
return lp;
}
lp = new (std::nothrow) LocalPool(this);
if (nullptr == lp) {
return nullptr;
}
std::lock_guard<std::mutex> l(_change_thread_mutex); //avoid race with clear_columns()
_local_pool = lp;
(void)butil::thread_atexit(LocalPool::delete_local_pool, lp);
_nlocal.fetch_add(1, std::memory_order_relaxed);
return lp;
}
inline void _clear_from_destructor_of_local_pool() {
_local_pool = nullptr;
// Do nothing if there are active threads.
if (_nlocal.fetch_sub(1, std::memory_order_relaxed) != 1) {
return;
}
std::lock_guard<std::mutex> l(_change_thread_mutex); // including acquire fence.
// Do nothing if there are active threads.
if (_nlocal.load(std::memory_order_relaxed) != 0) {
return;
}
// All threads exited and we're holding _change_thread_mutex to avoid
// racing with new threads calling get_column().
// Clear global free list.
_first_push_time = 0;
release_free_columns(1.0);
}
inline bool _push_free_block(const FreeBlock& blk) {
DynamicFreeBlock* p =
(DynamicFreeBlock*)malloc(offsetof(DynamicFreeBlock, ptrs) + sizeof(*blk.ptrs) * blk.nfree);
if (UNLIKELY(p == nullptr)) {
return false;
}
UPDATE_BVAR(g_column_pool_total_central_bytes, blk.bytes);
p->nfree = blk.nfree;
p->bytes = blk.bytes;
memcpy(p->ptrs, blk.ptrs, sizeof(*blk.ptrs) * blk.nfree);
std::lock_guard<std::mutex> l(_free_blocks_lock);
_first_push_time = _free_blocks.empty() ? butil::gettimeofday_s() : _first_push_time;
_free_blocks.push_back(p);
return true;
}
inline bool _pop_free_block(FreeBlock* blk) {
if (_free_blocks.empty()) {
return false;
}
_free_blocks_lock.lock();
if (_free_blocks.empty()) {
_free_blocks_lock.unlock();
return false;
}
DynamicFreeBlock* p = _free_blocks.back();
_free_blocks.pop_back();
_free_blocks_lock.unlock();
memcpy(blk->ptrs, p->ptrs, sizeof(*p->ptrs) * p->nfree);
blk->nfree = p->nfree;
blk->bytes = p->bytes;
free(p);
UPDATE_BVAR(g_column_pool_total_central_bytes, -blk->bytes);
return true;
}
private:
ColumnPool() { _free_blocks.reserve(32); }
~ColumnPool() = default;
static __thread LocalPool* _local_pool; // NOLINT
static std::atomic<long> _nlocal; // NOLINT
static std::mutex _change_thread_mutex; // NOLINT
mutable std::mutex _free_blocks_lock;
std::vector<DynamicFreeBlock*> _free_blocks;
int64_t _first_push_time = 0;
};
using ColumnPoolList =
TypeList<ColumnPool<Int8Column>, ColumnPool<UInt8Column>, ColumnPool<Int16Column>, ColumnPool<Int32Column>,
ColumnPool<UInt32Column>, ColumnPool<Int64Column>, ColumnPool<Int128Column>, ColumnPool<FloatColumn>,
ColumnPool<DoubleColumn>, ColumnPool<BinaryColumn>, ColumnPool<DateColumn>,
ColumnPool<TimestampColumn>, ColumnPool<DecimalColumn>, ColumnPool<Decimal32Column>,
ColumnPool<Decimal64Column>, ColumnPool<Decimal128Column>>;
template <typename T>
__thread typename ColumnPool<T>::LocalPool* ColumnPool<T>::_local_pool = nullptr; // NOLINT
template <typename T>
std::atomic<long> ColumnPool<T>::_nlocal = 0; // NOLINT
template <typename T>
std::mutex ColumnPool<T>::_change_thread_mutex{}; // NOLINT
template <typename T, bool AllocateOnEmpty = true>
inline T* get_column() {
static_assert(InList<ColumnPool<T>, ColumnPoolList>::value, "Cannot use column pool");
return ColumnPool<T>::singleton()->template get_column<AllocateOnEmpty>();
}
template <typename T>
inline void return_column(T* ptr) {
static_assert(InList<ColumnPool<T>, ColumnPoolList>::value, "Cannot use column pool");
ColumnPool<T>::singleton()->return_column(ptr);
}
template <typename T>
inline void release_large_columns(size_t limit) {
static_assert(InList<ColumnPool<T>, ColumnPoolList>::value, "Cannot use column pool");
ColumnPool<T>::singleton()->release_large_columns(limit);
}
template <typename T>
inline size_t release_free_columns(float ratio) {
static_assert(InList<ColumnPool<T>, ColumnPoolList>::value, "Cannot use column pool");
return ColumnPool<T>::singleton()->release_free_columns(ratio);
}
// NOTE: this is an expensive routine, so it should not be called too often.
template <typename T>
inline ColumnPoolInfo describe_column_pool() {
static_assert(InList<ColumnPool<T>, ColumnPoolList>::value, "Cannot use column pool");
return ColumnPool<T>::singleton()->describe_column_pool();
}
// Used in tests.
template <typename T>
inline void clear_columns() {
ColumnPool<T>::singleton()->clear_columns();
}
template <typename T>
struct HasColumnPool : public std::bool_constant<InList<ColumnPool<T>, ColumnPoolList>::value> {};
namespace detail {
struct ClearColumnPool {
template <typename Pool>
void operator()() {
Pool::singleton()->clear_columns();
}
};
} // namespace detail
inline void TEST_clear_all_columns_this_thread() {
ForEach<ColumnPoolList>(detail::ClearColumnPool());
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,87 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/column_helper.h"
namespace starrocks {
namespace vectorized {
/**
* Wrap a column, support :
* 1.column type auto conversion.
* 2.unpack complex column.
* 3.expand const column value.
*
* example like:
* ColumnViewer<TYPE_INT> c = ColumnViewer(IntColumn);
* ColumnViewer<TYPE_INT> c = ColumnViewer(ConstColumn<IntColumn>);
* ColumnViewer<TYPE_INT> c = ColumnViewer(NullableColumn<IntColumn>);
* will produce function:
* int value(size_t row_idx);
* bool is_null(size_t row_idx);
*
* @tparam Type
*/
static inline size_t not_const_mask(const ColumnPtr& column) {
return !column->only_null() && !column->is_constant() ? -1 : 0;
}
static inline size_t null_mask(const ColumnPtr& column) {
return !column->only_null() && !column->is_constant() && column->is_nullable() ? -1 : 0;
}
template <PrimitiveType Type>
class ColumnViewer {
public:
explicit ColumnViewer(const ColumnPtr& column)
: _not_const_mask(not_const_mask(column)), _null_mask(null_mask(column)) {
if (column->only_null()) {
_null_column = ColumnHelper::one_size_null_column;
_column = RunTimeColumnType<Type>::create();
_column->append_default();
} else if (column->is_constant()) {
auto v = ColumnHelper::as_raw_column<ConstColumn>(column);
_column = ColumnHelper::cast_to<Type>(v->data_column());
_null_column = ColumnHelper::one_size_not_null_column;
} else if (column->is_nullable()) {
auto v = ColumnHelper::as_raw_column<NullableColumn>(column);
_column = ColumnHelper::cast_to<Type>(v->data_column());
_null_column = ColumnHelper::as_column<NullColumn>(v->null_column());
} else {
_column = ColumnHelper::cast_to<Type>(column);
_null_column = ColumnHelper::one_size_not_null_column;
}
_data = _column->get_data().data();
_null_data = _null_column->get_data().data();
}
inline const RunTimeCppType<Type> value(const size_t idx) const { return _data[idx & _not_const_mask]; }
inline const bool is_null(const size_t idx) const { return _null_data[idx & _null_mask]; }
inline size_t size() const { return _column->size(); }
inline const NullColumnPtr& null_column() const { return _null_column; };
inline typename RunTimeColumnType<Type>::Ptr column() const { return _column; };
private:
// column ptr
typename RunTimeColumnType<Type>::Ptr _column;
NullColumnPtr _null_column;
// raw pointer
RunTimeCppType<Type>* _data;
NullColumn::ValueType* _null_data;
const size_t _not_const_mask;
const size_t _null_mask;
};
} // namespace vectorized
} // namespace starrocks

View File

@ -0,0 +1,74 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/const_column.h"
#include <algorithm>
#include "column/column_helper.h"
#include "simd/simd.h"
#include "util/coding.h"
namespace starrocks::vectorized {
ConstColumn::ConstColumn(ColumnPtr data) : _data(std::move(data)), _size(0) {
DCHECK(!_data->is_constant());
}
ConstColumn::ConstColumn(ColumnPtr data, size_t size) : _data(std::move(data)), _size(size) {
DCHECK(!_data->is_constant());
}
void ConstColumn::append(const Column& src, size_t offset, size_t count) {
if (_size == 0) {
const auto& src_column = down_cast<const ConstColumn&>(src);
_data->append(*src_column.data_column(), 0, 1);
}
_size += count;
}
void ConstColumn::append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) {
append(src, indexes[from], size);
}
void ConstColumn::append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) {
append(src, index, size);
}
void ConstColumn::fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
DCHECK(_size > 0);
for (uint16_t i = from; i < to; ++i) {
_data->fvn_hash(&hash[i], 0, 1);
}
}
void ConstColumn::crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
DCHECK(false) << "Const column shouldn't call crc32 hash";
}
size_t ConstColumn::filter_range(const Column::Filter& filter, size_t from, size_t to) {
size_t count = SIMD::count_nonzero(&filter[from], to - from);
this->resize(from + count);
return from + count;
}
int ConstColumn::compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const {
DCHECK(rhs.is_constant());
const auto& rhs_data = static_cast<const ConstColumn&>(rhs)._data;
return _data->compare_at(0, 0, *rhs_data, nan_direction_hint);
}
uint8_t* ConstColumn::serialize_column(uint8_t* dst) {
encode_fixed64_le(dst, _size);
dst += sizeof(size_t);
return _data->serialize_column(dst);
}
const uint8_t* ConstColumn::deserialize_column(const uint8_t* src) {
_size = decode_fixed64_le(src);
src += sizeof(size_t);
return _data->deserialize_column(src);
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,225 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/column.h"
#include "common/logging.h"
namespace starrocks::vectorized {
class ConstColumn final : public ColumnFactory<Column, ConstColumn> {
friend class ColumnFactory<Column, ConstColumn>;
public:
explicit ConstColumn(ColumnPtr data_column);
ConstColumn(ColumnPtr data_column, size_t size);
// Copy constructor
ConstColumn(const ConstColumn& rhs) : _data(rhs._data->clone_shared()), _size(rhs._size) {}
// Move constructor
ConstColumn(ConstColumn&& rhs) noexcept : _data(std::move(rhs._data)), _size(rhs._size) {}
// Copy assignment
ConstColumn& operator=(const ConstColumn& rhs) {
ConstColumn tmp(rhs);
this->swap_column(tmp);
return *this;
}
// Move assignment
ConstColumn& operator=(ConstColumn&& rhs) {
ConstColumn tmp(std::move(rhs));
this->swap_column(tmp);
return *this;
}
~ConstColumn() override = default;
bool is_nullable() const override { return _data->is_nullable(); }
bool is_null(size_t index) const override { return _data->is_null(0); }
bool only_null() const override { return _data->is_nullable(); }
bool has_null() const override { return _data->has_null(); }
bool is_constant() const override { return true; }
bool low_cardinality() const override { return false; }
const uint8_t* raw_data() const override { return _data->raw_data(); }
uint8_t* mutable_raw_data() override { return reinterpret_cast<uint8_t*>(_data->mutable_raw_data()); }
size_t size() const override { return _size; }
size_t type_size() const override { return _data->type_size(); }
size_t byte_size() const override { return _data->byte_size() + sizeof(_size); }
// const column has only one element
size_t byte_size(size_t from, size_t size) const override { return byte_size(); }
size_t byte_size(size_t idx) const override { return _data->byte_size(0); }
void reserve(size_t n) override {}
void resize(size_t n) override { _size = n; }
// This method resize the underlying data column,
// Because when sometimes(agg functions), we want to handle const column as normal data column
void assign(size_t n, size_t idx) override {
_size = n;
_data->assign(n, 0);
}
void remove_first_n_values(size_t count) override { _size = std::max<size_t>(1, _size - count); }
void append_datum(const Datum& datum) override {
if (_size == 0) {
_data->resize(0);
_data->append_datum(datum);
}
_size++;
}
void append(const Column& src, size_t offset, size_t count) override;
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
bool append_nulls(size_t count) override {
if (_data->is_nullable()) {
bool ok = true;
if (_size == 0) {
ok = _data->append_nulls(1);
}
_size += ok;
return ok;
} else {
return false;
}
}
bool append_strings(const std::vector<Slice>& strs) override { return false; }
size_t append_numbers(const void* buff, size_t length) override { return -1; }
void append_value_multiple_times(const void* value, size_t count) override {
if (_size == 0 && count > 0) {
_data->append_value_multiple_times(value, 1);
}
}
void append_default() override { _size++; }
void append_default(size_t count) override { _size += count; }
uint32_t serialize(size_t idx, uint8_t* pos) override { return _data->serialize(0, pos); }
uint32_t serialize_default(uint8_t* pos) override { return _data->serialize_default(pos); }
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override {
for (size_t i = 0; i < chunk_size; ++i) {
slice_sizes[i] += _data->serialize(0, dst + i * max_one_row_size + slice_sizes[i]);
}
}
const uint8_t* deserialize_and_append(const uint8_t* pos) override {
++_size;
if (_data->empty()) {
return _data->deserialize_and_append(pos);
}
// Note: we must update the pos
return pos + _data->serialize_size(0);
}
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override {
_size += batch_size;
if (_data->empty()) {
_data->deserialize_and_append((uint8_t*)srcs[0].data);
}
uint32_t serialize_size = _data->serialize_size(0);
// Note: we must update the pos
for (size_t i = 0; i < batch_size; i++) {
srcs[0].data = srcs[0].data + serialize_size;
}
}
uint32_t max_one_element_serialize_size() const override { return _data->max_one_element_serialize_size(); }
uint32_t serialize_size(size_t idx) const override { return _data->serialize_size(0); }
size_t serialize_size() const override { return _data->serialize_size() + sizeof(size_t); }
uint8_t* serialize_column(uint8_t* dst) override;
const uint8_t* deserialize_column(const uint8_t* src) override;
MutableColumnPtr clone_empty() const override { return create_mutable(_data->clone_empty(), 0); }
size_t filter_range(const Column::Filter& filter, size_t from, size_t to) override;
int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const override;
void fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override { _data->put_mysql_row_buffer(buf, 0); }
std::string get_name() const override { return "const-" + _data->get_name(); }
ColumnPtr* mutable_data_column() { return &_data; }
const ColumnPtr& data_column() const { return _data; }
Datum get(size_t n __attribute__((unused))) const override { return _data->get(0); }
size_t memory_usage() const override { return _data->memory_usage() + sizeof(size_t); }
size_t shrink_memory_usage() const override { return _data->shrink_memory_usage() + sizeof(size_t); }
size_t container_memory_usage() const override { return _data->container_memory_usage(); }
size_t element_memory_usage() const override { return _data->element_memory_usage(); }
size_t element_memory_usage(size_t from, size_t size) const override {
// const column has only one element
return element_memory_usage();
}
void swap_column(Column& rhs) override {
auto& r = down_cast<ConstColumn&>(rhs);
_data->swap_column(*r._data);
std::swap(_delete_state, r._delete_state);
std::swap(_size, r._size);
}
void reset_column() override {
Column::reset_column();
_data->reset_column();
_size = 0;
}
std::string debug_item(uint32_t idx) const override {
std::stringstream ss;
ss << "CONST: " << _data->debug_item(0);
return ss.str();
}
std::string debug_string() const override {
std::stringstream ss;
ss << "CONST: " << _data->debug_item(0) << " Size : " << _size;
return ss.str();
}
private:
ColumnPtr _data;
size_t _size;
};
} // namespace starrocks::vectorized

135
be/src/column/datum.cpp Normal file
View File

@ -0,0 +1,135 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/datum.h"
#include "gutil/strings/substitute.h"
#include "runtime/mem_pool.h"
#include "storage/types.h"
namespace starrocks::vectorized {
using strings::Substitute;
template <FieldType type>
Status datum_from_string(Datum* dst, const std::string& str) {
static TypeInfoPtr type_info = get_type_info(type);
typename CppTypeTraits<type>::CppType value;
if (type_info->from_string(&value, str) != OLAP_SUCCESS) {
return Status::InvalidArgument(Substitute("Failed to convert $0 to type $1", str, type));
}
dst->set(value);
return Status::OK();
}
Status datum_from_string(Datum* dst, FieldType type, const std::string& str, MemPool* mem_pool) {
switch (type) {
case OLAP_FIELD_TYPE_BOOL: {
static TypeInfoPtr type_info = get_type_info(OLAP_FIELD_TYPE_BOOL);
bool v;
auto st = type_info->from_string(&v, str);
if (st != OLAP_SUCCESS) {
return Status::InvalidArgument(Substitute("Failed to conert $0 to Bool type", str));
}
dst->set_int8(v);
return Status::OK();
}
case OLAP_FIELD_TYPE_TINYINT:
return datum_from_string<OLAP_FIELD_TYPE_TINYINT>(dst, str);
case OLAP_FIELD_TYPE_SMALLINT:
return datum_from_string<OLAP_FIELD_TYPE_SMALLINT>(dst, str);
case OLAP_FIELD_TYPE_INT:
return datum_from_string<OLAP_FIELD_TYPE_INT>(dst, str);
case OLAP_FIELD_TYPE_BIGINT:
return datum_from_string<OLAP_FIELD_TYPE_BIGINT>(dst, str);
case OLAP_FIELD_TYPE_LARGEINT:
return datum_from_string<OLAP_FIELD_TYPE_LARGEINT>(dst, str);
case OLAP_FIELD_TYPE_FLOAT:
return datum_from_string<OLAP_FIELD_TYPE_FLOAT>(dst, str);
case OLAP_FIELD_TYPE_DOUBLE:
return datum_from_string<OLAP_FIELD_TYPE_DOUBLE>(dst, str);
case OLAP_FIELD_TYPE_DATE:
return datum_from_string<OLAP_FIELD_TYPE_DATE>(dst, str);
case OLAP_FIELD_TYPE_DATE_V2:
return datum_from_string<OLAP_FIELD_TYPE_DATE_V2>(dst, str);
case OLAP_FIELD_TYPE_DATETIME:
return datum_from_string<OLAP_FIELD_TYPE_DATETIME>(dst, str);
case OLAP_FIELD_TYPE_TIMESTAMP:
return datum_from_string<OLAP_FIELD_TYPE_TIMESTAMP>(dst, str);
case OLAP_FIELD_TYPE_DECIMAL:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL>(dst, str);
case OLAP_FIELD_TYPE_DECIMAL_V2:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL_V2>(dst, str);
case OLAP_FIELD_TYPE_CHAR:
case OLAP_FIELD_TYPE_VARCHAR: {
/* Type need memory allocated */
Slice slice;
slice.size = str.size();
if (mem_pool == nullptr) {
slice.data = (char*)str.data();
} else {
slice.data = reinterpret_cast<char*>(mem_pool->allocate(slice.size));
memcpy(slice.data, str.data(), slice.size);
}
// If type is OLAP_FIELD_TYPE_CHAR, strip its tailing '\0'
if (type == OLAP_FIELD_TYPE_CHAR) {
slice.size = strnlen(slice.data, slice.size);
}
dst->set_slice(slice);
break;
}
default:
return Status::NotSupported(Substitute("Type $0 not supported", type));
}
return Status::OK();
}
template <typename T, FieldType type>
std::string datum_to_string(const T& datum) {
using CppType = typename CppTypeTraits<type>::CppType;
static TypeInfoPtr type_info = get_type_info(type);
auto value = datum.template get<CppType>();
return type_info->to_string(&value);
}
std::string datum_to_string(const Datum& datum, FieldType type) {
if (datum.is_null()) {
return "null";
}
switch (type) {
case OLAP_FIELD_TYPE_BOOL:
case OLAP_FIELD_TYPE_TINYINT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_TINYINT>(datum);
case OLAP_FIELD_TYPE_SMALLINT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_SMALLINT>(datum);
case OLAP_FIELD_TYPE_INT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_INT>(datum);
case OLAP_FIELD_TYPE_BIGINT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_BIGINT>(datum);
case OLAP_FIELD_TYPE_LARGEINT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_LARGEINT>(datum);
case OLAP_FIELD_TYPE_FLOAT:
return datum_to_string<Datum, OLAP_FIELD_TYPE_FLOAT>(datum);
case OLAP_FIELD_TYPE_DOUBLE:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DOUBLE>(datum);
case OLAP_FIELD_TYPE_DATE:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DATE>(datum);
case OLAP_FIELD_TYPE_DATE_V2:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DATE_V2>(datum);
case OLAP_FIELD_TYPE_DATETIME:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DATETIME>(datum);
case OLAP_FIELD_TYPE_TIMESTAMP:
return datum_to_string<Datum, OLAP_FIELD_TYPE_TIMESTAMP>(datum);
case OLAP_FIELD_TYPE_DECIMAL:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DECIMAL>(datum);
case OLAP_FIELD_TYPE_DECIMAL_V2:
return datum_to_string<Datum, OLAP_FIELD_TYPE_DECIMAL_V2>(datum);
case OLAP_FIELD_TYPE_CHAR:
case OLAP_FIELD_TYPE_VARCHAR:
return datum_to_string<Datum, OLAP_FIELD_TYPE_VARCHAR>(datum);
default:
return "";
}
}
} // namespace starrocks::vectorized

127
be/src/column/datum.h Normal file
View File

@ -0,0 +1,127 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <variant>
#include "runtime/date_value.h"
#include "runtime/decimalv2_value.h"
#include "runtime/timestamp_value.h"
#include "storage/decimal12.h"
#include "storage/hll.h"
#include "storage/uint24.h"
#include "util/int96.h"
#include "util/percentile_value.h"
#include "util/slice.h"
namespace starrocks {
class MemPool;
class Status;
class BitmapValue;
} // namespace starrocks
namespace starrocks::vectorized {
typedef __int128 int128_t;
typedef unsigned __int128 uint128_t;
class Datum;
using DatumArray = std::vector<Datum>;
class Datum {
public:
Datum() = default;
template <typename T>
Datum(T value) {
set(value);
}
int8_t get_int8() const { return get<int8_t>(); }
int16_t get_int16() const { return get<int16_t>(); }
uint24_t get_uint24() const { return get<uint24_t>(); }
int96_t get_int96() const { return get<int96_t>(); }
int32_t get_int32() const { return get<int32_t>(); }
int64_t get_int64() const { return get<int64_t>(); }
float get_float() const { return get<float>(); }
double get_double() const { return get<double>(); }
TimestampValue get_timestamp() const { return get<TimestampValue>(); }
DateValue get_date() const { return get<DateValue>(); }
const Slice& get_slice() const { return get<Slice>(); }
const int128_t& get_int128() const { return get<int128_t>(); }
const decimal12_t& get_decimal12() const { return get<decimal12_t>(); }
const DecimalV2Value& get_decimal() const { return get<DecimalV2Value>(); }
const DatumArray& get_array() const { return get<DatumArray>(); }
const HyperLogLog* get_hyperloglog() const { return get<HyperLogLog*>(); }
const BitmapValue* get_bitmap() const { return get<BitmapValue*>(); }
const PercentileValue* get_percentile() const { return get<PercentileValue*>(); }
void set_int8(int8_t v) { set<decltype(v)>(v); }
void set_uint8(uint8_t v) { set<decltype(v)>(v); }
void set_int16(int16_t v) { set<decltype(v)>(v); }
void set_uint16(uint16_t v) { set<decltype(v)>(v); }
void set_uint24(uint24_t v) { set<decltype(v)>(v); }
void set_int32(int32_t v) { set<decltype(v)>(v); }
void set_uint32(uint32_t v) { set<decltype(v)>(v); }
void set_int64(int64_t v) { set<decltype(v)>(v); }
void set_uint64(uint64_t v) { set<decltype(v)>(v); }
void set_int96(int96_t v) { set<decltype(v)>(v); }
void set_float(float v) { set<decltype(v)>(v); }
void set_double(double v) { set<decltype(v)>(v); }
void set_timestamp(TimestampValue v) { set<decltype(v)>(v); }
void set_date(DateValue v) { set<decltype(v)>(v); }
void set_int128(const int128_t& v) { set<decltype(v)>(v); }
void set_slice(const Slice& v) { set<decltype(v)>(v); }
void set_decimal12(const decimal12_t& v) { set<decltype(v)>(v); }
void set_decimal(const DecimalV2Value& v) { set<decltype(v)>(v); }
void set_array(const DatumArray& v) { set<decltype(v)>(v); }
void set_hyperloglog(HyperLogLog* v) { set<decltype(v)>(v); }
void set_bitmap(BitmapValue* v) { set<decltype(v)>(v); }
void set_percentile(PercentileValue* v) { set<decltype(v)>(v); }
template <typename T>
const T& get() const {
if constexpr (std::is_same_v<DateValue, T>) {
static_assert(sizeof(DateValue) == sizeof(int32_t));
return reinterpret_cast<const T&>(std::get<int32_t>(_value));
} else if constexpr (std::is_same_v<TimestampValue, T>) {
static_assert(sizeof(TimestampValue) == sizeof(int64_t));
return reinterpret_cast<const T&>(std::get<int64_t>(_value));
} else if constexpr (std::is_same_v<bool, T>) {
return reinterpret_cast<const T&>(std::get<int8_t>(_value));
} else if constexpr (std::is_unsigned_v<T>) {
return reinterpret_cast<const T&>(std::get<std::make_signed_t<T>>(_value));
} else {
return std::get<T>(_value);
}
}
template <typename T>
void set(T value) {
if constexpr (std::is_same_v<DateValue, T>) {
_value = value.julian();
} else if constexpr (std::is_same_v<TimestampValue, T>) {
_value = value.timestamp();
} else if constexpr (std::is_same_v<bool, T>) {
_value = (int8_t)value;
} else if constexpr (std::is_unsigned_v<T>) {
_value = (std::make_signed_t<T>)value;
} else {
_value = value;
}
}
bool is_null() const { return _value.index() == 0; }
void set_null() { _value = std::monostate(); }
private:
using Variant = std::variant<std::monostate, int8_t, int16_t, uint24_t, int32_t, int64_t, int96_t, int128_t, Slice,
decimal12_t, DecimalV2Value, float, double, DatumArray, HyperLogLog*, BitmapValue*,
PercentileValue*>;
Variant _value;
};
static const Datum kNullDatum{};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,148 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/datum_convert.h"
#include "gutil/strings/substitute.h"
#include "runtime/mem_pool.h"
namespace starrocks::vectorized {
using strings::Substitute;
template <FieldType TYPE>
Status datum_from_string(TypeInfo* type_info, Datum* dst, const std::string& str) {
typename CppTypeTraits<TYPE>::CppType value;
if (type_info->from_string(&value, str) != OLAP_SUCCESS) {
return Status::InvalidArgument(Substitute("Failed to convert $0 to type $1", str, TYPE));
}
dst->set(value);
return Status::OK();
}
Status datum_from_string(TypeInfo* type_info, Datum* dst, const std::string& str, MemPool* mem_pool) {
const auto type = type_info->type();
switch (type) {
case OLAP_FIELD_TYPE_BOOL: {
bool v;
auto st = type_info->from_string(&v, str);
if (st != OLAP_SUCCESS) {
return Status::InvalidArgument(Substitute("Failed to conert $0 to Bool type", str));
}
dst->set_int8(v);
return Status::OK();
}
case OLAP_FIELD_TYPE_TINYINT:
return datum_from_string<OLAP_FIELD_TYPE_TINYINT>(type_info, dst, str);
case OLAP_FIELD_TYPE_SMALLINT:
return datum_from_string<OLAP_FIELD_TYPE_SMALLINT>(type_info, dst, str);
case OLAP_FIELD_TYPE_INT:
return datum_from_string<OLAP_FIELD_TYPE_INT>(type_info, dst, str);
case OLAP_FIELD_TYPE_BIGINT:
return datum_from_string<OLAP_FIELD_TYPE_BIGINT>(type_info, dst, str);
case OLAP_FIELD_TYPE_LARGEINT:
return datum_from_string<OLAP_FIELD_TYPE_LARGEINT>(type_info, dst, str);
case OLAP_FIELD_TYPE_FLOAT:
return datum_from_string<OLAP_FIELD_TYPE_FLOAT>(type_info, dst, str);
case OLAP_FIELD_TYPE_DOUBLE:
return datum_from_string<OLAP_FIELD_TYPE_DOUBLE>(type_info, dst, str);
case OLAP_FIELD_TYPE_DATE:
return datum_from_string<OLAP_FIELD_TYPE_DATE>(type_info, dst, str);
case OLAP_FIELD_TYPE_DATE_V2:
return datum_from_string<OLAP_FIELD_TYPE_DATE_V2>(type_info, dst, str);
case OLAP_FIELD_TYPE_DATETIME:
return datum_from_string<OLAP_FIELD_TYPE_DATETIME>(type_info, dst, str);
case OLAP_FIELD_TYPE_TIMESTAMP:
return datum_from_string<OLAP_FIELD_TYPE_TIMESTAMP>(type_info, dst, str);
case OLAP_FIELD_TYPE_DECIMAL:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL>(type_info, dst, str);
case OLAP_FIELD_TYPE_DECIMAL_V2:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL_V2>(type_info, dst, str);
case OLAP_FIELD_TYPE_DECIMAL32:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL32>(type_info, dst, str);
case OLAP_FIELD_TYPE_DECIMAL64:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL64>(type_info, dst, str);
case OLAP_FIELD_TYPE_DECIMAL128:
return datum_from_string<OLAP_FIELD_TYPE_DECIMAL128>(type_info, dst, str);
/* Type need memory allocated */
case OLAP_FIELD_TYPE_CHAR:
case OLAP_FIELD_TYPE_VARCHAR: {
/* Type need memory allocated */
Slice slice;
slice.size = str.size();
if (mem_pool == nullptr) {
slice.data = (char*)str.data();
} else {
slice.data = reinterpret_cast<char*>(mem_pool->allocate(slice.size));
if (UNLIKELY(slice.data == nullptr)) {
return Status::InternalError("Mem usage has exceed the limit of BE");
}
memcpy(slice.data, str.data(), slice.size);
}
// If type is OLAP_FIELD_TYPE_CHAR, strip its tailing '\0'
if (type == OLAP_FIELD_TYPE_CHAR) {
slice.size = strnlen(slice.data, slice.size);
}
dst->set_slice(slice);
break;
}
default:
return Status::NotSupported(Substitute("Type $0 not supported", type));
}
return Status::OK();
}
template <FieldType TYPE>
std::string datum_to_string(TypeInfo* type_info, const Datum& datum) {
using CppType = typename CppTypeTraits<TYPE>::CppType;
auto value = datum.template get<CppType>();
return type_info->to_string(&value);
}
std::string datum_to_string(TypeInfo* type_info, const Datum& datum) {
if (datum.is_null()) {
return "null";
}
const auto type = type_info->type();
switch (type) {
case OLAP_FIELD_TYPE_BOOL:
case OLAP_FIELD_TYPE_TINYINT:
return datum_to_string<OLAP_FIELD_TYPE_TINYINT>(type_info, datum);
case OLAP_FIELD_TYPE_SMALLINT:
return datum_to_string<OLAP_FIELD_TYPE_SMALLINT>(type_info, datum);
case OLAP_FIELD_TYPE_INT:
return datum_to_string<OLAP_FIELD_TYPE_INT>(type_info, datum);
case OLAP_FIELD_TYPE_BIGINT:
return datum_to_string<OLAP_FIELD_TYPE_BIGINT>(type_info, datum);
case OLAP_FIELD_TYPE_LARGEINT:
return datum_to_string<OLAP_FIELD_TYPE_LARGEINT>(type_info, datum);
case OLAP_FIELD_TYPE_FLOAT:
return datum_to_string<OLAP_FIELD_TYPE_FLOAT>(type_info, datum);
case OLAP_FIELD_TYPE_DOUBLE:
return datum_to_string<OLAP_FIELD_TYPE_DOUBLE>(type_info, datum);
case OLAP_FIELD_TYPE_DATE:
return datum_to_string<OLAP_FIELD_TYPE_DATE>(type_info, datum);
case OLAP_FIELD_TYPE_DATE_V2:
return datum_to_string<OLAP_FIELD_TYPE_DATE_V2>(type_info, datum);
case OLAP_FIELD_TYPE_DATETIME:
return datum_to_string<OLAP_FIELD_TYPE_DATETIME>(type_info, datum);
case OLAP_FIELD_TYPE_TIMESTAMP:
return datum_to_string<OLAP_FIELD_TYPE_TIMESTAMP>(type_info, datum);
case OLAP_FIELD_TYPE_DECIMAL:
return datum_to_string<OLAP_FIELD_TYPE_DECIMAL>(type_info, datum);
case OLAP_FIELD_TYPE_DECIMAL_V2:
return datum_to_string<OLAP_FIELD_TYPE_DECIMAL_V2>(type_info, datum);
case OLAP_FIELD_TYPE_DECIMAL32:
return datum_to_string<OLAP_FIELD_TYPE_DECIMAL32>(type_info, datum);
case OLAP_FIELD_TYPE_DECIMAL64:
return datum_to_string<OLAP_FIELD_TYPE_DECIMAL64>(type_info, datum);
case OLAP_FIELD_TYPE_DECIMAL128:
return datum_to_string<OLAP_FIELD_TYPE_DECIMAL128>(type_info, datum);
case OLAP_FIELD_TYPE_CHAR:
case OLAP_FIELD_TYPE_VARCHAR:
return datum_to_string<OLAP_FIELD_TYPE_VARCHAR>(type_info, datum);
default:
return "";
}
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,12 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/datum.h"
#include "storage/types.h"
namespace starrocks::vectorized {
Status datum_from_string(TypeInfo* type_info, Datum* dst, const std::string& str, MemPool* mem_pool);
std::string datum_to_string(TypeInfo* type_info, const Datum& datum);
} // namespace starrocks::vectorized

View File

@ -0,0 +1,21 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/datum_tuple.h"
#include "column/schema.h"
namespace starrocks::vectorized {
int DatumTuple::compare(const Schema& schema, const DatumTuple& rhs) const {
CHECK_EQ(_datums.size(), schema.num_fields());
CHECK_EQ(_datums.size(), rhs._datums.size());
for (size_t i = 0; i < _datums.size(); i++) {
int r = schema.field(i)->type()->cmp(_datums[i], rhs[i]);
if (r != 0) {
return r;
}
}
return 0;
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,40 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <vector>
#include "column/datum.h"
namespace starrocks::vectorized {
class Schema;
class DatumTuple {
public:
DatumTuple() = default;
explicit DatumTuple(std::vector<Datum> datums) : _datums(std::move(datums)) {}
size_t size() const { return _datums.size(); }
void reserve(size_t n) { _datums.reserve(n); }
void append(const Datum& datum) { _datums.emplace_back(datum); }
const Datum& operator[](size_t n) const { return _datums[n]; }
const Datum& get(size_t n) const { return _datums.at(n); }
Datum& operator[](size_t n) { return _datums[n]; }
Datum& get(size_t n) { return _datums.at(n); }
int compare(const Schema& schema, const DatumTuple& rhs) const;
const std::vector<Datum>& datums() const { return _datums; }
private:
std::vector<Datum> _datums;
};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,87 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/decimalv3_column.h"
#include "column/fixed_length_column.h"
namespace starrocks::vectorized {
template <typename T>
DecimalV3Column<T>::DecimalV3Column(int precision, int scale) : _precision(precision), _scale(scale) {
DCHECK(0 <= _scale && _scale <= _precision && _precision <= decimal_precision_limit<T>);
}
template <typename T>
DecimalV3Column<T>::DecimalV3Column(int precision, int scale, size_t num_rows) : DecimalV3Column(precision, scale) {
this->resize(num_rows);
}
template <typename T>
bool DecimalV3Column<T>::is_decimal() const {
return true;
}
template <typename T>
bool DecimalV3Column<T>::is_numeric() const {
return true;
}
template <typename T>
void DecimalV3Column<T>::set_precision(int precision) {
this->_precision = precision;
}
template <typename T>
void DecimalV3Column<T>::set_scale(int scale) {
this->_scale = scale;
}
template <typename T>
int DecimalV3Column<T>::precision() const {
return _precision;
}
template <typename T>
int DecimalV3Column<T>::scale() const {
return _scale;
}
template <typename T>
MutableColumnPtr DecimalV3Column<T>::clone_empty() const {
return this->create_mutable(_precision, _scale);
}
template <typename T>
void DecimalV3Column<T>::put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const {
auto& data = this->get_data();
auto s = DecimalV3Cast::to_string<T>(data[idx], _precision, _scale);
buf->push_decimal(s);
}
template <typename T>
std::string DecimalV3Column<T>::debug_item(uint32_t idx) const {
auto& data = this->get_data();
return DecimalV3Cast::to_string<T>(data[idx], _precision, _scale);
}
template <typename T>
void DecimalV3Column<T>::crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
const auto& data = this->get_data();
// When decimal-v2 columns are used as distribution keys and users try to upgrade
// decimal-v2 column to decimal-v3 by schema change, decimal128(27,9) shall be the
// only acceptable target type, so keeping result of crc32_hash on type decimal128(27,9)
// compatible with type decimal-v2 is required in order to keep data layout consistency.
if constexpr (std::is_same_v<T, int128_t>) {
if (_precision == 27 && _scale == 9) {
for (uint16_t i = from; i < to; ++i) {
auto& decimal_v2_value = (DecimalV2Value&)(data[i]);
int64_t int_val = decimal_v2_value.int_value();
int32_t frac_val = decimal_v2_value.frac_value();
uint32_t seed = HashUtil::zlib_crc_hash(&int_val, sizeof(int_val), hash[i]);
hash[i] = HashUtil::zlib_crc_hash(&frac_val, sizeof(frac_val), seed);
}
return;
}
}
for (uint16_t i = from; i < to; ++i) {
hash[i] = HashUtil::zlib_crc_hash(&data[i], sizeof(T), hash[i]);
}
}
template class DecimalV3Column<int32_t>;
template class DecimalV3Column<int64_t>;
template class DecimalV3Column<int128_t>;
} // namespace starrocks::vectorized

View File

@ -0,0 +1,42 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <runtime/decimalv3.h>
#include "column/column.h"
#include "column/fixed_length_column_base.h"
#include "util/decimal_types.h"
#include "util/mysql_row_buffer.h"
namespace starrocks::vectorized {
template <typename T>
class DecimalV3Column final : public ColumnFactory<FixedLengthColumnBase<T>, DecimalV3Column<DecimalType<T>>, Column> {
public:
DecimalV3Column() = default;
DecimalV3Column(int precision, int scale);
DecimalV3Column(int precision, int scale, size_t num_rows);
DecimalV3Column(DecimalV3Column const&) = default;
DecimalV3Column& operator=(DecimalV3Column const&) = default;
bool is_decimal() const override;
bool is_numeric() const override;
void set_precision(int precision);
void set_scale(int scale);
int precision() const;
int scale() const;
MutableColumnPtr clone_empty() const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const;
std::string debug_item(uint32_t idx) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
private:
int _precision;
int _scale;
};
} // namespace starrocks::vectorized

67
be/src/column/field.cpp Normal file
View File

@ -0,0 +1,67 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/field.h"
#include "column/datum.h"
#include "storage/key_coder.h"
#include "storage/types.h"
#include "storage/vectorized/chunk_helper.h"
namespace starrocks {
namespace vectorized {
Field::Field(ColumnId id, const std::string& name, FieldType type, int precision, int scale, bool nullable)
: _id(id), _name(name), _type(get_type_info(type, precision, scale)), _is_nullable(nullable) {}
FieldPtr Field::with_type(const TypeInfoPtr& type) {
return std::make_shared<Field>(_id, _name, type, _is_nullable);
}
FieldPtr Field::with_name(const std::string& name) {
return std::make_shared<Field>(_id, name, _type, _is_nullable);
}
FieldPtr Field::with_nullable(bool is_nullable) {
return std::make_shared<Field>(_id, _name, _type, is_nullable);
}
FieldPtr Field::copy() const {
return std::make_shared<Field>(*this);
}
void Field::encode_ascending(const Datum& value, std::string* buf) const {
if (_short_key_length > 0) {
const KeyCoder* coder = get_key_coder(_type->type());
coder->encode_ascending(value, _short_key_length, buf);
}
}
void Field::full_encode_ascending(const Datum& value, std::string* buf) const {
const KeyCoder* coder = get_key_coder(_type->type());
coder->full_encode_ascending(value, buf);
}
FieldPtr Field::convert_to(FieldType to_type) const {
FieldPtr new_field = std::make_shared<Field>(*this);
new_field->_type = get_type_info(to_type);
new_field->_short_key_length = new_field->_type->size();
return new_field;
}
ColumnPtr Field::create_column() const {
return ChunkHelper::column_from_field(*this);
}
std::string Field::to_string() const {
std::stringstream os;
os << id() << ":";
os << name() << " ";
os << type()->type() << " ";
os << (is_nullable() ? "NULL" : "NOT NULL");
os << (is_key() ? " KEY" : "");
os << " " << aggregate_method();
return os.str();
}
} // namespace vectorized
} // namespace starrocks

98
be/src/column/field.h Normal file
View File

@ -0,0 +1,98 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <string>
#include "column/vectorized_fwd.h"
#include "storage/types.h"
namespace starrocks::vectorized {
class AggregateInfo;
class Datum;
class Field {
public:
Field(ColumnId id, const std::string& name, FieldType type, int precision, int scale, bool nullable);
Field(ColumnId id, const std::string& name, FieldType type, bool nullable)
: Field(id, name, type, -1, -1, nullable) {
DCHECK(type != OLAP_FIELD_TYPE_DECIMAL32);
DCHECK(type != OLAP_FIELD_TYPE_DECIMAL64);
DCHECK(type != OLAP_FIELD_TYPE_DECIMAL128);
DCHECK(type != OLAP_FIELD_TYPE_ARRAY);
}
Field(ColumnId id, const std::string& name, const TypeInfoPtr& type, bool is_nullable = true)
: _id(id), _name(name), _type(type), _is_nullable(is_nullable) {}
Field(const Field&) = default;
Field(Field&&) = default;
Field& operator=(const Field&) = default;
Field& operator=(Field&&) = default;
virtual ~Field() = default;
// return a copy of this field with the replaced type
FieldPtr with_type(const TypeInfoPtr& type);
// return a copy of this field with the replaced name
FieldPtr with_name(const std::string& name);
// return a copy of this field with the replaced nullability
FieldPtr with_nullable(bool is_nullable);
std::string to_string() const;
ColumnId id() const { return _id; }
const std::string& name() const { return _name; }
const TypeInfoPtr& type() const { return _type; }
bool is_nullable() const { return _is_nullable; }
FieldPtr copy() const;
bool is_key() const { return _is_key; }
void set_is_key(bool is_key) { _is_key = is_key; }
size_t short_key_length() const { return _short_key_length; }
void set_short_key_length(size_t n) { _short_key_length = n; }
// Encode the first |short_key_length| bytes.
void encode_ascending(const Datum& value, std::string* buf) const;
// Encode the full field.
void full_encode_ascending(const Datum& value, std::string* buf) const;
// Status decode_ascending(Slice* encoded_key, uint8_t* cell_ptr, MemPool* pool) const;
void set_aggregate_method(FieldAggregationMethod agg_method) { _agg_method = agg_method; }
starrocks::FieldAggregationMethod aggregate_method() const { return _agg_method; }
FieldPtr convert_to(FieldType to_type) const;
void add_sub_field(const Field& sub_field) { _sub_fields.emplace_back(sub_field); }
const Field& get_sub_field(int i) const { return _sub_fields[i]; }
ColumnPtr create_column() const;
private:
Field() = default;
ColumnId _id = 0;
std::string _name;
TypeInfoPtr _type = nullptr;
starrocks::FieldAggregationMethod _agg_method = OLAP_FIELD_AGGREGATION_NONE;
size_t _short_key_length = 0;
bool _is_nullable = true;
bool _is_key = false;
std::vector<Field> _sub_fields;
};
inline std::ostream& operator<<(std::ostream& os, const Field& field) {
os << field.to_string();
return os;
}
} // namespace starrocks::vectorized

View File

@ -0,0 +1,26 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/fixed_length_column.h"
namespace starrocks::vectorized {
template class FixedLengthColumn<uint8_t>;
template class FixedLengthColumn<uint16_t>;
template class FixedLengthColumn<uint32_t>;
template class FixedLengthColumn<uint64_t>;
template class FixedLengthColumn<int8_t>;
template class FixedLengthColumn<int16_t>;
template class FixedLengthColumn<int32_t>;
template class FixedLengthColumn<int64_t>;
template class FixedLengthColumn<int96_t>;
template class FixedLengthColumn<int128_t>;
template class FixedLengthColumn<float>;
template class FixedLengthColumn<double>;
template class FixedLengthColumn<uint24_t>;
template class FixedLengthColumn<decimal12_t>;
template class FixedLengthColumn<DateValue>;
template class FixedLengthColumn<DecimalV2Value>;
template class FixedLengthColumn<TimestampValue>;
} // namespace starrocks::vectorized

View File

@ -0,0 +1,22 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/fixed_length_column_base.h"
namespace starrocks::vectorized {
template <typename T>
class FixedLengthColumn final : public ColumnFactory<FixedLengthColumnBase<T>, FixedLengthColumn<T>, Column> {
public:
using ValueType = T;
using Container = Buffer<ValueType>;
using SuperClass = ColumnFactory<FixedLengthColumnBase<T>, FixedLengthColumn<T>, Column>;
FixedLengthColumn() = default;
explicit FixedLengthColumn(const size_t n) : SuperClass(n) {}
FixedLengthColumn(const size_t n, const ValueType x) : SuperClass(n, x) {}
FixedLengthColumn(const FixedLengthColumn& src) : SuperClass((const FixedLengthColumnBase<T>&)(src)) {}
MutableColumnPtr clone_empty() const override { return this->create_mutable(); }
};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,268 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include <gutil/strings/fastmem.h>
#include "column/column_helper.h"
#include "column/fixed_length_column.h"
#include "gutil/casts.h"
#include "runtime/large_int_value.h"
#include "storage/decimal12.h"
#include "storage/uint24.h"
#include "util/coding.h"
#include "util/hash_util.hpp"
#include "util/mysql_row_buffer.h"
#include "util/types.h"
namespace starrocks::vectorized {
template <typename T>
void FixedLengthColumnBase<T>::append(const Column& src, size_t offset, size_t count) {
const auto& num_src = down_cast<const FixedLengthColumnBase<T>&>(src);
_data.insert(_data.end(), num_src._data.begin() + offset, num_src._data.begin() + offset + count);
}
template <typename T>
void FixedLengthColumnBase<T>::append_selective(const Column& src, const uint32_t* indexes, uint32_t from,
uint32_t size) {
const T* src_data = reinterpret_cast<const T*>(src.raw_data());
size_t orig_size = _data.size();
_data.resize(orig_size + size);
for (size_t i = 0; i < size; ++i) {
_data[orig_size + i] = src_data[indexes[from + i]];
}
}
template <typename T>
void FixedLengthColumnBase<T>::append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) {
const T* src_data = reinterpret_cast<const T*>(src.raw_data());
size_t orig_size = _data.size();
_data.resize(orig_size + size);
for (size_t i = 0; i < size; ++i) {
_data[orig_size + i] = src_data[index];
}
}
template <typename T>
size_t FixedLengthColumnBase<T>::filter_range(const Column::Filter& filter, size_t from, size_t to) {
auto size = ColumnHelper::filter_range<T>(filter, _data.data(), from, to);
this->resize(size);
return size;
}
template <typename T>
int FixedLengthColumnBase<T>::compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const {
DCHECK_LT(left, _data.size());
DCHECK_LT(right, rhs.size());
DCHECK(dynamic_cast<const FixedLengthColumnBase<T>*>(&rhs) != nullptr);
T x = _data[left];
T y = down_cast<const FixedLengthColumnBase<T>&>(rhs)._data[right];
if constexpr (IsDate<T>) {
return x.julian() - y.julian();
} else if constexpr (IsTimestamp<T>) {
Timestamp v = x.timestamp() - y.timestamp();
// Implicitly converting Timestamp to int may give wrong result.
if (v == 0) {
return 0;
} else {
return v > 0 ? 1 : -1;
}
} else {
// uint8/int8_t, uint16/int16_t, uint32/int32_t, int64, int128, float, double, Decimal, ...
if (x > y) {
return 1;
} else if (x < y) {
return -1;
} else {
return 0;
}
}
}
template <typename T>
uint32_t FixedLengthColumnBase<T>::serialize(size_t idx, uint8_t* pos) {
strings::memcpy_inlined(pos, &_data[idx], sizeof(T));
return sizeof(T);
}
template <typename T>
uint32_t FixedLengthColumnBase<T>::serialize_default(uint8_t* pos) {
ValueType value{};
strings::memcpy_inlined(pos, &value, sizeof(T));
return sizeof(T);
}
template <typename T>
void FixedLengthColumnBase<T>::serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) {
for (size_t i = 0; i < chunk_size; ++i) {
slice_sizes[i] += serialize(i, dst + i * max_one_row_size + slice_sizes[i]);
}
}
template <typename T>
size_t FixedLengthColumnBase<T>::serialize_batch_at_interval(uint8_t* dst, size_t byte_offset, size_t byte_interval,
size_t start, size_t count) {
const size_t value_size = sizeof(T);
const auto& key_data = get_data();
uint8_t* buf = dst + byte_offset;
for (size_t i = start; i < start + count; ++i) {
strings::memcpy_inlined(buf, &key_data[i], value_size);
buf = buf + byte_interval;
}
return value_size;
}
template <typename T>
const uint8_t* FixedLengthColumnBase<T>::deserialize_and_append(const uint8_t* pos) {
T value{};
strings::memcpy_inlined(&value, pos, sizeof(T));
_data.emplace_back(value);
return pos + sizeof(T);
}
template <typename T>
void FixedLengthColumnBase<T>::deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) {
raw::make_room(&_data, batch_size);
for (size_t i = 0; i < batch_size; ++i) {
strings::memcpy_inlined(&_data[i], srcs[i].data, sizeof(T));
srcs[i].data = srcs[i].data + sizeof(T);
}
}
template <typename T>
uint8_t* FixedLengthColumnBase<T>::serialize_column(uint8_t* dst) {
uint32_t size = byte_size();
encode_fixed32_le(dst, size);
dst += sizeof(uint32_t);
strings::memcpy_inlined(dst, _data.data(), size);
dst += size;
return dst;
}
template <typename T>
const uint8_t* FixedLengthColumnBase<T>::deserialize_column(const uint8_t* src) {
uint32_t size = decode_fixed32_le(src);
src += sizeof(uint32_t);
raw::make_room(&_data, size / sizeof(ValueType));
strings::memcpy_inlined(_data.data(), src, size);
src += size;
return src;
}
template <typename T>
void FixedLengthColumnBase<T>::fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
for (uint16_t i = from; i < to; ++i) {
hash[i] = HashUtil::fnv_hash(&_data[i], sizeof(ValueType), hash[i]);
}
}
// Must same with RawValue::zlib_crc32
template <typename T>
void FixedLengthColumnBase<T>::crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
for (uint16_t i = from; i < to; ++i) {
if constexpr (IsDate<T> || IsTimestamp<T>) {
std::string str = _data[i].to_string();
hash[i] = HashUtil::zlib_crc_hash(str.data(), str.size(), hash[i]);
} else if constexpr (IsDecimal<T>) {
int64_t int_val = _data[i].int_value();
int32_t frac_val = _data[i].frac_value();
uint32_t seed = HashUtil::zlib_crc_hash(&int_val, sizeof(int_val), hash[i]);
hash[i] = HashUtil::zlib_crc_hash(&frac_val, sizeof(frac_val), seed);
} else {
hash[i] = HashUtil::zlib_crc_hash(&_data[i], sizeof(ValueType), hash[i]);
}
}
}
template <typename T>
void FixedLengthColumnBase<T>::put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const {
if constexpr (IsDecimal<T>) {
buf->push_decimal(_data[idx].to_string());
} else if constexpr (std::is_arithmetic_v<T>) {
buf->push_number(_data[idx]);
} else {
// date/datetime or something else.
std::string s = _data[idx].to_string();
buf->push_string(s.data(), s.size());
}
}
template <typename T>
void FixedLengthColumnBase<T>::remove_first_n_values(size_t count) {
size_t remain_size = _data.size() - count;
memmove(_data.data(), _data.data() + count, remain_size * sizeof(T));
_data.resize(remain_size);
}
template <typename T>
std::string FixedLengthColumnBase<T>::debug_item(uint32_t idx) const {
std::stringstream ss;
if constexpr (sizeof(T) == 1) {
// for bool, int8_t
ss << (int)_data[idx];
} else {
ss << _data[idx];
}
return ss.str();
}
template <>
std::string FixedLengthColumnBase<int128_t>::debug_item(uint32_t idx) const {
std::stringstream ss;
starrocks::operator<<(ss, _data[idx]);
return ss.str();
}
template <typename T>
std::string FixedLengthColumnBase<T>::debug_string() const {
std::stringstream ss;
ss << "[";
for (int i = 0; i < _data.size() - 1; ++i) {
ss << debug_item(i) << ", ";
}
ss << debug_item(_data.size() - 1) << "]";
return ss.str();
}
template <typename T>
std::string FixedLengthColumnBase<T>::get_name() const {
if constexpr (IsDecimal<T>) {
return "decimal";
} else if constexpr (IsDate<T>) {
return "date";
} else if constexpr (IsTimestamp<T>) {
return "timestamp";
} else if constexpr (IsInt128<T>) {
return "int128";
} else if constexpr (std::is_floating_point_v<T>) {
return "float-" + std::to_string(sizeof(T));
} else {
return "integral-" + std::to_string(sizeof(T));
}
}
template class FixedLengthColumnBase<uint8_t>;
template class FixedLengthColumnBase<uint16_t>;
template class FixedLengthColumnBase<uint32_t>;
template class FixedLengthColumnBase<uint64_t>;
template class FixedLengthColumnBase<int8_t>;
template class FixedLengthColumnBase<int16_t>;
template class FixedLengthColumnBase<int32_t>;
template class FixedLengthColumnBase<int64_t>;
template class FixedLengthColumnBase<int96_t>;
template class FixedLengthColumnBase<int128_t>;
template class FixedLengthColumnBase<float>;
template class FixedLengthColumnBase<double>;
template class FixedLengthColumnBase<uint24_t>;
template class FixedLengthColumnBase<decimal12_t>;
template class FixedLengthColumnBase<DateValue>;
template class FixedLengthColumnBase<DecimalV2Value>;
template class FixedLengthColumnBase<TimestampValue>;
} // namespace starrocks::vectorized

View File

@ -0,0 +1,178 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <utility>
#include "column/column.h"
#include "runtime/date_value.h"
#include "runtime/decimalv2_value.h"
#include "runtime/timestamp_value.h"
#include "util/raw_container.h"
namespace starrocks::vectorized {
template <typename T>
constexpr bool IsDecimal = false;
template <>
inline constexpr bool IsDecimal<DecimalV2Value> = true;
template <typename T>
constexpr bool IsDate = false;
template <>
inline constexpr bool IsDate<DateValue> = true;
template <typename T>
constexpr bool IsTimestamp = false;
template <>
inline constexpr bool IsTimestamp<TimestampValue> = true;
template <typename T>
class FixedLengthColumnBase : public ColumnFactory<Column, FixedLengthColumnBase<T>> {
friend class ColumnFactory<Column, FixedLengthColumnBase>;
public:
using ValueType = T;
using Container = Buffer<ValueType>;
FixedLengthColumnBase() = default;
explicit FixedLengthColumnBase(const size_t n) : _data(n) {}
FixedLengthColumnBase(const size_t n, const ValueType x) : _data(n, x) {}
FixedLengthColumnBase(const FixedLengthColumnBase& src) : _data(src._data.begin(), src._data.end()) {}
// Only used as a underlying type for other column type(i.e. DecimalV3Column), C++
// is weak to implement delegation for composite type like golang, so we have to use
// inheritance to wrap a underlying type. When constructing a wrapper object, we must
// constructor the wrapped object first, move constructor is used to prevent the unnecessary
// time-consuming copy operation.
FixedLengthColumnBase(FixedLengthColumnBase&& src) noexcept : _data(std::move(src._data)) {}
bool is_numeric() const override { return std::is_arithmetic_v<ValueType>; }
bool is_decimal() const override { return IsDecimal<ValueType>; }
bool is_date() const override { return IsDate<ValueType>; }
bool is_timestamp() const override { return IsTimestamp<ValueType>; }
const uint8_t* raw_data() const override { return reinterpret_cast<const uint8_t*>(_data.data()); }
uint8_t* mutable_raw_data() override { return reinterpret_cast<uint8_t*>(_data.data()); }
size_t type_size() const override { return sizeof(T); }
size_t size() const override { return _data.size(); }
size_t byte_size() const override { return _data.size() * sizeof(ValueType); }
size_t byte_size(size_t idx __attribute__((unused))) const override { return sizeof(ValueType); }
void reserve(size_t n) override { _data.reserve(n); }
void resize(size_t n) override { _data.resize(n); }
void resize_uninitialized(size_t n) override { raw::stl_vector_resize_uninitialized(&_data, n); }
void assign(size_t n, size_t idx) override { _data.assign(n, _data[idx]); }
void remove_first_n_values(size_t count) override;
void append(const T value) { _data.emplace_back(value); }
void append_datum(const Datum& datum) override { _data.emplace_back(datum.get<ValueType>()); }
void append(const Column& src, size_t offset, size_t count) override;
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
bool append_nulls(size_t count __attribute__((unused))) override { return false; }
bool append_strings(const std::vector<Slice>& slices __attribute__((unused))) override { return false; }
size_t append_numbers(const void* buff, size_t length) override {
const size_t count = length / sizeof(ValueType);
const T* const ptr = reinterpret_cast<const T*>(buff);
_data.insert(_data.end(), ptr, ptr + count);
return count;
}
void append_value_multiple_times(const void* value, size_t count) override {
_data.insert(_data.end(), count, *reinterpret_cast<const T*>(value));
}
void append_default() override { _data.emplace_back(ValueType()); }
void append_default(size_t count) override { _data.resize(_data.size() + count); }
uint32_t serialize(size_t idx, uint8_t* pos) override;
uint32_t serialize_default(uint8_t* pos) override;
uint32_t max_one_element_serialize_size() const override { return sizeof(ValueType); }
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override;
size_t serialize_batch_at_interval(uint8_t* dst, size_t byte_offset, size_t byte_interval, size_t start,
size_t count) override;
const uint8_t* deserialize_and_append(const uint8_t* pos) override;
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override;
uint32_t serialize_size(size_t idx) const override { return sizeof(ValueType); }
size_t serialize_size() const override { return byte_size() + sizeof(uint32_t); }
uint8_t* serialize_column(uint8_t* dst) override;
const uint8_t* deserialize_column(const uint8_t* src) override;
MutableColumnPtr clone_empty() const override { return this->create_mutable(); }
size_t filter_range(const Column::Filter& filter, size_t from, size_t to) override;
int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const override;
void fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override;
std::string get_name() const override;
Container& get_data() { return _data; }
const Container& get_data() const { return _data; }
Datum get(size_t n) const override { return Datum(_data[n]); }
std::string debug_item(uint32_t idx) const override;
std::string debug_string() const override;
size_t container_memory_usage() const override { return _data.capacity() * sizeof(ValueType); }
size_t shrink_memory_usage() const override { return _data.size() * sizeof(ValueType); }
void swap_column(Column& rhs) override {
auto& r = down_cast<FixedLengthColumnBase&>(rhs);
std::swap(this->_delete_state, r._delete_state);
std::swap(this->_data, r._data);
}
void reset_column() override {
Column::reset_column();
_data.clear();
}
protected:
Container _data;
};
} // namespace starrocks::vectorized

59
be/src/column/hash_set.h Normal file
View File

@ -0,0 +1,59 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/column_hash.h"
#include "util/phmap/phmap.h"
#include "util/phmap/phmap_dump.h"
namespace starrocks::vectorized {
template <typename T>
using HashSet = phmap::flat_hash_set<T, StdHash<T>>;
// By storing hash value in slice, we can save the cost of
// 1. re-calculate hash value of the slice
// 2. touch slice memory area which may cause high latency of memory access.
// and the tradeoff is we allocate 8-bytes hash value in slice.
// But now we allocate all slice data on a single memory pool(4K per allocation)
// the internal fragmentation can offset these 8-bytes hash value.
class SliceWithHash : public Slice {
public:
size_t hash;
SliceWithHash(const Slice& src) : Slice(src.data, src.size) { hash = SliceHash()(src); }
SliceWithHash(const uint8_t* p, size_t s, size_t h) : Slice(p, s), hash(h) {}
};
class HashOnSliceWithHash {
public:
std::size_t operator()(const SliceWithHash& slice) const { return slice.hash; }
};
class EqualOnSliceWithHash {
public:
bool operator()(const SliceWithHash& x, const SliceWithHash& y) const {
// by comparing hash value first, we can avoid comparing real data
// which may touch another memory area and has bad cache locality.
return x.hash == y.hash && memequal(x.data, x.size, y.data, y.size);
}
};
using SliceHashSet = phmap::flat_hash_set<SliceWithHash, HashOnSliceWithHash, EqualOnSliceWithHash>;
using SliceNormalHashSet = phmap::flat_hash_set<Slice, SliceHash, SliceNormalEqual>;
template <typename T>
struct PhSetTraits {
using SetType = HashSet<T>;
};
template <>
struct PhSetTraits<Slice> {
using SetType = SliceNormalHashSet;
};
template <typename T>
using PhSet = typename PhSetTraits<T>::SetType;
} // namespace starrocks::vectorized

View File

@ -0,0 +1,315 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/nullable_column.h"
#include <gutil/strings/fastmem.h>
#include "column/column_helper.h"
#include "gutil/casts.h"
#include "simd/simd.h"
#include "util/mysql_row_buffer.h"
namespace starrocks {
namespace vectorized {
NullableColumn::NullableColumn(MutableColumnPtr&& data_column, MutableColumnPtr&& null_column)
: _data_column(std::move(data_column)), _has_null(false) {
DCHECK(!_data_column->is_constant() && !_data_column->is_nullable())
<< "nullable column's data must be single column";
DCHECK(!null_column->is_constant() && !null_column->is_nullable())
<< "nullable column's data must be single column";
ColumnPtr ptr = std::move(null_column);
_null_column = std::static_pointer_cast<NullColumn>(ptr);
_has_null = SIMD::count_nonzero(_null_column->get_data());
}
NullableColumn::NullableColumn(ColumnPtr data_column, NullColumnPtr null_column)
: _data_column(std::move(data_column)),
_null_column(std::move(null_column)),
_has_null(SIMD::count_nonzero(_null_column->get_data())) {
DCHECK(!_data_column->is_constant() && !_data_column->is_nullable())
<< "nullable column's data must be single column";
DCHECK(!_null_column->is_constant() && !_null_column->is_nullable())
<< "nullable column's data must be single column";
}
void NullableColumn::append_datum(const Datum& datum) {
if (datum.is_null()) {
append_nulls(1);
} else {
_data_column->append_datum(datum);
null_column_data().emplace_back(0);
DCHECK_EQ(_null_column->size(), _data_column->size());
}
}
void NullableColumn::append(const Column& src, size_t offset, size_t count) {
DCHECK_EQ(_null_column->size(), _data_column->size());
if (src.is_nullable()) {
const auto& c = down_cast<const NullableColumn&>(src);
DCHECK_EQ(c._null_column->size(), c._data_column->size());
_null_column->append(*c._null_column, offset, count);
_data_column->append(*c._data_column, offset, count);
_has_null = _has_null || SIMD::count_nonzero(&(c._null_column->get_data()[offset]), count);
} else {
_null_column->resize(_null_column->size() + count);
_data_column->append(src, offset, count);
}
DCHECK_EQ(_null_column->size(), _data_column->size());
}
void NullableColumn::append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) {
DCHECK_EQ(_null_column->size(), _data_column->size());
uint32_t orig_size = _null_column->size();
if (src.is_nullable()) {
const auto& src_column = down_cast<const NullableColumn&>(src);
DCHECK_EQ(src_column._null_column->size(), src_column._data_column->size());
_null_column->append_selective(*src_column._null_column, indexes, from, size);
_data_column->append_selective(*src_column._data_column, indexes, from, size);
_has_null = _has_null || SIMD::count_nonzero(&_null_column->get_data()[orig_size], size);
} else {
_null_column->resize(orig_size + size);
_data_column->append_selective(src, indexes, from, size);
}
DCHECK_EQ(_null_column->size(), _data_column->size());
}
void NullableColumn::append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) {
DCHECK_EQ(_null_column->size(), _data_column->size());
uint32_t orig_size = _null_column->size();
if (src.is_nullable()) {
const auto& src_column = down_cast<const NullableColumn&>(src);
DCHECK_EQ(src_column._null_column->size(), src_column._data_column->size());
_null_column->append_value_multiple_times(*src_column._null_column, index, size);
_data_column->append_value_multiple_times(*src_column._data_column, index, size);
_has_null = _has_null || SIMD::count_nonzero(&_null_column->get_data()[orig_size], size);
} else {
_null_column->resize(orig_size + size);
_data_column->append_value_multiple_times(src, index, size);
}
DCHECK_EQ(_null_column->size(), _data_column->size());
}
bool NullableColumn::append_nulls(size_t count) {
DCHECK_GT(count, 0u);
_data_column->append_default(count);
null_column_data().insert(null_column_data().end(), count, 1);
DCHECK_EQ(_null_column->size(), _data_column->size());
_has_null = true;
return true;
}
bool NullableColumn::append_strings(const std::vector<Slice>& strs) {
if (_data_column->append_strings(strs)) {
null_column_data().resize(_null_column->size() + strs.size(), 0);
return true;
}
DCHECK_EQ(_null_column->size(), _data_column->size());
return false;
}
bool NullableColumn::append_strings_overflow(const std::vector<Slice>& strs, size_t max_length) {
if (_data_column->append_strings_overflow(strs, max_length)) {
null_column_data().resize(_null_column->size() + strs.size(), 0);
return true;
}
DCHECK_EQ(_null_column->size(), _data_column->size());
return false;
}
bool NullableColumn::append_continuous_strings(const std::vector<Slice>& strs) {
if (_data_column->append_continuous_strings(strs)) {
null_column_data().resize(_null_column->size() + strs.size(), 0);
return true;
}
DCHECK_EQ(_null_column->size(), _data_column->size());
return false;
}
size_t NullableColumn::append_numbers(const void* buff, size_t length) {
size_t n;
if ((n = _data_column->append_numbers(buff, length)) > 0) {
null_column_data().insert(null_column_data().end(), n, 0);
}
DCHECK_EQ(_null_column->size(), _data_column->size());
return n;
}
void NullableColumn::append_value_multiple_times(const void* value, size_t count) {
_data_column->append_value_multiple_times(value, count);
null_column_data().insert(null_column_data().end(), count, 0);
}
size_t NullableColumn::filter_range(const Column::Filter& filter, size_t from, size_t to) {
auto s1 = _data_column->filter_range(filter, from, to);
auto s2 = _null_column->filter_range(filter, from, to);
update_has_null();
DCHECK_EQ(s1, s2);
return s1;
}
int NullableColumn::compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const {
if (immutable_null_column_data()[left]) {
return rhs.is_null(right) ? 0 : nan_direction_hint;
}
if (rhs.is_nullable()) {
const NullableColumn& nullable_rhs = down_cast<const NullableColumn&>(rhs);
if (nullable_rhs.immutable_null_column_data()[right]) {
return -nan_direction_hint;
}
const auto& rhs_data = *(nullable_rhs._data_column);
return _data_column->compare_at(left, right, rhs_data, nan_direction_hint);
} else {
return _data_column->compare_at(left, right, rhs, nan_direction_hint);
}
}
uint32_t NullableColumn::serialize(size_t idx, uint8_t* pos) {
// For nullable column don't have null column data and has_null is false.
if (!_has_null) {
strings::memcpy_inlined(pos, &_has_null, sizeof(bool));
return sizeof(bool) + _data_column->serialize(idx, pos + sizeof(bool));
}
bool null = _null_column->get_data()[idx];
strings::memcpy_inlined(pos, &null, sizeof(bool));
if (null) {
return sizeof(bool);
}
return sizeof(bool) + _data_column->serialize(idx, pos + sizeof(bool));
}
uint32_t NullableColumn::serialize_default(uint8_t* pos) {
bool null = true;
strings::memcpy_inlined(pos, &null, sizeof(bool));
return sizeof(bool);
}
size_t NullableColumn::serialize_batch_at_interval(uint8_t* dst, size_t byte_offset, size_t byte_interval, size_t start,
size_t count) {
_null_column->serialize_batch_at_interval(dst, byte_offset, byte_interval, start, count);
for (size_t i = start; i < start + count; i++) {
if (_null_column->get_data()[i] == 0) {
_data_column->serialize(i, dst + (i - start) * byte_interval + byte_offset + 1);
} else {
_data_column->serialize_default(dst + (i - start) * byte_interval + byte_offset + 1);
}
}
return _null_column->type_size() + _data_column->type_size();
}
void NullableColumn::serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) {
for (size_t i = 0; i < chunk_size; ++i) {
slice_sizes[i] += serialize(i, dst + i * max_one_row_size + slice_sizes[i]);
}
}
const uint8_t* NullableColumn::deserialize_and_append(const uint8_t* pos) {
bool null;
memcpy(&null, pos, sizeof(bool));
pos += sizeof(bool);
null_column_data().emplace_back(null);
if (null == 0) {
pos = _data_column->deserialize_and_append(pos);
} else {
_has_null = true;
_data_column->append_default();
}
return pos;
}
void NullableColumn::deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) {
for (size_t i = 0; i < batch_size; ++i) {
srcs[i].data = (char*)deserialize_and_append((uint8_t*)srcs[i].data);
}
}
uint8_t* NullableColumn::serialize_column(uint8_t* dst) {
dst = _null_column->serialize_column(dst);
return _data_column->serialize_column(dst);
}
const uint8_t* NullableColumn::deserialize_column(const uint8_t* src) {
src = _null_column->deserialize_column(src);
_has_null = SIMD::count_nonzero(_null_column->get_data());
return _data_column->deserialize_column(src);
}
// Note: the hash function should be same with RawValue::get_hash_value_fvn
void NullableColumn::fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
// fast path when _has_null is false
if (!_has_null) {
_data_column->fvn_hash(hash, from, to);
return;
}
auto null_data = _null_column->get_data();
uint32_t value = 0x9e3779b9;
while (from < to) {
uint16_t new_from = from + 1;
while (new_from < to && null_data[from] == null_data[new_from]) {
++new_from;
}
if (null_data[from]) {
for (uint16_t i = from; i < new_from; ++i) {
hash[i] = hash[i] ^ (value + (hash[i] << 6) + (hash[i] >> 2));
}
} else {
_data_column->fvn_hash(hash, from, new_from);
}
from = new_from;
}
}
void NullableColumn::crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
// fast path when _has_null is false
if (!_has_null) {
_data_column->crc32_hash(hash, from, to);
return;
}
auto null_data = _null_column->get_data();
// NULL is treat as 0 when crc32 hash for data loading
static const int INT_VALUE = 0;
while (from < to) {
uint16_t new_from = from + 1;
while (new_from < to && null_data[from] == null_data[new_from]) {
++new_from;
}
if (null_data[from]) {
for (uint16_t i = from; i < new_from; ++i) {
hash[i] = HashUtil::zlib_crc_hash(&INT_VALUE, 4, hash[i]);
}
} else {
_data_column->crc32_hash(hash, from, new_from);
}
from = new_from;
}
}
void NullableColumn::put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const {
if (_has_null && _null_column->get_data()[idx]) {
buf->push_null();
} else {
_data_column->put_mysql_row_buffer(buf, idx);
}
}
} // namespace vectorized
} // namespace starrocks

View File

@ -0,0 +1,282 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/fixed_length_column.h"
#include "common/logging.h"
namespace starrocks::vectorized {
using NullData = FixedLengthColumn<uint8_t>::Container;
using NullColumn = FixedLengthColumn<uint8_t>;
using NullColumnPtr = FixedLengthColumn<uint8_t>::Ptr;
using NullColumns = std::vector<NullColumnPtr>;
using NullValueType = NullColumn::ValueType;
static constexpr NullValueType DATUM_NULL = NullValueType(1);
static constexpr NullValueType DATUM_NOT_NULL = NullValueType(0);
class NullableColumn final : public ColumnFactory<Column, NullableColumn> {
friend class ColumnFactory<Column, NullableColumn>;
public:
NullableColumn(MutableColumnPtr&& data_column, MutableColumnPtr&& null_column);
NullableColumn(ColumnPtr data_column, NullColumnPtr null_column);
// Copy constructor
NullableColumn(const NullableColumn& rhs)
: _data_column(rhs._data_column->clone_shared()),
_null_column(std::static_pointer_cast<NullColumn>(rhs._null_column->clone_shared())),
_has_null(rhs._has_null) {}
// Move constructor
NullableColumn(NullableColumn&& rhs)
: _data_column(std::move(rhs._data_column)),
_null_column(std::move(rhs._null_column)),
_has_null(rhs._has_null) {}
// Copy assignment
NullableColumn& operator=(const NullableColumn& rhs) {
NullableColumn tmp(rhs);
this->swap_column(tmp);
return *this;
}
// Move assignment
NullableColumn& operator=(NullableColumn&& rhs) noexcept {
NullableColumn tmp(std::move(rhs));
this->swap_column(tmp);
return *this;
}
~NullableColumn() override = default;
bool has_null() const override { return _has_null; }
void set_has_null(bool has_null) { _has_null = _has_null | has_null; }
void update_has_null() {
const NullColumn::Container& v = _null_column->get_data();
const auto* p = v.data();
_has_null = (p != nullptr) && (nullptr != memchr(p, 1, v.size() * sizeof(v[0])));
}
bool is_nullable() const override { return true; }
bool is_null(size_t index) const override {
DCHECK_EQ(_null_column->size(), _data_column->size());
return _has_null && immutable_null_column_data()[index];
}
bool low_cardinality() const override { return false; }
const uint8_t* raw_data() const override { return _data_column->raw_data(); }
uint8_t* mutable_raw_data() override { return reinterpret_cast<uint8_t*>(_data_column->mutable_raw_data()); }
size_t size() const override {
DCHECK_EQ(_data_column->size(), _null_column->size());
return _data_column->size();
}
size_t type_size() const override { return _data_column->type_size() + _null_column->type_size(); }
size_t byte_size() const override { return byte_size(0, size()); }
size_t byte_size(size_t from, size_t size) const override {
DCHECK_LE(from + size, this->size()) << "Range error";
return _data_column->byte_size(from, size) + _null_column->Column::byte_size(from, size);
}
size_t byte_size(size_t idx) const override { return _data_column->byte_size(idx) + sizeof(bool); }
void reserve(size_t n) override {
_data_column->reserve(n);
_null_column->reserve(n);
}
void resize(size_t n) override {
_data_column->resize(n);
_null_column->resize(n);
}
void assign(size_t n, size_t idx) override {
_data_column->assign(n, idx);
_null_column->assign(n, idx);
}
void remove_first_n_values(size_t count) override {
_data_column->remove_first_n_values(count);
_null_column->remove_first_n_values(count);
}
void append_datum(const Datum& datum) override;
void append(const Column& src, size_t offset, size_t count) override;
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
bool append_nulls(size_t count) override;
bool append_strings(const std::vector<Slice>& strs) override;
bool append_strings_overflow(const std::vector<Slice>& strs, size_t max_length) override;
bool append_continuous_strings(const std::vector<Slice>& strs) override;
size_t append_numbers(const void* buff, size_t length) override;
void append_value_multiple_times(const void* value, size_t count) override;
void append_default() override { append_nulls(1); }
void append_default_not_null_value() {
_data_column->append_default();
_null_column->append(0);
}
void append_default(size_t count) override { append_nulls(count); }
uint32_t max_one_element_serialize_size() const override {
return sizeof(bool) + _data_column->max_one_element_serialize_size();
}
uint32_t serialize(size_t idx, uint8_t* pos) override;
uint32_t serialize_default(uint8_t* pos) override;
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override;
const uint8_t* deserialize_and_append(const uint8_t* pos) override;
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override;
uint32_t serialize_size(size_t idx) const override {
if (_null_column->get_data()[idx]) {
return sizeof(uint8_t);
}
return sizeof(uint8_t) + _data_column->serialize_size(idx);
}
size_t serialize_size() const override { return _data_column->serialize_size() + _null_column->serialize_size(); }
uint8_t* serialize_column(uint8_t* dst) override;
const uint8_t* deserialize_column(const uint8_t* src) override;
MutableColumnPtr clone_empty() const override {
return create_mutable(_data_column->clone_empty(), _null_column->clone_empty());
}
size_t serialize_batch_at_interval(uint8_t* dst, size_t byte_offset, size_t byte_interval, size_t start,
size_t count) override;
size_t filter_range(const Column::Filter& filter, size_t from, size_t to) override;
int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const override;
void fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override;
std::string get_name() const override { return "nullable-" + _data_column->get_name(); }
NullData& null_column_data() { return _null_column->get_data(); }
const NullData& immutable_null_column_data() const { return _null_column->get_data(); }
Column* mutable_data_column() { return _data_column.get(); }
NullColumn* mutable_null_column() { return _null_column.get(); }
const Column& data_column_ref() const { return *_data_column; }
const ColumnPtr& data_column() const { return _data_column; }
ColumnPtr& data_column() { return _data_column; }
const NullColumnPtr& null_column() const { return _null_column; }
Datum get(size_t n) const override {
if (_has_null && _null_column->get_data()[n]) {
return Datum();
} else {
return _data_column->get(n);
}
}
bool set_null(size_t idx) override {
null_column_data()[idx] = 1;
_has_null = true;
return true;
}
size_t memory_usage() const override {
return _data_column->memory_usage() + _null_column->memory_usage() + sizeof(bool);
}
size_t shrink_memory_usage() const override {
return _data_column->shrink_memory_usage() + _null_column->shrink_memory_usage() + sizeof(bool);
}
size_t container_memory_usage() const override {
return _data_column->container_memory_usage() + _null_column->container_memory_usage();
}
size_t element_memory_usage(size_t from, size_t size) const override {
DCHECK_LE(from + size, this->size()) << "Range error";
return _data_column->element_memory_usage(from, size) + _null_column->element_memory_usage(from, size);
}
void swap_column(Column& rhs) override {
auto& r = down_cast<NullableColumn&>(rhs);
_data_column->swap_column(*r._data_column);
_null_column->swap_column(*r._null_column);
std::swap(_delete_state, r._delete_state);
std::swap(_has_null, r._has_null);
}
void reset_column() override {
Column::reset_column();
_data_column->reset_column();
_null_column->reset_column();
_has_null = false;
}
std::string debug_item(uint32_t idx) const override {
DCHECK(_null_column->size() == _data_column->size());
std::stringstream ss;
if (_null_column->get_data()[idx]) {
ss << "NULL";
} else {
ss << _data_column->debug_item(idx);
}
return ss.str();
}
std::string debug_string() const override {
DCHECK(_null_column->size() == _data_column->size());
std::stringstream ss;
ss << "[";
int size = _data_column->size();
for (int i = 0; i < size - 1; ++i) {
ss << debug_item(i) << ", ";
}
if (size > 0) {
ss << debug_item(size - 1);
}
ss << "]";
return ss.str();
}
private:
ColumnPtr _data_column;
NullColumnPtr _null_column;
bool _has_null;
};
} // namespace starrocks::vectorized

View File

@ -0,0 +1,299 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/object_column.h"
#include "gutil/casts.h"
#include "storage/hll.h"
#include "util/bitmap_value.h"
#include "util/mysql_row_buffer.h"
namespace starrocks::vectorized {
template <typename T>
size_t ObjectColumn<T>::byte_size(size_t from, size_t size) const {
DCHECK_LE(from + size, this->size()) << "Range error";
size_t byte_size = 0;
for (int i = 0; i < size; ++i) {
byte_size += _pool[from + i].serialize_size();
}
return byte_size;
}
template <typename T>
size_t ObjectColumn<T>::byte_size(size_t idx) const {
DCHECK(false) << "Don't support object column byte size";
return 0;
}
template <typename T>
void ObjectColumn<T>::assign(size_t n, size_t idx) {
_pool[0] = std::move(_pool[idx]);
_pool.resize(1);
_pool.reserve(n);
for (int i = 1; i < n; ++i) {
append(&_pool[0]);
}
_cache_ok = false;
}
template <typename T>
void ObjectColumn<T>::append(const T* object) {
_pool.emplace_back(*object);
_cache_ok = false;
}
template <typename T>
void ObjectColumn<T>::append(T&& object) {
_pool.emplace_back(std::move(object));
_cache_ok = false;
}
template <typename T>
void ObjectColumn<T>::remove_first_n_values(size_t count) {
size_t remain_size = _pool.size() - count;
for (int i = 0; i < remain_size; ++i) {
_pool[i] = std::move(_pool[count + i]);
}
_pool.resize(remain_size);
_cache_ok = false;
}
template <typename T>
void ObjectColumn<T>::append(const Column& src, size_t offset, size_t count) {
const auto& obj_col = down_cast<const ObjectColumn<T>&>(src);
for (int i = offset; i < count + offset; ++i) {
append(obj_col.get_object(i));
}
}
template <typename T>
void ObjectColumn<T>::append_selective(const starrocks::vectorized::Column& src, const uint32_t* indexes, uint32_t from,
uint32_t size) {
const auto& obj_col = down_cast<const ObjectColumn<T>&>(src);
for (int j = 0; j < size; ++j) {
append(obj_col.get_object(indexes[from + j]));
}
};
template <typename T>
void ObjectColumn<T>::append_value_multiple_times(const starrocks::vectorized::Column& src, uint32_t index,
uint32_t size) {
const auto& obj_col = down_cast<const ObjectColumn<T>&>(src);
for (int j = 0; j < size; ++j) {
append(obj_col.get_object(index));
}
};
template <typename T>
bool ObjectColumn<T>::append_strings(const vector<starrocks::Slice>& strs) {
_pool.reserve(_pool.size() + strs.size());
for (const Slice& s : strs) {
_pool.emplace_back(s);
}
_cache_ok = false;
return true;
}
template <typename T>
void ObjectColumn<T>::append_value_multiple_times(const void* value, size_t count) {
const Slice* slice = reinterpret_cast<const Slice*>(value);
_pool.reserve(_pool.size() + count);
for (int i = 0; i < count; ++i) {
_pool.emplace_back(*slice);
}
_cache_ok = false;
};
template <typename T>
void ObjectColumn<T>::append_default() {
_pool.emplace_back(T());
_cache_ok = false;
}
template <typename T>
void ObjectColumn<T>::append_default(size_t count) {
for (int i = 0; i < count; ++i) {
append_default();
}
}
template <typename T>
uint32_t ObjectColumn<T>::serialize(size_t idx, uint8_t* pos) {
DCHECK(false) << "Don't support object column serialize";
return 0;
}
template <typename T>
uint32_t ObjectColumn<T>::serialize_default(uint8_t* pos) {
DCHECK(false) << "Don't support object column serialize";
return 0;
}
template <typename T>
void ObjectColumn<T>::serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) {
DCHECK(false) << "Don't support object column serialize batch";
}
template <typename T>
const uint8_t* ObjectColumn<T>::deserialize_and_append(const uint8_t* pos) {
DCHECK(false) << "Don't support object column deserialize and append";
return pos;
}
template <typename T>
void ObjectColumn<T>::deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) {
DCHECK(false) << "Don't support object column deserialize and append";
}
template <typename T>
uint32_t ObjectColumn<T>::serialize_size(size_t idx) const {
DCHECK(false) << "Don't support object column byte size";
return 0;
}
template <typename T>
size_t ObjectColumn<T>::serialize_size() const {
// | count(4 byte) | size (8 byte)| object(size byte) | size(8 byte) |....
return byte_size() + sizeof(uint32_t) + _pool.size() * sizeof(uint64_t);
}
template <typename T>
uint8_t* ObjectColumn<T>::serialize_column(uint8_t* dst) {
encode_fixed32_le(dst, _pool.size());
dst += sizeof(uint32_t);
for (int i = 0; i < _pool.size(); ++i) {
uint64_t actual = _pool[i].serialize(dst + sizeof(uint64_t));
encode_fixed64_le(dst, actual);
dst += sizeof(uint64_t);
dst += actual;
}
return dst;
}
template <typename T>
const uint8_t* ObjectColumn<T>::deserialize_column(const uint8_t* src) {
uint32_t count = decode_fixed32_le(src);
src += sizeof(uint32_t);
for (int i = 0; i < count; ++i) {
uint64_t size = decode_fixed64_le(src);
src += sizeof(uint64_t);
_pool.emplace_back(Slice(src, size));
src += size;
}
return src;
}
template <typename T>
size_t ObjectColumn<T>::filter_range(const Column::Filter& filter, size_t from, size_t to) {
size_t old_sz = size();
size_t new_sz = from;
for (auto i = from; i < to; ++i) {
if (filter[i]) {
std::swap(_pool[new_sz], _pool[i]);
new_sz++;
}
}
DCHECK_LE(new_sz, to);
if (new_sz < to) {
for (int i = to; i < old_sz; i++) {
std::swap(_pool[new_sz], _pool[i]);
new_sz++;
}
}
_pool.resize(new_sz);
return new_sz;
}
template <typename T>
int ObjectColumn<T>::compare_at(size_t left, size_t right, const starrocks::vectorized::Column& rhs,
int nan_direction_hint) const {
DCHECK(false) << "Don't support object column compare_at";
return 0;
}
template <typename T>
void ObjectColumn<T>::fvn_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
std::string s;
for (int i = from; i < to; ++i) {
s.resize(_pool[i].serialize_size());
int32_t size = _pool[i].serialize(reinterpret_cast<uint8_t*>(s.data()));
hash[i] = HashUtil::fnv_hash(s.data(), size, hash[i]);
}
}
template <typename T>
void ObjectColumn<T>::crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const {
DCHECK(false) << "object column shouldn't call crc32_hash ";
}
template <typename T>
void ObjectColumn<T>::put_mysql_row_buffer(starrocks::MysqlRowBuffer* buf, size_t idx) const {
buf->push_null();
}
template <typename T>
void ObjectColumn<T>::_build_slices() const {
// TODO(kks): improve this
_buffer.clear();
_slices.clear();
// FIXME(kks): bitmap itself compress is more effective than LZ4 compress?
// Do we really need compress bitmap here?
if constexpr (std::is_same_v<T, BitmapValue>) {
for (int i = 0; i < _pool.size(); ++i) {
_pool[i].compress();
}
}
size_t size = byte_size();
_buffer.resize(size);
_slices.reserve(_pool.size());
size_t old_size = 0;
for (int i = 0; i < _pool.size(); ++i) {
int32_t slice_size = _pool[i].serialize(_buffer.data() + old_size);
_slices.emplace_back(Slice(_buffer.data() + old_size, slice_size));
old_size += slice_size;
}
}
template <typename T>
MutableColumnPtr ObjectColumn<T>::clone() const {
auto p = clone_empty();
p->append(*this, 0, size());
return p;
}
template <typename T>
ColumnPtr ObjectColumn<T>::clone_shared() const {
auto p = clone_empty();
p->append(*this, 0, size());
return p;
}
template <typename T>
std::string ObjectColumn<T>::debug_item(uint32_t idx) const {
return "";
}
template <>
std::string ObjectColumn<BitmapValue>::debug_item(uint32_t idx) const {
return _pool[idx].to_string();
}
template class ObjectColumn<HyperLogLog>;
template class ObjectColumn<BitmapValue>;
template class ObjectColumn<PercentileValue>;
} // namespace starrocks::vectorized

View File

@ -0,0 +1,221 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <memory>
#include "column/column.h"
#include "common/object_pool.h"
#include "util/bitmap_value.h"
namespace starrocks::vectorized {
//class Object {
// Object();
//
// Object(const Slice& s);
//
// void clear();
//
// size_t serialize_size() const;
// size_t serialize(uint8_t* dst) const;
//};
template <typename T>
class ObjectColumn final : public ColumnFactory<Column, ObjectColumn<T>> {
friend class ColumnFactory<Column, ObjectColumn>;
public:
using ValueType = T;
ObjectColumn() = default;
ObjectColumn(const ObjectColumn& column) { DCHECK(false) << "Can't copy construct object column"; }
ObjectColumn(ObjectColumn&& object_column) noexcept : _pool(std::move(object_column._pool)), _cache_ok(false) {}
void operator=(const ObjectColumn&) = delete;
ObjectColumn& operator=(ObjectColumn&& rhs) noexcept {
ObjectColumn tmp(std::move(rhs));
this->swap_column(tmp);
return *this;
}
~ObjectColumn() override = default;
bool is_object() const override { return true; }
const uint8_t* raw_data() const override {
_build_slices();
return reinterpret_cast<const uint8_t*>(_slices.data());
}
uint8_t* mutable_raw_data() override {
_build_slices();
return reinterpret_cast<uint8_t*>(_slices.data());
}
size_t size() const override { return _pool.size(); }
size_t type_size() const override { return sizeof(T); }
size_t byte_size() const override { return byte_size(0, size()); }
size_t byte_size(size_t from, size_t size) const override;
size_t byte_size(size_t idx) const override;
void reserve(size_t n) override { _pool.reserve(n); }
void resize(size_t n) override { _pool.resize(n); }
void assign(size_t n, size_t idx) override;
void append(const T* object);
void append(T&& object);
void append_datum(const Datum& datum) override { append(datum.get<T*>()); }
void remove_first_n_values(size_t count) override;
void append(const Column& src, size_t offset, size_t count) override;
void append_selective(const Column& src, const uint32_t* indexes, uint32_t from, uint32_t size) override;
void append_value_multiple_times(const Column& src, uint32_t index, uint32_t size) override;
bool append_nulls(size_t count) override { return false; }
bool append_strings(const std::vector<Slice>& strs) override;
size_t append_numbers(const void* buff, size_t length) override { return -1; }
// append from slice, call in SCAN_NODE append default values
void append_value_multiple_times(const void* value, size_t count) override;
void append_default() override;
void append_default(size_t count) override;
uint32_t serialize(size_t idx, uint8_t* pos) override;
uint32_t serialize_default(uint8_t* pos) override;
void serialize_batch(uint8_t* dst, Buffer<uint32_t>& slice_sizes, size_t chunk_size,
uint32_t max_one_row_size) override;
const uint8_t* deserialize_and_append(const uint8_t* pos) override;
void deserialize_and_append_batch(std::vector<Slice>& srcs, size_t batch_size) override;
uint32_t serialize_size(size_t idx) const override;
size_t serialize_size() const override;
uint8_t* serialize_column(uint8_t* dst) override;
const uint8_t* deserialize_column(const uint8_t* src) override;
MutableColumnPtr clone_empty() const override { return this->create_mutable(); }
MutableColumnPtr clone() const override;
ColumnPtr clone_shared() const override;
size_t filter_range(const Column::Filter& filter, size_t from, size_t to) override;
int compare_at(size_t left, size_t right, const Column& rhs, int nan_direction_hint) const override;
void fvn_hash(uint32_t* seed, uint16_t from, uint16_t to) const override;
void crc32_hash(uint32_t* hash, uint16_t from, uint16_t to) const override;
void put_mysql_row_buffer(MysqlRowBuffer* buf, size_t idx) const override;
std::string get_name() const override { return std::string{"object"}; }
T* get_object(size_t n) const { return const_cast<T*>(&_pool[n]); }
Buffer<T*>& get_data() {
_build_cache();
return _cache;
}
const Buffer<T*>& get_data() const {
_build_cache();
return _cache;
}
Datum get(size_t n) const override { return Datum(get_object(n)); }
size_t shrink_memory_usage() const override { return _pool.size() * type_size() + byte_size(); }
size_t container_memory_usage() const override { return _pool.capacity() * type_size(); }
size_t element_memory_usage() const override { return byte_size(); }
size_t element_memory_usage(size_t from, size_t size) const override { return byte_size(from, size); }
void swap_column(Column& rhs) override {
auto& r = down_cast<ObjectColumn&>(rhs);
std::swap(this->_delete_state, r._delete_state);
std::swap(this->_pool, r._pool);
std::swap(this->_cache_ok, r._cache_ok);
std::swap(this->_cache, r._cache);
std::swap(this->_buffer, r._buffer);
std::swap(this->_slices, r._slices);
}
void reset_column() override {
Column::reset_column();
_pool.clear();
_cache_ok = false;
_cache.clear();
_slices.clear();
_buffer.clear();
}
std::vector<T>& get_pool() { return _pool; }
const std::vector<T>& get_pool() const { return _pool; }
std::string debug_item(uint32_t idx) const;
std::string debug_string() const override {
std::stringstream ss;
ss << "[";
for (int i = 0; i < size() - 1; ++i) {
ss << debug_item(i) << ", ";
}
ss << debug_item(size() - 1) << "]";
return ss.str();
}
private:
void _build_cache() const {
if (_cache_ok) {
return;
}
_cache.clear();
_cache.reserve(_pool.size());
for (int i = 0; i < _pool.size(); ++i) {
_cache.emplace_back(const_cast<T*>(&_pool[i]));
}
_cache_ok = true;
}
// Currently, only for data loading
void _build_slices() const;
private:
std::vector<T> _pool;
mutable bool _cache_ok = false;
mutable Buffer<T*> _cache;
// Only for data loading
mutable std::vector<Slice> _slices;
mutable std::vector<uint8_t> _buffer;
};
} // namespace starrocks::vectorized

84
be/src/column/schema.cpp Normal file
View File

@ -0,0 +1,84 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "column/schema.h"
#include <algorithm>
namespace starrocks::vectorized {
Schema::Schema(Fields fields) : _fields(std::move(fields)) {
auto is_key = [](const FieldPtr& f) { return f->is_key(); };
_num_keys = std::count_if(_fields.begin(), _fields.end(), is_key);
_build_index_map(_fields);
}
void Schema::append(const FieldPtr& field) {
_fields.emplace_back(field);
_name_to_index.emplace(field->name(), _fields.size() - 1);
_num_keys += field->is_key();
}
void Schema::insert(size_t idx, const FieldPtr& field) {
DCHECK_LT(idx, _fields.size());
_fields.emplace(_fields.begin() + idx, field);
_name_to_index.clear();
_num_keys += field->is_key();
_build_index_map(_fields);
}
void Schema::remove(size_t idx) {
DCHECK_LT(idx, _fields.size());
_num_keys -= _fields[idx]->is_key();
if (idx == _fields.size() - 1) {
_name_to_index.erase(_fields[idx]->name());
_fields.erase(_fields.begin() + idx);
} else {
_fields.erase(_fields.begin() + idx);
_name_to_index.clear();
_build_index_map(_fields);
}
}
void Schema::set_fields(Fields fields) {
_fields = std::move(fields);
auto is_key = [](const FieldPtr& f) { return f->is_key(); };
_num_keys = std::count_if(_fields.begin(), _fields.end(), is_key);
_build_index_map(_fields);
}
const FieldPtr& Schema::field(size_t idx) const {
DCHECK_GE(idx, 0);
DCHECK_LT(idx, _fields.size());
return _fields[idx];
}
std::vector<std::string> Schema::field_names() const {
std::vector<std::string> names;
names.reserve(_fields.size());
for (const auto& field : _fields) {
names.push_back(field->name());
}
return names;
}
FieldPtr Schema::get_field_by_name(const std::string& name) const {
size_t idx = get_field_index_by_name(name);
return idx == -1 ? nullptr : _fields[idx];
}
void Schema::_build_index_map(const Fields& fields) {
for (size_t i = 0; i < fields.size(); i++) {
_name_to_index.emplace(fields[i]->name(), i);
}
}
size_t Schema::get_field_index_by_name(const std::string& name) const {
auto p = _name_to_index.find(name);
if (p == _name_to_index.end()) {
return -1;
}
return p->second;
}
} // namespace starrocks::vectorized

79
be/src/column/schema.h Normal file
View File

@ -0,0 +1,79 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <utility>
#include "column/field.h"
namespace starrocks::vectorized {
class Schema {
public:
Schema() = default;
explicit Schema(Fields fields);
size_t num_fields() const { return _fields.size(); }
size_t num_key_fields() const { return _num_keys; }
void reserve(size_t size) {
_fields.reserve(size);
_name_to_index.reserve(size);
}
void append(const FieldPtr& field);
void insert(size_t idx, const FieldPtr& field);
void remove(size_t idx);
void set_fields(Fields fields);
const FieldPtr& field(size_t idx) const;
const Fields& fields() const { return _fields; }
std::vector<std::string> field_names() const;
// return null if name not found
FieldPtr get_field_by_name(const std::string& name) const;
size_t get_field_index_by_name(const std::string& name) const;
void convert_to(Schema* new_schema, const std::vector<FieldType>& new_types) const {
// fields
int num_fields = _fields.size();
new_schema->_fields.resize(num_fields);
for (int i = 0; i < num_fields; ++i) {
auto cid = _fields[i]->id();
auto new_type = new_types[cid];
if (_fields[i]->type()->type() == new_type) {
new_schema->_fields[i] = _fields[i]->copy();
} else {
new_schema->_fields[i] = _fields[i]->convert_to(new_type);
}
}
new_schema->_num_keys = _num_keys;
new_schema->_name_to_index = _name_to_index;
}
private:
void _build_index_map(const Fields& fields);
Fields _fields;
size_t _num_keys = 0;
std::unordered_map<std::string, size_t> _name_to_index;
};
inline std::ostream& operator<<(std::ostream& os, const Schema& schema) {
const Fields& fields = schema.fields();
os << "(";
if (!fields.empty()) {
os << *fields[0];
}
for (size_t i = 1; i < fields.size(); i++) {
os << ", " << *fields[i];
}
os << ")";
return os;
}
} // namespace starrocks::vectorized

292
be/src/column/type_traits.h Normal file
View File

@ -0,0 +1,292 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "column/binary_column.h"
#include "column/decimalv3_column.h"
#include "column/nullable_column.h"
#include "column/object_column.h"
#include "runtime/primitive_type.h"
namespace starrocks {
namespace vectorized {
template <bool B, typename T>
struct cond {
static constexpr bool value = B;
using type = T;
};
template <typename Condition, typename... OtherConditions>
struct type_select {
using type = std::conditional_t<Condition::value, typename Condition::type,
typename type_select<OtherConditions...>::type>;
};
template <typename Condition>
struct type_select<Condition> {
using type = std::conditional_t<Condition::value, typename Condition::type, void>;
};
template <typename Condition, typename... OtherConditions>
using type_select_t = typename type_select<Condition, OtherConditions...>::type;
template <typename T>
constexpr bool IsInt128 = false;
template <>
inline constexpr bool IsInt128<int128_t> = true;
template <typename T>
constexpr bool IsSlice = false;
template <>
inline constexpr bool IsSlice<Slice> = true;
template <typename T>
constexpr bool IsDateTime = false;
template <>
inline constexpr bool IsDateTime<TimestampValue> = true;
template <>
inline constexpr bool IsDateTime<DateValue> = true;
template <typename T>
constexpr bool IsObject = false;
template <>
inline constexpr bool IsObject<HyperLogLog> = true;
template <>
inline constexpr bool IsObject<BitmapValue> = true;
template <>
inline constexpr bool IsObject<PercentileValue> = true;
template <typename T>
using is_starrocks_arithmetic = std::integral_constant<bool, std::is_arithmetic_v<T> || IsDecimal<T>>;
template <typename T>
using is_sum_bigint = std::integral_constant<bool, std::is_integral_v<T> && !IsInt128<T>>;
// If isArithmeticPT is true, means this type support +,-,*,/
template <PrimitiveType primitive_type>
constexpr bool isArithmeticPT = true;
template <>
inline constexpr bool isArithmeticPT<TYPE_CHAR> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_VARCHAR> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_DATE> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_DATETIME> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_HLL> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_OBJECT> = false;
template <>
inline constexpr bool isArithmeticPT<TYPE_PERCENTILE> = false;
template <PrimitiveType primitive_type>
constexpr bool isSlicePT = false;
template <>
inline constexpr bool isSlicePT<TYPE_CHAR> = true;
template <>
inline constexpr bool isSlicePT<TYPE_VARCHAR> = true;
template <PrimitiveType primitive_type>
struct RunTimeTypeTraits {};
template <>
struct RunTimeTypeTraits<TYPE_BOOLEAN> {
using CppType = uint8_t;
using ColumnType = BooleanColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_TINYINT> {
using CppType = int8_t;
using ColumnType = Int8Column;
};
template <>
struct RunTimeTypeTraits<TYPE_SMALLINT> {
using CppType = int16_t;
using ColumnType = Int16Column;
};
template <>
struct RunTimeTypeTraits<TYPE_INT> {
using CppType = int32_t;
using ColumnType = Int32Column;
};
template <>
struct RunTimeTypeTraits<TYPE_BIGINT> {
using CppType = int64_t;
using ColumnType = Int64Column;
};
template <>
struct RunTimeTypeTraits<TYPE_LARGEINT> {
using CppType = int128_t;
using ColumnType = Int128Column;
};
template <>
struct RunTimeTypeTraits<TYPE_FLOAT> {
using CppType = float;
using ColumnType = FloatColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_DOUBLE> {
using CppType = double;
using ColumnType = DoubleColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_DECIMALV2> {
using CppType = DecimalV2Value;
using ColumnType = DecimalColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_DECIMAL32> {
using CppType = int32_t;
using ColumnType = Decimal32Column;
};
template <>
struct RunTimeTypeTraits<TYPE_DECIMAL64> {
using CppType = int64_t;
using ColumnType = Decimal64Column;
};
template <>
struct RunTimeTypeTraits<TYPE_DECIMAL128> {
using CppType = int128_t;
using ColumnType = Decimal128Column;
};
template <>
struct RunTimeTypeTraits<TYPE_NULL> {
using CppType = uint8_t;
using ColumnType = NullColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_CHAR> {
using CppType = Slice;
using ColumnType = BinaryColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_VARCHAR> {
using CppType = Slice;
using ColumnType = BinaryColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_DATE> {
using CppType = DateValue;
using ColumnType = DateColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_DATETIME> {
using CppType = TimestampValue;
using ColumnType = TimestampColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_TIME> {
using CppType = double;
using ColumnType = DoubleColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_HLL> {
using CppType = HyperLogLog*;
using ColumnType = HyperLogLogColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_OBJECT> {
using CppType = BitmapValue*;
using ColumnType = BitmapColumn;
};
template <>
struct RunTimeTypeTraits<TYPE_PERCENTILE> {
using CppType = PercentileValue*;
using ColumnType = PercentileColumn;
};
template <PrimitiveType Type>
using RunTimeCppType = typename RunTimeTypeTraits<Type>::CppType;
template <PrimitiveType Type>
using RunTimeColumnType = typename RunTimeTypeTraits<Type>::ColumnType;
template <typename T>
struct ColumnTraits {};
template <>
struct ColumnTraits<bool> {
using ColumnType = BooleanColumn;
};
template <>
struct ColumnTraits<int8_t> {
using ColumnType = Int8Column;
};
template <>
struct ColumnTraits<int16_t> {
using ColumnType = Int16Column;
};
template <>
struct ColumnTraits<int32_t> {
using ColumnType = Int32Column;
};
template <>
struct ColumnTraits<int64_t> {
using ColumnType = Int64Column;
};
template <>
struct ColumnTraits<int128_t> {
using ColumnType = Int128Column;
};
template <>
struct ColumnTraits<float> {
using ColumnType = FloatColumn;
};
template <>
struct ColumnTraits<double> {
using ColumnType = DoubleColumn;
};
template <>
struct ColumnTraits<DecimalV2Value> {
using ColumnType = DecimalColumn;
};
template <>
struct ColumnTraits<Slice> {
using ColumnType = BinaryColumn;
};
template <>
struct ColumnTraits<DateValue> {
using ColumnType = DateColumn;
};
template <>
struct ColumnTraits<TimestampValue> {
using ColumnType = TimestampColumn;
};
} // namespace vectorized
} // namespace starrocks

View File

@ -0,0 +1,90 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include <memory>
#include <vector>
#pragma once
namespace starrocks {
class DecimalV2Value;
class HyperLogLog;
class BitmapValue;
class PercentileValue;
namespace vectorized {
class DateValue;
class TimestampValue;
typedef __int128 int128_t;
class Chunk;
class Field;
class Column;
class Schema;
struct RuntimeChunkMeta;
// We may change the Buffer implementation in the future.
template <typename T>
using Buffer = std::vector<T>;
class ArrayColumn;
class BinaryColumn;
template <typename T>
class FixedLengthColumn;
template <typename T>
class DecimalV3Column;
using ColumnPtr = std::shared_ptr<Column>;
using MutableColumnPtr = std::unique_ptr<Column>;
using Columns = std::vector<ColumnPtr>;
using MutableColumns = std::vector<MutableColumnPtr>;
using UInt8Column = FixedLengthColumn<uint8_t>;
using BooleanColumn = UInt8Column;
using Int8Column = FixedLengthColumn<int8_t>;
using Int16Column = FixedLengthColumn<int16_t>;
using Int32Column = FixedLengthColumn<int32_t>;
using UInt32Column = FixedLengthColumn<uint32_t>;
using Int64Column = FixedLengthColumn<int64_t>;
using UInt64Column = FixedLengthColumn<uint64_t>;
using Int128Column = FixedLengthColumn<int128_t>;
using DoubleColumn = FixedLengthColumn<double>;
using FloatColumn = FixedLengthColumn<float>;
using DateColumn = FixedLengthColumn<DateValue>;
using DecimalColumn = FixedLengthColumn<DecimalV2Value>;
using TimestampColumn = FixedLengthColumn<TimestampValue>;
using Decimal32Column = DecimalV3Column<int32_t>;
using Decimal64Column = DecimalV3Column<int64_t>;
using Decimal128Column = DecimalV3Column<int128_t>;
template <typename T>
constexpr bool is_decimal_column = false;
template <typename T>
constexpr bool is_decimal_column<DecimalV3Column<T>> = true;
template <typename ColumnType>
using DecimalColumnType = std::enable_if_t<is_decimal_column<ColumnType>, ColumnType>;
template <typename T>
class ObjectColumn;
using HyperLogLogColumn = ObjectColumn<HyperLogLog>;
using BitmapColumn = ObjectColumn<BitmapValue>;
using PercentileColumn = ObjectColumn<PercentileValue>;
using ChunkPtr = std::shared_ptr<Chunk>;
using ChunkUniquePtr = std::unique_ptr<Chunk>;
using SchemaPtr = std::shared_ptr<Schema>;
using Fields = std::vector<std::shared_ptr<Field>>;
using FieldPtr = std::shared_ptr<Field>;
using Filter = Buffer<uint8_t>;
using FilterPtr = std::shared_ptr<Filter>;
} // namespace vectorized
} // namespace starrocks

View File

@ -0,0 +1,36 @@
# This file is made available under Elastic License 2.0.
# This file is based on code available under the Apache license here:
# https://github.com/apache/incubator-doris/blob/master/be/src/common/CMakeLists.txt
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# where to put generated libraries
set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/common")
add_library(Common STATIC
daemon.cpp
status.cpp
statusor.cpp
resource_tls.cpp
logconfig.cpp
configbase.cpp
minidump.cpp
)
# Generate env_config.h according to env_config.h.in
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/env_config.h.in ${GENSRC_DIR}/common/env_config.h)

View File

@ -0,0 +1,78 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/compiler_util.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_COMPILER_UTIL_H
#define STARROCKS_BE_SRC_COMMON_COMMON_COMPILER_UTIL_H
// Compiler hint that this branch is likely or unlikely to
// be taken. Take from the "What all programmers should know
// about memory" paper.
// example: if (LIKELY(size > 0)) { ... }
// example: if (UNLIKELY(!status.ok())) { ... }
#define CACHE_LINE_SIZE 64
#ifdef LIKELY
#undef LIKELY
#endif
#ifdef UNLIKELY
#undef UNLIKELY
#endif
#define LIKELY(expr) __builtin_expect(!!(expr), 1)
#define UNLIKELY(expr) __builtin_expect(!!(expr), 0)
#define PREFETCH(addr) __builtin_prefetch(addr)
/// Force inlining. The 'inline' keyword is treated by most compilers as a hint,
/// not a command. This should be used sparingly for cases when either the function
/// needs to be inlined for a specific reason or the compiler's heuristics make a bad
/// decision, e.g. not inlining a small function on a hot path.
#define ALWAYS_INLINE __attribute__((always_inline))
#define ALIGN_CACHE_LINE __attribute__((aligned(CACHE_LINE_SIZE)))
#ifdef __clang__
#define DIAGNOSTIC_PUSH _Pragma("clang diagnostic push")
#define DIAGNOSTIC_POP _Pragma("clang diagnostic pop")
#elif defined(__GNUC__)
#define DIAGNOSTIC_PUSH _Pragma("GCC diagnostic push")
#define DIAGNOSTIC_POP _Pragma("GCC diagnostic pop")
#elif defined(_MSC_VER)
#define DIAGNOSTIC_PUSH __pragma(warning(push))
#define DIAGNOSTIC_POP __pragma(warning(pop))
#else
#error("Unknown compiler")
#endif
#define PRAGMA(TXT) _Pragma(#TXT)
#ifdef __clang__
#define DIAGNOSTIC_IGNORE(XXX) PRAGMA(clang diagnostic ignored XXX)
#elif defined(__GNUC__)
#define DIAGNOSTIC_IGNORE(XXX) PRAGMA(GCC diagnostic ignored XXX)
#elif defined(_MSC_VER)
#define DIAGNOSTIC_IGNORE(XXX) __pragma(warning(disable : XXX))
#else
#define DIAGNOSTIC_IGNORE(XXX)
#endif
#endif

626
be/src/common/config.h Normal file
View File

@ -0,0 +1,626 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/config.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_CONFIG_H
#define STARROCKS_BE_SRC_COMMON_CONFIG_H
#include "configbase.h"
namespace starrocks {
namespace config {
// cluster id
CONF_Int32(cluster_id, "-1");
// port on which ImpalaInternalService is exported
CONF_Int32(be_port, "9060");
// port for brpc
CONF_Int32(brpc_port, "8060");
// the number of bthreads for brpc, the default value is set to -1, which means the number of bthreads is #cpu-cores
CONF_Int32(brpc_num_threads, "-1");
// Declare a selection strategy for those servers have many ips.
// Note that there should at most one ip match this list.
// this is a list in semicolon-delimited format, in CIDR notation, e.g. 10.10.10.0/24
// If no ip match this rule, will choose one randomly.
CONF_String(priority_networks, "");
////
//// tcmalloc gc parameter
////
// min memory for TCmalloc, when used memory is smaller than this, do not returned to OS
CONF_mInt64(tc_use_memory_min, "10737418240");
// free memory rate.[0-100]
CONF_mInt64(tc_free_memory_rate, "20");
// Bound on the total amount of bytes allocated to thread caches.
// This bound is not strict, so it is possible for the cache to go over this bound
// in certain circumstances. The maximum value of this flag is capped to 1GB.
// This value defaults to 1GB.
// If you suspect your application is not scaling to many threads due to lock contention in TCMalloc,
// you can try increasing this value. This may improve performance, at a cost of extra memory
// use by TCMalloc.
// reference: https://gperftools.github.io/gperftools/tcmalloc.html: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
// https://github.com/gperftools/gperftools/issues/1111
CONF_Int64(tc_max_total_thread_cache_bytes, "1073741824");
// process memory limit specified as number of bytes
// ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'),
// or percentage of the physical memory ('<int>%').
// defaults to bytes if no unit is given"
// must larger than 0. and if larger than physical memory size,
// it will be set to physical memory size.
CONF_String(mem_limit, "80%");
// the port heartbeat service used
CONF_Int32(heartbeat_service_port, "9050");
// the count of heart beat service
CONF_Int32(heartbeat_service_thread_count, "1");
// the count of thread to create table
CONF_Int32(create_tablet_worker_count, "3");
// the count of thread to drop table
CONF_Int32(drop_tablet_worker_count, "3");
// the count of thread to batch load
CONF_Int32(push_worker_count_normal_priority, "3");
// the count of thread to high priority batch load
CONF_Int32(push_worker_count_high_priority, "3");
// the count of thread to publish version
CONF_Int32(publish_version_worker_count, "8");
// the count of thread to clear transaction task
CONF_Int32(clear_transaction_task_worker_count, "1");
// the count of thread to delete
CONF_Int32(delete_worker_count, "3");
// the count of thread to alter table
CONF_Int32(alter_tablet_worker_count, "3");
// the count of thread to clone
CONF_Int32(clone_worker_count, "3");
// the count of thread to clone
CONF_Int32(storage_medium_migrate_count, "1");
// the count of thread to check consistency
CONF_Int32(check_consistency_worker_count, "1");
// the count of thread to upload
CONF_Int32(upload_worker_count, "1");
// the count of thread to download
CONF_Int32(download_worker_count, "1");
// the count of thread to make snapshot
CONF_Int32(make_snapshot_worker_count, "5");
// the count of thread to release snapshot
CONF_Int32(release_snapshot_worker_count, "5");
// the interval time(seconds) for agent report tasks signatrue to FE
CONF_mInt32(report_task_interval_seconds, "10");
// the interval time(seconds) for agent report disk state to FE
CONF_mInt32(report_disk_state_interval_seconds, "60");
// the interval time(seconds) for agent report olap table to FE
CONF_mInt32(report_tablet_interval_seconds, "60");
// the interval time(seconds) for agent report plugin status to FE
// CONF_Int32(report_plugin_interval_seconds, "120");
// the timeout(seconds) for alter table
// CONF_Int32(alter_tablet_timeout_seconds, "86400");
// the timeout(seconds) for make snapshot
// CONF_Int32(make_snapshot_timeout_seconds, "600");
// the timeout(seconds) for release snapshot
// CONF_Int32(release_snapshot_timeout_seconds, "600");
// the max download speed(KB/s)
CONF_mInt32(max_download_speed_kbps, "50000");
// download low speed limit(KB/s)
CONF_mInt32(download_low_speed_limit_kbps, "50");
// download low speed time(seconds)
CONF_mInt32(download_low_speed_time, "300");
// curl verbose mode
// CONF_Int64(curl_verbose_mode, "1");
// seconds to sleep for each time check table status
// CONF_Int32(check_status_sleep_time_seconds, "10");
// sleep time for one second
CONF_Int32(sleep_one_second, "1");
// sleep time for five seconds
CONF_Int32(sleep_five_seconds, "5");
// log dir
CONF_String(sys_log_dir, "${STARROCKS_HOME}/log");
CONF_String(user_function_dir, "${STARROCKS_HOME}/lib/udf");
// INFO, WARNING, ERROR, FATAL
CONF_String(sys_log_level, "INFO");
// TIME-DAY, TIME-HOUR, SIZE-MB-nnn
CONF_String(sys_log_roll_mode, "SIZE-MB-1024");
// log roll num
CONF_Int32(sys_log_roll_num, "10");
// verbose log
CONF_Strings(sys_log_verbose_modules, "");
// verbose log level
CONF_Int32(sys_log_verbose_level, "10");
// log buffer level
CONF_String(log_buffer_level, "");
// Pull load task dir
CONF_String(pull_load_task_dir, "${STARROCKS_HOME}/var/pull_load");
// the maximum number of bytes to display on the debug webserver's log page
CONF_Int64(web_log_bytes, "1048576");
// number of threads available to serve backend execution requests
CONF_Int32(be_service_threads, "64");
// key=value pair of default query options for StarRocks, separated by ','
CONF_String(default_query_options, "");
// If non-zero, StarRocks will output memory usage every log_mem_usage_interval'th fragment completion.
// CONF_Int32(log_mem_usage_interval, "0");
// Controls the number of threads to run work per core. It's common to pick 2x
// or 3x the number of cores. This keeps the cores busy without causing excessive
// thrashing.
CONF_Int32(num_threads_per_core, "3");
// if true, compresses tuple data in Serialize
CONF_Bool(compress_rowbatches, "true");
// compress ratio when shuffle row_batches in network, not in storage engine.
// If ratio is less than this value, use uncompressed data instead
CONF_mDouble(rpc_compress_ratio_threshold, "1.1");
// serialize and deserialize each returned row batch
CONF_Bool(serialize_batch, "false");
// interval between profile reports; in seconds
CONF_mInt32(status_report_interval, "5");
// Local directory to copy UDF libraries from HDFS into
CONF_String(local_library_dir, "${UDF_RUNTIME_DIR}");
// number of olap scanner thread pool size
CONF_Int32(doris_scanner_thread_pool_thread_num, "48");
// number of olap scanner thread pool size
CONF_Int32(doris_scanner_thread_pool_queue_size, "102400");
// number of etl thread pool size
CONF_Int32(etl_thread_pool_size, "8");
// number of etl thread pool size
CONF_Int32(etl_thread_pool_queue_size, "256");
// port on which to run StarRocks test backend
CONF_Int32(port, "20001");
// default thrift client connect timeout(in seconds)
CONF_Int32(thrift_connect_timeout_seconds, "3");
// broker write timeout in seconds
CONF_Int32(broker_write_timeout_seconds, "30");
// default thrift client retry interval (in milliseconds)
CONF_mInt64(thrift_client_retry_interval_ms, "100");
// max row count number for single scan range
CONF_mInt32(doris_scan_range_row_count, "524288");
// size of scanner queue between scanner thread and compute thread
CONF_mInt32(doris_scanner_queue_size, "1024");
// single read execute fragment row size
CONF_mInt32(doris_scanner_row_num, "16384");
// number of max scan keys
CONF_mInt32(doris_max_scan_key_num, "1024");
// the max number of push down values of a single column.
// if exceed, no conditions will be pushed down for that column.
CONF_mInt32(max_pushdown_conditions_per_column, "1024");
// return_row / total_row
CONF_mInt32(doris_max_pushdown_conjuncts_return_rate, "90");
// (Advanced) Maximum size of per-query receive-side buffer
CONF_mInt32(exchg_node_buffer_size_bytes, "10485760");
// insert sort threadhold for sorter
// CONF_Int32(insertion_threadhold, "16");
// the block_size every block allocate for sorter
CONF_Int32(sorter_block_size, "8388608");
// push_write_mbytes_per_sec
CONF_Int32(push_write_mbytes_per_sec, "10");
CONF_mInt64(column_dictionary_key_ratio_threshold, "0");
CONF_mInt64(column_dictionary_key_size_threshold, "0");
// if true, output IR after optimization passes
// CONF_Bool(dump_ir, "false");
// if set, saves the generated IR to the output file.
//CONF_String(module_output, "");
// memory_limitation_per_thread_for_schema_change unit GB
CONF_mInt32(memory_limitation_per_thread_for_schema_change, "2");
// CONF_Int64(max_unpacked_row_block_size, "104857600");
CONF_mInt32(update_cache_expire_sec, "360");
CONF_mInt32(file_descriptor_cache_clean_interval, "3600");
CONF_mInt32(disk_stat_monitor_interval, "5");
CONF_mInt32(unused_rowset_monitor_interval, "30");
CONF_String(storage_root_path, "${STARROCKS_HOME}/storage");
// BE process will exit if the percentage of error disk reach this value.
CONF_mInt32(max_percentage_of_error_disk, "0");
// CONF_Int32(default_num_rows_per_data_block, "1024");
CONF_mInt32(default_num_rows_per_column_file_block, "1024");
CONF_Int32(max_tablet_num_per_shard, "1024");
// pending data policy
CONF_mInt32(pending_data_expire_time_sec, "1800");
// inc_rowset expired interval
CONF_mInt32(inc_rowset_expired_sec, "1800");
// inc_rowset snapshot rs sweep time interval
CONF_mInt32(tablet_rowset_stale_sweep_time_sec, "1800");
// garbage sweep policy
CONF_Int32(max_garbage_sweep_interval, "3600");
CONF_Int32(min_garbage_sweep_interval, "180");
CONF_mInt32(snapshot_expire_time_sec, "172800");
CONF_mInt32(trash_file_expire_time_sec, "259200");
// check row nums for BE/CE and schema change. true is open, false is closed.
CONF_mBool(row_nums_check, "true");
//file descriptors cache, by default, cache 16384 descriptors
CONF_Int32(file_descriptor_cache_capacity, "16384");
// minimum file descriptor number
// modify them upon necessity
CONF_Int32(min_file_descriptor_number, "60000");
CONF_Int64(index_stream_cache_capacity, "10737418240");
// CONF_Int64(max_packed_row_block_size, "20971520");
// Cache for stoage page size
CONF_String(storage_page_cache_limit, "0");
// whether to disable page cache feature in storage
CONF_Bool(disable_storage_page_cache, "true");
CONF_mInt32(base_compaction_check_interval_seconds, "60");
CONF_mInt64(base_compaction_num_cumulative_deltas, "5");
CONF_Int32(base_compaction_num_threads_per_disk, "1");
CONF_mDouble(base_cumulative_delta_ratio, "0.3");
CONF_mInt64(base_compaction_interval_seconds_since_last_operation, "86400");
CONF_mInt32(base_compaction_write_mbytes_per_sec, "5");
// cumulative compaction policy: max delta file's size unit:B
CONF_mInt32(cumulative_compaction_check_interval_seconds, "1");
CONF_mInt64(min_cumulative_compaction_num_singleton_deltas, "5");
CONF_mInt64(max_cumulative_compaction_num_singleton_deltas, "1000");
CONF_Int32(cumulative_compaction_num_threads_per_disk, "1");
CONF_mInt64(cumulative_compaction_budgeted_bytes, "104857600");
// CONF_Int32(cumulative_compaction_write_mbytes_per_sec, "100");
// cumulative compaction skips recently published deltas in order to prevent
// compacting a version that might be queried (in case the query planning phase took some time).
// the following config set the window size
CONF_mInt32(cumulative_compaction_skip_window_seconds, "30");
CONF_mInt32(update_compaction_check_interval_seconds, "60");
CONF_Int32(update_compaction_num_threads_per_disk, "1");
CONF_Int32(update_compaction_per_tablet_min_interval_seconds, "120"); // 2min
// if compaction of a tablet failed, this tablet should not be chosen to
// compaction until this interval passes.
CONF_mInt64(min_compaction_failure_interval_sec, "120"); // 2 min
// Too many compaction tasks may run out of memory.
// This config is to limit the max concurrency of running compaction tasks.
// -1 means no limit, and the max concurrency will be:
// C = (cumulative_compaction_num_threads_per_disk + base_compaction_num_threads_per_disk) * dir_num
// set it to larger than C will be set to equal to C.
// This config can be set to 0, which means to forbid any compaction, for some special cases.
CONF_Int32(max_compaction_concurrency, "-1");
// Threshold to logging compaction trace, in seconds.
CONF_mInt32(base_compaction_trace_threshold, "120");
CONF_mInt32(cumulative_compaction_trace_threshold, "60");
CONF_mInt32(update_compaction_trace_threshold, "20");
// Port to start debug webserver on
CONF_Int32(webserver_port, "8040");
// Number of webserver workers
CONF_Int32(webserver_num_workers, "48");
// Period to update rate counters and sampling counters in ms.
CONF_mInt32(periodic_counter_update_period_ms, "500");
// Used for mini Load. mini load data file will be removed after this time.
CONF_Int64(load_data_reserve_hours, "4");
// log error log will be removed after this time
CONF_mInt64(load_error_log_reserve_hours, "48");
CONF_Int32(number_tablet_writer_threads, "16");
// Automatically detect whether a char/varchar column to use dictionary encoding
// If the number of keys in a dictionary is greater than this fraction of the total number of rows
// turn off dictionary dictionary encoding. This only will detect first chunk
// set to 1 means always use dictionary encoding
CONF_Double(dictionary_encoding_ratio, "0.7");
// The minimum chunk size for dictionary encoding speculation
CONF_Int32(dictionary_speculate_min_chunk_size, "10000");
// The maximum amount of data that can be processed by a stream load
CONF_mInt64(streaming_load_max_mb, "10240");
// Some data formats, such as JSON, cannot be streamed.
// Therefore, it is necessary to limit the maximum number of
// such data when using stream load to prevent excessive memory consumption.
CONF_mInt64(streaming_load_max_batch_size_mb, "100");
// the alive time of a TabletsChannel.
// If the channel does not receive any data till this time,
// the channel will be removed.
CONF_Int32(streaming_load_rpc_max_alive_time_sec, "1200");
// the timeout of a rpc to open the tablet writer in remote BE.
// short operation time, can set a short timeout
CONF_Int32(tablet_writer_open_rpc_timeout_sec, "60");
// Deprecated, use query_timeout instread
// the timeout of a rpc to process one batch in tablet writer.
// you may need to increase this timeout if using larger 'streaming_load_max_mb',
// or encounter 'tablet writer write failed' error when loading.
// CONF_Int32(tablet_writer_rpc_timeout_sec, "600");
// OlapTableSink sender's send interval, should be less than the real response time of a tablet writer rpc.
CONF_mInt32(olap_table_sink_send_interval_ms, "10");
// Fragment thread pool
CONF_Int32(fragment_pool_thread_num_min, "64");
CONF_Int32(fragment_pool_thread_num_max, "4096");
CONF_Int32(fragment_pool_queue_size, "2048");
//for cast
// CONF_Bool(cast, "true");
// Spill to disk when query
// Writable scratch directories, splitted by ";"
CONF_String(query_scratch_dirs, "${STARROCKS_HOME}");
// Control the number of disks on the machine. If 0, this comes from the system settings.
CONF_Int32(num_disks, "0");
// The maximum number of the threads per disk is also the max queue depth per disk.
CONF_Int32(num_threads_per_disk, "0");
// The read size is the size of the reads sent to os.
// There is a trade off of latency and throughout, trying to keep disks busy but
// not introduce seeks. The literature seems to agree that with 8 MB reads, random
// io and sequential io perform similarly.
CONF_Int32(read_size, "8388608"); // 8 * 1024 * 1024, Read Size (in bytes)
CONF_Int32(min_buffer_size, "1024"); // 1024, The minimum read buffer size (in bytes)
// For each io buffer size, the maximum number of buffers the IoMgr will hold onto
// With 1024B through 8MB buffers, this is up to ~2GB of buffers.
CONF_Int32(max_free_io_buffers, "128");
CONF_Bool(disable_mem_pools, "false");
// Whether to allocate chunk using mmap. If you enable this, you'd better to
// increase vm.max_map_count's value whose default value is 65530.
// you can do it as root via "sysctl -w vm.max_map_count=262144" or
// "echo 262144 > /proc/sys/vm/max_map_count"
// NOTE: When this is set to true, you must set chunk_reserved_bytes_limit
// to a relative large number or the performace is very very bad.
CONF_Bool(use_mmap_allocate_chunk, "false");
// Chunk Allocator's reserved bytes limit,
// Default value is 2GB, increase this variable can improve performance, but will
// acquire more free memory which can not be used by other modules
CONF_Int64(chunk_reserved_bytes_limit, "2147483648");
// The probing algorithm of partitioned hash table.
// Enable quadratic probing hash table
CONF_Bool(enable_quadratic_probing, "false");
// for pprof
CONF_String(pprof_profile_dir, "${STARROCKS_HOME}/log");
// for partition
// CONF_Bool(enable_partitioned_hash_join, "false")
CONF_Bool(enable_partitioned_aggregation, "true");
// to forward compatibility, will be removed later
CONF_mBool(enable_token_check, "true");
// to open/close system metrics
CONF_Bool(enable_system_metrics, "true");
CONF_mBool(enable_prefetch, "true");
// Number of cores StarRocks will used, this will effect only when it's greater than 0.
// Otherwise, StarRocks will use all cores returned from "/proc/cpuinfo".
CONF_Int32(num_cores, "0");
// CONF_Bool(thread_creation_fault_injection, "false");
// Set this to encrypt and perform an integrity
// check on all data spilled to disk during a query
// CONF_Bool(disk_spill_encryption, "false");
// When BE start, If there is a broken disk, BE process will exit by default.
// Otherwise, we will ignore the broken disk,
CONF_Bool(ignore_broken_disk, "false");
// Writable scratch directories
CONF_String(scratch_dirs, "/tmp");
// If false and --scratch_dirs contains multiple directories on the same device,
// then only the first writable directory is used
// CONF_Bool(allow_multiple_scratch_dirs_per_device, "false");
// linux transparent huge page
CONF_Bool(madvise_huge_pages, "false");
// whether use mmap to allocate memory
CONF_Bool(mmap_buffers, "false");
// max memory can be allocated by buffer pool
CONF_String(buffer_pool_limit, "80G");
// clean page can be hold by buffer pool
CONF_String(buffer_pool_clean_pages_limit, "20G");
// Sleep time in seconds between memory maintenance iterations
CONF_mInt64(memory_maintenance_sleep_time_s, "10");
// Aligement
CONF_Int32(memory_max_alignment, "16");
// write buffer size before flush
CONF_mInt64(write_buffer_size, "104857600");
// following 2 configs limit the memory consumption of load process on a Backend.
// eg: memory limit to 80% of mem limit config but up to 100GB(default)
// NOTICE(cmy): set these default values very large because we don't want to
// impact the load performace when user upgrading StarRocks.
// user should set these configs properly if necessary.
CONF_Int64(load_process_max_memory_limit_bytes, "107374182400"); // 100GB
CONF_Int32(load_process_max_memory_limit_percent, "30"); // 30%
CONF_Int64(compaction_mem_limit, "2147483648"); // 2G
// update interval of tablet stat cache
CONF_mInt32(tablet_stat_cache_update_interval_second, "300");
// result buffer cancelled time (unit: second)
CONF_mInt32(result_buffer_cancelled_interval_time, "300");
// the increased frequency of priority for remaining tasks in BlockingPriorityQueue
CONF_mInt32(priority_queue_remaining_tasks_increased_frequency, "512");
// sync tablet_meta when modifing meta
CONF_mBool(sync_tablet_meta, "false");
// default thrift rpc timeout ms
CONF_mInt32(thrift_rpc_timeout_ms, "5000");
// txn commit rpc timeout
CONF_mInt32(txn_commit_rpc_timeout_ms, "10000");
// If set to true, metric calculator will run
CONF_Bool(enable_metric_calculator, "true");
// max consumer num in one data consumer group, for routine load
CONF_mInt32(max_consumer_num_per_group, "3");
// the size of thread pool for routine load task.
// this should be larger than FE config 'max_concurrent_task_num_per_be' (default 5)
CONF_Int32(routine_load_thread_pool_size, "10");
// Is set to true, index loading failure will not causing BE exit,
// and the tablet will be marked as bad, so that FE will try to repair it.
// CONF_Bool(auto_recover_index_loading_failure, "false");
// max external scan cache batch count, means cache max_memory_cache_batch_count * batch_size row
// default is 20, batch_size's defualt value is 1024 means 20 * 1024 rows will be cached
CONF_mInt32(max_memory_sink_batch_count, "20");
// This configuration is used for the context gc thread schedule period
// note: unit is minute, default is 5min
CONF_mInt32(scan_context_gc_interval_min, "5");
// es scroll keep-alive
CONF_String(es_scroll_keepalive, "5m");
// HTTP connection timeout for es
CONF_Int32(es_http_timeout_ms, "5000");
// the max client cache number per each host
// There are variety of client cache in BE, but currently we use the
// same cache size configuration.
// TODO(cmy): use different config to set different client cache if necessary.
CONF_Int32(max_client_cache_size_per_host, "10");
// Dir to save files downloaded by SmallFileMgr
CONF_String(small_file_dir, "${STARROCKS_HOME}/lib/small_file/");
// path gc
CONF_Bool(path_gc_check, "true");
CONF_Int32(path_gc_check_interval_second, "86400");
CONF_mInt32(path_gc_check_step, "1000");
CONF_mInt32(path_gc_check_step_interval_ms, "10");
CONF_mInt32(path_scan_interval_second, "86400");
// The following 2 configs limit the max usage of disk capacity of a data dir.
// If both of these 2 threshold reached, no more data can be writen into that data dir.
// The percent of max used capacity of a data dir
CONF_mInt32(storage_flood_stage_usage_percent, "95"); // 95%
// The min bytes that should be left of a data dir
CONF_mInt64(storage_flood_stage_left_capacity_bytes, "1073741824"); // 1GB
// number of thread for flushing memtable per store
CONF_Int32(flush_thread_num_per_store, "2");
// config for tablet meta checkpoint
CONF_mInt32(tablet_meta_checkpoint_min_new_rowsets_num, "10");
CONF_mInt32(tablet_meta_checkpoint_min_interval_secs, "600");
// Maximum size of a single message body in all protocols
CONF_Int64(brpc_max_body_size, "2147483648");
// Max unwritten bytes in each socket, if the limit is reached, Socket.Write fails with EOVERCROWDED
CONF_Int64(brpc_socket_max_unwritten_bytes, "1073741824");
// max number of txns for every txn_partition_map in txn manager
// this is a self protection to avoid too many txns saving in manager
CONF_mInt64(max_runnings_transactions_per_txn_map, "100");
// tablet_map_lock shard size, the value is 2^n, n=0,1,2,3,4
// this is a an enhancement for better performance to manage tablet
CONF_Int32(tablet_map_shard_size, "1");
CONF_String(plugin_path, "${STARROCKS_HOME}/plugin");
// txn_map_lock shard size, the value is 2^n, n=0,1,2,3,4
// this is a an enhancement for better performance to manage txn
CONF_Int32(txn_map_shard_size, "128");
// txn_lock shard size, the value is 2^n, n=0,1,2,3,4
// this is a an enhancement for better performance to commit and publish txn
CONF_Int32(txn_shard_size, "1024");
// Whether to continue to start be when load tablet from header failed.
CONF_Bool(ignore_load_tablet_failure, "false");
// Whether to continue to start be when load tablet from header failed.
CONF_Bool(ignore_rowset_stale_unconsistent_delete, "false");
// The chunk size for vector query engine
CONF_Int32(vector_chunk_size, "4096");
// valid range: [0-1000].
// `0` will disable late materialization.
// `1000` will enable late materialization always.
CONF_Int32(late_materialization_ratio, "10");
// valid range: [0-1000].
// `0` will disable late materialization select metric type.
// `1000` will enable late materialization always select metric type.
CONF_Int32(metric_late_materialization_ratio, "1000");
// Max batched bytes for each transmit request
CONF_Int64(max_transmit_batched_bytes, "65536");
CONF_Int16(bitmap_max_filter_items, "30");
// valid range: [0-1000].
CONF_Int16(bitmap_max_filter_ratio, "1");
CONF_Bool(bitmap_filter_enable_not_equal, "false");
// Only 1 and 2 is valid.
// When storage_format_version is 1, use origin storage format for Date, Datetime and Decimal
// type.
// When storage_format_version is 2, DATE_V2, TIMESTAMP and DECIMAL_V2 will be used as
// storage format.
CONF_mInt16(storage_format_version, "2");
// do pre-aggregate if effect great than the factor, factor range:[1-100].
CONF_Int16(pre_aggregate_factor, "80");
// enable genearate minidump for crash
CONF_Bool(sys_minidump_enable, "false");
// minidump dir(generated by google_breakpad)
CONF_String(sys_minidump_dir, "${STARROCKS_HOME}");
// max minidump files number could exist
CONF_mInt32(sys_minidump_max_files, "16");
// max minidump file size could exist
CONF_mInt32(sys_minidump_limit, "20480");
// interval(seconds) for cleaning old minidumps
CONF_mInt32(sys_minidump_interval, "600");
// The maximum number of version per tablet. If the
// number of version exceeds this value, new write
// requests will fail.
CONF_Int16(tablet_max_versions, "1000");
CONF_mBool(enable_bitmap_union_disk_format_with_set, "false");
// yield PipelineDriver when maximum number of chunks has been moved
// in current execution round.
CONF_Int64(pipeline_yield_max_chunks_moved, "100");
// yield PipelineDriver when maximum time in nano-seconds has spent
// in current execution round.
CONF_Int64(pipeline_yield_max_time_spent, "100000000");
} // namespace config
} // namespace starrocks
#endif // STARROCKS_BE_SRC_COMMON_CONFIG_H

View File

@ -0,0 +1,332 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/configbase.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include <algorithm>
#include <cerrno>
#include <cstring>
#include <fstream>
#include <iostream>
#include <list>
#include <map>
#include <sstream>
#define __IN_CONFIGBASE_CPP__
#include "common/config.h"
#undef __IN_CONFIGBASE_CPP__
#include "common/status.h"
#include "gutil/strings/substitute.h"
namespace starrocks {
namespace config {
std::map<std::string, Register::Field>* Register::_s_field_map = nullptr;
std::map<std::string, std::string>* full_conf_map = nullptr;
Properties props;
// trim string
std::string& trim(std::string& s) {
// rtrim
s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
// ltrim
s.erase(s.begin(), std::find_if(s.begin(), s.end(), std::not1(std::ptr_fun<int, int>(std::isspace))));
return s;
}
// split string by '='
void splitkv(const std::string& s, std::string& k, std::string& v) {
const char sep = '=';
size_t start = 0;
size_t end = 0;
if ((end = s.find(sep, start)) != std::string::npos) {
k = s.substr(start, end - start);
v = s.substr(end + 1);
} else {
k = s;
v = "";
}
}
// replace env variables
bool replaceenv(std::string& s) {
std::size_t pos = 0;
std::size_t start = 0;
while ((start = s.find("${", pos)) != std::string::npos) {
std::size_t end = s.find("}", start + 2);
if (end == std::string::npos) {
return false;
}
std::string envkey = s.substr(start + 2, end - start - 2);
const char* envval = std::getenv(envkey.c_str());
if (envval == nullptr) {
return false;
}
s.erase(start, end - start + 1);
s.insert(start, envval);
pos = start + strlen(envval);
}
return true;
}
bool strtox(const std::string& valstr, bool& retval);
bool strtox(const std::string& valstr, int16_t& retval);
bool strtox(const std::string& valstr, int32_t& retval);
bool strtox(const std::string& valstr, int64_t& retval);
bool strtox(const std::string& valstr, double& retval);
bool strtox(const std::string& valstr, std::string& retval);
template <typename T>
bool strtox(const std::string& valstr, std::vector<T>& retval) {
std::stringstream ss(valstr);
std::string item;
T t;
while (std::getline(ss, item, ',')) {
if (!strtox(trim(item), t)) {
return false;
}
retval.push_back(t);
}
return true;
}
bool strtox(const std::string& valstr, bool& retval) {
if (valstr.compare("true") == 0) {
retval = true;
} else if (valstr.compare("false") == 0) {
retval = false;
} else {
return false;
}
return true;
}
template <typename T>
bool strtointeger(const std::string& valstr, T& retval) {
if (valstr.length() == 0) {
return false; // empty-string is only allowed for string type.
}
char* end;
errno = 0;
const char* valcstr = valstr.c_str();
int64_t ret64 = strtoll(valcstr, &end, 10);
if (errno || end != valcstr + strlen(valcstr)) {
return false; // bad parse
}
T tmp = retval;
retval = static_cast<T>(ret64);
if (retval != ret64) {
retval = tmp;
return false;
}
return true;
}
bool strtox(const std::string& valstr, int16_t& retval) {
return strtointeger(valstr, retval);
}
bool strtox(const std::string& valstr, int32_t& retval) {
return strtointeger(valstr, retval);
}
bool strtox(const std::string& valstr, int64_t& retval) {
return strtointeger(valstr, retval);
}
bool strtox(const std::string& valstr, double& retval) {
if (valstr.length() == 0) {
return false; // empty-string is only allowed for string type.
}
char* end = nullptr;
errno = 0;
const char* valcstr = valstr.c_str();
retval = strtod(valcstr, &end);
if (errno || end != valcstr + strlen(valcstr)) {
return false; // bad parse
}
return true;
}
bool strtox(const std::string& valstr, std::string& retval) {
retval = valstr;
return true;
}
// load conf file
bool Properties::load(const char* filename) {
// if filename is null, use the empty props
if (filename == nullptr) {
return true;
}
// open the conf file
std::ifstream input(filename);
if (!input.is_open()) {
std::cerr << "config::load() failed to open the file:" << filename << std::endl;
return false;
}
// load properties
std::string line;
std::string key;
std::string value;
line.reserve(512);
while (input) {
// read one line at a time
std::getline(input, line);
// remove left and right spaces
trim(line);
// ignore comments
if (line.empty() || line[0] == '#') {
continue;
}
// read key and value
splitkv(line, key, value);
trim(key);
trim(value);
// insert into file_conf_map
file_conf_map[key] = value;
}
// close the conf file
input.close();
return true;
}
template <typename T>
bool Properties::get(const char* key, const char* defstr, T& retval) const {
const auto& it = file_conf_map.find(std::string(key));
std::string valstr = it != file_conf_map.end() ? it->second : std::string(defstr);
trim(valstr);
if (!replaceenv(valstr)) {
return false;
}
return strtox(valstr, retval);
}
template <typename T>
bool update(const std::string& value, T& retval) {
std::string valstr(value);
trim(valstr);
if (!replaceenv(valstr)) {
return false;
}
return strtox(valstr, retval);
}
template <typename T>
std::ostream& operator<<(std::ostream& out, const std::vector<T>& v) {
size_t last = v.size() - 1;
for (size_t i = 0; i < v.size(); ++i) {
out << v[i];
if (i != last) {
out << ", ";
}
}
return out;
}
#define SET_FIELD(FIELD, TYPE, FILL_CONFMAP) \
if (strcmp((FIELD).type, #TYPE) == 0) { \
if (!props.get((FIELD).name, (FIELD).defval, *reinterpret_cast<TYPE*>((FIELD).storage))) { \
std::cerr << "config field error: " << (FIELD).name << std::endl; \
return false; \
} \
if (FILL_CONFMAP) { \
std::ostringstream oss; \
oss << (*reinterpret_cast<TYPE*>((FIELD).storage)); \
(*full_conf_map)[(FIELD).name] = oss.str(); \
} \
continue; \
}
// init conf fields
bool init(const char* filename, bool fillconfmap) {
// load properties file
if (!props.load(filename)) {
return false;
}
// fill full_conf_map ?
if (fillconfmap && full_conf_map == nullptr) {
full_conf_map = new std::map<std::string, std::string>();
}
// set conf fields
for (const auto& it : *Register::_s_field_map) {
SET_FIELD(it.second, bool, fillconfmap);
SET_FIELD(it.second, int16_t, fillconfmap);
SET_FIELD(it.second, int32_t, fillconfmap);
SET_FIELD(it.second, int64_t, fillconfmap);
SET_FIELD(it.second, double, fillconfmap);
SET_FIELD(it.second, std::string, fillconfmap);
SET_FIELD(it.second, std::vector<bool>, fillconfmap);
SET_FIELD(it.second, std::vector<int16_t>, fillconfmap);
SET_FIELD(it.second, std::vector<int32_t>, fillconfmap);
SET_FIELD(it.second, std::vector<int64_t>, fillconfmap);
SET_FIELD(it.second, std::vector<double>, fillconfmap);
SET_FIELD(it.second, std::vector<std::string>, fillconfmap);
}
return true;
}
#define UPDATE_FIELD(FIELD, VALUE, TYPE) \
if (strcmp((FIELD).type, #TYPE) == 0) { \
if (!update((VALUE), *reinterpret_cast<TYPE*>((FIELD).storage))) { \
return Status::InvalidArgument(strings::Substitute("convert '$0' as $1 failed", VALUE, #TYPE)); \
} \
if (full_conf_map != nullptr) { \
std::ostringstream oss; \
oss << (*reinterpret_cast<TYPE*>((FIELD).storage)); \
(*full_conf_map)[(FIELD).name] = oss.str(); \
} \
return Status::OK(); \
}
Status set_config(const std::string& field, const std::string& value) {
auto it = Register::_s_field_map->find(field);
if (it == Register::_s_field_map->end()) {
return Status::NotFound(strings::Substitute("'$0' is not found", field));
}
if (!it->second.valmutable) {
return Status::NotSupported(strings::Substitute("'$0' is not support to modify", field));
}
UPDATE_FIELD(it->second, value, bool);
UPDATE_FIELD(it->second, value, int16_t);
UPDATE_FIELD(it->second, value, int32_t);
UPDATE_FIELD(it->second, value, int64_t);
UPDATE_FIELD(it->second, value, double);
// The other types are not thread safe to change dynamically.
return Status::NotSupported(
strings::Substitute("'$0' is type of '$1' which is not support to modify", field, it->second.type));
}
} // namespace config
} // namespace starrocks

128
be/src/common/configbase.h Normal file
View File

@ -0,0 +1,128 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/configbase.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_CONFIGBASE_H
#define STARROCKS_BE_SRC_COMMON_CONFIGBASE_H
#include <stdint.h>
#include <map>
#include <string>
#include <vector>
namespace starrocks {
class Status;
namespace config {
class Register {
public:
struct Field {
const char* type = nullptr;
const char* name = nullptr;
void* storage = nullptr;
const char* defval = nullptr;
bool valmutable = false;
Field(const char* ftype, const char* fname, void* fstorage, const char* fdefval, bool fvalmutable)
: type(ftype), name(fname), storage(fstorage), defval(fdefval), valmutable(fvalmutable) {}
};
public:
static std::map<std::string, Field>* _s_field_map;
public:
Register(const char* ftype, const char* fname, void* fstorage, const char* fdefval, bool fvalmutable) {
if (_s_field_map == nullptr) {
_s_field_map = new std::map<std::string, Field>();
}
Field field(ftype, fname, fstorage, fdefval, fvalmutable);
_s_field_map->insert(std::make_pair(std::string(fname), field));
}
};
#define DEFINE_FIELD(FIELD_TYPE, FIELD_NAME, FIELD_DEFAULT, VALMUTABLE) \
FIELD_TYPE FIELD_NAME; \
static Register reg_##FIELD_NAME(#FIELD_TYPE, #FIELD_NAME, &FIELD_NAME, FIELD_DEFAULT, VALMUTABLE);
#define DECLARE_FIELD(FIELD_TYPE, FIELD_NAME) extern FIELD_TYPE FIELD_NAME;
#ifdef __IN_CONFIGBASE_CPP__
#define CONF_Bool(name, defaultstr) DEFINE_FIELD(bool, name, defaultstr, false)
#define CONF_Int16(name, defaultstr) DEFINE_FIELD(int16_t, name, defaultstr, false)
#define CONF_Int32(name, defaultstr) DEFINE_FIELD(int32_t, name, defaultstr, false)
#define CONF_Int64(name, defaultstr) DEFINE_FIELD(int64_t, name, defaultstr, false)
#define CONF_Double(name, defaultstr) DEFINE_FIELD(double, name, defaultstr, false)
#define CONF_String(name, defaultstr) DEFINE_FIELD(std::string, name, defaultstr, false)
#define CONF_Bools(name, defaultstr) DEFINE_FIELD(std::vector<bool>, name, defaultstr, false)
#define CONF_Int16s(name, defaultstr) DEFINE_FIELD(std::vector<int16_t>, name, defaultstr, false)
#define CONF_Int32s(name, defaultstr) DEFINE_FIELD(std::vector<int32_t>, name, defaultstr, false)
#define CONF_Int64s(name, defaultstr) DEFINE_FIELD(std::vector<int64_t>, name, defaultstr, false)
#define CONF_Doubles(name, defaultstr) DEFINE_FIELD(std::vector<double>, name, defaultstr, false)
#define CONF_Strings(name, defaultstr) DEFINE_FIELD(std::vector<std::string>, name, defaultstr, false)
#define CONF_mBool(name, defaultstr) DEFINE_FIELD(bool, name, defaultstr, true)
#define CONF_mInt16(name, defaultstr) DEFINE_FIELD(int16_t, name, defaultstr, true)
#define CONF_mInt32(name, defaultstr) DEFINE_FIELD(int32_t, name, defaultstr, true)
#define CONF_mInt64(name, defaultstr) DEFINE_FIELD(int64_t, name, defaultstr, true)
#define CONF_mDouble(name, defaultstr) DEFINE_FIELD(double, name, defaultstr, true)
#else
#define CONF_Bool(name, defaultstr) DECLARE_FIELD(bool, name)
#define CONF_Int16(name, defaultstr) DECLARE_FIELD(int16_t, name)
#define CONF_Int32(name, defaultstr) DECLARE_FIELD(int32_t, name)
#define CONF_Int64(name, defaultstr) DECLARE_FIELD(int64_t, name)
#define CONF_Double(name, defaultstr) DECLARE_FIELD(double, name)
#define CONF_String(name, defaultstr) DECLARE_FIELD(std::string, name)
#define CONF_Bools(name, defaultstr) DECLARE_FIELD(std::vector<bool>, name)
#define CONF_Int16s(name, defaultstr) DECLARE_FIELD(std::vector<int16_t>, name)
#define CONF_Int32s(name, defaultstr) DECLARE_FIELD(std::vector<int32_t>, name)
#define CONF_Int64s(name, defaultstr) DECLARE_FIELD(std::vector<int64_t>, name)
#define CONF_Doubles(name, defaultstr) DECLARE_FIELD(std::vector<double>, name)
#define CONF_Strings(name, defaultstr) DECLARE_FIELD(std::vector<std::string>, name)
#define CONF_mBool(name, defaultstr) DECLARE_FIELD(bool, name)
#define CONF_mInt16(name, defaultstr) DECLARE_FIELD(int16_t, name)
#define CONF_mInt32(name, defaultstr) DECLARE_FIELD(int32_t, name)
#define CONF_mInt64(name, defaultstr) DECLARE_FIELD(int64_t, name)
#define CONF_mDouble(name, defaultstr) DECLARE_FIELD(double, name)
#endif
// configuration properties load from config file.
class Properties {
public:
bool load(const char* filename);
template <typename T>
bool get(const char* key, const char* defstr, T& retval) const;
private:
std::map<std::string, std::string> file_conf_map;
};
extern Properties props;
// full configurations.
extern std::map<std::string, std::string>* full_conf_map;
bool init(const char* filename, bool fillconfmap = false);
Status set_config(const std::string& field, const std::string& value);
} // namespace config
} // namespace starrocks
#endif // STARROCKS_BE_SRC_COMMON_CONFIGBASE_H

350
be/src/common/daemon.cpp Normal file
View File

@ -0,0 +1,350 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/daemon.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "common/daemon.h"
#include <gflags/gflags.h>
#include <gperftools/malloc_extension.h>
#include "column/column_helper.h"
#include "column/column_pool.h"
#include "common/config.h"
#include "common/minidump.h"
#include "exprs/bitmap_function.h"
#include "exprs/cast_functions.h"
#include "exprs/compound_predicate.h"
#include "exprs/decimal_operators.h"
#include "exprs/decimalv2_operators.h"
#include "exprs/encryption_functions.h"
#include "exprs/es_functions.h"
#include "exprs/grouping_sets_functions.h"
#include "exprs/hash_functions.h"
#include "exprs/hll_function.h"
#include "exprs/hll_hash_function.h"
#include "exprs/is_null_predicate.h"
#include "exprs/json_functions.h"
#include "exprs/like_predicate.h"
#include "exprs/math_functions.h"
#include "exprs/new_in_predicate.h"
#include "exprs/operators.h"
#include "exprs/percentile_function.h"
#include "exprs/string_functions.h"
#include "exprs/time_operators.h"
#include "exprs/timestamp_functions.h"
#include "exprs/utility_functions.h"
#include "geo/geo_functions.h"
#include "runtime/bufferpool/buffer_pool.h"
#include "runtime/exec_env.h"
#include "runtime/mem_tracker.h"
#include "runtime/memory/chunk_allocator.h"
#include "runtime/user_function_cache.h"
#include "runtime/vectorized/time_types.h"
#include "storage/options.h"
#include "util/cpu_info.h"
#include "util/debug_util.h"
#include "util/disk_info.h"
#include "util/logging.h"
#include "util/mem_info.h"
#include "util/network_util.h"
#include "util/starrocks_metrics.h"
#include "util/system_metrics.h"
#include "util/thrift_util.h"
#include "util/time.h"
namespace starrocks {
bool k_starrocks_exit = false;
class ReleaseColumnPool {
public:
explicit ReleaseColumnPool(double ratio) : _ratio(ratio) {}
template <typename Pool>
void operator()() {
_freed_bytes += Pool::singleton()->release_free_columns(_ratio);
}
size_t freed_bytes() const { return _freed_bytes; }
private:
double _ratio;
size_t _freed_bytes = 0;
};
void* tcmalloc_gc_thread(void* dummy) {
using namespace starrocks::vectorized;
const static float kFreeRatio = 0.5;
while (1) {
sleep(10);
#if !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && !defined(THREAD_SANITIZER)
MallocExtension::instance()->MarkThreadBusy();
#endif
ReleaseColumnPool releaser(kFreeRatio);
ForEach<ColumnPoolList>(releaser);
LOG_IF(INFO, releaser.freed_bytes() > 0) << "Released " << releaser.freed_bytes() << " bytes from column pool";
auto* local_column_pool_mem_tracker = ExecEnv::GetInstance()->local_column_pool_mem_tracker();
if (local_column_pool_mem_tracker != nullptr) {
// Frequent update MemTracker where allocate or release column may affect performance,
// so here update MemTracker regularly
local_column_pool_mem_tracker->consume(g_column_pool_total_local_bytes.get_value() -
local_column_pool_mem_tracker->consumption());
}
auto* central_column_pool_mem_tracker = ExecEnv::GetInstance()->central_column_pool_mem_tracker();
if (central_column_pool_mem_tracker != nullptr) {
// Frequent update MemTracker where allocate or release column may affect performance,
// so here update MemTracker regularly
central_column_pool_mem_tracker->consume(g_column_pool_total_central_bytes.get_value() -
central_column_pool_mem_tracker->consumption());
}
#if !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && !defined(THREAD_SANITIZER)
size_t used_size = 0;
size_t free_size = 0;
MallocExtension::instance()->GetNumericProperty("generic.current_allocated_bytes", &used_size);
MallocExtension::instance()->GetNumericProperty("tcmalloc.pageheap_free_bytes", &free_size);
size_t phy_size = used_size + free_size; // physical memory usage
if (phy_size > config::tc_use_memory_min) {
size_t max_free_size = phy_size * config::tc_free_memory_rate / 100;
if (free_size > max_free_size) {
MallocExtension::instance()->ReleaseToSystem(free_size - max_free_size);
}
}
MallocExtension::instance()->MarkThreadIdle();
#endif
}
return NULL;
}
void* memory_maintenance_thread(void* dummy) {
while (true) {
sleep(config::memory_maintenance_sleep_time_s);
ExecEnv* env = ExecEnv::GetInstance();
// ExecEnv may not have been created yet or this may be the catalogd or statestored,
// which don't have ExecEnvs.
if (env != nullptr) {
BufferPool* buffer_pool = env->buffer_pool();
if (buffer_pool != nullptr) buffer_pool->Maintenance();
// The process limit as measured by our trackers may get out of sync with the
// process usage if memory is allocated or freed without updating a MemTracker.
// The metric is refreshed whenever memory is consumed or released via a MemTracker,
// so on a system with queries executing it will be refreshed frequently. However
// if the system is idle, we need to refresh the tracker occasionally since
// untracked memory may be allocated or freed, e.g. by background threads.
if (env->process_mem_tracker() != nullptr && !env->process_mem_tracker()->is_consumption_metric_null()) {
env->process_mem_tracker()->RefreshConsumptionFromMetric();
}
}
}
return NULL;
}
/*
* this thread will calculate some metrics at a fix interval(15 sec)
* 1. push bytes per second
* 2. scan bytes per second
* 3. max io util of all disks
* 4. max network send bytes rate
* 5. max network receive bytes rate
*/
void* calculate_metrics(void* dummy) {
int64_t last_ts = -1L;
int64_t lst_push_bytes = -1;
int64_t lst_query_bytes = -1;
std::map<std::string, int64_t> lst_disks_io_time;
std::map<std::string, int64_t> lst_net_send_bytes;
std::map<std::string, int64_t> lst_net_receive_bytes;
while (true) {
StarRocksMetrics::instance()->metrics()->trigger_hook();
if (last_ts == -1L) {
last_ts = MonotonicSeconds();
lst_push_bytes = StarRocksMetrics::instance()->push_request_write_bytes.value();
lst_query_bytes = StarRocksMetrics::instance()->query_scan_bytes.value();
StarRocksMetrics::instance()->system_metrics()->get_disks_io_time(&lst_disks_io_time);
StarRocksMetrics::instance()->system_metrics()->get_network_traffic(&lst_net_send_bytes,
&lst_net_receive_bytes);
} else {
int64_t current_ts = MonotonicSeconds();
long interval = (current_ts - last_ts);
last_ts = current_ts;
// 1. push bytes per second
int64_t current_push_bytes = StarRocksMetrics::instance()->push_request_write_bytes.value();
int64_t pps = (current_push_bytes - lst_push_bytes) / (interval == 0 ? 1 : interval);
StarRocksMetrics::instance()->push_request_write_bytes_per_second.set_value(pps < 0 ? 0 : pps);
lst_push_bytes = current_push_bytes;
// 2. query bytes per second
int64_t current_query_bytes = StarRocksMetrics::instance()->query_scan_bytes.value();
int64_t qps = (current_query_bytes - lst_query_bytes) / (interval == 0 ? 1 : interval);
StarRocksMetrics::instance()->query_scan_bytes_per_second.set_value(qps < 0 ? 0 : qps);
lst_query_bytes = current_query_bytes;
// 3. max disk io util
StarRocksMetrics::instance()->max_disk_io_util_percent.set_value(
StarRocksMetrics::instance()->system_metrics()->get_max_io_util(lst_disks_io_time, 15));
// update lst map
StarRocksMetrics::instance()->system_metrics()->get_disks_io_time(&lst_disks_io_time);
// 4. max network traffic
int64_t max_send = 0;
int64_t max_receive = 0;
StarRocksMetrics::instance()->system_metrics()->get_max_net_traffic(
lst_net_send_bytes, lst_net_receive_bytes, 15, &max_send, &max_receive);
StarRocksMetrics::instance()->max_network_send_bytes_rate.set_value(max_send);
StarRocksMetrics::instance()->max_network_receive_bytes_rate.set_value(max_receive);
// update lst map
StarRocksMetrics::instance()->system_metrics()->get_network_traffic(&lst_net_send_bytes,
&lst_net_receive_bytes);
}
sleep(15); // 15 seconds
}
return NULL;
}
static void init_starrocks_metrics(const std::vector<StorePath>& store_paths) {
bool init_system_metrics = config::enable_system_metrics;
std::set<std::string> disk_devices;
std::vector<std::string> network_interfaces;
std::vector<std::string> paths;
for (auto& store_path : store_paths) {
paths.emplace_back(store_path.path);
}
if (init_system_metrics) {
auto st = DiskInfo::get_disk_devices(paths, &disk_devices);
if (!st.ok()) {
LOG(WARNING) << "get disk devices failed, stauts=" << st.get_error_msg();
return;
}
st = get_inet_interfaces(&network_interfaces);
if (!st.ok()) {
LOG(WARNING) << "get inet interfaces failed, stauts=" << st.get_error_msg();
return;
}
}
StarRocksMetrics::instance()->initialize(paths, init_system_metrics, disk_devices, network_interfaces);
if (config::enable_metric_calculator) {
pthread_t calculator_pid;
pthread_create(&calculator_pid, NULL, calculate_metrics, NULL);
}
}
void sigterm_handler(int signo) {
k_starrocks_exit = true;
}
int install_signal(int signo, void (*handler)(int)) {
struct sigaction sa;
memset(&sa, 0, sizeof(struct sigaction));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
auto ret = sigaction(signo, &sa, nullptr);
if (ret != 0) {
PLOG(ERROR) << "install signal failed, signo=" << signo;
}
return ret;
}
void init_signals() {
auto ret = install_signal(SIGINT, sigterm_handler);
if (ret < 0) {
exit(-1);
}
ret = install_signal(SIGTERM, sigterm_handler);
if (ret < 0) {
exit(-1);
}
}
void init_minidump() {
if (config::sys_minidump_enable) {
LOG(INFO) << "Minidump is enable";
Minidump::init();
} else {
LOG(INFO) << "Minidump is disable";
}
}
void init_daemon(int argc, char** argv, const std::vector<StorePath>& paths) {
// google::SetVersionString(get_build_version(false));
// google::ParseCommandLineFlags(&argc, &argv, true);
google::ParseCommandLineFlags(&argc, &argv, true);
init_glog("be", true);
LOG(INFO) << get_version_string(false);
init_thrift_logging();
CpuInfo::init();
DiskInfo::init();
MemInfo::init();
UserFunctionCache::instance()->init(config::user_function_dir);
Operators::init();
IsNullPredicate::init();
LikePredicate::init();
StringFunctions::init();
CastFunctions::init();
InPredicate::init();
MathFunctions::init();
EncryptionFunctions::init();
TimestampFunctions::init();
DecimalOperators::init();
DecimalV2Operators::init();
TimeOperators::init();
UtilityFunctions::init();
CompoundPredicate::init();
JsonFunctions::init();
HllHashFunctions::init();
ESFunctions::init();
GeoFunctions::init();
GroupingSetsFunctions::init();
BitmapFunctions::init();
HllFunctions::init();
HashFunctions::init();
PercentileFunctions::init();
vectorized::ColumnHelper::init_static_variable();
vectorized::date::init_date_cache();
pthread_t tc_malloc_pid;
pthread_create(&tc_malloc_pid, NULL, tcmalloc_gc_thread, NULL);
pthread_t buffer_pool_pid;
pthread_create(&buffer_pool_pid, NULL, memory_maintenance_thread, NULL);
LOG(INFO) << CpuInfo::debug_string();
LOG(INFO) << DiskInfo::debug_string();
LOG(INFO) << MemInfo::debug_string();
init_starrocks_metrics(paths);
init_signals();
init_minidump();
ChunkAllocator::init_instance(config::chunk_reserved_bytes_limit);
}
} // namespace starrocks

38
be/src/common/daemon.h Normal file
View File

@ -0,0 +1,38 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/daemon.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_DAEMON_H
#define STARROCKS_BE_SRC_COMMON_COMMON_DAEMON_H
#include <vector>
#include "storage/options.h"
namespace starrocks {
// Initialises logging, flags etc. Callers that want to override default gflags
// variables should do so before calling this method; no logging should be
// performed until after this method returns.
void init_daemon(int argc, char** argv, const std::vector<StorePath>& paths);
} // namespace starrocks
#endif

View File

@ -0,0 +1,28 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/env_config.h.in
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
namespace starrocks {
#cmakedefine HAVE_SCHED_GETCPU @HAVE_SCHED_GETCPU@
}

View File

@ -0,0 +1,37 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/global_types.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_GLOBAL_TYPES_H
#define STARROCKS_BE_SRC_COMMON_COMMON_GLOBAL_TYPES_H
namespace starrocks {
// for now, these are simply ints; if we find we need to generate ids in the
// backend, we can also introduce separate classes for these to make them
// assignment-incompatible
typedef int TupleId;
typedef int SlotId;
typedef int TableId;
typedef int PlanNodeId;
}; // namespace starrocks
#endif

35
be/src/common/hdfs.h Normal file
View File

@ -0,0 +1,35 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/hdfs.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_HDFS_H
#define STARROCKS_BE_SRC_COMMON_COMMON_HDFS_H
// This is a wrapper around the hdfs header. When we are compiling to IR,
// we don't want to pull in the hdfs headers. We only need the headers
// for the typedefs which we will replicate here
// TODO: is this the cleanest way?
#ifdef IR_COMPILE
typedef void* hdfsFS;
typedef void* hdfsFile;
#else
#endif
#endif

166
be/src/common/logconfig.cpp Normal file
View File

@ -0,0 +1,166 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/logconfig.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include <glog/logging.h>
#include <glog/vlog_is_on.h>
#include <cerrno>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <mutex>
#include "common/config.h"
#include "gutil/stringprintf.h"
#include "util/logging.h"
namespace starrocks {
static bool logging_initialized = false;
static std::mutex logging_mutex;
static bool iequals(const std::string& a, const std::string& b) {
unsigned int sz = a.size();
if (b.size() != sz) {
return false;
}
for (unsigned int i = 0; i < sz; ++i) {
if (tolower(a[i]) != tolower(b[i])) {
return false;
}
}
return true;
}
bool init_glog(const char* basename, bool install_signal_handler) {
std::lock_guard<std::mutex> logging_lock(logging_mutex);
if (logging_initialized) {
return true;
}
if (install_signal_handler) {
google::InstallFailureSignalHandler();
}
// don't log to stderr
FLAGS_stderrthreshold = 5;
// set glog log dir
FLAGS_log_dir = config::sys_log_dir;
// 0 means buffer INFO only
FLAGS_logbuflevel = 0;
// buffer log messages for at most this many seconds
FLAGS_logbufsecs = 30;
// set roll num
FLAGS_log_filenum_quota = config::sys_log_roll_num;
// set log level
std::string& loglevel = config::sys_log_level;
if (iequals(loglevel, "INFO")) {
FLAGS_minloglevel = 0;
} else if (iequals(loglevel, "WARNING")) {
FLAGS_minloglevel = 1;
} else if (iequals(loglevel, "ERROR")) {
FLAGS_minloglevel = 2;
} else if (iequals(loglevel, "FATAL")) {
FLAGS_minloglevel = 3;
} else {
std::cerr << "sys_log_level needs to be INFO, WARNING, ERROR, FATAL" << std::endl;
return false;
}
// set log buffer level
// defalut is 0
std::string& logbuflevel = config::log_buffer_level;
if (iequals(logbuflevel, "-1")) {
FLAGS_logbuflevel = -1;
} else if (iequals(logbuflevel, "0")) {
FLAGS_logbuflevel = 0;
}
// set log roll mode
std::string& rollmode = config::sys_log_roll_mode;
std::string sizeflag = "SIZE-MB-";
bool ok = false;
if (rollmode.compare("TIME-DAY") == 0) {
FLAGS_log_split_method = "day";
ok = true;
} else if (rollmode.compare("TIME-HOUR") == 0) {
FLAGS_log_split_method = "hour";
ok = true;
} else if (rollmode.substr(0, sizeflag.length()).compare(sizeflag) == 0) {
FLAGS_log_split_method = "size";
std::string sizestr = rollmode.substr(sizeflag.size(), rollmode.size() - sizeflag.size());
if (sizestr.size() != 0) {
char* end = NULL;
errno = 0;
const char* sizecstr = sizestr.c_str();
int64_t ret64 = strtoll(sizecstr, &end, 10);
if ((errno == 0) && (end == sizecstr + strlen(sizecstr))) {
int32_t retval = static_cast<int32_t>(ret64);
if (retval == ret64) {
FLAGS_max_log_size = retval;
ok = true;
}
}
}
} else {
ok = false;
}
if (!ok) {
std::cerr << "sys_log_roll_mode needs to be TIME-DAY, TIME-HOUR, SIZE-MB-nnn" << std::endl;
return false;
}
// set verbose modules.
FLAGS_v = -1;
std::vector<std::string>& verbose_modules = config::sys_log_verbose_modules;
int32_t vlog_level = config::sys_log_verbose_level;
for (size_t i = 0; i < verbose_modules.size(); i++) {
if (verbose_modules[i].size() != 0) {
google::SetVLOGLevel(verbose_modules[i].c_str(), vlog_level);
}
}
google::InitGoogleLogging(basename);
logging_initialized = true;
return true;
}
void shutdown_logging() {
std::lock_guard<std::mutex> logging_lock(logging_mutex);
google::ShutdownGoogleLogging();
}
std::string FormatTimestampForLog(MicrosecondsInt64 micros_since_epoch) {
time_t secs_since_epoch = micros_since_epoch / 1000000;
int usecs = micros_since_epoch % 1000000;
struct tm tm_time;
localtime_r(&secs_since_epoch, &tm_time);
return StringPrintf("%02d%02d %02d:%02d:%02d.%06d", 1 + tm_time.tm_mon, tm_time.tm_mday, tm_time.tm_hour,
tm_time.tm_min, tm_time.tm_sec, usecs);
}
} // namespace starrocks

83
be/src/common/logging.h Normal file
View File

@ -0,0 +1,83 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/logging.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef IMPALA_COMMON_LOGGING_H
#define IMPALA_COMMON_LOGGING_H
// This is a wrapper around the glog header. When we are compiling to IR,
// we don't want to pull in the glog headers. Pulling them in causes linking
// issues when we try to dynamically link the codegen'd functions.
#ifdef IR_COMPILE
#include <iostream>
#define DCHECK(condition) \
while (false) std::cout
#define DCHECK_EQ(a, b) \
while (false) std::cout
#define DCHECK_NE(a, b) \
while (false) std::cout
#define DCHECK_GT(a, b) \
while (false) std::cout
#define DCHECK_LT(a, b) \
while (false) std::cout
#define DCHECK_GE(a, b) \
while (false) std::cout
#define DCHECK_LE(a, b) \
while (false) std::cout
// Similar to how glog defines DCHECK for release.
#define LOG(level) \
while (false) std::cout
#define VLOG(level) \
while (false) std::cout
#else
// GLOG defines this based on the system but doesn't check if it's already
// been defined. undef it first to avoid warnings.
// glog MUST be included before gflags. Instead of including them,
// our files should include this file instead.
#undef _XOPEN_SOURCE
// This is including a glog internal file. We want this to expose the
// function to get the stack trace.
#include <glog/logging.h>
#undef MutexLock
#endif
// Define VLOG levels. We want display per-row info less than per-file which
// is less than per-query. For now per-connection is the same as per-query.
#define VLOG_CONNECTION VLOG(1)
#define VLOG_RPC VLOG(8)
#define VLOG_QUERY VLOG(1)
#define VLOG_FILE VLOG(2)
#define VLOG_ROW VLOG(10)
#define VLOG_PROGRESS VLOG(2)
#define VLOG_CONNECTION_IS_ON VLOG_IS_ON(1)
#define VLOG_RPC_IS_ON VLOG_IS_ON(2)
#define VLOG_QUERY_IS_ON VLOG_IS_ON(1)
#define VLOG_FILE_IS_ON VLOG_IS_ON(2)
#define VLOG_ROW_IS_ON VLOG_IS_ON(3)
#define VLOG_PROGRESS_IS_ON VLOG_IS_ON(2)
namespace starrocks {
class TUniqueId;
}
#define QUERY_LOG(level) LOG(level) << "[" << CurrentThread::query_id_string() << "] "
#endif

164
be/src/common/minidump.cpp Normal file
View File

@ -0,0 +1,164 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
//
#include "common/minidump.h"
#include <client/linux/handler/exception_handler.h>
#include <common/linux/linux_libc_support.h>
#include <glob.h>
#include <google_breakpad/common/minidump_format.h>
#include <signal.h>
#include <ctime>
#include <filesystem>
#include <fstream>
#include <iomanip>
#include <map>
#include <sstream>
#include <system_error>
#include <thread>
#include "common/config.h"
#include "util/logging.h"
namespace starrocks {
void Minidump::init() {
get_instance();
}
Minidump& Minidump::get_instance() {
// Singleton Pattern to ensure just one instance to generate minidump.
static Minidump instance;
return instance;
}
/// Signal handler to write a minidump file outside of crashes.
void Minidump::handle_signal(int signal) {
//google_breakpad::ExceptionHandler::WriteMinidump(get_instance()._path, Minidump::dump_callback, NULL);
google_breakpad::MinidumpDescriptor descriptor(get_instance()._minidump_dir);
// Set size limit for generated minidumps
size_t size_limit = 1024 * static_cast<int64_t>(config::sys_minidump_limit);
descriptor.set_size_limit(size_limit);
google_breakpad::ExceptionHandler eh(descriptor, NULL, Minidump::dump_callback, nullptr, false, -1);
eh.WriteMinidump();
}
Minidump::Minidump() : _minidump(), _minidump_dir(config::sys_minidump_dir) {
// Clean old minidumps
check_and_rotate_minidumps(config::sys_minidump_max_files, _minidump_dir);
google_breakpad::MinidumpDescriptor descriptor(_minidump_dir);
// Set size limit for generated minidumps
size_t size_limit = 1024 * static_cast<int64_t>(config::sys_minidump_limit);
descriptor.set_size_limit(size_limit);
// Step 1: use breakpad to generate minidump caused by crash.
_minidump.reset(new google_breakpad::ExceptionHandler(descriptor, Minidump::filter_callback,
Minidump::dump_callback, NULL, true, -1));
// Step 2: write minidump as reactive to SIGUSR1.
struct sigaction signal_action;
memset(&signal_action, 0, sizeof(signal_action));
sigemptyset(&signal_action.sa_mask);
signal_action.sa_handler = Minidump::handle_signal;
// kill -10 pid
sigaction(SIGUSR1, &signal_action, nullptr);
}
// This Implementations for clean oldest and malformed files is modified from IMPALA
void Minidump::check_and_rotate_minidumps(int max_minidumps, const std::string& minidump_dir) {
if (max_minidumps <= 0) return;
// Search for minidumps. There could be multiple minidumps for a single second.
std::multimap<int, std::filesystem::path> timestamp_to_path;
// For example: 2b1af619-8a02-49d7-72652c8d-a9b32de1.dmp.
string pattern = minidump_dir + "/*.dmp";
glob_t result;
glob(pattern.c_str(), GLOB_TILDE, NULL, &result);
for (size_t i = 0; i < result.gl_pathc; ++i) {
const std::filesystem::path minidump_path(result.gl_pathv[i]);
std::error_code err;
bool is_file = std::filesystem::is_regular_file(minidump_path, err);
// std::filesystem::is_regular_file() calls stat() eventually, which can return errors, e.g. if the
// file permissions prevented access or the path was wrong (see 'man 2 stat' for
// details). In these cases we assume that the issue is out of our control and err on
// the safe side by keeping the minidump around, hoping it will aid in debugging the
// issue. The alternative, removing a ~2MB file, will probably not help much anyways.
if (err) {
LOG(WARNING) << "Failed to stat() file " << minidump_path << ": " << err;
continue;
}
if (is_file) {
std::ifstream stream(minidump_path.c_str(), std::ios::in | std::ios::binary);
if (!stream.good()) {
// Error opening file, probably broken, remove it.
LOG(WARNING) << "Failed to open file " << minidump_path << ". Removing it.";
stream.close();
// Best effort, ignore error.
std::filesystem::remove(minidump_path.c_str(), err);
continue;
}
// Read minidump header from file.
MDRawHeader header;
constexpr int header_size = sizeof(header);
stream.read((char*)(&header), header_size);
// Check for minidump header signature and version. We don't need to check for
// endianness issues here since the file was written on the same machine. Ignore the
// higher 16 bit of the version as per a comment in the breakpad sources.
if (stream.gcount() != header_size || header.signature != MD_HEADER_SIGNATURE ||
(header.version & 0x0000ffff) != MD_HEADER_VERSION) {
LOG(WARNING) << "Found file in minidump folder, but it does not look like a "
<< "minidump file: " << minidump_path.string() << ". Removing it.";
std::filesystem::remove(minidump_path, err);
if (err) {
LOG(ERROR) << "Failed to delete file: " << minidump_path << "(error was: " << err << ")";
}
continue;
}
int timestamp = header.time_date_stamp;
timestamp_to_path.emplace(timestamp, minidump_path);
}
}
globfree(&result);
// Remove oldest entries until max_minidumps are left.
if (timestamp_to_path.size() <= max_minidumps) return;
int files_to_delete = timestamp_to_path.size() - max_minidumps;
DCHECK_GT(files_to_delete, 0);
auto to_delete = timestamp_to_path.begin();
for (int i = 0; i < files_to_delete; ++i, ++to_delete) {
std::error_code err;
std::filesystem::remove(to_delete->second, err);
if (!err) {
LOG(INFO) << "Removed old minidump file : " << to_delete->second;
} else {
LOG(ERROR) << "Failed to delete old minidump file: " << to_delete->second << "(error was: " << err << ")";
}
}
}
bool Minidump::dump_callback(const google_breakpad::MinidumpDescriptor& descriptor, void* context, bool succeeded) {
// Output minidump file path
if (succeeded) {
// Write message to stdout/stderr
const char msg[] = "Dump path: ";
const int msg_len = sizeof(msg) / sizeof(msg[0]) - 1;
const char* path = descriptor.path();
// We use breakpad's reimplementation of strlen to avoid calling into libc.
const int path_len = my_strlen(path);
// We use the linux syscall support methods from chromium here as per the
// recommendation of the breakpad docs to avoid calling into other shared libraries.
sys_write(STDOUT_FILENO, msg, msg_len);
sys_write(STDOUT_FILENO, path, path_len);
sys_write(STDOUT_FILENO, "\n", 1);
}
return succeeded;
}
bool Minidump::filter_callback(void* context) {
return true;
}
} // namespace starrocks

31
be/src/common/minidump.h Normal file
View File

@ -0,0 +1,31 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <memory>
#include <string>
namespace google_breakpad {
class MinidumpDescriptor;
class ExceptionHandler;
} // namespace google_breakpad
namespace starrocks {
class Minidump {
public:
static void init();
private:
Minidump();
static Minidump& get_instance();
void check_and_rotate_minidumps(int, const std::string&);
static bool dump_callback(const google_breakpad::MinidumpDescriptor& descriptor, void* context, bool succeeded);
static void handle_signal(int signal);
static bool filter_callback(void* context);
std::unique_ptr<google_breakpad::ExceptionHandler> _minidump;
const std::string _minidump_dir;
};
} // namespace starrocks

View File

@ -0,0 +1,83 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/object_pool.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_OBJECT_POOL_H
#define STARROCKS_BE_SRC_COMMON_COMMON_OBJECT_POOL_H
#include <mutex>
#include <vector>
#include "util/spinlock.h"
namespace starrocks {
// An ObjectPool maintains a list of C++ objects which are deallocated
// by destroying the pool.
// Thread-safe.
class ObjectPool {
public:
ObjectPool() = default;
~ObjectPool() { clear(); }
ObjectPool(const ObjectPool& pool) = delete;
ObjectPool& operator=(const ObjectPool& pool) = delete;
ObjectPool(ObjectPool&& pool) = default;
ObjectPool& operator=(ObjectPool&& pool) = default;
template <class T>
T* add(T* t) {
// TODO: Consider using a lock-free structure.
std::lock_guard<SpinLock> l(_lock);
_objects.emplace_back(Element{t, [](void* obj) { delete reinterpret_cast<T*>(obj); }});
return t;
}
void clear() {
std::lock_guard<SpinLock> l(_lock);
for (auto i = _objects.rbegin(); i != _objects.rend(); ++i) {
i->delete_fn(i->obj);
}
_objects.clear();
}
void acquire_data(ObjectPool* src) {
_objects.insert(_objects.end(), src->_objects.begin(), src->_objects.end());
src->_objects.clear();
}
private:
/// A generic deletion function pointer. Deletes its first argument.
using DeleteFn = void (*)(void*);
/// For each object, a pointer to the object and a function that deletes it.
struct Element {
void* obj;
DeleteFn delete_fn;
};
std::vector<Element> _objects;
SpinLock _lock;
};
} // namespace starrocks
#endif

12
be/src/common/ownership.h Normal file
View File

@ -0,0 +1,12 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
namespace starrocks {
enum Ownership {
kTakesOwnership,
kDontTakeOwnership,
};
}

View File

@ -0,0 +1,71 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/resource_tls.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "common/resource_tls.h"
#include <pthread.h>
#include "common/logging.h"
#include "gen_cpp/Types_types.h"
namespace starrocks {
static pthread_key_t s_resource_key;
static bool s_is_init = false;
static void resource_destructor(void* value) {
TResourceInfo* info = (TResourceInfo*)value;
if (info == nullptr) {
delete info;
}
}
void ResourceTls::init() {
int ret = pthread_key_create(&s_resource_key, resource_destructor);
if (ret != 0) {
LOG(ERROR) << "create pthread key for resource failed.";
return;
}
s_is_init = true;
}
TResourceInfo* ResourceTls::get_resource_tls() {
if (!s_is_init) {
return nullptr;
}
return (TResourceInfo*)pthread_getspecific(s_resource_key);
}
int ResourceTls::set_resource_tls(TResourceInfo* info) {
if (!s_is_init) {
return -1;
}
TResourceInfo* old_info = (TResourceInfo*)pthread_getspecific(s_resource_key);
int ret = pthread_setspecific(s_resource_key, info);
if (ret == 0) {
// OK, now we delete old one
delete old_info;
}
return ret;
}
} // namespace starrocks

View File

@ -0,0 +1,37 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/resource_tls.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#ifndef STARROCKS_BE_SRC_COMMON_COMMON_RESOURCE_TLS_H
#define STARROCKS_BE_SRC_COMMON_COMMON_RESOURCE_TLS_H
namespace starrocks {
class TResourceInfo;
class ResourceTls {
public:
static void init();
static TResourceInfo* get_resource_tls();
static int set_resource_tls(TResourceInfo*);
};
} // namespace starrocks
#endif

209
be/src/common/status.cpp Normal file
View File

@ -0,0 +1,209 @@
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include "common/status.h"
#include "gutil/strings/fastmem.h" // for memcpy_inlined
namespace starrocks {
// See Status::_state for details.
static const char g_moved_from_state[8] = {'\x00', '\x00', '\x00', '\x00', TStatusCode::INTERNAL_ERROR,
'\x00', '\x00', '\x00'};
inline const char* assemble_state(TStatusCode::type code, const Slice& msg, int16_t precise_code, const Slice& msg2) {
DCHECK(code != TStatusCode::OK);
const uint32_t len1 = msg.size;
const uint32_t len2 = msg2.size;
const uint32_t size = len1 + ((len2 > 0) ? (2 + len2) : 0);
auto result = new char[size + 7];
memcpy(result, &size, sizeof(size));
result[4] = static_cast<char>(code);
memcpy(result + 5, &precise_code, sizeof(precise_code));
memcpy(result + 7, msg.data, len1);
if (len2 > 0) {
result[7 + len1] = ':';
result[8 + len1] = ' ';
memcpy(result + 9 + len1, msg2.data, len2);
}
return result;
}
const char* Status::copy_state(const char* state) {
uint32_t size;
strings::memcpy_inlined(&size, state, sizeof(size));
auto result = new char[size + 7];
strings::memcpy_inlined(result, state, size + 7);
return result;
}
Status::Status(const TStatus& s) : _state(nullptr) {
if (s.status_code != TStatusCode::OK) {
if (s.error_msgs.empty()) {
_state = assemble_state(s.status_code, Slice(), 1, Slice());
} else {
_state = assemble_state(s.status_code, s.error_msgs[0], 1, Slice());
}
}
}
Status::Status(const PStatus& s) : _state(nullptr) {
TStatusCode::type code = (TStatusCode::type)s.status_code();
if (code != TStatusCode::OK) {
if (s.error_msgs_size() == 0) {
_state = assemble_state(code, Slice(), 1, Slice());
} else {
_state = assemble_state(code, s.error_msgs(0), 1, Slice());
}
}
}
Status::Status(TStatusCode::type code, const Slice& msg, int16_t precise_code, const Slice& msg2)
: _state(assemble_state(code, msg, precise_code, msg2)) {}
void Status::to_thrift(TStatus* s) const {
s->error_msgs.clear();
if (_state == nullptr) {
s->status_code = TStatusCode::OK;
} else {
s->status_code = code();
auto msg = message();
s->error_msgs.emplace_back(msg.data, msg.size);
s->__isset.error_msgs = true;
}
}
void Status::to_protobuf(PStatus* s) const {
s->clear_error_msgs();
if (_state == nullptr) {
s->set_status_code((int)TStatusCode::OK);
} else {
s->set_status_code(code());
auto msg = message();
s->add_error_msgs(msg.data, msg.size);
}
}
std::string Status::code_as_string() const {
if (_state == nullptr) {
return "OK";
}
switch (code()) {
case TStatusCode::OK:
return "OK";
case TStatusCode::CANCELLED:
return "Cancelled";
case TStatusCode::NOT_IMPLEMENTED_ERROR:
return "Not supported";
case TStatusCode::RUNTIME_ERROR:
return "Runtime error";
case TStatusCode::MEM_LIMIT_EXCEEDED:
return "Memory limit exceeded";
case TStatusCode::INTERNAL_ERROR:
return "Internal error";
case TStatusCode::THRIFT_RPC_ERROR:
return "Rpc error";
case TStatusCode::TIMEOUT:
return "Timeout";
case TStatusCode::MEM_ALLOC_FAILED:
return "Memory alloc failed";
case TStatusCode::BUFFER_ALLOCATION_FAILED:
return "Buffer alloc failed";
case TStatusCode::MINIMUM_RESERVATION_UNAVAILABLE:
return "Minimum reservation unavailable";
case TStatusCode::PUBLISH_TIMEOUT:
return "Publish timeout";
case TStatusCode::LABEL_ALREADY_EXISTS:
return "Label already exist";
case TStatusCode::END_OF_FILE:
return "End of file";
case TStatusCode::NOT_FOUND:
return "Not found";
case TStatusCode::CORRUPTION:
return "Corruption";
case TStatusCode::INVALID_ARGUMENT:
return "Invalid argument";
case TStatusCode::IO_ERROR:
return "IO error";
case TStatusCode::ALREADY_EXIST:
return "Already exist";
case TStatusCode::NETWORK_ERROR:
return "Network error";
case TStatusCode::ILLEGAL_STATE:
return "Illegal state";
case TStatusCode::NOT_AUTHORIZED:
return "Not authorized";
case TStatusCode::REMOTE_ERROR:
return "Remote error";
case TStatusCode::SERVICE_UNAVAILABLE:
return "Service unavailable";
case TStatusCode::UNINITIALIZED:
return "Uninitialized";
case TStatusCode::CONFIGURATION_ERROR:
return "Configuration error";
case TStatusCode::INCOMPLETE:
return "Incomplete";
case TStatusCode::DATA_QUALITY_ERROR:
return "Data quality error";
default: {
char tmp[30];
snprintf(tmp, sizeof(tmp), "Unknown code(%d): ", static_cast<int>(code()));
return tmp;
}
}
return std::string();
}
std::string Status::to_string() const {
std::string result(code_as_string());
if (_state == nullptr) {
return result;
}
result.append(": ");
Slice msg = message();
result.append(reinterpret_cast<const char*>(msg.data), msg.size);
int16_t posix = precise_code();
if (posix != 1) {
char buf[64];
snprintf(buf, sizeof(buf), " (error %d)", posix);
result.append(buf);
}
return result;
}
Slice Status::message() const {
if (_state == nullptr) {
return Slice();
}
uint32_t length;
memcpy(&length, _state, sizeof(length));
return Slice(_state + 7, length);
}
Status Status::clone_and_prepend(const Slice& msg) const {
if (ok()) {
return *this;
}
return Status(code(), msg, precise_code(), message());
}
Status Status::clone_and_append(const Slice& msg) const {
if (ok()) {
return *this;
}
return Status(code(), message(), precise_code(), msg);
}
const char* Status::moved_from_state() {
return g_moved_from_state;
}
bool Status::is_moved_from(const char* state) {
return state == moved_from_state();
}
} // namespace starrocks

316
be/src/common/status.h Normal file
View File

@ -0,0 +1,316 @@
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#pragma once
#include <string>
#include <vector>
#include "common/compiler_util.h"
#include "common/logging.h"
#include "gen_cpp/Status_types.h" // for TStatus
#include "gen_cpp/status.pb.h" // for PStatus
#include "util/slice.h" // for Slice
namespace starrocks {
class Status {
public:
Status() : _state(nullptr) {}
~Status() noexcept {
if (!is_moved_from(_state)) {
delete[] _state;
}
}
// copy c'tor makes copy of error detail so Status can be returned by value
Status(const Status& s) : _state(s._state == nullptr ? nullptr : copy_state(s._state)) {}
// move c'tor
Status(Status&& s) noexcept : _state(s._state) { s._state = moved_from_state(); }
// same as copy c'tor
Status& operator=(const Status& s) {
if (this != &s) {
Status tmp(s);
std::swap(this->_state, tmp._state);
}
return *this;
}
// move assign
Status& operator=(Status&& s) noexcept {
if (this != &s) {
Status tmp(std::move(s));
std::swap(this->_state, tmp._state);
}
return *this;
}
// "Copy" c'tor from TStatus.
Status(const TStatus& status); // NOLINT
Status(const PStatus& pstatus); // NOLINT
static Status OK() { return Status(); }
static Status Unknown(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::UNKNOWN, msg, precise_code, msg2);
}
static Status PublishTimeout(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::PUBLISH_TIMEOUT, msg, precise_code, msg2);
}
static Status MemoryAllocFailed(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::MEM_ALLOC_FAILED, msg, precise_code, msg2);
}
static Status BufferAllocFailed(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::BUFFER_ALLOCATION_FAILED, msg, precise_code, msg2);
}
static Status InvalidArgument(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::INVALID_ARGUMENT, msg, precise_code, msg2);
}
static Status MinimumReservationUnavailable(const Slice& msg, int16_t precise_code = 1,
const Slice& msg2 = Slice()) {
return Status(TStatusCode::MINIMUM_RESERVATION_UNAVAILABLE, msg, precise_code, msg2);
}
static Status Corruption(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::CORRUPTION, msg, precise_code, msg2);
}
static Status IOError(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::IO_ERROR, msg, precise_code, msg2);
}
static Status NotFound(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::NOT_FOUND, msg, precise_code, msg2);
}
static Status AlreadyExist(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::ALREADY_EXIST, msg, precise_code, msg2);
}
static Status NotSupported(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::NOT_IMPLEMENTED_ERROR, msg, precise_code, msg2);
}
static Status EndOfFile(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::END_OF_FILE, msg, precise_code, msg2);
}
static Status InternalError(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::INTERNAL_ERROR, msg, precise_code, msg2);
}
static Status RuntimeError(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::RUNTIME_ERROR, msg, precise_code, msg2);
}
static Status Cancelled(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::CANCELLED, msg, precise_code, msg2);
}
static Status MemoryLimitExceeded(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::MEM_LIMIT_EXCEEDED, msg, precise_code, msg2);
}
static Status ThriftRpcError(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::THRIFT_RPC_ERROR, msg, precise_code, msg2);
}
static Status TimedOut(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::TIMEOUT, msg, precise_code, msg2);
}
static Status TooManyTasks(const Slice& msg, int16_t precise_code = 1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::TOO_MANY_TASKS, msg, precise_code, msg2);
}
static Status ServiceUnavailable(const Slice& msg, int16_t precise_code = -1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::SERVICE_UNAVAILABLE, msg, precise_code, msg2);
}
static Status Uninitialized(const Slice& msg, int16_t precise_code = -1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::UNINITIALIZED, msg, precise_code, msg2);
}
static Status Aborted(const Slice& msg, int16_t precise_code = -1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::ABORTED, msg, precise_code, msg2);
}
static Status DataQualityError(const Slice& msg, int16_t precise_code = -1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::DATA_QUALITY_ERROR, msg, precise_code, msg2);
}
static Status VersionAlreadyMerged(const Slice& msg, int16_t precise_code = -1, const Slice& msg2 = Slice()) {
return Status(TStatusCode::OLAP_ERR_VERSION_ALREADY_MERGED, msg, precise_code, msg2);
}
bool ok() const { return _state == nullptr; }
bool is_cancelled() const { return code() == TStatusCode::CANCELLED; }
bool is_mem_limit_exceeded() const { return code() == TStatusCode::MEM_LIMIT_EXCEEDED; }
bool is_thrift_rpc_error() const { return code() == TStatusCode::THRIFT_RPC_ERROR; }
bool is_end_of_file() const { return code() == TStatusCode::END_OF_FILE; }
bool is_not_found() const { return code() == TStatusCode::NOT_FOUND; }
bool is_already_exist() const { return code() == TStatusCode::ALREADY_EXIST; }
bool is_io_error() const { return code() == TStatusCode::IO_ERROR; }
bool is_not_supported() const { return code() == TStatusCode::NOT_IMPLEMENTED_ERROR; }
/// @return @c true iff the status indicates Uninitialized.
bool is_uninitialized() const { return code() == TStatusCode::UNINITIALIZED; }
// @return @c true iff the status indicates an Aborted error.
bool is_aborted() const { return code() == TStatusCode::ABORTED; }
/// @return @c true iff the status indicates an InvalidArgument error.
bool is_invalid_argument() const { return code() == TStatusCode::INVALID_ARGUMENT; }
// @return @c true iff the status indicates ServiceUnavailable.
bool is_service_unavailable() const { return code() == TStatusCode::SERVICE_UNAVAILABLE; }
bool is_data_quality_error() const { return code() == TStatusCode::DATA_QUALITY_ERROR; }
bool is_version_already_merged() const { return code() == TStatusCode::OLAP_ERR_VERSION_ALREADY_MERGED; }
// Convert into TStatus. Call this if 'status_container' contains an optional
// TStatus field named 'status'. This also sets __isset.status.
template <typename T>
void set_t_status(T* status_container) const {
to_thrift(&status_container->status);
status_container->__isset.status = true;
}
// Convert into TStatus.
void to_thrift(TStatus* status) const;
void to_protobuf(PStatus* status) const;
std::string get_error_msg() const {
auto msg = message();
return std::string(msg.data, msg.size);
}
/// @return A string representation of this status suitable for printing.
/// Returns the string "OK" for success.
std::string to_string() const;
/// @return A string representation of the status code, without the message
/// text or sub code information.
std::string code_as_string() const;
// This is similar to to_string, except that it does not include
// the stringified error code or sub code.
//
// @note The returned Slice is only valid as long as this Status object
// remains live and unchanged.
//
// @return The message portion of the Status. For @c OK statuses,
// this returns an empty string.
Slice message() const;
TStatusCode::type code() const {
return _state == nullptr ? TStatusCode::OK : static_cast<TStatusCode::type>(_state[4]);
}
int16_t precise_code() const {
if (_state == nullptr) {
return 0;
}
int16_t precise_code;
memcpy(&precise_code, _state + 5, sizeof(precise_code));
return precise_code;
}
/// Clone this status and add the specified prefix to the message.
///
/// If this status is OK, then an OK status will be returned.
///
/// @param [in] msg
/// The message to prepend.
/// @return A new Status object with the same state plus an additional
/// leading message.
Status clone_and_prepend(const Slice& msg) const;
/// Clone this status and add the specified suffix to the message.
///
/// If this status is OK, then an OK status will be returned.
///
/// @param [in] msg
/// The message to append.
/// @return A new Status object with the same state plus an additional
/// trailing message.
Status clone_and_append(const Slice& msg) const;
private:
const char* copy_state(const char* state);
// Indicates whether this Status was the rhs of a move operation.
static bool is_moved_from(const char* state);
static const char* moved_from_state();
Status(TStatusCode::type code, const Slice& msg, int16_t precise_code, const Slice& msg2);
private:
// OK status has a nullptr _state. Otherwise, _state is a new[] array
// of the following form:
// _state[0..3] == length of message
// _state[4] == code
// _state[5..6] == precise_code
// _state[7..] == message
const char* _state;
};
inline std::ostream& operator<<(std::ostream& os, const Status& st) {
return os << st.to_string();
}
// some generally useful macros
#define RETURN_IF_ERROR(stmt) \
do { \
const Status& _status_ = (stmt); \
if (UNLIKELY(!_status_.ok())) { \
return _status_; \
} \
} while (false)
#define RETURN_IF_STATUS_ERROR(status, stmt) \
do { \
status = (stmt); \
if (UNLIKELY(!status.ok())) { \
return; \
} \
} while (false)
#define EXIT_IF_ERROR(stmt) \
do { \
const Status& _status_ = (stmt); \
if (UNLIKELY(!_status_.ok())) { \
string msg = _status_.get_error_msg(); \
LOG(ERROR) << msg; \
exit(1); \
} \
} while (false)
/// @brief Emit a warning if @c to_call returns a bad status.
#define WARN_IF_ERROR(to_call, warning_prefix) \
do { \
const Status& _s = (to_call); \
if (UNLIKELY(!_s.ok())) { \
LOG(WARNING) << (warning_prefix) << ": " << _s.to_string(); \
} \
} while (0);
#define RETURN_CODE_IF_ERROR_WITH_WARN(stmt, ret_code, warning_prefix) \
do { \
const Status& _s = (stmt); \
if (UNLIKELY(!_s.ok())) { \
LOG(WARNING) << (warning_prefix) << ", error: " << _s.to_string(); \
return ret_code; \
} \
} while (0);
#define RETURN_IF_ERROR_WITH_WARN(stmt, warning_prefix) \
do { \
const Status& _s = (stmt); \
if (UNLIKELY(!_s.ok())) { \
LOG(WARNING) << (warning_prefix) << ", error: " << _s.to_string(); \
return _s; \
} \
} while (0);
} // namespace starrocks
#define RETURN_IF(cond, ret) \
do { \
if (cond) { \
return ret; \
} \
} while (0)
#define WARN_UNUSED_RESULT __attribute__((warn_unused_result))

View File

@ -0,0 +1,51 @@
// Copyright 2020 The Abseil Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "common/statusor.h"
#include <cstdlib>
#include <utility>
#include "common/status.h"
namespace starrocks {
BadStatusOrAccess::BadStatusOrAccess(starrocks::Status status) : status_(std::move(status)) {}
BadStatusOrAccess::~BadStatusOrAccess() = default;
const char* BadStatusOrAccess::what() const noexcept {
return "Bad StatusOr access";
}
const starrocks::Status& BadStatusOrAccess::status() const {
return status_;
}
namespace internal_statusor {
void Helper::HandleInvalidStatusCtorArg(starrocks::Status* status) {
const char* kMessage = "An OK status is not a valid constructor argument to StatusOr<T>";
*status = starrocks::Status::InternalError(kMessage);
}
void Helper::Crash(const starrocks::Status& status) {
std::cerr << "Attempting to fetch value instead of handling error " << status.to_string();
std::abort();
}
void ThrowBadStatusOrAccess(starrocks::Status status) {
throw starrocks::BadStatusOrAccess(std::move(status));
}
} // namespace internal_statusor
} // namespace starrocks

705
be/src/common/statusor.h Normal file
View File

@ -0,0 +1,705 @@
// Copyright 2020 The Abseil Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// -----------------------------------------------------------------------------
// File: statusor.h
// -----------------------------------------------------------------------------
//
// An `starrocks::StatusOr<T>` represents a union of an `starrocks::Status` object
// and an object of type `T`. The `starrocks::StatusOr<T>` will either contain an
// object of type `T` (indicating a successful operation), or an error (of type
// `starrocks::Status`) explaining why such a value is not present.
//
// In general, check the success of an operation returning an
// `starrocks::StatusOr<T>` like you would an `starrocks::Status` by using the `ok()`
// member function.
//
// Example:
//
// StatusOr<Foo> result = Calculation();
// if (result.ok()) {
// result->DoSomethingCool();
// } else {
// LOG(ERROR) << result.status();
// }
#pragma once
#include <exception>
#include <initializer_list>
#include <new>
#include <string>
#include <type_traits>
#include <utility>
#include "common/status.h"
#include "common/statusor_internal.h"
namespace starrocks {
// BadStatusOrAccess
//
// This class defines the type of object to throw (if exceptions are enabled),
// when accessing the value of an `starrocks::StatusOr<T>` object that does not
// contain a value. This behavior is analogous to that of
// `std::bad_optional_access` in the case of accessing an invalid
// `std::optional` value.
//
// Example:
//
// try {
// starrocks::StatusOr<int> v = FetchInt();
// DoWork(v.value()); // Accessing value() when not "OK" may throw
// } catch (starrocks::BadStatusOrAccess& ex) {
// LOG(ERROR) << ex.status();
// }
class BadStatusOrAccess : public std::exception {
public:
explicit BadStatusOrAccess(starrocks::Status status);
~BadStatusOrAccess() override;
// BadStatusOrAccess::what()
//
// Returns the associated explanatory string of the `starrocks::StatusOr<T>`
// object's error code. This function only returns the string literal "Bad
// StatusOr Access" for cases when evaluating general exceptions.
//
// The pointer of this string is guaranteed to be valid until any non-const
// function is invoked on the exception object.
const char* what() const noexcept override;
// BadStatusOrAccess::status()
//
// Returns the associated `starrocks::Status` of the `starrocks::StatusOr<T>` object's
// error.
const starrocks::Status& status() const;
private:
starrocks::Status status_;
};
// Returned StatusOr objects may not be ignored.
template <typename T>
class [[nodiscard]] StatusOr;
// starrocks::StatusOr<T>
//
// The `starrocks::StatusOr<T>` class template is a union of an `starrocks::Status` object
// and an object of type `T`. The `starrocks::StatusOr<T>` models an object that is
// either a usable object, or an error (of type `starrocks::Status`) explaining why
// such an object is not present. An `starrocks::StatusOr<T>` is typically the return
// value of a function which may fail.
//
// An `starrocks::StatusOr<T>` can never hold an "OK" status (an
// `starrocks::TStatusCode::OK` value); instead, the presence of an object of type
// `T` indicates success. Instead of checking for a `kOk` value, use the
// `starrocks::StatusOr<T>::ok()` member function. (It is for this reason, and code
// readability, that using the `ok()` function is preferred for `starrocks::Status`
// as well.)
//
// Example:
//
// StatusOr<Foo> result = DoBigCalculationThatCouldFail();
// if (result.ok()) {
// result->DoSomethingCool();
// } else {
// LOG(ERROR) << result.status();
// }
//
// Accessing the object held by an `starrocks::StatusOr<T>` should be performed via
// `operator*` or `operator->`, after a call to `ok()` confirms that the
// `starrocks::StatusOr<T>` holds an object of type `T`:
//
// Example:
//
// starrocks::StatusOr<int> i = GetCount();
// if (i.ok()) {
// updated_total += *i
// }
//
// NOTE: using `starrocks::StatusOr<T>::value()` when no valid value is present will
// throw an exception if exceptions are enabled or terminate the process when
// exceptions are not enabled.
//
// Example:
//
// StatusOr<Foo> result = DoBigCalculationThatCouldFail();
// const Foo& foo = result.value(); // Crash/exception if no value present
// foo.DoSomethingCool();
//
// A `starrocks::StatusOr<T*>` can be constructed from a null pointer like any other
// pointer value, and the result will be that `ok()` returns `true` and
// `value()` returns `nullptr`. Checking the value of pointer in an
// `starrocks::StatusOr<T>` generally requires a bit more care, to ensure both that a
// value is present and that value is not null:
//
// StatusOr<std::unique_ptr<Foo>> result = FooFactory::MakeNewFoo(arg);
// if (!result.ok()) {
// LOG(ERROR) << result.status();
// } else if (*result == nullptr) {
// LOG(ERROR) << "Unexpected null pointer";
// } else {
// (*result)->DoSomethingCool();
// }
//
// Example factory implementation returning StatusOr<T>:
//
// StatusOr<Foo> FooFactory::MakeFoo(int arg) {
// if (arg <= 0) {
// return starrocks::Status::InvalidArgument("Arg must be positive");
// }
// return Foo(arg);
// }
template <typename T>
class StatusOr : private internal_statusor::StatusOrData<T>,
private internal_statusor::CopyCtorBase<T>,
private internal_statusor::MoveCtorBase<T>,
private internal_statusor::CopyAssignBase<T>,
private internal_statusor::MoveAssignBase<T> {
template <typename U>
friend class StatusOr;
typedef internal_statusor::StatusOrData<T> Base;
public:
// StatusOr<T>::value_type
//
// This instance data provides a generic `value_type` member for use within
// generic programming. This usage is analogous to that of
// `optional::value_type` in the case of `std::optional`.
typedef T value_type;
// Constructors
// Constructs a new `starrocks::StatusOr` with an `starrocks::TStatusCode::UNKNOWN`
// status. This constructor is marked 'explicit' to prevent usages in return
// values such as 'return {};', under the misconception that
// `starrocks::StatusOr<std::vector<int>>` will be initialized with an empty
// vector, instead of an `starrocks::TStatusCode::UNKNOWN` error code.
explicit StatusOr();
// `StatusOr<T>` is copy constructible if `T` is copy constructible.
StatusOr(const StatusOr&) = default;
// `StatusOr<T>` is copy assignable if `T` is copy constructible and copy
// assignable.
StatusOr& operator=(const StatusOr&) = default;
// `StatusOr<T>` is move constructible if `T` is move constructible.
StatusOr(StatusOr&&) = default;
// `StatusOr<T>` is moveAssignable if `T` is move constructible and move
// assignable.
StatusOr& operator=(StatusOr&&) = default;
// Converting Constructors
// Constructs a new `starrocks::StatusOr<T>` from an `starrocks::StatusOr<U>`, when `T`
// is constructible from `U`. To avoid ambiguity, these constructors are
// disabled if `T` is also constructible from `StatusOr<U>.`. This constructor
// is explicit if and only if the corresponding construction of `T` from `U`
// is explicit. (This constructor inherits its explicitness from the
// underlying constructor.)
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, const U&>,
std::is_convertible<const U&, T>,
std::negation<internal_statusor::IsConstructibleOrConvertibleFromStatusOr<T, U>>>::value,
int> = 0>
StatusOr(const StatusOr<U>& other) // NOLINT
: Base(static_cast<const typename StatusOr<U>::Base&>(other)) {}
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, const U&>,
std::negation<std::is_convertible<const U&, T>>,
std::negation<internal_statusor::IsConstructibleOrConvertibleFromStatusOr<T, U>>>::value,
int> = 0>
explicit StatusOr(const StatusOr<U>& other) : Base(static_cast<const typename StatusOr<U>::Base&>(other)) {}
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, U&&>,
std::is_convertible<U&&, T>,
std::negation<internal_statusor::IsConstructibleOrConvertibleFromStatusOr<T, U>>>::value,
int> = 0>
StatusOr(StatusOr<U>&& other) // NOLINT
: Base(static_cast<typename StatusOr<U>::Base&&>(other)) {}
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, U&&>,
std::negation<std::is_convertible<U&&, T>>,
std::negation<internal_statusor::IsConstructibleOrConvertibleFromStatusOr<T, U>>>::value,
int> = 0>
explicit StatusOr(StatusOr<U>&& other) : Base(static_cast<typename StatusOr<U>::Base&&>(other)) {}
// Converting Assignment Operators
// Creates an `starrocks::StatusOr<T>` through assignment from an
// `starrocks::StatusOr<U>` when:
//
// * Both `starrocks::StatusOr<T>` and `starrocks::StatusOr<U>` are OK by assigning
// `U` to `T` directly.
// * `starrocks::StatusOr<T>` is OK and `starrocks::StatusOr<U>` contains an error
// code by destroying `starrocks::StatusOr<T>`'s value and assigning from
// `starrocks::StatusOr<U>'
// * `starrocks::StatusOr<T>` contains an error code and `starrocks::StatusOr<U>` is
// OK by directly initializing `T` from `U`.
// * Both `starrocks::StatusOr<T>` and `starrocks::StatusOr<U>` contain an error
// code by assigning the `Status` in `starrocks::StatusOr<U>` to
// `starrocks::StatusOr<T>`
//
// These overloads only apply if `starrocks::StatusOr<T>` is constructible and
// assignable from `starrocks::StatusOr<U>` and `StatusOr<T>` cannot be directly
// assigned from `StatusOr<U>`.
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, const U&>,
std::is_assignable<T, const U&>,
std::negation<internal_statusor::IsConstructibleOrConvertibleOrAssignableFromStatusOr<
T, U>>>::value,
int> = 0>
StatusOr& operator=(const StatusOr<U>& other) {
this->Assign(other);
return *this;
}
template <typename U,
std::enable_if_t<
std::conjunction<
std::negation<std::is_same<T, U>>, std::is_constructible<T, U&&>,
std::is_assignable<T, U&&>,
std::negation<internal_statusor::IsConstructibleOrConvertibleOrAssignableFromStatusOr<
T, U>>>::value,
int> = 0>
StatusOr& operator=(StatusOr<U>&& other) {
this->Assign(std::move(other));
return *this;
}
// Constructs a new `starrocks::StatusOr<T>` with a non-ok status. After calling
// this constructor, `this->ok()` will be `false` and calls to `value()` will
// crash, or produce an exception if exceptions are enabled.
//
// The constructor also takes any type `U` that is convertible to
// `starrocks::Status`. This constructor is explicit if an only if `U` is not of
// type `starrocks::Status` and the conversion from `U` to `Status` is explicit.
//
// REQUIRES: !Status(std::forward<U>(v)).ok(). This requirement is DCHECKed.
// In optimized builds, passing starrocks::Status::OK() here will have the effect
// of passing starrocks::TStatusCode::INTERNAL_ERROR as a fallback.
template <
typename U = starrocks::Status,
std::enable_if_t<
std::conjunction<std::is_convertible<U&&, starrocks::Status>,
std::is_constructible<starrocks::Status, U&&>,
std::negation<std::is_same<std::decay_t<U>, starrocks::StatusOr<T>>>,
std::negation<std::is_same<std::decay_t<U>, T>>,
std::negation<std::is_same<std::decay_t<U>, std::in_place_t>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<T, U&&>>>::value,
int> = 0>
StatusOr(U&& v) : Base(std::forward<U>(v)) {}
template <
typename U = starrocks::Status,
std::enable_if_t<
std::conjunction<std::negation<std::is_convertible<U&&, starrocks::Status>>,
std::is_constructible<starrocks::Status, U&&>,
std::negation<std::is_same<std::decay_t<U>, starrocks::StatusOr<T>>>,
std::negation<std::is_same<std::decay_t<U>, T>>,
std::negation<std::is_same<std::decay_t<U>, std::in_place_t>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<T, U&&>>>::value,
int> = 0>
explicit StatusOr(U&& v) : Base(std::forward<U>(v)) {}
template <
typename U = starrocks::Status,
std::enable_if_t<
std::conjunction<std::is_convertible<U&&, starrocks::Status>,
std::is_constructible<starrocks::Status, U&&>,
std::negation<std::is_same<std::decay_t<U>, starrocks::StatusOr<T>>>,
std::negation<std::is_same<std::decay_t<U>, T>>,
std::negation<std::is_same<std::decay_t<U>, std::in_place_t>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<T, U&&>>>::value,
int> = 0>
StatusOr& operator=(U&& v) {
this->AssignStatus(std::forward<U>(v));
return *this;
}
// Perfect-forwarding value assignment operator.
// If `*this` contains a `T` value before the call, the contained value is
// assigned from `std::forward<U>(v)`; Otherwise, it is directly-initialized
// from `std::forward<U>(v)`.
// This function does not participate in overload unless:
// 1. `std::is_constructible_v<T, U>` is true,
// 2. `std::is_assignable_v<T&, U>` is true.
// 3. `std::is_same_v<StatusOr<T>, std::remove_cvref_t<U>>` is false.
// 4. Assigning `U` to `T` is not ambiguous:
// If `U` is `StatusOr<V>` and `T` is constructible and assignable from
// both `StatusOr<V>` and `V`, the assignment is considered bug-prone and
// ambiguous thus will fail to compile. For example:
// StatusOr<bool> s1 = true; // s1.ok() && *s1 == true
// StatusOr<bool> s2 = false; // s2.ok() && *s2 == false
// s1 = s2; // ambiguous, `s1 = *s2` or `s1 = bool(s2)`?
template <
typename U = T,
typename = typename std::enable_if<std::conjunction<
std::is_constructible<T, U&&>, std::is_assignable<T&, U&&>,
std::disjunction<std::is_same<std::remove_cv_t<std::remove_reference_t<U>>, T>,
std::conjunction<std::negation<std::is_convertible<U&&, starrocks::Status>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<
T, U&&>>>>,
internal_statusor::IsForwardingAssignmentValid<T, U&&>>::value>::type>
StatusOr& operator=(U&& v) {
this->Assign(std::forward<U>(v));
return *this;
}
// Constructs the inner value `T` in-place using the provided args, using the
// `T(args...)` constructor.
template <typename... Args>
explicit StatusOr(std::in_place_t, Args&&... args);
template <typename U, typename... Args>
explicit StatusOr(std::in_place_t, std::initializer_list<U> ilist, Args&&... args);
// Constructs the inner value `T` in-place using the provided args, using the
// `T(U)` (direct-initialization) constructor. This constructor is only valid
// if `T` can be constructed from a `U`. Can accept move or copy constructors.
//
// This constructor is explicit if `U` is not convertible to `T`. To avoid
// ambiguity, this constuctor is disabled if `U` is a `StatusOr<J>`, where `J`
// is convertible to `T`.
template <typename U = T,
std::enable_if_t<
std::conjunction<
internal_statusor::IsDirectInitializationValid<T, U&&>, std::is_constructible<T, U&&>,
std::is_convertible<U&&, T>,
std::disjunction<
std::is_same<std::remove_cv_t<std::remove_reference_t<U>>, T>,
std::conjunction<std::negation<std::is_convertible<U&&, starrocks::Status>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<
T, U&&>>>>>::value,
int> = 0>
StatusOr(U&& u) // NOLINT
: StatusOr(std::in_place, std::forward<U>(u)) {}
template <
typename U = T,
std::enable_if_t<
std::conjunction<
internal_statusor::IsDirectInitializationValid<T, U&&>,
std::disjunction<
std::is_same<std::remove_cv_t<std::remove_reference_t<U>>, T>,
std::conjunction<
std::negation<std::is_constructible<starrocks::Status, U&&>>,
std::negation<internal_statusor::HasConversionOperatorToStatusOr<T, U&&>>>>,
std::is_constructible<T, U&&>, std::negation<std::is_convertible<U&&, T>>>::value,
int> = 0>
explicit StatusOr(U&& u) // NOLINT
: StatusOr(std::in_place, std::forward<U>(u)) {}
// StatusOr<T>::ok()
//
// Returns whether or not this `starrocks::StatusOr<T>` holds a `T` value. This
// member function is analagous to `starrocks::Status::ok()` and should be used
// similarly to check the status of return values.
//
// Example:
//
// StatusOr<Foo> result = DoBigCalculationThatCouldFail();
// if (result.ok()) {
// // Handle result
// else {
// // Handle error
// }
[[nodiscard]] bool ok() const { return this->status_.ok(); }
// StatusOr<T>::status()
//
// Returns a reference to the current `starrocks::Status` contained within the
// `starrocks::StatusOr<T>`. If `starrocks::StatusOr<T>` contains a `T`, then this
// function returns `starrocks::Status::OK()`.
const Status& status() const&;
Status status() &&;
// StatusOr<T>::value()
//
// Returns a reference to the held value if `this->ok()`. Otherwise, throws
// `starrocks::BadStatusOrAccess` if exceptions are enabled, or is guaranteed to
// terminate the process if exceptions are disabled.
//
// If you have already checked the status using `this->ok()`, you probably
// want to use `operator*()` or `operator->()` to access the value instead of
// `value`.
//
// Note: for value types that are cheap to copy, prefer simple code:
//
// T value = statusor.value();
//
// Otherwise, if the value type is expensive to copy, but can be left
// in the StatusOr, simply assign to a reference:
//
// T& value = statusor.value(); // or `const T&`
//
// Otherwise, if the value type supports an efficient move, it can be
// used as follows:
//
// T value = std::move(statusor).value();
//
// The `std::move` on statusor instead of on the whole expression enables
// warnings about possible uses of the statusor object after the move.
const T& value() const&;
T& value() &;
const T&& value() const&&;
T&& value() &&;
// StatusOr<T>:: operator*()
//
// Returns a reference to the current value.
//
// REQUIRES: `this->ok() == true`, otherwise the behavior is undefined.
//
// Use `this->ok()` to verify that there is a current value within the
// `starrocks::StatusOr<T>`. Alternatively, see the `value()` member function for a
// similar API that guarantees crashing or throwing an exception if there is
// no current value.
const T& operator*() const&;
T& operator*() &;
const T&& operator*() const&&;
T&& operator*() &&;
// StatusOr<T>::operator->()
//
// Returns a pointer to the current value.
//
// REQUIRES: `this->ok() == true`, otherwise the behavior is undefined.
//
// Use `this->ok()` to verify that there is a current value.
const T* operator->() const;
T* operator->();
// StatusOr<T>::value_or()
//
// Returns the current value if `this->ok() == true`. Otherwise constructs a
// value using the provided `default_value`.
//
// Unlike `value`, this function returns by value, copying the current value
// if necessary. If the value type supports an efficient move, it can be used
// as follows:
//
// T value = std::move(statusor).value_or(def);
//
// Unlike with `value`, calling `std::move()` on the result of `value_or` will
// still trigger a copy.
template <typename U>
T value_or(U&& default_value) const&;
template <typename U>
T value_or(U&& default_value) &&;
// StatusOr<T>::IgnoreError()
//
// Ignores any errors. This method does nothing except potentially suppress
// complaints from any tools that are checking that errors are not dropped on
// the floor.
void IgnoreError() const;
// StatusOr<T>::emplace()
//
// Reconstructs the inner value T in-place using the provided args, using the
// T(args...) constructor. Returns reference to the reconstructed `T`.
template <typename... Args>
T& emplace(Args&&... args) {
if (ok()) {
this->Clear();
this->MakeValue(std::forward<Args>(args)...);
} else {
this->MakeValue(std::forward<Args>(args)...);
this->status_ = starrocks::Status::OK();
}
return this->data_;
}
template <typename U, typename... Args,
std::enable_if_t<std::is_constructible<T, std::initializer_list<U>&, Args&&...>::value, int> = 0>
T& emplace(std::initializer_list<U> ilist, Args&&... args) {
if (ok()) {
this->Clear();
this->MakeValue(ilist, std::forward<Args>(args)...);
} else {
this->MakeValue(ilist, std::forward<Args>(args)...);
this->status_ = starrocks::Status::OK();
}
return this->data_;
}
private:
using internal_statusor::StatusOrData<T>::Assign;
template <typename U>
void Assign(const starrocks::StatusOr<U>& other);
template <typename U>
void Assign(starrocks::StatusOr<U>&& other);
};
// operator==()
//
// This operator checks the equality of two `starrocks::StatusOr<T>` objects.
template <typename T>
bool operator==(const StatusOr<T>& lhs, const StatusOr<T>& rhs) {
if (lhs.ok() && rhs.ok()) return *lhs == *rhs;
return lhs.status() == rhs.status();
}
// operator!=()
//
// This operator checks the inequality of two `starrocks::StatusOr<T>` objects.
template <typename T>
bool operator!=(const StatusOr<T>& lhs, const StatusOr<T>& rhs) {
return !(lhs == rhs);
}
//------------------------------------------------------------------------------
// Implementation details for StatusOr<T>
//------------------------------------------------------------------------------
// TODO(sbenza): avoid the string here completely.
template <typename T>
StatusOr<T>::StatusOr() : Base(Status::Unknown("")) {}
template <typename T>
template <typename U>
inline void StatusOr<T>::Assign(const StatusOr<U>& other) {
if (other.ok()) {
this->Assign(*other);
} else {
this->AssignStatus(other.status());
}
}
template <typename T>
template <typename U>
inline void StatusOr<T>::Assign(StatusOr<U>&& other) {
if (other.ok()) {
this->Assign(*std::move(other));
} else {
this->AssignStatus(std::move(other).status());
}
}
template <typename T>
template <typename... Args>
StatusOr<T>::StatusOr(std::in_place_t, Args&&... args) : Base(std::in_place, std::forward<Args>(args)...) {}
template <typename T>
template <typename U, typename... Args>
StatusOr<T>::StatusOr(std::in_place_t, std::initializer_list<U> ilist, Args&&... args)
: Base(std::in_place, ilist, std::forward<Args>(args)...) {}
template <typename T>
const Status& StatusOr<T>::status() const& {
return this->status_;
}
template <typename T>
Status StatusOr<T>::status() && {
return ok() ? Status::OK() : std::move(this->status_);
}
template <typename T>
const T& StatusOr<T>::value() const& {
if (!this->ok()) internal_statusor::ThrowBadStatusOrAccess(this->status_);
return this->data_;
}
template <typename T>
T& StatusOr<T>::value() & {
if (!this->ok()) internal_statusor::ThrowBadStatusOrAccess(this->status_);
return this->data_;
}
template <typename T>
const T&& StatusOr<T>::value() const&& {
if (!this->ok()) {
internal_statusor::ThrowBadStatusOrAccess(std::move(this->status_));
}
return std::move(this->data_);
}
template <typename T>
T&& StatusOr<T>::value() && {
if (!this->ok()) {
internal_statusor::ThrowBadStatusOrAccess(std::move(this->status_));
}
return std::move(this->data_);
}
template <typename T>
const T& StatusOr<T>::operator*() const& {
this->EnsureOk();
return this->data_;
}
template <typename T>
T& StatusOr<T>::operator*() & {
this->EnsureOk();
return this->data_;
}
template <typename T>
const T&& StatusOr<T>::operator*() const&& {
this->EnsureOk();
return std::move(this->data_);
}
template <typename T>
T&& StatusOr<T>::operator*() && {
this->EnsureOk();
return std::move(this->data_);
}
template <typename T>
const T* StatusOr<T>::operator->() const {
this->EnsureOk();
return &this->data_;
}
template <typename T>
T* StatusOr<T>::operator->() {
this->EnsureOk();
return &this->data_;
}
template <typename T>
template <typename U>
T StatusOr<T>::value_or(U&& default_value) const& {
if (ok()) {
return this->data_;
}
return std::forward<U>(default_value);
}
template <typename T>
template <typename U>
T StatusOr<T>::value_or(U&& default_value) && {
if (ok()) {
return std::move(this->data_);
}
return std::forward<U>(default_value);
}
template <typename T>
void StatusOr<T>::IgnoreError() const {
// no-op
}
} // namespace starrocks

View File

@ -0,0 +1,361 @@
// Copyright 2020 The Abseil Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <type_traits>
#include <utility>
#include "common/status.h"
namespace starrocks {
template <typename T>
class [[nodiscard]] StatusOr;
namespace internal_statusor {
// Detects whether `U` has conversion operator to `StatusOr<T>`, i.e. `operator
// StatusOr<T>()`.
template <typename T, typename U, typename = void>
struct HasConversionOperatorToStatusOr : std::false_type {};
template <typename T, typename U>
void test(char (*)[sizeof(std::declval<U>().operator starrocks::StatusOr<T>())]);
template <typename T, typename U>
struct HasConversionOperatorToStatusOr<T, U, decltype(test<T, U>(0))> : std::true_type {};
// Detects whether `T` is constructible or convertible from `StatusOr<U>`.
template <typename T, typename U>
using IsConstructibleOrConvertibleFromStatusOr =
std::disjunction<std::is_constructible<T, StatusOr<U>&>, std::is_constructible<T, const StatusOr<U>&>,
std::is_constructible<T, StatusOr<U>&&>, std::is_constructible<T, const StatusOr<U>&&>,
std::is_convertible<StatusOr<U>&, T>, std::is_convertible<const StatusOr<U>&, T>,
std::is_convertible<StatusOr<U>&&, T>, std::is_convertible<const StatusOr<U>&&, T>>;
// Detects whether `T` is constructible or convertible or assignable from
// `StatusOr<U>`.
template <typename T, typename U>
using IsConstructibleOrConvertibleOrAssignableFromStatusOr =
std::disjunction<IsConstructibleOrConvertibleFromStatusOr<T, U>, std::is_assignable<T&, StatusOr<U>&>,
std::is_assignable<T&, const StatusOr<U>&>, std::is_assignable<T&, StatusOr<U>&&>,
std::is_assignable<T&, const StatusOr<U>&&>>;
// Detects whether direct initializing `StatusOr<T>` from `U` is ambiguous, i.e.
// when `U` is `StatusOr<V>` and `T` is constructible or convertible from `V`.
template <typename T, typename U>
struct IsDirectInitializationAmbiguous
: public std::conditional_t<std::is_same<std::remove_cv_t<std::remove_reference_t<U>>, U>::value,
std::false_type,
IsDirectInitializationAmbiguous<T, std::remove_cv_t<std::remove_reference_t<U>>>> {
};
template <typename T, typename V>
struct IsDirectInitializationAmbiguous<T, starrocks::StatusOr<V>>
: public IsConstructibleOrConvertibleFromStatusOr<T, V> {};
// Checks against the constraints of the direction initialization, i.e. when
// `StatusOr<T>::StatusOr(U&&)` should participate in overload resolution.
template <typename T, typename U>
using IsDirectInitializationValid = std::disjunction<
// Short circuits if T is basically U.
std::is_same<T, std::remove_cv_t<std::remove_reference_t<U>>>,
std::negation<
std::disjunction<std::is_same<starrocks::StatusOr<T>, std::remove_cv_t<std::remove_reference_t<U>>>,
std::is_same<starrocks::Status, std::remove_cv_t<std::remove_reference_t<U>>>,
std::is_same<std::in_place_t, std::remove_cv_t<std::remove_reference_t<U>>>,
IsDirectInitializationAmbiguous<T, U>>>>;
// This trait detects whether `StatusOr<T>::operator=(U&&)` is ambiguous, which
// is equivalent to whether all the following conditions are met:
// 1. `U` is `StatusOr<V>`.
// 2. `T` is constructible and assignable from `V`.
// 3. `T` is constructible and assignable from `U` (i.e. `StatusOr<V>`).
// For example, the following code is considered ambiguous:
// (`T` is `bool`, `U` is `StatusOr<bool>`, `V` is `bool`)
// StatusOr<bool> s1 = true; // s1.ok() && s1.ValueOrDie() == true
// StatusOr<bool> s2 = false; // s2.ok() && s2.ValueOrDie() == false
// s1 = s2; // ambiguous, `s1 = s2.ValueOrDie()` or `s1 = bool(s2)`?
template <typename T, typename U>
struct IsForwardingAssignmentAmbiguous
: public std::conditional_t<std::is_same<std::remove_cv_t<std::remove_reference_t<U>>, U>::value,
std::false_type,
IsForwardingAssignmentAmbiguous<T, std::remove_cv_t<std::remove_reference_t<U>>>> {
};
template <typename T, typename U>
struct IsForwardingAssignmentAmbiguous<T, starrocks::StatusOr<U>>
: public IsConstructibleOrConvertibleOrAssignableFromStatusOr<T, U> {};
// Checks against the constraints of the forwarding assignment, i.e. whether
// `StatusOr<T>::operator(U&&)` should participate in overload resolution.
template <typename T, typename U>
using IsForwardingAssignmentValid = std::disjunction<
// Short circuits if T is basically U.
std::is_same<T, std::remove_cv_t<std::remove_reference_t<U>>>,
std::negation<
std::disjunction<std::is_same<starrocks::StatusOr<T>, std::remove_cv_t<std::remove_reference_t<U>>>,
std::is_same<starrocks::Status, std::remove_cv_t<std::remove_reference_t<U>>>,
std::is_same<std::in_place_t, std::remove_cv_t<std::remove_reference_t<U>>>,
IsForwardingAssignmentAmbiguous<T, U>>>>;
class Helper {
public:
// Move type-agnostic error handling to the .cc.
static void HandleInvalidStatusCtorArg(Status*);
[[noreturn]] static void Crash(const starrocks::Status& status);
};
// Construct an instance of T in `p` through placement new, passing Args... to
// the constructor.
// This abstraction is here mostly for the gcc performance fix.
template <typename T, typename... Args>
void PlacementNew(void* p, Args&&... args) {
new (p) T(std::forward<Args>(args)...);
}
// Helper base class to hold the data and all operations.
// We move all this to a base class to allow mixing with the appropriate
// TraitsBase specialization.
template <typename T>
class StatusOrData {
template <typename U>
friend class StatusOrData;
public:
StatusOrData() = delete;
StatusOrData(const StatusOrData& other) {
if (other.ok()) {
MakeValue(other.data_);
MakeStatus();
} else {
MakeStatus(other.status_);
}
}
StatusOrData(StatusOrData&& other) noexcept {
if (other.ok()) {
MakeValue(std::move(other.data_));
MakeStatus();
} else {
MakeStatus(std::move(other.status_));
}
}
template <typename U>
explicit StatusOrData(const StatusOrData<U>& other) {
if (other.ok()) {
MakeValue(other.data_);
MakeStatus();
} else {
MakeStatus(other.status_);
}
}
template <typename U>
explicit StatusOrData(StatusOrData<U>&& other) {
if (other.ok()) {
MakeValue(std::move(other.data_));
MakeStatus();
} else {
MakeStatus(std::move(other.status_));
}
}
template <typename... Args>
explicit StatusOrData(std::in_place_t, Args&&... args) : data_(std::forward<Args>(args)...) {
MakeStatus();
}
explicit StatusOrData(const T& value) : data_(value) { MakeStatus(); }
explicit StatusOrData(T&& value) : data_(std::move(value)) { MakeStatus(); }
template <typename U, std::enable_if_t<std::is_constructible<starrocks::Status, U&&>::value, int> = 0>
explicit StatusOrData(U&& v) : status_(std::forward<U>(v)) {
EnsureNotOk();
}
StatusOrData& operator=(const StatusOrData& other) {
if (this == &other) return *this;
if (other.ok())
Assign(other.data_);
else
AssignStatus(other.status_);
return *this;
}
StatusOrData& operator=(StatusOrData&& other) {
if (this == &other) return *this;
if (other.ok())
Assign(std::move(other.data_));
else
AssignStatus(std::move(other.status_));
return *this;
}
~StatusOrData() {
if (ok()) {
status_.~Status();
data_.~T();
} else {
status_.~Status();
}
}
template <typename U>
void Assign(U&& value) {
if (ok()) {
data_ = std::forward<U>(value);
} else {
MakeValue(std::forward<U>(value));
status_ = Status::OK();
}
}
template <typename U>
void AssignStatus(U&& v) {
Clear();
status_ = static_cast<starrocks::Status>(std::forward<U>(v));
EnsureNotOk();
}
bool ok() const { return status_.ok(); }
protected:
// status_ will always be active after the constructor.
// We make it a union to be able to initialize exactly how we need without
// waste.
// Eg. in the copy constructor we use the default constructor of Status in
// the ok() path to avoid an extra Ref call.
union {
Status status_;
};
// data_ is active iff status_.ok()==true
struct Dummy {};
union {
// When T is const, we need some non-const object we can cast to void* for
// the placement new. dummy_ is that object.
Dummy dummy_;
T data_;
};
void Clear() {
if (ok()) data_.~T();
}
void EnsureOk() const {
if (!ok()) Helper::Crash(status_);
}
void EnsureNotOk() {
if (ok()) Helper::HandleInvalidStatusCtorArg(&status_);
}
// Construct the value (ie. data_) through placement new with the passed
// argument.
template <typename... Arg>
void MakeValue(Arg&&... arg) {
internal_statusor::PlacementNew<T>(&dummy_, std::forward<Arg>(arg)...);
}
// Construct the status (ie. status_) through placement new with the passed
// argument.
template <typename... Args>
void MakeStatus(Args&&... args) {
internal_statusor::PlacementNew<Status>(&status_, std::forward<Args>(args)...);
}
};
// Helper base classes to allow implicitly deleted constructors and assignment
// operators in `StatusOr`. For example, `CopyCtorBase` will explicitly delete
// the copy constructor when T is not copy constructible and `StatusOr` will
// inherit that behavior implicitly.
template <typename T, bool = std::is_copy_constructible<T>::value>
struct CopyCtorBase {
CopyCtorBase() = default;
CopyCtorBase(const CopyCtorBase&) = default;
CopyCtorBase(CopyCtorBase&&) = default;
CopyCtorBase& operator=(const CopyCtorBase&) = default;
CopyCtorBase& operator=(CopyCtorBase&&) = default;
};
template <typename T>
struct CopyCtorBase<T, false> {
CopyCtorBase() = default;
CopyCtorBase(const CopyCtorBase&) = delete;
CopyCtorBase(CopyCtorBase&&) = default;
CopyCtorBase& operator=(const CopyCtorBase&) = default;
CopyCtorBase& operator=(CopyCtorBase&&) = default;
};
template <typename T, bool = std::is_move_constructible<T>::value>
struct MoveCtorBase {
MoveCtorBase() = default;
MoveCtorBase(const MoveCtorBase&) = default;
MoveCtorBase(MoveCtorBase&&) = default;
MoveCtorBase& operator=(const MoveCtorBase&) = default;
MoveCtorBase& operator=(MoveCtorBase&&) = default;
};
template <typename T>
struct MoveCtorBase<T, false> {
MoveCtorBase() = default;
MoveCtorBase(const MoveCtorBase&) = default;
MoveCtorBase(MoveCtorBase&&) = delete;
MoveCtorBase& operator=(const MoveCtorBase&) = default;
MoveCtorBase& operator=(MoveCtorBase&&) = default;
};
template <typename T, bool = std::is_copy_constructible<T>::value&& std::is_copy_assignable<T>::value>
struct CopyAssignBase {
CopyAssignBase() = default;
CopyAssignBase(const CopyAssignBase&) = default;
CopyAssignBase(CopyAssignBase&&) = default;
CopyAssignBase& operator=(const CopyAssignBase&) = default;
CopyAssignBase& operator=(CopyAssignBase&&) = default;
};
template <typename T>
struct CopyAssignBase<T, false> {
CopyAssignBase() = default;
CopyAssignBase(const CopyAssignBase&) = default;
CopyAssignBase(CopyAssignBase&&) = default;
CopyAssignBase& operator=(const CopyAssignBase&) = delete;
CopyAssignBase& operator=(CopyAssignBase&&) = default;
};
template <typename T, bool = std::is_move_constructible<T>::value&& std::is_move_assignable<T>::value>
struct MoveAssignBase {
MoveAssignBase() = default;
MoveAssignBase(const MoveAssignBase&) = default;
MoveAssignBase(MoveAssignBase&&) = default;
MoveAssignBase& operator=(const MoveAssignBase&) = default;
MoveAssignBase& operator=(MoveAssignBase&&) = default;
};
template <typename T>
struct MoveAssignBase<T, false> {
MoveAssignBase() = default;
MoveAssignBase(const MoveAssignBase&) = default;
MoveAssignBase(MoveAssignBase&&) = default;
MoveAssignBase& operator=(const MoveAssignBase&) = default;
MoveAssignBase& operator=(MoveAssignBase&&) = delete;
};
[[noreturn]] void ThrowBadStatusOrAccess(starrocks::Status status);
} // namespace internal_statusor
} // namespace starrocks

99
be/src/common/type_list.h Normal file
View File

@ -0,0 +1,99 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
namespace starrocks::vectorized {
// A typelist is a type that represents a list of types and can be manipulated by a
// template metapro-gram. It provides the operations typically associated with a list:
// iterating over the elements (types) in the list, adding elements, or removing elements.
// However, typelists differ from most run-time data structures, such as std::list,
// in that they don't allow mutation.
//
// reference: <<C++ Templates The Complete Guide (2nd edition)>>
template <typename... Elements>
class TypeList {};
/// IsEmpty
template <typename List>
class IsEmpty {
public:
static constexpr bool value = false;
};
template <>
class IsEmpty<TypeList<>> {
public:
static constexpr bool value = true;
};
/// Front
template <typename List>
class FrontT;
template <typename Head, typename... Tail>
class FrontT<TypeList<Head, Tail...>> {
public:
using Type = Head;
};
template <typename List>
using Front = typename FrontT<List>::Type;
/// PopFront
template <typename List>
class PopFrontT;
template <typename Head, typename... Tail>
class PopFrontT<TypeList<Head, Tail...>> {
public:
using Type = TypeList<Tail...>;
};
template <typename List>
using PopFront = typename PopFrontT<List>::Type;
/// PushFront
template <typename List, typename NewElement>
class PushFrontT;
template <typename... Elements, typename NewElement>
class PushFrontT<TypeList<Elements...>, NewElement> {
public:
using Type = TypeList<NewElement, Elements...>;
};
template <typename List, typename NewElement>
using PushFront = typename PushFrontT<List, NewElement>::Type;
/// PushBack
template <typename List, typename NewElement>
class PushBackT;
template <typename... Elements, typename NewElement>
class PushBackT<TypeList<Elements...>, NewElement> {
public:
using Type = TypeList<Elements..., NewElement>;
};
template <typename List, typename NewElement>
using PushBack = typename PushBackT<List, NewElement>::Type;
/// InList
template <typename Element, typename List>
class InList : public std::bool_constant<
std::disjunction<std::is_same<Front<List>, Element>, InList<Element, PopFront<List>>>::value> {};
template <typename Element>
class InList<Element, TypeList<>> : public std::false_type {};
/// ForEach
template <typename List, typename Func>
void ForEach(Func&& func) {
if constexpr (!IsEmpty<List>::value) {
func.template operator()<Front<List>>();
ForEach<PopFront<List>>(std::forward<Func>(func));
}
}
} // namespace starrocks::vectorized

56
be/src/common/utils.h Normal file
View File

@ -0,0 +1,56 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/common/utils.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <string>
namespace starrocks {
struct AuthInfo {
std::string user;
std::string passwd;
std::string cluster;
std::string user_ip;
// -1 as unset
int64_t auth_code = -1;
};
template <class T>
void set_request_auth(T* req, const AuthInfo& auth) {
if (auth.auth_code != -1) {
// if auth_code is set, no need to set other info
req->__set_auth_code(auth.auth_code);
// user name and passwd is unused, but they are required field.
// so they have to be set.
req->user = "";
req->passwd = "";
} else {
req->user = auth.user;
req->passwd = auth.passwd;
if (!auth.cluster.empty()) {
req->__set_cluster(auth.cluster);
}
req->__set_user_ip(auth.user_ip);
}
}
} // namespace starrocks

44
be/src/env/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,44 @@
# This file is made available under Elastic License 2.0.
# This file is based on code available under the Apache license here:
# https://github.com/apache/incubator-doris/blob/master/be/src/env/CMakeLists.txt
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# where to put generated libraries
set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/env")
# where to put generated binaries
set(EXECUTABLE_OUTPUT_PATH "${BUILD_DIR}/src/env")
set(EXEC_FILES
compressed_file.cpp
env_posix.cpp
env_util.cpp
env_stream_pipe.cpp
env_broker.cpp
env_memory.cpp)
if (WITH_HDFS)
set(EXEC_FILES ${EXEC_FILES}
env_hdfs.cpp
)
endif()
add_library(Env STATIC
${EXEC_FILES}
)

61
be/src/env/compressed_file.cpp vendored Normal file
View File

@ -0,0 +1,61 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "env/compressed_file.h"
#include "exec/decompressor.h"
namespace starrocks {
Status CompressedSequentialFile::read(Slice* result) {
size_t output_len = result->size;
size_t output_bytes = 0;
result->size = 0;
while (output_bytes == 0) {
Status st = _compressed_buff.read(_input_file.get());
if (!st.ok() && !st.is_end_of_file()) {
return st;
} else if (st.is_end_of_file() && _stream_end) {
return Status::OK();
}
uint8_t* output = reinterpret_cast<uint8_t*>(result->data);
Slice compressed_data = _compressed_buff.read_buffer();
size_t input_bytes_read = 0;
size_t output_bytes_written = 0;
DCHECK_GT(compressed_data.size, 0);
RETURN_IF_ERROR(_decompressor->decompress((uint8_t*)compressed_data.data, compressed_data.size,
&input_bytes_read, output, output_len, &output_bytes_written,
&_stream_end));
if (UNLIKELY(output_bytes_written == 0 && input_bytes_read == 0 && st.is_end_of_file())) {
return Status::InternalError(strings::Substitute("Failed to decompress. input_len:$0, output_len:$0",
compressed_data.size, output_len));
}
_compressed_buff.skip(input_bytes_read);
output_bytes += output_bytes_written;
}
result->size = output_bytes;
return Status::OK();
}
Status CompressedSequentialFile::skip(uint64_t n) {
raw::RawVector<uint8_t> buff;
buff.resize(n);
while (n > 0) {
Slice s(buff.data(), n);
Status st = read(&s);
if (st.ok()) {
n -= s.size;
} else if (st.is_end_of_file()) {
return Status::OK();
} else {
return st;
}
}
return Status::OK();
}
} // namespace starrocks

79
be/src/env/compressed_file.h vendored Normal file
View File

@ -0,0 +1,79 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "env/env.h"
#include "util/bit_util.h"
#include "util/raw_container.h"
namespace starrocks {
class Decompressor;
class CompressedSequentialFile final : public SequentialFile {
public:
CompressedSequentialFile(std::shared_ptr<SequentialFile> input_file, std::shared_ptr<Decompressor> decompressor,
size_t compressed_data_cache_size = 8 * 1024 * 1024LU)
: _filename("compressed-" + input_file->filename()),
_input_file(std::move(input_file)),
_decompressor(std::move(decompressor)),
_compressed_buff(BitUtil::round_up(compressed_data_cache_size, CACHELINE_SIZE)) {}
Status read(Slice* result) override;
Status skip(uint64_t n) override;
const std::string& filename() const override { return _filename; }
private:
// Used to store the compressed data read from |_input_file|.
class CompressedBuffer {
public:
explicit CompressedBuffer(size_t buff_size)
: _compressed_data(BitUtil::round_up(buff_size, CACHELINE_SIZE)), _offset(0), _limit(0) {}
inline Slice read_buffer() const { return Slice(&_compressed_data[_offset], _limit - _offset); }
inline Slice write_buffer() const { return Slice(&_compressed_data[_limit], _compressed_data.size() - _limit); }
inline void skip(size_t n) {
_offset += n;
assert(_offset <= _limit);
}
inline Status read(SequentialFile* f) {
if (_offset > 0) {
// Copy the bytes between the buffer's current offset and limit to the beginning of
// the buffer.
memmove(&_compressed_data[0], &_compressed_data[_offset], available());
_limit -= _offset;
_offset = 0;
}
if (_limit >= _compressed_data.size()) {
return Status::InternalError("reached the buffer limit");
}
Slice buff(write_buffer());
Status st = f->read(&buff);
if (st.ok()) {
if (buff.size == 0) return Status::EndOfFile("read empty from " + f->filename());
_limit += buff.size;
}
return st;
}
inline size_t available() const { return _limit - _offset; }
private:
raw::RawVector<uint8_t> _compressed_data;
size_t _offset;
size_t _limit;
};
std::string _filename;
std::shared_ptr<SequentialFile> _input_file;
std::shared_ptr<Decompressor> _decompressor;
CompressedBuffer _compressed_buff;
bool _stream_end = false;
};
} // namespace starrocks

338
be/src/env/env.h vendored Normal file
View File

@ -0,0 +1,338 @@
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license.
// (https://developers.google.com/open-source/licenses/bsd)
#pragma once
#include <memory>
#include <string>
#include "common/status.h"
#include "util/slice.h"
namespace starrocks {
class RandomAccessFile;
class RandomRWFile;
class WritableFile;
class SequentialFile;
struct WritableFileOptions;
struct RandomAccessFileOptions;
struct RandomRWFileOptions;
class Env {
public:
// Governs if/how the file is created.
//
// enum value | file exists | file does not exist
// -----------------------------+-------------------+--------------------
// CREATE_OR_OPEN_WITH_TRUNCATE | opens + truncates | creates
// CREATE_OR_OPEN | opens | creates
// MUST_CREATE | fails | creates
// MUST_EXIST | opens | fails
enum OpenMode { CREATE_OR_OPEN_WITH_TRUNCATE, CREATE_OR_OPEN, MUST_CREATE, MUST_EXIST };
Env() = default;
virtual ~Env() = default;
// Return a default environment suitable for the current operating
// system. Sophisticated users may wish to provide their own Env
// implementation instead of relying on this default environment.
static Env* Default();
// Create a brand new sequentially-readable file with the specified name.
// On success, stores a pointer to the new file in *result and returns OK.
// On failure stores NULL in *result and returns non-OK. If the file does
// not exist, returns a non-OK status.
//
// The returned file will only be accessed by one thread at a time.
virtual Status new_sequential_file(const std::string& fname, std::unique_ptr<SequentialFile>* result) = 0;
// Create a brand new random access read-only file with the
// specified name. On success, stores a pointer to the new file in
// *result and returns OK. On failure stores nullptr in *result and
// returns non-OK. If the file does not exist, returns a non-OK
// status.
//
// The returned file may be concurrently accessed by multiple threads.
virtual Status new_random_access_file(const std::string& fname, std::unique_ptr<RandomAccessFile>* result) = 0;
virtual Status new_random_access_file(const RandomAccessFileOptions& opts, const std::string& fname,
std::unique_ptr<RandomAccessFile>* result) = 0;
// Create an object that writes to a new file with the specified
// name. Deletes any existing file with the same name and creates a
// new file. On success, stores a pointer to the new file in
// *result and returns OK. On failure stores NULL in *result and
// returns non-OK.
//
// The returned file will only be accessed by one thread at a time.
virtual Status new_writable_file(const std::string& fname, std::unique_ptr<WritableFile>* result) = 0;
// Like the previous new_writable_file, but allows options to be
// specified.
virtual Status new_writable_file(const WritableFileOptions& opts, const std::string& fname,
std::unique_ptr<WritableFile>* result) = 0;
// Creates a new readable and writable file. If a file with the same name
// already exists on disk, it is deleted.
//
// Some of the methods of the new file may be accessed concurrently,
// while others are only safe for access by one thread at a time.
virtual Status new_random_rw_file(const std::string& fname, std::unique_ptr<RandomRWFile>* result) = 0;
// Like the previous new_random_rw_file, but allows options to be specified.
virtual Status new_random_rw_file(const RandomRWFileOptions& opts, const std::string& fname,
std::unique_ptr<RandomRWFile>* result) = 0;
// Returns OK if the path exists.
// NotFound if the named file does not exist,
// the calling process does not have permission to determine
// whether this file exists, or if the path is invalid.
// IOError if an IO Error was encountered
virtual Status path_exists(const std::string& fname) = 0;
// Store in *result the names of the children of the specified directory.
// The names are relative to "dir".
// Original contents of *results are dropped.
// Returns OK if "dir" exists and "*result" contains its children.
// NotFound if "dir" does not exist, the calling process does not have
// permission to access "dir", or if "dir" is invalid.
// IOError if an IO Error was encountered
virtual Status get_children(const std::string& dir, std::vector<std::string>* result) = 0;
// Iterate the specified directory and call given callback function with child's
// name. This function continues execution until all children have been iterated
// or callback function return false.
// The names are relative to "dir".
//
// The function call extra cost is acceptable. Compared with returning all children
// into a given vector, the performance of this method is 5% worse. However this
// approach is more flexiable and efficient in fulfilling other requirements.
//
// Returns OK if "dir" exists.
// NotFound if "dir" does not exist, the calling process does not have
// permission to access "dir", or if "dir" is invalid.
// IOError if an IO Error was encountered
virtual Status iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) = 0;
// Delete the named file.
virtual Status delete_file(const std::string& fname) = 0;
// Create the specified directory.
// NOTE: It will return error if the path already exist(not necessarily as a directory)
virtual Status create_dir(const std::string& dirname) = 0;
// Creates directory if missing.
// Return OK if it exists, or successful in Creating.
virtual Status create_dir_if_missing(const std::string& dirname, bool* created = nullptr) = 0;
// Delete the specified directory.
// NOTE: The dir must be empty.
virtual Status delete_dir(const std::string& dirname) = 0;
// Synchronize the entry for a specific directory.
virtual Status sync_dir(const std::string& dirname) = 0;
// Checks if the file is a directory. Returns an error if it doesn't
// exist, otherwise writes true or false into 'is_dir' appropriately.
virtual Status is_directory(const std::string& path, bool* is_dir) = 0;
// Canonicalize 'path' by applying the following conversions:
// - Converts a relative path into an absolute one using the cwd.
// - Converts '.' and '..' references.
// - Resolves all symbolic links.
//
// All directory entries in 'path' must exist on the filesystem.
virtual Status canonicalize(const std::string& path, std::string* result) = 0;
virtual Status get_file_size(const std::string& fname, uint64_t* size) = 0;
// Store the last modification time of fname in *file_mtime.
virtual Status get_file_modified_time(const std::string& fname, uint64_t* file_mtime) = 0;
// Rename file src to target.
virtual Status rename_file(const std::string& src, const std::string& target) = 0;
// create a hard-link
virtual Status link_file(const std::string& /*old_path*/, const std::string& /*new_path*/) = 0;
};
struct RandomAccessFileOptions {
RandomAccessFileOptions() {}
};
// Creation-time options for WritableFile
struct WritableFileOptions {
// Call Sync() during Close().
bool sync_on_close = false;
// See OpenMode for details.
Env::OpenMode mode = Env::CREATE_OR_OPEN_WITH_TRUNCATE;
};
// Creation-time options for RWFile
struct RandomRWFileOptions {
// Call Sync() during Close().
bool sync_on_close = false;
// See OpenMode for details.
Env::OpenMode mode = Env::CREATE_OR_OPEN_WITH_TRUNCATE;
};
// A file abstraction for reading sequentially through a file
class SequentialFile {
public:
SequentialFile() = default;
virtual ~SequentialFile() = default;
// Read up to "result.size" bytes from the file.
// Copies the resulting data into "result.data".
//
// If an error was encountered, returns a non-OK status
// and the contents of "result" are invalid.
//
// Return OK if reached the end of file.
//
// REQUIRES: External synchronization
virtual Status read(Slice* result) = 0;
// Skip "n" bytes from the file. This is guaranteed to be no
// slower that reading the same data, but may be faster.
//
// If end of file is reached, skipping will stop at the end of the
// file, and Skip will return OK.
//
// REQUIRES: External synchronization
virtual Status skip(uint64_t n) = 0;
// Returns the filename provided when the SequentialFile was constructed.
virtual const std::string& filename() const = 0;
};
class RandomAccessFile {
public:
RandomAccessFile() = default;
virtual ~RandomAccessFile() = default;
// Read up to "result.size" bytes from the file.
// Copies the resulting data into "result.data".
//
// Return OK if reached the end of file.
virtual Status read(uint64_t offset, Slice* res) const = 0;
// Read "result.size" bytes from the file starting at "offset".
// Copies the resulting data into "result.data".
//
// If an error was encountered, returns a non-OK status.
//
// This method will internally retry on EINTR and "short reads" in order to
// fully read the requested number of bytes. In the event that it is not
// possible to read exactly 'length' bytes, an IOError is returned.
//
// Safe for concurrent use by multiple threads.
virtual Status read_at(uint64_t offset, const Slice& result) const = 0;
// Reads up to the "results" aggregate size, based on each Slice's "size",
// from the file starting at 'offset'. The Slices must point to already-allocated
// buffers for the data to be written to.
//
// If an error was encountered, returns a non-OK status.
//
// This method will internally retry on EINTR and "short reads" in order to
// fully read the requested number of bytes. In the event that it is not
// possible to read exactly 'length' bytes, an IOError is returned.
//
// Safe for concurrent use by multiple threads.
virtual Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const = 0;
// Return the size of this file
virtual Status size(uint64_t* size) const = 0;
// Return name of this file
virtual const std::string& file_name() const = 0;
};
// A file abstraction for sequential writing. The implementation
// must provide buffering since callers may append small fragments
// at a time to the file.
// Note: To avoid user misuse, WritableFile's API should support only
// one of Append or PositionedAppend. We support only Append here.
class WritableFile {
public:
enum FlushMode { FLUSH_SYNC, FLUSH_ASYNC };
WritableFile() = default;
virtual ~WritableFile() = default;
// Append data to the end of the file
virtual Status append(const Slice& data) = 0;
// If possible, uses scatter-gather I/O to efficiently append
// multiple buffers to a file. Otherwise, falls back to regular I/O.
//
// For implementation specific quirks and details, see comments in
// implementation source code (e.g., env_posix.cc)
virtual Status appendv(const Slice* data, size_t cnt) = 0;
// Pre-allocates 'size' bytes for the file in the underlying filesystem.
// size bytes are added to the current pre-allocated size or to the current
// offset, whichever is bigger. In no case is the file truncated by this
// operation.
//
// On some implementations, preallocation is done without initializing the
// contents of the data blocks (as opposed to writing zeroes), requiring no
// IO to the data blocks.
//
// In no case is the file truncated by this operation.
virtual Status pre_allocate(uint64_t size) = 0;
virtual Status close() = 0;
// Flush all dirty data (not metadata) to disk.
//
// If the flush mode is synchronous, will wait for flush to finish and
// return a meaningful status.
virtual Status flush(FlushMode mode) = 0;
virtual Status sync() = 0;
virtual uint64_t size() const = 0;
// Returns the filename provided when the WritableFile was constructed.
virtual const std::string& filename() const = 0;
private:
// No copying allowed
WritableFile(const WritableFile&);
void operator=(const WritableFile&);
};
// A file abstraction for random reading and writing.
class RandomRWFile {
public:
enum FlushMode { FLUSH_SYNC, FLUSH_ASYNC };
RandomRWFile() = default;
virtual ~RandomRWFile() = default;
virtual Status read_at(uint64_t offset, const Slice& result) const = 0;
virtual Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const = 0;
virtual Status write_at(uint64_t offset, const Slice& data) = 0;
virtual Status writev_at(uint64_t offset, const Slice* data, size_t data_cnt) = 0;
virtual Status flush(FlushMode mode, uint64_t offset, size_t length) = 0;
virtual Status sync() = 0;
virtual Status close() = 0;
virtual Status size(uint64_t* size) const = 0;
virtual const std::string& filename() const = 0;
};
} // namespace starrocks

538
be/src/env/env_broker.cpp vendored Normal file
View File

@ -0,0 +1,538 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "env/env_broker.h"
#include <brpc/uri.h>
#include <chrono>
#include <memory>
#include <string>
#include <thread>
#include "env/env.h"
#include "gen_cpp/FileBrokerService_types.h"
#include "gen_cpp/TFileBrokerService.h"
#include "gutil/strings/substitute.h"
#include "runtime/broker_mgr.h"
#include "runtime/client_cache.h"
#include "runtime/exec_env.h"
#include "util/coding.h"
namespace starrocks {
using BrokerServiceClient = TFileBrokerServiceClient;
#ifdef BE_TEST
namespace {
TFileBrokerServiceClient* g_broker_client = nullptr;
}
void EnvBroker::TEST_set_broker_client(TFileBrokerServiceClient* client) {
g_broker_client = client;
}
const std::string& get_client_id(const TNetworkAddress& /*broker_addr*/) {
static std::string s_client_id("test_client_id");
return s_client_id;
}
#else
const std::string& get_client_id(const TNetworkAddress& broker_addr) {
return ExecEnv::GetInstance()->broker_mgr()->get_client_id(broker_addr);
}
#endif
inline BrokerServiceClientCache* client_cache() {
return ExecEnv::GetInstance()->broker_client_cache();
}
static Status to_status(const TBrokerOperationStatus& st) {
switch (st.statusCode) {
case TBrokerOperationStatusCode::OK:
return Status::OK();
case TBrokerOperationStatusCode::END_OF_FILE:
return Status::EndOfFile(st.message);
case TBrokerOperationStatusCode::NOT_AUTHORIZED:
return Status::IOError("No broker permission, " + st.message);
case TBrokerOperationStatusCode::DUPLICATE_REQUEST:
return Status::InternalError("Duplicate broker request, " + st.message);
case TBrokerOperationStatusCode::INVALID_INPUT_OFFSET:
return Status::InvalidArgument("Invalid broker offset, " + st.message);
case TBrokerOperationStatusCode::INVALID_ARGUMENT:
return Status::InvalidArgument("Invalid broker argument, " + st.message);
case TBrokerOperationStatusCode::INVALID_INPUT_FILE_PATH:
return Status::NotFound("Invalid broker file path, " + st.message);
case TBrokerOperationStatusCode::FILE_NOT_FOUND:
return Status::NotFound("Broker file not found, " + st.message);
case TBrokerOperationStatusCode::TARGET_STORAGE_SERVICE_ERROR:
return Status::InternalError("Broker storage service error, " + st.message);
case TBrokerOperationStatusCode::OPERATION_NOT_SUPPORTED:
return Status::NotSupported("Broker operation not supported, " + st.message);
}
return Status::InternalError("Unknown broker error, " + st.message);
}
template <typename Method, typename Request, typename Response>
static Status call_method(const TNetworkAddress& broker, Method method, const Request& request, Response* response,
int retry_count = 1, int timeout_ms = DEFAULT_TIMEOUT_MS) {
Status status;
TFileBrokerServiceClient* client;
#ifndef BE_TEST
BrokerServiceConnection conn(client_cache(), broker, timeout_ms, &status);
if (!status.ok()) {
LOG(WARNING) << "Fail to get broker client: " << status;
return status;
}
client = conn.get();
#else
client = g_broker_client;
#endif
while (true) {
try {
(client->*method)(*response, request);
return Status::OK();
} catch (apache::thrift::transport::TTransportException& e) {
#ifndef BE_TEST
RETURN_IF_ERROR(conn.reopen());
client = conn.get();
#endif
if (retry_count-- > 0) {
std::this_thread::sleep_for(std::chrono::seconds(1));
} else {
return Status::ThriftRpcError(e.what());
}
} catch (apache::thrift::TException& e) {
return Status::ThriftRpcError(e.what());
}
}
}
// This function will *NOT* return EOF status.
static Status broker_pread(void* buff, const TNetworkAddress& broker, const TBrokerFD& fd, int64_t offset,
int64_t* length) {
int64_t bytes_read = 0;
while (bytes_read < *length) {
TBrokerPReadRequest request;
TBrokerReadResponse response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_fd(fd);
request.__set_offset(offset + bytes_read);
request.__set_length(*length - bytes_read);
RETURN_IF_ERROR(call_method(broker, &BrokerServiceClient::pread, request, &response));
if (response.opStatus.statusCode == TBrokerOperationStatusCode::END_OF_FILE) {
break;
} else if (response.opStatus.statusCode != TBrokerOperationStatusCode::OK) {
return to_status(response.opStatus);
} else if (response.data.empty()) {
break;
}
memcpy((char*)buff + bytes_read, response.data.data(), response.data.size());
bytes_read += static_cast<int64_t>(response.data.size());
}
*length = bytes_read;
return Status::OK();
}
static void broker_close_reader(const TNetworkAddress& broker, const TBrokerFD& fd) {
TBrokerCloseReaderRequest request;
TBrokerOperationStatus response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_fd(fd);
Status st = call_method(broker, &BrokerServiceClient::closeReader, request, &response);
LOG_IF(WARNING, !st.ok()) << "Fail to close broker reader, " << st.to_string();
}
static Status broker_close_writer(const TNetworkAddress& broker, const TBrokerFD& fd, int timeout_ms) {
TBrokerCloseWriterRequest request;
TBrokerOperationStatus response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_fd(fd);
Status st = call_method(broker, &BrokerServiceClient::closeWriter, request, &response, 1, timeout_ms);
if (!st.ok()) {
LOG(WARNING) << "Fail to close broker writer: " << st;
return st;
}
if (response.statusCode != TBrokerOperationStatusCode::OK) {
LOG(WARNING) << "Fail to close broker writer: " << response.message;
return to_status(response);
}
return Status::OK();
}
class BrokerRandomAccessFile : public RandomAccessFile {
public:
BrokerRandomAccessFile(const TNetworkAddress& broker, std::string path, const TBrokerFD& fd, int64_t size)
: _broker(broker), _path(std::move(path)), _fd(fd), _size(size) {}
~BrokerRandomAccessFile() override { broker_close_reader(_broker, _fd); }
// Return OK if reached end of file in order to be compatible with posix env.
Status read(uint64_t offset, Slice* res) const override {
int64_t length = static_cast<int64_t>(res->size);
Status st = broker_pread(res->data, _broker, _fd, static_cast<int64_t>(offset), &length);
if (st.ok()) {
res->size = length;
}
LOG_IF(WARNING, !st.ok()) << "Fail to read " << _path << ", " << st.message();
return st;
}
Status read_at(uint64_t offset, const Slice& res) const override {
int64_t length = static_cast<int64_t>(res.size);
Status st = broker_pread(res.data, _broker, _fd, static_cast<int64_t>(offset), &length);
if (!st.ok()) {
LOG(WARNING) << "Fail to read " << _path << ": " << st.message();
return st;
}
if (length < res.size) {
LOG(WARNING) << "Fail to read from " << _path << ", partial read expect=" << res.size << " real=" << length;
return Status::IOError("Partial read");
}
return Status::OK();
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
for (size_t i = 0; i < res_cnt; i++) {
RETURN_IF_ERROR(read_at(offset, res[i]));
offset += res[i].size;
}
return Status::OK();
}
Status size(uint64_t* size) const override {
*size = _size;
return Status::OK();
}
const std::string& file_name() const override { return _path; }
private:
TNetworkAddress _broker;
std::string _path;
TBrokerFD _fd;
int64_t _size;
};
class BrokerSequentialFile : public SequentialFile {
public:
explicit BrokerSequentialFile(std::unique_ptr<RandomAccessFile> random_file) : _file(std::move(random_file)) {}
~BrokerSequentialFile() override = default;
Status read(Slice* result) override {
Status st = _file->read(_offset, result);
_offset += st.ok() ? result->size : 0;
return st;
}
Status skip(uint64_t n) override {
_offset += n;
return Status::OK();
}
const std::string& filename() const override { return _file->file_name(); }
private:
std::unique_ptr<RandomAccessFile> _file;
size_t _offset = 0;
};
class BrokerWritableFile : public WritableFile {
public:
BrokerWritableFile(const TNetworkAddress& broker, std::string path, const TBrokerFD& fd, size_t offset,
int timeout_ms)
: _broker(broker), _path(std::move(path)), _fd(fd), _offset(offset), _timeout_ms(timeout_ms) {}
~BrokerWritableFile() override { (void)BrokerWritableFile::close(); }
Status append(const Slice& data) override {
TBrokerPWriteRequest request;
TBrokerOperationStatus response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_fd(_fd);
request.__set_offset(static_cast<int64_t>(_offset));
request.__set_data(data.to_string());
Status st = call_method(_broker, &BrokerServiceClient::pwrite, request, &response, 0, _timeout_ms);
if (!st.ok()) {
LOG(WARNING) << "Fail to append " << _path << ": " << st;
return st;
}
if (response.statusCode != TBrokerOperationStatusCode::OK) {
LOG(WARNING) << "Fail to append " << _path << ": " << response.message;
return to_status(response);
}
_offset += data.size;
return Status::OK();
}
Status appendv(const Slice* data, size_t cnt) override {
for (size_t i = 0; i < cnt; i++) {
RETURN_IF_ERROR(append(data[i]));
}
return Status::OK();
}
Status pre_allocate(uint64_t size) override { return Status::NotSupported("BrokerWritableFile::pre_allocate"); }
Status close() override {
if (_closed) {
return Status::OK();
}
Status st = broker_close_writer(_broker, _fd, _timeout_ms);
_closed = st.ok();
return st;
}
Status flush(FlushMode mode) override { return Status::OK(); }
Status sync() override {
LOG(WARNING) << "Ignored sync " << _path;
return Status::OK();
}
uint64_t size() const override { return _offset; }
const std::string& filename() const override { return _path; }
private:
TNetworkAddress _broker;
std::string _path;
TBrokerFD _fd;
size_t _offset;
bool _closed = false;
int _timeout_ms = DEFAULT_TIMEOUT_MS;
};
Status EnvBroker::new_sequential_file(const std::string& path, std::unique_ptr<SequentialFile>* file) {
std::unique_ptr<RandomAccessFile> random_file;
RETURN_IF_ERROR(new_random_access_file(path, &random_file));
*file = std::make_unique<BrokerSequentialFile>(std::move(random_file));
return Status::OK();
}
Status EnvBroker::new_random_access_file(const std::string& path, std::unique_ptr<RandomAccessFile>* file) {
return new_random_access_file(RandomAccessFileOptions(), path, file);
}
Status EnvBroker::new_random_access_file(const RandomAccessFileOptions& opts, const std::string& path,
std::unique_ptr<RandomAccessFile>* file) {
TBrokerOpenReaderRequest request;
TBrokerOpenReaderResponse response;
request.__set_path(path);
request.__set_clientId(get_client_id(_broker_addr));
request.__set_startOffset(0);
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_properties(_properties);
Status st = call_method(_broker_addr, &BrokerServiceClient::openReader, request, &response);
if (!st.ok()) {
LOG(WARNING) << "Fail to open " << path << ": " << st;
return st;
}
if (response.opStatus.statusCode != TBrokerOperationStatusCode::OK) {
LOG(WARNING) << "Fail to open " << path << ": " << response.opStatus.message;
return to_status(response.opStatus);
}
// Get file size
uint64_t size;
RETURN_IF_ERROR(_get_file_size(path, &size));
*file = std::make_unique<BrokerRandomAccessFile>(_broker_addr, path, response.fd, size);
return Status::OK();
}
Status EnvBroker::new_writable_file(const std::string& path, std::unique_ptr<WritableFile>* file) {
return new_writable_file(WritableFileOptions(), path, file);
}
Status EnvBroker::new_writable_file(const WritableFileOptions& opts, const std::string& path,
std::unique_ptr<WritableFile>* file) {
if (opts.mode == CREATE_OR_OPEN_WITH_TRUNCATE) {
if (auto st = _path_exists(path); st.ok()) {
return Status::NotSupported("Cannot truncate a file by broker");
}
} else if (opts.mode == MUST_CREATE) {
if (auto st = _path_exists(path); st.ok()) {
return Status::AlreadyExist(path);
}
} else if (opts.mode == MUST_EXIST) {
return Status::NotSupported("Open with MUST_EXIST not supported by broker");
} else if (opts.mode == CREATE_OR_OPEN) {
if (auto st = _path_exists(path); st.ok()) {
return Status::NotSupported("Cannot write an already exists file through broker");
}
} else {
auto msg = strings::Substitute("Unsupported open mode $0", opts.mode);
return Status::NotSupported(msg);
}
TBrokerOpenWriterRequest request;
TBrokerOpenWriterResponse response;
request.__set_path(path);
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_openMode(TBrokerOpenMode::APPEND);
request.__set_clientId(get_client_id(_broker_addr));
request.__set_properties(_properties);
Status st = call_method(_broker_addr, &BrokerServiceClient::openWriter, request, &response, 1, _timeout_ms);
if (!st.ok()) {
LOG(WARNING) << "Fail to open " << path << ": " << st;
return st;
}
if (response.opStatus.statusCode != TBrokerOperationStatusCode::OK) {
LOG(WARNING) << "Fail to open " << path << ": " << response.opStatus.message;
return to_status(response.opStatus);
}
*file = std::make_unique<BrokerWritableFile>(_broker_addr, path, response.fd, 0, _timeout_ms);
return Status::OK();
}
Status EnvBroker::new_random_rw_file(const std::string& path, std::unique_ptr<RandomRWFile>* file) {
return Status::NotSupported("EnvBroker::new_random_rw_file");
}
Status EnvBroker::new_random_rw_file(const RandomRWFileOptions& opts, const std::string& path,
std::unique_ptr<RandomRWFile>* file) {
return Status::NotSupported("BrokerEnv::new_random_rw_file");
}
Status EnvBroker::path_exists(const std::string& path) {
return _path_exists(path);
}
Status EnvBroker::_path_exists(const std::string& path) {
TBrokerCheckPathExistRequest request;
TBrokerCheckPathExistResponse response;
request.__set_properties(_properties);
request.__set_path(path);
request.__set_version(TBrokerVersion::VERSION_ONE);
RETURN_IF_ERROR(call_method(_broker_addr, &BrokerServiceClient::checkPathExist, request, &response));
if (response.opStatus.statusCode != TBrokerOperationStatusCode::OK) {
return to_status(response.opStatus);
}
return response.isPathExist ? Status::OK() : Status::NotFound(path);
}
Status EnvBroker::get_children(const std::string& dir, std::vector<std::string>* file) {
return Status::NotSupported("EnvBroker::get_children");
}
Status EnvBroker::iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) {
std::vector<std::string> files;
RETURN_IF_ERROR(get_children(dir, &files));
for (const auto& f : files) {
if (!cb(f.c_str())) {
break;
}
}
return Status::OK();
}
Status EnvBroker::delete_file(const std::string& path) {
return _delete_file(path);
}
Status EnvBroker::_delete_file(const std::string& path) {
TBrokerDeletePathRequest request;
TBrokerOperationStatus response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_path(path);
request.__set_properties(_properties);
Status st = call_method(_broker_addr, &BrokerServiceClient::deletePath, request, &response);
if (!st.ok()) {
LOG(WARNING) << "Fail to delete " << path << ": " << st.message();
return st;
}
st = to_status(response);
if (st.ok()) {
LOG(INFO) << "Deleted " << path;
} else {
LOG(WARNING) << "Fail to delete " << path << ": " << st.message();
}
return st;
}
Status EnvBroker::create_dir(const std::string& dirname) {
return Status::NotSupported("BrokerEnv::create_dir");
}
Status EnvBroker::create_dir_if_missing(const std::string& dirname, bool* created) {
return Status::NotSupported("BrokerEnv::create_dir_if_missing");
}
Status EnvBroker::delete_dir(const std::string& dirname) {
return Status::NotSupported("BrokerEnv::delete_dir");
}
Status EnvBroker::sync_dir(const std::string& dirname) {
return Status::NotSupported("BrokerEnv::sync_dir");
}
Status EnvBroker::is_directory(const std::string& path, bool* is_dir) {
TBrokerFileStatus stat;
RETURN_IF_ERROR(_list_file(path, &stat));
*is_dir = stat.isDir;
return Status::OK();
}
Status EnvBroker::canonicalize(const std::string& path, std::string* file) {
return Status::NotSupported("BrokerEnv::canonicalize");
}
Status EnvBroker::_get_file_size(const std::string& path, uint64_t* size) {
TBrokerFileStatus stat;
Status st = _list_file(path, &stat);
*size = stat.size;
return st;
}
Status EnvBroker::get_file_size(const std::string& path, uint64_t* size) {
return _get_file_size(path, size);
}
Status EnvBroker::get_file_modified_time(const std::string& path, uint64_t* file_mtime) {
return Status::NotSupported("BrokerEnv::get_file_modified_time");
}
Status EnvBroker::rename_file(const std::string& src, const std::string& target) {
return Status::NotSupported("BrokerEnv::rename_file");
}
Status EnvBroker::link_file(const std::string& old_path, const std::string& new_path) {
return Status::NotSupported("BrokerEnv::link_file");
}
Status EnvBroker::_list_file(const std::string& path, TBrokerFileStatus* stat) {
TBrokerListPathRequest request;
TBrokerListResponse response;
request.__set_version(TBrokerVersion::VERSION_ONE);
request.__set_fileNameOnly(true);
request.__set_isRecursive(false);
request.__set_path(path);
request.__set_properties(_properties);
RETURN_IF_ERROR(call_method(_broker_addr, &BrokerServiceClient::listPath, request, &response));
if (response.opStatus.statusCode == TBrokerOperationStatusCode::FILE_NOT_FOUND ||
response.opStatus.statusCode == TBrokerOperationStatusCode::NOT_AUTHORIZED) {
return Status::NotFound(path);
} else if (response.opStatus.statusCode == TBrokerOperationStatusCode::OK) {
if (response.files.size() != 1) {
return Status::InternalError(strings::Substitute("unexpected file list size=$0", response.files.size()));
}
swap(*stat, response.files[0]);
return Status::OK();
} else {
return to_status(response.opStatus);
}
}
} // namespace starrocks

83
be/src/env/env_broker.h vendored Normal file
View File

@ -0,0 +1,83 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "env/env.h"
#include "gen_cpp/Types_types.h"
namespace starrocks {
const static int DEFAULT_TIMEOUT_MS = 10000;
class TBrokerFileStatus;
class TFileBrokerServiceClient;
class TNetworkAddress;
class EnvBroker : public Env {
public:
// FIXME: |timeout_ms| is unused now.
EnvBroker(const TNetworkAddress& broker_addr, std::map<std::string, std::string> properties,
int timeout_ms = DEFAULT_TIMEOUT_MS)
: _broker_addr(broker_addr), _properties(std::move(properties)), _timeout_ms(timeout_ms) {}
Status new_sequential_file(const std::string& path, std::unique_ptr<SequentialFile>* file) override;
Status new_random_access_file(const std::string& path, std::unique_ptr<RandomAccessFile>* file) override;
Status new_random_access_file(const RandomAccessFileOptions& opts, const std::string& path,
std::unique_ptr<RandomAccessFile>* file) override;
Status new_writable_file(const std::string& path, std::unique_ptr<WritableFile>* file) override;
Status new_writable_file(const WritableFileOptions& opts, const std::string& path,
std::unique_ptr<WritableFile>* file) override;
Status new_random_rw_file(const std::string& path, std::unique_ptr<RandomRWFile>* file) override;
Status new_random_rw_file(const RandomRWFileOptions& opts, const std::string& path,
std::unique_ptr<RandomRWFile>* file) override;
Status path_exists(const std::string& path) override;
Status get_children(const std::string& dir, std::vector<std::string>* file) override;
Status iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) override;
Status delete_file(const std::string& path) override;
Status create_dir(const std::string& dirname) override;
Status create_dir_if_missing(const std::string& dirname, bool* created) override;
Status delete_dir(const std::string& dirname) override;
Status sync_dir(const std::string& dirname) override;
Status is_directory(const std::string& path, bool* is_dir) override;
Status canonicalize(const std::string& path, std::string* file) override;
Status get_file_size(const std::string& path, uint64_t* size) override;
Status get_file_modified_time(const std::string& path, uint64_t* file_mtime) override;
Status rename_file(const std::string& src, const std::string& target) override;
Status link_file(const std::string& old_path, const std::string& new_path) override;
#ifdef BE_TEST
static void TEST_set_broker_client(TFileBrokerServiceClient* client);
#endif
private:
Status _get_file_size(const std::string& params, uint64_t* size);
Status _path_exists(const std::string& path);
Status _delete_file(const std::string& path);
Status _list_file(const std::string& path, TBrokerFileStatus* stat);
TNetworkAddress _broker_addr;
std::map<std::string, std::string> _properties;
int _timeout_ms;
};
} // namespace starrocks

67
be/src/env/env_hdfs.cpp vendored Normal file
View File

@ -0,0 +1,67 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "env/env_hdfs.h"
#include "env/env.h"
#include "gutil/strings/substitute.h"
#include "util/hdfs_util.h"
namespace starrocks {
HdfsRandomAccessFile::HdfsRandomAccessFile(hdfsFS fs, hdfsFile file, std::string filename)
: _fs(fs), _file(file), _filename(std::move(filename)) {}
static Status read_at_internal(hdfsFS fs, hdfsFile file, const std::string& file_name, int64_t offset, Slice* res) {
auto cur_offset = hdfsTell(fs, file);
if (cur_offset == -1) {
return Status::IOError(
strings::Substitute("fail to get offset, file=$0, error=$1", file_name, get_hdfs_err_msg()));
}
if (cur_offset != offset) {
if (hdfsSeek(fs, file, offset)) {
return Status::IOError(strings::Substitute("fail to seek offset, file=$0, offset=$1, error=$1", file_name,
offset, get_hdfs_err_msg()));
}
}
size_t bytes_read = 0;
while (bytes_read < res->size) {
size_t to_read = res->size - bytes_read;
auto hdfs_res = hdfsRead(fs, file, res->data + bytes_read, to_read);
if (hdfs_res < 0) {
return Status::IOError(
strings::Substitute("fail to read file, file=$0, error=$1", file_name, get_hdfs_err_msg()));
} else if (hdfs_res == 0) {
break;
}
bytes_read += hdfs_res;
}
res->size = bytes_read;
return Status::OK();
}
Status HdfsRandomAccessFile::read(uint64_t offset, Slice* res) const {
RETURN_IF_ERROR(read_at_internal(_fs, _file, _filename, offset, res));
return Status::OK();
}
Status HdfsRandomAccessFile::read_at(uint64_t offset, const Slice& res) const {
Slice slice = res;
RETURN_IF_ERROR(read_at_internal(_fs, _file, _filename, offset, &slice));
if (slice.size != res.size) {
return Status::InternalError(
strings::Substitute("fail to read enough data, file=$0, offset=$1, size=$2, expect=$3", _filename,
offset, slice.size, res.size));
}
return Status::OK();
}
Status HdfsRandomAccessFile::readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const {
// TODO: implement
return Status::InternalError("HdfsRandomAccessFile::readv_at not implement");
}
Status HdfsRandomAccessFile::size(uint64_t* size) const {
// TODO: implement
return Status::InternalError("HdfsRandomAccessFile::size not implement");
}
} // namespace starrocks

33
be/src/env/env_hdfs.h vendored Normal file
View File

@ -0,0 +1,33 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <hdfs/hdfs.h>
#include "env/env.h"
namespace starrocks {
// class for remote read hdfs file
// Now this is not thread-safe.
class HdfsRandomAccessFile : public RandomAccessFile {
public:
HdfsRandomAccessFile(hdfsFS fs, hdfsFile file, std::string filename);
~HdfsRandomAccessFile() override = default;
Status read(uint64_t offset, Slice* res) const override;
Status read_at(uint64_t offset, const Slice& res) const override;
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override;
Status size(uint64_t* size) const override;
const std::string& file_name() const override { return _filename; }
hdfsFile hdfs_file() const { return _file; }
private:
hdfsFS _fs;
hdfsFile _file;
std::string _filename;
};
} // namespace starrocks

644
be/src/env/env_memory.cpp vendored Normal file
View File

@ -0,0 +1,644 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "env/env_memory.h"
#include <butil/files/file_path.h>
#include "util/raw_container.h"
namespace starrocks {
enum InodeType {
kNormal = 0,
kDir,
};
struct Inode {
Inode(InodeType t, std::string c) : type(t), data(std::move(c)) {}
InodeType type = kNormal;
std::string data;
};
using InodePtr = std::shared_ptr<Inode>;
class MemoryRandomAccessFile final : public RandomAccessFile {
public:
MemoryRandomAccessFile(std::string path, InodePtr inode) : _path(std::move(path)), _inode(std::move(inode)) {}
~MemoryRandomAccessFile() override = default;
Status read(uint64_t offset, Slice* res) const override {
const std::string& data = _inode->data;
if (offset >= data.size()) {
res->size = 0;
return Status::OK();
}
size_t to_read = std::min<size_t>(res->size, data.size() - offset);
memcpy(res->data, data.data() + offset, to_read);
res->size = to_read;
return Status::OK();
}
Status read_at(uint64_t offset, const Slice& result) const override {
const std::string& data = _inode->data;
if (offset + result.size > data.size()) {
return Status::IOError("Cannot read required bytes");
}
memcpy(result.data, data.data() + offset, result.size);
return Status::OK();
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
const std::string& data = _inode->data;
size_t total_size = 0;
for (int i = 0; i < res_cnt; ++i) {
total_size += res[i].size;
}
if (offset + total_size > data.size()) {
return Status::IOError("Cannot read required bytes");
}
for (int i = 0; i < res_cnt; ++i) {
memcpy(res[i].data, data.data() + offset, res[i].size);
offset += res[i].size;
}
return Status::OK();
}
Status size(uint64_t* size) const override {
const std::string& data = _inode->data;
*size = data.size();
return Status::OK();
}
const std::string& file_name() const override { return _path; }
private:
std::string _path;
InodePtr _inode;
};
class MemorySequentialFile final : public SequentialFile {
public:
MemorySequentialFile(std::string path, InodePtr inode)
: _offset(0), _random_file(std::move(path), std::move(inode)) {}
~MemorySequentialFile() override = default;
Status read(Slice* res) override {
Status st = _random_file.read(_offset, res);
if (st.ok()) {
_offset += res->size;
}
return st;
}
const std::string& filename() const override { return _random_file.file_name(); }
Status skip(uint64_t n) override {
uint64_t size = 0;
CHECK(_random_file.size(&size).ok());
_offset = std::min(_offset + n, size);
return Status::OK();
}
private:
uint64_t _offset = 0;
MemoryRandomAccessFile _random_file;
};
class MemoryWritableFile final : public WritableFile {
public:
MemoryWritableFile(std::string path, InodePtr inode) : _path(std::move(path)), _inode(std::move(inode)) {}
Status append(const Slice& data) override {
_inode->data.append(data.data, data.size);
return Status::OK();
}
Status appendv(const Slice* data, size_t cnt) override {
for (size_t i = 0; i < cnt; i++) {
(void)append(data[i]);
}
return Status::OK();
}
Status pre_allocate(uint64_t size) override {
_inode->data.reserve(size);
return Status::OK();
}
Status close() override {
_inode = nullptr;
return Status::OK();
}
Status flush(FlushMode mode) override { return Status::OK(); }
Status sync() override { return Status::OK(); }
uint64_t size() const override { return _inode->data.size(); }
const std::string& filename() const override { return _path; }
private:
std::string _path;
InodePtr _inode;
};
class MemoryRandomRWFile final : public RandomRWFile {
public:
MemoryRandomRWFile(std::string path, InodePtr inode) : _path(std::move(path)), _inode(std::move(inode)) {}
Status read_at(uint64_t offset, const Slice& result) const override {
const std::string& data = _inode->data;
if (offset + result.size > data.size()) {
return Status::IOError("invalid offset or buffer size");
}
memcpy(result.data, &data[offset], result.size);
return Status::OK();
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
for (size_t i = 0; i < res_cnt; i++) {
RETURN_IF_ERROR(read_at(offset, res[i]));
offset += res[i].size;
}
return Status::OK();
}
Status write_at(uint64_t offset, const Slice& data) override {
std::string& content = _inode->data;
if (offset + data.size > content.size()) {
content.resize(offset + data.size);
}
memcpy(&content[offset], data.data, data.size);
return Status::OK();
}
Status writev_at(uint64_t offset, const Slice* data, size_t data_cnt) override {
for (size_t i = 0; i < data_cnt; i++) {
(void)write_at(offset, data[i]);
offset += data[i].size;
}
return Status::OK();
}
Status flush(FlushMode mode, uint64_t offset, size_t length) override { return Status::OK(); }
Status sync() override { return Status::OK(); }
Status close() override {
_inode = nullptr;
return Status::OK();
}
Status size(uint64_t* size) const override {
*size = _inode->data.size();
return Status::OK();
}
const std::string& filename() const override { return _path; }
private:
std::string _path;
InodePtr _inode;
};
class EnvMemoryImpl {
public:
EnvMemoryImpl() {
// init root directory.
_namespace["/"] = std::make_shared<Inode>(kDir, "");
}
Status new_sequential_file(const butil::FilePath& path, std::unique_ptr<SequentialFile>* file) {
auto iter = _namespace.find(path.value());
if (iter == _namespace.end()) {
return Status::NotFound(path.value());
} else {
*file = std::make_unique<MemorySequentialFile>(path.value(), iter->second);
return Status::OK();
}
}
Status new_random_access_file(const butil::FilePath& path, std::unique_ptr<RandomAccessFile>* file) {
return new_random_access_file(RandomAccessFileOptions(), path, file);
}
Status new_random_access_file(const RandomAccessFileOptions& opts, const butil::FilePath& path,
std::unique_ptr<RandomAccessFile>* file) {
auto iter = _namespace.find(path.value());
if (iter == _namespace.end()) {
return Status::NotFound(path.value());
} else {
*file = std::make_unique<MemoryRandomAccessFile>(path.value(), iter->second);
return Status::OK();
}
}
template <typename DerivedType, typename BaseType>
Status new_writable_file(Env::OpenMode mode, const butil::FilePath& path, std::unique_ptr<BaseType>* file) {
InodePtr inode = get_inode(path);
if (mode == Env::MUST_EXIST && inode == nullptr) {
return Status::NotFound(path.value());
}
if (mode == Env::MUST_CREATE && inode != nullptr) {
return Status::AlreadyExist(path.value());
}
if (mode == Env::CREATE_OR_OPEN_WITH_TRUNCATE && inode != nullptr) {
inode->data.clear();
}
if (inode == nullptr && !path_exists(path.DirName()).ok()) {
return Status::NotFound("parent directory not exist");
}
if (inode == nullptr) {
assert(mode != Env::MUST_EXIST);
inode = std::make_shared<Inode>(kNormal, "");
_namespace[path.value()] = inode;
} else if (inode->type != kNormal) {
return Status::IOError(path.value() + " is a directory");
}
*file = std::make_unique<DerivedType>(path.value(), std::move(inode));
return Status::OK();
}
Status path_exists(const butil::FilePath& path) {
return get_inode(path) != nullptr ? Status::OK() : Status::NotFound(path.value());
}
Status get_children(const butil::FilePath& path, std::vector<std::string>* file) {
return iterate_dir(path, [&](const char* filename) -> bool {
file->emplace_back(filename);
return true;
});
}
Status iterate_dir(const butil::FilePath& path, const std::function<bool(const char*)>& cb) {
auto inode = get_inode(path);
if (inode == nullptr || inode->type != kDir) {
return Status::NotFound(path.value());
}
DCHECK(path.value().back() != '/' || path.value() == "/");
std::string s = (path.value() == "/") ? path.value() : path.value() + "/";
for (auto iter = _namespace.lower_bound(s); iter != _namespace.end(); ++iter) {
Slice child(iter->first);
if (!child.starts_with(s)) {
break;
}
// Get the relative path.
child.remove_prefix(s.size());
if (child.empty()) {
continue;
}
auto slash = (const char*)memchr(child.data, '/', child.size);
if (slash != nullptr) {
continue;
}
if (!cb(child.data)) {
break;
}
}
return Status::OK();
}
Status delete_file(const butil::FilePath& path) {
auto iter = _namespace.find(path.value());
if (iter == _namespace.end() || iter->second->type != kNormal) {
return Status::NotFound(path.value());
}
_namespace.erase(iter);
return Status::OK();
}
Status create_dir(const butil::FilePath& dirname) {
if (get_inode(dirname) != nullptr) {
return Status::AlreadyExist(dirname.value());
}
if (get_inode(dirname.DirName()) == nullptr) {
return Status::NotFound("parent directory not exist");
}
_namespace[dirname.value()] = std::make_shared<Inode>(kDir, "");
return Status::OK();
}
Status create_dir_if_missing(const butil::FilePath& dirname, bool* created) {
auto inode = get_inode(dirname);
if (inode != nullptr && inode->type == kDir) {
*created = false;
return Status::OK();
} else if (inode != nullptr) {
return Status::AlreadyExist(dirname.value());
} else if (get_inode(dirname.DirName()) == nullptr) {
return Status::NotFound("parent directory not exist");
} else {
*created = true;
_namespace[dirname.value()] = std::make_shared<Inode>(kDir, "");
return Status::OK();
}
}
Status delete_dir(const butil::FilePath& dirname) {
bool empty_dir = true;
RETURN_IF_ERROR(iterate_dir(dirname, [&](const char*) -> bool {
empty_dir = false;
return false;
}));
if (!empty_dir) {
return Status::IOError("directory not empty");
}
_namespace.erase(dirname.value());
return Status::OK();
}
Status is_directory(const butil::FilePath& path, bool* is_dir) {
auto inode = get_inode(path);
if (inode == nullptr) {
return Status::NotFound(path.value());
}
*is_dir = (inode->type == kDir);
return Status::OK();
}
Status get_file_size(const butil::FilePath& path, uint64_t* size) {
auto inode = get_inode(path);
if (inode == nullptr || inode->type != kNormal) {
return Status::NotFound("not exist or is a directory");
}
*size = inode->data.size();
return Status::OK();
}
Status rename_file(const butil::FilePath& src, const butil::FilePath& target) {
Slice s1(src.value());
Slice s2(target.value());
if (s2.starts_with(s1) && s2.size != s1.size) {
return Status::InvalidArgument("cannot make a directory a subdirectory of itself");
}
auto src_inode = get_inode(src);
auto dst_inode = get_inode(target);
if (src_inode == nullptr) {
return Status::NotFound(src.value());
}
auto dst_parent = get_inode(target.DirName());
if (dst_parent == nullptr || dst_parent->type != kDir) {
return Status::NotFound(target.DirName().value());
}
if (dst_inode != nullptr) {
if (src_inode->type == kNormal && dst_inode->type == kDir) {
return Status::IOError("target is an existing directory, but source is not a directory");
}
if (src_inode->type == kDir && dst_inode->type == kNormal) {
return Status::IOError("source is a directory, but target is not a directory");
}
// |src| and |target| referring to the same file
if (src_inode.get() == dst_inode.get()) {
return Status::OK();
}
if (dst_inode->type == kDir && !_is_directory_empty(target)) {
return Status::IOError("target is a nonempty directory");
}
}
_namespace[target.value()] = src_inode;
if (src_inode->type == kDir) {
std::vector<std::string> children;
Status st = get_children(src, &children);
LOG_IF(FATAL, !st.ok()) << st.to_string();
for (const auto& s : children) {
butil::FilePath src_child_path = src.Append(s);
butil::FilePath dst_child_path = target.Append(s);
st = rename_file(src_child_path, dst_child_path);
LOG_IF(FATAL, !st.ok()) << st.to_string();
}
}
_namespace.erase(src.value());
return Status::OK();
}
Status link_file(const butil::FilePath& old_path, const butil::FilePath& new_path) {
auto old_inode = get_inode(old_path);
auto new_inode = get_inode(new_path);
if (new_inode != nullptr) {
return Status::AlreadyExist(new_path.value());
}
if (old_inode == nullptr) {
return Status::NotFound(old_path.value());
}
if (get_inode(new_path.DirName()) == nullptr) {
return Status::NotFound(new_path.value());
}
_namespace[new_path.value()] = old_inode;
return Status::OK();
}
private:
// prerequisite: |path| exist and is a directory.
bool _is_directory_empty(const butil::FilePath& path) {
bool empty_dir = true;
Status st = iterate_dir(path, [&](const char*) -> bool {
empty_dir = false;
return false;
});
CHECK(st.ok()) << st.to_string();
return empty_dir;
}
// Returns nullptr if |path| does not exists.
InodePtr get_inode(const butil::FilePath& path) {
auto iter = _namespace.find(path.value());
return iter == _namespace.end() ? nullptr : iter->second;
}
template <typename K, typename V>
using OrderedMap = std::map<K, V>;
OrderedMap<std::string, InodePtr> _namespace;
};
EnvMemory::EnvMemory() : _impl(new EnvMemoryImpl()) {}
EnvMemory::~EnvMemory() {
delete _impl;
}
Status EnvMemory::new_sequential_file(const std::string& path, std::unique_ptr<SequentialFile>* file) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->new_sequential_file(butil::FilePath(new_path), file);
}
Status EnvMemory::new_random_access_file(const std::string& path, std::unique_ptr<RandomAccessFile>* file) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->new_random_access_file(butil::FilePath(new_path), file);
}
Status EnvMemory::new_random_access_file(const RandomAccessFileOptions& opts, const std::string& path,
std::unique_ptr<RandomAccessFile>* file) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->new_random_access_file(opts, butil::FilePath(new_path), file);
}
Status EnvMemory::new_writable_file(const std::string& path, std::unique_ptr<WritableFile>* file) {
return new_writable_file(WritableFileOptions(), path, file);
}
Status EnvMemory::new_writable_file(const WritableFileOptions& opts, const std::string& path,
std::unique_ptr<WritableFile>* file) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->new_writable_file<MemoryWritableFile>(opts.mode, butil::FilePath(new_path), file);
}
Status EnvMemory::new_random_rw_file(const std::string& path, std::unique_ptr<RandomRWFile>* file) {
return new_random_rw_file(RandomRWFileOptions(), path, file);
}
Status EnvMemory::new_random_rw_file(const RandomRWFileOptions& opts, const std::string& path,
std::unique_ptr<RandomRWFile>* file) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->new_writable_file<MemoryRandomRWFile>(opts.mode, butil::FilePath(new_path), file);
}
Status EnvMemory::path_exists(const std::string& path) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->path_exists(butil::FilePath(new_path));
}
Status EnvMemory::get_children(const std::string& dir, std::vector<std::string>* file) {
std::string new_path;
file->clear();
RETURN_IF_ERROR(canonicalize(dir, &new_path));
return _impl->get_children(butil::FilePath(new_path), file);
}
Status EnvMemory::iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(dir, &new_path));
return _impl->iterate_dir(butil::FilePath(new_path), cb);
}
Status EnvMemory::delete_file(const std::string& path) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->delete_file(butil::FilePath(new_path));
}
Status EnvMemory::create_dir(const std::string& dirname) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(dirname, &new_path));
return _impl->create_dir(butil::FilePath(new_path));
}
Status EnvMemory::create_dir_if_missing(const std::string& dirname, bool* created) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(dirname, &new_path));
return _impl->create_dir_if_missing(butil::FilePath(new_path), created);
}
Status EnvMemory::delete_dir(const std::string& dirname) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(dirname, &new_path));
return _impl->delete_dir(butil::FilePath(new_path));
}
Status EnvMemory::sync_dir(const std::string& dirname) {
return Status::OK();
}
Status EnvMemory::is_directory(const std::string& path, bool* is_dir) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->is_directory(butil::FilePath(new_path), is_dir);
}
Status EnvMemory::canonicalize(const std::string& path, std::string* file) {
if (path.empty() || path[0] != '/') {
return Status::InvalidArgument("Invalid path");
}
// fast path
if (path.find('.') == std::string::npos && path.find("//") == std::string::npos) {
*file = path;
if (file->size() > 1 && file->back() == '/') {
file->pop_back();
}
return Status::OK();
}
// slot path
butil::FilePath file_path(path);
std::vector<std::string> components;
file_path.GetComponents(&components);
std::vector<std::string> normalized_components;
components.erase(components.begin());
for (auto& s : components) {
if (s == "..") {
if (!normalized_components.empty()) {
normalized_components.pop_back();
}
} else if (s == ".") {
continue;
} else {
normalized_components.emplace_back(std::move(s));
}
}
butil::FilePath final_path("/");
for (const auto& s : normalized_components) {
final_path = final_path.Append(s);
}
*file = final_path.value();
return Status::OK();
}
Status EnvMemory::get_file_size(const std::string& path, uint64_t* size) {
std::string new_path;
RETURN_IF_ERROR(canonicalize(path, &new_path));
return _impl->get_file_size(butil::FilePath(new_path), size);
}
Status EnvMemory::get_file_modified_time(const std::string& path, uint64_t* file_mtime) {
return Status::NotSupported("get_file_modified_time");
}
Status EnvMemory::rename_file(const std::string& src, const std::string& target) {
std::string new_src_path;
std::string new_dst_path;
RETURN_IF_ERROR(canonicalize(src, &new_src_path));
RETURN_IF_ERROR(canonicalize(target, &new_dst_path));
return _impl->rename_file(butil::FilePath(new_src_path), butil::FilePath(new_dst_path));
}
Status EnvMemory::link_file(const std::string& old_path, const std::string& new_path) {
std::string new_src_path;
std::string new_dst_path;
RETURN_IF_ERROR(canonicalize(old_path, &new_src_path));
RETURN_IF_ERROR(canonicalize(new_path, &new_dst_path));
return _impl->link_file(butil::FilePath(new_src_path), butil::FilePath(new_dst_path));
}
Status EnvMemory::create_file(const std::string& path) {
WritableFileOptions opts{.mode = CREATE_OR_OPEN};
std::unique_ptr<WritableFile> dummy;
return new_writable_file(opts, path, &dummy);
}
Status EnvMemory::append_file(const std::string& path, const Slice& content) {
WritableFileOptions opts{.mode = CREATE_OR_OPEN};
std::unique_ptr<WritableFile> f;
RETURN_IF_ERROR(new_writable_file(opts, path, &f));
return f->append(content);
}
Status EnvMemory::read_file(const std::string& path, std::string* content) {
std::unique_ptr<RandomAccessFile> f;
RETURN_IF_ERROR(new_random_access_file(path, &f));
uint64_t size = 0;
RETURN_IF_ERROR(f->size(&size));
raw::make_room(content, size);
Slice buff(*content);
return f->read_at(0, buff);
}
} // namespace starrocks

165
be/src/env/env_memory.h vendored Normal file
View File

@ -0,0 +1,165 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include <algorithm>
#include "env/env.h"
namespace starrocks {
// NOTE(): Careful to use it. Now this is only used to do UT now.
class StringRandomAccessFile final : public RandomAccessFile {
public:
explicit StringRandomAccessFile(std::string str) : _str(std::move(str)) {}
~StringRandomAccessFile() override = default;
Status read(uint64_t offset, Slice* res) const override {
if (offset >= _str.size()) {
res->size = 0;
return Status::OK();
}
size_t to_read = std::min<size_t>(res->size, _str.size() - offset);
memcpy(res->data, _str.data() + offset, to_read);
res->size = to_read;
return Status::OK();
}
Status read_at(uint64_t offset, const Slice& result) const override {
if (offset + result.size > _str.size()) {
return Status::InternalError("");
}
memcpy(result.data, _str.data() + offset, result.size);
return Status::OK();
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
size_t total_size = 0;
for (int i = 0; i < res_cnt; ++i) {
total_size += res[i].size;
}
if (offset + total_size > _str.size()) {
return Status::InternalError("");
}
for (int i = 0; i < res_cnt; ++i) {
memcpy(res[i].data, _str.data() + offset, res[i].size);
offset += res[i].size;
}
return Status::OK();
}
Status size(uint64_t* size) const override {
*size = _str.size();
return Status::OK();
}
const std::string& file_name() const override {
static std::string s_name = "StringRandomAccessFile";
return s_name;
}
private:
std::string _str;
};
class StringSequentialFile final : public SequentialFile {
public:
explicit StringSequentialFile(std::string str) : _offset(0), _random_file(std::move(str)) {}
~StringSequentialFile() override = default;
Status read(Slice* res) override {
Status st = _random_file.read(_offset, res);
if (st.ok()) {
_offset += res->size;
}
return st;
}
const std::string& filename() const override {
static std::string s_name = "StringSequentialFile";
return s_name;
}
Status skip(uint64_t n) override {
uint64_t size = 0;
CHECK(_random_file.size(&size).ok());
_offset = std::min(_offset + n, size);
return Status::OK();
}
uint64_t size() const {
uint64_t sz;
(void)_random_file.size(&sz);
return sz;
}
private:
uint64_t _offset = 0;
StringRandomAccessFile _random_file;
};
class EnvMemoryImpl;
class EnvMemory : public Env {
public:
EnvMemory();
~EnvMemory() override;
EnvMemory(const EnvMemory&) = delete;
void operator=(const EnvMemory&) = delete;
Status new_sequential_file(const std::string& url, std::unique_ptr<SequentialFile>* file) override;
Status new_random_access_file(const std::string& url, std::unique_ptr<RandomAccessFile>* file) override;
Status new_random_access_file(const RandomAccessFileOptions& opts, const std::string& url,
std::unique_ptr<RandomAccessFile>* file) override;
Status new_writable_file(const std::string& url, std::unique_ptr<WritableFile>* file) override;
Status new_writable_file(const WritableFileOptions& opts, const std::string& url,
std::unique_ptr<WritableFile>* file) override;
Status new_random_rw_file(const std::string& url, std::unique_ptr<RandomRWFile>* file) override;
Status new_random_rw_file(const RandomRWFileOptions& opts, const std::string& url,
std::unique_ptr<RandomRWFile>* file) override;
Status path_exists(const std::string& url) override;
Status get_children(const std::string& dir, std::vector<std::string>* file) override;
Status iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) override;
Status delete_file(const std::string& url) override;
Status create_dir(const std::string& dirname) override;
Status create_dir_if_missing(const std::string& dirname, bool* created) override;
Status delete_dir(const std::string& dirname) override;
Status sync_dir(const std::string& dirname) override;
Status is_directory(const std::string& url, bool* is_dir) override;
Status canonicalize(const std::string& path, std::string* file) override;
Status get_file_size(const std::string& url, uint64_t* size) override;
Status get_file_modified_time(const std::string& url, uint64_t* file_mtime) override;
Status rename_file(const std::string& src, const std::string& target) override;
Status link_file(const std::string& old_path, const std::string& new_path) override;
Status create_file(const std::string& path);
Status append_file(const std::string& path, const Slice& content);
Status read_file(const std::string& path, std::string* content);
private:
EnvMemoryImpl* _impl;
};
} // namespace starrocks

702
be/src/env/env_posix.cpp vendored Normal file
View File

@ -0,0 +1,702 @@
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors
#include <dirent.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <memory>
#include "common/logging.h"
#include "env/env.h"
#include "gutil/gscoped_ptr.h"
#include "gutil/macros.h"
#include "gutil/port.h"
#include "gutil/strings/substitute.h"
#include "util/errno.h"
#include "util/file_cache.h"
#include "util/slice.h"
namespace starrocks {
using std::string;
using strings::Substitute;
// Close file descriptor when object goes out of scope.
class ScopedFdCloser {
public:
explicit ScopedFdCloser(int fd) : fd_(fd) {}
~ScopedFdCloser() {
int err;
RETRY_ON_EINTR(err, ::close(fd_));
if (PREDICT_FALSE(err != 0)) {
LOG(WARNING) << "Failed to close fd " << fd_;
}
}
private:
const int fd_;
};
static Status io_error(const std::string& context, int err_number) {
switch (err_number) {
case ENOENT:
return Status::NotFound(context, static_cast<int16_t>(err_number), std::strerror(err_number));
case EEXIST:
return Status::AlreadyExist(context, static_cast<int16_t>(err_number), std::strerror(err_number));
default:
return Status::IOError(context, static_cast<int16_t>(err_number), std::strerror(err_number));
}
}
static Status do_sync(int fd, const string& filename) {
if (fdatasync(fd) < 0) {
return io_error(filename, errno);
}
return Status::OK();
}
static Status do_open(const string& filename, Env::OpenMode mode, int* fd) {
int flags = O_RDWR;
switch (mode) {
case Env::CREATE_OR_OPEN_WITH_TRUNCATE:
flags |= O_CREAT | O_TRUNC;
break;
case Env::CREATE_OR_OPEN:
flags |= O_CREAT;
break;
case Env::MUST_CREATE:
flags |= O_CREAT | O_EXCL;
break;
case Env::MUST_EXIST:
break;
default:
return Status::NotSupported(strings::Substitute("Unknown create mode $0", mode));
}
int f;
RETRY_ON_EINTR(f, open(filename.c_str(), flags, 0666));
if (f < 0) {
return io_error(filename, errno);
}
*fd = f;
return Status::OK();
}
static Status do_readv_at(int fd, const std::string& filename, uint64_t offset, const Slice* res, size_t res_cnt,
size_t* read_bytes) {
// Convert the results into the iovec vector to request
// and calculate the total bytes requested
size_t bytes_req = 0;
struct iovec iov[res_cnt];
for (size_t i = 0; i < res_cnt; i++) {
const Slice& result = res[i];
bytes_req += result.size;
iov[i] = {result.data, result.size};
}
uint64_t cur_offset = offset;
size_t completed_iov = 0;
size_t rem = bytes_req;
while (rem > 0) {
// Never request more than IOV_MAX in one request
size_t iov_count = std::min(res_cnt - completed_iov, static_cast<size_t>(IOV_MAX));
ssize_t r;
RETRY_ON_EINTR(r, preadv(fd, iov + completed_iov, iov_count, cur_offset));
if (PREDICT_FALSE(r < 0)) {
// An error: return a non-ok status.
return io_error(filename, errno);
}
if (PREDICT_FALSE(r == 0)) {
if (read_bytes != nullptr) {
*read_bytes = cur_offset - offset;
}
return Status::EndOfFile(
strings::Substitute("EOF trying to read $0 bytes at offset $1", bytes_req, offset));
}
if (PREDICT_TRUE(r == rem)) {
// All requested bytes were read. This is almost always the case.
return Status::OK();
}
DCHECK_LE(r, rem);
// Adjust iovec vector based on bytes read for the next request
ssize_t bytes_rem = r;
for (size_t i = completed_iov; i < res_cnt; i++) {
if (bytes_rem >= iov[i].iov_len) {
// The full length of this iovec was read
completed_iov++;
bytes_rem -= iov[i].iov_len;
} else {
// Partially read this result.
// Adjust the iov_len and iov_base to request only the missing data.
iov[i].iov_base = static_cast<uint8_t*>(iov[i].iov_base) + bytes_rem;
iov[i].iov_len -= bytes_rem;
break; // Don't need to adjust remaining iovec's
}
}
cur_offset += r;
rem -= r;
}
DCHECK_EQ(0, rem);
return Status::OK();
}
static Status do_writev_at(int fd, const string& filename, uint64_t offset, const Slice* data, size_t data_cnt,
size_t* bytes_written) {
// Convert the results into the iovec vector to request
// and calculate the total bytes requested.
size_t bytes_req = 0;
struct iovec iov[data_cnt];
for (size_t i = 0; i < data_cnt; i++) {
const Slice& result = data[i];
bytes_req += result.size;
iov[i] = {result.data, result.size};
}
uint64_t cur_offset = offset;
size_t completed_iov = 0;
size_t rem = bytes_req;
while (rem > 0) {
// Never request more than IOV_MAX in one request.
size_t iov_count = std::min(data_cnt - completed_iov, static_cast<size_t>(IOV_MAX));
ssize_t w;
RETRY_ON_EINTR(w, pwritev(fd, iov + completed_iov, iov_count, cur_offset));
if (PREDICT_FALSE(w < 0)) {
// An error: return a non-ok status.
return io_error(filename, errno);
}
if (PREDICT_TRUE(w == rem)) {
// All requested bytes were read. This is almost always the case.
rem = 0;
break;
}
// Adjust iovec vector based on bytes read for the next request.
ssize_t bytes_rem = w;
for (size_t i = completed_iov; i < data_cnt; i++) {
if (bytes_rem >= iov[i].iov_len) {
// The full length of this iovec was written.
completed_iov++;
bytes_rem -= iov[i].iov_len;
} else {
// Partially wrote this result.
// Adjust the iov_len and iov_base to write only the missing data.
iov[i].iov_base = static_cast<uint8_t*>(iov[i].iov_base) + bytes_rem;
iov[i].iov_len -= bytes_rem;
break; // Don't need to adjust remaining iovec's.
}
}
cur_offset += w;
rem -= w;
}
DCHECK_EQ(0, rem);
*bytes_written = bytes_req;
return Status::OK();
}
class PosixSequentialFile : public SequentialFile {
public:
PosixSequentialFile(string fname, FILE* f) : _filename(std::move(fname)), _file(f) {}
~PosixSequentialFile() override {
int err;
RETRY_ON_EINTR(err, fclose(_file));
if (PREDICT_FALSE(err != 0)) {
LOG(WARNING) << "Failed to close " << _filename << ", msg=" << errno_to_string(ferror(_file));
}
}
Status read(Slice* result) override {
size_t r;
STREAM_RETRY_ON_EINTR(r, _file, fread_unlocked(result->data, 1, result->size, _file));
if (r < result->size) {
if (feof(_file)) {
// We leave status as ok if we hit the end of the file.
// We need to adjust the slice size.
result->truncate(r);
} else {
// A partial read with an error: return a non-ok status.
return io_error(_filename, ferror(_file));
}
}
return Status::OK();
}
Status skip(uint64_t n) override {
if (fseek(_file, n, SEEK_CUR)) {
return io_error(_filename, errno);
}
return Status::OK();
}
const string& filename() const override { return _filename; }
private:
const std::string _filename;
FILE* const _file;
};
class PosixRandomAccessFile : public RandomAccessFile {
public:
PosixRandomAccessFile(std::string filename, int fd) : _filename(std::move(filename)), _fd(fd) {}
~PosixRandomAccessFile() override {
int res;
RETRY_ON_EINTR(res, close(_fd));
if (res != 0) {
LOG(WARNING) << "close file failed, name=" << _filename << ", msg=" << errno_to_string(errno);
}
}
Status read(uint64_t offset, Slice* res) const override {
uint64_t read_bytes = 0;
auto st = do_readv_at(_fd, _filename, offset, res, 1, &read_bytes);
if (!st.ok() && st.is_end_of_file()) {
res->size = read_bytes;
return Status::OK();
}
return st;
}
Status read_at(uint64_t offset, const Slice& result) const override {
return do_readv_at(_fd, _filename, offset, &result, 1, nullptr);
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
return do_readv_at(_fd, _filename, offset, res, res_cnt, nullptr);
}
Status size(uint64_t* size) const override {
struct stat st;
auto res = fstat(_fd, &st);
if (res != 0) {
return io_error(_filename, errno);
}
*size = st.st_size;
return Status::OK();
}
const std::string& file_name() const override { return _filename; }
private:
std::string _filename;
int _fd;
};
class PosixWritableFile : public WritableFile {
public:
PosixWritableFile(std::string filename, int fd, uint64_t filesize, bool sync_on_close)
: _filename(std::move(filename)), _fd(fd), _sync_on_close(sync_on_close), _filesize(filesize) {}
~PosixWritableFile() override { WARN_IF_ERROR(close(), "Failed to close file, file=" + _filename); }
Status append(const Slice& data) override { return appendv(&data, 1); }
Status appendv(const Slice* data, size_t cnt) override {
size_t bytes_written = 0;
RETURN_IF_ERROR(do_writev_at(_fd, _filename, _filesize, data, cnt, &bytes_written));
_filesize += bytes_written;
return Status::OK();
}
Status pre_allocate(uint64_t size) override {
uint64_t offset = std::max(_filesize, _pre_allocated_size);
int ret;
RETRY_ON_EINTR(ret, fallocate(_fd, 0, offset, size));
if (ret != 0) {
if (errno == EOPNOTSUPP) {
LOG(WARNING) << "The filesystem does not support fallocate().";
} else if (errno == ENOSYS) {
LOG(WARNING) << "The kernel does not implement fallocate().";
} else {
return io_error(_filename, errno);
}
}
_pre_allocated_size = offset + size;
return Status::OK();
}
Status close() override {
if (_closed) {
return Status::OK();
}
Status s;
// If we've allocated more space than we used, truncate to the
// actual size of the file and perform Sync().
if (_filesize < _pre_allocated_size) {
int ret;
RETRY_ON_EINTR(ret, ftruncate(_fd, _filesize));
if (ret != 0) {
s = io_error(_filename, errno);
_pending_sync = true;
}
}
if (_sync_on_close) {
Status sync_status = sync();
if (!sync_status.ok()) {
LOG(ERROR) << "Unable to Sync " << _filename << ": " << sync_status.to_string();
if (s.ok()) {
s = sync_status;
}
}
}
int ret;
RETRY_ON_EINTR(ret, ::close(_fd));
if (ret < 0) {
if (s.ok()) {
s = io_error(_filename, errno);
}
}
_closed = true;
return s;
}
Status flush(FlushMode mode) override {
#if defined(__linux__)
int flags = SYNC_FILE_RANGE_WRITE;
if (mode == FLUSH_SYNC) {
flags |= SYNC_FILE_RANGE_WAIT_BEFORE;
flags |= SYNC_FILE_RANGE_WAIT_AFTER;
}
if (sync_file_range(_fd, 0, 0, flags) < 0) {
return io_error(_filename, errno);
}
#else
if (mode == FLUSH_SYNC && fsync(_fd) < 0) {
return io_error(_filename, errno);
}
#endif
return Status::OK();
}
Status sync() override {
if (_pending_sync) {
_pending_sync = false;
RETURN_IF_ERROR(do_sync(_fd, _filename));
}
return Status::OK();
}
uint64_t size() const override { return _filesize; }
const string& filename() const override { return _filename; }
private:
std::string _filename;
int _fd;
const bool _sync_on_close = false;
bool _pending_sync = false;
bool _closed = false;
uint64_t _filesize = 0;
uint64_t _pre_allocated_size = 0;
};
class PosixRandomRWFile : public RandomRWFile {
public:
PosixRandomRWFile(string fname, int fd, bool sync_on_close)
: _filename(std::move(fname)), _fd(fd), _sync_on_close(sync_on_close), _closed(false) {}
~PosixRandomRWFile() { WARN_IF_ERROR(close(), "Failed to close " + _filename); }
virtual Status read_at(uint64_t offset, const Slice& result) const override {
return do_readv_at(_fd, _filename, offset, &result, 1, nullptr);
}
Status readv_at(uint64_t offset, const Slice* res, size_t res_cnt) const override {
return do_readv_at(_fd, _filename, offset, res, res_cnt, nullptr);
}
Status write_at(uint64_t offset, const Slice& data) override { return writev_at(offset, &data, 1); }
Status writev_at(uint64_t offset, const Slice* data, size_t data_cnt) override {
size_t bytes_written = 0;
return do_writev_at(_fd, _filename, offset, data, data_cnt, &bytes_written);
}
Status flush(FlushMode mode, uint64_t offset, size_t length) override {
#if defined(__linux__)
int flags = SYNC_FILE_RANGE_WRITE;
if (mode == FLUSH_SYNC) {
flags |= SYNC_FILE_RANGE_WAIT_AFTER;
}
if (sync_file_range(_fd, offset, length, flags) < 0) {
return io_error(_filename, errno);
}
#else
if (mode == FLUSH_SYNC && fsync(_fd) < 0) {
return io_error(_filename, errno);
}
#endif
return Status::OK();
}
Status sync() override { return do_sync(_fd, _filename); }
Status close() override {
if (_closed) {
return Status::OK();
}
Status s;
if (_sync_on_close) {
s = sync();
if (!s.ok()) {
LOG(ERROR) << "Unable to Sync " << _filename << ": " << s.to_string();
}
}
int ret;
RETRY_ON_EINTR(ret, ::close(_fd));
if (ret < 0) {
if (s.ok()) {
s = io_error(_filename, errno);
}
}
_closed = true;
return s;
}
Status size(uint64_t* size) const override {
struct stat st;
if (fstat(_fd, &st) == -1) {
return io_error(_filename, errno);
}
*size = st.st_size;
return Status::OK();
}
const string& filename() const override { return _filename; }
private:
const std::string _filename;
const int _fd;
const bool _sync_on_close = false;
bool _closed = false;
};
class PosixEnv : public Env {
public:
~PosixEnv() override {}
Status new_sequential_file(const string& fname, std::unique_ptr<SequentialFile>* result) override {
FILE* f;
POINTER_RETRY_ON_EINTR(f, fopen(fname.c_str(), "r"));
if (f == nullptr) {
return io_error(fname, errno);
}
result->reset(new PosixSequentialFile(fname, f));
return Status::OK();
}
// get a RandomAccessFile pointer without file cache
Status new_random_access_file(const std::string& fname, std::unique_ptr<RandomAccessFile>* result) override {
return new_random_access_file(RandomAccessFileOptions(), fname, result);
}
Status new_random_access_file(const RandomAccessFileOptions& opts, const std::string& fname,
std::unique_ptr<RandomAccessFile>* result) override {
int fd;
RETRY_ON_EINTR(fd, open(fname.c_str(), O_RDONLY));
if (fd < 0) {
return io_error(fname, errno);
}
result->reset(new PosixRandomAccessFile(fname, fd));
return Status::OK();
}
Status new_writable_file(const string& fname, std::unique_ptr<WritableFile>* result) override {
return new_writable_file(WritableFileOptions(), fname, result);
}
Status new_writable_file(const WritableFileOptions& opts, const string& fname,
std::unique_ptr<WritableFile>* result) override {
int fd;
RETURN_IF_ERROR(do_open(fname, opts.mode, &fd));
uint64_t file_size = 0;
if (opts.mode == MUST_EXIST) {
RETURN_IF_ERROR(get_file_size(fname, &file_size));
}
result->reset(new PosixWritableFile(fname, fd, file_size, opts.sync_on_close));
return Status::OK();
}
Status new_random_rw_file(const string& fname, std::unique_ptr<RandomRWFile>* result) override {
return new_random_rw_file(RandomRWFileOptions(), fname, result);
}
Status new_random_rw_file(const RandomRWFileOptions& opts, const string& fname,
std::unique_ptr<RandomRWFile>* result) override {
int fd;
RETURN_IF_ERROR(do_open(fname, opts.mode, &fd));
result->reset(new PosixRandomRWFile(fname, fd, opts.sync_on_close));
return Status::OK();
}
Status path_exists(const std::string& fname) override {
if (access(fname.c_str(), F_OK) != 0) {
return io_error(fname, errno);
}
return Status::OK();
}
Status get_children(const std::string& dir, std::vector<std::string>* result) override {
result->clear();
DIR* d = opendir(dir.c_str());
if (d == nullptr) {
return io_error(dir, errno);
}
struct dirent* entry;
while ((entry = readdir(d)) != nullptr) {
result->push_back(entry->d_name);
}
closedir(d);
return Status::OK();
}
Status iterate_dir(const std::string& dir, const std::function<bool(const char*)>& cb) override {
DIR* d = opendir(dir.c_str());
if (d == nullptr) {
return io_error(dir, errno);
}
struct dirent* entry;
while ((entry = readdir(d)) != nullptr) {
// callback returning false means to terminate iteration
if (!cb(entry->d_name)) {
break;
}
}
closedir(d);
return Status::OK();
}
Status delete_file(const std::string& fname) override {
if (unlink(fname.c_str()) != 0) {
return io_error(fname, errno);
}
return Status::OK();
}
Status create_dir(const std::string& name) override {
if (mkdir(name.c_str(), 0755) != 0) {
return io_error(name, errno);
}
return Status::OK();
}
Status create_dir_if_missing(const string& dirname, bool* created = nullptr) override {
Status s = create_dir(dirname);
if (created != nullptr) {
*created = s.ok();
}
// Check that dirname is actually a directory.
if (s.is_already_exist()) {
bool is_dir = false;
RETURN_IF_ERROR(is_directory(dirname, &is_dir));
if (is_dir) {
return Status::OK();
} else {
return s.clone_and_append("path already exists but not a dir");
}
}
return s;
}
// Delete the specified directory.
Status delete_dir(const std::string& dirname) override {
if (rmdir(dirname.c_str()) != 0) {
return io_error(dirname, errno);
}
return Status::OK();
}
Status sync_dir(const string& dirname) override {
int dir_fd;
RETRY_ON_EINTR(dir_fd, open(dirname.c_str(), O_DIRECTORY | O_RDONLY));
if (dir_fd < 0) {
return io_error(dirname, errno);
}
ScopedFdCloser fd_closer(dir_fd);
if (fsync(dir_fd) != 0) {
return io_error(dirname, errno);
}
return Status::OK();
}
Status is_directory(const std::string& path, bool* is_dir) override {
struct stat path_stat;
if (stat(path.c_str(), &path_stat) != 0) {
return io_error(path, errno);
} else {
*is_dir = S_ISDIR(path_stat.st_mode);
}
return Status::OK();
}
Status canonicalize(const std::string& path, std::string* result) override {
// NOTE: we must use free() to release the buffer retruned by realpath(),
// because the buffer is allocated by malloc(), see `man 3 realpath`.
std::unique_ptr<char[], FreeDeleter> r(realpath(path.c_str(), nullptr));
if (r == nullptr) {
return io_error(strings::Substitute("Unable to canonicalize $0", path), errno);
}
*result = std::string(r.get());
return Status::OK();
}
Status get_file_size(const string& fname, uint64_t* size) override {
struct stat sbuf;
if (stat(fname.c_str(), &sbuf) != 0) {
return io_error(fname, errno);
} else {
*size = sbuf.st_size;
}
return Status::OK();
}
Status get_file_modified_time(const std::string& fname, uint64_t* file_mtime) override {
struct stat s;
if (stat(fname.c_str(), &s) != 0) {
return io_error(fname, errno);
}
*file_mtime = static_cast<uint64_t>(s.st_mtime);
return Status::OK();
}
Status rename_file(const std::string& src, const std::string& target) override {
if (rename(src.c_str(), target.c_str()) != 0) {
return io_error(src, errno);
}
return Status::OK();
}
Status link_file(const std::string& old_path, const std::string& new_path) override {
if (link(old_path.c_str(), new_path.c_str()) != 0) {
return io_error(old_path, errno);
}
return Status::OK();
}
};
// Default Posix Env
Env* Env::Default() {
static PosixEnv default_env;
return &default_env;
}
} // end namespace starrocks

30
be/src/env/env_stream_pipe.cpp vendored Normal file
View File

@ -0,0 +1,30 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#include "env/env_stream_pipe.h"
#include "env/env.h"
#include "gutil/strings/substitute.h"
#include "runtime/stream_load/stream_load_pipe.h"
namespace starrocks {
StreamPipeSequentialFile::StreamPipeSequentialFile(std::shared_ptr<StreamLoadPipe> file) : _file(std::move(file)) {}
StreamPipeSequentialFile::~StreamPipeSequentialFile() {
_file->close();
}
Status StreamPipeSequentialFile::read(Slice* result) {
bool eof = false;
return _file->read(reinterpret_cast<uint8_t*>(result->data), &(result->size), &eof);
}
Status StreamPipeSequentialFile::read_one_message(std::unique_ptr<uint8_t[]>* buf, size_t* length) {
return _file->read_one_message(buf, length);
}
Status StreamPipeSequentialFile::skip(uint64_t n) {
return _file->seek(n);
}
} // namespace starrocks

29
be/src/env/env_stream_pipe.h vendored Normal file
View File

@ -0,0 +1,29 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "env/env.h"
namespace starrocks {
class StreamLoadPipe;
// Since the StreamPipe does not have the standard
// SequentialFile interface. The StreamPipeSequentialFile
// is used to wrap the memory to provide the sequential API.
class StreamPipeSequentialFile : public SequentialFile {
public:
explicit StreamPipeSequentialFile(std::shared_ptr<StreamLoadPipe> file);
~StreamPipeSequentialFile() override;
Status read(Slice* result) override;
Status read_one_message(std::unique_ptr<uint8_t[]>* buf, size_t* length);
Status skip(uint64_t n) override;
const std::string& filename() const override { return _filename; }
private:
std::shared_ptr<StreamLoadPipe> _file;
std::string _filename = "StreamPipeSequentialFile";
};
} // namespace starrocks

56
be/src/env/env_util.cpp vendored Normal file
View File

@ -0,0 +1,56 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/env/env_util.cpp
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "env/env_util.h"
#include "env/env.h"
namespace starrocks {
namespace env_util {
Status open_file_for_write(Env* env, const std::string& path, std::shared_ptr<WritableFile>* file) {
return open_file_for_write(WritableFileOptions(), env, path, file);
}
Status open_file_for_write(const WritableFileOptions& opts, Env* env, const std::string& path,
std::shared_ptr<WritableFile>* file) {
std::unique_ptr<WritableFile> w;
RETURN_IF_ERROR(env->new_writable_file(opts, path, &w));
file->reset(w.release());
return Status::OK();
}
Status open_file_for_sequential(Env* env, const std::string& path, std::shared_ptr<SequentialFile>* file) {
std::unique_ptr<SequentialFile> r;
RETURN_IF_ERROR(env->new_sequential_file(path, &r));
file->reset(r.release());
return Status::OK();
}
Status open_file_for_random(Env* env, const std::string& path, std::shared_ptr<RandomAccessFile>* file) {
std::unique_ptr<RandomAccessFile> r;
RETURN_IF_ERROR(env->new_random_access_file(path, &r));
file->reset(r.release());
return Status::OK();
}
} // namespace env_util
} // namespace starrocks

48
be/src/env/env_util.h vendored Normal file
View File

@ -0,0 +1,48 @@
// This file is made available under Elastic License 2.0.
// This file is based on code available under the Apache license here:
// https://github.com/apache/incubator-doris/blob/master/be/src/env/env_util.h
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <memory>
#include <string>
#include "common/status.h"
namespace starrocks {
class Env;
class SequentialFile;
class RandomAccessFile;
class WritableFile;
struct WritableFileOptions;
namespace env_util {
Status open_file_for_write(Env* env, const std::string& path, std::shared_ptr<WritableFile>* file);
Status open_file_for_write(const WritableFileOptions& opts, Env* env, const std::string& path,
std::shared_ptr<WritableFile>* file);
Status open_file_for_sequential(Env* env, const std::string& path, std::shared_ptr<SequentialFile>* file);
Status open_file_for_random(Env* env, const std::string& path, std::shared_ptr<RandomAccessFile>* file);
} // namespace env_util
} // namespace starrocks

45
be/src/env/output_stream_wrapper.h vendored Normal file
View File

@ -0,0 +1,45 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021 StarRocks Limited.
#pragma once
#include "env/writable_file_as_stream_buf.h"
#include "env/writable_file_wrapper.h"
namespace starrocks {
//
// Wrap a WritableFile into std::ostream. Note that no internal buffer
// in the stream.
//
// Example usage:
// #1. Write to file as std::ostream
// ```
// std::unique_ptr<WritableFile> f;
// Env::Default()->new_writable_file("a.txt", &f);
// OutputStreamWrapper wrapper(f.release(), kTakesOwnership);
// wrapper << "anything can be sent to std::ostream";
// ```
//
// #2. Serialize protobuf to file directly
// ```
// TabletMetaPB tablet_meta_pb;
//
// std::unique_ptr<WritableFile> f;
// Env::Default()->new_writable_file("a.txt", &f);
// OutputStreamWrapper wrapper(f.release(), kTakesOwnership);
// tablet_meta.SerializeToOStream(&wrapper);
// ```
//
class OutputStreamWrapper final : public WritableFileWrapper, public std::ostream {
public:
// If |ownership| is kDontTakeOwnership, |file| must outlive this OutputStreamWrapper.
explicit OutputStreamWrapper(WritableFile* file, Ownership ownership = kDontTakeOwnership)
: WritableFileWrapper(file, ownership), std::ostream(NULL), _stream_buf(this) {
rdbuf(&_stream_buf);
}
private:
WritableFileAsStreamBuf _stream_buf;
};
} // namespace starrocks

Some files were not shown because too many files have changed in this diff Show More