starrocks

Commit Graph

Author	SHA1	Message	Date
yan zhang	ba48ea9526	[BugFix] Fix Child Class Loader of JNI Readers (#60163 ) Signed-off-by: yan zhang <dirtysalt1987@gmail.com>	2025-06-24 18:44:17 +08:00
RyanZ	2787b44ddc	[Enhancement] supprt incompatible avro schema (#57296 ) Signed-off-by: yanz <dirtysalt1987@gmail.com>	2025-03-26 14:22:52 +08:00
Vikas Attiguppa	50b1a91924	[Enhancement] Fixing CVEs (#54749 ) Signed-off-by: Vikas Attiguppa <20652333+va-os-commits@users.noreply.github.com>	2025-01-08 16:33:29 -08:00
Smith Cruise	b47bdebcfc	[Enhancement] Fix cve problems in java-extensions module (#49425 ) Signed-off-by: Smith Cruise <chendingchao1@126.com>	2024-08-08 17:05:40 +08:00
Smith Cruise	6b01a165c4	[Enhancement] Fix some hudi cve problems (#49157 ) Signed-off-by: Smith Cruise <chendingchao1@126.com>	2024-08-02 10:32:08 +08:00
stephen	0066149594	[Feature] support to query iceberg refs table (#48972 ) Signed-off-by: stephen <stephen5217@163.com>	2024-07-29 15:26:41 +08:00
裸奔丶小馒头	c349de8ac9	[BugFix] Fix JniScanner crash due to struct null indicator (#46492 ) Signed-off-by: changxin <streakxin@foxmail.com>	2024-06-07 11:46:25 +00:00
裸奔丶小馒头	b8cbc29f09	[BugFix] Fix the crash caused by JniScanner (#44903 ) Signed-off-by: changxin <streakxin@foxmail.com>	2024-05-24 10:33:42 +08:00
Smith Cruise	0fcf3d7eba	[Enhancement] Bump FE/BE's hadoop to 3.4.0 (#45312 ) Why I'm doing: For the CVE problem, we need to upgrade Hadoop SDK from 3.3.6 -> 3.4.0 It will introduce aws java SDK v2, so we can delete SDK v1. Signed-off-by: Smith Cruise <chendingchao1@126.com>	2024-05-16 14:40:26 +08:00
RyanZ	1569f589dc	[BugFix] fix empty required fields in jni scanner (#45568 ) Signed-off-by: yanz <dirtysalt1987@gmail.com>	2024-05-14 13:59:48 +08:00
RyanZ	b6ca919bf7	[Feature] Optimize `count(1)` in hdfs scanner by rewriting plan to `sum` (#43616 ) Why I'm doing: Rigjht now hdfs scanner optimization on count(1) is to output const column of expected count. And we can see in extreme case(large dataset), the chunk number flows in pipeline will be extremely huge, and operator time and overhead time is not neglectable. And here is a profile of select count() from hive.hive_ssb100g_parquet.lineorder. To reproduce this extreme case, I've changed code to scale morsels by 20x and repeat row groups by 10x. in concurrency=1 case , total time is 51s - OverheadTime: 25s37ms - __MAX_OF_OverheadTime: 25s111ms - __MIN_OF_OverheadTime: 24s962ms - PullTotalTime: 12s376ms - __MAX_OF_PullTotalTime: 13s147ms - __MIN_OF_PullTotalTime: 11s885ms What I'm doing: Rewrite the count(1) query to sum like. So each row group reader will only emit at one chunk(size = 1). And total time is 9s. Original plan is like +----------------------------------+ \| Explain String \| +----------------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS:18: count \| \| PARTITION: UNPARTITIONED \| \| \| \| RESULT SINK \| \| \| \| 4:AGGREGATE (merge finalize) \| \| \| output: count(18: count) \| \| \| group by: \| \| \| \| \| 3:EXCHANGE \| \| \| \| PLAN FRAGMENT 1 \| \| OUTPUT EXPRS: \| \| PARTITION: RANDOM \| \| \| \| STREAM DATA SINK \| \| EXCHANGE ID: 03 \| \| UNPARTITIONED \| \| \| \| 2:AGGREGATE (update serialize) \| \| \| output: count() \| \| \| group by: \| \| \| \| \| 1:Project \| \| \| <slot 20> : 1 \| \| \| \| \| 0:HdfsScanNode \| \| TABLE: lineorder \| \| partitions=1/1 \| \| cardinality=600037902 \| \| avgRowSize=5.0 \| +----------------------------------+ And rewritted plan is like +-----------------------------------+ \| Explain String \| +-----------------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS:18: count \| \| PARTITION: UNPARTITIONED \| \| \| \| RESULT SINK \| \| \| \| 3:AGGREGATE (merge finalize) \| \| \| output: sum(18: count) \| \| \| group by: \| \| \| \| \| 2:EXCHANGE \| \| \| \| PLAN FRAGMENT 1 \| \| OUTPUT EXPRS: \| \| PARTITION: RANDOM \| \| \| \| STREAM DATA SINK \| \| EXCHANGE ID: 02 \| \| UNPARTITIONED \| \| \| \| 1:AGGREGATE (update serialize) \| \| \| output: sum(19: ___count___) \| \| \| group by: \| \| \| \| \| 0:HdfsScanNode \| \| TABLE: lineorder \| \| partitions=1/1 \| \| cardinality=1 \| \| avgRowSize=1.0 \| +-----------------------------------+ Fixes #45242 Signed-off-by: yanz <dirtysalt1987@gmail.com>	2024-05-10 09:26:45 +08:00
Yi	d7399a1b61	[BugFix] fix jni-reader for struct type when writing nulls to off heap (#42285 ) Signed-off-by: Yi Wang <connorwang@live.com>	2024-05-08 22:08:46 +08:00
leoyy0316	cdf22824ef	[Enhancement]Enhance JNI reader for date and timestamp type (#38537 ) Signed-off-by: leoyy0316 <571684903@qq.com>	2024-01-09 21:43:08 +08:00
Yi	e2db8b8ebd	[BugFix] fix Timestamp cast exception when Hive-Serde with version 2.x.x is used (#37185 ) Signed-off-by: Yi Wang <connorwang@live.com>	2023-12-27 22:16:06 +08:00
Yi	e658d5b802	[Enhancement] Support Hive Table Formats that relies on SerDe properties to be correctly deserialized (#37182 ) Signed-off-by: Yi Wang <connorwang@live.com>	2023-12-27 11:32:50 +08:00
Zhang Yifan	17292309f2	[BugFix] fix ArrayIndexOutOfBoundsException in HiveScanner (#36177 ) Signed-off-by: zhangyifan27 <chinazhangyifan@163.com>	2023-12-05 13:02:54 +08:00
before-Sunrise	a68e29b942	[BugFix] hive jni scanner with rcbinary deal with timezone (#36371 ) Signed-off-by: before-Sunrise <unclejyj@gmail.com>	2023-12-04 20:55:17 +08:00
Felix Li	5cc1c6ef26	[Enhancement] Upgrade guava to 32.0.1-jre due to CVE-2023-2976 (#34379 ) Signed-off-by: Astralidea <astralidea@163.com>	2023-11-06 14:05:13 +08:00
before-Sunrise	1d7b0efc6d	[Feature] support avro/sequence file/ rcfile for hive table and external file table in jni scanner (#34028 ) Signed-off-by: before-Sunrise <unclejyj@gmail.com>	2023-11-03 19:52:02 +08:00

19 Commits