From df147a4ee7cb6384a381e41048eb397d268a840e Mon Sep 17 00:00:00 2001 From: Copilot <198982749+Copilot@users.noreply.github.com> Date: Wed, 20 Aug 2025 16:43:07 +0800 Subject: [PATCH] [Tool] Add comprehensive GitHub Copilot instructions for StarRocks (#62136) Signed-off-by: Seaven --- .github/copilot-instructions.md | 258 ++++++++++++++++++++++++++++++++ 1 file changed, 258 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000000..b17fb433ae6 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,258 @@ +# GitHub Copilot Instructions for StarRocks + +## Project Overview + +StarRocks is a high-performance, cloud-native analytical database system designed for real-time analytics and ad-hoc queries. It features a streamlined architecture with both shared-nothing and shared-data deployment modes, supporting sub-second query performance for complex analytical workloads. + +**Key Technologies:** +- **Backend (BE)**: C++ - Core analytical engine, storage layer, and query execution +- **Frontend (FE)**: Java - SQL parsing, query planning, metadata management, and coordination +- **Java Extensions**: Java - External connectors and UDF framework +- **Testing**: Python - Integration tests and SQL test framework + +## Architecture Components + +### Backend (be/) - C++ +The core analytical engine responsible for data storage, processing, and query execution: + +**Core Components:** +- `be/src/exec/` - Query execution operators (scan, join, aggregate, etc.) +- `be/src/storage/` - Storage engine (tablets, rowsets, segments, compaction) +- `be/src/exprs/` - Expression evaluation and vectorized computation +- `be/src/formats/` - Data format support (Parquet, ORC, CSV, JSON) +- `be/src/runtime/` - Runtime services (memory management, load balancing, stream processing) +- `be/src/connector/` - External data source connectors (Hive, Iceberg, Delta Lake) +- `be/src/service/` - RPC services and BE coordination +- `be/src/common/` - Shared utilities and common data structures + +**Performance Focus:** +- Vectorized query execution +- Columnar storage format +- Memory-efficient algorithms +- SIMD optimizations where applicable + +📋 **Note:** See `be/.cursorrules` for detailed backend component breakdown + +### Frontend (fe/) - Java +SQL interface and query coordination layer: + +**Core Components:** +- `fe/fe-core/src/main/java/com/starrocks/` + - `sql/` - SQL parser, analyzer, and AST + - `planner/` - Query planning and optimization (CBO) + - `catalog/` - Metadata management (tables, partitions, statistics) + - `scheduler/` - Query scheduling and execution coordination + - `load/` - Data loading coordination (Broker Load, Stream Load, etc.) + - `backup/` - Backup and restore functionality + - `privilege/` - Authentication and authorization + - `qe/` - Query execution coordination and session management +- `fe/fe-common/` - Common frontend utilities +- `fe/plugin-common/` - Plugin framework common components +- `fe/spark-dpp/` - Spark data preprocessing integration +- `fe/hive-udf/` - Hive UDF compatibility layer + +**Key Responsibilities:** +- Parse and validate SQL statements +- Generate optimized query plans using Cost-Based Optimizer (CBO) +- Manage cluster metadata and coordination +- Handle user sessions and security + +📋 **Note:** See `fe/.cursorrules` for detailed frontend component breakdown + +### Java Extensions (java-extensions/) - Java +External connectivity and extensibility: + +**Data Source Connectors:** +- `hive-reader/` - Apache Hive integration +- `iceberg-metadata-reader/` - Apache Iceberg support +- `hudi-reader/` - Apache Hudi integration +- `paimon-reader/` - Apache Paimon support +- `jdbc-bridge/` - JDBC connectivity for external databases +- `odps-reader/` - Alibaba ODPS integration + +**Extension Framework:** +- `udf-extensions/` - User-Defined Function framework +- `common-runtime/` - Shared runtime for extensions +- `hadoop-ext/` - Hadoop ecosystem integration + +📋 **Note:** See `java-extensions/.cursorrules` for detailed extensions breakdown + +### Additional Important Directories + +**Generated Sources (gensrc/):** +- `gensrc/proto/` - Protocol buffer definitions +- `gensrc/thrift/` - Thrift interface definitions +- `gensrc/script/` - Code generation scripts + +**Testing Framework (test/):** +- `test/sql/` - SQL test cases organized by functionality +- `test/common/` - Common test utilities +- `test/lib/` - Test libraries and helpers + +**Tools and Utilities:** +- `tools/` - Diagnostic tools, benchmarks, and utilities +- `bin/` - Binary executables and scripts +- `conf/` - Configuration files and templates +- `build-support/` - Build system support files +- `docker/` - Docker build configurations + +**Other Key Directories:** +- `thirdparty/` - External dependencies and patches +- `fs_brokers/` - File system broker implementations +- `webroot/` - Web UI static files +- `format-sdk/` - Format SDK for data interchange + +## Coding Guidelines + +### C++ (Backend) +```cpp +// Use modern C++ features (C++17/C++20) +// Follow Google C++ Style Guide conventions +// Use RAII for resource management +// Prefer smart pointers over raw pointers +// Use const-correctness + +// Example: Vectorized processing pattern +Status ColumnProcessor::process_batch(const ChunkPtr& chunk) { + const auto& column = chunk->get_column_by_name("column_name"); + auto result_column = std::make_shared(); + + // Vectorized operation on entire column + for (size_t i = 0; i < chunk->num_rows(); ++i) { + // Process element + } + + return Status::OK(); +} +``` + +### Java (Frontend) +```java +// Follow Java coding conventions +// Use dependency injection where appropriate +// Implement proper exception handling +// Use builder patterns for complex objects +// Follow existing naming conventions + +// Example: Query planning pattern +public class ScanNodePlanner extends PlanFragment { + @Override + public PlanFragment visitLogicalScanOperator( + OptExpression optExpression, ExecPlan context) { + LogicalScanOperator scanOperator = + (LogicalScanOperator) optExpression.getOp(); + + // Create physical scan node + ScanNode scanNode = createScanNode(scanOperator); + return new PlanFragment(scanNode); + } +} +``` + + +## ⚠️ CRITICAL BUILD SYSTEM WARNING +**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.** + +The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources. + +**Specific commands and files to AVOID:** +- `build.sh` - Main build script (extremely resource intensive) +- `build-in-docker.sh` - Docker-based build +- `run-be-ut.sh` / `run-fe-ut.sh` / `run-java-exts-ut.sh` - Unit test runners +- `docker-compose` commands - Heavy resource usage +- `Makefile*` - Make build files +- `pom.xml` - Maven build files (for Java components) + +**Focus on code analysis and targeted changes instead of full builds.** + +## Important Guidelines + +### Pull Request Requirements + +**PR Title Format:** +Must include category prefix: +- `[BugFix]` - Bug fixes and error corrections +- `[Feature]` - New features and capabilities +- `[Enhancement]` - Improvements to existing functionality +- `[Refactor]` - Code refactoring without functional changes +- `[Test]` - Test-related changes +- `[Doc]` - Documentation updates +- `[Build]` - Build system and CI/CD changes +- `[Performance]` - Performance optimizations + +**Example:** `[Feature] Add Apache Paimon table format support` + + +### Code Review Focus Areas + +**Performance Considerations:** +- Query execution efficiency +- Memory usage patterns +- Lock contention in concurrent scenarios +- Network I/O optimization + +**Correctness Priorities:** +- SQL standard compliance +- Data type handling accuracy +- Transaction consistency +- Error handling completeness + +**Security Considerations:** +- Input validation and sanitization +- Authentication and authorization +- Resource usage limits +- Information leak prevention + +## Common Development Patterns + +### Adding New SQL Functions +1. Define function signature in `fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java` +2. Implement evaluation logic in `be/src/exprs/` +3. Add comprehensive tests in `test/sql/test_functions/` + +### Adding New Data Source Connectors +1. Implement connector interface in `java-extensions/` +2. Add metadata reader and schema handling +3. Integrate with query planner in `fe/fe-core/src/main/java/com/starrocks/connector/` +4. Add integration tests + +### Query Optimization Improvements +1. Analyze optimizer rules in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/` +2. Update cost model if needed in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/cost/` +3. Add test cases in `test/sql/test_optimizer/` + +## Documentation References + +- **Contributing Guide**: [`CONTRIBUTING.md`](../CONTRIBUTING.md) +- **Development Setup**: [StarRocks Documentation](https://docs.starrocks.io/docs/developers/) +- **Architecture Overview**: [README.md](../README.md#architecture-overview) +- **PR Template**: [`.github/PULL_REQUEST_TEMPLATE.md`](.github/PULL_REQUEST_TEMPLATE.md) + +## Quick Reference + +**Key File Extensions:** +- `.cpp`, `.h`, `.cc` - C++ backend code +- `.java` - Java frontend/extensions code +- `.sql` - SQL test cases +- `.py` - Python test scripts +- `.proto` - Protocol buffer definitions +- `.thrift` - Thrift interface definitions + +**Important Configuration:** +- `conf/` - Runtime configuration templates +- `gensrc/` - Auto-generated code from IDL definitions +- `thirdparty/` - External dependencies + +**Testing Structure:** +- `test/sql/` - SQL correctness tests organized by functionality +- `be/test/` - C++ unit tests +- `fe/fe-core/src/test/` - Java unit tests + +**Build System Files to Avoid:** +- `build.sh` - Main build script (very resource intensive) +- `build-in-docker.sh` - Docker-based build +- `run-*-ut.sh` - Unit test runners +- `Makefile*` - Make build files +- `pom.xml` - Maven build files (for Java components) + +This project prioritizes **performance**, **correctness**, and **scalability**. When contributing, consider the impact on query performance and ensure changes maintain SQL standard compliance. \ No newline at end of file