[Tool] Add comprehensive GitHub Copilot instructions for StarRocks (#62136)

Signed-off-by: Seaven <seaven_7@qq.com>
2025-08-20 16:43:07 +08:00 · 2025-08-20 16:43:07 +08:00 · df147a4ee7
parent 2f101c3d88
commit df147a4ee7
1 changed files with 258 additions and 0 deletions
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@ -0,0 +1,258 @@
+# GitHub Copilot Instructions for StarRocks
+
+## Project Overview
+
+StarRocks is a high-performance, cloud-native analytical database system designed for real-time analytics and ad-hoc queries. It features a streamlined architecture with both shared-nothing and shared-data deployment modes, supporting sub-second query performance for complex analytical workloads.
+
+**Key Technologies:**
+- **Backend (BE)**: C++ - Core analytical engine, storage layer, and query execution
+- **Frontend (FE)**: Java - SQL parsing, query planning, metadata management, and coordination
+- **Java Extensions**: Java - External connectors and UDF framework
+- **Testing**: Python - Integration tests and SQL test framework
+
+## Architecture Components
+
+### Backend (be/) - C++
+The core analytical engine responsible for data storage, processing, and query execution:
+
+**Core Components:**
+- `be/src/exec/` - Query execution operators (scan, join, aggregate, etc.)
+- `be/src/storage/` - Storage engine (tablets, rowsets, segments, compaction)
+- `be/src/exprs/` - Expression evaluation and vectorized computation
+- `be/src/formats/` - Data format support (Parquet, ORC, CSV, JSON)
+- `be/src/runtime/` - Runtime services (memory management, load balancing, stream processing)
+- `be/src/connector/` - External data source connectors (Hive, Iceberg, Delta Lake)
+- `be/src/service/` - RPC services and BE coordination
+- `be/src/common/` - Shared utilities and common data structures
+
+**Performance Focus:**
+- Vectorized query execution
+- Columnar storage format
+- Memory-efficient algorithms
+- SIMD optimizations where applicable
+
+📋 **Note:** See `be/.cursorrules` for detailed backend component breakdown
+
+### Frontend (fe/) - Java
+SQL interface and query coordination layer:
+
+**Core Components:**
+- `fe/fe-core/src/main/java/com/starrocks/`
+  - `sql/` - SQL parser, analyzer, and AST
+  - `planner/` - Query planning and optimization (CBO)
+  - `catalog/` - Metadata management (tables, partitions, statistics)
+  - `scheduler/` - Query scheduling and execution coordination
+  - `load/` - Data loading coordination (Broker Load, Stream Load, etc.)
+  - `backup/` - Backup and restore functionality
+  - `privilege/` - Authentication and authorization
+  - `qe/` - Query execution coordination and session management
+- `fe/fe-common/` - Common frontend utilities
+- `fe/plugin-common/` - Plugin framework common components
+- `fe/spark-dpp/` - Spark data preprocessing integration
+- `fe/hive-udf/` - Hive UDF compatibility layer
+
+**Key Responsibilities:**
+- Parse and validate SQL statements
+- Generate optimized query plans using Cost-Based Optimizer (CBO)
+- Manage cluster metadata and coordination
+- Handle user sessions and security
+
+📋 **Note:** See `fe/.cursorrules` for detailed frontend component breakdown
+
+### Java Extensions (java-extensions/) - Java
+External connectivity and extensibility:
+
+**Data Source Connectors:**
+- `hive-reader/` - Apache Hive integration
+- `iceberg-metadata-reader/` - Apache Iceberg support
+- `hudi-reader/` - Apache Hudi integration
+- `paimon-reader/` - Apache Paimon support
+- `jdbc-bridge/` - JDBC connectivity for external databases
+- `odps-reader/` - Alibaba ODPS integration
+
+**Extension Framework:**
+- `udf-extensions/` - User-Defined Function framework
+- `common-runtime/` - Shared runtime for extensions
+- `hadoop-ext/` - Hadoop ecosystem integration
+
+📋 **Note:** See `java-extensions/.cursorrules` for detailed extensions breakdown
+
+### Additional Important Directories
+
+**Generated Sources (gensrc/):**
+- `gensrc/proto/` - Protocol buffer definitions
+- `gensrc/thrift/` - Thrift interface definitions
+- `gensrc/script/` - Code generation scripts
+
+**Testing Framework (test/):**
+- `test/sql/` - SQL test cases organized by functionality
+- `test/common/` - Common test utilities
+- `test/lib/` - Test libraries and helpers
+
+**Tools and Utilities:**
+- `tools/` - Diagnostic tools, benchmarks, and utilities
+- `bin/` - Binary executables and scripts
+- `conf/` - Configuration files and templates
+- `build-support/` - Build system support files
+- `docker/` - Docker build configurations
+
+**Other Key Directories:**
+- `thirdparty/` - External dependencies and patches
+- `fs_brokers/` - File system broker implementations
+- `webroot/` - Web UI static files
+- `format-sdk/` - Format SDK for data interchange
+
+## Coding Guidelines
+
+### C++ (Backend)
+```cpp
+// Use modern C++ features (C++17/C++20)
+// Follow Google C++ Style Guide conventions
+// Use RAII for resource management
+// Prefer smart pointers over raw pointers
+// Use const-correctness
+
+// Example: Vectorized processing pattern
+Status ColumnProcessor::process_batch(const ChunkPtr& chunk) {
+    const auto& column = chunk->get_column_by_name("column_name");
+    auto result_column = std::make_shared<Column>();
+    
+    // Vectorized operation on entire column
+    for (size_t i = 0; i < chunk->num_rows(); ++i) {
+        // Process element
+    }
+    
+    return Status::OK();
+}
+```
+
+### Java (Frontend)
+```java
+// Follow Java coding conventions
+// Use dependency injection where appropriate  
+// Implement proper exception handling
+// Use builder patterns for complex objects
+// Follow existing naming conventions
+
+// Example: Query planning pattern
+public class ScanNodePlanner extends PlanFragment {
+    @Override
+    public PlanFragment visitLogicalScanOperator(
+            OptExpression optExpression, ExecPlan context) {
+        LogicalScanOperator scanOperator = 
+            (LogicalScanOperator) optExpression.getOp();
+        
+        // Create physical scan node
+        ScanNode scanNode = createScanNode(scanOperator);
+        return new PlanFragment(scanNode);
+    }
+}
+```
+
+
+## ⚠️ CRITICAL BUILD SYSTEM WARNING
+**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
+
+The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
+
+**Specific commands and files to AVOID:**
+- `build.sh` - Main build script (extremely resource intensive)
+- `build-in-docker.sh` - Docker-based build
+- `run-be-ut.sh` / `run-fe-ut.sh` / `run-java-exts-ut.sh` - Unit test runners
+- `docker-compose` commands - Heavy resource usage
+- `Makefile*` - Make build files
+- `pom.xml` - Maven build files (for Java components)
+
+**Focus on code analysis and targeted changes instead of full builds.**
+
+## Important Guidelines
+
+### Pull Request Requirements
+
+**PR Title Format:**
+Must include category prefix:
+- `[BugFix]` - Bug fixes and error corrections
+- `[Feature]` - New features and capabilities  
+- `[Enhancement]` - Improvements to existing functionality
+- `[Refactor]` - Code refactoring without functional changes
+- `[Test]` - Test-related changes
+- `[Doc]` - Documentation updates
+- `[Build]` - Build system and CI/CD changes
+- `[Performance]` - Performance optimizations
+
+**Example:** `[Feature] Add Apache Paimon table format support`
+
+
+### Code Review Focus Areas
+
+**Performance Considerations:**
+- Query execution efficiency
+- Memory usage patterns
+- Lock contention in concurrent scenarios
+- Network I/O optimization
+
+**Correctness Priorities:**
+- SQL standard compliance
+- Data type handling accuracy
+- Transaction consistency
+- Error handling completeness
+
+**Security Considerations:**
+- Input validation and sanitization
+- Authentication and authorization
+- Resource usage limits
+- Information leak prevention
+
+## Common Development Patterns
+
+### Adding New SQL Functions
+1. Define function signature in `fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java`
+2. Implement evaluation logic in `be/src/exprs/`
+3. Add comprehensive tests in `test/sql/test_functions/`
+
+### Adding New Data Source Connectors
+1. Implement connector interface in `java-extensions/`
+2. Add metadata reader and schema handling
+3. Integrate with query planner in `fe/fe-core/src/main/java/com/starrocks/connector/`
+4. Add integration tests
+
+### Query Optimization Improvements
+1. Analyze optimizer rules in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/`
+2. Update cost model if needed in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/cost/`
+3. Add test cases in `test/sql/test_optimizer/`
+
+## Documentation References
+
+- **Contributing Guide**: [`CONTRIBUTING.md`](../CONTRIBUTING.md)
+- **Development Setup**: [StarRocks Documentation](https://docs.starrocks.io/docs/developers/)
+- **Architecture Overview**: [README.md](../README.md#architecture-overview)
+- **PR Template**: [`.github/PULL_REQUEST_TEMPLATE.md`](.github/PULL_REQUEST_TEMPLATE.md)
+
+## Quick Reference
+
+**Key File Extensions:**
+- `.cpp`, `.h`, `.cc` - C++ backend code
+- `.java` - Java frontend/extensions code  
+- `.sql` - SQL test cases
+- `.py` - Python test scripts
+- `.proto` - Protocol buffer definitions
+- `.thrift` - Thrift interface definitions
+
+**Important Configuration:**
+- `conf/` - Runtime configuration templates
+- `gensrc/` - Auto-generated code from IDL definitions
+- `thirdparty/` - External dependencies
+
+**Testing Structure:**
+- `test/sql/` - SQL correctness tests organized by functionality
+- `be/test/` - C++ unit tests
+- `fe/fe-core/src/test/` - Java unit tests
+
+**Build System Files to Avoid:**
+- `build.sh` - Main build script (very resource intensive)
+- `build-in-docker.sh` - Docker-based build
+- `run-*-ut.sh` - Unit test runners
+- `Makefile*` - Make build files
+- `pom.xml` - Maven build files (for Java components)
+
+This project prioritizes **performance**, **correctness**, and **scalability**. When contributing, consider the impact on query performance and ensure changes maintain SQL standard compliance.