[Tool] Add comprehensive GitHub Copilot instructions for StarRocks (#62136)

Signed-off-by: Seaven <seaven_7@qq.com>
This commit is contained in:
Copilot 2025-08-20 16:43:07 +08:00 committed by GitHub
parent 2f101c3d88
commit df147a4ee7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 258 additions and 0 deletions

258
.github/copilot-instructions.md vendored Normal file
View File

@ -0,0 +1,258 @@
# GitHub Copilot Instructions for StarRocks
## Project Overview
StarRocks is a high-performance, cloud-native analytical database system designed for real-time analytics and ad-hoc queries. It features a streamlined architecture with both shared-nothing and shared-data deployment modes, supporting sub-second query performance for complex analytical workloads.
**Key Technologies:**
- **Backend (BE)**: C++ - Core analytical engine, storage layer, and query execution
- **Frontend (FE)**: Java - SQL parsing, query planning, metadata management, and coordination
- **Java Extensions**: Java - External connectors and UDF framework
- **Testing**: Python - Integration tests and SQL test framework
## Architecture Components
### Backend (be/) - C++
The core analytical engine responsible for data storage, processing, and query execution:
**Core Components:**
- `be/src/exec/` - Query execution operators (scan, join, aggregate, etc.)
- `be/src/storage/` - Storage engine (tablets, rowsets, segments, compaction)
- `be/src/exprs/` - Expression evaluation and vectorized computation
- `be/src/formats/` - Data format support (Parquet, ORC, CSV, JSON)
- `be/src/runtime/` - Runtime services (memory management, load balancing, stream processing)
- `be/src/connector/` - External data source connectors (Hive, Iceberg, Delta Lake)
- `be/src/service/` - RPC services and BE coordination
- `be/src/common/` - Shared utilities and common data structures
**Performance Focus:**
- Vectorized query execution
- Columnar storage format
- Memory-efficient algorithms
- SIMD optimizations where applicable
📋 **Note:** See `be/.cursorrules` for detailed backend component breakdown
### Frontend (fe/) - Java
SQL interface and query coordination layer:
**Core Components:**
- `fe/fe-core/src/main/java/com/starrocks/`
- `sql/` - SQL parser, analyzer, and AST
- `planner/` - Query planning and optimization (CBO)
- `catalog/` - Metadata management (tables, partitions, statistics)
- `scheduler/` - Query scheduling and execution coordination
- `load/` - Data loading coordination (Broker Load, Stream Load, etc.)
- `backup/` - Backup and restore functionality
- `privilege/` - Authentication and authorization
- `qe/` - Query execution coordination and session management
- `fe/fe-common/` - Common frontend utilities
- `fe/plugin-common/` - Plugin framework common components
- `fe/spark-dpp/` - Spark data preprocessing integration
- `fe/hive-udf/` - Hive UDF compatibility layer
**Key Responsibilities:**
- Parse and validate SQL statements
- Generate optimized query plans using Cost-Based Optimizer (CBO)
- Manage cluster metadata and coordination
- Handle user sessions and security
📋 **Note:** See `fe/.cursorrules` for detailed frontend component breakdown
### Java Extensions (java-extensions/) - Java
External connectivity and extensibility:
**Data Source Connectors:**
- `hive-reader/` - Apache Hive integration
- `iceberg-metadata-reader/` - Apache Iceberg support
- `hudi-reader/` - Apache Hudi integration
- `paimon-reader/` - Apache Paimon support
- `jdbc-bridge/` - JDBC connectivity for external databases
- `odps-reader/` - Alibaba ODPS integration
**Extension Framework:**
- `udf-extensions/` - User-Defined Function framework
- `common-runtime/` - Shared runtime for extensions
- `hadoop-ext/` - Hadoop ecosystem integration
📋 **Note:** See `java-extensions/.cursorrules` for detailed extensions breakdown
### Additional Important Directories
**Generated Sources (gensrc/):**
- `gensrc/proto/` - Protocol buffer definitions
- `gensrc/thrift/` - Thrift interface definitions
- `gensrc/script/` - Code generation scripts
**Testing Framework (test/):**
- `test/sql/` - SQL test cases organized by functionality
- `test/common/` - Common test utilities
- `test/lib/` - Test libraries and helpers
**Tools and Utilities:**
- `tools/` - Diagnostic tools, benchmarks, and utilities
- `bin/` - Binary executables and scripts
- `conf/` - Configuration files and templates
- `build-support/` - Build system support files
- `docker/` - Docker build configurations
**Other Key Directories:**
- `thirdparty/` - External dependencies and patches
- `fs_brokers/` - File system broker implementations
- `webroot/` - Web UI static files
- `format-sdk/` - Format SDK for data interchange
## Coding Guidelines
### C++ (Backend)
```cpp
// Use modern C++ features (C++17/C++20)
// Follow Google C++ Style Guide conventions
// Use RAII for resource management
// Prefer smart pointers over raw pointers
// Use const-correctness
// Example: Vectorized processing pattern
Status ColumnProcessor::process_batch(const ChunkPtr& chunk) {
const auto& column = chunk->get_column_by_name("column_name");
auto result_column = std::make_shared<Column>();
// Vectorized operation on entire column
for (size_t i = 0; i < chunk->num_rows(); ++i) {
// Process element
}
return Status::OK();
}
```
### Java (Frontend)
```java
// Follow Java coding conventions
// Use dependency injection where appropriate
// Implement proper exception handling
// Use builder patterns for complex objects
// Follow existing naming conventions
// Example: Query planning pattern
public class ScanNodePlanner extends PlanFragment {
@Override
public PlanFragment visitLogicalScanOperator(
OptExpression optExpression, ExecPlan context) {
LogicalScanOperator scanOperator =
(LogicalScanOperator) optExpression.getOp();
// Create physical scan node
ScanNode scanNode = createScanNode(scanOperator);
return new PlanFragment(scanNode);
}
}
```
## ⚠️ CRITICAL BUILD SYSTEM WARNING
**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
**Specific commands and files to AVOID:**
- `build.sh` - Main build script (extremely resource intensive)
- `build-in-docker.sh` - Docker-based build
- `run-be-ut.sh` / `run-fe-ut.sh` / `run-java-exts-ut.sh` - Unit test runners
- `docker-compose` commands - Heavy resource usage
- `Makefile*` - Make build files
- `pom.xml` - Maven build files (for Java components)
**Focus on code analysis and targeted changes instead of full builds.**
## Important Guidelines
### Pull Request Requirements
**PR Title Format:**
Must include category prefix:
- `[BugFix]` - Bug fixes and error corrections
- `[Feature]` - New features and capabilities
- `[Enhancement]` - Improvements to existing functionality
- `[Refactor]` - Code refactoring without functional changes
- `[Test]` - Test-related changes
- `[Doc]` - Documentation updates
- `[Build]` - Build system and CI/CD changes
- `[Performance]` - Performance optimizations
**Example:** `[Feature] Add Apache Paimon table format support`
### Code Review Focus Areas
**Performance Considerations:**
- Query execution efficiency
- Memory usage patterns
- Lock contention in concurrent scenarios
- Network I/O optimization
**Correctness Priorities:**
- SQL standard compliance
- Data type handling accuracy
- Transaction consistency
- Error handling completeness
**Security Considerations:**
- Input validation and sanitization
- Authentication and authorization
- Resource usage limits
- Information leak prevention
## Common Development Patterns
### Adding New SQL Functions
1. Define function signature in `fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java`
2. Implement evaluation logic in `be/src/exprs/`
3. Add comprehensive tests in `test/sql/test_functions/`
### Adding New Data Source Connectors
1. Implement connector interface in `java-extensions/`
2. Add metadata reader and schema handling
3. Integrate with query planner in `fe/fe-core/src/main/java/com/starrocks/connector/`
4. Add integration tests
### Query Optimization Improvements
1. Analyze optimizer rules in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/`
2. Update cost model if needed in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/cost/`
3. Add test cases in `test/sql/test_optimizer/`
## Documentation References
- **Contributing Guide**: [`CONTRIBUTING.md`](../CONTRIBUTING.md)
- **Development Setup**: [StarRocks Documentation](https://docs.starrocks.io/docs/developers/)
- **Architecture Overview**: [README.md](../README.md#architecture-overview)
- **PR Template**: [`.github/PULL_REQUEST_TEMPLATE.md`](.github/PULL_REQUEST_TEMPLATE.md)
## Quick Reference
**Key File Extensions:**
- `.cpp`, `.h`, `.cc` - C++ backend code
- `.java` - Java frontend/extensions code
- `.sql` - SQL test cases
- `.py` - Python test scripts
- `.proto` - Protocol buffer definitions
- `.thrift` - Thrift interface definitions
**Important Configuration:**
- `conf/` - Runtime configuration templates
- `gensrc/` - Auto-generated code from IDL definitions
- `thirdparty/` - External dependencies
**Testing Structure:**
- `test/sql/` - SQL correctness tests organized by functionality
- `be/test/` - C++ unit tests
- `fe/fe-core/src/test/` - Java unit tests
**Build System Files to Avoid:**
- `build.sh` - Main build script (very resource intensive)
- `build-in-docker.sh` - Docker-based build
- `run-*-ut.sh` - Unit test runners
- `Makefile*` - Make build files
- `pom.xml` - Maven build files (for Java components)
This project prioritizes **performance**, **correctness**, and **scalability**. When contributing, consider the impact on query performance and ensure changes maintain SQL standard compliance.