[Tool] Add comprehensive GitHub Copilot instructions for StarRocks (#62136)
Signed-off-by: Seaven <seaven_7@qq.com>
This commit is contained in:
parent
2f101c3d88
commit
df147a4ee7
|
|
@ -0,0 +1,258 @@
|
||||||
|
# GitHub Copilot Instructions for StarRocks
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
StarRocks is a high-performance, cloud-native analytical database system designed for real-time analytics and ad-hoc queries. It features a streamlined architecture with both shared-nothing and shared-data deployment modes, supporting sub-second query performance for complex analytical workloads.
|
||||||
|
|
||||||
|
**Key Technologies:**
|
||||||
|
- **Backend (BE)**: C++ - Core analytical engine, storage layer, and query execution
|
||||||
|
- **Frontend (FE)**: Java - SQL parsing, query planning, metadata management, and coordination
|
||||||
|
- **Java Extensions**: Java - External connectors and UDF framework
|
||||||
|
- **Testing**: Python - Integration tests and SQL test framework
|
||||||
|
|
||||||
|
## Architecture Components
|
||||||
|
|
||||||
|
### Backend (be/) - C++
|
||||||
|
The core analytical engine responsible for data storage, processing, and query execution:
|
||||||
|
|
||||||
|
**Core Components:**
|
||||||
|
- `be/src/exec/` - Query execution operators (scan, join, aggregate, etc.)
|
||||||
|
- `be/src/storage/` - Storage engine (tablets, rowsets, segments, compaction)
|
||||||
|
- `be/src/exprs/` - Expression evaluation and vectorized computation
|
||||||
|
- `be/src/formats/` - Data format support (Parquet, ORC, CSV, JSON)
|
||||||
|
- `be/src/runtime/` - Runtime services (memory management, load balancing, stream processing)
|
||||||
|
- `be/src/connector/` - External data source connectors (Hive, Iceberg, Delta Lake)
|
||||||
|
- `be/src/service/` - RPC services and BE coordination
|
||||||
|
- `be/src/common/` - Shared utilities and common data structures
|
||||||
|
|
||||||
|
**Performance Focus:**
|
||||||
|
- Vectorized query execution
|
||||||
|
- Columnar storage format
|
||||||
|
- Memory-efficient algorithms
|
||||||
|
- SIMD optimizations where applicable
|
||||||
|
|
||||||
|
📋 **Note:** See `be/.cursorrules` for detailed backend component breakdown
|
||||||
|
|
||||||
|
### Frontend (fe/) - Java
|
||||||
|
SQL interface and query coordination layer:
|
||||||
|
|
||||||
|
**Core Components:**
|
||||||
|
- `fe/fe-core/src/main/java/com/starrocks/`
|
||||||
|
- `sql/` - SQL parser, analyzer, and AST
|
||||||
|
- `planner/` - Query planning and optimization (CBO)
|
||||||
|
- `catalog/` - Metadata management (tables, partitions, statistics)
|
||||||
|
- `scheduler/` - Query scheduling and execution coordination
|
||||||
|
- `load/` - Data loading coordination (Broker Load, Stream Load, etc.)
|
||||||
|
- `backup/` - Backup and restore functionality
|
||||||
|
- `privilege/` - Authentication and authorization
|
||||||
|
- `qe/` - Query execution coordination and session management
|
||||||
|
- `fe/fe-common/` - Common frontend utilities
|
||||||
|
- `fe/plugin-common/` - Plugin framework common components
|
||||||
|
- `fe/spark-dpp/` - Spark data preprocessing integration
|
||||||
|
- `fe/hive-udf/` - Hive UDF compatibility layer
|
||||||
|
|
||||||
|
**Key Responsibilities:**
|
||||||
|
- Parse and validate SQL statements
|
||||||
|
- Generate optimized query plans using Cost-Based Optimizer (CBO)
|
||||||
|
- Manage cluster metadata and coordination
|
||||||
|
- Handle user sessions and security
|
||||||
|
|
||||||
|
📋 **Note:** See `fe/.cursorrules` for detailed frontend component breakdown
|
||||||
|
|
||||||
|
### Java Extensions (java-extensions/) - Java
|
||||||
|
External connectivity and extensibility:
|
||||||
|
|
||||||
|
**Data Source Connectors:**
|
||||||
|
- `hive-reader/` - Apache Hive integration
|
||||||
|
- `iceberg-metadata-reader/` - Apache Iceberg support
|
||||||
|
- `hudi-reader/` - Apache Hudi integration
|
||||||
|
- `paimon-reader/` - Apache Paimon support
|
||||||
|
- `jdbc-bridge/` - JDBC connectivity for external databases
|
||||||
|
- `odps-reader/` - Alibaba ODPS integration
|
||||||
|
|
||||||
|
**Extension Framework:**
|
||||||
|
- `udf-extensions/` - User-Defined Function framework
|
||||||
|
- `common-runtime/` - Shared runtime for extensions
|
||||||
|
- `hadoop-ext/` - Hadoop ecosystem integration
|
||||||
|
|
||||||
|
📋 **Note:** See `java-extensions/.cursorrules` for detailed extensions breakdown
|
||||||
|
|
||||||
|
### Additional Important Directories
|
||||||
|
|
||||||
|
**Generated Sources (gensrc/):**
|
||||||
|
- `gensrc/proto/` - Protocol buffer definitions
|
||||||
|
- `gensrc/thrift/` - Thrift interface definitions
|
||||||
|
- `gensrc/script/` - Code generation scripts
|
||||||
|
|
||||||
|
**Testing Framework (test/):**
|
||||||
|
- `test/sql/` - SQL test cases organized by functionality
|
||||||
|
- `test/common/` - Common test utilities
|
||||||
|
- `test/lib/` - Test libraries and helpers
|
||||||
|
|
||||||
|
**Tools and Utilities:**
|
||||||
|
- `tools/` - Diagnostic tools, benchmarks, and utilities
|
||||||
|
- `bin/` - Binary executables and scripts
|
||||||
|
- `conf/` - Configuration files and templates
|
||||||
|
- `build-support/` - Build system support files
|
||||||
|
- `docker/` - Docker build configurations
|
||||||
|
|
||||||
|
**Other Key Directories:**
|
||||||
|
- `thirdparty/` - External dependencies and patches
|
||||||
|
- `fs_brokers/` - File system broker implementations
|
||||||
|
- `webroot/` - Web UI static files
|
||||||
|
- `format-sdk/` - Format SDK for data interchange
|
||||||
|
|
||||||
|
## Coding Guidelines
|
||||||
|
|
||||||
|
### C++ (Backend)
|
||||||
|
```cpp
|
||||||
|
// Use modern C++ features (C++17/C++20)
|
||||||
|
// Follow Google C++ Style Guide conventions
|
||||||
|
// Use RAII for resource management
|
||||||
|
// Prefer smart pointers over raw pointers
|
||||||
|
// Use const-correctness
|
||||||
|
|
||||||
|
// Example: Vectorized processing pattern
|
||||||
|
Status ColumnProcessor::process_batch(const ChunkPtr& chunk) {
|
||||||
|
const auto& column = chunk->get_column_by_name("column_name");
|
||||||
|
auto result_column = std::make_shared<Column>();
|
||||||
|
|
||||||
|
// Vectorized operation on entire column
|
||||||
|
for (size_t i = 0; i < chunk->num_rows(); ++i) {
|
||||||
|
// Process element
|
||||||
|
}
|
||||||
|
|
||||||
|
return Status::OK();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Java (Frontend)
|
||||||
|
```java
|
||||||
|
// Follow Java coding conventions
|
||||||
|
// Use dependency injection where appropriate
|
||||||
|
// Implement proper exception handling
|
||||||
|
// Use builder patterns for complex objects
|
||||||
|
// Follow existing naming conventions
|
||||||
|
|
||||||
|
// Example: Query planning pattern
|
||||||
|
public class ScanNodePlanner extends PlanFragment {
|
||||||
|
@Override
|
||||||
|
public PlanFragment visitLogicalScanOperator(
|
||||||
|
OptExpression optExpression, ExecPlan context) {
|
||||||
|
LogicalScanOperator scanOperator =
|
||||||
|
(LogicalScanOperator) optExpression.getOp();
|
||||||
|
|
||||||
|
// Create physical scan node
|
||||||
|
ScanNode scanNode = createScanNode(scanOperator);
|
||||||
|
return new PlanFragment(scanNode);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## ⚠️ CRITICAL BUILD SYSTEM WARNING
|
||||||
|
**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
|
||||||
|
|
||||||
|
The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
|
||||||
|
|
||||||
|
**Specific commands and files to AVOID:**
|
||||||
|
- `build.sh` - Main build script (extremely resource intensive)
|
||||||
|
- `build-in-docker.sh` - Docker-based build
|
||||||
|
- `run-be-ut.sh` / `run-fe-ut.sh` / `run-java-exts-ut.sh` - Unit test runners
|
||||||
|
- `docker-compose` commands - Heavy resource usage
|
||||||
|
- `Makefile*` - Make build files
|
||||||
|
- `pom.xml` - Maven build files (for Java components)
|
||||||
|
|
||||||
|
**Focus on code analysis and targeted changes instead of full builds.**
|
||||||
|
|
||||||
|
## Important Guidelines
|
||||||
|
|
||||||
|
### Pull Request Requirements
|
||||||
|
|
||||||
|
**PR Title Format:**
|
||||||
|
Must include category prefix:
|
||||||
|
- `[BugFix]` - Bug fixes and error corrections
|
||||||
|
- `[Feature]` - New features and capabilities
|
||||||
|
- `[Enhancement]` - Improvements to existing functionality
|
||||||
|
- `[Refactor]` - Code refactoring without functional changes
|
||||||
|
- `[Test]` - Test-related changes
|
||||||
|
- `[Doc]` - Documentation updates
|
||||||
|
- `[Build]` - Build system and CI/CD changes
|
||||||
|
- `[Performance]` - Performance optimizations
|
||||||
|
|
||||||
|
**Example:** `[Feature] Add Apache Paimon table format support`
|
||||||
|
|
||||||
|
|
||||||
|
### Code Review Focus Areas
|
||||||
|
|
||||||
|
**Performance Considerations:**
|
||||||
|
- Query execution efficiency
|
||||||
|
- Memory usage patterns
|
||||||
|
- Lock contention in concurrent scenarios
|
||||||
|
- Network I/O optimization
|
||||||
|
|
||||||
|
**Correctness Priorities:**
|
||||||
|
- SQL standard compliance
|
||||||
|
- Data type handling accuracy
|
||||||
|
- Transaction consistency
|
||||||
|
- Error handling completeness
|
||||||
|
|
||||||
|
**Security Considerations:**
|
||||||
|
- Input validation and sanitization
|
||||||
|
- Authentication and authorization
|
||||||
|
- Resource usage limits
|
||||||
|
- Information leak prevention
|
||||||
|
|
||||||
|
## Common Development Patterns
|
||||||
|
|
||||||
|
### Adding New SQL Functions
|
||||||
|
1. Define function signature in `fe/fe-core/src/main/java/com/starrocks/catalog/FunctionSet.java`
|
||||||
|
2. Implement evaluation logic in `be/src/exprs/`
|
||||||
|
3. Add comprehensive tests in `test/sql/test_functions/`
|
||||||
|
|
||||||
|
### Adding New Data Source Connectors
|
||||||
|
1. Implement connector interface in `java-extensions/`
|
||||||
|
2. Add metadata reader and schema handling
|
||||||
|
3. Integrate with query planner in `fe/fe-core/src/main/java/com/starrocks/connector/`
|
||||||
|
4. Add integration tests
|
||||||
|
|
||||||
|
### Query Optimization Improvements
|
||||||
|
1. Analyze optimizer rules in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/`
|
||||||
|
2. Update cost model if needed in `fe/fe-core/src/main/java/com/starrocks/sql/optimizer/cost/`
|
||||||
|
3. Add test cases in `test/sql/test_optimizer/`
|
||||||
|
|
||||||
|
## Documentation References
|
||||||
|
|
||||||
|
- **Contributing Guide**: [`CONTRIBUTING.md`](../CONTRIBUTING.md)
|
||||||
|
- **Development Setup**: [StarRocks Documentation](https://docs.starrocks.io/docs/developers/)
|
||||||
|
- **Architecture Overview**: [README.md](../README.md#architecture-overview)
|
||||||
|
- **PR Template**: [`.github/PULL_REQUEST_TEMPLATE.md`](.github/PULL_REQUEST_TEMPLATE.md)
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
**Key File Extensions:**
|
||||||
|
- `.cpp`, `.h`, `.cc` - C++ backend code
|
||||||
|
- `.java` - Java frontend/extensions code
|
||||||
|
- `.sql` - SQL test cases
|
||||||
|
- `.py` - Python test scripts
|
||||||
|
- `.proto` - Protocol buffer definitions
|
||||||
|
- `.thrift` - Thrift interface definitions
|
||||||
|
|
||||||
|
**Important Configuration:**
|
||||||
|
- `conf/` - Runtime configuration templates
|
||||||
|
- `gensrc/` - Auto-generated code from IDL definitions
|
||||||
|
- `thirdparty/` - External dependencies
|
||||||
|
|
||||||
|
**Testing Structure:**
|
||||||
|
- `test/sql/` - SQL correctness tests organized by functionality
|
||||||
|
- `be/test/` - C++ unit tests
|
||||||
|
- `fe/fe-core/src/test/` - Java unit tests
|
||||||
|
|
||||||
|
**Build System Files to Avoid:**
|
||||||
|
- `build.sh` - Main build script (very resource intensive)
|
||||||
|
- `build-in-docker.sh` - Docker-based build
|
||||||
|
- `run-*-ut.sh` - Unit test runners
|
||||||
|
- `Makefile*` - Make build files
|
||||||
|
- `pom.xml` - Maven build files (for Java components)
|
||||||
|
|
||||||
|
This project prioritizes **performance**, **correctness**, and **scalability**. When contributing, consider the impact on query performance and ensure changes maintain SQL standard compliance.
|
||||||
Loading…
Reference in New Issue