[Tool] Add cursor rules and code organization instructions (#61224)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
This commit is contained in:
parent
4f7485c9b4
commit
7e506ec4e1
|
|
@ -0,0 +1,158 @@
|
|||
# StarRocks Project Cursor Rules
|
||||
|
||||
## Project Overview
|
||||
StarRocks is an open-source, high-performance analytical database system designed for real-time analytics. This is a large-scale C++/Java project with a complex build system.
|
||||
|
||||
## ⚠️ IMPORTANT BUILD SYSTEM WARNING
|
||||
**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
|
||||
The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
|
||||
|
||||
## Code Organization
|
||||
|
||||
### Backend (be/)
|
||||
**Language**: C++
|
||||
**Purpose**: Core analytical engine and storage layer
|
||||
- `be/src/exec/` - Query execution engine components
|
||||
- `be/src/storage/` - Storage engine and data persistence
|
||||
- `be/src/exprs/` - Expression evaluation and JIT compilation
|
||||
- `be/src/formats/` - Data format parsers and serializers
|
||||
- `be/src/runtime/` - Runtime components (batch write, stream load, memory management, etc.)
|
||||
- `be/src/connector/` - External data source connectors
|
||||
- `be/src/service/` - Core backend services
|
||||
- `be/src/common/` - Common utilities and shared code
|
||||
|
||||
**📋 See `be/.cursorrules` for detailed backend component breakdown**
|
||||
|
||||
### Frontend (fe/)
|
||||
**Language**: Java
|
||||
**Purpose**: SQL parsing, query planning, and metadata management
|
||||
- `fe/fe-core/` - Core frontend services (SQL parser, planner, catalog)
|
||||
- `fe/fe-common/` - Common frontend utilities
|
||||
- `fe/plugin-common/` - Plugin framework common components
|
||||
- `fe/spark-dpp/` - Spark data preprocessing integration
|
||||
- `fe/hive-udf/` - Hive UDF compatibility layer
|
||||
|
||||
**📋 See `fe/.cursorrules` for detailed frontend component breakdown**
|
||||
|
||||
### Java Extensions (java-extensions/)
|
||||
**Language**: Java
|
||||
**Purpose**: External connectors and extensions
|
||||
- `java-extensions/hive-reader/` - Hive data reader
|
||||
- `java-extensions/iceberg-metadata-reader/` - Apache Iceberg metadata reader
|
||||
- `java-extensions/hudi-reader/` - Apache Hudi integration
|
||||
- `java-extensions/paimon-reader/` - Apache Paimon reader
|
||||
- `java-extensions/jdbc-bridge/` - JDBC connectivity bridge
|
||||
- `java-extensions/hadoop-ext/` - Hadoop ecosystem integration
|
||||
- `java-extensions/udf-extensions/` - UDF extension framework
|
||||
- `java-extensions/common-runtime/` - Common runtime for Java extensions
|
||||
|
||||
**📋 See `java-extensions/.cursorrules` for detailed extensions breakdown**
|
||||
|
||||
### Generated Sources (gensrc/)
|
||||
**Purpose**: Auto-generated code from IDL definitions
|
||||
- `gensrc/proto/` - Protocol buffer definitions
|
||||
- `gensrc/thrift/` - Thrift interface definitions
|
||||
- `gensrc/script/` - Code generation scripts
|
||||
|
||||
### Testing (test/)
|
||||
**Language**: Python
|
||||
**Purpose**: Integration and SQL testing framework
|
||||
- `test/sql/` - SQL test cases organized by functionality
|
||||
- `test/common/` - Common test utilities
|
||||
- `test/lib/` - Test libraries and helpers
|
||||
|
||||
### Tools and Utilities
|
||||
- `tools/` - Diagnostic tools, benchmarks, and utilities
|
||||
- `bin/` - Binary executables and scripts
|
||||
- `conf/` - Configuration files and templates
|
||||
- `build-support/` - Build system support files
|
||||
- `docker/` - Docker build configurations
|
||||
- `docs/` - Project documentation
|
||||
|
||||
### Third-party Dependencies
|
||||
- `thirdparty/` - External dependencies and patches
|
||||
- `licenses/` - License files for dependencies
|
||||
|
||||
### Other Important Directories
|
||||
- `fs_brokers/` - File system broker implementations
|
||||
- `webroot/` - Web UI static files
|
||||
- `format-sdk/` - Format SDK for data interchange
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
1. **No Building**: Avoid running build commands (`build.sh`, `make`, etc.) unless specifically requested
|
||||
2. **No Unit Tests**: Do not execute unit test scripts (`run-be-ut.sh`, `run-fe-ut.sh`, etc.)
|
||||
3. **Focus on Code Analysis**: Prioritize code reading, analysis, and small targeted changes
|
||||
4. **Language Awareness**:
|
||||
- Backend (be/) is C++ - focus on performance and memory management
|
||||
- Frontend (fe/) is Java - focus on SQL parsing and query planning
|
||||
- Tests are Python - focus on SQL correctness and integration testing
|
||||
|
||||
## Pull Request Guidelines
|
||||
|
||||
### PR Title Format
|
||||
PR titles must include a prefix to categorize the change:
|
||||
|
||||
- **[BugFix]** - Bug fixes and error corrections
|
||||
- **[Enhancement]** - Improvements to existing functionality
|
||||
- **[Feature]** - New features and capabilities
|
||||
- **[Refactor]** - Code refactoring without functional changes
|
||||
- **[Test]** - Test-related changes
|
||||
- **[Doc]** - Documentation updates
|
||||
- **[Build]** - Build system and CI/CD changes
|
||||
- **[Performance]** - Performance optimizations
|
||||
|
||||
**Examples:**
|
||||
- `[BugFix] Fix memory leak in column batch processing`
|
||||
- `[Feature] Add support for Apache Paimon connector`
|
||||
- `[Enhancement] Improve query optimizer for materialized views`
|
||||
|
||||
### Commit Message Template
|
||||
Follow this structured format for all commit messages:
|
||||
|
||||
```
|
||||
[Category] Brief description (50 chars or less)
|
||||
|
||||
Detailed explanation of what this commit does and why.
|
||||
Wrap lines at 72 characters.
|
||||
|
||||
- Key change 1
|
||||
- Key change 2
|
||||
- Key change 3
|
||||
|
||||
Fixes: #issue_number (if applicable)
|
||||
Closes: #issue_number (if applicable)
|
||||
```
|
||||
|
||||
**Categories:** BugFix, Enhancement, Feature, Refactor, Test, Doc, Build, Performance
|
||||
|
||||
**Example:**
|
||||
```
|
||||
[Feature] Add Apache Iceberg table format support
|
||||
|
||||
Implement Iceberg connector to enable querying Iceberg tables
|
||||
directly from StarRocks. This includes metadata reading,
|
||||
partition pruning, and schema evolution support.
|
||||
|
||||
- Add IcebergConnector and IcebergMetadata classes
|
||||
- Implement partition and file pruning optimizations
|
||||
- Support for Iceberg v1 and v2 table formats
|
||||
- Add comprehensive unit tests
|
||||
|
||||
Closes: #12345
|
||||
```
|
||||
|
||||
## Common File Extensions
|
||||
- `.cpp`, `.h`, `.cc` - C++ source and headers (backend)
|
||||
- `.java` - Java source files (frontend and extensions)
|
||||
- `.proto` - Protocol buffer definitions
|
||||
- `.thrift` - Thrift interface definitions
|
||||
- `.sql` - SQL test cases and queries
|
||||
- `.py` - Python test scripts
|
||||
|
||||
## Build System Files to Avoid
|
||||
- `build.sh` - Main build script (very resource intensive)
|
||||
- `build-in-docker.sh` - Docker-based build
|
||||
- `run-*-ut.sh` - Unit test runners
|
||||
- `Makefile*` - Make build files
|
||||
- `pom.xml` - Maven build files (for Java components)
|
||||
|
|
@ -0,0 +1,143 @@
|
|||
# StarRocks Frontend (fe/) Cursor Rules
|
||||
|
||||
## Overview
|
||||
The frontend is the Java-based component responsible for SQL parsing, query planning, metadata management, and coordination. It serves as the brain of StarRocks, handling all SQL operations and managing the distributed query execution.
|
||||
|
||||
## ⚠️ BUILD WARNING
|
||||
**DO NOT attempt to build or run tests unless explicitly requested.** The build system is resource-intensive.
|
||||
|
||||
## Frontend Architecture
|
||||
|
||||
### Frontend Core (fe-core/)
|
||||
The main frontend module containing all core database functionality:
|
||||
|
||||
#### Core SQL Processing
|
||||
- `fe-core/src/main/java/com/starrocks/sql/` - SQL processing pipeline
|
||||
- `sql/parser/` - SQL parser (ANTLR-based)
|
||||
- `sql/analyzer/` - SQL semantic analysis and validation
|
||||
- `sql/ast/` - Abstract Syntax Tree definitions
|
||||
- `sql/optimizer/` - Cost-based query optimizer
|
||||
- `sql/plan/` - Physical query plan generation
|
||||
- `sql/spm/` - SQL Plan Management
|
||||
|
||||
#### Metadata Management
|
||||
- `fe-core/src/main/java/com/starrocks/catalog/` - Metadata catalog system
|
||||
- `catalog/system/` - System tables and metadata
|
||||
- `catalog/mv/` - Materialized view metadata
|
||||
- `catalog/constraint/` - Table constraints management
|
||||
- `catalog/combinator/` - Catalog combinators
|
||||
|
||||
#### Query Execution
|
||||
- `fe-core/src/main/java/com/starrocks/qe/` - Query execution engine
|
||||
- Core classes: `ConnectContext`, `StmtExecutor`, `DefaultCoordinator`
|
||||
- Session management: `SessionVariable`, `ConnectProcessor`
|
||||
- Query scheduling: `SimpleScheduler`, backend selectors
|
||||
- Result processing: `ShowExecutor`, `ResultReceiver`
|
||||
|
||||
#### Query Planning
|
||||
- `fe-core/src/main/java/com/starrocks/planner/` - Physical query planning
|
||||
- `planner/stream/` - Stream processing plans
|
||||
|
||||
#### External Connectors
|
||||
- `fe-core/src/main/java/com/starrocks/connector/` - External data source connectors
|
||||
- `connector/hive/` - Apache Hive integration
|
||||
- `connector/iceberg/` - Apache Iceberg support
|
||||
- `connector/hudi/` - Apache Hudi integration
|
||||
- `connector/jdbc/` - JDBC connectivity
|
||||
- `connector/elasticsearch/` - Elasticsearch connector
|
||||
- `connector/delta/` - Delta Lake support
|
||||
- `connector/kudu/` - Apache Kudu connector
|
||||
- `connector/odps/` - ODPS (MaxCompute) connector
|
||||
- `connector/paimon/` - Apache Paimon connector
|
||||
|
||||
#### Data Loading
|
||||
- `fe-core/src/main/java/com/starrocks/load/` - Data ingestion framework
|
||||
- `load/loadv2/` - Load v2 implementation
|
||||
- `load/routineload/` - Routine/streaming load
|
||||
- `load/streamload/` - Stream loading
|
||||
- `load/batchwrite/` - Batch write operations
|
||||
- `load/pipe/` - Data pipeline management
|
||||
|
||||
#### Storage & Persistence
|
||||
- `fe-core/src/main/java/com/starrocks/persist/` - Metadata persistence
|
||||
- `fe-core/src/main/java/com/starrocks/journal/` - Write-ahead logging
|
||||
- `fe-core/src/main/java/com/starrocks/meta/` - Metadata management
|
||||
|
||||
#### Cluster Management
|
||||
- `fe-core/src/main/java/com/starrocks/system/` - System information service
|
||||
- `fe-core/src/main/java/com/starrocks/server/` - Server components and table factories
|
||||
- `fe-core/src/main/java/com/starrocks/ha/` - High availability
|
||||
- `fe-core/src/main/java/com/starrocks/leader/` - Leader election
|
||||
- `fe-core/src/main/java/com/starrocks/clone/` - Data replication
|
||||
|
||||
#### Security & Access Control
|
||||
- `fe-core/src/main/java/com/starrocks/authentication/` - User authentication
|
||||
- `fe-core/src/main/java/com/starrocks/authorization/` - Access control
|
||||
- `fe-core/src/main/java/com/starrocks/credential/` - Credential management
|
||||
|
||||
#### Advanced Features
|
||||
- `fe-core/src/main/java/com/starrocks/mv/` - Materialized views
|
||||
- `fe-core/src/main/java/com/starrocks/scheduler/` - Task scheduling
|
||||
- `fe-core/src/main/java/com/starrocks/statistic/` - Statistics collection
|
||||
- `fe-core/src/main/java/com/starrocks/warehouse/` - Data warehouse management
|
||||
- `fe-core/src/main/java/com/starrocks/lake/` - Lake storage format
|
||||
|
||||
#### Monitoring & Operations
|
||||
- `fe-core/src/main/java/com/starrocks/monitor/` - System monitoring
|
||||
- `fe-core/src/main/java/com/starrocks/metric/` - Metrics collection
|
||||
- `fe-core/src/main/java/com/starrocks/http/` - HTTP API endpoints
|
||||
|
||||
### Other Frontend Modules
|
||||
- `fe-common/` - Common frontend utilities and shared code
|
||||
- `plugin-common/` - Plugin framework common components
|
||||
- `spark-dpp/` - Spark data preprocessing integration
|
||||
- `hive-udf/` - Hive UDF compatibility layer
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Key Entry Points
|
||||
- `fe-core/src/main/java/com/starrocks/qe/StmtExecutor.java` - Main statement execution
|
||||
- `fe-core/src/main/java/com/starrocks/qe/ConnectContext.java` - Session context
|
||||
- `fe-core/src/main/java/com/starrocks/server/GlobalStateMgr.java` - Global state management
|
||||
|
||||
### SQL Processing Flow
|
||||
1. **Parser** (`sql/parser/`) - Parse SQL text to AST
|
||||
2. **Analyzer** (`sql/analyzer/`) - Semantic analysis and validation
|
||||
3. **Optimizer** (`sql/optimizer/`) - Cost-based optimization
|
||||
4. **Planner** (`planner/`) - Generate physical execution plan
|
||||
5. **Executor** (`qe/`) - Execute the plan
|
||||
|
||||
### Common Patterns
|
||||
- Most core classes extend from `GsonSerializable` for persistence
|
||||
- Use `ConnectContext.get()` to access current session context
|
||||
- Metadata operations go through `GlobalStateMgr.getCurrentState()`
|
||||
- External connectors implement `Connector` and `ConnectorMetadata` interfaces
|
||||
|
||||
### Testing
|
||||
- Unit tests are in `fe-core/src/test/`
|
||||
- Integration tests use SQL files in `/test/sql/`
|
||||
- Mock objects are in `fe-core/src/test/java/com/starrocks/utframe/`
|
||||
|
||||
## Contribution Guidelines
|
||||
|
||||
### PR Titles for Frontend Changes
|
||||
Use appropriate prefixes for frontend-related PRs:
|
||||
- `[BugFix] Fix SQL parser issue with complex expressions`
|
||||
- `[Feature] Add materialized view automatic refresh`
|
||||
- `[Enhancement] Improve connector metadata caching`
|
||||
- `[Performance] Optimize query planner for large joins`
|
||||
|
||||
### Commit Message Examples for Frontend
|
||||
```
|
||||
[Feature] Add support for Apache Paimon connector
|
||||
|
||||
Implement Paimon connector in fe-core to enable querying
|
||||
Paimon tables with full metadata integration.
|
||||
|
||||
- Add PaimonConnector and PaimonMetadata classes
|
||||
- Implement schema evolution and partition pruning
|
||||
- Add connector configuration and validation
|
||||
- Include comprehensive unit tests
|
||||
|
||||
Closes: #12345
|
||||
```
|
||||
|
|
@ -0,0 +1,177 @@
|
|||
# StarRocks Java Extensions Cursor Rules
|
||||
|
||||
## Overview
|
||||
Java Extensions provide connectivity to external data sources and extend StarRocks functionality through Java-based components. These extensions enable StarRocks to read from various external systems and provide extensibility through user-defined functions.
|
||||
|
||||
## ⚠️ BUILD WARNING
|
||||
**DO NOT attempt to build or run tests unless explicitly requested.** The Maven build system can be resource-intensive.
|
||||
|
||||
## Java Extensions Architecture
|
||||
|
||||
### External Data Connectors
|
||||
|
||||
#### Hadoop Ecosystem
|
||||
- `hadoop-ext/` - Hadoop ecosystem integration
|
||||
- Core Hadoop file system support
|
||||
- Hadoop configuration management
|
||||
- Security integration (Kerberos)
|
||||
|
||||
#### Data Lake Formats
|
||||
- `hive-reader/` - Apache Hive data reader
|
||||
- Hive metastore integration
|
||||
- Hive table format support
|
||||
- Partition handling
|
||||
|
||||
- `hudi-reader/` - Apache Hudi integration
|
||||
- Copy-on-write and merge-on-read tables
|
||||
- Timeline and metadata handling
|
||||
- Incremental query support
|
||||
|
||||
- `iceberg-metadata-reader/` - Apache Iceberg metadata reader
|
||||
- Iceberg table format support
|
||||
- Snapshot and schema evolution
|
||||
- Partition and file pruning
|
||||
|
||||
- `paimon-reader/` - Apache Paimon reader
|
||||
- Paimon table format support
|
||||
- Real-time and batch data access
|
||||
- Schema evolution handling
|
||||
|
||||
#### NoSQL and Analytics
|
||||
- `kudu-reader/` - Apache Kudu connector
|
||||
- Kudu table scanning
|
||||
- Predicate pushdown
|
||||
- Column pruning
|
||||
|
||||
- `odps-reader/` - ODPS (MaxCompute) reader
|
||||
- Alibaba Cloud MaxCompute integration
|
||||
- Table and partition access
|
||||
- Data type mapping
|
||||
|
||||
#### Connectivity
|
||||
- `jdbc-bridge/` - JDBC connectivity bridge
|
||||
- Generic JDBC data source support
|
||||
- Connection pooling
|
||||
- Query pushdown capabilities
|
||||
|
||||
- `jni-connector/` - JNI connectors for C++ integration
|
||||
- Bridge between Java extensions and C++ backend
|
||||
- Memory management for cross-language calls
|
||||
- Type conversion utilities
|
||||
|
||||
### Runtime and Utilities
|
||||
|
||||
#### Core Runtime
|
||||
- `common-runtime/` - Common runtime for Java extensions
|
||||
- Shared utilities and base classes
|
||||
- Configuration management
|
||||
- Logging and error handling
|
||||
|
||||
#### Development Tools
|
||||
- `java-utils/` - Java utilities and helper classes
|
||||
- Common data structures
|
||||
- Utility functions
|
||||
- Helper methods for connector development
|
||||
|
||||
#### User-Defined Functions
|
||||
- `udf-extensions/` - UDF extension framework
|
||||
- UDF registration and lifecycle management
|
||||
- Type system integration
|
||||
- Performance optimization
|
||||
|
||||
- `udf-examples/` - User-defined function examples
|
||||
- Sample UDF implementations
|
||||
- Best practices and patterns
|
||||
- Testing examples
|
||||
|
||||
### Dependencies
|
||||
- `hadoop-lib/` - Hadoop library dependencies
|
||||
- Hadoop client libraries
|
||||
- Version management
|
||||
- Compatibility handling
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Project Structure
|
||||
- Each extension follows Maven standard directory layout
|
||||
- `src/main/java/` - Main source code
|
||||
- `src/test/java/` - Unit tests
|
||||
- `pom.xml` - Maven build configuration
|
||||
|
||||
### Key Interfaces
|
||||
- `Connector` - Main connector interface
|
||||
- `ConnectorMetadata` - Metadata operations
|
||||
- `ConnectorScanRangeSource` - Data scanning
|
||||
- `RemoteFileIO` - File I/O operations
|
||||
|
||||
### Common Patterns
|
||||
- **Builder Pattern**: Used for configuration objects
|
||||
- **Factory Pattern**: For creating connector instances
|
||||
- **Template Method**: For common connector operations
|
||||
- **Strategy Pattern**: For different data access strategies
|
||||
|
||||
### Data Type Mapping
|
||||
- Consistent mapping between external system types and StarRocks types
|
||||
- Handle nullable vs non-nullable types appropriately
|
||||
- Support for complex types (arrays, maps, structs) where applicable
|
||||
|
||||
### Performance Considerations
|
||||
- **Predicate Pushdown**: Push filters to external systems when possible
|
||||
- **Column Pruning**: Only read required columns
|
||||
- **Partition Pruning**: Skip unnecessary partitions
|
||||
- **Parallel Processing**: Support parallel data reading
|
||||
- **Memory Management**: Efficient memory usage for large datasets
|
||||
|
||||
### Error Handling
|
||||
- Use StarRocks exception hierarchy
|
||||
- Provide meaningful error messages
|
||||
- Handle connection failures gracefully
|
||||
- Implement retry mechanisms where appropriate
|
||||
|
||||
### Configuration
|
||||
- Support both system-wide and per-table configuration
|
||||
- Use consistent naming conventions for properties
|
||||
- Provide sensible defaults
|
||||
- Document all configuration options
|
||||
|
||||
### Testing
|
||||
- Unit tests for core functionality
|
||||
- Integration tests with external systems (when available)
|
||||
- Mock external dependencies for reliable testing
|
||||
- Performance benchmarks for critical paths
|
||||
|
||||
### Security
|
||||
- Support authentication mechanisms of external systems
|
||||
- Handle credentials securely
|
||||
- Support encryption in transit
|
||||
- Implement proper access control
|
||||
|
||||
## Build System
|
||||
- Root `pom.xml` manages all extensions
|
||||
- Each extension has its own Maven module
|
||||
- Shared dependencies managed at parent level
|
||||
- Profiles for different build configurations
|
||||
|
||||
## Contribution Guidelines
|
||||
|
||||
### PR Titles for Java Extensions Changes
|
||||
Use appropriate prefixes for Java extensions PRs:
|
||||
- `[BugFix] Fix Hive partition metadata reading issue`
|
||||
- `[Feature] Add Delta Lake deletion vector support`
|
||||
- `[Enhancement] Improve JDBC connector connection pooling`
|
||||
- `[Performance] Optimize Iceberg metadata caching`
|
||||
|
||||
### Commit Message Examples for Java Extensions
|
||||
```
|
||||
[Feature] Add support for Kudu connector predicate pushdown
|
||||
|
||||
Implement predicate pushdown optimization for Kudu connector
|
||||
to reduce data transfer and improve query performance.
|
||||
|
||||
- Add predicate conversion from StarRocks to Kudu format
|
||||
- Implement column pruning optimization
|
||||
- Add support for complex predicate expressions
|
||||
- Include integration tests with Kudu test cluster
|
||||
|
||||
Closes: #12345
|
||||
```
|
||||
Loading…
Reference in New Issue