[Tool] Add cursor rules and code organization instructions (#61224)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
This commit is contained in:
Murphy 2025-07-24 18:15:11 +08:00 committed by GitHub
parent 4f7485c9b4
commit 7e506ec4e1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 478 additions and 0 deletions

158
.cursorrules Normal file
View File

@ -0,0 +1,158 @@
# StarRocks Project Cursor Rules
## Project Overview
StarRocks is an open-source, high-performance analytical database system designed for real-time analytics. This is a large-scale C++/Java project with a complex build system.
## ⚠️ IMPORTANT BUILD SYSTEM WARNING
**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
## Code Organization
### Backend (be/)
**Language**: C++
**Purpose**: Core analytical engine and storage layer
- `be/src/exec/` - Query execution engine components
- `be/src/storage/` - Storage engine and data persistence
- `be/src/exprs/` - Expression evaluation and JIT compilation
- `be/src/formats/` - Data format parsers and serializers
- `be/src/runtime/` - Runtime components (batch write, stream load, memory management, etc.)
- `be/src/connector/` - External data source connectors
- `be/src/service/` - Core backend services
- `be/src/common/` - Common utilities and shared code
**📋 See `be/.cursorrules` for detailed backend component breakdown**
### Frontend (fe/)
**Language**: Java
**Purpose**: SQL parsing, query planning, and metadata management
- `fe/fe-core/` - Core frontend services (SQL parser, planner, catalog)
- `fe/fe-common/` - Common frontend utilities
- `fe/plugin-common/` - Plugin framework common components
- `fe/spark-dpp/` - Spark data preprocessing integration
- `fe/hive-udf/` - Hive UDF compatibility layer
**📋 See `fe/.cursorrules` for detailed frontend component breakdown**
### Java Extensions (java-extensions/)
**Language**: Java
**Purpose**: External connectors and extensions
- `java-extensions/hive-reader/` - Hive data reader
- `java-extensions/iceberg-metadata-reader/` - Apache Iceberg metadata reader
- `java-extensions/hudi-reader/` - Apache Hudi integration
- `java-extensions/paimon-reader/` - Apache Paimon reader
- `java-extensions/jdbc-bridge/` - JDBC connectivity bridge
- `java-extensions/hadoop-ext/` - Hadoop ecosystem integration
- `java-extensions/udf-extensions/` - UDF extension framework
- `java-extensions/common-runtime/` - Common runtime for Java extensions
**📋 See `java-extensions/.cursorrules` for detailed extensions breakdown**
### Generated Sources (gensrc/)
**Purpose**: Auto-generated code from IDL definitions
- `gensrc/proto/` - Protocol buffer definitions
- `gensrc/thrift/` - Thrift interface definitions
- `gensrc/script/` - Code generation scripts
### Testing (test/)
**Language**: Python
**Purpose**: Integration and SQL testing framework
- `test/sql/` - SQL test cases organized by functionality
- `test/common/` - Common test utilities
- `test/lib/` - Test libraries and helpers
### Tools and Utilities
- `tools/` - Diagnostic tools, benchmarks, and utilities
- `bin/` - Binary executables and scripts
- `conf/` - Configuration files and templates
- `build-support/` - Build system support files
- `docker/` - Docker build configurations
- `docs/` - Project documentation
### Third-party Dependencies
- `thirdparty/` - External dependencies and patches
- `licenses/` - License files for dependencies
### Other Important Directories
- `fs_brokers/` - File system broker implementations
- `webroot/` - Web UI static files
- `format-sdk/` - Format SDK for data interchange
## Development Guidelines
1. **No Building**: Avoid running build commands (`build.sh`, `make`, etc.) unless specifically requested
2. **No Unit Tests**: Do not execute unit test scripts (`run-be-ut.sh`, `run-fe-ut.sh`, etc.)
3. **Focus on Code Analysis**: Prioritize code reading, analysis, and small targeted changes
4. **Language Awareness**:
- Backend (be/) is C++ - focus on performance and memory management
- Frontend (fe/) is Java - focus on SQL parsing and query planning
- Tests are Python - focus on SQL correctness and integration testing
## Pull Request Guidelines
### PR Title Format
PR titles must include a prefix to categorize the change:
- **[BugFix]** - Bug fixes and error corrections
- **[Enhancement]** - Improvements to existing functionality
- **[Feature]** - New features and capabilities
- **[Refactor]** - Code refactoring without functional changes
- **[Test]** - Test-related changes
- **[Doc]** - Documentation updates
- **[Build]** - Build system and CI/CD changes
- **[Performance]** - Performance optimizations
**Examples:**
- `[BugFix] Fix memory leak in column batch processing`
- `[Feature] Add support for Apache Paimon connector`
- `[Enhancement] Improve query optimizer for materialized views`
### Commit Message Template
Follow this structured format for all commit messages:
```
[Category] Brief description (50 chars or less)
Detailed explanation of what this commit does and why.
Wrap lines at 72 characters.
- Key change 1
- Key change 2
- Key change 3
Fixes: #issue_number (if applicable)
Closes: #issue_number (if applicable)
```
**Categories:** BugFix, Enhancement, Feature, Refactor, Test, Doc, Build, Performance
**Example:**
```
[Feature] Add Apache Iceberg table format support
Implement Iceberg connector to enable querying Iceberg tables
directly from StarRocks. This includes metadata reading,
partition pruning, and schema evolution support.
- Add IcebergConnector and IcebergMetadata classes
- Implement partition and file pruning optimizations
- Support for Iceberg v1 and v2 table formats
- Add comprehensive unit tests
Closes: #12345
```
## Common File Extensions
- `.cpp`, `.h`, `.cc` - C++ source and headers (backend)
- `.java` - Java source files (frontend and extensions)
- `.proto` - Protocol buffer definitions
- `.thrift` - Thrift interface definitions
- `.sql` - SQL test cases and queries
- `.py` - Python test scripts
## Build System Files to Avoid
- `build.sh` - Main build script (very resource intensive)
- `build-in-docker.sh` - Docker-based build
- `run-*-ut.sh` - Unit test runners
- `Makefile*` - Make build files
- `pom.xml` - Maven build files (for Java components)

143
fe/.cursorrules Normal file
View File

@ -0,0 +1,143 @@
# StarRocks Frontend (fe/) Cursor Rules
## Overview
The frontend is the Java-based component responsible for SQL parsing, query planning, metadata management, and coordination. It serves as the brain of StarRocks, handling all SQL operations and managing the distributed query execution.
## ⚠️ BUILD WARNING
**DO NOT attempt to build or run tests unless explicitly requested.** The build system is resource-intensive.
## Frontend Architecture
### Frontend Core (fe-core/)
The main frontend module containing all core database functionality:
#### Core SQL Processing
- `fe-core/src/main/java/com/starrocks/sql/` - SQL processing pipeline
- `sql/parser/` - SQL parser (ANTLR-based)
- `sql/analyzer/` - SQL semantic analysis and validation
- `sql/ast/` - Abstract Syntax Tree definitions
- `sql/optimizer/` - Cost-based query optimizer
- `sql/plan/` - Physical query plan generation
- `sql/spm/` - SQL Plan Management
#### Metadata Management
- `fe-core/src/main/java/com/starrocks/catalog/` - Metadata catalog system
- `catalog/system/` - System tables and metadata
- `catalog/mv/` - Materialized view metadata
- `catalog/constraint/` - Table constraints management
- `catalog/combinator/` - Catalog combinators
#### Query Execution
- `fe-core/src/main/java/com/starrocks/qe/` - Query execution engine
- Core classes: `ConnectContext`, `StmtExecutor`, `DefaultCoordinator`
- Session management: `SessionVariable`, `ConnectProcessor`
- Query scheduling: `SimpleScheduler`, backend selectors
- Result processing: `ShowExecutor`, `ResultReceiver`
#### Query Planning
- `fe-core/src/main/java/com/starrocks/planner/` - Physical query planning
- `planner/stream/` - Stream processing plans
#### External Connectors
- `fe-core/src/main/java/com/starrocks/connector/` - External data source connectors
- `connector/hive/` - Apache Hive integration
- `connector/iceberg/` - Apache Iceberg support
- `connector/hudi/` - Apache Hudi integration
- `connector/jdbc/` - JDBC connectivity
- `connector/elasticsearch/` - Elasticsearch connector
- `connector/delta/` - Delta Lake support
- `connector/kudu/` - Apache Kudu connector
- `connector/odps/` - ODPS (MaxCompute) connector
- `connector/paimon/` - Apache Paimon connector
#### Data Loading
- `fe-core/src/main/java/com/starrocks/load/` - Data ingestion framework
- `load/loadv2/` - Load v2 implementation
- `load/routineload/` - Routine/streaming load
- `load/streamload/` - Stream loading
- `load/batchwrite/` - Batch write operations
- `load/pipe/` - Data pipeline management
#### Storage & Persistence
- `fe-core/src/main/java/com/starrocks/persist/` - Metadata persistence
- `fe-core/src/main/java/com/starrocks/journal/` - Write-ahead logging
- `fe-core/src/main/java/com/starrocks/meta/` - Metadata management
#### Cluster Management
- `fe-core/src/main/java/com/starrocks/system/` - System information service
- `fe-core/src/main/java/com/starrocks/server/` - Server components and table factories
- `fe-core/src/main/java/com/starrocks/ha/` - High availability
- `fe-core/src/main/java/com/starrocks/leader/` - Leader election
- `fe-core/src/main/java/com/starrocks/clone/` - Data replication
#### Security & Access Control
- `fe-core/src/main/java/com/starrocks/authentication/` - User authentication
- `fe-core/src/main/java/com/starrocks/authorization/` - Access control
- `fe-core/src/main/java/com/starrocks/credential/` - Credential management
#### Advanced Features
- `fe-core/src/main/java/com/starrocks/mv/` - Materialized views
- `fe-core/src/main/java/com/starrocks/scheduler/` - Task scheduling
- `fe-core/src/main/java/com/starrocks/statistic/` - Statistics collection
- `fe-core/src/main/java/com/starrocks/warehouse/` - Data warehouse management
- `fe-core/src/main/java/com/starrocks/lake/` - Lake storage format
#### Monitoring & Operations
- `fe-core/src/main/java/com/starrocks/monitor/` - System monitoring
- `fe-core/src/main/java/com/starrocks/metric/` - Metrics collection
- `fe-core/src/main/java/com/starrocks/http/` - HTTP API endpoints
### Other Frontend Modules
- `fe-common/` - Common frontend utilities and shared code
- `plugin-common/` - Plugin framework common components
- `spark-dpp/` - Spark data preprocessing integration
- `hive-udf/` - Hive UDF compatibility layer
## Development Guidelines
### Key Entry Points
- `fe-core/src/main/java/com/starrocks/qe/StmtExecutor.java` - Main statement execution
- `fe-core/src/main/java/com/starrocks/qe/ConnectContext.java` - Session context
- `fe-core/src/main/java/com/starrocks/server/GlobalStateMgr.java` - Global state management
### SQL Processing Flow
1. **Parser** (`sql/parser/`) - Parse SQL text to AST
2. **Analyzer** (`sql/analyzer/`) - Semantic analysis and validation
3. **Optimizer** (`sql/optimizer/`) - Cost-based optimization
4. **Planner** (`planner/`) - Generate physical execution plan
5. **Executor** (`qe/`) - Execute the plan
### Common Patterns
- Most core classes extend from `GsonSerializable` for persistence
- Use `ConnectContext.get()` to access current session context
- Metadata operations go through `GlobalStateMgr.getCurrentState()`
- External connectors implement `Connector` and `ConnectorMetadata` interfaces
### Testing
- Unit tests are in `fe-core/src/test/`
- Integration tests use SQL files in `/test/sql/`
- Mock objects are in `fe-core/src/test/java/com/starrocks/utframe/`
## Contribution Guidelines
### PR Titles for Frontend Changes
Use appropriate prefixes for frontend-related PRs:
- `[BugFix] Fix SQL parser issue with complex expressions`
- `[Feature] Add materialized view automatic refresh`
- `[Enhancement] Improve connector metadata caching`
- `[Performance] Optimize query planner for large joins`
### Commit Message Examples for Frontend
```
[Feature] Add support for Apache Paimon connector
Implement Paimon connector in fe-core to enable querying
Paimon tables with full metadata integration.
- Add PaimonConnector and PaimonMetadata classes
- Implement schema evolution and partition pruning
- Add connector configuration and validation
- Include comprehensive unit tests
Closes: #12345
```

View File

@ -0,0 +1,177 @@
# StarRocks Java Extensions Cursor Rules
## Overview
Java Extensions provide connectivity to external data sources and extend StarRocks functionality through Java-based components. These extensions enable StarRocks to read from various external systems and provide extensibility through user-defined functions.
## ⚠️ BUILD WARNING
**DO NOT attempt to build or run tests unless explicitly requested.** The Maven build system can be resource-intensive.
## Java Extensions Architecture
### External Data Connectors
#### Hadoop Ecosystem
- `hadoop-ext/` - Hadoop ecosystem integration
- Core Hadoop file system support
- Hadoop configuration management
- Security integration (Kerberos)
#### Data Lake Formats
- `hive-reader/` - Apache Hive data reader
- Hive metastore integration
- Hive table format support
- Partition handling
- `hudi-reader/` - Apache Hudi integration
- Copy-on-write and merge-on-read tables
- Timeline and metadata handling
- Incremental query support
- `iceberg-metadata-reader/` - Apache Iceberg metadata reader
- Iceberg table format support
- Snapshot and schema evolution
- Partition and file pruning
- `paimon-reader/` - Apache Paimon reader
- Paimon table format support
- Real-time and batch data access
- Schema evolution handling
#### NoSQL and Analytics
- `kudu-reader/` - Apache Kudu connector
- Kudu table scanning
- Predicate pushdown
- Column pruning
- `odps-reader/` - ODPS (MaxCompute) reader
- Alibaba Cloud MaxCompute integration
- Table and partition access
- Data type mapping
#### Connectivity
- `jdbc-bridge/` - JDBC connectivity bridge
- Generic JDBC data source support
- Connection pooling
- Query pushdown capabilities
- `jni-connector/` - JNI connectors for C++ integration
- Bridge between Java extensions and C++ backend
- Memory management for cross-language calls
- Type conversion utilities
### Runtime and Utilities
#### Core Runtime
- `common-runtime/` - Common runtime for Java extensions
- Shared utilities and base classes
- Configuration management
- Logging and error handling
#### Development Tools
- `java-utils/` - Java utilities and helper classes
- Common data structures
- Utility functions
- Helper methods for connector development
#### User-Defined Functions
- `udf-extensions/` - UDF extension framework
- UDF registration and lifecycle management
- Type system integration
- Performance optimization
- `udf-examples/` - User-defined function examples
- Sample UDF implementations
- Best practices and patterns
- Testing examples
### Dependencies
- `hadoop-lib/` - Hadoop library dependencies
- Hadoop client libraries
- Version management
- Compatibility handling
## Development Guidelines
### Project Structure
- Each extension follows Maven standard directory layout
- `src/main/java/` - Main source code
- `src/test/java/` - Unit tests
- `pom.xml` - Maven build configuration
### Key Interfaces
- `Connector` - Main connector interface
- `ConnectorMetadata` - Metadata operations
- `ConnectorScanRangeSource` - Data scanning
- `RemoteFileIO` - File I/O operations
### Common Patterns
- **Builder Pattern**: Used for configuration objects
- **Factory Pattern**: For creating connector instances
- **Template Method**: For common connector operations
- **Strategy Pattern**: For different data access strategies
### Data Type Mapping
- Consistent mapping between external system types and StarRocks types
- Handle nullable vs non-nullable types appropriately
- Support for complex types (arrays, maps, structs) where applicable
### Performance Considerations
- **Predicate Pushdown**: Push filters to external systems when possible
- **Column Pruning**: Only read required columns
- **Partition Pruning**: Skip unnecessary partitions
- **Parallel Processing**: Support parallel data reading
- **Memory Management**: Efficient memory usage for large datasets
### Error Handling
- Use StarRocks exception hierarchy
- Provide meaningful error messages
- Handle connection failures gracefully
- Implement retry mechanisms where appropriate
### Configuration
- Support both system-wide and per-table configuration
- Use consistent naming conventions for properties
- Provide sensible defaults
- Document all configuration options
### Testing
- Unit tests for core functionality
- Integration tests with external systems (when available)
- Mock external dependencies for reliable testing
- Performance benchmarks for critical paths
### Security
- Support authentication mechanisms of external systems
- Handle credentials securely
- Support encryption in transit
- Implement proper access control
## Build System
- Root `pom.xml` manages all extensions
- Each extension has its own Maven module
- Shared dependencies managed at parent level
- Profiles for different build configurations
## Contribution Guidelines
### PR Titles for Java Extensions Changes
Use appropriate prefixes for Java extensions PRs:
- `[BugFix] Fix Hive partition metadata reading issue`
- `[Feature] Add Delta Lake deletion vector support`
- `[Enhancement] Improve JDBC connector connection pooling`
- `[Performance] Optimize Iceberg metadata caching`
### Commit Message Examples for Java Extensions
```
[Feature] Add support for Kudu connector predicate pushdown
Implement predicate pushdown optimization for Kudu connector
to reduce data transfer and improve query performance.
- Add predicate conversion from StarRocks to Kudu format
- Implement column pruning optimization
- Add support for complex predicate expressions
- Include integration tests with Kudu test cluster
Closes: #12345
```