[Tool] Add cursor rules and code organization instructions (#61224)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-24 18:15:11 +08:00 · 2025-07-24 18:15:11 +08:00 · 7e506ec4e1
parent 4f7485c9b4
commit 7e506ec4e1
3 changed files with 478 additions and 0 deletions
--- a/.cursorrules
+++ b/.cursorrules
@ -0,0 +1,158 @@
+# StarRocks Project Cursor Rules
+
+## Project Overview
+StarRocks is an open-source, high-performance analytical database system designed for real-time analytics. This is a large-scale C++/Java project with a complex build system.
+
+## ⚠️ IMPORTANT BUILD SYSTEM WARNING
+**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
+The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.
+
+## Code Organization
+
+### Backend (be/)
+**Language**: C++
+**Purpose**: Core analytical engine and storage layer
+- `be/src/exec/` - Query execution engine components
+- `be/src/storage/` - Storage engine and data persistence
+- `be/src/exprs/` - Expression evaluation and JIT compilation
+- `be/src/formats/` - Data format parsers and serializers
+- `be/src/runtime/` - Runtime components (batch write, stream load, memory management, etc.)
+- `be/src/connector/` - External data source connectors
+- `be/src/service/` - Core backend services
+- `be/src/common/` - Common utilities and shared code
+
+**📋 See `be/.cursorrules` for detailed backend component breakdown**
+
+### Frontend (fe/)
+**Language**: Java
+**Purpose**: SQL parsing, query planning, and metadata management
+- `fe/fe-core/` - Core frontend services (SQL parser, planner, catalog)
+- `fe/fe-common/` - Common frontend utilities
+- `fe/plugin-common/` - Plugin framework common components
+- `fe/spark-dpp/` - Spark data preprocessing integration
+- `fe/hive-udf/` - Hive UDF compatibility layer
+
+**📋 See `fe/.cursorrules` for detailed frontend component breakdown**
+
+### Java Extensions (java-extensions/)
+**Language**: Java
+**Purpose**: External connectors and extensions
+- `java-extensions/hive-reader/` - Hive data reader
+- `java-extensions/iceberg-metadata-reader/` - Apache Iceberg metadata reader
+- `java-extensions/hudi-reader/` - Apache Hudi integration
+- `java-extensions/paimon-reader/` - Apache Paimon reader
+- `java-extensions/jdbc-bridge/` - JDBC connectivity bridge
+- `java-extensions/hadoop-ext/` - Hadoop ecosystem integration
+- `java-extensions/udf-extensions/` - UDF extension framework
+- `java-extensions/common-runtime/` - Common runtime for Java extensions
+
+**📋 See `java-extensions/.cursorrules` for detailed extensions breakdown**
+
+### Generated Sources (gensrc/)
+**Purpose**: Auto-generated code from IDL definitions
+- `gensrc/proto/` - Protocol buffer definitions
+- `gensrc/thrift/` - Thrift interface definitions
+- `gensrc/script/` - Code generation scripts
+
+### Testing (test/)
+**Language**: Python
+**Purpose**: Integration and SQL testing framework
+- `test/sql/` - SQL test cases organized by functionality
+- `test/common/` - Common test utilities
+- `test/lib/` - Test libraries and helpers
+
+### Tools and Utilities
+- `tools/` - Diagnostic tools, benchmarks, and utilities
+- `bin/` - Binary executables and scripts
+- `conf/` - Configuration files and templates
+- `build-support/` - Build system support files
+- `docker/` - Docker build configurations
+- `docs/` - Project documentation
+
+### Third-party Dependencies
+- `thirdparty/` - External dependencies and patches
+- `licenses/` - License files for dependencies
+
+### Other Important Directories
+- `fs_brokers/` - File system broker implementations
+- `webroot/` - Web UI static files
+- `format-sdk/` - Format SDK for data interchange
+
+## Development Guidelines
+
+1. **No Building**: Avoid running build commands (`build.sh`, `make`, etc.) unless specifically requested
+2. **No Unit Tests**: Do not execute unit test scripts (`run-be-ut.sh`, `run-fe-ut.sh`, etc.)
+3. **Focus on Code Analysis**: Prioritize code reading, analysis, and small targeted changes
+4. **Language Awareness**: 
+   - Backend (be/) is C++ - focus on performance and memory management
+   - Frontend (fe/) is Java - focus on SQL parsing and query planning
+   - Tests are Python - focus on SQL correctness and integration testing
+
+## Pull Request Guidelines
+
+### PR Title Format
+PR titles must include a prefix to categorize the change:
+
+- **[BugFix]** - Bug fixes and error corrections
+- **[Enhancement]** - Improvements to existing functionality
+- **[Feature]** - New features and capabilities
+- **[Refactor]** - Code refactoring without functional changes
+- **[Test]** - Test-related changes
+- **[Doc]** - Documentation updates
+- **[Build]** - Build system and CI/CD changes
+- **[Performance]** - Performance optimizations
+
+**Examples:**
+- `[BugFix] Fix memory leak in column batch processing`
+- `[Feature] Add support for Apache Paimon connector`
+- `[Enhancement] Improve query optimizer for materialized views`
+
+### Commit Message Template
+Follow this structured format for all commit messages:
+
+```
+[Category] Brief description (50 chars or less)
+
+Detailed explanation of what this commit does and why.
+Wrap lines at 72 characters.
+
+- Key change 1
+- Key change 2
+- Key change 3
+
+Fixes: #issue_number (if applicable)
+Closes: #issue_number (if applicable)
+```
+
+**Categories:** BugFix, Enhancement, Feature, Refactor, Test, Doc, Build, Performance
+
+**Example:**
+```
+[Feature] Add Apache Iceberg table format support
+
+Implement Iceberg connector to enable querying Iceberg tables
+directly from StarRocks. This includes metadata reading,
+partition pruning, and schema evolution support.
+
+- Add IcebergConnector and IcebergMetadata classes
+- Implement partition and file pruning optimizations  
+- Support for Iceberg v1 and v2 table formats
+- Add comprehensive unit tests
+
+Closes: #12345
+```
+
+## Common File Extensions
+- `.cpp`, `.h`, `.cc` - C++ source and headers (backend)
+- `.java` - Java source files (frontend and extensions)
+- `.proto` - Protocol buffer definitions
+- `.thrift` - Thrift interface definitions
+- `.sql` - SQL test cases and queries
+- `.py` - Python test scripts
+
+## Build System Files to Avoid
+- `build.sh` - Main build script (very resource intensive)
+- `build-in-docker.sh` - Docker-based build
+- `run-*-ut.sh` - Unit test runners
+- `Makefile*` - Make build files
+- `pom.xml` - Maven build files (for Java components)
--- a/fe/.cursorrules
+++ b/fe/.cursorrules
@ -0,0 +1,143 @@
+# StarRocks Frontend (fe/) Cursor Rules
+
+## Overview
+The frontend is the Java-based component responsible for SQL parsing, query planning, metadata management, and coordination. It serves as the brain of StarRocks, handling all SQL operations and managing the distributed query execution.
+
+## ⚠️ BUILD WARNING
+**DO NOT attempt to build or run tests unless explicitly requested.** The build system is resource-intensive.
+
+## Frontend Architecture
+
+### Frontend Core (fe-core/)
+The main frontend module containing all core database functionality:
+
+#### Core SQL Processing
+- `fe-core/src/main/java/com/starrocks/sql/` - SQL processing pipeline
+  - `sql/parser/` - SQL parser (ANTLR-based)
+  - `sql/analyzer/` - SQL semantic analysis and validation
+  - `sql/ast/` - Abstract Syntax Tree definitions
+  - `sql/optimizer/` - Cost-based query optimizer
+  - `sql/plan/` - Physical query plan generation
+  - `sql/spm/` - SQL Plan Management
+
+#### Metadata Management
+- `fe-core/src/main/java/com/starrocks/catalog/` - Metadata catalog system
+  - `catalog/system/` - System tables and metadata
+  - `catalog/mv/` - Materialized view metadata
+  - `catalog/constraint/` - Table constraints management
+  - `catalog/combinator/` - Catalog combinators
+
+#### Query Execution
+- `fe-core/src/main/java/com/starrocks/qe/` - Query execution engine
+  - Core classes: `ConnectContext`, `StmtExecutor`, `DefaultCoordinator`
+  - Session management: `SessionVariable`, `ConnectProcessor`
+  - Query scheduling: `SimpleScheduler`, backend selectors
+  - Result processing: `ShowExecutor`, `ResultReceiver`
+
+#### Query Planning
+- `fe-core/src/main/java/com/starrocks/planner/` - Physical query planning
+  - `planner/stream/` - Stream processing plans
+
+#### External Connectors
+- `fe-core/src/main/java/com/starrocks/connector/` - External data source connectors
+  - `connector/hive/` - Apache Hive integration
+  - `connector/iceberg/` - Apache Iceberg support
+  - `connector/hudi/` - Apache Hudi integration
+  - `connector/jdbc/` - JDBC connectivity
+  - `connector/elasticsearch/` - Elasticsearch connector
+  - `connector/delta/` - Delta Lake support
+  - `connector/kudu/` - Apache Kudu connector
+  - `connector/odps/` - ODPS (MaxCompute) connector
+  - `connector/paimon/` - Apache Paimon connector
+
+#### Data Loading
+- `fe-core/src/main/java/com/starrocks/load/` - Data ingestion framework
+  - `load/loadv2/` - Load v2 implementation
+  - `load/routineload/` - Routine/streaming load
+  - `load/streamload/` - Stream loading
+  - `load/batchwrite/` - Batch write operations
+  - `load/pipe/` - Data pipeline management
+
+#### Storage & Persistence
+- `fe-core/src/main/java/com/starrocks/persist/` - Metadata persistence
+- `fe-core/src/main/java/com/starrocks/journal/` - Write-ahead logging
+- `fe-core/src/main/java/com/starrocks/meta/` - Metadata management
+
+#### Cluster Management
+- `fe-core/src/main/java/com/starrocks/system/` - System information service
+- `fe-core/src/main/java/com/starrocks/server/` - Server components and table factories
+- `fe-core/src/main/java/com/starrocks/ha/` - High availability
+- `fe-core/src/main/java/com/starrocks/leader/` - Leader election
+- `fe-core/src/main/java/com/starrocks/clone/` - Data replication
+
+#### Security & Access Control
+- `fe-core/src/main/java/com/starrocks/authentication/` - User authentication
+- `fe-core/src/main/java/com/starrocks/authorization/` - Access control
+- `fe-core/src/main/java/com/starrocks/credential/` - Credential management
+
+#### Advanced Features
+- `fe-core/src/main/java/com/starrocks/mv/` - Materialized views
+- `fe-core/src/main/java/com/starrocks/scheduler/` - Task scheduling
+- `fe-core/src/main/java/com/starrocks/statistic/` - Statistics collection
+- `fe-core/src/main/java/com/starrocks/warehouse/` - Data warehouse management
+- `fe-core/src/main/java/com/starrocks/lake/` - Lake storage format
+
+#### Monitoring & Operations
+- `fe-core/src/main/java/com/starrocks/monitor/` - System monitoring
+- `fe-core/src/main/java/com/starrocks/metric/` - Metrics collection
+- `fe-core/src/main/java/com/starrocks/http/` - HTTP API endpoints
+
+### Other Frontend Modules
+- `fe-common/` - Common frontend utilities and shared code
+- `plugin-common/` - Plugin framework common components
+- `spark-dpp/` - Spark data preprocessing integration
+- `hive-udf/` - Hive UDF compatibility layer
+
+## Development Guidelines
+
+### Key Entry Points
+- `fe-core/src/main/java/com/starrocks/qe/StmtExecutor.java` - Main statement execution
+- `fe-core/src/main/java/com/starrocks/qe/ConnectContext.java` - Session context
+- `fe-core/src/main/java/com/starrocks/server/GlobalStateMgr.java` - Global state management
+
+### SQL Processing Flow
+1. **Parser** (`sql/parser/`) - Parse SQL text to AST
+2. **Analyzer** (`sql/analyzer/`) - Semantic analysis and validation
+3. **Optimizer** (`sql/optimizer/`) - Cost-based optimization
+4. **Planner** (`planner/`) - Generate physical execution plan
+5. **Executor** (`qe/`) - Execute the plan
+
+### Common Patterns
+- Most core classes extend from `GsonSerializable` for persistence
+- Use `ConnectContext.get()` to access current session context
+- Metadata operations go through `GlobalStateMgr.getCurrentState()`
+- External connectors implement `Connector` and `ConnectorMetadata` interfaces
+
+### Testing
+- Unit tests are in `fe-core/src/test/`
+- Integration tests use SQL files in `/test/sql/`
+- Mock objects are in `fe-core/src/test/java/com/starrocks/utframe/`
+
+## Contribution Guidelines
+
+### PR Titles for Frontend Changes
+Use appropriate prefixes for frontend-related PRs:
+- `[BugFix] Fix SQL parser issue with complex expressions`
+- `[Feature] Add materialized view automatic refresh`
+- `[Enhancement] Improve connector metadata caching`
+- `[Performance] Optimize query planner for large joins`
+
+### Commit Message Examples for Frontend
+```
+[Feature] Add support for Apache Paimon connector
+
+Implement Paimon connector in fe-core to enable querying
+Paimon tables with full metadata integration.
+
+- Add PaimonConnector and PaimonMetadata classes
+- Implement schema evolution and partition pruning
+- Add connector configuration and validation
+- Include comprehensive unit tests
+
+Closes: #12345
+```
--- a/java-extensions/.cursorrules
+++ b/java-extensions/.cursorrules
@ -0,0 +1,177 @@
+# StarRocks Java Extensions Cursor Rules
+
+## Overview
+Java Extensions provide connectivity to external data sources and extend StarRocks functionality through Java-based components. These extensions enable StarRocks to read from various external systems and provide extensibility through user-defined functions.
+
+## ⚠️ BUILD WARNING
+**DO NOT attempt to build or run tests unless explicitly requested.** The Maven build system can be resource-intensive.
+
+## Java Extensions Architecture
+
+### External Data Connectors
+
+#### Hadoop Ecosystem
+- `hadoop-ext/` - Hadoop ecosystem integration
+  - Core Hadoop file system support
+  - Hadoop configuration management
+  - Security integration (Kerberos)
+
+#### Data Lake Formats
+- `hive-reader/` - Apache Hive data reader
+  - Hive metastore integration
+  - Hive table format support
+  - Partition handling
+
+- `hudi-reader/` - Apache Hudi integration
+  - Copy-on-write and merge-on-read tables
+  - Timeline and metadata handling
+  - Incremental query support
+
+- `iceberg-metadata-reader/` - Apache Iceberg metadata reader
+  - Iceberg table format support
+  - Snapshot and schema evolution
+  - Partition and file pruning
+
+- `paimon-reader/` - Apache Paimon reader
+  - Paimon table format support
+  - Real-time and batch data access
+  - Schema evolution handling
+
+#### NoSQL and Analytics
+- `kudu-reader/` - Apache Kudu connector
+  - Kudu table scanning
+  - Predicate pushdown
+  - Column pruning
+
+- `odps-reader/` - ODPS (MaxCompute) reader
+  - Alibaba Cloud MaxCompute integration
+  - Table and partition access
+  - Data type mapping
+
+#### Connectivity
+- `jdbc-bridge/` - JDBC connectivity bridge
+  - Generic JDBC data source support
+  - Connection pooling
+  - Query pushdown capabilities
+
+- `jni-connector/` - JNI connectors for C++ integration
+  - Bridge between Java extensions and C++ backend
+  - Memory management for cross-language calls
+  - Type conversion utilities
+
+### Runtime and Utilities
+
+#### Core Runtime
+- `common-runtime/` - Common runtime for Java extensions
+  - Shared utilities and base classes
+  - Configuration management
+  - Logging and error handling
+
+#### Development Tools
+- `java-utils/` - Java utilities and helper classes
+  - Common data structures
+  - Utility functions
+  - Helper methods for connector development
+
+#### User-Defined Functions
+- `udf-extensions/` - UDF extension framework
+  - UDF registration and lifecycle management
+  - Type system integration
+  - Performance optimization
+
+- `udf-examples/` - User-defined function examples
+  - Sample UDF implementations
+  - Best practices and patterns
+  - Testing examples
+
+### Dependencies
+- `hadoop-lib/` - Hadoop library dependencies
+  - Hadoop client libraries
+  - Version management
+  - Compatibility handling
+
+## Development Guidelines
+
+### Project Structure
+- Each extension follows Maven standard directory layout
+- `src/main/java/` - Main source code
+- `src/test/java/` - Unit tests
+- `pom.xml` - Maven build configuration
+
+### Key Interfaces
+- `Connector` - Main connector interface
+- `ConnectorMetadata` - Metadata operations
+- `ConnectorScanRangeSource` - Data scanning
+- `RemoteFileIO` - File I/O operations
+
+### Common Patterns
+- **Builder Pattern**: Used for configuration objects
+- **Factory Pattern**: For creating connector instances
+- **Template Method**: For common connector operations
+- **Strategy Pattern**: For different data access strategies
+
+### Data Type Mapping
+- Consistent mapping between external system types and StarRocks types
+- Handle nullable vs non-nullable types appropriately
+- Support for complex types (arrays, maps, structs) where applicable
+
+### Performance Considerations
+- **Predicate Pushdown**: Push filters to external systems when possible
+- **Column Pruning**: Only read required columns
+- **Partition Pruning**: Skip unnecessary partitions
+- **Parallel Processing**: Support parallel data reading
+- **Memory Management**: Efficient memory usage for large datasets
+
+### Error Handling
+- Use StarRocks exception hierarchy
+- Provide meaningful error messages
+- Handle connection failures gracefully
+- Implement retry mechanisms where appropriate
+
+### Configuration
+- Support both system-wide and per-table configuration
+- Use consistent naming conventions for properties
+- Provide sensible defaults
+- Document all configuration options
+
+### Testing
+- Unit tests for core functionality
+- Integration tests with external systems (when available)
+- Mock external dependencies for reliable testing
+- Performance benchmarks for critical paths
+
+### Security
+- Support authentication mechanisms of external systems
+- Handle credentials securely
+- Support encryption in transit
+- Implement proper access control
+
+## Build System
+- Root `pom.xml` manages all extensions
+- Each extension has its own Maven module
+- Shared dependencies managed at parent level
+- Profiles for different build configurations
+
+## Contribution Guidelines
+
+### PR Titles for Java Extensions Changes
+Use appropriate prefixes for Java extensions PRs:
+- `[BugFix] Fix Hive partition metadata reading issue`
+- `[Feature] Add Delta Lake deletion vector support`
+- `[Enhancement] Improve JDBC connector connection pooling`
+- `[Performance] Optimize Iceberg metadata caching`
+
+### Commit Message Examples for Java Extensions
+```
+[Feature] Add support for Kudu connector predicate pushdown
+
+Implement predicate pushdown optimization for Kudu connector
+to reduce data transfer and improve query performance.
+
+- Add predicate conversion from StarRocks to Kudu format
+- Implement column pruning optimization
+- Add support for complex predicate expressions
+- Include integration tests with Kudu test cluster
+
+Closes: #12345
+```