# StarRocks Project Cursor Rules

## Project Overview
StarRocks is an open-source, high-performance analytical database system designed for real-time analytics. This is a large-scale C++/Java project with a complex build system.

## ⚠️ IMPORTANT BUILD SYSTEM WARNING
**DO NOT attempt to build or run unit tests (UT) for this project unless explicitly requested by the user.**
The build system is extremely resource-intensive and time-consuming. Building the full project can take hours and requires significant system resources.

## Code Organization

### Backend (be/)
**Language**: C++
**Purpose**: Core analytical engine and storage layer
- `be/src/exec/` - Query execution engine components
- `be/src/storage/` - Storage engine and data persistence
- `be/src/exprs/` - Expression evaluation and JIT compilation
- `be/src/formats/` - Data format parsers and serializers
- `be/src/runtime/` - Runtime components (batch write, stream load, memory management, etc.)
- `be/src/connector/` - External data source connectors
- `be/src/service/` - Core backend services
- `be/src/common/` - Common utilities and shared code

**📋 See `be/.cursorrules` for detailed backend component breakdown**

### Frontend (fe/)
**Language**: Java
**Purpose**: SQL parsing, query planning, and metadata management
- `fe/fe-core/` - Core frontend services (SQL parser, planner, catalog)
- `fe/fe-testing/` - Common test utilities
- `fe/fe-utils/` - Common utilities and helpers
- `fe/spark-dpp/` - Spark data preprocessing integration
- `fe/hive-udf/` - Hive UDF compatibility layer

**📋 See `fe/.cursorrules` for detailed frontend component breakdown**

### Java Extensions (java-extensions/)
**Language**: Java
**Purpose**: External connectors and extensions
- `java-extensions/hive-reader/` - Hive data reader
- `java-extensions/iceberg-metadata-reader/` - Apache Iceberg metadata reader
- `java-extensions/hudi-reader/` - Apache Hudi integration
- `java-extensions/paimon-reader/` - Apache Paimon reader
- `java-extensions/jdbc-bridge/` - JDBC connectivity bridge
- `java-extensions/hadoop-ext/` - Hadoop ecosystem integration
- `java-extensions/udf-extensions/` - UDF extension framework
- `java-extensions/common-runtime/` - Common runtime for Java extensions

**📋 See `java-extensions/.cursorrules` for detailed extensions breakdown**

### Generated Sources (gensrc/)
**Purpose**: Auto-generated code from IDL definitions
- `gensrc/proto/` - Protocol buffer definitions
- `gensrc/thrift/` - Thrift interface definitions
- `gensrc/script/` - Code generation scripts

### Testing (test/)
**Language**: Python
**Purpose**: Integration and SQL testing framework
- `test/sql/` - SQL test cases organized by functionality
- `test/common/` - Common test utilities
- `test/lib/` - Test libraries and helpers

### Tools and Utilities
- `tools/` - Diagnostic tools, benchmarks, and utilities
- `bin/` - Binary executables and scripts
- `conf/` - Configuration files and templates
- `build-support/` - Build system support files
- `docker/` - Docker build configurations
- `docs/` - Project documentation

### Third-party Dependencies
- `thirdparty/` - External dependencies and patches
- `licenses/` - License files for dependencies

### Other Important Directories
- `fs_brokers/` - File system broker implementations
- `webroot/` - Web UI static files
- `format-sdk/` - Format SDK for data interchange

## Development Guidelines

1. **No Building**: Avoid running build commands (`build.sh`, `make`, etc.) unless specifically requested
2. **No Unit Tests**: Do not execute unit test scripts (`run-be-ut.sh`, `run-fe-ut.sh`, etc.)
3. **Focus on Code Analysis**: Prioritize code reading, analysis, and small targeted changes
4. **Language Awareness**: 
   - Backend (be/) is C++ - focus on performance and memory management
   - Frontend (fe/) is Java - focus on SQL parsing and query planning
   - Tests are Python - focus on SQL correctness and integration testing

## Pull Request Guidelines

### PR Title Format
PR titles must include a prefix to categorize the change:

- **[BugFix]** - Bug fixes and error corrections
- **[Enhancement]** - Improvements to existing functionality
- **[Feature]** - New features and capabilities
- **[Refactor]** - Code refactoring without functional changes
- **[Test]** - Test-related changes
- **[Doc]** - Documentation updates
- **[Build]** - Build system and CI/CD changes
- **[Performance]** - Performance optimizations

**Examples:**
- `[BugFix] Fix memory leak in column batch processing`
- `[Feature] Add support for Apache Paimon connector`
- `[Enhancement] Improve query optimizer for materialized views`

### Commit Message Template
Follow this structured format for all commit messages:

```
[Category] Brief description (50 chars or less)

Detailed explanation of what this commit does and why.
Wrap lines at 72 characters.

- Key change 1
- Key change 2
- Key change 3

Fixes: #issue_number (if applicable)
Closes: #issue_number (if applicable)
```

**Categories:** BugFix, Enhancement, Feature, Refactor, Test, Doc, Build, Performance

**Example:**
```
[Feature] Add Apache Iceberg table format support

Implement Iceberg connector to enable querying Iceberg tables
directly from StarRocks. This includes metadata reading,
partition pruning, and schema evolution support.

- Add IcebergConnector and IcebergMetadata classes
- Implement partition and file pruning optimizations  
- Support for Iceberg v1 and v2 table formats
- Add comprehensive unit tests

Closes: #12345
```

## Common File Extensions
- `.cpp`, `.h`, `.cc` - C++ source and headers (backend)
- `.java` - Java source files (frontend and extensions)
- `.proto` - Protocol buffer definitions
- `.thrift` - Thrift interface definitions
- `.sql` - SQL test cases and queries
- `.py` - Python test scripts

## Build System Files to Avoid
- `build.sh` - Main build script (very resource intensive)
- `build-in-docker.sh` - Docker-based build
- `run-*-ut.sh` - Unit test runners
- `Makefile*` - Make build files
- `pom.xml` - Maven build files (for Java components)