378 lines
11 KiB
Markdown
378 lines
11 KiB
Markdown
# Symbol Conflict Detection - Implementation Guide
|
|
|
|
This guide explains the implementation details of the Symbol Conflict Detection System and how to integrate it into your development workflow.
|
|
|
|
## Architecture Overview
|
|
|
|
### System Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Compiled Object Files (.o / .obj) │
|
|
│ (Created during cmake --build) │
|
|
└──────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ extract-symbols.sh │
|
|
│ ├─ Scan object files in build/ │
|
|
│ ├─ Use nm (Unix/macOS) or dumpbin (Windows) │
|
|
│ ├─ Extract symbol definitions (skip undefined refs) │
|
|
│ └─ Generate JSON database │
|
|
└──────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ symbol_database.json │
|
|
│ ├─ Metadata (platform, timestamp, stats) │
|
|
│ ├─ Conflicts array (symbols defined multiple times) │
|
|
│ └─ Symbols dict (full mapping) │
|
|
└──────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌────────────┼────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────────────────────────────┐
|
|
│ check-duplicate-symbols.sh │
|
|
│ └─ Parse JSON & report conflicts │
|
|
└──────────────────────────────────┘
|
|
│ │ │
|
|
│ │ │
|
|
[CLI] [Pre-Commit] [CI/CD]
|
|
```
|
|
|
|
## Script Implementation Details
|
|
|
|
### 1. extract-symbols.sh
|
|
|
|
**Purpose:** Extract all symbol definitions from object files
|
|
|
|
**Key Functions:**
|
|
|
|
#### Symbol Extraction (Unix/macOS)
|
|
```bash
|
|
nm -P <obj_file> # Parse format: SYMBOL TYPE [VALUE] [SIZE]
|
|
```
|
|
|
|
Format:
|
|
- Column 1: Symbol name
|
|
- Column 2: Symbol type (T=text, D=data, R=read-only, etc.)
|
|
- Column 3: Address (if defined)
|
|
- Column 4: Size
|
|
|
|
Filtering logic:
|
|
1. Skip symbols with name starting with space
|
|
2. Skip symbols with "U" in the type column (undefined)
|
|
3. Keep symbols with types: T, D, R, B, C, etc.
|
|
|
|
#### Symbol Extraction (Windows)
|
|
```bash
|
|
dumpbin /symbols <obj_file> # Parse binary format output
|
|
```
|
|
|
|
Note: Windows extraction is less precise than Unix. Symbol types are approximated.
|
|
|
|
#### JSON Generation
|
|
Uses Python3 for portability:
|
|
1. Read all extracted symbols from temp file
|
|
2. Group by symbol name
|
|
3. Identify conflicts (count > 1)
|
|
4. Generate structured JSON
|
|
5. Sort conflicts by count (most duplicated first)
|
|
|
|
**Performance Considerations:**
|
|
- Process all 4000+ object files sequentially
|
|
- `nm` is fast (~1ms per file on macOS)
|
|
- Python JSON generation is <100ms
|
|
- Total: ~2-3 seconds for typical builds
|
|
|
|
### 2. check-duplicate-symbols.sh
|
|
|
|
**Purpose:** Analyze symbol database and report conflicts
|
|
|
|
**Algorithm:**
|
|
1. Parse JSON database
|
|
2. Extract metadata and conflicts array
|
|
3. For each conflict:
|
|
- Print symbol name
|
|
- List all definitions with object files and types
|
|
4. Exit with code based on conflict count
|
|
|
|
**Output Formatting:**
|
|
- Colors for readability (RED for errors, GREEN for success)
|
|
- Structured output with proper indentation
|
|
- Fix suggestions (if --fix-suggestions flag)
|
|
|
|
### 3. Pre-commit Hook (`.githooks/pre-commit`)
|
|
|
|
**Purpose:** Fast symbol check on changed files (not full extraction)
|
|
|
|
**Algorithm:**
|
|
1. Get staged changes: `git diff --cached`
|
|
2. Filter to .cc/.h files
|
|
3. Find matching object files in build directory
|
|
4. Use `nm` to extract symbols from affected objects only
|
|
5. Check for duplicates using `sort | uniq -d`
|
|
|
|
**Key Optimizations:**
|
|
- Only processes changed files, not entire build
|
|
- Quick `sort | uniq -d` instead of full JSON parsing
|
|
- Can be bypassed with `--no-verify`
|
|
- Runs in <2 seconds
|
|
|
|
**Matching Logic:**
|
|
```
|
|
source file: src/cli/flags.cc
|
|
object file: build/CMakeFiles/*/src/cli/flags.cc.o
|
|
```
|
|
|
|
### 4. test-symbol-detection.sh
|
|
|
|
**Purpose:** Validate the entire system
|
|
|
|
**Test Sequence:**
|
|
1. Check scripts are executable (chmod +x)
|
|
2. Verify build directory exists
|
|
3. Count object files (need > 0)
|
|
4. Run extract-symbols.sh (timeout: 2 minutes)
|
|
5. Validate JSON structure (required fields)
|
|
6. Run check-duplicate-symbols.sh
|
|
7. Verify pre-commit hook configuration
|
|
8. Display sample output
|
|
|
|
**Exit Codes:**
|
|
- `0` = All tests passed
|
|
- `1` = Test failed (specific test prints which one)
|
|
|
|
## Integration Workflows
|
|
|
|
### Development Workflow
|
|
|
|
```
|
|
1. Make code changes
|
|
│
|
|
▼
|
|
2. Build project: cmake --build build
|
|
│
|
|
▼
|
|
3. Pre-commit hook runs automatically
|
|
│
|
|
├─ Fast check on changed files
|
|
├─ Warns if conflicts detected
|
|
└─ Allow commit with --no-verify if intentional
|
|
│
|
|
▼
|
|
4. Run full check before pushing (optional):
|
|
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
|
|
│
|
|
▼
|
|
5. Push to GitHub
|
|
```
|
|
|
|
### CI/CD Workflow
|
|
|
|
```
|
|
GitHub Push/PR
|
|
│
|
|
▼
|
|
.github/workflows/symbol-detection.yml
|
|
│
|
|
├─ Checkout code
|
|
├─ Setup environment
|
|
├─ Build project
|
|
├─ Extract symbols
|
|
├─ Check for conflicts
|
|
├─ Upload artifact (symbol_database.json)
|
|
└─ Fail job if conflicts found
|
|
```
|
|
|
|
### First-Time Setup
|
|
|
|
```bash
|
|
# 1. Configure git hooks (one-time)
|
|
git config core.hooksPath .githooks
|
|
|
|
# 2. Make hook executable
|
|
chmod +x .githooks/pre-commit
|
|
|
|
# 3. Test the system
|
|
./scripts/test-symbol-detection.sh
|
|
|
|
# 4. Create initial symbol database
|
|
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
|
|
```
|
|
|
|
## JSON Database Schema
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"platform": "Darwin|Linux|Windows",
|
|
"build_dir": "/path/to/build",
|
|
"timestamp": "ISO8601Z format",
|
|
"object_files_scanned": 145,
|
|
"total_symbols": 8923,
|
|
"total_conflicts": 2
|
|
},
|
|
"conflicts": [
|
|
{
|
|
"symbol": "FLAGS_rom",
|
|
"count": 2,
|
|
"definitions": [
|
|
{
|
|
"object_file": "flags.cc.o",
|
|
"type": "D"
|
|
},
|
|
{
|
|
"object_file": "emu_test.cc.o",
|
|
"type": "D"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"symbols": {
|
|
"FLAGS_rom": [
|
|
{ "object_file": "flags.cc.o", "type": "D" },
|
|
{ "object_file": "emu_test.cc.o", "type": "D" }
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Schema Notes:
|
|
- `symbols` dict only includes conflicted symbols (keeps file size small)
|
|
- `conflicts` array is sorted by count (most duplicated first)
|
|
- `type` field indicates symbol kind (T/D/R/B/U/etc.)
|
|
- Timestamps are UTC ISO8601 for cross-platform compatibility
|
|
|
|
## Symbol Types Reference
|
|
|
|
| Type | Name | Meaning | Common in |
|
|
|------|------|---------|-----------|
|
|
| T | Text | Function/code | .cc/.o |
|
|
| D | Data | Initialized variable | .cc/.o |
|
|
| R | Read-only | Const data | .cc/.o |
|
|
| B | BSS | Uninitialized data | .cc/.o |
|
|
| C | Common | Tentative definition | .cc/.o |
|
|
| U | Undefined | External reference | (skipped) |
|
|
| A | Absolute | Absolute symbol | (rare) |
|
|
| W | Weak | Weak symbol | (rare) |
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Extraction Fails with "No object files found"
|
|
|
|
**Cause:** Build directory not populated with .o files
|
|
|
|
**Solution:**
|
|
```bash
|
|
cmake --build build # First build
|
|
./scripts/extract-symbols.sh
|
|
```
|
|
|
|
### Extraction is Very Slow
|
|
|
|
**Cause:** 4000+ object files, or nm is slow on filesystem
|
|
|
|
**Solution:**
|
|
1. Ensure build is on fast SSD
|
|
2. Check system load: `top` or `Activity Monitor`
|
|
3. Run in foreground to see progress
|
|
4. Optional: Parallelize in future version
|
|
|
|
### Symbol Not Appearing as Conflict
|
|
|
|
**Cause:** Symbol is weak (W type) or hidden/internal
|
|
|
|
**Solution:**
|
|
Check directly with nm:
|
|
```bash
|
|
nm build/CMakeFiles/*/*.o | grep symbol_name
|
|
```
|
|
|
|
### Pre-commit Hook Not Running
|
|
|
|
**Cause:** Git hooks path not configured
|
|
|
|
**Solution:**
|
|
```bash
|
|
git config core.hooksPath .githooks
|
|
chmod +x .githooks/pre-commit
|
|
```
|
|
|
|
### Windows dumpbin Not Found
|
|
|
|
**Cause:** Visual Studio not properly installed
|
|
|
|
**Solution:**
|
|
```powershell
|
|
# Run from Visual Studio Developer Command Prompt
|
|
# or install Visual Studio with "Desktop development with C++"
|
|
```
|
|
|
|
## Performance Optimization Ideas
|
|
|
|
### Phase 1 (Current)
|
|
- Sequential symbol extraction
|
|
- Full JSON parsing
|
|
- Complete database generation
|
|
|
|
### Phase 2 (Future)
|
|
- Parallel object file processing (~4x speedup)
|
|
- Incremental extraction (only new/changed objects)
|
|
- Symbol caching (reuse between builds)
|
|
|
|
### Phase 3 (Future)
|
|
- HTML report generation with source links
|
|
- Integration with IDE (clangd warnings)
|
|
- Automatic fix suggestions with patch generation
|
|
|
|
## Maintenance
|
|
|
|
### When to Run Extract
|
|
|
|
| Scenario | Command |
|
|
|----------|---------|
|
|
| After major rebuild | `./scripts/extract-symbols.sh` |
|
|
| Before pushing | `./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh` |
|
|
| In CI/CD | Automatic (symbol-detection.yml) |
|
|
| Quick check on changes | Pre-commit hook (automatic) |
|
|
|
|
### Cleanup
|
|
|
|
```bash
|
|
# Remove symbol database
|
|
rm build/symbol_database.json
|
|
|
|
# Clean temp files (if stuck)
|
|
rm -f build/.temp_symbols.txt build/.object_files.tmp
|
|
```
|
|
|
|
### Updating for New Platforms
|
|
|
|
To add support for a new platform:
|
|
|
|
1. Detect platform in `extract-symbols.sh`:
|
|
```bash
|
|
case "${UNAME_S}" in
|
|
NewOS*) IS_NEWOS=true ;;
|
|
esac
|
|
```
|
|
|
|
2. Add extraction function:
|
|
```bash
|
|
extract_symbols_newos() {
|
|
local obj_file="$1"
|
|
# Use platform-specific tool (e.g., readelf for new Unix variant)
|
|
}
|
|
```
|
|
|
|
3. Call appropriate function in main loop
|
|
|
|
## References
|
|
|
|
- **nm manual:** `man nm` or online docs
|
|
- **dumpbin:** Visual Studio documentation
|
|
- **Symbol types:** ELF specification (gabi10000.pdf)
|
|
- **ODR violations:** C++ standard section 3.2
|