11 KiB
Symbol Conflict Detection - Implementation Guide
This guide explains the implementation details of the Symbol Conflict Detection System and how to integrate it into your development workflow.
Architecture Overview
System Components
┌─────────────────────────────────────────────────────────┐
│ Compiled Object Files (.o / .obj) │
│ (Created during cmake --build) │
└──────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ extract-symbols.sh │
│ ├─ Scan object files in build/ │
│ ├─ Use nm (Unix/macOS) or dumpbin (Windows) │
│ ├─ Extract symbol definitions (skip undefined refs) │
│ └─ Generate JSON database │
└──────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ symbol_database.json │
│ ├─ Metadata (platform, timestamp, stats) │
│ ├─ Conflicts array (symbols defined multiple times) │
│ └─ Symbols dict (full mapping) │
└──────────────────┬──────────────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────────────────────┐
│ check-duplicate-symbols.sh │
│ └─ Parse JSON & report conflicts │
└──────────────────────────────────┘
│ │ │
│ │ │
[CLI] [Pre-Commit] [CI/CD]
Script Implementation Details
1. extract-symbols.sh
Purpose: Extract all symbol definitions from object files
Key Functions:
Symbol Extraction (Unix/macOS)
nm -P <obj_file> # Parse format: SYMBOL TYPE [VALUE] [SIZE]
Format:
- Column 1: Symbol name
- Column 2: Symbol type (T=text, D=data, R=read-only, etc.)
- Column 3: Address (if defined)
- Column 4: Size
Filtering logic:
- Skip symbols with name starting with space
- Skip symbols with "U" in the type column (undefined)
- Keep symbols with types: T, D, R, B, C, etc.
Symbol Extraction (Windows)
dumpbin /symbols <obj_file> # Parse binary format output
Note: Windows extraction is less precise than Unix. Symbol types are approximated.
JSON Generation
Uses Python3 for portability:
- Read all extracted symbols from temp file
- Group by symbol name
- Identify conflicts (count > 1)
- Generate structured JSON
- Sort conflicts by count (most duplicated first)
Performance Considerations:
- Process all 4000+ object files sequentially
nmis fast (~1ms per file on macOS)- Python JSON generation is <100ms
- Total: ~2-3 seconds for typical builds
2. check-duplicate-symbols.sh
Purpose: Analyze symbol database and report conflicts
Algorithm:
- Parse JSON database
- Extract metadata and conflicts array
- For each conflict:
- Print symbol name
- List all definitions with object files and types
- Exit with code based on conflict count
Output Formatting:
- Colors for readability (RED for errors, GREEN for success)
- Structured output with proper indentation
- Fix suggestions (if --fix-suggestions flag)
3. Pre-commit Hook (.githooks/pre-commit)
Purpose: Fast symbol check on changed files (not full extraction)
Algorithm:
- Get staged changes:
git diff --cached - Filter to .cc/.h files
- Find matching object files in build directory
- Use
nmto extract symbols from affected objects only - Check for duplicates using
sort | uniq -d
Key Optimizations:
- Only processes changed files, not entire build
- Quick
sort | uniq -dinstead of full JSON parsing - Can be bypassed with
--no-verify - Runs in <2 seconds
Matching Logic:
source file: src/cli/flags.cc
object file: build/CMakeFiles/*/src/cli/flags.cc.o
4. test-symbol-detection.sh
Purpose: Validate the entire system
Test Sequence:
- Check scripts are executable (chmod +x)
- Verify build directory exists
- Count object files (need > 0)
- Run extract-symbols.sh (timeout: 2 minutes)
- Validate JSON structure (required fields)
- Run check-duplicate-symbols.sh
- Verify pre-commit hook configuration
- Display sample output
Exit Codes:
0= All tests passed1= Test failed (specific test prints which one)
Integration Workflows
Development Workflow
1. Make code changes
│
▼
2. Build project: cmake --build build
│
▼
3. Pre-commit hook runs automatically
│
├─ Fast check on changed files
├─ Warns if conflicts detected
└─ Allow commit with --no-verify if intentional
│
▼
4. Run full check before pushing (optional):
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
│
▼
5. Push to GitHub
CI/CD Workflow
GitHub Push/PR
│
▼
.github/workflows/symbol-detection.yml
│
├─ Checkout code
├─ Setup environment
├─ Build project
├─ Extract symbols
├─ Check for conflicts
├─ Upload artifact (symbol_database.json)
└─ Fail job if conflicts found
First-Time Setup
# 1. Configure git hooks (one-time)
git config core.hooksPath .githooks
# 2. Make hook executable
chmod +x .githooks/pre-commit
# 3. Test the system
./scripts/test-symbol-detection.sh
# 4. Create initial symbol database
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
JSON Database Schema
{
"metadata": {
"platform": "Darwin|Linux|Windows",
"build_dir": "/path/to/build",
"timestamp": "ISO8601Z format",
"object_files_scanned": 145,
"total_symbols": 8923,
"total_conflicts": 2
},
"conflicts": [
{
"symbol": "FLAGS_rom",
"count": 2,
"definitions": [
{
"object_file": "flags.cc.o",
"type": "D"
},
{
"object_file": "emu_test.cc.o",
"type": "D"
}
]
}
],
"symbols": {
"FLAGS_rom": [
{ "object_file": "flags.cc.o", "type": "D" },
{ "object_file": "emu_test.cc.o", "type": "D" }
]
}
}
Schema Notes:
symbolsdict only includes conflicted symbols (keeps file size small)conflictsarray is sorted by count (most duplicated first)typefield indicates symbol kind (T/D/R/B/U/etc.)- Timestamps are UTC ISO8601 for cross-platform compatibility
Symbol Types Reference
| Type | Name | Meaning | Common in |
|---|---|---|---|
| T | Text | Function/code | .cc/.o |
| D | Data | Initialized variable | .cc/.o |
| R | Read-only | Const data | .cc/.o |
| B | BSS | Uninitialized data | .cc/.o |
| C | Common | Tentative definition | .cc/.o |
| U | Undefined | External reference | (skipped) |
| A | Absolute | Absolute symbol | (rare) |
| W | Weak | Weak symbol | (rare) |
Troubleshooting Guide
Extraction Fails with "No object files found"
Cause: Build directory not populated with .o files
Solution:
cmake --build build # First build
./scripts/extract-symbols.sh
Extraction is Very Slow
Cause: 4000+ object files, or nm is slow on filesystem
Solution:
- Ensure build is on fast SSD
- Check system load:
toporActivity Monitor - Run in foreground to see progress
- Optional: Parallelize in future version
Symbol Not Appearing as Conflict
Cause: Symbol is weak (W type) or hidden/internal
Solution: Check directly with nm:
nm build/CMakeFiles/*/*.o | grep symbol_name
Pre-commit Hook Not Running
Cause: Git hooks path not configured
Solution:
git config core.hooksPath .githooks
chmod +x .githooks/pre-commit
Windows dumpbin Not Found
Cause: Visual Studio not properly installed
Solution:
# Run from Visual Studio Developer Command Prompt
# or install Visual Studio with "Desktop development with C++"
Performance Optimization Ideas
Phase 1 (Current)
- Sequential symbol extraction
- Full JSON parsing
- Complete database generation
Phase 2 (Future)
- Parallel object file processing (~4x speedup)
- Incremental extraction (only new/changed objects)
- Symbol caching (reuse between builds)
Phase 3 (Future)
- HTML report generation with source links
- Integration with IDE (clangd warnings)
- Automatic fix suggestions with patch generation
Maintenance
When to Run Extract
| Scenario | Command |
|---|---|
| After major rebuild | ./scripts/extract-symbols.sh |
| Before pushing | ./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh |
| In CI/CD | Automatic (symbol-detection.yml) |
| Quick check on changes | Pre-commit hook (automatic) |
Cleanup
# Remove symbol database
rm build/symbol_database.json
# Clean temp files (if stuck)
rm -f build/.temp_symbols.txt build/.object_files.tmp
Updating for New Platforms
To add support for a new platform:
- Detect platform in
extract-symbols.sh:
case "${UNAME_S}" in
NewOS*) IS_NEWOS=true ;;
esac
- Add extraction function:
extract_symbols_newos() {
local obj_file="$1"
# Use platform-specific tool (e.g., readelf for new Unix variant)
}
- Call appropriate function in main loop
References
- nm manual:
man nmor online docs - dumpbin: Visual Studio documentation
- Symbol types: ELF specification (gabi10000.pdf)
- ODR violations: C++ standard section 3.2