Files
yaze/docs/internal/testing/symbol-conflict-detection.md
2025-11-21 21:35:50 -05:00

441 lines
10 KiB
Markdown

# Symbol Conflict Detection System
## Overview
The Symbol Conflict Detection System is designed to catch **One Definition Rule (ODR) violations** and symbol conflicts **before linking fails**. This prevents wasted time debugging linker errors and improves development velocity.
**The Problem:**
- Developers accidentally define the same symbol in multiple translation units
- Errors only appear at link time (after 10-15+ minutes of compilation on some platforms)
- The error message is often cryptic: `symbol already defined in object`
- No early warning during development
**The Solution:**
- Extract symbols from compiled object files immediately after compilation
- Build a symbol database with conflict detection
- Pre-commit hook warns about conflicts before committing
- CI/CD job fails early if conflicts detected
- Fast analysis: <5 seconds for typical builds
## Quick Start
### Generate Symbol Database
```bash
# Extract all symbols and create database
./scripts/extract-symbols.sh
# Output: build/symbol_database.json
```
### Check for Conflicts
```bash
# Analyze database for conflicts
./scripts/check-duplicate-symbols.sh
# Output: List of conflicting symbols with file locations
```
### Combined Usage
```bash
# Extract and check in one command
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
```
## Components
### 1. Symbol Extraction Tool (`scripts/extract-symbols.sh`)
Scans all compiled object files and extracts symbol definitions.
**Features:**
- Cross-platform support (macOS/Linux/Windows)
- Uses `nm` on Unix/macOS, `dumpbin` on Windows
- Generates JSON database with symbol metadata
- Skips undefined symbols (references only)
- Tracks symbol type (text, data, read-only)
**Usage:**
```bash
# Default: scan ./build directory, output to build/symbol_database.json
./scripts/extract-symbols.sh
# Custom build directory
./scripts/extract-symbols.sh /path/to/custom/build
# Custom output file
./scripts/extract-symbols.sh build symbols.json
```
**Output Format:**
```json
{
"metadata": {
"platform": "Darwin",
"build_dir": "build",
"timestamp": "2025-11-20T10:30:45.123456Z",
"object_files_scanned": 145,
"total_symbols": 8923,
"total_conflicts": 2
},
"conflicts": [
{
"symbol": "FLAGS_rom",
"count": 2,
"definitions": [
{
"object_file": "flags.cc.o",
"type": "D"
},
{
"object_file": "emu_test.cc.o",
"type": "D"
}
]
}
],
"symbols": {
"FLAGS_rom": [...]
}
}
```
**Symbol Types:**
- `T` = Text/Code (function in `.text` section)
- `D` = Data (initialized global variable in `.data` section)
- `R` = Read-only (constant in `.rodata` section)
- `B` = BSS (uninitialized global in `.bss` section)
- `U` = Undefined (external reference, not a definition)
### 2. Duplicate Symbol Checker (`scripts/check-duplicate-symbols.sh`)
Analyzes symbol database and reports conflicts in a developer-friendly format.
**Usage:**
```bash
# Check default database (build/symbol_database.json)
./scripts/check-duplicate-symbols.sh
# Specify custom database
./scripts/check-duplicate-symbols.sh /path/to/symbol_database.json
# Verbose output (show all symbols)
./scripts/check-duplicate-symbols.sh --verbose
# Include fix suggestions
./scripts/check-duplicate-symbols.sh --fix-suggestions
```
**Output Example:**
```
=== Duplicate Symbol Checker ===
Database: build/symbol_database.json
Platform: Darwin
Build directory: build
Timestamp: 2025-11-20T10:30:45.123456Z
Object files scanned: 145
Total symbols: 8923
Total conflicts: 2
CONFLICTS FOUND:
[1/2] FLAGS_rom (x2)
1. flags.cc.o (type: D)
2. emu_test.cc.o (type: D)
[2/2] g_global_counter (x2)
1. utils.cc.o (type: D)
2. utils_test.cc.o (type: D)
=== Summary ===
Total conflicts: 2
Fix these before linking!
```
**Exit Codes:**
- `0` = No conflicts found
- `1` = Conflicts detected
### 3. Pre-Commit Hook (`.githooks/pre-commit`)
Runs automatically before committing code (can be bypassed with `--no-verify`).
**Features:**
- Only checks changed `.cc` and `.h` files
- Fast analysis: ~2-3 seconds
- Warns about conflicts in affected object files
- Suggests common fixes
- Non-blocking (just a warning, doesn't fail the commit)
**Usage:**
```bash
# Automatically runs on git commit
git commit -m "Your message"
# Skip hook if needed
git commit --no-verify -m "Your message"
```
**Setup (first time):**
```bash
# Configure Git to use .githooks directory
git config core.hooksPath .githooks
# Make hook executable
chmod +x .githooks/pre-commit
```
**Hook Output:**
```
[Pre-Commit] Checking for symbol conflicts...
Changed files:
src/cli/flags.cc
test/emu_test.cc
Affected object files:
build/CMakeFiles/z3ed.dir/src/cli/flags.cc.o
build/CMakeFiles/z3ed_test.dir/test/emu_test.cc.o
Analyzing symbols...
WARNING: Symbol conflicts detected!
Duplicate symbols in affected files:
FLAGS_rom
- flags.cc.o
- emu_test.cc.o
You can:
1. Fix the conflicts before committing
2. Skip this check: git commit --no-verify
3. Run full analysis: ./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
Common fixes:
- Add 'static' keyword to make it internal linkage
- Use anonymous namespace in .cc files
- Use 'inline' keyword for function/variable definitions
```
## Common Fixes for ODR Violations
### Problem: Global Variable Defined in Multiple Files
**Bad:**
```cpp
// flags.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM");
// test.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM"); // ERROR: Duplicate definition
```
**Fix 1: Use `static` (internal linkage)**
```cpp
// test.cc
static ABSL_FLAG(std::string, rom, "", "Path to ROM"); // Now local to this file
```
**Fix 2: Use Anonymous Namespace**
```cpp
// test.cc
namespace {
ABSL_FLAG(std::string, rom, "", "Path to ROM");
} // Now has internal linkage
```
**Fix 3: Declare in Header, Define in One .cc**
```cpp
// flags.h
extern ABSL_FLAG(std::string, rom);
// flags.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM");
// test.cc
// Use via flags.h declaration, don't redefine
```
### Problem: Duplicate Function Definitions
**Bad:**
```cpp
// util.cc
void ProcessData() { /* ... */ }
// util_test.cc
void ProcessData() { /* ... */ } // ERROR: Already defined
```
**Fix 1: Make `inline`**
```cpp
// util.h
inline void ProcessData() { /* ... */ }
// util.cc and util_test.cc can include and use it
```
**Fix 2: Use `static`**
```cpp
// util.cc
static void ProcessData() { /* ... */ } // Internal linkage
```
**Fix 3: Use Anonymous Namespace**
```cpp
// util.cc
namespace {
void ProcessData() { /* ... */ }
} // Internal linkage
```
### Problem: Class Static Member Initialization
**Bad:**
```cpp
// widget.h
class Widget {
static int instance_count; // Declaration only
};
// widget.cc
int Widget::instance_count = 0;
// widget_test.cc (accidentally includes impl)
int Widget::instance_count = 0; // ERROR: Multiple definitions
```
**Fix: Define in Only One .cc**
```cpp
// widget.h
class Widget {
static int instance_count;
};
// widget.cc (ONLY definition)
int Widget::instance_count = 0;
// widget_test.cc (only uses, doesn't redefine)
```
## Integration with CI/CD
### GitHub Actions Example
Add to `.github/workflows/ci.yml`:
```yaml
- name: Extract symbols
if: success()
run: |
./scripts/extract-symbols.sh build
./scripts/check-duplicate-symbols.sh
- name: Upload symbol report
if: always()
uses: actions/upload-artifact@v3
with:
name: symbol-database
path: build/symbol_database.json
```
### Workflow:
1. **Build completes** (generates .o/.obj files)
2. **Extract symbols** runs immediately
3. **Check for conflicts** analyzes database
4. **Fail job** if duplicates found
5. **Upload report** for inspection
## Performance Notes
### Typical Build Timings
| Operation | Time | Notes |
|-----------|------|-------|
| Extract symbols (145 obj files) | ~2-3s | macOS/Linux with `nm` |
| Extract symbols (145 obj files) | ~5-7s | Windows with `dumpbin` |
| Check duplicates | <100ms | JSON parsing and analysis |
| Pre-commit hook (5 changed files) | ~1-2s | Only checks affected objects |
### Optimization Tips
1. **Run only affected files in pre-commit hook** - Don't scan entire build
2. **Cache symbol database** - Reuse between checks if no new objects
3. **Parallel extraction** - Future enhancement for large builds
4. **Filter by symbol type** - Focus on data/text symbols, skip weak symbols
## Troubleshooting
### "Symbol database not found"
**Issue:** Script says database doesn't exist
```
Error: Symbol database not found: build/symbol_database.json
```
**Solution:** Generate it first
```bash
./scripts/extract-symbols.sh
```
### "No object files found"
**Issue:** Extraction found 0 object files
```
Warning: No object files found in build
```
**Solution:** Rebuild the project first
```bash
cmake --build build # or appropriate build command
./scripts/extract-symbols.sh
```
### "No compiled objects found for changed files"
**Issue:** Pre-commit hook can't find object files for changes
```
[Pre-Commit] No compiled objects found for changed files (might not be built yet)
```
**Solution:** This is normal if you haven't built yet. Just commit normally:
```bash
git commit -m "Your message"
```
### Symbol not appearing in conflicts
**Issue:** Manual review found duplicate, but tool doesn't report it
**Cause:** Symbol might be weak, or in template/header-only code
**Solution:** Check with `nm` directly:
```bash
nm build/CMakeFiles/*/*.o | grep symbol_name
```
## Future Enhancements
1. **Incremental checking** - Only re-scan changed object files
2. **HTML reports** - Generate visual conflict reports with source references
3. **Automatic fixes** - Suggest patches for common ODR patterns
4. **Integration with IDE** - Clangd/LSP warnings for duplicate definitions
5. **Symbol lifecycle tracking** - Track which symbols were added/removed per build
6. **Statistics dashboard** - Monitor symbol health over time
## References
- [C++ One Definition Rule (cppreference)](https://en.cppreference.com/w/cpp/language/definition)
- [Linker Errors (isocpp.org)](https://isocpp.org/wiki/faq/linker-errors)
- [GNU nm Manual](https://sourceware.org/binutils/docs/binutils/nm.html)
- [Windows dumpbin Documentation](https://learn.microsoft.com/en-us/cpp/build/reference/dumpbin-reference)
## Support
For issues or suggestions:
1. Check `.githooks/pre-commit` is executable: `chmod +x .githooks/pre-commit`
2. Verify git hooks path is configured: `git config core.hooksPath`
3. Run full analysis for detailed debugging: `./scripts/check-duplicate-symbols.sh --verbose`
4. Open an issue with the `symbol-detection` label