Files
yaze/docs/internal/testing/symbol-conflict-detection.md
2025-11-21 21:35:50 -05:00

10 KiB

Symbol Conflict Detection System

Overview

The Symbol Conflict Detection System is designed to catch One Definition Rule (ODR) violations and symbol conflicts before linking fails. This prevents wasted time debugging linker errors and improves development velocity.

The Problem:

  • Developers accidentally define the same symbol in multiple translation units
  • Errors only appear at link time (after 10-15+ minutes of compilation on some platforms)
  • The error message is often cryptic: symbol already defined in object
  • No early warning during development

The Solution:

  • Extract symbols from compiled object files immediately after compilation
  • Build a symbol database with conflict detection
  • Pre-commit hook warns about conflicts before committing
  • CI/CD job fails early if conflicts detected
  • Fast analysis: <5 seconds for typical builds

Quick Start

Generate Symbol Database

# Extract all symbols and create database
./scripts/extract-symbols.sh

# Output: build/symbol_database.json

Check for Conflicts

# Analyze database for conflicts
./scripts/check-duplicate-symbols.sh

# Output: List of conflicting symbols with file locations

Combined Usage

# Extract and check in one command
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh

Components

1. Symbol Extraction Tool (scripts/extract-symbols.sh)

Scans all compiled object files and extracts symbol definitions.

Features:

  • Cross-platform support (macOS/Linux/Windows)
  • Uses nm on Unix/macOS, dumpbin on Windows
  • Generates JSON database with symbol metadata
  • Skips undefined symbols (references only)
  • Tracks symbol type (text, data, read-only)

Usage:

# Default: scan ./build directory, output to build/symbol_database.json
./scripts/extract-symbols.sh

# Custom build directory
./scripts/extract-symbols.sh /path/to/custom/build

# Custom output file
./scripts/extract-symbols.sh build symbols.json

Output Format:

{
  "metadata": {
    "platform": "Darwin",
    "build_dir": "build",
    "timestamp": "2025-11-20T10:30:45.123456Z",
    "object_files_scanned": 145,
    "total_symbols": 8923,
    "total_conflicts": 2
  },
  "conflicts": [
    {
      "symbol": "FLAGS_rom",
      "count": 2,
      "definitions": [
        {
          "object_file": "flags.cc.o",
          "type": "D"
        },
        {
          "object_file": "emu_test.cc.o",
          "type": "D"
        }
      ]
    }
  ],
  "symbols": {
    "FLAGS_rom": [...]
  }
}

Symbol Types:

  • T = Text/Code (function in .text section)
  • D = Data (initialized global variable in .data section)
  • R = Read-only (constant in .rodata section)
  • B = BSS (uninitialized global in .bss section)
  • U = Undefined (external reference, not a definition)

2. Duplicate Symbol Checker (scripts/check-duplicate-symbols.sh)

Analyzes symbol database and reports conflicts in a developer-friendly format.

Usage:

# Check default database (build/symbol_database.json)
./scripts/check-duplicate-symbols.sh

# Specify custom database
./scripts/check-duplicate-symbols.sh /path/to/symbol_database.json

# Verbose output (show all symbols)
./scripts/check-duplicate-symbols.sh --verbose

# Include fix suggestions
./scripts/check-duplicate-symbols.sh --fix-suggestions

Output Example:

=== Duplicate Symbol Checker ===
Database: build/symbol_database.json
Platform: Darwin
Build directory: build
Timestamp: 2025-11-20T10:30:45.123456Z
Object files scanned: 145
Total symbols: 8923
Total conflicts: 2

CONFLICTS FOUND:

[1/2] FLAGS_rom (x2)
      1. flags.cc.o (type: D)
      2. emu_test.cc.o (type: D)

[2/2] g_global_counter (x2)
      1. utils.cc.o (type: D)
      2. utils_test.cc.o (type: D)

=== Summary ===
Total conflicts: 2
Fix these before linking!

Exit Codes:

  • 0 = No conflicts found
  • 1 = Conflicts detected

3. Pre-Commit Hook (.githooks/pre-commit)

Runs automatically before committing code (can be bypassed with --no-verify).

Features:

  • Only checks changed .cc and .h files
  • Fast analysis: ~2-3 seconds
  • Warns about conflicts in affected object files
  • Suggests common fixes
  • Non-blocking (just a warning, doesn't fail the commit)

Usage:

# Automatically runs on git commit
git commit -m "Your message"

# Skip hook if needed
git commit --no-verify -m "Your message"

Setup (first time):

# Configure Git to use .githooks directory
git config core.hooksPath .githooks

# Make hook executable
chmod +x .githooks/pre-commit

Hook Output:

[Pre-Commit] Checking for symbol conflicts...
Changed files:
  src/cli/flags.cc
  test/emu_test.cc

Affected object files:
  build/CMakeFiles/z3ed.dir/src/cli/flags.cc.o
  build/CMakeFiles/z3ed_test.dir/test/emu_test.cc.o

Analyzing symbols...

WARNING: Symbol conflicts detected!

Duplicate symbols in affected files:
  FLAGS_rom
    - flags.cc.o
    - emu_test.cc.o

You can:
  1. Fix the conflicts before committing
  2. Skip this check: git commit --no-verify
  3. Run full analysis: ./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh

Common fixes:
  - Add 'static' keyword to make it internal linkage
  - Use anonymous namespace in .cc files
  - Use 'inline' keyword for function/variable definitions

Common Fixes for ODR Violations

Problem: Global Variable Defined in Multiple Files

Bad:

// flags.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM");

// test.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM");  // ERROR: Duplicate definition

Fix 1: Use static (internal linkage)

// test.cc
static ABSL_FLAG(std::string, rom, "", "Path to ROM");  // Now local to this file

Fix 2: Use Anonymous Namespace

// test.cc
namespace {
  ABSL_FLAG(std::string, rom, "", "Path to ROM");
}  // Now has internal linkage

Fix 3: Declare in Header, Define in One .cc

// flags.h
extern ABSL_FLAG(std::string, rom);

// flags.cc
ABSL_FLAG(std::string, rom, "", "Path to ROM");

// test.cc
// Use via flags.h declaration, don't redefine

Problem: Duplicate Function Definitions

Bad:

// util.cc
void ProcessData() { /* ... */ }

// util_test.cc
void ProcessData() { /* ... */ }  // ERROR: Already defined

Fix 1: Make inline

// util.h
inline void ProcessData() { /* ... */ }

// util.cc and util_test.cc can include and use it

Fix 2: Use static

// util.cc
static void ProcessData() { /* ... */ }  // Internal linkage

Fix 3: Use Anonymous Namespace

// util.cc
namespace {
  void ProcessData() { /* ... */ }
}  // Internal linkage

Problem: Class Static Member Initialization

Bad:

// widget.h
class Widget {
  static int instance_count;  // Declaration only
};

// widget.cc
int Widget::instance_count = 0;

// widget_test.cc (accidentally includes impl)
int Widget::instance_count = 0;  // ERROR: Multiple definitions

Fix: Define in Only One .cc

// widget.h
class Widget {
  static int instance_count;
};

// widget.cc (ONLY definition)
int Widget::instance_count = 0;

// widget_test.cc (only uses, doesn't redefine)

Integration with CI/CD

GitHub Actions Example

Add to .github/workflows/ci.yml:

- name: Extract symbols
  if: success()
  run: |
    ./scripts/extract-symbols.sh build
    ./scripts/check-duplicate-symbols.sh

- name: Upload symbol report
  if: always()
  uses: actions/upload-artifact@v3
  with:
    name: symbol-database
    path: build/symbol_database.json

Workflow:

  1. Build completes (generates .o/.obj files)
  2. Extract symbols runs immediately
  3. Check for conflicts analyzes database
  4. Fail job if duplicates found
  5. Upload report for inspection

Performance Notes

Typical Build Timings

Operation Time Notes
Extract symbols (145 obj files) ~2-3s macOS/Linux with nm
Extract symbols (145 obj files) ~5-7s Windows with dumpbin
Check duplicates <100ms JSON parsing and analysis
Pre-commit hook (5 changed files) ~1-2s Only checks affected objects

Optimization Tips

  1. Run only affected files in pre-commit hook - Don't scan entire build
  2. Cache symbol database - Reuse between checks if no new objects
  3. Parallel extraction - Future enhancement for large builds
  4. Filter by symbol type - Focus on data/text symbols, skip weak symbols

Troubleshooting

"Symbol database not found"

Issue: Script says database doesn't exist

Error: Symbol database not found: build/symbol_database.json

Solution: Generate it first

./scripts/extract-symbols.sh

"No object files found"

Issue: Extraction found 0 object files

Warning: No object files found in build

Solution: Rebuild the project first

cmake --build build  # or appropriate build command
./scripts/extract-symbols.sh

"No compiled objects found for changed files"

Issue: Pre-commit hook can't find object files for changes

[Pre-Commit] No compiled objects found for changed files (might not be built yet)

Solution: This is normal if you haven't built yet. Just commit normally:

git commit -m "Your message"

Symbol not appearing in conflicts

Issue: Manual review found duplicate, but tool doesn't report it

Cause: Symbol might be weak, or in template/header-only code

Solution: Check with nm directly:

nm build/CMakeFiles/*/*.o | grep symbol_name

Future Enhancements

  1. Incremental checking - Only re-scan changed object files
  2. HTML reports - Generate visual conflict reports with source references
  3. Automatic fixes - Suggest patches for common ODR patterns
  4. Integration with IDE - Clangd/LSP warnings for duplicate definitions
  5. Symbol lifecycle tracking - Track which symbols were added/removed per build
  6. Statistics dashboard - Monitor symbol health over time

References

Support

For issues or suggestions:

  1. Check .githooks/pre-commit is executable: chmod +x .githooks/pre-commit
  2. Verify git hooks path is configured: git config core.hooksPath
  3. Run full analysis for detailed debugging: ./scripts/check-duplicate-symbols.sh --verbose
  4. Open an issue with the symbol-detection label