backend-infra-engineer: Release v0.3.3 snapshot

This commit is contained in:
scawful
2025-11-21 21:35:50 -05:00
parent 3d71417f62
commit 476dd1cd1c
818 changed files with 65706 additions and 35514 deletions

View File

@@ -0,0 +1,377 @@
# Symbol Conflict Detection - Implementation Guide
This guide explains the implementation details of the Symbol Conflict Detection System and how to integrate it into your development workflow.
## Architecture Overview
### System Components
```
┌─────────────────────────────────────────────────────────┐
│ Compiled Object Files (.o / .obj) │
│ (Created during cmake --build) │
└──────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ extract-symbols.sh │
│ ├─ Scan object files in build/ │
│ ├─ Use nm (Unix/macOS) or dumpbin (Windows) │
│ ├─ Extract symbol definitions (skip undefined refs) │
│ └─ Generate JSON database │
└──────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ symbol_database.json │
│ ├─ Metadata (platform, timestamp, stats) │
│ ├─ Conflicts array (symbols defined multiple times) │
│ └─ Symbols dict (full mapping) │
└──────────────────┬──────────────────────────────────────┘
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────────────────────┐
│ check-duplicate-symbols.sh │
│ └─ Parse JSON & report conflicts │
└──────────────────────────────────┘
│ │ │
│ │ │
[CLI] [Pre-Commit] [CI/CD]
```
## Script Implementation Details
### 1. extract-symbols.sh
**Purpose:** Extract all symbol definitions from object files
**Key Functions:**
#### Symbol Extraction (Unix/macOS)
```bash
nm -P <obj_file> # Parse format: SYMBOL TYPE [VALUE] [SIZE]
```
Format:
- Column 1: Symbol name
- Column 2: Symbol type (T=text, D=data, R=read-only, etc.)
- Column 3: Address (if defined)
- Column 4: Size
Filtering logic:
1. Skip symbols with name starting with space
2. Skip symbols with "U" in the type column (undefined)
3. Keep symbols with types: T, D, R, B, C, etc.
#### Symbol Extraction (Windows)
```bash
dumpbin /symbols <obj_file> # Parse binary format output
```
Note: Windows extraction is less precise than Unix. Symbol types are approximated.
#### JSON Generation
Uses Python3 for portability:
1. Read all extracted symbols from temp file
2. Group by symbol name
3. Identify conflicts (count > 1)
4. Generate structured JSON
5. Sort conflicts by count (most duplicated first)
**Performance Considerations:**
- Process all 4000+ object files sequentially
- `nm` is fast (~1ms per file on macOS)
- Python JSON generation is <100ms
- Total: ~2-3 seconds for typical builds
### 2. check-duplicate-symbols.sh
**Purpose:** Analyze symbol database and report conflicts
**Algorithm:**
1. Parse JSON database
2. Extract metadata and conflicts array
3. For each conflict:
- Print symbol name
- List all definitions with object files and types
4. Exit with code based on conflict count
**Output Formatting:**
- Colors for readability (RED for errors, GREEN for success)
- Structured output with proper indentation
- Fix suggestions (if --fix-suggestions flag)
### 3. Pre-commit Hook (`.githooks/pre-commit`)
**Purpose:** Fast symbol check on changed files (not full extraction)
**Algorithm:**
1. Get staged changes: `git diff --cached`
2. Filter to .cc/.h files
3. Find matching object files in build directory
4. Use `nm` to extract symbols from affected objects only
5. Check for duplicates using `sort | uniq -d`
**Key Optimizations:**
- Only processes changed files, not entire build
- Quick `sort | uniq -d` instead of full JSON parsing
- Can be bypassed with `--no-verify`
- Runs in <2 seconds
**Matching Logic:**
```
source file: src/cli/flags.cc
object file: build/CMakeFiles/*/src/cli/flags.cc.o
```
### 4. test-symbol-detection.sh
**Purpose:** Validate the entire system
**Test Sequence:**
1. Check scripts are executable (chmod +x)
2. Verify build directory exists
3. Count object files (need > 0)
4. Run extract-symbols.sh (timeout: 2 minutes)
5. Validate JSON structure (required fields)
6. Run check-duplicate-symbols.sh
7. Verify pre-commit hook configuration
8. Display sample output
**Exit Codes:**
- `0` = All tests passed
- `1` = Test failed (specific test prints which one)
## Integration Workflows
### Development Workflow
```
1. Make code changes
2. Build project: cmake --build build
3. Pre-commit hook runs automatically
├─ Fast check on changed files
├─ Warns if conflicts detected
└─ Allow commit with --no-verify if intentional
4. Run full check before pushing (optional):
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
5. Push to GitHub
```
### CI/CD Workflow
```
GitHub Push/PR
.github/workflows/symbol-detection.yml
├─ Checkout code
├─ Setup environment
├─ Build project
├─ Extract symbols
├─ Check for conflicts
├─ Upload artifact (symbol_database.json)
└─ Fail job if conflicts found
```
### First-Time Setup
```bash
# 1. Configure git hooks (one-time)
git config core.hooksPath .githooks
# 2. Make hook executable
chmod +x .githooks/pre-commit
# 3. Test the system
./scripts/test-symbol-detection.sh
# 4. Create initial symbol database
./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh
```
## JSON Database Schema
```json
{
"metadata": {
"platform": "Darwin|Linux|Windows",
"build_dir": "/path/to/build",
"timestamp": "ISO8601Z format",
"object_files_scanned": 145,
"total_symbols": 8923,
"total_conflicts": 2
},
"conflicts": [
{
"symbol": "FLAGS_rom",
"count": 2,
"definitions": [
{
"object_file": "flags.cc.o",
"type": "D"
},
{
"object_file": "emu_test.cc.o",
"type": "D"
}
]
}
],
"symbols": {
"FLAGS_rom": [
{ "object_file": "flags.cc.o", "type": "D" },
{ "object_file": "emu_test.cc.o", "type": "D" }
]
}
}
```
### Schema Notes:
- `symbols` dict only includes conflicted symbols (keeps file size small)
- `conflicts` array is sorted by count (most duplicated first)
- `type` field indicates symbol kind (T/D/R/B/U/etc.)
- Timestamps are UTC ISO8601 for cross-platform compatibility
## Symbol Types Reference
| Type | Name | Meaning | Common in |
|------|------|---------|-----------|
| T | Text | Function/code | .cc/.o |
| D | Data | Initialized variable | .cc/.o |
| R | Read-only | Const data | .cc/.o |
| B | BSS | Uninitialized data | .cc/.o |
| C | Common | Tentative definition | .cc/.o |
| U | Undefined | External reference | (skipped) |
| A | Absolute | Absolute symbol | (rare) |
| W | Weak | Weak symbol | (rare) |
## Troubleshooting Guide
### Extraction Fails with "No object files found"
**Cause:** Build directory not populated with .o files
**Solution:**
```bash
cmake --build build # First build
./scripts/extract-symbols.sh
```
### Extraction is Very Slow
**Cause:** 4000+ object files, or nm is slow on filesystem
**Solution:**
1. Ensure build is on fast SSD
2. Check system load: `top` or `Activity Monitor`
3. Run in foreground to see progress
4. Optional: Parallelize in future version
### Symbol Not Appearing as Conflict
**Cause:** Symbol is weak (W type) or hidden/internal
**Solution:**
Check directly with nm:
```bash
nm build/CMakeFiles/*/*.o | grep symbol_name
```
### Pre-commit Hook Not Running
**Cause:** Git hooks path not configured
**Solution:**
```bash
git config core.hooksPath .githooks
chmod +x .githooks/pre-commit
```
### Windows dumpbin Not Found
**Cause:** Visual Studio not properly installed
**Solution:**
```powershell
# Run from Visual Studio Developer Command Prompt
# or install Visual Studio with "Desktop development with C++"
```
## Performance Optimization Ideas
### Phase 1 (Current)
- Sequential symbol extraction
- Full JSON parsing
- Complete database generation
### Phase 2 (Future)
- Parallel object file processing (~4x speedup)
- Incremental extraction (only new/changed objects)
- Symbol caching (reuse between builds)
### Phase 3 (Future)
- HTML report generation with source links
- Integration with IDE (clangd warnings)
- Automatic fix suggestions with patch generation
## Maintenance
### When to Run Extract
| Scenario | Command |
|----------|---------|
| After major rebuild | `./scripts/extract-symbols.sh` |
| Before pushing | `./scripts/extract-symbols.sh && ./scripts/check-duplicate-symbols.sh` |
| In CI/CD | Automatic (symbol-detection.yml) |
| Quick check on changes | Pre-commit hook (automatic) |
### Cleanup
```bash
# Remove symbol database
rm build/symbol_database.json
# Clean temp files (if stuck)
rm -f build/.temp_symbols.txt build/.object_files.tmp
```
### Updating for New Platforms
To add support for a new platform:
1. Detect platform in `extract-symbols.sh`:
```bash
case "${UNAME_S}" in
NewOS*) IS_NEWOS=true ;;
esac
```
2. Add extraction function:
```bash
extract_symbols_newos() {
local obj_file="$1"
# Use platform-specific tool (e.g., readelf for new Unix variant)
}
```
3. Call appropriate function in main loop
## References
- **nm manual:** `man nm` or online docs
- **dumpbin:** Visual Studio documentation
- **Symbol types:** ELF specification (gabi10000.pdf)
- **ODR violations:** C++ standard section 3.2