Files
yaze/docs/gfx_optimization_recommendations.md
scawful 22297402fc Implement comprehensive atlas rendering system and performance optimizations
- Introduced the AtlasRenderer class for efficient texture management and batch rendering, significantly reducing draw calls.
- Added RenderTilesBatch function in Tilemap for rendering multiple tiles in a single operation, enhancing performance.
- Implemented memory management features including automatic atlas defragmentation and UV coordinate mapping.
- Integrated performance monitoring dashboard to track atlas statistics and rendering efficiency.
- Developed a benchmarking suite to validate performance improvements and ensure accuracy in rendering speed.
- Enhanced existing graphics components to utilize the new atlas rendering system, improving overall responsiveness in the YAZE editor.
2025-09-29 00:03:43 -04:00

422 lines
12 KiB
Markdown

# YAZE Graphics System Optimization Recommendations
## Overview
This document provides comprehensive analysis and optimization recommendations for the YAZE graphics system, specifically targeting improvements for Link to the Past ROM hacking workflows.
## Current Architecture Analysis
### Strengths
1. **Arena-based Resource Management**: Efficient SDL resource pooling
2. **SNES-specific Format Support**: Proper handling of 4BPP/8BPP graphics
3. **Palette Management**: Integrated SNES palette system
4. **Tile-based Editing**: Support for 8x8 and 16x16 tiles
### Performance Bottlenecks Identified
#### 1. Bitmap Class Issues
- **Linear Palette Search**: `SetPixel()` uses O(n) palette lookup
- **Redundant Data Copies**: Multiple copies of pixel data
- **Inefficient Texture Updates**: Full texture updates for single pixel changes
- **Missing Bounds Optimization**: No early exit for out-of-bounds operations
#### 2. Arena Resource Management
- **Hash Map Overhead**: O(1) lookup but memory overhead for small collections
- **No Resource Pooling**: Each allocation creates new SDL resources
- **Missing Batch Operations**: No bulk texture/surface operations
#### 3. Tilemap Performance
- **Lazy Loading Inefficiency**: Tiles created on-demand without batching
- **Memory Fragmentation**: Individual tile bitmaps cause memory fragmentation
- **No Tile Caching Strategy**: No LRU or smart caching for frequently used tiles
## Optimization Recommendations
### 1. Bitmap Class Optimizations
#### A. Palette Lookup Optimization
```cpp
// Current: O(n) linear search
uint8_t color_index = 0;
for (size_t i = 0; i < palette_.size(); i++) {
if (palette_[i].rgb().x == color.rgb().x && ...) {
color_index = static_cast<uint8_t>(i);
break;
}
}
// Optimized: O(1) hash map lookup
class Bitmap {
private:
std::unordered_map<uint32_t, uint8_t> color_to_index_cache_;
public:
void InvalidatePaletteCache() {
color_to_index_cache_.clear();
for (size_t i = 0; i < palette_.size(); i++) {
uint32_t color_hash = HashColor(palette_[i].rgb());
color_to_index_cache_[color_hash] = static_cast<uint8_t>(i);
}
}
uint8_t FindColorIndex(const SnesColor& color) {
uint32_t hash = HashColor(color.rgb());
auto it = color_to_index_cache_.find(hash);
return (it != color_to_index_cache_.end()) ? it->second : 0;
}
};
```
#### B. Dirty Region Tracking
```cpp
class Bitmap {
private:
struct DirtyRegion {
int min_x, min_y, max_x, max_y;
bool is_dirty = false;
} dirty_region_;
public:
void SetPixel(int x, int y, const SnesColor& color) {
// ... existing code ...
// Update dirty region instead of marking entire bitmap
if (!dirty_region_.is_dirty) {
dirty_region_.min_x = dirty_region_.max_x = x;
dirty_region_.min_y = dirty_region_.max_y = y;
dirty_region_.is_dirty = true;
} else {
dirty_region_.min_x = std::min(dirty_region_.min_x, x);
dirty_region_.min_y = std::min(dirty_region_.min_y, y);
dirty_region_.max_x = std::max(dirty_region_.max_x, x);
dirty_region_.max_y = std::max(dirty_region_.max_y, y);
}
}
void UpdateTexture(SDL_Renderer* renderer) {
if (!dirty_region_.is_dirty) return;
// Only update the dirty region
SDL_Rect dirty_rect = {
dirty_region_.min_x, dirty_region_.min_y,
dirty_region_.max_x - dirty_region_.min_x + 1,
dirty_region_.max_y - dirty_region_.min_y + 1
};
// Update only the dirty region
Arena::Get().UpdateTextureRegion(texture_, surface_, &dirty_rect);
dirty_region_.is_dirty = false;
}
};
```
### 2. Arena Resource Management Improvements
#### A. Resource Pooling
```cpp
class Arena {
private:
struct TexturePool {
std::vector<SDL_Texture*> available_textures_;
std::unordered_map<SDL_Texture*, std::pair<int, int>> texture_sizes_;
} texture_pool_;
struct SurfacePool {
std::vector<SDL_Surface*> available_surfaces_;
std::unordered_map<SDL_Surface*, std::tuple<int, int, int, int>> surface_info_;
} surface_pool_;
public:
SDL_Texture* AllocateTexture(SDL_Renderer* renderer, int width, int height) {
// Try to reuse existing texture of same size
for (auto it = texture_pool_.available_textures_.begin();
it != texture_pool_.available_textures_.end(); ++it) {
auto& size = texture_pool_.texture_sizes_[*it];
if (size.first == width && size.second == height) {
SDL_Texture* texture = *it;
texture_pool_.available_textures_.erase(it);
return texture;
}
}
// Create new texture if none available
return CreateNewTexture(renderer, width, height);
}
void FreeTexture(SDL_Texture* texture) {
// Return to pool instead of destroying
texture_pool_.available_textures_.push_back(texture);
}
};
```
#### B. Batch Operations
```cpp
class Arena {
public:
struct BatchUpdate {
std::vector<std::pair<SDL_Texture*, SDL_Surface*>> updates_;
void AddUpdate(SDL_Texture* texture, SDL_Surface* surface) {
updates_.emplace_back(texture, surface);
}
void Execute() {
// Batch all texture updates for efficiency
for (auto& update : updates_) {
UpdateTexture(update.first, update.second);
}
updates_.clear();
}
};
BatchUpdate CreateBatch() { return BatchUpdate{}; }
};
```
### 3. Tilemap Performance Enhancements
#### A. Smart Tile Caching
```cpp
class Tilemap {
private:
struct TileCache {
static constexpr size_t MAX_CACHE_SIZE = 1024;
std::unordered_map<int, Bitmap> cache_;
std::list<int> access_order_;
Bitmap* GetTile(int tile_id) {
auto it = cache_.find(tile_id);
if (it != cache_.end()) {
// Move to front of access order
access_order_.remove(tile_id);
access_order_.push_front(tile_id);
return &it->second;
}
return nullptr;
}
void CacheTile(int tile_id, Bitmap&& bitmap) {
if (cache_.size() >= MAX_CACHE_SIZE) {
// Remove least recently used tile
int lru_tile = access_order_.back();
access_order_.pop_back();
cache_.erase(lru_tile);
}
cache_[tile_id] = std::move(bitmap);
access_order_.push_front(tile_id);
}
} tile_cache_;
public:
void RenderTile(int tile_id) {
Bitmap* cached_tile = tile_cache_.GetTile(tile_id);
if (cached_tile) {
core::Renderer::Get().UpdateBitmap(cached_tile);
return;
}
// Create new tile and cache it
Bitmap new_tile = CreateTileFromAtlas(tile_id);
tile_cache_.CacheTile(tile_id, std::move(new_tile));
core::Renderer::Get().RenderBitmap(&tile_cache_.cache_[tile_id]);
}
};
```
#### B. Atlas-based Rendering
```cpp
class Tilemap {
public:
void RenderTilemap(const std::vector<int>& tile_ids,
const std::vector<SDL_Rect>& positions) {
// Batch render multiple tiles from atlas
std::vector<SDL_Rect> src_rects;
std::vector<SDL_Rect> dst_rects;
for (size_t i = 0; i < tile_ids.size(); ++i) {
SDL_Rect src_rect = GetTileRect(tile_ids[i]);
src_rects.push_back(src_rect);
dst_rects.push_back(positions[i]);
}
// Single draw call for all tiles
core::Renderer::Get().RenderAtlas(atlas.texture(), src_rects, dst_rects);
}
};
```
### 4. Editor-Specific Optimizations
#### A. Graphics Editor Improvements
```cpp
class GraphicsEditor {
private:
struct EditingState {
bool is_drawing = false;
std::vector<PixelChange> undo_stack_;
std::vector<PixelChange> redo_stack_;
DirtyRegion current_edit_region_;
} editing_state_;
public:
void StartDrawing() {
editing_state_.is_drawing = true;
editing_state_.current_edit_region_.Reset();
}
void EndDrawing() {
if (editing_state_.is_drawing) {
// Batch update only the edited region
UpdateDirtyRegion(editing_state_.current_edit_region_);
editing_state_.is_drawing = false;
}
}
void SetPixel(int x, int y, const SnesColor& color) {
// Record change for undo/redo
editing_state_.undo_stack_.emplace_back(x, y, GetPixel(x, y), color);
// Update pixel
current_bitmap_->SetPixel(x, y, color);
// Update edit region
editing_state_.current_edit_region_.AddPoint(x, y);
}
};
```
#### B. Palette Editor Optimizations
```cpp
class PaletteEditor {
private:
struct PaletteCache {
std::unordered_map<uint32_t, ImVec4> snes_to_rgba_cache_;
std::unordered_map<uint32_t, uint16_t> rgba_to_snes_cache_;
void Invalidate() {
snes_to_rgba_cache_.clear();
rgba_to_snes_cache_.clear();
}
} palette_cache_;
public:
ImVec4 ConvertSnesToRgba(uint16_t snes_color) {
uint32_t key = snes_color;
auto it = palette_cache_.snes_to_rgba_cache_.find(key);
if (it != palette_cache_.snes_to_rgba_cache_.end()) {
return it->second;
}
ImVec4 rgba = ConvertSnesColorToImVec4(SnesColor(snes_color));
palette_cache_.snes_to_rgba_cache_[key] = rgba;
return rgba;
}
};
```
### 5. Memory Management Improvements
#### A. Custom Allocator for Graphics Data
```cpp
class GraphicsAllocator {
private:
static constexpr size_t POOL_SIZE = 16 * 1024 * 1024; // 16MB
char* pool_;
size_t offset_;
public:
GraphicsAllocator() : pool_(new char[POOL_SIZE]), offset_(0) {}
void* Allocate(size_t size) {
if (offset_ + size > POOL_SIZE) {
return nullptr; // Pool exhausted
}
void* ptr = pool_ + offset_;
offset_ += size;
return ptr;
}
void Reset() { offset_ = 0; }
};
```
#### B. Smart Pointer Management
```cpp
template<typename T>
class GraphicsPtr {
private:
T* ptr_;
std::function<void(T*)> deleter_;
public:
GraphicsPtr(T* ptr, std::function<void(T*)> deleter)
: ptr_(ptr), deleter_(deleter) {}
~GraphicsPtr() {
if (ptr_ && deleter_) {
deleter_(ptr_);
}
}
T* get() const { return ptr_; }
T& operator*() const { return *ptr_; }
T* operator->() const { return ptr_; }
};
```
## Implementation Priority
### Phase 1 (High Impact, Low Risk)
1. **Palette Lookup Optimization**: Hash map for O(1) color lookups
2. **Dirty Region Tracking**: Only update changed areas
3. **Resource Pooling**: Reuse SDL textures and surfaces
### Phase 2 (Medium Impact, Medium Risk)
1. **Tile Caching System**: LRU cache for frequently used tiles
2. **Batch Operations**: Group texture updates
3. **Memory Pool Allocator**: Custom allocator for graphics data
### Phase 3 (High Impact, High Risk)
1. **Atlas-based Rendering**: Single draw calls for multiple tiles
2. **Multi-threaded Updates**: Background texture processing
3. **GPU-based Operations**: Move some operations to GPU
## Performance Metrics
### Target Improvements
- **Palette Lookup**: 100x faster (O(n) → O(1))
- **Texture Updates**: 10x faster (dirty regions)
- **Memory Usage**: 30% reduction (resource pooling)
- **Frame Rate**: 2x improvement (batch operations)
### Measurement Tools
```cpp
class PerformanceProfiler {
public:
void StartTimer(const std::string& operation) {
timers_[operation] = std::chrono::high_resolution_clock::now();
}
void EndTimer(const std::string& operation) {
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
end - timers_[operation]).count();
operation_times_[operation].push_back(duration);
}
void Report() {
for (auto& [operation, times] : operation_times_) {
double avg_time = std::accumulate(times.begin(), times.end(), 0.0) / times.size();
SDL_Log("Operation %s: %.2f μs average", operation.c_str(), avg_time);
}
}
};
```
## Conclusion
These optimizations will significantly improve the performance and responsiveness of the YAZE graphics system, particularly for ROM hacking workflows that involve frequent pixel manipulation, palette editing, and tile-based graphics editing. The phased approach ensures minimal risk while delivering substantial performance improvements.