Implement comprehensive atlas rendering system and performance optimizations
- Introduced the AtlasRenderer class for efficient texture management and batch rendering, significantly reducing draw calls. - Added RenderTilesBatch function in Tilemap for rendering multiple tiles in a single operation, enhancing performance. - Implemented memory management features including automatic atlas defragmentation and UV coordinate mapping. - Integrated performance monitoring dashboard to track atlas statistics and rendering efficiency. - Developed a benchmarking suite to validate performance improvements and ensure accuracy in rendering speed. - Enhanced existing graphics components to utilize the new atlas rendering system, improving overall responsiveness in the YAZE editor.
This commit is contained in:
421
docs/gfx_optimization_recommendations.md
Normal file
421
docs/gfx_optimization_recommendations.md
Normal file
@@ -0,0 +1,421 @@
|
||||
# YAZE Graphics System Optimization Recommendations
|
||||
|
||||
## Overview
|
||||
This document provides comprehensive analysis and optimization recommendations for the YAZE graphics system, specifically targeting improvements for Link to the Past ROM hacking workflows.
|
||||
|
||||
## Current Architecture Analysis
|
||||
|
||||
### Strengths
|
||||
1. **Arena-based Resource Management**: Efficient SDL resource pooling
|
||||
2. **SNES-specific Format Support**: Proper handling of 4BPP/8BPP graphics
|
||||
3. **Palette Management**: Integrated SNES palette system
|
||||
4. **Tile-based Editing**: Support for 8x8 and 16x16 tiles
|
||||
|
||||
### Performance Bottlenecks Identified
|
||||
|
||||
#### 1. Bitmap Class Issues
|
||||
- **Linear Palette Search**: `SetPixel()` uses O(n) palette lookup
|
||||
- **Redundant Data Copies**: Multiple copies of pixel data
|
||||
- **Inefficient Texture Updates**: Full texture updates for single pixel changes
|
||||
- **Missing Bounds Optimization**: No early exit for out-of-bounds operations
|
||||
|
||||
#### 2. Arena Resource Management
|
||||
- **Hash Map Overhead**: O(1) lookup but memory overhead for small collections
|
||||
- **No Resource Pooling**: Each allocation creates new SDL resources
|
||||
- **Missing Batch Operations**: No bulk texture/surface operations
|
||||
|
||||
#### 3. Tilemap Performance
|
||||
- **Lazy Loading Inefficiency**: Tiles created on-demand without batching
|
||||
- **Memory Fragmentation**: Individual tile bitmaps cause memory fragmentation
|
||||
- **No Tile Caching Strategy**: No LRU or smart caching for frequently used tiles
|
||||
|
||||
## Optimization Recommendations
|
||||
|
||||
### 1. Bitmap Class Optimizations
|
||||
|
||||
#### A. Palette Lookup Optimization
|
||||
```cpp
|
||||
// Current: O(n) linear search
|
||||
uint8_t color_index = 0;
|
||||
for (size_t i = 0; i < palette_.size(); i++) {
|
||||
if (palette_[i].rgb().x == color.rgb().x && ...) {
|
||||
color_index = static_cast<uint8_t>(i);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Optimized: O(1) hash map lookup
|
||||
class Bitmap {
|
||||
private:
|
||||
std::unordered_map<uint32_t, uint8_t> color_to_index_cache_;
|
||||
|
||||
public:
|
||||
void InvalidatePaletteCache() {
|
||||
color_to_index_cache_.clear();
|
||||
for (size_t i = 0; i < palette_.size(); i++) {
|
||||
uint32_t color_hash = HashColor(palette_[i].rgb());
|
||||
color_to_index_cache_[color_hash] = static_cast<uint8_t>(i);
|
||||
}
|
||||
}
|
||||
|
||||
uint8_t FindColorIndex(const SnesColor& color) {
|
||||
uint32_t hash = HashColor(color.rgb());
|
||||
auto it = color_to_index_cache_.find(hash);
|
||||
return (it != color_to_index_cache_.end()) ? it->second : 0;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### B. Dirty Region Tracking
|
||||
```cpp
|
||||
class Bitmap {
|
||||
private:
|
||||
struct DirtyRegion {
|
||||
int min_x, min_y, max_x, max_y;
|
||||
bool is_dirty = false;
|
||||
} dirty_region_;
|
||||
|
||||
public:
|
||||
void SetPixel(int x, int y, const SnesColor& color) {
|
||||
// ... existing code ...
|
||||
|
||||
// Update dirty region instead of marking entire bitmap
|
||||
if (!dirty_region_.is_dirty) {
|
||||
dirty_region_.min_x = dirty_region_.max_x = x;
|
||||
dirty_region_.min_y = dirty_region_.max_y = y;
|
||||
dirty_region_.is_dirty = true;
|
||||
} else {
|
||||
dirty_region_.min_x = std::min(dirty_region_.min_x, x);
|
||||
dirty_region_.min_y = std::min(dirty_region_.min_y, y);
|
||||
dirty_region_.max_x = std::max(dirty_region_.max_x, x);
|
||||
dirty_region_.max_y = std::max(dirty_region_.max_y, y);
|
||||
}
|
||||
}
|
||||
|
||||
void UpdateTexture(SDL_Renderer* renderer) {
|
||||
if (!dirty_region_.is_dirty) return;
|
||||
|
||||
// Only update the dirty region
|
||||
SDL_Rect dirty_rect = {
|
||||
dirty_region_.min_x, dirty_region_.min_y,
|
||||
dirty_region_.max_x - dirty_region_.min_x + 1,
|
||||
dirty_region_.max_y - dirty_region_.min_y + 1
|
||||
};
|
||||
|
||||
// Update only the dirty region
|
||||
Arena::Get().UpdateTextureRegion(texture_, surface_, &dirty_rect);
|
||||
dirty_region_.is_dirty = false;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### 2. Arena Resource Management Improvements
|
||||
|
||||
#### A. Resource Pooling
|
||||
```cpp
|
||||
class Arena {
|
||||
private:
|
||||
struct TexturePool {
|
||||
std::vector<SDL_Texture*> available_textures_;
|
||||
std::unordered_map<SDL_Texture*, std::pair<int, int>> texture_sizes_;
|
||||
} texture_pool_;
|
||||
|
||||
struct SurfacePool {
|
||||
std::vector<SDL_Surface*> available_surfaces_;
|
||||
std::unordered_map<SDL_Surface*, std::tuple<int, int, int, int>> surface_info_;
|
||||
} surface_pool_;
|
||||
|
||||
public:
|
||||
SDL_Texture* AllocateTexture(SDL_Renderer* renderer, int width, int height) {
|
||||
// Try to reuse existing texture of same size
|
||||
for (auto it = texture_pool_.available_textures_.begin();
|
||||
it != texture_pool_.available_textures_.end(); ++it) {
|
||||
auto& size = texture_pool_.texture_sizes_[*it];
|
||||
if (size.first == width && size.second == height) {
|
||||
SDL_Texture* texture = *it;
|
||||
texture_pool_.available_textures_.erase(it);
|
||||
return texture;
|
||||
}
|
||||
}
|
||||
|
||||
// Create new texture if none available
|
||||
return CreateNewTexture(renderer, width, height);
|
||||
}
|
||||
|
||||
void FreeTexture(SDL_Texture* texture) {
|
||||
// Return to pool instead of destroying
|
||||
texture_pool_.available_textures_.push_back(texture);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### B. Batch Operations
|
||||
```cpp
|
||||
class Arena {
|
||||
public:
|
||||
struct BatchUpdate {
|
||||
std::vector<std::pair<SDL_Texture*, SDL_Surface*>> updates_;
|
||||
|
||||
void AddUpdate(SDL_Texture* texture, SDL_Surface* surface) {
|
||||
updates_.emplace_back(texture, surface);
|
||||
}
|
||||
|
||||
void Execute() {
|
||||
// Batch all texture updates for efficiency
|
||||
for (auto& update : updates_) {
|
||||
UpdateTexture(update.first, update.second);
|
||||
}
|
||||
updates_.clear();
|
||||
}
|
||||
};
|
||||
|
||||
BatchUpdate CreateBatch() { return BatchUpdate{}; }
|
||||
};
|
||||
```
|
||||
|
||||
### 3. Tilemap Performance Enhancements
|
||||
|
||||
#### A. Smart Tile Caching
|
||||
```cpp
|
||||
class Tilemap {
|
||||
private:
|
||||
struct TileCache {
|
||||
static constexpr size_t MAX_CACHE_SIZE = 1024;
|
||||
std::unordered_map<int, Bitmap> cache_;
|
||||
std::list<int> access_order_;
|
||||
|
||||
Bitmap* GetTile(int tile_id) {
|
||||
auto it = cache_.find(tile_id);
|
||||
if (it != cache_.end()) {
|
||||
// Move to front of access order
|
||||
access_order_.remove(tile_id);
|
||||
access_order_.push_front(tile_id);
|
||||
return &it->second;
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
void CacheTile(int tile_id, Bitmap&& bitmap) {
|
||||
if (cache_.size() >= MAX_CACHE_SIZE) {
|
||||
// Remove least recently used tile
|
||||
int lru_tile = access_order_.back();
|
||||
access_order_.pop_back();
|
||||
cache_.erase(lru_tile);
|
||||
}
|
||||
|
||||
cache_[tile_id] = std::move(bitmap);
|
||||
access_order_.push_front(tile_id);
|
||||
}
|
||||
} tile_cache_;
|
||||
|
||||
public:
|
||||
void RenderTile(int tile_id) {
|
||||
Bitmap* cached_tile = tile_cache_.GetTile(tile_id);
|
||||
if (cached_tile) {
|
||||
core::Renderer::Get().UpdateBitmap(cached_tile);
|
||||
return;
|
||||
}
|
||||
|
||||
// Create new tile and cache it
|
||||
Bitmap new_tile = CreateTileFromAtlas(tile_id);
|
||||
tile_cache_.CacheTile(tile_id, std::move(new_tile));
|
||||
core::Renderer::Get().RenderBitmap(&tile_cache_.cache_[tile_id]);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### B. Atlas-based Rendering
|
||||
```cpp
|
||||
class Tilemap {
|
||||
public:
|
||||
void RenderTilemap(const std::vector<int>& tile_ids,
|
||||
const std::vector<SDL_Rect>& positions) {
|
||||
// Batch render multiple tiles from atlas
|
||||
std::vector<SDL_Rect> src_rects;
|
||||
std::vector<SDL_Rect> dst_rects;
|
||||
|
||||
for (size_t i = 0; i < tile_ids.size(); ++i) {
|
||||
SDL_Rect src_rect = GetTileRect(tile_ids[i]);
|
||||
src_rects.push_back(src_rect);
|
||||
dst_rects.push_back(positions[i]);
|
||||
}
|
||||
|
||||
// Single draw call for all tiles
|
||||
core::Renderer::Get().RenderAtlas(atlas.texture(), src_rects, dst_rects);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### 4. Editor-Specific Optimizations
|
||||
|
||||
#### A. Graphics Editor Improvements
|
||||
```cpp
|
||||
class GraphicsEditor {
|
||||
private:
|
||||
struct EditingState {
|
||||
bool is_drawing = false;
|
||||
std::vector<PixelChange> undo_stack_;
|
||||
std::vector<PixelChange> redo_stack_;
|
||||
DirtyRegion current_edit_region_;
|
||||
} editing_state_;
|
||||
|
||||
public:
|
||||
void StartDrawing() {
|
||||
editing_state_.is_drawing = true;
|
||||
editing_state_.current_edit_region_.Reset();
|
||||
}
|
||||
|
||||
void EndDrawing() {
|
||||
if (editing_state_.is_drawing) {
|
||||
// Batch update only the edited region
|
||||
UpdateDirtyRegion(editing_state_.current_edit_region_);
|
||||
editing_state_.is_drawing = false;
|
||||
}
|
||||
}
|
||||
|
||||
void SetPixel(int x, int y, const SnesColor& color) {
|
||||
// Record change for undo/redo
|
||||
editing_state_.undo_stack_.emplace_back(x, y, GetPixel(x, y), color);
|
||||
|
||||
// Update pixel
|
||||
current_bitmap_->SetPixel(x, y, color);
|
||||
|
||||
// Update edit region
|
||||
editing_state_.current_edit_region_.AddPoint(x, y);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### B. Palette Editor Optimizations
|
||||
```cpp
|
||||
class PaletteEditor {
|
||||
private:
|
||||
struct PaletteCache {
|
||||
std::unordered_map<uint32_t, ImVec4> snes_to_rgba_cache_;
|
||||
std::unordered_map<uint32_t, uint16_t> rgba_to_snes_cache_;
|
||||
|
||||
void Invalidate() {
|
||||
snes_to_rgba_cache_.clear();
|
||||
rgba_to_snes_cache_.clear();
|
||||
}
|
||||
} palette_cache_;
|
||||
|
||||
public:
|
||||
ImVec4 ConvertSnesToRgba(uint16_t snes_color) {
|
||||
uint32_t key = snes_color;
|
||||
auto it = palette_cache_.snes_to_rgba_cache_.find(key);
|
||||
if (it != palette_cache_.snes_to_rgba_cache_.end()) {
|
||||
return it->second;
|
||||
}
|
||||
|
||||
ImVec4 rgba = ConvertSnesColorToImVec4(SnesColor(snes_color));
|
||||
palette_cache_.snes_to_rgba_cache_[key] = rgba;
|
||||
return rgba;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### 5. Memory Management Improvements
|
||||
|
||||
#### A. Custom Allocator for Graphics Data
|
||||
```cpp
|
||||
class GraphicsAllocator {
|
||||
private:
|
||||
static constexpr size_t POOL_SIZE = 16 * 1024 * 1024; // 16MB
|
||||
char* pool_;
|
||||
size_t offset_;
|
||||
|
||||
public:
|
||||
GraphicsAllocator() : pool_(new char[POOL_SIZE]), offset_(0) {}
|
||||
|
||||
void* Allocate(size_t size) {
|
||||
if (offset_ + size > POOL_SIZE) {
|
||||
return nullptr; // Pool exhausted
|
||||
}
|
||||
|
||||
void* ptr = pool_ + offset_;
|
||||
offset_ += size;
|
||||
return ptr;
|
||||
}
|
||||
|
||||
void Reset() { offset_ = 0; }
|
||||
};
|
||||
```
|
||||
|
||||
#### B. Smart Pointer Management
|
||||
```cpp
|
||||
template<typename T>
|
||||
class GraphicsPtr {
|
||||
private:
|
||||
T* ptr_;
|
||||
std::function<void(T*)> deleter_;
|
||||
|
||||
public:
|
||||
GraphicsPtr(T* ptr, std::function<void(T*)> deleter)
|
||||
: ptr_(ptr), deleter_(deleter) {}
|
||||
|
||||
~GraphicsPtr() {
|
||||
if (ptr_ && deleter_) {
|
||||
deleter_(ptr_);
|
||||
}
|
||||
}
|
||||
|
||||
T* get() const { return ptr_; }
|
||||
T& operator*() const { return *ptr_; }
|
||||
T* operator->() const { return ptr_; }
|
||||
};
|
||||
```
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1 (High Impact, Low Risk)
|
||||
1. **Palette Lookup Optimization**: Hash map for O(1) color lookups
|
||||
2. **Dirty Region Tracking**: Only update changed areas
|
||||
3. **Resource Pooling**: Reuse SDL textures and surfaces
|
||||
|
||||
### Phase 2 (Medium Impact, Medium Risk)
|
||||
1. **Tile Caching System**: LRU cache for frequently used tiles
|
||||
2. **Batch Operations**: Group texture updates
|
||||
3. **Memory Pool Allocator**: Custom allocator for graphics data
|
||||
|
||||
### Phase 3 (High Impact, High Risk)
|
||||
1. **Atlas-based Rendering**: Single draw calls for multiple tiles
|
||||
2. **Multi-threaded Updates**: Background texture processing
|
||||
3. **GPU-based Operations**: Move some operations to GPU
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Target Improvements
|
||||
- **Palette Lookup**: 100x faster (O(n) → O(1))
|
||||
- **Texture Updates**: 10x faster (dirty regions)
|
||||
- **Memory Usage**: 30% reduction (resource pooling)
|
||||
- **Frame Rate**: 2x improvement (batch operations)
|
||||
|
||||
### Measurement Tools
|
||||
```cpp
|
||||
class PerformanceProfiler {
|
||||
public:
|
||||
void StartTimer(const std::string& operation) {
|
||||
timers_[operation] = std::chrono::high_resolution_clock::now();
|
||||
}
|
||||
|
||||
void EndTimer(const std::string& operation) {
|
||||
auto end = std::chrono::high_resolution_clock::now();
|
||||
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
|
||||
end - timers_[operation]).count();
|
||||
|
||||
operation_times_[operation].push_back(duration);
|
||||
}
|
||||
|
||||
void Report() {
|
||||
for (auto& [operation, times] : operation_times_) {
|
||||
double avg_time = std::accumulate(times.begin(), times.end(), 0.0) / times.size();
|
||||
SDL_Log("Operation %s: %.2f μs average", operation.c_str(), avg_time);
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
These optimizations will significantly improve the performance and responsiveness of the YAZE graphics system, particularly for ROM hacking workflows that involve frequent pixel manipulation, palette editing, and tile-based graphics editing. The phased approach ensures minimal risk while delivering substantial performance improvements.
|
||||
Reference in New Issue
Block a user