2 changed files with 904 additions and 0 deletions
-
144test/README.md
-
760test/compare_models.py
@ -0,0 +1,144 @@ |
|||
# C++ Tracker Model Testing Framework |
|||
|
|||
This directory contains the testing framework for comparing C++ implementations of PyTorch models against their original Python counterparts. |
|||
|
|||
## Overview |
|||
|
|||
The primary goal is to ensure that the C++ models (`cimp` project) produce results that are numerically close to the Python models from the `pytracking` toolkit, given the same inputs and model weights. |
|||
|
|||
The framework consists of: |
|||
|
|||
1. **C++ Test Program (`test_models.cpp`)**: |
|||
* Responsible for loading pre-trained model weights (from `exported_weights/`). |
|||
* Takes randomly generated input tensors (pre-generated by `generate_test_samples.cpp` and saved in `test/input_samples/`). |
|||
* Runs the C++ `Classifier` and `BBRegressor` models. |
|||
* Saves the C++ model output tensors to `test/output/`. |
|||
|
|||
2. **C++ Sample Generator (`generate_test_samples.cpp`)**: |
|||
* Generates a specified number of random input tensor sets for both the classifier and bounding box regressor. |
|||
* Saves these input tensors into `test/input_samples/{classifier|bb_regressor}/sample_N/` and `test/input_samples/{classifier|bb_regressor}/test_N/` (for classifier test features). |
|||
* This step is separated to allow the Python comparison script to run even if the C++ models have issues during their execution phase. |
|||
|
|||
3. **Python Comparison Script (`compare_models.py`)**: |
|||
* Loads the original Python models (using `DiMPTorchScriptWrapper` which loads weights from `exported_weights/`). |
|||
* Loads the input tensors generated by `generate_test_samples.cpp` from `test/input_samples/`. |
|||
* Runs the Python models on these input tensors to get reference Python outputs. |
|||
* Loads the C++ model output tensors from `test/output/`. |
|||
* Performs a detailed, element-wise comparison between Python and C++ outputs. |
|||
* Calculates various error metrics (MAE, Max Error, L2 norms, Cosine Similarity, Pearson Correlation, Mean Relative Error). |
|||
* Generates an HTML report (`test/comparison/report.html`) summarizing the comparisons, including per-sample statistics and error distribution plots (saved in `test/comparison/plots/`). |
|||
|
|||
4. **Automation Script (`run_full_comparison.sh`)**: |
|||
* Orchestrates the entire testing process: |
|||
1. Builds the C++ project (including `test_models` and `generate_test_samples`). |
|||
2. Runs `generate_test_samples` to create/update input data. |
|||
3. Runs `test_models` to generate C++ outputs. |
|||
4. Runs `compare_models.py` to perform the comparison and generate the report. |
|||
* Accepts the number of samples as an argument. |
|||
|
|||
## Directory Structure |
|||
|
|||
``` |
|||
test/ |
|||
├── input_samples/ # Stores input tensors generated by C++ |
|||
│ ├── classifier/ |
|||
│ │ ├── sample_0/ |
|||
│ │ │ ├── backbone_feat.pt |
|||
│ │ │ └── ... (other classifier train inputs) |
|||
│ │ └── test_0/ |
|||
│ │ └── test_feat.pt |
|||
│ │ └── ... (other classifier test inputs) |
|||
│ └── bb_regressor/ |
|||
│ └── sample_0/ |
|||
│ ├── feat_layer2.pt |
|||
│ ├── feat_layer3.pt |
|||
│ └── ... (other bb_regressor inputs) |
|||
├── output/ # Stores output tensors generated by C++ models |
|||
│ ├── classifier/ |
|||
│ │ ├── sample_0/ |
|||
│ │ │ └── clf_features.pt |
|||
│ │ └── test_0/ |
|||
│ │ └── clf_feat_test.pt |
|||
│ └── bb_regressor/ |
|||
│ └── sample_0/ |
|||
│ ├── iou_pred.pt |
|||
│ └── ... (other bb_regressor outputs) |
|||
├── comparison/ # Stores comparison results |
|||
│ ├── report.html # Main HTML report |
|||
│ └── plots/ # Error distribution histograms |
|||
├── test_models.cpp # C++ program to run models and save outputs |
|||
├── generate_test_samples.cpp # C++ program to generate input samples |
|||
├── compare_models.py # Python script for comparison and report generation |
|||
├── run_full_comparison.sh # Main test execution script |
|||
└── README.md # This file |
|||
``` |
|||
|
|||
## How to Add a New Model for Comparison |
|||
|
|||
Let's say you want to add a new model called `MyNewModel` with both C++ and Python implementations. |
|||
|
|||
**1. Export Python Model Weights:** |
|||
* Ensure your Python `MyNewModel` can have its weights saved in a format loadable by both Python (e.g., `state_dict` or individual tensors) and C++ (LibTorch `torch::load`). |
|||
* Create a subdirectory `exported_weights/mynewmodel/` and save the weights there. |
|||
* Document the tensor names and their corresponding model parameters in a `mynewmodel_weights_doc.txt` file within that directory (see existing `classifier_weights_doc.txt` or `bb_regressor_weights_doc.txt` for examples). This is crucial for the `DiMPTorchScriptWrapper` if loading from individual tensors. |
|||
|
|||
**2. Update C++ Code:** |
|||
|
|||
* **`generate_test_samples.cpp`**: |
|||
* Add functions to generate realistic random input tensors for `MyNewModel`. |
|||
* Define the expected input tensor names and shapes. |
|||
* Modify the `main` function to: |
|||
* Create a directory `test/input_samples/mynewmodel/sample_N/`. |
|||
* Call your new input generation functions. |
|||
* Save these input tensors (e.g., `my_input1.pt`, `my_input2.pt`) into the created directory using the `save_tensor` utility. |
|||
* **`test_models.cpp`**: |
|||
* Include the header for your C++ `MyNewModel` (e.g., `cimp/mynewmodel/mynewmodel.h`). |
|||
* In the `main` function: |
|||
* Add a section for `MyNewModel`. |
|||
* Determine the absolute path to `exported_weights/mynewmodel/`. |
|||
* Instantiate your C++ `MyNewModel`, passing the weights directory. |
|||
* Loop through the number of samples: |
|||
* Construct paths to the input tensors in `test/input_samples/mynewmodel/sample_N/`. |
|||
* Load these input tensors using `load_tensor`. Ensure they are on the correct device (CPU/CUDA). |
|||
* Call the relevant methods of your C++ `MyNewModel` (e.g., `myNewModel.predict(...)`). |
|||
* Create an output directory `test/output/mynewmodel/sample_N/`. |
|||
* Save the output tensors from your C++ model (e.g., `my_output.pt`) to this directory using `save_tensor`. Remember to move outputs to CPU before saving if they are on CUDA. |
|||
* **`CMakeLists.txt`**: |
|||
* If `MyNewModel` is a new static library (like `classifier` or `bb_regressor`), define its sources and add it as a library. |
|||
* Link `test_models` and `generate_test_samples` (if it needs new specific libraries) with `MyNewModel` library and any other dependencies (like LibTorch). |
|||
|
|||
**3. Update Python Comparison Script (`compare_models.py`):** |
|||
|
|||
* **`ModelComparison.__init__` & `_init_models`**: |
|||
* If your Python `MyNewModel` needs to be loaded via `DiMPTorchScriptWrapper`, update the wrapper or add logic to load your model. You might need to add a new parameter like `mynewmodel_sd='mynewmodel'` to `DiMPTorchScriptWrapper` and handle its loading. |
|||
* Store the loaded Python `MyNewModel` instance (e.g., `self.models.mynewmodel`). |
|||
* **Create `compare_mynewmodel` method**: |
|||
* Create a new method, e.g., `def compare_mynewmodel(self):`. |
|||
* Print a starting message. |
|||
* Define input and C++ output directory paths: `Path('test') / 'input_samples' / 'mynewmodel'` and `Path('test') / 'output' / 'mynewmodel'`. |
|||
* Loop through `self.num_samples`: |
|||
* Initialize `current_errors = {}` for the current sample. |
|||
* Construct paths to input tensors for `MyNewModel` from `test/input_samples/mynewmodel/sample_N/`. |
|||
* Load these tensors using `self.load_cpp_tensor()`. |
|||
* Run the Python `MyNewModel` with these inputs to get `py_output_tensor`. Handle potential errors. |
|||
* Construct paths to C++ output tensors from `test/output/mynewmodel/sample_N/`. |
|||
* Load the C++ output tensor (`cpp_output_tensor`) using `self.load_cpp_tensor()`. |
|||
* Call `self._compare_tensor_data(py_output_tensor, cpp_output_tensor, "MyNewModel Output Comparison Name", i, current_errors)`. Use a descriptive name. |
|||
* If there are multiple distinct outputs from `MyNewModel` to compare, repeat the load and `_compare_tensor_data` calls for each. |
|||
* Store the results: `if current_errors: self.all_errors_stats[f"MyNewModel_Sample_{i}"] = current_errors`. |
|||
* **`ModelComparison.run_all_tests`**: |
|||
* Call your new `self.compare_mynewmodel()` method. |
|||
|
|||
**4. Run the Tests:** |
|||
* Execute `./test/run_full_comparison.sh <num_samples>`. |
|||
* Check the console output and `test/comparison/report.html` for the results of `MyNewModel`. |
|||
|
|||
## Key Considerations: |
|||
|
|||
* **Tensor Naming and Paths:** Be consistent with tensor filenames and directory structures. The Python script relies on these conventions to find the correct files. |
|||
* **Data Types and Devices:** Ensure tensors are of compatible data types (usually `float32`) and are on the correct device (CPU/CUDA) before model inference and before saving/loading. C++ outputs are saved from CPU. |
|||
* **Error Handling:** Implement robust error handling in both C++ (e.g., for file loading, model errors) and Python (e.g., for tensor loading, Python model execution). The comparison script is designed to report "N/A" for metrics if tensors are missing or shapes mismatch, allowing other comparisons to proceed. |
|||
* **`DiMPTorchScriptWrapper`:** If your Python model structure is different from DiMP's Classifier/BBRegressor, you might need to adapt `DiMPTorchScriptWrapper` or write a custom loader for your Python model if it's not already a `torch.jit.ScriptModule`. The current wrapper supports loading from a directory of named tensor files based on a documentation text file. |
|||
* **`load_cpp_tensor` in Python:** This utility in `compare_models.py` attempts to robustly load tensors saved by LibTorch (which sometimes get wrapped as `RecursiveScriptModule`). If you encounter issues loading your C++ saved tensors, you might need to inspect their structure and potentially adapt this function. The C++ `save_tensor` function aims to save plain tensors. |
|||
|
|||
By following these steps, you can integrate new models into this testing framework to validate their C++ implementations. |
@ -0,0 +1,760 @@ |
|||
#!/usr/bin/env python3 |
|||
import os |
|||
import torch |
|||
import numpy as np |
|||
import glob |
|||
import matplotlib.pyplot as plt |
|||
from pathlib import Path |
|||
import sys |
|||
import json |
|||
from tqdm import tqdm |
|||
import inspect |
|||
|
|||
# Add the project root to path |
|||
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) |
|||
|
|||
# Import model wrappers |
|||
from pytracking.features.net_wrappers import DiMPTorchScriptWrapper |
|||
|
|||
class ModelComparison: |
|||
def __init__(self, model_dir='exported_weights', num_samples=1000): |
|||
self.model_dir = model_dir |
|||
self.num_samples = num_samples |
|||
self.device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|||
|
|||
# Initialize comparison results |
|||
self.comparison_dir = Path('test') / 'comparison' |
|||
self.comparison_dir.mkdir(parents=True, exist_ok=True) |
|||
self.plots_dir = self.comparison_dir / 'plots' # plots_dir initialized here |
|||
|
|||
# Initialize models |
|||
self._init_models() |
|||
|
|||
def _init_models(self): |
|||
"""Initialize Python models""" |
|||
print("Loading Python models...") |
|||
|
|||
# Load DiMP components |
|||
self.models = DiMPTorchScriptWrapper( |
|||
model_dir=self.model_dir, |
|||
device=self.device, |
|||
backbone_sd='backbone', # Directory with backbone weights |
|||
classifier_sd='classifier', # Directory with classifier weights |
|||
bbregressor_sd='bb_regressor' # Directory with bbox regressor weights |
|||
) |
|||
|
|||
def compare_classifier(self): |
|||
"""Compare classifier model outputs between Python and C++""" |
|||
print("\nComparing classifier outputs...") |
|||
# Ensure paths are Path objects for consistency if not already |
|||
input_dir_path = Path('test') / 'input_samples' / 'classifier' |
|||
cpp_output_dir_path = Path('test') / 'output' / 'classifier' |
|||
|
|||
if not input_dir_path.exists() or not cpp_output_dir_path.exists(): |
|||
print(f"Classifier input or C++ output directory not found ({input_dir_path}, {cpp_output_dir_path}). Skipping.") |
|||
return |
|||
|
|||
# Removed: train_errors = [] |
|||
# Removed: test_errors = [] |
|||
# self.all_errors_stats is initialized per test run. |
|||
|
|||
# Compare training samples |
|||
print("\nClassifier - Comparing Training Samples...") |
|||
for i in tqdm(range(self.num_samples), desc="Training samples"): |
|||
current_errors = {} # For this sample |
|||
sample_dir = input_dir_path / f'sample_{i}' |
|||
cpp_out_sample_dir = cpp_output_dir_path / f'sample_{i}' |
|||
|
|||
py_clf_feat = None |
|||
cpp_clf_feat = None |
|||
|
|||
if not sample_dir.exists() or not cpp_out_sample_dir.exists(): |
|||
print(f"Warning: Skipping classifier train sample {i}, files not found at {sample_dir} or {cpp_out_sample_dir}.") |
|||
# No explicit error assignment here; _compare_tensor_data will handle Nones |
|||
else: |
|||
feat_path = sample_dir / 'backbone_feat.pt' |
|||
feat = self.load_cpp_tensor(feat_path, self.device) |
|||
if feat is None: |
|||
print(f"Critical: Failed to load input tensor for {feat_path} for classifier train sample {i}.") |
|||
# feat is None, py_clf_feat will remain None |
|||
else: |
|||
try: |
|||
with torch.no_grad(): |
|||
py_clf_feat = self.models.classifier.extract_classification_feat(feat) |
|||
except Exception as e: |
|||
print(f"ERROR: Python model extract_classification_feat (train) failed for sample {i}: {e}") |
|||
# py_clf_feat remains None |
|||
|
|||
cpp_clf_feat_path = cpp_out_sample_dir / 'clf_features.pt' |
|||
cpp_clf_feat = self.load_cpp_tensor(cpp_clf_feat_path, self.device) |
|||
if cpp_clf_feat is None: |
|||
print(f"Warning: Failed to load C++ output tensor {cpp_clf_feat_path} for classifier train sample {i}.") |
|||
# cpp_clf_feat remains None |
|||
|
|||
self._compare_tensor_data(py_clf_feat, cpp_clf_feat, "Classifier Features Train", i, current_errors) |
|||
if current_errors: self.all_errors_stats[f"Clf_Train_Sample_{i}"] = current_errors |
|||
|
|||
# Compare test samples |
|||
print("\nClassifier - Comparing Test Samples...") |
|||
for i in tqdm(range(self.num_samples), desc="Test samples"): |
|||
current_errors = {} # For this sample |
|||
test_sample_input_dir = input_dir_path / f'test_{i}' |
|||
cpp_test_out_sample_dir = cpp_output_dir_path / f'test_{i}' |
|||
|
|||
py_clf_feat_test = None |
|||
cpp_clf_feat_test = None |
|||
|
|||
if not test_sample_input_dir.exists() or not cpp_test_out_sample_dir.exists(): |
|||
print(f"Warning: Skipping classifier test sample {i}, files not found at {test_sample_input_dir} or {cpp_test_out_sample_dir}.") |
|||
# No explicit error assignment here |
|||
else: |
|||
test_feat_path = test_sample_input_dir / 'test_feat.pt' |
|||
test_feat = self.load_cpp_tensor(test_feat_path, self.device) |
|||
if test_feat is None: |
|||
print(f"Critical: Failed to load input tensor for {test_feat_path} for classifier test sample {i}.") |
|||
# test_feat is None, py_clf_feat_test remains None |
|||
else: |
|||
try: |
|||
with torch.no_grad(): |
|||
py_clf_feat_test = self.models.classifier.extract_classification_feat(test_feat) |
|||
except Exception as e: |
|||
print(f"ERROR: Python model extract_classification_feat (test) failed for sample {i}: {e}") |
|||
# py_clf_feat_test remains None |
|||
|
|||
cpp_clf_feat_test_path = cpp_test_out_sample_dir / 'clf_feat_test.pt' |
|||
cpp_clf_feat_test = self.load_cpp_tensor(cpp_clf_feat_test_path, self.device) |
|||
if cpp_clf_feat_test is None: |
|||
print(f"Warning: Failed to load C++ output tensor {cpp_clf_feat_test_path} for classifier test sample {i}.") |
|||
# cpp_clf_feat_test remains None |
|||
|
|||
self._compare_tensor_data(py_clf_feat_test, cpp_clf_feat_test, "Classifier Features Test", i, current_errors) |
|||
if current_errors: self.all_errors_stats[f"Clf_Test_Sample_{i}"] = current_errors |
|||
|
|||
# Old stats and plotting code removed/commented below, now handled by HTML report |
|||
# print("\nClassifier Comparison Statistics:") |
|||
# if train_errors: |
|||
# print(f" Training Features MAE: Mean={np.mean(train_errors):.4e}, Std={np.std(train_errors):.4e}") |
|||
# if test_errors: |
|||
# print(f" Test Features MAE: Mean={np.mean(test_errors):.4e}, Std={np.std(test_errors):.4e}") |
|||
|
|||
# self._generate_stats_and_plots(train_errors, "Classifier Training Features Error", self.plots_dir / "clf_train_feat_error_hist.png") |
|||
# self._generate_stats_and_plots(test_errors, "Classifier Test Features Error", self.plots_dir / "clf_test_feat_error_hist.png") |
|||
|
|||
def compare_bb_regressor(self): |
|||
"""Compare bb_regressor model outputs between Python and C++""" |
|||
print("\nComparing bb_regressor outputs...") |
|||
input_dir = Path('test') / 'input_samples' / 'bb_regressor' |
|||
cpp_output_dir = Path('test') / 'output' / 'bb_regressor' |
|||
|
|||
if not input_dir.exists() or not cpp_output_dir.exists(): |
|||
print(f"BB Regressor input or C++ output directory not found ({input_dir}, {cpp_output_dir}). Skipping.") |
|||
return |
|||
|
|||
for i in tqdm(range(self.num_samples), desc="BB Regressor samples"): |
|||
sample_dir = input_dir / f'sample_{i}' |
|||
cpp_output_sample_dir = cpp_output_dir / f'sample_{i}' |
|||
|
|||
# Load input tensors for BB Regressor for this sample |
|||
feat_layer2_path = sample_dir / 'feat_layer2.pt' |
|||
feat_layer3_path = sample_dir / 'feat_layer3.pt' |
|||
init_bbox_path = sample_dir / 'init_bbox.pt' |
|||
proposals_path = sample_dir / 'proposals.pt' |
|||
|
|||
feat_layer2 = self.load_cpp_tensor(feat_layer2_path, self.device) |
|||
feat_layer3 = self.load_cpp_tensor(feat_layer3_path, self.device) |
|||
init_bbox = self.load_cpp_tensor(init_bbox_path, self.device) |
|||
proposals = self.load_cpp_tensor(proposals_path, self.device) |
|||
|
|||
if any(t is None for t in [feat_layer2, feat_layer3, init_bbox, proposals]): |
|||
print(f"Critical: Failed to load one or more BB Regressor input tensors for sample {i}. Skipping.") |
|||
continue |
|||
|
|||
backbone_feat_tuple = (feat_layer2, feat_layer3) # Define the tuple for clarity |
|||
|
|||
# Get IoU features from Python model |
|||
# self.models.get_backbone_bbreg_feat calls self.bb_regressor.get_iou_feat |
|||
with torch.no_grad(): |
|||
py_iou_feat = self.models.get_backbone_bbreg_feat({"layer2": feat_layer2, "layer3": feat_layer3}) |
|||
|
|||
# Get modulation vectors |
|||
squeezed_init_bbox = init_bbox |
|||
if init_bbox is not None and init_bbox.dim() == 3 and init_bbox.shape[1] == 1: |
|||
squeezed_init_bbox = init_bbox.squeeze(1) |
|||
|
|||
with torch.no_grad(): |
|||
# Pass original backbone features to get_modulation |
|||
py_modulation = self.models.bb_regressor.get_modulation(backbone_feat_tuple, squeezed_init_bbox) |
|||
|
|||
# DEBUG: Print shapes |
|||
print(f"Sample {i}: py_iou_feat[0] shape: {py_iou_feat[0].shape}, py_modulation[0] shape: {py_modulation[0].shape}") |
|||
print(f"Sample {i}: py_iou_feat[1] shape: {py_iou_feat[1].shape}, py_modulation[1] shape: {py_modulation[1].shape}") |
|||
|
|||
# Predict IoU (Python model) |
|||
py_iou_pred = None |
|||
try: |
|||
with torch.no_grad(): |
|||
py_iou_pred = self.models.bb_regressor.predict_iou(py_modulation, py_iou_feat, proposals) |
|||
except RuntimeError as e: |
|||
print(f"WARNING: Python model self.models.bb_regressor.predict_iou failed for sample {i}: {e}") |
|||
|
|||
# Load C++ outputs |
|||
cpp_iou_pred_path = cpp_output_sample_dir / 'iou_pred.pt' |
|||
cpp_modulation_0_path = cpp_output_sample_dir / 'modulation_0.pt' |
|||
cpp_modulation_1_path = cpp_output_sample_dir / 'modulation_1.pt' |
|||
cpp_feat_0_path = cpp_output_sample_dir / 'iou_feat_0.pt' |
|||
cpp_feat_1_path = cpp_output_sample_dir / 'iou_feat_1.pt' |
|||
|
|||
cpp_iou_pred = self.load_cpp_tensor(cpp_iou_pred_path, self.device) |
|||
cpp_modulation_0 = self.load_cpp_tensor(cpp_modulation_0_path, self.device) |
|||
cpp_modulation_1 = self.load_cpp_tensor(cpp_modulation_1_path, self.device) |
|||
cpp_feat_0 = self.load_cpp_tensor(cpp_feat_0_path, self.device) |
|||
cpp_feat_1 = self.load_cpp_tensor(cpp_feat_1_path, self.device) |
|||
|
|||
current_errors = {} # Store errors for this sample for the HTML report |
|||
|
|||
# Compare IoU features (py_iou_feat vs cpp_feat_0/1) |
|||
# _compare_tensor_data will handle None inputs appropriately |
|||
py_iou_f0 = py_iou_feat[0] if py_iou_feat and len(py_iou_feat) > 0 else None |
|||
py_iou_f1 = py_iou_feat[1] if py_iou_feat and len(py_iou_feat) > 1 else None |
|||
self._compare_tensor_data(py_iou_f0, cpp_feat_0, "BBReg PyIoUFeat0 vs CppIoUFeat0", i, current_errors) |
|||
self._compare_tensor_data(py_iou_f1, cpp_feat_1, "BBReg PyIoUFeat1 vs CppIoUFeat1", i, current_errors) |
|||
|
|||
# Compare modulation vectors (py_modulation vs cpp_modulation_0/1) |
|||
py_mod_0 = py_modulation[0] if py_modulation and len(py_modulation) > 0 else None |
|||
py_mod_1 = py_modulation[1] if py_modulation and len(py_modulation) > 1 else None |
|||
self._compare_tensor_data(py_mod_0, cpp_modulation_0, "BBReg PyMod0 vs CppMod0", i, current_errors) |
|||
self._compare_tensor_data(py_mod_1, cpp_modulation_1, "BBReg PyMod1 vs CppMod1", i, current_errors) |
|||
|
|||
# Compare final IoU prediction |
|||
# _compare_tensor_data will handle None for py_iou_pred or cpp_iou_pred |
|||
self._compare_tensor_data(py_iou_pred, cpp_iou_pred, "BBReg IoUPred", i, current_errors) |
|||
|
|||
if current_errors: # Add to overall statistics if any comparisons were made/attempted |
|||
self.all_errors_stats[f"BBReg_Sample_{i}"] = current_errors |
|||
# Note: MAE accumulation for overall average needs to be selective based on valid comparisons |
|||
# For simplicity, we'll let the HTML report show NaNs for failed/skipped comparisons. |
|||
|
|||
if not self.all_errors_stats: # Check if any BB regressor comparisons were made |
|||
print("No BB Regressor comparisons were performed for this model type.") # Clarified message |
|||
# No plots or stats if nothing was compared for BB regressor |
|||
return |
|||
|
|||
# The following old averaging and plotting is now handled by generate_html_report using all_errors_stats |
|||
# print("\nBB Regressor Comparison Statistics:") |
|||
# if iou_pred_errors: |
|||
# print(f" IoU Prediction MAE: Mean={np.mean(iou_pred_errors):.4e}, Std={np.std(iou_pred_errors):.4e}") |
|||
# if modulation_errors: |
|||
# print(f" Modulation MAE: Mean={np.mean(modulation_errors):.4e}, Std={np.std(modulation_errors):.4e}") |
|||
# if feat_errors: |
|||
# print(f" IoU Feature MAE: Mean={np.mean(feat_errors):.4e}, Std={np.std(feat_errors):.4e}") |
|||
|
|||
# # Plots - these would need to be rethought with the new error structure |
|||
# self._generate_stats_and_plots(iou_pred_errors, "BB Regressor IoU Prediction Error", self.plots_dir / "bbreg_iou_pred_error_hist.png") |
|||
# self._generate_stats_and_plots(modulation_errors, "BB Regressor Modulation Error", self.plots_dir / "bbreg_modulation_error_hist.png") |
|||
# self._generate_stats_and_plots(feat_errors, "BB Regressor IoU Feature Error", self.plots_dir / "bbreg_feature_error_hist.png") |
|||
|
|||
def generate_html_report(self): |
|||
print("\nGenerating HTML report...") |
|||
report_path = self.comparison_dir / "report.html" |
|||
# plot_paths_dict = {} # This variable was unused |
|||
|
|||
# Prepare data for the report: group by model and comparison type |
|||
report_data = { |
|||
# "Model_Type Component_Name": { \ |
|||
# "samples": {0: {\"mae\":X, \"max_err\":Y, \"mean_py\":Z, \"std_err\":S, \"plot_path\":\"...\"}, 1:{...} },\n # "overall_mae_mean": A, "overall_mae_std": B, "overall_max_err_mean": C\n # }\n } |
|||
} |
|||
|
|||
for sample_key, comparisons in self.all_errors_stats.items(): |
|||
# sample_key examples: "Clf_Train_Sample_0", "Clf_Test_Sample_0", "BBReg_Sample_0" |
|||
parts = sample_key.split("_") |
|||
model_prefix = parts[0] # Clf, BBReg |
|||
sample_type_str = "" |
|||
sample_idx = -1 |
|||
|
|||
if model_prefix == "Clf": |
|||
sample_type_str = parts[1] # Train or Test |
|||
sample_idx = int(parts[-1]) |
|||
model_name_key = f"Classifier {sample_type_str}" |
|||
elif model_prefix == "BBReg": |
|||
sample_idx = int(parts[-1]) |
|||
model_name_key = "BB Regressor" |
|||
else: |
|||
print(f"WARNING: Unknown sample key format in all_errors_stats: {sample_key}") |
|||
continue |
|||
|
|||
for comparison_name, stats in comparisons.items(): |
|||
# comparison_name examples: "Classifier Features Train", "BBReg PyIoUFeat0 vs CppIoUFeat0" |
|||
# Unpack all 11 metrics now |
|||
mae, max_err, diff_arr, mean_py_val, std_abs_err, \ |
|||
l2_py, l2_cpp, l2_diff, cos_sim, pearson, mre = stats |
|||
|
|||
full_comparison_key = f"{model_name_key} - {comparison_name}" |
|||
|
|||
if full_comparison_key not in report_data: |
|||
report_data[full_comparison_key] = { |
|||
"samples": {}, |
|||
"all_maes": [], |
|||
"all_max_errs": [], |
|||
"all_mean_py_vals": [], |
|||
"all_std_abs_errs": [], # Renamed from all_std_errs |
|||
"all_l2_py_vals": [], |
|||
"all_l2_cpp_vals": [], |
|||
"all_l2_diff_vals": [], |
|||
"all_cos_sim_vals": [], |
|||
"all_pearson_vals": [], |
|||
"all_mre_vals": [] |
|||
} |
|||
|
|||
plot_filename = None |
|||
if diff_arr is not None and len(diff_arr) > 0 and not np.all(np.isnan(diff_arr)): |
|||
plot_filename = f"{model_prefix}_{sample_type_str}_sample{sample_idx}_{comparison_name.replace(' ', '_').replace('/', '_')}_hist.png" |
|||
plot_abs_path = self.plots_dir / plot_filename |
|||
# Pass std_abs_err to plotting function |
|||
self._generate_single_plot(diff_arr, comparison_name, plot_abs_path, mean_py_val, std_abs_err, mae, max_err) |
|||
|
|||
report_data[full_comparison_key]["samples"][sample_idx] = { |
|||
"mae": mae, |
|||
"max_err": max_err, |
|||
"mean_py_val": mean_py_val, |
|||
"std_abs_err": std_abs_err, # Renamed from std_err |
|||
"l2_py": l2_py, |
|||
"l2_cpp": l2_cpp, |
|||
"l2_diff": l2_diff, |
|||
"cos_sim": cos_sim, |
|||
"pearson": pearson, |
|||
"mre": mre, |
|||
"plot_path": plot_filename # Store relative path for HTML |
|||
} |
|||
if not np.isnan(mae): report_data[full_comparison_key]["all_maes"].append(mae) |
|||
if not np.isnan(max_err): report_data[full_comparison_key]["all_max_errs"].append(max_err) |
|||
if not np.isnan(mean_py_val): report_data[full_comparison_key]["all_mean_py_vals"].append(mean_py_val) |
|||
if not np.isnan(std_abs_err): report_data[full_comparison_key]["all_std_abs_errs"].append(std_abs_err) |
|||
if not np.isnan(l2_py): report_data[full_comparison_key]["all_l2_py_vals"].append(l2_py) |
|||
if not np.isnan(l2_cpp): report_data[full_comparison_key]["all_l2_cpp_vals"].append(l2_cpp) |
|||
if not np.isnan(l2_diff): report_data[full_comparison_key]["all_l2_diff_vals"].append(l2_diff) |
|||
if not np.isnan(cos_sim): report_data[full_comparison_key]["all_cos_sim_vals"].append(cos_sim) |
|||
if not np.isnan(pearson): report_data[full_comparison_key]["all_pearson_vals"].append(pearson) |
|||
if not np.isnan(mre): report_data[full_comparison_key]["all_mre_vals"].append(mre) |
|||
|
|||
# Calculate overall stats |
|||
for comp_key, data in report_data.items(): |
|||
data["overall_mae_mean"] = np.mean(data["all_maes"]) if data["all_maes"] else float('nan') |
|||
data["overall_mae_std"] = np.std(data["all_maes"]) if data["all_maes"] else float('nan') |
|||
data["overall_max_err_mean"] = np.mean(data["all_max_errs"]) if data["all_max_errs"] else float('nan') |
|||
data["overall_mean_py_val_mean"] = np.mean(data["all_mean_py_vals"]) if data["all_mean_py_vals"] else float('nan') |
|||
data["overall_std_abs_err_mean"] = np.mean(data["all_std_abs_errs"]) if data["all_std_abs_errs"] else float('nan') # Renamed |
|||
data["overall_l2_py_mean"] = np.mean(data["all_l2_py_vals"]) if data["all_l2_py_vals"] else float('nan') |
|||
data["overall_l2_cpp_mean"] = np.mean(data["all_l2_cpp_vals"]) if data["all_l2_cpp_vals"] else float('nan') |
|||
data["overall_l2_diff_mean"] = np.mean(data["all_l2_diff_vals"]) if data["all_l2_diff_vals"] else float('nan') |
|||
data["overall_cos_sim_mean"] = np.mean(data["all_cos_sim_vals"]) if data["all_cos_sim_vals"] else float('nan') |
|||
data["overall_pearson_mean"] = np.mean(data["all_pearson_vals"]) if data["all_pearson_vals"] else float('nan') |
|||
data["overall_mre_mean"] = np.mean(data["all_mre_vals"]) if data["all_mre_vals"] else float('nan') |
|||
|
|||
# HTML Generation |
|||
html_content = """ |
|||
<html> |
|||
<head> |
|||
<title>Model Comparison Report</title> |
|||
<style> |
|||
body { font-family: sans-serif; margin: 20px; } |
|||
h1, h2, h3 { color: #333; } |
|||
table { border-collapse: collapse; width: 90%; margin-bottom: 20px; } |
|||
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; } |
|||
th { background-color: #f2f2f2; } |
|||
.plot-container { margin-bottom: 30px; page-break-inside: avoid; } |
|||
img { max-width: 100%; height: auto; border: 1px solid #ccc; } |
|||
.nan { color: #999; font-style: italic; } |
|||
.collapsible { |
|||
background-color: #f2f2f2; |
|||
color: #444; |
|||
cursor: pointer; |
|||
padding: 10px; |
|||
width: 100%; |
|||
border: none; |
|||
text-align: left; |
|||
outline: none; |
|||
font-size: 1.1em; |
|||
margin-top: 10px; |
|||
margin-bottom: 5px; |
|||
} |
|||
.active, .collapsible:hover { |
|||
background-color: #ddd; |
|||
} |
|||
.content { |
|||
padding: 0 18px; |
|||
display: none; |
|||
overflow: hidden; |
|||
background-color: #f9f9f9; |
|||
} |
|||
.metric-explanation { margin-bottom: 20px; padding: 10px; border: 1px solid #eee; background-color: #f9f9f9; } |
|||
.metric-explanation dt { font-weight: bold; } |
|||
.metric-explanation dd { margin-left: 20px; margin-bottom: 5px; } |
|||
</style> |
|||
</head> |
|||
<body> |
|||
<h1>Model Comparison Report</h1> |
|||
<p>Number of samples per model component: {self.num_samples}</p> |
|||
|
|||
<div class="metric-explanation"> |
|||
<h3>Understanding the Metrics:</h3> |
|||
<dl> |
|||
<dt>Mean MAE (Mean Absolute Error)</dt> |
|||
<dd><b>Calculation:</b> Average of the absolute differences between corresponding elements of the Python and C++ tensors (<code>mean(abs(py - cpp))</code>). The "Mean MAE" in the summary table is the average of these MAEs over all samples for a given comparison.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Closer to 0 indicates better agreement. This metric shows the average magnitude of error.</dd> |
|||
|
|||
<dt>Std MAE (Standard Deviation of MAE)</dt> |
|||
<dd><b>Calculation:</b> Standard deviation of the MAE values calculated for each sample within a comparison group.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. A smaller value indicates that the MAE is consistent across samples. A larger value suggests variability in agreement from sample to sample.</dd> |
|||
|
|||
<dt>Mean Max Error</dt> |
|||
<dd><b>Calculation:</b> Average of the maximum absolute differences found between Python and C++ tensors for each sample (<code>mean(max(abs(py - cpp)))</code> over samples).</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Closer to 0 is better. Indicates the average of the worst-case discrepancies per sample.</dd> |
|||
|
|||
<dt>Mean Py Val (Mean Python Tensor Value)</dt> |
|||
<dd><b>Calculation:</b> Average of the mean values of the Python reference tensors over all samples (<code>mean(mean(py_tensor_sample_N))</code>).</dd> |
|||
<dd><b>Range & Interpretation:</b> Problem-dependent. Provides context about the typical magnitude of the Python model's output values.</dd> |
|||
|
|||
<dt>Mean Std Abs Err (Mean Standard Deviation of Absolute Errors)</dt> |
|||
<dd><b>Calculation:</b> Average of the standard deviations of the absolute error arrays (<code>abs(py - cpp)</code>) for each sample. The "Err Std" in plot titles is this value for that specific sample.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. A smaller value indicates that the errors are concentrated around their mean (MAE), implying less spread in error magnitudes within a sample.</dd> |
|||
|
|||
<dt>Mean L2 Py (Mean L2 Norm of Python Tensor)</dt> |
|||
<dd><b>Calculation:</b> Average of the L2 norms (Euclidean norm) of the flattened Python tensors over all samples.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Represents the average magnitude or "length" of the Python output vectors.</dd> |
|||
|
|||
<dt>Mean L2 Cpp (Mean L2 Norm of C++ Tensor)</dt> |
|||
<dd><b>Calculation:</b> Average of the L2 norms of the flattened C++ tensors over all samples.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Represents the average magnitude of the C++ output vectors. Should be comparable to Mean L2 Py if models agree in scale.</dd> |
|||
|
|||
<dt>Mean L2 Diff (Mean L2 Norm of Difference)</dt> |
|||
<dd><b>Calculation:</b> Average of the L2 norms of the flattened difference tensors (<code>py - cpp</code>) over all samples.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Closer to 0 indicates better agreement. This is the magnitude of the average difference vector.</dd> |
|||
|
|||
<dt>Mean Cosine Sim (Mean Cosine Similarity)</dt> |
|||
<dd><b>Calculation:</b> Average of the cosine similarities between the flattened Python and C++ tensors over all samples. Cosine similarity is <code>dot(py, cpp) / (norm(py) * norm(cpp))</code>.</dd> |
|||
<dd><b>Range & Interpretation:</b> -1 to 1 (typically 0 to 1 for non-negative features). Closer to 1 indicates that the tensors point in the same direction (high similarity in terms of orientation, ignoring magnitude). Values near 0 suggest orthogonality, and near -1 suggest opposite directions.</dd> |
|||
|
|||
<dt>Mean Pearson Corr (Mean Pearson Correlation Coefficient)</dt> |
|||
<dd><b>Calculation:</b> Average of the Pearson correlation coefficients between the flattened Python and C++ tensors over all samples. Measures linear correlation.</dd> |
|||
<dd><b>Range & Interpretation:</b> -1 to 1. Closer to 1 indicates strong positive linear correlation. Closer to -1 indicates strong negative linear correlation. Closer to 0 indicates weak or no linear correlation.</dd> |
|||
|
|||
<dt>Mean MRE (Mean Relative Error)</dt> |
|||
<dd><b>Calculation:</b> Average of the mean relative errors per sample, where relative error is <code>mean(abs(py - cpp) / (abs(py) + epsilon))</code>. Epsilon is a small value to prevent division by zero.</dd> |
|||
<dd><b>Range & Interpretation:</b> 0 to ∞. Closer to 0 is better. This metric normalizes the absolute error by the magnitude of the Python reference values, useful for understanding error relative to signal strength.</dd> |
|||
</dl> |
|||
</div> |
|||
""" |
|||
|
|||
sorted_report_keys = sorted(report_data.keys()) |
|||
|
|||
html_content += "<h2>Overall Comparison Statistics</h2><table><tr><th>Comparison Key</th><th>Mean MAE</th><th>Std MAE</th><th>Mean Max Error</th><th>Mean Py Val</th><th>Mean Std Abs Err</th><th>Mean L2 Py</th><th>Mean L2 Cpp</th><th>Mean L2 Diff</th><th>Mean Cosine Sim</th><th>Mean Pearson Corr</th><th>Mean MRE</th></tr>" |
|||
for comp_key in sorted_report_keys: |
|||
data = report_data[comp_key] |
|||
html_content += f""" |
|||
<tr> |
|||
<td>{comp_key}</td> |
|||
<td>{f"{data['overall_mae_mean']:.4e}" if not np.isnan(data['overall_mae_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_mae_std']:.4e}" if not np.isnan(data['overall_mae_std']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_max_err_mean']:.4e}" if not np.isnan(data['overall_max_err_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_mean_py_val_mean']:.4e}" if not np.isnan(data['overall_mean_py_val_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_std_abs_err_mean']:.4e}" if not np.isnan(data['overall_std_abs_err_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_l2_py_mean']:.4e}" if not np.isnan(data['overall_l2_py_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_l2_cpp_mean']:.4e}" if not np.isnan(data['overall_l2_cpp_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_l2_diff_mean']:.4e}" if not np.isnan(data['overall_l2_diff_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_cos_sim_mean']:.4f}" if not np.isnan(data['overall_cos_sim_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_pearson_mean']:.4f}" if not np.isnan(data['overall_pearson_mean']) else 'N/A'}</td> |
|||
<td>{f"{data['overall_mre_mean']:.4e}" if not np.isnan(data['overall_mre_mean']) else 'N/A'}</td> |
|||
</tr> |
|||
""" |
|||
html_content += "</table>" |
|||
|
|||
for comp_key in sorted_report_keys: |
|||
data = report_data[comp_key] |
|||
html_content += f"<h2>Details for: {comp_key}</h2>" |
|||
html_content += f"""<p>Overall Mean MAE: {f'{data["overall_mae_mean"]:.4e}' if not np.isnan(data['overall_mae_mean']) else 'N/A'}</p>""" |
|||
|
|||
html_content += "<table><tr><th>Sample Index</th><th>MAE</th><th>Max Error</th><th>Mean Py Val</th><th>Std Abs Err</th><th>L2 Py</th><th>L2 Cpp</th><th>L2 Diff</th><th>Cosine Sim</th><th>Pearson Corr</th><th>MRE</th><th>Error Distribution Plot</th></tr>" |
|||
for sample_idx in sorted(data["samples"].keys()): |
|||
sample_data = data["samples"][sample_idx] |
|||
plot_path_html = f'./plots/{sample_data["plot_path"]}' if sample_data["plot_path"] else "N/A" |
|||
img_tag = f'<img src="{plot_path_html}" alt="Error histogram">' if sample_data["plot_path"] else "N/A" |
|||
html_content += f""" |
|||
<tr> |
|||
<td>{sample_idx}</td> |
|||
<td>{f"{sample_data['mae']:.4e}" if not np.isnan(sample_data['mae']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['max_err']:.4e}" if not np.isnan(sample_data['max_err']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['mean_py_val']:.4e}" if not np.isnan(sample_data['mean_py_val']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['std_abs_err']:.4e}" if not np.isnan(sample_data['std_abs_err']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['l2_py']:.4e}" if not np.isnan(sample_data['l2_py']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['l2_cpp']:.4e}" if not np.isnan(sample_data['l2_cpp']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['l2_diff']:.4e}" if not np.isnan(sample_data['l2_diff']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['cos_sim']:.4f}" if not np.isnan(sample_data['cos_sim']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['pearson']:.4f}" if not np.isnan(sample_data['pearson']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{f"{sample_data['mre']:.4e}" if not np.isnan(sample_data['mre']) else '<span class="nan">N/A</span>'}</td> |
|||
<td>{img_tag}</td> |
|||
</tr> |
|||
""" |
|||
html_content += "</table>" |
|||
|
|||
html_content += """ |
|||
<script> |
|||
var coll = document.getElementsByClassName("collapsible"); |
|||
var i; |
|||
for (i = 0; i < coll.length; i++) { |
|||
coll[i].addEventListener("click", function() { |
|||
this.classList.toggle("active"); |
|||
var content = this.nextElementSibling; |
|||
if (content.style.display === "block") { |
|||
content.style.display = "none"; |
|||
} else { |
|||
content.style.display = "block"; |
|||
} |
|||
}); |
|||
} |
|||
</script> |
|||
</body></html> |
|||
""" |
|||
|
|||
with open(report_path, 'w') as f: |
|||
f.write(html_content) |
|||
print(f"HTML report generated at {report_path}") |
|||
|
|||
def _generate_single_plot(self, error_array, title, plot_path, mean_val, std_abs_err, mae, max_err): |
|||
if error_array is None or len(error_array) == 0 or np.all(np.isnan(error_array)): |
|||
# print(f"Skipping plot for {title} as error_array is empty or all NaNs.") |
|||
return |
|||
plt.figure(figsize=(8, 6)) |
|||
plt.hist(error_array, bins=50, color='skyblue', edgecolor='black') |
|||
|
|||
stats_text = f"Ref Mean: {mean_val:.3e} | MAE: {mae:.3e} | MaxErr: {max_err:.3e} | Err Std: {std_abs_err:.3e}" |
|||
plt.title(f"{title}\n{stats_text}", fontsize=10) |
|||
plt.xlabel("Error Value") |
|||
plt.ylabel("Frequency") |
|||
plt.grid(True, linestyle='--', alpha=0.7) |
|||
try: |
|||
plt.tight_layout() |
|||
plt.savefig(plot_path) |
|||
except Exception as e: |
|||
print(f"ERROR: Failed to save plot {plot_path}: {e}") |
|||
plt.close() |
|||
|
|||
def run_all_tests(self): |
|||
self.all_errors_stats = {} # Initialize/clear for the new run |
|||
self.plots_dir.mkdir(parents=True, exist_ok=True) # Ensure plots_dir exists |
|||
self.compare_classifier() |
|||
self.compare_bb_regressor() |
|||
self.generate_html_report() |
|||
print("All tests completed!") |
|||
|
|||
def load_cpp_tensor(self, path, device): |
|||
path_str = str(path) # Ensure path is a string |
|||
try: |
|||
# Attempt 1: Load as a plain tensor, assuming it's not a TorchScript module. |
|||
# This is the most common and safest way to load tensors saved from PyTorch (Python or C++). |
|||
tensor = torch.load(path_str, map_location=device, weights_only=True) |
|||
# print(f"Successfully loaded tensor from {path_str} with weights_only=True") |
|||
return tensor |
|||
except RuntimeError as e_weights_only: |
|||
# Handle cases where weights_only=True is not appropriate (e.g., TorchScript archives) |
|||
if "TorchScript archive" in str(e_weights_only) or \ |
|||
"PytorchStreamReader failed" in str(e_weights_only) or \ |
|||
"weights_only" in str(e_weights_only): # Broader check for weights_only issues |
|||
# print(f"weights_only=True failed for {path_str} ({e_weights_only}). Trying weights_only=False.") |
|||
try: |
|||
# Attempt 2: Load with weights_only=False. |
|||
loaded_obj = torch.load(path_str, map_location=device, weights_only=False) |
|||
|
|||
if isinstance(loaded_obj, torch.Tensor): |
|||
# print(f"Successfully loaded tensor from {path_str} with weights_only=False.") |
|||
return loaded_obj |
|||
|
|||
# Check for _actual_script_module for deeply nested tensors |
|||
elif hasattr(loaded_obj, '_actual_script_module') and hasattr(loaded_obj._actual_script_module, 'forward'): |
|||
# print(f"Found _actual_script_module in {path_str}, trying its forward().") |
|||
try: |
|||
potential_tensor = loaded_obj._actual_script_module.forward() |
|||
if isinstance(potential_tensor, torch.Tensor): |
|||
# print(f"Extracted tensor using _actual_script_module.forward() from {path_str}") |
|||
return potential_tensor |
|||
except Exception as e_deep_forward: |
|||
print(f"Warning: Calling _actual_script_module.forward() from {path_str} failed: {e_deep_forward}") |
|||
|
|||
# General ScriptModule handling (RecursiveScriptModule or any object with forward) |
|||
elif isinstance(loaded_obj, torch.jit.RecursiveScriptModule) or hasattr(loaded_obj, 'forward'): |
|||
# print(f"Loaded a ScriptModule/object with forward from {path_str}. Attempting extraction.") |
|||
|
|||
# Attempt 2a: Greedily find the first tensor attribute |
|||
for attr_name in dir(loaded_obj): |
|||
if attr_name.startswith('__'): |
|||
continue |
|||
try: |
|||
attr_val = getattr(loaded_obj, attr_name) |
|||
if isinstance(attr_val, torch.Tensor): |
|||
# print(f"Extracted tensor from attribute '{attr_name}' of ScriptModule at {path_str}") |
|||
return attr_val |
|||
except Exception: |
|||
pass # Ignore errors from getattr |
|||
|
|||
# Attempt 2b: Try calling forward() if it exists and no tensor attribute was found |
|||
if hasattr(loaded_obj, 'forward') and callable(loaded_obj.forward): |
|||
sig = inspect.signature(loaded_obj.forward) |
|||
if not sig.parameters: # Only call if forward() takes no arguments |
|||
try: |
|||
potential_tensor = loaded_obj.forward() |
|||
if isinstance(potential_tensor, torch.Tensor): |
|||
# print(f"Extracted tensor using forward() from ScriptModule at {path_str}") |
|||
return potential_tensor |
|||
except Exception as e_forward: |
|||
print(f"Warning: Calling forward() on ScriptModule from {path_str} failed: {e_forward}") |
|||
|
|||
# Attempt 2c: Check state_dict |
|||
try: |
|||
sd = loaded_obj.state_dict() |
|||
# print(f"DEBUG: state_dict for {path_str}: {list(sd.keys())}") |
|||
if len(sd) == 1: |
|||
tensor_name = list(sd.keys())[0] |
|||
potential_tensor = sd[tensor_name] |
|||
if isinstance(potential_tensor, torch.Tensor): |
|||
print(f"INFO: Extracted tensor '{tensor_name}' from single-entry state_dict of ScriptModule at {path_str}") |
|||
return potential_tensor |
|||
elif len(sd) > 1: |
|||
# If multiple tensors, this is heuristic. Prefer known/simple names if possible. |
|||
# For now, just take the first one if it's a tensor. |
|||
for tensor_name, potential_tensor in sd.items(): |
|||
if isinstance(potential_tensor, torch.Tensor): |
|||
print(f"INFO: Extracted tensor '{tensor_name}' (from multiple) from state_dict of ScriptModule at {path_str}") |
|||
return potential_tensor |
|||
print(f"Warning: ScriptModule at {path_str} has multiple state_dict entries: {list(sd.keys())} but none were straightforwardly returned as the primary tensor.") |
|||
# else: state_dict is empty, or no tensors found above |
|||
except Exception as e_sd: |
|||
print(f"Warning: Error accessing/processing state_dict for ScriptModule at {path_str}: {e_sd}") |
|||
|
|||
print(f"ERROR: Could not extract tensor from ScriptModule at {path_str} after trying attributes, forward(), and state_dict(). Dir: {dir(loaded_obj)}") |
|||
return None |
|||
else: |
|||
print(f"ERROR: Loaded object from {path_str} (with weights_only=False) is not a Tensor or recognized ScriptModule. Type: {type(loaded_obj)}.") |
|||
return None |
|||
except Exception as e_load_false: |
|||
print(f"ERROR: weights_only=False also failed for {path_str}. Last error: {e_load_false}") |
|||
return None |
|||
else: # Some other error with weights_only=True |
|||
print(f"ERROR: Loading tensor from {path_str} with weights_only=True failed with an unexpected error: {e_weights_only}") |
|||
return None |
|||
except Exception as e_generic: |
|||
print(f"ERROR: A generic error occurred while loading tensor from {path_str}: {e_generic}") |
|||
return None |
|||
|
|||
def _compare_tensor_data(self, tensor1, tensor2, name, sample_idx, current_errors): |
|||
"""Compare two tensors and return error metrics.""" |
|||
num_metrics = 11 # mae, max_err, diff_arr, mean_py_val, std_abs_err, l2_py, l2_cpp, l2_diff, cos_sim, pearson, mre |
|||
nan_metrics_tuple = ( |
|||
float('nan'), float('nan'), [], float('nan'), float('nan'), # Original 5 |
|||
float('nan'), float('nan'), float('nan'), float('nan'), float('nan'), float('nan') # New 6 |
|||
) |
|||
|
|||
if tensor1 is None or tensor2 is None: |
|||
py_mean = float('nan') |
|||
py_l2 = float('nan') |
|||
if tensor1 is not None: # Python tensor exists |
|||
t1_cpu_temp = tensor1.cpu().detach().numpy().astype(np.float32) |
|||
py_mean = np.mean(t1_cpu_temp) |
|||
py_l2 = np.linalg.norm(t1_cpu_temp.flatten()) |
|||
# If only tensor2 is None, we can't calculate C++ l2 or comparison metrics |
|||
# If only tensor1 is None, py_mean and py_l2 remain NaN. |
|||
|
|||
current_errors[name] = ( |
|||
float('nan'), float('nan'), [], py_mean, float('nan'), |
|||
py_l2, float('nan'), float('nan'), float('nan'), float('nan'), float('nan') |
|||
) |
|||
print(f"Warning: Cannot compare '{name}' for sample {sample_idx}, one or both tensors are None.") |
|||
return |
|||
|
|||
t1_cpu = tensor1.cpu().detach().numpy().astype(np.float32) |
|||
t2_cpu = tensor2.cpu().detach().numpy().astype(np.float32) |
|||
|
|||
if t1_cpu.shape != t2_cpu.shape: |
|||
print(f"Warning: Shape mismatch for '{name}' sample {sample_idx}. Py: {t1_cpu.shape}, Cpp: {t2_cpu.shape}. Skipping most comparisons.") |
|||
current_errors[name] = ( |
|||
float('nan'), float('nan'), [], np.mean(t1_cpu), float('nan'), # MAE, MaxErr, diff_arr, MeanPy, StdAbsErr |
|||
np.linalg.norm(t1_cpu.flatten()), np.linalg.norm(t2_cpu.flatten()), float('nan'), # L2Py, L2Cpp, L2Diff |
|||
float('nan'), float('nan'), float('nan') # CosSim, Pearson, MRE |
|||
) |
|||
return |
|||
|
|||
# All calculations from here assume shapes match and tensors are not None |
|||
t1_flat = t1_cpu.flatten() |
|||
t2_flat = t2_cpu.flatten() |
|||
|
|||
abs_diff_elements = np.abs(t1_cpu - t2_cpu) |
|||
mae = np.mean(abs_diff_elements) |
|||
max_err = np.max(abs_diff_elements) |
|||
diff_arr_for_hist = abs_diff_elements.flatten() # For histogram |
|||
|
|||
mean_py_val = np.mean(t1_cpu) |
|||
std_abs_err = np.std(diff_arr_for_hist) |
|||
|
|||
l2_norm_py = np.linalg.norm(t1_flat) |
|||
l2_norm_cpp = np.linalg.norm(t2_flat) |
|||
l2_norm_diff = np.linalg.norm(t1_flat - t2_flat) |
|||
|
|||
# Cosine Similarity |
|||
dot_product = np.dot(t1_flat, t2_flat) |
|||
if l2_norm_py == 0 or l2_norm_cpp == 0: |
|||
cosine_sim = float('nan') |
|||
else: |
|||
cosine_sim = dot_product / (l2_norm_py * l2_norm_cpp) |
|||
|
|||
# Pearson Correlation Coefficient |
|||
if len(t1_flat) < 2: |
|||
pearson_corr = float('nan') |
|||
else: |
|||
std_t1 = np.std(t1_flat) |
|||
std_t2 = np.std(t2_flat) |
|||
if std_t1 == 0 or std_t2 == 0: # If either is constant |
|||
if std_t1 == 0 and std_t2 == 0 and np.allclose(t1_flat, t2_flat): |
|||
pearson_corr = 1.0 # Both constant and identical |
|||
else: |
|||
pearson_corr = float('nan') # Otherwise, undefined or not meaningfully 1 |
|||
else: |
|||
try: |
|||
corr_matrix = np.corrcoef(t1_flat, t2_flat) |
|||
if corr_matrix.ndim == 2: |
|||
pearson_corr = corr_matrix[0, 1] |
|||
else: # Should be a scalar if inputs were effectively constant, already handled by std checks |
|||
pearson_corr = float(corr_matrix) if np.isscalar(corr_matrix) else float('nan') |
|||
except Exception: |
|||
pearson_corr = float('nan') |
|||
|
|||
# Mean Relative Error (MRE) |
|||
epsilon_rel_err = 1e-9 # Small epsilon to avoid division by zero and extreme values |
|||
# Calculate relative error where abs(t1_cpu) is not zero (or very small) |
|||
# For elements where t1_cpu is zero (or very small): |
|||
# - If t2_cpu is also zero (small), error is small. |
|||
# - If t2_cpu is not zero, relative error is infinite/large. |
|||
# Using (abs(t1_cpu) + epsilon) in denominator handles this. |
|||
mean_rel_err = np.mean(abs_diff_elements / (np.abs(t1_cpu) + epsilon_rel_err)) |
|||
|
|||
current_errors[name] = ( |
|||
mae, max_err, diff_arr_for_hist, mean_py_val, std_abs_err, |
|||
l2_norm_py, l2_norm_cpp, l2_norm_diff, cosine_sim, pearson_corr, mean_rel_err |
|||
) |
|||
|
|||
# Optional: print detailed error for specific high-error cases |
|||
# if mae > 1e-4: |
|||
# print(f"High MAE for {name}, sample {sample_idx}: {mae:.6f}") |
|||
|
|||
# The function implicitly returns None as it modifies current_errors in place. |
|||
# For direct use, if needed, it could return the tuple: |
|||
# return (mae, max_err, diff_arr_for_hist, mean_py_val, std_abs_err, l2_norm_py, l2_norm_cpp, l2_norm_diff, cosine_sim, pearson_corr, mean_rel_err) |
|||
|
|||
|
|||
if __name__ == "__main__": |
|||
# Parse command line arguments |
|||
import argparse |
|||
parser = argparse.ArgumentParser(description="Compare Python and C++ model implementations") |
|||
parser.add_argument("--num-samples", type=int, default=1000, help="Number of test samples (default: 1000)") |
|||
args = parser.parse_args() |
|||
|
|||
# Run comparison |
|||
comparison = ModelComparison(num_samples=args.num_samples) |
|||
comparison.run_all_tests() |
Write
Preview
Loading…
Cancel
Save
Reference in new issue