Browse Source

Docs: Enhance HTML report metrics and add test framework README

master
mht 1 week ago
parent
commit
2502097f00
  1. 144
      test/README.md
  2. 760
      test/compare_models.py

144
test/README.md

@ -0,0 +1,144 @@
# C++ Tracker Model Testing Framework
This directory contains the testing framework for comparing C++ implementations of PyTorch models against their original Python counterparts.
## Overview
The primary goal is to ensure that the C++ models (`cimp` project) produce results that are numerically close to the Python models from the `pytracking` toolkit, given the same inputs and model weights.
The framework consists of:
1. **C++ Test Program (`test_models.cpp`)**:
* Responsible for loading pre-trained model weights (from `exported_weights/`).
* Takes randomly generated input tensors (pre-generated by `generate_test_samples.cpp` and saved in `test/input_samples/`).
* Runs the C++ `Classifier` and `BBRegressor` models.
* Saves the C++ model output tensors to `test/output/`.
2. **C++ Sample Generator (`generate_test_samples.cpp`)**:
* Generates a specified number of random input tensor sets for both the classifier and bounding box regressor.
* Saves these input tensors into `test/input_samples/{classifier|bb_regressor}/sample_N/` and `test/input_samples/{classifier|bb_regressor}/test_N/` (for classifier test features).
* This step is separated to allow the Python comparison script to run even if the C++ models have issues during their execution phase.
3. **Python Comparison Script (`compare_models.py`)**:
* Loads the original Python models (using `DiMPTorchScriptWrapper` which loads weights from `exported_weights/`).
* Loads the input tensors generated by `generate_test_samples.cpp` from `test/input_samples/`.
* Runs the Python models on these input tensors to get reference Python outputs.
* Loads the C++ model output tensors from `test/output/`.
* Performs a detailed, element-wise comparison between Python and C++ outputs.
* Calculates various error metrics (MAE, Max Error, L2 norms, Cosine Similarity, Pearson Correlation, Mean Relative Error).
* Generates an HTML report (`test/comparison/report.html`) summarizing the comparisons, including per-sample statistics and error distribution plots (saved in `test/comparison/plots/`).
4. **Automation Script (`run_full_comparison.sh`)**:
* Orchestrates the entire testing process:
1. Builds the C++ project (including `test_models` and `generate_test_samples`).
2. Runs `generate_test_samples` to create/update input data.
3. Runs `test_models` to generate C++ outputs.
4. Runs `compare_models.py` to perform the comparison and generate the report.
* Accepts the number of samples as an argument.
## Directory Structure
```
test/
├── input_samples/ # Stores input tensors generated by C++
│ ├── classifier/
│ │ ├── sample_0/
│ │ │ ├── backbone_feat.pt
│ │ │ └── ... (other classifier train inputs)
│ │ └── test_0/
│ │ └── test_feat.pt
│ │ └── ... (other classifier test inputs)
│ └── bb_regressor/
│ └── sample_0/
│ ├── feat_layer2.pt
│ ├── feat_layer3.pt
│ └── ... (other bb_regressor inputs)
├── output/ # Stores output tensors generated by C++ models
│ ├── classifier/
│ │ ├── sample_0/
│ │ │ └── clf_features.pt
│ │ └── test_0/
│ │ └── clf_feat_test.pt
│ └── bb_regressor/
│ └── sample_0/
│ ├── iou_pred.pt
│ └── ... (other bb_regressor outputs)
├── comparison/ # Stores comparison results
│ ├── report.html # Main HTML report
│ └── plots/ # Error distribution histograms
├── test_models.cpp # C++ program to run models and save outputs
├── generate_test_samples.cpp # C++ program to generate input samples
├── compare_models.py # Python script for comparison and report generation
├── run_full_comparison.sh # Main test execution script
└── README.md # This file
```
## How to Add a New Model for Comparison
Let's say you want to add a new model called `MyNewModel` with both C++ and Python implementations.
**1. Export Python Model Weights:**
* Ensure your Python `MyNewModel` can have its weights saved in a format loadable by both Python (e.g., `state_dict` or individual tensors) and C++ (LibTorch `torch::load`).
* Create a subdirectory `exported_weights/mynewmodel/` and save the weights there.
* Document the tensor names and their corresponding model parameters in a `mynewmodel_weights_doc.txt` file within that directory (see existing `classifier_weights_doc.txt` or `bb_regressor_weights_doc.txt` for examples). This is crucial for the `DiMPTorchScriptWrapper` if loading from individual tensors.
**2. Update C++ Code:**
* **`generate_test_samples.cpp`**:
* Add functions to generate realistic random input tensors for `MyNewModel`.
* Define the expected input tensor names and shapes.
* Modify the `main` function to:
* Create a directory `test/input_samples/mynewmodel/sample_N/`.
* Call your new input generation functions.
* Save these input tensors (e.g., `my_input1.pt`, `my_input2.pt`) into the created directory using the `save_tensor` utility.
* **`test_models.cpp`**:
* Include the header for your C++ `MyNewModel` (e.g., `cimp/mynewmodel/mynewmodel.h`).
* In the `main` function:
* Add a section for `MyNewModel`.
* Determine the absolute path to `exported_weights/mynewmodel/`.
* Instantiate your C++ `MyNewModel`, passing the weights directory.
* Loop through the number of samples:
* Construct paths to the input tensors in `test/input_samples/mynewmodel/sample_N/`.
* Load these input tensors using `load_tensor`. Ensure they are on the correct device (CPU/CUDA).
* Call the relevant methods of your C++ `MyNewModel` (e.g., `myNewModel.predict(...)`).
* Create an output directory `test/output/mynewmodel/sample_N/`.
* Save the output tensors from your C++ model (e.g., `my_output.pt`) to this directory using `save_tensor`. Remember to move outputs to CPU before saving if they are on CUDA.
* **`CMakeLists.txt`**:
* If `MyNewModel` is a new static library (like `classifier` or `bb_regressor`), define its sources and add it as a library.
* Link `test_models` and `generate_test_samples` (if it needs new specific libraries) with `MyNewModel` library and any other dependencies (like LibTorch).
**3. Update Python Comparison Script (`compare_models.py`):**
* **`ModelComparison.__init__` & `_init_models`**:
* If your Python `MyNewModel` needs to be loaded via `DiMPTorchScriptWrapper`, update the wrapper or add logic to load your model. You might need to add a new parameter like `mynewmodel_sd='mynewmodel'` to `DiMPTorchScriptWrapper` and handle its loading.
* Store the loaded Python `MyNewModel` instance (e.g., `self.models.mynewmodel`).
* **Create `compare_mynewmodel` method**:
* Create a new method, e.g., `def compare_mynewmodel(self):`.
* Print a starting message.
* Define input and C++ output directory paths: `Path('test') / 'input_samples' / 'mynewmodel'` and `Path('test') / 'output' / 'mynewmodel'`.
* Loop through `self.num_samples`:
* Initialize `current_errors = {}` for the current sample.
* Construct paths to input tensors for `MyNewModel` from `test/input_samples/mynewmodel/sample_N/`.
* Load these tensors using `self.load_cpp_tensor()`.
* Run the Python `MyNewModel` with these inputs to get `py_output_tensor`. Handle potential errors.
* Construct paths to C++ output tensors from `test/output/mynewmodel/sample_N/`.
* Load the C++ output tensor (`cpp_output_tensor`) using `self.load_cpp_tensor()`.
* Call `self._compare_tensor_data(py_output_tensor, cpp_output_tensor, "MyNewModel Output Comparison Name", i, current_errors)`. Use a descriptive name.
* If there are multiple distinct outputs from `MyNewModel` to compare, repeat the load and `_compare_tensor_data` calls for each.
* Store the results: `if current_errors: self.all_errors_stats[f"MyNewModel_Sample_{i}"] = current_errors`.
* **`ModelComparison.run_all_tests`**:
* Call your new `self.compare_mynewmodel()` method.
**4. Run the Tests:**
* Execute `./test/run_full_comparison.sh <num_samples>`.
* Check the console output and `test/comparison/report.html` for the results of `MyNewModel`.
## Key Considerations:
* **Tensor Naming and Paths:** Be consistent with tensor filenames and directory structures. The Python script relies on these conventions to find the correct files.
* **Data Types and Devices:** Ensure tensors are of compatible data types (usually `float32`) and are on the correct device (CPU/CUDA) before model inference and before saving/loading. C++ outputs are saved from CPU.
* **Error Handling:** Implement robust error handling in both C++ (e.g., for file loading, model errors) and Python (e.g., for tensor loading, Python model execution). The comparison script is designed to report "N/A" for metrics if tensors are missing or shapes mismatch, allowing other comparisons to proceed.
* **`DiMPTorchScriptWrapper`:** If your Python model structure is different from DiMP's Classifier/BBRegressor, you might need to adapt `DiMPTorchScriptWrapper` or write a custom loader for your Python model if it's not already a `torch.jit.ScriptModule`. The current wrapper supports loading from a directory of named tensor files based on a documentation text file.
* **`load_cpp_tensor` in Python:** This utility in `compare_models.py` attempts to robustly load tensors saved by LibTorch (which sometimes get wrapped as `RecursiveScriptModule`). If you encounter issues loading your C++ saved tensors, you might need to inspect their structure and potentially adapt this function. The C++ `save_tensor` function aims to save plain tensors.
By following these steps, you can integrate new models into this testing framework to validate their C++ implementations.

760
test/compare_models.py

@ -0,0 +1,760 @@
#!/usr/bin/env python3
import os
import torch
import numpy as np
import glob
import matplotlib.pyplot as plt
from pathlib import Path
import sys
import json
from tqdm import tqdm
import inspect
# Add the project root to path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
# Import model wrappers
from pytracking.features.net_wrappers import DiMPTorchScriptWrapper
class ModelComparison:
def __init__(self, model_dir='exported_weights', num_samples=1000):
self.model_dir = model_dir
self.num_samples = num_samples
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Initialize comparison results
self.comparison_dir = Path('test') / 'comparison'
self.comparison_dir.mkdir(parents=True, exist_ok=True)
self.plots_dir = self.comparison_dir / 'plots' # plots_dir initialized here
# Initialize models
self._init_models()
def _init_models(self):
"""Initialize Python models"""
print("Loading Python models...")
# Load DiMP components
self.models = DiMPTorchScriptWrapper(
model_dir=self.model_dir,
device=self.device,
backbone_sd='backbone', # Directory with backbone weights
classifier_sd='classifier', # Directory with classifier weights
bbregressor_sd='bb_regressor' # Directory with bbox regressor weights
)
def compare_classifier(self):
"""Compare classifier model outputs between Python and C++"""
print("\nComparing classifier outputs...")
# Ensure paths are Path objects for consistency if not already
input_dir_path = Path('test') / 'input_samples' / 'classifier'
cpp_output_dir_path = Path('test') / 'output' / 'classifier'
if not input_dir_path.exists() or not cpp_output_dir_path.exists():
print(f"Classifier input or C++ output directory not found ({input_dir_path}, {cpp_output_dir_path}). Skipping.")
return
# Removed: train_errors = []
# Removed: test_errors = []
# self.all_errors_stats is initialized per test run.
# Compare training samples
print("\nClassifier - Comparing Training Samples...")
for i in tqdm(range(self.num_samples), desc="Training samples"):
current_errors = {} # For this sample
sample_dir = input_dir_path / f'sample_{i}'
cpp_out_sample_dir = cpp_output_dir_path / f'sample_{i}'
py_clf_feat = None
cpp_clf_feat = None
if not sample_dir.exists() or not cpp_out_sample_dir.exists():
print(f"Warning: Skipping classifier train sample {i}, files not found at {sample_dir} or {cpp_out_sample_dir}.")
# No explicit error assignment here; _compare_tensor_data will handle Nones
else:
feat_path = sample_dir / 'backbone_feat.pt'
feat = self.load_cpp_tensor(feat_path, self.device)
if feat is None:
print(f"Critical: Failed to load input tensor for {feat_path} for classifier train sample {i}.")
# feat is None, py_clf_feat will remain None
else:
try:
with torch.no_grad():
py_clf_feat = self.models.classifier.extract_classification_feat(feat)
except Exception as e:
print(f"ERROR: Python model extract_classification_feat (train) failed for sample {i}: {e}")
# py_clf_feat remains None
cpp_clf_feat_path = cpp_out_sample_dir / 'clf_features.pt'
cpp_clf_feat = self.load_cpp_tensor(cpp_clf_feat_path, self.device)
if cpp_clf_feat is None:
print(f"Warning: Failed to load C++ output tensor {cpp_clf_feat_path} for classifier train sample {i}.")
# cpp_clf_feat remains None
self._compare_tensor_data(py_clf_feat, cpp_clf_feat, "Classifier Features Train", i, current_errors)
if current_errors: self.all_errors_stats[f"Clf_Train_Sample_{i}"] = current_errors
# Compare test samples
print("\nClassifier - Comparing Test Samples...")
for i in tqdm(range(self.num_samples), desc="Test samples"):
current_errors = {} # For this sample
test_sample_input_dir = input_dir_path / f'test_{i}'
cpp_test_out_sample_dir = cpp_output_dir_path / f'test_{i}'
py_clf_feat_test = None
cpp_clf_feat_test = None
if not test_sample_input_dir.exists() or not cpp_test_out_sample_dir.exists():
print(f"Warning: Skipping classifier test sample {i}, files not found at {test_sample_input_dir} or {cpp_test_out_sample_dir}.")
# No explicit error assignment here
else:
test_feat_path = test_sample_input_dir / 'test_feat.pt'
test_feat = self.load_cpp_tensor(test_feat_path, self.device)
if test_feat is None:
print(f"Critical: Failed to load input tensor for {test_feat_path} for classifier test sample {i}.")
# test_feat is None, py_clf_feat_test remains None
else:
try:
with torch.no_grad():
py_clf_feat_test = self.models.classifier.extract_classification_feat(test_feat)
except Exception as e:
print(f"ERROR: Python model extract_classification_feat (test) failed for sample {i}: {e}")
# py_clf_feat_test remains None
cpp_clf_feat_test_path = cpp_test_out_sample_dir / 'clf_feat_test.pt'
cpp_clf_feat_test = self.load_cpp_tensor(cpp_clf_feat_test_path, self.device)
if cpp_clf_feat_test is None:
print(f"Warning: Failed to load C++ output tensor {cpp_clf_feat_test_path} for classifier test sample {i}.")
# cpp_clf_feat_test remains None
self._compare_tensor_data(py_clf_feat_test, cpp_clf_feat_test, "Classifier Features Test", i, current_errors)
if current_errors: self.all_errors_stats[f"Clf_Test_Sample_{i}"] = current_errors
# Old stats and plotting code removed/commented below, now handled by HTML report
# print("\nClassifier Comparison Statistics:")
# if train_errors:
# print(f" Training Features MAE: Mean={np.mean(train_errors):.4e}, Std={np.std(train_errors):.4e}")
# if test_errors:
# print(f" Test Features MAE: Mean={np.mean(test_errors):.4e}, Std={np.std(test_errors):.4e}")
# self._generate_stats_and_plots(train_errors, "Classifier Training Features Error", self.plots_dir / "clf_train_feat_error_hist.png")
# self._generate_stats_and_plots(test_errors, "Classifier Test Features Error", self.plots_dir / "clf_test_feat_error_hist.png")
def compare_bb_regressor(self):
"""Compare bb_regressor model outputs between Python and C++"""
print("\nComparing bb_regressor outputs...")
input_dir = Path('test') / 'input_samples' / 'bb_regressor'
cpp_output_dir = Path('test') / 'output' / 'bb_regressor'
if not input_dir.exists() or not cpp_output_dir.exists():
print(f"BB Regressor input or C++ output directory not found ({input_dir}, {cpp_output_dir}). Skipping.")
return
for i in tqdm(range(self.num_samples), desc="BB Regressor samples"):
sample_dir = input_dir / f'sample_{i}'
cpp_output_sample_dir = cpp_output_dir / f'sample_{i}'
# Load input tensors for BB Regressor for this sample
feat_layer2_path = sample_dir / 'feat_layer2.pt'
feat_layer3_path = sample_dir / 'feat_layer3.pt'
init_bbox_path = sample_dir / 'init_bbox.pt'
proposals_path = sample_dir / 'proposals.pt'
feat_layer2 = self.load_cpp_tensor(feat_layer2_path, self.device)
feat_layer3 = self.load_cpp_tensor(feat_layer3_path, self.device)
init_bbox = self.load_cpp_tensor(init_bbox_path, self.device)
proposals = self.load_cpp_tensor(proposals_path, self.device)
if any(t is None for t in [feat_layer2, feat_layer3, init_bbox, proposals]):
print(f"Critical: Failed to load one or more BB Regressor input tensors for sample {i}. Skipping.")
continue
backbone_feat_tuple = (feat_layer2, feat_layer3) # Define the tuple for clarity
# Get IoU features from Python model
# self.models.get_backbone_bbreg_feat calls self.bb_regressor.get_iou_feat
with torch.no_grad():
py_iou_feat = self.models.get_backbone_bbreg_feat({"layer2": feat_layer2, "layer3": feat_layer3})
# Get modulation vectors
squeezed_init_bbox = init_bbox
if init_bbox is not None and init_bbox.dim() == 3 and init_bbox.shape[1] == 1:
squeezed_init_bbox = init_bbox.squeeze(1)
with torch.no_grad():
# Pass original backbone features to get_modulation
py_modulation = self.models.bb_regressor.get_modulation(backbone_feat_tuple, squeezed_init_bbox)
# DEBUG: Print shapes
print(f"Sample {i}: py_iou_feat[0] shape: {py_iou_feat[0].shape}, py_modulation[0] shape: {py_modulation[0].shape}")
print(f"Sample {i}: py_iou_feat[1] shape: {py_iou_feat[1].shape}, py_modulation[1] shape: {py_modulation[1].shape}")
# Predict IoU (Python model)
py_iou_pred = None
try:
with torch.no_grad():
py_iou_pred = self.models.bb_regressor.predict_iou(py_modulation, py_iou_feat, proposals)
except RuntimeError as e:
print(f"WARNING: Python model self.models.bb_regressor.predict_iou failed for sample {i}: {e}")
# Load C++ outputs
cpp_iou_pred_path = cpp_output_sample_dir / 'iou_pred.pt'
cpp_modulation_0_path = cpp_output_sample_dir / 'modulation_0.pt'
cpp_modulation_1_path = cpp_output_sample_dir / 'modulation_1.pt'
cpp_feat_0_path = cpp_output_sample_dir / 'iou_feat_0.pt'
cpp_feat_1_path = cpp_output_sample_dir / 'iou_feat_1.pt'
cpp_iou_pred = self.load_cpp_tensor(cpp_iou_pred_path, self.device)
cpp_modulation_0 = self.load_cpp_tensor(cpp_modulation_0_path, self.device)
cpp_modulation_1 = self.load_cpp_tensor(cpp_modulation_1_path, self.device)
cpp_feat_0 = self.load_cpp_tensor(cpp_feat_0_path, self.device)
cpp_feat_1 = self.load_cpp_tensor(cpp_feat_1_path, self.device)
current_errors = {} # Store errors for this sample for the HTML report
# Compare IoU features (py_iou_feat vs cpp_feat_0/1)
# _compare_tensor_data will handle None inputs appropriately
py_iou_f0 = py_iou_feat[0] if py_iou_feat and len(py_iou_feat) > 0 else None
py_iou_f1 = py_iou_feat[1] if py_iou_feat and len(py_iou_feat) > 1 else None
self._compare_tensor_data(py_iou_f0, cpp_feat_0, "BBReg PyIoUFeat0 vs CppIoUFeat0", i, current_errors)
self._compare_tensor_data(py_iou_f1, cpp_feat_1, "BBReg PyIoUFeat1 vs CppIoUFeat1", i, current_errors)
# Compare modulation vectors (py_modulation vs cpp_modulation_0/1)
py_mod_0 = py_modulation[0] if py_modulation and len(py_modulation) > 0 else None
py_mod_1 = py_modulation[1] if py_modulation and len(py_modulation) > 1 else None
self._compare_tensor_data(py_mod_0, cpp_modulation_0, "BBReg PyMod0 vs CppMod0", i, current_errors)
self._compare_tensor_data(py_mod_1, cpp_modulation_1, "BBReg PyMod1 vs CppMod1", i, current_errors)
# Compare final IoU prediction
# _compare_tensor_data will handle None for py_iou_pred or cpp_iou_pred
self._compare_tensor_data(py_iou_pred, cpp_iou_pred, "BBReg IoUPred", i, current_errors)
if current_errors: # Add to overall statistics if any comparisons were made/attempted
self.all_errors_stats[f"BBReg_Sample_{i}"] = current_errors
# Note: MAE accumulation for overall average needs to be selective based on valid comparisons
# For simplicity, we'll let the HTML report show NaNs for failed/skipped comparisons.
if not self.all_errors_stats: # Check if any BB regressor comparisons were made
print("No BB Regressor comparisons were performed for this model type.") # Clarified message
# No plots or stats if nothing was compared for BB regressor
return
# The following old averaging and plotting is now handled by generate_html_report using all_errors_stats
# print("\nBB Regressor Comparison Statistics:")
# if iou_pred_errors:
# print(f" IoU Prediction MAE: Mean={np.mean(iou_pred_errors):.4e}, Std={np.std(iou_pred_errors):.4e}")
# if modulation_errors:
# print(f" Modulation MAE: Mean={np.mean(modulation_errors):.4e}, Std={np.std(modulation_errors):.4e}")
# if feat_errors:
# print(f" IoU Feature MAE: Mean={np.mean(feat_errors):.4e}, Std={np.std(feat_errors):.4e}")
# # Plots - these would need to be rethought with the new error structure
# self._generate_stats_and_plots(iou_pred_errors, "BB Regressor IoU Prediction Error", self.plots_dir / "bbreg_iou_pred_error_hist.png")
# self._generate_stats_and_plots(modulation_errors, "BB Regressor Modulation Error", self.plots_dir / "bbreg_modulation_error_hist.png")
# self._generate_stats_and_plots(feat_errors, "BB Regressor IoU Feature Error", self.plots_dir / "bbreg_feature_error_hist.png")
def generate_html_report(self):
print("\nGenerating HTML report...")
report_path = self.comparison_dir / "report.html"
# plot_paths_dict = {} # This variable was unused
# Prepare data for the report: group by model and comparison type
report_data = {
# "Model_Type Component_Name": { \
# "samples": {0: {\"mae\":X, \"max_err\":Y, \"mean_py\":Z, \"std_err\":S, \"plot_path\":\"...\"}, 1:{...} },\n # "overall_mae_mean": A, "overall_mae_std": B, "overall_max_err_mean": C\n # }\n }
}
for sample_key, comparisons in self.all_errors_stats.items():
# sample_key examples: "Clf_Train_Sample_0", "Clf_Test_Sample_0", "BBReg_Sample_0"
parts = sample_key.split("_")
model_prefix = parts[0] # Clf, BBReg
sample_type_str = ""
sample_idx = -1
if model_prefix == "Clf":
sample_type_str = parts[1] # Train or Test
sample_idx = int(parts[-1])
model_name_key = f"Classifier {sample_type_str}"
elif model_prefix == "BBReg":
sample_idx = int(parts[-1])
model_name_key = "BB Regressor"
else:
print(f"WARNING: Unknown sample key format in all_errors_stats: {sample_key}")
continue
for comparison_name, stats in comparisons.items():
# comparison_name examples: "Classifier Features Train", "BBReg PyIoUFeat0 vs CppIoUFeat0"
# Unpack all 11 metrics now
mae, max_err, diff_arr, mean_py_val, std_abs_err, \
l2_py, l2_cpp, l2_diff, cos_sim, pearson, mre = stats
full_comparison_key = f"{model_name_key} - {comparison_name}"
if full_comparison_key not in report_data:
report_data[full_comparison_key] = {
"samples": {},
"all_maes": [],
"all_max_errs": [],
"all_mean_py_vals": [],
"all_std_abs_errs": [], # Renamed from all_std_errs
"all_l2_py_vals": [],
"all_l2_cpp_vals": [],
"all_l2_diff_vals": [],
"all_cos_sim_vals": [],
"all_pearson_vals": [],
"all_mre_vals": []
}
plot_filename = None
if diff_arr is not None and len(diff_arr) > 0 and not np.all(np.isnan(diff_arr)):
plot_filename = f"{model_prefix}_{sample_type_str}_sample{sample_idx}_{comparison_name.replace(' ', '_').replace('/', '_')}_hist.png"
plot_abs_path = self.plots_dir / plot_filename
# Pass std_abs_err to plotting function
self._generate_single_plot(diff_arr, comparison_name, plot_abs_path, mean_py_val, std_abs_err, mae, max_err)
report_data[full_comparison_key]["samples"][sample_idx] = {
"mae": mae,
"max_err": max_err,
"mean_py_val": mean_py_val,
"std_abs_err": std_abs_err, # Renamed from std_err
"l2_py": l2_py,
"l2_cpp": l2_cpp,
"l2_diff": l2_diff,
"cos_sim": cos_sim,
"pearson": pearson,
"mre": mre,
"plot_path": plot_filename # Store relative path for HTML
}
if not np.isnan(mae): report_data[full_comparison_key]["all_maes"].append(mae)
if not np.isnan(max_err): report_data[full_comparison_key]["all_max_errs"].append(max_err)
if not np.isnan(mean_py_val): report_data[full_comparison_key]["all_mean_py_vals"].append(mean_py_val)
if not np.isnan(std_abs_err): report_data[full_comparison_key]["all_std_abs_errs"].append(std_abs_err)
if not np.isnan(l2_py): report_data[full_comparison_key]["all_l2_py_vals"].append(l2_py)
if not np.isnan(l2_cpp): report_data[full_comparison_key]["all_l2_cpp_vals"].append(l2_cpp)
if not np.isnan(l2_diff): report_data[full_comparison_key]["all_l2_diff_vals"].append(l2_diff)
if not np.isnan(cos_sim): report_data[full_comparison_key]["all_cos_sim_vals"].append(cos_sim)
if not np.isnan(pearson): report_data[full_comparison_key]["all_pearson_vals"].append(pearson)
if not np.isnan(mre): report_data[full_comparison_key]["all_mre_vals"].append(mre)
# Calculate overall stats
for comp_key, data in report_data.items():
data["overall_mae_mean"] = np.mean(data["all_maes"]) if data["all_maes"] else float('nan')
data["overall_mae_std"] = np.std(data["all_maes"]) if data["all_maes"] else float('nan')
data["overall_max_err_mean"] = np.mean(data["all_max_errs"]) if data["all_max_errs"] else float('nan')
data["overall_mean_py_val_mean"] = np.mean(data["all_mean_py_vals"]) if data["all_mean_py_vals"] else float('nan')
data["overall_std_abs_err_mean"] = np.mean(data["all_std_abs_errs"]) if data["all_std_abs_errs"] else float('nan') # Renamed
data["overall_l2_py_mean"] = np.mean(data["all_l2_py_vals"]) if data["all_l2_py_vals"] else float('nan')
data["overall_l2_cpp_mean"] = np.mean(data["all_l2_cpp_vals"]) if data["all_l2_cpp_vals"] else float('nan')
data["overall_l2_diff_mean"] = np.mean(data["all_l2_diff_vals"]) if data["all_l2_diff_vals"] else float('nan')
data["overall_cos_sim_mean"] = np.mean(data["all_cos_sim_vals"]) if data["all_cos_sim_vals"] else float('nan')
data["overall_pearson_mean"] = np.mean(data["all_pearson_vals"]) if data["all_pearson_vals"] else float('nan')
data["overall_mre_mean"] = np.mean(data["all_mre_vals"]) if data["all_mre_vals"] else float('nan')
# HTML Generation
html_content = """
<html>
<head>
<title>Model Comparison Report</title>
<style>
body { font-family: sans-serif; margin: 20px; }
h1, h2, h3 { color: #333; }
table { border-collapse: collapse; width: 90%; margin-bottom: 20px; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
.plot-container { margin-bottom: 30px; page-break-inside: avoid; }
img { max-width: 100%; height: auto; border: 1px solid #ccc; }
.nan { color: #999; font-style: italic; }
.collapsible {
background-color: #f2f2f2;
color: #444;
cursor: pointer;
padding: 10px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 1.1em;
margin-top: 10px;
margin-bottom: 5px;
}
.active, .collapsible:hover {
background-color: #ddd;
}
.content {
padding: 0 18px;
display: none;
overflow: hidden;
background-color: #f9f9f9;
}
.metric-explanation { margin-bottom: 20px; padding: 10px; border: 1px solid #eee; background-color: #f9f9f9; }
.metric-explanation dt { font-weight: bold; }
.metric-explanation dd { margin-left: 20px; margin-bottom: 5px; }
</style>
</head>
<body>
<h1>Model Comparison Report</h1>
<p>Number of samples per model component: {self.num_samples}</p>
<div class="metric-explanation">
<h3>Understanding the Metrics:</h3>
<dl>
<dt>Mean MAE (Mean Absolute Error)</dt>
<dd><b>Calculation:</b> Average of the absolute differences between corresponding elements of the Python and C++ tensors (<code>mean(abs(py - cpp))</code>). The "Mean MAE" in the summary table is the average of these MAEs over all samples for a given comparison.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Closer to 0 indicates better agreement. This metric shows the average magnitude of error.</dd>
<dt>Std MAE (Standard Deviation of MAE)</dt>
<dd><b>Calculation:</b> Standard deviation of the MAE values calculated for each sample within a comparison group.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. A smaller value indicates that the MAE is consistent across samples. A larger value suggests variability in agreement from sample to sample.</dd>
<dt>Mean Max Error</dt>
<dd><b>Calculation:</b> Average of the maximum absolute differences found between Python and C++ tensors for each sample (<code>mean(max(abs(py - cpp)))</code> over samples).</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Closer to 0 is better. Indicates the average of the worst-case discrepancies per sample.</dd>
<dt>Mean Py Val (Mean Python Tensor Value)</dt>
<dd><b>Calculation:</b> Average of the mean values of the Python reference tensors over all samples (<code>mean(mean(py_tensor_sample_N))</code>).</dd>
<dd><b>Range & Interpretation:</b> Problem-dependent. Provides context about the typical magnitude of the Python model's output values.</dd>
<dt>Mean Std Abs Err (Mean Standard Deviation of Absolute Errors)</dt>
<dd><b>Calculation:</b> Average of the standard deviations of the absolute error arrays (<code>abs(py - cpp)</code>) for each sample. The "Err Std" in plot titles is this value for that specific sample.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. A smaller value indicates that the errors are concentrated around their mean (MAE), implying less spread in error magnitudes within a sample.</dd>
<dt>Mean L2 Py (Mean L2 Norm of Python Tensor)</dt>
<dd><b>Calculation:</b> Average of the L2 norms (Euclidean norm) of the flattened Python tensors over all samples.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Represents the average magnitude or "length" of the Python output vectors.</dd>
<dt>Mean L2 Cpp (Mean L2 Norm of C++ Tensor)</dt>
<dd><b>Calculation:</b> Average of the L2 norms of the flattened C++ tensors over all samples.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Represents the average magnitude of the C++ output vectors. Should be comparable to Mean L2 Py if models agree in scale.</dd>
<dt>Mean L2 Diff (Mean L2 Norm of Difference)</dt>
<dd><b>Calculation:</b> Average of the L2 norms of the flattened difference tensors (<code>py - cpp</code>) over all samples.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Closer to 0 indicates better agreement. This is the magnitude of the average difference vector.</dd>
<dt>Mean Cosine Sim (Mean Cosine Similarity)</dt>
<dd><b>Calculation:</b> Average of the cosine similarities between the flattened Python and C++ tensors over all samples. Cosine similarity is <code>dot(py, cpp) / (norm(py) * norm(cpp))</code>.</dd>
<dd><b>Range & Interpretation:</b> -1 to 1 (typically 0 to 1 for non-negative features). Closer to 1 indicates that the tensors point in the same direction (high similarity in terms of orientation, ignoring magnitude). Values near 0 suggest orthogonality, and near -1 suggest opposite directions.</dd>
<dt>Mean Pearson Corr (Mean Pearson Correlation Coefficient)</dt>
<dd><b>Calculation:</b> Average of the Pearson correlation coefficients between the flattened Python and C++ tensors over all samples. Measures linear correlation.</dd>
<dd><b>Range & Interpretation:</b> -1 to 1. Closer to 1 indicates strong positive linear correlation. Closer to -1 indicates strong negative linear correlation. Closer to 0 indicates weak or no linear correlation.</dd>
<dt>Mean MRE (Mean Relative Error)</dt>
<dd><b>Calculation:</b> Average of the mean relative errors per sample, where relative error is <code>mean(abs(py - cpp) / (abs(py) + epsilon))</code>. Epsilon is a small value to prevent division by zero.</dd>
<dd><b>Range & Interpretation:</b> 0 to &infin;. Closer to 0 is better. This metric normalizes the absolute error by the magnitude of the Python reference values, useful for understanding error relative to signal strength.</dd>
</dl>
</div>
"""
sorted_report_keys = sorted(report_data.keys())
html_content += "<h2>Overall Comparison Statistics</h2><table><tr><th>Comparison Key</th><th>Mean MAE</th><th>Std MAE</th><th>Mean Max Error</th><th>Mean Py Val</th><th>Mean Std Abs Err</th><th>Mean L2 Py</th><th>Mean L2 Cpp</th><th>Mean L2 Diff</th><th>Mean Cosine Sim</th><th>Mean Pearson Corr</th><th>Mean MRE</th></tr>"
for comp_key in sorted_report_keys:
data = report_data[comp_key]
html_content += f"""
<tr>
<td>{comp_key}</td>
<td>{f"{data['overall_mae_mean']:.4e}" if not np.isnan(data['overall_mae_mean']) else 'N/A'}</td>
<td>{f"{data['overall_mae_std']:.4e}" if not np.isnan(data['overall_mae_std']) else 'N/A'}</td>
<td>{f"{data['overall_max_err_mean']:.4e}" if not np.isnan(data['overall_max_err_mean']) else 'N/A'}</td>
<td>{f"{data['overall_mean_py_val_mean']:.4e}" if not np.isnan(data['overall_mean_py_val_mean']) else 'N/A'}</td>
<td>{f"{data['overall_std_abs_err_mean']:.4e}" if not np.isnan(data['overall_std_abs_err_mean']) else 'N/A'}</td>
<td>{f"{data['overall_l2_py_mean']:.4e}" if not np.isnan(data['overall_l2_py_mean']) else 'N/A'}</td>
<td>{f"{data['overall_l2_cpp_mean']:.4e}" if not np.isnan(data['overall_l2_cpp_mean']) else 'N/A'}</td>
<td>{f"{data['overall_l2_diff_mean']:.4e}" if not np.isnan(data['overall_l2_diff_mean']) else 'N/A'}</td>
<td>{f"{data['overall_cos_sim_mean']:.4f}" if not np.isnan(data['overall_cos_sim_mean']) else 'N/A'}</td>
<td>{f"{data['overall_pearson_mean']:.4f}" if not np.isnan(data['overall_pearson_mean']) else 'N/A'}</td>
<td>{f"{data['overall_mre_mean']:.4e}" if not np.isnan(data['overall_mre_mean']) else 'N/A'}</td>
</tr>
"""
html_content += "</table>"
for comp_key in sorted_report_keys:
data = report_data[comp_key]
html_content += f"<h2>Details for: {comp_key}</h2>"
html_content += f"""<p>Overall Mean MAE: {f'{data["overall_mae_mean"]:.4e}' if not np.isnan(data['overall_mae_mean']) else 'N/A'}</p>"""
html_content += "<table><tr><th>Sample Index</th><th>MAE</th><th>Max Error</th><th>Mean Py Val</th><th>Std Abs Err</th><th>L2 Py</th><th>L2 Cpp</th><th>L2 Diff</th><th>Cosine Sim</th><th>Pearson Corr</th><th>MRE</th><th>Error Distribution Plot</th></tr>"
for sample_idx in sorted(data["samples"].keys()):
sample_data = data["samples"][sample_idx]
plot_path_html = f'./plots/{sample_data["plot_path"]}' if sample_data["plot_path"] else "N/A"
img_tag = f'<img src="{plot_path_html}" alt="Error histogram">' if sample_data["plot_path"] else "N/A"
html_content += f"""
<tr>
<td>{sample_idx}</td>
<td>{f"{sample_data['mae']:.4e}" if not np.isnan(sample_data['mae']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['max_err']:.4e}" if not np.isnan(sample_data['max_err']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['mean_py_val']:.4e}" if not np.isnan(sample_data['mean_py_val']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['std_abs_err']:.4e}" if not np.isnan(sample_data['std_abs_err']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['l2_py']:.4e}" if not np.isnan(sample_data['l2_py']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['l2_cpp']:.4e}" if not np.isnan(sample_data['l2_cpp']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['l2_diff']:.4e}" if not np.isnan(sample_data['l2_diff']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['cos_sim']:.4f}" if not np.isnan(sample_data['cos_sim']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['pearson']:.4f}" if not np.isnan(sample_data['pearson']) else '<span class="nan">N/A</span>'}</td>
<td>{f"{sample_data['mre']:.4e}" if not np.isnan(sample_data['mre']) else '<span class="nan">N/A</span>'}</td>
<td>{img_tag}</td>
</tr>
"""
html_content += "</table>"
html_content += """
<script>
var coll = document.getElementsByClassName("collapsible");
var i;
for (i = 0; i < coll.length; i++) {
coll[i].addEventListener("click", function() {
this.classList.toggle("active");
var content = this.nextElementSibling;
if (content.style.display === "block") {
content.style.display = "none";
} else {
content.style.display = "block";
}
});
}
</script>
</body></html>
"""
with open(report_path, 'w') as f:
f.write(html_content)
print(f"HTML report generated at {report_path}")
def _generate_single_plot(self, error_array, title, plot_path, mean_val, std_abs_err, mae, max_err):
if error_array is None or len(error_array) == 0 or np.all(np.isnan(error_array)):
# print(f"Skipping plot for {title} as error_array is empty or all NaNs.")
return
plt.figure(figsize=(8, 6))
plt.hist(error_array, bins=50, color='skyblue', edgecolor='black')
stats_text = f"Ref Mean: {mean_val:.3e} | MAE: {mae:.3e} | MaxErr: {max_err:.3e} | Err Std: {std_abs_err:.3e}"
plt.title(f"{title}\n{stats_text}", fontsize=10)
plt.xlabel("Error Value")
plt.ylabel("Frequency")
plt.grid(True, linestyle='--', alpha=0.7)
try:
plt.tight_layout()
plt.savefig(plot_path)
except Exception as e:
print(f"ERROR: Failed to save plot {plot_path}: {e}")
plt.close()
def run_all_tests(self):
self.all_errors_stats = {} # Initialize/clear for the new run
self.plots_dir.mkdir(parents=True, exist_ok=True) # Ensure plots_dir exists
self.compare_classifier()
self.compare_bb_regressor()
self.generate_html_report()
print("All tests completed!")
def load_cpp_tensor(self, path, device):
path_str = str(path) # Ensure path is a string
try:
# Attempt 1: Load as a plain tensor, assuming it's not a TorchScript module.
# This is the most common and safest way to load tensors saved from PyTorch (Python or C++).
tensor = torch.load(path_str, map_location=device, weights_only=True)
# print(f"Successfully loaded tensor from {path_str} with weights_only=True")
return tensor
except RuntimeError as e_weights_only:
# Handle cases where weights_only=True is not appropriate (e.g., TorchScript archives)
if "TorchScript archive" in str(e_weights_only) or \
"PytorchStreamReader failed" in str(e_weights_only) or \
"weights_only" in str(e_weights_only): # Broader check for weights_only issues
# print(f"weights_only=True failed for {path_str} ({e_weights_only}). Trying weights_only=False.")
try:
# Attempt 2: Load with weights_only=False.
loaded_obj = torch.load(path_str, map_location=device, weights_only=False)
if isinstance(loaded_obj, torch.Tensor):
# print(f"Successfully loaded tensor from {path_str} with weights_only=False.")
return loaded_obj
# Check for _actual_script_module for deeply nested tensors
elif hasattr(loaded_obj, '_actual_script_module') and hasattr(loaded_obj._actual_script_module, 'forward'):
# print(f"Found _actual_script_module in {path_str}, trying its forward().")
try:
potential_tensor = loaded_obj._actual_script_module.forward()
if isinstance(potential_tensor, torch.Tensor):
# print(f"Extracted tensor using _actual_script_module.forward() from {path_str}")
return potential_tensor
except Exception as e_deep_forward:
print(f"Warning: Calling _actual_script_module.forward() from {path_str} failed: {e_deep_forward}")
# General ScriptModule handling (RecursiveScriptModule or any object with forward)
elif isinstance(loaded_obj, torch.jit.RecursiveScriptModule) or hasattr(loaded_obj, 'forward'):
# print(f"Loaded a ScriptModule/object with forward from {path_str}. Attempting extraction.")
# Attempt 2a: Greedily find the first tensor attribute
for attr_name in dir(loaded_obj):
if attr_name.startswith('__'):
continue
try:
attr_val = getattr(loaded_obj, attr_name)
if isinstance(attr_val, torch.Tensor):
# print(f"Extracted tensor from attribute '{attr_name}' of ScriptModule at {path_str}")
return attr_val
except Exception:
pass # Ignore errors from getattr
# Attempt 2b: Try calling forward() if it exists and no tensor attribute was found
if hasattr(loaded_obj, 'forward') and callable(loaded_obj.forward):
sig = inspect.signature(loaded_obj.forward)
if not sig.parameters: # Only call if forward() takes no arguments
try:
potential_tensor = loaded_obj.forward()
if isinstance(potential_tensor, torch.Tensor):
# print(f"Extracted tensor using forward() from ScriptModule at {path_str}")
return potential_tensor
except Exception as e_forward:
print(f"Warning: Calling forward() on ScriptModule from {path_str} failed: {e_forward}")
# Attempt 2c: Check state_dict
try:
sd = loaded_obj.state_dict()
# print(f"DEBUG: state_dict for {path_str}: {list(sd.keys())}")
if len(sd) == 1:
tensor_name = list(sd.keys())[0]
potential_tensor = sd[tensor_name]
if isinstance(potential_tensor, torch.Tensor):
print(f"INFO: Extracted tensor '{tensor_name}' from single-entry state_dict of ScriptModule at {path_str}")
return potential_tensor
elif len(sd) > 1:
# If multiple tensors, this is heuristic. Prefer known/simple names if possible.
# For now, just take the first one if it's a tensor.
for tensor_name, potential_tensor in sd.items():
if isinstance(potential_tensor, torch.Tensor):
print(f"INFO: Extracted tensor '{tensor_name}' (from multiple) from state_dict of ScriptModule at {path_str}")
return potential_tensor
print(f"Warning: ScriptModule at {path_str} has multiple state_dict entries: {list(sd.keys())} but none were straightforwardly returned as the primary tensor.")
# else: state_dict is empty, or no tensors found above
except Exception as e_sd:
print(f"Warning: Error accessing/processing state_dict for ScriptModule at {path_str}: {e_sd}")
print(f"ERROR: Could not extract tensor from ScriptModule at {path_str} after trying attributes, forward(), and state_dict(). Dir: {dir(loaded_obj)}")
return None
else:
print(f"ERROR: Loaded object from {path_str} (with weights_only=False) is not a Tensor or recognized ScriptModule. Type: {type(loaded_obj)}.")
return None
except Exception as e_load_false:
print(f"ERROR: weights_only=False also failed for {path_str}. Last error: {e_load_false}")
return None
else: # Some other error with weights_only=True
print(f"ERROR: Loading tensor from {path_str} with weights_only=True failed with an unexpected error: {e_weights_only}")
return None
except Exception as e_generic:
print(f"ERROR: A generic error occurred while loading tensor from {path_str}: {e_generic}")
return None
def _compare_tensor_data(self, tensor1, tensor2, name, sample_idx, current_errors):
"""Compare two tensors and return error metrics."""
num_metrics = 11 # mae, max_err, diff_arr, mean_py_val, std_abs_err, l2_py, l2_cpp, l2_diff, cos_sim, pearson, mre
nan_metrics_tuple = (
float('nan'), float('nan'), [], float('nan'), float('nan'), # Original 5
float('nan'), float('nan'), float('nan'), float('nan'), float('nan'), float('nan') # New 6
)
if tensor1 is None or tensor2 is None:
py_mean = float('nan')
py_l2 = float('nan')
if tensor1 is not None: # Python tensor exists
t1_cpu_temp = tensor1.cpu().detach().numpy().astype(np.float32)
py_mean = np.mean(t1_cpu_temp)
py_l2 = np.linalg.norm(t1_cpu_temp.flatten())
# If only tensor2 is None, we can't calculate C++ l2 or comparison metrics
# If only tensor1 is None, py_mean and py_l2 remain NaN.
current_errors[name] = (
float('nan'), float('nan'), [], py_mean, float('nan'),
py_l2, float('nan'), float('nan'), float('nan'), float('nan'), float('nan')
)
print(f"Warning: Cannot compare '{name}' for sample {sample_idx}, one or both tensors are None.")
return
t1_cpu = tensor1.cpu().detach().numpy().astype(np.float32)
t2_cpu = tensor2.cpu().detach().numpy().astype(np.float32)
if t1_cpu.shape != t2_cpu.shape:
print(f"Warning: Shape mismatch for '{name}' sample {sample_idx}. Py: {t1_cpu.shape}, Cpp: {t2_cpu.shape}. Skipping most comparisons.")
current_errors[name] = (
float('nan'), float('nan'), [], np.mean(t1_cpu), float('nan'), # MAE, MaxErr, diff_arr, MeanPy, StdAbsErr
np.linalg.norm(t1_cpu.flatten()), np.linalg.norm(t2_cpu.flatten()), float('nan'), # L2Py, L2Cpp, L2Diff
float('nan'), float('nan'), float('nan') # CosSim, Pearson, MRE
)
return
# All calculations from here assume shapes match and tensors are not None
t1_flat = t1_cpu.flatten()
t2_flat = t2_cpu.flatten()
abs_diff_elements = np.abs(t1_cpu - t2_cpu)
mae = np.mean(abs_diff_elements)
max_err = np.max(abs_diff_elements)
diff_arr_for_hist = abs_diff_elements.flatten() # For histogram
mean_py_val = np.mean(t1_cpu)
std_abs_err = np.std(diff_arr_for_hist)
l2_norm_py = np.linalg.norm(t1_flat)
l2_norm_cpp = np.linalg.norm(t2_flat)
l2_norm_diff = np.linalg.norm(t1_flat - t2_flat)
# Cosine Similarity
dot_product = np.dot(t1_flat, t2_flat)
if l2_norm_py == 0 or l2_norm_cpp == 0:
cosine_sim = float('nan')
else:
cosine_sim = dot_product / (l2_norm_py * l2_norm_cpp)
# Pearson Correlation Coefficient
if len(t1_flat) < 2:
pearson_corr = float('nan')
else:
std_t1 = np.std(t1_flat)
std_t2 = np.std(t2_flat)
if std_t1 == 0 or std_t2 == 0: # If either is constant
if std_t1 == 0 and std_t2 == 0 and np.allclose(t1_flat, t2_flat):
pearson_corr = 1.0 # Both constant and identical
else:
pearson_corr = float('nan') # Otherwise, undefined or not meaningfully 1
else:
try:
corr_matrix = np.corrcoef(t1_flat, t2_flat)
if corr_matrix.ndim == 2:
pearson_corr = corr_matrix[0, 1]
else: # Should be a scalar if inputs were effectively constant, already handled by std checks
pearson_corr = float(corr_matrix) if np.isscalar(corr_matrix) else float('nan')
except Exception:
pearson_corr = float('nan')
# Mean Relative Error (MRE)
epsilon_rel_err = 1e-9 # Small epsilon to avoid division by zero and extreme values
# Calculate relative error where abs(t1_cpu) is not zero (or very small)
# For elements where t1_cpu is zero (or very small):
# - If t2_cpu is also zero (small), error is small.
# - If t2_cpu is not zero, relative error is infinite/large.
# Using (abs(t1_cpu) + epsilon) in denominator handles this.
mean_rel_err = np.mean(abs_diff_elements / (np.abs(t1_cpu) + epsilon_rel_err))
current_errors[name] = (
mae, max_err, diff_arr_for_hist, mean_py_val, std_abs_err,
l2_norm_py, l2_norm_cpp, l2_norm_diff, cosine_sim, pearson_corr, mean_rel_err
)
# Optional: print detailed error for specific high-error cases
# if mae > 1e-4:
# print(f"High MAE for {name}, sample {sample_idx}: {mae:.6f}")
# The function implicitly returns None as it modifies current_errors in place.
# For direct use, if needed, it could return the tuple:
# return (mae, max_err, diff_arr_for_hist, mean_py_val, std_abs_err, l2_norm_py, l2_norm_cpp, l2_norm_diff, cosine_sim, pearson_corr, mean_rel_err)
if __name__ == "__main__":
# Parse command line arguments
import argparse
parser = argparse.ArgumentParser(description="Compare Python and C++ model implementations")
parser.add_argument("--num-samples", type=int, default=1000, help="Number of test samples (default: 1000)")
args = parser.parse_args()
# Run comparison
comparison = ModelComparison(num_samples=args.num_samples)
comparison.run_all_tests()
Loading…
Cancel
Save