History

mht 988d8622fa Fix: Correct classifier weight loading and device handling		5 months ago
..
README.md	Docs: Enhance HTML report metrics and add test framework README	5 months ago
compare_models.py	Docs: Enhance HTML report metrics and add test framework README	5 months ago

README.md

C++ Tracker Model Testing Framework

This directory contains the testing framework for comparing C++ implementations of PyTorch models against their original Python counterparts.

Overview

The primary goal is to ensure that the C++ models (cimp project) produce results that are numerically close to the Python models from the pytracking toolkit, given the same inputs and model weights.

The framework consists of:

C++ Test Program (test_models.cpp):
- Responsible for loading pre-trained model weights (from exported_weights/).
- Takes randomly generated input tensors (pre-generated by generate_test_samples.cpp and saved in test/input_samples/).
- Runs the C++ Classifier and BBRegressor models.
- Saves the C++ model output tensors to test/output/.
C++ Sample Generator (generate_test_samples.cpp):
- Generates a specified number of random input tensor sets for both the classifier and bounding box regressor.
- Saves these input tensors into test/input_samples/{classifier|bb_regressor}/sample_N/ and test/input_samples/{classifier|bb_regressor}/test_N/ (for classifier test features).
- This step is separated to allow the Python comparison script to run even if the C++ models have issues during their execution phase.
Python Comparison Script (compare_models.py):
- Loads the original Python models (using DiMPTorchScriptWrapper which loads weights from exported_weights/).
- Loads the input tensors generated by generate_test_samples.cpp from test/input_samples/.
- Runs the Python models on these input tensors to get reference Python outputs.
- Loads the C++ model output tensors from test/output/.
- Performs a detailed, element-wise comparison between Python and C++ outputs.
- Calculates various error metrics (MAE, Max Error, L2 norms, Cosine Similarity, Pearson Correlation, Mean Relative Error).
- Generates an HTML report (test/comparison/report.html) summarizing the comparisons, including per-sample statistics and error distribution plots (saved in test/comparison/plots/).
Automation Script (run_full_comparison.sh):
- Orchestrates the entire testing process:
  1. Builds the C++ project (including test_models and generate_test_samples).
  2. Runs generate_test_samples to create/update input data.
  3. Runs test_models to generate C++ outputs.
  4. Runs compare_models.py to perform the comparison and generate the report.
- Accepts the number of samples as an argument.

Directory Structure

test/
├── input_samples/        # Stores input tensors generated by C++
│   ├── classifier/
│   │   ├── sample_0/
│   │   │   ├── backbone_feat.pt
│   │   │   └── ... (other classifier train inputs)
│   │   └── test_0/
│   │       └── test_feat.pt
│   │       └── ... (other classifier test inputs)
│   └── bb_regressor/
│       └── sample_0/
│           ├── feat_layer2.pt
│           ├── feat_layer3.pt
│           └── ... (other bb_regressor inputs)
├── output/               # Stores output tensors generated by C++ models
│   ├── classifier/
│   │   ├── sample_0/
│   │   │   └── clf_features.pt
│   │   └── test_0/
│   │       └── clf_feat_test.pt
│   └── bb_regressor/
│       └── sample_0/
│           ├── iou_pred.pt
│           └── ... (other bb_regressor outputs)
├── comparison/           # Stores comparison results
│   ├── report.html       # Main HTML report
│   └── plots/            # Error distribution histograms
├── test_models.cpp       # C++ program to run models and save outputs
├── generate_test_samples.cpp # C++ program to generate input samples
├── compare_models.py     # Python script for comparison and report generation
├── run_full_comparison.sh # Main test execution script
└── README.md             # This file

How to Add a New Model for Comparison

Let's say you want to add a new model called MyNewModel with both C++ and Python implementations.

1. Export Python Model Weights:

Ensure your Python MyNewModel can have its weights saved in a format loadable by both Python (e.g., state_dict or individual tensors) and C++ (LibTorch torch::load).
Create a subdirectory exported_weights/mynewmodel/ and save the weights there.
Document the tensor names and their corresponding model parameters in a mynewmodel_weights_doc.txt file within that directory (see existing classifier_weights_doc.txt or bb_regressor_weights_doc.txt for examples). This is crucial for the DiMPTorchScriptWrapper if loading from individual tensors.

2. Update C++ Code:

generate_test_samples.cpp: * Add functions to generate realistic random input tensors for MyNewModel. * Define the expected input tensor names and shapes. * Modify the main function to:
- Create a directory test/input_samples/mynewmodel/sample_N/.
- Call your new input generation functions.
- Save these input tensors (e.g., my_input1.pt, my_input2.pt) into the created directory using the save_tensor utility.
test_models.cpp: * Include the header for your C++ MyNewModel (e.g., cimp/mynewmodel/mynewmodel.h). * In the main function:
- Add a section for MyNewModel.
- Determine the absolute path to exported_weights/mynewmodel/.
- Instantiate your C++ MyNewModel, passing the weights directory.
- Loop through the number of samples: * Construct paths to the input tensors in test/input_samples/mynewmodel/sample_N/. * Load these input tensors using load_tensor. Ensure they are on the correct device (CPU/CUDA). * Call the relevant methods of your C++ MyNewModel (e.g., myNewModel.predict(...)). * Create an output directory test/output/mynewmodel/sample_N/. * Save the output tensors from your C++ model (e.g., my_output.pt) to this directory using save_tensor. Remember to move outputs to CPU before saving if they are on CUDA.
CMakeLists.txt: * If MyNewModel is a new static library (like classifier or bb_regressor), define its sources and add it as a library. * Link test_models and generate_test_samples (if it needs new specific libraries) with MyNewModel library and any other dependencies (like LibTorch).

3. Update Python Comparison Script (compare_models.py):

ModelComparison.__init__ & _init_models: * If your Python MyNewModel needs to be loaded via DiMPTorchScriptWrapper, update the wrapper or add logic to load your model. You might need to add a new parameter like mynewmodel_sd='mynewmodel' to DiMPTorchScriptWrapper and handle its loading. * Store the loaded Python MyNewModel instance (e.g., self.models.mynewmodel).
Create compare_mynewmodel method: * Create a new method, e.g., def compare_mynewmodel(self):. * Print a starting message. * Define input and C++ output directory paths: Path('test') / 'input_samples' / 'mynewmodel' and Path('test') / 'output' / 'mynewmodel'. * Loop through self.num_samples:
- Initialize current_errors = {} for the current sample.
- Construct paths to input tensors for MyNewModel from test/input_samples/mynewmodel/sample_N/.
- Load these tensors using self.load_cpp_tensor().
- Run the Python MyNewModel with these inputs to get py_output_tensor. Handle potential errors.
- Construct paths to C++ output tensors from test/output/mynewmodel/sample_N/.
- Load the C++ output tensor (cpp_output_tensor) using self.load_cpp_tensor().
- Call self._compare_tensor_data(py_output_tensor, cpp_output_tensor, "MyNewModel Output Comparison Name", i, current_errors). Use a descriptive name.
- If there are multiple distinct outputs from MyNewModel to compare, repeat the load and _compare_tensor_data calls for each.
- Store the results: if current_errors: self.all_errors_stats[f"MyNewModel_Sample_{i}"] = current_errors.
ModelComparison.run_all_tests: * Call your new self.compare_mynewmodel() method.

4. Run the Tests:

Execute ./test/run_full_comparison.sh <num_samples>.
Check the console output and test/comparison/report.html for the results of MyNewModel.

Key Considerations:

Tensor Naming and Paths: Be consistent with tensor filenames and directory structures. The Python script relies on these conventions to find the correct files.
Data Types and Devices: Ensure tensors are of compatible data types (usually float32) and are on the correct device (CPU/CUDA) before model inference and before saving/loading. C++ outputs are saved from CPU.
Error Handling: Implement robust error handling in both C++ (e.g., for file loading, model errors) and Python (e.g., for tensor loading, Python model execution). The comparison script is designed to report "N/A" for metrics if tensors are missing or shapes mismatch, allowing other comparisons to proceed.
DiMPTorchScriptWrapper: If your Python model structure is different from DiMP's Classifier/BBRegressor, you might need to adapt DiMPTorchScriptWrapper or write a custom loader for your Python model if it's not already a torch.jit.ScriptModule. The current wrapper supports loading from a directory of named tensor files based on a documentation text file.
load_cpp_tensor in Python: This utility in compare_models.py attempts to robustly load tensors saved by LibTorch (which sometimes get wrapped as RecursiveScriptModule). If you encounter issues loading your C++ saved tensors, you might need to inspect their structure and potentially adapt this function. The C++ save_tensor function aims to save plain tensors.

By following these steps, you can integrate new models into this testing framework to validate their C++ implementations.