# C++ Tracker Model Testing Framework This directory contains the testing framework for comparing C++ implementations of PyTorch models against their original Python counterparts. ## Overview The primary goal is to ensure that the C++ models (`cimp` project) produce results that are numerically close to the Python models from the `pytracking` toolkit, given the same inputs and model weights. The framework consists of: 1. **C++ Test Program (`test_models.cpp`)**: * Responsible for loading pre-trained model weights (from `exported_weights/`). * Takes randomly generated input tensors (pre-generated by `generate_test_samples.cpp` and saved in `test/input_samples/`). * Runs the C++ `Classifier` and `BBRegressor` models. * Saves the C++ model output tensors to `test/output/`. 2. **C++ Sample Generator (`generate_test_samples.cpp`)**: * Generates a specified number of random input tensor sets for both the classifier and bounding box regressor. * Saves these input tensors into `test/input_samples/{classifier|bb_regressor}/sample_N/` and `test/input_samples/{classifier|bb_regressor}/test_N/` (for classifier test features). * This step is separated to allow the Python comparison script to run even if the C++ models have issues during their execution phase. 3. **Python Comparison Script (`compare_models.py`)**: * Loads the original Python models (using `DiMPTorchScriptWrapper` which loads weights from `exported_weights/`). * Loads the input tensors generated by `generate_test_samples.cpp` from `test/input_samples/`. * Runs the Python models on these input tensors to get reference Python outputs. * Loads the C++ model output tensors from `test/output/`. * Performs a detailed, element-wise comparison between Python and C++ outputs. * Calculates various error metrics (MAE, Max Error, L2 norms, Cosine Similarity, Pearson Correlation, Mean Relative Error). * Generates an HTML report (`test/comparison/report.html`) summarizing the comparisons, including per-sample statistics and error distribution plots (saved in `test/comparison/plots/`). 4. **Automation Script (`run_full_comparison.sh`)**: * Orchestrates the entire testing process: 1. Builds the C++ project (including `test_models` and `generate_test_samples`). 2. Runs `generate_test_samples` to create/update input data. 3. Runs `test_models` to generate C++ outputs. 4. Runs `compare_models.py` to perform the comparison and generate the report. * Accepts the number of samples as an argument. ## Directory Structure ``` test/ ├── input_samples/ # Stores input tensors generated by C++ │ ├── classifier/ │ │ ├── sample_0/ │ │ │ ├── backbone_feat.pt │ │ │ └── ... (other classifier train inputs) │ │ └── test_0/ │ │ └── test_feat.pt │ │ └── ... (other classifier test inputs) │ └── bb_regressor/ │ └── sample_0/ │ ├── feat_layer2.pt │ ├── feat_layer3.pt │ └── ... (other bb_regressor inputs) ├── output/ # Stores output tensors generated by C++ models │ ├── classifier/ │ │ ├── sample_0/ │ │ │ └── clf_features.pt │ │ └── test_0/ │ │ └── clf_feat_test.pt │ └── bb_regressor/ │ └── sample_0/ │ ├── iou_pred.pt │ └── ... (other bb_regressor outputs) ├── comparison/ # Stores comparison results │ ├── report.html # Main HTML report │ └── plots/ # Error distribution histograms ├── test_models.cpp # C++ program to run models and save outputs ├── generate_test_samples.cpp # C++ program to generate input samples ├── compare_models.py # Python script for comparison and report generation ├── run_full_comparison.sh # Main test execution script └── README.md # This file ``` ## How to Add a New Model for Comparison Let's say you want to add a new model called `MyNewModel` with both C++ and Python implementations. **1. Export Python Model Weights:** * Ensure your Python `MyNewModel` can have its weights saved in a format loadable by both Python (e.g., `state_dict` or individual tensors) and C++ (LibTorch `torch::load`). * Create a subdirectory `exported_weights/mynewmodel/` and save the weights there. * Document the tensor names and their corresponding model parameters in a `mynewmodel_weights_doc.txt` file within that directory (see existing `classifier_weights_doc.txt` or `bb_regressor_weights_doc.txt` for examples). This is crucial for the `DiMPTorchScriptWrapper` if loading from individual tensors. **2. Update C++ Code:** * **`generate_test_samples.cpp`**: * Add functions to generate realistic random input tensors for `MyNewModel`. * Define the expected input tensor names and shapes. * Modify the `main` function to: * Create a directory `test/input_samples/mynewmodel/sample_N/`. * Call your new input generation functions. * Save these input tensors (e.g., `my_input1.pt`, `my_input2.pt`) into the created directory using the `save_tensor` utility. * **`test_models.cpp`**: * Include the header for your C++ `MyNewModel` (e.g., `cimp/mynewmodel/mynewmodel.h`). * In the `main` function: * Add a section for `MyNewModel`. * Determine the absolute path to `exported_weights/mynewmodel/`. * Instantiate your C++ `MyNewModel`, passing the weights directory. * Loop through the number of samples: * Construct paths to the input tensors in `test/input_samples/mynewmodel/sample_N/`. * Load these input tensors using `load_tensor`. Ensure they are on the correct device (CPU/CUDA). * Call the relevant methods of your C++ `MyNewModel` (e.g., `myNewModel.predict(...)`). * Create an output directory `test/output/mynewmodel/sample_N/`. * Save the output tensors from your C++ model (e.g., `my_output.pt`) to this directory using `save_tensor`. Remember to move outputs to CPU before saving if they are on CUDA. * **`CMakeLists.txt`**: * If `MyNewModel` is a new static library (like `classifier` or `bb_regressor`), define its sources and add it as a library. * Link `test_models` and `generate_test_samples` (if it needs new specific libraries) with `MyNewModel` library and any other dependencies (like LibTorch). **3. Update Python Comparison Script (`compare_models.py`):** * **`ModelComparison.__init__` & `_init_models`**: * If your Python `MyNewModel` needs to be loaded via `DiMPTorchScriptWrapper`, update the wrapper or add logic to load your model. You might need to add a new parameter like `mynewmodel_sd='mynewmodel'` to `DiMPTorchScriptWrapper` and handle its loading. * Store the loaded Python `MyNewModel` instance (e.g., `self.models.mynewmodel`). * **Create `compare_mynewmodel` method**: * Create a new method, e.g., `def compare_mynewmodel(self):`. * Print a starting message. * Define input and C++ output directory paths: `Path('test') / 'input_samples' / 'mynewmodel'` and `Path('test') / 'output' / 'mynewmodel'`. * Loop through `self.num_samples`: * Initialize `current_errors = {}` for the current sample. * Construct paths to input tensors for `MyNewModel` from `test/input_samples/mynewmodel/sample_N/`. * Load these tensors using `self.load_cpp_tensor()`. * Run the Python `MyNewModel` with these inputs to get `py_output_tensor`. Handle potential errors. * Construct paths to C++ output tensors from `test/output/mynewmodel/sample_N/`. * Load the C++ output tensor (`cpp_output_tensor`) using `self.load_cpp_tensor()`. * Call `self._compare_tensor_data(py_output_tensor, cpp_output_tensor, "MyNewModel Output Comparison Name", i, current_errors)`. Use a descriptive name. * If there are multiple distinct outputs from `MyNewModel` to compare, repeat the load and `_compare_tensor_data` calls for each. * Store the results: `if current_errors: self.all_errors_stats[f"MyNewModel_Sample_{i}"] = current_errors`. * **`ModelComparison.run_all_tests`**: * Call your new `self.compare_mynewmodel()` method. **4. Run the Tests:** * Execute `./test/run_full_comparison.sh `. * Check the console output and `test/comparison/report.html` for the results of `MyNewModel`. ## Key Considerations: * **Tensor Naming and Paths:** Be consistent with tensor filenames and directory structures. The Python script relies on these conventions to find the correct files. * **Data Types and Devices:** Ensure tensors are of compatible data types (usually `float32`) and are on the correct device (CPU/CUDA) before model inference and before saving/loading. C++ outputs are saved from CPU. * **Error Handling:** Implement robust error handling in both C++ (e.g., for file loading, model errors) and Python (e.g., for tensor loading, Python model execution). The comparison script is designed to report "N/A" for metrics if tensors are missing or shapes mismatch, allowing other comparisons to proceed. * **`DiMPTorchScriptWrapper`:** If your Python model structure is different from DiMP's Classifier/BBRegressor, you might need to adapt `DiMPTorchScriptWrapper` or write a custom loader for your Python model if it's not already a `torch.jit.ScriptModule`. The current wrapper supports loading from a directory of named tensor files based on a documentation text file. * **`load_cpp_tensor` in Python:** This utility in `compare_models.py` attempts to robustly load tensors saved by LibTorch (which sometimes get wrapped as `RecursiveScriptModule`). If you encounter issues loading your C++ saved tensors, you might need to inspect their structure and potentially adapt this function. The C++ `save_tensor` function aims to save plain tensors. By following these steps, you can integrate new models into this testing framework to validate their C++ implementations.