# C++ Tracker Model Testing Framework

This directory contains the testing framework for comparing C++ implementations of PyTorch models against their original Python counterparts.

## Overview

The primary goal is to ensure that the C++ models (`cimp` project) produce results that are numerically close to the Python models from the `pytracking` toolkit, given the same inputs and model weights.

The framework consists of:

1.  **C++ Test Program (`test_models.cpp`)**:
    *   Responsible for loading pre-trained model weights (from `exported_weights/`).
    *   Takes randomly generated input tensors (pre-generated by `generate_test_samples.cpp` and saved in `test/input_samples/`).
    *   Runs the C++ `Classifier` and `BBRegressor` models.
    *   Saves the C++ model output tensors to `test/output/`.

2.  **C++ Sample Generator (`generate_test_samples.cpp`)**:
    *   Generates a specified number of random input tensor sets for both the classifier and bounding box regressor.
    *   Saves these input tensors into `test/input_samples/{classifier|bb_regressor}/sample_N/` and `test/input_samples/{classifier|bb_regressor}/test_N/` (for classifier test features).
    *   This step is separated to allow the Python comparison script to run even if the C++ models have issues during their execution phase.

3.  **Python Comparison Script (`compare_models.py`)**:
    *   Loads the original Python models (using `DiMPTorchScriptWrapper` which loads weights from `exported_weights/`).
    *   Loads the input tensors generated by `generate_test_samples.cpp` from `test/input_samples/`.
    *   Runs the Python models on these input tensors to get reference Python outputs.
    *   Loads the C++ model output tensors from `test/output/`.
    *   Performs a detailed, element-wise comparison between Python and C++ outputs.
    *   Calculates various error metrics (MAE, Max Error, L2 norms, Cosine Similarity, Pearson Correlation, Mean Relative Error).
    *   Generates an HTML report (`test/comparison/report.html`) summarizing the comparisons, including per-sample statistics and error distribution plots (saved in `test/comparison/plots/`).

4.  **Automation Script (`run_full_comparison.sh`)**:
    *   Orchestrates the entire testing process:
        1.  Builds the C++ project (including `test_models` and `generate_test_samples`).
        2.  Runs `generate_test_samples` to create/update input data.
        3.  Runs `test_models` to generate C++ outputs.
        4.  Runs `compare_models.py` to perform the comparison and generate the report.
    *   Accepts the number of samples as an argument.

## Directory Structure

```
test/
├── input_samples/        # Stores input tensors generated by C++
│   ├── classifier/
│   │   ├── sample_0/
│   │   │   ├── backbone_feat.pt
│   │   │   └── ... (other classifier train inputs)
│   │   └── test_0/
│   │       └── test_feat.pt
│   │       └── ... (other classifier test inputs)
│   └── bb_regressor/
│       └── sample_0/
│           ├── feat_layer2.pt
│           ├── feat_layer3.pt
│           └── ... (other bb_regressor inputs)
├── output/               # Stores output tensors generated by C++ models
│   ├── classifier/
│   │   ├── sample_0/
│   │   │   └── clf_features.pt
│   │   └── test_0/
│   │       └── clf_feat_test.pt
│   └── bb_regressor/
│       └── sample_0/
│           ├── iou_pred.pt
│           └── ... (other bb_regressor outputs)
├── comparison/           # Stores comparison results
│   ├── report.html       # Main HTML report
│   └── plots/            # Error distribution histograms
├── test_models.cpp       # C++ program to run models and save outputs
├── generate_test_samples.cpp # C++ program to generate input samples
├── compare_models.py     # Python script for comparison and report generation
├── run_full_comparison.sh # Main test execution script
└── README.md             # This file
```

## How to Add a New Model for Comparison

Let's say you want to add a new model called `MyNewModel` with both C++ and Python implementations.

**1. Export Python Model Weights:**
   *   Ensure your Python `MyNewModel` can have its weights saved in a format loadable by both Python (e.g., `state_dict` or individual tensors) and C++ (LibTorch `torch::load`).
   *   Create a subdirectory `exported_weights/mynewmodel/` and save the weights there.
   *   Document the tensor names and their corresponding model parameters in a `mynewmodel_weights_doc.txt` file within that directory (see existing `classifier_weights_doc.txt` or `bb_regressor_weights_doc.txt` for examples). This is crucial for the `DiMPTorchScriptWrapper` if loading from individual tensors.

**2. Update C++ Code:**

   *   **`generate_test_samples.cpp`**:
      *   Add functions to generate realistic random input tensors for `MyNewModel`.
      *   Define the expected input tensor names and shapes.
      *   Modify the `main` function to:
         *   Create a directory `test/input_samples/mynewmodel/sample_N/`.
         *   Call your new input generation functions.
         *   Save these input tensors (e.g., `my_input1.pt`, `my_input2.pt`) into the created directory using the `save_tensor` utility.
   *   **`test_models.cpp`**:
      *   Include the header for your C++ `MyNewModel` (e.g., `cimp/mynewmodel/mynewmodel.h`).
      *   In the `main` function:
         *   Add a section for `MyNewModel`.
         *   Determine the absolute path to `exported_weights/mynewmodel/`.
         *   Instantiate your C++ `MyNewModel`, passing the weights directory.
         *   Loop through the number of samples:
            *   Construct paths to the input tensors in `test/input_samples/mynewmodel/sample_N/`.
            *   Load these input tensors using `load_tensor`. Ensure they are on the correct device (CPU/CUDA).
            *   Call the relevant methods of your C++ `MyNewModel` (e.g., `myNewModel.predict(...)`).
            *   Create an output directory `test/output/mynewmodel/sample_N/`.
            *   Save the output tensors from your C++ model (e.g., `my_output.pt`) to this directory using `save_tensor`. Remember to move outputs to CPU before saving if they are on CUDA.
   *   **`CMakeLists.txt`**:
      *   If `MyNewModel` is a new static library (like `classifier` or `bb_regressor`), define its sources and add it as a library.
      *   Link `test_models` and `generate_test_samples` (if it needs new specific libraries) with `MyNewModel` library and any other dependencies (like LibTorch).

**3. Update Python Comparison Script (`compare_models.py`):**

   *   **`ModelComparison.__init__` & `_init_models`**:
      *   If your Python `MyNewModel` needs to be loaded via `DiMPTorchScriptWrapper`, update the wrapper or add logic to load your model. You might need to add a new parameter like `mynewmodel_sd='mynewmodel'` to `DiMPTorchScriptWrapper` and handle its loading.
      *   Store the loaded Python `MyNewModel` instance (e.g., `self.models.mynewmodel`).
   *   **Create `compare_mynewmodel` method**:
      *   Create a new method, e.g., `def compare_mynewmodel(self):`.
      *   Print a starting message.
      *   Define input and C++ output directory paths: `Path('test') / 'input_samples' / 'mynewmodel'` and `Path('test') / 'output' / 'mynewmodel'`.
      *   Loop through `self.num_samples`:
         *   Initialize `current_errors = {}` for the current sample.
         *   Construct paths to input tensors for `MyNewModel` from `test/input_samples/mynewmodel/sample_N/`.
         *   Load these tensors using `self.load_cpp_tensor()`.
         *   Run the Python `MyNewModel` with these inputs to get `py_output_tensor`. Handle potential errors.
         *   Construct paths to C++ output tensors from `test/output/mynewmodel/sample_N/`.
         *   Load the C++ output tensor (`cpp_output_tensor`) using `self.load_cpp_tensor()`.
         *   Call `self._compare_tensor_data(py_output_tensor, cpp_output_tensor, "MyNewModel Output Comparison Name", i, current_errors)`. Use a descriptive name.
         *   If there are multiple distinct outputs from `MyNewModel` to compare, repeat the load and `_compare_tensor_data` calls for each.
         *   Store the results: `if current_errors: self.all_errors_stats[f"MyNewModel_Sample_{i}"] = current_errors`.
   *   **`ModelComparison.run_all_tests`**:
      *   Call your new `self.compare_mynewmodel()` method.

**4. Run the Tests:**
   *   Execute `./test/run_full_comparison.sh <num_samples>`.
   *   Check the console output and `test/comparison/report.html` for the results of `MyNewModel`.

## Key Considerations:

*   **Tensor Naming and Paths:** Be consistent with tensor filenames and directory structures. The Python script relies on these conventions to find the correct files.
*   **Data Types and Devices:** Ensure tensors are of compatible data types (usually `float32`) and are on the correct device (CPU/CUDA) before model inference and before saving/loading. C++ outputs are saved from CPU.
*   **Error Handling:** Implement robust error handling in both C++ (e.g., for file loading, model errors) and Python (e.g., for tensor loading, Python model execution). The comparison script is designed to report "N/A" for metrics if tensors are missing or shapes mismatch, allowing other comparisons to proceed.
*   **`DiMPTorchScriptWrapper`:** If your Python model structure is different from DiMP's Classifier/BBRegressor, you might need to adapt `DiMPTorchScriptWrapper` or write a custom loader for your Python model if it's not already a `torch.jit.ScriptModule`. The current wrapper supports loading from a directory of named tensor files based on a documentation text file.
*   **`load_cpp_tensor` in Python:** This utility in `compare_models.py` attempts to robustly load tensors saved by LibTorch (which sometimes get wrapped as `RecursiveScriptModule`). If you encounter issues loading your C++ saved tensors, you might need to inspect their structure and potentially adapt this function. The C++ `save_tensor` function aims to save plain tensors.

By following these steps, you can integrate new models into this testing framework to validate their C++ implementations.