Published February 20, 2025 | Version 1.0.0
Dataset Open

Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment

  • 1. ROR icon TU Wien

Description

How To Cite?

Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

 

Folder Structure

The folder named “submission” contains the following:

  1. “pythonProject”: This folder contains all the Python files and subfolders needed for analysis.
  2. ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

Setting Up the Environment

  1. Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.
  2. The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

Subfolders

1. Data_4_IJGIS

  • This folder contains the data used for the results reported in the paper.
  • Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

2. results_[DateTime] (e.g., results_20240906_15_00_13)

  • This folder will be generated when you run the code and will store the output of each step.
  • The current folder contains results created during code debugging for the submission.
  • When you run the code, a new folder with fresh results will be generated.

Python Files

1. helper_functions.py

  • Contains reusable functions used throughout the analysis.
  • Each function includes a description of its purpose and the input parameters required.

2. create_sanity_plots.py

  • Generates scatter plots like those in Figure 3 of the paper.
  • Although the code has been run for all 309 trials, it can be used to check the sample data provided.
  • Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.
  • Usage: Run this file to create visualizations similar to Figure 3.

3. overlapping_sliding_window_loop.py

  • Implements overlapping sliding window segmentation and generates plots like those in Figure 4.
  • Output:
    • Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.
    • Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.
    • A visualization of the segments, similar to Figure 4, will be automatically generated.

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.) 

  • These files compute features as explained in Tables 1 and 2 of the paper, respectively.
  • They process the segmented recordings generated by the overlapping_sliding_window_loop.py.
  • Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

5. training_prediction.py

  • This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:
a. Data Preparation (corresponding to Section 5.1.1 of the paper)
  • Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.
  • A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.
b. Training/Validation/Test Split
  • Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).
  • Make sure that you follow the instructions in the comments to the code exactly.
  • Output: The split data is saved as .csv files in the results folder.
c. Machine and Deep Learning Experiments

This part contains three main code blocks:

iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

  • MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.
  • XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.
  • XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

d. Inference (Monitoring Part)
  • Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.
  • Figure 8 in the paper is generated using this part of the code.

6. sequence_analysis.py

  • Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.
  • This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

Licenses

The data is licensed under CC-BY, the code is licensed under MIT.

Files

readme.pdf

Files (10.8 GiB)

Name Size
md5:e9ffefc4da4c6ad9477ff62d6981abb6
12.2 KiB Download
md5:208568e24deaa3e759b85d0a0a25b288
56.5 KiB Preview Download
md5:412d038386bf06b56d5acaceb8784449
10.8 GiB Preview Download