| |
Last updated on October 12, 2025. This conference program is tentative and subject to change
Technical Program for Wednesday October 8, 2025
|
We-S1-T1 |
Hall F |
Deep Learning 6 |
Regular Papers - Cybernetics |
Chair: Zhang, Li | Royal Holloway, University of London |
Co-Chair: Li, Jun | Nanjing Normal University |
|
08:30-08:45, Paper We-S1-T1.1 | |
Asymmetric U-Net with Gaussian Splatting for Single-View 3D Reconstruction (I) |
|
Wang, Yifan | Southwest Petroleum University |
Xu, Zhirui | SouthWest Petroleum University |
Zeng, Xianting | Southwest Petroleum University |
Zhou, Wenjun | Southwest Petroleum University |
Peng, Bo | Southwest Petroleum University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: With the rapid advancement of computer vision and graphics, high-quality single-view 3D reconstruction has become increasingly significant in applications such as autonomous driving, robotics, and virtual reality. Traditional methods often struggle with the inherent ill-posed nature of single-view reconstruction, leading to limited accuracy and reduced detail retention. To address these challenges, we introduce an innovative framework that integrates an asymmetric U-Net architecture with 3D Gaussian splatting for single-view 3D reconstruction. Furthermore, our approach incorporates a novel 3D smoothing filter that constrains the maximum frequency of the 3D representation, effectively mitigating high-frequency artifacts in out-of-distribution rendering. By synergistically combining implicit and explicit representations, our method leverages the strengths of both to enhance reconstruction efficiency and quality. Experiments conducted on the SRN-Cars dataset demonstrate that our framework outperforms existing methods in quantitative metrics and qualitative assessments, achieving higher reconstruction accuracy and smoother surfaces.
|
|
08:45-09:00, Paper We-S1-T1.2 | |
DGKD: A Universal Knowledge Distillation Framework Based on Decoupling Gradients |
|
Yan, Fuhang | Beijing University of Posts and Telecommunications |
Niu, Tao | Beijing University of Posts and Telecommunications |
Teng, Yinglei | Beijing University of Posts and Telecommunications |
Keywords: Deep Learning, Machine Vision, Transfer Learning
Abstract: Knowledge distillation transfers knowledge from a complex network (teacher) to a lightweight network (student).Existing knowledge distillation methods typically employ a loss function comprising task and distillation losses, and they use a hyper-parameter to balance the two losses. However, in this paper, we observe an inconsistency between the gradientdirections of these two losses, which introduces a trade-off between two gradients, hindering the student from learning the entire knowledge. To overcome this challenge, we propose a Universal Knowledge Distillation Framework Based on Decoupling Gradients (DGKD). DGKD breaks the trade-off by decoupling the gradients of the two losses and distributing these to separate branches of the student. Additionally, we introduce a Knowledge Interaction Module (KIM) and Inference-stage Simplifications to optimize our DGKD, enhancing its flexibility and simplicity. Extensive experiments validate the superiority of DGKD. For example, DGKD achieves a +2.80% accuracy improvement on CIFAR-100 for the VGG13-MN-V2 pair.
|
|
09:00-09:15, Paper We-S1-T1.3 | |
GMP: A GNN-Enhanced Mamba-2 Neural Predictor for Efficient NAS |
|
Lu, Yupeng | East China Normal University |
Mao, Hongyan | East China Normal University |
Jiang, Ningkang | East China Normal University |
Keywords: Deep Learning, Neural Networks and their Applications, Computational Intelligence
Abstract: In recent years, predictor-based Neural Architecture Search (NAS) methods have gained significant attention due to their ability to rapidly estimate architecture properties, such as accuracy and latency. However, GNN-based predictor methods often suffer from convergence issues that lead to suboptimal performance, and Transformer-based predictor face quadratic computational complexity which limits their efficiency. To address these challenges, we propose GMP, a GNN-Enhanced Mamba-2 neural predictor that efficiently learns the feature representations of architectures and accurately predicts architecture properties. Specifically, we leverage a novel Mamba-2 model, which employs an advanced State Space Duality (SSD) framework to achieve competitive performance with linear complexity. To adapt to neural prediction tasks and fully improve the predictive capabilities, we propose a bidirectional Mamba-2 module and design a flow-enhanced GNN to integrate with it. To the best of our knowledge, we are the first to apply such SSM architecture to neural predictors. Extensive experiments conducted on NAS-Bench-101, NAS-Bench-201, DARTS, and NNLQP datasets demonstrate the feasibility of our proposed method. Compared to state-of-the-art methods, our GMP not only achieves a significant speedup (7.47 times faster) but also delivers satisfactory performance. Our code is available at https://github.com/estar445/GMP.
|
|
09:15-09:30, Paper We-S1-T1.4 | |
SAMKD: Spatial-Aware Adaptive Masking Knowledge Distillation for Object Detection |
|
Zhang, Zhourui | Nanjing Normal University |
Li, Jun | Nanjing Normal University |
Li, Jiayan | Nanjing Normal University |
Xu, Jianhua | Nanjing Normal University |
Keywords: Deep Learning, Neural Networks and their Applications, Expert and Knowledge-Based Systems
Abstract: Most of recent attention-guided feature masking distillation methods perform knowledge transfer via global teacher attention maps without delving into fine-grained clues. Instead, performing distillation at finer granularity is conducive to uncovering local details supplementary to global knowledge transfer and reconstructing comprehensive student features. In this study, we propose a Spatial-aware Adaptive Masking Knowledge Distillation (SAMKD) framework for accurate object detection. Different from previous feature distillation methods which mainly perform single-scale feature masking, we develop spatially hierarchical feature masking distillation scheme, such that the object-aware locality is encoded during coarse-to-fine distillation process for improved feature reconstruction. In addition, our spatial-aware feature distillation strategy is combined with a masking logit distillation scheme in which region-specific feature difference between teacher and student networks is utilized to adaptively guide the distillation process. Thus, it can help the student model to better learn from the teacher counterpart with improved knowledge transfer and reduced gap. Extensive experiments for detection task demonstrate the superiority of our method. For example, when FCOS is used as teacher detector with ResNet101 backbone, our method improves the student network from 35.3% to 38.8% mAP, outperforming state-of-the-art distillation methods including MGD, FreeKD and DMKD.
|
|
09:30-09:45, Paper We-S1-T1.5 | |
Metal Artifact Removal for Cultural Rlics Restoration: A Segmentation-Driven Deep Learning Framework on CT Images |
|
Li, Jianqiang | Beijing University of Technology |
He, Hang | Beijing University of Technology |
Xu, Xi | Beijing University of Technology |
Huang, Jing | The Palace Museum |
Keywords: Deep Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: Metal artifacts frequently emerge in reconstructed computerized tomography (CT) images of cultural relics, primarily due to the presence of metal elements within these cultural relics during the scanning process. It degrades the quality of CT images, posing a challenge to cultural relic restoration efforts. Traditional metal artifact reduction (MAR) methods include projection-domain interpolation and iterative reconstruction. The former offers high computational efficiency but may introduce secondary artifacts, while the latter achieves high accuracy at the cost of increased computational complexity. Recent studies have employed CNN-based image segmentation models to automatically separate metal artifacts from anatomical structures in medical CT images. These models demonstrate superior accuracy compared to conventional approaches. However, compared to medical CT images, the CT images of cultural relics are more complex and blurred at the edges due to their age and the diversity of their constituent materials. This makes artifact removal in cultural relics a challenging task. To address this problem, we first constructed a dataset of CT images of Chinese cultural relics and annotated the cultural relic regions in each image. Secondly, we classified the images into five classes based on the shape of the artifact, then proposed the Annotation and Structural Variability Estimator (ASVE) to assess the complexity of each class for division of test set and training set. Finally, we proposed a CT image artifact removal framework for cultural relics, utilizing a domain-specific pre-trained encoder and a designed attention-based scSEConv Block to enhance the artifact awareness and boundary recovery capabilities of the Unet. The experimental results showed that our model achieved an IOU of 0.8445 and a recall of 0.9179 on the test set, outperforming other compared segmentation models.
|
|
09:45-10:00, Paper We-S1-T1.6 | |
Deep Fusion Networks with Hybrid Attention Mechanisms for Voice Diabetes Detection |
|
Gangani, Abhishek | Royal Holloway, University of Londo |
Zhang, Li | Royal Holloway, University of London |
Panesar, Arjun | DDM Health Ltd |
Keywords: Computational Intelligence, Expert and Knowledge-Based Systems, Hybrid Models of Computational Intelligence
Abstract: About 50% of Type 2 diabetes patients develop diabetic neuropathy, which can damage nerves throughout the body, including those controlling the vocal cords, leading to issues like vocal fold paralysis, hoarseness, or vocal strain. Therefore, this research pioneers automated diabetes diagnosis using voice and speech recordings, with the attempt to provide a non-invasive, easily accessible tool for early detection of diabetes and prediabetes conditions. Firstly, five speech/voice datasets have been generated, including audio recordings of vowel letters (‘A’, ‘E’, ‘O’) and high and low speed countings of digital numbers (1-20), provided by participants with and without diabetes conditions. Initially, Mel-frequency Cepstral Coefficient (MFCC) features from various voice clips of diabetic and non-diabetic participants are extracted. 1D Convolutional Neural Network (1D CNN) and Bidirectional Long Short-Term Memory (BiLSTM) incorporating various hybrid attention mechanisms are proposed for diabetes classification. Specifically, attention methods, i.e. Squeeze and Excitation Block (SE), Convolutional Block Attention Module (CBAM), and Global Context (GC) blocks, as well as their hybrid strategies, i.e. SE+CBAM and SE+CBAM+GC, have been proposed to extract the most important acoustic features and incorporated with BiLSTM and 1D CNN, respectively. Moreover, within and cross-dataset hybrid attention feature maps of diverse resulting networks are constructed to further emphasize the most crucial disease-indicative patterns from a global perspective, leading to the best accuracy rate of 82%. The empirical studies suggest that speech analysis could be a viable method for the preliminary diagnosis of diabetes.
|
|
We-S1-T2 |
Hall N |
Neural Networks and Their Applications 1 |
Regular Papers - Cybernetics |
Chair: Tomioka, Yoichi | The University of Aizu |
Co-Chair: Shen, Jun | Anhui University |
|
08:30-08:45, Paper We-S1-T2.1 | |
Fusion Dynamic In-Context in Decoder for Event Causality Identification |
|
Mu, Lin | Anhui University |
Shen, Jun | Anhui University |
Ni, Li | Anhui University |
Keywords: Neural Networks and their Applications
Abstract: Event causality identification (ECI) aims to predict the causal relationships between events mentioned in text. Although existing prompt-based methods have achieved significant improvements on the ECI task, these methods face the static prompt challenge, i.e., the model encodes all examples within the prompt equally. In this paper, we proposes a Fusion Dynamic In-Context in Decoder (FDICD) model. Specifically, FDICD is inspired by the in-context learning paradigm, encoding dynamic in-context examples separately and then fusing this information in the decoder. Additionally, we incorporate external knowledge to enhance the semantic representation of causal tokens in the prompt. We conducted extensive experiments on two widely used benchmarks, and the results demonstrate that FDICD achieve promising improvements over baseline methods.
|
|
08:45-09:00, Paper We-S1-T2.2 | |
Automatic Determination Method for Suspected Software Defects Based on Static Analysis |
|
Li, Xuejian | Anhui University |
Sun, Chunyang | An Hui University |
Zhu, Zhengguang | School of Computer Science and Technology, an Hui University, He |
Li, Zihan | Anhui University |
Keywords: Neural Networks and their Applications
Abstract: 用于软件缺陷预测的深度学习技术 提高大规模缺陷检测效率 项目。 但是,预测结果通常包括 一定比例的误报和遗漏, 需要手动验证实际缺陷。 这导致研究人员对以下方面的兴趣增加 开发方法以自动确定 可疑缺陷的真实性。 然而,现有的自动缺陷方法 识别是有限的,因为它们通常无法识别 有效区分真伪缺陷 阳性。 在本文中,我们提出了一种自动方法 验证基于深度学习的预测结果 软件缺陷模型,利用静态分析 技术。 这种方法有效降低了人工成本 与验证潜在缺陷相关。 它通过聚合来自 程序入口点到可疑位置 缺陷。 该方法涉及编译和完善 潜在缺陷的执行路径轨迹, 对结构化采用筛选和优先级 分析。 符号执行用于验证 可疑缺陷。 在Ö
|
|
09:00-09:15, Paper We-S1-T2.3 | |
A Spatio-Temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction |
|
Long, Juncen | Politecnico Di Milano |
Bardaro, Gianluca | Politecnico Di Milano |
Mentasti, Simone | Politecnico Di Milano |
Matteucci, Matteo | Politecnico Di Milano |
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning
Abstract: Pedestrian trajectory prediction is important in the research of mobile robot navigation in environments with pedestrians. Most pedestrian trajectory prediction algorithms require as input complete historical trajectories. If a pedestrian is unobservable in any frame in the past, then its historical trajectory becomes incomplete and the algorithm does not predict its future trajectory. To address this limitation, we propose STGN-IT, a spatio-temporal graph network allowing incomplete trajectory input. STGN-IT is able to predict the future trajectories of pedestrians with incomplete historical trajectories. STGN-IT uses the spatio-temporal graph with an additional encoding method to represent the historical trajectories and observation states of pedestrians. Moreover, STGN-IT introduces static obstacles in the environment that may affect the future trajectories as nodes to further improve the prediction accuracy. A clustering algorithm is also applied in the construction of spatio-temporal graphs. Experiments on public datasets show that STGN-IT outperforms state-of-the-art algorithms.
|
|
09:15-09:30, Paper We-S1-T2.4 | |
An Adaptive Memory Multi-Level Feature Graph Convolutional Network* |
|
Zhao, Xinbo | HeBei University |
Yang, Wenzhu | Hebei University |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Machine Learning
Abstract: Graph Convolutional Networks (GCNs) have excelled in skeleton-based action and gesture recognition due to their strong feature extraction capabilities. However, existing methods often overlook the aggregation of motion information from low-level to high-level features. To address this, we propose an Adaptive Memory Multi-level Feature Graph Convolutional Network (AMF-GCN), which preserves low-level features during propagation and progressively passes them to the output. We also introduce a Motion Capture Module (MCM) to capture intrinsic relationships between motion and spatial features, and a Dynamic Fusion Graph Convolution (DFGC) algorithm to efficiently transmit multi-level features while preserving low-level information. Additionally, a Motion Feature Enhancement Mechanism (MFEM) uses attention to highlight important features. Experimental results indicate that AMF-GCN demonstrates competitive performance compared to mainstream models, achieving 93.5% accuracy on NTU RGB+D (X-sub) and 97.5% on X-view, while delivering favorable results across multiple benchmark datasets.
|
|
09:30-09:45, Paper We-S1-T2.5 | |
CWTLNet: Ultra-Short-Term Cryptocurrency Forecasting with Wavelet-Enhanced Deep Architecture |
|
Du, Xingbang | Hokkaido University |
Cao, Yang | Hokkaido University |
Zhang, Enzhi | Hokkaido University |
Zhong, Rui | Hokkaido University |
Munetomo, Masaharu | Hokkaido University |
Keywords: Neural Networks and their Applications, Deep Learning, AI and Applications
Abstract: In this study, we propose a novel deep neural model, CWTLNet, specifically designed to address the unique characteristics of cryptocurrency trading, including significant short-term volatility, which operates around the clock, and is prone to sudden price jumps and drops. The architecture consists of two branches. The CWT branch employs a continuous wavelet transform (CWT) to extract time-frequency feature maps from the input time series. These feature maps are then processed by a stack of CWT Blocks, which are composed of residual structures with two Conv1D layers, allowing the model to capture local, ultra-short-term patterns in the data. The linear branch first decomposes the time series into its trend and residual components. Each component is modeled separately using linear models and subsequently recombined. The linear branch is particularly effective at capturing the periodic patterns embedded in the time series. Finally, the outputs of the two branches are fused through a gated mechanism with a temperature coefficient, enabling adaptive weighting of the branch outputs. Experimental results demonstrate that CWTLNet outperforms commonly used models — including LSTM, Transformer, and DLinear — on minute-level trading data for seven different cryptocurrencies. For example, The performance of CWTLNet on the Bitcoin dataset, in terms of Mean Squared Error (MSE) and Mean Absolute Error (MAE), shows improvements of at least 2.7% and 1.7%, respectively, compared to the aforementioned three models.
|
|
09:45-10:00, Paper We-S1-T2.6 | |
CLEAR: A Clean-Label Backdoor Attack Via Representation-Guided Trigger Embedding |
|
Wu, Zhan | Nanjing University of Science and Technology |
Li, Haipeng | Nanjing University of Science and Technology |
Wu, Di | Nanjing University of Science and Technology |
Pang, Shuchao | Nanjing University of Science and Technology |
Keywords: Neural Networks and their Applications, Deep Learning, AI and Applications
Abstract: Recent studies have shown that although DNNs perform well on visual tasks, they are still vulnerable to clean-label backdoor attacks. As poisoned samples come from the target class, the model learns both original and trigger features, weakening the association between the trigger and the target label and thereby reducing the attack success rate (ASR). To address this issue, we propose a novel clean-label backdoor attack framework, named CLEAR, i.e., Clean-Label Embedding Attack with Representation-Guidance, which strengthens the correlation between triggers and target labels while maintaining high stealth. Specifically, the“benign-label push” mechanism perturbs clean samples selected from the target class (the samples designated for poisoning), pushing them away from their original class in the feature space, while the “target-label pull” mechanism pulls the poisoned sample representations closer to the target class through saliency-guided embedding. CLEAR consists of three key components: integrating a diffusion model with PGD to generate natural but semantically perturbed adversarial samples; selecting backdoor-susceptible samples near the decision boundary based on classification loss; and embedding triggers into high-saliency regions identified using a Feature Pyramid Network (FPN) combined with a local self-attention mechanism. Experimental results on CIFAR10 and GTSRB with ResNet18 and VGG16 demonstrate that CLEAR generally improves ASR while maintaining strong stealth.
|
|
We-S1-T3 |
Room 0.11 |
Emerging Theories, Algorithms and Applications in Soft Computing |
Special Sessions: Cyber |
Chair: Wang, Zitong | University of Aizu |
Co-Chair: Harada, Tomohiro | Saitama University |
Organizer: Pei, Yan | University of Aizu |
Organizer: Liu, Xiabi | Beijing Institute of Technology |
Organizer: Choo, Yun-Huoy | Universiti Teknikal Malaysia Melaka |
Organizer: Ohnishi, Kei | Kyushu Institute of Technology |
|
08:30-08:45, Paper We-S1-T3.1 | |
Enhancing Non-Dominated Sorting Genetic Algorithm III Using Chaotic Dynamics and Estimated Convergence Point (I) |
|
Wang, Zitong | University of Aizu |
Pei, Yan | University of Aizu |
Li, Jianqiang | Beijing University of Technology |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Optimization and Self-Organization Approaches
Abstract: In evolutionary computation, the study subject for algorithms capable of effectively resolving multi-objective optimization problems remains at the forefront of research. Most existing multi-objective evolutionary algorithms (MOEAs) often struggle with maintaining diversity and avoiding premature convergence when solving complex or high-dimensional optimization problems. This study proposes an innovative iteration of the Non-dominated Sorting Genetic Algorithm III (NSGA-III), which infuses chaotic dynamics alongside estimated convergence point strategies to enhance solution quality and diversity. Our research undertakes a comprehensive evaluation of this enhanced algorithm against a suite of benchmark problems. We compare it with its predecessor, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and variations incorporating chaotic dynamics alongside estimated convergence point strategies. The performance analysis results indicate that NSGA-III, using chaotic dynamics and estimated convergence point strategies across most test functions, demonstrates superior performance over traditional and singly-enhanced MOEAs. The statistical analysis strongly suggests that the dual enhancements embedded in the novel algorithm contribute significantly to its optimization capabilities.
|
|
08:45-09:00, Paper We-S1-T3.2 | |
Balancing Exploration and Exploitation in Maximum Diffusion Reinforcement Learning Using Evolutionary Computation Algorithm (I) |
|
Zhao, Ying | The University of Aizu |
Pei, Yan | University of Aizu |
Keywords: Machine Learning, Evolutionary Computation, Neural Networks and their Applications
Abstract: Balancing exploration and exploitation is a fundamental challenge in reinforcement learning. In Maximum Diffusion Reinforcement Learning (MaxDiff RL), this balance is regulated by a temperature parameter that controls exploration, but optimizing it across different tasks is challenging. To tackle this issue, we propose a method that dynamically adjusts the temperature parameter via evolutionary algorithms during training. The training process is divided into multiple stages, where different optimization strategies are applied to adaptively evolve the temperature parameter, ensuring a proper balance between exploration and exploitation. We evaluate our method on two continuous control tasks in robotics, i.e., Swimmer and HalfCheetah. Experimental results demonstrate that our method outperforms baseline algorithms with randomly initialized or default temperature parameters, achieving faster convergence and higher cumulative rewards, particularly in tasks demanding greater exploration.
|
|
09:00-09:15, Paper We-S1-T3.3 | |
How Reliable Is Validation Accuracy for Estimating Surrogate Model Generalization in Surrogate-Assisted Evolutionary Algorithms? (I) |
|
Hanawa, Yuki | Tokyo Metropolitan University |
Harada, Tomohiro | Saitama University |
Miura, Yukiya | Tokyo Metropolitan University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Machine Learning
Abstract: Surrogate-assisted evolutionary algorithms (SAEAs) address expensive optimization problems using surrogate models to approximate costly evaluation functions. To develop effective SAEAs, evaluating the generalization performance of surrogate models during the search process is crucial. In practice, surrogate accuracy is typically estimated using validation accuracy computed from a holdout subset of the limited training data. This estimate is then used to guide parameter tuning or surrogate model selection. However, whether validation accuracy reliably reflects test accuracy on unseen data is unclear, especially when only a small training dataset is available in SAEAs. Despite its practical importance, the relationship between validation and test accuracies in SAEAs has not been systematically investigated. This study examines the reliability of validation accuracy as an indicator of surrogate model generalization. Toward this goal, we implement surrogate-assisted particle swarm optimization with a generation-based strategy and employ radial basis function (RBF) models, which are commonly used in recent SAEA research. Experiments are conducted on CEC 2015 benchmark problems to analyze the relationship between validation and test accuracies. The results reveal that validation accuracy does not consistently correlate with test accuracy, suggesting that it is not a reliable indicator of surrogate model generalization in SAEAs.
|
|
09:15-09:30, Paper We-S1-T3.4 | |
A Hybrid A*D3QN Framework with Prior Knowledge and Multimodal Data Fusion for USV Path Planning (I) |
|
Peng, Huanxin | Dalian University of Technology |
Yang, Dequan | Dalian University of Technology |
Li, Xianneng | Dalian University of Technology |
Hu, Deqiang | Dalian University of Technology |
Yu, Yang | Dalian University of Technology |
Zhang, Zhongzhao | Dalian University of Technology |
Keywords: Application of Artificial Intelligence, Deep Learning
Abstract: Path planning is an essential task for the mission execution of unmanned surface vehicle (USV). However, existing advanced techniques based on deep reinforcement learning (DRL) often suffer from low learning efficiency and insufficient environmental perception from single-sensor configurations. To address these issues, this paper proposes a hybrid framework named A*D3QN, which integrates the heuristic efficiency of the A* algorithm with the adaptive decision-making of a dueling double deep Q-network (D3QN), associated with the multimodal data fusion for precise environment modeling. The proposed A*D3QN incorporates prior knowledge from the global paths generated by A* to initialize and guide the D3QN learning process. The prior knowledge, formatted as RL transition tuples, is used in the reward function design and the N-step prioritized experience replay, which significantly accelerates overall learning efficiency. Moreover, an improved D3QN architecture is designed to dynamically fuse visual data and navigation states via a cross-entropy attention mechanism, enabling multimodal perception in partially unknown environments. Extensive experiments across three scenarios with varying obstacle densities demonstrate that A*D3QN significantly outperforms state-of-the-art DRL baselines. Ablation studies further validate the necessity of each component.
|
|
We-S1-T4 |
Room 0.12 |
Fault Monitoring and Diagnosis |
Regular Papers - SSE |
Chair: Xia, Wei | Institute of Information Engineering,Chinese Academy of Sciences |
Co-Chair: Ghanduri, Fatima | University of Glasgow |
|
08:30-08:45, Paper We-S1-T4.1 | |
Plug-And-Play PLC-Based Monitoring and Outlier Detection for Inline Production Systems Via a Generalized Multi-Agent Approach |
|
Wagner, Cedric | Technical University of Munich |
Dominik, Hujo-Lauer | Chair of Automation and Information Systems Technical University |
Vogel-Heuser, Birgit | Technical University of Munich |
Keywords: Fault Monitoring and Diagnosis, Manufacturing Automation and Systems, Cyber-physical systems
Abstract: Rapid technological advances in recent years have established Big Data, Industrie 4.0, Internet of Things (IoT) and Artificial Intelligence (AI) as data-driven concepts that are increasingly being implemented and proving value in real-world scenarios, transforming the future across all sectors—including manufacturing. Especially in the area of industrial automation, these fast-paced and transformative changes collide with a long-established industry. Despite the current state-of-the-art in technology, the lifetime of equipment is measured in decades—with the need to provide long-term support even to outdated (legacy) systems in heterogeneous production environments. Incorporating changes in such systems is challenging and mostly driven by software due to its adaptability. This paper contributes a generalized, lightweight approach for integrating machine-level data collection and analysis with agent-based systems, aiming to facilitate the adoption of CPPS in the industry and easing the integration with legacy control systems. As a use case for production performance optimization, the monitoring and detection of outliers for interconnected transportation systems has been defined. A key feature of the approach is that it utilizes only data from presence sensors. Thus, only minimal information about the system and no further configuration or development is required while maintaining full functionality. The generalizability and transferability of this approach have been validated using two representative laboratory production systems, demonstrating the potential of PLC-based data (pre-) processing and its scalability to multiple IEC 61131-3 conform control applications.
|
|
08:45-09:00, Paper We-S1-T4.2 | |
A Distributed Framework for Financial Market Trend Prediction Using Hybrid Fuzzy Clustering and Hidden Markov Models |
|
Ghanduri, Fatima | University of Glasgow |
Anagnostopoulos, Christos | University of Glasgow |
Keywords: Distributed Intelligent Systems, Adaptive Systems, Fault Monitoring and Diagnosis
Abstract: Accurate financial forecasting demands scalable models that adapt to volatility in high-dimensional, temporally dynamic markets. Traditional approaches—such as centralized Hidden Markov Models (HMMs) and GARCH (Generalized Autoregressive Conditional Heteroskedasticity)—struggle with real-time responsiveness and robustness. We propose HMM-DFC, a distributed framework that integrates Fuzzy C-Means (FCM) clustering with HMMs across temporally partitioned nodes to capture evolving market regimes. To enhance adaptability and consistency, we introduce two novel modules: Volatility-Aware Transition Refinement (VATR) for sharper regime transitions and Entropy-Based Node Regularization (EBNR) for stable cross-node synchronization. Extensive evaluation on S&P 500 data shows that HMM-DFC outperforms GARCH, centralized HMMs, and distributed FCM in volatility detection and clustering quality. The framework supports real-time, and scalable forecasting for applications in trading.
|
|
09:00-09:15, Paper We-S1-T4.3 | |
FakeApp: A High-Precision Method for Domain Fronting Detection in Real Networks with Neuro-Symbolic Integration |
|
Yibo, Xie | Institute of Information Engineering, Chinese Academy of Science |
Gou, Gaopeng | Institute of Information Enginnering, Chinese Academy of Science |
Xiong, Gang | Institute of Information Enginnering, Chinese Academy of Science |
Li, Zhen | Institute of Information Enginnering, Chinese Academy of Science |
Xia, Wei | Institute of Information Engineering,Chinese Academy of S |
Keywords: Fault Monitoring and Diagnosis, Communications, Homeland Security
Abstract: Domain fronting is a covert communication technique, which evades detection by connecting with legitimate domains to imitate normal network traffic. But the imitation is flawed, so current detection methods usually treat domain fronting as abnormal traffic. However, these methods show very low precision due to normal-abnormal traffic imbalance in practice. In the paper, we find that domain fronting's imitation is limited to popular applications (apps), such as Chrome and Firefox browser. Thus, we mitigate traffic imbalance by defining domain fronting detection as an app discrimination problem, rather than previous anomaly detection task. According to the revised definition, we propose FakeApp, a high-precision method for domain fronting detection in real networks. Using frequent item analysis, FakeApp first extracts the imitated app information from domain fronting tools as symbolic features. Then, through deep neural networks, it discriminates whether traffic belongs to genuine or spoofed apps. Finally, FakeApp integrates symbolic features and neural networks together to identify domain fronting in real networks. Evaluations over 2 million flows show that the precision of FakeApp is over 95%, far surpassing state-of-the-art methods on four domain fronting tools. These results also indicate that we have effectively mitigated the traffic imbalance issue.
|
|
09:15-09:30, Paper We-S1-T4.4 | |
A Fuzzy-Logic Framework for Humidity-Setpoint Sensitivity: Impacts on Latent and Sensible HVAC Energy Loads |
|
Safdari, Mojtaba | University of Guelph |
Keywords: Intelligent Power Grid, Infrastructure Systems and Services, Cyber-physical systems
Abstract: This study investigates how fuzzy-logic humidity thresholds influence HVAC energy of a residential house. A validated fuzzy controller is embedded in the Vertical City Weather Generator (VCWG v3.0.0) and subjected to a one-at-a-time sensitivity analysis on the four relative-humidity limits—Low-Low, Low, High and High-High—while all other inputs are held constant. Results show that sensible loads remain essentially unchanged regardless of threshold adjustments, whereas humidification and dehumidification respond strongly and in opposite directions to shifts at the dry and humid ends of the comfort band. Widening the allowable humidity range therefore offers a straightforward means to cut latent energy without affecting heating or cooling demand. The findings highlight which fuzzy thresholds matter most and provide practical guidance for tuning rule-based HVAC control in cold, dry climates.
|
|
09:30-09:45, Paper We-S1-T4.5 | |
A Mamba-Based Real-Time DC Arc Fault Diagnosis |
|
Zhao, Ruxue | Shandong University of Science and Technology |
Feng, Wancheng | Shandong University of Science and Technology |
Tian, Chunpeng | Shandong University of Technology |
Keywords: Fault Monitoring and Diagnosis, Intelligent Power Grid, Smart Metering
Abstract: Arc fault diagnosis constitutes a critical challenge in fault detection, focusing on rapid and precise identification of arc-induced safety risks in power systems. Previous methodologies hard to strike a balance between real-time performance and detection accuracy in arc fault diagnosis systems. To address this challenge, we propose a linear-time Mamba-based model enhanced with a Spatial Awareness Module(SAM), achieving real-time DC arc fault diagnosis. Specifically, our approach leverages a state-space model (SSM) framework and employs a hardware-aware parallel algorithm for efficiency. To further improve accuracy while maintaining the computational efficiency of the base model, we integrate a spatial awareness module to capture global features, enabling precise fault diagnosis. Experimental results demonstrate that our method achieves 96.72% accuracy with a 1.87 ms response time, making it highly suitable for industrial applications where rapid and reliable arc fault diagnosis is critical. This advancement holds significant promise for enhancing safety in industrial electrical systems.
|
|
09:45-10:00, Paper We-S1-T4.6 | |
Patch-Sfp: A Fail-Slow Detection Framework for Cloud Storage Systems |
|
Li, Jing | Civil Aviation University of China |
Wang, Tianhao | Civil Aviation University of China |
Ding, Jianli | Civil Aviation University of China |
Keywords: Fault Monitoring and Diagnosis, Large-Scale System of Systems, Discrete Event Systems
Abstract: 随着云存储系统的发展,“fail-slow” 越来越大 注意,在此期间,驱动器 I/O 线程会体验 延迟响应和系统性能始终如一 低于预期。本文介绍了 Patch-sfp,一种 检测云中 fail-slow 的实用框架 存储系统。它采用基于 PatchTST 的预测模型 LPM 来 预测未来的驱动器延迟趋势,并开发 延迟阈值设计机制,准确识别 fail-slow 事件。 考虑到驱动器的负载不同,动态分块 策略旨在从 不同工作环境下的驱动器,增强了 LPM 捕获一段时间内依赖关系的能力。此外,还引入了 MoH-Attention 机制 提高 LPM 学习复杂关系的能力 在驱动器数据之间。实验结果表明,Patch-sfp 有效 检测云存储系统中的故障慢速,性能优于 现有的 fail-slow 检测方法包括 性能、适应性和健壮性。
|
|
We-S1-T5 |
Room 0.14 |
Image Processing and Pattern Recognition 3 |
Regular Papers - Cybernetics |
Chair: Zou, LongQuan | Shanghai University of International Business and Economics |
Co-Chair: Burguera, Antoni | Universitat De Les Illes Balears |
|
08:30-08:45, Paper We-S1-T5.1 | |
A Hybrid CNN-Transformer Model for Tomato Leaf Disease Classification Incorporating Gray-Aware Attention and RS-Convolutional Block Attention Mechanisms |
|
Zou, LongQuan | Shanghai University of International Business and Economics |
Yu, Lu | Shanghai University of International Business and Economics, Sha |
Huang, Xiaoyao | Shanghai University of International Business and Economics |
Li, Xinlei | Shanghai University of International Business and Economics |
Li, Qi | Shanxi Agricultural University |
Hu, Linqiang | Fudan University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Deep Learning
Abstract: Accurate classification of leaf diseases is crucial for plant health and effective crop management. Existing deep learning approaches are predominantly categorized into Convolutional Neural Network (CNN)-based and Vision Transformer (ViT)-based methods. However, inherent limitations in both approaches constrain further performance gains. CNNs excel at capturing local lesion details but struggle with long-range dependencies. In contrast, ViTs leverage self-attention to model global feature relationships but often overlook fine-grained local information. Moreover, most models rely solely on three-channel (RGB) inputs, underutilizing the texture details present in grayscale data. To address these problems, we propose a hybrid CNN-Transformer model enhanced with a novel Gray-Aware (GA) Attention Module and an improved Residual Convolutional Block Attention Module (Rs-CBAM). Specifically, the hybrid architecture effectively balances fine-grained detail preservation and global contextual understanding. GA strengthens texture representation, while Rs-CBAM further enhances attention to critical regions. Comparative experiments conducted on the PlantVillage and AI Challenger 2018 datasets demonstrate that our model outperforms existing models. Ablation studies further confirm the effectiveness of each proposed enhancement. Overall, the proposed approach provides a promising direction for fine-grained plant disease classification, and GA shows potential in broader image processing tasks requiring enhanced texture representation. The code is available on GitHub: https://github.com/Governeson/GA-CCB.
|
|
08:45-09:00, Paper We-S1-T5.2 | |
Image Grayscale Enhancement through Frame Accumulation with Shaped-Function Signal |
|
Yang, Xiao | Huazhong University of Science and Technology |
Song, Enmin | Huazhong University of Science and Technology |
Ma, Guangzhi | Huazhong University of Science and Technology |
Qiu, Wanyu | Hubei University of Economics |
Guo, Jia | Hubei University of Economics |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Multimedia Computation
Abstract: Grayscale resolution constitutes a fundamental performance metric in digital imaging systems for computer vision applications such as natural light imaging or medical imaging. Although frame accumulation has been conventionally employed to enhance grayscale resolution, existing implementations demand stringent hardware capabilities and operational conditions, thereby restricting their practical applicability. This research introduces an innovative grayscale super-resolution technique that integrates frame accumulation with shaped-function signal, mitigating analog-to-digital converter quantization errors. Specifically, our approach involves superimposing an auxiliary illumination with periodic luminance onto the original image signal. The composite signal is continuously sampled by an imaging sensor, with subsequent grayscale reconstruction achieved through an innovative algorithm developed in this research. The experimental results demonstrate that our method is effective in enhancing the quality of images captured by natural light digital cameras by improving grayscale resolution. Quantitative analysis reveals advancements across multiple image quality metrics, including structural similarity index (with +4% gains) and peak signal-to-noise ratio (with +3.5db gains). Furthermore, this methodology enhances operational convenience by addressing two key constraints in existing approaches: the requirement for precise waveform control of auxiliary illumination and the necessity for phase synchronization between auxiliary optical signals and image sensor sampling cycles. These features significantly facilitate the deployment of this method in practical application scenarios.
|
|
09:00-09:15, Paper We-S1-T5.3 | |
SCCOME: Scene Change Capture and Optical Motion Estimation for Video Quality Assessment |
|
Liu, Tsung-Jung | National Chung Hsing University |
Liao, Hao-Shiang | National Chung Hsing University |
Liu, Kuan-Hsien | National Taichung University of Science and Technology |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Neural Networks and their Applications
Abstract: The rapid growth of user-generated content (UGC) on social platforms has created a pressing need for effective outdoor video quality assessment. Evaluating video quality in uncontrolled environments is challenging due to the absence of reference videos and distortions caused by compression and transmission artifacts, limiting the applicability of traditional metrics. In this paper, we propose a novel no-reference video quality assessment model that reduces computational complexity by identifying frames with significant visual changes. Optical flow detection is then applied to these frames to capture perceptually important regions, enabling focused processing. Experiments on three public outdoor video quality databases—KoNViD-1k, LIVE-Qualcomm, and CVD2014—demonstrate the effectiveness of our method. Furthermore, ablation studies highlight the critical roles of frame selection and optical flow-based region analysis in improving model performance. The source code is available at https://github.com/Hsiang417/SCCOME.
|
|
09:15-09:30, Paper We-S1-T5.4 | |
Fully Automated SAM for Single-Source Domain Generalization in Medical Image Segmentation |
|
Zhuo, Huanli | Anhui University |
Ma, Leilei | Anhui University |
Zhao, Haifeng | Anhui University |
Zhou, Shiwei | Anhui University |
Sun, Dengdi | Anhui University |
Fu, Yanping | Anhui University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications
Abstract: Although SAM-based single-source domain generalization models for medical image segmentation can mitigate the impact of domain shift on the model in cross-domain scenarios, these models still face two major challenges. First, the segmentation of SAM is highly dependent on domain-specific expert-annotated prompts, which prevents SAM from achieving fully automated medical image segmentation and therefore limits its application in clinical settings. Second, providing poor prompts (such as bounding boxes that are too small or too large) to the SAM prompt encoder can mislead SAM into generating incorrect mask results. Therefore, we propose the FA-SAM, a single-source domain generalization framework for medical image segmentation that achieves fully automated SAM. FA-SAM introduces two key innovations: an Auto-prompted Generation Model (AGM) branch equipped with a Shallow Feature Uncertainty Modeling (SUFM) module, and an Image-Prompt Embedding Fusion (IPEF) module integrated into the SAM mask decoder. Specifically, AGM models the uncertainty distribution of shallow features through the SUFM module to generate bounding box prompts for the target domain, enabling fully automated segmentation with SAM. The IPEF module integrates multiscale information from SAM image embeddings and prompt embeddings to capture global and local details of the target object, enabling SAM to mitigate the impact of poor prompts. Extensive experiments on publicly available prostate and fundus vessel datasets validate the effectiveness of FA-SAM and highlight its potential to address the above challenges.
|
|
09:30-09:45, Paper We-S1-T5.5 | |
Deep Learning of Image Global Signatures for Underwater Loop Closing Detection |
|
Burguera, Antoni | Universitat De Les Illes Balears |
Bonin-Font, Francisco | Universitat De Les Illes Balears |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: Detecting whether two images depict the same location, totally or partially, from different viewpoints is a key challenge in mobile robotics known as Visual Loop Detection (VLD). VLD is an essential component of visual Simultaneous Localization and Mapping (SLAM), enabling robots to estimate their position while constructing a consistent map of the environment. Many existing approaches rely on global image descriptors or signatures such as Bag of Words (BoW), Vector of Locally Aggregated Descriptors (VLAD), and Hash-based Loop Closure (HALOC), which encode images into compact feature vectors for efficient comparison. Recent deep learning methods have surpassed these classical techniques but typically require large datasets and substantial computational resources, particularly during training. Addressing these challenges is crucial in underwater robotics, which is the focus of this work, as data collection is costly and onboard computational resources are often limited. This paper proposes a hybrid approach that combines deep learning with a signature-based method. Specifically, a triplet-based neural network is trained on HALOC descriptors to learn a discriminative embedding space where images of the same location are mapped closer together while different locations remain distinct. By operating directly on HALOC signatures, the model remains lightweight, thus being trainable on small datasets and involving low computational requirements. The proposal is experimentally evaluated with underwater imagery grabbed in coastal areas of Mallorca.
|
|
We-S1-T6 |
Room 0.16 |
Machine Learning 1 |
Regular Papers - Cybernetics |
Chair: Yairi, Takehisa | The University of Tokyo |
Co-Chair: Bhattacharyya, Shuvra | University of Maryland, College Park |
|
08:30-08:45, Paper We-S1-T6.1 | |
Deterministic and Stochastic Hybrid Modeling with Regularization |
|
Osaka, Akira | The University of Tokyo |
Takeishi, Naoya | The University of Tokyo |
Yairi, Takehisa | The University of Tokyo |
Keywords: Machine Learning, Neural Networks and their Applications, Deep Learning
Abstract: Hybrid modeling is an approach that integrates data-driven models with physics-based models to compensate for discrepancies between model predictions and real-world behaviors, enabling us to obtain accurate dynamical system models. Previous studies have pointed out that in the training phase, applying constraints on the data-driven parts, called regularizers, is necessary when the physics models include unidentified parameters. However, the appropriate formulation of such regularizers remains unclear. In this work, we conducted experiments to investigate effective ways of introducing regularizers. We found that we can strike the balance between parameter estimation and state prediction by utilizing correlational information between physics and whole models for regularization. Furthermore, we extended the hybrid modeling approach to stochastic differential equations (SDEs), proposing a novel SDE learning algorithm using the probability distributions for the regularizers. Our experimental results showed that the proposed approach improved parameter estimation accuracy and enabled us to acquire correct stochastic dynamics.
|
|
08:45-09:00, Paper We-S1-T6.2 | |
Interpreting and Enhancing Decisions in Autonomous Navigation: A Belief-Desire-Intention Reinforcement Learning (BDI-RL) Approach |
|
Perrusquia, Adolfo | Cranfield University |
Panda, Deepak Kumar | Cranfield University |
Guo, Weisi | Cranfield University |
Keywords: Machine Learning, AI and Applications, Application of Artificial Intelligence
Abstract: Explaining autonomy is becoming a crucial factor in the design of trustworthy autonomous platforms in both transport and smart living sectors. Interpretable reinforcement learning (RL) is an emerging research area that aims to explain why an autonomous platform adopts an action or set of actions. However, the state-of-the-art has focused on the design of explainable tools as independent modules that are not involved in the decision-making process of the RL agent. In this paper, we propose a novel belief-desire-intention RL (BDI-RL) approach that incorporates the explainable module as a belief model that enhances the learning capabilities of the RL as well as actions interpretability. To this end, we combine the merits of Dyna-Q algorithm as backbone RL model and belief maps as explainable element. The combined contribution of these models provides a robust model that emulates better the reasoning process of humans by leveraging beliefs and on-line agent-environment interactions. Simulations experiments are conducted in a grid environment of different sizes and obstacles. Comparisons are also provided to show the benefits of the proposed methodology.
|
|
09:00-09:15, Paper We-S1-T6.3 | |
FedDLFA: A Robust Defense Mechanism against Label-Flipping Attacks in Federated Learning |
|
Hu, Shiwen | Guangdong University of Foreign Studies |
Wang, Changji | Guangdong University of Foreign Studies |
Li, Yuan | Guangdong University of Foreign Studies |
Keywords: Machine Learning, AI and Applications, Application of Artificial Intelligence
Abstract: Federated learning effectively safeguards data privacy by enabling local model training across multiple clients. However, it remains susceptible to label flipping attacks, which can significantly degrade the global model's performance, even when the proportion of malicious clients is small. Existing defense methods often rely on assumptions regarding client data distribution or attacker proportions, which limits their effectiveness in real-world scenarios characterized by data heterogeneity and unknown attack scale. This paper introduces a novel and robust defense mechanism, FedDLFA. FedDLFA is grounded in a key insight: Due to the adversarial nature of the training objective, malicious clients exhibit significantly distinct neural activation patterns under standardized inputs compared to normal clients. FedDLFA extracts neural activation vectors from all client models, calculates their cosine similarity, constructs a similarity matrix, and applies clustering techniques to divide clients into two groups. Subsequently, it employs a density-size joint scoring mechanism to identify potential clusters of malicious clients. Experiments conducted on the MNIST, FMNIST, and CIFAR10 datasets, under both IID and Non-IID settings, demonstrate that FedDLFA achieves superior accuracy and effectively mitigates attack success rates compared to existing state-of-the-art methods.
|
|
09:15-09:30, Paper We-S1-T6.4 | |
Learning What Matters Now: A Dual‑Critic Context‑Aware RL Framework for Priority‑Driven Information Gain |
|
Panagopoulos, Dimitrios | Cranfield University |
Perrusquia, Adolfo | Cranfield University |
Guo, Weisi | Cranfield University |
Keywords: Machine Learning, Computational Intelligence, AI and Applications
Abstract: Autonomous systems operating in high‑stakes search‑and‑rescue (SAR) missions must continuously gather mission‑critical information while flexibly adapting to shifting operational priorities. We propose CA‑MIQ (Context‑Aware Max‑Information Q‑learning), a lightweight dual‑critic reinforcement learning (RL) framework that dynamically adjusts its exploration strategy whenever mission priorities change. CA‑MIQ pairs a standard extrinsic critic for task reward with an intrinsic critic that fuses state‑novelty, information‑location awareness, and real‑time priority alignment. A built‑in shift detector triggers transient exploration boosts and selective critic resets, allowing the agent to re‑focus after a priority revision. In a simulated SAR grid‑world, where experiments specifically test adaptation to changes in the priority order of information types the agent is expected to focus on, CA‑MIQ achieves nearly four times higher mission‑success rates than baselines after a single priority shift and more than three times better performance in multiple‑shift scenarios, achieving 100% recovery while baseline methods fail to adapt. These results highlight CA‑MIQ’s effectiveness in any discrete environment with piecewise‑stationary information‑value distributions.
|
|
09:30-09:45, Paper We-S1-T6.5 | |
Fast Learning and Robustness in Insect Olfactory Bio-Inspired Neural Networks: Neural Threshold Adaptation and Sparse Coding Strategies |
|
Vázquez Martín, Marcos | Madrid Autonomous University |
Rodriguez, Francisco B. | Universidad Autónoma De Madrid, Escuela Politécnica Superior, Gr |
Keywords: Machine Learning, Computational Life Science, Neural Networks and their Applications
Abstract: Fast learning remains a fundamental challenge in deep learning. Inspired by the biological insect olfactory system, we study fast learning in a model with structured random connectivity, adaptive thresholds, and sparsely activated neurons. In contrast to conventional random feature learning, our model regulates neuronal activity through dynamic threshold adaptation. Experimental results indicate that the model performs well with fewer training iterations while improving generalization and robustness compared to a traditional MLP. The model integrates principles from random feature learning and sparse coding, similar to the mechanisms observed in the insect olfactory system, particularly in the Antennal Lobe and Kenyon cells. These findings support that our Bio-Inspired Olfaction Neural Network (BONN) is a biologically plausible and computationally efficient alternative to fast learning in neural networks.
|
|
09:45-10:00, Paper We-S1-T6.6 | |
LIGA: A LIghtweight CNN Architecture Designed to Classify Popular Music Genres from the Amazonian Region |
|
Gomes, Claudio | UNIFAP |
Tsang, Ing Ren | Universidade Federal De Pernambuco |
Keywords: Multimedia Computation, Neural Networks and their Applications, Transfer Learning
Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) can deliver outstanding performance when appropriately optimized for specific tasks. Although these architectures typically require substantial computational resources, lightweight CNN models may achieve comparable or even superior efficiency in well-defined and constrained scenarios. Therefore, the effectiveness of the approach depends on the careful tuning of the hyperparameter and the deliberate selection of an architecture aligned with the characteristics of the target problem. This work presents LIGA, a LIghtweight CNN-based architecture designed to classify popular music Genres from the Amazonian region, including andean, brega, carimbó, cumbia, merengue, pasillo, salsa, and vaqueirada, originating from countries such as Bolivia, Brazil, Colombia, Ecuador, French Guiana, Peru, the Dominican Republic, and Venezuela. In addition to low computational resource usage and improved training speed, LIGA achieved higher precision and accuracy compared to the EfficientNet, MobileNet, ResNet, VGG, Xception, MobileViT, and MaxViT models.
|
|
We-S1-T7 |
Room 0.31 |
Human-Machine Cooperation and Systems 1 |
Regular Papers - HMS |
Chair: Okuda, Hiroyuki | Nagoya University |
Co-Chair: Richards, Dale | Thales UK |
|
08:30-08:45, Paper We-S1-T7.1 | |
ToolPlant: Tool-Based Natural Language Interface for Plant Simulation Models |
|
Zhao, Weikang | Institute of Automation, Chinese Academy of Sciences |
Hua, Jing | Institute of Automation,Chinese Academy of Sciences |
Wang, Xiujuan | Institute of Automation,Chinese Academy of Sciences |
Wang, Haoyu | Institute of Automation,Chinese Academy of Sciences |
Kang, Mengzhen | Institute of Automation,Chinese Academy of Sciences |
Keywords: Human-Computer Interaction, Human-Machine Cooperation and Systems, Human-Machine Interface
Abstract: Plant simulation models play a significant role in agricultural production and ecological research. However, traditional simulation software is often highly specialized and involves complex parameters, posing substantial entry barriers for non-expert users. With the emergence of large language models (LLMs) and advancements in their function calling capabilities, we propose an interactive simulation system that integrates LLM-based function calling with plant simulation models. This system uses the function-calling capabilities of large models (e.g., DeepSeek-V3) to translate natural language instructions into simulation parameters and commands for the GreenLab platform, thereby enabling interactive control of plant growth models. Furthermore, the system provides real-time visualization of simulation outcomes through a front-end 3D interface. Preliminary results demonstrate that this approach effectively lowers the usability threshold of plant simulation models, allowing non-expert users to conveniently conduct simulations and analyze results using natural language.
|
|
08:45-09:00, Paper We-S1-T7.2 | |
Online Phase Estimation of Human Oscillatory Motions Using Deep Learning |
|
Grotta, Antonio | Scuola Superiore Meridionale |
De Lellis, Francesco | University of Napoli, Federico II |
Keywords: Human-Machine Cooperation and Systems, Human Factors, Assistive Technology
Abstract: Accurately estimating the phase of oscillatory systems is essential for analyzing cyclic activities such as repetitive gestures in human motion. In this work we introduce a learning-based approach for online phase estimation in three-dimensional motion trajectories, using a Long Short-Term Memory (LSTM) network. A calibration procedure is applied to standardize trajectory position and orientation, ensuring invariance to spatial variations. The proposed model is evaluated on motion capture data and further tested in a dynamical system, where the estimated phase is used as input to a reinforcement learning (RL)-based control to assess its impact on the synchronization of a network of Kuramoto oscillators.
|
|
09:00-09:15, Paper We-S1-T7.3 | |
Strategic Gazing to Enhance AMR-Pedestrian Interaction at Crossings |
|
Otsuka, Kohei | Nagoya University |
Ninomiya, Yuki | Nagoya University |
Okuda, Hiroyuki | Nagoya University |
Matsubayashi, Shota | Nagoya University |
Miwa, Kazuhisa | Nagoya University |
Suzuki, Tatsuya | Nagoya University |
Keywords: Human-Machine Cooperation and Systems, Human Factors, Human-Collaborative Robotics
Abstract: Smooth and safe interactions between pedestrians and Autonomous Mobile Robots (AMRs) are crucial for integrating robotic systems into shared environments. Previous studies on external Human-Machine Interfaces (eHMIs) often employed static information presentation, neglecting dynamic interaction contexts. This study investigates the effectiveness of gaze-based nudging—subtle, intuitive communication via gaze behavior—in facilitating pedestrian role selection (leader or follower) during perpendicular crossing interactions with AMRs. Virtual reality (VR) experiments using Unity and Cybershoes are conducted to evaluate two gaze patterns generated by the AMR: `Leader’s gaze', a brief gaze directed toward pedestrians in the early stages of interactions, and `Follower's gaze', a gaze keeping track of pedestrians throughout the interaction until crossing completion. The impact of gaze timing (Early/Late) and initial positional relationships between pedestrians and AMRs (initial difference in TTCP) are systematically analyzed. The results indicate that gaze nudging significantly enhances pedestrians' subjective ratings of safety, smoothness, and understanding of the robot's intention compared to no-gaze conditions.Leader's gaze effectively encourages pedestrians to adopt the follower role under conditions favoring AMR priority (small or negative TTCP difference), whereas Follower's gaze promotes pedestrians adopting the Leader role under conditions naturally favoring pedestrian priority (larger TTCP difference). Additionally, the effectiveness of gaze nudging strongly depends on interaction timing, with early-stage gaze presentations exhibiting greater influence. These findings confirm the potential of gaze-based nudging as a nonintrusive, context-sensitive strategy for pedestrian-AMR interactions, emphasizing the importance of precisely timed gaze presentations for facilitating pedestrians' natural and intuitive role selection.
|
|
09:15-09:30, Paper We-S1-T7.4 | |
Human-Machine Collaboration in Technical Drawing Analysis |
|
Banotra, Richa | McDermott International |
Dzhusupova, Rimma | McDermott International |
Bosch, Jan | Eindhoven University of Technology |
Holmström Olsson, Helena | Malmo University |
Keywords: Human-Machine Cooperation and Systems, Human-Machine Interaction, Information Systems for Design
Abstract: Interpreting complex engineering drawings requires substantial manual effort in industrial workflows, with engineers spending hundreds of hours verifying correlations between visual elements and structured specifications across thousands of documents. This challenge is particularly acute in the Engineering, Procurement and Construction (EPC) industries, where interpretation errors are propagated into costly procurement and construction mistakes. We propose a cybernetic systems approach using Large Multimodal Models (LMMs) as cognitive partners in engineering documentation workflows, enhancing human capabilities through intelligent assistance in verification and information extraction tasks. To validate this solution, we systematically evaluated five LMMs on 60 piping isometric drawings with varying template structures, measuring both optical character recognition accuracy and correlation capabilities between drawings and their associated Bills of Materials. The results demonstrated significant performance variation between systems, with Claude 3.5 Sonnet and GPT-4o achieving greater accuracy 70%, while open source alternatives faced challenges with complex layouts and ambiguous visual information. These findings establish benchmarks for human-machine collaboration in engineering documentation processing and provide a framework for integrating intelligent systems into technical workflows across multiple industrial domains.
|
|
09:30-09:45, Paper We-S1-T7.5 | |
Self-Generative Requirements Engineering |
|
Peer, Jordan | Tel Aviv University |
Mordecai, Yaniv | Tel Aviv University |
Reich, Yoram | Tel Aviv University |
Keywords: Human-Machine Cooperation and Systems, Interactive Design Science and Engineering, Assistive Technology
Abstract: In the landscape of systems engineering, the integration of Generative Artificial Intelligence (GenAI) offers transformative potential for Requirements Engineering (RE). This study introduces Self-Generative Requirements Engineering (Self-GenRE), a human-AI reflexive methodology leveraging GenAI with humans in the loop to facilitate continuous requirement gathering improvement in the RE process. The methodology is demonstrated through two use cases: 1) a reflexive application of GenRE to itself, utilizing NLP4ReF – a GenRE tool designed to automate system requirements generation from a partial set of initial requirements, and 2) the application of the enhanced GenRE tool to an IoT project. Self-GenRE streamlines requirement gathering, demonstrating a significant reduction in requirements gathering time from days to minutes, while increasing previously overlooked requirements identification – 42% before and 71% after enhancement. These improvements illustrate the Self-GenRE's effectiveness, enabling GenRE tools to enhance themselves based on the system of interest (SoI) and its requirements, thereby continuously improving both the process and the GenRE tools, ensuring alignment with evolving project needs. Presently, modifying GenRE tools to adhere to new requirements is performed manually; however, we envision future automation of this process, paving the way for self-improving agentic generative RE. The Self-GenRE methodology, process, and tools were developed within a Model-Based Systems Engineering (MBSE) framework, facilitating efficient design modifications. This research highlights the efficacy of integrating GenAI methodologies and processes into systems engineering, paving the way for more robust and adaptable frameworks capable of addressing the evolving challenges in modern technology development and innovation.
|
|
09:45-10:00, Paper We-S1-T7.6 | |
Toward Automated Interdisciplinary Checks in EPC Projects: A Human-AI Collaborative Approach |
|
Banotra, Richa | McDermott International |
Yanez, Daniel | McDermott International |
Balamurugan, Nirmalkumar | McDermott International |
Dzhusupova, Rimma | McDermott International |
Bosch, Jan | Eindhoven University of Technology |
Holmström Olsson, Helena | Malmo University |
Keywords: Human-Machine Cooperation and Systems, Interactive Design Science and Engineering, Augmented Cognition
Abstract: Engineering, Procurement, and Construction (EPC) projects often encounter critical inefficiencies in Interdisciplinary Check (IDC) processes, where engineers manually validate design consistency across drawings and specifications, resulting in bottlenecks in large-scale projects. This paper introduces a proof-of-concept(PoC) human-AI cooperative framework for automating IDC workflows. The framework combines deep learning for document processing, heuristic algorithms for feature extraction, and a Large Multimodal Model (LMM) for cross-domain validation, while preserving engineering decision authority. The framework extracts key engineering data, links components across disciplines, and checks compliance with industry standards and client requirements. Testing on controlled data from a single EPC project (with 25 different engineering drawings) successfully identified all design discrepancies within the predefined validation scope, demonstrating notable workflow efficiency potential. As Phase 1 of a three-phase research cycle, this confirms technical feasibility before expanding to multiple projects and company-wide deployment, showing how human-AI collaboration can revolutionize engineering validation while preserving oversight of engineering expertise.
|
|
We-S1-T8 |
Room 0.32 |
Control of Uncertain Systems |
Regular Papers - SSE |
Chair: Michalek, Maciej Marcin | Poznan University of Technology |
Co-Chair: Wan, Yan | University of Texas at Arlington |
|
08:30-08:45, Paper We-S1-T8.1 | |
Prescribed-Time Stabilization for Uncertain Euler-Lagrangian Systems: A Cascade and Singularity-Free Design |
|
Zhou, Shuaiyu | Southeast University |
Wei, Yiheng | Southeast University |
He, Wangli | East China University of Science and Technology |
Cao, Jinde | Southeast University |
Keywords: Control of Uncertain Systems
Abstract: Prescribed-time stable systems frequently suffer from infinite gain issues that lead to practical infeasibility. This paper investigates the problem of stabilization for uncertain Euler-Lagrangian systems and proposes a singularity-free prescribed-time stabilization controller. Based on the time space deformation approach, the designed controller ensures singularity avoidance, effectively eliminating the existence of infinite gain. Besides, the considered Euler-Lagrangian systems are subject to unknown nonlinear functions and derivative-bounded external disturbances with unknown bounds. To achieve system stabilization under such uncertainty issue, this paper formulates multiple sliding manifolds with prescribed-time stability. The controller's effectiveness is validated through simulation studies on a rendezvous formation problem.
|
|
08:45-09:00, Paper We-S1-T8.2 | |
Error-Based ADRC Augmented with a Residual Compensator for Control Performance Improvement with Application to an Aerodynamic Plant |
|
Michalek, Maciej Marcin | Poznan University of Technology |
Debski, Karol | Poznan University of Technology |
Keywords: Control of Uncertain Systems, Adaptive Systems, System Modeling and Control
Abstract: Numerous practical applications have revealed significance efficiency and high robustness of the Active Disturbance Rejection Control (ADRC) scheme in the problem of tracking control of highly uncertain and perturbed dynamical systems. The error-based ADRC (eADRC) allows additionally reducing the information content about a reference trajectory, where the availability of the reference signal's time derivatives is not required. The main limitation, however, of the conventional ADRC (also in the error form) applied with a linear extended state observer (LESO) comes from a need of a wide enough observer's bandwidth, which leads to large observer gains, and in the case of a noisy feedback can lead to a control performance degradation and to an excessive amplification of high parasitic frequencies in a control signal. In this paper we propose to mitigate the above mentioned limitation by augmenting the eADRC with an add-on mechanism called the emph{residual compensation} which enables reducing a bandwidth of LESO while keeping (or even improving) the resultant tracking control performance in both a~tracking accuracy and a~control cost. A presentation of the residual compensation idea for the error-based ADRC is followed by its experimental verification using a highly uncertain laboratory-scale aerodynamic plant.
|
|
09:00-09:15, Paper We-S1-T8.3 | |
Efficient Online Uncertainty Evaluation for Microgrid Systems |
|
Zhou, Siyu | The University of Texas at Arlington |
Wan, Yan | University of Texas at Arlington |
Jiang, Zimin | Stony Brook University |
Zhang, Peng | Stony Brook University |
Lin, Zongli | University of Virginia |
Shamash, Yacov A. | Stony Brook University |
Keywords: Control of Uncertain Systems, Large-Scale System of Systems, Intelligent Power Grid
Abstract: In this study, we present a method for online estimation of the mean performance output in microgrid systems subject to high-dimensional and dynamic uncertainties. We integrate an efficient Multivariate Probabilistic Collocation Method (MPCM) based sampling strategy with a Copula-based conditional probability distribution. This integrated method enables online evaluation of system outputs with high estimation accuracy and efficiency. The online evaluation algorithm is developed, and its theoretical analysis is provided. Real Time Digital Simulator (RTDS) experiments validate the method, demonstrating its feasibility for practical applications.
|
|
09:15-09:30, Paper We-S1-T8.4 | |
Undecidability of Context-Sensitive Field-Insensitive Analyses |
|
Galindo, Carlos | Universitat Politècnica De València |
Martín-Abellán, Carlos | Universitat Politècnica De València |
Silva, Josep | Universitat Politècnica De València |
Keywords: Control of Uncertain Systems, Quality and Reliability Engineering, Technology Assessment
Abstract: Data-dependence analyses (DDA) and data-flow analyses (DFA) are essential software engineering processes automatically carried out in compilers, debuggers, and, in general, in most program analyses for termination, parallelization, optimization, etc. The question "For which classes of programs DDA and DFA are decidable?" remains unanswered, even though it is of fundamental importance because it can pose limits to the algorithms that try to solve the DDA/DFA problem. It is a well-known result that DDA/DFA is undecidable for non-terminating programs. In 2000, Thomas Reps took a big step forward. He proved the undecidability of one specific DDA for programs with recursive functions and composite data structures, and he explained the important implications of that result. It was a big step in the path to prove the undecidability of other analyses. This paper uses Reps' results to prove that DDA and DFA are also undecidable for programs without composite data structures.
|
|
09:30-09:45, Paper We-S1-T8.5 | |
Data-Driven MPC with Data Selection for Flexible Cable-Driven Robotic Arms |
|
Liang, Huayue | Tsinghua University |
Chen, Yanbo | Tsinghua University |
Cheng, Hongyang | Tsinghua University |
Yu, Yanzhao | Tsinghua University |
Li, Shoujie | Tsinghua University |
Tan, Junbo | Tsinghua University |
Wang, Xueqian | Tsinghua University |
Zeng, Long | Tsinghua University |
Keywords: Control of Uncertain Systems, System Modeling and Control, Robotic Systems
Abstract: Flexible cable-driven robotic arms (FCRAs) offer dexterous and compliant motion. Still, the inherent properties of cables, such as resilience, hysteresis, and friction, often lead to particular difficulties in modeling and control. This paper proposes a model predictive control (MPC) method that relies exclusively on input-output data, without a physical model, to improve the control accuracy of FCRAs. First, we develop an implicit model based on input-output data and integrate it into an MPC optimization framework. Second, a data selection algorithm (DSA) is introduced to filter the data that best characterize the system, thereby reducing the solution time per step to approximately 4 ms, which is an improvement of nearly 80%. Lastly, the influence of hyperparameters on tracking error is investigated through simulation. The proposed method has been validated on a real FCRA platform, including five-point positioning accuracy tests, a five-point response tracking test, and trajectory tracking for letter drawing. The results demonstrate that the average positioning accuracy is approximately 2.070 mm. Moreover, compared to the PID method with an average tracking error of 1.418°, the proposed method achieves an average tracking error of 0.541°.
|
|
09:45-10:00, Paper We-S1-T8.6 | |
RL-Enhanced Disturbance-Aware MPC for Fast and Robust UAV Trajectory Tracking |
|
Shen, Haoxun | University of Pennsylvania |
Zhan, Junfei | University of Pennsylvania |
He, Tengjiao | Jinan University |
Keywords: Control of Uncertain Systems, System Modeling and Control, Robotic Systems
Abstract: This paper presents a robust Model Predictive Control (MPC) framework for trajectory tracking of unmanned aerial vehicles (UAVs), enhanced by a reinforcement learning (RL) policy for warm-start initialization and a sliding mode observer (SMO) for disturbance estimation. The proposed architecture addresses multifaceted performance degradation, including residual trajectory tracking errors, reduced control responsiveness, and slow early-stage convergence, primarily caused by external disturbances, model uncertainties, and time-varying environmental conditions. An enhanced adaptive super-twisting SMO is developed to estimate lumped disturbances in real time. These estimates are integrated into the MPC prediction model to compensate for mismatches between nominal dynamics and true system behavior. To accelerate control convergence, particularly in improving early stage performance, an offline RL warm-start policy is used to generate an initial control sequence for MPC. The overall framework preserves the constraint handling and predictive capabilities of conventional MPC, while significantly improving robustness, stability, and tracking accuracy. The simulation results validate the effectiveness of the proposed method in achieving reliable and precise trajectory tracking under challenging and uncertain operating scenarios.
|
|
We-S1-T9 |
Room 0.51 |
Wearable Computing |
Regular Papers - HMS |
Chair: Miyake, Shota | Waseda University |
Co-Chair: Piscitelli, Alfonso | University of Salerno |
|
08:30-08:45, Paper We-S1-T9.1 | |
Relative Grip Strength Estimation Using Optical Sensors: A Lightweight Algorithm |
|
Miyake, Shota | Waseda University |
Tamaki, Emi | H2L, Inc |
Keywords: Wearable Computing
Abstract: This study suggests a lightweight algorithm to estimate relative grip strength using forearm muscle deformation measured by infrared optical sensors. Unlike conventional approaches relying on machine learning or electromyography (EMG), the suggested method estimates grip strength by calculating the absolute difference between sensor values in a relaxed and contracted state, thereby enabling computation without complex signal processing. To validate the method’s generalizability across individuals, the algorithm was applied to data collected from seven participants. Each participant wore an optical sensor band (FirstVR) and a pressure-sensing glove and repeatedly gripped and released a cylindrical object while synchronized time-series data were recorded. Relative grip strength was computed using a moving average filter, and correlations with actual grip force values were analyzed. The results showed that the estimated grip strength values generally followed the trends of the measured force data, achieving a coefficient of determination (R²) above 0.5 for most participants. The findings indicate that the suggested method can provide a simple yet effective approach to grip strength estimation with minimal processing, which is potentially useful for wearable systems or real-time applications with limited computational resources.
|
|
08:45-09:00, Paper We-S1-T9.2 | |
A Black-Box Positioning Solution for Effortless Integration in Wearable Navigation |
|
Vu, Anh Van | Korea Advanced Institute of Science and Technology |
Sung, Changmin | Korea Advanced Institute of Science and Technology |
Han, Dongsoo | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Computing
Abstract: Accurate localization is critical for a wide range of applications, including data navigation, augmented reality, safety management, and healthcare. However, achieving accurate positioning remains challenging, especially in GPS-degraded environments. Pedestrian Dead Reckoning (PDR), which estimates position based on inertial sensor data, offers a promising alternative but typically demands substantial effort in algorithm design and system integration. This complexity often leads to prolonged development timelines and elevated costs. To address these challenges, we present the PDR Sensor (PDRS), a plug-and-play, black-box PDR module optimized for seamless integration into wearable and IoT platforms. PDRS integrates a microprocessor and an inertial measurement unit (IMU) within a compact, low-power module. It executes real-time sensor fusion onboard, providing position updates via standard interfaces (I²C, SPI, UART). Experimental evaluations demonstrate its high robustness, achieving a step detection error of just 0.16% over 1250 steps and a traveled distance error of 1.87% under diverse conditions. The system further exhibits low latency (2.5 ms) and modest power consumption (52.35 mA). Unlike conventional studies, the PDRS leverages a holistic co-design of hardware and software to enhance performance and integration aspects, simplifying adoption without requiring deep expertise. By lowering technical barriers, PDRS enables developers and researchers to prioritize innovation in IoT and wearable technologies, accelerating prototyping, reducing development time, and advancing industry progress. Future work aims to enhance accuracy, reduce power usage and module size, and integrate AI-driven techniques to better mitigate drift.
|
|
09:00-09:15, Paper We-S1-T9.3 | |
Presentation of Eight Motion Direction to the Forearm Utilizing Synthesis of Stimuli Generated by Cloth Deformation |
|
Fukatsu, Haru | Nagoya University |
Yamaguchi, Takuma | Nagoya University |
Funabora, Yuki | Nagoya University |
Doki, Shinji | Nagoya University |
Keywords: Wearable Computing, Haptic Systems
Abstract: This paper presents a wearable haptic device that can perceive eight directions of the forearm utilizing cloth deformation. Inducing kinesthetic perception through wearable haptic devices is expected to have applications in VR, rehabilitation, and teleoperation. The demand for haptic devices made of flexible materials that can provide intuitive feedback has rapidly increased. In this paper, we developed a novel haptic device for the forearm that consists of thin artificial muscles and existing clothing. The cloth deformation caused by the contraction of the artificial muscle allows the wearer to perceive four directions of the forearm, and the results of the perception experiment showed 98.8% accuracy. Furthermore, experimental results showed that synthesis stimuli can extend the wearer's kinesthetic perception in eight directions. The findings provide important guidelines for the design of future haptic devices.
|
|
09:15-09:30, Paper We-S1-T9.4 | |
Development of a Wearable Cyborg HAL for Functional Improvement of Twisting Movements through Coordinated Hip-Trunk Motion |
|
Matsuura, Mitsuki | University of Tsukuba |
Uehara, Akira | University of Tsukuba |
Sankai, Yoshiyuki | University of Tsukuba |
Kawamoto, Hiroaki | University of Tsukuba |
Keywords: Wearable Computing, Human-Computer Interaction, Assistive Technology
Abstract: Locomotive syndrome is a condition in which motor function during walking deteriorates due to motor unit disorders, increasing the risk of requiring nursing care. This is a serious problem that reduces the quality of life. In this study, we focused on cybernics treatment, a method in which a wearable cyborg facilitates voluntary movement in paralyzed body parts, thereby providing sensory feedback from the paralyzed periphery to induce neural plasticity. Facilitating movements that activate the muscles along the spiral line—a myofascial chain that spirals across the body—is expected to improve coordination between the lower limbs and the trunk in patients with impaired locomotion, thereby enhancing their walking function. This study aimed to develop a method for activating the muscle groups along the spiral line responsible for coordinated hip-trunk motion using a wearable cyborg hybrid assistive limbs (HAL) through cybernics treatment, and to confirm the feasibility of this method in assisting twisting movements through coordinated hip-trunk motion in a fundamental experiment. The system consists of a trunk twisting unit and a hip flexion unit. By mechanically linking these components and implementing a control system that synchronizes trunk twisting and hip flexion based on the wearer’s intended movement, the system enables coordinated hip-trunk motion assistance. We conducted a fundamental experiment on an able-bodied adult male. Our results confirmed the presence of assistive torque during trunk twisting and the synchronization between trunk rotation and lateral bending, thereby confirming the feasibility of assisting hip flexion and trunk twisting in accordance with the wearer’s movement intention.
|
|
09:30-09:45, Paper We-S1-T9.5 | |
A Hardware-Separated Standard Leads ECG Monitoring System |
|
Niu, Mengting | University of Science and Technology of China |
Lu, Guorui | Leiden University |
Liang, Zhen | Loughborough University |
Liu, Boyan | University of Science and Technology of China |
Cai, Xiaohui | University of Science and Technology of China |
Keywords: Wearable Computing, Human-Computer Interaction, Medical Informatics
Abstract: Since the invention of the electrocardiogram (ECG), significant advancements have been made in its acquisition methods. However, achieving both comfort during measurement process and standardization simultaneously remains a challenge. In this study, we present a novel signal routing method, by dividing the pathway into the on-body part and environmental part and having the connection of these two parts enabled by physical contact, we are able to move the rigid hardware away from the human body. Based on this signal routing method, we design and implement an ECG signal acquisition system capable of measuring standard leads ECG. The Pearson correlation coefficient between the ECG signals acquired by our system with dry electrodes and the standard system with silver/silver-chloride (Ag/AgCl) electrodes reaches up to 0.98 in limb leads and up to 0.94 in chest leads, which demonstrates that our system has the potential to acquire the standard leads ECG signal.
|
|
09:45-10:00, Paper We-S1-T9.6 | |
Evaluating Touch and Crown Interactions on a Linear QWERTY Soft Keyboard on Smartwatches |
|
Costagliola, Gennaro | University of Salerno |
De Rosa, Mattia | University of Salerno |
Fiumarella, Camilla | Università Degli Studi Di Salerno |
Fuccella, Vittorio | University of Salerno |
Piscitelli, Alfonso | University of Salerno |
Keywords: Wearable Computing, User Interface Design, Human-Computer Interaction
Abstract: Typing on a smartwatch presents significant challenges due to several factors, including the small size of these devices, the lack of tactile feedback, and the fat finger problem. This study presents an empirical analysis of two distinct text-entry methods for a one-line QWERTY keyboard: a traditional touch-based interaction and an input technique that leverages the rotary-crown, a feature commonly found on a substantial number of smartwatch devices. Interaction via the rotating crown enables text entry even when touch input is impractical—such as when wearing gloves, in wet conditions, or in cases of reduced precision like the 'fat finger' problem. By rotating the rotary-crown, the user can navigate to and select the desired character, which is then input by pressing the corresponding key to confirm the selection. The method was evaluated in comparison to the touch-interaction: the results demonstrated that participants achieved higher accuracy with the rotary-crown input, with a Total Error Rate of 9%; conversely, participants exhibited faster performance with the touch-interaction, achieving a typing speed of 9.3 words per minute.
|
|
We-S1-T10 |
Room 0.90 |
Information Systems for Design and Engineering |
Regular Papers - HMS |
Chair: Roshinta, Trisna Ari | Budapest University of Technology and Economics |
Co-Chair: Leyli-abadi, Milad | IRT SystemX |
|
08:30-08:45, Paper We-S1-T10.1 | |
Enhancing Model Transparency with Causality-Aware Surrogate Frameworks in Explainable AI |
|
Roshinta, Trisna Ari | Budapest University of Technology and Economics |
Szűcs, Gábor | Budapest University of Technology and Economics |
Keywords: Ethics of AI and Pervasive Systems, Information Systems for Design, Networking and Decision-Making
Abstract: Explainable Artificial Intelligence (XAI) plays a crucial role in high-stakes decision-making by ensuring that machine learning models provide clear and trustworthy explanations. However, many existing interpretability methods, such as SHAP and Partial Dependence Plot (PDP), struggle to differentiate between correlation and causality. To overcome this challenge, we introduce a causality-aware surrogate modeling framework that improves the global interpretability of complex models. Our approach combines Probability of Sufficiency (PS), Probability of Necessity (PN), and Probability of Causality (PoC) with decision tree-based rule extraction to identify rules and features that have a direct causal impact on the target outcome. Experiments on the German Credit dataset reveal that certain rules exhibit strong sufficiency while showing weak necessity, highlighting key causal factors in loan approval decisions. Among these, savings and duration stand out as critical features for counterfactual reasoning. By ensuring that extracted rules capture true causal relationships rather than misleading correlations, our method enhances model transparency, trustworthiness, and counterfactual fundamentals.
|
|
08:45-09:00, Paper We-S1-T10.2 | |
A Conceptual Framework for AI-Based Decision Systems in Critical Infrastructures |
|
Leyli-abadi, Milad | IRT SystemX |
Bessa, Ricardo | INESC TEC |
Viebahn, Jan | TenneT Transmission System Operator |
Boos, Daniel | Swiss Federal Railways |
Borst, Clark | Delft University of Technology |
Castagna, Alberto | EnliteAI |
Chavarriaga, Ricardo | Zurich University of Applied Sciences ZHAW |
Hassouna, Mohamed | Fraunhofer IEE |
Lemetayer, Bruno | RTE |
Leto, Giulia | Delft University of Technology |
Marot, Antoine | Company |
Meddeb, Maroua | IRT SystemX |
Meyer, Manuel | Flatland Association |
Schiaffonati, Viola | Politecnico Di Milano |
Schneider, Manuel | Flatland Association |
Waefler, Toni | University of Applied Sciences and Arts Northwestern Switzerland |
Yagoubi, Mouadh | IRT SystemX |
Keywords: Human-centered Learning, Human-Machine Interaction, Networking and Decision-Making
Abstract: The interaction between humans and AI in safety-critical systems presents a unique set of challenges that remain partially addressed by existing frameworks. These challenges stem from the complex interplay of requirements for transparency, trust, and explainability, coupled with the necessity for robust and safe decision-making. A framework that holistically integrates human and AI capabilities while addressing these concerns is notably required, bridging the critical gaps in designing, deploying, and maintaining safe and effective systems. This paper proposes a holistic conceptual framework for critical infrastructures by adopting an interdisciplinary approach. It integrates traditionally distinct fields such as mathematics, decision theory, computer science, philosophy, psychology, and cognitive engineering and draws on specialized engineering domains, particularly energy, mobility, and aeronautics. Its flexibility is further demonstrated through a case study on power grid management.
|
|
09:00-09:15, Paper We-S1-T10.3 | |
Intelligent Chess Robot Tutor: A Low-Cost, Vision-Guided Multi-Level System with Real-Time Tracking and Advanced Gameplay Analysis |
|
Mdimegh, Iyed | National Institute of Applied Sciences and Technology (INSAT) |
Hamzaoui, Mohamed | National Institute of Applied Sciences and Technology (INSAT) |
Bouzidi, Malak | National Institute of Applied Sciences &Technology (INSAT) |
Kasmi, Rayen | National Institute of Applied Sciences and Technology (INSAT) |
Zghibi, Ahmed | National Institute of Applied Science and Technology |
Zouari, Fayez | National Institute of Applied Science and Technology (INSAT) |
Keywords: Human-centered Learning, Human-Computer Interaction, Human-Collaborative Robotics
Abstract: Chess is a highly popular board game that is still practiced by individuals of different ages, despite the rise of digital games. Aiming to update chess education without compromising its classical charm, this paper introduces Roboknight, a low-cost, modular robotic platform designed to promote interactive and fun chess learning among students of all proficiency levels. The platform integrates a 4-degree-of-freedom (4-DOF) robotic arm and a dual-power control system to provide robust operation. It employs computer vision advances, including YOLO-based detection and FEN encoding, in real-time move detection and verification. A central logic module oversees the game flow, applying predefined rules and managing interactions between the vision module, control, and the chess engine. A web-based dashboard further adds to the potential for remote monitoring and performance analysis, thus enabling large-scale deployment in educational settings. Although the existing prototype delivers a functional and enjoyable learning experience, future upgrades will focus on multi-device synchronization, conversational tutoring using chatbot integration, and improved energy efficiency through smart power management.
|
|
09:15-09:30, Paper We-S1-T10.4 | |
Conceptualisation of Collaboration Awareness in Next-Generation Collaborative Environments |
|
Bravo, Crescencio | University of Castilla-La Mancha |
Molina, Ana I. | University of Castilla-La Mancha |
Gallardo, Jesús | University of Zaragoza |
Ortega-Cordovilla, Manuel | University of Castilla-La Mancha |
Keywords: Telepresence, User Interface Design, Human-Computer Interaction
Abstract: Society and organizations are undergoing a profound digital transformation, driven by emerging technological advances and new paradigms that address the growing demands of productive, commercial, service-related, and social activities. This evolution has led to the emergence of so-called Next-Generation Collaborative Environments, where providing awareness remains a critical factor for usability, efficiency and a satisfactory user experience. In this work, we analyse some significant awareness frameworks from the literature and propose a new conceptualisation tailored to these emergent environments, aimed at groupware designers. Our proposal is novel in both its scope and its non-linear approach to the awareness areas and dimensions it encompasses. This conceptual schema has been preliminarily validated through its application to the modelling of a diverse set of case studies.
|
|
09:30-09:45, Paper We-S1-T10.5 | |
Improved Fireworks Algorithm-Enhanced Single-Objective Hybrid Disassembly Line Balancing with Machine Wear Rates Considered (I) |
|
Feng, Yujie | LiaoNing Petrochemical University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Tang, Ying | Rowan University |
Wang, Weitian | Montclair State University |
Hu, Bin | Kean University |
Gao, Claire | Livingston High School |
Wang, Jun | College of Computer Science and Technology Shenyang University O |
Keywords: Design Methods, Resilience Engineering, Human-Collaborative Robotics
Abstract: As the demand for disassembling end-of-life products grows, limitations in traditional disassembly line design, low efficiency, and high resource consumption become increasingly evident. Particularly in large-scale disassembly tasks, where the cost of conventional remanufacturing rises and the technologies fail to meet high-efficiency requirements. The integration of robots into disassembly lines is a promising solution to alleviate these issues. This work presents a multi-product hybrid disassembly line balancing problem that considers machine wear rates and establishes a mixed-integer programming model guided by profit maximization to address it. An improved fireworks algorithm is used in the proposed approach. The developed solution is compared with genetic and ant colony algorithms. Evaluation results and analysis demonstrated the competitive efficiency and stability of our approach.
|
|
09:45-10:00, Paper We-S1-T10.6 | |
Visual-Based Spatial Audio Generation System for Multi-Speaker Environments |
|
Liu, Xiaojing | Queen Mary University of London |
Gurelli, Ogulcan | Queen Mary University of London |
Wang, Yan | Xidian University |
Reiss, Joshua | Queen Mary University of London |
Keywords: Multimedia Systems
Abstract: In multimedia applications such as films and video games, spatial audio techniques are widely employed to enhance user experiences by simulating 3D sound: transforming mono audio into binaural formats. However, this process is often complex and labor-intensive for sound designers, requiring precise synchronization of audio with the spatial positions of visual components. To address these challenges, we propose a visual-based spatial audio generation system - an automated system that integrates face detection YOLOv8 for object detection, monocular depth estimation, and spatial audio techniques. Notably, the system operates without requiring additional binaural dataset training. The proposed system is evaluated against existing Spatial Audio generation system using objective metrics. Experimental results demonstrate that our method significantly improves spatial consistency between audio and video, enhances speech quality, and performs robustly in multi-speaker scenarios. By streamlining the audio-visual alignment process, the proposed system enables sound engineers to achieve high-quality results efficiently, making it a valuable tool for professionals in multimedia production.
|
|
We-S1-T11 |
Room 0.94 |
Smart Factories with Artificial Intelligence: Advancing Sustainable
Manufacturing & ALPS-CPS: Advances in Learning Paradigms for Smart
Cyber-Physical Systems |
Special Sessions: SSE |
Chair: Feroskhan, Mir | Nanyang Technological University Singapore |
Co-Chair: Wang, Jiacun | Monmouth University |
Organizer: Guo, Xiwang | Liaoning Petrochemical University |
Organizer: Wan, Yan | University of Texas at Arlington |
Organizer: Xie, Junfei | San Diego State University |
Organizer: Wang, Jiacun | Monmouth University |
|
08:30-08:45, Paper We-S1-T11.1 | |
Disassembly and Assembly Line Balancing Problem with Robot Movement Space Constraints Solved Using the Improved Parallel A2C Algorithm (I) |
|
Zeng, Wenjing | Liaoning Petrochemical University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Qin, Shujin | Shangqiu Normal University |
Qi, Liang | Shandong University of Science and Technology |
Hu, Bin | Kean University |
Chen, Siqi | College of Computer Science and Technology Shenyang University O |
Wang, Jun | College of Computer Science and Technology Shenyang University O |
Keywords: Manufacturing Automation and Systems, Cyber-physical systems
Abstract: The disassembly and assembly line balancing problem (DALP) is a critical task in industrial production, involving the efficient organization of disassembly and assembly tasks to improve the productivity and flexibility of production lines. In practical applications, task allocation, robot movement, and workstation layout optimization are key factors affecting production efficiency. This study proposes an improved parallel advantage actor-critic algorithm to address DALP with space constraints due to robot movement. Considering the limitations of workstation space, this approach optimizes the robot’s movement paths between workstations, reducing the cost of opening workstations, and optimizing task allocation strategies. To enhance the convergence speed and stability of the conventional Parallel A2C algorithm, action space optimization and a greedy strategy are incorporated into the algorithm. Experimental results demonstrate that the improved parallel advantage actor-critic outperforms the A2C and AC algorithms in terms of efficiency and performance, particularly in handling disassembly tasks with space constraints, significantly improving the operational efficiency and economic benefits of the production line.
|
|
08:45-09:00, Paper We-S1-T11.2 | |
Solving the Circular Disassembly Line Balancing Problem in Shifts Considering Human Learning Effect Based on IMPALA Algorithm (I) |
|
Zhang, Haitao | Liaoning Petrochemical University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Qin, Shujin | Shangqiu Normal University |
Hu, Bin | Kean University |
Qi, Liang | Shandong University of Science and Technology |
Chen, Siqi | College of Computer Science and Technology Shenyang University O |
Wang, Jun | College of Computer Science and Technology Shenyang University O |
Keywords: Manufacturing Automation and Systems, Cyber-physical systems
Abstract: Product disassembly is significant for recycling scrapped products and reducing environmental pollution and resource waste. The recovery, reuse, and recycling of industrial products is crucial in modern industry. Manual disassembly efficiency significantly impacts the disassembly line’s overall effectiveness, especially workers’ skill level and learning efficiency. This paper proposes a multi-period personnel scheduling problem that considers worker learning effects. A mixed integer programming model for the disassembly balance problem was established to maximize disassembly profit. This problem is solved using a new reinforcement learning algorithm, the importanceweighted actor-learner architecture(IMPALA). The correctness and effectiveness of the proposed algorithm are verified through comparative experiments with the famous IBM optimizer CPLEX and some popular peer algorithms.
|
|
09:00-09:15, Paper We-S1-T11.3 | |
An Improved Reinforcement Learning-Based UAV Obstacle Avoidance Framework Using PPO-CMA (I) |
|
Chen, Yuqi | Ntu - Nanyang Technological University |
Gao, Junjie | Ntu |
Deng, Yaosheng | Nanyang Technological University |
Feroskhan, Mir | Nanyang Technological University Singapore |
Keywords: Robotic Systems
Abstract: In recent years, the widespread adoption of unmanned aerial vehicles (UAVs) has increased the demand for advanced obstacle avoidance capabilities. Traditional path planning algorithms often exhibit low efficiency and poor adaptability, making them unsuitable for dynamic environments. To address these limitations, researchers have explored reinforcement learning (RL)-based approaches, with Proximal Policy Optimization (PPO) widely used for its stable policy updates and improved sample efficiency. However, PPO suffers from slow convergence, which limits its real-time applicability. To overcome this issue, this paper integrates the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) into PPO and proposes PPO-CMA, an enhanced algorithm for end-to-end UAV path planning. The PPO-CMA network, leveraging depth images and UAV pose feedback, generates continuous control actions, combining CMA-ES adaptive search with PPO policy optimization to improve convergence speed and learning efficiency. The proposed method is evaluated in the VisFly simulation environment, demonstrating significantly faster convergence compared to traditional PPO while ensuring accurate target-reaching and obstacle avoidance. The real-world experimental results validate the effectiveness, robustness, and practical applicability of PPO-CMA for UAV navigation and motion planning in real-world scenarios.
|
|
09:15-09:30, Paper We-S1-T11.4 | |
Physics-Informed Split Extended Dynamic Mode Decomposition and Real-Time Sequential Action Control of Multirotors with Partially Known Dynamics (I) |
|
Kamath, Archit Krishna | Nanyang Technological University Singapore |
Nahavandi, Saeid | Swinburne University of Technology |
Anavatti, Sreenatha | University of New South Wales |
Feroskhan, Mir | Nanyang Technological University Singapore |
Keywords: Autonomous Vehicle, System Modeling and Control, Mechatronics
Abstract: This paper addresses the challenge of real-time control of multirotors subjected to partially known and unmodeled dynamics. A physics-informed Koopman operator framework is proposed, where the known physical dynamics and unknown residual effects are separated using a Strang splitting approach. The continuous-time Koopman operator is trained on physics-derived trajectories, while the discrete-time Koopman operator is learned from real-world trajectory data, enabling a data-efficient and globally linearizable model of the multirotor dynamics. The learned linear model is subsequently used to design a discrete-time Sequential Action Control (SAC) policy for real-time trajectory tracking. Experimental validation on a quadrotor platform tracking a lemniscate trajectory demonstrates that the proposed PI-EDMD-based SAC controller achieves superior tracking accuracy and up to 67% lower control energy consumption compared to baseline nonlinear SAC and LQR controllers. These results highlight the effectiveness of the proposed framework in enhancing both trajectory fidelity and actuation efficiency for multirotors.
|
|
09:30-09:45, Paper We-S1-T11.5 | |
A Physics-Informed Approach to Intelligent Actuator-Fault Diagnosis in Multirotor UAVs (I) |
|
T, Thanaraj | Nanyang Technological University, Singapore |
Kamath, Archit Krishna | Nanyang Technological University Singapore |
Feroskhan, Mir | Nanyang Technological University Singapore |
Keywords: Fault Monitoring and Diagnosis, Cyber-physical systems, Quality and Reliability Engineering
Abstract: Multirotor unmanned aerial vehicles (UAVs) frequently experience degraded control authority due to partial actuator faults, compromising mission reliability and safety. Purely data-driven fault diagnostic methods, although effective, typically demand extensive labelled datasets and lack direct interpretability. Physics-informed neural networks (PINNs), which incorporate physical laws directly into their learning process, offer a promising alternative by enabling data-efficient training and interpretable results. This study proposes a PINN for actuator fault diagnosis in quadrotor UAVs by embedding discrete Newton–Euler residuals within its loss function, ensuring predictions remain consistent with rigid-body dynamics. A quadrotor UAV is modelled in a high-fidelity simulation environment, and flight data from these simulations are collected for analysis. A sensitivity study is subsequently conducted, testing the PINN against varied fault magnitudes, fault intervals, and different training dataset sizes (100%, 50%, and 30%).The proposed PINN outperforms a similarly sized multilayer perceptron (MLP), reducing fault detection delay by approximately 20%, consistently achieving macro F1 scores above 0.90, and improving prediction accuracy R^2 and RMSE, even with limited labelled data.
|
|
09:45-10:00, Paper We-S1-T11.6 | |
Physics-Informed Neural Network Modelling of Kinematics for Flapping-Wing Mechanism in Aerial Robots (I) |
|
Teo, Wei Rui Clarence | Nanyang Technological University |
Sivakumar, Anush Kumar | Nanyang Technological University |
Feroskhan, Mir | Nanyang Technological University Singapore |
Keywords: Cyber-physical systems, System Modeling and Control, Mechatronics
Abstract: This study presents a data-driven and physics-informed learning framework for predicting the kinematics of an elastic-incorporated flapping-wing device. An experimental setup was developed where a flapping-wing device was designed and built to collect time-series kinematic data. A physics-informed neural network (PINN) was trained on the experimental dataset, achieving an RMSE of 0.3140 on the holdout set. In comparison, a multi-layer perceptron (MLP) of equivalent architecture yielded a higher RMSE of 0.426 and exhibited memorization. The incorporation of physics-based constraints into the PINN enhanced predictive accuracy and improved generalization, particularly in capturing the oscillatory behavior of the flapping system. These results demonstrate the effectiveness of PINNs in predicting the kinematics of an elastic-integrated flapping mechanism. Future directions include transforming the kinematic model to a full dynamic formulation, which would allow for precise control of flapping-wing prototypes with integrated elasticity.
|
|
We-S1-T12 |
Room 0.95 |
System Modeling and Control & Conflict Resolution |
Special Sessions: SSE |
Chair: Fang, Liping | Toronto Metropolitan University |
Co-Chair: Flores, Huber | University of Tartu |
Organizer: Fang, Liping | Toronto Metropolitan University |
Organizer: Hipel, Keith | University of Waterloo |
|
08:30-08:45, Paper We-S1-T12.1 | |
Modeling Drone Deliveries Using Petri Nets: An Evaluation on Collision Recovery and Energy Efficiency |
|
Feitosa, Leonel | Federal University of Piauí (UFPI) |
Vandirleya, Barbosa | UFPI |
Silva, Luis Guilherme | Universidade Federal Do Piauí |
Fé, Iure | UFPI |
Martins Campos de Oliveira, Fabíola | Federal University of ABC |
Bittencourt, Luiz Fernando | Unicamp |
Flores, Huber | University of Tartu |
Silva, Francisco Airton | Federal University of Piauí |
Keywords: System Modeling and Control
Abstract: The growing adoption of drones for goods delivery has emerged as a potentially viable solution. By operating through aerial routes, drones significantly reduce delivery times and expand operational reach. However, covering large areas requires prolonged flights, leading to high battery consumption and an increased risk of collisions, particularly in densely populated regions. This study presents a Stochastic Petri Net model to evaluate drone performance, focusing on metrics such as utilization, delivery rate, mean mission time, and drop probability. Additionally, energy consumption and carbon footprint metrics were investigated to assess the environmental impact of drone operations. The model incorporates factors such as strategic recharging points and collision probability, providing insights into drone performance under high-demand scenarios.
|
|
08:45-09:00, Paper We-S1-T12.2 | |
Modeling and Non-Linear Attitude and Altitude Control for a New Design of Birotor UAV |
|
Lebid, Moussaab | Emp |
Araar, Oualid | Ecole Militaire Polytechnique |
Bouzid, Yasser | Emp |
Hadj Sadok, Brahim | Emp |
Rezig, Chakib | Emp |
Keywords: System Modeling and Control, Modeling of Autonomous Systems
Abstract: This paper presents the modeling and control of a tilt-birotor UAV, an emerging aerial platform that combines efficiency and agility. Departing from conventional multirotor configurations, the birotor drone utilizes two tiltable rotors to generate lift and directional control. This not only reduces the drone’s weight but also allows better maneuverability, at the expense of higher nonlinearity and couplings in the drone model. To improve tracking performance and robustness an attitude and altitude stabilization strategy, based on a nonlinear backstepping controller, is proposed in this work. To generate the low-level control inputs, the proposed controller is combined with both linear and nonlinear allocation strategies. Simulation results, conducted on the full non-linear model of a V-shaped birotor design, confirm the viability of the proposed control strategy.
|
|
09:00-09:15, Paper We-S1-T12.3 | |
Robust Localization of Mobile Robots in Changing Environments Using Visual SLAM Enhanced with Semantic Information |
|
Seki, Yoshiaki | University of Tsukuba |
Kawamoto, Hiroaki | University of Tsukuba |
Uehara, Akira | University of Tsukuba |
Ohya, Akihisa | University of Tsukuba |
Yorozu, Ayanori | University of Tsukuba |
Keywords: Robotic Systems, Modeling of Autonomous Systems, System Modeling and Control
Abstract: Dynamic environments present a significant challenge for mobile robot localization, as changes in object configuration and human movement can lead to incorrect position estimates. Conventional methods struggle with these variations, often resulting in mismatches and localization failures. To address this issue, we propose a Visual SLAM-based localization method that integrates semantic information using an RGB-D camera. Our approach utilizes instance segmentation to enhance localization robustness in 3D environments. During map creation, feature points extracted from images in a static environment are assigned label IDs corresponding to object regions, embedding semantic information into the 3D map. During localization, feature matching is constrained by label consistency, ensuring that only points with identical labels are matched. Additionally, feature points associated with high-mobility objects, such as chairs or people, are excluded to prevent errors caused by environmental changes. Furthermore, during pose optimization, the information matrix is weighted according to semantic labels, giving higher importance to static objects and reducing the influence of dynamic ones. We validate the proposed method through real-world experiments in cluttered indoor environments with varying furniture arrangements and human presence. The results demonstrate that our approach significantly reduces mismatches and improves localization accuracy compared to conventional ORB-SLAM2. Unlike existing methods that update maps online, our approach maintains a fixed pre-constructed map, reducing inconsistencies over time. This study enhances localization robustness in changing environments by incorporating semantic constraints into Visual SLAM.
|
|
09:15-09:30, Paper We-S1-T12.4 | |
An Integrated Framework for Exploring the Minimum Cost Conflict Mediation Path in Graph Model (I) |
|
Zhu, Yan | Sichuan University |
Dong, Yucheng | Sichuan University |
Zhang, Hengjie | Hohai University |
Fang, Liping | Toronto Metropolitan University |
Keywords: Conflict Resolution
Abstract: The inverse analysis of the graph model for conflict resolution (GMCR) determines the preferences required to ensure that a desired resolution is an equilibrium. However, existing studies have yet to examine the transition mechanism from the current state to the desired resolution. To address this issue, this study introduces an integrated minimum cost conflict mediation path (I-MCCMP) model within the GMCR framework. This model not only identifies the necessary preferences to establish the desired resolution as an equilibrium but also determines the optimal conflict mediation path to achieve it. To demonstrate its applicability, the proposed model is applied to the real-world Lake Gisborne Conflict.
|
|
09:30-09:45, Paper We-S1-T12.5 | |
A Multi-Objective Simulation-Based Optimization Framework for Multi-Agent Phased Evacuation Strategies in Fire (I) |
|
Golshani, Feze | Toronto Metropolitan University |
Fang, Liping | Toronto Metropolitan University |
Keywords: Conflict Resolution, Decision Support Systems
Abstract: This study proposes a multi-agent-based simulation and optimization framework to enhance phased evacuation strategies by integrating fire dynamics, evacuees’ characteristics, and evacuation processes. Unlike conventional models, this framework probabilistically assesses the integrated effects of heat, asphyxiant gases, and irritant gases on different evacuees’ incapacitation. The Non-dominated Sorting Genetic Algorithm III (NSGA-III), coupled with a trained neural network, is employed to find optimal phased evacuation strategies considering the Total Evacuation Time (TET), congestion, and fire impact. To assess its effectiveness, the framework is applied to a fire scenario in an educational building, comparing simultaneous and phased evacuation strategies. Results demonstrate that the selected phased evacuation strategy significantly enhances evacuation efficiency, reducing TET, congestion, and fire impact by 14.8%, 33.3%, and 13.1%, respectively. These findings underscore the framework’s potential for improving fire evacuation planning, providing a simulation-based approach to optimizing evacuation strategies and enhancing safety in fire.
|
|
09:45-10:00, Paper We-S1-T12.6 | |
Mathematical Representation and Analysis of the Longest-Valid-Chain Rule in Blockchain System Based on Social Choice Correspondence (I) |
|
Geng, Guangping | Institute of Science Tokyo |
Inohara, Takehiro | Institute of Science Tokyo |
Keywords: Conflict Resolution
Abstract: In this paper, we establish a mathematical representation of the Longest-Valid-Chain rule in the blockchain system based on the social choice correspondence. The "longest"-ness is defined by the resources used to create a new block, and the "valid"-ness is defined by the compatibility between protocols. Instead of a single individual or an entity making the decision, the Longest-Valid-Chain rule is an important method for decision-making in the blockchain-based decentralized organization. The Longest-Valid-Chain rule can transform a list of individual preferences into a social choice when the protocol fork occurs in the blockchain system. According to the analysis, we prove that the Longest-Valid-Chain rule is not Paretian, but is anonymous, neutral, and strongly monotonic.
|
|
We-S1-T13 |
Room 0.96 |
AI and Applications 2 |
Regular Papers - Cybernetics |
Chair: Rinaldi, Antonio Maria | University of Naples Federico II |
Co-Chair: Shamsi, Afshar | Concordia University |
|
08:30-08:45, Paper We-S1-T13.1 | |
ETAGE: Enhanced Test Time Adaptation with Integrated Entropy and Gradient Norms for Robust Model Performance |
|
Shamsi, Afshar | Concordia University |
Becirovic, Rejisa | UNSW Sydney |
Argha, Ahmadreza | UNSW Sydney |
Abbasnejad, Ehsan | Monash University |
Alinejad-Rokny, Hamid | UNSW Sydney |
Mohammadi, Arash | Concordia University |
Keywords: AI and Applications, Deep Learning, Machine Learning
Abstract: Test time adaptation (TTA) equips deep learning models to handle unseen test data that deviates from the training distribution, even when source data is inaccessible. While traditional TTA methods often rely on entropy as a confidence metric, its effectiveness can be limited, particularly in biased scenarios. Extending existing approaches like the Pseudo Label Probability Difference (PLPD), we introduce ETAGE, a refined TTA method that integrates entropy minimization with gradient norms and PLPD, to enhance sample selection and adaptation. Our method prioritizes samples that are less likely to cause instability by combining high entropy with high gradient norms out of adaptation, thus avoiding the overfitting to noise often observed in previous methods. Extensive experiments on CIFAR-10-C and CIFAR-100-C datasets demonstrate that our approach outperforms existing TTA techniques, particularly in challenging and biased scenarios, leading to more robust and consistent model performance across diverse test scenarios. The codebase for ETAGE is available on https: //github.com/afsharshamsi/ETAGE.
|
|
08:45-09:00, Paper We-S1-T13.2 | |
MMFformer: Multimodal Fusion Transformer Network for Depression Detection |
|
Haque, Md Rezwanul | University of Waterloo |
Islam, Md. Milon | University of Waterloo |
Raju, S M Taslim Uddin | University of Waterloo |
Hamdi, Altaheri | University of Waterloo |
Nassar, Lobna | American University of Ras Al Khaimah |
Karray, Fakhreddine | University of Waterloo |
Keywords: AI and Applications, Computational Intelligence, Neural Networks and their Applications
Abstract: Depression is a serious mental health illness that significantly affects an individual’s well-being and quality of life, making early detection crucial for adequate care and treatment. Detecting depression is often difficult, as it is based primarily on subjective evaluations during clinical interviews. Hence, the early diagnosis of depression, thanks to the content of social networks, has become a prominent research area. The extensive and diverse nature of user-generated information poses a significant challenge, limiting the accurate extraction of relevant temporal information and the effective fusion of data across multiple modalities. This paper introduces MMFformer, a multimodal depression detection network designed to retrieve depressive spatio-temporal high-level patterns from multimodal social media information. The transformer network with residual connections captures spatial features from videos, and a transformer encoder is exploited to design important temporal dynamics in audio. Moreover, the fusion architecture fused the extracted features through late and intermediate fusion strategies to find out the most relevant intermodal correlations among them. Finally, the proposed network is assessed on two large-scale depression detection datasets, and the results clearly reveal that it surpasses existing state-of-the-art approaches, improving the F1-Score by 13.92% for D-Vlog dataset and 7.74% for LMVD dataset. The code is made available publicly at https://github.com/rezwanh001/Large-Scale-Multimodal-Depression-Detection.
|
|
09:00-09:15, Paper We-S1-T13.3 | |
Breaking Monocular Depth Estimation with DepthHack: A Black-Box 3D Physical Adversarial Attack for Autonomous Driving |
|
Sun, Yibo | Zhengzhou University |
Shi, Yucheng | Zhengzhou University |
Shi, Lei | Zhengzhou University |
Wei, Lin | Zhengzhou University |
Gao, Yufei | Zhengzhou University |
Li, Qiushi | China Mobile Online Services Company Limited |
Li, Wenwen | China Mobile Online Services Company Limited |
Keywords: AI and Applications, Deep Learning, Multimedia Computation
Abstract: 目深度估计 (MDE) 对于自动驾驶至关重要,无需昂贵的 LiDAR 即可实现 3D 感知,但现有的针对 MDE 的 2D 白盒对抗攻击缺乏对视点和现实世界变化的鲁棒性。我们提出了 DepthHack,这是 MDE 的第一个 3D 黑盒对抗性攻击框架,它使用概率采样和基于分数的优化来制作强大的 3D 对抗性纹理。DepthHack 可确保在各种天气条件和视点下的稳健性,在 Monodepth2 (Carla) 上实现 8.88 m 的平均深度估计误差,比 HardBeat 高出 2.1%,仅 60k 个查询。这项工作揭示了 MDE 漏洞,增强了自治系统的安全性。在 4 个 MDE 模型和 2 个数据集上的实验验证了其卓越的性能、效率和跨数据集泛化能力。
|
|
09:15-09:30, Paper We-S1-T13.4 | |
GSCL-KT: Improving Knowledge Tracing Via Intra-Group Similarity Contrastive Learning |
|
Li, Changlong | East China Normal University |
Wang, Su | East China Normal University |
Hu, Wenxin | East China Normal University |
Keywords: AI and Applications, Deep Learning, Representation Learning
Abstract: Knowledge tracing models have long grappled with the dual challenges of data sparsity and the limited ability to capture group learning patterns. Current contrastive learning paradigms in knowledge tracing (e.g., CL4KT framework) primarily employ sequence augmentation strategies to alleviate data scarcity constraints. However, such approaches frequently compromise semantic coherence during the augmentation process while failing to account for inherent similarity patterns within learner cohorts. To overcome these limitations, this paper introduces the GSCL-KT model (Group Similarity Contrastive Learning for Knowledge Tracing), which, for the first time, incorporates a group-similarity-aware contrastive learning mechanism into the knowledge tracing domain. Unlike traditional approaches that rely on manual data augmentation, GSCL-KT dynamically identifies positive and negative sample pairs from educationally homogeneous groups, enabling the discovery of group-level cognitive patterns while maintaining semantic coherence. The proposed model incorporates several advanced optimization strategies, including the Talking-Heads attention mechanism for fine-grained interaction modeling, the ContraNorm method for feature distribution regularization, and a correlation network enhanced by label dependencies. Experimental results on four real-world educational datasets demonstrate that GSCL-KT consistently outperforms existing baseline models, achieving the highest AUC and competitive performance across metrics.
|
|
09:30-09:45, Paper We-S1-T13.5 | |
ArchERL: Evolutionary Reinforcement Learning Framework for Efficient Hardware Architecture Design without Domain Knowledge |
|
Huang, Yuwei | Southern University of Science and Technology |
Shi, Yuhui | Southern University of Science and Technology |
Keywords: AI and Applications, Evolutionary Computation, Machine Learning
Abstract: With the stagnation of Moore's Law scaling, efficient hardware architectures employing compute-in-memory paradigms have become increasingly crucial to sustain AI innovations. This motivates the development of high-throughput architectures with balanced energy-latency profiles through machine learning algorithms. However, for human-in-the-loop optimization methods, human labor is involved in most of the iterations, whereas for automated methods, either expert domain knowledge is required in the design or the search space is relatively small. To address these challenges, we propose ArchERL, an evolutionary reinforcement learning (ERL) framework that represents the first application of ERL to general hardware architecture design. Specifically, ArchERL tightly couples population-based evolutionary algorithm for global exploration with an actor-critic reinforcement learning module for prior-free policy refinement, and ArchERL employs periodic weight synchronization and gradient feedback between the two modules to achieve efficient collaborative search and rapid convergence. To evaluate the proposed method, extensive experiments are conducted in multiple simulated hardware environments, including the DRAM controller and DNN mapping. The results demonstrate that ArchERL achieves state-of-the-art performance and outperforms widely used baselines in both efficiency and effectiveness.
|
|
09:45-10:00, Paper We-S1-T13.6 | |
A Multi-Sector Approach to Retrieval Augmented Generation in Agriculture |
|
Benfenati, Domenico | University of Naples Federico II |
Rinaldi, Antonio Maria | University of Naples Federico II |
Russo, Cristiano | Department of Electrical Engineering and Information Technology |
Tommasino, Cristian | Department of Electrical Engineering and Information Technology |
Keywords: AI and Applications, Expert and Knowledge-Based Systems, Deep Learning
Abstract: Agriculture is a multidisciplinary domain that spans a wide range of sectors, including horticulture, animal husbandry, and water and land resource management. The intricate complexity of this field arises from the interconnectivity of its diverse sub-sectors, requiring advanced knowledge-specific expertise to address interdisciplinary challenges effectively. Traditional methods for integrating knowledge across sub-sectors are often limited in their ability to deliver precise and context-specific insights, particularly for complex, multi-sectors inquiries. In this paper, we present a multi-sectors framework for question-answering in agriculture. Our framework utilizes multiple Large Language Models, each of which is specialized in a sector of agriculture. By adopting the Mixture of Experts paradigm, our framework synthesizes the expertise of multiple Large Language Models to produce accurate and interdisciplinary answers to questions spanning multiple agricultural sectors. We have evaluated the performance of our approach using qualitative metrics. Our results demonstrate the effectiveness of our approach compared with a baseline Large Language Model question-answering strategy.
|
|
We-S1-T14 |
Room 0.97 |
Exploring Shared and Cooperative Control Systems: Models, Patterns and
Assessment Methodologies 1 |
Special Sessions: HMS |
Chair: Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Co-Chair: Mandischer, Nils | University of Augsburg |
Organizer: Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Organizer: Jost, Céline | Paris 8 University |
Organizer: Mandischer, Nils | University of Augsburg |
Organizer: Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Organizer: Pool, Daan Marinus | TU Delft |
Organizer: Carlson, Tom | University College London |
Organizer: Shen, Weiming | Huazhong University of Science and Technology |
Organizer: Liu, Peter X. | CARLETON UNIVERSITY |
|
08:30-08:45, Paper We-S1-T14.1 | |
Conjugated Capabilities: Interrelations of Elementary Human Capabilities and Their Implication on Human-Machine Task Allocation and Capability Testing Procedures (I) |
|
Mandischer, Nils | University of Augsburg |
Füller, Larissa | University of Augsburg |
Alles, Torsten | Institute for Quality Assurance in Prevention and Rehabilitation |
Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Mikelsons, Lars | University of Augsburg |
Keywords: Human Performance Modeling, Human-Collaborative Robotics, Shared Control
Abstract: Human and automation capabilities are the foundation of every human-autonomy interaction and interaction pattern. Therefore, machines need to understand the capacity and performance of human doing, and adapt their own behavior, accordingly. In this work, we address the concept of conjugated capabilities, i.e. capabilities that are dependent or interrelated and between which effort can be distributed. These may be used to overcome human limitations, by shifting effort from a deficient to a conjugated capability with performative resources. For example: A limited arm's reach may be compensated by tilting the torso forward. We analyze the interrelation between elementary capabilities within the IMBA standard to uncover potential conjugation, and show evidence in data of post-rehabilitation patients. From the conjugated capabilities, within the example application of stationary manufacturing, we create a network of interrelations. With this graph, a manifold of potential uses is enabled. We showcase the graph's usage in optimizing IMBA test design to accelerate data recordings, and discuss implications of conjugated capabilities on task allocation between the human and an autonomy.
|
|
08:45-09:00, Paper We-S1-T14.2 | |
An Exploratory Study on Human-Robot Interaction Using Semantics-Based Situational Awareness (I) |
|
Ruan, Tianshu | University of Birmingham |
Ramesh, Aniketh | Extreme Robotics Lab, University of Birmingham |
Stolkin, Rustam | Extreme Robotics Lab, NCNR, University of Birmingham |
Chiou, Manolis | Queen Mary University of London |
Keywords: Human-Machine Interaction, Human-Collaborative Robotics, Human Factors
Abstract: In this paper, we investigate the impact of high-level semantics (evaluation of the environment) on Human-Robot Team (HRT) and Human-Robot Interaction (HRI) in the context of mobile robot deployments. Although semantics has been widely researched in AI, how high-level semantics can benefit the HRT paradigm is underexplored, often fuzzy, and intractable. We applied a semantics-based framework that could reveal different indicators of the environment (i.e. how much semantic information exists) in a mock-up disaster response mission. In such missions, semantics are crucial as the HRT should handle complex situations and respond quickly with correct decisions, where humans might have a high workload. Especially when human operators need to shift their attention between robots and other tasks, they will struggle to build Situational Awareness (SA) quickly. The experiment suggests that the presented semantics: 1) alleviate the perceived human operator's workload; 2) increase the operator's trust in the SA; and 3) help to reduce the reaction time in switching the Level of Autonomy (LoA) when needed. Additionally, we find that participants with higher trust in the system are encouraged by high-level semantics to use teleoperation mode more.
|
|
09:00-09:15, Paper We-S1-T14.3 | |
Acceptance of Haptic Shared Control Design Choices for Car Steering (I) |
|
Huijsing, Kelsey N. | TU Delft, Aerospace Engineering, Control & Simulation |
Pool, Daan Marinus | TU Delft |
van Paassen, Marinus M | Delft University of Technology |
Mulder, Max | Delft University of Technology |
Keywords: Haptic Systems, Shared Control, Human-Machine Cooperation and Systems
Abstract: Haptic shared control systems that support drivers by means of added torques on the steering wheel are often tuned heuristically. To allow for more systematic design, this paper focuses on the Four Design Choices Architecture (FDCA) and systematically analyzes its tuning with an offline simulation model for the driver's control behavior and neuromuscular system. These analyses indicated that within the FDCA architecture the Level of Haptic Support (LoHS), which is a feedforward channel supporting negotiation of upcoming curves, is a main contributor to joint system performance. In a driving simulator experiment, the adaptation to and acceptance of different LoHS levels was investigated. Driver acceptance was found to increase with increasing LoHS values up to 1. Objective metrics, including torque conflict (70% reduction), steering effort (81% reduction), steering wheel reversal rate, and lateral deviation all improved, indicating that with the FDCA a high LoHS is both acceptable and, in fact, preferred.
|
|
09:15-09:30, Paper We-S1-T14.4 | |
Detecting Human Distraction in Manual Control (I) |
|
Li, Y. David | TU Delft, Aerospace Engineering, Control & Simulation |
Pool, Daan Marinus | TU Delft |
Mulder, Max | Delft University of Technology |
Keywords: Human Performance Modeling, Human-Machine Cooperation and Systems, Shared Control
Abstract: InceptionTime neural network models were trained to detect distractions in manual control tasks with pursuit and preview displays. Training and test data were collected in an experiment where ten participants were deliberately distracted from the primary control task using the Surrogate Reference Task (SuRT). Overall, distractions are easier to detect in pursuit tasks, with test accuracies of around 80% and 60% for pursuit and preview data, respectively. With preview, human controllers see the future target trajectory, which enables them to mitigate distraction effects. Unexpectedly, data with longer distractions from 'hard' SuRT tasks are more difficult to classify than 'easy' distractions; an effect attributed to differences in human behavior between the training and test data collection conditions. These results show clear opportunities for neural network models to detect distractions, in real-time, for increasing safety of human-operated vehicles.
|
|
09:30-09:45, Paper We-S1-T14.5 | |
Validation of a Grip Force Scheduled LPV Model of Time-Varying Neuromuscular Admittance (I) |
|
Palings, Rik | TU Delft, Aerospace Engineering, Control & Simulation |
Pool, Daan Marinus | TU Delft |
van Paassen, Marinus M | Delft University of Technology |
Mulder, Max | Delft University of Technology |
Keywords: Shared Control, Human Performance Modeling, Human-Machine Interface
Abstract: Haptic Shared Control (HSC) systems offer a means to support human drivers in the transition to fully-automated driving. Matching HSC systems settings with drivers' time-varying neuromuscular system (NMS) dynamics requires real-time HSC adaptations. This paper presents an experimental validation of a previously proposed method for predicting drivers' time-varying neuromuscular admittance using an 'average' grip force scheduled linear parameter varying (LPV) model. The quality of LPV model predictions is compared to that of Recursive Least Squares (RLS) fits of an admittance model on the same data. Ten participants performed steering wheel manipulation tasks with steering wheel perturbations that needed to be kept within a certain displacement boundary by adapting their grip force. Time-invariant (TI) and time-varying (TV) boundary levels were used to, respectively, construct and validate the LPV model. Results show that the average relation between admittance and grip force that underlies the current LPV method varies too much between TV and TI tasks, hampering accurate admittance predictions. Compared to the quality-of-fit of 80-90% obtained with RLS on the TV data, the LPV model's predictions are insufficiently accurate and do not exceed 55% on average. An approach that enables individual instead of average LPV models to be constructed directly from TV experiment data needs to be pursued for HSC implementations.
|
|
09:45-10:00, Paper We-S1-T14.6 | |
RTART: A Signal Processing-Informed Neural Network for Cross-Individual Fault Diagnosis (I) |
|
He, Yiming | Huazhong University of Science and Technology |
Zhao, Chao | Huazhong University of Science and Technology |
Shen, Weiming | Huazhong University of Science and Technology |
Keywords: Systems Safety and Security,
Abstract: The traditional intelligent models developed based on the source device fault diagnosis (SDFD) framework have achieved high accuracy, but may lead to unreliable diagnostic results for unseen individual diagnostic scenarios, as they ignore potential individual differences (such as assembly changes, noise interference). This paper proposes a refined trigonometric activation representation transformer (RTART), which is a signal processing informed neural network (SPINN) tailored for cross individual fault diagnosis (CIFD). By integrating the theory of Sparse Short Time Fourier Transform (DSTFT) and Kolmogorov Arnold Representation Theorem (KART) into structural design of the Transformer, RTART enhances the extraction of periodic vibration features and frequency modulation diversity. Specifically, the original vibration signals are first segmented into regional patches based on one-dimensional convolutional patch (1-DCP) tokenizer, and then input into a multi-scale region pruning (MSRP) module for global feature refinement, which aims to refine decision features for reducing the risk of overfitting due to individual differences. A trigonometric activation representation (TrigAR) module is developed to enhance the reliable feature expression of periodic vibration signals and improve the model generalization. The proposed method is validated on a well-known public dataset using the CIFD benchmark. Compared with the most advanced methods, RTART has better generalization ability and achieves a cross individual accuracy rate of over 95%.
|
|
We-S1-T15 |
Room 1.85 |
Smart Buildings, Smart Cities and Infrastructures |
Regular Papers - SSE |
Chair: Li, David | Yeshiva University |
Co-Chair: Shrivastav, Chinmay Satish | University of Modena and Reggio Emilia |
|
08:30-08:45, Paper We-S1-T15.1 | |
Scalable Object Geolocation in Traffic Camera Imagery Using 3D World Model |
|
Shrivastav, Chinmay Satish | University of Modena and Reggio Emilia |
Masola, Alessio | University of Modena and Reggio Emilia |
Cavicchioli, Roberto | University of Modena and Reggio Emilia |
Capodieci, Nicola | University of Modena and Reggio Emilia |
Burgio, Paolo | University of Modena and Reggio Emilia |
Keywords: Smart Buildings, Smart Cities and Infrastructures, Digital Twin, Intelligent Transportation Systems
Abstract: The rapid expansion of outdoor traffic camera systems requires efficient methods to accurately estimate the geolocation of objects within their scenes. We present an innovative and scalable framework that completely automates this process by combining easy-to-build 3D world modeling with real-world traffic camera imagery. First, using the Cesium plugin for Unreal Engine, we create detailed and scalable 3D representations of urban environments, leveraging publicly available, highly accurate 3D data. This results in the creation of globally curated 3D content, including terrain, imagery, and photogrammetry. The real-world traffic camera imagery is then matched within our model using state-of-the-art feature matching techniques. By estimating the homography between synthetic images from the 3D model and the real images from traffic cameras, we accurately determine the geolocation of observed objects within the scene. This approach not only enhances geolocation accuracy but also enables seamless scalability across diverse urban settings and camera deployments worldwide. Our method significantly reduces the manual effort required for traffic camera calibration, thus streamlining the deployment of intelligent transportation systems at scale. We demonstrate the high performance of our approach by experimenting in an urban trial site with multiple smart city cameras and publicly available cameras around the world. Additionally, we highlight the adaptability of our framework for a wide range of computer vision-based traffic analytics applications, including its potential for drone-based localization.
|
|
08:45-09:00, Paper We-S1-T15.2 | |
Enhancing the Adversarial Attack Resilience of Smart Building Environment Control Based on Multi-Agent Deep Reinforcement Learning |
|
Chen, Hongjian | South China Normal University |
Sun, Duoyu | South China Normal University |
Yang, Huan | South China Normal University |
Keywords: Smart Buildings, Smart Cities and Infrastructures, Distributed Intelligent Systems, Cyber-physical systems
Abstract: With the widespread deployment of intelligent building management systems (BMS) in smart buildings, security threats posed by false data injection (FDI) attacks have become increasingly prominent. To address the proven vulnerability of Multi-Agent Deep Reinforcement Learning (MADRL)-driven BMS to FDI intrusions, this study proposes a QMIX framework enhanced by adversarial training. By incorporating adversarial samples crafted via the Fast Gradient Sign Method (FGSM) during training and introducing Noisy Network layers, the framework's robustness against black-box attacks is substantially improved. Experimental results demonstrate that compared to the baseline model without defense mechanisms, the proposed framework achieves significantly smaller performance degradation under black-box attacks across varying intensity levels, particularly under high-intensity attack scenarios, while maintaining stable control performance. Furthermore, we conduct white-box attack experiments to systematically assess model robustness, along with ablation studies quantifying the individual contributions of adversarial training and Noisy Networks to defense efficacy.
|
|
09:00-09:15, Paper We-S1-T15.3 | |
Collaborative Indoor Positioning: A Machine Learning Approach for Dynamic Environments with Autonomous Mobile Robots |
|
Moradbeikie, Azin | CITIN |
Amorin, Ivan | CITIN |
Azevedo, Rolando | CITIN |
Jesus, Cristiano | CITIN |
David, Beserra | EPITA |
Ivan Lopes, Sergio | CITIN |
Keywords: Smart Buildings, Smart Cities and Infrastructures, Distributed Intelligent Systems, Robotic Systems
Abstract: Current indoor positioning technologies face significant limitations in dynamic industrial environments, where metallic structures and signal interference degrade accuracy. While Bluetooth Low Energy (BLE) RSSI-based systems offer a cost-effective and energy-efficient solution, their performance is often hampered by path loss variability in dynamic environments. Usually, to improve BLE RSSI-based positioning accuracy, the number of infrastructure nodes (IN) may be increased. However, increasing the number of static infrastructure nodes (SIN) leads to increased complexity, cost, and network load. To address these challenges, we propose a novel collaborative indoor positioning method integrating autonomous mobile robots (AMRs) as mobile infrastructure nodes (MINs). Unlike traditional static deployments, our system leverages AMRs equipped with high-accuracy onboard positioning and BLE receivers to dynamically collect and correlate RSSI data while navigating the environment. This approach enables real-time adaptive localization by fusing MIN data with fixed SIN measurements using a machine learning-based fusion model (combining Neural Networks and Kernel Density Estimation).
|
|
09:15-09:30, Paper We-S1-T15.4 | |
A Framework for Indoor Map Layout Construction in Collaborative Positioning: Optimized Analysis of Reachability and Vertical Transitions |
|
Son, Kyuho | Korea Advanced Institute of Science and Technology |
Han, Dongsoo | Korea Advanced Institute of Science and Technology |
Keywords: Smart Buildings, Smart Cities and Infrastructures, System Architecture
Abstract: Collaborative indoor positioning requires maps with explicitly defined spatial connectivity. This paper presents a framework for constructing indoor map layouts using spatial entities and reachability structures. We introduce the Floor-Group-Area (FGA) Matrix Problem to visualize and optimize layout connectivity through column reordering. The framework exports 1bpp OGMs and spatial metadata in JSON. In user evaluations, trained participants completed full layouts of a four-story museum in 20 minutes, with all computationally intensive algorithms running in few seconds on a standard PC.
|
|
We-S1-BMI.WS |
Room 0.49&0.50 |
BMI Workshop - Paper Session 3: BCIs in Healthcare and Rehabilitation |
BMI Workshop |
Chair: Power, Sarah | Memorial University of Newfoundland |
|
08:30-08:45, Paper We-S1-BMI.WS.1 | |
Improving Continuous Grasp Force Decoding from EEG with Time-Frequency Regressors and Premotor-Parietal Network Integration |
|
Dangi, Parth | Indian Institute of Technology, Gandhinagar |
Meena, Yogesh | IIT Gandhinagar |
Keywords: Active BMIs, Other Neurotechnology and Brain-Related Topics, BMI Emerging Applications
Abstract: Brain-machine interfaces (BMIs) have significantly advanced neuro-rehabilitation by enhancing motor control. However, accurately decoding continuous grasp force remains a challenge, limiting the effectiveness of BMI applications for fine motor tasks. Current models tend to prioritise algorithmic complexity rather than incorporating neurophysiological insights into force control, which is essential for developing effective neural engineering solutions. To address this, we propose EEGForceMap, an EEG-based methodology that isolates signals from the premotor-parietal region and extracts task-specific components. We construct three distinct time-frequency feature sets, which are validated by comparing them with prior studies, and use them for force prediction with linear, non-linear, and deep learning-based regressors. The performance of these regressors was evaluated on the WAY-EEG-GAL dataset that includes 12 subjects. Our results show that integrating EEGForceMap approach with regressor models yields a 61.7% improvement in subject-specific conditions (R² = 0.815) and a 55.7% improvement in subject-independent conditions (R² = 0.785) over the state-of-the-art kinematic decoder models. Furthermore, an ablation study confirms that each preprocessing step significantly enhances decoding accuracy. This work contributes to the advancement of responsive BMIs for stroke rehabilitation and assistive robotics by improving EEGbased decoding of dynamic grasp force.
|
|
08:45-09:00, Paper We-S1-BMI.WS.2 | |
Optimal EEG Channel Selection for Alzheimer’s Disease Detection: An Exhaustive Analysis |
|
Li, Taida | University of North Carolina, Charlotte |
Yan, Yujun | Dartmouth College |
Song, WenZhan | University of Georgia |
Zhang, Xiang | UNC Charlotte |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: Electroencephalography (EEG) emerges as a highly accessible, affordable, and pervasive method for early detection and continuous monitoring of Alzheimer's Disease (AD). Existing studies usually employ all the avialiable EEG channels, despite the importance of each EEG channel in AD detection is largely undeveloped yet. This study presents a systematic evaluation of EEG channel selection through exhaustive analysis of all possible channel combinations (1–19 channels) to determine optimal configurations for AD detection. We performed 570 experiments (190 combinations × 3 seeds) using iterative forward selection. Our results demonstrate that: (1) temporal channels (particularly T5) show the highest single-channel predictive value (64.23% F1 for T5 alone); (2) an optimal 9-channel subset {T5, T6, F8, F4, Fp1, O1, P3, P4, F3} achieves peak performance (74.8% F1), outperforming full 19-channel setups (71.42%); (3) a 5-channel subset {T5, T6, F8, F4, Fp1} obtains competitive F1 of 73.8%, achieving the trade-off between less channels and higher performance. We also provide a new concept, Marginal Utility (MU), that quantifies the performance change in AD detection when adding a specific EEG channel to the model's input, providing a metric for individual electrode significance. Our data-driven, exhaustive analysis ensures unbiased identification of the most diagnostically relevant channels for Alzheimer’s disease. These findings also enable the development of efficient, clinically viable EEG systems for AD detection using 70% fewer channels without sacrificing diagnostic accuracy.
|
|
09:00-09:15, Paper We-S1-BMI.WS.3 | |
Teaching Mu and Beta Modulation During Ankle-Foot Dorsiflexion through Visual and Kinesthetic Neurofeedback-Based Motor Imagery Brain-Computer Interface |
|
Gonzalez-Cely, Aura Ximena | Federal University of Espirito Santo |
Varela Silva, Lucivanio | Edmond and Lily Safra International Institute of Neuroscience |
da Costa, Lucas Jose | Edmond and Lily Safra International Institute of Neuroscience |
Oliveira de Azevedo Dantas, André Felipe | Santos Dumont Institute |
Cunha do Espírito Santo, Caroline | Santos Dumont Institute |
Bastos-Filho, Teodiano Freire | Federal University of Espirito Santo |
Delisle Rodriguez, Denis | Federal University of Espirito Santo |
Keywords: Active BMIs, BMI Emerging Applications, Passive BMIs
Abstract: This study proposes a Brain-Computer Interface (BCI) based on Motor Imagery (MI) for ankle-foot dorsiflexion training, providing Functional Electrical Stimulation (FES) as a way of NeuroFeedback (NF), in the tibialis anterior muscle. Riemannian geometry is utilized as a feature extraction technique for a more reliable MI-based electroencephalography discrimination, also considering Passive Movement (PM) data. A complete Spinal Cord Injury (SCI) individual tested the BCI during five sessions, each one on a different day, achieving an accuracy around 0.33, greater than the chance level of 0.25, considering that this is a four-class classification system. The mean BCI latency was lower than 240 ms. Significant Relative power changes (R) were observed in the mu (8-12 Hz) band for MI of dorsiflexion of the feet as well as negative R values centered around Cz, suggesting a better MI performance after observing PM as a visual real guide. As a highlight, our BCI calibrated with more reliable MI data, greatly enhanced the mu (8-12 Hz) rhythm modulation in the operation stage more than the high-beta (18-24 Hz) band modulation. These findings are relevant for advancing NF and BCI approaches to restore lower-limb motor functions, as well as enhance neuroplasticity.
|
|
09:15-09:30, Paper We-S1-BMI.WS.4 | |
Electroencephalography Neuromarkers to Predict the Response of a Multisensory Virtual Reality Nature Immersion Intervention for Patients Diagnosed with Post-Traumatic Stress Disorder |
|
de Jesus Junior, Belmir Jose | Institut National De La Recherche Scientifique - INRS |
Soares Lopes, Marilia Karla | Université Du Québec - Institut National De La Recherche Scienti |
Perreault, Lea | Traumas Cote-Nord |
Roberge, Marie-Claude | Traumas Cote-Nord |
Oliveira, Alcyr | Federal University of Health Sciences of Porto Alegre |
Falk, Tiago H. | INRS-EMT |
Keywords: Other Neurotechnology and Brain-Related Topics, BMI Emerging Applications, Passive BMIs
Abstract: Immersive virtual reality (VR) applications rapidly expand across domains, including training, gaming, and healthcare. More recently, multisensory immersive experiences, including olfactory and haptic stimulation, have emerged and shown great promise, especially for interventions in well-being and mental health management. Multisensory experiences, however, are very subjective (e.g., one subject may like certain smells, while others do not), and recent results have suggested that some participants may not respond positively to the treatment. As multisensory VR interventions can be costly and time-consuming for both patients and clinicians, being able to find neuromarkers that predict intervention outcomes would be invaluable. Here, we aim to take the first steps in the development of a neuromarker to predict the response to a multisensory nature immersion VR intervention. A pilot experiment was performed with twenty patients diagnosed with post-traumatic stress disorder. Potential neuromarkers are extracted from electroencephalography (EEG) signals measured from an instrumented VR headset. We show that some EEG patterns start to differ between responders and non-responders as early as the fourth session, i.e., one-third of the way into the entire intervention. This suggests that neuromarkers to predict the outcomes of a multisensory VR immersion intervention may exist. These markers could be used not only to save time and resources for clinicians and patients but also to promote precision treatment where interventions are adjusted to each patient, maximizing success rates.
|
|
09:30-09:45, Paper We-S1-BMI.WS.5 | |
VR-Based Cognitive Training of Short-Term Working Memory in Older Adults |
|
Penaloza, Christian | Mirai Innovation Research Institute |
Valencia, Victor | Advanced Telecommunications Research Institute International |
Choudhury, Nusrat | National Research Council Canada |
Keywords: Other Neurotechnology and Brain-Related Topics, Passive BMIs, BMI Emerging Applications
Abstract: Cognitive decline in aging is a critical issue that often leads to long-term functional impairments, including dementia. Cognitive training aims to maintain or improve abilities such as working memory, attention, and motor skills. Working memory, in particular, significantly influences daily life activities and is known to gradually deteriorate with age. In this study, we propose a cognitive training paradigm that actively engages working memory within a virtual reality (VR) environment. We evaluated the effectiveness of this paradigm through a user study involving 20 older adults, specifically assessing its impact on short-term working memory. Experimental results revealed that, based on a standardized working memory test administered before and after the VR training, 60% of participants improved their performance, 20% maintained their initial performance, and 20% experienced a decline. Additionally, EEG analysis demonstrated that participants who improved, exhibited significant differences in alpha-theta band power spectral density compared to those whose performance declined; these frequency bands are known indicators of working memory performance. Our findings suggest that VR-based cognitive training has the potential to enhance short-term working memory in older adults.
|
|
09:45-10:00, Paper We-S1-BMI.WS.6 | |
EEG-Based Cross-Subject Decoding of Motor Imagery for Affected-Hand Finger Tapping in Chronic Stroke Patients |
|
Jang, Yunjeong | The Catholic University of Korea |
Kim, Yun-Hee | Sungkyunkwan University School of Medicine |
Lee, Minji | The Catholic University of Korea |
Keywords: BMI Emerging Applications, Passive BMIs, Active BMIs
Abstract: This study presents a cross-subject electroencephalography (EEG) decoding framework for imagined finger tapping of the affected hand in chronic stroke patients. EEG from 11 participants was recorded using 27 channels during motor imagery of individual fingers. To capture functional connectivity, the weighted phase lag index was computed across six frequency bands. Due to high inter-subject variability, Kruskal–Wallis and Mann–Whitney U tests were used for two-stage feature selection. Selected features were classified using Random Forest, Support Vector Machine, and AdaBoost, with leave-one-subject-out validation. AdaBoost achieved the best performance (39.6% accuracy, 38.7% F1-score), exceeding chance level. Discriminative features were prominent in alpha and delta bands, reflecting activity in prefrontal and parietal-occipital areas. These results indicate that stroke patients retain task-relevant neural patterns, and the proposed method supports interpretable, scalable classification for motor imagery-based rehabilitation.
|
|
We-MPO |
Foyer F |
System Science and Engineering WiP Poster Session |
Work in Progress |
Chair: Pröstl Andrén, Filip | AIT Austrian Institute of Technology |
|
08:30-10:00, Paper We-MPO.1 | |
Improving Model Flexibility of Electrical Characteristic Prediction of A-IGZO-TFTs |
|
Bea, Khean Thye | National Taipei University of Technology |
Liao, JoAn | National Taipei University of Technology, Taipei Tech |
Chen, Yen Ting | National Taipei University of Technology |
Hu, Hsin-Hui | National Taipei University of Technology |
Chen, Yen-Lin | National Taipei University of Technology |
Cheng, Wai Khuen | Univetsiti Tunku Abdul Rahman |
Chen, Kun-Ming | National Nano Device Laboratories |
Keywords: Manufacturing Automation and Systems
Abstract: This paper presents a lightweight and flexible artificial neural network (ANN) model for predicting the transfer characteristics of amorphous Indium Gallium Zinc Oxide (a-IGZO) thin-film transistors (TFTs). Traditional TCAD simulations, while accurate, are computationally intensive and inflexible to rapid design variations. Prior efforts using variational autoencoders (VAEs) showed promise but were limited by rigid input formats that required retraining for any changes in curve dimension or voltage range. To overcome this, we propose an ANN architecture that treats gate voltage as a dynamic input, enabling continuous and accurate predictions across a wide voltage spectrum without the need for retraining. Experimental evaluation through 5-fold cross-validation confirms the ANN’s competitive performance, achieving an average R² score of 0.9898, outperforming VAE models in flexibility and robustness. This approach offers a fast, accurate, and hardware-efficient alternative for a-IGZO TFT modeling and optimization, facilitating the monolithic 3D (M3D) integration and the next generation of display.
|
|
08:30-10:00, Paper We-MPO.2 | |
Practical Considerations in Building a Simulation-Based Driver Monitoring Evaluation System: Focus on Virtual Driver Generation |
|
Oh, Cheonin | Electronics and Telecommunications Research Institute (ETRI) |
Kim, Woojin | Electronics and Telecommunications Research Institute |
Yoon, Daesub | ETRI |
Keywords: Intelligent Transportation Systems, Consumer and Industrial Applications, Quality and Reliability Engineering
Abstract: As Level 2–3 autonomous vehicles become more widespread, the reliable validation of Driver Monitoring Systems (DMS) has become essential to assess driver attention and readiness for control takeover. Conventional evaluation methods require real drivers to operate actual vehicles under test conditions, resulting in high costs and limited ability to replicate diverse real-world driving scenarios. In this study, we propose a simulation-based DMS evaluation framework that utilizes virtual drivers and synthetic driving scenarios. We aim to construct digital human models representing diverse demographics across race, gender, and age as well as driving behavior animation assets, using Unreal Engine and camera-based skeleton data. Our focus is on replicating realistic gaze, facial expressions, and driver behaviors in accordance with international DMS evaluation standards. To date, we have developed a pipeline for skeleton data collection and refinement and are in the process of building a driver behavior animation library. This approach is expected to enhance the efficiency and scalability of DMS evaluation, while improving realism and ensuring regulatory compliance.
|
|
08:30-10:00, Paper We-MPO.3 | |
Biogas Power Plant Operation Modeling Based on Statistical and Grey-Box Methods |
|
Reisz, Petra Alexandra | AIT Austrian Institute of Technology GmbH |
Strasser, Thomas | AIT Austrian Institute of Technology GmbH |
Keywords: Digital Twin, System Modeling and Control, Smart Buildings, Smart Cities and Infrastructures
Abstract: Biogas plants have been the subject of modeling and research in recent years. Such a plant is a system consisting of several components, the most important of which are the combined heat and power (CHP) unit that produces electricity and heat, the organic feed receiving area, the digester and the gas storage. All of these components have separate and detailed models developed for different purposes. A missing system approach that provides a sufficient level of detail and correlation between feeding, biogas production rate, storage level and CHP output was identified. Therefore, this work presents the concept of an approach that uses a deeper system understanding combined with processing of big amount of data rather than the popular fully machine learning based models. It outlines the methodology for establishing a relationship between CHP power and gas storage level in both forward and backward coupling. Possible physics based models are presented, which should serve as a predictor between the storage level of the biogas power plant and the feedstock. Finally, the individual steps are linked to a system-level approach that captures the fact that the power generated affects the state of the system and vice versa. The models presented are under development and use a sophisticated mix of statistical, data-driven and physical modelling methods. As input, they require recorded data that is minimally in detail and generally available for all biogas power plants. Together with the presented approach, the developed model should be able to serve as a digital twin of an arbitrary power plant that can be integrated into energy system models. Additionally, it should deliver quality indicators for the biogas produced at the respective plant. As such, the proposed methodology offers added value over existing approaches, which are often unidirectional or focused solely on individual components.
|
|
08:30-10:00, Paper We-MPO.4 | |
A Well-Being Oriented Integer Optimization Approach to Parallel Machine Scheduling with Flexible Working Hours |
|
Yaguchi, Takanobu | Mitsubishi Electric Corporation |
Kaieda, Hirokazu | Mitsubishi Electric Corporation |
Nakai, Atsuko | Mitsubishi Electric Corporation |
Iima, Hitoshi | Kyoto Institute of Technology |
Keywords: Manufacturing Automation and Systems, Consumer and Industrial Applications
Abstract: Job scheduling is essential for enhancing productivity in manufacturing environments, with traditional methods primarily focusing on maximizing throughput by minimizing tardiness or completion times. However, in societies facing population decline and aging workforces, there is a growing need to consider workers' well-being alongside productivity goals. Factory workers are increasingly affected by demographic shifts and technological innovations, which leads to diverse lifestyle needs that require flexible working arrangements. This paper proposes a well-being oriented integer optimization approach to parallel machine scheduling that supports flexible working styles while maintaining production efficiency. Our method introduces job processing priorities and employs dummy jobs to represent non-working periods, creating constraints that respect workers' diverse availability patterns. Numerical experiments demonstrate that our proposed method successfully creates schedules that minimize tardiness while accommodating workers' diverse working hour preferences, contributing to both productivity and factory worker well-being in manufacturing operations.
|
|
08:30-10:00, Paper We-MPO.5 | |
A Novel Approach to Resolve Inconsistency for Multi-Criteria Sorting with Heterogeneous Preferences by Considering Confidence Levels |
|
Li, Zhuolin | Dalian University of Technology |
Zhang, Zhen | Dalian University of Technology |
Keywords: Decision Support Systems
Abstract: In multi-criteria sorting (MCS) problems, decision makers often express indirect and potentially conflicting preferences, especially when these preferences come from multiple sources or take different forms. Effectively handling such inconsistency is essential for generating reliable sorting results in MCS problems. This paper introduces a novel approach tailored to resolving inconsistency in MCS problems with heterogeneous preferences. The proposed framework begins with a consistency checking model to detect conflicts in the provided heterogeneous preferences. If the inconsistency is detected, a two-stage adjustment model will be applied: the first stage minimizes the number of adjustments, while the second stage preserves high-confidence preferences wherever possible. Once the adjusted preferences are consistent, a sorting result determination model is used to assign alternatives to predefined categories, with an emphasis on maximizing discriminative power between categories.
|
|
08:30-10:00, Paper We-MPO.6 | |
A Real-Time Digital Twin Framework for the TIAGo Service Robot |
|
Herrmann, Malte | Otto-Von-Guericke Universität |
Hempel, Thorsten | Otto-Von-Guericke University |
Al-Hamadi, Ayoub | Otto-Von-Guericke University |
Keywords: Digital Twin, Cyber-physical systems, Robotic Systems
Abstract: The rapid development of humanoid robots to perform human-like tasks has introduced new possibilities in automation. In the context of advanced autonomous robotic operations, the potential damage due to unanticipated actions is a significant concern. To address this challenge, this paper presents a digital twin designed to replicate and monitor robotic operations in a virtual environment. Key features include the integration of manufacturer-provided kinematic models, real-time sensor fusion, and a user-friendly graphical interface to support reinforcement learning applications. The digital twin successfully mirrors the movements of a physical robot and records visual data for downstream machine learning tasks. Further, we implemented integration of procedurally generated virtual environments that allows for dynamic scenario creation, enabling robots to train and adapt to diverse and unpredictable conditions without the risks associated with real-world testing. This paves the way toward transitioning our entire laboratory into a digital ecosystem, providing a safe and controlled environment where robots can operate, learn, and improve autonomously.
|
|
08:30-10:00, Paper We-MPO.7 | |
Metaethical Framework for Artificial Intelligence Arms Races and Cyber Influence Operations |
|
Del Fabbro, Olivier | ETH Zurich |
Meier, Raphael | Armasuisse S+T |
Christen, Patrik | FHNW |
Keywords: System Modeling and Control, Decision Support Systems, Cyber-physical systems
Abstract: This paper presents a metaethical framework, that potentially allows to conceptually grasp current arms races in artificial intelligence and the ethical problems, such as cyber influence operations, connected to these arms races. AI arms races do not slow down, but enhance the weaponisation of AI. Rather than solving ethical problems, the metaethical framework thus first and foremost can be considered as a map, helping to navigate through the complex reality underlying the arms races and ethical problems. In this sense, the metaethical framework should in the future be capable of evaluating different countermeasures such as deplatforming and watermarking.
|
|
08:30-10:00, Paper We-MPO.8 | |
Origin-Destination Extraction from Large-Scale Route Search Records for Tourism Trend Analysis |
|
Ge, Hangli | The University of Tokyo |
Dizhi, Huang | University of Tokyo |
Yang, Xiaojie | The University of Tokyo |
Lin, Lifeng | The University of Tokyo |
Hatano, Kazuma | The University of Tokyo |
Kawasaki, Takeshi | East Nippon Expressway Company Limited |
Koshizuka, Noboru | The University of Tokyo |
Keywords: Cyber-physical systems, Smart Buildings, Smart Cities and Infrastructures, Intelligent Transportation Systems
Abstract: This paper presents a novel method for transforming large-scale historical expressway route search records into a three-dimensional (3D) Origin-Destination (OD) map, enabling data compression, efficient spatiotemporal sampling and statistical analysis. The study analyzed over 380 million expressway route search logs to investigate online search behavior related to tourist destinations. Several expressway interchanges (ICs) near popular attractions, such as those associated with spring flower viewing, autumn foliage and winter skiing, are examined and visualized. The results reveal strong correlations between search volume trends and the duration of peak tourism seasons. This approach leverages cyberspace behavioral data as a leading indicator of physical movement, providing a proactive tool for traffic management and tourism planning.
|
|
08:30-10:00, Paper We-MPO.9 | |
Grasp Planning for a Reconfigurable Soft Gripper Using Superquadrics and Reinforcement Learning in Simulation |
|
Vatsal, Vighnesh | TCS Research, Tata Consultancy Services Ltd |
George, Nijil | TCS Research, Tata Consultancy Services Ltd |
Lima, Rolif | Tata Consultancy Services |
Das, Kaushik | TCS Research |
Keywords: Robotic Systems, Soft Robotics, Consumer and Industrial Applications
Abstract: Grasping and manipulation remain fundamental challenges in the effective deployment of robotic systems in real-world applications. In retail and supermarket scenarios, soft robotic grippers enable safe and efficient material handling. However, existing grasp planners are designed for rigid or suction-based grippers. Soft grasping is more challenging in terms of planning, estimation and sensing due to deflections in the gripper material on contact with the target. We present a system for soft robotic grasping using a custom gripper with a reconfigurable wrist that leverages reinforcement learning to augment existing vision-based techniques to adapt to the target object's geometry. This system includes a hidden superquadrics module to guide the adaptation of the gripper's palm configuration. We evaluate this system in a PyBullet simulation environment and compare it with a baseline synergy-based grasp strategy. Ongoing and future work involves transferring this planner to our physical robotic platform and evaluation in retail stores.
|
|
08:30-10:00, Paper We-MPO.10 | |
Design and Application of a GMV-PID Compensator for a Hierarchical-Type Control System |
|
Sugahara, Takahiro | Hiroshima University |
Wakitani, Shin | Hiroshima University |
Yamamoto, Toru | Hiroshima University |
Ochiiwa, Takashi | The Japan Steel Works, Ltd |
Tomiyama, Hideki | The Japan Steel Works, Ltd |
Keywords: System Modeling and Control, Consumer and Industrial Applications, Mechatronics
Abstract: Model-Based Development (MBD) has been widely used as an efficient product development method in industry. A controller designed using MBD may not achieve the desired control performance due to the effects of disturbances and model errors in the actual plant. A hierarchical-type control system has been proposed as the control system architecture for products developed through MBD. In this control structure, a compensator is introduced to suppress the effects of disturbances and model errors in the actual plant. This paper proposes a PID-type compensator (GMV-PID compensator) based on Generalized Minimum Variance Control (GMVC) as a method for designing compensators in the hierarchical-type control system. The proposed method is intended for application to plastic processing machinery such as injection molding machines and film production machines. As part of ongoing work, this paper presents preliminary experimental results obtained by applying the proposed method to a pilot-scale slider-crank system, which simulates the toggle-type clamping mechanism of an actual injection molding machine.
|
|
08:30-10:00, Paper We-MPO.11 | |
Smart Control of Water-Fertilizer Integrated Regulation System Based on Deep Reinforcement Learning |
|
Liu, Jiamei | Zhejiang Sci-Tech University |
Chang, Fangle | Ningbo Global Innovation Center, ZheJiang University |
Ma, Longhua | Information School, NingboTech University |
Xie, Lei | State Key Laboratory of Industrial Control Technology, Zhejiang |
Su, Hongye | Zhejiang University |
Keywords: System Modeling and Control, Decision Support Systems, Modeling of Autonomous Systems
Abstract: Water-fertilizer integrated regulation system aims to to improve crop yield, nutrient use efficiency (NUE), and water use efficiency (WUE). This study develops an intelligent control system with a cloud server network to centralize data management and process. The system contains of the monitoring module, the intelligent cloud platform module, and the control terminal. In the monitoring module, the sensor network and phenotypic monitoring method were utilized to collect real-time crop and environment data. In the intelligent cloud platform module, a perception-computing-control integrated computing optimization framework was constructed to achieve the localization of control tasks by analyzing perceived data, training predictive control models, and optimizing deep reinforcement learning algorithms. The control terminal contains of task scheduling, equipment control, and data collection drive, as well as a reasonable deployment pipeline in the cloud. Our developed control model has increased crop productivity nearly 9% and achieved water resource savings nearly 15.6%. Multi-modal large models will be applied to adjust model parameters in different environments.
|
|
08:30-10:00, Paper We-MPO.12 | |
The Collision Avoidance with the Human Braking Sensation in Congested Moving Environments |
|
Osaki, Tomoki | Nagoya University |
Matsubayashi, Shota | Nagoya University |
Ninomiya, Yuki | Nagoya University |
Miwa, Kazuhisa | Nagoya University |
Terai, Hitoshi | Kindai University |
Keywords: Cooperative Systems and Control
Abstract: Recent technological advances have introduced autonomous mobile robots into public areas that were previously occupied only by pedestrians. When pedestrians and mobile robots coexist, the collision avoidance algorithm in which the robots can reach their goals without causing collisions or significant delays is important. One approach is to incorporate human sensation into the avoidance algorithm, and the braking index representing human braking sensation in two-dimensional environments was developed. The goal of this study was to verify whether the collision avoidance algorithm based on the braking index could work efficiently in a congested environment using agent simulations. The results showed that the agents with proposed algorithm achieved smoother and safer movement than the agents without it in a congested environment. Additionally, we found a traffic density-flow relationship between congestion and movement smoothness. This finding implies that this relationship could be applied to two-dimensional environments, such as shopping malls or airports.
|
|
08:30-10:00, Paper We-MPO.13 | |
Integrating Large Language Models into Data-Driven Frameworks for Smart Meter Analytics |
|
Gramegna, Filippo | Polytechnic University of Bari |
Bilenchi, Ivano | Polytechnic University of Bari |
Loseto, Giuseppe | LUM University |
Manco, Federico | Lutech S.p.A |
Mastrototaro, Gianpiero | Lutech S.p.A |
Scioscia, Floriano | Polytechnic University of Bari |
Ruta, Michele | Politecnico Di Bari |
Keywords: Smart Metering, Decision Support Systems, Intelligent Power Grid
Abstract: The evolution of metering technologies has enabled the collection of a vast amount of energy consumption data, offering new opportunities for more efficient energy management. While utility providers increasingly leverage machine learning and data visualization to simplify and optimize data analysis, current systems often present barriers in understanding collected data. This paper introduces a novel multi-agent architecture that integrates Large Language Models (LLMs) to enhance the interpretability of smart meter data. Developed within the Digital Enterprise initiative by Lutech S.p.A., the proposed framework enables natural language interactions and energy-related reports. An early evaluation has been conducted through a series of basic interaction tests, demonstrating the feasibility of the approach and its potential to improve data-driven decision-making.
|
|
08:30-10:00, Paper We-MPO.14 | |
Towards a Quantitative Holistic Cybersecurity Assessment of Energy Systems |
|
Herzog, Almut | Fraunhofer FIT |
Geiger, Alexander | Fraunhofer FIT |
Ulbig, Andreas | IAEW at RWTH Aachen University |
Keywords: Decision Support Systems, Cyber-physical systems, System Modeling and Control
Abstract: This work-in-progress paper presents a concept for a holistic cybersecurity analysis. The proposed approach is designed to enable a quantitative assessment of the cybersecurity posture of an energy system, based on knowledge of its assets and their interconnections. Identified vulnerabilities are reported and categorized, while corresponding mitigation suggestions are collected and validated. At present, cybersecurity assessments in this domain still rely heavily on expert judgment, leading to subjective evaluations that hinder comparability and objectivity. To address this, the proposed approach aims to systematize and automate the analysis process. This paper is published at an early stage to actively invite feedback and foster community involvement, as broad expert input is essential to improve the quality, robustness, and acceptance of the approach. This paper documents a research project aimed at providing grid operators with an analysis tool to support decision-making in the future.
|
|
08:30-10:00, Paper We-MPO.15 | |
Detecting Hidden Backdoors in Large Language Models |
|
Peechatt, Jibin Mathew | FHNW |
Schaaf, Marc | FHNW |
Christen, Patrik | FHNW |
Keywords: Homeland Security
Abstract: Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek's language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.
|
|
08:30-10:00, Paper We-MPO.16 | |
Agentic Hyperautomation: A Distributed Architecture for Scalable AI-Driven Workflows |
|
Tomasino, Arnaldo | Polytechnic University of Bari |
Ieva, Saverio | Polytechnic University of Bari |
Loseto, Giuseppe | LUM University |
Scioscia, Floriano | Polytechnic University of Bari |
Ruta, Michele | Politecnico Di Bari |
Ingianni, Angelo | Lutech S.p.A |
Minoia, Marco | Lutech S.p.A |
Genchi, Gianmarco | Lutech S.p.A |
Keywords: Distributed Intelligent Systems, Large-Scale System of Systems, System Architecture
Abstract: Hyperautomation aims to digitize end-to-end business processes, but most of the current solutions and platforms still rely on pre-defined workflows to carry out complex tasks. However, business process automation can greatly benefit from recent technological innovations at the intersection of Large Language Models (LLMs) and Multi-Agent systems (MAS). In this paper, we present an Agentic Artificial Intelligence (AI) framework where a central LLM-driven orchestrator dynamically plans and delegates tasks to specialized LLM agents, which in turn exploit tools to act on business platforms and other external systems. A case study based on document management illustrates how the approach is able to deal with complex requests involving source file parsing and report generation, leveraging an approach based on Retrieval Augmented Generation (RAG) to enable knowledge sharing among specialized agents. Finally, the proposed framework introduces a practical blueprint for scalable, explainable Agentic AI in enterprise hyperautomation environments.
|
|
We-KN3 |
Hall F |
Keynote 3 |
Keynote |
Chair: Strasser, Thomas | AIT Austrian Institute of Technology GmbH |
|
10:30-10:45, Paper We-KN3.1 | |
IEEE Lotfi A. Zadeh Award for Emerging Technologies |
|
Rahman, Saifur | Advanced Research Institute, Virginia Tech |
Keywords: AI and Applications
Abstract: The IEEE Lotfi A. Zadeh Award for Emerging Technologies recognizes individuals or teams for exceptional contributions to emerging technologies that have demonstrated significant impact, originality, and importance in recent years. This prestigious honor reflects IEEE’s commitment to advancing innovative technologies that shape the future and embodies the pioneering spirit of Lotfi A. Zadeh, the father of fuzzy logic. The award, which includes a bronze medal, certificate, and honorarium, will be presented by 2023 IEEE President Saifur Rahman and conferred to the awardee, Dimitar Filiev, during this session.
|
|
10:45-11:45, Paper We-KN3.2 | |
Keynote Talk: Driving Innovation: AI in Automotive Engineering |
|
Filev, Dimitar | Ford Motor Company |
Keywords: AI and Applications, Application of Artificial Intelligence
Abstract: Artificial Intelligence (AI) is rapidly transforming the automotive industry. From enhancing vehicle performance and safety to streamlining manufacturing processes, AI-powered systems are driving significant advancements. This presentation, drawing on the speaker's 30+ years of experience in R&D, discusses the transformative role of AI in various automotive applications. We will explore the evolution of AI in the automotive sector, from early AI to deep learning and generative AI. We will examine the challenges and opportunities associated with integrating AI solutions. The presentation will highlight specific examples of AI applications in powertrain, vehicle systems, and autonomous driving. We will examine the lessons learned from applying data-driven AI techniques and discuss the potential of combining knowledge-based approaches with data-driven methods to further enhance AI's impact on the automotive industry.
|
|
We-PT2 |
Hall N |
Invited Talk |
Talk |
Chair: Pröstl Andrén, Filip | AIT Austrian Institute of Technology |
|
11:45-12:30, Paper We-PT2.1 | |
Beyond Smart: Architecting Trustworthy Ecosystems |
|
Neppel, Clara | IEEE Technology Centre GmbH |
Keywords: Trust in Autonomous Systems, System Architecture
Abstract: With autonomous systems playing critical roles across society and industry, their trustworthiness has become a foundational requirement—not only for technical reliability, but to ensure ethical, safe, and responsible deployment. As these systems increasingly make decisions that affect humans, trustworthiness is essential to earning acceptance, managing risk, and complying with evolving legal and societal expectations. This talk focuses on standards and frameworks for integrating trust into both smart system architectures and the broader cross-domain ecosystems that support their safe adoption and positive impact.
|
|
We-S2-T1 |
Hall F |
Deep Learning 7 |
Regular Papers - Cybernetics |
Chair: Sun, Haofeng | Beijing University of Posts and Telecommunications |
Co-Chair: Peng, Bo | Southwest Petroleum University |
|
11:45-12:00, Paper We-S2-T1.1 | |
Uncovering Critical Features for Deepfake Detection through the Lottery Ticket Hypothesis |
|
Al Amin, Lisan | University of Maryland, Baltimore County |
Hossain, Md. Ismail | North South University |
Nguyen, Thanh Thi | Monash University |
Jahan, Tasnim | United International University, Bangladesh |
Islam, Mahbubul | United International University, Bangladesh |
Quader, Faisal | University of Maryland, Baltimore County |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Vision
Abstract: Recent advances in deepfake technology have created increasingly convincing synthetic media that poses significant challenges to information integrity and social trust. While current detection methods show promise, their underlying mechanisms remain poorly understood, and the large sizes of their models make them challenging to deploy in resource-limited environments. This study investigates the application of the Lottery Ticket Hypothesis (LTH) to deepfake detection, aiming to identify the key features crucial for recognizing deepfakes. We examine how neural networks can be efficiently pruned while maintaining high detection accuracy. Through extensive experiments with MesoNet, CNN-5, and ResNet-18 architectures on the OpenForensic and FaceForensics++ datasets, we find that deepfake detection networks contain winning tickets, i.e., subnetworks, that preserve performance even at substantial sparsity levels. Our results indicate that MesoNet retains accuracy at 80% sparsity on the OpenForensic dataset, with only 3,000 parameters. The results also show that our proposed LTH-based iterative magnitude pruning approach consistently outperforms one-shot pruning methods. Using Grad-CAM visualization, we analyze how pruned networks maintain their focus on critical facial regions for deepfake detection. Additionally, we demonstrate the transferability of winning tickets across datasets, suggesting potential for efficient, deployable deepfake detection systems.
|
|
12:00-12:15, Paper We-S2-T1.2 | |
Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey |
|
Schotschneider, Albert | FZI Research Center for Information Technology |
Pavlitska, Svetlana | FZI Research Center for Information Technology |
Zöllner, Marius | Forschungszentrum Informatik |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Deep neural networks (DNNs) are widely used in perception systems for safety-critical applications, such as autonomous driving and robotics. However, DNNs remain vulnerable to various safety concerns, including generalization errors, out-of-distribution (OOD) inputs, and adversarial attacks, which can lead to hazardous failures. This survey provides a comprehensive overview of runtime safety monitoring approaches, which operate in parallel to DNNs during inference to detect these safety concerns without modifying the DNN itself. We categorize existing methods into three main groups: Monitoring inputs, internal representations, and outputs. We analyze the state-of-the-art for each category, identify strengths and limitations, and map methods to the safety concerns they address. In addition, we highlight open challenges and future research directions.
|
|
12:15-12:30, Paper We-S2-T1.3 | |
An Unsupervised Ultrasonic Speckle Displacement Tracking Model Via Knowledge Distillation and Curriculum Learning (I) |
|
He, Yuchuan | Southwest Petroleum University |
Dang, Jiachen | SouthWest Petroleum University |
Yang, Han | Southwest Petroleum University |
Peng, Bo | Southwest Petroleum University |
Keywords: Deep Learning, Image Processing and Pattern Recognition, Machine Learning
Abstract: 超声弹性成像 (USE)、射频 (RF) 数据中 保留高频内容,而 B 模 (BM) 数据 通过射频信号变换获得,遭受 大量信息丢失,导致次优 BM 散斑跟踪中的位移场。然而,射频 临床超声中通常无法获得数据 系统,从 BM 进行可靠的位移估计 序列对实用弹性成像至关重要。虽然 现有的基于深度学习的BM运动跟踪方法 提供可用的位移预测、其准确性和 应变图像质量仍然有限,尤其是在真实 体内数据。为了解决这个限制,我们建议将 无监督光流跟踪神经 网络——KDCLPWC-Net——融合知识 蒸馏和课程学习以增强斑点 跟踪 BM 数据的性能。具体来说,一个 引入跨模态知识蒸馏框架, 具有并行特征对齐模块 (FAM) 将多尺度特征表示从 改善轴向位移的教师网络 学生网络中的估计。此
|
|
12:30-12:45, Paper We-S2-T1.4 | |
FedDAPR: Federated Semi-Supervised Learning with Dynamic Aggregation and Prototype Retraining |
|
Tian, Wansheng | Beijing University of Posts and Telecommunications |
Tian, Hui | Beijing University of Posts and Telecommunications |
Wang, Jiawei | Beijing University of Posts and Telecommunications |
Sun, Haofeng | Beijing University of Posts and Telecommunications |
Keywords: Deep Learning, Representation Learning, Machine Learning
Abstract: By utilizing the large amount of unlabeled data among distributed clients, federated semi-supervised learning (FSSL) has become a new research topic. However, the knowledge discrepancies among diverse clients and non-independent and identically distributed (Non-IID) data pose significant challenges to the model performance and generalization in FSSL. To tackle these challenges, we propose a FSSL framework with the dynamic aggregation and prototype retraining method (FedDAPR). FedDAPR improves the performance and robustness of the global model by developing a dynamic aggregation scheme. Moreover, to enhance the degraded training performance caused by the Non-IID data, we propose a prototype-based retraining mechanism for the classifier of the global model. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed FedDAPR.
|
|
We-S2-T3 |
Room 0.11 |
Haptic Systems & Assistive Technology |
Regular Papers - HMS |
Chair: Shiraishi, Yuhki | Tsukuba University of Technology |
Co-Chair: Altamirano Cabrera, Miguel | Skolkovo Institute of Science and Technology Skoltech |
|
11:45-12:00, Paper We-S2-T3.1 | |
Detection of Cognitive and Physical Fatigue Using Physiological Signals |
|
Fava, Alessandra | University of Modena and Reggio Emilia |
Gabbi, Marta | University of Modena and Reggio Emilia |
Villani, Valeria | University of Modena and Reggio Emilia |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: Assistive Technology, Biometrics and Applications,, Human Factors
Abstract: In recent decades, the detection of fatigue has been largely studied to improve safety and performance in various domains such as healthcare, transportation, and manufacturing. In fact, fatigue significantly affects cognitive and physical performance. There are numerous studies on the detection of fatigue, but most of them focus on binary classification (i.e., fatigue vs. resting state) or different levels of fatigue intensity, without distinguishing between specific fatigue types. This work aims to discriminate between cognitive and physical fatigue, and proposes the use of physiological signals to classify four distinct conditions: rest, cognitive fatigue, physical fatigue, and combined cognitive and physical fatigue. We analyze data from cardiac, eye, electrodermal, and electromyographic activity. We consider different feature selection methods, including correlation analysis, principal component analysis and sequential forward floating search method. Ultimately, we classify them using state-of-the-art machine learning methods. The highest classification accuracy (86.823%) is obtained from a support vector machine method using a selection of the features extracted from cardiac and electromyographic sensors. This study highlights the potential for real-time fatigue classification, which can enhance the adaptability of automated systems to human needs, particularly in high risk environments like transportation and healthcare. Furthermore, the findings suggest that fatigue monitoring can be effectively conducted with minimal sensor requirements, contributing to the design of more efficient wearable sensor systems.
|
|
12:00-12:15, Paper We-S2-T3.2 | |
A Shoulder-Mounted Tactile Notification System for d/Deaf and Hard of Hearing Individuals: Toward Practical Use in Multi-Person Meetings |
|
Shiraishi, Yuhki | Tsukuba University of Technology |
Shitara, Akihisa | University of Tsukuba |
Yoneyama, Fumio | Tsukuba University of Technology |
Nakai, Yukiya | Alps Alpine Co., Ltd |
Kato, Nobuko | National University Corporation of Tsukuba University of Technol |
Keywords: Assistive Technology, Haptic Systems, Wearable Computing
Abstract: Auditory directionality remains challenging for d/Deaf and hard of hearing (DHH) people. We evaluated a shoulder-mounted tactile system that vibrates bilaterally to signal sound direction and urgency in multi-person meetings. Eight DHH participants watched subtitled VR meetings with and without the device, pressing a trigger when their name or an alarm appeared and selecting the speaker afterward. The device increased reaction count for name calls and shortened reaction time for both name and alarm alerts (all p < 0.05), while speaker identification accuracy stayed high and unchanged. Post-experiment surveys confirmed strong demand for directional cues (p < 0.01) and alarm sound alerts (p < 0.05). Reflective Thematic Analysis of open-ended responses highlighted (i) the need to customize target sounds, (ii) meeting-focused use cases, and (iii) requests for refined vibration patterns and lighter design. These findings demonstrate the system’s value for timely notifications and emphasize directional feedback and customizability as critical to practical adoption.
|
|
12:15-12:30, Paper We-S2-T3.3 | |
EEG Study of the Influence of Imagined Temperature Sensations on Neuronal Activity in the Sensorimotor Cortex |
|
Belichenko, Anton | Skolkovo Institure of Science and Technology |
Trinitatova, Daria | Skolkovo Institute of Science and Technology |
Nasibullina, Aigul | Skolkovo Institure of Science and Technology |
Yakovlev, Lev | Skolkovo Institure of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Brain-Computer Interfaces, Haptic Systems, Human-Computer Interaction
Abstract: Understanding the neural correlates of sensory imagery is crucial for advancing cognitive neuroscience and developing novel Brain-Computer Interface (BCI) paradigms. This study investigated the influence of imagined temperature sensations (ITS) on neural activity within the sensorimotor cortex. The experimental study involved the evaluation of neural activity using electroencephalography (EEG) during both real thermal stimulation (TS: 40 °C Hot, 20 °C Cold) applied to the participants' hand, and the mental temperature imagination (ITS) of the corresponding hot and cold sensations. The analysis focused on quantifying the event-related desynchronization (ERD) of the sensorimotor mu-rhythm (8-13 Hz). The experimental results revealed a characteristic mu-ERD localized over central scalp regions (e.g., C3) during both TS and ITS conditions. Although the magnitude of mu-ERD during ITS was slightly lower than during TS, this difference was not statistically significant (p>.05). However, ERD during both ITS and TS was statistically significantly different from the resting baseline (p<.001). These findings demonstrate that imagining temperature sensations engages sensorimotor cortical mechanisms in a manner comparable to actual thermal perception. This insight expands our understanding of the neurophysiological basis of sensory imagery and suggests the potential utility of ITS for non-motor BCI control and neurorehabilitation technologies.
|
|
We-S2-T5 |
Room 0.14 |
Image Processing and Pattern Recognition 4 |
Regular Papers - Cybernetics |
Chair: Lian, Tengchi | Hebei University |
Co-Chair: Kondo, Katsuya | Tottori University |
|
11:45-12:00, Paper We-S2-T5.1 | |
PowerOpsNet: Integrating YOLOv5-A and GCN for Real-Time Safety Behavior Recognition in Power Grid Environments |
|
Yan, Shenyu | South China Normal University |
Yu, Songsen | South China Normal University |
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications
Abstract: Abstract— Ensuring operational safety in power grids is critical, particularly in substations where occlusion and small-object challenges are prevalent. We propose PowerOpsNet, a unified end-to-end framework for real-time unsafe behavior recognition. It combines an optimized YOLOv5-A detector—enhanced via Mosaic-8 augmentation, multi-scale features, and a dynamic focal loss—with a CNN-LSTM-GCN-based behavior recognition module. A benchmark dataset comprising 11 object types and 18 actions was collected from real substations. Experiments demonstrate 91.3% detection precision and 87.6% recognition accuracy, with inference speeds of 370ms and 200ms, respectively. PowerOpsNet supports real-time safety monitoring and is extensible to other high-risk domains.
|
|
12:00-12:15, Paper We-S2-T5.2 | |
Few-Shot Font Generation Via Adaptive Feature Extraction and Detail Enhancement |
|
Lian, Tengchi | Hebei University |
Yang, Fang | Hebei University |
Zhang, Chong | Hebei University |
Keywords: Image Processing and Pattern Recognition, Machine Vision
Abstract: Due to the complex structures of many characters, automatic font generation remains a challenging research task. Although existing methods achieve satisfactory performance, the generated characters still suffer from significant detail loss, such as stroke loss, stroke connection issues, or breakage, among others. To address this issue, we propose an approach based on Adaptive Feature Extraction (AFE) and the Detail Enhancement Module (DEM). AFE adaptively fuses local and global channel attention to enhance content feature extraction, enabling the network to focus on structurally important features, thereby improving character clarity and consistency. DEM first employs multi-scale dilated depthwise convolutions with feature aggregation to effectively enhance both local details and global contours in font structures. It then uses deformable convolutions to predict displacement maps and applies these maps to perform geometric feature deformation on the low-level feature maps from the content encoder. This approach improves style transfer accuracy while preserving structural consistency in fine-grained font generation. Experimental results demonstrate the superiority of our method, which can generate Chinese characters with rich details. The generated characters outperform those produced by existing methods. Furthermore, our approach can be extended to cross-lingual generation.
|
|
12:15-12:30, Paper We-S2-T5.3 | |
DC-BEV: Depth-Completed Bird's Eye View Representation for Multi-Modal 3D Object Detection |
|
Ning, Tong | University of Chinese Academy of Science |
Lu, Ke | University of Chinese Academy of Sciences |
Jiang, Xirui | University of Chinese Accademy of Sciences |
Xue, Jian | University of Chinese Academy of Sciences |
Keywords: Image Processing and Pattern Recognition, Machine Vision, AI and Applications
Abstract: The bird's eye view (BEV) representation is essential for accurate 3D perception tasks (e.g., 3D object detection) in autonomous driving for its precise localization, scale consistency, and modality independence. However, traditional depth-prediction-based methods, which rely on predicted depth distributions derived from semantic image features for BEV transformation, face challenges such as ambiguous depth-prior information and dependence on predefined depth distribution types. To address these limitations, we propose a novel multi-modal 3D object detection method, Depth Completed BEV (DC-BEV), which leverages ground-truth sparse LiDAR depth to guide BEV transformations, significantly enhancing depth estimation accuracy. Specifically, we introduce a Multimodal-Depth Completion (MDC) mechanism, which enriches sparse LiDAR depth into dense depth maps by integrating semantic and geometric cues from images. Additionally, to mitigate gradient instability caused by inadequate implicit supervision, we present an Explicit Depth Supervision (EDS) mechanism that directly supervises depth predictions using a dedicated depth loss. Comprehensive experiments conducted on the nuScenes dataset demonstrate that DC-BEV achieves superior performance, notably improving detection accuracy through enhanced depth estimation quality and robust BEV representation.
|
|
12:30-12:45, Paper We-S2-T5.4 | |
Monocular Multiple People Tracking in Thermal Images Using Time-Series Depth Estimation |
|
Taki, Kimitaka | Graduate School of Sustainability Science, Tottori University |
Kondo, Katsuya | Tottori University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Application of Artificial Intelligence
Abstract: Multiple person tracking is generally achieved by analyzing the color texture, shape, motion, and other information in RGB video images. However, this approach is challenging in low-light or dark environments without adequate lighting, leading to reduced accuracy. Monocular depth estimation methods have limitations in accurately estimating depth and cannot effectively distinguish between front and rear figures during occlusion. In this paper, we focused on the motion information instead of the appearance information of the detected persons. Moreover, we utilize additional estimated depth information instead of only the detected two-dimensional position in the image. Through experimental results, we show that the accuracy of tracking heat in an image can be improved by processing, such as Kalman filtering of three-dimensional position information, to compensate for tracking breaks due to occlusion. Keywords—thermal images, monocular depth estimation, occlusion, person tracking, depth map correction
|
|
12:45-13:00, Paper We-S2-T5.5 | |
A Positive Sample Encouraged Contrastive Learning Method for Source-Free Unsupervised Domain Adaptation |
|
Lu, Xiaoyu, Sean | Nanjing University of Science and Technology |
You, Yifei | Nanjing University of Science and Technology |
Yao, Siya | Zhejiang Gongshang University |
Huang, Bo | Nanjing University of Science and Technology |
Keywords: Transfer Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: Source-free Unsupervised Domain Adaptation (SFUDA) aims to utilize a pre-trained source model on unlabeled target domains without accessing source data. The emergence of SFUDA has improved the robustness of UDA methods and enables domain adaptation to still perform well in fields with security and privacy considerations. Self-training is one approach to solve the domain gap problem in the SFUDA field, which iteratively selects high-confidence target samples as pseudo-labeled samples to guide target model learning. However, previous works only use the source domain model to generate the first round of pseudo labels before performing denoising tasks. This preprocessing is undoubtedly rough, because it ignores the domain gap between the source domain and the target domain. Given the existence of domain gap, some incorrect pseudo labels may never be corrected by subsequent denoising tasks. Inspired by the teacher-student model, this work proposes a method called Positive Sample Encouragement (PSE) that effectively addresses this issue in a coarse-to-fine manner. Specifically, we first conduct a contrastive learning task that enables the target model to effectively inherit from the source model, while allowing the source model to adapt to target data. Furthermore, we incorporate mutual information-based class balancing, that assigns high weights to high-confidence classes by learning pseudo-label feature distributions. Through extensive experiments on three widely-used benchmarks, we demonstrate that our proposed method achieves competitive performance compared with the state-of-the-art approaches.
|
|
We-S2-T6 |
Room 0.16 |
Machine Learning 2 |
Regular Papers - Cybernetics |
Chair: Bhattacharyya, Shuvra | University of Maryland, College Park |
Co-Chair: Ebrahimi, Masoud | Mälardalen University |
|
12:00-12:15, Paper We-S2-T6.2 | |
Towards Incorporating Social and Spatial Dependencies in Machine Learning Models for Crime Prediction |
|
Qi, Xiaowen | University of Maryland |
Nakamura, Kiminori | University of Maryland |
Bhattacharyya, Shuvra | University of Maryland, College Park |
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications
Abstract: In order to better dispatch police patrols, it is crucial to accurately predict crime hotspots. Considering crime prediction as a typical time series forecasting task (i.e., using crime history to predict future occurrences) has been shown to be effective. However, contextual features (e.g., demographics, economics, etc.) can also be introduced to improve the accuracy of the predictions as they relate to the formation of crime. Since such contextual features are usually considered to be static relative to the time scales of crime records, we first propose a parallel branch model with dedicated branches for each type of data so that temporal crime history and time-invariant contextual features can be processed coherently. Then, we incorporate both spatial and social dependencies into the model, considering that criminals may flee to neighboring areas and similar crime patterns may occur in areas with similar social functions. The experimental results confirm the effectiveness of our proposed model. In particular, our model achieves state-of-the-art performance with an accuracy of 75.3% and an AUC-ROC (area under the receiver operating characteristic curve) of 0.79.
|
|
12:15-12:30, Paper We-S2-T6.3 | |
LI-AMCU: Leveraging Isomorphism for Adaptive Motion Control Update |
|
Wang, Shengyi | University of Science and Technology of China |
Deng, GuoQing | Hefei Institutes of Physical Science, Chinese Academy of Science |
Zhang, Wen | Hefei Institutes of Physical Science, Chinese Academy of Science |
Keywords: Machine Learning, Machine Vision, Hybrid Models of Computational Intelligence
Abstract: Motion control of human musculoskeletal models has been playing an essential role in the field of biomechanics and computer animation simulation for the last decade. However, challenges remain, including the high computational complexity of the new task and the difficulty of ensuring the authenticity of the motion data mapping. In this paper, we propose LI-AMCU, a novel modeling framework for motion control. To achieve more efficient control in the deployment of various tasks, we combine reinforcement learning with a GAN-like framework and propose an update module in the control policy based on the latent space. To guarantee a more natural and realistic movement of the data-driven model, we propose a geometric transformation learning module based on isomorphism. Furthermore, experiments on the Human3.6M and LaFAN1 datasets demonstrate that LI-AMCU achieves state-of-the-art performance and has the robustness and generalizability to enable complex musculoskeletal models to perform different motion tasks. We also visualize the various motions of physics-based simulated characters. The implementation code will be released after acceptance.
|
|
12:30-12:45, Paper We-S2-T6.4 | |
Dynamic Zooming Strategy with Parameter Importance and Volume Enforcement for Parallel Optimization |
|
Yamaguchi, Yotaro | Osaka Institute of Technology |
Tanigaki, Yuki | Osaka Institute of Technology |
Keywords: Machine Learning, Metaheuristic Algorithms, Evolutionary Computation
Abstract: Black-box optimization (BBO) is a proven approach for solving complex real-world problems, such as hyperparameter optimization (HPO) in machine learning. HPO often involves expensive objective function evaluations, which limit the number of feasible evaluations. Accordingly, the demand for optimization methods with high parallel efficiency and fast convergence has increased. Sequential Uniform Design (SeqUD) satisfies these properties by iteratively applying uniform sampling and search space contraction (zooming) around the best solutions. However, in many HPO problems, only a few parameters strongly influence the objective function value, while others have little effect—a property known as low effective dimensionality (LED). Therefore, the uniform contraction utilized in SeqUD can lead to inefficient search for HPO problems. In this paper, we propose a new zooming strategy, referred to as Dynamic Importance-guided and Volume Enforcement (DIVE), for efficiently performing HPO with SeqUD. Numerical results utilizing benchmark functions assuming HPO characteristics show that the SeqUD with DIVE outperforms CMA-ES, a major evolutionary algorithm, in terms of both evaluation efficiency and parallel scalability under limited function evaluations.
|
|
12:45-13:00, Paper We-S2-T6.5 | |
RARE: Robustness Assessment and Regularized Enhancement a Study in Function Estimation and Symbolic Regression |
|
Ebrahimi, Masoud | Mälardalen University |
Alfalouji, Qamar | Align Technology Inc |
Keywords: Machine Learning, Neural Networks and their Applications, AI and Applications
Abstract: Robustness against noise and input perturbations is a crucial requirement for trustworthy machine learning (ML). We introduce Robustness Assessment and Regularized Enhancement (RARE), a model-agnostic framework that evaluates and improves robustness in regression tasks, including neural and symbolic approaches. RARE centers on a user-specified noise model (e.g., Gaussian or Laplace) to differentiate intended from unintended input variations, yet in practice generalizes to unseen noise. Its four components are: (i) a significant-pattern operator that per feature synthesizes a single representative noise, i.e., worst-case in our tests, with best- or average-case also supported; (ii) a robustness assessor that measures the ratio of input deviation induced by significant patterns to output variation, analogous to estimating the inverse of local Lipschitz constants of the learned function; (iii) a data-augmentation module that increases training exposure to noise using the significant patterns; and (iv) a regularizer that embeds the metric in the loss to penalize low robustness. Because RARE targets the Lipschitz behaviour of the function under learning rather than the model's, it neither requires white-box access to model parameters nor assumes continuity or differentiability of the learning method. Consequently, RARE applies to off-the-shelf models such as Random Forests (RF) and even End-to-End Transformer-based Symbolic Models (E2E). Unlike adversarial methods, RARE does not require generating extensive synthetic data to improve robustness. Experiments showed MLPs trained with the regularized loss exhibit 7-33% higher robustness under Gaussian noise and up to a 32.3% improvement under high-variance Laplace noise. Regularized CNNs outperform unregularized CNNs in low-noise conditions under Gaussian noise. A regularized symbolic equation learner converges in as few as 3,000 epochs for some cases, while the non-regularized one converged in 20,000 epochs. Finally, experimental results show that compared to Meta's E2E, we achieve an average 66.83% reduction in fitting time, corresponding to an expected 3x speedup across all tested equations.
|
|
We-S2-T7 |
Room 0.31 |
Human-Machine Cooperation and Systems 2 |
Regular Papers - HMS |
Chair: Richards, Dale | Thales UK |
Co-Chair: Sitdhipol, Supawich | Chulalongkorn University |
|
11:45-12:00, Paper We-S2-T7.1 | |
Autonomous Sensor Management: Using Decision Strings Methodology |
|
Richards, Dale | Thales UK |
Glover, Timothy John | Loughborough University |
Knowles, James | Loughborough University |
Coombes, Matthew | Loughborough Univerity |
Keywords: Human-Machine Cooperation and Systems, Human Factors, Interactive Design Science and Engineering
Abstract: Maintaining superiority on the battlefield is vital in ensuring mission success. The use of advanced technologies such as Autonomy and Artificial Intelligence offers the ability for the human to let the machines do the heavy lifting of tasks; especially those that require rapid processing of complex information and within a dynamic environment. This paper outlines the MASTER SOUP project that demonstrates the use of AI to manage multiple sensors, whilst also introducing a new method to assist the multidisciplinary design team better understand how the system deals with decisions that occur during the mission. The use of a method utilizing decision strings is outlined and discussed. By adopting this method, it was found that it was beneficial to both Human Factors and AI Engineers in terms of designing the Human-Machine Teaming concept. The use of decision strings provides an intuitive methodology for identifying and deconstructing the nature of decisions within the Human-Machine Team. Further to this it can be used to the benefit of all members of the design team to facilitate the elicitation of design requirements that provide benefits across multidisciplinary fields.
|
|
12:00-12:15, Paper We-S2-T7.2 | |
Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations |
|
Sitdhipol, Supawich | Chulalongkorn University |
Sukprasongdee, Waritwong | Chulalongkorn University |
Chuangsuwanich, Ekapol | Chulalongkorn University |
Tse, Rina | Chulalongkorn University |
Keywords: Human-Machine Cooperation and Systems, Human-Collaborative Robotics, Intelligence Interaction
Abstract: Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human–robot collaborative task performance.
|
|
12:15-12:30, Paper We-S2-T7.3 | |
Unlocking Crowdsourcing in Technology Foresight: A Rapid Review |
|
Simões, Jonatas | Universidade Federal Do Rio De Janeiro |
Nóbrega, Lucas | Universidade Federal Do Rio De Janeiro |
Martinez, Luiz Felipe | Universidade Federal Do Rio De Janeiro |
Argôlo, Matheus | Universidade Federal Do Rio De Janeiro |
Barbosa, Carlos Eduardo | Universidade Federal Do Rio De Janeiro |
de Almeida, Marcos Antonio | Ufrj |
Souza, Jano | Federal University of Rio De Janeiro |
Keywords: Cooperative Work in Design, Human-Machine Cooperation and Systems, Human Factors
Abstract: Traditional methods for Technology Foresight often struggle to keep up with the fast changes in technology. This work looks at how crowdsourcing can improve Technology Foresight by using collective intelligence to spot trends and tackle complex problems. The goal of this research is to overcome the limits of traditional approaches by using online platforms and new technologies like big data and machine learning to make innovation more accessible. We examine how to apply crowdsourcing effectively in Technology Foresight, discussing its benefits, challenges, and real-world uses. We used a Rapid Review (RR) approach, combined with Large Language Models (LLMs), to refine how we select and analyze research. This method helped us find important insights and categorize applications in different areas. The findings show that crowdsourcing can speed up technology forecasting by bringing in diverse viewpoints and encouraging collaboration across fields. Examples include its use in urban planning, disaster management, and open innovation platforms. Furthermore, combining crowdsourcing with advanced technologies enhances its effectiveness in foresight processes. This work offers practical advice for organizations that want to use crowdsourcing in their innovation strategies and suggests future ways to integrate new technologies into Technology Foresight.
|
|
12:30-12:45, Paper We-S2-T7.4 | |
Analysis of Gait Pattern Changes During Use of Wearable Cyborg HAL Related to Gait Ability in an Individual with Neuromuscular Disease |
|
Namikawa, Yasuko | University of Tsukuba |
Sankai, Yoshiyuki | University of Tsukuba |
Uehara, Akira | University of Tsukuba |
Kawamoto, Hiroaki | University of Tsukuba |
Keywords: Assistive Technology, Medical Informatics, Human-Machine Cooperation and Systems
Abstract: Cybernics treatment using the Hybrid Assistive Limb (HAL) can improve gait abilities. Contrary to conventional evaluations that compare 2-min walk distances without wearing HAL between pre- and post-intervention, we assess gait data measured by HAL during gait assistance, which enables the observation of gait changes that accompany the intervention. To establish this novel evaluation approach, it is essential to examine the relationship between changes in gait ability without HAL and gait patterns during HAL-assisted walking. Focusing on one individual with a neuromuscular disease, this study analyzed and evaluated the relationship between changes in the 2-min walk distance and changes in gait patterns during HAL-assisted walking. Principal component analysis (PCA) was employed to characterize the changes in gait patterns during HAL wear, followed by the creation of an individual model that predicts the rate of change in the 2-min walk distance without HAL from those features using the eXtreme Gradient Boosting (XGBoost). The model was then interpreted using SHapley Additive exPlanations (SHAP) to analyze the contributions of each feature to the prediction. Analysis of nine HAL-assisted walking trials and corresponding 2-min walk distance measurements revealed that changes in the 2-min walk distance were associated with alterations in specific gait patterns during HAL-assisted walking: knee joint angles, knee joint torques generated by HAL, and trunk pitch angles representing anterior-posterior trunk tilting. These findings clarified important features related to changes in gait ability within longitudinal gait pattern changes during cybernics treatment and demonstrated the utility of our analysis method and HAL-measured data for evaluating individual gait pattern changes.
|
|
12:45-13:00, Paper We-S2-T7.5 | |
Investigating the Stability of Neuronal Dynamics in Low-Oxygen States to Model Cognitive Engagement |
|
Almeida Campelo Ferreira, Rafaela | University of Florida |
Beres, Szilard Laszlo | University of Florida |
Napoli, Nicholas Joseph | University of Florida |
Keywords: Human Performance Modeling, Human Factors, Human-Machine Interaction
Abstract: Cognitive impairment due to hypoxia significantly affects human performance, particularly during tasks that require sustained mental effort and engagement. Electroencephalography (EEG) has emerged as a valuable non-invasive tool for analyzing brain activity and cognitive performance. However, traditional EEG-based methods for human performance modeling, such as the Engagement Index and wavelet entropy, are limited in their ability to detect rapid and localized changes in brain dynamics that are critical for real-time modeling. This paper investigates Spectral Stability, an entropy-based method that captures fast, localized changes in brain activity by analyzing the hierarchical rankings of spectral intensities across EEG frequency bands. Using a data set collected under normoxic and hypoxic dual-task scenarios, we evaluate the ability of Spectral Stability to provide granular insight into oscillatory cognitive states in different regions of the brain. Our findings demonstrate that Spectral Stability reliably distinguishes between normoxic and hypoxic conditions across multiple brain regions, outperforming the Engagement Index in spatial specificity, while also correlating significantly with the Engagement Index, indicating shared underlying neural dynamics. These results highlight that Spectral Stability may provide additional key information within neuronal dynamics as a tool to advance cognitive state modeling and understanding real-time engagement variability under impairment.
|
|
We-S2-T8 |
Room 0.32 |
Securing Trust and Resilience in AI-Driven Autonomous Systems |
Special Sessions: SSE |
Chair: Homaifar, Abdollah | North Carolina A&T State University |
Co-Chair: Nahavandi, Saeid | Swinburne University of Technology |
Organizer: Kebria, Parham | North Carolina Agricultural and Technical State University |
Organizer: Homaifar, Abdollah | North Carolina A&T State University |
Organizer: Nahavandi, Saeid | Swinburne University of Technology |
|
11:45-12:00, Paper We-S2-T8.1 | |
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN (I) |
|
Mohammadi, Ahmad | North Carolina Agricultural and Technical State University |
Ahmari, Reza | North Carolina Agricultural and Technical State University |
Hemmati, Vahid | North Carolina Agricultural and Technical State University |
Owusu-Ambrose, Frederick | North Carolina Agricultural and Technical State University |
Mahmoud, Mahmoud Nabil | North Carolina Agricultural and Technical State University |
Kebria, Parham | North Carolina Agricultural and Technical State University |
Homaifar, Abdollah | North Carolina A&T State University |
Saif, Mehrdad | University of Windsor |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, Trust in Autonomous Systems
Abstract: As autonomous vehicles (AVs) become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold (varepsilon) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. To assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.62pm1%, 99.96pm0.1%, 99.88pm0.1%, and 98.38pm0.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of AVs against GPS spoofing threats.
|
|
12:00-12:15, Paper We-S2-T8.2 | |
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing (I) |
|
Ahmari, Reza | North Carolina Agricultural and Technical State University |
Mohammadi, Ahmad | North Carolina Agricultural and Technical State University |
Hemmati, Vahid | North Carolina Agricultural and Technical State University |
Mynuddin, Mohammed | North Carolina Agricultural and Technical State University |
Mahmoud, Mahmoud Nabil | North Carolina Agricultural and Technical State University |
Kebria, Parham | North Carolina Agricultural and Technical State University |
Homaifar, Abdollah | North Carolina A&T State University |
Saif, Mehrdad | University of Windsor |
Keywords: Autonomous Vehicle, Trust in Autonomous Systems, Robotic Systems
Abstract: This study investigates the vulnerabilities of autonomous navigation and landing systems in Urban Air Mobility (UAM) vehicles. Specifically, it focuses on Trojan attacks that target deep learning models, such as Convolutional Neural Networks (CNNs). Trojan attacks work by embedding covert triggers within a model’s training data. These triggers cause specific failures under certain conditions, while the model continues to perform normally in other situations. We assessed the vulnerability of Urban Autonomous Aerial Vehicles (UAAVs) using the DroNet framework. Our experiments showed a significant drop in accuracy, from 96.4% on clean data to 73.3% on data triggered by Trojan attacks. To conduct this study, we collected a custom dataset and trained models to simulate real-world conditions. We also developed an evaluation framework designed to identify Trojan-infected models. This work demonstrates the potential security risks posed by Trojan attacks and lays the groundwork for future research on enhancing the resilience of UAM systems.
|
|
12:15-12:30, Paper We-S2-T8.3 | |
Conflict-Free Flight Scheduling Using Strategic Demand Capacity Balancing for Urban Air Mobility Operations (I) |
|
Hemmati, Vahid | North Carolina Agricultural and Technical State University |
Ayalew, Yonas | NCATSU |
Mohammadi, Ahmad | North Carolina Agricultural and Technical State University |
Ahmari, Reza | North Carolina Agricultural and Technical State University |
Kebria, Parham | North Carolina Agricultural and Technical State University |
Homaifar, Abdollah | North Carolina A&T State University |
Saif, Mehrdad | University of Windsor |
Keywords: System Modeling and Control, Conflict Resolution, Cooperative Systems and Control
Abstract: In this paper, we propose a conflict-free multi-agent flight scheduling that ensures robust separation in constrained airspace for Urban Air Mobility (UAM) operations application. First, we introduce Pairwise Conflict Avoidance (PCA) based on delayed departures, leveraging kinematic principles to maintain safe distances. Next, we expand PCA to multi-agent scenarios, formulating an optimization approach that systematically determines departure times under increasing traffic densities. Performance metrics, such as average delay, assess the effectiveness of our solution. Through numerical simulations across diverse multi-agent environments and real-world UAM use cases, our method demonstrates a significant reduction in total delay while ensuring collision-free operations. This approach provides a scalable framework for emerging urban air mobility systems.
|
|
12:30-12:45, Paper We-S2-T8.4 | |
The Emergence of Deep Reinforcement Learning for Path Planning (I) |
|
Nguyen, Thanh Thi | Monash University |
Nahavandi, Saeid | Swinburne University of Technology |
Razzak, Imran | Mohamed Bin Zayed University of Artificial Intelligence |
Nguyen, Dung | The University of Queensland |
Pham, Nhat Truong | Sungkyunkwan University |
Nguyen, Quoc Viet Hung | Griffith University |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Trust in Autonomous Systems
Abstract: The increasing demand for autonomous systems in complex and dynamic environments has driven significant research into intelligent path planning methodologies. For decades, graph-based search algorithms, linear programming techniques, and evolutionary computation methods have served as foundational approaches in this domain. Recently, deep reinforcement learning (DRL) has emerged as a powerful method for enabling autonomous agents to learn optimal navigation strategies through interaction with their environments. This survey provides a comprehensive overview of traditional approaches as well as the recent advancements in DRL applied to path planning tasks, focusing on autonomous vehicles, drones, and robotic platforms. Key algorithms across both conventional and learning-based paradigms are categorized, with their innovations and practical implementations highlighted. This is followed by a thorough discussion of their respective strengths and limitations in terms of computational efficiency, scalability, adaptability, and robustness. The survey concludes by identifying key open challenges and outlining promising avenues for future research. Special attention is given to hybrid approaches that integrate DRL with classical planning techniques to leverage the benefits of both learning-based adaptability and deterministic reliability, offering promising directions for robust and resilient autonomous navigation.
|
|
12:45-13:00, Paper We-S2-T8.5 | |
A Novel Flight Modeling Framework for Unmanned Aircraft in Realistic Airspace Encounters (I) |
|
Zeleke, Lydia Asrat | North Carolina A&T State University |
Lartey, Benjamin | North Carolina A&T State University |
Nuhu, Abdul-Rauf | North Carolina Agricultural and Technical State University |
Ayalew, Yonas | NCATSU |
Kebria, Parham | North Carolina Agricultural and Technical State University |
Homaifar, Abdollah | North Carolina A&T State University |
Keywords: Modeling of Autonomous Systems, Trust in Autonomous Systems, System Modeling and Control
Abstract: The integration of Unmanned Aircraft Systems (UAS) into the National Airspace System (NAS) requires robust encounter modeling tools to evaluate Detect-and-Avoid (DAA) systems. However, existing tools often lack the ability to model the unique dynamics of large UAS (lUAS) and small UAS (sUAS), and few are available as open-source solutions. In this paper, we introduce novel flight modeling concepts for lUAS and sUAS, developed within an open-source framework. For lUAS, we propose a hybrid modeling strategy that combines probabilistic manned aircraft models with UAS-specific performance constraints. A machine learning-based surrogate model is employed to streamline feasibility evaluation and enable the generation of realistic trajectories. The sUAS flight modeling technique enables mission-aware trajectory generation by incorporating geospatial data and customized control architecture for fixed-wing and multirotor configurations. These models capture a wide range of aircraft dynamics, offering the potential to generate versatile encounter datasets for DAA evaluation. By releasing them as open-source resources, we aim to encourage broader collaboration, inform regulatory development, and drive innovation in encounter modeling, all in support of the safe and effective integration of UAS into the NAS.
|
|
We-S2-T9 |
Room 0.51 |
Systems Safety and Security |
Regular Papers - HMS |
Chair: Colombo, Pietro | University of Insubria, Italy |
Co-Chair: Panda, Deepak Kumar | Cranfield University |
|
11:45-12:00, Paper We-S2-T9.1 | |
Efficient Enforcement of Fine-Grained Access Control in Sparkplug-Based Industrial Internet of Things |
|
Colombo, Pietro | University of Insubria, Italy |
Ferrari, Elena | University of Insubria, Italy |
Keywords: Systems Safety and Security
Abstract: Sparkplug is an emergent open-source software specification for Industrial Internet of Things (IIoT) systems, designed to favor data integration and device interoperability in an MQTT infrastructure. Although the security issues of IIoT systems can have relevant safety implications, Sparkplug only provides basic security features and essential, coarse-grained access control (AC) mechanisms. Effective AC solutions for Sparkplug-based IIoT systems still need to be designed, and, due to the Sparkplug's increasing popularity and its recent definition as an ISO Standard, this has become a crucial need. To fill this void, this paper proposes an approach to efficiently enforcing fine-grained AC in Sparkplug-based IIoT systems. In particular, we define a fine-grained discretionary AC model and a related reference monitor implementing an efficient enforcement mechanism. Early performance evaluations show a reasonably low time overhead.
|
|
12:00-12:15, Paper We-S2-T9.2 | |
Identification and Assessment of Causes of Urban Gas Accidents Via Text Mining and Bayesian Networks |
|
Guo, Peng | Northwestern Polytechnical University |
Ma, Yue | Northwestern Polytechnical University |
Zhao, Jing | Northwestern Polytechnical University |
Wei, Fuchuan | Northwestern Polytechnical University |
Keywords: Systems Safety and Security,, Human Factors, Networking and Decision-Making
Abstract: Urban gas systems, as critical components of urban infrastructure, are increasingly exposed to complex safety risks due to ongoing urban expansion and the resulting rise in accident frequency, posing significant threats to public welfare. Traditional cause analysis methods based on expert judgment often lack objectivity and reproducibility. This paper proposes a data-driven framework for identifying and evaluating the causes of urban gas accidents using information extracted from accident reports. A TF-H text mining technique is employed to extract causal factors, while the Apriori algorithm reveals their interrelationships. A Bayesian network is then constructed to infer accident severity and assess the relative importance of each factor. Validation with real-world cases confirms the effectiveness of the proposed approach. Based on the results, targeted management strategies are recommended to support the prevention and mitigation of urban gas accidents.
|
|
12:15-12:30, Paper We-S2-T9.3 | |
Generative Adversarial Evasion and Out-Of-Distribution Detection for UAV Cyber-Attacks |
|
Panda, Deepak Kumar | Cranfield University |
Guo, Weisi | Cranfield University |
Keywords: Systems Safety and Security,, Resilience Engineering, Information Visualization
Abstract: The increasing integration of UAVs into civilian airspace has amplified the urgency for resilient and intelligent intrusion detection system (IDS) frameworks, as traditional anomaly detection methods often struggle to detect novel threats. A common strategy is to treat the unfamiliar attacks as out-of-distribution (OOD) samples; hence, inadequate mitigation responses can leave systems vulnerable, granting adversaries the capability to cause potential damage. Furthermore, conventional OOD detectors frequently fail to discriminate the stealthy adversarial attacks from OOD samples. This paper proposes a conditional generative adversarial network (cGAN)-based framework specifically designed to craft stealthy adversarial attacks that effectively evade IDS mechanisms. Initially, we construct a robust multi-class classifier as IDS which classifies the benign UAV telemetry data from known cyber-attack types, including Denial of Service (DoS), false data injection (FDI), man-in-the-middle (MiTM), and replay attacks. Leveraging this classifier, our proposed cGAN strategically perturbs known attack features, generating sophisticated adversarial samples engineered to evade detection through benign misclassification. Then, the generative stealthy adversarial samples is iteratively refined to maintain statistical similarity with out-of-distribution (OOD) samples while achieving a high attack success rate. To effectively detect these stealthy adversarial perturbations, a conditional variational autoencoder (CVAE) is implemented, using negative log-likelihood as a metric to distinguish adversarial samples from genuine OOD samples. Comparative analyses between CVAE-based regret analysis and traditional Mahalanobis distance-based detectors demonstrate that the CVAE’s negative log-likelihood significantly outperforms in detecting stealthy adversarial attacks from OOD samples. Our findings highlight the necessity of advanced probabilistic modeling techniques to reliably detect and adapt the existing IDS against novel, generative-model-based stealthy cyber threats.
|
|
12:30-12:45, Paper We-S2-T9.4 | |
Inferring Smartphone Application Types Via Inaudible Charging Sound: A New Acoustic Side-Channel Attack |
|
Meng, Junchen | Zhejiang University of Technology |
Yang, Zhe | Zhejiang University of Technology |
Zhang, Xiao-Li | University of Science and Technology Beijing |
Zhu, Huaiyu | Zhejiang University |
Keywords: Systems Safety and Security, Environmental Sensing,, Human Perception in Multimedia
Abstract: The security and privacy protection of mobile devices have been critical issues. This study explores a novel acoustic side-channel attack method that infers the category of applications (apps) being used by analyzing the inaudible sounds emitted from chargers during the smartphone charging process. Unlike traditional methods that require direct contact or modification to the device, this approach demonstrates widespread availability and enhanced concealment. Through preprocessing, feature extraction, and model training, we conducted experiments using various smartphones, apps and chargers. The results indicate that both Random Forest and Transformer models achieve nearly perfect performance in app type inference under random splits cross-validation test, while the Transformer with end-to-end feature learning shows higher stability and generalizability under strict leave-one-out cross-validation, with an average accuracy of 63.52%. Finally, we give further analysis on effectiveness and efficiency of the proposed method, highlighting the potential threat of acoustic side-channel attacks and emphasizing the necessity of enhancing privacy protection and improved charging technologies.
|
|
12:45-13:00, Paper We-S2-T9.5 | |
VF-Mix: Variational Feature Mixing for Cross-Domain Face Anti-Spoofing |
|
Chen, Danwei | Nanjing University of Posts and Telecommunications |
Lin, Daoyang | Nanjing University of Posts and Telecommunications |
Keywords: Biometrics and Applications,, Systems Safety and Security
Abstract: Face Anti-Spoofing (FAS) is essential for securing face recognition systems, but its practical deployment is severely hindered by the domain shift problem, where models trained on source domains perform poorly on unseen target domains. To address this critical challenge, this paper proposes VF-Mix, a framework whose novelty lies in the synergistic integration of multiple techniques to effectively balance feature invariance and discriminability. VF-Mix first utilizes a variational encoder to learn a regularized latent space. Within this space, cross-domain feature mixing and domain adversarial training work in concert to align distributions and learn generalized, domain-invariant cues. Crucially, supervised contrastive learning is simultaneously applied to enhance the class-separability of these features, ensuring that the push for invariance does not weaken discriminative power. This entire process is stabilized by a gradient alignment mechanism that harmonizes optimization signals from diverse source domains. Extensive experiments on standard benchmarks validate our approach. Under the challenging leave-one-out O&C&M→I protocol, VF-Mix achieves state-of-the-art performance, reducing the HTER to 3.75% and increasing the AUC to 99.34%, significantly outperforming prior methods. Ablation studies confirm that this synergy between components is critical for learning robust and discriminative features, presenting VF-Mix as an effective solution for advancing cross-domain face anti-spoofing.
|
|
We-S2-T10 |
Room 0.90 |
Visual Computing and Cognitive Modelling for Human-Machine and Social
Interaction & Humanized Crowd Computing |
Special Sessions: HMS |
Chair: Yu, Hui | University of Glasgow |
Co-Chair: Tang, Ying | Rowan University |
Organizer: Yu, Hui | University of Glasgow |
Organizer: Jian, Muwei | Shandong University of Finance and Economics |
Organizer: Tang, Ying | Rowan University |
Organizer: Wang, Jiacun | Monmouth University |
Organizer: EL Yacoubi, Mounîm A. | Institut Mines-Telecom / Telecom SudParis |
|
11:45-12:00, Paper We-S2-T10.1 | |
Open World Adaptive Pseudo Contrastive Learning for Generalized Category Discovery (I) |
|
Hao, Yiqing | Beijing Jiaotong University |
Wang, Xu | Beijing Jiaotong University |
Jin, Yi | Beijing Jiaotong University |
Wang, Tao | Beijing Jiaotong University |
Li, Yidong | Beijing Jiaotong University |
Yu, Hui | University of Glasgow |
Keywords: Visual Analytics/Communication
Abstract: In this work, we investigate the challenging task of Generalized Category Discovery (GCD). Given datasets collected from open-world scenarios comprising both labeled and unlabeled images, GCD aims to classify all unlabeled images while simultaneously identifying unlabeled novel categories. The fundamental challenge in GCD tasks stems from inherent annotation discrepancies between seen and novel classes within the dataset. The lack of reliable label supervision for novel classes in unlabeled data leads to significant disparities in the model’s learning between old and novel classes, which is termed the bias risk. Recent advancements in GCD have employed the entropy maximization algorithm to alleviate the bias risk. However, they fail to provide debiased optimization for unlabeled data, leading to models that struggle with extracting discriminative features from such data. To address these challenges, we have created an Open-world pseudo-contrastive learning framework named OpcGCD. Our OpcGCD framework implements a dynamic category-wise threshold mechanism, which employs a parametric prototype classifiers to generate debiased pseudo-labels for unlabeled samples. To facilitate the learning of discriminative feature representations, our proposed OpcGCD employs debiased pseudo-labels in the formulation of a contrastive learning loss. Extensive evaluations conducted on multiple GCD benchmark datasets demonstrate the robustness and effectiveness of the approach.
|
|
12:00-12:15, Paper We-S2-T10.2 | |
AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding (I) |
|
Lin, Fei | Macau University of Science and Technology |
Tian, Yonglin | Institute of Automation, Chinese Academy of Sciences |
Zhang, Tengchao | Macau University of Science and Technology |
Huang, Jun | Macau University of Science and Technology |
Sangtian, Guan | Macau University of Science and Technology |
Wang, Fei-Yue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Human-Machine Interaction, Human-Collaborative Robotics, Intelligence Interaction
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly important in dynamic environments such as logistics transportation and disaster response. However, current tasks often rely on human operators to monitor aerial videos and make operational decisions. This mode of human-machine collaboration suffers from significant limitations in efficiency and adaptability. In this paper, we present AirVista-II—an end-to-end agentic system for embodied UAVs, designed to enable general-purpose semantic understanding and reasoning in dynamic scenes. The system integrates agent-based task identification and scheduling, multimodal perception mechanisms, and differentiated keyframe extraction strategies tailored for various temporal scenarios, enabling the efficient capture of critical scene information. Experimental results demonstrate that the proposed system achieves high-quality semantic understanding across diverse UAV-based dynamic scenarios under a zero-shot setting.
|
|
12:15-12:30, Paper We-S2-T10.3 | |
Revisiting Multi-Modal Alignment: A Distribution View (I) |
|
Li, Weikai | Chongqing Jiaotong University |
Nan, Tian | Chongqing Jiaotong University |
Li, Yuan | IntelliCloud |
Tang, Ying | Rowan University |
Keywords: Cognitive Computing, Human-Computer Interaction, Multimedia Systems
Abstract: Current multi-modal large language models (MMLMs) primarily rely on instance-level feature statistics for cross-modal alignment. However, they commonly suffer three inherent limitations including vulnerability to outlier perturbations, neglect of inter-feature covariance structures, and local optimum trapping. Such limitations stem from a critical oversight—existing approaches disregard the global statistical structure of multi-modal data, treating cross-modal alignment as isolated feature-level alignment rather than systematic distribution-level alignment. To address these issues, this paper proposes Layer-wise Covariance Alignment (LCA), which firstly leverages the distribution-level alignment for cross-modal alignment. The effectiveness of LCA is validated through the use of parameter-efficient Low-Rank Adaptation (LoRA) on CLIP architectures. Experimental validation across 8 benchmarks demonstrates state-of-the-art performance, confirming the critical role of distribution-level alignment in overcoming sample-level optimization constraints for cross-modal learning.
|
|
12:30-12:45, Paper We-S2-T10.4 | |
KEMO: A Multi-Objective Thought Chain Distillation Based Model for Intraoperative Hazardous Prediction and Event Plan Generation (I) |
|
Hao, Sen | Beijing University of Technology |
Wu, Huimin | Capital Medical University, Beijing Anzhen Hospital |
Zhao, Qing | Beijing University of Technology |
Qi, Hongzhi | Beijing University of Technology |
Che, Shuyao | Beijing University of Technology |
Wang, Sheng | Capital Medical University, Beijing Anzhen Hospital |
Pei, Yan | University of Aizu |
Ouyang, Yinuo | Beijing University of Technology |
Li, Jianqiang | Beijing University of Technology |
Keywords: Human-Computer Interaction, Medical Informatics, Human-Machine Interaction
Abstract: Accurate prediction of intraoperative hazardous events and generation of effective intervention plans are critical to surgical safety, but face multiple challenges of real-time, accuracy, and interpretability. Large-scale language models have potential, but their high cost and potential ‘illusion’ problems limit their application in real-time clinical environments. Traditional multitask learning models are efficient but knowledge-constrained, making it difficult to capture complex reasoning processes. To bridge this gap, this paper proposes a multi-objective distillation knowledge enhancement model-KEMO, which innovatively adopts a multi-objective chain-of-thought distillation framework to not only mimic the prediction results of the instructor's LLM, but also explicitly migrate its structured reasoning process to the lightweight student model, which improves the answerability of the model by synergistically optimising the three objectives of event prediction, reasoning alignment and scenario generation. Interpretability. Meanwhile, combined with the Knowledge Graph-based Retrieval Augmented Generation mechanism, validated medical knowledge is dynamically injected to enhance the accuracy and reliability of decision-making and reduce model illusion. The experimental results show that the KEMO model significantly outperforms traditional models of the same magnitude in intraoperative hazardous event prediction and prognostic proposal generation, and achieves a performance comparable to that of a large faculty model.The KEMO model effectively bridges the gap between the large language model and the actual clinical application, and facilitates the transformation of the large model knowledge to the actual clinical deployment.
|
|
12:45-13:00, Paper We-S2-T10.5 | |
Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification (I) |
|
Pitukova, Laura | Technical University of Kosice |
Sinčák, Peter | Technical University of Kosice |
Kovács, László József | University of Miskolc |
|
We-S2-T12 |
Room 0.95 |
Invited Position and Review Papers on Emerging Trends in Systems, Man, and
Cybernetics |
Special Sessions: Cyber |
Chair: Strasser, Thomas | AIT Austrian Institute of Technology GmbH |
Co-Chair: Lai, Loi Lei | Guangdong University of Technology |
Organizer: Strasser, Thomas | AIT Austrian Institute of Technology GmbH |
Organizer: Eigner, György | Obuda University |
Organizer: Kovacs, Levente | Obuda University |
|
11:45-12:00, Paper We-S2-T12.1 | |
Memories of the Future: Systems, Human, and Cybernetic Aspects of the Emerging Post-AI World (I) |
|
Kreinovich, Vladik | University of Texas at El Paso |
Svitek, Miroslav | Czech Technical University in Prague |
Julio, Urenda | University of Texas at El Paso |
Kosheleva, Olga | University of Texas at El Paso |
Keywords: AI and Applications, Application of Artificial Intelligence, Expert and Knowledge-Based Systems
Abstract: While current machine-learning-based AI techniques have been spectacularly successful, their present applications still leaves many important open questions – for example, how to make their results more reliable or, at least, how to gauge how reliable is each AI recommendation. In this paper, we argue that to fully answer these questions, we need to go beyond the current AI techniques, and that in this development, systems-, human-, and cybernetics-based ideas not only naturally appear, they seem to provide a way to the desired answers.
|
|
12:00-12:15, Paper We-S2-T12.2 | |
A Brief Overview on Some Areas in Systems, Man and Cybernetics and Suggestions on Their Future (I) |
|
Qi Hong, Lai | University of Oxford |
Yuan, Yujie | Beijing Jiaotong University |
Lai, Chun Sing | Brunel University London |
Chen, Chunjie | SIAT, CAS |
Lai, Loi Lei | Guangdong University of Technology |
Keywords: AI and Applications, Cloud, IoT, and Robotics Integration
Abstract: The authors hope that this overview and suggestions will stimulate and contribute to further ongoing discussions and interesting research work and industrial applications in some fields of Systems, Man and Cybernetics.
|
|
12:15-12:30, Paper We-S2-T12.3 | |
Toward Autonomous Educational Support with Multi-Agent Systems (I) |
|
Hare, Ryan | Rowan University |
Tang, Ying | Rowan University |
Keywords: Agent-Based Modeling, Machine Learning
Abstract: Integrating artificial intelligence into educational technology presents great opportunities for automated educational systems. These systems could relieve teacher resources and support underperforming students. However, creating systems that are adaptive, scalable, and factually correct is resource intensive. Furthermore, there are many technologies that are prevalent, but lack systematic ways to integrate them into existing educational technologies. Building on reinforcement learning and large language models (LLMs), this paper introduces a multi-agent framework for adding both a reinforcement learning-based tutor and an LLM-driven peer to educational systems. The integrated architecture is unified with a central ontology, acting as a symbolic knowledge base and facilitating data transformation. We also detail a novel windowed experience sharing method for improving reinforcement learning training efficiency when dealing with similar environments and low-data situations. We present our architecture and simulated results to verify the reinforcement learning algorithm as an adaptive tutor, as well as the integration of an LLM-driven peer and educational outcomes from said integration.
|
|
12:30-12:45, Paper We-S2-T12.4 | |
Situated Intelligence and Social Coordination in Systems with Interacting Autonomous Agents (I) |
|
Kozma, Robert | University of Memphis, TN |
Rudas, Imre | Obuda University |
Kovacs, Levente | Obuda University |
Keywords: Neural Networks and their Applications, Computational Intelligence, Agent-Based Modeling
Abstract: Embodiment is a key aspect of human intelligence, related to our ability of to identify the context of the individual experiences at a given time, corresponding to the natural constraints represented by our body. Learning from higher cognitive functions and social coordination between humans can support building intelligent robot systems and facilitates harmonious human-machine interactions. This position paper provides an overview of neural structures and neural dynamics contributing to human cognitive functions, including multisensory integration, Gestalt formation, perception, and building sensory associations. Embodied cognitive principles are illustrated through the intentional action-perception cycle. The results are applied to the design novel algorithms for brain-inspired cognitive robotics. Example scenarios include imitation learning, and the emergence of dialogue patterns in social robotics settings. Index Terms—Neurodynamics; Intentionality; Cognitive robotics; Embodied intelligence; Social Coordination; Neurodynamics.
|
|
We-S2-T13 |
Room 0.96 |
AI and Applications 3 |
Regular Papers - Cybernetics |
Chair: Kouzinopoulos, Charalampos S. | Maastricht University |
Co-Chair: Kim, Woo-Chan | KOREA UNIVERSITY |
|
11:45-12:00, Paper We-S2-T13.1 | |
KiC: Keyword-Inspired Cascade for Cost-Efficient Text Generation with LLMs |
|
Kim, Woo-Chan | KOREA UNIVERSITY |
Park, Ji-Hoon | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: AI and Applications, Hybrid Models of Computational Intelligence, Deep Learning
Abstract: Large language models (LLMs) have demonstrated state-of-the-art performance across a wide range of natural language processing tasks. However, high-performing models are typically accessible only via APIs, incurring substantial inference costs. Cascade methods address this by initially employing a cheaper model and escalating to a stronger one only when necessary. Nevertheless, existing cascade approaches struggle to select a reliable representative response and assess the overall reliability of free-form outputs, as they rely on exact text matching. To overcome these limitations, we propose Keyword-inspired Cascade (KiC), a novel framework for cost-efficient free-form text generation. KiC identifies the most representative answer among multiple outputs from a weaker model and evaluates the semantic alignment of other responses with it. Based on the degree of alignment, KiC determines whether to accept the weaker model’s output or escalate to a stronger model. Experiments on three free-form text generation benchmarks show that KiC achieves 97.53% of GPT-4’s accuracy while reducing API costs by 28.81% on average, and even outperforms GPT-4 in a specific benchmark.
|
|
12:00-12:15, Paper We-S2-T13.2 | |
RaDL: Relation-Aware Disentangled Learning for Multi-Instance Text-To-Image Generation |
|
Park, Geon | Korea University |
Kim, Seon Bin | Korea University |
Jung, Gunho | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: AI and Applications, Image Processing and Pattern Recognition, Deep Learning
Abstract: With recent advancements in text-to-image (T2I) models, effectively generating multiple instances within a single image prompt has become a crucial challenge. Existing methods, while successful in generating positions of individual instances, often struggle to account for relationship discrepancy and multiple attributes leakage. To address these limitations, this paper proposes the relation-aware disentangled learning (RaDL) framework. RaDL enhances instance-specific attributes through learnable parameters and generates relation-aware image features via Relation Attention, utilizing action verbs extracted from the global prompt. Through extensive evaluations on benchmarks such as COCO-Position, COCO-MIG, and DrawBench, we demonstrate that RaDL outperforms existing methods, showing significant improvements in positional accuracy, multiple attributes consideration, and the relationships between instances. Our results present RaDL as the solution for generating images that consider both the relationships and multiple attributes of each instance within the multi-instance image.
|
|
12:15-12:30, Paper We-S2-T13.3 | |
DMA-MCTS: Dynamic Memory-Augmented Monte-Carlo Tree Search for LLM Task Planning |
|
Wang, Jiakang | Institute of Computing Technology, Chinese Academy of Sciences |
Wang, Qi | Institute of Computing Technology, Chinese Academy of Sciences |
Li, Mengxian Li | Institute of Computing Technology, Chinese Academy of Sciences |
Li, Tingting | Institute of Computing Technology, Chinese Academy of Sciences |
Xu, Yongjun Xu | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: AI and Applications, Machine Learning, Optimization and Self-Organization Approaches
Abstract: While Large Language Models (LLMs) show promise for task planning, their efficacy diminishes in complex, long-horizon tasks within dynamic, partially observable environments, primarily due to challenges in long-term reasoning and effective adaptation from experience. A key limitation of current approaches is the insufficient utilization of historical trajectory information. To overcome these challenges in the context of Partially Observable Markov Decision Process (POMDP) planning, this paper introduces DMA-MCTS (Dynamic Memory-Augmented Monte-Carlo Tree Search), a framework that integrates Monte Carlo Tree Search (MCTS) with LLMs, augmented by a novel dynamic memory and reflection system. The core technical contributions include: (1) a dual-layer semantic memory repository enabling efficient context-aware retrieval of past experiences; (2) a memory-enhanced UCT selection strategy biased by historical Q-values to guide search; and (3) a differentiated reflection mechanism employing LLMs to extract generalizable knowledge from both successful and failed trajectories. Comprehensive evaluations conducted on complex object rearrangement tasks within the VirtualHome simulator demonstrate that DMA-MCTS significantly outperforms relevant baselines, including standard LLM-MCTS approaches, in terms of task success rate, generalization capabilities, and planning efficiency. These results underscore the critical importance of integrating structured dynamic memory and systematic reflection mechanisms for developing highly adaptive and effective LLM-based agents capable of tackling long-horizon planning problems.
|
|
12:30-12:45, Paper We-S2-T13.4 | |
Learning Preference Distributions: A Label-Side Paradigm for Explainable Reward Models |
|
Chen, Shikai | Southeast University |
Yuan, Jin | Lenovo |
Zhang, Yang | Lenovo |
Shi, Zhongchao | Lenovo |
Fan, Jianping | Lenovo |
Geng, Xin | Southeast University |
Rui, Yong | Southeast University |
Keywords: AI and Applications, Machine Learning, Transfer Learning
Abstract: The reward model is a critical component in training powerful large language models. However, current methods largely overlook the inherent subjectivity and variability in human preferences. Typically, this issue is indirectly addressed through model-side approaches, such as ensemble methods to estimate uncertainty from multiple predictions or uncertainty-aware regression to predict mean and variance. These indirect approaches fail to capture the intrinsic distributional characteristics and inter-rater disagreements present in human judgments. In contrast, we propose a direct, label-side solution by explicitly modeling human preference distributions. We recover missing information from scalar ratings to construct meaningful distribution labels. By employing Label Distribution Learning (LDL), each dimension of the resulting multidimensional discrete distribution explicitly corresponds to a specific preference score, naturally reflecting the subjective and multidimensional nature of human evaluations. Our approach improves explainability and confidence estimation, while also enabling more effective data selection and sample-efficient test-time adaptation. Empirical results demonstrate that our method not only achieves state-of-the-art performance but also provides a robust framework for uncertainty quantification and nuanced preference modeling.
|
|
12:45-13:00, Paper We-S2-T13.5 | |
A Segmented Robot Grasping Perception Neural Network for Edge AI |
|
Bröcheler, Casper | Maastricht University |
Vroom, Thomas | Maastricht University |
Timmermans, Derrick | Maastricht University |
Akker, Alan van den | Maastricht University |
Tang, Guangzhi | Maastricht University |
Kouzinopoulos, Charalampos S. | Maastricht University |
Möckel, Rico | Maastricht University |
Keywords: AI and Applications, Machine Vision, Deep Learning
Abstract: Robotic grasping, the ability of robots to reliably secure and manipulate objects of varying shapes, sizes and orientations, is a complex task that requires precise perception and control. Deep neural networks have shown remarkable success in grasp synthesis by learning rich and abstract representations of objects. When deployed at the edge, these models can enable low-latency, low-power inference, making real-time grasping feasible in resource-constrained environments. This work implements Heatmap-Guided Grasp Detection, an end-to-end framework for the detection of 6-Dof grasp poses, on the GAP9 RISC-V System-on-Chip. The model is optimised using hardware-aware techniques, including input dimensionality reduction, model partitioning, and quantisation. Experimental evaluation on the GraspNet-1Billion benchmark validates the feasibility of fully on-chip inference, highlighting the potential of low-power MCUs for real-time, autonomous manipulation.
|
|
We-S2-T14 |
Room 0.97 |
Exploring Shared and Cooperative Control Systems: Models, Patterns and
Assessment Methodologies 2 |
Special Sessions: HMS |
Chair: Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Co-Chair: Mandischer, Nils | University of Augsburg |
Organizer: Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Organizer: Jost, Céline | Paris 8 University |
Organizer: Mandischer, Nils | University of Augsburg |
Organizer: Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Organizer: Pool, Daan Marinus | TU Delft |
Organizer: Carlson, Tom | University College London |
Organizer: Shen, Weiming | Huazhong University of Science and Technology |
Organizer: Liu, Peter X. | CARLETON UNIVERSITY |
|
11:45-12:00, Paper We-S2-T14.1 | |
Multi-View Reconstruction with Global Context for 3D Anomaly Detection (I) |
|
Sun, Yihan | Huazhong University of Science and Technology |
Cheng, Yuqi | Huazhong University of Science and Technology |
Cao, Yunkang | Huazhong University of Science and Technology |
Zhang, Yuxin | Huazhong University of Science and Technology |
Shen, Weiming | Huazhong University of Science and Technology |
Keywords: Human-Collaborative Robotics, Systems Safety and Security, Supervisory Control
Abstract: 3D anomaly detection is critical in industrial quality inspection. While existing methods achieve notable progress, their performance degrades in high-precision 3D anomaly detection due to insufficient global information. To address this, we propose Multi-View Reconstruction (MVR), a method that losslessly converts high-resolution point clouds into multi-view images and employs a reconstruction-based anomaly detection framework to enhance global information learning. Extensive experiments demonstrate the effectiveness of MVR, achieving 89.6% object-wise AU-ROC and 95.7% point-wise AU-ROC on the Real3D-AD benchmark.
|
|
12:00-12:15, Paper We-S2-T14.2 | |
Leveraging Learning Bias for Noisy Anomaly Detection (I) |
|
Zhang, Yuxin | Huazhong University of Science and Technology |
Cao, Yunkang | Huazhong University of Science and Technology |
Cheng, Yuqi | Huazhong University of Science and Technology |
Sun, Yihan | Huazhong University of Science and Technology |
Shen, Weiming | Huazhong University of Science and Technology |
Keywords: Human-Collaborative Robotics, Systems Safety and Security, Supervisory Control
Abstract: This paper addresses the challenge of fully unsupervised image anomaly detection (FUIAD), where training data may contain unlabeled anomalies. Conventional methods assume anomaly-free training data, but real-world contamination leads models to absorb anomalies as normal, degrading detection performance. To mitigate this, we propose a two-stage framework that systematically exploits inherent learning bias in models. The learning bias stems from: (1) the statistical dominance of normal samples, driving models to prioritize learning stable normal patterns over sparse anomalies, and (2) feature-space divergence, where normal data exhibit low intra-class consistency while anomalies display high diversity, leading to unstable model responses. Leveraging the learning bias, stage 1 partitions the training set into subsets, trains sub-models, and aggregates cross-model anomaly scores to filter a purified dataset. Stage 2 trains the final detector on this dataset. Experiments on the Real-IAD benchmark demonstrate superior anomaly detection and localization performance under different noise conditions. Ablation studies further validate the framework’s contamination resilience, emphasizing the critical role of learning bias exploitation. The model-agnostic design ensures compatibility with diverse unsupervised backbones, offering a practical solution for real-world scenarios with imperfect training data.
|
|
12:15-12:30, Paper We-S2-T14.3 | |
Recommending Level of Haptic Guidance Based on Sensory Reliability Using a Supplemental Interface for Underwater Robot Operation (I) |
|
Yamamoto, Keita | Nara Institute of Science and Technology |
Sato, Eito | Nara Institute of Science and Technology |
Wada, Takahiro | Nara Institute of Science and Technology |
Keywords: Haptic Systems, Human-Machine Interface, Shared Control
Abstract: Haptic shared control (HSC) is a control approach where both a human operator and an autonomous controller exert force on a shared, physical control terminal of a robot. HSC allows the fluid combination of human intelligence and machine precision, improving task performance and reducing the human workload. When the autonomous controller is unreliable, however, the frequent disagreement between the human and the machine input adversely affects the task performance and the operator workload. Previous research showed that allowing the human operator to adjust the strength of the haptic guidance can alleviate the problem. However, the decision of when and how much to adjust the haptic guidance strength remained a challenging task for a human operator. This study proposed an approach that utilized a grip mechanism for adjusting and recommending an appropriate strength. In the proposed approach, the human operator was in charge of deciding the haptic strength via the grip angle. The autonomous controller suggested the appropriate haptic strength based on its reliability by applying proportional control to the grip angle. An experiment with a simulated remotely operated vehicle (ROV) teleoperation with HSC guidance showed that the proposed method is effective in aiding the adjustment of autonomous controller input strength, as well as in a marginal reduction in workload.
|
|
12:30-12:45, Paper We-S2-T14.4 | |
Biodynamic Feedthrough Models and Model-Based Cancellation for Touchscreen Dragging Inputs in Turbulence (I) |
|
McKenzie, Max | Delft University of Technology |
Pool, Daan Marinus | TU Delft |
Keywords: Human-Machine Interface, Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: This paper applies model-based biodynamic feedthrough (BDFT) cancellation to a touchscreen dragging task during realistic vertical (heave) and lateral (sway) aircraft turbulence, to mitigate erroneous turbulence-induced inputs. One-size-fits-all (OSFA) BDFT models were used to model the influence of turbulence accelerations on finger position, achieving average quality-of-fits of 61% and 69% in the vertical and horizontal screen directions, respectively. On average, 27% of the touch input error variance was mitigated these OSFA models, with individualized models providing only a marginal improvement (+4%). The application of OSFA models identified from a condition with equally-scaled turbulence in heave and sway (adjusted intensity) to the realistic turbulence condition did not significantly affect cancellation performance, indicating that BDFT models may not need to be adaptive to varying motion intensity. However, consistent with earlier work, BDFT dynamics were found to vary between vertical and horizontal finger movements, with BDFT dynamics exhibiting lower stiffness and a higher static gain for vertical BDFT. On average, the linear BDFT-related component of touch input errors contributed 41% of the overall error variance, indicating that current linear BDFT model may need to be extended to include nonlinear effects, such as varying finger friction.
|
|
12:45-13:00, Paper We-S2-T14.5 | |
Disentangling Coordiante Frames for Task Specific Motion Retargeting in Teleoperation Using Shared Control and VR Controllers (I) |
|
Grobbel, Max | FZI Forschungszentrum Informatik |
Flögel, Daniel | FZI Forschungszentrum Informatik |
Rigoll, Philipp | FZI Forschungszentrum Informatik |
Hohmann, Sören | KIT |
Keywords: Telepresence, Shared Control, Human-Machine Interface
Abstract: Task performance in terms of task completion time in teleoperation is still far behind compared to humans conducting tasks directly. One large identified impact on this is the human capability to perform transformations and alignments, which is directly influenced by the point of view and the motion retargeting strategy. In modern teleoperation systems, motion retargeting is usually implemented through a one time calibration or switching modes. Complex tasks, like concatenated screwing, might be difficult, because the operator has to align (e.g. mirror) rotational and translational input commands. Recent research has shown, that the separation of translation and rotation leads to increased task performance. This work proposes a formal motion retargeting method, which separates translational and rotational input commands. This method is then included in a optimal control based trajectory planner and shown to work on a UR5e manipulator.
|
|
We-S3-T1 |
Hall F |
Deep Learning & Representation Learning 1 |
Regular Papers - Cybernetics |
Chair: Oliveira, Adriano, Adriano L.I.Oliveira | Universidade Federal De Pernambuco |
Co-Chair: Cao, Helin | University of Bonn |
|
14:00-14:15, Paper We-S3-T1.1 | |
SWA-SOP: Spatially-Aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving |
|
Cao, Helin | University of Bonn |
Materla, Rafael | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Deep Learning, Representation Learning, Machine Vision
Abstract: Perception systems in autonomous driving rely on sensors such as LiDAR and cameras to perceive the 3D environment. However, due to occlusions and data sparsity, these sensors often fail to capture complete information. Semantic Occupancy Prediction (SOP) addresses this challenge by inferring both occupancy and semantics of unobserved regions. Existing transformer-based SOP methods lack explicit modeling of spatial structure in attention computation, resulting in limited geometric awareness and poor performance in sparse or occluded areas. To this end, we propose Spatially-aware Window Attention (SWA), a novel mechanism that incorporates local spatial context into attention. SWA significantly improves scene completion and achieves state-of-the-art results on LiDAR-based SOP benchmarks. We further validate its generality by integrating SWA into a camera-based SOP pipeline, where it also yields consistent gains across modalities.
|
|
14:15-14:30, Paper We-S3-T1.2 | |
Second-Order Latent Factorization of Tensors Based on Tucker Decomposition for Spatio-Temporal Traffic Flow Data Completion |
|
Mi, Jiajia | Guangdong University of Technology |
Li, Weiling | Dongguan University of Technology |
Huaqiang, Yuan | Dongguan University of Technology |
Xie, Zhe | Dreame Technology Co., Ltd |
Liu, Dongning | Guangdong University of Technology |
Keywords: Representation Learning, Computational Intelligence in Information, Expert and Knowledge-Based Systems
Abstract: The efficiency of Intelligent Transport Systems (ITS) runs on high-quality traffic data, however in real-world deployments, sensors failures, communication interruptions or other issues often lead to missing data, which affects the performance of ITS. Aiming at traffic data’s complex spatio-temporal characteristics, although the latent factorization of tensors (LFT) model has been widely used for missing-value completion, its non-convex objective function makes it difficult for first-order optimization methods to approximate high-quality second-order stationary points, therefore limiting the improvement of the completion accuracy. To address the issues, this paper proposes an incomplete tensor complementation model combining Tucker decomposition and second-order optimization strategy to improve the complementation accuracy and convergence stability. To address the issues, this paper proposes a Second-order Latent Factorization of Tensors based on Tucker Decomposition (SLTD), and efficiently solves it via Gauss-Newton approximation, so that it can significantly improve the model performance while keeping the computational cost low. Experimental results on real traffic datasets (in terms of average vehicle speed) from four cities verify the effectiveness of SLTD. Results show that the proposed model outperforms existing prevailing methods in terms of accuracy and provides a better solution for traffic data completion.
|
|
14:30-14:45, Paper We-S3-T1.3 | |
Robust Representation Learning for Time Series Via Decomposition and Fine-Grained Similarity-Guided Contrast |
|
Diao, Jianzhou | Beijing University of Posts and Telecommunications |
Zhu, Xinning | Beijing University of Posts and Telecommunications |
Wang, Ye | Beijing University of Posts and Telecommunications |
Hu, Zheng | Beijing University of Posts and Telecommunications |
Keywords: Representation Learning, Deep Learning, AI and Applications
Abstract: Self-supervised contrastive learning has demonstrated effectiveness in time series representation learning. However, existing methods still exhibit three major limitations: limited robustness to noise, missing values, and distribution shifts; reliance on a binary contrastive objective that overlooks similarity information between time series instances; and coarse-grained contrast on raw observations that fails to capture true underlying relationships between time series instances. To address these limitations, we propose a novel framework that achieves robust representation learning for time series through decomposition and fine-grained similarity-guided contrast. Specifically, we apply decomposition at the input stage to smooth noise and missing values, and the resulting disentangled trend-seasonal representations provide adaptability to distribution shifts. Furthermore, we introduce a similarity-guided contrastive loss that incorporates similarity information between instances. Additionally, our method enables fine-grained contrast through separate trend contrasting and seasonal contrasting. Extensive experiments on forecasting, anomaly detection, and classification tasks demonstrate that our framework achieves state-of-the-art performance. Further analyses validate its robustness and the effectiveness of its design.
|
|
14:45-15:00, Paper We-S3-T1.4 | |
Leveraging Transformer-Based Pretrained Embeddings for Reinforcement Learning in Cryptocurrency Trading |
|
Lima, Adriano | Universidade Federal De Pernambuco |
Zanchettin, Cleber | Universidade Federal De Pernambuco |
Oliveira, Adriano, Adriano L.I.Oliveira | Universidade Federal De Pernambuco |
Keywords: Representation Learning, Deep Learning, Transfer Learning
Abstract: ChronosRL is a hybrid zero-shot trading framework that plugs transformer-derived Chronos embeddings into deep reinforcement learning (DQN, PPO, A2C, RPPO), eliminating the handcrafted indicators that dominate crypto trading. For every daily bar, we pass a two-channel price–volume series through Chronos-small to obtain a 768-dimensional embedding capturing temporal patterns; the RL agent then decides Buy / Sell / Hold under single-position and 5% stop-loss. Across six USDT pairs from July 1–Nov 30 2024, ChronosRL delivers a mean ROI of 102.8% (std 79.7%), surpassing Buy-and-Hold (98.4%) and a Bollinger-MA baseline (46.4%). The best configuration — DQN + Chronos — achieves a Sharpe 0.78 while operating zero-shot, demonstrating the power of pretrained temporal representations to generalise in volatile markets. These results suggest that merging foundation models with RL is a promising path toward data-driven trading strategies.
|
|
15:00-15:15, Paper We-S3-T1.5 | |
Sliced Wasserstein Discrepancy in Disentangling Representation and Adaptation Networks for Unsupervised Domain Adaptation |
|
Sol, Joel | University of Victoria |
Alijani, Shadi | University of Victoria |
Najjaran, Homayoun | University of Victoria |
Keywords: Representation Learning, Transfer Learning
Abstract: This paper introduces DRANet-SWD as a novel complete pipeline for disentangling content and style representations of images for unsupervised domain adaptation (UDA). The approach builds upon DRANet by incorporating the sliced Wasserstein discrepancy (SWD) as a style loss instead of the traditional Gram matrix loss. The potential advantages of SWD over the Gram matrix loss for capturing style variations in domain adaptation are investigated. Experiments using digit classification datasets and driving scenario segmentation validate the method, demonstrating that DRANet-SWD enhances performance. Results indicate that SWD provides a more robust statistical comparison of feature distributions, leading to better style adaptation. These findings highlight the effectiveness of SWD in refining feature alignment and improving domain adaptation tasks across these benchmarks. Our code can be found here.
|
|
We-S3-T2 |
Hall N |
Neural Networks and Their Applications 3 |
Regular Papers - Cybernetics |
Chair: Sun, Xinmiao | University of Science and Technology Beijing |
Co-Chair: Tomioka, Yoichi | The University of Aizu |
|
14:00-14:15, Paper We-S3-T2.1 | |
SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs |
|
Czakó, Patrik | Obuda University |
Kertész, Gábor | Obuda University |
Szénási, Sándor | Obuda University |
Keywords: Neural Networks and their Applications, Deep Learning, Computational Intelligence
Abstract: We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs). SmoothRot addresses the critical challenge of massive activation outliers, by integrating channel-wise scaling with Hadamard transformations. Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy. Experiments conducted on popular LLMs (LLaMA2 7B, LLaMA3.1 8B, and Mistral 7B) demonstrate that SmoothRot consistently reduces the performance gap between quantized and FP16 models by approximately 10-30% across language generation and zero-shot reasoning tasks, without introducing additional inference latency.
|
|
14:15-14:30, Paper We-S3-T2.2 | |
Multimodal Sensing and Machine Learning for Soft and Hard Texture Roughness Recognition Using Sliding Exploratory Procedures |
|
Guo, Quan | University of Bath |
Tronco Jurado, Ulises | University of Bath |
Martinez-Hernandez, Uriel | University of Bath |
Keywords: Neural Networks and their Applications, Hybrid Models of Computational Intelligence, Application of Artificial Intelligence
Abstract: Texture roughness perception is crucial for autonomous robots to perform manipulation, quality inspection, and material discrimination in unknown environments. This work proposes an approach to combine vibration and force data using the VibroTact sensor for texture roughness classification. Vibration and force data are first processed by CNN and ANN models, and then combined using a Bayesian framework. This approach is evaluated by recognizing 15 textures with different roughness (7 soft and 8 hard textures) using individual ANN and CNN models, and is compared against the Bayesian combination of both methods. Texture data is collected by mounting the VibroTact sensor on a robotic arm and using three sliding exploratory procedures (vertical, diagonal, and circular sliding). The texture roughness recognition results achieve 100% accuracy using the combined approach, which improves the performance of individual ANN and CNN models which range from 87.50% to 100% accuracy. The results also show that diagonal and vertical sliding are optimal for recognizing hard and soft textures, respectively. This approach demonstrates its potential for industrial robotics applications that require texture discrimination.
|
|
14:30-14:45, Paper We-S3-T2.3 | |
Frequency Mamba U-Net for MRI Brain Tumor Segmentation |
|
Wang, Wei | Dalian Minzu University |
Guo, Lixin | Dalian Minzu University |
Zhang, Muqing | Dalian Minzu University |
Jianxin, Zhang | Dalian Minzu University |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: Accurate segmentation of brain tumors is crucial for clinical diagnosis and treatment. In recent years, state space modeling (SSM) demonstrates its potential in long-range dependency modeling. However, it still has limitations in 3D medical image applications. Moreover, segmentation accuracy is often affected by high-frequency noise and the complex details inherent in brain tumor images. To overcome these limitations, we propose a novel frequency mamba U-Net model, i.e., FMambaU-Net, which breaks through the bottleneck of SSM in 3D images, reduces noise and improves the robustness and accuracy of brain tumor segmentation by integrating improved Mamba and frequency domain operations in U-Net. Specifically, we embed a dynamic weighted (DW) Mamba module into the bottleneck of 3D U-Net, which enhances the processing of complex tumor morphology by prioritizing critical regions. Additionally, we design and integrate a 3D frequency domain fusion module (FDFM) into the skip connections, which uses 3D fast Fourier transform (FFT) to separate frequency components and assign lower weights to high frequencies, thereby reducing high-frequency noise and improving segmentation accuracy. Evaluations on the BraTS 2020 and 2021 datasets show that FMambaU-Net outperforms both the baseline and state-of-theart methods. On BraTS 2020, the DSCs for enhanced tumor (ET), tumor core (TC), and whole tumor (WT) are 79.41%, 84.18%, and 91.62%, respectively, while on BraTS 2021, the DSCs are 88.59%, 91.30%, and 91.88%. These results demonstrate the effectiveness and competitiveness of FMambaU-Net in brain tumor segmentation.
|
|
14:45-15:00, Paper We-S3-T2.4 | |
C2-YOLOv8-Obb: A Novel QR Code Localization Model Integrating CBAM and CMUNeXt for Complex Environments |
|
Zhou, Xijie | University of Science and Technology Beijing |
Xu, Jianyuan | University of Science and Technology Beijing |
Sun, Xinmiao | University of Science and Technology Beijing |
Gokbayrak, Kagan | Hacettepe University |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: This paper introduces C2-YOLOv8-obb, a novel QR code localization model that enhances the YOLOv8-obb model through two key innovations: (1) the integration of a Convolutional Block Attention Module (CBAM) into the backbone feature extraction network, enabling the model to dynamically focus on critical QR code features under chal- lenging conditions such as strong lighting or background clutter; and (2) the replacement of traditional convolution layers in both the backbone and neck with CMUNeXt Blocks, improving the global extraction capability of model. Exper- imental results demonstrate that the proposed C2-YOLOv8- obb model outperforms traditional contour detection methods under strong lighting conditions and complex environments, achieving a 2.28 times improvement in the effective recognition range. Compared to state-of-the-art models such as YOLOv7, YOLOv7-tiny, and YOLOv8-obb, C2-YOLOv8-obb achieves significant improvements across all key metrics, including a recall and mAP@50 exceeding 99%, precision reaching 98%, and mAP@50:95 achieving 92.1%. These results validate the effectiveness of the proposed method.
|
|
15:00-15:15, Paper We-S3-T2.5 | |
Fault-Tolerant and Highly Efficient Vision Transformer Models with Approximate TMR Based on Low-Bit Quantization |
|
Ogawa, Kiyoto | The University of Aizu |
Saikawa, Yamato | The University of Aizu |
Tomioka, Yoichi | The University of Aizu |
Saito, Hiroshi | The University of Aizu |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, AIoT
Abstract: From the perspective of real-time processing, such as autonomous driving, sudden failures could potentially lead to severe and even life-threatening accidents. In particular, hardware faults in AI models used for real-time decision-making can compromise safety. To prevent these accidents before they happen, it is crucial to detect failures promptly. Moreover, for small and power-constrained devices such as drones, it is essential to develop a fault-tolerant AI that is computationally efficient and minimizes memory usage and power consumption. Approximated Triple Modular Redundancy (TMR) with quantization has been proposed for convolutional layers, which enables fault-tolerant inference with reduced computational cost. However, sufficiently efficient and reliable fault-tolerant methods for quantized Vision Transformers based on Transformer blocks have not yet been developed. In this paper, we propose Approximated Dual Modular Redundancy (DMR) for fault detection and Approximated TMR for fault recovery, specifically designed for Vision Transformers. In our evaluation, the proposed Approximate TMR was applied to an 8-bit quantized Swin Transformer. The results show that it reduces the computational cost significantly compared to conventional TMR. Furthermore, it successfully detects faults when single-bit flips occur with a probability exceeding 1%. We also demonstrate that the proposed Approximated TMR maintains higher accuracy than existing methods, such as Ranger and Clipper, even under single-bit flip faults.
|
|
We-S3-T3 |
Room 0.11 |
Cooperative Systems and Control |
Regular Papers - SSE |
Chair: Hayashi, Naoki | The University of Osaka |
Co-Chair: Almeida Junior, Acarcio Gomes | UFPE |
|
14:00-14:15, Paper We-S3-T3.1 | |
Distributed Byzantine-Resilient Stochastic Optimization with Event-Triggered Communication |
|
Tanaka, Shota | The University of Osaka |
Hayashi, Naoki | The University of Osaka |
Inuiguchi, Masahiro | Osaka University |
Keywords: Cooperative Systems and Control
Abstract: We consider Byzantine-resilient distributed optimization with event-triggered communication. The proposed algorithm is designed to handle non-convex optimization problems in a network of agents where some agents may exhibit Byzantine behavior. Each normal agent has an estimate of a critical point of the global cost function and transmits the estimate to neighbors when the difference between the current and previously communicated values exceeds a predefined threshold. Normal agents then update their estimates by averaging the received values from a trusted set that is determined through the Iterative Outlier Scissor (IOS) procedure. By combining the event-triggered communication and the IOS filtering procedure, the proposed approach ensures resilience to Byzantine behavior and guarantees convergence even in adversarial settings.
|
|
14:15-14:30, Paper We-S3-T3.2 | |
Multiagent Safe Reinforcement Learning by Decentralized Event-Triggered Min-Max Optimization |
|
Otani, Shunsuke | The University of Osaka |
Hayashi, Naoki | The University of Osaka |
Inuiguchi, Masahiro | Osaka University |
Keywords: Cooperative Systems and Control
Abstract: This paper addresses decentralized event-triggered reinforcement learning with safety constraints. Each agent has an individual reward function and safety constraints that depend on the joint actions of agents. The objective is to maximize the team's long-term return while satisfying the safety constraints. We formulate the reinforcement learning problem as a nonconvex-concave min-max optimization problem and propose a decentralized policy gradient algorithm. Each agent has estimations for the optimal primal and dual solutions of the min-max optimization problem. Different from the existing decentralized reinforcement learning, the agents share these estimates only when the error exceeds a predefined threshold. We show that the estimates of agents converge to a neighborhood of a locally optimal solution while effectively reducing the communication overhead.
|
|
14:30-14:45, Paper We-S3-T3.3 | |
Decentralized Reinforcement Learning with Risk Aversion in Multi-Agent Systems |
|
Ishikawa, Daichi | The University of Osaka |
Ichino, Taisei | The University of Osaka |
Hayashi, Naoki | The University of Osaka |
Inuiguchi, Masahiro | Osaka University |
Keywords: Cooperative Systems and Control
Abstract: This study addresses risk-averse distributed reinforcement learning in multi-agent systems, where agents collaboratively optimize their policies with a risk index in the objective function. Unlike conventional reinforcement learning, which maximizes the expected cumulative reward, the proposed risk-averse reinforcement learning incorporates Conditional Value at Risk (CVaR) to account for rare but significant adverse events. We propose a risk-averse distributed reinforcement learning algorithm based on the policy gradient method, which enables agents to collaboratively update their policies by sharing information through a communication network. Through a theoretical analysis, we show that the proposed algorithm ensures agreement among agents on their estimated policies. Moreover, we show that the consensus value converges in a neighborhood of a locally optimal solution, which ensures that each agent's learned policy remains aligned with risk-aware optimization criteria.
|
|
14:45-15:00, Paper We-S3-T3.4 | |
Adaptive Time-Constrained Consensus Control for a Class of Disturbed Multi-Agent Systems |
|
Zhang, Shen | Qilu University of Technology (Shandong Academy of Sciences) |
Jin, Xiaozheng | Qilu University of Technology (Shandong Academy of Sciences) |
Fu, Jia | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Cooperative Systems and Control, Adaptive Systems, Robotic Systems
Abstract: This paper addresses the robust time-constrained consensus control of a class of multi-agent systems (MASs) subject to nonlinear dynamics and external disturbances. Adaptive compensation technique is utilized to eliminate the negative effects of nonlinearities and disturbances. New distributed finite-time consensus control strategies are designed to ensure bounded consensus in MASs by leveraging adaptive compensation signals. The finite-time stability of the MAS is proved by utilizing the Lyapunov theorem. Finally, simulation outcomes involving multiple unmanned marine systems within a multiagent framework confirm the efficacy of the proposed approach.
|
|
15:00-15:15, Paper We-S3-T3.5 | |
CACCpo: Multi-Objective Particle Swarm Optimization for Collaborative Adaptive Cruise Control |
|
Almeida Junior, Acarcio Gomes | UFPE |
da Silva Filho, Abel Guilhermino | UFPE |
Campelo, Divanilson | Universidade Federal De Pernambuco |
Keywords: Cooperative Systems and Control, Autonomous Vehicle
Abstract: This study investigates the use of Multi-Objective Particle Swarm Optimization (MOPSO) to optimize the hyper- parameters of Cooperative Adaptive Cruise Control (CACC) controllers, aiming to enhance performance in dynamic traffic scenarios. The proposed approach, named CACCpo, identifies hyperparameter combinations on Pareto fronts for a CACC controller based on the reduction of its accumulated error and overshoot, which are related to the system’s response time and stability, respectively. The proposed CACC controller, tuned with MOPSO, achieves an accumulated error of 21.21 and an overshoot of 6.38%, outperforming other controllers in speed and accuracy when responding to variations in the leader vehicle’s speed. In addition, it provides smoother acceleration and deceleration with minimal oscillations, ensuring a more stable and efficient system. Moreover, it maintains a constant safe distance between vehicles, which is essential for passenger safety and comfort.
|
|
We-S3-T4 |
Room 0.12 |
Intelligent Power Grid 1 |
Regular Papers - SSE |
Chair: Zhang, Ziqi | Nanjing University of Aeronautics and Astronautics |
Co-Chair: Ho, Tan-Jan | Chung-Yuan Christian University |
|
14:00-14:15, Paper We-S3-T4.1 | |
Multi-Agent DRL-Based Online Path Planning for UAV Power Tower Inspection with Travel Time Uncertainty |
|
Liu, Wei | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Qiu, Wen-Jin | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Intelligent Power Grid, Control of Uncertain Systems
Abstract: Recent years have witnessed the increasing adoption of unmanned aerial vehicles (UAVs) for power grid inspection, as they gradually replace conventional hazardous manual operations. However, conventional metaheuristics and operations research-based path planning algorithms suffer from long computation times for large-scale problems, thus making them unsuitable for real-time multi-UAV scheduling under flight time and energy consumption uncertainty. To solve the multi-UAV online path planning problem, this paper proposes a multi-agent deep reinforcement learning (MADRL) algorithm that performs online path planning based on real-time environmental and UAV state information, with the goal of minimizing the total travel distance. We employ resource preservation and decision sharing to handle real-time cooperation under travel time uncertainty while preventing resource conflicts. We design a Safety Mask mechanism that constrains dangerous UAV actions to address energy consumption uncertainty. Experiments on instances with 40-400 towers show that our algorithm requires minimal computation time and generates higher-quality solutions compared to other baseline algorithms in large-scale tower scenarios.
|
|
14:15-14:30, Paper We-S3-T4.2 | |
Novel Exploration Strategy Via Uncertainty Decomposition with Application to Reinforcement-Learning-Based Optimal Power Flow |
|
Zhang, Ziqi | Nanjing University of Aeronautics and Astronautics |
Zhang, Chaohai | Nanjing University of Aeronautics and Astronautics |
Keywords: Intelligent Power Grid, Control of Uncertain Systems, Cyber-physical systems
Abstract: With the large-scale integration of renewable energy sources, the Optimal Power Flow (OPF) problem in modern power systems has become increasingly uncertain and complex, making it difficult for conventional methods to strike a balance between computational efficiency and optimization quality. While reinforcement learning (RL) holds potential in such high-dimensional and dynamic optimization scenarios, it still encounters major hurdles, including slow convergence, insufficient exploration, and a high risk of converging to local optima under uncertain conditions caused by renewable energy integration. To address these issues, this paper models and decouples the total uncertainty into the environmental-noise-induced uncertainty and the lack-of-knowledge-induced uncertainty. Then, based on the decomposition, this paper proposes (1) a novel uncertainty-driven exploration formula that converges better and (2) an uncertainty-aware experience replay buffer, which dynamically identifies and focuses on decision regions with high uncertainty to improve RL’s performance. Experimental results on a real 179-bus power system confirm that the proposed method improves exploration efficiency and yields higher overall returns.
|
|
14:30-14:45, Paper We-S3-T4.3 | |
A Low-Rank Enhanced Lightweight Multimodal LLM Framework for Efficient Edge Power Visual Detection |
|
Zhao, Chang | Institute of Computing Technology, Chinese Academy of Sciences |
Ji, Wen | Institute of Computing Technology, University of Chinese Academy |
Yang, Zheming | Institute of Computing Technology, Chinese Academy of Sciences |
Hu, Yunqing | Institute of Computing Technology, Chinese Academy of Sciences |
Zhang, Chang | Institute of Computing Technology, Chinese Academy of Sciences |
Xu, Jingce | State Grid Energy Research Institute, State Grid Corporation Of |
Ma, Hao | Xi'an Jiaotong University |
Guo, Ziyu | State Grid Hebei Electric Power Research Institute |
Zhang, Wancai | Nari Technology Co., Ltd |
Keywords: Intelligent Power Grid, Fault Monitoring and Diagnosis, System Architecture
Abstract: With the development of artificial intelligence, many visual detection methods have been applied to power systems. However, diverse tasks cause existing solutions to face low detection accuracy and resource constraints in edge power scenarios. In this paper, we propose a low-rank enhanced lightweight multimodal LLM framework for efficient edge power visual detection. The framework can effectively balance the model performance with the resource constraints of edge deployment by introducing multimodal LLM and fusing data augmentation strategies and low-rank optimization. First, to enhance the adaptability of diverse visual detection tasks, we design a diverse data augmentation strategy for multimodal LLM to solve the problem of insufficient power scene data. Then, we reduce the number of training parameters by a low-rank optimization technique to enable the model to run efficiently on resource-limited edge devices. Experimental results demonstrate that our proposed framework can improve the accuracy by more than 3% on average under different power visual detection tasks. It can also save 76.49% of graphics memory consumption and accelerate training by 81.88%.
|
|
14:45-15:00, Paper We-S3-T4.4 | |
Synthetic Data Generation for Wind Energy Forecasting: Comparison between Statistical and Deep Learning Models |
|
Klyagina, Olga | INESC TEC |
Xia, Weijie | Delft University of Technology |
Andrade, Ricardo | INESC TEC |
Vergara, Pedro P. | Delft University of Technology |
Bessa, Ricardo | INESC TEC |
Keywords: Intelligent Power Grid, Intelligent Green Production Systems
Abstract: This paper examines the effectiveness of various synthetic data generation methods for deterministic wind power forecasting. Specifically, this work evaluates four approaches—Gaussian Mixture Models (GMMs), t-Copula, DoppelGANger, and FCPFlow—by comparing the forecasting performance, measured using Mean Absolute Error and Root Mean Squared Error, of models trained on synthetic versus real datasets. Our findings indicate that statistical methods (such as GMM and t-Copula) achieve notably better performance under limited data availability. However, the deep generative model FCPFlow yields superior results when sufficient training data is available. These findings suggest that the choice of synthetic data generation method should be informed by the specific data availability context.
|
|
15:00-15:15, Paper We-S3-T4.5 | |
Multi-Metric Adaptive Autoencoder for Smart Grid Measurement Incomplete Data Recovery |
|
Ding, Yuting | Southeast University |
Zheng, Jianyong | Southeast University |
Mei, Fei | Hohai University |
Lu, Jianchao | Southeast University |
Keywords: Intelligent Power Grid, Smart Metering, Infrastructure Systems and Services
Abstract: Incomplete data in smart grid measurement tools (SGMT) poses significant challenges for accurate monitoring and decision making. Traditional autoencoders with fixed loss functions struggle to handle heterogeneous corruption patterns—dense, low‐amplitude noise and sparse, high‐impact anomalies. This paper presents a multi-metric adaptive autoencoder designed for recovering incomplete SGMT data (MMAS). By combining diverse (L_{p})-norm autoencoders with an adaptive weighting mechanism and nonnegative, density-aware optimization, MMAS effectively captures both noise and anomaly patterns. Experimental results on real-world data show that MMAS consistently outperforms baseline methods in both recovery accuracy and downstream tasks. The proposed framework offers a robust and practical solution for SGMT data recovery under varying sparsity conditions.
|
|
15:15-15:30, Paper We-S3-T4.6 | |
Intelligent Robust Dynamic State Tracking of Power Systems under Outliers and Missing Data |
|
Ho, Tan-Jan | Chung-Yuan Christian University |
Tsai, Shao-Wei | Chung Yuan Christian University, Longtan, Taoyuan, Taiwan (R.O.C |
Keywords: Intelligent Power Grid, Smart Sensor Networks
Abstract: This paper presents new resilient state estimation approaches for dynamically tracking power system states with high accuracy under outliers/missing measurements. To effectively counter the adverse impact of non-Gaussian noises and lost data on power system state monitoring, the proposed methods are developed based on a synergy of a measurement pre-screening scheme, unscented Kalman filtering, and M-estimation with/without incorporating computational intelligence using a fuzzy tuning technique. Simulations using the IEEE 9-bus power system testbed demonstrate that the proposed methods can yield desirable robust state tracking results, and outperform some methods in the literature. Moreover, it is shown that the proposed intelligent method using fuzzy adaptation can achieve the best performance.
|
|
We-S3-T6 |
Room 0.16 |
Machine Learning 3 |
Regular Papers - Cybernetics |
Chair: Karray, Fakhreddine | University of Waterloo |
Co-Chair: Breslin, Robert | University of North Dakota |
|
14:00-14:15, Paper We-S3-T6.1 | |
Fast Domain Adaptation and Comparing KD Routines |
|
Picklo, Ian | University of North Dakota |
Breslin, Robert | University of North Dakota |
Neubert, Jeremiah | University of North Dakota |
Keywords: Transfer Learning, Machine Vision, Machine Learning
Abstract: Autonomous driving and advanced driver assistance systems have made significant advances in recent years. Dataset construction for road scenes can be time consuming and cost-prohibitive, even more so for the thermal domain. The thermal domain is underrepresented in publicly available datasets compared to visible-light domains. The proposed methodology cheaply produces high-quality thermal (IR) semantic segmentation networks using multiple rounds of knowledge distillation with different training domains and no IR labels. A trained visual spectrum (RGB) is used to train thermal and visual-thermal (RGBT) students. The experiment compares the use of intermediate training domains, such as RGBT for training a thermal student. The method’s key contribution is demonstrating that the best results come from directly training the IR student with an RGB teacher, without intermediate trainings.
|
|
14:15-14:30, Paper We-S3-T6.2 | |
Nabla Fractional Algorithm for Distributed Resource Allocation Based on PID Protocol in Cooperative-Competitive Network |
|
Ni, Xintong | Southeast University |
Wei, Yiheng | Southeast University |
Li, Xiuxian | Tongji University |
Cao, Jinde | Southeast University |
Keywords: Optimization and Self-Organization Approaches, Complex Network
Abstract: The cooperative-competitive network has broad application prospects due to its advantages in aligning with practical production and life. This work, for the first time, combines the predictive capability of the proportional-integral-derivative (PID) protocol for error and the flexibility of an additional order parameter in fractional calculus to solve the distributed resource allocation problem in cooperative-competitive networks, and provides a discrete time algorithm for the design. The analysis proves that the algorithms can Mittag-Leffler converge to the optimal solution of the distributed resource allocation problem. Finally, numerical simulations are provided, demonstrating the effectiveness of the algorithm, along with a set of comparative simulation that validate the superiority of the proposed algorithm.
|
|
14:30-14:45, Paper We-S3-T6.3 | |
Advancing Human Activity Recognition with Meta-Learning for Continual Learning |
|
Ghedini, Cinara | Computer Science Division. Aeronautics Institute of Technology |
Silva, Anderson Anjos | Unicamp |
Colombini, Esther Luna | Unicamp |
Keywords: Machine Learning, Neural Networks and their Applications, Deep Learning
Abstract: Human Activity Recognition (HAR) is an evolving field with applications in health monitoring, smart environments, exercise tracking, and human-computer interaction. HAR systems require models to adapt to dynamic and evolving data distributions, a challenge that traditional machine learning approaches often struggle to address, resulting in performance degradation over time. This paper introduces a framework for evaluating the application of meta-learning in continual learning scenarios within HAR. In the proposed framework, we employ Online Aware Meta-Learning (OML) and Model-Agnostic Meta-Learning (MAML-Rep) in a continual learning HAR scenario. These methods are evaluated for their ability to retain prior knowledge while efficiently adapting to new activities and data, addressing key challenges such as catastrophic forgetting and data imbalance. The framework also integrates robust preprocessing techniques, including data augmentation, to manage dataset variability. Experimental results highlight OML's superior adaptability and performance across multiple HAR datasets, particularly in handling imbalanced data, validating meta-learning strategies efficacy for continual learning in HAR.
|
|
14:45-15:00, Paper We-S3-T6.4 | |
MultiSenseNet: A Multi-Scale Sensing Approach for Cyberbullying Detection on Social Media |
|
Lu, Xiaoyu, Sean | Nanjing University of Science and Technology |
Ma, Shulei | Nanjing University of Science and Technology |
Liu, Haoyue | Zhejiang Gongshang University |
Song, Chenhao | Nanjing University of Science and Technology |
Huang, Bo | Nanjing University of Science and Technology |
Keywords: Machine Learning, Neural Networks and their Applications, Knowledge Acquisition
Abstract: As social media usage proliferates, cyberbullying emerges as a severe social issue, with an increasing demand for effective detection methods. Current approaches suffer from three major limitations: (1) they often focus solely on textual or visual features while ignoring sentiment features; (2) they fail to effectively capture the attack patterns of offensive comments in cyberbullying sessions; and (3) existing methods typically use simple linear classification layers at the end of their models, making it difficult to discriminate between bullying and non-bullying sessions, especially when the differences are subtle. In this study, we first identify two primary attack patterns of cyberbullying, burst attacks that multiple offensive comments appear consecutively within short time periods; and high-density attacks, where offensive comments appear throughout a comment sequence but may be interspersed with non-offensive comments. Then, to address the challenges, a novel cyberbullying detection method, Multi-Scale Sensing Network (MultiSenseNet), is proposed to capture both burst and high-density attacks through a multi-scale sliding window selector, integrate semantic and sentiment features, and replace linear layers with contrastive learning for an enhanced discrimination.
|
|
15:00-15:15, Paper We-S3-T6.5 | |
Nesterov Acceleration Algorithm in Deep Learning Based on Proportional-Integral-Derivative Control |
|
Tao, Meng | Southeast University |
Wei, Yiheng | Southeast University |
Franceschelli, Mauro | University of Cagliari |
Cao, Jinde | Southeast University |
Keywords: Machine Learning, Optimization and Self-Organization Approaches
Abstract: 本文提出了一种新的优化算法,名为 PIDNAG,创新性地集成了 PID 控制 策略 - 包括 proportional、integral 和 derivative 组件 - 放入 Nesterov 加速梯度法 用于求解凸优化问题。从 从控制理论的角度出发,我们进行严谨的 理论分析证明,提出的 方法不仅保证了收敛性,而且 显著加快优化过程。 广泛的实验结果表明,PIDNAG 实现了 在各种实际应用中显著的收敛性能 学习任务,扎实验证其卓越的能力 在凸优化问题中。
|
|
15:15-15:30, Paper We-S3-T6.6 | |
MDD-Net: Multimodal Depression Detection through Mutual Transformer |
|
Haque, Md Rezwanul | University of Waterloo |
Islam, Md. Milon | University of Waterloo |
Raju, S M Taslim Uddin | University of Waterloo |
Hamdi, Altaheri | University of Waterloo |
Nassar, Lobna | American University of Ras Al Khaimah |
Karray, Fakhreddine | University of Waterloo |
Keywords: Expert and Knowledge-Based Systems, Application of Artificial Intelligence, Hybrid Models of Computational Intelligence
Abstract: Depression is a major mental health condition that severely impacts the emotional and physical well-being of individuals. The simple nature of data collection from social media platforms has attracted significant interest in properly utilizing this information for mental health research. A Multimodal Depression Detection Network (MDD-Net), utilizing acoustic and visual data obtained from social media networks, is proposed in this work where mutual transformers are exploited to efficiently extract and fuse multimodal features for efficient depression detection. The MDD-Net consists of four core modules: an acoustic feature extraction module for retrieving relevant acoustic attributes, a visual feature extraction module for extracting significant high-level patterns, a mutual transformer for computing the correlations among the generated features and fusing these features from multiple modalities, and a detection layer for detecting depression using the fused feature representations. The extensive experiments are performed using the multimodal D-Vlog dataset, and the findings reveal that the developed multimodal depression detection network surpasses the state-of-the-art by up to 17.37% for F1-Score, demonstrating the greater performance of the proposed system. The source code is accessible at https://github.com/rezwanh001/Multimodal-Depression-Detection.
|
|
We-S3-T7 |
Room 0.31 |
Human Perception in Multimedia & Design Methods |
Regular Papers - HMS |
Chair: Tanaka, Takayuki | Hokkaido University |
Co-Chair: Wang, Luming | Zhejiang University |
|
14:00-14:15, Paper We-S3-T7.1 | |
Learning Spectral Diffusion Prior for Hyperspectral Image Reconstruction |
|
Yu, Mingyang | East China Normal University |
Wu, Zhijian | East China Normal University |
Huang, Dingjiang | East China Normal University |
Keywords: Human Enhancements, Human Perception in Multimedia, Multimedia Systems
Abstract: Hyperspectral image (HSI) reconstruction aims to recover 3D HSI from its degraded 2D measurements. Recently great progress has been made in deep learning-based methods, however, these methods often struggle to accurately capture high-frequency details of the HSI. To address this issue, this paper proposes a Spectral Diffusion Prior (SDP) that is implicitly learned from hyperspectral images using a diffusion model. Leveraging the powerful ability of the diffusion model to reconstruct details, this learned prior can significantly improve the performance when injected into the HSI model. To further improve the effectiveness of the learned prior, we also propose the Spectral Prior Injector Module (SPIM) to dynamically guide the model to recover the HSI details. We evaluate our method on two representative HSI methods: MST and BISRNet. Experimental results show that our method outperforms existing networks by about 0.5 dB, effectively improving the performance of HSI reconstruction.
|
|
14:15-14:30, Paper We-S3-T7.2 | |
EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera |
|
Wang, Luming | Zhejiang University |
Shi, Hao | Zhejiang University |
Yin, Xiaoting | Zhejiang University |
Yang, Kailun | Hunan University |
Wang, Kaiwei | Zhejiang University |
Bai, Jian | Zhejiang University |
Keywords: Human Perception in Multimedia, Virtual and Augmented Reality Systems, Human-centered Learning
Abstract: Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing asynchronous event streams due to their synchronous frame-based nature. Moreover, from an egocentric perspective, event cameras record data that includes events generated by both head movements and hand gestures, thereby increasing the complexity of gesture recognition. To address this, we propose a novel network architecture specifically designed for event data processing, incorporating (1) a lightweight CNN with asymmetric depthwise convolutions to reduce parameters while preserving spatiotemporal features, (2) a plug-and-play state-space model as context block that decouples head movement noise from gesture dynamics, and (3) a parameter-free Bins-Temporal Shift Module (BTSM) that shifts features along bins and temporal dimensions to fuse sparse events efficiently. We further establish the EgoEvGesture dataset, the first large-scale dataset for egocentric gesture recognition using event cameras. Experimental results demonstrate that our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches. Notable misclassifications in freestyle motions stem from high inter-personal variability and unseen test patterns differing from training data. Moreover, our approach achieved a remarkable accuracy of 97.0% on the DVS128 Gesture, demonstrating the effectiveness and generalization capability of our method on public datasets. The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture.
|
|
14:30-14:45, Paper We-S3-T7.3 | |
Annotation of Manga Reading Order by Scanpath Measurement |
|
Iwamoto, Yuma | The University of Electro-Communications |
Matsuno, Shogo | The University of Electro-Communications |
Hayashi, Katsuhiko | The University of Tokyo |
Kamigaito, Hidetaka | Nara Institute of Science and Technology |
Keywords: Human-Machine Interface, Human Perception in Multimedia, Virtual and Augmented Reality Systems
Abstract: In this paper, we propose a simple method for annotating reading order based on the measurement of manga reading behavior, which automatically estimates reading order by obtaining a scanpath from eye movements measured by an eye tracker with a built-in HMD. Our method reduces the cost of work and measures natural reading behavior, which is difficult to achieve with conventional GUI-based manual labeling and image analysis-based methods for manga reading order annotation tasks. First, Gaze samples captured with an HMD built-in eye tracker are mapped to their corresponding panels, after which time-series clustering is applied. Then, we extract the most extended fixation duration for every panel and sort panels in ascending order of that duration to infer an individual reading sequence. Also, as needed, it integrates by voting across multiple estimation results, then yields a consensus—i.e., tendentious—reading order for the page. In addition, we evaluated it on 99 pages selected from 33 commercially published works, using recordings from five adult readers. The automatically inferred sequences were compared against expert manual annotations, achieving a mean Kendall's rank correlation coefficient of 0.82. This result indicates that the proposed method can be extended to annotation tasks based on more natural reading behavior while maintaining estimation accuracy. At the same time, the observed variability among individual sequences highlights the importance of modeling reader-specific behaviors in future work.
|
|
14:45-15:00, Paper We-S3-T7.4 | |
A LLM-Based Supportive Dialogue Training System for Healthcare Workers |
|
Chen, Yong-Xiang | Chung Yuan Christian University |
Ho, Hsi-Wei | Chung Yuan Christian University |
Chang, Yu-Chiao | Chung Yuan Christian University |
Keywords: Design Methods
Abstract: Medical disputes have been increasing year by year, with poor communication and inadequate service attitude being the main causes. Intern healthcare workers often face challenges due to a lack of communication experience, and there are many limitations in implementing communication skills training. This study proposed a framework that aims to utilize LLMs to enhance supportive language dialogue training for healthcare workers, particularly in emotional support and empathy expression. Through LLM-based simulation combined with supportive psychotherapy, users can engage in conversations with patients from diverse backgrounds and accumulate more experience. We expect this framework to more effectively train healthcare professionals.
|
|
15:00-15:15, Paper We-S3-T7.5 | |
THAT: Token-Wise High-Frequency Augmentation Transformer for Hyperspectral Pansharpening |
|
Jin, Hongkun | JPMorgan Chase |
Hongcheng, Jiang | UMKC, Umsystem |
Zhang, Zejun | University of Southern California |
Zhang, Yuan | The University of Adelaide |
Fu, Jia | KTH Royal Institute of Technology |
Li, Tingfeng | NEC Laboratories America, Inc |
Luo, Kai | University of Virginia |
Keywords: Environmental Sensing,, Augmented Cognition, Design Methods
Abstract: Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multi-scale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors (e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral--spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components—such as material edges and texture transitions—and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency Augmentation Transformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the-art performance with improved reconstruction quality and efficiency. Code is available at https://github.com/kailuo93/THAT.
|
|
15:15-15:30, Paper We-S3-T7.6 | |
Reproduction of Human Locomotion Transitions Based on Motion Intention Using a Strategy-Tactics Framework |
|
Kitagawa, Masaki | Hokkaido University |
Tanaka, Takayuki | Hokkaido University |
Murai, Akihiko | National Institute of Advanced Industrial Science and Technology |
Kusaka, Takashi | Hokkaido University |
Keywords: Human Performance Modeling, Intelligence Interaction, Design Methods
Abstract: This paper proposes a hierarchical motion generation model that integrates individual intent into dynamic locomotion control, aiming to reproduce transitions between walking and running within a unified physical framework. The model consists of a strategy layer that reflects high-level objectives such as speed and energy efficiency, and a tactics layer that adjusts physical parameters such as leg stiffness, touchdown angle, and applied force. These layers interact through dynamically modulated control gains, allowing motion transitions to emerge naturally without explicit switching mechanisms. Simulations based on an extended Spring-Loaded Inverted Pendulum (SLIP) model demonstrate that differences in acceleration duration and energy supply result in diverse locomotion patterns. Notably, a transition from walking to running occurs when both acceleration intensity and additional energy surpass specific thresholds. The results highlight the model's capacity to capture hysteresis-like features in human gait transitions and emphasize the importance of coordinated tactical control. Future work will address integrating multiple tactical elements and experimental validation toward applications in assistive robotics and human movement analysis.
|
|
We-S3-T9 |
Room 0.51 |
Autonomous Vehicle 1 |
Regular Papers - SSE |
Chair: Bonilla Licea, Daniel | Mohammed VI Polytechnic University |
Co-Chair: Rao, Xinpei | University of Chinese Academy of Sciences |
|
14:00-14:15, Paper We-S3-T9.1 | |
Free-Space Optical Communication-Driven NMPC Framework for Multi-Rotor Aerial Vehicles in Structured Inspection Scenarios |
|
Silano, Giuseppe | Czech Technical University in Prague |
Bonilla Licea, Daniel | Mohammed VI Polytechnic University |
El Hammouti, Hajar | Mohammed VI Polytechnic University |
Saska, Martin | Czech Technical University in Prague |
Keywords: Autonomous Vehicle, Communications, Robotic Systems
Abstract: This paper introduces a Nonlinear Model Predictive Control (NMPC) framework for communication-aware motion planning of Multi-Rotor Aerial Vehicles (MRAVs) using Free-Space Optical (FSO) links. The scenario involves MRAVs equipped with body-fixed optical transmitters and Unmanned Ground Vehicles (UGVs) acting as mobile relays, each outfitted with fixed conical Field-of-View (FoV) receivers. The controller integrates optical connectivity constraints into the NMPC formulation to ensure beam alignment and minimum link quality, while also enabling UGV tracking and obstacle avoidance. The method supports both coplanar and tilted MRAV configurations. MATLAB simulations demonstrate its feasibility and effectiveness.
|
|
14:15-14:30, Paper We-S3-T9.2 | |
A Modified SG Algorithm for TDOA-Based Source Localization |
|
Rao, Xinpei | University of Chinese Academy of Sciences |
Liu, Yujing | Chinese Academy of Sciences |
Liu, Zhixin | Academy of Mathematics and Systems Science, CAS |
Keywords: Autonomous Vehicle
Abstract: In this paper, we consider the online source localization problem by using the time difference of arrival (TDOA) of the base station and a moving unmanned aerial vehicle (UAV). First, we transform the source localization problem into estimating an unknown parameter of a stochastic linear regression model. Then, we propose a modified stochastic gradient (SG) algorithm to estimate the unknown parameter, and establish sufficient conditions on the UAV’s positions to guarantee global convergence of the algorithm. We note that our convergence results are obtained without using the persistent excitation condition or the independent and identical distributed assumptions on the data. Finally, numerical simulations are presented to validate the effectiveness of the proposed algorithm.
|
|
14:30-14:45, Paper We-S3-T9.3 | |
Mitigating Concept Drift in QoS Prediction for Teleoperation of Autonomous Vehicles Using Historic Data |
|
Su, Xiyan | Technical University of Munich |
Gao, Jianning | Technical University of Munich |
Ashri, Mahmoud | Technical University of Munich |
Diermeyer, Frank | Technical University Munich |
Keywords: Autonomous Vehicle, Communications, Intelligent Transportation Systems
Abstract: Teleoperation serves as the fallback solution to autonomous driving but reliable functions of the teleoperation require a certain amount of mobile network resources, which cannot be guaranteed at all times. Therefore, predictive quality of service (pQoS) is introduced as a concept to increase the resilience of the teleoperation. In this paper, based on a data measurement campaign, we propose a prediction framework to prediction two important network KPIs of teleoperation: uplink data-rate and round-trip latency. Furthermore, we introduce a method to alleviate the performance degradation of machine-learning-based prediction models on previously unseen data due to concept drift by incorporating historic data into the prediction pipeline. Additionally, we introduce the metric of critical scenario detection to evaluate the prediction performance specifically for teleoperation.
|
|
14:45-15:00, Paper We-S3-T9.4 | |
Formal Safety and Robustness Verification of Nonlinear Vehicle Systems under Uncertainty Using Sum-Of-Squares Optimization |
|
Erz, Jannis | Karlsruhe Institute of Technology (KIT) |
Burton, Simon | Department of Computer Science, University of York |
Sax, Eric | Institute for Information Processing Technologies (ITIV), Karlsr |
Keywords: Autonomous Vehicle, Control of Uncertain Systems, Trust in Autonomous Systems
Abstract: Ensuring stability and robustness in highly automated vehicle (HAV) control systems is critical for guaranteeing the safety of the intended functionality (SotiF) under real-world uncertainties inside a defined operational design domain (ODD). This paper presents a formal verification framework utilizing sum-of-squares (SOS) optimization to quantify and guarantee nonlinear system stability and robust performance in the presence of parametric uncertainties and environmental disturbances. We analyze a benchmark automated lane-following use case (UC) and verify polynomial system representations of the vehicle architecture against worst-case deviations using region of attraction (RoA) and robust positive invariant (RPI) sets. Simulation-based statistical validation using high-fidelity vehicle models confirms the effectiveness of the proposed method for formal robust control certification.
|
|
15:00-15:15, Paper We-S3-T9.5 | |
Vision-Based Covert Attack and Hybrid Adversary Detection for Autonomous Vehicles Using Generative Networks |
|
Moradi Sizkouhi, Amir Mohammad | Concordia University |
Selmic, Rastko | Concordia University |
Keywords: Autonomous Vehicle, Cyber-physical systems
Abstract: In this paper, we introduce Gen-VCA, a generative vision-based covert attack that specifically targets lane keeping driver assistance systems. Gen-VCA drives the AV out of the lane by injecting a defective signal into the steering angle. Then, it uses a customized diffusion-based generative model to create realistic front-view road views, misleading the perception system into believing that the vehicle is correctly aligned with the center line. To detect the malicious behavior of AVs, we propose an adversary detection system that integrates multi-scale forensic image analysis and deep discriminative neural networks. Experiments in real-world driving conditions reveal that Gen-VCA excels in novel center-aligned view synthesis of the road and induces significant lateral steering errors in AVs, while the detection system accurately detects these manipulations with minimal errors.
|
|
We-S3-T10 |
Room 0.90 |
Affective Computing 1 |
Regular Papers - HMS |
Chair: Gao, Ruoyu | Shanghai Jiao Tong University |
Co-Chair: Badica, Costin | Universitatea Din Craiova |
|
14:00-14:15, Paper We-S3-T10.1 | |
Emotion Recognition in Conversation Based on the Fine-Grained Multidimensional Emotion Representation Learning |
|
Gao, Ruoyu | Shanghai Jiao Tong University |
Wen, Xiaoyu | Hyperspace AI |
Liu, Gaofeng | ShangHai Jiaotong University |
Huo, Hong | Shanghai Jiao Tong University |
Fang, Tao | Shanghai Jiao Tong University |
Keywords: Affective Computing
Abstract: Traditional emotion recognition in conversation (ERC) studies are usually designed to predict a fixed set of predetermined emotion categories. This limited supervision diminishes the expressive power of the data, resulting in failure to capture the complexity of human emotions in conversation. Learning from a well-designed fine-grained representation of emotions offers a promising alternative that utilizes a wider range of supervision. In this paper, the proposed Fine-grained Multidimensional Emotion Representation Learning (FMERL) framework integrates multitask learning and contrastive learning, and extends the emotion representation of valence, arousal and dominance (VAD) from the psychological field to both continuous and discrete forms. Firstly, the emotion features from text, audio, and visual modalities are extracted. Then, the multimodal features are fused by a transformer-based model. The multitask learning module consists of three networks: the valence network, arousal network, and dominance network, for learning the continuous fine-grained emotion representations from the fused multimodal features. The contrastive learning aligns fused multimodal features with the discrete fine-grained emotion representations derived through prompt engineering applied to a large language model. The transferable ability of contrastive learning enables FMERL to map the semantic information of emotion representation and fused multimodal features into a shared embedding space, thereby understanding their semantic relationships and enabling zero-shot learning for unseen emotion classes. Experimental results on the IEMOCAP and MELD datasets have shown that FMERL achieves state-of-the-art performance in emotion recognition and implements zero-shot learning.
|
|
14:15-14:30, Paper We-S3-T10.2 | |
Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces |
|
Jang, Hyo-Jeong | Korea University |
Shin, Hye-Bin | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Affective Computing, Brain-Computer Interfaces
Abstract: Electroencephalography (EEG) is a fundamental modality for cognitive state monitoring in brain-computer interfaces (BCIs). However, it is highly susceptible to intrinsic signal errors and human-induced labeling errors, which lead to label noise and ultimately degrade model performance. To enhance EEG learning, multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models with rich representations to EEG-based models. Nevertheless, KD faces two key challenges: modality gap and soft label misalignment. The former arises from the heterogeneous nature of EEG and visual feature spaces, while the latter stems from label inconsistencies that create discrepancies between ground truth labels and distillation targets. This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels. We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies. It aligns feature semantics through a prototype-based similarity module and introduces a task-specific distillation head to resolve label-induced inconsistency in supervision. Experimental results demonstrate that our approach improves EEG-based emotion regression and classification performance, outperforming both unimodal and multimodal baselines on a public multimodal dataset. These findings highlight the potential of our framework for BCI applications.
|
|
14:30-14:45, Paper We-S3-T10.3 | |
MoRa: Multi-Graph Orthogonal Representation Adaptation Network for Cross-Subject EEG Emotion Recognition |
|
Wu, Mengqi | South China University of Technology |
Chen, C. L. Philip | University of Macau |
Zhang, Tong | South China University of Technology |
Keywords: Affective Computing, Brain-Computer Interfaces
Abstract: The redundancy of emotion-agnostic features and inter-subject variability significantly undermine the adaptability of EEG-based emotion recognition models. Previous studies have largely overlooked the influence of emotional responses at both intra- and inter-region levels across individuals. Consequently, the emotion-aware representations exhibit substantial redundancy and limited adaptability to cross-subject variability. To address these issues, this paper proposes a multi-graph orthogonal adaptation network (MoRa) that enhances the robustness of emotion-aware representations and mitigates inter-subject variability in cross-subject EEG emotion recognition. Specifically, the multi-graph module performs intra-region orthogonal decoupling of emotion-aware and agnostic information, reducing redundancy and enhancing consistency across dual emotional spaces. To reduce inter-region redundancy, the cross-region emotion refinement module integrates soft orthogonality and knowledge distillation to enhance collaborative information exchange across regions. Moreover, cross-subject experiments on the SEED and SEED-IV datasets demonstrate that the MoRa achieves state-of-the-art performance in EEG emotion recognition.
|
|
14:45-15:00, Paper We-S3-T10.4 | |
InfoGA: Enhancing Generalized EEG Emotion Recognition Via Information-Aware Graph Augmentation |
|
Chen, Bianna | Nanjing University of Finance and Economics |
Chen, C. L. Philip | University of Macau |
Zhang, Tong | South China University of Technology |
Keywords: Affective Computing, Brain-Computer Interfaces, Biometrics and Applications,
Abstract: Limited EEG data and subject variability pose significant challenges to the generalization of EEG-based emotion recognition. Most existing approaches augment EEG data using deterministic methods, often neglecting to ensure both diversity and fidelity in the generated samples. This oversight leads to insufficient domain diversity and emotional semantic information for a generalized model independent of individuals. This paper proposes an Information-Aware Graph Augmentation (InfoGA) framework for generalized EEG emotion recognition. The graph uncertainty augmentation module augments both the connectivity and features of EEG graphs by modeling statistical uncertainty, enabling the model to simulate domain shifts and improve generalizability against subject variability. Additionally, two information-aware constraints are introduced to ensure diversity and fidelity in the augmented EEG graphs. The graph diversity constraint enriches the emotional knowledge of the augmented graphs, while the graph fidelity constraint preserves their emotional semantic fidelity by integrating consistency learning with supervised learning. Extensive experiments on three public EEG emotion datasets, i.e., SEED, SEED-IV, and SEED-V, demonstrate that InfoGA achieves superior generalizability compared to baseline methods.
|
|
15:00-15:15, Paper We-S3-T10.5 | |
Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition |
|
Lin, Xuetao | Beihang University |
Peng, Tianhao | Beihang University |
Dai, Peihong | Beihang University |
Liang, Yu | Beijing University of Technology |
Wu, Wenjun | Beihang University |
Keywords: Affective Computing, Brain-Computer Interfaces, Brain-based Information Communications
Abstract: 基于脑电图的情绪识别在 开发自适应脑机通信系统, 然而,在实践中面临两个基本挑战 实施:(1) 有效整合 非平稳的时空神经模式,(2) 稳健 适应动态情绪强度变化 真实场景。本文提出了 STT-CL,一种集成 课程学习的时空转换器。我们 method 引入了两个核心组件:空间编码器 对通道间关系和时态 编码器,它通过 窗口式注意力机制,支持同步 提取空间相关性和时间动态 来自 EEG 信号。作为此体系结构的补充,一个 循序渐进的强度感知课程学习策略 指导从高强度到低强度的训练 通过动态样本调度实现的情绪状态 双重难度评估。综合实验 三个基准数据集展示了最先进的技术 在不同情绪强度水平上的表现,以及 消融研究证实了两者的必要性 建筑ń
|
|
15:15-15:30, Paper We-S3-T10.6 | |
Systematic Features Selection in Content-Based Filtering Books Recommender System |
|
Lutan, Elena-Ruxandra | University of Craiova |
Badica, Costin | Universitatea Din Craiova |
Keywords: Affective Computing, Human Perception in Multimedia, Kansei (sense/emotion) Engineering
Abstract: In this paper, we propose a method for obtaining personalized book recommendations using content-based filtering approach and three sets of book features to define the book metadata, in order to highlight their impact in the recommendation process. The recommender system is experimentally validated using four books datasets of different sizes, collected from Goodreads website - a popular book social network, using our customized web scraper. Lastly, we propose three evaluation metrics: Coverage, Average Recommendations Similarity and Relevance, and discuss our results.
|
|
We-S3-T11 |
Room 0.94 |
Large-Scale System of Systems |
Regular Papers - SSE |
Chair: Raz, Ali | George Mason University |
Co-Chair: Castellanos, Johanna | Federal University of Pernambuco |
|
14:00-14:15, Paper We-S3-T11.1 | |
Intent-Based Networking for Distributed Command-And-Control Systems: A Conceptual Framework with System of Systems |
|
Raz, Ali | George Mason University |
Hieb, Michael | George Mason University |
Maxwell, Dan | KadSci LLC |
Omelko, Vladimir | Air Force Research Laboratory |
Beckus, Andre | University of Central Florida |
Hilliard, Ryan | Air Force Research Laboratory |
Keywords: Large-Scale System of Systems, System Architecture, Distributed Intelligent Systems
Abstract: Modern operational concepts for System of Systems (SoS) and Command-and-Control (C2) systems demand flexible integration of distributed systems with dynamic task allocations and reconfigurations based on evolving operational environment to achieve assigned missions and support a commander’s intent (CI). However, the complexity of underlying systems and stringent demands of operational needs makes it very challenging to dynamically evolve underlying SoS architecture because it is typically pre-defined in a centralized manner. This paper leverages the emerging concepts and principles from Intent-based Networking (IBN) in software-defined networks for dynamically creating multiple pathways to achieve CI with distributed complex systems. The paper describes work-in-progress towards creating a conceptual framework for Intent-based Orchestration of Distributed C2 (IBODC2) that first develops a mapping of C2 and SoS problem space to an IBN architecture and defines the framework with knowledge management, intent formulation, intent processing, and intent achievement. Initial technical approaches for implementing the IBODC2 framework are also presented that includes building mission ontology for intent formulation, set based design and Allen’s Interval Algebra for intent processing, and adapting SoS analytical methods for intent achievement. This paper describes the first steps towards expanding the applicability of IBN concepts from software defined networks to complex physical systems, SoS, and C2 application domains.
|
|
14:15-14:30, Paper We-S3-T11.2 | |
A Lightweight Parallel System for Industrial-Scale Molecular Dynamics Simulation on Sunway Supercomputer |
|
Meng, Xiaojuan | Qilu University of Technology (Shandong Academy of Sciences) |
Yan, Yunbo | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Anbang | Qilu University of Technology |
Liu, Anjun | Jinan Institute of Supercomputer Technology |
Guo, Meng | Qilu University of Technology |
Keywords: Large-Scale System of Systems, System Architecture, System Modeling and Control
Abstract: The growing demand for high-fidelity, large-scale molecular dynamics (MD) simulations has posed significant performance challenges for traditional MD applications, especially on memory-constrained heterogeneous platforms. To address these challenges on the SW26010-Pro processor, we propose a parallel optimization framework for miniMD. Introducing a full-neighbor force model to eliminate atomic data dependencies and an integrated segmented prefetching and caching mechanism to enhance memory bandwidth utilization. Meanwhile, we employ shared-bin neighbor list reconstruction for redundant access reduction, coupled with an asynchronous master-slave parallel strategy to fully exploit the heterogeneous resources of the SW26010-Pro. Experimental evaluations demonstrate that our framework achieves up to 12.7× speedup on a single node and sustains 84% parallel efficiency when scaling to 399,360 cores in billion-atom simulations. The proposed techniques are broadly applicable to MD workloads with irregular memory access patterns and limited cache resources, offering a general solution path for high-performance simulations on heterogeneous architectures.
|
|
14:30-14:45, Paper We-S3-T11.3 | |
Packet Delivery Performance in Satellite-Assisted LoRaWAN Networks Using Computational Geometry |
|
Bermudez, Esau | Federal University of Pernambuco |
Castellanos, Johanna | Federal University of Pernambuco |
Dantas, Jamilson | UFPE |
Maciel, Paulo | UFPE |
Keywords: Communications, Smart Sensor Networks
Abstract: Reliable signal reception and attenuation control are critical to improving the performance of low-power wide area networks (LPWANs), especially in hybrid systems integrating LoRa and satellite communications. This study analyzes the effect of Graham's algorithm in hybrid LoRa-satellite networks using NEO-M8N GPS-enabled TTGO nodes deployed in mesh in hindered environments. Implementation of the algorithm optimized inter-node connectivity, reduced signal attenuation, and improved satellite availability. A superior packet delivery ratio (PDR) was observed with the Graham algorithm. In addition, power consumption per minute was stabilized and battery life was extended. Through mathematical modeling and experimental validation, it is confirmed that Graham improves the energy efficiency and robustness of the network, enhancing the performance of hybrid LPWAN systems.
|
|
14:45-15:00, Paper We-S3-T11.4 | |
Beyond Compute Dependency: Strategic Trade-Offs in AI Chip Procurement and R&D |
|
Tang, Guoying | Northwestern Polytechnical University |
Zhang, Yali | Northwestern Polytechnical University |
Li, Liaoliao | Northwestern Polytechnical University |
Li, Wenjing | Northwestern Polytechnical University |
Keywords: Supply equipement
Abstract: The rapid development of large language models (LLM) has driven a significant surge in demand for computing power, but the high concentration in the supply market has exacerbated the risk of “computing power dependency.” Some LLM manufacturers have attempted to address this challenge through research and development (R&D) of artificial intelligence chips. This study constructs a duopoly game model to analyze the chip acquisition strategies of LLM manufacturers under supply chain risks. We compare two scenarios—where both LLM manufacturers procure chips (NN model) and where one manufacturer relies on R&D (NR model). The findings reveal that: (1) When the manufacturer’s competitor opts for R&D chips, if the performance of the competitor’s chips is at a low level, the manufacturer can experience a significant increase in price and profit by choosing to purchase chips; (2) When the competitor consistently opts for procurement, the LLM manufacturer should choose R&D over purchasing if their internally developed chips achieve moderate performance levels; (3) Customer sensitivity to supply disruptions increases LLM manufacturers’ stability preference, thereby enhancing the competitive advantages of R&D strategy; computational efficiency improvements can substitute part of the value brought by R&D strategy. These findings offer theoretical insights into how LLM manufacturers can balance cost, performance, and risk in chip strategy decisions.
|
|
15:00-15:15, Paper We-S3-T11.5 | |
Integrating Assurance and Risk Management of Complex Systems |
|
Haugen, Odd Ivar | DNV AS |
Keywords: Technology Assessment, Cyber-physical systems
Abstract: This paper examines the relationship between assurance, risk, and risk management in the context of complex safety-related systems. It introduces a nuanced understanding of assurance, arguing that for a complex system, confidence is justified by claims founded on system behaviour. It emphasises that knowledge is the cornerstone of assurance. The paper addresses the challenges inherent in complex safety-critical systems, specifically the epistemic and aleatory uncertainties. To address these uncertainties about the emergent behaviour, this paper proposes a systems approach based on the CESM metamodel (Composition, Environment, Structure, Mechanisms). Because risk is the 'effect of uncertainty on objectives' and assurance provides the 'grounds for justified confidence,' addressing the uncertainties of complex systems requires a more integrated assurance and risk management framework. This paper conceptualises the interplay between assurance and risk management through two models: the domain model and the control model. Assurance and risk management mutually depend on each other to reduce uncertainty and control risk levels. This work highlights the dual roles of assurance in risk management, acting as an epistemic actuator on the one side, and providing feedback about the strength of the justification on the other. Assurance and risk management play inseparable roles in ensuring safety in complex systems. By framing assurance as the primary means of tackling epistemic uncertainty within a broader risk management control structure, this work provides a robust conceptualisation for building justified confidence in system safety.
|
|
15:15-15:30, Paper We-S3-T11.6 | |
Stochastic Model for Analysis of Energy Availability in Multi-Microgrids |
|
Castellanos, Johanna | Federal University of Pernambuco |
Bermudez, Esau | Federal University of Pernambuco |
Maciel, Paulo | UFPE |
Keywords: Fault Monitoring and Diagnosis, Smart Buildings, Smart Cities and Infrastructures, Discrete Event Systems
Abstract: This paper presents a comprehensive Stochastic Petri Net (SPN) model for evaluating energy availability in multi-microgrid systems, integrating photovoltaic (PV) generation, battery storage, and distribution networks. The proposed framework addresses the inherent uncertainties of renewable energy sources (RES), dynamic load demands, and operational transitions between grid-connected and islanded modes. By leveraging SPNs, the model captures the probabilistic behavior of microgrid components, enabling precise assessment of steady-state and instantaneous availability metrics. A Reliability Block Diagram (RBD) further extends the analysis to cooperative microgrid networks, quantifying the collective ability to meet contractual energy commitments under shared surplus scenarios. Simulation results, validated using the CIGRE LV benchmark network, demonstrate the model’s efficacy in identifying reliability bottlenecks, optimizing maintenance strategies, and enhancing system resilience. The study highlights the SPN’s versatility in modeling complex interactions across residential, industrial, and commercial microgrids, providing actionable insights for sustainable energy system design.
|
|
We-S3-T12 |
Room 0.95 |
Innovative Perspectives on AI and Visual Analytics |
Special Sessions: HMS |
Chair: Huang, Weidong | University of Technology Sydney |
Co-Chair: Lin, Chun-Cheng | National Yang Ming Chiao Tung University |
Organizer: Huang, Weidong | University of Technology Sydney |
Organizer: Lin, Chun-Cheng | National Yang Ming Chiao Tung University |
Organizer: Bi, Chongke | Tianjin University |
Organizer: Dong, Alice Xiaodan | University of Technology Sydney |
|
14:00-14:15, Paper We-S3-T12.1 | |
Advancing AI-Enhanced Financial Security: A Review of Facial, Voice, and Medical Biometrics for Identity Verification (I) |
|
Chen, Zhicong | University of Technology Sydney |
Dong, Alice Xiaodan | University of Technology Sydney |
Peters, Gareth William | University of California Santa Barbara |
Chan, Jennifer | The University of Sydney |
Huang, Weidong | University of Technology Sydney |
Lin, Chun-Cheng | National Yang Ming Chiao Tung University |
Keywords: Information Visualization, Biometrics and Applications,
Abstract: Identity verification is a critical component of financial data and systems security and privacy preservation and is required by regulatory guidelines to ensure compliance with regulatory requirements and to aid in fraud prevention. With advances in artificial intelligence (AI), deep learning, and statistical methods, financial institutions are increasingly adopting multifactor authentication (MFA) that incorporates biometric-based approaches to perform authentication of identities for access to accounts, records or data and for verification of decision making and confirmation of actions. This paper presents a systematic review of current methodologies that utilize facial recognition, voice biometrics, and other data for identity verification in financial institutions. We explore the effectiveness and challenges associated with these approaches, highlighting recent developments in AI-driven models, deep learning architectures, and statistical techniques. In addition, we discuss the integration of multimodal biometric data and the decision and access systems that are developed for MFA approaches to improve security and accuracy. This review offers insights into the future of biometric identity verification in the financial sector. Our findings suggest that the integration of multimodal data in financial applications could serve as a valuable avenue for future research and practical applications. In addition, investigating the role of biometric authentication in back-end systems is an important area worth further exploration.
|
|
14:15-14:30, Paper We-S3-T12.2 | |
BiFTVis: A Bidirectional Feature-Tracking Method for Visual Analytics of Flow Fields (I) |
|
Bi, Chongke | Tianjin University |
Gao, Xin | Tianjin University |
Pan, Peiru | Tianjin University |
Liu, Le | Northwestern Polytechnical University |
Deng, Liang | China Aerodynamics Research and Development Center |
Wang, Fang | China Aerodynamics Research and Development Center |
Keywords: Information Visualization, Visual Analytics/Communication
Abstract: Accurate analysis of time-dependent flow fields generated by numerical simulations requires effective interpretation of temporal information. Visualizations offer an exceptional ability to convey complex data and have been widely used for such analysis. However, current research mainly focuses on flow features at individual time steps, lacking exploration of how these features evolve over time, leading to insufficient investigation and understanding of fluid motion laws. To overcome this limitation, this study proposes BiFTVis, an interactive visualization framework that enhances the complete flow field analysis pipeline. The framework includes a volume rendering module that represents flow features extracted using the Q criterion method. An additional forward tracking module is provided to track feature events by utilizing a graph optimization-based feature tracking algorithm. A timeline-based visual encoding method has been designed to convey multiple feature events along the time axis, facilitating the examination of events in specific time periods. Furthermore, a novel radial layout glyph along with a Directed Acyclic Graph has been designed for encoding multi-facet feature attributes, enabling fast identification of various feature types and reverse tracking analysis of feature events in a backward tracking module. BiFTVis has been evaluated through a case study, demonstrating its effectiveness in promoting new insights into potential causes of feature events and avoiding certain feature event occurrences, ultimately enhancing flow field analysis.
|
|
14:30-14:45, Paper We-S3-T12.3 | |
Reinforcement Learning Approach for On-Ramp Exit Considering Vehicle Trajectories and Tasks (I) |
|
Wei, Wenyuan | CATARC Automotive Technology (Shanghai) Co..Ltd |
Zhang, Hao | TIANJIN UNIVERSITY of TECHNOLOGY, China Coal Yongcheng Energy De |
Zhao, Shuai | Zhongqi Zhilian Technology Co. Ltd |
Yang, Lu | Tianjin University of Technology |
Bi, Chongke | Tianjin University |
Wang, Yiquan | Tianjin University of Technology |
Tan, Yansong | Tianjin University of Technology |
Keywords: Networking and Decision-Making, Intelligence Interaction, Environmental Sensing,
Abstract: To address the challenges of accurately predicting vehicle trajectories and prioritizing driving tasks in complex situations such as lane changing or highway ramp merging for autonomous vehicles, this paper introduces a deep reinforcement learning (DRL) merging control method called DRLI-P (DRL for Trajectory Prediction and Task Importance Network Fusion). DRLI-P integrates an LSTM-based vehicle trajectory prediction network with a rule-based task importance network (TIN). The TIN assesses the importance of vehicle action rules and complex driving tasks, alleviating problems associated with sparse reward distribution in DRL and improving sampling efficiency. The LSTM uses historical driving data to predict vehicle trajectories and constructs a state space to mitigate slow training speeds and reduced sensitivity to single state parameter changes due to high state dimensionality in multi-vehicle scenarios. A multi-category weighted reward function is developed that focuses on critical driving features such as target distance, vehicle motion information, and trajectory predictions. The proposed merging control method is applied to three leading DRL algorithms: DDPG, TD3, and SAC, followed by simulation experiments in the CARLA environment. The results show that the DRLI-P method significantly improves the convergence speed and performance of all three algorithms, with the most notable improvement seen in the SAC algorithm, thereby increasing the safety, efficiency, and convenience of DRL algorithms for merging control.
|
|
14:45-15:00, Paper We-S3-T12.4 | |
Predicting Signals for Algorithmic Cryptocurrency Trading: A Hybrid Convolutional Neural Network – Gated Recurrent Unit (CNN-GRU) Architecture (I) |
|
Le, Thanh-Nhan | University of Technology Sydney |
Anaissi, Ali | University of Technology Sydney |
Huang, Weidong | University of Technology Sydney |
Hua, Jie | Shaoyang University |
Keywords: Medical Informatics, Visual Analytics/Communication
Abstract: This paper proposes a hybrid Convolutional Neural Network – Gated Recurrent Unit (CNN-GRU) architecture for producing algorithmic trading signals for the Binance Coin (BNB) cryptocurrency dataset. It uses the standalone models, CNN and GRU, as benchmarks for comparing both classification and trading performance. Results show that classification performance of CNN-GRU is quite subpar comparing to that of the standalone models; however, this model achieves the highest mean trading performance. Although the outcomes are financially promising, the paper has not explored algorithmic trading in its full glory, so results are open to further improvements. Possible future works include employing other methods for improving imbalanced classification, more feature engineering and testing with different timeframes, and a more involved approach with feature explainability.
|
|
15:00-15:15, Paper We-S3-T12.5 | |
Advances in AI-Driven Diagnostics for Traditional Chinese Medicine: A Review of Tongue, Face, and Pulse Analysis (I) |
|
Zhang, Siqi | University of Technology Sydney |
Dong, Alice Xiaodan | University of Technology Sydney |
Huang, Weidong | University of Technology Sydney |
Keywords: Information Visualization
Abstract: This paper provides a systematic review of how artificial intelligence (AI) techniques have been applied in the diagnosis of traditional Chinese medicine (TCM). It highlights future directions for integrating the three key diagnostic indicators, tongue morphology, facial complexion, and pulse waveforms, into a unified framework. We found that existing studies rarely address the fundamental tension between TCM’s dialectical reasoning and the data-driven optimization strategies typical of computer vision models. Beyond reviewing current computer vision applications in tongue, facial, and pulse diagnosis, this paper also highlights the deeper epistemological divide between TCM and AI methodologies.
|
|
15:15-15:30, Paper We-S3-T12.6 | |
Generative Street-View Using Satellite Images with Hallucination Reduction Via Semantic Constraining (I) |
|
Esan, Oluwatoni | Cranfield University |
Gugushvili, Mariam | Cranfield Uni |
Konain, Jean | Cranfield Uni |
Hanish, Uppal | Cranfield Uni |
Thomas, Prosser | Cranfield University |
Qiu, Minqing | Cranfield University |
Wisniewski, Mariusz | Cranfield University |
Xing, Yang | Cranfield University |
Guo, Weisi | Cranfield University |
Keywords: Information Visualization, Telepresence, Environmental Sensing,
Abstract: Autonomous navigation requires training data in diverse transport settings. Accurate street/ground level representation is important to train autonomous driving, tourism planning, environmental protection, and a wide range of sectors. Many parts of the inhabited and most of the uninhabited world lacks street view imagery. Current street image generation can transform satellite imagery into synthetic 3D images, but there is a high level of hallucination. Here, we develop a Neural Gazetteer that integrates semantic narrative data (e.g., review comments and place attributes) to reduce hallucination in generative street-view images. Our novel work flow involves using satellite imagery to extract a geometry projection of the area and then integrating semantic narrative data into a diffusion model to generate realistic street views. We perform a wide range of comparisons with ground truth for urban and rural areas to identify the performance of our approach at both the feature-scale as well as the human perception semantic-scale.
|
|
We-S3-T13 |
Room 0.96 |
Artificial Intelligence of Things |
Regular Papers - Cybernetics |
Chair: Hu, Yue | Beijing University of Posts and Telecommunications |
Co-Chair: Roldán-Gómez, José | University of Zaragoza |
|
14:00-14:15, Paper We-S3-T13.1 | |
StrawDet-YOLO: A Real-Time and High Precision Object Detection for Strawberry |
|
Hu, Yue | Beijing University of Posts and Telecommunications |
Jiang, Xiankun | Beijing University of Posts and Telecommunications |
Guan, Jianfeng | Beijing University of Posts and Telecommunications |
Keywords: AIoT, AI and Applications, Application of Artificial Intelligence
Abstract: Accurate detection of strawberry diseases is critical for ensuring yield stability and fruit quality, yet existing detection methods struggle with fine-grained symptom variations, complex field conditions. To address these challenges, we propose StrawDet-YOLO, a lightweight yet high-precision detection model based on YOLOv8, tailored for strawberry disease recognition in real-world agricultural environments. First, we construct Strawberry-12, a new benchmark dataset containing 3,906 images and 16,107 annotations across 12 categories, including diseases, pests, and nutrient deficiencies. Second, we introduce the BFSA (Bilinear Fused Synergic Attention) module for refined spatial-channel attention fusion and the R-SCConv (Residual Spatial and Channel Convolution) module for robust multi-scale feature reconstruction, improving the model’s sensitivity to subtle lesions. Finally, we design a gradient-aware loss function GWIoU (Gradient-Weighted IoU loss) to dynamically emphasize hard samples and enhance localization performance. Extensive experiments demonstrate that StrawDet-YOLO achieves a 7.7% precision improvement and superior mAP compared to YOLOv8n, while maintaining real-time inference at 0.4 ms per image. These results validate its practicality for deployment in smart farming and disease monitoring applications.
|
|
14:15-14:30, Paper We-S3-T13.2 | |
Efficient Cattle Tracking in Aerial Videos Using YOLO11n and Lightweight Algorithms |
|
de HolandaLeal, Ismael | Universidade Federal Do Piauí |
Ramos Gonçalves, Allan Jheyson | Universidade Federal Do Piauí |
da Rocha, Mauricio Benjamin | UFPI |
Silva, Romuere Rodrigues Veloso e Silva | Federal University of Piauí |
Rabelo, Ricardo A. L. | Federal University of Piaui |
Keywords: AIoT, AI and Applications, Application of Artificial Intelligence
Abstract: This paper presents a system for cattle detection and counting in aerial videos, integrating the YOLO11n object detection model with three tracking algorithms: Euclidean Distance, ByteTrack, and DeepSORT. The aim is to evaluate and compare these approaches’ performance and computational efficiency in real UAV-based monitoring scenarios. The YOLO11n model achieved high detection performance, with 86.5% precision, 97% recall, an F1-score of 93%, and a mAP@0.50 of 95.5%. Tracking performance was assessed using the MOTA, HOTA, and IDF1 metrics across four distinct videos. The Euclidean Distance-based approach, despite its simplicity, demonstrated competitive performance in all metrics, with HOTA consistently above 0.89 and IDF1 ranging from 0.857 to 0.923, highlighting its robustness and consistency. These results indicate that lightweight tracking algorithms are not only viable but, in some scenarios, preferable for real-time cattle monitoring using UAVs.
|
|
14:30-14:45, Paper We-S3-T13.3 | |
AIoT-Based Mobile Edge Computing Architecture for Remote Environments |
|
Ramos Gonçalves, Allan Jheyson | Universidade Federal Do Piauí |
de HolandaLeal, Ismael | Universidade Federal Do Piauí |
Leão, Erico | Federal University of Piauí |
Rabelo, Ricardo A. L. | Federal University of Piaui |
Keywords: AIoT, Application of Artificial Intelligence, Cloud, IoT, and Robotics Integration
Abstract: Artificial Intelligence of Things (AIoT) represents a technological revolution that integrates the processing of Artificial Intelligence (AI) with the Internet of Things (IoT). This work proposes an AIoT-based Mobile Edge Computing Architecture that enables the decentralization of the on-demand decision-making process in remote environments through hierarchical network management, emphasizing the edge and fog layers. The architecture aims to apply AI in an airborne IoT network infrastructure to increase the efficiency of applications composed of low-power computing devices in terms of processing, communication, and energy. The proposed architecture aims to direct the adequate use of AI models in a decentralized manner on-demand, seeking to improve applications in terms of latency, bandwidth consumption, and energy. The study showed that the NCNN format offers the best energy efficiency and processing speed at the edge layer, processing up to 219 frames in real-time with only 3.57 mAh and 80 ms per image. While fog computing reduced memory usage, it increased latency over 40 meters, highlighting the need for hybrid strategies.
|
|
14:45-15:00, Paper We-S3-T13.4 | |
Adaptive Federated Learning-Based Architecture for Intrusion Detection in IoT/IIoT Environments |
|
García-Sáez, Luis Miguel | University of Castilla-La Mancha |
Ruiz-Villafranca, Sergio | University of Castilla-La Mancha |
Roldán-Gómez, José | University of Zaragoza |
Carrillo-Mondéjar, Javier | University of Zaragoza |
Martínez, José Luis | University of Castilla-La Mancha |
Keywords: AIoT, Cloud, IoT, and Robotics Integration, Machine Learning
Abstract: The rapid expansion and growth of Internet of Things (IoT) and Industrial Internet of Things (IIoT) environments has led to an increase in the number of attacks and risks in these environments. This presents new cybersecurity challenges that require more advanced intrusion detection systems (IDS). However, IDS based on centralised Machine Learning (ML) face problems of scalability, latency, and privacy. In this context, Federated Learning (FL) offers a decentralised approach that allows multiple nodes to train models collaboratively without exposing sensitive data. This work presents a federated IDS tailored for IoT/IIoT environments and introduces FedWLA, an aggregation strategy that dynamically weights updates according to the quality and uncertainty of local data. The proposed architecture is evaluated through different IoT/IIoT traffic datasets orientated to cybersecurity and widely used in these environments. It shows comparable and even superior performance to centralised methods, with an average F1-Score ranging between 0.98 - 0.99 for the tests performed. Moreover, the proposed FedWLA strategy consistently outperforms other federated aggregation approaches, such as FedAvg and FedProx, particularly in heterogeneous scenarios. These results demonstrate the capability and potential of FL in intrusion detection, effectively leveraging the scalability and privacy advantages it offers.
|
|
15:00-15:15, Paper We-S3-T13.5 | |
FLIFRA: Hybrid Data Poisoning Attack Detection in Federated Learning for IoT Security |
|
Anley, Mulualem Bitew | University of Milan |
Genovese, Angelo | Università Degli Studi Di Milano |
Tesema, Tibebe Beshah | Addis Ababa University |
Piuri, Vincenzo | Universita' Degli Studi Di Milano |
Keywords: AIoT, Computational Intelligence
Abstract: The rapid expansion of IoT devices has transformed numerous industries by enabling extensive data collection and real-time analytics. Federated Learning (FL) offers a decentralized model training paradigm that ensures data privacy, making it particularly suitable for IoT environments. Yet, it remains vulnerable to poisoning attacks that can severely compromise model integrity, wherein malicious clients compromise the global model by injecting poisoned updates. Existing defenses, which focus primarily on global model performance, often fail to effectively integrate local anomaly detection with global weighting mechanisms, thus limiting their efficacy against such threats. Addressing this research gap, we propose FLIFRA (Federated Learning Isolation Forest with Robust Aggregation), a hybrid defense framework that combines client-side anomaly detection using Isolation Forest (iForest) with dynamic reputation-based robust aggregation at the server. This dual-layer approach filters out malicious updates before aggregation and adjusts client reputations to mitigate adversarial influence. Our evaluation of three cybersecurity datasets (CIC-IDS2018, BoT-IoT, and UNSW-NB15) under various intensities of poisoning (10%, 20%, 30%, and 40%) demonstrates that the proposed method outperforms the traditional aggregation schemes of FedAvg, Krum, Trimmed Mean, DRRA, and WeiDetect in the literature. In particular, our framework achieves higher detection accuracy, faster convergence, and improved stability, even in highly heterogeneous data environments.
|
|
15:15-15:30, Paper We-S3-T13.6 | |
ActionFi: Human Action Recognition Via Multimodal Feature Fusion of Restructured CSI and Optical Flow |
|
Zhang, Zheng | Inner Mongolia University |
Zhang, Junxing | Inner Mongolia University |
Keywords: AIoT, Multimedia Computation, Deep Learning
Abstract: WiFi sensing has attracted increasing attention from researchers due to its contactless and device-free characteristics. Multimodal methods based on WiFi CSI and images have made significant achievements in the field of human action recognition. However, there are also some issues that need to be addressed, such as ignoring the contribution of the subcarrier dimension in CSI and using input data with extremely asymmetric subcarrier dimension and sample size in the recognition process, and the image contains a large number of unrelated background factors about the recognition subject and is prone to leaking user privacy. To address these issues, we propose a novel multimodal action recognition method called ActionFi, which restructured the CSI into data blocks with an equal number of subcarriers and samples. Using optical flow images instead of traditional photos in multimodal learning can eliminate irrelevant background factors while preserving essential action information. We use a dual-stream network with different types of CNNs as the main body to extract the features of restructured CSI and optical flow images, and fuse these two parts of features to achieve accurate human actions recognition. We conducted extensive experiments on the large open-source dataset MM-Fi, the results show that our ActionFi achieved state-of-the-art performance in evaluation metrics.
|
|
We-S3-T14 |
Room 0.97 |
Intelligent Medical Mechatronics Systems and Applications & Ethical and
Normative Decision-Making in Neurotechnology: Safeguarding Human
Autonomy and Collective Agency |
Special Sessions: HMS |
Chair: Kuo, Chung-Hsien | National Taiwan University |
Co-Chair: Wood, Guilherme | University of Graz |
Organizer: Kuo, Chung-Hsien | National Taiwan University |
Organizer: Liu, Yi-Hung | National Yang-Ming Chia-Tung University |
|
14:00-14:15, Paper We-S3-T14.1 | |
Simulation and Control of an Exoskeleton for Lower Limbs Rehabilitation (I) |
|
Frascella, Simona | Polytechnic University of Bari |
Roccotelli, Michele | Polytechnic University of Bari |
Fanti, Maria Pia | Polytecnic of Bari, Italy |
Keywords: Human Enhancements, Human-Collaborative Robotics, Assistive Technology
Abstract: Being able to walk is one of the most important human abilities. With the increase in life expectancy, the disability rate is also rising, and research is extensively focusing on robotic devices to address this issue. These devices are now being applied in various fields for the assistance and rehabilitation of patients with different types of motor impairments. The aim of this article is to develop a lower limb exoskeleton model controlled using standard regulators in Simulink. After analyzing the construction of the model, the results of various simulations will be presented based on different desired response types and compared with the state of the art.
|
|
14:15-14:30, Paper We-S3-T14.2 | |
Implementation of a Location Feedback System for CPR Training with Resusci Anne (I) |
|
Zheng, Shun-Chia | Chang Gung University |
Lin, Wen-Yen | Chang Gung University |
Keywords: Medical Informatics, Human-Machine Interaction, User Interface Design
Abstract: An accelerometer-based pushing location feedback mechanism for chest compression in CPR training was proposed previously. In this article, a system implementation is disclosed for the proposed detection mechanism in the Resusci Anne to be used in CPR training. In the implementation, not only the previously proposed mechanism is implemented so that the feedback information can be shown with graphical interfaces on a near-by mobile device in real-time, but several issues, such as the position offset of the manikin used for CPR training causing from the compression activities, are also resolved. With the implementation, the proposed mechanism is realized and is ready to be commercialized as a location feedback device for CPR training to enhance the basic requirement from AHA (American Heart Association) so that the trainees who are taking the CPR training courses with this device have higher chance to learn the skills to deliver high-quality CPR when facing a cardiac emergency.
|
|
14:30-14:45, Paper We-S3-T14.3 | |
An EOG Guided P300 Flicker Display Paradigm for Improving the Driving Intuition of BCI Wheelchair Users (I) |
|
Nguyen, Phuc Thanh-Thien | National Taiwan University |
Nguyen, Dai-Dong | National Taiwan University |
Lin, Yi-Tseng | National Taiwan University of Science and Technology |
Kuo, Chung-Hsien | National Taiwan University |
Keywords: Assistive Technology, Human-Machine Interface, Brain-Computer Interfaces
Abstract: Most existing P300 brain computer interface (BCI) based brain-controlled wheelchairs (BCWs) position visual stimulus panels in a manner that requires users to frequently shift their gaze between the control panel and the surrounding environment. This constant gaze shifting can reduce intuitiveness and compromise safety. To address this issue, this study introduces a translucent visual stimuli control panel placed directly within the user's field of vision. Visual stimuli, representing movement commands (forward, backward, left, right, and stop), are projected onto this panel using a micro-projector positioned 35 cm away from the user. Notably, the location of these flashing stimuli dynamically aligns with the user’s eye gaze, which is detected through electrooculographic (EOG) signals. Consequently, the control interface naturally follows the user’s visual attention, enhancing both intuitiveness and operational safety. Classification of the P300 signals for wheelchair control is performed using a support vector machine (SVM), while the canonical correlation analysis spatial filter enhances the information transfer rate. Performance assessments involved 10 healthy subjects navigating wheelchair trajectories “U” (5.7 m) and “S” (12.4 m) shapes. Experimental results confirmed that the eye-gaze-guided BCI approach significantly improves accuracy and user experience compared to traditional non-EOG-guided methods.
|
|
14:45-15:00, Paper We-S3-T14.4 | |
Tackling Neuroenchantment: A Multiperspective Approach (I) |
|
Wood, Guilherme | University of Graz |
Dolezal, Eugen | University of Vienna |
Berger, Lisa | University of Graz |
Zandonella, Petra | University of Graz |
Gremsl, Thomas | University of Graz |
Staudegger, Elisabeth | University of Graz |
Keywords: Brain-Computer Interfaces, Systems Safety and Security,, Ethics of AI and Pervasive Systems
Abstract: We propose a unified framework to chart the different reasons for neurotechnology use and aid ethical decision making. Neurotechnologies are methods to record, analyze and manipulate brain activity. They are essential tools of diagnostics and rehabilitation but are more and more finding their way into the field of enhancement, which entails several challenges and risks, especially since the borders between rehabilitation and enhancement are not clear. Neuroenchantment, the persuasive power of neurotechnologies, can propagate wrong beliefs and hamper critical thinking, leading to inappropriate risk judgement. We propose a framework of two axes named “discovery” and “recovery” to depict specific technologies and allow comparisons among technologies at individual, social, and cultural levels. Recovery describes rehabilitative reasons of usage and discovery describes enhancement-related reasons, whereby both concepts are associated with one another. The overlap of both can result in a gray zone of ethical decision making. This danger can be better identified with our framework, offering implications for medical, therapeutic, and regulatory efforts.
|
|
15:00-15:15, Paper We-S3-T14.5 | |
A Wearable Low-Cost Photoplethysmography Acquisition Device for Continuous Heart Rate Monitoring |
|
Bhongade, Amit | Indian Institute of Technology Delhi |
Sharma, Bhanuj | Amity University |
Gandhi, Tapan Kumar | Indian Institute of Technology Delhi |
Keywords: Assistive Technology, Human-Machine Interface, Wearable Computing
Abstract: Photoplethysmography (PPG) is a non-invasive technique for detecting heart rate (HR) but is often hampered by noise, which affects its reliability in cardiac monitoring applications like heart rate variability (HRV) and blood pres sure measurement. To overcome this, a Wearable LOw-cost PPG acquisition deVicE (WeLOVE) was developed, and varia tional mode decomposition combined with principal component analysis (VMD-PCA) is proposed to estimate HR and other cardiovascular parameters from PPG signals. The accuracy of WeLOVE was validated against the Equivital wireless phys iological monitoring system (EQO2) under various breathing conditions. The mean absolute error (MAE) of HR between the PPG signal and ECG signal was 7.8±3.25 bpm during normal breathing and 8.86±4.45 bpm during slow breathing across subjects. The root mean square error (RMSE) was 13.46±5.42 bpm for normal breathing and 17.16±8.34 bpm for slow breathing across subjects. The average HR measured by ECG was 71.77±8.15 bpm for normal breathing and 68.85±9.5 bpm for slow breathing, while PPG readings were 65.61±6.34 bpm and 65.14±7.64 bpm, respectively, across subjects. These results demonstrate that the WeLOVE device, combined with the VMD-PCA method, offers significantly accurate heart rate (HR) measurements while effectively mitigating motion artifacts. The robustness of this approach ensures reliable performance even in dynamic conditions, making it particularly well-suited for real-time applications in cardiac monitoring. This capability is essential for continuous health tracking, wearable medical devices, and remote patient monitoring, where precise and artifact-resistant HR estimation is crucial for timely and effective clinical decision-making.
|
|
15:15-15:30, Paper We-S3-T14.6 | |
Visual-Textual Embedding Fusion for Lung CT Retrieval: A Deep Learning Approach to Semantic Gap Reduction |
|
Hattab, Mahbouba | ISITCom, University of Sousse, Tunisia |
Maalel, Ahmed | University of Manouba, National School of Computer Sciences, RIA |
Keywords: Medical Informatics, Information Visualization
Abstract: In recent years, Content-Based Medical Image Retrieval (CBMIR) has gained prominence as a tool for clinical decision support by enabling radiologists to retrieve visually similar cases. However, the persistent semantic gap between low-level visual features and high-level clinical semantics limits retrieval effectiveness. We propose a multi-modal CBMIR framework that reduces the semantic gap by fusing visual and textual features from the LIDC-IDRI dataset. Our dual-branch architecture combines a CNN for lung CT image features with a transformer encoder for radiology reports, integrating both in a shared embedding space via deep metric learning. Retrieval is performed using FAISS, with results showing improved Precision@K and mAP over unimodal methods. The proposed fusion strategy enhances both retrieval accuracy and interpretability in lung nodule assessment.
|
|
We-S3-BMI.WS |
Room 0.49&0.50 |
BMI Workshop - Paper Session 4: Advances in BCIs 2 |
BMI Workshop |
Chair: A. P., Vinod | Singapore Institute of Technology |
|
14:00-14:15, Paper We-S3-BMI.WS.1 | |
Quantifying the Risk of Private Information Leakage in the Metaverse with EEG-Instrumented Virtual Reality Headsets (I) |
|
Jaberi, Mina | Institut National De La Recherche Scientifique |
Bouchard, Stéphane | Université Du Québec En Outaouais |
Falk, Tiago H. | INRS-EMT |
Keywords: Passive BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: As virtual reality (VR) technologies become more immersive and metaverse applications burgeon, modern headsets are increasingly becoming equipped with sensors capable of capturing multiple neurophysiological signals. While these signals can be used to measure, in real-time, quality of experience metrics that can be used to enhance interactivity and user engagement, they may also introduce novel privacy risks by unintentionally leaking sensitive personal attributes. In this paper, we explore the extent in which electroencephalography (EEG) signals, recorded during an immersive VR memory task, can be used to infer users' private information, such as age, biological sex, and identity. We employ both classical machine learning models with hand-crafted features, as well as end-to-end deep learning approaches. Our findings demonstrate that EEG-based features can, indeed, leak information about biological sex, age, and user identity, with end-to-end models obtaining the best performance. Feature importance ranking and deep neural network saliency maps were used to provide explainability on the neural patterns used by the models. We conclude with recommendations on how these findings can also be used to help secure future metaverse applications.
|
|
14:15-14:30, Paper We-S3-BMI.WS.2 | |
Enhancing Cross-Task Learning-Based Multiclass Motor Imagery Classification in Brain Computer Interfaces Using Conditional Domain Adversarial Network |
|
K M, Devika | Singapore Institute of Technology |
Parashiva, Praveen Kumar | Singapore Institute of Technology |
A. P., Vinod | Singapore Institute of Technology |
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: Motor imagery (MI)-based Brain-Computer Interface (BCI) systems decode the neuronal patterns during motor movement imagination tasks and induce neuroplasticity to assist patients with stroke rehabilitation. Deep Learning methods achieve better generalization but require a large dataset for which longer calibration sessions are required. Transfer Learning methods learn the shift in data distribution from source to the target domain without the need for a large dataset. The existing cross-subject and cross-session transfer learning methods learn the shift in input data distribution alone, ignoring label distribution. This work proposes a cross-task transfer learning to learn new MI tasks by realigning the parameters of the deep learning model. The parameters of the pre-trained EEGNet model, trained on binary MI tasks, are re-aligned to decode a new MI task using Conditional Domain Adversarial Network (CDAN), while improving the generalizability across existing and new MI tasks. The proposed method is evaluated on the BCI Competition IV 2a dataset and compared with the popular fine-tuning approach for transfer learning in deep learning. The classification results achieved using the proposed CDAN-based cross-task transfer learning approach generalize better compared to conventional training and the fine-tuning approaches for transfer learning. The proposed method achieved an improvement in classification accuracy of around 6% for learning a third MI task from a pre-trained binary MI task classifier using EEGNet. The proposed CDAN-based cross-task transfer learning approach can significantly reduce the calibration session to learn new MI tasks, as it offers to learn the shift in input and output label distribution due to new MI tasks. Further, the proposed method can be scaled to learn dexterous MI tasks and kinematics relation information from MI tasks.
|
|
14:30-14:45, Paper We-S3-BMI.WS.3 | |
EEGScaler: A Deep Learning Network to Scale EEG Electrode and Samples for Hand Motor Imagery Speed Decoding |
|
Parashiva, Praveen Kumar | Singapore Institute of Technology |
Gangadharan K, Sagila | Singapore Institute of Technology |
A. P., Vinod | Singapore Institute of Technology |
Keywords: BMI Emerging Applications, Passive BMIs, Active BMIs
Abstract: Motor Imagery (MI)-based Brain-Computer Interface (MI-BCI) systems induce neuroplasticity, promoting rehabilitation in stroke-affected patients. Decoding of kinematics information such as speed from unilateral hand MI tasks provides more natural control of the BCI systems. However, decoding speed related information from unilateral MI tasks is challenging due to the significant spatial overlap of neuronal sources and the inherently low spatial resolution of EEG. To address this, we propose EEGScaler, an end-to-end deep learning framework designed to decode slow v/s fast MI tasks by adaptively scaling EEG samples and electrodes with high discriminative value. The work proposes electrode-scaling and sample-scaling blocks to learn the importance of electrodes (i.e., spatial) and samples (i.e., temporal) in decoding speed from MI tasks. Further, spatiotemporal features are extracted using temporal and depth-wise convolution filters. In this work, subject-independent data is used to learn the weights of the EEGScaler and then subject-specific data is used to fine-tune the weights of the proposed electrode-scaling and sample-scaling blocks. The proposed method is implemented on 14 healthy subjects’ EEG data to classify slow v/s fast MI tasks performed using their dominant hand. The cross-validated subject-specific classification accuracy achieved using the proposed method outperformed the existing methods by ~7%. EEGScaler is a novel end-to-end learning model designed to assign importance to spatial and temporal information in EEG via electrode- and sampling-scaling blocks, respectively. Decoding of kinematics such as speed of MI tasks increases the degree of freedom in BCI systems, paving the way for more intuitive and efficient neurorehabilitation applications. This advancement has the potential to improve motor rehabilitation strategies by enabling more precise and adaptive BCI-driven therapy tailored to individual recovery needs.
|
|
14:45-15:00, Paper We-S3-BMI.WS.4 | |
Learnable Frequency-Weighting Layer for Improving EEG-Based BCI Performance |
|
Ding, Tsujen | National Tsing Hua University |
Hsu, Hui-Yu | National Tsing Hua University |
Kuo, Po-Chih | National Tsing Hua University |
Zai-Fu, Yao | National Tsing Hua University |
Pan, Cheng-Yu | National Tsing Hua University |
Pan, Bo-Yu | Soochow University, Taiwan |
Keywords: BMI Emerging Applications, Passive BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: This paper presents a learnable frequency- weighting layer for enhancing EEG-based Brain-Computer Interface (BCI) performance, particularly in higher-order cognitive tasks. The proposed layer, integrating wavelet transform with convolution operations, selectively emphasizes task- relevant frequency components while suppressing those associated with noise or unrelated brain activities. We evaluated our method on two datasets—a local dataset focusing on spatial perspective-taking tasks and the Cog-BCI dataset involving the N-back task, a standard paradigm for assessing working memory load. Empirical results indicate that incorporating the learnable frequency-weighting layer into commonly used deep learning models (EEGNet-V1, EEGNet-V4, ShallowConvNet, and DeepConvNet) consistently yields significant improvements in classification accuracy. Notably, our approach demonstrates effectiveness in tasks exhibiting similar brain-wave patterns, underscoring the adaptability of the method to a wide range of EEG-based BCI applications. Overall, this study provides a robust and generalizable approach to improving EEG-based BCI systems, enhancing their practicality in real-world cognitive applications.
|
|
15:00-15:15, Paper We-S3-BMI.WS.5 | |
Sense-It: Towards New Types of BCI-Based Sensorimotor Neurofeedback for Motor Rehabilitation (I) |
|
Yousefi, Mathilde | University of Lorraine |
Frey, Jérémy | Qualya |
Rimbert, Sébastien | INRIA |
Herrera Altamira, Gabriela | Université De Lorraine, LORIA |
Fleck, Stephanie | Université De Lorraine |
Keywords: BMI Emerging Applications, Active BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: Combined with conventional rehabilitation of the upper limbs, kinesthetic motor imagery (KMI) is a promising direction in post-stroke neurorehabilitation. Unfortunately, KMI is a complex mental task for people, as it provides no feedback, making it impossible to self-assess, correct, and improve their performance and, consequently, their rehabilitation. To overcome this difficulty, we have designed Sense-IT, a deformable interface coupled with a gamified BCI that provides tangible multisensory feedback (visual and kinesthetic) to allow for a supportive and differentiable rehabilitation program according to patients' needs and preferences. This original double-blind and mixed study (N=36) compares the effects of three feedback modalities (visual, kinesthetic via Sense-IT, bimodal) on the user experience and motor area stimulation. Although neurophysiological results did not reveal significant differences in contralateral motor cortex activation across the three feedback modalities, subjective measures highlighted the added value of tangible and multimodal feedback. Participants reported that the kinesthetic feedback provided by Sense-IT enhanced their engagement and task comprehension. Overall, these findings support the relevance of combining visual and kinesthetic neurofeedback to improve user experience and emphasize the importance of delivering reliable and meaningful feedback in BCI-based motor rehabilitation.
|
|
15:15-15:30, Paper We-S3-BMI.WS.6 | |
EEG-Based Neural Representation and Decoding of Imagined Phonemes |
|
Zhu, Ziyue | University College London |
Patel, Rishan | University College London |
Kelly, Michael-Merlin | University College London |
Garrison-Hooks, Emmanuel | University College London |
Cho, Youngjun | University College London |
Carlson, Tom | University College London |
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: Speech impairments caused by severe diseases and injuries are serious health problems. The deprivation of communication ability can significantly decrease the quality of daily life. Inner speech is a natural mental activity used by many healthy and disabled people with intact cognitive ability. With the help of Brain-Computer Interfaces (BCIs), imagined speech can be decoded as semantic output or downstream commands for external devices, showing great potential for intuitive neural interface control. As many works have demonstrated promising results with invasive brain recordings, it remains a challenge for non-invasive BCIs to realize reliable speech decoding due to the trade-off between safety and signal quality. In this study, we explored the feasibility and neural mechanism behind a non-invasive brain recording technique based on electroencephalogram (EEG) during speech imagery with 6 English phonemes. We found significant time-frequency representation differences between /b/ and /u:/ and showed feasibility for pair-wise imagined phoneme classification with Filter Bank Common Spatial Pattern. We also demonstrated the EEG neural representation of phonemes in the latent space and how they are separated. Our results suggested that for naive BCI users, subject-specific phoneme pairs yielded the best performance. Finally, we discussed the challenges and potential optimization directions for future EEG-based speech BCIs.
|
|
We-S4-T1 |
Hall F |
Deep Learning & Representation Learning 2 |
Regular Papers - Cybernetics |
Chair: Wang, Yifan | Southwest Petroleum University |
Co-Chair: Cao, Helin | University of Bonn |
|
16:00-16:15, Paper We-S4-T1.1 | |
MMNet: A Multi-Scale Multimodal Model for End-To-End Grouping of Fragmented UI Elements |
|
Zhang, Liuzhou | Central China Normal University |
Wang, Yuanlei | Hunan University |
Yuqi, Zhao | Central China Normal University |
Tian, Shuangshuang | Central China Normal University |
Keywords: Representation Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence
Abstract: Graphical User Interface (GUI) designs often result in fragmented elements, leading to inefficient and redundant code when automatically converted. This paper presents MMNet, a novel end-to-end model for grouping these fragmented elements, leveraging multimodal feature representations and advanced retention mechanisms to improve grouping accuracy. MMNet uses UI sequence prediction, enhanced by large multimodal models, and a multi-scale retention mechanism to build a UI encoder. This approach captures temporal dependencies and multi-scale features, improving multimodal representation learning. To address the lack of fragmented UI element datasets, we constructed a new dataset and enriched its visual information using advanced multi-modal large models. Given the complex nature of UI design prototypes, it remains challenging for models to effectively learn the relationships between different modalities. We have adopted a multi-scale retention mechanism to further refine the relationship modeling between UI elements. Evaluated on our dataset of 71,851 UI elements, MMNet outperformed three state-of-the-art deep learning methods, demonstrating its effectiveness and innovation. The open-source code and datasets are available at https://anonymous.4open.science/r/MMNet-2343.
|
|
16:15-16:30, Paper We-S4-T1.2 | |
OC-SOP: Enhancing Vision-Based 3D Semantic Occupancy Prediction by Object-Centric Awareness |
|
Cao, Helin | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Image Processing and Pattern Recognition, Representation Learning, Deep Learning
Abstract: Autonomous driving perception faces significant challenges due to occlusions and incomplete scene data in the environment. To overcome these issues, the task of semantic occupancy prediction (SOP) is proposed, which aims to jointly infer both the geometry and semantic labels of a scene from images. However, conventional camera-based methods typically treat all categories equally and primarily rely on local features, leading to suboptimal predictions—especially for dynamic foreground objects. To address this, we propose Object-Centric SOP (OC-SOP), a framework that integrates high-level object-centric cues extracted via a detection branch into the semantic occupancy prediction pipeline. This object-centric integration significantly enhances the prediction accuracy for foreground objects and achieves state-of-the-art performance among all categories on SemanticKITTI.
|
|
16:30-16:45, Paper We-S4-T1.3 | |
CF-ViT: Cross-Feature Vision Transformer for Improving Feature Learning on Tiny Datasets |
|
Meng, Chunlei | Fudan University |
Lin, Wei | Fudan University |
Yang, Jiacheng | Jihua Laboratory |
Liu, Yi | Fudan University |
Zhang, Hongda | Fudan University |
Chen, Yuning | Fudan University |
Liu, Bowen | Fudan University |
Zhou, Ziqing | Fudan University |
Ouyang, Chun | Fudan University |
Gan, Zhongxue | Fudan University |
Wu, Dunzhao | Jiangling Motors Corporation, Ltd |
Nie, Zhihua | Fudan Univercity |
Keywords: Machine Vision, Image Processing and Pattern Recognition, Representation Learning
Abstract: Efficient feature learning is considered indispensable for maximizing the representation of scarce information in tiny datasets. However, existing methods are often unable to fully exploit local features and contextual dependencies when dealing with tiny datasets. To overcome this shortcoming, a Cross-Feature Vision Transformer (CF-ViT) was proposed, which decouples local feature refinement from global context modeling and leverages the complementary strengths of CNNs and Transformers. Specifically, a Cross-Scale Fusion (CSF) module was introduced to integrate features from multiple scales, ensuring that cross-scale information is globally embedded. In addition, a Feature Enhancement and Reorganization (FER) module was incorporated into CF-ViT, whereby Transformer outputs are reorganized into 2D feature maps for convolution-based detail enhancement to thoroughly exploit local information. Extensive experiments have demonstrated that CF-ViT consistently surpasses baselines across 4 tiny datasets, reaching a 96.87% (KSDD) Top-1 accuracy with only 29.19 million parameters and 2.67 billion FLOPs. Moreover, a Top-1 accuracy of 85.03% is attained on a real-world tiny dataset of wood surface defect detection, exceeding all baselines. These findings underscore the effectiveness and generalization capability of CF-ViT in capturing fine-grained local details and global context, offering a promising and deployable solution for vision tasks in tiny datasets.
|
|
16:45-17:00, Paper We-S4-T1.4 | |
3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations |
|
Wu, Xuecheng | Xi'an Jiaotong University |
Xue, Junxiao | Zhejiang Lab |
Fu, Liangyu | School of Software, Northwestern Polytechnical University |
Nie, Jiayu | Xi'an Jiaotong University |
Huang, Danlei | Xi’an Jiaotong University |
Yin, Xinyi | Zhengzhou University |
Hu, Tingqi | Zhengzhou University |
Keywords: Machine Vision, Neural Networks and their Applications, Deep Learning
Abstract: Recent research on general real-time object detectors (e.g., YOLO series) has demonstrated the effectiveness of attention mechanisms for elevating model performance. Nevertheless, existing methods often neglect to unifiedly deploy hierarchical attention mechanisms to construct a more discriminative YOLO head which is enriched with more useful intermediate features. To tackle these gaps, this work aims to leverage multiple attention mechanisms to hierarchically enhance the triple discriminative awareness of the YOLO detection head and complementarily learn the coordinated intermediate representations, resulting in a new series detector 3A-YOLO for the general scenarios. Specifically, taking YOLOv4 and YOLOv8 as baselines, we first propose a new detection head denoted TDA-YOLO Module, which unifiedly enhance the representations learning capability of scale-awareness, spatial-awareness, and task-awareness. Secondly, we steer the intermediate features to coordinatedly learn the inter-channel relationships and precise positional information. Finally, we perform neck network improvements followed by introducing various tricks to boost the adaptability of our 3A-YOLO. Extensive experiments across COCO and VOC benchmarks indicate the effectiveness of our detectors. Ablation studies provide further insights into the critical design choices driving the performance enhancements.
|
|
17:00-17:15, Paper We-S4-T1.5 | |
Cascade-YOLO: Enhancing Small Object Detection in Remote Sensing Imagery (I) |
|
Zeng, Xianting | Southwest Petroleum University |
Wang, Yifan | Southwest Petroleum University |
Zhou, Wenjun | Southwest Petroleum University |
Zhang, Quan | Southwest Petroleum University |
Liu, Yangyi | Intelligent Policing Key Laboratory of Sichuan Province |
Keywords: Deep Learning, Image Processing and Pattern Recognition, Neural Networks and their Applications
Abstract: Small object detection in remote sensing images suffers from fine detail loss in shallow backbones and semantic confusion in deep layers under complex backgrounds. To address this, we present Cascade-YOLO, a lightweight detector that fuses cascaded residual learning with spatial-context modeling. Its Cascade Branch Residual Module (CBRM) uses a triple-branch, Pinwheel-shaped convolutions with residual skips to expand local receptive fields while preserving feature diversity. Meanwhile, its Cascade Group Spatial Context-Aware Module (CGSCM) applies grouped operations to hierarchically aggregate spatial cues to suppress background interference. A joint Normalized Wasserstein Distance and CIoU loss further enhances its scale robustness. Compared to existing YOLO variants, Cascade YOLO better retains small object details and reduces false positives in cluttered scenes. On NWPU VHR-10 dataset, it attains 93.8% mAP50 with just 6.43M parameters and 18.9 GFLOPs—1.1% higher accuracy and 71% less computation than YOLOv11-m—and runs at 256 FPS on an NVIDIA RTX 4070, 39% faster than FFCA-YOLO. These results demonstrate its superior accuracy–efficiency trade-off for real-time remote sensing.
|
|
We-S4-T2 |
Hall N |
Neural Networks and Their Applications 4 |
Regular Papers - Cybernetics |
Chair: Lemmel, Julian | TU Wien |
Co-Chair: Kita, Eisuke | Nagoya University |
|
16:00-16:15, Paper We-S4-T2.1 | |
A Novel Medical Image Reconstruction Framework for 3D Chest CT Based on a Single 2D Anteroposterior X-Ray Image |
|
Tang, Tao | Southwest University of Science and Technology |
Cai, Wenjie | University of Science and Technology of China |
Yang, Bin | Southwest University of Science and Technology |
Huang, Jun | Southwest University of Science and Technology |
Zhou, Ying | Mianyang Central Hospital |
Liu, Zhiqin | Southwest University of Science and Technology |
Wang, Qingfeng | Southwest University of Science and Technology |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Deep Learning
Abstract: X-rays are widely used in clinical practice due to their low radiation exposure and cost. However, 2D imaging can result in overlapping anatomical structures. In contrast, CT scans generate 3D images, effectively addressing this limitation. Nevertheless, CT scans also have drawbacks, such as high radiation, high cost, and inability to be implemented within ICU settings. In this paper, we propose the AP2CT-GAN framework, which aims to reconstruct chest CT images from a single anteroposterior chest X-ray. The framework incorporates the Feature Enhancement Connection (FEC) and the Feature Dimension Converter (FDC) to enhance critical features and capture global contextual information, along with the Dual-Consistency Loss function propose to ensure that the reconstructed CT images maintain a high level of structural and textural consistency with the ground truth. Experimental results demonstrate that AP2CT-GAN outperforms existing methods, offering a low-radiation, low-cost CT imaging solution with valuable potential applications in resource-limited regions and ICU settings.
|
|
16:15-16:30, Paper We-S4-T2.2 | |
Discrimination of Brown Spot Leaf Disease Based on Color Extraction from Leaf Image |
|
Yuasa, Nao | Nagoya University |
Minamisawa, Daisuke | Nagoya University |
Takai, Ayana | Nagoya University |
Oirase, Masaya | Nagoya University |
Kita, Eisuke | Nagoya University |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing
Abstract: Due to the development of the convolutional neural network technique, image discrimination techniques are very powerful tools in several applications. However, it takes a relatively long time to train the convolutional neural network. Moreover, the model size is sometimes big. In the previous study, authors presented the image discrimination technique using the Red, Green and Blue (RGB) frequency distribution extracted from the images, instead of the images themselves. The use of RGB frequency distribution enables to use not neural network model but decision tree model to define the discrimination model. As a result, the model size is small and the training time is relatively short. However, in addition to RGB data, HSV data can also be extracted from images. The aim of this study is to discuss the effectiveness of HSV data for the image discrimination. The discrimination of the brown spot leaf disease of tomato leaves is considered as the numerical examples. For evaluating the HSV frequency distribution as the explanatory variable, three algorithms are presented and compared. HSV frequency distributions are evaluated from the images and then, used for the explanatory variables for the discrimination problem. The objective variables are health status of leaf; ``health'' or ``sick''. Random Forest, CatBoost, and XGBoost are used for defining the discrimination model. Numerical results revealed that the use of HSV data, instead of RGB data, improves the discrimination accuracy.
|
|
16:30-16:45, Paper We-S4-T2.3 | |
Online Fine-Tuning of Carbon Emission Predictions Using Real-Time Recurrent Learning for State Space Models |
|
Lemmel, Julian | TU Wien |
Kranzl, Manuel | Datenvorsprung GmbH |
Lamine, Adam | PyrosomaAI |
Neubauer, Philipp | Pyrosoma AI |
Grosu, Radu | TUW |
Neubauer, Sophie | Pyrosoma AI |
Keywords: Neural Networks and their Applications, Machine Learning, Deep Learning
Abstract: This paper introduces a new approach for fine-tuning the predictions of structured state space models (SSMs) at inference time using real-time recurrent learning. While SSMs are known for their efficiency and long-range modeling capabilities, they are typically trained offline and remain static during deployment. Our method enables online adaptation by continuously updating model parameters in response to incoming data. We evaluate our approach for linear-recurrent-unit SSMs using a small carbon emission dataset collected from embedded automotive hardware. Experimental results show that our method consistently reduces prediction error online during inference, demonstrating its potential for dynamic, resource-constrained environments.
|
|
16:45-17:00, Paper We-S4-T2.4 | |
Data Enhancement for Long-Tailed Tasks: Diffusion Model with Optimized Quality Filter |
|
You, Jiawei | Dalian Maritime University |
Zhai, Yichuan | Dalian Maritime University Student |
Mengke, Li | Shenzhen University |
Lu, Yang | Xiamen University |
Keywords: Neural Networks and their Applications, Machine Vision, Machine Learning
Abstract: In the field of long-tail learning, the uneven distribution of datasets leads to a significant decrease in the model's accuracy for tail classes. The data augmentation method is an effective way to alleviate the long-tail problem. The diversity of most existing augmentation methods is insufficient, leading to the limited improvement of tail class information. Due to the diffusion model's ability to generate images with high quality, rich details, and diversity by gradually denoising images, we propose a augmentation method based on the diffusion model called Diffusion-VagMix (DVM). Based on the diffusion model, this method uses an Optimized Quality Filter (OptiFilter) to scalp low-quality generated images and perform further data augmentation. Our approach effectively addresses the issue of insufficient samples in tail classes and uneven quality of the resulting images by augmenting the dataset with high-quality generated images. This strategy leads to improvements in the accuracy of tail classes and enhances the model's overall performance. Our DVM method can be used on other long-tail learning methods at will, which can make further improvements. The source code is available at https://github.com/Alert-M/Diffusion-VagMix.
|
|
17:00-17:15, Paper We-S4-T2.5 | |
RMNA: ROI-Mixup and Neighborhood Alignment for Cross-Domain Plant Disease Classification |
|
Li, Boheng | Hosei University |
Iyatomi, Hitoshi | Hosei University |
Keywords: Transfer Learning, Deep Learning, Neural Networks and their Applications
Abstract: 许多研究表明机器学习在植物病害分类中的有效性。然而,大规模研究表明,由于相对于疾病症状的可变性而言,数据多样性不足,因此来自看不见环境的图像的性能会显著下降。尽管已经提出了包括领域适应在内的各种技术,但在实质性的领域转移下仍然存在许多挣扎。为了解决这个问题,我们提出了 ROI-mixup and neighborhood alignment (RMNA),这是一种基于转导学习的无监督域适应方法,其中未标记的目标域(测试域)数据在训练期间是完全可访问的。具体来说,所提出的 ROI-mixup 通过混合来自源和目标图像的 ROI 和非 ROI 标记来减少背景干扰,从而增强对关键植物病害特征的提取。邻域对齐利用特征相似性和最近邻关系来优化特征空间对齐,从而提高类内一致性
|
|
We-S4-T3 |
Room 0.11 |
Cooperative Systems and Control & Adaptive Systems |
Regular Papers - SSE |
Chair: Li, Zhen | Institute of Information Enginnering, Chinese Academy of Science |
Co-Chair: Wang, Shouguang | Zhejiang Gongshang University |
|
16:00-16:15, Paper We-S4-T3.1 | |
GDHP-SAF: A Gradient-Driven Heterogeneous Mixed-Precision Acceleration Framework for Industrial-Scale Multiphase Flow Simulations |
|
Yan, Yunbo | Qilu University of Technology (Shandong Academy of Sciences) |
Liu, Anjun | Jinan Institute of Supercomputer Technology |
Meng, Xiaojuan | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Anbang | Qilu University of Technology |
Raikov, Aleksandr | Jinan Institute of Supercomputing Technology |
Guo, Meng | Qilu University of Technology |
Keywords: System Architecture, Adaptive Systems, System Modeling and Control
Abstract: 格玻尔兹曼法 (LBM) 广泛用于模拟化学、能源和工业应用中的多相流。然而,模拟大规模界面现象(例如液滴碰撞和气泡聚结)在平衡 GPU 内存使用和计算精度方面提出了重大挑战。为了克服这个问题,我们提出了梯度驱动的异构混合精度加速框架(GDHP-SAF)。该框架将双精度 (FP64) 计算动态分配给高梯度区域,以确保准确的界面张力建模,同时应用单精度 (FP32) 和 Kahan 求和来实现低梯度区域的数值稳定性。GDHP-SAF 集成到 Palabos 平台中,具有动态精密传感模块、三层混合精度计算管道以及具有双缓冲通信协议的优化压缩存储。在涉及 1.41 亿个格子节点的基准测试中,该框架将内存消耗降低了 40%,并提供了 30 倍的加速。这项工作通过协调动态精度控制和分层Ê
|
|
16:15-16:30, Paper We-S4-T3.2 | |
Safe Spacecraft Guidance Using Adaptive Domain Randomization and Relaxed Control Barrier Functions |
|
Tammam, Abdulla | City, University of London |
Aouf, Nabil | Cranfield University |
Keywords: Control of Uncertain Systems, System Modeling and Control, Robotic Systems
Abstract: Autonomous spacecraft guidance for proximity operations can present significant challenges in uncertain environments, particularly in terms of safety and robustness. This paper proposes a novel deep reinforcement learning (DRL)-based guidance system that integrates Adaptive Domain Randomization (ADR) and relaxed Control Barrier Functions (CBFs) to enhance adaptability and enforce safety constraints. ADR systematically expands the range of environmental parameters during training, enabling the agent to generalize effectively to diverse conditions. CBFs provide a theoretical safety guarantee, ensuring stable maneuvering even in failure scenarios. The proposed approach is evaluated in a simulated spacecraft proximity operation under nominal and degraded conditions, including actuator breakdown and sensor degradation. Results demonstrate that the proposed DRL agent configuration outperforms the baseline DRL approach, maintaining a lower relative position error and angular error.
|
|
16:30-16:45, Paper We-S4-T3.3 | |
Heterogeneous UAV-UGV Collaboration for Dynamic Environment Surveillance and Rendezvous Charging Missions |
|
Lee, Yu-Cheng | National Cheng Kung University |
Liu, Yen-Chen | National Cheng Kung University |
Keywords: Cooperative Systems and Control, Modeling of Autonomous Systems, Robotic Systems
Abstract: This study develops a novel UAV-UGV collaborative system for persistent surveillance of dynamic environments. The main tasks are independently assigned to the UAVs and UGVs, while sharing a rendezvous-based charging mission. The UAVs focus on AoI-based persistent coverage, and the UGVs are responsible for covering signal sources in the environment, serving not only as mobile charging stations, but also as active participants in environmental sensing. The system incorporates physical constraints such as endurance limits and velocity bounds to ensure feasibility in real-world scenarios. A sequential rendezvous scheduling algorithm is presented, which dynamically adjusts the time window bounds between UAVs and UGVs, enabling conflict-free charging for multiple UAVs. Results demonstrate that the algorithm exhibits high flexibility and robustness across various UAV–UGV configurations. It effectively supports systems with a UAV-to-UGV ratio greater than 2 and performs well in most scenarios.
|
|
16:45-17:00, Paper We-S4-T3.4 | |
Social Attraction Mutation with Noise Scheduling in Genetic Algorithms |
|
Domonkos, Márk | ELTE Eötvös Loránd University, Faculty of Informatics, Departmen |
Gyöngyössy, Natabara Máté | ELTE Eötvös Loránd University, Faculty of Informatics |
Botzheim, Janos | Eötvös Loránd University |
Keywords: Adaptive Systems
Abstract: Evolutionary algorithms offer a versatile toolset for various problems, from engineering problems to the economy. In this paper, we present our modification to our mutation operator called Social Attraction Mutation (SAM). In our previous publication, we concluded that SAM is a fast converging, however too greedy method. As a solution for this problem, we add noise to the SAM with different scheduling via linear combination. As a baseline method for the comparison we used Gaussian Mutation, our previously developed SAM, and a combination of the two, where SAM was used as a pre-evolutionary operator, and then Gaussian Mutation was used. We made tests using four continuous benchmark problems and two classical control problems from Gymnasium. The results we obtained in the case of continuous benchmark problems show that the introduction of noise significantly enhances SAM. On the other hand, in the case of the control problems, where the task is to find the possible best weight combinations for the controller neural network, we obtained mixed results.
|
|
17:00-17:15, Paper We-S4-T3.5 | |
Smart Contract Vulnerability Detection Via Fusion of Sequence and Graph Features |
|
Li, Haikuo | Chinese Academy of Sciences |
Xiong, Gang | Institute of Information Enginnering, Chinese Academy of Science |
Yang, Chen | Chinese Academy of Sciences |
Yue, Juwei | Chinese Academy of Sciences |
Chen, Ziqian | University of Chinese Academy of Sciences |
Gou, Gaopeng | Institute of Information Enginnering, Chinese Academy of Science |
Li, Zhen | Institute of Information Enginnering, Chinese Academy of Science |
Keywords: Fault Monitoring and Diagnosis, System Modeling and Control, Homeland Security
Abstract: Smart contracts control critical financial assets on blockchains, with potential weaknesses risking substantial losses. Thus, smart contract vulnerability detection is essential for maintaining blockchain ecosystem stability. Traditional methods depend extensively on expert-driven patterns, resulting in poor scalability. Although deep learning-based approaches have made significant progress, they still suffer from issues such as inflexible representations, insufficient feature modalities, and limited model capabilities. In this paper, we propose FSGDec, a novel smart contract vulnerability detection framework that fuses sequential information and structural features at the bytecode level. Firstly, an efficient node embedding method is developed for contract control flow graphs, flexibly processing node sequences and incorporating node-specific semantic information associated with weaknesses. Then, by modeling node features as time series signals, an adaptive graph wave network is introduced to automatically capture vulnerability-related structural features. Finally, a classifier is deployed to perform bug detection utilizing the extracted graph-level features that integrate semantic information. Evaluated on two real-world smart contract datasets, the experimental results demonstrate that FSGDec achieves superior performance compared to state-of-the-art baselines.
|
|
We-S4-T4 |
Room 0.12 |
Intelligent Power Grid 2 |
Regular Papers - SSE |
Chair: Mignoni, Nicola | Politecnico Di Bari |
Co-Chair: Qin, Ciyu | ETH Zürich |
|
16:00-16:15, Paper We-S4-T4.1 | |
Preventive N-1 Redispatch under Multi-Energy and Multi-Period Constraints: An Operator-Oriented Framework |
|
Qin, Ciyu | ETH Zürich |
Gjorgiev, Blazhe | ETH Zürich |
Stankovski, Andrej | ETH Zürich |
Sansavini, Giovanni | ETH Zürich |
Keywords: Intelligent Power Grid, System Modeling and Control, Cyber-physical systems
Abstract: Redispatch is essential in power system operation to maintain security and resolve congestions. However, often the redispatch models do not fully capture the operational constraints of diverse generation resources and operator preferences. This paper proposes a multi-period and multi-energy source preventive N-1 redispatch model. The model adjusts a day-ahead market-cleared dispatch to account for the N-1 security while accounting for generator (thermal, hydro, battery storage, wind, and solar) and grid constraints. To reflect practical operator preferences, the model minimises both redispatch costs and the number of redispatch actions. An adaptive constraint relaxation mechanism is included to address infeasibility under stressed conditions while preserving system security. The proposed model captures key operational features of modern power systems and supports decision-making in both cost-focused and operator-aligned scenarios. A modified IEEE 118-bus system is used to demonstrate the model’s effectiveness in enhancing system security, reducing redispatch costs, and utilising flexibility resources under realistic conditions.
|
|
16:15-16:30, Paper We-S4-T4.2 | |
Optimal Shadow-Aware Dynamic Solar Panel Orientation in Dual-Axis Agro-Voltaic Systems for Smart Energy-Agriculture Integration |
|
Askari Noghani, Saba | Polytechnic of Bari |
Mignoni, Nicola | Politecnico Di Bari |
Carli, Raffaele | Politecnico Di Bari |
Dotoli, Mariagrazia | Politecnico Di Bari |
Raisch, Joerg | Technische Universitaet Berlin |
Keywords: Intelligent Power Grid, System Modeling and Control, Decision Support Systems
Abstract: Agro-voltaic (AV) systems enable the simultaneous use of land for both agriculture and solar energy generation, offering a promising solution to land-use conflicts. However, the shadows cast by solar panels present a significant challenge: while they can protect crops from excessive heat and reduce water loss, they may also limit sunlight exposure, reducing crop yield. This paper presents a method for controlling shadow placement on farmland by dynamically adjusting the azimuth and elevation angles of dual-axis solar panels. A geometric formulation is introduced to model the shadow of a rectangular panel on the ground, followed by an optimization framework that restricts the shadow to a predefined zone while ensuring minimum solar radiation within that zone over the course of a day. The proposed approach balances the goals of maximizing solar energy capture and achieving precise shadow control, using an iterative refinement method. The system is evaluated using real-world solar data from a clear day in Bari, Italy. The results demonstrate the system's practical potential for effectively managing shading in real-world AV applications, thus enabling sustainable integration of energy generation and farming activities.
|
|
16:30-16:45, Paper We-S4-T4.3 | |
Green Energy and Latency Aware Computation Intensive Machine Learning Task Offloading in Carbon-Neutral Edge Computing |
|
Ahmmed, Tahsin | Green University of Bangladesh |
Hasnat Zaman, Waliyel | Green University of Bangladesh |
Islam Rimon, Md. Saiful | Green University of Bangladesh |
Roy, Palash | University of Dhaka |
Razzaque, Md. Abdur | University of Dhaka |
Fortino, Giancarlo | University of Calabria |
Savaglio, Claudio | Italian National Research Council (CNR)-ICAR, University of Cala |
Hassan, Mohammad Mehedi | King Saud University |
Keywords: Intelligent Green Production Systems, Smart Sensor Networks, Cyber-physical systems
Abstract: The growing demand for computation-intensive artificial intelligence (AI) and machine learning (ML) applications necessitates carbon-neutral edge computing to enhance resource efficiency, reduce energy consumption, and promote sustainability in Industrial Internet of Things (IIoT) systems. However, reducing service latency and energy consumption while ensuring execution accuracy and a predictable carbon footprint and its associated cost remains a critical research challenge. Existing works in the literature experience significant challenges for task offloading due to a lack of edge collaboration and ineffective management of Carbon Emission Rights (CER) credits. In this paper, we have developed an optimization framework leveraging Mixed Integer Linear Programming (MILP), namely GRELMON, to jointly minimize service latency and energy consumption while maximizing task accuracy in carbon-neutral collaborative edge and cloud computing for IIoT environments. Moreover, a carbon emission forecasting model using a hybrid deep learning approach is also developed to prevent unnecessary CER purchases. The experimental results demonstrate that GRELMON outperforms state-of-the-art methods by reducing latency and energy consumption while improving the accuracy of the execution of ML tasks.
|
|
16:45-17:00, Paper We-S4-T4.4 | |
A Stochastic Performance Model and a Sensitivity Analysis of a Battery Swapping Station System for Electric Vehicles |
|
Silva, Jonatas | Universidade Federal De Pernambuco |
Almeida, Vinícius | Centro De Informática, Universidade Federal De Pernambuco |
Santana, Marcelo | Universidade Federal De Pernambuco, Centro De Informática |
Dantas, Renata | Federal Institute of Pernambuco - IFPE |
da Silva, Daliton | Universidade Federal De Pernambuco |
Maciel, Paulo | UFPE |
Keywords: Electric Vehicles and Electric Vehicle Supply Equipment, System Modeling and Control, Infrastructure Systems and Services
Abstract: This paper presents a stochastic performance model for evaluating battery swapping station (BSS) systems for electric vehicles (EV). The approach uses Stochastic Petri Nets (SPN) and discrete event simulation to model interactions among vehicles, battery swapping infrastructure, and demand distribution. A modular framework simulates a real-world case study in Recife, Brazil. The Average Probability of Not Finding a Battery (APNFB) is evaluated under various operational scenarios. A sensitivity analysis highlights key parameters affecting system performance: battery discharge time, charge time, and the number of motorcycles. These parameters significantly impact battery availability and reliability, emphasizing the need for informed infrastructure planning and design. The study illustrates the value of stochastic modeling for understanding BSS system performance under uncertainty and supports strategic decision-making for future deployments.
|
|
17:00-17:15, Paper We-S4-T4.5 | |
Topology Reconstruction of Low Voltage Grids Using Genetic Algorithms |
|
Lima, David | INESC TEC |
Sampaio, Gil | INESC TEC |
Keywords: Smart Metering, Smart Buildings, Smart Cities and Infrastructures
Abstract: The topology of low-voltage (LV) distribution grids is often partially known or inaccurately documented by grid operators, including line and cable characteristics, hinder- ing the effective integration and management of Distributed Energy Resources (DERs). This paper presents a data-driven method to reconstruct LV grid topologies using only voltage measurements from customers’ smart meters. The approach relies on an adapted genetic algorithm (GA) that iteratively explores candidate configurations, guided by a score function that evaluates both the physical plausibility of estimated line impedances and their consistency with noisy voltage data, which is progressively corrected throughout the process, i.e., the method also filters out errors affecting the initial measurements. The method requires no prior information on grid connectivity and demonstrates robustness to measurement noise, making it well suited for real-world deployment.
|
|
17:15-17:30, Paper We-S4-T4.6 | |
A Robust Phase Mapping Approach Using the Mahalanobis-Wasserstein Distance |
|
Lima, David | INESC TEC |
Sampaio, Gil | INESC TEC |
Rocha, Conceição | INESC TEC |
Viana, João | INESC TEC |
Gouveia, Clara | INESC TEC |
Keywords: Smart Metering, Smart Buildings, Smart Cities and Infrastructures
Abstract: The integration of Distributed Energy Resources (DERs) into low-voltage (LV) distribution grids poses significant challenges for grid management, particularly regarding the need for accurate information on the connection phases of installations to ensure proper load balancing and to enhance hosting capacity. This paper presents a novel voltage-based phase mapping approach using the Mahalanobis-Wasserstein (MW) distance — a metric that exploits voltage time series data to accurately assign users to their corresponding phases without requiring additional hardware or prior knowledge of the grid’s topology. The proposed method demonstrates strong resilience to missing data, a frequent issue in real-world deployments, and incorporates a confidence score to quantify the reliability of the phase assignments.
|
|
We-S4-T5 |
Room 0.14 |
Application of Artificial Intelligence 6 |
Regular Papers - Cybernetics |
Chair: Virvou, Maria | Department of Informatics University of Piraeus Piraeus, Greece |
Co-Chair: Gao, Haihua | Institute of Information Engineering, Chinese Academy of Sciences |
|
16:00-16:15, Paper We-S4-T5.1 | |
Multimodal Integration of MRI and Genetic Information for Glioblastoma Survival Prediction |
|
Kang, Hanbeen | Korea University |
Kang, Bogyeong | Korea University |
Lim, Minjoo | Korea University |
Kam, Tae-Eui | Korea University |
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Deep Learning
Abstract: Glioblastoma (GBM) remains a brain tumor with extremely poor prognosis, necessitating precise survival prediction to guide personalized treatment planning. Each MRI modality highlights distinct biological features of GBM, while genetic information provides crucial context for understanding tumor and progression. This complementary information offers a more comprehensive understanding of GBM. However, existing survival prediction methods, using either statistical approaches or deep learning models, often fail to capture the intricate relationships between multimodal MRI data and genetic markers, causing significant challenges to effective integration. To address these challenges, we propose a novel framework that integrates multimodal MRI and genetic information through a tailored fusion approach reflecting the distinct biological characteristics of each modality. Our method integrates multimodal MRI data and genetic information through a modality-aware fusion approach, which preserves modality-specific features and adjusts feature representations to integrate genetic information. This design achieves superior performance by modeling modality-specific features and cross-modal interactions, while incorporating genetic information to refine the feature representation for survival prediction. As a result, this framework supports clinicians in making informed, personalized treatment decisions, ultimately enhancing patient outcomes.
|
|
16:15-16:30, Paper We-S4-T5.2 | |
LLMPEx: Automatic Extraction of ABAC Policies from Natural Language Documents Using LLMs |
|
Gao, Haihua | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Machine Learning
Abstract: In the domain of enterprise security, the management and implementation of access control policies are critical for safeguarding sensitive information and maintaining system integrity. Current security standard documents and internal authorization specifications typically adopt unstructured formats with highly specialized terminology, creating challenges for non-specialist personnel such as system administrators who struggle to translate complex technical terms and logical rules into Attribute-Based Access Control (ABAC) policies due to insufficient professional expertise. These challenges often lead to misinterpretations and conversion errors that significantly undermine policy accuracy and effectiveness. To address this issue, we introduce LLMPEx, an innovative framework that leverages the semantic capabilities of large language models (LLMs) to automatically convert Natural Language Access Control Policies (NLACPs) into ABAC policies. LLMPEx integrates three core modules: a text classification module for identifying Access Control Policy (ACP) statements, an entity recognition module for extracting entity information from natural language texts, and a policy generation module that uses LLMs to automatically extract ABAC policies based on the texts and the identified entity information. The experimental results validate the effectiveness of LLMPEx in both text recognition and policy generation tasks, demonstrating that it enhances the reliability in ABAC policy creation while reducing manual efforts and potential human errors.
|
|
16:30-16:45, Paper We-S4-T5.3 | |
An Artificial Intelligence Approach to Automatically Generate Cantonese Meeting Minutes for E-Government |
|
Li, Pinyan | Macao Polytechnic University |
Hoi, Lap Man | Macao Polytechnic University |
Wang, Yapeng | Macao Polytechnic University |
Im, Sio Kei | Macao Polytechnic University |
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Machine Learning
Abstract: When artificial intelligence (AI) technology enters the world of acoustics, many projects that require audio processing are automated and no longer require much manual work. In particular, the use of the latest AI technologies to recognize speech and determine speakers makes it possible to generate meeting minutes entirely by machines. In the southeastern region of China (for example Hong Kong and Macao), people use Cantonese as the official language. Internal government meetings are usually conducted in Cantonese, and the executive department will request that the minutes be submitted as soon as possible after the meeting. In addition to the content of the speeches, the minutes must also include the identities of the corresponding speakers. During some periods of intensive meetings on new policy releases, our interpreters faced great pressure to script the exhausting meeting minutes. Due to the presence of local terms and personal names, even state-of-the-art large language models (LLMs) cannot fully suffice. Therefore, we propose a novel approach to solve such problems. This approach is a three-tier software architecture: the data-tier (data processing), the service-tier (AI models and web services), and the application-tier (user interfaces). The implementation work is carried out using modern AI models (OpenAI’s Whisper and Nvidia’s TitaNet) and a dataset we created (Cantonese Policy Address, CPA). Training results (word error rate is 33.81% and equal error rate is 0.54) and validation results (confusion matrix up to 97%) show that our proposed approach improves automatic recognition precision, thus helping people understand the spirit of the meeting more effectively and quickly.
|
|
16:45-17:00, Paper We-S4-T5.4 | |
Towards Autonomous Design of UAV Path Planning Algorithms Via DeepSeek |
|
Wei, Wenhong | Dongguan University of Technology |
Li, Mingzhou | Dongguan University of Technology |
Li, Qingxia | Dongguan City University |
Keywords: Application of Artificial Intelligence, Swarm Intelligence, Computational Intelligence
Abstract: Path planning is a critical component in UAV mission execution. While traditional optimization algorithms are mature and effective, they typically rely on expert knowledge and manual tuning, resulting in high development barriers and limited adaptability. With the growing capabilities of large language models (LLMs) in natural language understanding and code generation, we explore the autonomous generation of UAV path planning algorithms using DeepSeek, a generalpurpose LLM developed in China. We propose a promptdriven framework that guides DeepSeek to generate both structural descriptions and implementation code for optimization algorithms. A simulation environment is built to evaluate the generated strategies in terms of feasibility, logical consistency, and baseline performance, with comparisons to classical approaches. Experimental results show that DeepSeek is capable of producing executable and modular optimization strategies, demonstrating strong potential for intelligent algorithm design. Instead of aiming to surpass existing state-of-the-art methods, this work focuses on reducing development costs and enhancing accessibility in control-oriented algorithm design. Our study presents a novel perspective on automated algorithm generation and highlights the practical applicability of LLMs in robotics and intelligent control.
|
|
17:00-17:15, Paper We-S4-T5.5 | |
Which AI and How Trusted by Human Energy Stakeholders? Comparing Generative AI ChatGPT with Domain Specific AI-ENERGIA-SYS |
|
Tsihrintzis, George A. | University of Piraeus |
Sarmas, Elissaios | Decision Support Systems Laboratory, School of Electrical & Com |
Marinakis, Vangelis | Decision Support Systems Laboratory, School of Electrical & Com |
Panagoulias, Dimitrios P. | Department of Informatics University of Piraeus Piraeus, Greece |
Tsihrintzi, Evangelia-Aikaterini | Department of Informatics and Telecommunicatios National and Kap |
Virvou, Maria | Department of Informatics University of Piraeus Piraeus, Greece |
Keywords: Application of Artificial Intelligence, Intelligent Internet Systems, Knowledge Acquisition
Abstract: Artificial Intelligence (AI) has been growing significantly recently, based on machine learning, deep learning and the latest Generative AI and Large Language Models (LLMs), like ChatGPT. All AI tools are promising to improve decision-making in many domains, including complex ones, such as energy. As such, there have been energy-specific AI systems that incorporate domain expertise, in addition to general-purpose LLMs that integrate knowledge on a plethora of domains, including energy, based on their training from the internet. However, AI systems can produce errors due to their probabilistic nature, and many users, like energy stakeholders, lack the training to use them effectively. Which AI do we mean and how well is it trusted. It is worth exploring differences in AI systems. By employing the VIRTSI model, this paper compares trust states of energy stakeholders on domain specific AI systems versus general-purpose generative AI. For the purposes of the comparison, we use a domain specific AI-based Energy system entitled AI-ENERGIA-SYS that has been previously developed versus ChatGPT 4.0 by OpenAI. VIRTSI is a rigorous computational model for the dynamics of human trust states, spanning from overtrust to distrust, through user modelling and quantifies the efficiency of the interaction in VIRTSI-adapted confusion matrices. The findings reveal that ChatGPT is persuasive and user-friendly to stakeholders, but its lack of domain awareness and explainability leads to overtrust, risking decision quality. In contrast, AI-ENERGIA-SYS, though more complex, supports better trust calibration, meaning that, when trusted, it is usually correct, but it is not trusted so frequently as ChatGPT. The results suggest that future combinations of energy-specific AI systems with general-purpose generative AI could provide improved trust dynamics and more effective decision support.
|
|
We-S4-T6 |
Room 0.16 |
Machine Learning 4 |
Regular Papers - Cybernetics |
Chair: Zhang, Zhiyuan | Singapore Management University |
Co-Chair: Tian, Yibin | Shenzhen University |
|
16:00-16:15, Paper We-S4-T6.1 | |
VDGPG: A Virtual Data-Guided Prompt Generation Framework for Incremental Learning with Application to Wafer Defect Detection |
|
Liu, Bingwen | Chongqing Normal University |
Tian, Yibin | Shenzhen University |
Chai, Shanglei | Shenzhen University |
Zhang, Zhiyuan | Singapore Management University |
Zeng, Zhi | Chongqing Normal University |
Keywords: Machine Learning, Image Processing and Pattern Recognition, Computational Intelligence
Abstract: Although convolutional neural networks have been widely used for wafer defect detection in semiconductor manufacturing, they typically rely on static offline datasets to train models. These models show strong reliability when detecting known defect types, but struggle with unknown ones, posing challenges in model adaptation and leading to high maintenance costs. Incremental Learning (IL) offers a solution that allows models to continuously adapt to new types of defects without accessing full historical data. This paper introduces a Virtual Data-Guided Prompt Generation (VDGPG) framework, a novel IL approach for wafer defect detection that integrates prompt guided learning and dual-branch virtual data generation. Specifically, VDGPG assigns task-specific prompt vectors to individual attention heads, using a channel attention gating mechanism and similarity computation to select prompt vectors from a prompt pool. This enables the model to focus more effectively on the relevant features for each category of defects. The dual-branch virtual data generation module generates diverse virtual samples, with a special emphasis on contour edge generation, which guides the model to learn features of potential new categories proactively. Experiments using the public WM-811K dataset demonstrate that VDGPG achieves significant performance improvements in wafer defect detection over existing IL methods.
|
|
16:15-16:30, Paper We-S4-T6.2 | |
Text-Based Entity Matching for Entity Resolution and Data Fusion Applied to Person Descriptions |
|
Nausner, Jan | Austrian Institute of Technology |
Hurst, Jakob | Austrian Institute of Technology |
Hubner, Michael | Austrian Institute of Technology |
Keywords: Expert and Knowledge-Based Systems, Machine Learning, Computational Intelligence
Abstract: Text-based entity matching facilitates interoperability between heterogeneous systems by aligning textual person descriptions. We propose an entity matching methodology that integrates rule-based feature extraction, similarity measures, and supervised machine learning classifiers, rigorously evaluated on a person matching problem. We constructed a feature space by extracting domain-specific person attributes from text via a combination of string similarity scores and similarities of inverse document frequency (TF-IDF) embeddings. Next, we evaluated multiple supervised classification models including Multi-Layer Perceptron, Random Forest, and XGBoost, to determine their effectiveness. For evaluation, we created a new domain-specific entity matching dataset named Real Scenario Text-based Person Matching (RSTPM), and assessed the person matching performance of all models in terms of classification metrics and computational cost. In addition, we studied the classification impact of the various features. The proposed approach was shown to achieve an increase of 27.47 percentage points (from 55.41% to 82.88%) in F1-Score compared to the baseline and a total Accuracy of 92.14%, thus demonstrating significant improvements in textual person matching whilst exhibiting a moderate increase in computational demand.
|
|
16:30-16:45, Paper We-S4-T6.3 | |
Feature Selection Method Based on Enhanced Fata Morgana Algorithm with Collaborative Search |
|
Zhang, Guodao | Hangzhou Dianzi University |
Li, Xin | Hangzhou Dianzi University |
Li, Yupeng | Hangzhou Dianzi University |
Pan, Xiaotian | HangZhou Dianzi University |
Lin, Luning | Hangzhou Dianzi University |
Chen, Xinyue | Institute of Intelligent Media Computing, Hangzhou Dianzi Univer |
Keywords: Metaheuristic Algorithms, Optimization and Self-Organization Approaches, Machine Learning
Abstract: The Fata morgana algorithm (FATA), with its combination of virtual and real search mechanisms and outstanding global optimization ability, has demonstrated strong performance in various optimization problems. This paper attempts to introduce it into the field of feature selection and proposes an enhanced version, named EFATA, which effectively improves the ability to escape local optima and global search by incorporating a collaborative search strategy inspired by swarm foraging. In the experiments, the Support Vector Machine (SVM) is used to guide the feature selection process, enabling the effective identification of key feature across multiple UCI datasets, and experimental results show that it is superior to FATA and other comparison algorithms in terms of classification accuracy and F-score, fully validating the superiority and robustness of the EFATA in feature selection tasks.
|
|
16:45-17:00, Paper We-S4-T6.4 | |
A Novel Feature Selection Method Base on Enhanced-CDRIME and Multi-Classifiers for Pain Assessment |
|
Zhang, Guodao | Hangzhou Dianzi University |
Li, Leqi | HangZhou Dianzi University |
Li, Yupeng | Hangzhou Dianzi University |
Pan, Xiaotian | HangZhou Dianzi University |
Yang, Sufang | The Fifth Affiliated Hospital of Wenzhou Medical University( Lis |
Zhou, JianWei | Department of Anesthesia, the Fifth Affiliated Hospital of Wenzh |
Wang, Chuanguang | The Fifth Afffliated Hospital of Wenzhou Medical University |
Keywords: Optimization and Self-Organization Approaches, Metaheuristic Algorithms, Machine Learning
Abstract: Multi-source physiological signals are valuable for quantitative pain assessment, but their high dimensionality, nonlinearity, and strong inter-variable correlations present challenges for clinical decision-making. To improve diagnostic accuracy and efficiency, we propose a novel feature selection framework based on an enhanced Rime optimization algorithm (b-CDRIME), which incorporates a co-adaptive hunting strategy and a dispersed foraging strategy. The framework discretizes the original continuous CDRIME algorithm by introducing a V-binary coding strategy, and constructs a multi-objective weighted fitness function that integrates classification accuracy, feature dimension and AUC, which is combined with the dynamic feedback mechanism of the classifier to achieve efficient feature screening and classification performance optimization. On a multi-source physiological signal public pain dataset, this paper utilizes 10 classical classifiers to test our methods on feature selection and performance improvement. The comparative experiments results show that the pain assessment model constructed based on CDRIME with XGBoost has an average classification accuracy of 81.6% while significantly reducing feature dimensionality. These findings validate the good performance and application potential of this study in the task of processing high-dimensional multi-source pain physiological signals.
|
|
We-S4-T7 |
Room 0.31 |
Information Visualization |
Regular Papers - HMS |
Chair: Li, Yang | Nanjing University |
Co-Chair: Lin, Ping-I | National Chi Nan University |
|
16:00-16:15, Paper We-S4-T7.1 | |
Using 3D Heatmaps to Visualize the Gaze Distributions of Observers Watching a Moving Subject |
|
Iwasaki, Fuyuko | Tottori University |
Hioki, Shouta | Tottori University |
Yoneda, Shunsuke | Tottori University |
Inoue, Michiko | Tottori University |
Nishiyama, Masashi | Tottori University |
Keywords: Human Perception in Multimedia, Information Visualization, Affective Computing
Abstract: We propose a method to visualize the measured gaze distribution of observers asked to perceive the dynamism of a subject's movements in a sports video. This visualization method uses a three-dimensional heatmap on the surface of a human body model. An existing method generates the heatmap using gaze measurements on a body surface in a still image. However, this method does not handle changes over time in a subject's posture in a video. Furthermore, this method does not visualize gaze in the region surrounding the subject's body. Our method calculates the angle between the gaze direction and vertex position to visualize the gaze distribution on the body surface and surrounding regions. Experimental results demonstrated that our method visualizes not only the gaze distribution on the surface region but also that in the surrounding region. We also verified that it is possible to visualize the gaze distribution over a subject's movements in a video without depending on changes in posture using a standard human body model.
|
|
16:15-16:30, Paper We-S4-T7.2 | |
PALM: PAnoramic Learning Map Integrating Learning Analytics and Curriculum Map for Scalable Insights across Courses |
|
Ozaki, Mahiro | Kyushu University |
Chen, Li | Osaka Kyoiku University |
Naganuma, Shotaro | Kyushu University |
Švábenský, Valdemar | Masaryk University |
Okubo, Fumiya | Kyushu University |
Shimada, Atsushi | Kyushu University |
Keywords: Human-centered Learning, Information Visualization, Interactive Design Science and Engineering
Abstract: This study proposes and evaluates the PAnoramic Learning Map (PALM), a learning analytics (LA) dashboard designed to address the scalability challenges of LA by integrating curriculum-level information. Traditional LA research has predominantly focused on individual courses or learners and often lacks a framework that considers the relationships between courses and the long-term trajectory of learning. To bridge this gap, PALM was developed to integrate multilayered educational data into a curriculum map, enabling learners to intuitively understand their learning records and academic progression. We conducted a system evaluation to assess PALM's effectiveness in two key areas: (1) its impact on students’ awareness of their learning behaviors, and (2) its comparative performance against existing systems. The results indicate that PALM enhances learners' awareness of study planning and reflection, particularly by improving perceived behavioral control through the visual presentation of individual learning histories and statistical trends, which clarify the links between learning actions and outcomes. Although PALM requires ongoing refinement as a system, it received significantly higher evaluations than existing systems in terms of visual appeal and usability. By serving as an information resource with previously inaccessible insights, PALM enhances self-regulated learning and engagement, representing a significant step beyond conventional LA toward a comprehensive and scalable approach.
|
|
16:30-16:45, Paper We-S4-T7.3 | |
How Do Visualization Choices Affect Human Decision-Making? a Case Study in Air Traffic Control |
|
Lyu, Wenying | Delft University of Technology |
Borst, Clark | Delft University of Technology |
van Paassen, Marinus M | Delft University of Technology |
Mulder, Max | Delft University of Technology |
Keywords: Human-Machine Interface, Information Visualization, Human Factors
Abstract: Air traffic control is advancing digitalization by developing advanced decision-support systems, where the way information is presented to operators plays a central role in shaping performance. However, the effects of different visual representations within these systems on human decision-making remain not fully understood. In this study, we compared two Conflict Detection and Resolution (CD&R) tools: the Highly Interactive Problem Solver (HIPS) and the Solution Space Diagram (SSD). Although both systems are grounded in the same control problem, they differ in how they represent the control constraints that define conflict conditions and feasible responses. Through a human-in-the-loop experiment under low- and high-traffic conditions, we analyzed how these differences influence decision-making. Results showed that, particularly in low-density traffic, HIPS enabled quicker responses, fewer commands, and smaller safety margins, whereas SSD, despite receiving more favorable subjective ratings, led to greater variability in actions. These findings suggest that visualization significantly impacts decision-making consistency and efficiency. However, in highly complex environments, overall effectiveness may depend more on operators’ ability to shift and adapt decision-making patterns facilitated by the interface than on specific visual elements.
|
|
16:45-17:00, Paper We-S4-T7.4 | |
In-Depth Financial Analysis of a Quantum-Inspired Weighted Multi-Objective Portfolio Model with Visual Observations |
|
Jiang, Yu-Chi | National University of Tainan |
Lin, Ping-I | National Chi Nan University |
Kuo, Shu-Yu | National Yunlin University of Science & Technology |
Chou, Yao-Hsin | National Chi Nan University |
Keywords: Information Visualization, Information Systems for Design and Marketing
Abstract: Quantum-inspired evolutionary computation integrates quantum mechanics with classical optimization techniques, offering innovative solutions to complex real-world problems and gaining attention. Among them, portfolio optimization presents a critical issue. Enhancing real-world applicability requires further consideration of fund allocation alongside various investor risk preferences. However, traditional methods face challenges in handling high-dimensional, multi-objective optimization, particularly in maintaining solution diversity and offering interpretable results. To address this challenge, this study proposes a multi-objective weighted portfolio model (MoWPM) incorporating trend ratio-based evaluation. The multi-objective quantum-inspired tabu search (MoQTS) designs for MoWPM, leveraging superposition and enhanced entanglement mechanisms to explore the Pareto front. MoQTS enables the generation of high-quality solutions tailored to various risk levels. In-depth empirical analysis shows that MoQTS and MoWPM outperform equal-weighted approaches, demonstrating robustness across various performance indicators and through statistical validation. Experiments with classical multi-objective algorithms, supported by clear visualizations, underscore the strong potential of quantum-inspired techniques for practical financial optimization.
|
|
17:00-17:15, Paper We-S4-T7.5 | |
Authentic 3D Structure Preserved Surround View System for Automobile Driving Assistance |
|
Li, Jiaheng | Nanjing University |
Low, Chin Sheng | Nanjing University |
Cao, Jinghao | Nanjing University |
Du, Sidan | Nanjing University |
Li, Yang | Nanjing University |
Keywords: Information Visualization, Multimedia Systems
Abstract: The automation and intellectualization of vehicles have become a development trend in recent years. Compared to Autonomous Driving systems, Advanced Driver Assistance Systems (ADAS) offer a more developed and practical choice. As an ADAS, the 3D surround view system can provide drivers with panoramic environmental information around the vehicle. However, traditional methods face issues such as distortion of close-range obstacles and ghosting in texture stitching. In this article, we propose an authentic 3D structure preserved surround view system based on depth information surface reconstruction. We introduce a novel dual-layer surface of foreground and background to restore structural information of close-range obstacles. In addition, we propose an innovative texture acquisition method to address the ghosting problem in stitching. Finally, we evaluate our method on both simulated and real datasets collected by ourselves, concluding that our approach can better provide drivers with information about the surroundings of their vehicles. More detailed material is available at https://nju-ee.github.io/Autonomous_Driving_Research_Group. page/surround/.
|
|
We-S4-T8 |
Room 0.32 |
Consumer and Industrial Applications |
Regular Papers - SSE |
Chair: Meng, Lin | Ritsumeikan University |
Co-Chair: Meena, Yogesh | IIT Gandhinagar |
|
16:00-16:15, Paper We-S4-T8.1 | |
SD-ProtoNet: Few-Shot Steel Defect Detection and Classification Via Meta-Learning |
|
Wu, Shengbo | Fudan University |
Zhou, YuHan | Fudan University |
Yuan, Yijie | Fudan University |
Chen, Xiong | Fudan University |
Keywords: Consumer and Industrial Applications
Abstract: In industrial settings, the detection of surface defects in steel materials, such as scratches, inclusions, and crazing, is crucial for quality control. However, practical production often faces challenges such as the acquisition of low-resolution (LR) images and a limited number of defect samples. Traditional deep learning methods are prone to overfitting or poor generalization due to insufficient data. To address these issues, this paper proposes a steel surface defect classification method SD-ProtoNet tailored for industrial scenarios, based on the LR-ProtoNet model. The SD-ProtoNet is a meta-learning approach for steel defect detection that incorporates a Feature affine Layer (FA) into the feature extraction block to enhance the feature representation capability of low-resolution images. Additionally, it utilizes a Brownian Distance Covariance (BDC) metric block to capture the joint distribution and nonlinear relationships of defect features, thereby improving classification performance under few-shot conditions. Experiments conducted on the NEU-DET dataset validate the effectiveness of the proposed model in low-resolution steel defect detection tasks.Our detection method achieved better detection accuracy in small samples and low resolution compared to other traditional methods, achieving 78.77% and 88.28% in the settings of 5-way 1-shot and 5-way 5-shot small samples, respectively.
|
|
16:15-16:30, Paper We-S4-T8.2 | |
Enhanced Biogas Production Prediction Using BiogasNET with BiogasGAN |
|
Geng, Yingrui | Ritsumeikan University |
Wang, Zenghui | University of South Africa |
Deng, Mingcong | Tokyo University of Agriculture and Technology |
Meng, Lin | Ritsumeikan University |
Keywords: Consumer and Industrial Applications, Control of Uncertain Systems, Modeling of Autonomous Systems
Abstract: Biogas is a sustainable energy source produced through anaerobic digestion (AD), which converts organic waste into methane and carbon dioxide. Accurate prediction of biogas yield is essential for stable and efficient operation. However, this task is difficult due to the nonlinear dynamics of AD systems and frequent missing values in sensor data. Here, we propose a two-stage framework. First, we introduce BiogasGAN, a generative adversarial network designed to impute missing values in multivariate time series data. It reconstructs incomplete sensor records while preserving temporal and cross-variable relationships. Second, we present BiogasNET, a hybrid deep learning model that combines convolutional layers, long short-term memory (LSTM) units, and attention mechanisms to forecast biogas production from imputed data. We evaluate our framework on real-world biogas plant datasets. Experimental results show that BiogasNET achieves state-of-the-art performance, with RMSE and MAE as low as 0.029 and 0.022, respectively. Ablation studies confirm the value of each model component, and comparisons with conventional machine learning methods highlight its robustness. Overall, our approach provides an effective and practical solution for biogas yield prediction in real-world environments.
|
|
16:30-16:45, Paper We-S4-T8.3 | |
Federated Learning with Diversified Model Ensemble for Industrial Preventive Maintenance |
|
Pan, Yuchen | Shanghai Jiao Tong University |
Bai, Yang | Shanghai Jiao Tong University |
Zhang, Zunpu | Shanghai Baosight Software Co., Ltd |
Chen, Lixing | Shanghai Jiao Tong University |
Keywords: Consumer and Industrial Applications, Distributed Intelligent Systems, Fault Monitoring and Diagnosis
Abstract: Preventive maintenance is crucial in the industrial production process, enabling the prediction of potential faults and minimizing manufacturing losses. The industrial system consists of distributed heterogeneous end devices, which pose significant challenges for establishing generalized and effective solutions for preventive maintenance. Federated learning (FL) has emerged as a promising paradigm for training comprehensive models in industrial edge environments and protecting data privacy. However, model heterogeneity among devices presents a significant challenge in the application of FL to real-world industrial scenarios, hindering the aggregation and sharing of knowledge among edge models. In this paper, we address the issue of model heterogeneity by leveraging the concept of multi-model ensemble. We propose a novel framework, namely Federated Learning with Diversified Model Ensemble (FedDME), which collects knowledge of heterogeneous models from edge devices and integrates them through knowledge distillation. FedDME can cope with model heterogeneity and save communication costs to improve FL efficiency. The proposed approach is implemented within defect detection tasks related to preventive maintenance. The results demonstrate that our method achieves over 10% accuracy improvement compared with the state-of-the-art solution and exhibits robust adaptability to real-world industrial environments.
|
|
16:45-17:00, Paper We-S4-T8.4 | |
Federated Hard Example Mining for Defect Detection under Statistical Heterogeneity |
|
Pan, Yuchen | Shanghai Jiao Tong University |
Bai, Yang | Shanghai Jiao Tong University |
Chen, Lixing | Shanghai Jiao Tong University |
Zhang, Zunpu | Shanghai Baosight Software Co., Ltd |
Keywords: Consumer and Industrial Applications, Distributed Intelligent Systems, Fault Monitoring and Diagnosis
Abstract: Industrial defect detection is crucial for the production process, which identifies potential faults and minimizes manufacturing losses. Manufacturing operations are often distributed across multiple production lines and sites, with diverse defect samples scattered across edge devices. This necessitates the development of robust techniques to construct comprehensive detection models through collaborative learning among edge nodes. Federated learning (FL) has emerged as a promising paradigm for training such models in industrial edge environments while preserving data privacy. However, the presence of biased production conditions across edge nodes introduces statistical heterogeneity, which poses a significant challenge to the practical application of FL and adversely impacts the performance of aggregated global models. In this paper, we address the issue of statistical heterogeneity by leveraging hard sample mining and pseudo data generation. We propose a novel framework, namely Federated Hard Example Mining and Generation (FedHEM), which collects class prototypes and hard features from local clients and generates corresponding pseudo samples to transfer those knowledge. FedHEM can cope with statistical heterogeneity and satisfy the need for generalization ability and learning of specific classes. The proposed framework is evaluated on benchmark datasets and industrial defect detection tasks. Experimental results demonstrate that FedHEM achieves over 10% accuracy improvement compared to state-of-the-art data-heterogeneous methods and exhibits robust adaptability to real-world industrial environments.
|
|
17:00-17:15, Paper We-S4-T8.5 | |
SkeySpot: Automating Service Key Detection for Digital Electrical Layout Plans in the Construction Industry |
|
Dosi, Dhruv | IIT Gandhinagar |
Meena, Rohit | I-Diary IT Solutions Private Limited |
Rajpura, Param | IIT Gandhinagar |
Meena, Yogesh | IIT Gandhinagar |
Keywords: Consumer and Industrial Applications, Enterprise Information Systems
Abstract: Legacy floor plans, often preserved only as scanned documents, remain essential resources for architecture, urban planning, and facility management in the construction industry. However, the lack of machine-readable floor plans render large-scale interpretation both time-consuming and errorprone. Automated symbol spotting offers a scalable solution by enabling the identification of service key symbols directly from floor plans, supporting workflows such as cost estimation, infrastructure maintenance, and regulatory compliance. This work introduces a labelled Digitised Electrical Layout Plans (DELP) dataset comprising 45 scanned electrical layout plans annotated with 2,450 instances across 34 distinct service key classes. A systematic evaluation framework is proposed using pretrained object detection models for DELP dataset. Among the models benchmarked, YOLOv8 achieves the highest performance with a mean Average Precision (mAP) of 82.5%. Using YOLOv8, we develop SkeySpot, a lightweight, open-source toolkit for real-time detection, classification, and quantification of electrical symbols. SkeySpot produces structured, standardised outputs that can be scaled up for interoperable building information workflows, ultimately enabling compatibility across downstream applications and regulatory platforms. By lowering dependency on proprietary CAD systems and reducing manual annotation effort, this approach makes the digitisation of electrical layouts more accessible to small and medium-sized enterprises (SMEs) in the construction industry, while supporting broader goals of standardisation, interoperability, and sustainability in the built environment.
|
|
17:15-17:30, Paper We-S4-T8.6 | |
Systematic Mapping on Positioning of Labeled Industrial Objects |
|
Torres de Oliveira, Murilo | Instituto Federal De Educação, Ciência E Tecnologia De São Paulo |
Beletti Ferreira, Alexandre | Federal Institute of Technology |
Nakamoto, Francisco Yastami | Federal Institute of Education, Science and Technology of São Pa |
Keywords: Quality and Reliability Engineering, Consumer and Industrial Applications, Manufacturing Automation and Systems
Abstract: This paper presents a systematic mapping of the literature on identifying the position and orientation of labeled parts in industrial processes. As automation and industrial processes evolve, new challenges emerge in ensuring the accuracy of part identification, particularly in dynamic manufacturing environments. This work examines 41 papers selected from the following sources: IEEE Xplore, ACM Digital Library, and Scopus. The analysis follows a mapping protocol designed to identify key trends in the field. The findings underscore the need for advancements in computer vision for real-time systems, with a particular emphasis on improving the detection of small objects and integrating vision systems with existing equipment on automated production lines.
|
|
We-S4-T9 |
Room 0.51 |
Autonomous Vehicle 2 |
Regular Papers - SSE |
Chair: Huang, Yo-Ping | National Taipei University of Technology |
Co-Chair: Brühl, Tim | Karlsruhe Institute of Technology |
|
16:00-16:15, Paper We-S4-T9.1 | |
Distributed Processing of Deep Learning Models for Multi-Sensor Fusion |
|
Rauch, Robert | Technical University of Kosice |
Levorato, Marco | UC Irvine |
Gazda, Juraj | Technical University of Kosice |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems
Abstract: Connected Autonomous Vehicles (CAVs) are equipped with an array of sensors generating substantial data streams whose real-time analysis often exceed the onboard computational capabilities. While offloading these computational tasks to edge servers is an established solution, such an approach is increasingly challenging as multiple sensor streams need to be fused and analyzed, and thus transferred over capacity-limited and volatile wireless channels. To address this challenge, in this paper we propose a ``split computing'' framework. Compared to existing solutions that focus on individual sensor streams (and most commonly cameras), our framework is designed for sensor fusion neural models, and specifically camera and LiDAR fusion for semantic segmentation. Our proposed framework optimizes data transfer by eliminating pooling indices in favor of sending LiDAR point cloud indices only to the final network block. We remark how sensor fusion is instrumental to guarantee robust operations in a broad range of conditions. By compressing the data to be transported over the channel, our approach reduces offloading latency, better utilizes CAV computational resources compared to full offloading schemes and decreases channel load in congested urban network scenarios. Our experimental results demonstrate up to 62.74% improvement in task execution latency for sensor fusion models, with at most 6.61% performance trade-off due to compression.
|
|
16:15-16:30, Paper We-S4-T9.2 | |
Kinematic Temporal VAE for Generalized Pedestrian Prediction |
|
Li, Dongchen | Waseda University |
Lin, Zhimao | Waseda IPS |
Hu, Jinglu | Waseda University |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, Robotic Systems
Abstract: The pedestrian trajectory prediction is a crucial research topic in artificial intelligence application scenarios like autopilot and robotics. In these kinds of scenarios, the autopilot vehicle or robot should have a cautious interaction with human to avoid accident. Over the past decade, researchers have continuously proposed high-performance pedestrian trajectory prediction methods by leveraging the powerful tool of artificial intelligence. In particular, the spatial-temporal features based methods have been successfully applied. However, one potential issue with spatial-temporal features has been overlooked. Due to the sensitivity inherent in pedestrian dataset collection, the diversity of spatial features is far less than that of temporal features. Therefore, most spatial-temporal features based methods are tend to overfit to scenarios features, results in an unstable results across different scenarios. In our work, a Kinematic Temporal Conditional Variational Autoencoder (KT-VAE) that emphasize the importance of temporal features along with a reliable spatial post-processing method is proposed. In KT-VAE, the spatial features are compressed instead of the temporal features to ensure that the model focuses more on the temporal continuity of pedestrian kinematic. This approach enables the VAE to better capture the temporal continuity and dynamic characteristics of pedestrian motion, while avoiding scenario overfitting that can result from insufficient spatial features. Through experiments, the KT-VAE maintains stability across different scenarios in cross-validation and demonstrates competitive performance in practical applications.
|
|
16:30-16:45, Paper We-S4-T9.3 | |
Human Driver Modeling Via Control-Based Approaches: PID and MPC Using Bayesian Optimization for Driver Adaptation |
|
Schmees, Steffen | University of Kaiserslautern-Landau |
Heidinger, Jan | University of Lübeck |
Gödker, Markus | University of Lübeck |
Bernhardt, Lukas | University of Lübeck |
Franke, Thomas | University of Lübeck |
Görges, Daniel | University of Kaiserslautern |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, System Modeling and Control
Abstract: Although interpretable controllers are widely used in vehicle systems, they have received limited attention as models of human driving behavior. This study explores whether two such control strategies, a preview-augmented Proportional- Integral-Derivative (PID) controller and a constraint-based Model Predictive Control (MPC) framework, can model human longitudinal driving behavior when adapted via Bayesian optimization. A structured dataset of human driving behavior, recorded with participants in a driving simulator, was used to train and evaluate both controllers across acceleration, deceleration, and cruising scenarios. MPC achieved lower overall deviation and more consistent performance across trials. These findings highlight the potential of combining interpretable control architectures with data-driven parameter adaptation to model human driving behavior effectively.
|
|
16:45-17:00, Paper We-S4-T9.4 | |
Logic-Based Knowledge Awareness for Autonomous Agents in Continuous Spaces |
|
Ghosh, Arabinda | Max Planck Institute for Software Systems |
Salamati, Mahmoud | Max Planck Institute for Software Systems |
Soudjani, Sadegh | Newcastle University |
Keywords: Cyber-physical systems, Autonomous Vehicle, System Modeling and Control
Abstract: This paper presents a step towards a formal controller design method for autonomous agents based on knowledge awareness to improve decision-making. Our approach is to first create an organized repository of information (a knowledge base) for autonomous agents which can be accessed and then translated into temporal specifications. Secondly, to develop a controller with formal guarantees that meets a combination of mission-specific objective and the specification from the knowledge base, we utilize an abstraction-based controller design (ABCD) approach, capable of managing both nonlinear dynamics and temporal requirements. Unlike the conventional offline ABCD approach, our method dynamically updates the controller whenever the knowledge base prompts changes in the specifications. A three-dimensional nonlinear car model navigating an urban road scenario with traffic signs and obstacles is considered for validation. Results show the effectiveness of the method in guiding the autonomous agents to the target while complying with the knowledge base and the mission-specific objective.
|
|
17:00-17:15, Paper We-S4-T9.5 | |
Rainy-nuScenes: A Data Partition for Benchmarking Contaminated Vehicle Cameras through Rain |
|
Eberhardt, Tim Dieter | PhD Candidate KIT Porsche AG |
Klingler, Matthias | Dr. Ing. H.c. F. Porsche Aktiengesellschaft |
Brühl, Tim | Karlsruhe Institute of Technology |
Schwager, Robin | Dr. Ing. H.c. F. Porsche AG |
Sohn, Tin Stribor | Dr. Ing. H.c. F. Porsche AG |
Schneider, Stefan-Alexander | Kempten University of Applied Sciences |
Stork, Wilhelm | Karlsruhe Institute of Technology (KIT) |
Keywords: Trust in Autonomous Systems, Intelligent Transportation Systems, Autonomous Vehicle
Abstract: Adverse weather conditions, particularly rainfall, present substantial challenges to camera-based perception systems in Advanced Driver Assistance Systems (ADAS). Unlike human drivers, camera sensors are more vulnerable to visibility degradation caused by raindrops, which can impair essential functions such as object and lane detection. In this paper, we introduce Rainy-nuScenes, a novel extension of the widely-used nuScenes dataset, specifically designed for benchmarking water droplet detection and segmentation in automotive camera images. The dataset comprises 762 annotated images containing over 1,700 labeled water droplets, enabling a detailed analysis of their spatial distribution and geometric characteristics. We conduct a comparative study between Rainy-nuScenes and related datasets, including WoodScape, emphasizing key differences in droplet coverage, distribution patterns, and annotation strategies. Furthermore, we evaluate various camera defisheye techniques—such as linear and orthographic projections—in conjunction with U-Net [1] based convolutional neural networks (CNNs) trained on both Rainy-nuScenes and fisheye-derived WoodScape images [2]. Our experiments show that the orthographic defisheye approach significantly improves the robustness and generalization capabilities of segmentation models. Rainy-nuScenes serves as a comprehensive benchmark for advancing ADAS algorithms in adverse weather, contributing to the development of safer and more reliable autonomous systems. The data and code are available at: https://github.com/timdietereberhardt/rainynuscenes
|
|
17:15-17:30, Paper We-S4-T9.6 | |
HAF-Net: Hierarchical Attention Fusion Network for Multimodal Image Fusion |
|
Kshetrimayum, Satchidanand | National Taipei University of Technology |
Huang, Yo-Ping | National Taipei University of Technology |
Keywords: Intelligent Transportation Systems
Abstract: Multimodal medical image fusion plays a vital role in enhancing diagnostic accuracy by integrating complementary information from different imaging modalities, such as PET, SPECT, and MRI. However, existing deep learning-based fusion methods often suffer from limited detail preservation and inefficient attention modeling across spatial and channel dimensions. In this work, we propose a novel framework called hierarchical attention fusion network (HAF-Net) for robust and high-quality medical image fusion. The proposed model incorporates a hierarchical feature aggregation (HFA) module to extract scale-adaptive features, and a residual attention convolution (RAC) block to enhance fine-grained details using gradient-aware spatial and frequency-domain information. Furthermore, a multispectral frequency-aware channel attention (MFCA) mechanism is introduced to capture discriminative features across multiple frequency bands, and a cross-interaction attention module (CIAM) is designed to jointly model spatial-channel relationships. An adaptive fusion weighting (AFW) strategy is employed to dynamically combine multi-scale features based on their contextual relevance. Extensive experiments on standard PET/MRI and SPECT/MRI datasets demonstrate that HAF-Net achieves superior performance compared to state-of-the-art fusion methods. The results validate the effectiveness of the proposed modules in preserving structural integrity and enhancing detail in fused medical images.
|
|
We-S4-T10 |
Room 0.90 |
Affective Computing 2 |
Regular Papers - HMS |
Chair: Badica, Costin | Universitatea Din Craiova |
Co-Chair: Mukaeda, Takayuki | Yokohama National University |
|
16:00-16:15, Paper We-S4-T10.1 | |
CoDD: Convenient-Oriented Depression Detection from Few-Channel EEG with Channel Reconstruction |
|
Xu, Zihua | South China University of Technology |
Chen, C. L. Philip | University of Macau |
Zhang, Tong | South China University of Technology |
Keywords: Affective Computing, Brain-Computer Interfaces, Wearable Computing
Abstract: Major depressive disorder (MDD) is characterized by imbalanced connectivity between brain regions, rather than simply increased or decreased activity in a specific region. Current studies on depression detection using EEG predominantly utilize whole-brain signal (full-channel). However, due to the cost and convenience limitations of full-channel devices, portable miniaturized EEG devices with only a few electrodes (few-channel) are more suitable for widespread use in daily scenarios. Depression detection from few-channel EEG data is challenging because these devices can only capture EEG signals from a portion of the brain regions. To address the challenge, we propose a novel Convenience-Oriented Depression Detection model (CoDD). The model aims to enhance the performance of MDD detection using few-channel EEG by leveraging knowledge from full-channel EEG. Specifically, we capture prior whole-brain connectivity patterns from full-channel data to reconstruct missing channels in few-channel EEG data, supplementing the few-channel data with critical encodings and cooperative relationships related to depression. In addition, we use full-channel data to guide the process of information mining and depression detection to ensure that the training process conforms to the original data distribution. Experiments conducted on the MODMA and PRED+CT datasets demonstrate that the model achieves SOTA performance in depression detection under few-channel device conditions, validating the effectiveness of the proposed technique.
|
|
16:15-16:30, Paper We-S4-T10.2 | |
Multimodal Sentiment Analysis Based on Uncertain Missing Patterns |
|
Xu, Bingbing | Wuhan University of Science and Technology |
Yang, Juan | Wuhan University of Science and Technology |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering
Abstract: 模态情绪分析已显示出巨大的潜力 在许多领域中的应用。然而,现有模型 通常处理缺失的模态或缺失的数据 分别。在实践中,这两个 缺失类型是随机的,并且经常共存。当两者 模式和数据缺失,模型输入变得更多 不完整和不平衡。这不仅降低了 模型的泛化能力,也导致 不可靠的结果和关键任务中的潜在错误。 此外,许多方法只关注处理 缺失数据本身,同时忽略了 熔合过程中的重量分配。在本文中, 我们提出了一种方法 多模态情感分析 Based on 不确定缺失模式 (MUMP)。它结合了 Multimodal Trans-former Backbone,缺失的一代 模块和 Dy-Namic 门特征融合模块。这 缺失生成 mod-ule 将模态缺失视为 一个极端的缺失数据案例,并利用缺失的 模式分类和生成线索 完整的模态功能。Ó
|
|
16:30-16:45, Paper We-S4-T10.3 | |
DENL: Dynamic Emotion Neural Link for Efficient Emotion Recognition in Conversations |
|
Cao, Yukun | ShangHai University of Electric Power |
He, Yongcheng | Shanghai University of Electric Power |
Gu, Niu | Shanghai University of Electric Power |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering
Abstract: The Emotion Recognition in Conversations (ERC) task requires models to precisely capture subtle emotional nuances within contextual environments. Presently, the correlation between utterances and emotions is relatively weak, meaning the same utterance might express entirely different emotions. To tackle this challenge, pre-trained language models (PLMs) are typically employed, either through full-parameter fine-tuning or parameter-efficient tuning and learning (PETL) methods. However, these methods incur high computational costs. To address the weak correlation between utterances and emotions under limited computational resources, we propose a feature-task dynamic emotion neural link (DENL), which refines emotional feature representation at a low computational cost and rapidly adapts to ERC tasks. At the feature-learning layer, we embed multiple specialized Emotion Dual-Rank Adapters (EDRA) in parallel, coupled with a gradient-aware dynamic gating mechanism (DeepGate), to avoid propagation costs through the backbone network during backpropagation, thus creating a low-cost emotion neural link to capture emotional features. At the task-learning layer, we utilize a Emotion Space Intervention (ESI) approach, employing a low-rank subspace to manipulate portions of the hidden representations of utterance embeddings, thus guiding the model to rapidly adapt to ERC tasks. Experimental results demonstrate that DENL not only improves the accuracy of fine-grained emotion classification on three benchmark ERC datasets, but also significantly reduces computational cost and parameter size.
|
|
16:45-17:00, Paper We-S4-T10.4 | |
Self-Supervised rU-Net with Spectrum Branch: A Novel Framework for Subject-Independent Emotion Recognition Based on Peripheral Physiological Signals |
|
Chen, Jiaming | Fudan University |
You, Lifeng | Greater Bay Area Institute of Precision Medicine |
Dang, Ting | The University of Melbourne |
Liu, Xiao | Fudan University |
Zhang, Hongtao | Greater Bay Area Institute of Precision Medicine |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering, Human Perception in Multimedia
Abstract: Frequency-domain features of peripheral physiological signals are vital for emotion recognition. However, existing end-to-end network architectures rarely extract them efficiently. To address this limitation, we propose a multimodal rU-Net model incorporating time-frequency information fusion. Specifically, the spectrum is integrated as a parallel branch alongside the time-domain branch for feature extraction. A fusion module enables direct frequency domain feature extraction and subsequent time-frequency fusion. By utilizing the rU-Net encoder with multimodal signal channels, our approach processes skin temperature (SKT), electrodermal activity (EDA), and photoplethysmography (PPG) data simultaneously, thus preventing model inflation from encoder stacking. The CASE and DEAP datasets have been validated using the leave-one-subject-out (LOSO) approach. In the 3-class classification, the best accuracy for valence (V) and arousal (A) were 69.36% and 71.34%, respectively, while in the 2-class classification V and A were 70.06% and 70.29%, respectively. This work offers valuable insights and a novel approach for future research in emotion recognition based on peripheral physiological signals collected by non-EEG wearable devices.
|
|
17:00-17:15, Paper We-S4-T10.5 | |
Domain-Adaptive Emotion Estimation through Lightweight Fine-Tuning |
|
Hayashi, Ryogo | Yokohama National University |
Mukaeda, Takayuki | Yokohama National University |
Tomihama, Keigo | Yokohama National University |
Tsuge, Yukiko | Ajinomoto AGF, INC |
Sasajima, Yuko | Ajinomoto AGF, INC |
Kumao, Toshio | Ajinomoto AGF, INC |
Minami, Shigenobu | MIRUWS Co. Ltd |
Shima, Keisuke | Yokohama National University |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering, Human-Machine Interface
Abstract: This study proposes a domain-adaptive fine-tuning method for transferring emotion estimation models trained on video viewing tasks to food consumption tasks. Only the electrocardiogram signal was used as the input, and the model estimated the valence and arousal based on Russell's circumplex model of affect. In the evaluation experiment, biosignals and subjective ratings were collected from nine participants during both video viewing and food consumption tasks, and the model performance was compared under different fine-tuning situations. The results demonstrated that fine-tuning specifically tailored to the food consumption task improved the estimation accuracy, indicating that effective model adaptation is feasible even with a limited amount of data. Furthermore, fine-tuning using the video-viewing task showed a certain degree of effectiveness, indicating the potential for cross-domain transferability and mitigation of individual differences.
|
|
17:15-17:30, Paper We-S4-T10.6 | |
HBRA: Multimodal Heterogeneous Graph Learning for Bidirectional Marital Recommendations in Aging Societies |
|
Zhang, Fenghao | QingDaoUniversity |
Cheng, Zesheng | College of Computer Science and Technology, Qingdao University, |
Jin, Zi | Qingdao Menaul School |
Keywords: Affective Computing, Multi-User Interaction, Human Factors
Abstract: Addressing the inefficiency of marital matching in aging societies, this study proposes HBRA, a bidirectional recommendation framework leveraging heterogeneous graph learning to model asymmetric preferences and enhancecompatibility analysis. The framework integrates behavioral and textual data through a unified graph architecture, overcoming limitations of unidirectional recommendation systems and fragmented feature fusion. By introducing spectral graph theory and meta-path-guided message passing, HBRA achieves noiserobust preference modeling while eliminating traditional neural network training through closed-form matrix decomposition, reducing computational complexity by 99.5%. Evaluated on the novel “FCWR” dataset (built from real-world matchmaking records) and the Speed Dating benchmark, HBRA outperforms 24 state-of-the-art models, improving NDCG@2 by 25.8% and training efficiency by three orders of magnitude. Parameter analysis reveals text-based lifestyle keywords as the dominant matching factor, surpassing traditional attributes like age and geography. This work provides a scalable, interpretable solution for intelligent matchmaking platforms, directly addressing demographic challenges posed by population aging through datadriven compatibility optimization.
|
|
We-S4-T11 |
Room 0.94 |
Systems Science and Engineering: Decision Aid and Knowledge Issues &
Frontier of AI for Smart, Responsible and Sustainable Manufacturing |
Special Sessions: SSE |
Chair: Abel, Marie-Hélène | Sorbonne Universités, Université De Technologie De Compiègne, CNRS UMR 7253 Heudiasyc |
Co-Chair: Saad, Ines | University of Jules Vernes ESC Amiens |
Organizer: Abel, Marie-Hélène | Sorbonne Universités, Université De Technologie De Compiègne, CN |
Organizer: Hammami, Omar | ENSTA |
Organizer: Saad, Ines | University of Jules Vernes ESC Amiens |
|
16:00-16:15, Paper We-S4-T11.1 | |
How Well Do LLMs Predict Prerequisite Skills? Zero-Shot Comparison to Expert-Defined Concepts (I) |
|
Le, Ngoc Luyen | Université De Technologie De Compiègne |
Abel, Marie-Hélène | Sorbonne Universités, Université De Technologie De Compiègne, CN |
Keywords: Decision Support Systems, Technology Assessment
Abstract: Prerequisite skills - foundational competencies required before mastering more advanced concepts - are important for supporting effective learning, assessment, and skill-gap analysis. Traditionally curated by domain experts, these relationships are costly to maintain and difficult to scale. This paper investigates whether large language models (LLMs) can predict prerequisite skills in a zero-shot setting, using only natural language descriptions and without task-specific fine-tuning. We introduce ESCO-PrereqSkill, a benchmark dataset constructed from the ESCO taxonomy, comprising 3,196 skills and their expert-defined prerequisite links. Using a standardized prompting strategy, we evaluate 13 state-of-the-art LLMs, including GPT-4, Claude 3, Gemini, LLaMA 4, Qwen2, and DeepSeek, across semantic similarity, BERTScore, and inference latency. Our results show that models such as LLaMA4-Maverick, Claude-3-7-Sonnet, and Qwen2-72B generate predictions that closely align with expert ground truth, demonstrating strong semantic reasoning without supervision. These findings highlight the potential of LLMs to support scalable prerequisite skill modeling for applications in personalized learning, intelligent tutoring, and skill-based recommender systems.
|
|
16:15-16:30, Paper We-S4-T11.2 | |
A DRL Approach for Teleoperated Driving in 6G Network Digital Twin Framework (I) |
|
Marvulli, Michele | Polytechnic University of Bari |
Gassi, Giuseppe | Polytechnic University of Bari |
Ali, Wasim | Polytechnic University of Bari |
Volpe, Gaetano | Polytechnic University of Bari |
Mangini, Agostino Marcello | Polytechnic of Bari |
Fanti, Maria Pia | Polytecnic of Bari, Italy |
Keywords: Autonomous Vehicle, Communications, Digital Twin
Abstract: In the age of intelligent transportation systems and smart cities, teleoperated driving aims to bridge the gap between human and fully autonomous driving. However, the reliability of teleoperated driving is heavily dependent on the quality of the cellular networks, a limitation that could be addressed by 6G networks, which aims to enhance ultra low latency and high reliability. This study proposes an integrated simulator for teleoperated driving by utilizing Deep Reinforcement Learning (DRL) in a framework of 6G and Network Digital Twin. The presented simulation framework combines different tools (i.e., SUMO, OMNeT++, and Simu5G) to model realistic traffic and network dynamics. In addition, the Random Forest algorithm is used for the coverage prediction system and maintaining stable connectivity, and a DRL model optimizes vehicle routing by balancing path length and signal coverage. A case study is simulated considering the city of Bari (Italy). The framework demonstrates robust communication between teleoperated vehicles and 6G Digital Twin infrastructure.
|
|
16:30-16:45, Paper We-S4-T11.3 | |
EPreS: A Methodology of Real-Time Energy Consumption Prediction for Automotive Spray-Painting System (I) |
|
Wu, Wei | Chongqing University |
Zhang, Xiangfei | Chongqing Univeristy |
Wang, Yang | Chongqing Univeristy |
Dong, Ke | Chongqing Univeristy |
Li, Congbo | State Key Laboratory of Mechanical Transmission, Chongqing Unive |
Keywords: Intelligent Green Production Systems
Abstract: The spray-painting system in automotive manufacturing characterizes a large amount of energy consumption at a time, thus exposing the process to potential overload risks. It becomes imperative to dynamically predict the energy consumption of spray-painting systems to facilitate preventive and security measures. However, the monitoring data derived from the spray-painting system exhibits high dimensionality and non-linearity, posing challenges to prediction accuracy. Therefore, this paper proposes a methodology framework of real-time energy consumption prediction for the automotive spray-painting system, termed ePreS. First, a hybrid feature extraction approach is designed to cope with multi-domain data concerning device, process, production, and multiple energy sources, thus reducing model complexity and elevating training efficiency. Second, a deep learning model CNN-BiLSTM-Attention (CBA) is proposed for the real-time energy consumption prediction, while the coati optimization algorithm (COA) is employed for network structure optimization. Finally, a real-world case study is implemented in an electric vehicle painting workshop, with a smart energy management system module developed, to verify the effectiveness and superiority of the proposed method. This study is expected to serve as a tool for practitioners to meet similar requirements and spark new ideas for future research.
|
|
16:45-17:00, Paper We-S4-T11.4 | |
Applying Large Language Models As Hybrid-Algorithm Experts for Job Shop Scheduling Problems* (I) |
|
Zhang, Zhiyan | Tongji University |
Ji, Zimo | Tongji University |
Wang, Haixuan | Tongji University |
Wang, Junkai | Tongji University |
Keywords: Manufacturing Automation and Systems, Intelligent Green Production Systems, Decision Support Systems
Abstract: Efficient production scheduling of aviation components is essential for optimizing manufacturing equipment and resource utilization. Traditional scheduling requires experts to develop complex models and optimization algorithms, making the process time-consuming and expertise-dependent. This paper proposes LLM-Scheduling, a multi-agent framework where each agent, powered by a large language model (LLM), autonomously executes scheduling tasks based on natural language inputs. The framework integrates retrieval-augmented generation (RAG) to provide relevant domain knowledge and employs a hybrid-search strategy to enhance optimization performance. Experimental results demonstrate that LLM-Scheduling outperforms conventional approaches in most cases, achieving more efficient scheduling and improved adaptability in dynamic manufacturing environments. Moreover, the framework shows promise in enhancing energy efficiency by optimizing machine utilization and reducing idle time. These findings suggest that LLM-driven scheduling can significantly contribute to smarter and more sustainable manufacturing processes.
|
|
17:00-17:15, Paper We-S4-T11.5 | |
Dynamic Time Series Segmentation for Health Monitoring of Hybrid Systems |
|
Hatte, Léonie | LAAS-CNRS, Université De Toulouse |
Ribot, Pauline | LAAS-CNRS, Université De Toulouse |
Chanthery, Elodie | LAAS-CNRS, Université De Toulouse |
Keywords: Modeling of Autonomous Systems, Discrete Event Systems, Fault Monitoring and Diagnosis
Abstract: Monitoring and diagnosing complex, real-world, industrial hybrid systems require accurate and up-to-date models that can adapt to evolving system behaviors. Such systems, characterized by both continuous and discrete dynamics, are best represented by hybrid models. In this article, we present the segmentation step of HyMED (Hybrid Model Enrichment for Diagnosis), a model-based health monitoring and diagnosis method that monitors hybrid systems and automatically updates the system model if necessary. HyMED uses noisy multivariate time series data to dynamically update models, addressing unanticipated degradations and faults. A key feature of HyMED is its online and passive segmentation step (ODS), which enables robust detection of system mode changes in complex, nonlinear time series. Unlike traditional segmentation methods, ODS dynamically determines its segmentation hyperparameters through an automatic parameter selection process. ODS guarantees adaptability without the need for manual adjustment. The effectiveness of HyPED's segmentation method is demonstrated through a case study on an engine timing system, where its performances are compared to the offline method depicted in the Ruptures library.
|
|
We-S4-T13 |
Room 0.96 |
Biometric Systems and Bioinformatics |
Regular Papers - Cybernetics |
Chair: Kamiya, Tohru | Kyushu Institute of Technology |
Co-Chair: Gu, Boyuan | Glasgow College, University of Electronic Science and Technology of China |
|
16:00-16:15, Paper We-S4-T13.1 | |
SEMG-DGCN Directed Graph Convolutional Network for Rehabilitation Action Difficulty Assessment Based on SEMG |
|
Tian, Yuxuan | Tsinghua University |
Li, Zhuangzhuang | Tsinghua University |
Wu, Ji | Tsinghua University |
Chen, Yuepeng | Beijing University of Posts and Telecommunications |
Feng, Xuefeng | Ningbo University |
Chenyi, Guo | Tsinghua University |
Ma, Ye | Ningbo University |
Liu, Dongwei | Zhejiang University of Finance and Economics |
Ning, Jian | Beijing Sino-SMAFIT Technology Co., Ltd |
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Expert and Knowledge-Based Systems
Abstract: Against the backdrop of an aging population and the high prevalence of chronic diseases, the demand for rehabilitation medical services has surged. However, traditional rehabilitation action difficulty assessment relies on expert experience, suffering from strong subjectivity and low reliability. Existing assessment methods struggle to cover the full-body kinematic characteristics, and existing models lack directed modeling of action difficulty relationships and fail to effectively capture muscle synergy. To address this, this study constructs a 16-channel full-body sEMG dataset, sEmg-Human-594, which includes 594 rehabilitation actions. This study proposes an sEMG-DGCN assessment method based on Directed Graph Convolutional Network (DGCN), which integrates 11-dimensional expert-annotated difficulty criteria to derive difficulty labels, employs directed graphs to model difficulty relationships between actions, and introduces an Anatomically Constrained Spatiotemporal Attention mechanism. Experimental results show that the sEMG-DGCN model achieves an accuracy of 93.61% in difficulty relationship classification, significantly outperforming comparative models. Ablation experiments verify the effectiveness of the attention mechanism, providing a new pathway for rehabilitation action difficulty assessment.
|
|
16:15-16:30, Paper We-S4-T13.2 | |
A Generative Strategy for Target-Oriented Molecular Design: Integrating Docking Scores with Drug-Likeness Constraints |
|
Yang, Zhicheng | Wuhan University of Science and Technology |
Zhang, Xiaolong | Wuhan University of Science and Technology |
Lin, Xiaoli | Wuhan University of Science and Technology |
Keywords: Deep Learning, Biometric Systems and Bioinformatics
Abstract: With the growing application of deep learning in drug design, efficiently generating molecules with desirable drug-like properties and target specificity remains a key challenge. Existing methods often rely on complex fusion of molecular and protein features, leading to high data demands and training costs. To address this, a novel molecule generation model based on multi-constraint optimization is proposed, incorporating docking scores, Lipinski’s Rule of Five, and other pharmacological constraints to guide generation. The model is built on a Bidirectional Gated Recurrent Unit (BiGRU) and improves training efficiency through cross-entropy loss optimization. Experimental results show that the proposed approach achieves superior performance in molecular validity, novelty, and uniqueness, and generates compounds with lower binding energies to target proteins, significantly enhancing the practicality and scalability of molecular design.
|
|
16:30-16:45, Paper We-S4-T13.3 | |
Classification of Respiratory Sounds Based on Hybrid Convolutional Recurrent Neural Network |
|
Asatani, Naoki | Kyushu Institute of Technology |
Kamiya, Tohru | Kyushu Institute of Technology |
Mabu, Shingo | Yamaguchi University |
Kido, Shoji | The University of Osaka |
Keywords: Image Processing and Pattern Recognition, Application of Artificial Intelligence, Biometric Systems and Bioinformatics
Abstract: Nearly 8 million people suffer and die from respiratory diseases every year. Therefore, to reduce the number of deaths which are caused by the diseases, early detection and early treatment of respiratory diseases are required as global issues, and several techniques have been proposed until now. Currently, the ICBHI (International Conference on Biomedical and Health Informatics) 2017 Challenge Dataset has been released for research on respiratory sound analysis, and respiratory sound classification methods using this dataset have been proposed worldwide. The authors also proposed a respiratory sound classification method using an improved CRNN (Convolutional Recurrent Neural Network), which is a combination of CNN and RNN (Recurrent Neural Network) with some modifications. However, it was still difficult to classify by image features alone due to noise such as voice in the respiratory sound data. To overcome this problem, we try to classify breath sounds automatically using a deep learning model that considers the features of the raw breath sound data. To extract the sound features of the raw respiratory sound data, we use a 1D-CRNN, which is a 1D-CNN reconstruction of an improved CRNN proposed in a previous study of ours. Then, it is combined with the deep features obtained by our previous improved CRNN (2D-CRNN) for the final classification. The proposed method achieves AUC (Area Under Curve) of 0.92, sensitivity of 0.75, specificity of 0.86, and ICBHI score of 0.80 based on the ROC (Receiver Operating Characteristic) analysis, respectively, which are the highest values compared to the other methods under the same experimental conditions.
|
|
16:45-17:00, Paper We-S4-T13.4 | |
A U-Net and Transformer Paralleled Network for Robust Blood Pressure Estimation Based on CWT-Transformed PPG Images |
|
Gu, Boyuan | Glasgow College, University of Electronic Science and Technology |
Tang, Jiahao | Shandong University |
Tang, Yunhan | Glasgow College, University of Electronic Science and Technology |
Sun, Haiyang | University of Electronic Science and Technology of China |
Xie, Changting | University of Electronic Science and Technology of China |
Keywords: Neural Networks and their Applications, Deep Learning, Biometric Systems and Bioinformatics
Abstract: This paper proposes a parallel U-Net and Transformer network for blood pressure (BP) estimation using Photoplethysmography (PPG) signals transformed into time-frequency images using CWT. It combines the U-Net's excellent capability of extracting local features with the Transformer's advantage in capturing long-range dependencies when processing sequential data. Specifically, the PPG signal is first segmented into 5-second windows. These segments are then transformed into 2-D images using continuous wavelet transform (CWT), incorporating both the real and imaginary components of the signal fragments. These images are used as inputs to the deep-learning network. For continuous BP signals, the peaks of the waveform are extracted, and the 5-second average of systolic blood pressure (SBP) and diastolic blood pressure (DBP) are computed to serve as the labels and references. The effectiveness and robustness of the proposed model are evaluated on a dataset constructed by us consisting of 16 subjects under 3 different conditions (Sit, Ex and Db), and the results are compared with traditional PPG-based BP estimation methods. Experimental results demonstrate an average mean absolute error (MAE) of 6.99 mmHg for SBP and 7.27 mmHg for DBP. Specifically, our approach achieved a BHS Grade A on the sit dataset and a BHS Grade C on both the Ex and Db datasets. Cross dataset validation is further conducted to prove the effectiveness of our model.
|
|
We-S4-T14 |
Room 0.97 |
Human-Collaborative Robotics |
Regular Papers - HMS |
Chair: Siebinga, Olger | Delft University of Technology |
Co-Chair: Kerbl, Tobias | Technische Universität München |
|
16:00-16:15, Paper We-S4-T14.1 | |
A Model of the Sidewalk Salsa |
|
Siebinga, Olger | Delft University of Technology |
Keywords: Human-Collaborative Robotics, Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: When two pedestrians approach each other on the sidewalk head-on, they sometimes engage in an awkward interaction, both deviating to the same side (repeatedly) to avoid a collision. This phenomenon is known as the sidewalk salsa. Although well known, no existing model describes how this "dance" arises. Such a model must capture the nuances of individual interactions between pedestrians that lead to the sidewalk salsa. Therefore, it could be helpful in the development of mobile robots that frequently participate in such individual interactions, for example, by informing robots in their decision-making. Here, I present a model based on the communication-enabled interaction framework capable of reproducing the sidewalk salsa. The model assumes pedestrians have a deterministic plan for their future movements and a probabilistic belief about the movements of another pedestrian. Combined, the plan and belief result in a perceived risk that pedestrians try to keep below a personal threshold. In simulations of this model, the sidewalk salsa occurs in a symmetrical scenario. At the same time, it shows behavior comparable to observed real-world pedestrian behavior in scenarios with initial position offsets or risk threshold differences. Two other scenarios provide support for a hypothesis from literature stating that cultural norms --in the form of a biased belief about on which side others will pass (i.e. deviating to the left or right)-- contribute to the occurrence of the sidewalk salsa. Thereby, the proposed model provides insight into how the sidewalk salsa arises.
|
|
16:15-16:30, Paper We-S4-T14.2 | |
Human-Robot Synchronization with Virtual Reality for Dual-Arm Robot Teleoperation |
|
Lo, Jia-Hsun | National Taiwan University |
Guo, Cheng-Ming | National Taiwan University |
Huang, Han-Pang | National Taiwan University |
Keywords: Human-Collaborative Robotics, Human-Machine Interface, Virtual/Augmented/Mixed Reality
Abstract: This study develops a dual-arm robot teleoperation system via virtual reality. The objective is to make user control the dual-arm robot through hands and robot vision via HoloLens2. The gestures of the user’s arm and hand are captured and rendered in Unity environment. To map the robot movement with the user’s hand pose, incremental motion mapping and orientation joint control are proposed, and the joint angles are solved through damped least square inverse kinematics. The robotic hands are also designed as end-effectors for multiple tasks. The simulation and experimental results show that the motion of end-effectors matches the user’s hand trajectory, which is synchronized with the user’s movement. The user also completes tasks in several scenarios with the utilities of robot teleoperation. The system constructed in this study combines the actions of humans and robots, promoting multiple robotics research in the fields of imitation learning and remote control.
|
|
16:30-16:45, Paper We-S4-T14.3 | |
Gaze Matters: Eye Contact Detection in Unscripted Human-Robot Interaction Scenarios |
|
Hempel, Thorsten | Otto-Von-Guericke University |
Jung, Magnus | Otto-Von-Guericke University |
Strazdas, Dominykas | Otto-Von-Guericke University |
Al-Hamadi, Ayoub | Otto-Von-Guericke University |
Keywords: Human-Collaborative Robotics, Intelligence Interaction, Human-Machine Interface
Abstract: In human-robot interaction (HRI), eye contact is a crucial mechanism in nonverbal communication, yet its detection from the robot’s perspective remains largely underexplored. This paper presents a user study in a realistic collaborative HRI scenario with unscripted tasks to evaluate the performance of current eye contact detection models under realistic conditions. As part of this study, a new dataset of 5,888 manually annotated visual engagement instances was created to reflect the complexity of real-world interactions and to validate our previous work on NITEC, a large-scale dataset for eye contact detection. The results indicate that models trained on the NITEC dataset perform best with an AUC of 0.74 compared to other models, highlighting the importance of training on diverse and unconstrained data for robust eye contact recognition in HRI. These findings highlight both the technical challenges and the potential for more human-centric robotic systems.
|
|
16:45-17:00, Paper We-S4-T14.4 | |
Safe and Efficient SSM Collaborative Strategy Integrating Human Head and Hands Tracking with a RGB-D Camera |
|
Lettera, Gaetano | Marche Polytechnic University |
Callegari, Massimo | Marche Polytechnic University |
Scoccia, Cecilia | Marche Polytechnic University |
Keywords: Human-Collaborative Robotics, Systems Safety and Security,, Supervisory Control
Abstract: In the evolving Industry 5.0 landscape, human-robot collaboration (HRC) is crucial to improve safety and productivity in shared industrial workspaces. This paper presents a cost-effective solution to track the human skeleton and identify the body parts that are most exposed to collision risk during a HRC task, i.e. the head and the hands, using a RGB-D camera. The depth images are processed through artificial intelligence (AI) algorithms, ensuring an accurate estimation of human motion and proximity to the robot. Furthermore, a robot control module has been developed to modulate the robot's speed based on the risk assessment calculated in real time, implementing a collaborative Speed and Separation Monitoring (SSM) scenario. To improve the efficiency of the workcell, the research proposes a multiplication coefficient to compute the minimum protective separation distance suggested by the ISO 10218-2, which weights the actual risk based on the biomechanical limits of the human part involved, according to the maximum permissible forces provided by the standard. Experimental results in a real robotic work cell demonstrate that the proposed system can achieve safety performance comparable to traditional motion capture systems, such as the OptiTrack; meanwhile it does not require operators to wear additional equipment and offers the advantages of reduced costs and a simpler setup, which is more affordable and adaptable to a wider range of industrial environments.
|
|
17:00-17:15, Paper We-S4-T14.5 | |
Perceptual Data Visualization for Remote Assistance of Automated Vehicles: A Design Study for Perception Modification |
|
Kerbl, Tobias | Technische Universität München |
Isildar, Tarik | Technical University of Munich |
El Alami, Yassine | Technical University of Munich |
Diermeyer, Frank | Technical University Munich |
Keywords: Human-Collaborative Robotics, Telepresence, User Interface Design
Abstract: Teleoperation is emerging as a crucial fallback for automated vehicles when facing unforeseen edge cases that exceed their capabilities. This work focuses on the remote assistance concept Perception Modification, where a remote operator resolves perception-related issues (e.g., neglecting false-positive detections) by modifying the vehicle’s environmental model without assuming full control. Effective perceptual data visualization is essential, enabling the remote operator to achieve sufficient situational awareness to identify perception-related issues while maintaining an acceptable mental workload. However, for Perception Modification it remains unclear which perceptual data should be visualized and how. This research implements and evaluates three visualization approaches tai- lored to Perception Modification. The conducted online user study shows a preference for integrating perceptual data into a single view, though further development is needed for real-world deployment.
|
|
We-S4-T15 |
Room 1.85 |
System Architecture |
Regular Papers - SSE |
Chair: Li, Anbang | Qilu University of Technology |
Co-Chair: Oliveira, Cayo | Universidade Federal De Pernambuco |
|
16:00-16:15, Paper We-S4-T15.1 | |
High-Performance FPGA-Based System for Diabetic Retinopathy Classification |
|
Shi, Fengtao | Tiangong University |
Xue, Yongjiang | Tiangong University |
Zhang, Jinzhu | Hebei University of Technology |
Song, Qingzeng | Tiangong University |
Keywords: System Architecture, Cyber-physical systems
Abstract: 糖尿病视网膜病变 (DR),作为主要 糖尿病并发症,表现显著 诊断挑战,包括有限的医疗反应 来源、人力不足、耗时 检查程序。为了解决这些问题,这个 研究提出了一种基于 FPGA 的高性能 in- telligent DR 分类系统。通过集成 具有 FQ- 的创新 MIL-ViT 网络架构 ViT 量化技术,我们已经实现了高效的 在 Xilinx ZCU102 平台上进行实时诊断 形式。该系统展示了分类精度 APTOS2019 的 CIE 分别为 85.5% 和 90.8%,并且 RFMiD2020 数据集。通过硬件 优化后,我们减少了单张图像的推断 ENCE 时间低于 20 毫秒,满足临床要求 实时需求。值得注意的是,我们的专业 ac- Celeration 架构实现了能源效率 之 36.91 GOPs/W,代表显着改进 超过现有的解决方案。完整的算法到 硬件协同设计保持量化精度 在完成全栈优化时损失低于 4% mization 的。这项工作
|
|
16:15-16:30, Paper We-S4-T15.2 | |
A Redundancy Detection Framework for Distributed Tables in Data Mesh Environments |
|
Oliveira, Cayo | Universidade Federal De Pernambuco |
Matos, Rubens | IFSE |
Araujo, Jean | Universidade Federal Do Agreste De Pernambuco |
Dantas, Jamilson | UFPE |
Keywords: System Architecture, Distributed Intelligent Systems, Quality and Reliability Engineering
Abstract: As organizations adopt Data Mesh to decentralize data ownership, domain teams gain autonomy to manage and evolve their data products. While this fosters architectural flexibility, it also increases the risk of table duplication across distributed environments, affecting governance and storage efficiency. This paper introduces a graph-based methodology to detect such structural redundancies by modeling data architectures as directed graphs and applying subgraph isomorphism algorithms. A user-guided validation step confirms whether identified similarities correspond to actual duplications. The method supports both synthetic and benchmark-based scenarios — including an adapted TPC-DS schema — and enables algorithmic evaluation through execution time (ET), accuracy (ACC), and success frequency (SF). As a case study, two algorithms were tested: VF2 and a hybrid approach called Node Match, which uses degree-based filtering as a pre-step before applying VF2. Node Match demonstrated superior cost-efficiency. A Python-based tool was developed to implement the methodology, enabling graph simulation, interactive validation, and automated metric reporting.
|
|
16:30-16:45, Paper We-S4-T15.3 | |
YOLO-BT: A High-Performance Framework for Accurate Brain Tumor Detection and Localization |
|
Liu, Jiehan | Zhejiang University |
Wan, Huagen | Zhejiang University |
Li, Jiajia | University of Shanghai for Science and Technology |
Keywords: System Architecture, System Modeling and Control, Modeling of Autonomous Systems
Abstract: Brain tumors pose a significant health risk to humans. For proper diagnosis and efficient treatment planning, early detection of brain tumors is crucial. Nevertheless, current automated detection techniques face several challenges, including insufficient robustness in complex environments, difficulties in recognizing boundaries, and limitations in identifying small tumors. To solve these problems, this paper proposes a novel YOLO-BT architecture that integrates the inverted Residual Mobile Block (iRMB), the Spatial and Channel Reconstruction Convolution (ScConv) module, and the Wise-IoU loss function into YOLOv9. Our experimental results demonstrate that on the Br35H dataset, YOLO-BT achieved a 1.6% improvement in mAP@0.5 compared to YOLOv9, while on the Axial-T1CE-2-Class dataset, YOLO-BT exhibited a 2.2% improvement over YOLOv9. This demonstrates that YOLO-BT is an effective and practical solution for brain tumor detection.
|
|
16:45-17:00, Paper We-S4-T15.4 | |
Acquisition of Interpretable Domain Information During Brain MR Image Harmonization for Content-Based Image Retrieval |
|
Abe, Keima | Hosei University |
Muraki, Hayato | Hosei University |
Tomoshige, Shuhei | Hosei University |
Oishi, Kenichi | Johns Hopkins University School of Medicine |
Iyatomi, Hitoshi | Hosei University |
Keywords: System Architecture, Technology Assessment
Abstract: Medical images like MR scans often show domain shifts across imaging sites due to scanner and protocol differences, which degrade machine learning performance in tasks such as disease classification. Domain harmonization is thus a critical research focus. Recent approaches encode brain images x into a low-dimensional latent space z, then disentangle it into z_u (domain-invariant) and z_d (domain-specific), achieving strong results. However, these methods often lack interpretability—an essential requirement in medical applications—leaving practical issues unresolved. We propose Pseudo-Linear-Style Encoder Adversarial Domain Adaptation (PL-SE-ADA), a general framework for domain harmonization and interpretable representation learning that preserves disease relevant information in brain MR images. PL-SE-ADA includes two encoders f_E and f_SE to extract z_u and z_d, a decoder to reconstruct the image f_D, and a domain predictor g_D. Beyond adversarial training between the encoder and domain predictor, the model learns to reconstruct the input image x by summing reconstructions from z_u and z_d, ensuring both harmonization and informativeness. Compared to prior methods, PL-SE-ADA achieves equal or better performance in image reconstruction, disease classification, and domain recognition. It also enables visualization of both domain-independent brain features and domain-specific components, offering high interpretability across the entire framework.
|
|
17:00-17:15, Paper We-S4-T15.5 | |
ScalaCrypt: A Secure Multi-Core Parallel System Architecture for High-Throughput Encrypted Data Transmission |
|
Li, Anbang | Qilu University of Technology |
Qin, Yixin | Qilu University of Technology |
Meng, Xiaojuan | Qilu University of Technology (Shandong Academy of Sciences) |
Yan, Yunbo | Qilu University of Technology (Shandong Academy of Sciences) |
Wang, Jiaxiang | Jinan Institute of Supercomputing Technology |
Guo, Meng | Qilu University of Technology |
Keywords: System Architecture, System Modeling and Control, Communications
Abstract: Long-distance RDMA technology leverages FPGA-accelerated protocol conversion to achieve low-latency, high-bandwidth transmission across data centers. However, end-to-end security for encrypted data requires tight coordination with transmission efficiency. RDMA's reliance on lossless networks and microsecond-level latency demands hardware-level encryption implementation, as traditional software-based encryption or single-core hardware encryption cannot meet real-time and low-latency requirements. To address this, this paper proposes an FPGA-based high-throughput scalable AES-GCM system architecture designed to overcome the core bottlenecks of traditional hardware encryption solutions in throughput, resource efficiency, and dynamic scalability.By integrating multi-core parallel processing, a master-slave core collaborative scheduling mechanism, and dynamic channel scaling technology, the architecture achieves 12.6 Gbps encryption throughput with a single 16-core module on Xilinx Kintex UltraScale FPGAs, delivering a power efficiency of 6.3 Gbps/W—a 34x improvement over CPU/GPU heterogeneous solutions. Furthermore, the architecture reduces LUT resource consumption by 20.54% and 28.46% compared to implementations on Microchip PolarFire and Lattice ECP5 FPGAs, respectively. Through physical isolation and zero-copy key distribution techniques, it ensures data security in multi-tenant environments while significantly enhancing the safety of long-distance RDMA over 10Gbps links.
|
|
We-S4-BMI.WS |
Room 0.49&0.50 |
BMI Workshop - Paper Session 5: Active BCIs |
BMI Workshop |
Chair: Dehais, Frederic | ISAE-SUPAERO |
|
16:00-16:15, Paper We-S4-BMI.WS.1 | |
Brainwave-Based TAN Authentication: An SSVEP BCI for Secure Web Transactions (I) |
|
Cantürk, Atilla | Rhine-Waal University of Applied Sciences |
Volosyak, Ivan | Rhine-Waal University of Applied Sciences |
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: This paper presents a user-friendly SSVEP-based Brain–Computer Interface (BCI) system for secure transaction authentication using a 6-digit ternary Transaction Authentication Number (TAN). The system eliminates training and minimizes cognitive load via a single-stimulus design and provides a novel kind of BCI application. A filter bank canonical correlation analysis (FBCCA) based algorithm enables accurate classification without prior calibration. Sixteen participants completed five authentication trials each using a web-based donation task. All participants performed at least one successful authentication, with nine participants (56%) achieving perfect accuracy across all trials. The system reached an average classification accuracy of 94.6% and a mean authentication time of 17.2 seconds. Compared to a recent cVEP-based TAN system, this reflects a 35.6 percentage point improvement in accuracy and a 61.9% reduction in authentication time. Participants reported low mental and physical strain, which indicates high usability. Future work includes transitioning to dry electrodes and embedded real-time processing to support deployment in real-world applications.
|
|
16:15-16:30, Paper We-S4-BMI.WS.2 | |
Enhancing Motor Imagery Decoding with Environmental Context During Robot Control |
|
Simonetto, Piero | University of Padova |
Toniolo, Sebastiano | University of Padova |
Tortora, Stefano | Intelligent Autonomous System Lab, Department of Information Eng |
Menegatti, Emanuele | University of Padua |
Tonin, Luca | University of Padova |
Keywords: Active BMIs, BMI Emerging Applications
Abstract: Motor imagery (MI) is a fundamental brain-machine interface (BMI) paradigm in which users learn how to modulate their brain signals to voluntarily and selectively activate specific areas of the sensorimotor cortex. The self-paced nature of this kinesthetic imagination makes MI well-suited for several human-robot interaction (HRI) scenarios. However, the performance of MI decoding is highly dependent on both the user’s expertise and the quality of the decoding algorithm. To address these challenges, we propose a method that integrates environmental data from robotic sensors to enhance the MI decoding process. Preliminary experiments indicate that this approach improves decoding accuracy and the overall performance of the brain-driven system, opening new opportunities for research on how machines can enhance the usability of BMIs systems.
|
|
16:30-16:45, Paper We-S4-BMI.WS.3 | |
Steady-State Motion Visual Evoked Potentials with 3D Stimuli in a VR-Based BCI (I) |
|
Scheppink, Hanneke | Rhine-Waal University of Applied Sciences |
Cortés Navarro, María del Carmen | Hochschule Rhein-Waal |
Volosyak, Ivan | Rhine-Waal University of Applied Sciences |
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: This study investigated the potential of steady-state motion visual evoked potential (SSMVEP) using 3D stimuli as a preliminary step towards developing a brain-computer interface (BCI) for command selection in a virtual reality (VR) environment. Several movement patterns and stimulus shapes were examined to determine the optimal stimulus configurations for eliciting strong and robust SSMVEP responses. The aim was to identify the most effective stimulus and SSMVEP movement to improve user comfort, interaction and control accuracy for future immersive VR applications presented in a head-mounted display (HMD). The proposed shapes, cube and diamond, yielded comparable performance in the steady-state visual evoked potential (SSVEP) condition; however, for the zooming and rotating movements, the cube achieved higher average accuracies. In the frequency spectrum, the average signal-to-noise ratio (SNR) values of the diamonds were comparable to those of the cubes. Subjective results showed that participants had a preference for the rotating diamonds.
|
|
16:45-17:00, Paper We-S4-BMI.WS.4 | |
Canine EEG in Natural Settings: Auditory Evoked Potentials During Free Movement Using a Miniature Wireless System |
|
Schreiner, Leonhard | G.tec Medical Engineering GmbH |
Sieghartsleitner, Sebastian | G.tec Medical Engineering GmbH |
Fummo, Marco | G.tec Medical Engineering GmbH |
Guger, Christoph | G.tec |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Active BMIs
Abstract: This study presents a non-invasive methodology for recording electroencephalography (EEG) in awake, unrestrained dogs using a compact, wireless 8-channel system. EEG data were collected from two dogs of different breeds during an auditory oddball paradigm without sedation. eight gold cup electrodes were symmetrically positioned on the scalp, and recordings were sampled at 250 Hz. Signal quality was validated using power spectral density analysis. Auditory evoked potentials (AEPs) reliably differentiated target from non-target tones, with a clear N100 response observed. In both dogs, the largest amplitudes were recorded over the right lateral temporal region. These findings demonstrate the feasibility of rapid, stress-free EEG acquisition in dogs and other non-human animals and support the use of mobile neurotechnology in naturalistic settings. This approach advances canine cognitive neuroscience and contributes more broadly to the study of brain activity in ecologically valid conditions across species.
|
|
17:00-17:15, Paper We-S4-BMI.WS.5 | |
A Study on the Effectiveness of Complementary Labels to Update Decoders for MI-BCI Systems |
|
Kanda, Takuya | NTT Corporation |
Isezaki, Takashi | NTT Corporation |
Okitsu, Kengo | NTT |
Keywords: Active BMIs
Abstract: The real-time update of decoders during Motor Imagery-based Brain-Computer Interface (MI-BCI) operation is a critical technological challenge. MI classification features exhibit variability not only across subjects but also during usage, necessitating model tuning during operation. Due to the non-stationarity of EEG signals, MI-BCIs require adapting the model before each session. However, data collection for tuning is often expensive and time-consuming. To address this challenge, weakly supervised learning with complementary labels has been proposed as a means to reduce the cost of training data collection while allowing subject-specific tuning. Despite its potential, this method has not yet been applied to MI classification, and its effectiveness (in terms of learnability and performance improvement) remains unclear. This study investigates the feasibility of applying complementary label-based updates to MI-BCI models using open MI datasets. The results demonstrated a significant accuracy improvement in 3-class classification, highlighting the potential for efficient real-time model adaptation.
|
|
17:15-17:30, Paper We-S4-BMI.WS.6 | |
Multi-Direction Imagined Hand Movement Classification Using EEG-Based Brain Computer Interface |
|
Gangadharan K, Sagila | Singapore Institute of Technology |
Parashiva, Praveen Kumar | Singapore Institute of Technology |
A. P., Vinod | Singapore Institute of Technology |
Keywords: BMI Emerging Applications, Active BMIs, Passive BMIs
Abstract: Non-invasive decoding of imagined movements has significant potential in developing neuro-prosthetics and assistive technology for people with motor disabilities. While there has been extensive research conducted on classifying bilateral movement imaginations, classification of the kinematics associated with movement imaginations of unilateral limb is still an open research problem. Decoding the movement imaginations of a single limb can enhance human computer interaction by enabling more natural and intuitive movement control. This work aims to classify the direction of imagined hand movement in a four directional motor imagery (MI) task. Electroencephalogram (EEG) is recorded from 14 healthy subjects while they imagined center out movement of their right hand in four orthogonal directions (right, left, up and down). A Wavelet Phase Common Spatial Pattern (WPCSP) method is proposed to extract useful features from EEG to decode the imagined movement directions. The proposed method extracts informative features from the instantaneous phase signal using Common Spatial Pattern and are then classified using Linear Discriminant classifier, resulting in a mean binary direction classification accuracy of 68.54±7.5% and four-class direction classification accuracy of 38.48±8.1% among 14 healthy subjects. The results highlight the significance of phase-based features in decoding imagined kinematics of unilateral limb movements. This outcome is a step towards achieving higher degrees of freedom of movement and enhancing the efficacy of rehabilitation strategies.
|
|
We-Online |
Online Room |
Online Session HMS |
Regular Papers - HMS |
|
08:30-17:30, Paper We-Online.1 | |
SDRNet: Joint Modeling of Static and Dynamic Representations for Breast Ultrasound Diagnosis |
|
Song, Ziyang | Shanghai Jiao Tong University |
Li, Yiming | Shanghai Jiao Tong University |
Zhu, Ying | Ruijin Hospital, Shanghai Jiao Tong University School of Medicin |
Xu, Yi | Shanghai Jiao Tong University |
Keywords: Medical Informatics
Abstract: Breast ultrasound (BUS) diagnosis requires syn- ergistic analysis of static images (capturing 2D features like microcalcifications and boundaries) and dynamic videos (re- vealing 3D structural dynamics such as ductal infiltration and fluid mobility), particularly for complex lesions including intraductal carcinomas and phyllodes tumors. Existing deep learning methods predominantly focus on single modalities, limiting diagnostic accuracy. To address this, we propose SDRNet, a dual-branch framework integrating three innovations: 1) a Medical Prior-Guided Sampling (MPGS) module that automatically selects diagnostically critical video keyframes; 2)cross-modal attention fusion enabling bidirectional alignment of static spatial details with spatiotemporal video patterns; and 3) Conditional Layer Normalization (CLN) that injects static anatomical context into video feature learning, enhancing fine-grained cross-modal alignment. A temporal attention module further prioritizes diagnostically salient video segments. Evaluated on a BUS dataset containing challenging dual-modality cases, SDRNet achieves state-of-the-art performance (92.6% AUC, 87.0% accuracy and 88.2% malignant F1-score), outperforming single-modality models by 2.8 % in AUC and late-fusion baselines by 2.6%.
|
|
08:30-17:30, Paper We-Online.2 | |
DP-GaussTalk: Dual-Path Audio-Driven Feature Fusion for 3D Gaussian-Based Talking Head Synthesis |
|
Chen, Zhiwei | Xinjiang University |
Yu, Yinfeng | Xinjiang University |
Wang, Liejun | Xinjiang University |
Keywords: Multimedia Systems, Human Perception in Multimedia, Virtual/Augmented/Mixed Reality
Abstract: 音频驱动的 3D 会说话头生成是一项关键研究 人工智能主题及其应用 虚拟助理、电影制作和在线教育。 然而,当前的方法面临着诸如 精准视听同步,面部细致 表达式重建和实时渲染 效率。为了解决这些问题,我们建议 DP-GaussTalk,一种用于生成音频驱动的新型框架 基于高斯的 3D 说话头。我们的方法使用 基于WavLM的音频特征提取网络获取 多尺度音频表示。它包含一个 用于特征细化的时间音频压缩器 (TACo), 实现更精确的 3D 高斯模型变形。 此外,我们还引入了双路径交叉注意力 增强音频和 3D 视觉对齐的机制 功能,可提高口型同步准确性和面部 细节保真度。我们还包括一个 HyperRestore 模块 提高合成视频的视觉质量。 在多个基准数据集上的实验结果 证明DP-GaussTalk优于
|
|
08:30-17:30, Paper We-Online.3 | |
Personalized Electrical Muscle Stimulation for Precise Weight Perception in Virtual Reality: A Machine Learning Approach |
|
Vrontos, Apostolos | Institute of Industrial Engineering and Ergonomics, RWTH Aachen |
Zolnouri, Reza | Institute of Industrial Engineering and Ergonomics, RWTH Aachen |
Nitsch, Verena | Institute of Industrial Engineering and Ergonomics, RWTH Aachen |
Mertens, Alexander | Institute of Industrial Engineering and Ergonomics, RWTH Aachen |
Brandl, Christopher | Fraunhofer Institute for Communication, Information Processing A |
Keywords: Haptic Systems, Human-Machine Interaction, Virtual/Augmented/Mixed Reality
Abstract: Electrical muscle stimulation (EMS) can create compelling weight sensations in virtual reality by activating antagonist muscles, prompting users to exert effort against induced contractions. This experiment investigates personalizing EMS amplitudes to precisely make a 2 kg object feel 1 kg heavier. This task is challenging due to high individual variability in sensory, motor, and pain thresholds, as well as physical characteristics. We applied nine personalized EMS intensity levels to the triceps and extensor carpi ulnaris muscles of 75 participants, who then compared the perceived weight of the stimulated arm against a 3 kg reference. A machine learning model was trained on the collected data to predict the perceived weight difference from user-specific features. The best-performing XGBoost model achieved a classification accuracy of 77%. Feature analysis revealed that the applied EMS amplitude was the most influential predictor, supplemented by individual motor and pain thresholds, body mass index, and forearm length. This ML-based personalization is a significant advancement for creating adaptive haptic feedback, with applications in immersive VR, physical rehabilitation, and physiotherapy
|
|
08:30-17:30, Paper We-Online.4 | |
Facial Action Unit Detection Based on Differential Attention and Spatial-Temporal Interactive Fusion |
|
Li, Xiuzhen | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Peng | Qilu University of Technology, Shandong Computer Science Center |
Zhao, Wei | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Jian | Qilu University of Technology(Shandong Academy of Sciences) |
Wang, Fuqiang | Qilu University of Technology, Shandong Computer Science Center |
Li, Ye | Qilu University of Technology (Shandong Academy of Sciences) |
Wu, Xiaoming | Qilu University of Technology, Shandong Computer Science Center |
Keywords: Human-Computer Interaction, Human-Machine Interface, Affective Computing
Abstract: Facial action unit (AU) is one of the objective representations of facial behaviors. Since the changes in AU are subtle, it is challenging to capture the local regions related to AU, and utilizing the spatial-temporal information becomes crucial for AU detection. However, the relationship between AUs in different frames and the complementary information in spatial-temporal features are not well explored. In this paper, we propose an AU detection model based on differential attention (DA) and spatial-temporal interactive fusion (STIF). The DA module is designed to enhance the local spatial features related to AU with dynamic temporal information. By interactively fusing the spatial-temporal features, the STIF module is proposed to capture the complementary information and obtain more robust representations of AU features. Afterwards, we further learn the dependencies between AUs by introducing an AU relationship learning module. Compared with other state-of-the-art methods, our method achieves competitive performance on two widely-used datasets, BP4D and DISFA, with average F1 scores of 65.1% and 65.4%, respectively. The model contains only 11.7M parameters, which is relatively lightweight for AU detection.
|
|
08:30-17:30, Paper We-Online.5 | |
PB-NeRF: Part-Based Neural Radiance Fields for Dynamic Human Rendering |
|
Jiang, Ming | Guilin University of Electronic Technology |
Li, Jiawei | Guilin University of Electronic Technology |
Zhang, Hao | Central South University |
Lu, Yao | Guilin University of Electric Thechnology |
Tang, Yan | Central South University |
Keywords: Virtual/Augmented/Mixed Reality, Virtual and Augmented Reality Systems, Human Enhancements
Abstract: Dynamic human rendering with Neural Radiance Fields (NeRF) has made significant progress, especially under part-based rendering paradigms. However, due to the non-rigid nature of human skin and clothing, sampling points often drift, leading to blurred transitions at the junctions of segmented regions. To address this issue, we propose PB-NeRF, a novel framework that introduces a consistency loss to enforce bi-directional consistency in sampling point transitions. Additionally, we design a dedicated module that extracts pose features of individual body parts from keyframes to guide the rendering of occluded regions. Experimental results show that PB-NeRF effectively reduces artifacts in part-based human rendering and significantly improves overall rendering quality. Our method not only accelerates training on the ZJU-MoCap and MonoCap datasets but also produces high-fidelity images, demonstrating both its efficiency and robustness.
|
|
08:30-17:30, Paper We-Online.6 | |
M2ST-Net: Human-Object Interaction Recognition Using a Multi-Stream Multi-Feature Spatial-Temporal Network |
|
Wu, Bohong | Sun Yat-Sen University |
Gao, Qing | Sun Yat-Sen University |
Lai, Yuanchuan | Sun Yat-Sen University, School of Electronics and Information Te |
Huiwen, Zhang | Pudu Robotics |
Keywords: Human-Machine Interaction, Human-Computer Interaction, Visual Analytics/Communication
Abstract: Human-robot interaction (HRI) relies on accurate human-object interaction (HOI) recognition to enable collaborative behaviors. While HOI recognition has been considered for enabling intuitive robot interactions, video-based implementations face challenges in handling complex objectpose variations and temporal dynamics. This work proposes a Multi-stream Multi-feature Spatial-Temporal Network (M²STNet) that synergistically integrates visual, geometric, and global relational features. The framework leverages Faster R-CNNextracted ROI features for visual perception, while dedicated geometric and global encoders respectively capture humanobject spatial dynamics and entity positional relationships. Evaluations on MPHOI-72, CAD-120, and Bimanual Actions demonstrate superior performance over existing methods. HRI validation on dual-arm robots confirms practical applicability of our methods.
|
|
08:30-17:30, Paper We-Online.7 | |
Speech Emotion Recognition Based on Multi-Scale and Adaptive Feature Fusion |
|
Zhang, Wenhao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Peng | Qilu University of Technology, Shandong Computer Science Center |
Zhang, Jianan | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Xiuzhen | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Jianqiang | Qilu University of Technology, Shandong Computer Science Center |
Zhao, Wei | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Ye | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Affective Computing, Multimedia Systems, Human-Machine Interaction
Abstract: Speech emotion recognition (SER) aims to predict emotional states from speech data. Current neural network based SER methods typically adopt a fixed attention granularity, which limits their ability to capture multi-scale features and interactive information. In this paper, we propose a multi-scale and adaptive feature fusion (MAFF) model for SER tasks. First, feature extraction with multi-scale convolutional neural networks is designed to extract key shallow features with different granularities. Then, an adaptive feature fusion network (AFF-Net) is developed to enhance the feature representation capability by exploiting the interactions between local and non-local features. It consists of two branches: the efficient self-attention (ESA) branch for capturing fine-grained local features and the global information estimation (GIE) branch for modeling non-local information. Experiments on the IEMOCAP and RAVDESS datasets show weighted accuracies of 74.63% and 85.62%, respectively, which are superior to the performance of other state-of-the-art methods.
|
|
08:30-17:30, Paper We-Online.8 | |
Fine-Grained Action Recognition Using Cross-Modal Attention Network for Human-Robot Sign Language Interaction |
|
Hu, Jing | Sun Yat-Sen University |
Gao, Qing | Sun Yat-Sen University |
Cheng, Xianfeng | Sun Yat-Sen University |
Xuerui, Li | University at Buffalo |
Ju, Zhaojie | University of Portsmouth |
Keywords: Human-Machine Interaction, Human-Computer Interaction, Visual Analytics/Communication
Abstract: With the growing demand for barrier-free communication among the deaf and mute, human-robot sign language interaction has gradually gained attention as an auxiliary tool. Action recognition serves as a crucial information source for robots to understand human behavior, enabling robots to recognize signs and achieve natural interaction with deaf people through it. However, existing HRI technologies based on action recognition predominantly focus on coarse-grained human movements, failing to capture and respond to nuanced actions in real-world scenarios. Additionally, multi-modal action recognition often employs early or late fusion methods to integrate various modalities, lacking the exploration of relationships between modalities, resulting in the loss of some correlated information. To enable robots to adapt to diverse scenarios for a more nuanced understanding of human behaviors, we propose a fine-grained action recognition framework using Cross-Modal Attention network (CMA) based on RGB and skeleton. Firstly, holistic features including face, hand, and body are extracted by a pose estimator, effectively representing intricate human actions. Subsequently, to fully leverage the extracted fine-grained features, skeleton is represented as heatmap volumes. Finally, a Cross-Attention Interaction (CAI) module is designed to explore the intrinsic connections between RGB and skeleton, facilitating mutual learning of their respective advantageous features in the deep layers of feature extraction, thereby achieving information interaction. Simultaneously, HRI experiments are conducted on the large-scale fine-grained action dataset, WLASL2000. In this HRI system, the robotic arm responds by performing sign language aligned with the human actions identified by CMA, showcasing the practicality and effectiveness of our proposed model in real-world scenarios.
|
|
08:30-17:30, Paper We-Online.9 | |
SignRobot: Sign Language Recognition for Robot Interaction Based on Dual-Stream Multi-Fusion with Frame Enhancement Network (I) |
|
Hu, Jing | Sun Yat-Sen University |
Gao, Qing | Sun Yat-Sen University |
Lai, Yuanchuan | Sun Yat-Sen University, School of Electronics and Information Te |
Zhang, Yang | Shenzhen Technology University |
Ju, Zhaojie | University of Portsmouth |
Keywords: Human-Machine Interaction, Human-Computer Interaction, Visual Analytics/Communication
Abstract: Deaf and hard-of-hearing individuals often rely on signs for communication, but limited translation resources restrict their daily needs. Robots equipped with sign language recognition and interaction capabilities can assist in bridging this gap. To enhance sign language translation resources, we design a robotic system called SignRobot, which accurately recognizes and responds to signs. To improve recognition performance, we develop a sign language recognition network based on dual-stream multi-fusion and frame enhancement (MFENet), using RGB and heatmaps as inputs. Specifically, an frame enhancement module with parallel spatial and motion guidance is introduced to emphasize key spatial regions and movement changes. By employing a multi-fusion strategy, we achieve feature-level interaction and adaptive late fusion between modalities, improving accuracy and robustness. Experimental results show that MFE-Net surpasses state-of-the-art methods on the PHOENIX14 and PHOENIX14-T datasets. Additionally, our SignRobot successfully demonstrates sign recognition and robotic responses in sign language, representing a promising advancement in robot-assisted communication for deaf people.
|
|
08:30-17:30, Paper We-Online.10 | |
GBA-Net: A Multi-Task Medical Image Classification Model Based on Global Biaxial Attention |
|
Li, Haoyang | South China Normal University |
Qiu, Guansheng | South China Normal University |
Zeng, Xiyin | Hongkong Science and Technology University(Guangzhou) |
Liu, Siwei | University of Aberdeen |
Keywords: Medical Informatics
Abstract: Medical image classification is critical in computer-aided diagnosis. However, existing methods suffer from high computational complexity and struggle to efficiently fuse multi-scale features. Extracting global representations while preserving local details remains challenging, and class imbalance further limits performance. To address these issues, we propose GBA-Net, a multi-scale feature fusion framework based on a ResNet-50 backbone. GBA-Net leverages Global Biaxial Attention (GBA) to enhance feature extraction, efficiently modeling both local and global information by integrating multi-scale features along spatial and channel dimensions while reducing computational cost. Additionally, we introduce Balanced Softmax Focal Loss (BSFL), which dynamically adjusts class weights based on class frequency and hard-to-classify samples, improving classification accuracy for minority classes and enhancing overall model robustness. Experimental results show that GBA-Net effectively captures comprehensive spatial information while alleviating class imbalance, achieving significant classification accuracy improvements across six benchmark datasets from four medical imaging modalities, with gains upto 15.75%.
|
|
08:30-17:30, Paper We-Online.11 | |
Improving Accuracy of Contactless Palmprint Recognition Using Gabor Filters and Local Binary Pattern for Highly Secure Biometric Authentication Applications |
|
Zayan, Mubashwira | North South University |
Abdur Rahman, Hafiz | North South University |
Oeshi, Farhana Akter | North South University |
Keywords: Biometrics and Applications,, Human-Machine Interaction
Abstract: The authentication systems are facing growing challenges during the era of AI. To address this challenge, this paper proposes a highly accurate and reliable authentication method using the user’s palmprint. Our method is based on pattern identification from palmprints by utilizing a blend of sophisticated image processing methods and machine learning algorithms, particularly employing Gabor filters and Local Binary Pattern (LBP) for extracting features. In order to improve our results, we also employed Principal Component Analysis (PCA) for reducing dimensionality and Support Vector Machine (SVM) for classification purposes. Our final outcome shows that the model has excellent performance across all classes, with minimal misclassifications. In terms of accuracy for correctly detecting palmprints, our approach achieves a very high accuracy of about 99%.
|
|
08:30-17:30, Paper We-Online.12 | |
Adding Multi-Scale Priors for 3D Pose Estimation |
|
Dang, Jia | Xi'an Jiaotong University |
Li, Tianyi | Xi'an Jiaotong University |
Liu, Yuehu | Xi'an Jiaotong University |
Keywords: Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: Recently solutions based on human prior information have been introduced to estimate 3D human pose from 2D keypoint sequence. These methods aim to learn human skeleton knowledge by analyzing joint connections within the body structure. However, we observe that previous methods cannot capture motion integrity and effectively model action correlation, resulting in the lack of smoothness and coherence in the estimated pose sequence. Therefore, we propose to use multi-scale priors for 3D pose estimation, namely Body-Scale Prior (BSP) and Action-Scale Prior (ASP). These modules take advantage of the body structure and action semantics, to enhance the network's capability of understanding human pose. BSP models human body knowledge by constructing the skeleton joint structure and relationship between frames, while ASP learns high-dimensional action semantics by constructing topological encoding of the action sequence. By fusing prior features from these two scales, our method enables simultaneous modeling of skeleton joints and integral actions. Extensive experiments are conducted on the most popular benchmark: Human3.6M. The results show that our model achieves better performance in comparison to state-of-the-art methods.
|
|
08:30-17:30, Paper We-Online.13 | |
The Influence of Educational Chatbots with Social Cues on Students’ Learning Experience (I) |
|
Ma, Tiancong | HK PolyU |
Saionji, Ayaka | South China University of Technology |
Keywords: Human-Computer Interaction, Human-centered Learning
Abstract: This study proposes an educational chatbot with high-level social cues that conveys emotions, facial expressions, and body movements through avatar animation, and applies it to educational practice. The experiment recruited 104 middle school students, divided into a control group (taught by traditional teachers) and two experimental groups (chatbots using high social cues and low social cues respectively). After 10 classes over 5 days, we compared the learning effects and satisfaction of students in each group. The results show that chatbots with high social cues significantly improve learning results and user satisfaction. Interview results indicate that chatbots improve students’ learning experience, allowing them to learn flexibly, reduce stress, save time, and maintain motivation. These findings provide valuable experience for the design of educational chatbots and promote the realization of transformative learning outcomes.
|
|
08:30-17:30, Paper We-Online.14 | |
Auto-Visualization Driven Visual Analytics for Elderly Care Institutions |
|
Zhu, Yongjie | Southwest University of Science and Technology |
Wang, Song | Southwest University of Science and Technology |
Zuo, Laipan | Southwest University of Science and Technology |
Wu, Tong | SouthWest University of Science and Technology |
Keywords: User Interface Design, Interactive and Digital Media, Information Visualization
Abstract: To tackle unstructured, scattered information and user-need gaps in elderly care info management, this study created SeniorAdvisor. It uses BERT-based classification, chart decision trees, and EDA for automated visualization. By blending user needs with EDA insights via iterative visualization and custom coding, it recommends fitting institutions, boosting efficiency and satisfaction. Case studies confirm SeniorAdvisor enhances info management and meets elderly care needs, presenting innovative approaches for elderly care institutions.
|
|
08:30-17:30, Paper We-Online.15 | |
WMTMNet: An Automatic Sleep Staging Model Based on Wavelet Convolution and Multimodal Bilinear Pooling |
|
Tian, Maohui | Chongqing University of Posts and Telecommunications |
Tian, Yin | Chongqing University of Posts and Telecommunications |
Keywords: Biometrics and Applications,, Brain-Computer Interfaces, Human-Machine Interaction
Abstract: 睡眠分期是睡眠的关键组成部分 评估并对疾病具有重要价值 诊断。 尽管现有的研究在 条款 性能方面,在性能方面仍存在不足。 解释性 算法决策和结果的性。这 “黑色的 盒子“问题在高风险中尤为令人担忧 应用 比如医疗决策。因此,本文 提出 一种可解释的混合深度学习模型,其中 采用 睡眠阶段分类的专业知识。首先,我们 雇 基于 Morlet 小波的内核层,用于分离特征 之 睡眠数据中的不同频率。这一层是 设计 基于人类使用的视觉分析原理 专家 多导睡眠图 (PSG) 记录。接下来,我们利用 多尺度 分组卷积以提取和减少 維度 对应不同频率的时间序列 乐队。 随后,将 Transformer 编码器应用于模型 时间序列中的上下文信息。 最后,一个 采用多模态双线性
|
|
08:30-17:30, Paper We-Online.16 | |
FlexSecure: Enhancing Flexibility and Security of Shared Terminals in Real-Time Collaborative Programming Environments |
|
Fang, Bicheng | Tongji University |
Wang, Mingjie | Tongji University |
Lu, Chengbin | Tongji University |
Jiang, Jinfeng | Tongji University |
Wang, Liyou | Tongji University |
Zhao, Bowei | Tongji University |
Gao, Zhen | Tongji University |
Fan, Hongfei | Tongji University |
Zhao, Shengjie | Tongji University |
Keywords: Human-Computer Interaction, Multi-User Interaction
Abstract: Real-time collaborative programming is an emerging technology that supports a team of programmers to concurrently view and edit source code documents, with the benefits of enhancing team productivity and reducing project cost. Shared terminal is one crucial component of real-time collaborative programming environments, which facilitates interactive and instant peer support in debugging scenarios. In this study, we propose a novel approach named FlexSecure to address two major challenges in existing shared terminals. FlexSecure supports unconstrained and flexible shared terminal sessions that allow any collaborator to initiate, and meanwhile, preserves the local security of the initiator by incorporating fine-grained permission control to prevent risky command execution. Prototype implementation has validated the feasibility of FlexSecure, and user evaluation has demonstrated its effectiveness and satisfactory performance.
|
|
08:30-17:30, Paper We-Online.17 | |
YOLO-Map: Enhanced Boundary Feature Extraction and Small Target Detection for Problematic Maps |
|
Xu, Yan | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Zhenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Chuantao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Geng, Liting | Shandong Computer Science Center (National Supercomputer Center |
Liu, Yue | Qilu University of Technology |
Li, Jintao | Qilu University of Technology |
Wang, Chunxiao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Zhao, Zhigang | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Visual Analytics/Communication, Multimedia Systems, Information Visualization
Abstract: 地图数据的独特性质带来了挑战 用于检测有问题的 Map 中的关键错误区域, 特别是在不连续边界特征方面 小目标信息的提取和忽略。自 针对这些问题,我们提出了一个轻量级的 Problemtic 地图检测算法称为 YOLO-Map。一、要确保 边缘提取的完整性和抵抗力 干扰,我们设计了 Dual-branch Attention 卷积模块 (DACM),它利用协同 两个分支准确识别国家的效果 边界区域。接下来,多路径特征聚合 (MPFA) 模块采用双向自适应融合 策略,增强多尺度的递归连接 功能并提高目标定位精度。 此外,我们还提出了 Global Context Fusion Module (GCFM),可增强小目标特征 通过多分支协作进行代表 注意力机制。实验结果表明 YOLO-Map 在 CME 上的准确率达到 87.3% (mAP@.5) 数据集,性能优于许多大型模型。
|
|
08:30-17:30, Paper We-Online.18 | |
Multi-Modal Adaptive Synchronous Learning for Dermatosis Diagnosis |
|
Dong, Ziyu | Zhengzhou University |
Zhang, Shibo | Institute of Computing Technology, Chinese Academy of Sciences |
Yang, Xiaodong | Institute of Computing Technology, Chinese Academy of Sciences |
Qin, Xin | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: Multimedia Systems, Medical Informatics
Abstract: Skin cancer is one of the most prevalent cancers worldwide. Early diagnosis of skin lesions is essential for improving survival rates. Multimodal learning with skin lesion images and clinical metadata has shown promise in boosting diagnostic performance. However, imbalanced multimodal learning remains a major challenge. The image modality, containing more decisive information, often dominates the learning process, suppressing effective learning from the metadata. Existing methods often overlook the inherent disparity in decisive information between modalities and tend to overemphasize less informative ones, leading to suboptimal performance. Moreover, the widespread use of one-hot encoding for categorical metadata creates sparse representations, which complicate weight updates in early network layers and further suppress metadata learning. To address these issues, we propose a Multimodal Adaptive Synchronous Learning (MASL) method. Speciffcally, we replace conventional one-hot encoding with a Dense Token Embedding (DTE) derived from a learnable embedding matrix. In addition, we introduce a Prior-Guided Adaptive Parameter Freezing (PGAPF) mechanism, which leverages the prior decision conffdence ratio between modalities to adjust the multimodal learning process. Experiments on PAD-UFES-20 and ISIC 2019 demonstrate that MASL improves diagnostic performance and promotes more balanced multimodal learning, achieving 1–2% gains across key evaluation metrics, including AUC, accuracy, sensitivity, and speciffcity.
|
|
08:30-17:30, Paper We-Online.19 | |
GM-SAM: Edge-Aware Sonar Image Segmentation by Gradient-Enhanced and Multi-Perspective Fusion |
|
Liu, Yue | Qilu University of Technology |
Li, Chuantao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Zhang, Zhenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Xu, Yan | Qilu University of Technology (Shandong Academy of Sciences) |
Lv, Jialiang | Qilu University of Technology (Shandong Academy of Sciences) |
Wang, Chunxiao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Zhao, Zhigang | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Human-Machine Interaction, Visual Analytics/Communication, Virtual/Augmented/Mixed Reality
Abstract: Sonar imaging, due to its ability to penetrate water and certain obstacles, has become a pivotal technology in marine domains. The complexities of underwater environments, combined with the low resolution and noise interference inherent in sonar images, present significant challenges in sonar image segmentation. To address these issues, we propose a novel approach, GM-SAM, which integrates denoising and multi scale boundary feature extraction into the Segment Anything Model (SAM). Initially, the Multi-Perspective Feature Module(MPFM) employs parallel attention mechanisms and channel enhanced convolutions to reduce computational complexity, suppress noise, and capture both local details and global feature. Following this, the Gradient-Enhanced Transformation Module(GETM) leverages gradient-based operations to extract edge features, enhancing the model’s sensitivity to boundaries. Finally, the Sonar-Fusion Adapter module integrates task-specific knowledge from MPFM with general features from SAM. Experimental results demonstrate that GM-SAM significantly outperforms existing methods, achieving superior Dice scores and effectively segmenting valid targets.
|
|
08:30-17:30, Paper We-Online.20 | |
MGPSyn: Molecular Graph Pretraining Enhanced Synergistic Drug Combination Prediction |
|
Liang, Xiaoyi | Tongji Univeristy |
Zhu, Hongming | Tongji University |
Zhu, Xiaoli | Tongji University |
Mao, Dongsheng | Tongji University |
Liu, Qin | Tongji University |
Keywords: Medical Informatics, Biometrics and Applications,
Abstract: Identifying synergistic anticancer drug combinations is a key challenge due to the vast number of potential drug pairs and the high cost of experimental screening. Recent advances leverage deep learning and graph neural networks (GNNs) to learn molecular representations directly from drug structures, but the limited number of unique drugs in available datasets restricts the effectiveness of such models. In this paper, we propose a framework that addresses this issue through a DDI-enhanced Deep Graph InfoMax pretraining strategy. Our approach captures structural and relational drug features by jointly maximizing local–global mutual information and incorporating drug–drug interaction prediction. A multi-level interaction model with semantic attention further enhances feature fusion, leading to improved synergy prediction performance. We compared our method with five advanced methods on two public datasets. The results demonstrate that the proposed method exhibits superior generalization ability.
|
|
08:30-17:30, Paper We-Online.21 | |
CFANet: A Cross-Feature Attention Network for Enhanced EEG Motor Imagery Decoding |
|
Zhu, Liang | Yunnan University |
Zhu, Weina | Yunnan University |
Keywords: Brain-Computer Interfaces, Biometrics and Applications,, Assistive Technology
Abstract: Motor imagery (MI) EEG signal decoding plays a crucial role in brain-computer interfaces (BCIs), especially for applications in rehabilitation and assistive technologies. However, the decoding of MI-EEG signals remains a challenging task due to the high inter-subject variability, complex temporal dynamics, and the difficulty in capturing multi-domain feature interactions. This paper proposes a novel approach, CFANet, a cross-feature attention network designed to address these challenges. CFANet integrates a Cross Feature Enhancement (CFE) module and hybrid attention mechanisms to effectively capture the interactions between different feature domains and enhance the integration of both local and global features. Experimental results on three publicly available EEG datasets (BCIC-IV-2A, BCIC-IV-2B, and HGD) show that CFANet outperforms several state-of-the-art models in terms of classification accuracy and Cohen's Kappa. Ablation studies further validate the significant contributions of the CFE module and attention mechanisms to the model’s overall performance. Additionally, Grad-CAM visualizations demonstrate CFANet's ability to focus on critical brain regions during motor imagery tasks, providing insights into its decision-making process. These results highlight CFANet’s potential for practical BCI applications, while suggesting avenues for further improving its real-time performance and cross-user adaptability.
|
|
08:30-17:30, Paper We-Online.22 | |
Topic Recognition and Fine-Grained Access Control for Text Paragraphs Based on Large Language Models |
|
Sa, HaiPing | Xinjiang University |
Yang, Fei | Xinjiang University |
Wang, Xiaoli | Xinjiang University |
Keywords: Systems Safety and Security,, Multi-User Interaction, Human-Computer Interaction
Abstract: 随着信息化和大数据的快速推进,数据隐私保护已成为各行各业的紧要关注点。然而,现有的文本访问控制机制主要在文档或章节级别采用粗粒度策略,缺乏根据用户权限动态定制内容的灵活性。此外,识别和管理敏感信息通常依赖于手动注释,这是劳动密集型的、容易出错且不可扩展的。针对这一局限性,围绕段落层面的语义理解和细粒度访问控制,提出了一种用户权限与段落主题的动态匹配方法,将基于大语言模型(LLM)的文档语义解析与段落主题提取相结合,将主题标注与基于主题的访问控制(TBAC)策略相结合,实现用户权限与段落主题的动态匹配, 实现差异化内容屏蔽和动态展示,保障授权用户访问敏感信息的数据安全。
|
|
08:30-17:30, Paper We-Online.23 | |
EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation |
|
Zhu, Tianheng | Xinjiang University |
Yu, Yinfeng | Xinjiang University |
Wang, Liejun | Xinjiang University |
Sun, Fuchun | Tsinghua University |
Zheng, Wendong | Tianjin University of Technology |
Keywords: Multimedia Systems
Abstract: This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3–5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker’s potential for real-time multimedia applications.
|
|
08:30-17:30, Paper We-Online.24 | |
WL-GS: A Gaussian Splatting Method for Unconstrained Photo Collection under Resource-Limited Conditions |
|
Wang, Hongfei | Xinjiang University |
Zeng, Sisi | Xinjiang University |
Yu, Qing | Xinjiang University |
Keywords: Human-Machine Interaction, Virtual/Augmented/Mixed Reality, Information Visualization
Abstract: Neural Radiance Fields (NeRFs) have demonstrated impressive performance in novel view synthesis from unconstrained image collections. However, their substantial computational overhead and limited rendering efficiency hinder practical deployment. In contrast, 3D Gaussian Splatting (3DGS) enhances training speed and rendering quality through explicit scene representation but suffers from high memory consumption and poor scalability in in-the-wild settings. To address these limitations, we introduce WL-GS, a novel 3D reconstruction framework that adopts an appearance-independent modeling strategy alongside a visibility mask to effectively suppress transient occlusions. Dynamic appearance features are reintegrated into the Gaussian primitives, while a parabolically constrained growth mechanism and a multi-dimensional score-driven densification strategy are employed to control redundancy in point generation. Experimental results show that WL-GS significantly improves reconstruction fidelity while reducing memory usage compared to baseline 3DGS methods. The proposed framework provides a scalable and resource-efficient solution for novel view synthesis, making it well-suited for deployment on platforms with limited computational capabilities.
|
|
08:30-17:30, Paper We-Online.25 | |
CEDS: A Container Escape Detection System Based on Filesystem Isolation Boundaries |
|
Xie, Jing | University of Chinese Academy of Sciences |
Zhang, Tianshu | University of Chinese Academy of Sciences |
Zhang, Weijuan | Institute of Information Engineering, Chinese Academy of Science |
Fang, Junhao | University of Chinese Academy of Sciences |
Yang, Bowen | Institute of Information Engineering, Chinese Academy of Science |
Fu, YuXia | Institute of Information Engineering, Chinese Academy of Science |
Jia, Xiaoqi | Institute of Information Engineering, University of Chinese Acad |
Huang, Qingjia | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Systems Safety and Security,, Design Methods
Abstract: Container technology is becoming increasingly important in cloud computing due to its efficiency and agility, but it also introduces new security risks. In particular, container escape attacks exploiting inherent vulnerabilities in container components have emerged as a primary threat to container security due to their pervasive nature and severe impact. However, existing container escape detection methods mainly rely on known attack patterns and pay insufficient attention to exploits targeting container components. In this work, we propose CEDS, a system based on container filesystem isolation boundaries to detect escape attacks caused by container component vulnerabilities in real time. We first establish an attack model by analyzing 16 container component exploits. Then, we propose a method to identify isolation boundaries by analyzing mount namespaces and container filesystem hierarchies, and subsequently detect abnormal cross-boundary file operations at the kernel level via system call monitoring. Finally, we implement a prototype of CEDS with eBPF. Experimental results demonstrate that, compared with the existing baseline methods, CEDS can effectively detect container escape attacks with minimal performance overhead.
|
|
08:30-17:30, Paper We-Online.26 | |
CogMAS: A Cognitively-Grounded Multi-Agent Framework for Explainable and Consistent Open-Ended Student Response Scoring |
|
Fang, Yixuan | South China Normal University |
Xu, ZiShan | South China Normal University |
Guo, Yifu | South China Normal University |
Lu, YuQuan | South China Normal University |
Yang, Huan | South China Normal University |
Keywords: Human-centered Learning, Assistive Technology, Companion Technology
Abstract: Automated scoring of open-ended questions continues to face significant challenges in modeling student cognition, ensuring scoring consistency, and providing interpretability. Although large language models (LLMs) have demonstrated substantial potential, single-model architectures exhibit structural limitations in handling complex reasoning and cognitive alignment. To address these issues, we propose CogMAS, a Cognitively-Grounded Multi-Agent Scoring Framework that incorporates three types of agents, i.e., student agents, teacher agents, and evaluator agents, to enable multidimensional and interpretable scoring of open-ended responses. CogMAS leverages Bloom's taxonomy to construct a mapping between questions and cognitive dimensions, guiding teacher agents to perform dimension-aware scoring. A dual-stage semantic retrieval module is introduced to provide contextually relevant exemplars. Evaluator agents are responsible for detecting explanation path biases and deriving high-confidence reasoning chains and final scores. Teacher agents are further trained using Direct Preference Optimization (DPO) to improve the quality and consistency of scoring explanations. High-confidence score–explanation pairs are stored in a retrievable memory module to support continuous optimization in future tasks. Experiments on three public open-ended question scoring datasets demonstrate that CogMAS achieves state-of-the-art performance in both scoring accuracy and consistency, validating its effectiveness and generalizability.
|
|
08:30-17:30, Paper We-Online.27 | |
EEG-IvNet: A Framework for Predicting Involvement in UAV Operator Training from EEG Signals |
|
Zhu, Yi | Nanjing Audit University |
Wang, Shasha | Nanjing Audit University |
Chen, Nannan | Nanjing Audit University |
Xiang, Fengtao | National University of Defense Technology |
Wang, Chang | National University of Defence Technology |
Keywords: Virtual/Augmented/Mixed Reality, Cognitive Computing, Virtual and Augmented Reality Systems
Abstract: In modern aviation and logistics, the professional competence and operational skills of unmanned aerial vehicle (UAV) operators are critical to mission success and overall efficiency. Accordingly, effective UAV training, which ensures high levels of involvement, is paramount in enhancing operators’ learning and operational performance. To address the lack of theoretical research on electroencephalography (EEG)-based involvement prediction for UAV training, this study proposes EEG-IvNet, a deep learning framework designed to predict involvement from EEG signals. The model integrates convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and an attention mechanism to classify involvement states. EEG data were collected from participants performing UAV monitoring tasks in controlled experiments and used as input to the model. The results demonstrate that EEG-IvNet outperformed the benchmark CNN+LSTM model in classifying involvement states. Furthermore, this study offers insights into the neural basis of involvement, providing strong theoretical and algorithmic support for incorporating VR technology in UAV training. These findings not only suggest potential improvements in training outcomes but also highlight broader applicability in intelligent aviation, virtual education, and other related domains.
|
|
08:30-17:30, Paper We-Online.28 | |
GNN-ReBeL: Enhancing Neural Belief Representations for Imperfect-Information Games |
|
Zeng, Weijun | Central China Normal University |
Li, Yinghao | China Agricultural University, |
Chen, Xiaosi | Central China Normal University |
Chang, Zijie | Hongyi Honor College, Wuhan University, 430072, Wuhan, China |
Ge, Fei | Central China Normal University |
Keywords: Human-Machine Interaction, Networking and Decision-Making, Human-Computer Interaction
Abstract: AlphaZero 已经证明了集成的有效性 强化学习 (RL) 的搜索方法 国际象棋和围棋等完美信息游戏。然而 将这种范式扩展到不完美信息游戏 由于存在 隐藏的信息。反事实后悔最小化 (CFR)在此类方面提供了理论上的收敛保证 设置,以及基于递归信念的最新进展 学习 (ReBeL) 引入了统一的 RL+搜索 框架,将不完美的信息游戏转化为 公众信仰国家 (PBS) 的完美信息游戏。 尽管 ReBeL 具有理论吸引力,但 ReBeL 的实施 基于多层感知器(MLP)的局限性在于 能够捕捉 公共广播公司。在本文中,我们提出了GNN-ReBeL,一种新方法 利用图神经网络 (GNN) 显式 对信息状态之间的战略关系进行建模 通过将 PBS 表示为图结构数据。这 基于图的表示增强了信息传播 在信念空间内,同时保持收&
|
|
08:30-17:30, Paper We-Online.29 | |
An Investigation of Subgame Depth in ReBeL: Impact on Convergence and Performance in Imperfect-Information Games |
|
Zeng, Weijun | Central China Normal University |
Li, Yinghao | China Agricultural University, |
Chen, Xiaosi | Central China Normal University |
Chang, Zijie | Hongyi Honor College, Wuhan University, 430072, Wuhan, China |
Ge, Fei | Central China Normal University |
Keywords: Human-Computer Interaction, Human-Machine Interaction, Networking and Decision-Making
Abstract: The integration of reinforcement learning with search algorithms has revolutionized AI performance in perfect-information games, with AlphaZero demonstrating superhuman abilities in chess, shogi, and Go. Counterfactual Regret Minimization (CFR) has been the traditional approach for imperfect-information games, but requires extensive precomputation of complete game strategies. Recursive Belief-based Learning (ReBeL) represents a breakthrough by combining self-play reinforcement learning with real-time search in imperfect-information settings, enabling superhuman performance in poker and other games with hidden information. A critical hyperparameter in ReBeL's algorithm is the depth of subgames used during search. While deeper subgames potentially capture more strategic information, they also increase computational complexity. This paper presents a systematic investigation of the impact of subgame depth on ReBeL's performance, convergence rate, and computational requirements. Through extensive experiments on benchmark imperfect-information games Liar's Dice, we demonstrate that optimal subgame depth involves a non-trivial trade-off between solution quality and computational efficiency. Our findings provide practical guidance for implementing ReBeL in various game domains and contribute to the broader understanding of search depth parameterization in reinforcement learning algorithms for imperfect-information games.
|
|
08:30-17:30, Paper We-Online.30 | |
KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models |
|
Zixian, Liu | Central China Normal University |
Liu, Sihao | Central China Normal University |
Yuqi, Zhao | Central China Normal University |
Keywords: Human-Machine Interaction, Augmented Cognition, Human-Computer Interaction
Abstract: 态大语言模型的快速采用 (传销) 在自主代理中,跨平台任务执行 在教育环境中的能力已经获得 显着关注。然而,现有基准 框架在支持方面仍然存在明显缺陷 教育环境中的跨平台任务,尤其是 在处理学校专用软件(如小雅 智能助手、华时夏子等),其中 由于 缺乏对这些结构细节的了解 私域软件。此外,当前评估 方法严重依赖于目标等粗粒度指标 方向或轨迹匹配,使其具有挑战性 捕获代理的详细执行和效率 在复杂的任务中。为了解决这些问题,我们建议 KGCE (知识增强双图评估器 跨平台教育代理基准测试 Multimodal Language Models),一个新颖的基准测试平台 集成了知识库增强和双图 评估框架。我们首先构建了一个数据集 包括 104 项与教育相关的任&
|
|
08:30-17:30, Paper We-Online.31 | |
CNRel: Candidate Prompt Enhancement and Noise Filtering Relational Triple Extraction Framework Based on Large Language Models |
|
Li, Wei | University of Chinese Academy of Sciences |
Xie, Pan | China United Network Communications Group Company LTD |
Zhao, Chenbin | China United Network Communications Group Company LTD |
Li, Hui | Institute of Information Engineering, Chinese Academy of Science |
Li, Liangxiong | Institute of Information Engineering, Chinese Academy of Science |
Ge, Jingguo | University of Chinese Academy of Sciences |
Keywords: Intelligence Interaction, Human-Machine Interaction, Ethics of AI and Pervasive Systems
Abstract: Relational Triplet Extraction (RTE) focuses on extracting triples from sentences, a crucial task in the automatic construction of knowledge graphs. Large Language Models (LLMs) have the ability to automatically extract triples from text through appropriate instructions or fine-tuning. However, due to the bias between LLMs training data and inference data, the previous LLM-based triple extraction method ignores many potentially valuable knowledge and lacks noise filtering, which greatly limits the capability of RTE model. To address these challenges, we propose Candidate Prompt Enhancement and Noise Filtering Relational Triple Extraction Framework Based on Large Language Models (CNRel), which combines small pre-trained language model and LLMs. Specifically, we first utilize a candidate entity pair extraction and filtering block, based on a small pre-trained language model, to extract and refine all possible entity pairs in the text, ensuring the capture of as much valuable information as possible. Then, a fine-tuned LLMs such as LLaMA is then used to predict the relationship between the candidate entity pairs and extract as many triples as possible. Finally, Noise Filter block filter the extracted triples through LLMs, and remove the wrong triples, which greatly improve the precision of the RTE model. Experiments on several public datasets show that CNRel achieves state-of-the-art among all previous mainstream relational triple extraction methods, and we conduct a widely ablation experiments to reveal the contribution of each component to the overall performance.
|
|
08:30-17:30, Paper We-Online.32 | |
Video-Guided Global Contrast Learning for Multimodal Sentiment Analysis |
|
Zhang, Enze | Shandong Normal University |
Zhao, YiRan | Shandong Normal University |
Qiao, Xin | Shandong Normal University |
Li, Lun | Shandong Normal University |
Keywords: Affective Computing, Information Visualization, Intelligence Interaction
Abstract: Multimodal sentiment analysis stands as a fundamental research domain dedicated to predict the speaker's sentiment inclination by leveraging features derived from textual, visual, and acoustic modalities. Multimodal fusion serves as a pivotal issue in multimodal sentiment analysis.Existing studies commonly handle the three modal features on an equal footing or explore the interactions among different modalities with text at the core, they overlook the global contextual information spanning across videos. To address this limitation, we put forward a Video-guided global contrast learning (VGGCL) approach. This method is designed to capture abundant global context features by delving into both the intra-video and cross-video context interactions. To further mitigate error accumulation and interference, we have created a cross-video library. This library takes into account emotional relevance and video similarity to retrieve effective video sources. Moreover, we have introduced a contrastive learning scheme grounded in global context to ease the inconsistency between the global context and individual modalities within different feature spaces. We conducted experiments on the CMU - MOSEI ,CMU - MOSI and IEMOCAP datasets. The results of these experiments validate the effectiveness of our Video-guided global contrast learning framework. Notably, our proposed framework outperforms all the baseline methods.
|
|
08:30-17:30, Paper We-Online.33 | |
MaskCrossKD: Mask Cross Knowledge Distillation for Rotated Object Detection in Aerial Images |
|
Zhao, Zhenbo | Harbin Engineering University |
Dong, Hongbin | University of Harbin Engineering |
Zhang, Xiaoping | China Academy of Chinese Medical Sciences |
Keywords: Environmental Sensing,, Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: Knowledge distillation has been widely applied as an effective model compression technique across various visual tasks. Currently, distillation methods suitable for object detection are typically realized through feature imitation in horizontal bounding box scenarios. In this paper, we propose an effective mask-based distillation scheme, MaskCrossKD. First, random pixels in the student features are masked, and then the mask is distilled through pixel reconstruction. The mask restricts the feature space of the student model, encouraging it to generate complete features that are consistent with the teacher model. Second, we introduce a simple distillation temperature control strategy, which dynamically adjusts the temperature to control the difficulty level of tasks during the student model’s learning process. Additionally, to adapt to rotated object detection, we propose a Rotational Distillation Loss (OKDL). This method avoids the discontinuity of rotated bounding box boundaries by mapping rotated boxes to a Gaussian distribution and designing a loss function based on the Gaussian Wasserstein distance. Experiments conducted on several public datasets across different models show significant improvements in detection performance compared to the teacher model. On the DOTA dataset, MaskCrossKD improved the average precision of FCOS ResNet-50 from 70.70% to 75.47% under the 1X strategy, and that of Oriented R-CNN ResNet-50 from 73.40% to 77.82%.
|
|
08:30-17:30, Paper We-Online.34 | |
Cabin-HMR: Single-View Multi-Person Human Mesh Estimation in Cabin Spaces |
|
Rui, Zhengheng | Southeast University |
Huang, Buzhen | Southeast University |
Wang, Ziyazhuo | Southeast University |
Wang, Di | Huawei |
Wang, Yangang | Southeast University |
Keywords: Human Perception in Multimedia, Human-centered Learning, Human-Computer Interaction
Abstract: Severe occlusion remains a fundamental challenge in single-view human pose estimation, particularly in cabin environments where strong distortion further exacerbates the ill-posed nature of the problem. Leveraging the abundance of 2D pose data, we can impose structured modeling of the human body to infer plausible estimates for occluded regions, thereby guiding the reconstruction of a complete human body mesh. To address this issue, we propose Cabin-HMR, a novel method for multi-person 3D pose reconstruction from a single view. Our approach effectively incorporates human structural priors derived from 2D poses and sitting postures to infer the most plausible full-body pose under occlusion. Furthermore, by integrating depth information as a corrective signal for local image patches, our method significantly mitigates the impact of camera distortion in cabin environments. To enhance generalization to diverse and complex seated postures, we construct a large-scale dataset comprising paired 2D and 3D sitting pose annotations collected from synchronized multi-view camera systems in vehicle interiors. Experimental results demonstrate that Cabin-HMR achieves robust performance across various scenarios, particularly excelling in cabin environments where occlusion and distortion are prevalent.
|
|
08:30-17:30, Paper We-Online.35 | |
Efficient Range-Based Top-K Spatial Dataset Search |
|
Lu, Ziyu | Nanjing University of Posts and Telecommunications |
Dai, Hua | Nanjing University of Post &Telecommunications |
Sun, Jie | Nanjing University of Posts and Telecommunications |
Lu, Bing Hui | Nanjing University of Posts and Telecommunications |
Li, Zhangchen | Nanjing University of Posts and Telecommunications, China |
Yang, Geng | Nanjing University of Post and Telecommunication |
Keywords: Information Systems for Design
Abstract: As the number of open spatial datasets continues to grow, there is a corresponding increase in demand for the ability to efficiently identify spatial datasets that align with users' specific requirements. It has become a significant issue, resulting in the need for a variety of spatial dataset search requirements, including the need for range-based spatial dataset search. In this paper, we propose an efficient range-based top-k spatial dataset search processing for spatial information retrieval based on the quadtree-based region-dataset inverted index (QRDI-index). A relevance measurement between a spatial dataset and a search range is presented first, which is used to rank candidate results. To support efficient search processing, the QRDI-index is designed, which combines the inverted index, quadtree, and spatial datasets. Using the index, we propose an efficient search processing algorithm that filters the minimum tree nodes in the QRDI-index, and the search space is narrowed to these nodes. Experimental results on three real-world spatial data repositories validate the accuracy and efficiency of the proposed search scheme.
|
|
08:30-17:30, Paper We-Online.36 | |
Ocular Feature Extraction for Eye Movement Analysis and Neurological Dysfunction Diagnosis |
|
Wang, Ziqi | Zhejiang University |
Bi, Jing | Beijing University of Technology |
Zhao, Xiaomeng | Beijing Neurorient Technology Co., Ltd |
Zhang, Junqi | Beijing University of Technology |
Zhai, Jiahui | Beijing University of Technology |
Ma, Hongyao | Beijing University of Technology |
Cui, Jinglei | Beijing University of Technology |
Cui, Rong | Beijing University of Technology |
Zheng, Zhipeng | Beijing University of Technology |
Tang, Yuanchen | Xuanwu Hospital, Capital Medical University |
Liang, Jiantao | Xuanwu Hospital, Capital Medical University |
Zhou, Kai | Beijing Neurorient Technology Co., Ltd |
Keywords: Biometrics and Applications,, Human-Machine Cooperation and Systems
Abstract: Neurological dysfunction encompasses a variety of diseases resulting from neural damage. Accurate assessment of neurological function is critical for diagnosis and the development of effective treatment plans. A significant number of patients with neurological disorders exhibit ocular abnormalities. Analyzing ocular status through eye movement capture plays a pivotal role in understanding various neurological dysfunctions. However, current methods of analyzing ocular status for neurological function assessments lack precision and objectivity, often relying heavily on physicians' subjective judgment. This work proposes the Ocular-enhanced Face Keypoints Network (OFKNet), a facial keypoint detection model based on deep convolutional neural networks. OFKNet employs ConvNeXt as its backbone network and introduces a multi-scale input enhancement strategy. Additionally, a region enhancement module based on MobileNetV3 is designed to optimize features in the canthus area. Multiscale feature fusion and channel weighting are achieved through an improved Path Aggregation Network and Squeeze-and-Excitation modules. To validate OFKNet's accuracy, we compared it with state-of-the-art models, including MediaPipe FaceLandmarker, InsightFace, Dlib68, and Dlib81, using a patient dataset we collected. Experimental results demonstrate that OFKNet outperforms existing models, particularly in calibration accuracy around the eyes. By monitoring eye movements in real-time, OFKNet ensures high-precision extraction of key points in each frame, accurately reflecting changes in patients' ocular movements.
|
|
08:30-17:30, Paper We-Online.37 | |
EchoFall: An Indoor Acoustic Fall Detection and Severity Analysis Method |
|
Zhou, Rui | University of Electronic Science and Technology of China |
Liu, Chenxu | University of Electronic Science and Technology of China |
Sun, Jiajun | University of Electronic Science and Technology of China |
Li, Songlin | University of Electronic Science and Technology of China |
Keywords: Human Perception in Multimedia, Human-centered Learning, Human-Computer Interaction
Abstract: Fall detection is of significant importance for healthcare. However, such systems still suffer from confusion with fall-like motions, domain dependence and lack of severity analysis. In this paper, we propose an acoustic fall detection and severity analysis method---EchoFall, leveraging inaudible acoustic signals. To distinguish various falls from daily motions, EchoFall captures the Doppler effect induced by motions, generates Doppler Frequency Spectrogram (DFS), employs the Residual Network (ResNet) to identify sudden and soft falls. To achieve domain independence, we devise a feature disentanglement network to separate the motion features and the domain features in DFS, allowing the classifier to take only the motion features to identify falls. To analyze the severity of falls, EchoFall estimates the distance from the target person to the acoustic device, monitors the status of the person after fall, and identifies the severity of falls. Extensive evaluations demonstrate that EchoFall can achieve fall detection independent of users, locations and environments, and can identify the falls as fatal, moderate or minor, facilitating appropriate rescues.
|
|
08:30-17:30, Paper We-Online.38 | |
Stochastic Algorithm-Based Estimation of Slow and Fast Processes of Sensorimotor Adaptation During Locomotion* |
|
Singh, Rajat Emanuel | Northwestern College |
Hill, Christopher Mark | Louisiana State University |
Iqbal, Kamran | University of Arkansas at Little Rock |
Keywords: Human Performance Modeling, Human-centered Learning
Abstract: Sensorimotor adaptation involves concurrent slow and fast processes, commonly modeled using a dual-state framework. This study employed a dual-state Kalman filter, incorporating both deterministic (interior point algorithm) and stochastic optimization methods, including genetic algorithm (GA), particle swarm optimization (PSO), and differential evolution algorithm (DEA). Five models were developed to estimate parameters and internal states (slow and fast) during a visual feedback treadmill walking task, using knee kinematics data from eleven participants. PSO achieved the highest accuracy in both full-state and internal-state estimations. GA showed the lowest full-state accuracy but performed well in internal-state estimation. DEA ranked second in full-state accuracy but struggled with internal states, while the interior point method had lower full-state accuracy but outperformed DEA in internal-state estimation. Notably, internal-state accuracy did not always align with full-state performance. Overall, stochastic methods outperformed the deterministic approach in estimating dual-state Kalman filter models for sensorimotor adaptation.
|
|
08:30-17:30, Paper We-Online.39 | |
Deep Multi-Feature Hash Networks for Image-Text Retrieval |
|
Dong, Zheng | Qilu University of Technology (Shandong Academy of Sciences) |
Ruijia, Zhang | Qilu University of Technology (Shandong Academy of Sciences) |
Lu, Qin | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Human-Computer Interaction, Multi-User Interaction, Information Systems for Design
Abstract: In recent years, cross-modal image-text retrieval has gained significant attention due to its ability to efficiently retrieve semantically relevant information from large-scale multimedia data. The rise of deep learning has provided new perspectives for addressing the heterogeneity challenge in cross-modal retrieval. However, existing hashing methods often struggle to balance efficient retrieval with fine-grained semantic alignment and global semantic understanding, thereby limiting retrieval accuracy. To tackle these challenges, this paper proposes a novel deep multi-feature hashing network (DMFHN), designed to achieve both efficient and fine-grained cross-modal retrieval through compact binary hash codes. The core of DMFHN lies in the synergy between the feature optimization encoder and a bidirectional GRU. The feature optimization encoder integrates self-attention mechanisms with depthwise separable convolutions to effectively capture both global dependencies and local details in images. Specifically, the module first employs multi-head self-attention to model the global contextual information of an image, then utilizes depthwise separable convolutions to extract crucial local features. By fusing global relationships with local details, it generates a more expressive image representation. Meanwhile, the bidirectional GRU enhances textual features by capturing sequential dependencies and contextual semantics within the text. Additionally, we design a cross-modal feature fusion strategy that dynamically integrates image and text representations, further improving fine-grained semantic expressiveness. Through a well-optimized hashing function, DMFHN constrains and quantizes multi-modal features into compact hash codes, ensuring high retrieval efficiency while maintaining strong cross-modal semantic consistency. Finally, experiments conducted on MIRFLICKR-25K and NUS-WIDE, two real-world datasets, demonstrate that DMFHN achieves state-of-the-art performance in image-text retrieval, significantly outperforming existing mainstream methods.
|
|
08:30-17:30, Paper We-Online.40 | |
RCIM: Relation-Driven Collaborative and Integrative Multimodal Knowledge Graph Completion |
|
Ruijia, Zhang | Qilu University of Technology (Shandong Academy of Sciences) |
Dong, Zheng | Qilu University of Technology (Shandong Academy of Sciences) |
Lu, Qin | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Human-Machine Interaction, Information Systems for Design, Human-Collaborative Robotics
Abstract: 知识图谱补全(KGC)旨在学习高质量 要预测的实体和关系的表示 知识图谱 (KG) 中缺少实体。模 态 知识图谱补全 (MMKGC) 增强实体 通过合并多模态信息来表示,例如 作为图像。然而,现有方法难以实现 有效过滤掉不相关的特征,无法 充分利用关系依赖关系,从而限制 他们的推理能力。为了解决这些问题,我们 提出一种关系驱动的协作和整合 多模态知识图谱补全模型(RCIM-KGC), 利用关系驱动的机制来增强 多模态知识表示。具体来说, 自适应关系驱动协作模块 (RDCM) 选择并整合视觉和结构信息 基于关系重要性优化实体 表示法。同时,关系驱动 集成模块 (RDIM) 计算 相邻实体来指导多模态的聚合 邻里信息,从而提高推理能力 性能。实验结果表
|
|
08:30-17:30, Paper We-Online.41 | |
A Vision-Based Medication Monitoring and Advisory System to Detect Adherence from Video Streams |
|
Wang, Anthony | Stratford Preparatory |
Greer, Ross | University of California, Merced |
Keywords: Human-Machine Cooperation and Systems, Assistive Technology, Human-Computer Interaction
Abstract: A live medication monitoring and advisory system to track medicine intake is a potentially viable solution to enforce a medication schedule. In this paper, we present a system that uses an indoor mounted camera to detect live medicine intake behavior using the video-based attention architecture TimeSformer. We present an algorithm for processing an entire video to predict moments of medicine-taking using repeated application of this model on overlapping windowed segments. We trained and evaluated our model on real-world data from 100 videos with varied lighting, clothing, medicine intake style, and medicine bottles. Our model achieved an AUC of 99.49% on a precision-recall curve, highlighting its practicality and potential for reducing overdoses and enforcing prescription schedules. We make our code publicly available at: https://github.com/arw5902/medicine.
|
|
08:30-17:30, Paper We-Online.42 | |
Reinforcement Learning with Attention-Based Multi-Scale Convolution Networks for Imperfect Information Games |
|
Yuan, Weilin | National University of Defense Technology |
Hu, Zhenzhen | National University of Defense Technology |
Jiaxing, Chen | National University of Defense Technology |
Chen, Shaofei | National University of Defense Technology |
Zhao, Weiwei | National University of Defense Technology |
Lu, Lina | National University of Defense Technology |
Keywords: Environmental Sensing,, Networking and Decision-Making
Abstract: Game theory plays a pivotal role in addressing Imperfect Information Games (IIGs), having given rise to the development of a suite of superhuman agents. However, constructing information sets and game trees in game theory approaches restricts the transferability of strategies. Recently, Reinforcement Learning (RL) has been introduced from the perfect information game to IIG, owing to its generalization capabilities. Yet, the challenges of efficient feature abstraction and stable learning within the RL paradigm persist. To address these challenges, we explore a stable actor-critic training framework with knowledge distillation based on supervised learning to enhance proximal policy optimization in strategy learning (MCA-PPO). For feature extraction, we engineer a multi-scale convolutional neural network that amalgamates local feature information across various receptive field sizes, unearthing the latent information embedded within potential feature sequences. We refine the network architecture further by integrating it with an attention mechanism, enabling it to discern relationships between feature channels and to autonomously identify the essence of multi-scale features. Regarding strategy learning, we employ the knowledge distillation technique to mitigate the instability inherent in proximal policy optimization, thereby hastening the progression of strategic learning. Finally, we conducted a systematic and comprehensive analysis of the MCA-PPO algorithm through a series of experiments on Teax Hold'em poker, showing the efficacy and transferability of the MCA-PPO framework.
|
|
08:30-17:30, Paper We-Online.43 | |
PGQE: Pose-Guided Query Enhancement for Person Re-Identification |
|
Wang, Ning | NanJing University |
Lu, Sanglu | State Key Laboratory for Novel Software Technology, NanJing Univ |
Xie, Lei | NanJing University |
Keywords: Human-Centered Transportation, Human Enhancements, Human-Machine Interaction
Abstract: Person re-identification (Re-ID) aims to retrieve images of the same individual from a gallery captured by disjoint camera views. A major challenge lies in learning robust and discriminative representations under varying human poses and cluttered backgrounds. Recent approaches based on Generative Adversarial Networks (GANs) attempt to mitigate these issues by augmenting training data via pose or style transfer. However, despite generating visually plausible samples, such methods often introduce redundant or low-quality data, which can hinder feature learning, slow down convergence, and lead to overfitting. In this paper, we propose PGQE, a Pose-Guided Query Enhancement framework that synthesizes pose-normalized query images during inference, thereby avoiding the drawbacks of GAN-based data augmentation in training. Motivated by the observation that queries with cleaner backgrounds and canonical poses yield better matching performance, PGQE leverages high-quality gallery samples to extract target poses and guide image generation. To ensure the quality and discriminability of the generated queries, we impose two constraints: identity consistency with the source image and pose consistency with the extracted target pose. Extensive experiments on Market-1501, CUHK03, and MSMT17 demonstrate that PGQE significantly improves Re-ID accuracy and outperforms several state-of-the-art methods.
|
|
08:30-17:30, Paper We-Online.44 | |
An Efficient Policy-Hiding Access Control Scheme for Data Sharing in Mobile Health Applications |
|
Wang, Fengling | Guang'an Vocational and Technical College |
Tian, Qiufa | Guangxi Normal University |
Yang, Mingzhi | Guilin University of Technology |
Chen, Fenghua | Guangxi Normal University |
Song, Weitong | Guilin University of Technology |
Chen, Zhuhui | Guangxi Normal University |
Keywords: Systems Safety and Security, Systems Safety and Security,
Abstract: Ciphertext-policy Attribute-based Encryption is pivotal to mobile health security. Research on traditional CPABE schemes has predominantly concentrated on the security and efficiency of ciphertext data sharing, whereas investigations into key distribution remain insufficient. Moreover, access policies associated with user attributes in traditional schemes are typically transmitted in plaintext, posing a risk of user privacy leakage. Furthermore, there is limited research on user tracing and attribute revocation in traditional schemes following data leakage incidents. To address these issues, this paper proposes a multi-secret sharing CP-ABE access control scheme tailored for mobile health. The scheme integrates the CTM-SS multi-secret sharing algorithm with a multi-authority architecture to achieve efficient distributed data sharing. The access policy is fully concealed using a hash function. A user identity mapping table is constructed. In the event of data leakage, the regulator uses the identity mapping table to trace malicious users and revoke their attributes. Simulation results demonstrate that the proposed scheme exhibits strong security and performance characteristics and is suitable for real-world mobile healthcare environments.
|
|
08:30-17:30, Paper We-Online.45 | |
Adaptive Multi-Frequency Attention Network for Human Motion Prediction |
|
Shang, Jianbo | Dongguan University of Technology |
Ren, Ziliang | Dongguan University of Technology |
Zhang, Qieshi | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Zhang, Fuyong | School of Computer Science and Technology, Dongguan University O |
Zhao, Tiezhu | School of Computer Science and Technology, Dongguan University O |
Keywords: Human-Computer Interaction, Intelligence Interaction, Human-Collaborative Robotics
Abstract: Human motion prediction aims to predict future human motion sequences based on historical motion sequences. Existing deterministic methods typically predict only a single future sequence, ignoring the inherent stochasticity and diversity of human motion. To address this limitation, we propose a stochastic human action prediction network to achieve diverse motion prediction. Specifically, in the latent space, we introduce an adaptive multi-frequency attention module and a graph convolutional module. The graph convolutional network module encodes the sequence into a continuous latent representation, while the adaptive multi-frequency attention module captures multi-frequency information from historical action sequences and selects important frequency components for the encoder and decoder to improve prediction accuracy. Additionally, a set of motion queries and semantic latent directions are introduced to further enhance the diversity of prediction results. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods in both prediction accuracy and diversity.
|
|
08:30-17:30, Paper We-Online.46 | |
Re-Parameterization Convolution Spiking Neural Network for Object Detection |
|
Zhou, Jun | Dongguan University of Technology |
Ren, Ziliang | Dongguan University of Technology |
Zhang, Qieshi | Shenzhen Institutes of Advanced Technology, Chinese Academy of S |
Kudayberdievna, Kadyrkulova Kyial | Kyrgyz State Technical University Named after I.Razzakov, Bishke |
Aizharkyn, Taalaybekova | Kyrgyz State Technical University Named after I.Razzakov, Bishke |
Keywords: Intelligence Interaction, Human-Computer Interaction
Abstract: As the third-generation of neural networks, Spik ing Neural Networks (SNN) have biological plausibility and low-power advantages over Artificial Neural Networks (ANNs). However, applying SNN to object detection tasks presents challenges in achieving both high detection accuracy and fast processing speed. To overcome the aforementioned problems, we propose a Re-parameterization SpikeYOLO (RepSpikeYOLO) for high-performance and energy-effcient object detection Our design revolves around network architecture and SNN residual block. Foremost, the SNN are difficult to train, mainly owing to their complex dynamics of neurons and non-differentiable spike operations. We design a YOLO architecture to solve this problem by training SNN with surrogate gradients. Second, object detection is more sensitive to gradient vanishing or exploding in training deep SNN. To address this challenge, we design a new SNN residual block, which can effectively extend the depth of the directly-trained with low power consumption. The proposed approach is validated on both COCO dataset and PASCAL VOC dataset. It is shown that our YOLO could achieve a comparable performance to the ANN with the same architecture. On the COCO dataset, we obtain 54% mAP@50 and 33.7% mAP@50:95, which is +3.9% and 3.7% higher than the prior state-of-the-art SNN, respectively. On the PASCAL VOC dataset, we achieve 75.1% mAP@50, which is +21.05% higher than the prior state-of-the-art SNN.
|
|
08:30-17:30, Paper We-Online.47 | |
Feast of Fireworks: Immersive VR Relaxation with Motion Capture Interaction |
|
Yao, Hongjun | Southeast University |
Ding, Ding | Southeast University |
Nie, Cheng | Southeast University |
Liu, Zicheng | Southeast University |
Xu, Xiangyu | Southeast University |
Keywords: Virtual/Augmented/Mixed Reality
Abstract: In modern society, the prevalence of psychological disorders is gradually increasing among the population, especially among young people. Traditional psychological relaxation methods and counseling services struggle to meet the growing demand. This study proposes a virtual reality (VR) relaxation system that integrates multiple relaxation methods, aiming to alleviate users' anxiety. The system features natural interaction, multiple activity modules designed based on different relaxation theories (such as Expressive Writing, Attention Restoration Theory, and Catharsis Theory), and a reward module, providing users with an immersive experience. The empirical study recruited 36 undergraduate and master students aged 18-25. Through comparative experiments, evaluations were conducted by measuring long-term and short-term anxiety, immersion, and system usability. The results show that the VR relaxation system is significantly effective in reducing users' anxiety levels, and its long-term effect is better than that of Progressive Muscle Relaxation (PMR). Additionally, the result of immersion and system usability indicate that the system offers a satisfying and user-friendly virtual experience.
|
|
08:30-17:30, Paper We-Online.48 | |
Multimodal Large Language Model-Action Unit Approach for Mixed Emotion Descriptors |
|
Kim, Cheolhee | Korea Institute of Science and Technology |
Ji, Seungyeon | Korea Institute of Science and Technology |
Han, Kyungreem | Korea Institute of Science and Technology |
Keywords: Affective Computing, Human-Computer Interaction, Cognitive Computing
Abstract: Facial Expression Recognition (FER) allows computers to identify emotional expressions depicted on a human face. While recent vision-language models have demonstrated remarkable performance across various single-emotion-FER tasks, often outperforming human-level, they usually fail mixed emotion cases. This study describes a Multimodal Large Language Model (MLLM) approach (i.e., Emo-AU (Action Unit) LLM) for mixed emotion detection using the FERPlus dataset. The Emo-AU LLM uses a cross-modal attention mechanism that relates AU and visual features to circumvent the limitations of current visual encoders, which rely on coarse facial features. Our model achieved an accuracy of 98.30% for the single-emotion detection, and it obtained accuracies of 89.76% (major emotion: more labeled by 10 annotators) and 82.42% (minor emotion: less labeled) for the two-emotion cases. The model also explains the reasoning behind the predictions in the benefit of a large language model. This study lays the foundations for understanding facial expressions in context—considering the surrounding situation and social interaction, rather than solely relying on the human face in real and generated image/video.
|
|
08:30-17:30, Paper We-Online.49 | |
SocializeChat: A GPT-Based AAC Tool Grounded in Personal Memories to Support Social Communication (I) |
|
Xiang, Wei | Zhejiang University |
Xu, Yunkai | Pennsylvania State University |
Fang, Yuyang | Zhejiang University |
Teng, Zhuyu | Zhejiang University |
Jiang, Zhaoqu | Zhejiang University |
Hu, BeiJia | ZheJiang University |
Yang, Jinguo | Zhejiang University |
Keywords: Assistive Technology, Human-Computer Interaction
Abstract: Elderly people with speech impairments often face challenges in engaging in meaningful social communication, particularly when using Augmentative and Alternative Communication (AAC) tools that primarily address basic needs. Moreover, effective chats often rely on personal memories, which is hard to extract and reuse. We introduce SocializeChat, an AAC tool that generates sentence suggestions by drawing on users’ personal memory records. By incorporating topic preference and interpersonal closeness, the system reuses past experience and tailors suggestions to different social contexts and conversation partners. SocializeChat not only leverages past experiences to support interaction, but also treats conversations as opportunities to create new memories, fostering a dynamic cycle between memory and communication. A user study shows its potential to enhance the inclusivity and relevance of AAC-supported social interaction.
|
|
08:30-17:30, Paper We-Online.50 | |
Advancing Meniscus Tear Diagnosis in Knee MRI: MRNet+ |
|
Lam, Jerry H. Y. | The Chinese University of Hong Kong |
Tsang, Colin S. C. | The Chinese University of Hong Kong |
Sum, K. W. | The Chinese University of Hong Kong |
Keywords: Medical Informatics
Abstract: This paper explores the use of deep learning techniques in the field of medical image analysis, specifically focusing on knee MRI scans. With data pre-processing techniques, we have developed a model, MRNet+, that improves the area under the curve (AUC) for MRI diagnosis in detecting knee meniscus tears. Pretrained parameters and data augmentation techniques are utilized to enhance model performance. The result has shown an improvement in AUC compared to the original MRNet. Our proposed model, MRNet+, has achieved an AUC of 88.49%. This work demonstrates the potential of data pre-processing techniques and deep learning in improving diagnostic tools for medical imaging and suggests future directions for applying AI to healthcare.
|
|
08:30-17:30, Paper We-Online.51 | |
QwenGrasp: Human-Robot Interactive 6-DoF Target-Oriented Grasping with Large Vision-Language Model |
|
Chen, Xinyu | Southern University of Science and Technology |
Yang, Jian | Southern University of Science and Technology |
Zhao, Qi | Southern University of Science and Technology |
He, Zonghan | Southern University of Science and Technology |
Yang, Haobin | Southern University of Science and Technology |
Shi, Yuhui | Southern University of Science and Technology |
Keywords: Human-Collaborative Robotics, Human-Machine Cooperation and Systems, Human-Computer Interaction
Abstract: Human-robot interactive target-oriented grasping in unstructured environments, guided by natural language, is crucial for enabling intelligent robotic arms to perform tasks safely and efficiently. However, it remains a challenge for robot arms to comprehend human instructions and execute corresponding grasping actions. In this paper, we propose QwenGrasp, a novel system that uses a large vision-language model to align workspace images with textual instructions. This alignment enables QwenGrasp to perform accurate 6-DoF grasping on the specified target object. Additionally, we introduce Masked REGNet, which incorporates target-object location information into the network to generate precise grasp poses and ensure high grasp quality. Through extensive real-world experiments, QwenGrasp achieves over 90% success across six diverse instruction types. The results highlight QwenGrasp’s ability to understand human intent and execute precise grasping actions. Notably, it outperforms other target-oriented methods in both performance and instruction comprehension. Even when given vague descriptions, directional cues, or complex instructions, QwenGrasp reliably identifies and grasps the correct object. An ablation study further confirms the importance of each component, with all contributing significantly to robust and high-quality grasping.
|
|
08:30-17:30, Paper We-Online.52 | |
Denoising Near-Infrared Spectroscopy Signal |
|
Yin, Yue | Guilin University of Electronic Technology |
Li, Runze | Guilin University of Electronic Technology |
Hu, Yan | School of Artificial Intelligence, Guilin University of Electron |
Chaddad, Ahmad | Guilin University of Electronic Technology |
Keywords: Brain-Computer Interfaces, Brain-based Information Communications, Human-Computer Interaction
Abstract: Functional near-infrared spectroscopy (fNIRS) is a non-invasive method for detecting brain activity; however, noise, including motion artifacts and systemic interference, significantly impacts signal quality. To identify an effective denoising technique, we evaluated seven popular methods: Savitzky-Golay (SG) filter, spline interpolation, multivariate disturbance filtering(MDF), traditional band-pass filtering, coefficient of variation (CV) analysis, correlation-based signal improvement (CBSI) and time derivative distribution repair (TDDR), using a public dataset. Experimental results indicate that the SG filter offers the highest signal-to-noise ratio (SNR) of 27.65, and excels at removing high-frequency noise like spikes through averaging techniques. However, the CV method provides the highest contrast enhancement, with a contrast-to-noise ratio (CNR) of 19.41. Importantly, spline interpolation balances signal contrast and primary signal extraction effectively, so that we considered it as our best method. These findings offer valuable guidance for selecting appropriate filters in denoising fNIRS signals. The code and results are available on url{https://github.com/AIPMLab/Signal_SMC}.
|
|
08:30-17:30, Paper We-Online.53 | |
Enhancing Path Prediction with Eye Movement Data: Deep Learning Applications in Advanced Driver Assistance Systems and Autonomous Vehicles |
|
Alsanwy, Shehab | Deakin University |
Qazani, Mohammad Reza Chalak | Deakin University |
Shajari, Arian | Deakin University |
Nahavandi, Saeid | Swinburne University of Technology |
Asadi, Houshyar | Deakin University |
Keywords: Human-Centered Transportation, Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: Vehicle trajectory prediction plays a pivotal role in enhancing advanced driver assistance systems (ADAS) and autonomous vehicles (AVs), crucial for collision avoidance, path planning, and traffic management. Traditional models often fail to account for variations in driver behaviour, such as eye movement patterns, which can substantially influence trajectory predictions. Our research presents an advanced trajectory prediction model that integrates Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks with both eye movement data and traditional vehicle dynamic information. The accuracy of these models was evaluated in a simulated environment designed to mimic real-world driving conditions, capturing extensive data on vehicle dynamics, including position, rotation, acceleration, speed, and eye movement patterns. Data collection was rigorously conducted with 17 drivers, each using a driving simulator that ran the Euro Truck Simulator software. The models were implemented and validated using Python 3.9 and Google Colab, chosen for their effectiveness in handling deep learning tasks. Our findings demonstrate that the inclusion of eye movement data alongside vehicle dynamics enhances the accuracy of trajectory predictions, significantly reducing both Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as well as Mean Absolute Error Percentage (MAPE), compared to models based solely on vehicle dynamics. This improvement not only bolsters the precision of trajectory predictions for ADAS and AV systems but also significantly elevates their safety and operational efficiency.
|
|
08:30-17:30, Paper We-Online.54 | |
Military Aircraft Target Detection Using Enhanced YOLOv11 and Super-Resolution Algorithm |
|
Zhou, Yu | Nanjing University of Science and Technology |
Wang, Jun | Nanjing University of Science and Technology |
Bo, Yuming | Nanjing University of Science and Technology |
Yang, Huanyu | Nanjing University of Science and Technology |
Liu, Peiyao | Nanjing University of Science and Technology |
Yang, Lijun | Nanjing University of Science and Technology |
Keywords: Human-Machine Interaction, Human-Machine Cooperation and Systems, Human-Machine Interface
Abstract: Abstract—Military aircraft target detection remains a critical challenge in modern defense systems, particularly for remote sensing imagery with complex environmental interference. Existing approaches often exhibit limitations in maintaining high detection fidelity across varying resolutions and cluttered backgrounds. To overcome these constraints, this study presents SR-YOLOv11, a multi-stage framework integrating Super-Resolution (SR) reconstruction and hierarchical feature optimization. Initially, the Enhanced Deep Super-Resolution (EDSR) network is deployed to refine input image quality, ensuring precise preservation of critical aircraft morphological features. Subsequently, the YOLOv11 architecture is systematically enhanced through three key innovations: 1) Replacement of the native C3k2 module with a C3k2_AdditiveBlock to amplify discriminative feature learning; 2) Integration of an Adaptive Downsampling (ADown) layer for computationally efficient multi-scale context aggregation; 3) Implementation of a auxiliary detection head mechanism with cross-layer feature fusion, significantly boosting localization accuracy for occluded targets. Comprehensive evaluations on the MAR20 dataset demonstrate the framework’s superiority, achieving 98.7% mAP@50 and 80.9% mAP@(50:95). The proposed architecture demonstrates enhanced robustness in complex environments while maintaining real-time processing efficiency, validating its operational viability in aerial surveillance scenarios.
|
|
08:30-17:30, Paper We-Online.55 | |
SIEP-YOLO: Small Target Cluster Detection in Aerial Images |
|
Liu, Peiyao | Nanjing University of Science and Technology |
Wang, Jun | Nanjing University of Science and Technology |
Hu, Jinpeng | Chongqing Construction Engineering Group Corporation Limited |
Yang, Huanyu | Nanjing University of Science and Technology |
Zhu, Peng | Nanjing University of Science and Technology |
Bo, Yuming | Nanjing University of Science and Technology |
Keywords: Human Enhancements, Human-Machine Interaction, Information Visualization
Abstract: Accurate detection of small object clusters in modern security surveillance systems is critical for threat prevention and response, directly impacting monitoring efficiency and risk management efficacy. However, prevailing object detection algorithms struggle with high-density small target scenarios, suffering from slow inference speeds, insufficient precision, frequent false positives, and high miss rates. These limitations undermine real-time reliability and responsiveness to emergencies. To address these challenges, we propose SIEP-YOLO, an enhanced YOLOv11-based model integrating three novel components: the SDI-iAFF feature fusion module, EUCB upsampling module, and a specialized small object detection layer P2. This architecture boosts representational capacity and detection performance while simplifying computational complexity, achieving a balance between high accuracy and lightweight design. Experimental results demonstrate that SIEP-YOLO outperforms the original YOLOv11 by 3.2% and 2.0% in mAP@0.5 and mAP@(0.5:0.95), respectively, on benchmark datasets. The proposed model thus emerges as a superior solution for small object cluster detection, enabling more efficient and reliable surveillance in complex environments.
|
|
08:30-17:30, Paper We-Online.56 | |
FedGA: Federated Learning Via Gradient Adaptive Aggregation (I) |
|
Hu, Changfeng | Hangzhou Dianzi University |
Tan, Min | Hangzhou Dianzi University |
Gao, Zhigang | China Jiliang University |
Han, Tingting | Hangzhou Dianzi University |
Kuang, Zhenzhong | Hangzhou Dianzi University |
Keywords: Cognitive Computing, Human-Machine Interaction, Intelligence Interaction
Abstract: In modern lives, the rapid proliferation of Internet of Things (IoT) devices has made them indispensable tools for data collection and analysis across various domains. However, growing concerns over data ownership and privacy have hindered effective data sharing among IoT devices, leading to the persistent challenge of data silos. Federated Learning (FL) has emerged as a promising solution to this problem by enabling collaborative model training without direct data exchange. Despite its potential, FL faces two critical limitations: severe catastrophic forgetting for historical knowledge and inefficient average aggregation. To address these challenges, this paper proposes FedGA, an innovative FL framework that leverages cosine similarity-based weighted aggregation to enhance model convergence speed. Furthermore, FedGA incorporates a mechanism to memorize historical models, thereby significantly alleviating catastrophic forgetting. Extensive experiments on three public datasets validate the effectiveness of FedGA, demonstrating its superior performance in both accuracy and training efficiency compared to state-of-the-art methods. The results highlight FedGA’s capability to overcome the key shortcomings of existing FL approaches, making it a robust solution for practical IoT applications.
|
| |