SMC 2024 Program | Tuesday October 8, 2024


TuAT1	MR01
Cognitive and Affective Computing 1	Regular Papers - Cybernetics
Chair: Liping, Wang	Zhejiang University of Technology

08:45-09:05, Paper TuAT1.3
Parquet-Based CTR Model Training in Production Environment

Liu, Zhibing	Institute of Information Engineering, CAS; School of Cyber Secur
Guo, Jinrong	JD.com
Zhou, Biyu	Institute of Information Engineering, Chinese Academy of Sciences
Xiaokun, Zhu	JD.com
Yongjun, Bao	JD.com
Han, Jizhong	Institute of Information Engineering, Chinese Academy of Science
Hu, Songlin	Institute of Information Engineering, Chinese Academy of Sciences
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence Abstract: CTR model has played an important role in modern recommendation systems. Most of the recommendation models in industrial scenario are trained by TensorFlow. However, we observed that, TFRecord, the native k-v data format in TensorFlow, is not the best choice for CTR training. Those keys take up to 54% of the storage space in TFRecord formatted training data. To overcome this, we introduce Apache Parquet, a column-oriented data format, into CTR tasks to improve spatial efficiency. Besides, to use in production environment, we further give some high performance implementations of Parquet training scheme. Firstly, GPU data preprocessing method is adopted in replace of original Spark based solution to generate Parquet training data and accelerate data preprocessing. Secondly, we modify data loader in TensorFlow to consume the Parquet training data with high efficiency. Experimental results show that, the size of preprocessed Criteo dataset is 95.29% smaller in comparison to TFRecord and the data preprocessing time also reduces 99.6%. Without any model performance damage, we speed up the training process by 1.45x. Our scheme has applied to our internal business and has obtained similar performance benefits.

09:05-09:25, Paper TuAT1.4
A Multi-Scale Qiantang River Tide Prediction Model Based on Multi-Period Decoupling

Liping, Wang	Zhejiang University of Technology
Jin, Hao	Zhejiang University of Technology
Qicang, Qiu	Zhejiang Lab
Hui, Wang	Zhejiang University of Technology
Keywords: Neural Networks and their Applications, AI and Applications, Application of Artificial Intelligence Abstract: To enhance the accuracy of tidal bore height prediction in the Qiantang River, this paper addresses the limitations of one-dimensional, single-scale time series in terms of representational capacity by proposing an MPCSM model that integrates multi-period decomposition with cross-scale fusion for tide height forecasting. The model initially employs the Maximum Overlap Discrete Wavelet Transform to decompose the original time series data into multiple periodic components. Subsequently, it leverages multi-frequency channel attention and gating mechanisms to ascertain the significance of each frequency component and optimize prediction weights. For prediction, cross-scale iterative forecasting is utilized to capture both long-term and short-term characteristics of tidal height data, accompanied by a designed loss function computation strategy that adapts to cross-scale prediction errors. At the Qiantang River Zhakou station, compared to current mainstream prediction models, the average absolute error across different forecast time spans has been reduced by 9.26%. The research findings can serve as a reference for tidal bore prediction in the Qiantang River and other similar basins.

09:25-09:45, Paper TuAT1.5
Enhancing Session-Based Recommendation Via Inter-Session Similar Intent Modeling and Graph Neural Networks

Li, Yunhan	Inner Mongolia University
An, Chunyan	Inner Mongolia University
Yang, Conghao	Inner Mongolia University
Wang, Mingyuan	Inner Mongolia University
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning Abstract: Session-based recommendation (SBR) is a challenging task that aims to make item recommendations based on anonymized user session data. Mainstream SBR efforts focus on modeling information within a session and do not use information from other sessions. Although some works try to use other session information, there are still many limitations, and how to model other session information is still a challenging task. To overcome these limitations, we propose a new method for learning similar intentions between sessions, aiming to better model the recommendation information contained in other sessions. Specifically, we contribute a new model named ISIM-GNN that learns and integrates three levels of information simultaneously: (i) In the intra-session representation learning layer, we represent the session as a session graph and model it using a gated graph neural network. (ii) In the global item embedding learning layer, We use the graph attention mechanism to propagate and aggregate relevant item information from other sessions on the global graph. (iii) In the inter-session similar intent learning layer, we employ both "hard similarity" and "soft similarity" to select similar sessions, and use the attention mechanism to conduct session-level aggregation on the selected similar sessions to make better use of the inter-session collaboration information. Experiments on three real-world datasets show a significant performance improvement of our approach compared to state-of-the-art work.


TuAT2	MR02
Deep Learning and Neural Networks 4	Regular Papers - Cybernetics
Chair: Yang, Yisheng	Xiamen University Malaysia

08:45-09:05, Paper TuAT2.3
AutoASD: An Automated Architecture Search for Detecting Insidious Malicious Traffic Behaviour in APT Attacks with Assorted Features

Liu, Xinyu	Sichuan University
Zhong, Zhentian	Sichuan University
Li, Xiaohui	Sichuan University
Xiang, Huifang	Sichuan University
Keywords: Deep Learning, Transfer Learning, Machine Learning Abstract: It has been acknowledged that the risks posed by Advanced Persistent Threats (APTs) are critical. These attacks can allow cybercriminals to remotely manipulate infected devices and steal sensitive data. To effectively combat APT attacks, it is crucial to employ multidimensional analysis techniques that can predict their impact and detect lateral infiltration behavior. This paper presents an approach called AutoASD, which utilizes Neural Architecture Search (NAS) Deep learning (DL) and Transfer Learning (TL) to identify various types of malicious traffic in APT attacks. AutoASD analyzes data at multiple granularities to classify different types of malware traffic and enhance classification accuracy. It leverages feature extraction and a pre-trained high-performance backbone network as the seed network, and employs parameter remapping to adjust the depth, width and kernel to create a super network. The aim of using NAS is to improve the real-time and accuracy of the system. In experiments, the effectiveness of AutoASD was verified using MobileNetV2 and demonstrated superior performance in APT malicious traffic classification, particularly for attacks with small sample sizes.

09:05-09:25, Paper TuAT2.4
EDAW: Enhanced Knowledge Distillation and Adaptive Pseudo Label Weights for Continual Named Entity Recognition

Sheng, Yubin	Central South University
Zhang, Zuping	Central South University
Tang, Panrui	Central South University
Huang, Bo	Central South University
Xiao, Yao	XinJiang University
Keywords: Knowledge Acquisition, Neural Networks and their Applications, Deep Learning Abstract: Continual Learning for Named Entity Recognition (CL-NER) is designed to train models capable of adapting to evolving data by continuously introducing new entity types. This approach is crucial in dynamic environments where data evolves, such as social media, healthcare, and legal documents, necessitating the model to retain the memory of previously learned entity types while learning to identify new ones. However, due to the neural network's tendency to acquire new knowledge and forget old knowledge in continual learning and the unique non-entity type annotations in NER tasks, CL-NER faces severe catastrophic forgetting and semantic drift issues. In this paper, we propose Enhanced Knowledge Distillation and Entropy-based Adaptive Pseudo Label Weights (EDAW) to address the catastrophic forgetting and semantic drift issues in CL-NER. Specifically, we develop an enhanced knowledge distillation method that combines Kullback-Leibler divergence and feature cosine discrepancy. This method effectively minimizes the variance in output probability distributions and aligns the internal feature spaces between new and old models, thus reducing catastrophic forgetting. Additionally, we propose an entropy-based adaptive pseudo label weight method that allows the model to assign different weights to pseudo labels with varying certainties during training, effectively alleviating semantic drift and error accumulation caused by erroneous re-labeling of pseudo labels. Notably, this study pioneers the inclusion of a Chinese dataset in CL-NER, enhancing the model's robustness and demonstrating its efficacy in a multilingual context. Experiments on fourteen CL-NER settings across four public NER datasets show that EDAW improves average Micro-F1 and Macro-F1 scores by 3.44% and 3.72%, respectively, over existing state-of-the-art(SOTA) methods. We make our code available at: https://github.com/livosr/EDAW/tree/master

09:25-09:45, Paper TuAT2.5
Evolutionary Neural Architecture Search for 3D Point Cloud Analysis

Yang, Yisheng	Xiamen University Malaysia
Du, Guodong	Harbin Institute of Technology, Shen Zhen
Toa, Chean Khim	Xiamen University Malaysia
Tang, Ho-Kin	Harbin Institute of Technology (Shenzhen)
Goh, Sim Kuan	Xiamen University Malaysia
Keywords: Multimedia Computation, Neural Networks and their Applications, Computational Intelligence Abstract: Neural architecture search (NAS) automates neural network design by using optimization algorithms to navigate architecture spaces, reducing the burden of manual architecture design. While NAS has achieved success, applying it to emerging domains, such as analyzing unstructured 3D point clouds, remains underexplored due to the data lying in non-Euclidean spaces, unlike images. This paper presents Success-History-based Self-adaptive Differential Evolution with a Joint Point Interaction Dimension Search (SHSADE-PIDS), an evolutionary NAS framework that encodes discrete deep neural network architectures to continuous spaces and performs searches in the continuous spaces for efficient point cloud neural architectures. Comprehensive experiments on challenging 3D segmentation and classification benchmarks demonstrate SHSADE-PIDS's capabilities. It discovered highly efficient architectures with higher accuracy, significantly advancing prior NAS techniques. For segmentation on SemanticKITTI, SHSADE-PIDS attained 64.51% mean IoU using only 0.55M parameters and 4.5GMACs, reducing overhead by over 22-26X versus other top methods. For ModelNet40 classification, it achieved 93.4% accuracy with just 1.31M parameters, surpassing larger models. SHSADE-PIDS provided valuable insights into bridging evolutionary algorithms with neural architecture optimization, particularly for emerging frontiers like point cloud learning.


TuAT3	MR03
Autonomous Systems and Robotics
Chair: Disimino, Giuseppe	APPLICA Srl

08:05-08:25, Paper TuAT3.1
Matching Input and Output Devices and Physical Disabilities for Human-Robot Workstations

Weidemann, Carlo Benedikt	RWTH Aachen University
Mandischer, Nils	University of Augsburg
Corves, Burkhard	RWTH Aachen University
Keywords: Human-Collaborative Robotics, Design Methods, Assistive Technology Abstract: As labor shortage is rising at an alarming rate, it is imperative to enable all people to work, particularly people with disabilities and elderly people. Robots are often used as universal tool to assist people with disabilities. However, for such human-robot workstations universal design fails. We mitigate the challenges of selecting an individualized set of input and output devices by matching devices required by the work process and individual disabilities adhering to the Convention on the Rights of Persons with Disabilities passed by the United Nations. The objective is to facilitate economically viable workstations with just the required devices, hence, lowering overall cost of corporate inclusion and during redesign of workplaces. Our work focuses on developing an efficient approach to filter input and output devices based on a person's disabilities, resulting in a tailored list of usable devices. The methodology enables an automated assessment of devices compatible with specific disabilities defined in International Classification of Functioning, Disability and Health. In a mock-up, we showcase the synthesis of input and output devices from disabilities, thereby providing a practical tool for selecting devices for individuals with disabilities.

08:25-08:45, Paper TuAT3.2
Design and Implementation of a Cobot Arm System for Ladder Stitch (I)

Disimino, Giuseppe	APPLICA Srl
Mangini, Agostino Marcello	Polytechnic of Bari
Fanti, Maria Pia	Polytecnic of Bari, Italy
Keywords: Human-Collaborative Robotics, Design Methods Abstract: While automation is widespread in tailoring, high- end fashion brands still rely on skilled manual labor for intri- cate stitching. However, finding skilled workers is challenging due to a lack of new talent entering the field. The stitches required both speed and precision. In response to the declining availability of skilled artisans, this study explores leveraging cobot technology to replicate traditional manual techniques. The research investigates the development and implementation of a system integrating cobots to execute precise stitching tasks. Key aspects include designing infrastructure to support cobots and implementing advanced techniques for fabric manipulation and needle guidance. The study also examines the adaptation of cobot technology to replicate the ”Ladder Stitch” technique, traditionally performed by skilled artisans. By blending tra- ditional craftsmanship with modern technology, this research aims to address the shortage of skilled labor in the tailoring industry.

08:45-09:05, Paper TuAT3.3
Exploring Capability-Based Control Distributions of Human-Robot Teams through Capability Deltas: Formalization and Implications (I)

Mandischer, Nils	University of Augsburg
Usai, Marcel	Fraunhofer FKIE
Flemisch, Frank	RWTH Aachen University/Fraunhofer
Mikelsons, Lars	University of Augsburg
Keywords: Human-Collaborative Robotics, Assistive Technology, Shared Control Abstract: The implicit assumption that human and autonomous agents have certain capabilities is omnipresent in modern teaming concepts. However, none formalize these capabilities in a flexible and quantifiable way. In this paper, we propose Capability Deltas, which establish a quantifiable source to craft autonomous assistance systems in which one agent takes the leader and the other the supporter role. We deduct the quantification of human capabilities based on an established assessment and documentation procedure from occupational inclusion of people with disabilities. This allows us to quantify the delta, or gap, between a team’s current capability and a requirement established by a work process. The concept is then extended to the multi-dimensional capability space, which then allows to formalize compensation behavior and assess required actions by the autonomous agent.


TuAT5	MR05
Autonomous and Intelligent Vehicles 1
Chair: Kshetrimayum, Satchidanand	National Taipei University of Technology

08:45-09:05, Paper TuAT5.3
Trajectory Planning for UAV Transportation Systems Using RRT*-Informed NMPC

Kang, Junjie	York University
Shan, Jinjun	York University
Keywords: Intelligent Transportation Systems, Autonomous Vehicle, Robotic Systems Abstract: This paper presents a novel trajectory planning approach for two typical aerial transportation systems: UAV-slung-load and flying inverted pendulum. By integrating Rapidly-exploring Random Trees* (RRT*) into Nonlinear Model Predictive Control (NMPC), the proposed method enhances motion planning, enabling effective navigation in complex environments while ensuring stability and safety. Simulation results demonstrate the approach's capability to overcome local minima and generate feasible trajectories, highlighting its potential to advance trajectory planning in UAV transportation systems.

09:05-09:25, Paper TuAT5.4
Attention-Based Few-Shot Food Classification Using Prototypical Networks (I)

Kshetrimayum, Satchidanand	National Taipei University of Technology
Huang, Yo-Ping	National Taipei University of Technology
Keywords: Intelligent Transportation Systems, Consumer and Industrial Applications, Manufacturing Automation and Systems Abstract: In the era of rapidly advancing technology, food classification has emerged as a pivotal application across various domains including health monitoring, dietary assessment, and culinary innovation. However, efficiently categorizing food items remains a challenge, particularly in scenarios with limited labeled data. This paper introduces a novel approach for few-shot food classification using Prototypical Networks with ResNet-50 and an attention mechanism as embedding network. Leveraging the inherent capability of Prototypical Networks to learn from scarce examples, our method demonstrates exceptional adaptability and accuracy in classifying food items. Through extensive experimentation on the Food-101 dataset, employing various CNN architectures, our findings underscore the effectiveness of our approach. In particular, ResNet-50 integrated with the attention mechanism surpasses other architectures, achieving superior classification accuracies of 91.5% and 95.2% for 1-shot and 5-shot learning scenarios, respectively. This integrated approach showcases the potential of Prototypical Networks in addressing the challenges of limited labeled data in food classification tasks, marking a significant advancement in the field.

09:25-09:45, Paper TuAT5.5
Weighted Fuzzy Rough Sets Feature Selection for High Dimensional Classification Problems (I)

Khabusi, Simon Peter	National Taipei University of Technology
Huang, Yo-Ping	National Taipei University of Technology
Vu, Van Phong	Ho Chi Minh City University of Technology and Education
Keywords: Intelligent Transportation Systems, Decision Support Systems, System Modeling and Control Abstract: Feature selection holds significant importance in knowledge mining as it plays a pivotal role in selecting and preserving the most informative features within a dataset while discarding irrelevant, redundant, or noisy attributes. This process contributes to enhancing model performance, reducing computational complexity, and refining interpretability, thus facilitating more accurate and efficient data analysis. In high-dimensional datasets, the necessity for feature selection becomes more pronounced due to the heightened risk of encountering the curse of dimensionality. Therefore, this study proposes a weighted fuzzy rough quickreduct (FRQR) feature selection approach employing feature weights to handle the equal situation problem inherent in FRQR. The proposed method is evaluated on ten publicly available datasets with feature sizes ranging from 2000 to 15154. The selected features are used to train and test random forest candidate models whose estimates are then combined according to the posterior probabilities by Bayesian Model Averaging (BMA). The performance of the model on the selected features is evaluated on four performance metrics. The essentiality of the selected features is further determined by comparing the model classification performance achieved on the non-selected features and all the dataset features. The results indicate competitiveness in the performance metric values achieved on selected features over the other two feature categories affirming the efficacy of the proposed method.


TuAT6	MR06
Autonomous Systems and Robotics 3
Chair: Chen, Yu-Xuan	National Chung Cheng University

08:45-09:05, Paper TuAT6.3
Is Shared Autonomous Driving Worth Promoting? Based on the Heterogeneity of Consumer Green Preference

Li, Wenjing	Northwestern Polytechnical University
Zhang, Yali	Northwestern Polytechnical University
Keywords: Autonomous Vehicle, Decision Support Systems, Intelligent Transportation Systems Abstract: In order to solve the artificial problems such as high prices and frequent conflicts between drivers and consumers in ride-hailing services, China and other countries are actively promoting the development of shared autonomous driving. We explore the stimulative impact of subsidy initiatives on shared autonomous driving, focusing on consumers' green preferences. By constructing a Hotelling model of AV and HV service platform competition, it analyzes the optimal solutions under three scenarios: no subsidy, subsidized AV platform, and subsidized consumers, and conducts sensitivity analysis. The study found that subsidizing AV platforms can reduce their service prices and enhance market competitiveness, while subsidizing consumers may increase their service prices but not necessarily increase market share. Policy makers should adjust subsidies as AV services develop, and may gradually cancel incentives after the industry matures. At this time, AV platforms need to improve their technology and management level to achieve sustainable development.

09:05-09:25, Paper TuAT6.4
Integrating End-To-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving

Kaljavesi, Gemb	Technical University of Munich
Su, Xiyan	Technical University of Munich
Diermeyer, Frank	Technical University Munich
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, Fault Monitoring and Diagnosis Abstract: Online corner case detection is crucial for ensuring safety in autonomous driving vehicles. Current autonomous driving approaches can be categorized into modular approaches and end-to-end approaches. To leverage the advantages of both, we propose a method for online corner case detection that integrates an end-to-end approach into a modular system. The modular system takes over the primary driving task and the end-to-end network runs in parallel as a secondary one, the disagreement between the systems is then used for corner case detection. We implement this method on a real vehicle and evaluate it qualitatively. Our results demonstrate that end-to-end networks, known for their superior situational awareness, as secondary driving systems, can effectively contribute to corner case detection. These findings suggest that such an approach holds potential for enhancing the safety of autonomous vehicles.

09:25-09:45, Paper TuAT6.5
High-Precision Vehicle Positioning Technology by Combining Vehicle Images and Satellite Maps (I)

Chen, Yu-Xuan	National Chung Cheng University
Lin, Huei-Yung	National Taipei University of Technology
Keywords: Autonomous Vehicle, Intelligent Transportation Systems Abstract: With the continuous advances of technologies, the demand for precise vehicle positioning has grown significantly. Although it is now possible to capture driving scenes and record driving paths using a dashcam, standard civilian GPS typically has the accuracy errors ranging from 3 to 5 meters. This level of precision is in general not sufficient for the rapidly evolving ADAS (Advanced Driver Assistance Systems). In this paper, we present a high-precision vehicle positioning technique based on satellite map and image data. Instead of using expensive LiDAR sensors, the proposed approach utilizes lane detection, semantic segmentation and geolocation to extract environmental features from images. A* algorithm is then adopted to refine the driving trajectory for the improvement of vehicle positioning accuracy. Furthermore, we establish an image dataset containing satellite maps and latitude/longitude coordinate information of various road scenes. The code and datasets are made available publicly at https://github.com/M610415018/M610415018-Paper.


TuAT7	MR07
Online - AI Applications 1
Chair: Xu, Jin	Shenyang Aerospace University

08:05-08:25, Paper TuAT7.1
Intrusion Detection System Based on FastICA and Multi-Grained Cascaded Forest

Fei, Jiahui	Nanjing University of Science and Technology
Zhang, Shuangquan	School of Cyber Science and Engineering
Lian, Zhichao	Nanjing University of Science and Technology
Keywords: AIoT, Application of Artificial Intelligence, Machine Learning Abstract: 随着大数据、微处理器和其他应用的进步，物联网（IoT）取得了长足的发展。由于缺乏必要的安全防御机制，物联网设备容易被攻击者针对和控制。他们可以操纵大量的物联网设备，对一个国家或地区的网络基础设施发动DDoS攻击，导致严重的经济损失和社会安全风险。基于深度学习的入侵检测方法通常依赖于大量高质量的训练实例，因此很难将其应用于缺乏足够标记数据的网络流量。传统的机器学习（ML）方法在从高维数据中提取和表示特征方面的能力有限，这使得发现数据中的底层结构和模式变得具有挑战性。针对上述问题，该文提出一种融合了快速独立分量分析（FastICA）模块和多粒度级联森林（GcForest）的入侵检测系统（IDS）。通过利用 Fa

08:25-08:45, Paper TuAT7.2
Efficient One-Shot Pruning of Large Language Models with Low-Rank Approximation

Xu, Yangyan	Institute of Information Engineering, Chinese Academy of Science
Cao, Cong	Institute of Information Engineering, Chinese Academy of Science
Yuan, Fangfang	Institute of Information Engineering, Chinese Academy of Science
Mi, Rongxin	National Computer Network Emergency Response Technical Team/Coor
Sun, Nannan	Institute of Information Engineering, Chinese Academy of Science
Wang, Dakui	Institute of Information Engineering, Chinese Academy of Science
Liu, Yanbing	Institute of Information Engineering, Chinese Academy of Science
Keywords: Representation Learning, Deep Learning, Machine Learning Abstract: Model pruning, as an effective method for compressing large language models (LLMs), has recently attracted considerable attention in the field of natural language processing. However, existing LLM pruning methods have two main drawbacks: (1) Iterative pruning for LLMs with over a billion parameters requires retraining, which leads to significant pruning costs. (2) LLMs Pruning is formalized as a weight reconstruction problem that necessitates second-order information, incurring expensive computations. To address these issues, we propose a novel pruning method named Eplra: efficient one-shot pruning of large language models with low-rank approximation, which efficiently identifies sparse networks in LLMs. Specifically, we design a novel pruning metric based on input activations for the rapid one-shot compression of LLMs. We first incorporate input activations into the calculation of weight importance to promote precise pruning of low-priority weights. Then, we perform local weight comparisons across each output of linear layers to induce uniform sparsity. Next, we expand Eplra into semi-structured pruning patterns to accommodate various acceleration scenarios. Finally, we employ low-rank parametrized update matrices to fine-tune the pruned model, facilitating a swift recovery of model performance. Experimental results on various language benchmark datasets demonstrate that Eplra outperforms the state-of-the-art methods.

08:45-09:05, Paper TuAT7.3
Video-Based Examination Anomaly Action Recognition Via Channel-Temporal Model

Peng, Qin	Central China Normal University
Yao, Huaxiong	Central China Normal University
Liu, Xinyu	Central China Normal University
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning Abstract: With rapid technological advancements in computer vision, the recognition of abnormal behavior during examinations has transitioned from human observation to computer-assisted recognition. Although traditional 2D Convolutional Neural Networks (CNNs) excel in computational efficiency, they need to capture crucial temporal dynamics for comprehensive video analysis more precisely. Nevertheless, 3D CNN-based methods demonstrate promising performance in temporal modeling but impose substantial computational demands and deployment costs. To overcome these challenges, this paper introduces an innovative Examination Anomaly Action Recognition Network named ReTANet. It incorporates cross-channel temporal modeling to capture temporal features within videos. It also employs Multi-Scale Channel Attention to enrich feature representation and extract channel and spatial information, thereby enhancing recognition accuracy without significantly increasing computational complexity and model parameters. Furthermore, this paper introduces the Examination Anomaly Action Dataset, also named the ExamGuard Dataset (EGD), to facilitate model training and evaluation. Remarkably, our model demonstrates superior performance compared to existing mainstream action recognition algorithms on the HMDB-51 dataset. Rigorous ablation studies conducted on the UCF-101 dataset have shown the effectiveness and significance of the proposed module.

09:05-09:25, Paper TuAT7.4
KGNet: A Legal Knowledge Enhancement and GlobalPointer Triple Extraction Network

Li, Jinchen	Inner Mongolia University of Finance and Economics
Li, Yanling	Inner Mongolia Normal University
Fengpei Ge, Fengpei Ge	Beijing University of Posts and Telecommunications
Xingxing, Wang	Inner Mongolia Normal University
Keywords: AI and Applications, Application of Artificial Intelligence, Deep Learning Abstract: Extracting entity relations is vital in legal artificial intelligence. It automates the mining of triple data from vast legal texts. Current methods face challenges in inaccurately identifying legal named entity boundaries and extracting overlapping relation triples from legal texts. We present KGNet, a model developed to address these issues effectively. Our approach introduces a Word Information Generator Based on BMES tagging combined with the Fusionformer module. This innovation enhances the incorporation of legal domain knowledge into text representations, improving the accuracy of entity recognition. Additionally, we utilize the GlobalPointer decoder, which redefines and decomposes relation triples, thus resolving the issue of overlapping entities. Performance evaluations on a specially constructed judicial document dataset show that KGNet achieves an F1 score of 66.7%, representing an average improvement of 15.3% over baseline models. These results confirm the effectiveness of KGNet in enhancing legal document processing.

09:25-09:45, Paper TuAT7.5
Research on Task Assignment of Firefighting UAVs Based on E-CARGO Model (I)

Xu, Jin	Shenyang Aerospace University
Xiang, Zhiyu	Shenyang Aerospace University
Zhang, Senyue	Shenyang Aerospace University
Sun, Yue	Shenyang Aerospace University
Gao, Beihang	Shenyang Aerospace University
Keywords: Adaptive Systems, Cooperative Systems and Control Abstract: 消防无人机技术已成为其中之一现代消防行动的核心工具，及其任务执行的规模和复杂性都在不断扩大。面对这样的发展，更是变得更加找到一种有效的方法来确保无人机能够被指派执行最合适的任务。在这个研究中，我们使用了 Environment-Class、Agent、Role、Group、和对象（E-CARGO）模型，以系统地分析消防无人机的任务分配（FDTA）问题，以及引入了增强的鲸鱼优化算法（EWOA）优化FDTA问题中的路径规划。最后仿真实验在多样化下进行地形展示效率和快速响应的条件改进算法在不同工作负载下的能力和环境条件。


TuAT8	MR08
Online - Affective and Cognitive Computing 1	Regular Papers - Cybernetics
Chair: Yuan, Desen	ASR Microelectronics Co., Ltd.; University of Electronic Science and Technology of China

08:05-08:25, Paper TuAT8.1
RPID: Boosting Transferability of Adversarial Attacks on Vision Transformers

Wang, Shujuan	Nanjing University of Science and Technology
Wang, Panpan	Nanjing University of Science and Technology
Sun, Yunpeng	Nanjing University of Science and Technology
Lian, Zhichao	Nanjing University of Science and Technology
Li, Shuohao	National University of Defense Technology
Keywords: Image Processing and Pattern Recognition, Machine Learning, Deep Learning Abstract: Vision Transformers (ViTs) have achieved excellent performance on many computer vision tasks, which has attracted attention of many researchers for their adversarial robustness. As a kind of black-box attack, transfer-based attacks usually use adversarial examples generated by a surrogate model to attack structurally different models. It is practical and poses a certain threat to the application of ViTs in critical security areas. Existing transfer-based attacks against ViTs suffer from weak adversarial transferability and noticeable perceptibility. In this work, we propose a method called Reduce Regional Perturbation Interaction and Differentiated (RPID) attack, which employs two strategies of reducing correlation between regional perturbations and adding differentiated perturbations to produce adversarial examples. Extensive experiments demonstrate that our proposed method improves the transferability of the baseline methods for adversarial attacks against ViTs while maintaining stealthiness.

08:25-08:45, Paper TuAT8.2
LESaET: Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER

Ding, Lichao	Qilu University of Technology (Shandong Academy of Sciences)
Zhao, Jing	Qilu University of Technology(ShanDong Academy of Sciences)
Lu, Kai	Qilu University of Technology (Shandong Academy of Sciences)
Hao, Zenghao	Qilu University of Technology
Keywords: Knowledge Acquisition, Representation Learning, Neural Networks and their Applications Abstract: Knowledge graphs (KGs) provide a structured representation of the real world through entity-relation triples. However, current KGs are often incomplete, typically containing only a small fraction of all possible facts. This involves inferring missing content from existing information, a task known as link prediction. Existing methods in the field of link prediction struggle with controlling the dimensionality of embedding vectors or suffer from overly complex models. In order to tackle these challenges, we introduce a method in this paper, named Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER (LESaET). LESaET leverages both self-attention mechanisms and tensor factorization to learn expressive contextual-enhanced representations of KGs. Specifically, LESaET employs the multi-head self-attention mechanism of Transformer as an encoder to capture the mutual information between entities and relationships, and utilizes Enhanced-TuckER as a decoder, ultimately achieving expressive low-dimensional embeddings for link prediction tasks. LESaET demonstrates competitive performance compared to advanced methods on standard datasets.

08:45-09:05, Paper TuAT8.3
Towards Adversarial Robustness in Blind Image Quality Assessment with Soft Thresholding Norm

Yuan, Desen	ASR Microelectronics Co., Ltd.; University of Electronic Science
Wang, Lei	University of Electronic Science and Technology of China
Keywords: Multimedia Computation, Deep Learning, Media Computing Abstract: In this study, we address the issue of adversarial robustness within the context of Blind Image Quality Assessment (BIQA), an area of heightened importance due to the inherent susceptibility of Deep Neural Networks (DNNs) to adversarial assaults. Current approaches primarily rely on adversarial training, which, despite its efficacy, imposes a significant computational burden. Our research proposes an alternative strategy known as the Soft Thresholding Norm (ST Norm). This approach counters the 'feature shift' phenomenon, identified by a substantial Euclidean Distance Statistics (EDS) between original and adversarial features, through the imposition of sparse constraints on potential features following batch normalization. This novel method offers several advantages: it reduces the Lipschitz constant yielding smoother models, seamlessly integrates with existing models, and boasts inherent denoising capabilities, thereby effectively mitigating the impact of adversarial perturbations. Results suggest that our approach achieves robustness comparable to adversarial training but with significantly less computational overhead. Moreover, it consistently outperforms other adversarial defense strategies on BIQA datasets, highlighting its practical effectiveness in enhancing adversarial robustness. This study underscores the potential of the Soft Thresholding Norm within the realm of IQA tasks, positioning it as a resource-efficient alternative to traditional adversarial training methodologies.

09:05-09:25, Paper TuAT8.4
Efficient Nearest Neighbor Prompt-Based Learning for Few-Shot NER in Manufacturing

Chen, JiaXin	Shenyang Aerospace University
Wang, Peiyan	Shenyang Aerospace University
Keywords: Application of Artificial Intelligence, Knowledge Acquisition Abstract: The NER task in manufacturing is usually lack sufficient labeled data resources. To tackle this issue, this paper presents an effective NN-PLM framework for few-shot NER in manufacturing, which introduce a simple enhancement of the prompt-based learning model using nearest neighbor retrieval. We retrieve the morphologically similar characters for each character to be predicted and then rectifies the prediction. Moreover, we use supervised contrastive learning (SCL) and instance weighting to get better semantic representations of multi-category characters. Compared with the best baseline, our NN-PLM achieves a 7.12% F1 score average improvement on all few-shot settings in manufacturing.

09:25-09:45, Paper TuAT8.5
MJR: Multi-Head Joint Reasoning on Language Models for Question Answering

Li, Shunhao	South China Normal University
Chen, Jiale	South China Normal University
Yan, Enliang	South China Normal University
Zhan, Choujun	South China Normal University
Wang, Fu Lee	Hong Kong Metropolitan University
Hao, Tianyong	South China Normal University
Keywords: Deep Learning, Neural Networks and their Applications, Expert and Knowledge-Based Systems Abstract: Language Models (LMs) have achieved impressive success in various question answering (QA) tasks but have shown limited performance on structured reasoning. Recent research suggests that Knowledge Graph (KG) can augment text data by providing a structured background to enhance reasoning capabilities of LMs. Therefore, how to integrate and reason over KG representations and language context remains an open question. In this work, we propose MJR, a novel model to integrate encoded representations of LMs and graph neural network through multiple layers of feature interaction operations. Subsequently, the fused feature representations in two modalities are fed into a multi-head representation fusion module to comprehensively capture semantic and graph structure information, thereby enhancing language understanding and reasoning capabilities. In addition, we investigate the performance and applicability of different types of large language models as text encoder in the question-answering task. We evaluate our model on three common dataset: CommonsenseQA, OpenBookQA, and MedQA-USMLE datasets. The results demonstrate the advancements of MJR over existing LMs, LM+KG and LLMs models in reasoning for question answering.


TuAT9	MR09
AI Applications 8	Regular Papers - Cybernetics
Chair: Liu, Shanwen	College of Computer Science, Sichuan Normal University

08:45-09:05, Paper TuAT9.3
Robotic Manipulator Motion Planning Based on Global Path Guidance Reinforcement Learning in Dynamic Obstacle Environment

Liu, Shixian	Chinese Academy of Sciences
Zhang, Jinhan	Institute of Automation, Chinese Academy of Sciences
Shanlin, Zhong	Institute of Automation, Chinese Academy of Sciences
Chen, Jiahao	Institute of Automation, Chinese Academy of Sciences
Zhengyu, Liu	Institute of Automation, Chinese Academy of Sciences
Wu, Wei	Institute of Automation, Chinese Academy of Sciences
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning Abstract: Friendly robots have extremely important application prospects in many fields. However, in unstructured environment, the interaction between manipulator and dynamic environments faces the problems of high uncertainty caused by random invasion of work space and computational complexity brought by multi-dimensional action space. Therefore, we propose a hierarchical planning algorithm based on global path guidance reinforcement learning to solve this problem from the decision and planning level. Specifically, the global path planning algorithm first produces a global reference path that ensures the target can be reached. Then the reference path is decomposed into consecutive local targets, which are combined with the objective function of reinforcement learning as local constraints. Finally, the reinforcement learning local planner generates the action of the manipulator based on the observed information. The simulation results show that our method is superior to the standard off-policy reinforcement learning algorithm in terms of learning speed and accuracy, which proves the effectiveness of our algorithm.

09:05-09:25, Paper TuAT9.4
MFFDR: An Advanced Multi-Branch Feature Fusion and Dynamic Reconstruction Framework for Enhancing Adversarial Robustness

Liu, Shanwen	College of Computer Science, Sichuan Normal University
Guo, Rongzuo	Sichuan Normal University
Zhang, Xianchao	Sichuan Normal University
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning Abstract: Deep Neural Networks (DNNs) are highly susceptible to adversarial noise, which can lead to erroneous predictions. In high-stakes scenarios, such as autonomous driving and medical diagnosis, DNNs inaccuracies can be dire. To address this issue, Adversarial Training (AT) has been widely adopted as an effective defense method. However, our analysis reveals two critical flaws in the traditional AT approach that hinder its adversarial robustness: (1) focus only on a subset of robust features during the training process. This narrow focus limits the model's ability to learn and perceive a diverse range of features. (2) tend to overlook potential cues in non-robust features that could be beneficial for the model to make correct predictions. These cues, referred to as "positive activations" for simplicity, contain valuable information that can enhance the model's perception and understanding of the input data. In this way, we propose a novel and plug-and-play framework called Multi-branch Feature Fusion and Dynamic Reconstruction (MFFDR), which leverages multi-branch attention mechanisms to enhance the model's perception of robust features and enrich the diversity of learned features. Moreover, we employ a dynamic weighting strategy to reconstruct non-robust features in order to utilize the positive activations embedded within them. Extensive experiments demonstrate that our method significantly improves the model's adversarial robustness and outperforms previous state-of-the-art methods.

09:25-09:45, Paper TuAT9.5
BTP-CAResNet: An Encrypted Traffic Classification Method Based on Byte Transfer Probability and Coordinate Attention Mechanism

Li, Junhao	Qilu University of Technology (Shandong Academy of Sciences)
Zhang, Wei	Qilu University of Technology (Shandong Academy of Sciences)
Shi, Huiling	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Application of Artificial Intelligence, AI and Applications, Neural Networks and their Applications Abstract: With the extensive application of network traffic encryption technology, the accurate and efficient classification of encrypted traffic has become a critical need for network management. Deep learning has become the predominant method for traffic classification, primarily involving the transformation of network traffic into grayscale images and their subsequent classification using Convolutional Neural Networks (CNNs). However, traditional grayscale image generation methods are plagued with issues of redundant and lost information, and conventional channel attention mechanisms are still insufficient in capturing key traffic features, collectively hindering the enhancement of classification performance. To tackle these issues, this paper introduces a classification method based on Byte Transfer Probability and Coordinate Attention Mechanism in Residual Network (BTP-CAResNet). This method, on the foundation of the classic ResNet architecture, incorporates a new grayscale image generation method that utilizes Byte Transfer Probability, effectively overcoming the deficiencies of traditional approaches. Additionally, this paper integrates a Coordinate Attention Mechanism into the ResNet model, which effectively overcomes the limitations of traditional channel attention mechanisms and further improves the performance of traffic classification. Experimental validation on the ISCX VPN-nonVPN dataset demonstrates that, compared to previous CNN-based methods, the method proposed in this paper exhibits superior performance in key metrics such as accuracy, precision, recall, and F1 score. It provides a new perspective for traffic classification based on convolutional neural networks.


TuAT10	MR10
Big Data and Intelligent Systems
Chair: Li, Wei	University of Chinese Academy of Sciences

08:05-08:25, Paper TuAT10.1
SIKGC: Structural Information Prompt Based Knowledge Graph Completion with Large Language Models

Li, Wei	University of Chinese Academy of Sciences
Ge, Jingguo	University of Chinese Academy of Sciences
Feng, Weihua	Zhengzhou Tobacco Research Institute of CNTC, China
Zhang, Lei	Institute of Information Engineering，Chinese Academy of S
Li, Liangxiong	Institute of Information Engineering, Chinese Academy of Science
Wu, Bingzhen	Institute of Information Engineering, Chinese Academy of Science
Keywords: AI and Applications, Big Data Computing,, Deep Learning Abstract: Knowledge Graph Completion (KGC) aims to enrich and complete the knowledge graph by discovering missing information from existing fact triples. However, existing KGC methods often overlook the utilization of structured knowledge within the knowledge base. In this paper, we propose a novel Large Language Models-based Knowledge Graph Completion framework, called SIKGC, which builds the structural information prompt to assist the knowledge graph completion tasks. Specifically, we arrange the triples in the knowledge graph as the sequences of text. By fusing the descriptions of entities, relations and their structural information as task-aware prompts, we input such prompts into large language models and regard the responses as prediction tasks. The experimental results on various public datasets show that the proposed method outperforms all baseline methods for the three knowledge completion tasks and attains state-of-the-art in triple classification. We also demonstrate that fine-tuning the smaller large language models (e.g., Baichuan2-13B, LLaMA2-13B, ChatGLM3-6B) with relevant data markedly enhances their KGC capabilities and significantly outperforms GPT-4.

08:25-08:45, Paper TuAT10.2
RealDriftGenerator: A Novel Approach to Generate Concept Drift in Real World Scenario

Lin, Borong	Xi'an Jiaotong-Liverpool University
Huang, Chao	University of Southampton
Zhu, Xiaohui	Xi'an Jiaotong-Liverpool University
Jin, Nanlin	Xi’an Jiaotong-Liverpool University
Keywords: Big Data Computing,, Machine Learning Abstract: Concept drift refers to the probability distribution of data generation changes over time in a data stream environment. In recent years, there has been an increasing interest in drift detection models. However, due to the lack of labeled concept drift datasets, most researchers tend to using synthetic drift data generators for model training. These generators only have relatively simple feature distributions, which fail to capture the complexity found in real-world scenarios. This paper introduces a real scenario concept drift label generator (RealDriftGenerator). This generator aims to preserve the complexity and temporal correlation of real-world scenario while generating concept drifts with user defined drift positions and drift widths. The validation results shows that the temporal correlation coefficients of RealDriftGenerator is significantly higher than benchmark drift generators. Additionally, the ability of RealDriftGenerator to capture the complexity in real-world scenarios is 20% higher than benchmark drift generators(measured by model performance). The source code of RealDriftGenerator has been published on https://github.com/sniperrifle71/realDriftGenerator.

08:45-09:05, Paper TuAT10.3
An Agent-Based Model of Opinion Dynamics with Hierarchical Thinking

Ou, Lizhen	National University of Defense Tenchnology
Yao, Yiping	National University of Defense and Technology
Luo, Jiao	National University of Defense Technology
Tang, Wenjie	National University of Defense Technology
Keywords: Agent-Based Modeling, Artificial Social Intelligence Abstract: Opinion dynamics studies the principles governing the evolution of collective opinion, offering valuable insights into the comprehension of social phenomena and forecasting group behavior. However, existing opinion dynamics models often overlook the impact of both opinion climate and cognitive capacities on interactive behaviors, thus causing simulation outcomes diverge from real-world observations. Addressing this gap, we propose a novel opinion dynamics model based on hierarchical thinking to describe the opinion evolution on social networks. Individuals are classified into different levels according to their cognitive abilities. They act with bounded rationality at their respective levels to optimize both the promotion of personal opinions and the avoidance of cyberbullying. Through simulation analysis, we found the crucial role of users with high levels of hierarchical thinking. They can discern the opinion climate and articulate their opinion, acting as bridges in the evolution of public opinion. Their opinion can reach the bounded confidence range of more people, thereby enabling polarization to shift to consensus under the same conditions. Furthermore, this effect is independent of individual inherent attributes, which is more in line with real-life scenarios.


TuAT11	MR11
Brain-Machine Interfaces (BMIs) 2	Regular Papers - Cybernetics
Chair: Shukla, Rishabh	Indian Institute of Technology Jammu

08:25-08:45, Paper TuAT11.2
RoPAR: Enhancing Adversarial Robustness with Progressive Image Aggregation and Reordering Noise

An, Jong-Hyun	Korea University
Hong, Jung-Ho	Korea University
Kim, Hee-Dong	Korea University
Lee, Seong-Whan	Korea University
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications Abstract: Adversarial attacks mislead deep neural network classifiers with slight perturbations, underscoring the necessity for the development of robust defenses to ensure the secure and responsible use of artificial intelligence. Recent research has shown that diffusion-based adversarial purification methods have emerged as a promising defense technique, but often suffer from computational inefficiencies and suboptimal results. To address these issues, we propose RoPAR, an innovative approach that enhances robustness against adversarial attacks by aggregating purified images at intermediate steps of the diffusion process. Our method improves model robustness while reducing the required diffusion steps. We also introduce a technique for reordering Gaussian noise to minimize semantic information loss while removing adversarial perturbations. These enhancements significantly reduce the number of function evaluations from 200 to 6, achieving a robust accuracy of 92.39% against preprocessor-blind PGD attacks on CIFAR-10, a 2.29 percentage point improvement over state-of-the-art. Moreover, our method demonstrates its effectiveness in real-world scenarios, achieving 87.46% accuracy on CIFAR-10C.

08:45-09:05, Paper TuAT11.3
Bridging the Gap: Creating Authentic Biometric Templates for Secure Authentication Systems

Shukla, Rishabh	Indian Institute of Technology Jammu
Kaur, Harkeerat	Indian Institute of Technology Jammu
Echizen, Isao	National Institute of Informatics Tokyo
Keywords: Biometric Systems and Bioinformatics Abstract: Fingerprints serve as a primary means of individually identifying individuals. However, employing fingerprints in online mode poses a significant privacy risk, since it is susceptible to several forms of attack. It is plagued by issues related to privacy and security. In response to this, we proposed an innovative approach to convert the original fingerprint into a secure template that may be retained and utilized for authentication purposes. The new templates bear a resemblance to the original human fingerprints and ensure privacy by possessing the characteristic of non-invertibility. This study presented a method for generating highly authentic fingerprint templates that ensure the capacity to revoke and cancel the stolen fingerprint. Throughout the training and testing phase, we utilized the dataset derived from the Vikriti-ID fingerprint. The collection has 25000 distinct fingerprint samples, divided into five classes, with each class containing 5,000 samples. Throughout the testing phase, the comprehensive performance was evaluated based on the matching performance including EER and AUC.

09:05-09:25, Paper TuAT11.4
PromotiCon: Prompt-Based Emotion Controllable Text-To-Speech Via Prompt Generation and Matching

Lee, Ji-Eun	Korea University
Kim, Seung-Bin	Korea University
Cho, Deok-Hyeon	Korea University
Lee, Seong-Whan	Korea University
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications Abstract: Text-to-speech (TTS) technologies have recently expanded to incorporate natural language prompts for user-friendly control of speech styles, driven by significant advancements in language models. Traditional prompt-based TTS research, however, typically requires large-scale prompt generation that often necessitates costly human annotations. To address this challenge, we propose PromotiCon, a system that leverages prompts generated without human annotations to control emotions in speech. Our model utilizes abundant prompts generated using a large language model. Additionally, we propose an emotion distance-based prompt-speech matching method to appropriately pair the generated prompts with the most resembling speech data. To enhance speaker adaptation, we introduce a semi-supervised approach that allows the joint utilization of multi-speaker data without emotion labels. As a result, our system facilitates zero-shot emotional speech synthesis. Our experimental results confirm the effectiveness of our approach. Audio samples are available at https://promoticon.github.io/.

09:25-09:45, Paper TuAT11.5
CHBaR: Conditional Hilbert Schmidt Bottleneck As Regularization for Adversarial Robustness

Jung, Seung-Wook	Korea University
Hong, Jung-Ho	Korea University
Kim, Hee-Dong	Korea University
Lee, Seong-Whan	Korea University
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications Abstract: Adversarial attacks pose a significant threat to security-critical applications by deliberately deceiving model predictions. Numerous works attempt to create robust models by encoding useful information to intermediate representations. However, they still contain too much information about the training data which hinders improving the robustness of the model. To mitigate this issue, we propose a novel approach, CHBaR, that incorporates class-conditioned information into intermediate representations. The class-conditioned information plays the role of weight components which are multiplied with the intermediate representations to produce class-conditioned representations. We utilize an attribution-based explanation method to obtain this class-conditioned information. As a result, the weight components emphasize class-relevant features by highlighting relevant information from the target class. This weighting process easily integrates the target class without complex computations and conceals useless representations, thus enhancing model predictions by masking features unrelated to the class. Extensive experiments demonstrate the effectiveness of our proposed method in enhancing adversarial robustness. Especially, on the SVHN dataset, our proposed method shows an increment of 6.98% points compared to the baseline model in PGD40 adversarial attack with the TRADES training setting.


TuAT12	MR12
Haptic and Human-Computer Interaction 6	Regular Papers - HMS
Chair: Panagopoulos, Dimitrios	Cranfield University

08:25-08:45, Paper TuAT12.2
GRUI: A Novel Gesture Recognition Utilizing UWB Sensor and IMU

Lee, Dongjae	Korea University
Yoo, Kyeonghyun	Korea University
Jung, Wooyong	Korea University
Kim, Hwangnam	Korea University
Keywords: Human-Computer Interaction, Intelligence Interaction, Human-Machine Cooperation and Systems Abstract: In recent advancements in sensor and artificial intelligence technologies, the reliability of gesture recognition has significantly improved, prompting various industrial fields to adopt this technology. However, most gesture recognition systems rely on optical methods because of their high accuracy, despite require complex and computationally intensive processes. Moreover, these systems are associated with high construction costs and are susceptible to environmental factors. This paper introduces a novel gesture recognition system, which effectively tracks and estimates gestures using cost-effective ultra-wideband (UWB) sensors and inertial measurement units (IMU). The system acquires position data of gesture through UWB sensors and includes essential data processing steps such as the detection and removal of abnormal data via IMU, data smoothing with a Kalman filter, and data normalization and scaling. Notably, normalization and scaling are achieved by converting the position data into grayscale images, ensuring the consistency of data features and enhancing gesture recognition accuracy across diverse users. The proposed system employs a convolutional neural network (CNN) model to estimate gestures from these images. Comparative analyses demonstrate that the proposed system exhibits superior gesture classification performance compared to systems utilizing a long-short term memory (LSTM) model and those employing the same CNN model without the aforementioned data processing steps. Therefore, this system is not only cost-effective but also efficiently tracks and estimates gestures, offering significant improvements over existing methods.

08:45-09:05, Paper TuAT12.3
Generating Explanations for Autonomous Robots Using Assumption-Alignment Tracking

Cao, Xuan	Brigham Young University
Crandall, Jacob	Brigham Young University
Goodrich, Michael	Brigham Young University
Keywords: Human-Machine Interaction, Intelligence Interaction Abstract: As the techniques of autonomous robots advance, there is an increasing demand for robots to provide explanations for their behavior. There are two commonly used explanation types. The first type emphasizes that a robot’s policy is the best (or only) option that satisfies a specific property produced by its decision-making algorithms. The second explanation type is used when a robot fails and describes the cause of an error state that led to the failure. This paper proposes a new explanation type derived from a robot's proficiency self-assessment. The proposed explanation type not only supplements the first explanation type under typical operating conditions but also includes the second explanation type when the robot fails. The proposed explanation type is based on assumption-alignment tracking (AAT), a novel method for robot proficiency self-assessment. AAT provides three pieces of information for explanation generation: (1) assessment of assumptions veracity on which the robot's generators rely; (2) proficiency assessment measured by the probability that the robot will successfully accomplish its task; (3) counterfactual proficiency assessment computed by hypothetically varying assumptions. The information provided by AAT fits the situation awareness-based framework for explainable artificial intelligence. Examples of generated explanations are demonstrated using a simulated robot setting up a table with different blocks.

09:05-09:25, Paper TuAT12.4
Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

Panagopoulos, Dimitrios	Cranfield University
Perrusquia, Adolfo	Cranfield University
Guo, Weisi	Cambridge University
Keywords: Human-centered Learning, Human-Machine Interaction, Human-Machine Cooperation and Systems Abstract: In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.

09:25-09:45, Paper TuAT12.5
Sensor System for Real-Time Classification of Manual Construction Tasks with Power Tools for Exoskeleton Control

Leudesdorff, Bent	Fraunhofer IPA
Salazar Strümpler, Lydia Rebeca	Fraunhofer IPA
Dobosz, Thomas	Institute of Industrial Manufacturing and Management, University
Maufroy, Christophe	Fraunhofer Institute for Manufacturing Engineering and Automatio
Schneider, Urs	Institute of Industrial Manufacturing and Management, University
Bauernhansl, Thomas	Institute of Industrial Manufacturing and Management, University
Keywords: Human-Machine Cooperation and Systems, Assistive Technology, Human-Machine Interaction Abstract: Work-related musculoskeletal disorders (WMSD) continue to be a significant cause of work incapacity. Exoskeletons have the potential to prevent these disorders, with passive exoskeletons already proving their usefulness but also displaying limitations. However, there is currently a lack of suitable methods to control active exoskeletons, which offer additional advantages like supporting the user just when needed. This paper proposes a sensor system and a method to classify different activities based on the kinematic and activation signals of the used power tools. First, requirements and thresholds for the sensor system and its signals were derived from representative activities in the construction environment. It then introduces a sensor system that collects the tool's kinematic signals and activation. The sensor system consists of an inertial measurement unit (IMU), pressure sensor, and WiFi-capable micro-controller to stream the data. Based on the signals, a threshold-based algorithm, capable of identifying six predefined activities, is presented. The paper presents a suitable test course based a real situation in the construction industry to evaluate the proposed sensor system and algorithm. With the developed test course a study is conducted and the activities are classified based on the signals of the sensor system. The results demonstrate that the defined activities can be distinguished based on the kinematic and activation signals of the power tools.


TuAT13	Foyer
2P - AI Applications
Chair: Yagi, Naomi	University of Hyogo

08:05-08:25, Paper TuAT13.1
CiRA CORE: A Low Code Platform That Makes AI Work for Industry 4.0

Loo, ChuKiong	University of Malaya
Boonsang, Siridech	King Mongkut’s Institute of Technology
Sasisaowapak, Thanyathep	King Mongkut’s Institute of Technology
Chuwongin, Santhad	King Mongkut’s Institute of Technology
Tongloy, Teerawat	King Mongkut’s Institute of Technology
Nahavandi, Saeid	Swinburne University of Technology
Wong, Kok Wai	Murdoch University
Keywords: Application of Artificial Intelligence, Cloud, IoT, and Robotics Integration, AIoT Abstract: CiRA CORE is a central hub designed to connect AI technology creation with practical application, making it easier to work with ROS (Robot Operating System) and link different systems through a user-friendly drag-and-drop interface. This approach removes the need for extensive coding, making the platform accessible to those with minimal programming experience. CiRA CORE offers a comprehensive suite of features for AI development and robot control, including algorithm creation, AI model training, and device integration commonly used in industrial settings. It supports tasks like image recognition and facilitates data storage, labeling, and integration with other systems for data-driven AI development. Overall, CiRA CORE aims to democratize AI development and robot control, simplifying AI development for Industry 4.0 applications, and leading to increased efficiency, reduced costs, and improved safety in industrial processes. This paper reports the progress of the CiRA CORE training modules funded by the SMCS TEAM Program Award. The project has completed the design of a 6-axis robot 3D training kit and simulation models for CiRA CORE training modules. The next steps involve developing 3D-printed robots and training materials. The main goal is to democratize advanced robotics and AI by simplifying integration through a visual, node-based programming interface. This approach reduces the need for complex coding, making these technologies accessible to users with limited programming experience. This initiative aims to foster widespread adoption in business and industrial settings, aligning with IEEE SMC's mission to promote professional growth and innovation in robotics and AI.

08:25-08:45, Paper TuAT13.2
Towards an Optimal Design: What Can We Recommend to Elon Musk

Ceberio, Martine	The University of Texas at El Paso
Kosheleva, Olga	University of Texas at El Paso
Kreinovich, Vladik	University of Texas at El Paso
Nguyen, Hung T.	New Mexico State University
Keywords: Consumer and Industrial Applications, Large-Scale System of Systems, Manufacturing Automation and Systems Abstract: Elon Musk's successful "move fast and break things" strategy is based on the fact that in many cases, we do not need to satisfy all usual constraints to be successful. By sequentially trying smaller number of constraints, he finds the smallest number of constraints that are still needed to succeed -- and using this smaller number of constrains leads to a much cheaper (and thus, more practical) design. In this strategy, Musk relies on his intuition -- which, as all intuitions, sometimes works and sometimes doesn't. To replace this intuition, we propose an algorithm that minimizes the worst-case cost of finding the smallest number of constraints.

08:45-09:05, Paper TuAT13.3
Development of Tracking System for Swallowing Movement Using Optical Flow

Yagi, Naomi	University of Hyogo
Nishihara, Ryosuke	University of Hyogo
Kawamura, Naoko	Himeji Dokkyo University
Maezawa, Hitoshi	Kansai Medical University
Kashioka, Hideki	National Institute of Information and Communications Technology
Hirata, Masayuki	Osaka University
Yanagida, Toshio	National Institute of Information and Communications Technology
Sakai, Yoshitada	Kobe University
Hata, Yutaka	University of Hyogo
Keywords: AI and Applications, Application of Artificial Intelligence, Computational Intelligence Abstract: Currently, population in Japan has been aging at a speed unparalleled in other countries, and countermeasures against aging population and the worsening of disease for people with disabilities have become urgent issues. Pneumonia and aspiration pneumonia are the leading causes of death. It is said that swallowing function tends to decline from around age of 40, however it is important to keep it in good condition without deteriorating function as much as possible. The gold standard for swallowing functional evaluation is swallowing contrast testing, however X-ray exposure disables to repeat testing. In addition, Repetitive Saliva Swallowing Test (RSST) of screening test is difficult for self-check. Therefore, in this study, we develop a system to self-evaluate swallowing ability for keeping swallowing function healthy. It is proposed by applying optical flow and artificial intelligence of DeepLabcut. As a result, we were able to visualize movement of the larynx during swallowing.

09:05-09:25, Paper TuAT13.4
The Improved Mango Plant Detection Model Based on Attention Module Mechanism

Sung, Wen-Tsai	National Chin-Yi University of Technology
Isa, Indra Griha Tofik	National Chin-Yi University of Technology
Keywords: AIoT, Computational Intelligence, Soft Computing, Socio-Economic Cybernetics Abstract: Agriculture is one of the sources of income a region can rely on to support its economy. Traditional agriculture relies primarily on human performance and observation, resulting in greater production costs and, subsequently, higher selling prices. Artificial intelligence-based technology can be used to reduce production costs, increase productivity, and provide consumer convenience. An indicator that is easy to interpret in measuring the quality and optimization of plant growth is the visualization of the condition of the leaves. The artificial intelligence technique that can be implemented in this regard is the object detection model. However, the challenge is the complex, multi-object, and multi-intersection condition of the leaves, which causes the model to be less optimal in conducting classification and detection tasks regarding whether the leaf condition is good or not. A YOLOv7 model will be employed in order to detect leaf quality, whether in an “optimal” or “not optimal” condition. To enhance the model's performance by improving accuracy through feature extraction enhancement, YOLOv7 will be integrated with the attention module, called the convolutional block attention module (CBAM). The case study in this research is detecting a mango plant which is one of the plants that can provide a high economic impact and the object observed is the mango plant leaf. Several previous studies related to the implementation of attention modules in object detection include the improved pest-YOLO for real-time pest detection by combining YOLOv3 with efficient channel attention (ECA) and a transformer encoder. The ECA module and transformer encoder were integrated into the backbone and neck block systems of YOLO [1]. The lightweight YOLO model combined with SE-CSPGhostnet by improving the backbone block which employs squeeze-and-excitation networks (SENet) and a convolution technique consisting of regular convolution and ghost convolution [2]. There is a highlighted improvement of YOLOv7 compared to the previous version of YOLO, which is Extended Efficient Layer Aggregation Networks (E-ELAN). YOLOv7's learning ability is enhanced by using this network while maintaining the transition layer's architecture. E-ELAN enhance

09:25-09:45, Paper TuAT13.5
AI-Enhanced Web Form Development: Tackling Accessibility Barriers with Generative Technologies

Saraswathi, Pradeep Kumar	Salesforce
Keywords: Assistive Technology, User Interface Design, Companion Technology Abstract: Web forms play a pivotal role in digital interfaces but frequently pose significant accessibility challenges. This paper explores the main barriers to creating accessible web forms and investigates how generative AI technologies can provide solutions. We highlight core issues such as accurate labeling, keyboard navigation, error management, focus control, visual design factors, placeholder text usage, assistive technology compatibility, handling of complex inputs, responsive design, cognitive load reduction, and ongoing testing. For each of these challenges, we assess its effect on accessibility and present innovative AI-driven strategies. Our findings illustrate how AI can streamline the development process by automating label generation, improving tab indexing, enhancing real-time error detection, refining focus control, offering contrast improvement suggestions, and simulating interactions with assistive technologies. We conclude that incorporating generative AI into web form development can markedly improve accessibility, making digital experiences more inclusive for users of all abilities. This not only supports compliance with legal and ethical standards but also fosters a more inclusive online environment, enhancing user satisfaction and overall experience.


TuAPSR	Room T14
Poster Presentation - Session 1	Poster Session

08:05-09:45, Paper TuAPSR.1
UAVs for Sustainable Palm Oil Production: An Ant Colony Approach to Efficient Path Planning

Lai, Weng Kin	Tunku Abdul Rahman University of Management and Technology
Chen, Pak Hen	Tunku Abdul Rahman University of Management and Technology
Lim, Li Li	Tunku Abdul Rahman University of Management and Technology
Lee, Patrick Sheng Siang	AONIC
Keywords: Application of Artificial Intelligence, Swarm Intelligence, AI and Applications Abstract: The production of palm oil on a commercial scale is labour intensive with many of its processes handled by humans. In some countries, there can be as many as 500,000 plantation workers in the palm oil sector involved in labour intensive work in large plantations. However, such dependence on humans for low skill manual work has led to many problems. Unmanned aerial vehicles (UAVs) have been seen as a possible alternative to support some of these processes that require low skills in the palm oil industry. However, the flying time of the UAVs is finite and hence it is important to maximize the number of palm trees that it can service. In this paper, an Ant Colony System (ACS) with a novel path constructor was used to identify good flight paths for UAVs in large palm oil plantations to help improve the efficiency for some of the agricultural activities. Good results were obtained for various data sets especially when compared with the standard ACS as well as those by the human experts.

08:05-09:45, Paper TuAPSR.2
Incremental Learning Algorithms for Broad Learning System with Node and Input Addition

Chen, Guang-Ze	University of Macau
Jin, Junwei	Henan University of Technology
Sun, Hai-Wei	University of Macau
Chen, C. L. Philip	University of Macau
Keywords: Computational Intelligence, AI and Applications, Machine Learning Abstract: The Broad Learning System (BLS) has been established as an effective flat network alternative to Deep Neural Networks (DNNs), delivering high efficiency while achieving competitive accuracy. Despite its advantages, the incremental learning methods of BLS face challenges in stability and computation when expanding with new nodes or input. We introduce two novel incremental learning algorithms based on factorization updates for BLS that optimize node and input additions to overcome these limitations. Our node addition algorithm utilizes QR decomposition and Cholesky factorization, using the update of the Cholesky factor instead of pseudo-inverse computations. For input addition, we propose an iterative Cholesky factor update algorithm. Our algorithms demonstrate not only faster computation compared to the existing BLS but also improved testing accuracy on the MNIST or Fashion-MNIST dataset. This work presents a significant step forward in the practical application and scalability of BLS in various data-dense environments.

08:05-09:45, Paper TuAPSR.3
RTS-DETR: Efficient Real-Time DETR for Small Object Detection

Li, Wenqiang	Qilu University of Technology (Shandong Academy of Sciences)
Li, Aimin	Qilu University of Technology
Li, Zhiyao	Qilu University of Technology (Shandong Academy of Sciences)
Kong, Xiaotong	Qilu University of Technology (Shandong Academy of Sciences)
Zhang, Yuechen	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, AI and Applications Abstract: In recent years, object detection models DETRs based on Transformer architecture have played a huge role in various fields. However, the DETR series models are not satisfactory in small object detection. Mainly due to the huge amount of calculation of DETR, a lot of feature information will be lost in the feature fusion stage and the low tolerance of small objects to Intersection over Union (IoU). In order to solve the above problems, we propose a near real-time detection model RTS-DETR. In this paper, we revisit RT-DETR, which effectively handles multi-scale features by decoupling intra-scale interactions and cross-scale fusion, but this will lose a lot of positive local information. To this end, we have improved the efficient hybrid encoder. We propose a new positional encoding method that enables the hybrid encoder to more accurately convert the input feature sequence into a high-dimensional representation, and propose a new feature fusion module to enhance the model's ability to capture local features. Furthermore, in order to improve the tolerance of small objects to IoU, we combine Normalized Wasserstein Distance (NWD) with Shape-IoU for the optimization model. This method more accurately takes into account the shape and size of objects, thereby improving detection accuracy. Our model achieves an accuracy of 38.8% (in terms of mAP_{@0.5}) on the widely used VisDrone dataset, which improves the accuracy by 2.5% compared to RT-DETR with ResNet-18 as the backbone network.

08:05-09:45, Paper TuAPSR.4
Synergizing Internal and External Knowledge: Prompt Engineering for Efficient and Effective Large Language Model Reasoning

Lu, Gewei	Shanghai Jiao Tong University
He, Chaofan	Shanghai Jiao Tong University
Shen, Liping	Shanghai Jiao Tong University
Keywords: Application of Artificial Intelligence, Deep Learning, Knowledge Acquisition Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated remarkable capability in question answering but face challenges when it comes to knowledge-based reasoning, such as limited training data and hallucination. To address these challenges, integrating LLMs with knowledge graphs (KGs) has emerged as a promising solution. However, the cost associated with training and inference of LLMs is high. Our method integrates the Retrieval-Augmented Generation (RAG) paradigm, incorporating relevant information from KGs alongside the question to enhance LLMs' reasoning process without training. Moreover, we propose a novel concept of self-knowledge motivation to reduce the overhead of inference, which prompts LLMs to integrate retrieved information with their internal knowledge for reasoning before seeking additional queries to KGs. Experimental results showcase improvements in answer accuracy and a reduction in LLMs' API calls compared to the latest published state-of-the-art (SOTA) method employing an identical paradigm, underscoring the efficiency and effectiveness of our method.

08:05-09:45, Paper TuAPSR.5
Try-Then-Eval: Equipping an LLM-Based Agent with a Two-Phase Mechanism to Solve Computer Tasks

Cao, Thanh-Duy	Ho Chi Minh University of Science, VNU-HCM
Nguyen, Phong Phu	University of Science - VNUHCM
Le, Vy	University of Information Technology
Nguyen, Long	University of Science, Ho Chi Minh City, Vietnam
Nguyen, Vu	University of Science, Vietnam National University
Keywords: Application of Artificial Intelligence, Computational Intelligence, Neural Networks and their Applications Abstract: Building an autonomous intelligent agent capable of carrying out web automation tasks from descriptions in natural language offers a wide range of applications, including software testing, virtual assistants, and task automation in general. However, recent studies addressing this problem often require manually constructing of prior human demonstrations. In this paper, we approach the problem by leveraging the idea of reinforcement learning (RL) with the two-phase mechanism to form an agent using LLMs for automating computer tasks without relying on human demonstrations. We evaluate our LLM-based agent using the MiniWob++ dataset of web-based application tasks, showing that our approach achieves 85% success rate without prior demonstrations. The results also demonstrate the agent's capability of self-improvement through training.

08:05-09:45, Paper TuAPSR.6
Decrease the Prompt Uncertainty: Adversarial Prompt Learning for Few-Shot Text Classification

Weng, Jinta	School of Cyber Security, University of Chinese Academy of Scien
Zhang, Zhaoguang	Guangzhou University
Jing, Yaqi	National Computer Network Emergency Response Technical Team/Coor
Niu, Chenxu	China
Huang, Heyan	School of Computer Science and Technology, Beijing Institute Of
Hu, Yue	School of Cyber Security, University of Chinese Academy of Scien
Keywords: Artificial Social Intelligence, AI and Applications, Machine Learning Abstract: With few-shot learning abilities, pre-trained language models (PLMs) have achieved remarkable success in classification tasks. However, recent studies have shown that the performance of PLM is vulnerable due to different prompts and the instability of the prompt-based learning process. To address this challenge, we explore appropriate perturbation addition of adversarial training and integrate the global knowledge of the full-parameter fine-tuned pre-trained language model(PLM). Specifically, we propose an adversarial prompt learning model (ATPET) and ATPET with fine-tuning(ATPETFT), incorporating ATPET with fine-tuning knowledge into the prompt learning process. Through extensive experiments on several few-shot classification tasks and challenging data settings, we demonstrate that our methods consistently improve the robustness while maintaining the effectiveness of PLMs.

08:05-09:45, Paper TuAPSR.7
Enhancing Autofocus Performance through Predictive Motion-Targeting and Self-Attention in a Deep Reinforcement Learning Framework

Wei, Xiaolin	Chongqing University
Yang, Ruilong	Chongqing University
Wu, Xing	Chongqing University
Wang, Chengliang	Chongqing University
Wang, Haidong	Southwest Hospital of Army Medical University
Wang, Hongqian	Southwest Hospital of Army Medical University
Tang, Tao	Chongqing University
Keywords: Image Processing and Pattern Recognition, AI and Applications, Neural Networks and their Applications Abstract: In focusing tasks on moving targets, traditional methods that rely on maximizing contrast struggle to capture moving objects due to insufficient focusing speed. Deep learning-based methods have attempted to directly predict the optimal focal length for the target; however, due to low prediction accuracy, they often lead to out-of-focus situations when capturing moving objects. In recent years, some approaches have utilized reinforcement learning to automatically explore focal length adjustment patterns, thus achieving better results than traditional methods. However, these approaches have not considered the motion characteristics of the targets, leading to a need for further improvement in focusing performance. To overcome these limitations, we introduce a motion-based feature and deep reinforcement learning-driven autofocus algorithm named MF-DRLAF for moving targets. This novel method tracks the object, predicts its motion state through feature extraction, and uses deep reinforcement learning to dynamically adjust the focus. We utilize a self-attention mechanism to adaptively learn various motion patterns and employ a feature pool structure to enhance processing efficiency. Experiments and real-world testing on a Google Pixel3 demonstrate that our approach significantly enhances autofocus performance on moving objects, highlighting its potential for broader imaging applications. This approach offers a promising direction for future development in autofocus technology.

08:05-09:45, Paper TuAPSR.8
Fractional Order Controller Design for LFC of Two-Area Interconnected Power System with Time Delay Based on IMC Approach

K, Gnaneshwar	PDPM IIITDM Jabalpur
Padhy, Prabin Kumar	PDPM IIITDM Jabalpur
Keywords: System Modeling and Control, Intelligent Power Grid, Control of Uncertain Systems Abstract: Load frequency control (LFC) of a two-area connected electric power system is vital for maintaining grid stability and reliability by matching power generation with load demand. Thus, this work proposes an analytical approach for designing a fractional order (FO) controller to regulate the LFC of a two-area connected electrical power system with time delay. First, the interconnected electrical power system is accurately modelled as a FO system with time delay. Then, the FO controller is designed using the internal model control (IMC) technique, where a low-pass filter (LPF) is considered to mitigate the effect of the disturbances. The tuning parameter of the designed FO involves a single tuning parameter, which is analytically designed using gain crossover frequency criteria. The disturbance and parametric uncertainty analyses have been carried out to analyze the efficacy of the proposed method under the variation of tuning parameter. Then, the frequency and tie-line power fluctuations are estimated under nominal and parametric uncertainty conditions. Also, its performance has been compared to recent state-of-the-art techniques for precise efficacy analysis.

08:05-09:45, Paper TuAPSR.9
SELus: Towards Spatio-Temporal Modeling and Quantitative Evaluation for Cyber-Physical Systems

Zhang, Quanguo	East China Normal University
Liu, Jing	East China Normal University
Liu, Mingxing	Nuclear Power Institute of China
Huang, Yanhong	East China Normal University
Hou, Rongbin	Nuclear Power Institute of China
Shi, Jianqi	East China Normal University
Keywords: System Modeling and Control, Cyber-physical systems, Modeling of Autonomous Systems Abstract: Synchronous language is routinely used to model safety-critical control systems. In recent years, it is gradually being applied to cyber-physical systems (CPS) which emphasise high levels of correctness and safety. It is based on the assumption that the system reacts instantaneously to input events and can compute the output before the next input event, so it is well suited for expressing temporal logic. However, it lacks effective constructs for expressing spatial properties in CPS. Moreover, spatio-temporal properties in CPS are indispensable, requiring not only qualitative analysis but also quantitative analysis. Therefore, we propose SELus, a new synchronous language based on Lustre, to provide the capability of modeling spatio-temporal properties in CPS, enabling the representation of spatial topological relationships and the performance of quantitative analysis on them. To formally verify the SELus model, we introduce a set of mapping rules to transform the SELus model into the Ptolemy II model. The resulting Ptolemy II model is used in Ptolemy II to perform quantitative analysis of the SELus model. Experiments are conducted on lane changing system, showcasing the usability and effectiveness of our language.

08:05-09:45, Paper TuAPSR.10
Wheeled Mobile Robots on Rough Terrains As Stochastic Nonholonomic Systems

Gzenda, Vaughn	Carleton University
Chhabra, Robin	Carleton University
Keywords: Control of Uncertain Systems, Modeling of Autonomous Systems, Robotic Systems Abstract: In this paper, we investigate the motion of wheeled mobile robots on rough terrains modeled as noisy nonholonomic constraints. Such constraints are the natural extension of ideal nonholonomic constraints when the Stratonovich process is directly introduced in the constraint equations. The resulting stochastic model can capture motion on rough surfaces, random deformation in the wheel-ground contact, or stochastic loss/gain of traction. We study a differential robot with ideal noisy and affine noisy constraints, where each case models a certain aspect of motion on rough terrains. We then investigate their corresponding stochastic dynamics and the propagation of mean and covariance through Monte-Carlo simulations. The proposed model for roving rough terrains has the potential to serve as the stochastic model employed in model-based motion planning, pose estimation, and control of rover systems. The main challenge will be dealing with the nonlinear appearance of the noise and its feedback in the equations of motion.

08:05-09:45, Paper TuAPSR.11
Energy-Efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles

Ding, Fan	Monash University
Luo, Xuewen	Monash University
Li, Gaoxuan	Monash University
Tew, Hwa Hui	Monash University Malaysia
Loo, Junn Yong	Monash University Malaysia
Chor, Wai Tong	Tunku Abdul Rahman University of Management and Technology
Bakibillah, A. S. M.	Tokyo Institute of Technology
Zhao, Ziyuan	I2R，A*STAR
Tao, Zhiyu	National Science Library, Chinese Academy of Sciences; Departmen
Keywords: Autonomous Vehicle, System Modeling and Control, Modeling of Autonomous Systems Abstract: To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware.It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.

08:05-09:45, Paper TuAPSR.12
A Novel Information-Theoretic Metric for Evaluating LiDAR Setups of Autonomous Vehicles

Hafemann, Philipp	Technical University Munich
Song, Xulin	Technical University Munich
Brecht, David	Technical University of Munich
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Intelligent Transportation Systems Abstract: The sensor configuration of an autonomous vehicle (AV) is determined in the early development phase when specific perception algorithms are not yet available. Therefore, approaches based on synthetic raw data are necessary to evaluate different configurations. One sensor type used in AV is LiDAR, but developers should carefully consider the amount and placement of the sensors due to their high costs. In this contribution, we propose the Omni-Lidar Evaluation Score (OLES), a novel metric to evaluate different LiDAR configurations based on their simulated raw data. Our OLES metric combines information-theoretic quantities with coverage-based metrics, considering both the spatial coverage and the uniformity of a LiDAR point cloud distribution. We show the need for a new metric and provide details on implementing OLES using the open-source simulator textit{CARLA}. We demonstrate the effectiveness of our new metric in a simulation study and highlight its usefulness in the early phases of vehicle development. This research provides a means to evaluate the quality of LiDAR configurations and provides a basis for further optimizing sensor setups for AVs.

08:05-09:45, Paper TuAPSR.13
The Eco-Label Strategy of Green Manufacture under the Influence of Consumers’ Intrinsic Preferences

Hou, Yingjie	Northwestern Polytechnical University
Guo, Peng	Northwestern Polytechnical University
Zhao, Jing	Northwestern Polytechnical University
Keywords: Consumer and Industrial Applications Abstract: Considering two eco-label strategies, self-label and certification-label, we construct a duopoly competition model encompasses both green product and ordinary product manufacturing enterprises. By Investigating the optimal eco-label standards, we explore the product pricing, and profits for enterprises facing green-sensitive consumers and price-sensitive consumers. The we analyze the optimal eco-label selection for green enterprises in different preference markets. Research indicates that the green quality standards and product prices under certification labels are invariably higher than those under self-label. However, the choice of eco-label by enterprises is influenced by consumers' individual intrinsic preferences; in price-sensitive markets, enterprises tend to adopt self-label; In green-sensitive markets, when the value of consumers' individual intrinsic preferences is below a certain threshold, enterprises will prioritize certification labels. Additionally, the profits of enterprises in green-sensitive markets are generally higher than those in price-sensitive markets, enterprises should highlight the advantages of green quality and guide consumers to prefer green attributes more when formulating promotional strategies.

08:05-09:45, Paper TuAPSR.14
AutoForma: A Large Language Model-Based Multi-Agent for Computer-Automated Design

Liao, JianXing	Shenzhen Institute for Advanced Study, University of Electronic
Xu, Junyan	University of Electronic Science and Technology of China
He, Sicheng	University of Electronic Science and Technology of China
Chen, Zeke	UESTC
Yu, Shui	Shen Zhen Institute for Advanced Study, UESTC
Li, Yun	Shenzhen Institute for Advanced Study, University of Electronic
Keywords: Consumer and Industrial Applications, System Architecture Abstract: With the proliferation of artificial intelligence, Computer-Aided Design (CAD) is being transformed into Computer-Automated Design (CAutoD). In this paper, the advent of Large Language Models (LLMs) introduces new opportunities for CAutoD. This study develops AutoForma, an LLM-based multi-agent system, for automatic conversion from natural language descriptions to 3D models. By harnessing the comprehension capabilities of LLMs, AutoForma streamlines the CAutoD workflow by efficiently translating design intents into precise models in CAD. Through a comprehensive set of evaluations, AutoForma is seen to offer automation performance across various design tasks, particularly in generating non-standard parts that meet specific requirements, with higher efficiency and accuracy than using just an LLM like GPT-4.

08:05-09:45, Paper TuAPSR.15
Hybrid Data-Mechanism Modeling for Tire Response Dynamics in Estimating Tire–Road Friction Coefficient

Lu, Jiaxing	Tongji University
Cheng, Liangzhu	Dongfeng Automotive Technology Center
Liang, Jun	Dongfeng Automotive Technology Center
Wang, Nian	Dongfeng Motor Corporation
Li, Bin	College of Electronic and Information Engineering, Tongji Univer
Zhang, Lin	Tongji University
Chen, Hong	Tongji University
Keywords: System Modeling and Control, Electric Vehicles and Electric Vehicle Supply Equipment, Autonomous Vehicle Abstract: Advanced control and safety systems are crucial for electric vehicles, and the accurate estimation of the tire-road friction coefficient (TRFC) is crucial for developing effective safety control strategies. The hybrid data-mechanism model (HDMM), introduced in this paper, addresses the performance challenges posed by the inaccuracies of physical models and the limited interpretability of data-driven models in tire force estimation for TRFC estimation.Tire dynamics often exhibit transient responses, while mechanism-based models(MBM) typically reflect steady-state characteristics. Neglecting transient characteristics leads to a decrease in model accuracy.A neural network is used to learn the transient response characteristics of tire dynamics.These characteristics are then integrated with the steady-state tire forces from MBM to estimate the lateral and vertical forces acting on the wheel.The estimated tire forces serve as virtual measurements to calibrate parameters in the TRFC estimator, based on the Unscented Kalman Filter (UKF). During real-world vehicle tests, the proposed method reduced the Mean Error (ME) in lateral and vertical forces by 1271.85 N and 996.7 N, respectively, compared to the estimated tire forces from MBM. Additionally, the estimated TRFC converged to the reference value approximately 40ms earlier than the result from the MBM, with an estimated deviation within 0.1.

08:05-09:45, Paper TuAPSR.16
FLSTAGCN: Traffic Flow Prediction Based on Federated Learning and Attention Graph Convolutional Network

Shi, Lei	Zhengzhou University, School of Cyber Science and Engineering
Yuan, Shaohua	Zhengzhou University
Lian, Huijuan	Zhengzhou University
Gao, Yufei	Zhengzhou University
Wei, Lin	Zhengzhou University
Wang, Qilong	Zhengzhou University
Keywords: Intelligent Transportation Systems, Distributed Intelligent Systems, Smart Buildings, Smart Cities and Infrastructures Abstract: Traffic flow prediction assumes a pivotal role in aiding governments and companies accurately forecast changes in vehicle volume, consequently enhancing transportation efficiency and facilitating vehicle travel. Presently, the majority of traffic flow prediction methods rely on centralized learning strategies, which entail the transmission of substantial data and may jeopardize user privacy. To address this issue, we propose a Federated Learning-based Attention Graph Convolutional Network (FLSTAGCN) algorithm for traffic flow prediction. Firstly, we develop a Spatial-Temporal Attention Graph Convolutional Network (STAGCN) method that employs attention mechanism to proficiently extract spatial-temporal features from traffic flow data, augmenting the model's learning capabilities. Subsequently, within the aggregation mechanism of Federated learning, we devise a bespoke optimal selection to enhance training accuracy and reduce communication costs in traffic flow prediction scenarios. Finally, we integrate Federated Learning with STAGCN and utilize the optimal selection protocol to designate participants for transmitting optimal parameters. The Experimental results substantiate that our approach outperforms advanced deep learning approaches in terms of traffic flow prediction performance while ensuring the privacy and security of traffic data.

08:05-09:45, Paper TuAPSR.17
Steering Control Considering Motion Sickness and Vehicle Performance Via DDPG Algorithm and 6-DoF-SVC Model

Kawakami, Uta	The University of Electro-Communications
Sawada, Kenji	The University of Electro-Communications
Keywords: Autonomous Vehicle, Decision Support Systems, Adaptive Systems Abstract: Autonomous driving demands sophisticated control systems that optimize safety, performance, passenger comfort, and fuel efficiency. This study proposes a steering control system that integrates the Deep Deterministic Policy Gradient (DDPG) for speed planning with a novel feedback mechanism based on Subjective Vertical Conflict (SVC) in the reward function. Using simulations in MATLAB and Simulink, we evaluate the system's performance across various thresholds of SVC, examining its impact on ride comfort, fuel efficiency, and vehicle behavior during lane changes. Results reveal a trade-off relationship between ride comfort and fuel efficiency, with lower SVC thresholds generally improving comfort but potentially increasing steering input. Additionally, excessively low SVC thresholds degrade target-reaching performance and lengthen lane change distances, highlighting the need for careful parameter tuning. Overall, our findings demonstrate the potential of reinforcement learning-based steering control systems to optimize multiple evaluation criteria simultaneously while emphasizing the importance of balancing trade-offs in autonomous driving scenarios.

08:05-09:45, Paper TuAPSR.18
Robust Controller for Varying Speed Autonomous Ground Vehicles Considering System Uncertainties and Road Conditions

Rahim, Md Abdur	Deakin University
Arogbonlo, Adetokunbo	Deakin University
Pappu, Mohammad Rokonuzzaman	Deakin University
Abu Alqumsan, Ahmad	Deakin University
Keywords: Autonomous Vehicle, Control of Uncertain Systems, System Modeling and Control Abstract: This paper presents a novel robust path-tracking controller for autonomous ground vehicles. Environmental and vehicle factors like variation in road conditions and varying speed can adversely affect autonomous ground vehicles' path-tracking capability. A polytopic linear parameter varying model for autonomous ground vehicle that accounts for system uncertainties with varying speeds and road conditions is formulated. Then, an H_∞ based robust path-tracking controller is developed using this model to minimise the vehicle's lateral velocity, heading error, and slip angle. Simulation results comparing the proposed controller with a conventional robust controller are presented. The findings show that the proposed controller performs well and is more effective than the conventional robust controller.

08:05-09:45, Paper TuAPSR.19
Safety Verification of Advanced Driver Assistance Systems Using Hybrid Automaton Reachability

Liu, Lu	Huazhong University of Science and Technology
Sun, Qi	Huazhong University of Science and Technology
Yang, Liren	Huazhong University of Science and Technology
Li, Yahui	Huazhong University of Science and Technology
Zhou, Chunjie	Huazhong University of Science and Technology
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Cooperative Systems and Control Abstract: Advanced driver assistance system (ADAS) is effectively promoting the vehicular automation level and it is critical to ensure its functional safety. While existing analysis mainly focuses on individual applications of ADAS, safety violations in the overall system can be found by extensive road tests, which are not only costly in terms of time and money but also lack a formal safety guarantee. This is because tests may not cover all driving scenarios, especially the ones that involve discrete mode switching. In this paper, we focus on the longitudinal vehicle motion and provide a pipeline to perform safety verification for all the related ADAS applications. To that end, we specify safety constraints and boundaries for a vehicle’s longitudinal cruising and collision avoidance and validate a longitudinal dynamic model against the high-fidelity simulation software CarSim. Then we define hybrid automata to describe the closed-loop system composed of the vehicle dynamics and the ADAS. Finally, by computing the reachable sets of the hybrid automata and comparing them with the specified safety boundaries, the ADAS is verified. Numerical experiments demonstrate the efficacy of the proposed approach.

08:05-09:45, Paper TuAPSR.20
Multi-Segment Fusion-Enhanced Spatial-Temporal Graph Convolutional Network for Traffic Flow Prediction (I)

Zhang, Wei	Chongqing University of Posts and Telecommunications
Tang, Peng	Southwest University
Keywords: Intelligent Transportation Systems Abstract: Accurate traffic Flow Prediction can assist in traffic management, route planning, and congestion mitigation, which holds significant importance in enhancing the efficiency and reliability of intelligent transportation systems (ITS). However, existing traffic flow prediction models suffer from limitations in capturing the complex spatial-temporal dependencies within traffic networks. In order to address this issue, this study proposes a multi-segment fusion-enhanced spatial-temporal graph convolutional network (MS-STGCN) for traffic flow prediction with the following three-fold ideas: a) building a unified spatial-temporal graph convolutional framework based on Tensor M-product, which capture the spatial-temporal patterns simultaneously; b) incorporating hourly, daily, and weekly components to model multi temporal properties of traffic flows, respectively; c) fusing the outputs of the three components by attention mechanism to obtain the final traffic flow prediction results. The results of experiments conducted on two traffic flow datasets demonstrate that the proposed MS-STGCN outperforms the state-of-the-art models.


TuK3N	HALL C&D
Keynote 3 Chairperson: Prof. Andreas AI-Based Convoying of Leader-Follower Autonomous Vehicles


TuBT1	MR01
Computational Intelligence and Soft Computing 2	Regular Papers - Cybernetics
Chair: Yu, Baijiang	South China University of Technology

11:00-11:20, Paper TuBT1.1
Incremental Evolution of Three Degree-Of-Freedom Arachnid Gaits

Parker, Gary	Connecticut College
Isak, Manan Basil Masaru	Connecticut College
O'Connor, Jim	Connecticut College
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence Abstract: In this research, we evolve gaits for an arachnid-inspired robot. The method used is an expansion upon previous research on the incremental evolution of gaits for hexapod robots with two degrees of freedom per leg, which we now apply to a more complex, eight-legged robot with three degrees of freedom per leg. Incremental evolution handles gait generation for legged robots in two discrete increments. The first increment uses a cyclic genetic algorithm to learn the activations (pulse instructions to the servos) required for each leg to perform a single-leg cycle. This learning program takes into account the way each leg is mounted on the body and the range of movement provided by the three servos on each leg to produce a smooth, straight, and efficient leg cycle. The second increment uses a genetic algorithm to select the best combination of leg cycles for each leg and to learn the timing to execute each leg cycle to coordinate them all together into a single gait. In this work, we learn the gait incrementally in a simulation and transfer the final gaits to the real robot to confirm the method’s viability.

11:20-11:40, Paper TuBT1.2
Individual-Level Dominant Exemplar Selection for Particle Swarm Optimization

Wang, Hu-Long	Nanjing University of Information Science and Technology
Duan, Danting	Key Laboratory of Media Audio & Video, Communication University
Yang, Qiang	Nanjing University of Information Science and Technology
Gao, Xu-Dong	Nanjing University of Information Science and Technology
Xu, Peilan	Nanjing University of Information Science and Technology
Lin, Xin	Nanjing University of Information Science and Technology
Lu, Zhen-Yu	Nanjing University of Information Science and Technology
Zhang, Jun	Hanyang University
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence Abstract: Leading exemplars play significant roles in updating particles to seek optimal solutions for Particle Swarm Optimization (PSO). Along this road, this paper devises an Individual-level Dominant Exemplar Selection (IDES) framework for PSO, giving rise to a new PSO variant named IDESPSO. Specifically, instead of using their own personally best positions and the globally best position of the entire swarm to update particles, IDES first randomly chooses two different exemplars for each particle from all personally best positions. Then, it compares the two selected exemplars with the personally best position of this particle. Based on the comparison results, different updating strategies are utilized to update different particles. This method notably enriches the variety among the chosen leading exemplars, thereby substantially bolstering the updating diversity of particles. Under IDES, this paper further develops seven selection strategies to help IDESPSO pick up promising exemplars for particles to evolve. Specifically, the seven selection schemes are the roulette wheel selection, the tournament selection, and five hybridizations of two basic models. A series of experiments have been undertaken on the universally used CEC2014 problem suite to compare IDESPSO with the seven selection schemes and two classic PSOs. The empirical results show that IDESPSO paired with anyone of the seven selection methods, markedly outperforms the two classical PSO variants, highlighting its significant performance.

11:40-12:00, Paper TuBT1.3
EARL-Light: An Evolutionary Algorithm-Assisted Reinforcement Learning for Traffic Signal Control

Chen, JingYuan	South China University of Technology
Wei, Feng-Feng	South China University of Technology
Chen, Tai-You	South China University of Technology
Hu, Xiao-Min	Guangdong University of Technology
Jeon, Sang-Woon	Hanyang University
Wang, Yang	Northwestern Polytechnical University
Chen, Wei-Neng	South China University of Technology
Keywords: Evolutionary Computation, Computational Intelligence, Machine Learning Abstract: Traffic signal control (TSC) problems have re- ceived increasing attention with the development of the smart city. Reinforcement learning (RL) models TSC as a Markov decision process and learns the timing relationship of traffic scheduling from massive historical data. Due to the uncertainty and mutability of TSC problems, existing RL methods face bottlenecks in diversity and are easy to be trapped into local optima. To alleviate this predicament, this paper combines evolutionary optimization and RL to propose an evolution- ary algorithm-assisted reinforcement learning (EARL-Light) method for TSC problems. EARL-Light is a population-based algorithm, in which one individual represents a policy and a population of individuals are evolved to search for near-optimal policies. The diversified search ability of evolutionary optimiza- tion can help the algorithm get rid of local optima for global optimization and the rapid learning based on the gradient of RL can achieve fast convergence. Extensive experiments on seven real-world traffic datasets demonstrates that EARL-Light achieves shorter travel time with fast convergence.

12:00-12:20, Paper TuBT1.4
Evolutionary Reinforcement Learning with Double Replay Buffers for UAV Online Target Tracking

Yu, Baijiang	South China University of Technology
Wei, Feng-Feng	South China University of Technology
Hu, Xiao-Min	Guangdong University of Technology
Jeon, Sang-Woon	Hanyang University
Luo, Wenjian	Harbin Institute of Technology, Shenzhen
Chen, Wei-Neng	South China University of Technology
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence Abstract: Target tracking has broad applications like disaster relief, and unmanned aerial vehicles (UAVs) have been universally applied in target tracking in recent years. Due to the strong responsiveness to deceptive reward signals and diverse exploration, evolutionary reinforcement learning (ERL) is a more noteworthy option for training UAVs than common reinforcement learning. However, for ERL contains too many agents, its training efficiency is not satisfactory enough. To address this shortcoming, this paper proposes an evolutionary reinforcement learning with double replay buffers (ERLDRB) for UAV online target tracking problem. Firstly, considering the energy consumption and the possible delay of feedback signals to the UAV, a more realistic model of UAV online target tracking problem is designed. Then based on the problem formulation, ERLDRB utilizes a double experience replay buffers technique to increase learning efficiency in the training stage, which can better solve real-world UAV online target tracking problem. Simulation results show that ERLDRB outperforms multiple contrasting algorithms on the designed model.

12:20-12:40, Paper TuBT1.5
Matrix-Based Ant Colony System for Traveling Salesman Problem

Li, Xu	South China University of Technology
Li, Jian-Yu	South China University of Technology
Chen, Chun-Hua	South China University of Technology
Zhan, Zhi-Hui	South China University of Technology
Kwong, Sam Tak Wu	Lingnan University
Zhang, Jun	Hanyang University
Keywords: Evolutionary Computation, Swarm Intelligence, Computational Intelligence Abstract: Ant colony system algorithm (ACS), as an important evolutionary computation (EC) algorithm, has demonstrated significant advantages in solving complex optimization problems. However, traditional EC algorithms and traditional ACS algorithm often face the challenge of slow computational speed when dealing with large-scale problems. In recent years, matrix-based EC approaches have been proposed to accelerate the computational speed, which has obtained promising results in dealing with large-scale problems. However, most existing matrix-based EC algorithms are designed for continuous optimization problems, while the matrix-based approach integrated with ACS has not attracted enough attention, which will be efficient for solving large-scale discrete optimization problems. Therefore, in this paper, we propose a matrix-based ACS (MACS) algorithm and apply it to solve the traveling salesman problem (TSP). MACS is an innovative improvement over the traditional ACS algorithm, utilizing matrix operations to parallelly let ants select city and update pheromone. Experimental results show that the MACS algorithm has significantly better efficiency in accelerating computational speed while maintaining the remarkable problem-solving ability in solving large-scale TSP.

12:40-13:00, Paper TuBT1.6
Building Consensus in Group Decision-Making with Intuitionistic Reciprocal Preference Relations: An Analysis of Various Protocols of Information Granularity Distribution

González-Quesada, Juan Carlos	University of Granada
Cabrerizo, Francisco Javier	University of Granada (Q1818002F)
Herrera Viedma, Enrique	University of Granada (Spain)
Pedrycz, Witold	University of Alberta
Keywords: Fuzzy Systems and their applications, Computational Intelligence Abstract: On the one hand, to model experts' preferences in group decision-making, intuitionistic reciprocal preference relations have widely been used because they allow for accommodating hesitation degrees, which are inherent to all decision-making processes. On the other hand, an optimization of information granularity distribution has recently been applied to establish consensus during group decision-making processes. Concretely, a symmetric and uniform distribution of information granularity has been considered for intuitionistic reciprocal preference relations. However, there exist other protocols of information granularity distribution that could be used. Therefore, we aim to analyze all the information granularity distribution protocols and determine their effectiveness in building consensus through intuitionistic reciprocal preference relations. The performance of the different protocols is discussed by conducting some numerical experiments that help provide insights into the effectiveness of the protocols to build consensus.


TuBT2	MR02
Deep Learning and Neural Networks 5	Regular Papers - Cybernetics
Chair: Raju, S M Taslim Uddin	University of Waterloo

11:00-11:20, Paper TuBT2.1
CMA-BP: A Clustered Multi-Task Learning and Branch Attention Based Branch Predictor

Ming, Li	University of Electronic Science and Technology of China
Rucong, Xu	University of Electronic Science and Technology of China
Zhang, Hexu	University of Electronic Science and Technology of China
Li, Lin	Qingdao Agriculture University
Li, Yun	Shenzhen Institute for Advanced Study, University of Electronic
Keywords: Neural Networks and their Applications, Machine Learning, AI and Applications Abstract: Branch prediction stands as a key bottleneck in enhancing CPU performance, particularly evidenced by an average of around 10 mispredicted hard-to-predict(H2P) branches per benchmark in SPEC 2017 by current neural network methods. To improve, this paper proposed a Clustered Multitask Learning and Branch Attention Mechanism-Based Branch Predictor (CMA-BP). Clustered multi-task learning enhances model generalization, and branch attention extracts preferences of different branches for global history. Thus, CMA-BP efficiently aggregates branches with similar features, reducing training complexity. Experimental results show that CMABP outperforms existing predictors in accuracy significantly and in the number of parameters required. By advancing the state-of-the-art in branch prediction, this work has important implications for future high-performance computer architecture design

11:20-11:40, Paper TuBT2.2
RS-DETR: An Improved DETR for High-Resolution Remote Sensing Image Object Detection

Cao, Feng	Shanxi University
Wang, Ruoyu	Shanxi University
Li, Deyu	Shanxi University
Hu, ZhiGuo	Shanxi University
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications Abstract: High-resolution remote sensing image object detection is an important research area in remote sensing information processing and has substantial practical applications. This domain presents unique challenges, including variable object scales, complex backgrounds, prevalent small objects, and densely arranged items, distinguishing it from traditional object detection in natural images. This paper proposes a novel object detection algorithm(RS-DETR), which builds upon the DETR framework and integrates the Swin Transformer. The algorithm features a dual-branch structure in its feature extraction module, markedly improving detection accuracy, especially for objects of varying scales. The addition of the GAM convolutional attention mechanism allows the model to concentrate more effectively on relevant regions, minimizing background complexities. Moreover, we have included the scale-invariant intersection over union (SIoU) loss function to enhance the precise localization of closely packed objects. To demonstrate the efficacy of the algorithm, RS-DETR was applied to the HRSC2016 and NWPU VHR-10 datasets. The results show average detection accuracies of 86.1% and 57.9% on these datasets, respectively, outperforming the baseline models by 1.1% and 0.9%, respectively.

11:40-12:00, Paper TuBT2.3
TransUAAE-CapGen: Caption Generation from Histopathological Patches through Transformer and UNet-Based Adversarial Autoencoder

Raju, S M Taslim Uddin	University of Waterloo
Mohammad, Abdul Raqeeb	University of Waterloo
Islam, Md. Milon	University of Waterloo
Karray, Fakhreddine	University of Waterloo
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning Abstract: Captioning Whole Slide Images (WSIs) for pathological analysis is an essential but not extensively explored aspect of computer-aided pathological diagnosis. Challenges arise from insufficient datasets and the effectiveness of model training. Generating automatic caption reports for various gastric adenocarcinoma images is another challenge. In this paper, we introduce a hybrid method referred to as TransUAAECapGen to generate histopathological captions from WSI patches. The TransUAAE-CapGen architecture consists of a hybrid UNet-based Advereasrial Autoencoder (AAE) for feature extraction and a transformer for caption generation. The hybrid UNet-based AAE extracted complex tissue properties from histopathological patches, transforming them into lowdimensional embeddings. The embeddings are then fed into the transformer to generate concise captions. Our proposed method is validated using the PatchGastricADC22 dataset. The TransUAAE-CapGen model provides the best estimated accuracy of BLEU-4 = 86.8%, METEOR = 59.6%, a ROUGE = 89.3%, and CIDEr = 7.72%. Experimental analysis indicates that the TransUAAE-CapGen architecture outperforms the traditional LSTM-based model for the caption generation task. Our findings reveal that the proposed architecture can effectively generate accurate and precise reports for medical image analysis.

12:00-12:20, Paper TuBT2.4
Learned Image Compression with Transformer-CNN Mixed Structures and Spatial Checkerboard Context

Ji, Kexin	Hohai University
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition Abstract: Learning-based image compression techniques combined with current Transformer models and with checkerboard context models have shown the excellent Rate-Distortion performance. However, the mixed structure still has room for optimization in terms of redundancy information and decoding efficiency, while the checkerboard context model has redundancy in capturing correlations between latent representations. To solve these problems, we propose an innovative framework that combines a mixed Transformer-CNN structure with a checkerboard context model. Specifically, we introduce a ``Checkerboard Channel-wise Entropy Module" to improve coding efficiency of utilizing contexts through a two-channel decoding method with checkerboard contexts. Then, we propose the ``In-slice Odd-even Context", which improves the handling of spatial redundancy information by adding additional spatial contexts by introducing a checkerboard context model to the original mixed structure with channel contexts and global contexts. Extensive experimental results demonstrate that our proposed method outperforms JPEG, BPG and previous learned image compression on the Kodak dataset.

12:20-12:40, Paper TuBT2.5
Multi-Kernel Broad Learning System Based on Elastic-Net with Random Fourier Features

Zhang, Qihuai	Beijing Normal University
Zhao, Xiaojie	Beijing Normal University
Keywords: Machine Learning, Neural Networks and their Applications Abstract: The Broad Learning System (BLS) features a simple yet efficient network structure, with its core being the fast and random generation of hidden layers; however, this generation method not only fails to effectively capture the nonlinear characteristics in the task, but also generates certain 'redundant nodes', which can negatively affect its learning capabilities. In this study, we propose an improved version of BLS, named the KEFBLS, aimed at enhancing the feature extraction capability of the hidden layer through the integration of multi-kernel technology and network sparsification strategies, complemented by deeper feature extraction using random Fourier features. the KEFBLS first combines polynomial and wavelet kernels to boost the nonlinear mapping capabilities of data; then, it applies the elastic-net method to refine the BLS objective function, removing low-impact hidden layer nodes to reduce redundancy and create a more streamlined network; finally, KEFBLS employs random Fourier features to map the processed hidden layers, further enhancing the network's feature extraction capabilities, constructing a new learning model. Our experimental results on three UCI regression datasets demonstrate that KEFBLS surpasses other methods in terms of learning efficiency and model performance.

12:40-13:00, Paper TuBT2.6
SFAM-Net: A Novel Dual-Branch Network Based on Spectral Feature and Attention Machine for Building Change Detection in Remote Sensing Imagery

Li, Jiequn	Taiyuan University of Technology
He, Zhisen	Taiyuan University of Technology
Lv, Yanfang	Taiyuan University of Technology
Yan, Chen	Taiyuan University of Technology
Wang, XingKui	Taiyuan University of Technology
Keywords: Neural Networks and their Applications, Deep Learning, Machine Vision Abstract: Deep learning techniques have significantly advanced change detection in remote sensing imagery. However, building change detection presents challenges due to the varied appearance of buildings and the complexity of scenes in remote sensing images. Current deep learning-based methods encounter three primary issues. Firstly, CNN-based approaches struggle to model crucial global contextual information essential for remote sensing building images analysis. Transformer-based methods may inadvertently degrade local features. Secondly, traditional attention mechanisms fall short in effectively modeling spatial and spectral features. Thirdly, certain channel attention methods extract excessive redundant information.To address these challenges, this study proposes SFAM-Net, a two-branch hybrid architecture. Our approach initially employs orthogonal methods to minimize redundant information extracted from channels and spaces. Subsequently, we leverage the parallel structure of convolutions and visual transformers to enhance images representation, integrating local features and global representations through cross-attention to better coordinate building and background features. In the CNN and Transformer branches, we adopt spatial-spectral feature coordination and spectral multi-head attention coordination strategies to improve performance in complex scenes. Additionally, we introduce a novel loss function combining edge and center guidance, focusing on changing image edges and centers to enhance sensitivity and accuracy in change area detection. Extensive experiments on widely used LEVIR-CD and WHU-CD datasets validate the effectiveness and efficiency of our network.


TuBT4	MR04
BMI - Recent Advances of Brain-Computer Interfaces (Chair: Ivan Volosyak, Co-Chairs: Vinod A. Prasad)	BMI Workshop Papers
Chair: Cantürk, Atilla	Rhine-Waal University of Applied Sciences

11:00-11:20, Paper TuBT4.1
A Browser-Driven cVEP-Based BCI Web Speller (I)

Cantürk, Atilla	Rhine-Waal University of Applied Sciences
Spieker, Kathrin	Rhine-Waal University of Applied Sciences
Volosyak, Ivan	Rhine-Waal University of Applied Sciences
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics Abstract: By utilizing brain signals, the Brain-Computer Interface (BCI) enables non-muscular communication. In recent decades, BCI systems —particularly speller interfaces— have offered a variety of graphical user interfaces (GUIs). Many attempts had been made to improve the system’s user-friendliness and speller speed. In this paper we present a web-based BCI speller based on Code-Modulated Visual Evoked Potentials (cVEPs), which can be accessed at https://bci-lab.hochschule-rhein-waal.de/en/cvepspeller/. As a result, this web speller is now available to BCI researchers worldwide free of charge, and can be used with variety of own classifier applications. This web speller was successfully tested with the majority of modern web browsers. In the three-step web speller, each character can be selected by navigating through three distinct web interface screens. In this study the web-based cVEP speller was tested and compared to our “local speller” (incorporating the GUI and the signal processing in one application) by seven subjects, who were asked to spell the words “BCI_LAB” and “KLEVE”. All subjects were able to perform the spelling tasks, with a mean accuracy of 95.02% and an average Information Transfer Rate (ITR) of 42.55 bits/min, compared to our “local speller” with a mean accuracy of 93.81% and an average ITR of 48.58 bits/min, respectively. The results showed similar values, confirming the suitability of the suggested web speller for the representation of the cVEP stimuli.

11:20-11:40, Paper TuBT4.2
Decoding Speed and Direction of Imagined Hand Movement from EEG-BCI

Gangadharan K, Sagila	Indian Institute of Technology Palakkad
A. P., Vinod	Singapore Institute of Technology
Keywords: BMI Emerging Applications, Active BMIs Abstract: Motor Imagery-based Brain Computer Interface (BCI) system, that decodes imagined movements from non-invasive electroencephalogram (EEG) are of significant importance, as it can enhance neurorehabilitation and human- computer interaction. Decoding the kinematic parameters of imagined movement is essential to realize BCI systems with higher degrees of freedom of movement and precise control over external effectors. In this work, we propose an efficient algorithm to decode imagined bi-directional movements of hand at two different speeds, slow and fast, from EEG-based BCI. EEG is recorded from fourteen healthy subjects, while they imagined slow and fast movement of their right hand towards the right or left direction. Wavelet-Common Spatial Pattern (WCSP) features and Movement Related Cortical Potential (MRCP) features are extracted from the EEG and a subset of these features are further identified based on the subject-specificity of discriminative subbands and channels. Selected WCSP and MRCP features are then concatenated and used to decode the imagined slow and fast bi-directional movements. Binary classification of the speed-direction pairs resulted in an average classification accuracy of 68.1% across fourteen subjects. To our knowledge, this is the first work addressing decoding of imagined speed and direction of hand movements using EEG-BCI.

11:40-12:00, Paper TuBT4.3
A Study on the Efficacy of an Online Neurofeedback Game Using Consumer-Grade EEG on Enhancing Attention Skills in Stroke and Mild Cognitive Impairment Patients

T.A., Suhail	Indian Institute of Technology Palakkad, Kerala
R., Subasree	Department of Neurology, National Institute of Mental Health And
A. P., Vinod	Singapore Institute of Technology
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics Abstract: The burden of cognitive impairment is increasing worldwide and the rehabilitation of patients with such disorders is a major concern. Neurofeedback training (NFT) is emerging as a promising non-pharmacological intervention for enhancing cognition in healthy and cognitive-deficit patients. In this paper, we examine the efficacy of a real-time neurofeedback game using a four-channel consumer-grade Electroencephalography (EEG) system on enhancing overt and covert attention skills in Stroke and Mild cognitive impairment (MCI) patients. The game works based on navigating a car on a computer screen using the player’s attention level computed from EEG signals streamed in real-time. The proposed NFT is conducted across 18 patients (10 Stroke and 8 MCI). The attention score significantly improved from the first session to the final session by 1.25-12.55% in the stroke group and 1.84-44.14% in the MCI group. The experimental results demonstrate that the proposed neurofeedback game using consumer-grade EEG is an effective tool for enhancing overt and covert attention skills in patients with cognitive impairment. To our knowledge, this is the first EEG-Brain Computer Interface study for improving overt and covert attention on the cross-sectional stroke-MCI patients.

12:00-12:20, Paper TuBT4.4
Auto-Adaptive Model for Longitudinal Motor Imagery Decoding in Amyotrophic Lateral Sclerosis

Patel, Rishan	University College London
Bryson, Barney	University College London
Jiang, Dai	University College London
Demosthenous, Andreas	University College London
Keywords: Other Neurotechnology and Brain-Related Topics, Active BMIs, BMI Emerging Applications Abstract: Amyotrophic Lateral Sclerosis (ALS) has been a grossly misrepresented end user group when developing coadaptive algorithms for Brain Computer Interfaces (BCI). Researchers have credited this issue to the difficulty of progressing disease in patients with ALS. This non-stationarity reduces accuracy over time. This paper introduces an online model, usable for a BCI using ALS patients data. The automatic coadaptive model effectively decodes 3 class motor imagery (MI) of the left, right hand and rest while adapting to address non-stationarities of Electroencephalography (EEG) over time caused by various factors over the study duration. Adapting Filter bank Common Spatial Pattern (FBCSP) algorithm, where we show it could enable above 70% detection of hand MI in ALS end users longitudinally, previously lacking evidence. The evaluation results demonstrate that the model achieves average accuracies of 72.6% over a 1–2 month period of usage involving 8 ALS patients. This work shows the first auto-adaptive model with ALS patient EEG data providing a stronger incentive for further investigation by setting benchmark models on longitudinal datasets contributing to the solution of multiple challenges in this field.

12:20-12:40, Paper TuBT4.5
Post-Training Quantization in Brain-Computer Interfaces Based on Event-Related Potential Detection (I)

Cecotti, Hubert	California State University, Fresno
Dhaliwal, Dalvir	California State University, Fresno
Singh, Hardip	California State University, Fresno
Meena, Yogesh	IIT Gandhinagar
Keywords: Other Neurotechnology and Brain-Related Topics, BMI Emerging Applications, Active BMIs Abstract: Post-training quantization (PTQ) is a technique used to optimize and reduce the memory footprint and computational requirements of machine learning models. It has been used primarily for neural networks. For Brain-Computer Interfaces (BCI) that are fully portable and usable in various situations, it is necessary to provide approaches that are lightweight for storage and computation. In this paper, we propose the evaluation of post-training quantization on state-of-the-art approaches in brain-computer interfaces and assess their impact on accuracy. We evaluate the performance of the single-trial detection of event-related potentials representing one major BCI paradigm. The area under the receiver operating characteristic curve drops from 0.861 to 0.825 with PTQ when applied on both spatial filters and the classifier, while reducing the size of the model by about times 15. The results support the conclusion that PTQ can substantially reduce the memory footprint of the models while keeping roughly the same level of accuracy.

12:40-13:00, Paper TuBT4.6
Towards Effective Deep Neural Network Approach for Multi-Trial P300-Based Character Recognition in Brain-Computer Interfaces

Shukla, Praveen	Indian Institute of Technology Gandhinagar
Cecotti, Hubert	California State University, Fresno
Meena, Yogesh	IIT Gandhinagar
Keywords: Passive BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics Abstract: Brain-computer interfaces (BCIs) enable direct interaction between users and computers by decoding brain signals. This study addresses the challenges of detecting P300 event-related potentials in electroencephalograms (EEGs) and integrating these P300 responses for character spelling, particularly within oddball paradigms characterized by uneven P300 distribution, low target probability, and poor signal-to-noise ratio (SNR). This work proposes a weighted ensemble spatio-sequential convolutional neural network (WE-SPSQ-CNN) to improve classification accuracy and SNR by mitigating signal variability for character identification. We evaluated the proposed WE-SPSQ-CNN on dataset II from the BCI Competition III, achieving P300 classification accuracies of 69.7% for subject A and 79.9% for subject B across fifteen epochs. For character recognition, the model achieved average accuracies of 76.5%, 87.5%, and 94.5% with five, ten, and fifteen repetitions, respectively. Our proposed model outperformed state-of-the- art models in the five-repetition and delivered comparable performance in the ten and fifteen repetitions.


TuBT5
Autonomous and Intelligent Vehicles 2	Regular Papers - SSE
Chair: Wang, Xinxin	Chang'an University

11:00-11:20, Paper TuBT5.1
A Deep Reinforcement Learning Based Cooperative Adaptive Cruise Control for Connected and Autonomous Vehicle by Considering Preceding Vehicle Properties

Wang, Xinxin	Chang'an University
Gong, Siyuan	Chang'an University
Zeng, Lu	Chang'an University
Ding, Yukun	Chang'an University
Yin, Jiakai	Chang'an University
Keywords: Intelligent Transportation Systems, Cooperative Systems and Control, Distributed Intelligent Systems Abstract: Deep reinforcement learning (DRL) based cooperative adaptive cruise control (CACC) of connected and autonomous vehicles (CAVs) shows great potential in improving controllers’ adaptability to real-world dynamic traffic conditions. However, most DRL-based CACCs primarily focus on fixed predecessor following (PF) information flow topology (IFT) while ignoring the heterogeneity of the vehicle. In reality, even though most CACCs separate the platoon by human-driven vehicle (HDV) while considering the HDV as the leading vehicle of the platoon, the heterogeneity of the preceding vehicle with different motion properties (i.e. HDV and CAV) still exists. The adaptability of the DRL-based CACC may be degraded if the above heterogeneity is ignored. To fill this research gap, a DRL-based CACC with preceding vehicle properties is proposed in this study. Specifically, it first designs three typical sub-controllers by considering the applicable IFTs and the properties of preceding vehicles. The proposed DRL-based CACC is established according to the aforementioned designs with the objectives of safety, efficiency, smoothness, and comfort. To enhance the efficiency of transforming acquired data into the learning experience during the exploration process of the DRL algorithm, a periodic update deep deterministic policy gradient (DDPG) algorithm is proposed. The results demonstrate the necessity of considering the preceding vehicle properties in designing the corresponding controller for a specific IFT.

11:20-11:40, Paper TuBT5.2
A Hierarchical Heterogeneous IoT Time Series Data Index for NVM

Cai, Tao	Jiangsu University
Lei, Tian-Le	Jiangsu University
Niu, Dejiao	Jiangsu University
Dai, Jianfei	Jiangsu University
Huang, Zeyu	Jiangsu University
Ni, Qiangqiang	Jiangsu University
Keywords: System Architecture Abstract: The index plays an important role in the performance of IoT time series data storage systems. However, the current index designed for HDD or SSD can not adapt to the characteristics of IoT time series data and effectively leverage the performance advantages of NVM. A new hierarchical heterogeneous index is designed for IoT time series storage system based on NVM. The structure is given first and the group consists of several data blocks used to manage IoT time series data. The ordered construction strategy is designed for skip list by the sustained generation of IoT time series data and creating the index for each IoT time series data block group. Meanwhile, a compression and reconstruction strategy for skip list is given to effectively utilize NVM. Then, a TS-Radix tree is presented to index IoT time series data block groups by the temporal characteristics of IoT time series data. Finally, a prototype of it is implemented and YCSB-TS is used to evaluate. The results show that this new index can effectively improve the throughput of random and range queries by up to 262.4%, surpassing the performance of InfluxDB, OpenTSDB, and KairosDB.

11:40-12:00, Paper TuBT5.3
Cooperative Control for Multiple DC-DC Converters of Li-Ion Battery Systems

Li, Heng	Central South University
Le, Chen	Central South University
Zhu, Ren	Central South University
Yu, Haiya	Central South University
Peng, Hui	Central South University
Keywords: Electric Vehicles and Electric Vehicle Supply Equipment Abstract: With the rapid development of Autonomous Rail Rapid Transit technology, lithium-ion battery as its main power source, the research of its charging technology has become particularly important. At present, the charging scheme of lithium-ion battery mainly includes two ways: single high-power charging and multiple low-power charging modules in parallel. However, the single high-power charging scheme has the problems of high cost and low efficiency, and the traditional parallel charging method lacks effective module management strategy, resulting in unbalanced load between modules during charging, affecting charging efficiency and safety. Aiming at the shortcomings of current research, this paper proposes a parallel charging scheme of multiple DC-DC modules based on cooperative control. By designing a closed-loop control system of current inner loop and voltage outer loop, the precise control and cooperative work of parallel modules are realized. The experimental results show that the scheme not only improves the charging efficiency, but also ensures the stability and safety of the charging process, which provides an effective and reliable solution for the lithium-ion battery charging of the intelligent rail train.

12:00-12:20, Paper TuBT5.4
Predictive Set-Point Modulation Control of Lithium-Ion Battery Storage System for Autonomous Rail Rapid Transit

Li, Heng	Central South University
Yu, Haiya	Central South University
Zhu, Ren	Central South University
Le, Chen	Central South University
Peng, Hui	Central South University
Keywords: Intelligent Transportation Systems Abstract: With rubber wheels instead of steel wheels and no need to be guided by steel rails, Autonomous rail Rapid Transit(ART), is gradually coming into people's lives. However, ART still occupies existing lanes and the relatively small station spacing of ART means that ART needs to be started and stopped frequently, all of which can lead to fluctuations in DC bus voltage during ART operation, making it difficult for loads (such as motors, air conditioners, and sensors) to operate properly. Therefore, this paper proposes the use of predictive set-point modulation to suppress DC bus voltage fluctuations. The predictive set-point modulation method is able to predict the direction of DC bus voltage changes prospectively, and then adjust the voltage preset value to balance the fluctuation of the output voltage, optimizing the closed-loop system's transient dynamic performance. Moreover, since lithium ion battery has high power and high energy consumption, using only one DC-DC circuit can reduce system reliability and cost. Therefore, we propose to use parallel DC-DC modules to balance the excessive power of the battery and verify the feasibility of the proposed method through simulation experiments. The experiments show that the proposed method can effectively suppress the DC bus voltage fluctuation and improve the system reliability.

12:20-12:40, Paper TuBT5.5
An Urabn Electric Vehicle Charging System Via Hybrid Heterogeneous Modes

Zhang, Keyang	School of Cyber Science and Engineering, Wuhan University
Liu, Yueheng	Wuhan Cyber Security Association
Liu, Shuohan	Qilu University of Technology
Gao, Junqiao	School of Cyber Science and Engineering, Wuhan University
Cao, Yue	School of Cyber Science and Engineering, Wuhan University, China
Ahmad, Naveed	School of Management, Northwestern Polytechnical University
Zhang, Xu	University of East Anglia
Keywords: Electric Vehicles and Electric Vehicle Supply Equipment Abstract: Electric Vehicle (EV) is regarded as the optimal alternative to traditional fuel-powered vehicles. However, the exponential surge in EV charging demand poses challenges in charging infrastructure planning and charging behavior management. The efficacy of traditional Grid-to-Vehicle (G2V) charging mode, which obtains power from the grid, is curtailed by the limited number and uneven distribution of charging facilities, inevitably leading to charging congestion. Instead, the concept of Vehicle-to-Vehicle (V2V) charging has emerged as a spatio-temporally flexible charging mode, which expands EV's role from consumer to provider, forming V2V pairs and utilizing urban Parking Lots (PLs) as charging locations. In this paper, we propose a Hybrid Heterogeneous Modes (HHM)-based EV charging optimization scheme in urban settings. Building upon the G2V mode, we introduce synchronous and asynchronous V2V charging modes as optimization strategies, integrating and exploiting the unique advantages of each mode. We also utilize global EV charging scheduling and comprehensively take four dimensions into consideration (energy trading cost, travel cost, waiting time cost and loss cost), thus achieving flexible mode selection, V2V pairing, and designated charging locations. Simulations confirm the effectiveness of the proposed scheme in reducing total charging costs, optimizing user service experiences, and improving charging facility utilization rates.

12:40-13:00, Paper TuBT5.6
How to Drive - an Ability-Based Description of Autonomous, Remote and Human Driving

Pfab, Florian	Technical University of Munich
Gehrke, Nils	Technical University of Munich
Diermeyer, Frank	Technical University Munich
Keywords: Autonomous Vehicle, Trust in Autonomous Systems, Fault Monitoring and Diagnosis Abstract: The development of autonomous and remote-operated driving systems requires extensive stakeholder analyses, requirement engineering, and formalized system descriptions. This is necessary to guarantee the success of the final product after the expensive and time-consuming development phase. To integrate a formalized description of the required abilites of the system, ability graphs have been proposed in the literature. Up to this date, however, this ability graph has only been used to model less complicated driver assistance systems in the literature. This work aims to introduce the value of an ability graph-based description of complex driving systems. This is achieved by successfully demonstrating and discussing a method for constructing a holistic ability graph capable of describing the entirety of abilities required for any driving system.


TuBT6	MR06
Assurance and Reliability	Regular Papers - SSE
Chair: Ueno, Makoto	Japan Aerospace Exploration Agency

11:00-11:20, Paper TuBT6.1
A Reliability Framework for Proactive-Tolerance Reed-Solomon Storage Systems

Li, Jing	Civil Aviation University of China
Zhou, Zhenrui	Civil Aviation University of China
Ding, Jianli	Civil Aviation University of China
Keywords: Quality and Reliability Engineering, System Architecture Abstract: Reed-Solomon coding has been widely adopted to protect data storage against failures in storage systems. Recently, proactive fault tolerance is coming to offer an added protection for data storage coping with the increased device failures. Reliability is critical for storage systems. Due to complex system states and fault tolerance patterns, it is very intricate to analyze the reliability of proactive-tolerance Reed-Solomon storage systems. In this paper, a reliability framework is proposed, which combines event-driven simulation with mathematical model to analyze the reliability of proactive-tolerance Reed-Solomon storage systems. Monte Carlo simulation model simulates storage systems operation according to the failure parameters of different subsystems generated from statistical data. A mathematical model is designed to analyze the probability of data loss under certain concurrent failures. The proposed framework models the impact of various device failures (including permanent device failures, transient failures and correlated failures), network bandwidth, and both accurate predictions and false alarms of proactive fault tolerance on the reliability. The proposed reliability framework can be adapted to different Reed-Solomon codes and system configurations, offering versatility in its application and assist system designers in optimizing trade-offs and comparing schemes, which is beneficial for system design and operation.

11:20-11:40, Paper TuBT6.2
Correlation between Entropy of Competency Performance and Training Period in Airline Captain Upgrade Training

Ueno, Makoto	Japan Aerospace Exploration Agency
Yamada, Kento	Japan Aerospace Exploration Agency
Matsuda, Takeshi	Japan Airlines Co., Ltd
Ikeshita, Harumi	Japan Airlines Co., Ltd
Kyoya, Yuta	Japan Airlines Co., Ltd
Keywords: Consumer and Industrial Applications, Quality and Reliability Engineering, System Modeling and Control Abstract: Airline pilot training is shifting from traditional quantity-based (hourly-based) training to competency-based training (CBT), and assessing non-technical skills in CBT is complex and essential. Additionally, the captain upgrade training takes more than one year; therefore, a shorter upgrade period with adequate competency is expected. In this study, for captain upgrade training, the information entropy is estimated from the probability distribution of the count data of assessment marker (AM) flags raised per flight, which is raised when each competency component needs improvement. It is found that the information entropy time history calculated from raw data converges into that predicted by the Software Reliability Growth Model (SRGM), which has been found in previous studies as a model to represent the time history of AM flags. Moreover, the local maximum of the entropy time history during a specific period from the start of training correlates with the length of the applicant’s training period. When entropies for each competency are separately calculated, the competencies related to the training period length can be discussed. These results are expected to help estimate the captain upgrade training period and to acquire insight into the training status, including quantitative assessment of non-technical skills.

11:40-12:00, Paper TuBT6.3
A New Transmission Cause and Effect Analysis (TCEA) Approach to Risk Management for Non-Healthcare Context: A Case Study on COVID-19

Kerk, Yi Wen	Universiti Kebangsaan Malaysia
Tay, Kai Meng	Universiti Malaysia Sarawak
Jong, Chian Haur	University of Technology Sarawak
Chai, Chee Shee	Universiti Malaysia Sarawak
Lim, Chee Peng	Deakin University
Keywords: Quality and Reliability Engineering, System Modeling and Control Abstract: Leveraging the concept of Failure Mode and Effect Analysis (FMEA), we propose a simple and systematic approach, namely Transmission Cause and Effect Analysis (TCEA), to achieve a defined goal of reducing transmission risk through effective preventive and control actions in real-world environments. Specifically, the transmission risk of an infectious disease (e.g., COVID-19) is perceived as a combination of the presence of a transmission agent (e.g., SARS-CoV-2 virus or its variants) and the requisite factors that lead to infection of humans and the associated aftermath of infection. TCEA adopts a causal map to represent all possible transmission risks via a brainstorming process. Next, appropriate preventive and control actions associated with each transmission risk are identified. Similar to FMEA, a Risk Priority Number model with Severity, Occurrence, and Detection ratings is adopted for analysis, prioritization, and decision-making. To demonstrate the usefulness of TCEA, a real-world case study on COVID-19 is conducted. The empirical results indicate that TCEA provide a simple, systematic and easy-to-implement approach to effectively analyze and manage transmission risks of COVID-19 in non-healthcare workplaces.

12:00-12:20, Paper TuBT6.4
Compressed Sensing Signal Reconstruction for Real-Time Machine Vision Systems

Lu, Yu	Shenzhen Technology University
Cai, Pufan	Shenzhen University
Yu, Jingying	Shenzhen Technology University
Shi, Shijie	Shenzhen Technology University
Li, Meng	Shenzhen Technology University
Ge, Huilin	Jiangsu University Science and Technology
Fu, Xianghua	Shenzhen Technology University
Keywords: Autonomous Vehicle, Quality and Reliability Engineering, System Modeling and Control Abstract: The advancement of machine vision systems necessitates efficient and accurate signal reconstruction methods to enhance real-time perception and decision-making capabilities. This paper introduces a Generalized Backtracking Regularization Adaptive Matching Pursuit (GBRAMP) algorithm, designed to reconstruct signals within machine vision systems using compressed sensing techniques. The GBRAMP algorithm improves upon existing methods by incorporating regularization for enhanced atom selection and a backtracking approach to accurately estimate sparsity, addressing the limitations of traditional convex optimization, greedy, and Bayesian reconstruction algorithms. The paper provides a comparative analysis of the GBRAMP algorithm against other prominent reconstruction techniques. Experimental results validate the GBRAMP algorithm's improved performance in terms of both reconstruction accuracy and computational speed, making it a competitive solution for the next generation of machine vision systems.

12:20-12:40, Paper TuBT6.5
Static Code Analysis of IEC 61131-3 ST Programs Via Symbolic Execution

Zhao, Mengyan	East China Normal University
Huang, Yanhong	East China Normal University
Shi, Jianqi	East China Normal University
Chen, Yinghao	East China Normal University
Yang, Yang	East China Normal University
Keywords: Quality and Reliability Engineering, Manufacturing Automation and Systems Abstract: A Programmable Logic Controller (PLC) is an essentially domain-specific computer used to control physical equipment and is widely used in industrial control fields. It plays a crucial role in automating complex processes for industrial automation systems, requiring high reliability as code vulnerabilities can potentially lead to disasters. Therefore, vulnerability detection in PLC programs is of significant importance. However, the availability of tools supporting vulnerability detection in PLC programming languages is limited. This paper attempts to improve industrial security from the perspective of code security and proposes a static code analysis approach specifically designed for IEC 61131-3 Structured Text (ST) programs. This approach uses structural pattern matching and symbolic execution technology to identify program defects and improve quality by detecting problematic code structures and potential issues early in the development process, thereby reducing the debugging effort required during developments. Considering the characteristic of periodic loop execution in PLCs, we introduce the loop unwinding technique to collect constraints from subsequent execution cycles for detection purposes. Based on the aforementioned approach, we implement a static code analysis tool, ST-Checker and make a series of evaluations. The experimental results show that this method is feasible and can detect potential defects that existing PLC compilers cannot detect, improving the precision of defect detection with data dependencies.

12:40-13:00, Paper TuBT6.6
Latent Vector Autoregressive Modeling with Maximum Predicted Variance for Dynamic Process Monitoring

Chen, Shumei	Lingnan University
Qin, Joe	Lingnan University, Hong Kong
Keywords: System Modeling and Control, Quality and Reliability Engineering, Manufacturing Automation and Systems Abstract: In this paper, we define reduced dimensional predictors in latent dynamic systems as contrast to the traditional full-dimensional predictor models. Then the estimation of the new latent vector autoregressive model is developed with an objective to maximize the predicted variance for a given number of latent variables. A new dynamic predictive monitoring index that accounts for variations in the prediction residual and the predictor is developed. The residuals are modeled with a subsequent principal component analysis and a comprehensive monitoring method is developed to detect abnormal situations in industrial and operational systems. The new algorithm is tested on a simple closed-loop control system and the revamped Tennessee Eastman simulated process to show its effectiveness compared to other state of the art methods.


TuBT7	MR07
Online - AI Applications 2

11:00-11:20, Paper TuBT7.1
MFM: Multimodal Sentiment Analysis Based on Modal Focusing Model

Sun, Shuangyang	Hohai University
Xu, Guoyan	Hohai University
Lu, Sijun	Hohai University
Keywords: Deep Learning, Neural Networks and their Applications Abstract: Multimodal sentiment analysis integrates various modalities of information to collectively inform decision-making processes. Previous studies often treat different modal features equally or emphasize textual information as the primary consideration. However, when the modalities in the sample contain different sentiment information, these methods may not be able to effectively deal with this situation. To solve this problem, we propose a multimodal sentiment analysis model focusing on each modality (MFM). In this paper, we separately integrate each modality as a primary modality interacting with other secondary modal information so that each modality can play a leading role. In addition, we use shared mask in modal interaction to capture important information in the secondary modality related to the primary modality, and improve the effectiveness of the information interaction process. The model is evaluated against baseline models using the MOSI and MOSEI multimodal sentiment analysis datasets. The experimental results show that the model achieves better performance, thereby validating its effectiveness in multimodal sentiment analysis tasks.

11:20-11:40, Paper TuBT7.2
Virtual Feature Generation and Learning for Federated Domain Generalization

Kong, Wenkang	Beijing Jiaotong University
Huang, Jun	South China Normal University
Liang, Zhuoming	South China Normal University
Keywords: Transfer Learning, Representation Learning, Information Assurance and Intelligence Abstract: Federated Domain Generalization (FedDG) aims to learn a generalizable model from source domains while preserving data privacy. However, several challenges exist: (1) effective knowledge transfer across diverse modalities like text or audio has not been achieved; (2) the presence of irrelevant or malicious source domains can cause negative transfer due to the lack of data quality governance; (3) existing methods use Shapley Value for contribution calculation, which has high computational complexity and is only suitable for scenarios with a small number of clients. To address these issues, we propose FedVFGL, a novel FedDG framework. FedVFGL introduces a global objective with personalized feature calibration. Its key innovation is a dynamic aggregation scheme that adjusts each domain's contribution based on relevance and reliability. Theoretical analysis demonstrates that this reweighting can improve generalization bounds over traditional FedDG. FedVFGL is a versatile, robust algorithm that integrates with various federated techniques and mitigates data poisoning. Extensive experiments demonstrate FedVFGL's superior performance over state-of-the-art FedDG methods, and its complementary benefits when combined with existing approaches. The FedVFGL source code is publicly available at: https://github.com/scnu-kevinkong/Fedvfgl.

11:40-12:00, Paper TuBT7.3
Entity Label-Guided Graph Fusion Multi-Modal Named Entity Recognition

Ding, Guohui	Shenyang Aerospace University
Tang, Wenjing	Shenyang Aerospace University
Yuan, Zhaoyi	Shenyang Aerospace University
Keywords: Multimedia Computation, Neural Networks and their Applications, Deep Learning Abstract: Multimodal Named Entity Recognition (MNER) is a task that leverages multimodal information (such as text and images) to identify named entities within social media text. Traditional MNER methods primarily rely on simple interactions between text annotations and visual features, thus overlooking the specific correspondence between text and visual objects. Additionally, irrelevant visual noise may interfere with the final recognition results. Therefore,this paper proposes a Entity label-guided Graph Fusion Multi-modal Named Entity Recog nition approach(ELGF), which utilizes pre-defined entity label information from input text as a bridge. Firstly, entity label detection tasks are employed to obtain entity label information. Then, the entity label information is utilized as a bridge between the two modalities to construct a multimodal interaction graph. This graph is inputted into a graph neural network, where attention and gate mechanisms are applied to interactively fuse multimodal information. Finally, a Conditional Random Field (CRF) decoding is used to predict the final MNER label sequence.Extensive experimental results demonstrate that compared to mainstream methods, the proposed model achieves competitive recognition accuracy on public datasets.

12:00-12:20, Paper TuBT7.4
Incorporating Heterophily into Graph Neural Networks for Graph Classification

Yang, Jiayi	Tongji University
Medya, Sourav	UIC
Ye, Wei	Tongji University
Keywords: Representation Learning, Deep Learning, Machine Learning Abstract: Graph Neural Networks (GNNs) often assume strong homophily for graph classification, seldom considering heterophily, which means connected nodes tend to have different class labels and dissimilar features. In real-world scenarios, graphs may have nodes that exhibit both homophily and heterophily. Failing to generalize to this setting makes many GNNs underperform in graph classification. In this paper, we address this limitation by identifying three effective designs and develop a novel GNN architecture called IHGNN (Incorporating Heterophily into Graph Neural Networks). These designs include the combination of integration and separation of the ego- and neighbor-embeddings of nodes, adaptive aggregation of node embeddings from different layers, and differentiation between different node embeddings for constructing the graph-level readout function. We empirically validate IHGNN on various graph datasets and demonstrate that it outperforms the state-of-the-art GNNs for graph classification.

12:20-12:40, Paper TuBT7.5
Sensor-Based Authentication on Smartphones Via Integrating Auxiliary Information

Wang, Yingjie	Wuhan University
Ruimin, Hu	Wuhan University
Keywords: Biometric Systems and Bioinformatics, Deep Learning, AI and Applications Abstract: With the widespread use of smartphones, more and more private information is stored on the phone, and the loss or theft of the phone can lead to data leakage, theft of property, and other problems. Traditional active authentication methods based on passwords, faces, fingerprints, etc. authenticate only once at login, which still has some hidden dangers. Authentication methods based on behavioral biometrics such as walking gait, touch screen, keystroke, etc. can provide implicit and continuous authentication services, and thus have been widely studied recently. Considering the limited computational resources of smart devices, we first introduce the state-space model with linear model complexity, i.e., Mamba, to extract deep feature representations from raw signal sequences in the sensor-based authentication task. Then a series of auxiliary information from different domains are proposed and fused to the deep features to enhance the consistent semantic information and ultimately improve the discriminative ability of the model. Our model A-Mamba, i.e., Auxiliary-Mamba, is experimented on two public datasets. The experimental results show that our model outperforms existing approaches.

12:40-13:00, Paper TuBT7.6
TS-DETR: A Small Object Detection Model in Autonomous Driving Systems

Niu, Yifan	University of Electronic Science and Technology of China
Feng, Chenglin	University of Electronic Science and Technology of China
Zeng, Tiantian	University of Electronic Science and Technology of China
Xiao, Kai	Brookfield RPS Real Property Solutions
Wu, Shaozhi	Yangtze Delta Region Institute (Quzhou), University of Electroni
Liu, Xingang	Yangtze Delta Region Institute (Quzhou), University of Electroni
Su, Han	Yangtze Delta Region Institute (Quzhou), University of Electroni
Gong, Jiechuan	Yangtze Delta Region Institute (Quzhou) of the University of Ele
Keywords: Deep Learning, Machine Learning, Machine Vision Abstract: With the rapid advancement of autonomous driving technology, the demand for faster and more accurate object detection frameworks has become increasingly urgent. Recently, many deep learning-based object detectors have demonstrated impressive performance in real-time driving applications. However, the detection of small objects such as traffic signs remains challenging due to the complex nature of these objects. This paper proposes a transformer-based detector TS-DETR to improve the accuracy of small object detection in autonomous driving systems. We introduce a constrained decoder structure to focus the model's attention on the predicted boxes. Additionally, a content-aware deformable cross-attention mechanism is proposed to obtain more comprehensive attention weights. Experimental results on challenging public datasets such as TT100K and CCTSDB2021 demonstrate that our approach achieves significant performance improvements with only a minimal increase in the number of parameters compared to existing algorithms.

12:40-13:00, Paper TuBT7.7
A Case Study on Blockchain-Based Anonymous Reviewer Incentive Token (BARIT)

Shrestha, Pratiksha	Miami University, Ohio
Bhunia, Suman	Miami University, Ohio
Carvalho, Arthur	Miami University, Ohio
Anderson, Chad	Miami University, Ohio
Lee, Gabe	Miami University, Ohio
Keywords: Information Systems for Design and Marketing, Human-Machine Cooperation and Systems, Human Performance Modeling Abstract: Peer review is an integral part of academic publi- cation necessary to maintain high standards and novelty of pub- lished research. Despite its importance, peer reviewers are rarely provided incentives, leading to journals having difficulty finding reviewers inclined to accept invitations and submit reviews on time. This paper proposes a Blockchain-based Anonymous Reviewer Incentive Token (BARIT) to incentivize peer reviewers. BARIT introduces flexible incentive schemes to provide both recognition and tangible benefits for the reviewers’ contribution while preserving the anonymity of reviewers. Using blockchain technology to record reward tokens ensures their permanence and acceptance across different publishers. Incentive models are designed to encourage the involvement of researchers as reviewers, reduce reviewer refusal rates, and prompt the timely submission of review reports.


TuBT8	MR08
Online - AIoT 1
Chair: Ren, Senlin	Beijing Information Science and Technology University

11:00-11:20, Paper TuBT8.1
Using Large Language Models to Integrate Virtual Students in Computerized Learning Platforms

Arevalillo-Herráez, Miguel	Universitat De València
Ayesh, Aladdin	University of Aberdeen
Rezakhanlou, Houman	Universitat De València
Arnau, David	Universitat De València
Keywords: Human-Machine Interaction, Human-Machine Cooperation and Systems, Human-Computer Interaction Abstract: Recent Large Language Models (LLMs) demonstrate problem-solving capabilities suitable for educational use. This paper investigates using LLMs to create virtual agents that mimic student behavior and interact with learning platforms. Testing a modest-sized LLM on an Intelligent Tutoring System for word problem-solving revealed that the LLM could fully solve 92% of single-step problems, although their performance decreased to 14% when attempting more complex problems.

11:20-11:40, Paper TuBT8.2
Inverse Stereo Matching Supervised Dense Point Cloud Reconstruction for Scenes

Zong, Ze	Soochow University
Xie, Jie	Soochow Univerisity
Zhang, Jin	Soochow University
Wu, Cheng	Soochow Univerisity
Keywords: Machine Learning, Machine Vision, Deep Learning Abstract: The reconstruction of dense point clouds is an important foundation for downstream applications, such as object detection, semantic classification and surface reconstruction. Current methods focus on dense point cloud reconstruction for objects but neglect the whole scene. To address this issue, the reconstruction of dense point clouds supervised by inverse stereo matching (IS-Dense) is proposed. In detail, the Transformer model is first used to extract deep features form the point clouds. Second, point cloud features are expanded through the base upsampler. Ultimately, the point clouds would be coordinated following the feature expansion. Due to the uneven distribution of point clouds in the whole scene, some gaps and anomalies are presented in the data. Therefore, a point location refinement module supervised by inverse stereo matching is designed to solve this problem. For this module, the key is to utilize the reconstructed dense point clouds and the right image to estimate the left image. Supervised by real left images, the reconstructed dense point clouds are precise and even-distributed. The experimental results prove the superiority of the proposed method over current methods, especially for the whole scene.

11:40-12:00, Paper TuBT8.3
VTLL: Visual-Texture Multi-Modal Fusion for Table Structure Recognition Based on Logic Location Regression

Fang, Jun	Nanjing University of Science and Technology
Zhang, Chongyang	Nanjing University of Science and Technology
Keywords: Deep Learning, Neural Networks and their Applications, AI and Applications Abstract: Tables play a crucial role in both documents and daily life, thus sparking significant interest in the research of automatic table structure recognition(TSR). Recent methods primarily achieve recognition by predicting makeup sequences or adjacency relationships between table cells. However, these methods have some limitations in applications. The former requires additional computation of heuristic rules to recover the structure of the table, which not only increases the computational complexity but also may affect the recognition accuracy. The later relies on huge training data and inefficiency decoders, which not only raises the cost but also limits their application in real-time or large-scale data processing scenarios. At the same time, they often ignore the significance of cell logical location and fail to effectively utilize the rich text information that naturally exists in the table. In this paper, we propose a new framework called VTLL to solve the problem. The method extracts visual and textual features to do adaptive feature fusion, and we also introduce a cascading regressors to predict the fused features multiple times, while combining intra-cell and inter-cell losses. As for the regression header part, we use a mixed context aggregator to understand the inter-cell relationships. We evaluated the proposed methods on several public datasets, experiment results demonstrate that VTLL performs better.

12:00-12:20, Paper TuBT8.4
An Efficient Multi-Layer Indexing Method on Blockchain for Multimodal Data Querying

Jia, Haoyu	Qilu University of Technology
Wu, Xiaoming	Qilu University of Technology, Shandong Computer Science Center
Shanshan, Liu	Qilu University of Technology (Shandong Academy of Sciences)
Yuan, Qile	Qilu University of Technology (Shandong Academy of Sciences)
Qi, Bei	Qilu University of Technology (Shandong Academy of Sciences)
Liu, Xiangzhi	Shandong Computer Science Center (National Supercomputer Center
Keywords: Information Assurance and Intelligence, Computational Intelligence in Information, Cybernetics for Informatics Abstract: 在数字社会时代，生成频率多模态数据正在迅速增加。区块链，被公认为值得信赖的分布式数据库技术，为可信存储和高效管理多模态数据。然而区块链系统仅支持使用事务的查询哈希值作为关键字，不能直接利用多模态数据的内容特征，导致一般查询效率低。为了解决这个问题，本文介绍了一种高效的多层索引方法用于多模态数据查询的区块链。它建立了一个链上和链下数据之间的有效映射通过可验证的链上和链下协同存储体系结构。该论文还提出了多层位图块索引（MBBI）和杜鹃默克尔树（C-MT）到优化查询流程。实验结果证明这种方法不仅确保了链上元数据的一致性和完整性，以及链下还显著提升了效率多模态数据查询。这

12:20-12:40, Paper TuBT8.5
Improved Imbalance Resilience in Continual Multi-Label Classification with Adaptive Margin Spiking Neural Networks

Mishra, Sourav	Indian Institute of Science, Bangalore
Dora, Shirin	Loughborough University
Sundaram, Suresh	Indian Institute of Science
Keywords: Machine Learning, Neural Networks and their Applications, Deep Learning Abstract: Multi-label learning and continual multi-label learning are crucial challenges in machine learning, particularly in handling complex data with multiple overlapping labels over time. Recent research works try to tackle the effect of data imbalance as it makes multi-label learning more challening. This work introduces an adaptive margin spiking neural network (AM-SNN) architecture coupled with a novel imbalance-sensitive loss function designed to enhance robustness against class imbalance in these settings. AM-SNN employs two output layers: one for predictions and another for margin values, with a unique loss function leveraging cosine similarity between predictions and ground truths to improve confidence in the model's predictions. Experiments show that AM-SNNs trained with the proposed loss function outperform state-of-the-art loss functions on metrics such as the imbalance-weighted F1 score and the F1 score for the most imbalanced class on several multi-label learning datasets. In continual multi-label learning, AM-SNNs surpass Bipolar SNNs and the CIFDM benchmark on large datasets - Birds, Human, and Eukaryote.

12:40-13:00, Paper TuBT8.6
Combining Deep Learning and Expert Rules for Smart Contract Vulnerability Detection

Ren, Senlin	Beijing Information Science and Technology University
Yang, Jun	Beijing Information Science and Technology University
Gu, Xiguo	Beijing Information Science and Technology University
Zheng, Liwei	Beijing Information Science and Technology University
Cui, Zhanqi	Beijing Information Science and Technology University
Keywords: Quality and Reliability Engineering Abstract: Smart contracts usually hold a large amount of digital assets, which can cause substantial losses if these contracts have vulnerabilities. Thus, it is essential to adequately detect possible vulnerabilities in smart contracts before deployment. There are many types of vulnerabilities in smart contracts, and different detection methods have their own unique advantages, some vulnerabilities may be more suitable for expert rule-based methods, while some vulnerabilities are more suitable for deep learning-based methods. A single detection method usually fails to fully use its ability to detect vulnerabilities. To address the above problems, we propose a composite approach named CDE-VD (Combining Deep Leaning and Expert Rule for Smart Contract Vulnerability Detection) to improve the performance of vulnerability detection. The method divides smart contract samples into deep learning-prone samples and expert rule-prone samples by classifying them before detection, and extracts expert rule features to train the smart contract detection method classifier to predict the category of the samples under analysis, then selects the suitable method for detection. The experimental results show that the vulnerability detection performance of CDE-VD outperforms that of single detection methods. Compared with the SOTA method MANDO, CDE-VD achieves average improvements of 3.22%, 2.32%, 9.25%, and 6.54% in terms of the Accuracy, Precision, Recall, and F_1-score for five categories of vulnerabilities such as access control and time manipulation, respectively, which indicates that category prediction of the smart contract samples could improve vulnerability detection performance.


TuBT9	MR09
AI Applications 9	Regular Papers - Cybernetics
Chair: Qin, Nianping	Sichuan University

11:00-11:20, Paper TuBT9.1
CPR: Mitigating Large Language Model Hallucinations with Curative Prompt Refinement

Shim, Jung-Woo	Korea University
Ju, Yeong-Joon	Korea University
Park, Ji-Hoon	Korea University
Lee, Seong-Whan	Korea University
Keywords: Application of Artificial Intelligence, Deep Learning, Information Assurance and Intelligence Abstract: Recent advancements in large language models (LLMs) highlight their fluency in generating responses to diverse prompts. However, these models sometimes generate plausible yet incorrect "hallucinated" facts, undermining trust. A frequent but often overlooked cause of such errors is the use of poorly structured or vague prompts by users, leading LLMs to base responses on assumed rather than actual intentions. To mitigate hallucinations induced by these ill-formed prompts, we introduce Curative Prompt Refinement (CPR), a plug-and-play framework for curative prompt refinement that 1) cleans ill-formed prompts, and 2) generates additional informative task descriptions to align the intention of the user and the prompt using a fine-tuned small language model. When applied to language models, we discover that CPR significantly increases the quality of generation while also mitigating hallucination. Empirical studies show that prompts with CPR applied achieves over a 90% win rate over the original prompts without any external knowledge.

11:20-11:40, Paper TuBT9.2
MVESF: Multi-View Enhanced Semantic Fusion for Controllable Text-To-Image Generation

Zhang, Hai	East China Normal University
Cao, Guitao	East China Normal University
Wang, Xinke	East China Normal University
Quan, Jiahao	East China Normal University
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Vision Abstract: Text-to-image(T2I) synthesis aims to generate semantically consistent images with texts. Currently, existing methods merely use the shallow semantics of the text to crudely guide image generation. They cannot fully integrate rich textual semantics with image features, leading to an inability to control image generation through text finely. To address this issue, We propose a GAN-based method named MVESF (Multi-View Enhanced Semantic Fusion), which enhances the semantic fusion of text and images from multiple perspectives for fine-grained controllable text-to-image synthesis. In multi-view, we introduce Multi-domain Semantic Guidance, Local Semantic Attention, and Visual-textual Consistency Loss to enhance the semantic fusion of text and images in image generation, image discrimination, and image supervision, respectively. Our method promotes the consistent alignment between text and images, allowing for fine-grained variations in the generated images when subtle changes in the input text without affecting unrelated regions. Extensive experiments have demonstrated the effectiveness of our approach.

11:40-12:00, Paper TuBT9.3
Channel Independent Attention Network for Electricity Theft Detection

Qin, Nianping	Sichuan University
Zhou, Yao	Sichuan University
Keywords: Application of Artificial Intelligence, Deep Learning, AI and Applications Abstract: Electricity theft severely impairs the economic benefits to businesses and endangers public safety. In recent years, deep learning models for automated electricity theft detection have been advancing rapidly. However, the pattern of electricity usage data exhibits complex dependencies in cases of electricity theft, posing significant challenges for accurate detection. To cope with this difficulty, we propose a Channel Independent Attention Network (CIAN) for electricity theft detection. In particular, electricity consumption data is firstly processed by a sequence patchfication strategy, which enables dependency capturing over suitable time intervals. A parallel structure is then constructed, where a self-attention based channel independent encoder and a mix convolution module are cooperated for learning global and local feature patterns, respectively. Lastly, the features are fused to predict whether the electricity consumption sequence data involves electricity theft. Experimental results on a real-world electricity dataset demonstrate that the proposed method surpasses other competitive electricity theft detection approaches across various evaluation metrics.

12:00-12:20, Paper TuBT9.4
Tibetan-Chinese Machine Translation Enhanced on Cross-Lingual Pre-Trained Model

Zhou, Mingjun	School of Information Science and Technology, Tibet University
Gesang, Quzong	Tibet University
Qun, Nuo	Tibet University
Nyima, Tashi	Tibet University
Rinchen, Dongrub	Tibet University
Keywords: Application of Artificial Intelligence, AI and Applications, Deep Learning Abstract: Tibetan-Chinese machine translation has become a focal point of interest within the Tibetan community due to its importance for effective communication and cultural preservation. Although technological advancements have been made, current Tibetan-Chinese translation systems still fall short of satisfactory performance. Moreover, evaluating these systems using a publicly available and reliable benchmark dataset remains problematic. To address these challenges, we propose an enhanced Tibetan-Chinese machine translation framework. This framework includes data filtering techniques and utilizes the semantic knowledge of the state-of-the-art Chinese cross-lingual model, CINO. After preprocessing, we trained our model using the Transformer architecture and found that our approach significantly improves translation performance. Additionally, we have developed a high-quality, expert-reviewed test dataset to support community evaluations of translation systems. The details of our experimental model and the test dataset can be accessed at footnote{https://github.com/UTibetNLP/Ti-ChTrans}.

12:20-12:40, Paper TuBT9.5
Multi-Objective Defect Detection Method for Transmission Lines Based on Improved YOLOv8

Jingdong, Wang	Northeast Electric Power University
Ding, Xu	School of Computer Science, Northeast Electric Power University,
Fanqi, Meng	Northeast Electric Power University
Zhou, Lina	Beijing Institute of Computer Technology and Application
Keywords: Application of Artificial Intelligence, Deep Learning, Image Processing and Pattern Recognition Abstract: Aiming at the problem that the traditional detection method has low efficiency and accuracy due to the fact that there are many key component targets to be inspected by unmanned aerial vehicle (UAV) during power inspection, and the shape difference is large, and the image quality is not high, a transmission line abnormal target detection method based on improved YOLOv8 and super-resolution reconstruction is proposed. Firstly, the super-resolution reconstruction algorithm is used to reconstruct the abnormal image to improve the clarity and enrich the characteristic information contained in the image. On this basis, the improved YOLOv8 network is used to detect the defects in the inspection image. The CBAM attention mechanism is fused in the Bottleneck part of the C2f module to strengthen the model's ability to locate the target; In order to further improve the detection ability of small targets in patrol inspection, a small target detection layer is added to make the network pay more attention to the detection of small targets.Finally, in order to be deployed to the edge devices, the original convolution in the network is modified to a lightweight convolution GhostConv to reduce the number of model parameters. The experimental results show that the proposed method can accurately detect abnormal defects of transmission line components on the basis of improving the quality of inspection images. mAP is improved by 3.7% and the number of model parameters and calculation amount are greatly reduced, which reflects the effectiveness of the algorithm, and it has stronger extraction ability and robustness for subtle defect targets, meeting the detection requirements of power inspection.

12:40-13:00, Paper TuBT9.6
A Near-Imperceptible Disambiguating Approach Via Verification for Generative Linguistic Steganography

Yan, Ruiyi	Beijing Institute of Technology
Song, Tian	School of Cyberspace Science and Technology, Beijing Institute O
Yang, Yating	School of Cyberspace Science and Technology, Beijing Institute O
Keywords: AI and Applications, Application of Artificial Intelligence, Media Computing Abstract: Generative linguistic steganography aims to embed information into natural language texts to achieve covert transmission. However, currently in most approaches based on subword-supporting language models, the extraction process relies on tokenizing steganographic texts into tokens, which could cause segmentation ambiguity, leading to false results or failures of extraction finally. Despite several existing countermeasures (or disambiguation) that have been proposed, they are based on removing tokens of candidate pools, which render them incompatible from the sights of keeping imperceptibility, potentially incurring safety risks. To avoid it, we focus on tackling segmentation ambiguity with near-integrity of candidate pools. In this paper, we propose a near-imperceptible disambiguating approach via verification for generative linguistic steganography. First, this paper draws an all-case extraction method to obtain possible true extracted results. Further, length verification and checksum verification are presented to filter wrong extracted results caused by segmentation ambiguity. Experiments show that our disambiguating approach outperforms the existing disambiguating approaches, on various criteria, including about 23.49% higher embedding capacity, about 23.46% higher imperceptibility and about 5.73% anti-steganalysis capacity of steganographic texts.


TuBT10	MR10
Deep Learning and Neural Networks 2
Chair: Zimmermann, Grasielli Barreto	Graduate Program in Computer Science (PPGIa) - Pontifical Catholic University of Parana (PUCPR)

11:00-11:20, Paper TuBT10.1
Improve Deep Learning Autofocus with Depth Information Supervision and Current Focal Distance Cues

Wei, Xiaolin	Chongqing University
Yang, Ruilong	Chongqing University
Wu, Xing	Chongqing University
Wang, Chengliang	Chongqing University
Wang, Haidong	Southwest Hospital of Army Medical University
Wang, Hongqian	Southwest Hospital of Army Medical University
Tang, Tao	Chongqing University
Keywords: Neural Networks and their Applications, Machine Vision, Machine Learning Abstract: Abstract—Traditional autofocus methods search for the optimal focal distance (FD) by evaluating image quality from focal stacks, resulting in time-consuming focusing processes. Recently, deep learning has being adopted for single-shot autofocus methods, which can predict the optimal FD directly from a single input image. However, these methods often suffer from low prediction accuracy due to the lack of global features and structured global supervisory information, as they rely solely on the image’s region of interest (ROI) as input and a single value for supervision. We propose a deep learning network named MPFS, which takes a full-frame photograph as input and uses the optimal focal distance per pixel for supervision, this method effectively addresses the issues of missing global features and insufficient supervisory information by leveraging these enhancements. Additionally, the network integrates current camera focal distance information to mitigate the scale ambiguity caused by the lack of absolute scale information. To validate the effectiveness of the proposed method, we designed an experiment using a dataset annotated with optimal FD per pixel. Experimental results on this dataset indicate that our approach achieves a 0.22 decrease in the Mean Absolute Error (Mae) metric compared to the state-of-the-art models, with improvements of 0.02 and 0.004 in d1 and d2 metrics.

11:20-11:40, Paper TuBT10.2
Identification of Diseases in Greenhouse Tomato Cultivation: A New Dataset and Baseline Results

Zimmermann, Grasielli Barreto	Graduate Program in Computer Science (PPGIa) - Pontifical Cathol
Pellenz, Marcelo Eduardo	Graduate Program in Computer Science (PPGIa) - Pontifical Cathol
Souza Britto Jr, Alceu	Pontifical Catholic University of Parana (PUCPR)
Yandre, Costa	State University of Maringá
Keywords: Neural Networks and their Applications, Machine Learning, Image Processing and Pattern Recognition Abstract: Plant diseases are one of the factors that compromise food production goals, and the characteristics and climate of each production region influence them. Tomatoes are one of the world's most consumed vegetables and are widely affected by various diseases. However, tomato cultivation in greenhouses allows its continuous production. In this context, this research work focuses on the problem of identifying diseases in tomato cultivation scenarios in greenhouses. For this study, we created new datasets with two image sizes: the Tomato Leaf Image Dataset (TLID) with image sizes of 256x256 pixels and 15,256 images, and the Patch Tomato Leaf Image Dataset (PTLID) with patch sizes of 32x32 pixels and 227,218 images. Both datasets comprise seven classes, including four types of diseases, two combinations of diseases on the same leaf, and the healthy leaf. Machine Learning techniques have been widely used to identify plant diseases. This work presents two machine learning methods tested with both datasets. In the proposed models, we combine three convolutional neural networks, a customized CNN, VGG19, and Resnet50, and two voting classification methods using Hard and Soft decisions. The evaluation performed on the datasets showed that when the patches are used, the results improve significantly, reaching an accuracy of 90.48%. This technique makes it possible to identify the stage of the disease.

11:40-12:00, Paper TuBT10.3
A GCN-Based Trip Recommendation Method Incorporating Reverse Effect

Luan, Wenjing	Shandong University of Science and Technology
Wang, Xueyao	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Guo, Xiwang	Liaoning Petrochemical University
Keywords: Deep Learning, Machine Learning, Big Data Computing, Abstract: In location-based services (LBS), providing accurate trip recommendations is a challenge due to the diverse trip preferences of users and the complexity of their transfer behaviors. Previous trip recommendation studies neglect the effect of the following POIs on the previously visited ones, called a reverse effect. To address this problem, this study builds a Graph-convolutional-network-based Double-layer Bidirectional Trip Recommendation (GDB-TR) model. It utilizes a heterogeneous graph to model users’ check-in trajectories, incorporating spatial and temporal information. Subsequently, subgraphs are extracted, and adjacency matrices are constructed to represent the relationships within each subgraph. By fusing these matrices through a neural network, GDB-TR obtains vector representations for both point-of-interests (POIs) and POI categories. The core of GDB-TR is a double-layer bidirectional neural network. This network comprises two layers: one for describing POIs and the other for POI categories. Bidirectional computation is conducted between the initial and destination nodes, with a forward computation capturing the influence of preceding POIs or categories on following ones, and reverse computation evaluating this influence in the opposite direction. Finally, we conduct experiments on five popular real-world datasets and the results show the superiority of GDB-TR over existing baseline models, measured by F1 and pairs-F1 metrics.

12:00-12:20, Paper TuBT10.4
Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

Zheng, Tao	Xinjiang University
Wang, Liejun	Xinjiang University
Yu, Yinfeng	Xinjiang University
Keywords: Deep Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition Abstract: Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Taking into account the aforementioned issues, our study introduces a novel speech enhancement framework, HFSDA, which skillfully integrates heterogeneous spatial features and incorporates a dual-dimension attention mechanism to significantly enhance speech clarity and quality in noisy environments. By leveraging self-supervised learning embeddings in tandem with Short-Time Fourier Transform (STFT) spectrogram features, our model excels at capturing both high-level semantic information and detailed spectral data, enabling a more thorough analysis and refinement of speech signals. Furthermore, we employ the innovative Omni-dimensional Dynamic Convolution (ODConv) technology within the spectrogram input branch, enabling enhanced extraction and integration of crucial information across multiple dimensions. Additionally, we refine the Conformer model by enhancing its feature extraction capabilities not only in the temporal dimension but also across the spectral domain. Extensive experiments on the VCTK-DEMAND dataset show that HFSDA is comparable to existing state-of-the-art models, confirming the validity of our approach.

12:20-12:40, Paper TuBT10.5
Safely Knowledge Transfer from Source Models Via an Iterative Pruning Based Learning Approach

Lu, Xiaoyu, Sean	Nanjing University of Science and Technology
Zhang, Jianan	Nanjing University of Science and Technology
Yao, Siya	Zhejiang Gongshang University
Huang, Bo	Nanjing University of Science and Technology
Keywords: Transfer Learning, Neural Networks and their Applications, Deep Learning Abstract: 迁移学习已成为深层次的关键技术学习，在工业界和学术界广泛采用开发定制模型，特别是针对特定和下游任务解决。尽管迁移学习，目标模型可以轻松继承在学习过程中源模型的缺陷，例如容易受到后门攻击和对抗性攻击攻击。因此，这项工作提出了一种新的方法，迭代剪枝学习方法（IPLA），可减少转移过程中潜在缺陷的继承学习过程。为了减少对攻击并提高目标模型IPLA的鲁棒性评估源模型中权重的重要性并保留对目标任务至关重要的任务，然后通过迭代修剪来修剪多余的权重过程。实验在 4 个数据集上进行，超过 2 个数据集骨干源模型。结果表明所提方法性能令人满意。

12:40-13:00, Paper TuBT10.6
Learning Node-Pair Insertion for the Pickup and Delivery Problem with Time Windows (I)

Fang, Zhanhong	Sun Yat-Sen University
Chen, Jinbiao	Sun Yat-Sen University
Zhang, Zizhen	Sun Yat-Sen University
Su, Dawei	Sun Yat-Sen University
Keywords: Neural Networks and their Applications, Deep Learning, Optimization and Self-Organization Approaches Abstract: Pickup and Delivery Problem with Time Windows (PDPTW) is a prevalent research direction in modern logistics transportation. In this challenging problem, customers are divided into pickup nodes and delivery nodes, and vehicles must first serve each pickup node before proceeding to its corresponding delivery node. Moreover, the hard time window constraint presents an obstacle for the existing learning-to-construct methods. Hence, this paper proposes a novel learning-to-construct approach based on node-pair insertion to address the complex time window constraint. It involves predicting the insertion point for the next node pair within the current partial solution and ensuring constraint adherence. We enhance the context information for the decoder to produce better solutions. The experimental results verify that the proposed approach can construct high-quality solutions in a very short period of time.


TuBT11	MR11
Image Processing and Pattern Recognition 1	Regular Papers - Cybernetics
Chair: He, Hanxian	Monash University

11:00-11:20, Paper TuBT11.1
Domain Adaptive Lung Nodule Detection in X-Ray Image

Zhao, Haifeng	Anhui University
Jiang, Lixiang	Anhui University
Ma, Leilei	Anhui University
Sun, Dengdi	Anhui University
Fu, Yanping	Anhui University
Keywords: Deep Learning, Image Processing and Pattern Recognition, Transfer Learning Abstract: Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adaptive approach for lung nodule detection that leverages mean teacher self-training and contrastive learning. First, we propose a hierarchical contrastive learning strategy to refine nodule representations and enhance the distinction between nodules and background. Second, we introduce a nodule-level domain-invariant feature learning (NDL) module to capture domain-invariant features through adversarial learning across different domains. Additionally, we propose a new annotated dataset of X-ray images to aid in advancing lung nodule detection research. Extensive experiments conducted on multiple X-ray datasets demonstrate the efficacy of our approach in mitigating domain shift impacts.

11:20-11:40, Paper TuBT11.2
PWPH: Proactive Deepfake Detection Method Based on Watermarking and Perceptual Hashing

Li, Jian	Qilu University of Technology(Shandong Academy of Sciences)
Li, Shuanshuan	Qilu University of Technology (Shandong Academy of Sciences)
Ma, Bin	Qilu University of Technology(Shandong Academy of Sciences)
Wang, Chunpeng	Qilu University of Technology(Shandong Academy of Sciences)
Zhou, Linna	Beijing University of Posts and Telecommunications
Wang, Fei	Qilu University of Technology
Wang, Yule	Qilu University of Technology
Yang, Miaomiao	Qilu University of Technology （Shandong Academy of Scienc
Keywords: Image Processing and Pattern Recognition, Information Assurance and Intelligence, Neural Networks and their Applications Abstract: Deepfake技术的普及提高了识别真假面孔的挑战。而检测方法已经存在，其中大多数是被动的取证并面临普遍性的挑战，以及迁移。目前，一些研究试图保护通过预先插入不可见信息来获得原始图像。但是，在形象方面仍然存在不足由于信息而产生的质量和信息稳健性嵌入，即水印。因此，我们采用感知哈希编码的鲁棒性，并将其与提出主动的信息隐藏技术 Deepfake检测解决方案，在此简称PWPH 纸。我们的方法简单而高效：首先，包含人脸的图像分为两部分：FA（人脸 area）和NFA（非面部区域）。感知哈希代码是从非面部区域（NFA）生成。然后，哈希代码作为水印嵌入到 FA 中。在提取阶段，我们使用与编码器相同的方法来从 FA 中检索嵌入的水印。水印਷

11:40-12:00, Paper TuBT11.3
FUIT: Improving Semantic Consistency in Unpaired Image Translation Via High and Low Frequency Guidance

Du, Mingxin	Beijing University of Posts and Telecommunications
Luo, Juanjuan	Beijing University of Posts and Telecommunications
Keywords: Image Processing and Pattern Recognition, Machine Vision, Deep Learning Abstract: With the development of generative models, unpaired image translation has made remarkable progress. It aims to translate an image from one domain to another domain while remaining the content information of the input image. However, existing methods still suffer from semantic inconsistency during the translation process. In this paper, we propose a novel unpaired image translation framework to improve semantic consistency via high and low frequency guidance. Our key idea is to divide images into high and low frequency components. Based on frequency representation of images, we design a new high-frequency encoding module that remains content semantic information from the original domain. We also present a novel style consistency regularization on low-frequency components, which enhances style consistency of output images. Our approach achieves sota performance on multiple unpaired image translation tasks. Extensive experiments prove our method significantly improves semantic consistency that remains content semantics from the original images and generates style-harmonious images.

12:00-12:20, Paper TuBT11.4
Sensitive Image Classification by Vision Transformers

He, Hanxian	Monash University
Wilson, Campbell	Monash University
Nguyen, Thanh Thi	Monash University
Dalins, Janis	Australian Federal Police
Keywords: Application of Artificial Intelligence, Image Processing and Pattern Recognition, Deep Learning Abstract: When it comes to classifying child sexual abuse images, managing similar inter-class correlations and diverse intra-class correlations poses a significant challenge. Vision transformer models, unlike conventional deep convolutional network models, leverage a self-attention mechanism to capture global interactions among contextual local elements. This allows them to navigate through image patches effectively, avoiding incorrect correlations and reducing ambiguity in attention maps, thus proving their efficacy in computer vision tasks. Rather than directly analyzing child sexual abuse data, we constructed two datasets: one comprising clean and pornographic images and another with three classes, which additionally include images indicative of pornography, sourced from Reddit and Google Open Images data. In our experiments, we also employ an adult content image benchmark dataset. These datasets served as a basis for assessing the performance of vision transformer models in pornographic image classification. In our study, we conducted a comparative analysis between various popular vision transformer models and traditional pre-trained ResNet models. Furthermore, we compared them with established methods for sensitive image detection such as attention and metric learning based CNN and Bumble. The findings demonstrated that vision transformer networks surpassed the benchmark pre-trained models, showcasing their superior classification and detection capabilities in this task.

12:20-12:40, Paper TuBT11.5
An Industrial Scene Text Detection with Spectral Domain Enhancement and Graph Fourier Mapping

Xiao, Wocheng	South China University of Technology
Liang, Lingyu	South China University of Technology
Huang, Shuangping	South China University of Technology
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: Text detection is a task of great significance in different scenarios, which has wide applications for downstream tasks such as text recognition and text retrieval. Varieties of text detection methods have been proposed to solve this problem in natural scenes and have achieved good results. However, these methods cannot get satisfied performance in industrial scenes for various interferences caused by the industrial environment like the image noise, background material texture and low contrast. To deal with these challenges, we first propose a contour modeling algorithm based on graph Fourier transform mapping to represent arbitrary shaped text contours. A refined fast Fourier convolutional network module with ability of spectral-domain sensing is also intruduced to enhance text feature and suppress interference. Based on these two components, we construct a novel network to achieve accurate industrial scene text detection. Quantitative evaluations are conducted on benchmark datasets MPSC and IcText, experimental results show that our method obtains the state-of-the-art detection accuracy.

12:40-13:00, Paper TuBT11.6
Multi-Stream GCN and CNN for Skeleton-Based Action Recognition

Taniyev, Kenzhebek	Nazarbayev University
Zhaksylyk, Tomiris	Nazarbayev University
Anh Tu, Nguyen	Nazarbayev University
Keywords: Deep Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence Abstract: This paper presents two novel approaches for improving skeleton-based action recognition using Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN). In the first approach, we combine GCN and CNN streams that process position and velocity features to improve classification accuracy. In the second approach, we use GCN as an embedding layer for support network CNN to extract features from skeleton data, which significantly improves recognition accuracy. Our experiments on the JHMDB dataset demonstrate that our approaches outperform state-of-the-art methods while using significantly fewer parameters. Additionally, we extended our evaluation to the Kinetics-400 dataset, where our methods showed comparable results with considerably lower model complexity. Our work contributes to the development of more efficient and robust action recognition models.


TuBT12	MR12
Haptic and Human-Computer Interaction 7	Regular Papers - HMS
Chair: Kirchhoff, Miriam	University of Tübingen

11:00-11:20, Paper TuBT12.1
Unsupervised Few-Shot Adaptive Re-Learning for EEG-Based Motor Imagery Classification

Ng, Han Wei	Nanyang Technological University
Guan, Cuntai	Nanyang Technological University
Keywords: Brain-Computer Interfaces, Human-Machine Interaction, Human-Machine Interface Abstract: To address the effect of both intra- and inter-subject variability in the EEG-based motor-imagery classification, adaptive schemes have been proposed whereby the pre-trained model. However, collection and labelling of additional target data is resource intensive. Furthermore, there may exist significant signal variations across time especially across multiple recording sessions which can result in model performance deterioration following adaptation. This introduces another challenging problem to training classifiers as they are typically unable to automatically determine which data is suitable for fine-tuning purposes, leading to some subjects facing performance drops even after adaptation. To address the data scarcity and variability in adaptive performance, we propose a novel machine-relearning Siamese architecture which utilizes few samples of unlabeled evaluation data to perform data efficient model re-learning. Machine re-learning optimally selects a sub-section of previously known data to update the model parameters. This is implemented via the use of comparative contrastive loss between the Gaussian distributions of target and known data to perform data selection. The highest subject-independent performance achieved an average (N=54) accuracy of 86.63% (±11.79%) without additional data and 88.23% (±10.36%) when utilizing a single unlabeled supplementary data for two-class motor imagery. The previous best accuracy on this dataset is 85.90% (±11.20%) using the best-known method in the literature. Therefore, significantly superior adaptation performance can be achieved while utilizing lesser amount of information from the target subject through reducing EEG feature variability in the training and fine-tuning sets. Codes may be found at: https://github.com/NgHanWei/EEG_Relearning

11:20-11:40, Paper TuBT12.2
Closed-Loop Phase Selection in EEG-TMS Using Bayesian Optimization

Kirchhoff, Miriam	University of Tübingen
Humaidan, Dania	Hertie-Institute for Clinical Brain Research
Ziemann, Ulf	University of Tübingen
Keywords: Brain-Computer Interfaces, Human-Machine Interaction, Medical Informatics Abstract: Research on transcranial magnetic stimulation (TMS) combined with encephalography feedback (EEG-TMS) has shown that the phase of the sensorimotor mu rhythm is predictive of corticospinal excitability. Thus, if the subject-specific optimal phase is known, stimulation can be timed to be more efficient. In this paper, we present a closed-loop algorithm to determine the phase linked to the highest excitability with as few trials as possible. We used Bayesian optimization with different configurations as an automated, online search tool in an EEG-TMS simulation experiment. From a sample of 38 healthy participants (25 f, 18 m), we selected all participants with a significant single-subject phase effect (n = 5) for the simulation. We then simulated 1000 experimental sessions per participant where we used Bayesian optimization to find the optimal phase. We tested two objective functions: Fitting a sinusoid in Bayesian linear regression, or Gaussian Process regression. We additionally tested adaptive sampling using a knowledge gradient as the acquisition function compared with random sampling. We evaluated the algorithm’s performance in a fast optimization (100 trials) and a long-term optimization (1000 trials). We found that for fast optimization, the Bayesian linear regression in combination with adaptive sampling gives the best results with a mean phase location accuracy of 79 % after 100 trials. With either sampling approach, Bayesian linear regression performs better than Gaussian Process regression in the fast optimization. In the long-term optimization, Bayesian regression with random sampling shows the best trajectory, with a rather steep improvement and good final performance of 87 % mean phase location accuracy. In summary, we could show the suitability of closed-loop Bayesian optimization for phase selection. We show that we can increase the speed and accuracy by using prior knowledge about the expected function shape compared with traditional Bayesian optimization with Gaussian Process regression.

11:40-12:00, Paper TuBT12.3
Low-Complexity RR Interval-Based Cardiac Dysrhythmia Indicator for Wearable ECG Devices

Palagani, Yellappa	Stanford University
Gonuguntla, Venkateswarlu	Symbiosis International (Deemed University)
Keywords: Human-Machine Interface, Wearable Computing, Assistive Technology Abstract: Cardiac dysrhythmias pose a significant risk of sudden cardiac death, necessitating prompt detection and treatment. While wearable ECG devices offer real-time monitoring capabilities, current methods for identifying dysrhythmias are often time-consuming. In this paper, we present a novel low-complexity RR interval (RRI)-based Cardiac Dysrhythmia Indicator (CDI) aimed at efficiently detecting dysrhythmias. The CDI architecture comprises two principal components: the Slow-Fast-Normal Indicator Circuit (SFNIC) and the Normal-Abnormal Indicator Circuit (NAIC). SFNIC monitors dysrhythmias at each RRI, while NAIC evaluates dysrhythmia status at regular intervals, indicating normal or abnormal states every 64 RRIs (64 bpm is near to normal pulse rate). Through extensive simulations conducted using 0.18 μm CMOS processes, with mixed-signal circuits operating within the (0.9-1.8) volt range, our design demonstrates successful recognition of various ECG rhythms. These findings suggest that the proposed CDI design holds promise for integration into wearable ECG devices, enabling real-time dysrhythmia detection and timely intervention.

12:00-12:20, Paper TuBT12.4
Self-Checkout Product Detection with Occlusion Layer Prediction and Intersection Weighting

Zhongling, Liu	Fujitsu
Shi, Ziqiang	Fujitsu R&D Center, Co. Ltd
Liu, Rujie	Fujitsu Research & Development Center
Liu, Liu	FRDC
Yamamoto, Takuma	Fujitsu Limited
Uchida, Daisuke	Fujitsu Limited
Keywords: Human-Machine Interaction, Design Methods, Assistive Technology Abstract: Automatic self-checkout based on computer vision is gaining popularity in the field of retail industry, due to the convenience for customers and manpower saving. Thus, retail product detection is vital important in the process of automatic checkout. The task of product detection based on single camera is still challenging, like (1) holding a variety of different products in one or both hands, (2) products appearance, (3) intentional fraudulent checkout practices. In this paper, we introduce a third branch on ordinary detectors to predict the occlusion layer of a product and then adopt occlusion layer aware non-max suppression (OLA-NMS) to depress false positives while keeping detection rate. Furthermore, IoU-activate loss is adopted by considering location information in the classification loss. Our third contribution is that we have collected a large-scale of retail checkout images for the target of self-checkout monitoring (SCOM), since there is no dataset or benchmark available for retail product detection under occlusion. Experiments are conducted on SCOM dataset to demonstrate the effectiveness of the proposed method.

12:20-12:40, Paper TuBT12.5
Modeling Operator Involvement in Automated Production: A Case Study of the Functional Resonance Analysis Method

Wilch, Jan	Technical University of Munich
Vogel-Heuser, Birgit	Technical University of Munich
Grabbe, Niklas	Technical University of Munich, Chair of Ergonomics
Bengler, Klaus	Chair of Ergonomics, Technical University of Munich
Posch, Magdalena	Technical University of Munich, Institute of Automation and Info
Keywords: Human-Machine Cooperation and Systems, Resilience Engineering, Human Factors Abstract: Faults in the technical process of automated production systems (aPS) are challenging to detect, report, and recover accurately. Missing physical equipment, lacking detection mechanisms in the control software, and insufficient training and experience of human operators lead to an often delayed and imprecise detection of a fault’s root cause. Since fault detection and recovery typically rely heavily on human involvement, there is also a risk of operators performing only partial repairs or inadvertently introducing new faults. Thereby, even slight deviations in human-machine interac-tions may affect process outcomes unpredictably, which so far cannot be modelled in field-level automation. The Functional Resonance Analysis Method (FRAM) is well-suited to address functional relations in such complex socio-technical systems and accurately represent resource requirements, preconditions, control, and time constraints, making it a promising tool to methodically investigate aPS faults. Yet, the FRAM has barely been applied in this domain. This paper thus introduces an approach relying on the FRAM to methodically derive alarm conditions and system extensions as a foundation for broader uses to investigate in future work. The FRAM-based approach was shown to reduce uncaught faults, required operator interventions, and thus unplanned downtime, in an experiment with two participants on a lab-sized demonstrator machine.

12:40-13:00, Paper TuBT12.6
Annotation-Based Semantic Conflict Prevention in Real-Time Collaborative Programming: Approach, Techniques, Prototype, and User Study

Wang, Mingjie	Tongji University
Fang, Bicheng	Tongji University
Jiang, Jinfeng	Tongji University
Fan, Hongfei	Tongji University
Keywords: Human-Computer Interaction, Multi-User Interaction Abstract: Real-time collaborative programming environments support a group of programmers, who are geographically distributed, to concurrently view and edit a shared set of source code in a real-time fashion. However, this emerging technology has not been widely applied yet, and one critical challenge is the semantic conflict during real-time collaboration. Existing conflict prevention approaches are limited in supporting individual source code files only, with pre-programmed rules that might be inflexible. To address these challenges, we propose a novel semantic conflict prevention approach based on source code annotations, which allow programmers to manually, conveniently and flexibly apply fine-grained conflict prevention rules during real-time collaboration. The proposed approach and techniques have been successfully implemented in a prototype system. User studies and performance evaluations have demonstrated the technical feasibility of the proposed approach and techniques, as well as the prototype's satisfactory efficiency and usability.


TuBT13	Foyer
Evolutionary and Heuristic Computation	Regular Papers - Cybernetics


TuBPSR	Room T14
Poster Presentation - Session 2	Poster Session

11:00-13:00, Paper TuBPSR.1
ICU-TGNN: A Hybrid Multitask Transformer and Graph Neural Network Model for Predicting Clinical Outcomes of Patients in the ICU

Shi, Tongyue	Peking University
Haowei, Xu	School of Artificial Intelligence, OPtics and ElectroNics (iOPEN
Ma, Jun	Peking University
Kong, Guilan	Peking University
Keywords: Biometric Systems and Bioinformatics, Hybrid Models of Computational Intelligence, Application of Artificial Intelligence Abstract: Predicting clinical outcomes for patients in the intensive care units (ICUs) is crucial for physicians to assess clinical risk and provide timely and appropriate interventions. Current models often fail to predict multiple clinical outcomes simultaneously, and patient similarity has not been fully utilized to improve prediction accuracy. Additionally, electronic health record (EHR) data in the ICUs often contain observations recorded at irregular time intervals, which existing prediction methods do not effectively model. To address these challenges, we propose the ICU-TGNN model. This model combines time attention-based Transformer and graph neural network (GNN) architectures to leverage temporal and relational patient data. The Transformer component analyzes time series EHR data to capture dynamic patient states over time, while the GNN component exploits the relational structure among patients based on comorbidities to enhance model generalizability. We evaluated the ICU-TGNN model on the eICU-CRD dataset, demonstrating its superior capability in predicting general clinical outcomes such as ICU mortality and length of ICU stay. Our findings highlight ICU-TGNN’s capability to provide accurate outcome predictions by effectively handling the complexity of ICU data, thereby holding great potential to optimize patient management and improve clinical outcomes.

11:00-13:00, Paper TuBPSR.2
Mitigating Exposure Bias in Score-Based Generation of Molecular Conformations

Wang, Sijia	Soochow University
Wang, Chen	Soochow University
Zhao, Zhenhao	Soochow University
Zhang, Ji-Qiang	Ningxia University
Cai, Weiran	Soochow University
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Application of Artificial Intelligence Abstract: Molecular conformation generation poses a significant challenge in the field of computational chemistry. Recently, Diffusion Probabilistic Models (DPMs) and Score-Based Generative Models (SGMs) are effectively used due to their capacity for generating accurate conformations far beyond conventional physics-based approaches. However, the discrepancy between training and inference rises a critical problem known as the exposure bias. While this issue has been extensively investigated in DPMs, the existence of exposure bias in SGMs and its effective measurement remain unsolved, which hinders the use of bias compensation methods for SGMs, including ConfGF and Torsional Diffusion as the representatives. In this work, we propose a method for measuring exposure bias in SGMs used for molecular conformation generation, which can confirm the significant existence of exposure bias in these models and measure the value of bias. This enables the development of bias compensation algorithms for DPMs to be adapted into SGM. Experimental results show that by introducing the compensation method Input Perturbation, originally used for training DPMs, into SGM-based molecular conformation models can significantly improve both the accuracy and diversity of the generated conformations. By using the IP-enhanced Torsional Diffusion model, we achieve new state-of-the-art performance on the GEOM-Drugs dataset and are on par on GEOMQM9. We provide the code publicly at https://github.com/jia-975/torsionalDiff-ip.

11:00-13:00, Paper TuBPSR.3
FedCGSU: Client Grouping Based on Similar Uncertainty for Non-IID Federated Learning

Liu, Hesheng	Jiangsu University
Feng, Li	Jiangsu University
Mei, Muyu	Jiangsu University
Keywords: Swarm Intelligence, Deep Learning, Computational Intelligence Abstract: 联邦学习（FL）是一种方法，它允许资源有限的节点进行协作，而无需共享其数据。然而，本地设备的计算性能、数据的非独立性和同分布性（Non-IID）以及有限的通信资源等都不可避免地降低了模型的收敛速度和精度。在Non-IID数据下获得性能优越的联邦学习模型迫在眉睫。在本文中，我们提出了FedCGSU，这是一种基于本地客户端分布中相似不确定性的客户端分组联邦学习方法。FedCGSU通过利用局部分布的相似性来实现分组，从而减少了不同分布的客户之间权重差异的影响。然后，在聚合方法中将损失值和本地数据量相结合，从而减少了计算弱设备对全局收敛速度的影响。利用3个公共数据集进行了实验，结果表明，FedCGSU方法在提高准确性和加快收敛速&#

11:00-13:00, Paper TuBPSR.4
NQNR: News Recommendation Method Based on News Quality-Aware Modeling

Xu, Baojie	Qilu University of Technology (Shandong Academy of Sciences)
Yang, Zhenyu	Qilu University of Technology (Shandong Academy of Sciences)
Huang, Yan	Qilu University of Technology (Shandong Academy of Sciences)
Hu, Wenyue	Qilu University of Technology (Shandong Academy of Sciences)
Zhang, Zhibo	Qilu University of Technology
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence Abstract: Personalized News Recommendation (PNR) can enhance user experience by alleviating information overload. Traditional news recommendation methods consider all clicking behaviors as user interests, resulting in biased user modeling that fails to accurately capture user interests. In addition, although there are methods to reduce the impact of low-quality news at the representation level by simply filtering it through the attention mechanism. However, this only implicitly models the news in the interaction sequence without specifically considering the quality of each news, and thus has very limited effect in identifying noise. To address these issues, this paper proposes News Recommendation method based on News Quality-aware modeling (NQNR). We attempt to explicitly model the news in the click sequence and andidate ranking one by one to visually assess the quality of each news. Specifically, we design a detection module to detect whether the input news is low-quality news. Then, by reducing the influence of low-quality news in user modeling and candidate ranking, user interests are modeled more accurately, while recommendations of such news are reduced for users. In addition, to capture the similarity of vectors more accurately, we also design a similarity computation method based on the multiple attention mechanism in the detection module. Experiments on a large real-world Microsoft News Dataset (MIND) show that our model significantly outperforms previous models. Our code is posted at the following URL: https://github.com/xxbbjj/NQNR-.

11:00-13:00, Paper TuBPSR.5
Uncertainty-Based Continual Learning for Neural Networks with Low-Rank Variance Matrices

Rao, Xuan	Beijing Normal University
Zhao, Bo	Beijing Normal University
Liu, Derong	Southern University of Science and Technology
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning Abstract: Bayesian inference has provided the continual learning (CL) with an elegant framework where past experiences and new knowledge are consolidated into the posterior constantly. Typical approaches rely on Bayesian neural networks whose parameters are updated by variational inference, namely, maximizing the evidence lower bound of log-likelihood. In this paper, we discuss the effects of local reparameterization on the optimization of such networks in the context of CL. The empirical results show that it does not only increase the inference speed of neural networks, but also enhance the CL performance in some scenarios. Additionally, motivated by the observation that variance matrices have low-rank structures, we propose the d-tied variational continual learning (d-tied-VCL) to improve the parameter efficiency of variational continual learning (VCL). Experiments on random classification, permuted MNIST, and split CIFAR100 show that even VCL with rank-1 variance matrices achieves competitive performance.

11:00-13:00, Paper TuBPSR.6
A Collaborative Heterogeneous Graph Neural Network for Personalized News Recommendation

Shi, Chenglong	Qilu University of Technology (Shandong Academy of Sciences)
Geng, Haibin	Qilu University of Technology (Shandong Academy of Sciences)
Jiang, Wenfeng	Qilu University of Technology (Shandong Academy of Sciences)
Li, Jinbao	Qilu University of Technology (Shandong Academy of Sciences)
Wu, Chao	Qilu University of Technology (Shandong Academy of Sciences)
Liu, Song	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence Abstract: Personalized news recommendation is the process of predicting the relevance of news to users and recommending news to user to fulﬁll their information needs. However, existing news recommendation methods extract semantic information from users and candidate news respectively, ignoring semantic interaction information between users and candidate news. Furthermore, previous models only use same node types for message passing, ignoring different characteristics and topology between different node types. In addition, existing methods learn news representations through text representations, ignoring semantic correlation information between entity relationships and texts. To solve these problems, we propose a personalized news recommendation model named CoHG. In our model, we design a collaborative fusion module to obtain semantic interaction information through interacting with user history news and candidate news. Then, we design a heterogeneous gated graph neural network that maps different node types into a same space to extract higher-order information in user graphs for message passing. Moreover, we design an enhanced relevant attention module to enhance semantic correlation information of text content by aggregating text representation and entity representation into a unified representation. Finally, we conducted experiments on MIND and Adressa datasets to compare with other baseline models.

11:00-13:00, Paper TuBPSR.7
A GCN-Based Model for Next POI Recommendation with Fusion of Global and Local Information

Luan, Wenjing	Shandong University of Science and Technology
Liu, Kaixuan	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Guo, Xiwang	Liaoning Petrochemical University
Keywords: Neural Networks and their Applications, Deep Learning, Representation Learning Abstract: With the popularity of location-based services, point of interest (POI) recommendation has become a hot research topic. Current researches mainly focus on the analysis of personal check-in trajectories to obtain user preferences. However, a user’s check-in data is generally sparse, and it is difficult to make accurate recommendation by only using the user’s local information. Additionally, the public’s check-in behavior may exhibit common visiting patterns, and incorporating global check-in information is beneficial for enhancing the learning of individual user preferences. Therefore, we propose a GCN-based model with Fusion of Global and Local information (GFGL) for the next POI recommendation. The model obtains global information such as spatial distance, social relationships, and transition probabilities from all users’ visit trajectories, and utilizes graph convolution network (GCN) for learning of multi-dimensional global information. It considers the asymmetry of transition probability graphs and the symmetry of spatial distance and social relationship graphs, conducting different operations on these two types of graphs. Next, we fuse global information with user local information through the user context information embedding module. Besides, a long short-term memory (LSTM) model is employed to dynamically perceive the transitions of POI categories and the sequential behaviors of users. A transformer model is utilized to explore the relationships between non-adjacent visits in trajectories, understanding the overall preferences of a user. Extensive experiments on two real-world datasets demonstrate the superiority of GFGL against state-of-the-art methods in the next POI recommendation.

11:00-13:00, Paper TuBPSR.8
Beyond Label: Cold-Start Self-Training for Cross-Domain Semantic Text Similarity

Huang, Bo	Central South University
Liu, Jiasong	Central South University
Liu, Xin	Central South University
Chen, Cui	Central South University
Zhang, Zuping	Central South University
Keywords: Deep Learning, Representation Learning, Transfer Learning Abstract: In Natural Language Processing (NLP), understanding the semantic connection between two texts, a semantic text similarity (STS) task, is a significant challenge. Especially in resource-restricted and cross-domain settings, traditional methods are limited by the high cost of data labeling. This challenge is particularly evident in resource-restricted and cross-domain settings, where traditional methods face high data labeling costs. We propose an innovative method, designated as ``Cold-Start Self-Training" that reduces reliance on large labeled datasets for STS tasks in resource-restricted settings. This method utilizes dual-view pooling to extract semantic similarity information from unlabeled data and generates pseudo-labeled data to fine-tune the cross-encoder model. Dual-view pooling combines different pooling results of the same text to evaluate semantic similarity without additional model tuning, simplifying the self-training process. Experimental results show that our method significantly improves the cross-encoder model's performance on STS tasks in the medical domain. Our findings provide new strategies for cross-domain STS tasks, challenging the traditional reliance on extensive labeled data. We also validate the potential of unsupervised pre-trained models for cross-domain tasks, offering theoretical and practical support for complex challenges.

11:00-13:00, Paper TuBPSR.9
Optimizing Linux Scheduling Based on Global Runqueue with SCX

Tang, Qinan	Xiamen University
Gao, Xing	Xiamen University
Li, Guilin	Xiamen University
Lin, Juncong	Xiamen University
Keywords: System Architecture Abstract: In Linux kernel version 6.6, the Earliest Eligible Virtual Deadline First (EEVDF) scheduler was introduced as the new default scheduler. However, due to its high computational complexity, it may not be suitable for all application scenarios. In cases where there are a large number of short-term tasks or frequent task communication, EEVDF can incur excessive context switch overhead and performance degradation. To address these issues, we propose a lightweight scheduling strategy named SRAND. Leveraging the programmable scheduling framework `sched_ext` and employing BPF technology, we have implemented a strategy based on global and local run queues. This strategy utilizes a five-level BPF mapped queue to partition tasks with different virtual runtimes. Tasks with smaller virtual runtimes are placed into a FIFO-type global queue first, enabling priority scheduling for tasks with smaller virtual runtimes. Additionally, we monitor CPU idle states and allocate tasks in a timely manner, enhancing task responsiveness while reducing scheduling complexity.It is worth noting that our strategy integrates user-space scheduling policies into the kernel via an eBPF program loader, thus eliminating the need for kernel code modifications. By implementing the SRAND strategy, we have observed significant improvements compared to Linux's default scheduling strategy EEVDF. Specifically, our proposed strategy averages an 11.83% reduction in process context switch time and an overall performance improvement of 7.02% in stress tests, while maintaining satisfactory load balancing.

11:00-13:00, Paper TuBPSR.10
MorphoMove: Bi-Modal Path Planner with MPC-Based Path Follower for Multi-Limb Morphogenetic UAV

Mustafa, Muhammad Ahsan	Skolkovo Institute of Science and Technology
Yaqoot, Yasheerah	Skolkovo Institute of Science and Technology
Martynov, Mikhail	Skolkovo Institute of Science and Technology
Karaf, Sausar	Skoltech Institute of Science and Technology
Tsetserukou, Dzmitry	Skoltech
Keywords: Robotic Systems, System Modeling and Control, Digital Twin Abstract: This paper discusses developments for a multi-limb morphogenetic UAV, MorphoGear, that is capable of both aerial flight and ground locomotion. A hybrid path planning algorithm based on the A* strategy has been developed, enabling seamless transition between air-to-ground navigation modes, thereby enhancing robot's mobility in complex environments. Moreover, precise path following is achieved during ground locomotion with a Model Predictive Control (MPC) architecture for its novel walking behaviour. Experimental validation was conducted in the Unity simulation environment utilizing Python scripts to compute control values. The algorithm's performance is validated by the Root Mean Squared Error (RMSE) of 0.91 cm and a maximum error of 1.85 cm, as demonstrated by the results. These developments highlight the adaptability of MorphoGear in navigation through cluttered environments, establishing it as a usable tool in autonomous exploration, both aerial and ground-based.

11:00-13:00, Paper TuBPSR.11
State-Of-Charge Estimation of Lithium-Ion Battery Switched Balance System

Li, Heng	Central South University
Wang, Shunli	Central South University
Zhu, Ren	Central South University
Zhao, Xiang	Central South University
Peng, Hui	Central South University
Fan, Yunsheng	Central South University
Zhang, Rui	Changsha University
Keywords: Intelligent Power Grid Abstract: This paper explores the estimation of the State of Charge (SoC) of lithium-ion batteries. Currently, the majority of research efforts focus on the SoC estimation of individual lithium-ion batteries. However, in practical scenarios, lithium-ion batteries are commonly connected with balancing circuits to address battery imbalances. Upon activation of the equalization circuit, the battery's system dynamics transition to a new mode. Therefore, it is difficult for classical SoC estimation algorithms to accurately estimate the real SoC value. In this paper, we employ a switched system methodology to estimate the battery's SoC. We describe the switched system of the Thevenin equivalent circuit model of a lithium-ion battery using a switched resistance balance circuit. Then we use the method of nonlinear switching observer to analyze the convergence and divergence. Finally, we set up an experimental platform and verify the performance of the observer through several sets of experiments.

11:00-13:00, Paper TuBPSR.12
Simulation and Control of Slope Bottlenecks Based on Cellular Automata in Mixed Traffic Flow

Zhang, Fengqi	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Luan, Wenjing	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Guo, Xiwang	Liaoning Petrochemical University
Keywords: Intelligent Green Production Systems, Autonomous Vehicle, System Modeling and Control Abstract: Traffic congestion frequently occurs on slope segments of highways, which is a typical bottleneck. With the development of connected and autonomous vehicle (CAV) technology, there will be a scene that CAVs and human driven vehicles (HDVs) co-exist. This work studies a slope bottleneck on highway in mixed traffic scenarios and proposes a traffic flow model for slope bottlenecks incorporating CAV platooning based on cellular automata. A novel traffic flow control strategy for slope bottlenecks is proposed based on variable speed limit (VSL) and vehicle platooning. Firstly, it divides the upstream section of the slope bottleneck into two zones for implementing VSL and vehicle platooning, respectively. Via speed restrictions within the VSL zone, the inflow of vehicles into the vehicle platooning zone is effectively mitigated to create low traffic density. In the vehicle platooning zone, a hybrid vehicle platooning method for mixed scenarios is proposed. The experimental results demonstrate that our strategy effectively enhances traffic flow of the slope bottleneck, thereby mitigating traffic congestion.

11:00-13:00, Paper TuBPSR.13
Research on Optimization and Scheduling of MPTCP Data Networks in GNSS Network Reference Stations

Hao, Fengqi	Key Laboratory of Computing Power Network and Information Secur
Wang, Yuxin	Qilu University of Technology
Hoiio, Kong	City University of Macau
Ma, Dexin	Qingdao Agricultural Uni Versity, Qingdao 266109, China
Zhao, Xiyuan	Qilu University of Technology
Keywords: Communications Abstract: The Global Navigation Satellite System Network Reference Stations (GNSS-NRS) are pivotal in modern positioning and navigation applications. However, GNSS-NRS face significant communication challenges, including instability during rapid user movement and inconsistent coverage by different operators, which often results in data transmission interruptions and delays. Moreover, the conventional Multipath Transmission Control Protocol (MPTCP) scheduling strategy does not accommodate the diversity of the data types managed by GNSS-NRS. This default approach treats all data uniformly, even under deteriorating network conditions. To overcome these limitations, this paper introduces a 'Multiple Reception' strategy based on MPTCP that ensures timely and reliable GNSS data transmission. This novel strategy differentiates between observational and control data based on their real-time criticality and applies tailored transmission techniques. Under adverse conditions, it utilizes multiple transmission paths and selectively receives control data, thus prioritizing the transfer of critical information. We conducted simulation experiments using MiniNet to compare the 'Multiple Reception' method against conventional and redundant strategies, demonstrating that it significantly reduces data loss rates in GNSS-NRS.

11:00-13:00, Paper TuBPSR.14
Multi-UAV Distributed Collaborative Path Planning Based on NTVPPSO

Liu, Guohui	Southwest University
Wang, Yujun	Southwest University
Dong, XingXiang	South West University
Ran, Kemeng	Southwest University
Keywords: Cooperative Systems and Control, Distributed Intelligent Systems, Modeling of Autonomous Systems Abstract: Multi-UAV cooperative path planning (MUCPP) is one of the core issues of UAV swarms. When the number of UAVs is large or the mission is complex, problems such as poor scalability and slow convergence will be faced. This paper proposes a multi-UAV distributed hierarchical collaborative path planning method based on an improved particle swarm optimization algorithm. In order to speed up the convergence and ensure the global optimality of the results, a coefficient nonlinear time-varying parallel particle swarm optimization algorithm (NTVPPSO) is designed in this method. This algorithm splits the population into multiple sub-populations. Each particle in the sub-population is updated in parallel using non-linear time-varying coefficients, and information is exchanged between the sub-populations. In order to enhance scalability, a distributed framework with a two-layer structure is designed. First, each UAV plans its own path to determine the collaborative goal. Then the collaborative path planning of multiple UAVs is completed sequentially based on heuristic priorities. The simulation results show that the proposed method can converge quickly and the collaborative path tends to be optimal, and is a feasible distributed solution to the MUCPP problem.

11:00-13:00, Paper TuBPSR.15
An Affine-Based Maneuver Control Method for Multi-Agent Cooperative Transportation System Over Switching Formations

Liu, Tianqi	Institute of Automation, Chinese Academy of Sciences
Ai, Xiaolin	Institute of Automation, Chinese Academy of Sciences
Pu, Zhiqiang	Institute of Automation, Chinese Academy of Sciences
Feng, Lv	J-Elephant Robot Company,
Keywords: Cooperative Systems and Control, Distributed Intelligent Systems, System Modeling and Control Abstract: This paper addresses the maneuver control problem for the multi-agent cooperative transportation systems (MACTSs) with double-integrator dynamics over switching formations. The switching formations consists of a pair of operations: one is the switching directed graphs and the other is the corresponding formation configurations. In real cooperative transportation scenarios, varying interaction relationships necessitate distinct system configurations. But it is challenging for researchers to design control law with the evolutions of not only the communication topology but also the system configuration. Drawing inspiration from advancements on switching topologies, we utilize the characteristics of the stress matrix to build up a new LMI inequality to design a novel class of distributed affine-based controller. The global convergence will be achieved as long as the feedback gain matrix and the switching signal satisfy three specific conditions. And we give the corresponding algorithm to calculate the required control parameters. A Lyapunov function is constructed to demonstrate the system's stability and simulation example is provided in detail at the end of this paper to validate our method's efficacy.

11:00-13:00, Paper TuBPSR.16
Dynamic and Cooperative Multi-Agent Task Allocation: Enhancing Nash Equilibrium through Learning

Ribeiro da Costa, Antonio	University Rey Juan Carlos
Moreno, Daniel	University Rey Juan Carlos
Lujak, Marin	University Rey Juan Carlos
Rossetti, Rosaldo	University of Porto
Kokkinogenis, Zafeiris	University of Porto
Keywords: Cooperative Systems and Control, Cyber-physical systems, Modeling of Autonomous Systems Abstract: This paper proposes the Dynamic and Cooperative Multi-Agent Task Allocation (DC-MATA) problem, focusing on individually rational agents in a cooperative organization, which allocate dynamically changing tasks over time. DC-MATA aims at dynamically improving Nash equilibrium over time through learning in this context. Task utilities evolve dynamically, and learning, conducted in rounds, optimizes agents' task selection order to enhance system performance. Our proposed DC-MATA solution approach assigns agents to tasks with highest utility over time and tends towards the Nash equilibrium that aligns with agents' self-interest while improving the gap with system optimum. We propose priority-sensitive reward function and four action sampling algorithms (varepsilon-greedy, varepsilon-decay, Adapted Simulated Annealing, and Prior Sequence-Aware Sampling - PSAS) leveraging a Markov decision process (MDP) framework. Simulation experiments on our newly proposed GitHub benchmark instances confirm robust performance, facilitating efficient task allocation in the DC-MATA scenario.

11:00-13:00, Paper TuBPSR.17
A Deep Learning Approach to High Accuracy Driver Identification Using Physiological Signals with Optimal Driver Pool Size (I)

Ao, Guo	Nagoya University
Huang, Zhiying	Hosei University
Meng, Yuang	Hosei University
Ma, Jianhua	Hosei University Japan
Keywords: Biometric Systems and Bioinformatics, Deep Learning Abstract: If drivers can be correctly recognized from their physiological signals, personalized services such as dynamical adjustments to the seat, backrest, and headrest can be provided during driving to ensure comfort and safety. However, such services require high accuracy identification of drivers, i.e., identification accuracy more than 99%. To determine whether such high accuracy can be achieved using drivers' physiological signals, we propose identifying only a limited group of drivers for a particular vehicle. Specifically, we built two deep learning models from three common physiological signals. To achieve high accuracy identification of drivers, we adjusted the size of the drivers to be identified (driver pool) to achieve a high identification accuracy of drivers. Our results show that when the maximum driver pool sizes are 5 and 2, the identification accuracies reached 95% and 99%, respectively.

11:00-13:00, Paper TuBPSR.18
Modularity Maximization-Incorporated Nonnegative Tensor RESCAL Decomposition for Dynamic Community Detection (I)

Fang, Hao	Southwest University
Wang, Qu	Southwest University
Hu, Qicong	Southwest University
Hao, Wu	Southwest University
Keywords: Complex Network, Machine Learning, Knowledge Acquisition Abstract: Dynamic community detection is crucial for elucidating the temporal evolution of social structures, information dissemination, and interactive behaviors within complex networks. Nonnegative matrix factorization provides an efficient framework for identifying communities in static networks but fall short in depicting temporal variations in community affiliations. To solve this problem, this paper proposes a Modularity maximization-incorporated Nonnegative Tensor RESCAL Decomposition (MNTD) model for dynamic community detection. This method serves two primary functions: a) Nonnegative tensor RESCAL decomposition extracts latent community structures in different time slots, highlighting the persistence and transformation of communities; and b) Incorporating an initial community structure into the modularity maximization algorithm, facilitating more precise community segmentations. Comparative analysis of real-world datasets shows that the MNTD is superior to state-of-the-art dynamic community detection methods in the accuracy of community detection.

11:00-13:00, Paper TuBPSR.19
How to Reduce Loss of Personnel Arrangement? a Group MultiRole Assignment Perspective (I)

Ke, Xintong	School of Computer Science and Technology Guangdong University O
Lin, Xuewei	School of Computer Science and Technology Guangdong University O
Zhu, Haibin	Nipissing University
Liu, Dongning	Guangdong University of Technology
Keywords: Cooperative Systems and Control, Adaptive Systems, Decision Support Systems Abstract: Most project development processes are iterative and can be divided into multiple tasks. One person can take on multiple tasks, and one task can be assigned to multiple people. The many-to-many personnel allocation method greatly improves the efficiency of the project and saves the cost of the project. There will be two different losses in this allocation plan: the tasks undertaken by personnel are too discrete and the personnel are easily distracted when undertaking important tasks. This paper first formally models the project personnel allocation problem through the Group Multi Role Assignment (GMRA) model. Then two new constraint formulas were proposed to extend the GMRA model to reduce the loss of personnel allocation and the necessary and sufficient conditions of the extended method were proved. Subsequently, two large-scale simulation experiments were carried out to compare and demonstrate the differences between the expanded new method and the original model, and to explore the sufficient and necessary conditions to increase the speed of finding feasible solutions for the new method. Using the improved model for arranging personnel of development projects not only enables efficient many-to-many allocation but also helps reduce a lot of hidden losses in the project process.

11:00-13:00, Paper TuBPSR.20
Solving the Bank Credit Decision Problem Via Revised Group Multi-Role Assignment (I)

Zeng, Kaizhe	Guangdong University of Technology
Liu, Weijian	Guangdong University of Technology
Zhu, Haibin	Nipissing University
Liu, Dongning	Guangdong University of Technology
Keywords: Decision Support Systems, Cooperative Systems and Control Abstract: Abstract—The Bank Credit Decision-Making Problem (BCDMP) is one of the main issues that bank operations need to face. To obtain the maximum profit value and optimal loan plan of the bank as much as possible, this article suggests converting BCDMP into a Many-to-Many Assignment Problem, which can be specified by the Multi-Role Assignment (GMRA). GMRA is a sub-model of the E-CARGO. By revised GMRA, the relationship between the enterprises and loans is converted into the relationship between agents and roles, and a multi-dimensional and multi-index evaluation method is used to evaluate the matching degrees between enterprises and loans. We use the Entropy Weight Method (EWM) and the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to obtain the enterprises' score for a loan through four indicators: profits, inventory, turnover ability, and credit. Then we considered the impact of different loan interest rates on bank profits, obtained enterprise scores under different loan interest rates, and used linear programming to solve the problem, achieving good results (The bank achieved a profit margin of 5.43176% via revised GMRA). Keywords—E-CARGO, The Bank Credit Decision-Making Problem (BCDMP), Group Multi-Role Assignment (GMRA), Revised Group Multi-Role Assignment (RGMRA), Entropy Weight Method (EWM), Technique for Order Preference by Similarity to Ideal Solution (TOPSIS).


TuTU	MR05
“eud1w” (Extended Reality and Intelligent Robotics for Cognitive Wellness (XRICog): Innovating Mental Health, Workload Management, and Decision-Making through XR and AI-Enhanced Social Robots)


TuK4N	HALL C&D
Keynote 4 Chairperson: Prof. Haibin Prof.Dr. S.Joe.Qin


TuCT1	MR01
Computational Intelligence and Soft Computing 3	Regular Papers - Cybernetics
Chair: Du, Guodong	Harbin Institute of Technology, Shen Zhen

15:00-15:20, Paper TuCT1.1
Anomaly-Free Prior Guided Knowledge Distillation for Industrial Anomaly Detection

Li, Gang	Qilu University of Technology
Chen, Tianjiao	Qilu University of Technology
Li, Min	Qilu University of Technology
Han, Delong	Qilu University of Technology
Zhou, Mingle	Qilu University of Technology
Keywords: Machine Vision, Application of Artificial Intelligence, Deep Learning Abstract: 在工业制造中，视觉异常检测是通过检测和防止生产异常。异常检测方法基于知识蒸馏，展现出前途无量在解决不可预测性方面的绩效，以及异常的多样性。然而，他们缺乏在处理时，来自无异常先验的有效指导异常特征和未充分利用的多尺度特征在分割评分阶段，产生次优结果检测结果。为了缓解这些问题，我们提出了一种无异常先验指导知识蒸馏（APG）工业异常检测。首先，它过滤通过训练去噪目标网络来呈现异常特征具有知识蒸馏结构。同时，我们提出先验感知传播模块（P3M），通过以下方式提取更有效的无异常特征对异常特征施加约束。其次，我们提出多尺度先验引导融合模块（MPGFM）利用作为先验目标网络的无异常特征 &#

15:20-15:40, Paper TuCT1.2
FGSNet: A Finer-Grained Siamese Network for Industrial Few-Shot Anomaly Detection

Han, Delong	Qilu University of Technology
Xu, Luo	Qilu University of Technology
Zhou, Mingle	Qilu University of Technology
Li, Gang	Qilu University of Technology
Li, Min	Qilu University of Technology
Keywords: Machine Vision, Application of Artificial Intelligence, Deep Learning Abstract: 异常检测通常依赖于大量的训练集样本和异常样本相对罕见且在日常工业场景中很难获得。在 Few-Shot 异常检测（FSAD）的方案，如何更好地利用少数图像的细粒度特征是一个关键问题。为了解决这些问题，更细粒度的本文提出了暹罗网络（FGSNet）。FGSNet公司考虑 Few-Shot Anomaly Detection （FSAD）的设置由两个阶段组成。第一阶段是负责用于提取更细粒度的特征并大致对齐该图像的特点，同时结合了高效的细粒度特征融合模块（F3M）以增强特征表示法。同时，我们设计了一个深度监督损失提升信息细粒度水平萃取。第二个主要用于提炼和对融合的特征进行去噪，然后进行详细的对齐后续特征分布建模的操作。与大多数使用单类的现有 FASD 模型相比训练和单模型评估&

15:40-16:00, Paper TuCT1.3
Self-Supervised Generative Pre-Trained Model with a Learnable Mask Network for Industrial Time Series Prediction

Wang, Chenze	Tongji University
Wang, Han	Tongji University
Liu, Qing	Tongji University
Keywords: Deep Learning, Application of Artificial Intelligence, Representation Learning Abstract: Industrial time series prediction (ITSP) is an indispensable part of predictive control in modern industry. Recently, supervised deep learning-based methods have provided solutions with sufficient annotated data. However, there is massive unlabeled data with complex temporal features in modern industrial production, resulting in poor performance of these methods. To address this problem, a self-supervised generative pre-trained model with a learnable mask network (SSGPM-LMN) is proposed in this paper. First, the multivariate time series are made into patches channel-independently. Then, these patches are fed into a Transformer encoder with the learnable mask-reconstruction paradigm, drawing mask indices with high temporal features by calculating the cosine similarity in low-dimensional feature space to better learn general representations. Furthermore, a two-step fine-tuning strategy, including linear probing and full fine-tuning, is adopted for various downstream scenarios. Finally, extensive experimental results on case studies of ITSP and transfer learning indicate that our SSGPM-LMN achieves superior performance.

16:00-16:20, Paper TuCT1.4
Impacts of Darwinian Evolution on Pre-Trained Deep Neural Networks

Du, Guodong	Harbin Institute of Technology, Shen Zhen
Jiang, Runhua	Xiamen University Malaysia
Yang, Senqiao	Harbin Institute of Technology, Shenzhen
Li, Haoyang	Xiamen University Malaysia
Chen, Wei	Harbin Institute of Technology, Shenzhen
Li, Keren	Shenzhen University
Goh, Sim Kuan	Xiamen University Malaysia
Tang, Ho-Kin	Harbin Institute of Technology (Shenzhen)
Keywords: Computational Intelligence, Deep Learning, Evolutionary Computation Abstract: Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propagation (BP) and its variants to learn representations from data, BP does not incorporate the evolutionary processes that govern biological neural systems. This work proposes a neural network optimization framework based on evolutionary theory. Specifically, BP-trained deep neural networks for visual recognition tasks obtained from the ending epochs are considered the primordial ancestors (initial population). Subsequently, the population evolved with differential evolution. Extensive experiments are carried out to examine the relationships between Darwinian evolution and neural network optimization, including the correspondence between datasets, environment, models, and living species. The empirical results show that the proposed framework has positive impacts on the network, with reduced over-fitting and an order of magnitude lower time complexity compared to BP. Moreover, the experiments show that the proposed framework performs well on deep neural networks and big datasets.

16:20-16:40, Paper TuCT1.5
Conflict-Free Genetic Algorithm with Nash Equilibrium Seeking for Game-Based Battery Swapping Station Recommendation

Sun, Changlong	South China University of Technology
Xu, Xinxin	Ocean University of China
Chen, Chun-Hua	South China University of Technology
Hong, Jun	South China University of Technology
He, Zhenan	Sichuan University
Yu, Dengxiu	Northwestern Polytechnical University
Kwong, Sam Tak Wu	Lingnan University
Zhan, Zhi-Hui	South China University of Technology
Keywords: Evolutionary Computation, Computational Intelligence Abstract: The rapid growth of electric vehicles (EVs) has led to significant challenges in providing efficient and sustainable charging solutions. This paper addresses the battery swapping station (BSS) recommendation problem by proposing a novel conflict-free genetic algorithm (CFGA) integrated with a Nash equilibrium seeking (NES) approach to identify optimal Nash equilibrium (ONE) solutions to such a non-cooperative optimization problem. The CFGA employs specialized crossover and mutation operators to generate offspring that satisfy the constraints of the problem, ensuring that each EV decides a unique battery swap strategy without conflict. Firstly, an order crossover operator is proposed to preserve the order of genes in the chromosomes. Secondly, a replacement and exchange mutation operator is proposed to enhance mutation diversity. The resulting optimal solution is then used as the initial strategy for the NES, which iteratively converges to the ONE. The proposed CFGA with NES algorithm is evaluated under both small-scale and large-scale cases, demonstrating its effectiveness in achieving a balance between costs for EVs and utilization for BSSs. The study’s findings have practical implications for the smart grid and EV integration, offering a robust method for optimizing EV infrastructure and operations.

16:40-17:00, Paper TuCT1.6
Transforming GP-CNN Tree Search into Trainable Architectures for Image Classification

Sun, Feng	Shenzhen Institute for Advanced Study, University of Electronic
Ke, Yan	Shenzhen Institute for Advanced Study, University of Electronic
Gong, Yue-Jiao	South China University of Technology
Li, Yun	Shenzhen Institute for Advanced Study, University of Electronic
Keywords: Evolutionary Computation, Computational Intelligence, Image Processing and Pattern Recognition Abstract: Data-efficient image classification poses a challenge in achieving effectiveness with limited data, as evidenced by the current methods based on convolutional neural networks (CNNs) and genetic programming (GP). Existing works employing these two methods encounter limitations, such as a lack of flexibility and an inability to effectively explore the latent features of the data. To tackle these challenges, this paper introduces a genetic programming method for data-efficient image recognition, leveraging novel function sets, terminal sets, and program structures. This method transforms tree-based data structures in GP into trainable CNN architectures. Further, by employing block structures instead of single operations in the search space, the search space is reduced and the stability of the search structures enhanced. Comparative experiments with state-of-the-art neural network methods and GP-based methods on data-efficient classification datasets validate the GP-CNN method offering higher performance.


TuCT2	MR02
Deep Learning and Neural Networks 6	Regular Papers - Cybernetics
Chair: Zhang, Mengkang	Dalian University of Techonology

15:00-15:20, Paper TuCT2.1
Ellipsis Resolution: Generative Adversarial Networks and Fine-Tuning with Large-Scale Models for Improved Performance

Zhang, Mengkang	Dalian University of Techonology
Xu, Xiujuan	Dalian University of Technology
Zhao, Xiaowei	Dalian University of Technology
Keywords: Deep Learning Abstract: Ellipsis phenomenon is prevalent, particularly in languages like Chinese, and can be observed across various domains, including everyday conversations, literary works, and product evaluations. Such ellipsis poses challenges for machines, affecting the performance of natural language processing tasks such as machine translation and comprehension. Current approaches to tackle ellipsis primarily involve fine-tuning pre-trained language models like BERT. This approach encompasses two subtasks: 1) detecting ellipsis positions, i.e., identifying locations where ellipsis occurs within a sentence, and 2) completing the elliptical content by predicting the missing information based on the detected positions. This paper proposes a novel model that combines fine-tuned BERT with Generative Adversarial Networks (GANs), incorporating a sampler between the generator and discriminator. The proposed method simultaneously utilizes the generator, discriminator, and sampler to locate and complete ellipsis. Experimental results demonstrate that our approach achieves an F1 score improvement of 0.03 in ellipsis position detection and an EM (Exact Match) improvement of 0.1 in ellipsis content completion compared to the baseline method.

15:20-15:40, Paper TuCT2.2
Fabric Defect Detection Based on Hybrid Attention Transformer and Improved Cascade R-CNN

Yao, Li	Donghua University
Song, Simeng	Donghua University
Wan, Yan	Donghua University
Keywords: Deep Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition Abstract: Various defects arise during textile production, making fabric defect inspection essential for production and quality management in the textile industry. However, fabric defect detection techniques face challenges due to defects with disparate aspect ratios, high foreground-background similarity, and tiny sizes. Based on these issues, we propose a fabric defect detection method that combines an improved Cascade R-CNN (SPCNet) network and Super-Resolution reconstruction technology. Firstly, defective images are reconstructed using a hybrid attention Transformer (HAT), enhancing texture details and edge information. We design a new multi-stage defect detection model SPCNet to identify fabric defects. The architecture includes a feature extraction module based on Switchable Atrous Convolution (SAC). SAC can obtain a larger receptive field for better detection of tiny defects. Path Aggregation Network (PANet) is introduced to improve the recognition of scale-unbalanced defects. In addition, Cascade RPN (C-RPN) is adopted to fully use deep and shallow features. To solve the issue of imbalanced defect classes, we adopt Class-aware Sampling (CAS) strategy. Soft-NMS is used to reduce the false deletion of defective feature detection boxes. Comparative experimental results demonstrate that the defect detection method combining HAT and SPCNet can significantly raise the overall recognition rate of multi-class fabric defects, exceeding the performance of other current methods.

15:40-16:00, Paper TuCT2.3
DHFusion: Deep Hypergraph Convolutional Network for Multi-Modal Physiological Signals Fusion

Shen, Yuanhang	South China University of Technology
Chen, C. L. Philip	South China University of Technology
Zhang, Tong	South China University of Technology
Keywords: Multimedia Computation, Deep Learning Abstract: Multi-modal physiological signals fusion integrates multiple heterogeneous physiological signals to characterize human physiological activities, which is basic research on biomedical signal processing. Most researches on multi-modal physiological signal fusion are based on heterogeneous graph fusion networks (HGFNs) while they ignore the high-order correlations between multi-modal physiological signals. Moreover, most HGFNs suffer from the over-smoothing problem, causing failures in deeper networks. To address these issues, this paper proposes a deep hypergraph convolutional network called DHFusion for multi-modal physiological signal fusion. Specially, DHFusion designs deep hypergraph convolution (DHGCN) layer to effectively integrate multiple physiological signals and obtain multi-modal features. DHFusion then introduces JK readout layer to enable multi-layer DHGCN to capture deep multi-modal features based on high-order correlations. DHFusion effectively solves the over-smoothing of HGFNs and performs well in representing deep multi-modal features. Experimental results over the state-of-the-art methods on two benchmark datasets demonstrate the effectiveness of the proposed method.

16:00-16:20, Paper TuCT2.4
Internet Outages During Times of Conflict

Oliveira, Luiz Felipe	IFRJ
Ballantyne, Rob	Simon Fraser University
Souza, Jano	Federal University of Rio De Janeiro
Trajkovic, Ljiljana	Simon Fraser University
Keywords: Machine Learning, Deep Learning, Complex Network Abstract: We analyze datasets collected by the Center for Applied Internet Data Analysis (CAIDA) and the Réseaux IP Européens (RIPE) sites to demonstrate the efficacy of machine learning models in predicting Internet outages and disruptions that affect and hinder network access to essential services and humanitarian aid in times of conflict. Custom datasets for specific geographic areas are created using data filtering. A sliding time window transformation was applied to design models that accurately predict disruptions. Model performance is evaluated based on the prediction accuracy using Internet Outage Detection and Analysis (IODA) and RIPE datasets.

16:20-16:40, Paper TuCT2.5
Unveiling the Black Box: Differential Cryptanalysis with XAI

Goi, Yue-Tian	Monash University Malaysia
Leong, Shu-Min	Monash University Malaysia
Phan, Raphael	Monash University
Lai, Shangqi	CSIRO's Data61
Salagean, Ana	Loughborough University
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications Abstract: At CRYPTO’19, Gohr[1] presented ResNet-based neural distinguishers (ND) for the round-reduced SPECK32/64 cipher. However, due to the black-box use of such deep learning models, it is hard for humans to understand why these distinguishers work, impeding advancements in cryptanalytic knowledge. In this work, we aim to effectively adapt eXplainable Artificial Intelligence (XAI) techniques, notably Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP), to gain a detailed understanding of the important features useful in Gohr’s neural distinguishers.

16:40-17:00, Paper TuCT2.6
CMACC: Cross-Modal Adversarial Contrastive Learning in Visual Question Answering Based on Co-Attention Network

Yi, Zepu	Huazhong University of Science and Technology
Lu, Songfeng	Huazhong University of Science and Technology
Tang, Xueming	Huazhong University of Science and Technology
Wu, Junjun	Huazhong University of Science and Technology
Zhu, Jianxin	Huazhong University of Science and Technology
Keywords: Neural Networks and their Applications, Deep Learning, Machine Vision Abstract: 视觉问答（VQA）呈现重要计算机视觉和计算机视觉交叉点的挑战自然语言处理。为了更好地理解问题与图像的关系，我们的模型引入了一种名为CMACC（Cross-Modal 基于对抗性对比学习的共同注意）。中国中华人民共和国食品协会通过一个有效地整合了图像和文本信息。采用跨模态对抗的协同注意力机制对比学习。对抗性学习使文本和图像之间的潜在特征分布，而对比学习使跨多模态样本描述相同的上下文，从而增强模型对多模态输入的适应性。提高鲁棒性，促进更多人的学习广义特征表示，对比学习作为损失函数的一个组成部分被合并，明确强调感知之间的差异相似和不同的特征。此外，高级数据增强技术被集成以进一步增强模型对不同&#


TuCT3	MR03
Cybersecurity and Assurance	Regular Papers - HMS
Chair: Soniya, Soniya	University of Western Ontario

15:00-15:20, Paper TuCT3.1
A Fast and Safe Neuromorphic Approach for Obstacle Avoidance of Unmanned Aerial Vehicle

Wan, Zhong	Defense Innovation Institute
Zhang, Xiangyu	Defense Innovation Institute
Xiao, Xun	National University of Defense Technology
Zhao, Jingyue	Defense Innovation Institute
Tie, Junbo	National University of Defense Technology
Chen, Renzhi	Defense Innovation Institute
Xu, Shi	Defense Innovation Institute
Zhang, Guangda	Defense Innovation Institute
Wang, Lei	Defense Innovation Institute
Dai, Huadong	Defense Innovation Institute
Keywords: Brain-based Information Communications, Systems Safety and Security, Supervisory Control Abstract: Obstacle avoidance is a crucial task in unmanned aerial vehicles (UAV) motion planning. The accuracy and consistency of real-time visual information affect the generation of obstacle avoidance commands, raising higher safety demands for obstacle avoidance. The neuromorphic computing-based obstacle avoidance solution can address these challenges. Dynamic vision sensors (DVS) exhibit low latency, low power consumption, and high dynamic range as novel neuromorphic sensors. Spiking neural networks (SNN) also leverage the same mechanism to efficiently process asynchronous and sparse event data generated by DVS, offering latency and energy efficiency advantages. Additionally, the optimal estimation method effectively mitigates the impact of noise and interference within the system, reducing the influence of errors on the algorithm and enhancing safety. Based on these considerations, this paper proposes a fast and safe obstacle avoidance framework. DVS is used to acquire event data from the environment, and a hardware-compatible lightweight SNN is employed to extract dynamic obstacle position information from the data. Compared to baseline methods, this approach reduces latency by 85%. Furthermore, two estimation methods are used to predict the movement of obstacles, ensuring flight safety by generating different UAV obstacle avoidance actions based on confidence intervals, even in the presence of obstacle information errors and omissions.

15:20-15:40, Paper TuCT3.2
Detection of Low Rate DDoS with Adversarial Attacks by Enhanced Generative Adversarial Networks

Hsiao, Tzu-Yi	National Taipei University of Technology
Yang, Shih-Hsuan	National Taipei University of Technology
Keywords: Systems Safety and Security,, Systems Safety and Security, Networking and Decision-Making Abstract: The Internet of Things (IoT) is vulnerable to cybersecurity attacks because it is typically designed with minimal security considerations. The Low-Rate Distributed Denial-of-Service attacks (LR-DDoS) evade detection by injecting a small portion of malicious packets that persistently lurk within the IoT network. Recently, adversarial LR-DDoS attacks by machine learning approaches further complicate the protection. In this study, an improved detection algorithm based on a generative adversarial network (GAN) is proposed against LR-DDoS. To learn the temporal characteristics, LR-DDoS time-series features are taken as a part of the generator’s input and LSTM (Long Short-Term Memory) modules are incorporated. Two discriminators with adaptive weighting are employed, one for conventional LR-DDoS attacks and the other for the adversarial attacks, to generate plausible LR-DDoS samples. The generated packets are used to train the detection algorithm of an Intrusion Detection System (IDS). Simulation results show that the proposed method achieves an accuracy of 94% in detecting adversarial LR-DDoS packets, which outperforms the state-of-the-art methods by at least 10%.

15:40-16:00, Paper TuCT3.3
AODPFL: An Adaptive Optimization Method for Differentially Private Federated Learning

Qiu, Mengxing	Hebei University
Liang, Xiaoyan	Hebei University
Du, Ruizhong	Hebei University
Keywords: Systems Safety and Security,, Systems Safety and Security Abstract: Federated learning addresses the issues of data silos and privacy to some extent by training models locally on client devices, only uploading model parameters, and aggregating them on the central server. However, attackers can still infer private data information from the uploaded parameters. To solve this issue, differential privacy technology is introduced. The key to differential privacy lies in gradient clipping and noise addition. However, the traditional gradient clipping method often faces the gradient distortion issue and will be ineffective if the noise is large enough. Regarding noise addition, commonly used fixed or fixed decay rate noise scale settings often overlook the characteristics of gradients during training, which might lead to adding improper noise to gradients. To address these issues, we propose an adaptive optimization method for differentially private federated learning (AODPFL). Specifically, we adopt a strategy of clipping the gradient by grouping, effectively reducing gradient distortion. We design an adaptive clipping threshold based on the gradient changes during training to improve model accuracy under large noise conditions. We design a noise scale decay method with a dynamic decay rate to allocate the privacy budget more reasonably and inject appropriate noise into gradients. Experimental results show that compared to other gradient clipping and noise addition methods, our method achieves higher accuracy under the same privacy budget.

16:00-16:20, Paper TuCT3.4
Are ViTs Weak against Patch Attacks?

Soniya, Soniya	University of Western Ontario
Munz, Phil	TorjAI
Narayan, Apurva	University of Western Ontario
Keywords: Systems Safety and Security Abstract: Nowadays, vision transformers (ViTs) are one of the most prominent state-of-the-art models for vision-based tasks. ViTs are being used widely in many safety-critical applications ranging from health care to automotive. However, these widespread deployments also bring the risk of adversarial attacks on ViTs to the forefront. Thus, understanding the vulnerability of vision transforms against possible adversarial attacks is necessary before deployment in real-time scenarios. Adversarial patch attacks represent a practical threat to the viability of ViT-based real-world applications. This study delves into the vulnerability of vision transformers to such attacks, exploring both single and multi-patch adversarial attacks to gauge the robustness of vision transformers across benchmark datasets, including CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet-1k. Experimentally, our findings reveal that poly-multi patch attacks constitute formidable adversarial threats, with vision transformers exhibiting greater vulnerability to poly-multi attacks than single, mono, and split-multi attacks. Additionally, we qualitatively elucidate the impact of patch location on the efficacy of adversarial attacks, providing insights into the factors influencing their effectiveness. Through this study, we aim to enhance our understanding of vision transformers’ susceptibility to adversarial patch attacks, contributing to developing strategies for strengthening their security and resilience in real-world applications.

16:20-16:40, Paper TuCT3.5
KI-Mix: Enhancing Cyber Threat Detection in Incomplete Supervision Setting through Knowledge-Informed Pseudo-Anomaly Generation

Yang, Gang	National University of Defense Technology
Wu, Bo	National University of Defense Technology
Fan, Linna	National University of Defense Technology
Tao, Xia	National University of Defense Technology
He, Jun	National University of Defense Technology
Keywords: Systems Safety and Security,, Systems Safety and Security, Networking and Decision-Making Abstract: Data-driven methodologies have exhibited remarkable performance in identifying various cyber threats. However, obtaining well-labeled training samples is enormously expensive and often challenging when tackling practical cyber-security problems, due to the cost and difficulties in data annotation. To address this issue, we propose KI-Mix, a novel pseudo-anomaly generation algorithm for cyber threat detection on the basis of the limited labeled anomalies and a large volume of unlabeled data. In a nutshell, KI-Mix incorporates security domain knowledge into data interpolation to capture more labeled data to facilitate semi-supervised detection on cyber anomalies. We compare the performance of KI-Mix with several commonly applied augmentation techniques, such as Mixup and CutMix to evaluate its effectiveness in limited annotated data settings. Through extensive experiments on five security datasets covering various aspects of network threats, we demonstrate that KI-Mix outperforms other methods with equivalent baseline models. Notably, KI-Mix is model-agnostic to enable any data-driven threats detection models to handle incomplete supervision problems in real-world cyber threat detection.

16:40-17:00, Paper TuCT3.6
SEASHA3: Secure and Efficient Encryption for the IoT Data by Replacing Subkeys of AES with SHA3

Soni, Ankush	BITS Pilani Goa Campus
Sahay, Sanjay K.	BITS Pilani, Goa, India
Keywords: Systems Safety and Security,, Systems Safety and Security Abstract: Advanced Encryption Standard (AES) is one of the most widely used cipher for data encryption. However, recent studies have pointed out some weaknesses in the subkeys used in AES. Also, the sequential subkeys generation process of AES seems to be inefficient for resource-constrained IoT devices. Therefore, in this paper, we studied how secure and efficient the Secure Hash Algorithm-3 (SHA3) generated subkeys will be for the encryption of IoT data by just replacing the AES subkeys with it. For this purpose we used the statistical test suite provided by the National Institute of Standards and Technology and found that SHA-3 generated subkeys are highly random, and also it is significantly efficient for encrypting the IoT Data. However, as symmetric keys are repeatedly used, therefore, only upto 2 MB data, the encryption is significantly (sim 1300 times) efficient. From the experimental analysis, we find that SEASHA3 generates subkeys that are random, non-linear, and irreversible, making it more secure and efficient than traditional AES subkey generation.


TuCT4	MR04
BMI - Movements, Tones and Music Detection with Brain-Computer Interfaces (Chair: Yaoping Hu)	BMI Workshop Papers
Chair: Yang, Shiau Ru	NYCU

15:00-15:20, Paper TuCT4.1
Adaptive Integrating General and Personalized Features for Enhanced Decoding of Motor Imagery EEG Signals Via HyperNet-Based Module

Kim, Si-Hyun	Korea University
Kim, Sung-Jin	Korea University
Lee, Dae-Hyeok	Korea University
Kwak, Heon-Gyu	Korea University
Lee, Seong-Whan	Korea University
Keywords: Active BMIs Abstract: Brain-computer interface (BCI) technology enables communication between humans and devices by reflecting users' status and intentions. Electroencephalography (EEG) signals are utilized to capture brain electrical activity with no surgical operation. When conducting motor imagery (MI), one of the endogenous BCI paradigms, the users imagine the movement of muscles used when performing a certain movement without actual physical movement. However, not all subjects show outstanding classification performance in decoding MI-based EEG signals. We propose the novel method that utilizes the weights of the pre-trained model to generate the personalized weights, effectively combining the general MI features with the personalized features. We used the 5-fold cross-validation for evaluating the performances, and conducted the experiments in 3 different pre-trained models (Top--3, Top--5, and Top--7). We compared the performances of our proposed method using the baseline and the full fine-tuning. In comparison to our proposed method with the baseline, our proposed method achieved the improvement of the average accuracies in all pre-trained models, and those values were 0.123, 0.138, and 0.143, respectively. When comparing our proposed method with the full fine-tuning, the average accuracies of our proposed method were the highest in all pre-trained models, and the differences in the average accuracies were 0.012, 0.019, and 0.009, respectively. Hence, we demonstrated the possibility of improving the precision and effectiveness of the EEG-based systems by reflecting the individual differences in EEG signals among the subjects with low classification accuracy.

15:20-15:40, Paper TuCT4.2
Recognition of Mandarin Tones in Spoken Sentences from Brain Dynamics Using 2DCNN for Brain-Computer Interfaces

Yang, Shiau Ru	NYCU
Jen-Tzung, Chien	Department of Electronics and Electrical Engineering, National Y
Chen-Yi, Lee	National Yang Ming Chiao Tung University
Keywords: Brain-Computer Interfaces, Brain-based Information Communications, Human-Machine Interface Abstract: Direct speech brain-computer interface (DS-BCI) technology facilitates mind-reading through brain signals, fostering direct communication and control mechanisms. The applications of DS-BCI are extensive, transcending language communication to encompass fields like healthcare and assistive technologies. Given that Mandarin is the predominant tonal language globally, investigation of the tonal characteristics of this language is imperative for DS-BCI advancement. This study collected electroencephalogram (EEG) data from subjects articulating Mandarin sentences, which were then synchronized with corresponding audio recordings. Each word was segmented at the vocalization onset, with a capture window extending 300 milliseconds prior and 200 milliseconds post-articulation. We further engineered a multi-layered CNN architecture for EEG tone recognition. The rational asymmetry (RASM) feature extraction technique proved instrumental in reducing data dimensionality and enhancing model training, culminating in optimal accuracy to identify Tones 1 through 4 at 25.55%, 30.42%, 35.13%, and 27.04%, respectively. This research advances the field by employing complete sentences for EEG-based Mandarin tone recognition.

15:40-16:00, Paper TuCT4.3
Decoding Musical Timbre Perception from Single-Trial EEG Data

Satkunarajah, Praveena	Memorial University of Newfoundland
Power, Sarah	Memorial University of Newfoundland
Zendel, Benjamin	Memorial University of Newfoundland
Keywords: Passive BMIs, Other Neurotechnology and Brain-Related Topics Abstract: Many users of hearing aids report challenges when listening to music. In the future, it may be possible to develop hearing aids that have electrodes which monitors brain activity in real-time and adapts the filters on the hearing aid to match the volitions of the user. In music, this could mean amplifying the sound of the instrument the listener wants to hear. One of the first steps in this research is to determine if a machine learning algorithm can identify to which instrument an individual is listening based only on a brief EEG signal. To test this possibility, participants were presented with a series of brief tones that varied in timbre (Trombone, Clarinet, Cello, Piano and Pure Tone) while their ongoing EEG was recorded from 73 electrodes. Linear Discriminant Analysis (LDA) was used. We investigated four different sets of features – Raw EEG, ERP-based features, Harmonics-based features and Regularity-based features. The Raw EEG based classifier performed significantly above chance (37%) when attempting to distinguish between responses to different musical instruments for 5-way classification. More advanced classification algorithms or different features may be able to better distinguish between tones with a musical timbre.

16:00-16:20, Paper TuCT4.4
Brain-Computer Interface System Based on Common Spatial Patterns for Inner Speech Recognition from Electroencephalography Signal by Using a Convolutional Neural Networks

Abdalla, Hussna	University Putra Malaysia, Malaysia & Sudan International Univer
Al-Haddad, Syed Abdul Rahman	Universiti Putra Malaysia
Bin Basri, Hamidon	University Putra Malaysia
Aris, Ishak	Universiti Putra Malaysia
Yusof, Abdul Hanif Khan	University Putra Malaysia
Neyaz, Hisham	University Putra Malaysia
Keywords: Passive BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics Abstract: Inner speech recognition is a modern advancement in Brain computer interfaces (BCI) that facilitates a communication between the computer and the brain in a direct way. It is particularly beneficial for individuals who face communication issues due to speech disability. In this research, an end-to-end brain-computer interface system was developed that uses Electroencephalography (EEG) signals to recognize inner speech straight from the brain. The study utilized an open-access dataset presented on four Spanish words from ten participants. The proposed system involved preprocessing to clean and enhance the signal. Common Spatial Pattern (CSP) was utilized to extract features to improve the discrimination between the selected four words. A Convolutional Neural Networks (CNNs) deep learning model was used to enhance the inner speech recognition performance from spatial features. The result shows that the model can decode the inner speech with an average accuracy of 81.40% for the un-seeing dataset and 89.80% for the entire dataset, indicating that the proposed method outperforms previous works that used the same dataset.

16:20-16:40, Paper TuCT4.5
Can Quasi-Movements Be Used As a Model of the BCI Based on Attempted Movements?

Yashin, Artem	Moscow State University of Psychology and Education
Vasilyev, Anatoly N.	National Research Center Kurchatov Institute
Shevtsova, Yulia	Moscow State University of Psychology and Education
Shishkin, Sergei L.	NRC "Kurchatov Institute"
Keywords: Brain-Computer Interfaces, Human-Computer Interaction, Assistive Technology Abstract: Brain-computer interfaces (BCIs) based on motor imagery (imagined movements, IM) are among the most common BCIs for the rehabilitation of paralyzed patients. However, it is possible that attempted movements (AM) would be more an effective alternative for IM. Unlike IM, AM are difficult to study outside of clinical practice. Nikulin et al. (2008) suggest that quasi-movements (QM) could help model AM in healthy participants without immobilizing interventions. QM result from the amplitude reduction of an overt movement, which leads to the practical absence of electromyography (EMG) response. The performance of QM may have features that may distance QM from AM. Here, we examined the compatibility of QM with a saccade task, which modelled visual interaction with the outside world during the practical use of a BCI. In a study involving 24 volunteers, we used electroencephalography (EEG), EMG, and conducted an extensive survey of the participants. We expected that, compared to IM, QM in the dual-task condition would be easier and less tiring and would be accompanied by greater event-related desynchronization (ERD) of the sensorimotor rhythms. Our hypotheses were based on the assumption that like AM and unlike IM, QM is a more external task, and so is more compatible with the saccade task. We reproduced the effect of greater ERD for QM in the dual-task condition but did not find any significant difference between the difficulty or tediousness of QM and IM. Nevertheless, the survey data gave us important insights into the challenges participants faced when performing QM. Despite EMG values similar to IM, the feeling of muscle tension experienced by the participants correlated with mean EMG values. The main challenge in performing QM by the participants was to make movements without an amplitude. Performing QM conflicted with the illusion of movement that was supposed to accompany them: without proprioceptive feedback, participants doubt the reality of QM. Our results can be used to improve the procedure of QM training, which should bring them closer to genuine attempts of movements in the eyes of participants.


TuCT5
Autonomous Systems and Robotics 1
Chair: Ndiaye, Mouhamet Latyr	University of La Rochelle

15:00-15:20, Paper TuCT5.1
Autonomous Structural Inspection with an Octocopter UAV: Integration of 3D LiDAR and Gimbal-Mounted Camera

Son, Ji-Hwan	Electronics and Telecommunications Research Institute
Kim, Deokyun	Electronics and Telecommunications Research Institute
Cha, Jihun	Electronics and Telecommunications Research Institute
Keywords: Autonomous Vehicle, Robotic Systems, Consumer and Industrial Applications Abstract: This paper presents an autonomous inspection system that utilizes an octocopter UAV to enhance the structural safety assessments of challenging environments such as bridges, high-rise buildings, and dams. The system features an octocopter equipped with a rotating 3D LiDAR for comprehensive environmental data collection, a three-axis gimbal-mounted camera for high-resolution imaging, and an onboard computer for data processing and control. Our algorithms are designed to recognize environmental structures, identify planar surfaces, and calculate optimal gimbal poses for detailed inspections. The UAV autonomously navigates and performs these tasks, ensuring precise structural assessments. We validate the effectiveness of this system through experiments with the octocopter UAV.

15:20-15:40, Paper TuCT5.2
Networked Systems Diagnostics: A Fusion of Failure Mode and Effects Analysis and a Delphi Expert Study

Ren, Yongxu	Friedrich-Alexander-Universität Erlangen-Nürnberg
Deichsel, Felix	Friedrich-Alexander-Universität Erlangen-Nürnberg
Hopf, Valentin	Friedrich-Alexander-Universität Erlangen-Nürnberg
Seiler, Jürgen	Friedrich-Alexander-Universität Erlangen-Nürnberg
Kaup, André	Friedrich-Alexander-Universität Erlangen-Nürnberg
Beckerle, Philipp	FAU Erlangen-Nürnberg
Keywords: Robotic Systems, Fault Monitoring and Diagnosis Abstract: Networked devices, especially those comprising multiple identical devices, are extensively utilized in industrial scenarios. However, their complexity poses unique challenges in diagnostic processes, demanding efficient methodologies to identify and assess risks. The application of Failure Mode and Effects Analysis (FMEA) for analyzing complex systems, especially those consisting of networked devices, appears to be limited, particularly in identifying critical risk factors. In this paper, we propose a novel pipeline to diagnose networked systems by fusing FMEA with a Delphi (expert) Study. Our approach leverages the collective knowledge of a group of experts through a structured Delphi Study enabling them to contribute individually but also to interact. We demonstrate the applicability of our approach through a case study involving a system of networked mobile robots. Based on a Fault Tree Analysis (FTA), Risk Priority Numbers (RPN) of minimal fault tree cut sets are calculated to identify the most critical mechatronic failures. Our findings show that our methodology provides an RPN ranking that closely aligns with expert insights, highlighting its efficacy in accurately assessing risk in complex networked systems.

15:40-16:00, Paper TuCT5.3
A Deep Reinforcement Learning Approach for Route Planning of Autonomous Vehicles

Paparella, Francesco	Polytechnic University of Bari
Olivieri, Giuseppe	Polytechnic University of Bari
Volpe, Gaetano	Polytechnic University of Bari
Mangini, Agostino Marcello	Polytechnic of Bari
Fanti, Maria Pia	Polytecnic of Bari, Italy
Keywords: Autonomous Vehicle, Intelligent Transportation Systems Abstract: Urban autonomous driving has the potential to enhance both safety and efficiency of transportation in environments also in complex traffic conditions. However, new services and approaches are necessary to manage Autonomous Vehicles in the real traffic. This paper introduces a novel approach to optimize routing in the urban settings by Deep Reinforcement Learning (DRL) techniques. A modular DRL architecture is proposed to obtain a route able to minimize the length of the paths, minimize the number of turns during the travel and select the dedicated lanes. The proposed DRL is implemented on a case study where the agents are trained in a simulation environment for the city center of Bari, a town of Southern Italy.

16:00-16:20, Paper TuCT5.4
Refining Sensitive Document Classification: Introducing an Enhanced Dataset Proposal

Ndiaye, Mouhamet Latyr	University of La Rochelle
Hamdi, Ahmed	University of La Rochelle
Mokhtari, Amdjed	OODRIVE
Ghamri-Doudane, Yacine	La Rochelle Université
Keywords: Enterprise Information Systems, Service Systems and Organizations Abstract: The need for document exchange between people, companies and government increases every day. Consequently, safeguarding documents against potential attackers becomes increasingly crucial. Several attacks have been reported over the past years and the risk of document leak is more present nowadays. To prevent data violation, we need tools to determine the sensitivity degree of documents which allows us to guarantee that only authorized people have access to them and to adapt strategies to sensitivity levels. To achieve this, deep learning techniques have shown good performances in document classification and therefore in sensitivity identification. Such approaches require sufficiently large resources to learn robust models. However, due to the sensitive nature of documents, public datasets are missing to conduct research in this context. In this paper, we experiment with Large Language Models (LLM) to generate a multi-domain dataset of business documents. Utilizing a two-step generation process, we employ several prompting strategies across six language models to create a first dataset of documents classified into 4 sensitivity classes: Public, Internal, Confidential and Restricted. We then relied on human experts to review validate the annotations generated in a sample of documents. Their insights were instrumental in generating the final dataset by identifying the most effective prompting strategies and the top-performing LLMs in both English and French languages for sensitive documents generation. The generated dataset has been tested over two robust baselines.

16:20-16:40, Paper TuCT5.5
Collaborative Scheduling Optimization Method for Multi-Stage Automobile Engine Hybrid Flow Shop (I)

Zhai, Hewang	Chongqing University
Li, Congbo	Chongqing University, Chongqing
Wu, Wei	Chongqing University
Xiong, Maokun	Chongqing University
Yang, Miao	Chongqing University
Keywords: Decision Support Systems Abstract: The production of automobile engines primarily concerns three workshops: casting, machining and assembly. Any stagnation in any of these stages will have a detrimental impact on the operation of downstream production, resulting in increased costs and production delays. Hence, it is imperative to establish a rational scheduling system for the whole production. Therefore, this study proposes a collaborative scheduling optimization method for multi-level hybrid flow shop of automobile engine. Initially, the collaborative production relationships between the workshops are examined. Subsequently, an optimization model of collaborative scheduling for the multi-stage hybrid flow shop is formulated, focusing on reducing collaborative production costs and minimizing the maximum completion time. Furthermore, considering the discrete nature of the scheduling problem, the study explores encoding/decoding strategies based on random key techniques to bridge the gap between continuous algorithms and discrete problems. Consequently, a distributed reference vector guided evolutionary algorithm (DRVEA) is introduced to solve the model. Finally, the effectiveness and superiority of the proposed method are validated through a case study using practical data from an automobile engine enterprise.

16:40-17:00, Paper TuCT5.6
Isolation Forest Backward Particle Swarm Optimization Algorithm and Its Application to Control Problems (I)

Luan, Po-Chien	National Cheng Kung University
Kuo, Ping-Huan	National Chung Cheng University
Cho, Kuan-Ting	National Cheng Kung University
Lee, Chao-Chi	National Cheng Kung University
Huang, Wei-Hsiang	National Cheng Kung University
Chen, Yen-Ming	National Cheng Kung University
Li, Tzuu-Hseng S.	National Cheng Kung University
Keywords: System Modeling and Control, Decision Support Systems, Cooperative Systems and Control Abstract: Premature convergence is a critical issue of Particle Swarm Optimization (PSO). The weak global search capability causes particles trapped in local minima at early stage of learning process. There is several research dedicate to solve this problem over the decade. This paper proposes a new algorithm combined Isolation Forest and Particle Swarm Optimization called Isolation Forest Backward Particle Swarm Optimization (IFB-PSO). The proposed new learning scheme helps particles escaping from local minima. The particle will jump backward to the targeted position when the particle trapped over specific iterations. The destination is precisely selected by Isolation Forest to endow the backward particle hopeful future. IFB-PSO is evaluated by a classic benchmark suite, cart-pole problem, and mountain car problem. Experimental results show that IFB-PSO gets competitive results on the benchmark suite with different dimensions and two control problems in comparison with 11 well-known optimization algorithms. The behavior of backward particles is also analyzed to inspect the utility and efficiency of the backward process.


TuCT6	MR06
Complex and Cooperative Systems
Chair: Ishihara, Shinji	Hitachi, Ltd

15:00-15:20, Paper TuCT6.1
Controlling Autonomous Machines at Construction Sites with Heterogeneous Moving Objects by Model Predictive Control

Ishihara, Shinji	Hitachi, Ltd
Ohtsuka, Toshiyuki	Kyoto University
Keywords: Cooperative Systems and Control, System Modeling and Control, Autonomous Vehicle Abstract: This study examines how to properly control each autonomous machine at a construction site where human-driven construction machines, an autonomous excavator, and an autonomous truck are coexisting. We proposed to utilize Model Predictive Control (MPC) to control two different control targets, an excavator and a truck, using the same method in a unified manner. By taking advantage of the MPC's ability to handle constraint conditions explicitly, we proposed a control method that allows each moving object to operate safely without contact. In order to control each autonomous machine with MPC, the future behaviors of machines other than itself are needed. In this study, a simple but effective model was used to make predictions for human-operated construction machinery. On the other hand, for autonomous machines, we proposed a method to achieve efficient cooperative behavior by utilizing the calculation results of the MPC of other machine. The effectiveness of the proposed method was confirmed by numerical simulations.

15:20-15:40, Paper TuCT6.2
Co-Estimation of SOC and Parameters of Supercapacitors Based on a Switched Model

Chen, Xiaoyang	Central South University
Li, Heng	Central South University
Zhu, Ren	Central South University
Peng, Hui	Central South University
Fan, Yunsheng	Central South University
Zhang, Rui	Changsha University
Keywords: Cooperative Systems and Control Abstract: To ensure optimal functionality of the supercapacitor management system in practical applications, the accurate and robust state of charge (SOC) estimation is crucial, particularly to account for aging effects and varying operating conditions. This paper proposes a switched system-based approach for the co-estimation of SOC and parameters of supercapacitors coupled with balancing resistor circuits. A switched model incorporating an equivalent circuit model is developed to accommodate the activation of equalization within a series-connected supercapacitor pack. The method combines a modified recursive least squares (RLS) algorithm with a switching sliding mode observer (SMO) for real-time parameter adaptation and SOC estimation. The experimental verification under the multi-balancing charging scenarios demonstrates significant enhancements in accuracy and robustness compared to traditional methods employing fixed model configurations and parameters.

15:40-16:00, Paper TuCT6.3
Static and Time-Extended 1-On-1 Multi-Agent Task Allocation with Implements and Limited Autonomy

Lujak, Marin	University Rey Juan Carlos
Gutiérrez-Cejudo, Jorge	University Rey Juan Carlos
Salvatore, Alessio	University of Rome Tor Vergata
Giordani, Stefano	University of Rome Tor Vergata
Fernandez, Alberto	University Rey Juan Carlos
Keywords: Cooperative Systems and Control, Cyber-physical systems, Modeling of Autonomous Systems Abstract: In this paper, we propose and formulate the static and time-extended one-on-one multi-agent task allocation problem with implements and limited autonomy (MATAILA). The objective is to assign tasks to a team of agents statically or over a receding time horizon, while minimizing the overall multi-agent team's cost of performing the tasks and the penalty cost for unaccomplished tasks, all while maintaining sufficient battery level across the team. The basis of the studied problem is the (static one-on-one) axial 3-index assignment problem with the extensions on the time horizon and agents' autonomy. Time-extended MATAILA is a computationally expensive problem, that we simplify by a static MATAILA which focuses only at the tasks pending in the present period and is myopic towards the tasks appearing in the future. We compare the performance of the proposed models in scenarios where all tasks are known a priori. We analyze the performance and scalability of the two approaches experimentally in simulations and show their efficiency in dynamically changing scenarios.

16:00-16:20, Paper TuCT6.4
Cooperative Search and Track of Rogue Drones Using Multiagent Reinforcement Learning

Valianti, Panayiota	University of Cyprus
Malialis, Kleanthis	University of Cyprus
Kolios, Panayiotis	University of Cyprus
Ellinas, Georgios	University of Cyprus
Keywords: Cooperative Systems and Control Abstract: This work considers the problem of intercepting rogue drones targeting sensitive critical infrastructure facilities. While current interception technologies focus mainly on the jamming/spoofing tasks, the challenges of effectively locating and tracking rogue drones have not received adequate attention. Solving this problem and integrating with recently proposed interception techniques will enable a holistic system that can reliably detect, track, and neutralize rogue drones. Specifically, this work considers a team of pursuer UAVs that can search, detect, and track multiple rogue drones over a sensitive facility. The joint search and track problem is addressed through a novel multiagent reinforcement learning scheme to optimize the agent mobility control actions that maximize the number of rogue drones detected and tracked. The performance of the proposed system is investigated under realistic settings through extensive simulation experiments with varying number of agents demonstrating both its performance and scalability.

16:40-17:00, Paper TuCT6.9
How to Avoid Involution in Graduate Selection? a Perspective Based on Group Role Assignment (I)

Huang, WenXin	Guangdong University of Technology
Zhu, Haibin	Nipissing University
Liu, Dongning	Guangdong University of Technology
Keywords: Cooperative Systems and Control Abstract: The phenomenon of "involution" and "lying flat" in today's society vividly portrays the social pressure and development difficulties faced by the younger generation. Today, more and more college students are competing for a limited amount of employment resources, which will lead to an increasingly serious phenomenon of involution. However, due to involution, the respective final benefits did not increase significantly. It will bring a lot of negative effects. As involution intensifies, how to select the most suitable employees and optimize team performance has become a major challenge for enterprises. This paper formalizes the problem by extending the Group Role Assignment（GRA) model. Subsequently, we conducted a large-scale social simulation experiment. Some weights and thresholds are introduced to explore what evaluation standards companies should set and make personal potential value and the performance of management trainees reasonable. In this case, the impact of the individual's final assessment score is the most balanced, thereby achieving optimal group performance. Through this method, companies can clarify assessment standards, recruit the most suitable employees, and avoid graduate involution.

16:40-17:00, Paper TuCT6.10
CNN Distance Estimation Based on Received Signal Strength Indicator (I)

Chu, Hung-Chi	Chaoyang University of Technology
Lai, Hong-Cheng	Chaoyang University of Technology
Wang, Mu-Yen	Chaoyang University of Technology
Wu, Shih-Jung	Chaoyang University of Technology
Keywords: Communications Abstract: With the advancement of technology, an increasing number of applications rely on positioning services. While GPS technology is well-developed for outdoor environments, providing accurate location information indoors remains crucial. Distance estimation is vital component of indoor positioning. However, the typical path loss model (PLM) has limitation in complex indoor settings. This article proposes using deep learning methods to enhance distance estimation. By collecting Wi-Fi signals form reference points and utilizing a 1D Convolutional Neural Network (1DCNN) model for distance classification, we can reduce outliers. In addition, as the number of classification categories increases moderately, the distance estimation error can be effectively reduced. Experimental results show the when the distance classification categories are set to 3, 6 and 12 respectively, the average distance estimation errors are 0.379 meters, 0.113 meters and 0.058 meters.


TuCT7	MR07
Online - AI Applications 3	Regular Papers - Cybernetics
Chair: Zhao, Xu	Institute of Software, Chinese Academy of Sciences

15:00-15:20, Paper TuCT7.1
MedX-Net : Hierarchical Transformer with Large Kernel Convolutions for 3D Medical Image Segmentation

Lu, Lin	Qilu University of Technology (Shandong Academy of Sciences)
Zou, Qingzhi	Qilu University of Technology (ShanDong Academy of Sciences)
Keywords: Image Processing and Pattern Recognition, Deep Learning, Machine Vision Abstract: Due to the exceptional performance of Transformers in 2D medical image segmentation, recent work has also introduced them into 3D medical segmentation tasks. For instance, Swin UNETR and other hierarchical Transformers have reintroduced prior knowledge from several convolutional networks, further enhancing the volume segmentation capabilities of models. The efficacy of these hybrid methodologies is primarily attributed to the substantial quantity of parameters and the non-local self-attention mechanism with a large receptive field. We argue that the behavior of these methods' large receptive fields can be simulated by employing fewer parameters through the utilization of depth-wise convolutions with large kernel. Within this manuscript, we introduce a lightweight volume segmentation model called MedX-Net, which uses convolutional network modules to simulate hierarchical Transformers for robust volume segmentation. Firstly, inspired by the hierarchical Transformer module of Swin UNETR, we investigate large-kernel depth-wise convolutions with different sizes to achieve a reduced model parameter count while maintaining a large global receptive field. Secondly, we replace the multi-layer perceptron (MLP) in the hierarchical Transformer module with Inverted Bottleneck with Depthwise Convolution Enhancement(DWCE) to improve model performance with fewer activation and normalization layers, further reducing the parameter count. We validate the effectiveness and efficiency of our model for volume segmentation on three public datasets: Synapse, BTCV and ACDC. On the Synapse dataset, compared to Swin UNETR, our model achieves an improvement from 83.48% to 87.21% in Dice score. Compared to the result of 86.57% achieved by nnFormer, our model achieves superior performance while reducing the model parameter count by 64%.

15:20-15:40, Paper TuCT7.2
Mutual a Priori Based Completion for Low Overlapping Point Clouds Registration

Liu, Zhiyong	Nanjing University of Science and Technology
Liu, Yazhou	Nanjing University of Science and Technology
Keywords: Machine Vision, Image Processing and Pattern Recognition, Machine Learning Abstract: This work presents a new registration method that especially designed for low-overlapping partial point clouds. Based on the assumption that the partial point clouds to be registered belong to the same target, the proposed Mutual a Priori based Completion (MPC) method uses the partial point clouds to be registered as the a priori for each other to expend each individual point clouds. The main challenge of this mutual reference approach is that the partial clouds without spatial alignment cannot provide reliable completion reference for each. Based on the mutual information maximization, a progressive completion structure is developed to make the alignment in the spatial, feature and completion space respectively. Experiments on public datasets show encouraging results. Especially for the ultra-low overlap cases, compared with the state-of-the-art (SOTA) models, the size of overlapping regions can be increased by about 14.0%, and the rotation and translation error can be reduced by 30.8% and 57.7% respectively.

15:40-16:00, Paper TuCT7.3
Rapid Maize Seedling Detection Based on Receptive-Field and Cross-Dimensional Information Interaction

Hao, Fengqi	Qilu University of Technology
Zhu, Shulei	Qilu University of Technology
Ma, Dexin	Qingdao Agricultural Uni Versity, Qingdao 266109, China
Dong, Xiangjun	Qilu University of Technology
Hoiio, Kong	City University of Macau
Liu, Xia	Maize Research Institute, Shandong Academy of Agricultural Scien
Mu, Chunhua	Maize Research Institute, Shandong Academy of Agricultural Scien
Keywords: Machine Vision, Deep Learning, Application of Artificial Intelligence Abstract: The seedling stage is important in the growth and development of maize, and it is also a critical period that affects maize yield and quality. Accurately recognizing this stage of maize is challenging, as current maize seedling detection methods struggle with small sizes and complex environments. This paper proposes a rapid maize seedling detection method that emphasizes spatial feature extraction in the receptive field and cross-dimensional interaction---receptive field triplet attention YOLO (RFT-YOLO). First, we design a convolutional module for cross-dimensional information interaction within the receptive field. Second, we introduce an advanced selective fusion network, which boosts the multi-scale fusion capability of small object features while reducing the GFLOPs of the model. Additionally, we employ Inner-IoU to diminish sensitivity to positional biases of small objects and expedite the convergence of the loss function. Finally, we create a dataset of maize seedlings to conduct our experiments. Experimental results demonstrate that the RFT-YOLO significantly enhances performance, reducing the number of parameters by 35.3% and GFLOPs by 14.24%, compared to the baseline model. Moreover, the mean average precision (mAP) improves from 89.6% to 91.5%. These improvements confirm the model's effectiveness in detecting maize seedlings.

16:00-16:20, Paper TuCT7.4
Improving Class Imbalanced Few-Shot Image Classification Via an Edge-Restrained SinDiffusion Method

Zhao, Xu	Institute of Software, Chinese Academy of Sciences
Dong, Hongwei	Institute of Software, Chinese Academy of Sciences
Wu, Yuquan	Institute of Software, Chinese Academy of Sciences
Keywords: Machine Vision, Neural Networks and their Applications, Multimedia Computation Abstract: Class imbalance significantly hampers recognition accuracy in image classification, which is particularly prevalent in special field datasets. Traditional methodologies addressing imbalance, including techniques like oversampling, have shown limited effectiveness. Recently, the advanced generative capabilities of diffusion models garnered attention, with subsequent developments like SinDiffusion enabling new possibilities for rare class sample generation. However, the outputs of these models often significantly deviate from the original targets. Building on this foundation, our approach incorporates edge constraints to ensure newly generated samples not only differ from originals but also retain essential target features. Experiments validate the effectiveness of our method, highlighting the potential to mitigate class imbalance challenges in classification tasks.

16:20-16:40, Paper TuCT7.5
GOAT: Learning Multi-Body Dynamics Using Graph Neural Network with Restraints

Yang, Sheng	Chinese Academy of Sciences
Jia, Daixi	Institute of Software, Chinese Academy of Sciences
Chen, Lipeng	Institute of Software, Chinese Academy of Sciences
Li, Kunyu	Institute of Software, Chinese Academy of Sciences
Wu, Fengge	Institute of Software, Chinese Academy of Sciences
Zhao, Junsuo	Institute of Software, Chinese Academy of Sciences
Keywords: Deep Learning, Machine Learning, Application of Artificial Intelligence Abstract: Accurately simulating physical processes is an extremely challenging task, but the rapid development of machine learning and the availability of large datasets have made Graph Neural Networks (GNNs) a powerful tool for effectively simulating the physical systems. Currently, GNNs- based methods are primarily used in simple scenarios such as the free fall and collision of objects, fluid flow, and gravitational interactions among atoms. However, in complex industrial environments, there are always intricate interference factors such as friction, bearing connections, and torque affecting the motion between objects. Consequently, GNNs-based methods largely fail to solve practical physical problems related to complex multi-body dynamics. In this paper, to address the current lack of multi-body dynamics datasets in this field, we first introduce a multi-body dynamics dataset comprising eight different scenarios, each embodying distinct physical principles. Furthermore, we explore Graph Neural Simulators (GNSs) structure and physical priors and propose an efficient novel model, the Graph Neural Network with Restraints (GOAT), that can directly learn the relationships between systems from multi-body trajectories, thereby enhancing performance. Our results have shown significant improvements compared to other state-of-the-art baselines, demonstrating strong generalization capabilities and data efficiency.


TuCT8	MR08
Online - AIoT 2
Chair: Feng, Zhengqian	SHANDONG SCICOM Information and Economy Research Institute CO.. Ltd

15:00-15:20, Paper TuCT8.1
Classification of Table Cells Based on LLM Prompts

Liu, Mengjie	Hefei University of Technology
Bu, Chenyang	Hefei University of Technology
Bai, Shengxing	Hefei University of Technology
Dong, Bingbing	Hefei University of Technology
Wu, Xindong	Hefei University of Technology
Keywords: Artificial Social Intelligence, Application of Artificial Intelligence, AI and Applications Abstract: Tables, as an important means of data storage, are widely used in spreadsheets, web tables, and PDFs. By integrating information from table data with knowledge retrieved from an external knowledge base, and examining the correspondences between cell values in the table and instances in the knowledge base, we can extract knowledge from the table to augment and enrich the knowledge base. To achieve this goal, we first need to classify table cells based on their roles in the layout of document data. Due to the diverse structures arising from the arrangements of rows and columns, as well as the complexity of content resulting from concise data storage, current automation techniques heavily rely on stylistic features of table cells, such as font or color. Moreover, these methods are rarely experimented with or validated on tables without style features. Recent literature indicates that large language models (LLMs) demonstrate a basic ability to understand the structure and content of tables in tasks such as table judgment reasoning. Even without extensive feature inputs or pre-training, LLMs still show comparable results to machine learning and deep learning in these tasks. Therefore, this paper attempts to apply LLMs to table cell classification without using other stylistic features. We have designed a 4-component prompt paradigm (Classification Definition, Instruction, Table, Completion), representing respectively the classification definition, task instructions, table data, and result output. We conduct experiments on three datasets CIUS, SAUS, and DEEX for table cell classification with one-shot learning. Our experimental results show that with the assistance of LLMs, better results can be achieved without utilizing stylistic features.

15:20-15:40, Paper TuCT8.2
Knowledge Distillation Using Global Fusion and Feature Restoration for Industrial Defect Detectors

Feng, Zhengqian	SHANDONG SCICOM Information and Economy Research Institute CO
Yue, XiYao	Qilu University of Technology
Li, Wang	SHANDONG SCICOM Information and Economy Research Institute CO
Zhou, MingLe	Qilu University of Technology (Shandong Academy of Sciences)
Han, Delong	Qilu University of Technology
Li, Gang	Qilu University of Technology
Keywords: Neural Networks and their Applications, Machine Vision, Deep Learning Abstract: Deep learning technology has been widely applied in industrial quality inspection tasks to improve the accuracy of defect detection, object recognition, and classification. However, in the task of object detection, both feature-based and regression-based traditional knowledge distillation methods impose overly strict constraints on the student model. To address the above issues, this article proposes Global Fusion and Feature Restoration Knowledge Distillation(FRD). FRD integrates contextual information into the channel through a Global Fusion Module (GFM), and further utilizes attention mechanisms to adaptively focus on the distillation region after separating the foreground background. FRD also uses a Mask Feature Restoration (MFR) to mask and restore a portion of student features, improving the learnability of the model. At the same time, FRD adopts a comprehensive approach of multiple loss superposition constraints, rather than simply using MSE losses to imitate the features of teachers. Experiments have shown that FRD can effectively improve model performance in industrial defect detection tasks. On the aluminum surface defect dataset, FRD increased the mAP index of RetinaNetRes50 from 57.7% to 61.7%. We also confirmed the effectiveness of FRD for general object detection on the Coco dataset. On a randomly selected COCO dataset containing 4000 images, FRD increased the mAP metric of RetinaNet-Res50 from 21.5% to 43.6%.

15:40-16:00, Paper TuCT8.3
An Adaptive Residual Coordinate Attention-Based Network for Hat and Mask Wearing Detection in Kitchen Environments

Zhao, Ying	Qilu University of Technology (Shandong Academy of Sciences), Ji
Wu, Xiaoming	Qilu University of Technology, Shandong Computer Science Center
Liu, Xiangzhi	Shandong Computer Science Center (National Supercomputer Center
Chen, Hao	Qilu University of Technology (Shandong Academy of Sciences), Ji
Qi, Bei	Qilu University of Technology (Shandong Academy of Sciences)
Dong, Huomin	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition Abstract: In order to ensure food safety, it is required for personnel to wear hats and masks during food handling processes. To accurately detect the wearing status of kitchen staff, the ARP-YOLO model is proposed.Firstly, images are obtained from multiple kitchens and angles to construct a dataset reflecting the wearing status of hats and masks. To simulate more complex kitchen environments, Gaussian noise is added to the data and lighting conditions are adjusted for data augmentation.Lighting conditions in the kitchen can affect detection,causing the same target to exhibit different shapes and features under different lighting conditions, leading to missed detections.To address the above issues, we propose ARCA (Adaptive-Residual-Coordinate-Attention), which uses residual connections to strengthen attention to important features while preserving original features, and employs adaptive convolution reduction to reduce module parameters.To improve target localization accuracy, P2 detection layers are added in the Neck to obtain more accurate target position information.The ARP-YOLO model demonstrates significant improvements over the baseline model, with a 13.6% increase in Recall, allowing for more effective target capture and reduced missed detection risk.Additionally, mAP@0.5 has increased by 10%, enhancing target localization accuracy. The F1-Score has also increased by 7.4%, better balancing the relationship between Precision and Recall.To validate the effectiveness of the model, comparative experiments with other models are conducted, showing that ARP-YOLO model's Recall and Average Precision(AP) are higher than those of other models.

16:00-16:20, Paper TuCT8.4
A Multi-Objective Binary Differential Evolution Operator for Feature Selection

Dehnad, Parastoo	Tabriz University
Asilian Bidgoli, Azam	Wilfrid Laurier University
Rahnamayan, Shahryar	Brock University
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Machine Learning Abstract: Feature selection is a pivotal component of machine learning and data analysis, to optimize model performance by eliminating irrelevant and redundant features, to address the challenges associated with the “curse of dimensionality” and interpretability. In this context, we present feature selection as a multi-objective binary optimization task with the dual aim of maximizing classification accuracy while minimizing the number of selected features. In order to address this optimization challenge, we introduce the Multi objective Binary Differential Evolution algorithm (MOBDE). It’s worth noting that DE originally is an extremely powerful real-value coded algorithm, and to make it binary, set-based operators must replace vector-based operations. Optimization in binary space is deemed more suitable than real-value optimization because a many-to-one mapping can waste the efforts of the optimizer when solving a binary problem like feature selection. The algorithm leverages a partial opposition-based binary operator to generate diverse solutions to enhance its exploration within the search space. Additionally, it incorporates a majority voting mechanism as a local search strategy to bolster the algorithm’s exploitation capabilities. Results from experimentation on eleven datasets underscore the efficiency of MOBDE, outperforming the widely recognized NSGA-II method in terms of the hypervolume (HV) performance metric and minimizing the number of selected features. The proposed algorithm and its experimental outcomes are comprehensively detailed and analyzed, offering valuable insights into its efficacy for feature selection tasks.

16:20-16:40, Paper TuCT8.5
Wheat-YOLO: A Real-Time and High Precision Object Detection for Wheat (I)

Xu, Jin	Shenyang Aerospace University
Sun, Yue	Shenyang Aerospace University
Zhang, Senyue	Shenyang Aerospace University
Sun, Dongdong	Shenyang Aerospace University
Xiang, Zhiyu	Shenyang Aerospace University
Keywords: Adaptive Systems, Smart Buildings, Smart Cities and Infrastructures Abstract: In the wheat detection work, the wheat is located in a complex environment, weeds and dried leaves will hinder the detection of wheat, the wheat images of shadows , wheat obscuring each other and other phenomena will lead to a reduction in the accuracy of the detection.At the same time, for the small object detection problem, most of the detection algorithms use the detection speed as a cost to improve the detection accuracy, and cannot do a good job of balancing between the two.To address the above issues, this paper proposes an improved YOLO detection algorithm, which aims to improve the detection accuracy of small targets in complex environments while minimising the cost of detection speed. Firstly, two attention mechanisms for small target detection are added to the backbone of YOLOv8; secondly, the target detection header with unified attention is used in the head part, which improves the expressive power of the detection header without any computational expense; and finally, the Inner-MPDIoU function, which is a combination of MPDIoU and Inner ideas, is used as the localisation loss in loss functions.Extensive experiments on publicly available datasets show that the results of the network in this paper are improved in terms of both speed and accuracy compared to the original YOLOv8, enabling a better balance between detection accuracy and speed.

16:40-17:00, Paper TuCT8.6
A Sequential Pattern Mining Approach for Situation-Aware Human Activity Projection (I)

D'Aniello, Giuseppe	University of Salerno
Falcone, Roberto	University of Salerno
Gaeta, Matteo	University of Salerno
Rehman, Zia Ur	University of Salerno
Fortino, Giancarlo	University of Calabria
Keywords: Other Neurotechnology and Brain-Related Topics Abstract: Human activity prediction has become increasingly prevalent in a plethora of time-critical applications. To realize accurate identification and prediction of human behaviour, we propose a situation-aware wearable computing system. A wearable computing system has the capability to perceive, comprehend and project situations by analyzing the human behavioral patterns in different environments. In particular, this work proposes a situation-aware human activity prediction (SA-HAP) approach based on sequential pattern mining that aims to anticipate future activities and tailor its responses according to situations by analyzing frequent sequential patterns and their correlations to understand how these situations are interrelated. The approach not only improves prediction accuracy but also provide the foundation for a more informed decision-making process, as the projected situations can be explained using the identified behavioral patterns. The approach is compared with other traditional techniques for activity prediction (LSTM and HMM), achieving better performance on the Extrasensory dataset.


TuCT9	MR09
Online - AI Applications 10
Chair: Li, Li	Beijing Information Science and Technology University

15:00-15:20, Paper TuCT9.1
NRPP: A Learning Graph Representation Approach for Network Robustness Prediction

Huang, Wenli	Sichuan Normal University
Chen, Liang	Sichuan Normal University
Zhang, Shuai	The Tianjin Normal University
Li, Junli	Sichuan Normal University
Keywords: Complex Network, Machine Learning Abstract: In the field of modern network science, robustness is a key factor in evaluating the characteristics of complex networks. Connectivity robustness and controllability robustness are two important measures. They refer to a network's ability to maintain connectivity and controllability during malicious attacks or random failures. Traditional methods for assessing network robustness, which typically involve time-consuming attack simulations, often suffer from limited accuracy and computational inefficiency. Thus, this paper proposes a simple yet effective method, NRPP, for predicting network robustness using a learning graph representation. This method transforms local nodal information into a graph representation and extracts multi-scale features. Extensive experiments on undirected synthetic networks show: 1) NRPP effectively combines node sorting (NR) with pyramid pooling (PP) to obtain graph-level vector representations, improving network robustness predictions. Ablation experiments validate the necessity of node sorting and pyramid pooling. 2) Experimental results demonstrate that NRPP outperforms three state-of-the-art CNN-based models in predicting network robustness.

15:20-15:40, Paper TuCT9.2
ALGIN: Adaptive Local and Global Interests Network for Click-Through Rate Prediction

Xiang, Yingjia	School of Big Data and Software Engineering, Chongqing Universit
Lu, Xuanyu	China University of Geosciences, Wuhan, China
Zhou, Wei	Chongqing University
Wen, Junhao	Chongqing University
Keywords: Intelligent Internet Systems, Application of Artificial Intelligence, Knowledge Acquisition Abstract: User behavior sequence modeling represents users' interests and preferences, which has a crucial impact on click-through rate (CTR) prediction models and recommendation systems. In practical scenarios, multiple types of interactive behaviors usually constitute users' complex interests and preferences. However, existing CTR prediction models based on multi-type interactive behavior modeling have two limitations: (1) they cannot handle noise in users' global interests, resulting in the inability to model accurate global interest preferences; (2) they struggle to depict users' local interest patterns and cannot model the complex patterns embedded in multi-behavior sequences. To overcome these limitations, an Adaptive Local and Global Interests Network For Click-Through Rate Prediction (ALGIN) is proposed, which effectively models global and local interests within multi-behavior interaction sequences through a unified framework. Firstly, ALGIN employs a Fourier Global Interest Modeling module to eliminate noise in the behavioral sequence; secondly, ALGIN designs a Multi-Scale Local Interest Extraction module to capture behavioral patterns within multi-type interactive sequences; finally, an Item-driven Interest Selector is used to integrate users' local interests with their global interests. Experiments on multiple public datasets demonstrate that the ALGIN model outperforms the current state-of-the-art (SOTA), and ablation experiments confirm the effectiveness of each module within the model.

15:40-16:00, Paper TuCT9.3
Data Importance Measurement Based on Sampling Region Information for Oversampling

Liu, Xuanxuan	Qingdao University
Guo, Li	Qingdao University
Chen, Long	University of Macau
Keywords: Machine Learning Abstract: Sampling is commonly employed to tackle the classification of imbalanced data, with the Synthetic Minority Oversampling Technique (SMOTE) being the most widely used sampling method. In recent years, multiple variations of SMOTE, including the clustering & SMOTE based oversampling methods are proposed. Nevertheless, these methods often neglect the calculation of importance measurement for minority class samples while generating noise, boundary, or overlapping samples in some cases. To solve such problems, we propose a novel importance measurement based on sampling region information (SRI) in minority class samples by using clustering. The measurement focuses more on important samples. Based on the measurement, we also propose an oversampling method called KRISMOTE. The method not only reduces noise, but also clarifies the classification boundaries, effectively enhancing the classification performance of imbalanced data. Experimental results over the publicly available KEEL dataset demonstrate that the proposed KRISMOTE method outperforms other popular oversampling algorithms.

16:00-16:20, Paper TuCT9.4
A Semantic Verifier for Optimizing Small-Scale Large Language Models on Reasoning Tasks

Bai, Yu	Shenyang Aerospace University
Li, Jun	Shenyang Aerospace University
Cai, Fang	Stanford University
Liu, Yuting	Shenyang Aerospace University
Keywords: Knowledge Acquisition, Representation Learning, Deep Learning Abstract: Large language models (LLMs) with more than 100 billion parameters have revolutionized various tasks related to natural language processing and have had a profound impact in the field of artificial intelligence. However, deploying LLMs in the real world could also result in increased production costs. Small-scale Large Language Models (SLLMs), which are smaller, compact LLMs with fewer than 10 billion parameters, could significantly reduce production costs compared to LLMs. However, they typically perform less effectively than LLMs in general. Although In-context learning prompting has successfully enhanced the capabilities of SLLMs, the construction of prompts requires a certain level of human expertise. In this study, we explore enhancing SLLMs in emulating the performance of LLMs in reasoning tasks at a minimal cost, without any prompts provided by humans. We employ two SLLMs and incorporate a ranking model based on a Semantic Verifier between them to facilitate reasoning tasks. Experiments conducted on four publicly available datasets for reasoning tasks demonstrate that our approach effectively enhances the inference performance of SLLMs, and it achieves new state-of-the-art results.

16:20-16:40, Paper TuCT9.5
Novel Post-Training Structure-Agnostic Weight Pruning Technique for Deep Neural Networks

Abdi Reyhan, Zahra	Brock University
Rahnamayan, Shahryar	Brock University
Asilian Bidgoli, Azam	Wilfrid Laurier University
Keywords: Deep Learning, Evolutionary Computation, Machine Learning Abstract: Deep neural networks (DNNs) have shown exceptional performance in various domains, leading to their widespread adoption. However, the necessity to deploy DNNs on resource-constrained devices calls for improved model efficiency. Accordingly, DNN pruning has emerged as a critical technique in the field of machine learning, offering significant improvements in computational efficiency and model simplicity. This paper introduces an innovative post-training pruning approach for DNNs without any retraining, that utilizes multi-objective optimization to achieve substantial sparsity rates while preserving significant accuracy levels. The proposed method transforms the post-training weight pruning challenge into a two-variable, biobjective optimization problem. The optimizer finds the optimal minimum and maximum threshold values through optimization, effectively converting the real-valued weights between these two thresholds to zero. The task and model-independency of the proposed framework make it applicable across various models, tasks, and datasets without constraints on the number of weights. The approach provides a decision-maker, where users can select the best strategy within resource constraints to achieve their desired accuracy. In order to asses our method, we evaluated the pruning of RESNET50 model on CIFAR10 and CIFAR100 benchmark datasets. In CIFAR10, by reducing 70% of the weights within the optimal threshold values, the network’s accuracy only decreases by 0.1. Similarly, in CIFAR100, an appropriate weight range was selected, resulting in a 65% reduction in weights while maintaining a negligible 0.1 decrease in network accuracy. This demonstrates the effectiveness of the optimization in achieving significant model size reduction without compromising performance on large DNNs.

16:40-17:00, Paper TuCT9.6
Vulnerability Detection by Sequential Learning of Program Semantics Via Graph Attention Networks

Li, Li	Beijing Information Science and Technology University
Han, Qihong	Beijing Information Science and Technology University
Cui, Zhanqi	Beijing Information Science and Technology University
Keywords: Quality and Reliability Engineering, Decision Support Systems, Fault Monitoring and Diagnosis Abstract: Vulnerability detection is a crucial aspect of protecting software systems from cyber attacks. However, some types of vulnerabilities are difficult to detect and require analyzing the source code from multi-views. To address this, we propose a general and easily extensible framework, SGVD(Sequential Graph Attention Networks for Vulnerability Detection). SGVD consists of a sequential module that uses the GAT to learn the semantic representations of the code and a novel Fused-Prediction module that extracts useful features from the multi-view source code. We evaluated this framework on a dataset that includes two large-scale open-source C projects. The experiments showed that SGVD had a superior performance compared to the existing advanced graph learning vulnerability detection tools Devign and ReGVD, with an average increase of 12.25% in Accuracy, 13.65% in Precision, 12.04% in F1 score, and 9.14% in Recall.


TuCT10	MR10
Machine Vision and Perception 1
Chair: Li, Wei	Beijing Jiaotong University

15:00-15:20, Paper TuCT10.1
SMDNet: A Pulmonary Nodule Classification Model Based on Positional Self-Supervision and Multi-Direction Attention

Xiong, Jun	Chongqing University
Wang, Chengliang	Chongqing University
Wu, Xing	Chongqing University
Wang, Peng	Southwest Hospital of Army Medical University
Wang, Haidong	Southwest Hospital of Army Medical University
Wang, Hongqian	Southwest Hospital of Army Medical University
Keywords: Machine Vision, Biometric Systems and Bioinformatics Abstract: Accurate classification of pulmonary nodules holds importance in the early diagnosis of lung cancer. Unlike 2D models, 3D models can simultaneously utilize multiple slices as input to capture features. However, 3D models face challenges in capturing nodule features in different directions and discerning feature differences in various positions of computed tomography (CT). We introduce a pulmonary nodule classification model, SMDNet. Firstly, a multi-direction attention is proposed to capture nodule features from sagittal, coronal, and axial axes. Secondly, distinct labels are assigned to the cubes at different cropping positions from CT for binary classification to capture local differences. Besides, gradient boosting decision tree (GBDT) is employed to combine shallow features with deep features to improve accuracy. Comparative experimental results on the largest publicly available dataset of pulmonary nodules, LIDC-IDRI, showed that SMDNet achieves a 4.81% improvement in accuracy under identical data processing.

15:20-15:40, Paper TuCT10.2
Sim-To-Real Domain Adaptation for Deformation Classification

Sol, Joel	University of Victoria
Fayyad, Jamil	University of Victoria
Alijani, Shadi	University of Victoria
Najjaran, Homayoun	University of British Columbia
Keywords: Transfer Learning, Deep Learning, Machine Vision Abstract: Deformation detection is vital for enabling accurate assessment and prediction of structural changes in materials, ensuring timely and effective interventions to maintain safety and integrity. Automating deformation detection through computer vision is crucial for efficient monitoring, but it faces significant challenges in creating a comprehensive dataset of both deformed and non-deformed objects, which can be difficult to obtain in many scenarios. In this paper, we introduce a novel framework for generating controlled synthetic data that simulates deformed objects. This approach allows for the realistic modeling of object deformations under various conditions. Our framework integrates an intelligent adapter network that facilitates sim-to-real domain adaptation, enhancing classification results requiring limited real data from deformed objects. We conduct experiments on domain adaptation and classification tasks and demonstrate that our framework improves sim-to-real classification results compared to the simulation baseline. Our code is available here.

15:40-16:00, Paper TuCT10.3
FP3Seg: Point Cloud Panoptic Segmentation Via LiDAR-Camera Fusion and Progressive Decoder

Dai, Xianyou	Zhejiang University of Technology
Weng, Libo	Zhejiang University of Technology
Gao, Fei	Zhejiang University of Technology
Keywords: Deep Learning, Machine Vision, Transfer Learning Abstract: Point cloud panoptic segmentation is a 3D scene perception task that provides a holistic solution for both semantic and instance segmentation. The sparsity and lack of texture features in LiDAR point cloud, coupled with the relatively narrow field of view of camera, make multi-modal fusion challenging. In this paper, we propose a novel multi-modal fusion based point cloud panoptic segmentation method, named FP3Seg, with main contributions including Hybrid Domain Adaptive Fusion (HDAF) module and Progressive Decoder. HDAF employs learnable weights to adaptively fuse multi-modal features in both the channel and spatial domains. Through knowledge distillation, FP3Seg extends the benefits of multi-modal fusion beyond the camera field of view. Progressive Decoder embeds semantic and instance information into the input of panoptic decoder, assisting the decoder in understanding the distinctions between stuff and thing classes. The proposed method is benchmarked on the SemanticKITTI test set, achieving 57.7% PQ, showing a 1.7% improvement over baseline. Experimental results demonstrate that FP3Seg possesses advantages over single-modal approaches in multiple aspects, especially for the segmentation of thing classes.

16:00-16:20, Paper TuCT10.4
OneDConv: Generalized Convolution for Transform-Invariant Representation

Weng, Haohan	South China University of Technology
Chen, C. L. Philip	South China University of Technology
Ke, Yi	South China University of Technology
Haiqi, Liu	South China University of Technology
Zhang, Tong	South China University of Technology
Keywords: Machine Vision, Representation Learning, Image Processing and Pattern Recognition Abstract: Convolutional Neural Networks (CNNs) have exhibited great power in various vision tasks. However, the lack of transform-invariant property limits their further applications in complicated real-world scenarios. In this work, we proposed a novel generalized one-dimension convolutional operator (OneDConv), which dynamically transforms the convolution kernels based on the input features in a computationally and parametrically efficient manner. The proposed operator can extract the transform-invariant features naturally. It improves the robustness and generalization of convolution without sacrificing the performance of common images. The proposed OneDConv operator can substitute the vanilla convolution. Thus, it can readily be incorporated into popular convolutional architectures, supporting end-to-end training. Empirical evaluations on popular benchmarks reveal OneDConv's superior performance over the standard convolution and competitive models in handling canonical and distorted images.

16:20-16:40, Paper TuCT10.5
Monocular Ranging Based on Camera Pose in Visual Train Positioning (I)

Li, Wei	Beijing Jiaotong University
Chai, Ming	Beijing Jiaotong University
Liu, Hongjie	Beijing Jiaotong University
Lv, Jidong	Beijing Jiaotong University
Meiling, Xie	BJTU
Keywords: Machine Vision, AI and Applications, Deep Learning Abstract: Real-time and accurate positioning is the key to ensuring the safe operation of trains. Traditional train positioning technology relies on track-side equipment, and there are problems such as high construction and maintenance costs and difficulties in obtaining the initial train positions. With the rapid development of enabling technologies such as AI and image recognition, visual perception-based train autonomous positioning technology has attracted widespread attention in recent years. To address the problem of large measurement errors due to camera pose changes in train visual positioning, this article proposes a monocular ranging method based on visual beacon imaging size correction. A kilometer post is used as a visual beacon, and a deep neural network is used to detect the visual beacon and extract its feature points’ pixel coordinates. The PnP algorithm is used to estimate the camera pose and correct the visual beacon imaging size through the camera pose angle, the average relative error of ranging is stabilized at 2% ∼ 3%, and the ranging effect is improved by about 60%, which effectively reducing the ranging error of the monocular ranging model based on known references.

16:40-17:00, Paper TuCT10.6
Advancing ESWL Outcome Predictions in X-Ray and CT through Sophisticated Feature Selection (I)

Kobayashi, Soya	University of Hyogo
Fujita, Daisuke	University of Hyogo
Shibutani, Hironobu	Ishikawa Hospital
Gohara, Shinsuke	Ishikawa Hospital
Kobashi, Syoji	University of Hyogo
Keywords: Machine Vision, Image Processing and Pattern Recognition, AI and Applications Abstract: Ureteral stones, a prevalent type of urinary stone, form within the ureter, causing severe pain and hematuria. Extracorporeal shock wave lithotripsy (ESWL) and transurethral lithotripsy (TUL) are the primary treatments. Although ESWL is less invasive and requires shorter hospital stays, its lower success rate compared to TUL, often necessitates additional treatments, increasing both the physical and financial burdens on patients. This study aims to enhance the accuracy of predicting ESWL outcomes by using advanced machine learning methods to analyze both CT and X-ray images together with clinical findings. Particularly, X-ray images, which have been less frequently in past studies, are anticipated to provide detailed analytical features. Addressing the limitations of current predictive methods, which often suffer from excessive irrelevant features and low interpretability, this study introduces three sophisticated feature selection approaches: P-value, AUC, and SHAP value. These approaches aim to enhance both model interpretability and predictive performance. Several machine learning algorithms were evaluated, including decision tree, K-nearest neighbors, random forest, logistic regression, SVM, adaboost, and Gaussian NB. Results from 139 subjects with ESWL show that logistic regression and SVM perform optimally, with SHAP-based feature selection significantly boosting outcomes, demonstrated by the increase in the AUC from 0.791 to 0.845 for logistic regression, and from 0.750 to 0.764 for SVM. Furthermore, the study decreases the number of features used in the model from 55 to 19, simplifying the prediction process. However, features extracted from X-ray images showed limited effectiveness.


TuCT11	MR11
Image Processing and Pattern Recognition 2	Regular Papers - Cybernetics
Chair: Yang, Jie	University of Technology Sydney

15:00-15:20, Paper TuCT11.1
Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

Chen, Yulin	School of Automation, Guangdong University of Technology, Guangz
Huang, Guoheng	Guangdong University of Technology, School of Computer Science A
Huang, Kai	Guangdong University of Technology, School of Computer Science A
Lin, Zijin	School of Computer Science and Technology, Guangdong University
Zhong, Guo	Guangdong University of Foreign Studies, School of Information S
Luo, Shenghong	University of Macau
Deng, Jie	Department of Otorhinolaryngology, the First Affiliated Hospital
Zhou, Jian	Sun Yat-Sen University Cancer Center, Guangdong Key Laboratory O
Keywords: Application of Artificial Intelligence, Image Processing and Pattern Recognition, Deep Learning Abstract: Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as the loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and sizes. To address these issues, we propose a novel Medical Visual Prompting (MVP) framework that leverages pre-training and prompting concepts from Natural Language Processing (NLP). The framework utilizes three key components: Super-Pixel Guided Prompting (SPGP) for superpixelating the input image, Image Embedding Guided Prompting (IEGP) for freezing patch embedding and merging with superpixels to provide visual prompts, and Adaptive Attention Mechanism Guided Prompting (AAGP) for pinpointing prompt content and efficiently adapting all layers. By integrating SPGP, IEGP, and AAGP, the MVP framework enables the segmentation network to better learn shape prompting information and facilitates mutual learning across different tasks. Extensive experiments conducted on five datasets demonstrate the superior performance of the proposed method in various challenging medical image tasks while simplifying single-task medical segmentation models. This novel framework offers improved performance with fewer parameters and holds significant potential for accurate segmentation of lesion regions in various medical tasks, making it clinically valuable.

15:20-15:40, Paper TuCT11.2
MemADet: A Representative Memory Bank Approach for Industrial Image Anomaly Detection

Li, Min	Qilu University of Technology
He, Jinghui	Qilu University of Technology
Zuobin, Ying	City University of Macau
Li, Gang	Qilu University of Technology
Zhou, Mingle	Qilu University of Technology
Keywords: Machine Vision, Image Processing and Pattern Recognition, Deep Learning Abstract: In the field of industrial production, anomaly detection is crucial for ensuring product quality and maintaining production efficiency. With the continuous advancement of computer vision technology, it has shown tremendous potential in industrial applications. However, the scarcity of labeled anomaly samples in real-world operating environments poses significant challenges for traditional anomaly detection techniques. To address this, we propose MemADet, a novel anomaly detection model that employs an unsupervised approach and leverages a representative memory bank. It employs a dynamic decision mechanism to control the representativeness of the features stored in the memory bank, and employs a weighted anomaly score calculation mechanism to further enhance the performance of image anomaly detection. Our evaluations indicate that MemADet performs robustly in industrial image anomaly detection across three datasets, with particularly notable detection accuracy on the MVTec AD dataset. Its efficacy is further validated by competitive results on two additional datasets, highlighting its consistent and effective performance in various settings.

15:40-16:00, Paper TuCT11.3
Cross-Modality Disentangled Information Bottleneck Strategy for Multimodal Sentiment Analysis

Deng, Zhengnan	The School of Computer Science and Technology, Guangdong Univers
Huang, Guoheng	Guangdong University of Technology, School of Computer Science A
Zhong, Guo	Guangdong University of Foreign Studies, School of Information S
Yuan, Xiaochen	Macao Polytechnic University
Huang, Lian	The Department of Applied Electronics, Guangdong Mechanical And
Pun, Chi-Man	University of Macau
Keywords: Multimedia Computation, Neural Networks and their Applications, Deep Learning Abstract: Multimodal Sentiment Analysis (MSA) has been a pivotal domain in current research area which utilizes diverse information carriers such as videos containing multiple modalities to understand the user's sentiment. With the success of multimodal fusion techniques, lots of fusion strategies have been proposed to obtain a favorable multimodal joint representation for MSA. However, existing studies hardly consider the problem of redundant information in unimodal, resulting in the joint representation may contain much redundant information from different modalities, thus limiting the accuracy of sentiment prediction. In this work, we propose a Cross-Modality Disentangled Information Bottleneck Strategy (CMDIBS), which consists of a Cross-Modality Knowledge Awareness (CMKA) module and a Multimodal Disentangled Information Bottleneck (MDIB) mechanism. Specifically, the CMKA module encourages interactions among different modalities to learn the sentiment embedding relevant to the predicted goals. In particular, MDIB mechanism aims to maximize the mutual information (MI) between the multimodal joint representation and the predicted label, and maximize the MI between the style embedding with the label and the input data while constraining the MI between the multimodal joint representation and the style embedding to obtain a succinct and efficient multimodal joint representation. Experimental results on the benchmark datasets, namely CMU-MOSI and CMU-MOSEI, indicated that the proposed method surpasses existing approaches and attains SOTA performance.

16:00-16:20, Paper TuCT11.4
Enhanced Slicing Prototype and Hybrid Metric Transformer for Few-Shot Medical Image Classification

Wang, Bo	East China Normal University
Wang, Hailing	East China Normal University
Cao, Guitao	East China Normal University
Keywords: Image Processing and Pattern Recognition, Deep Learning, AI and Applications Abstract: As one of the most popular neural network modules, Transformer plays a key role in many fundamental deep learning models such as few-shot medical image segmentation, which aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more semantically effective information from the support to match with the corresponding objects in query. The traditional models generally input the whole image into the deep neural network to obtain the feature representation, and use only one measurement method to improve efficiency. If the objects in them show large intra-class diversity, the discrepancy gap between query and support images is ignored. To solve this problem, we propose an enhanced slicing prototype and multidimensional metric mechanism to address the inefficiency of existing few-shot learning methods in medical image classification. Instead of whole image is input into the deep neural network, our proposed model segments the image into slices, and then use the self-attention mechanism to generate enhanced feature vectors based on transformer. And then, a hybrid metric is used to measure similarity between features by calculating the distance between the support set and query set slice prototypes to improve efficiency. Experiments demonstrate that our model has better classification effect on mini-MedMNIST, which is a few-shot medical image dataset constructed from MedMNIST dataset.

16:20-16:40, Paper TuCT11.5
Dual Contrastive Learning with Mutual Correction for Semi-Supervised Medical Image Segmentation

Zhu, Jiazhe	Tongji University
He, Lianghua	Tongji University
Keywords: Image Processing and Pattern Recognition, Deep Learning Abstract: Semi-supervised learning has garnered considerable attention from researchers due to its capacity to utilize extensive amounts of unlabeled data, thus reducing the reliance of deep learning models on annotated datasets. However, in medical image segmentation, this method still encounters challenges such as suboptimal pseudo-labeling and insufficient feature extraction because of the confirmation bias problem induced by erroneously fitting unlabeled data. To tackle these challenges, we propose a novel approach that integrates correction modules and contrastive learning. First, our method exploits the difference in the output predictions from two different decoders and employs two rectification losses in the inconsistent regions for labeled and unlabeled data respectively, which mitigates the confirmation bias problem. Additionally, we incorporate two uncertainty-guided pixel-prototype contrastive learning modules, which are designed to perceive complete sample distribution information and optimize the features of pixels with low-uncertainty pseudo labels. Both modules complement each other and enable the encoder to generate class-discriminative features, thereby enhancing the final segmentation performance. Finally, extensive experiments are conducted on the two widely used medical image datasets to demonstrate the effectiveness of our method.

16:40-17:00, Paper TuCT11.6
Shedding New Light on Traditional Image Clustering: A Non-Deep Approach with Competitive Performance and Interpretability

Yang, Jie	University of Technology Sydney
Lin, Sheng-Ku	University of Technology Sydney
Keywords: Image Processing and Pattern Recognition, Machine Learning Abstract: Image clustering, a fundamental task in computer vision, entails grouping images into distinct categories based on their intrinsic properties and similarities. Traditional (non-deep) image clustering models often struggle to achieve high accuracy due to variations in pose, illumination, or occlusion within image datasets, which frequently lead to multi-modal clusters. In recent years, deep neural networks, with their robust representation learning capabilities, have demonstrated considerable accuracy in image clustering tasks. However, the high computational costs and lack of interpretability of deep models have limited their practical application. In this paper, we introduce the MaxFeature Torque Clustering (MFTC) model, a non-deep approach designed as a transitional solution that bridges the gap between traditional and deep image clustering models. MFTC stands out for its accuracy, outperforming conventional image clustering methods, and provides greater interpretability, an aspect often lacking in deep models. Across six publicly available image datasets, the non-deep MFTC model achieved accuracy comparable to or better than previous state-of-the-art (SOTA) deep image clustering models. The codes are available.


TuCT12	MR12
Haptic and Human-Computer Interaction 8	Regular Papers - HMS
Chair: Abekawa, Naotoshi	Nippon Telegraph and Telephone Corporation

15:00-15:20, Paper TuCT12.1
A Computational Mechanism for Forming Arm Motor Memories Differentiated by Dynamic Gaze States

Abekawa, Naotoshi	Nippon Telegraph and Telephone Corporation
Gomi, Hiroaki	Nippon Telegraph and Telephone Corporation
Keywords: Human Performance Modeling, Human-Computer Interaction Abstract: Various kinds of natural reaching behaviors, such as reaching for a cup or learning a new tennis shot, require adaptive changes in response to error signals. Since reaches are usually coordinated with preceding eye movements to a reach target, learning to reach often occurs under coordinated eye-hand movements. To understand the computational mechanism of eye-hand coordination, we examined whether/how preceding eye movements contribute to the formation of motor memory for reaching. In the experiment, participants were asked to perform sequential eye and hand movements toward a visual target (i.e., eye-then-reach) and to learn two opposing visuomotor perturbations (i.e., clockwise and counterclockwise visuomotor rotations), which normally prevent learning due to memory interference. We found that two visuomotor maps for reaching could be learned simultaneously when each was uniquely associated with preceding static and dynamic gaze states. This gaze-dependent learning effect also attenuated as the pause between eye and hand movements increased. These results demonstrate that the brain encodes reaching memories in tight association with the recent history of preceding dynamic gaze states.

15:20-15:40, Paper TuCT12.2
Maximizing Disagreement and Polarization in Social Media Networks Using Double Deep Q-Learning

Zareer, Mohamed	Concordia University
Selmic, Rastko	Concordia University
Keywords: Human-Machine Interaction, Networking and Decision-Making, Human-Computer Interaction Abstract: In this paper, we consider reinforcement learning (RL) techniques to systematically analyze and enhance the levels of disagreement and polarization within social media ecosystems. The proposed methodology employs a Double Deep Q-Learning algorithm to strategically identify individuals within the network. This identification process is aimed at selecting agents for takeover and control, thereby orchestrating a scenario that culminates in the maximization of disagreement and polarization within the network. The social media network is modeled by an asynchronous and synchronous expressed and private opinion dynamics model. The model incorporates a dual-state update mechanism: a synchronous update process for the state representing an individual's private opinion and an asynchronous update process for the state that reflects the individual's publicly expressed opinion. The RL agent's observational capacity is limited to the expressed opinions of individuals and the quantifiable metric of their followers or connections. The proposed model is analyzed for varying topologies and convergence conditions. Simulations are provided to illustrate the results.

15:40-16:00, Paper TuCT12.3
CGCNNet: A Spatial-Temporal Information Confidence Guided GCN Network for Human Pose Estimation

Sun, Yuechao	Beijing University of Technology
Kong, Dehui	Beijing University of Technology
Li, Jinghua	Beijing University of Technology
Li, Qianxing	Beijing University of Technology
Yin, Baocai	Beijing University of Technology
Keywords: Human-Computer Interaction, Human Perception in Multimedia Abstract: 基于视频的人体姿势估计伴随着一系列视频序列，这些视频序列提供了很多信息，并且通常在视频帧之间具有值得注意的姿势变化。研究人员密切关注这种视频的可变性，因为它对最终结果有重大影响。鉴于深度学习技术的不可解释性，现有的基于学习的方法主要隐含地考虑了这种可变性。为了充分利用输入信息的可变性，该文提出一种显隐考虑相结合的方法，该方法设计为时空信息置信度引导的图卷积网络（CGCNNet），分别显式考虑联合序列在时间上的置信度和隐式考虑空间上的置信度。具体而言，从时间域的角度来看，CGCNNet明确地测量了序列中每一帧对目标帧预测位姿的贡献，指导了帧间信息的融合，并部分缓解了图像多样性导致的

16:00-16:20, Paper TuCT12.4
MmHPE: Human Pose Estimation Based on Point Cloud from Millimeter-Wave Radar

Lai, Jiale	South China University of Technology
Tian, Jiake	South China University of Technology
Zou, Yi	South China University of Technology
Song, Xianfeng	South China University of Technology
Liu, Fangming	Peng Cheng Laboratory
Li, Dacheng	Gosuncn Technology Group Co., Ltd
Keywords: Human-Computer Interaction, Biometrics and Applications, Abstract: ehabilitation therapy involving repetitive exercises targeting specific human joints under the supervision of a doctor is crucial for patients with movement disorders. Yet the cost of commuting and the demand for medical resources are inconvenient for patients. Human-computer interaction can provide remote rehabilitation guidance for patients at home through human pose estimation technology, but privacy concerns with optical sensors and the cost and discomfort of wearable sensors have hindered progress in this field. To address the above challenges, we propose mmHPE, an innovative 3D human pose estimation framework that uses millimeter-wave (mmWave) radar sensors. It initially manipulates the raw data captured from radar sensors to generate a spatiotemporal sequence point cloud dataset. Afterward, we create a Convolutional Neural Network (CNN) that is linked to a Bidirectional Long Short-Term Memory (Bi-LSTM) network. Moreover, a multi-head attention mechanism is employed to boost the network’s performance and to accurately estimate the locations of human skeletons. Ultimately, the 21 points with the corresponding human pose position are successfully reconstructed. We investigate the mmHPE framework’s feasibility and cross-domain stability in different home environments in real-world scenarios. This innovation proffers patients a convenient and privacy-conscious solution for their rehabilitation training requisites.

16:20-16:40, Paper TuCT12.5
The ATTUNE Model for Artificial Trust towards Human Operators

Petousakis, Giannis	University of Manchester
Cangelosi, Angelo	University of Manchester
Stolkin, Rustam	Extreme Robotics Lab, NCNR, University of Birmingham
Chiou, Manolis	Queen Mary University of London
Keywords: Human-Machine Interaction, Human-Collaborative Robotics, Human Performance Modeling Abstract: This paper presents a novel method to quantify Trust in HRI. It proposes an HRI framework for the estimation of the Robot Trust towards the Human, in the context of a narrow and specified task. The framework produces a real-time estimation of an AI agent’s Artificial Trust towards a Human partner interacting with a mobile teleoperation robot. The approach for the framework is based on principles drawn of Theory of Mind, including information about the human state, action, and intent. The framework is used to create the ATTUNE model for Artificial Trust Towards Human Operator. The model uses metrics on the operator's state of attention, navigational intent, actions and performance, to quantify the Trust towards them. The model is tested on a pre-existing dataset that includes recordings (ROSbags) of a human trial in a simulated disaster response scenario. The performance of ATTUNE is evaluated through a qualitative and quantitative analysis. The results of the analyses provide insight into the next stages of the research and help refine the proposed approach.

16:40-17:00, Paper TuCT12.6
Exploring Diverse Augmentation Strategies for Imagined Speech Detection in Brain-Computer Interface Using Machine Learning Models

Mohan, Anand	IIT Roorkee
Anand, Rs	Indian Institute of Technology Roorkee
Keywords: Human-Machine Interaction, Cognitive Computing, Brain-based Information Communications Abstract: Imagined speech refers to the inner articulation of words without vocalization. Augmentation techniques in Electroencephalography (EEG)-imagined speech datasets are essential to compensate for limited data, enhance model generalization, and ensure robust performance. They mitigate overfitting and improve the effectiveness of models trained on these datasets. This work demonstrates the application of diverse data augmentation techniques in imagined speech tasks, where individuals mentally simulate speech without vocalization. The proposed augmentation methods including overlapping, time warping + overlapping, gaussian noise, fourier transform surrogate (FTSurrogate), time reverse, channel shuffle, time warping + jitter and no augmentation methods aim to enhance the robustness and generalization of machine learning models. Leveraging these techniques, several classification experiments using a variety of supervised learning algorithms to distinguish imagined speech patterns are conducted. This work highlights the importance of data augmentation and machine learning (ML) in imagined speech tasks. This work has applications in cognitive neuroscience and assistive rehabilitation technologies. Our multi-layer perceptron (MLP) result has shown promising outcomes. The results are analyzed in terms of accuracy, f1 score and kappa.


TuCT13	Foyer
Decision Support and Expert Systems	Special Sessions: SSE


TuDT1	MR01
Image Processing and Pattern Recognition 5	Regular Papers - Cybernetics
Chair: Lin, Wei	Fudan University

17:30-17:50, Paper TuDT1.1
TF-Net: Triple Fusion Net for Medical Image Segmentation

Lin, Wei	Fudan University
Meng, Chunlei	Fudan University
Zhang, Hongda	Fudan University
Liu, Bowen	Fudan University
Xie, Yi	Fudan University
Dong, Xinyang	Fudan University
Ouyang, Chun	Fudan University
Gan, Zhongxue	Fudan University
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning Abstract: Lesion segmentation plays a crucial role in various medical image analyses, which not only improves the efficiency in clinical diagnosis but also assists in detecting early symptoms of various diseases. Most existing studies focus on directly extracting lesion information from specific types of medical images with pre-trained weights, often neglecting the underlying topological and pathological causes which lead to these lesions. Furthermore, they overlook to capture general anatomical features among lesions, which are related to the distribution of lesions, and thus the model is poorly generalized in different medical datasets. Inspired by these insights, we propose a Triple Fusion Net (TF-Net), a network structure divided into three branches: left, middle and right. The left and right branches are designed to extract lesion features and associated topological style features within various medical images, respectively. And these features are further fused and modeled in the middle branch. The proposed structure of triple branches for features fusing effectively learns multi-feature information and improves the performance of TF-Net. And our work experiments validate various feature fusion methods in the middle branch, including channel-wise concatenation, element-wise addition, attention gate, and transformer encoder block. Without using pre-trained weights in our network, the transformer encoder block performs best on some tasks of DDR and surpasses other pre-trained models. Channel concatenation exhibits performance close to other pre-trained models in both the IDRiD, Kvasir-Seg and TN3K. Attention gate fusion also shows competitive results in thyroid ultrasound segmentation. Our approach, leveraging a unique network structure and four different feature fusion methods, demonstrates remarkable generality across a spectrum of medical image segmentation tasks.

17:50-18:10, Paper TuDT1.2
LANet: Luminance-Aware Network for SDRTV-To-HDRTV Translation

Zhao, Jianru	University of Chinese Academy of Sciences
Zhang, Hua	Institute of Information Engineering, Chinese Academy of Science
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: High dynamic range (HDR) media resources, providing more details and higher contrast compared to standard dynamic range (SDR) media resources, cannot be captured by standard digital cameras directly due to sensor limitations. However, HDR cameras are excessively expensive, and most existing media resources are still in SDR. With the development of HDR display devices, there is an intense demand to transform the existing SDR media resources into the HDR version. In this work, we propose a deep neural network, LANet, to achieve standard dynamic range television (SDRTV) to high dynamic range television (HDRTV) translation. Specifically, we propose a new luminance-aware branch with self-attention mechanisms to selectively and adaptively process different regions in SDR frames that existing methods have overlooked. Experimental results show that the proposed LANet achieves state-of-the-art performance in quantitative comparisons and the visual quality of the reconstructed HDR frames have been significantly improved.

18:10-18:30, Paper TuDT1.3
RATLIP: Generative Adversarial CLIP Text-To-Image Synthesis Based on Recurrent Aﬀine Transformations

Lin, Chengde	Guilin University of Electronic Technology
Lu, Xijun	Guilin University of Electronic Technology
Chen, Guangxi	Guilin University of Electronic Technology
Keywords: Deep Learning, Image Processing and Pattern Recognition Abstract: Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in syn-thesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual in-formation unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP’s ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models.


TuDT2	MR02
Deep Learning and Neural Networks 7	Regular Papers - Cybernetics
Chair: Bi, Jing	Beijing University of Technology

17:30-17:50, Paper TuDT2.1
Multi-Indicator Water Quality Prediction Using Multimodal Bottleneck Fusion and ITransformer with Attention

Bi, Jing	Beijing University of Technology
Li, Yibo	Beijing University of Technology
Zhang, Xuan	Beijing University of Technology
Yuan, Haitao	Beihang University
Wang, Ziqi	Beijing University of Technology
Zhang, Jia	Southern Methodist University
Zhou, Mengchu	New Jersey Institute of Technology
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Expert and Knowledge-Based Systems Abstract: Water quality prediction methods forecast the future short or long-term trends of its changes, providing proactive advice for water pollution prevention and control. Existing water quality prediction methods only consider the historical data of single-type or multi-type water quality. However, meteorology and other factors also have a significant impact on water quality indicators. Therefore, only considering the historical data of water quality is not feasible. Unlike existing studies, this work proposes a hybrid water quality prediction model called CMI to solve the above problem. Before prediction, CMI incorporates a multimodal fusion mechanism of water quality time series and remote sensing images of meteorological rainfall. Moreover, CMI integrates the model of ConvNeXt V2 and a multimodal bottleneck transformer to extract image features for fusing the time series and images. Furthermore, it utilizes an emerging model of iTransformer to realize prediction with the fused features. Experimental results with real-life water quality time series and remotely sensed rainfall images demonstrate that CMI outperforms other state-of-the-art fusion algorithms, and the water quality prediction accuracy with fused meteorological data is 13% higher on average than that with only water quality time series.

17:50-18:10, Paper TuDT2.2
Temporal MLP Bridges the Gap between Embedding and Attention for Multivariate Time Series Forecasting

Xie, Zhinan	ShanghaiTech University
Zheng, Qi	Tongji University
Zhang, Yaying	Tongji University
Keywords: Deep Learning, Machine Learning, Representation Learning Abstract: Multivariate time series forecasting is crucial across various applications. In recent years, numerous studies adopt embedding layer and Attention mechanism to extract the intricate spatio-temporal features of time series. This involves directly transmitting the concatenated embeddings into the Attention mechanism. However, they generally overlook the importance of sending the integrated information in the embeddings into the Attention mechanism in a more appropriate way. To address this, we propose an intuitive network model with Temporal MLP Bridging the gap between Embedding and Attention (TMBEA) to deal with the above issue. Specifically, we explore a light-weight bridge with simple Multi-Layer Perceptrons (MLPs) fusing features along the temporal dimension, processing the embeddings before feeding them into the canonical Attention networks, which help embeddings to better align with the subsequent Attention networks. Experiments on real-world datasets, traffic datasets and air pollutant concentration datasets, demonstrate the efficiency of model. Further studies also show the capacity of bridge in improving the robustness of the model.

18:10-18:30, Paper TuDT2.3
Point Cloud Completion Method Assisted by Projected Image

Lin, Shujin	Sun Yat-Sen University
Li, ZhaoWen	Sun Yat-Sen University
Wu, Runxun	Sun Yat-Sen University
Zhou, Fan	Sun Yat-Sen University
Keywords: Deep Learning Abstract: Image-guided point cloud completion task aims to utilize image information to address the uncertainties in point cloud completion inference. Although acquiring 2D image data is relatively simpler than 3D data, it is still ineffective in scenarios with occlusions where image data cannot be reliably obtained as a reference. Therefore, we propose a point cloud completion model assisted by projected image data, which addresses the limitations of acquiring 2D images by constructing projected images of the point cloud. Extensive experiments demonstrate that our proposed method enhances the quality of point cloud completion and outperforms other advanced methods.


TuDT3	MR03
Haptic and Human-Computer Interaction 5	Special Sessions: HMS
Chair: Wang, Hanying	Northeastern University

17:30-17:50, Paper TuDT3.1
Automatic Recognition of Social Engagement for Children with Autism Spectrum Disorder (I)

Wang, Xinming	Harbin Institute of Technology (Shenzhen)
Zhang, Xiangdong	Center of Medical Prenatal Diagnosis, Lishui Maternity and Child
Wang, Zhiyong	Harbin Institute of Technology, Shenzhen
Nie, Wei	Harbin Institute of Technology (Shenzhen)
Zhang, Hanlin	School of Mechanical and Automation, State Key Laboratory of Rob
Xu, Xiu	Children's Hospital of Fudan University
Liu, Honghai	Shanghai Jiao Tong University
Keywords: Human-Machine Interaction, Affective Computing, Cognitive Computing Abstract: Estimating children’s engagement levels improves their understanding of their social behaviors, since they can reflect their devotion to social interaction with others. This paper proposes an automatic method to recognize children’s engagement levels in a triadic social interaction context. First, an overall metric function containing behavior, cognition, and affective dimensions is proposed to estimate children’s multidimensional engagement levels. Then, the automatic feature extraction method based on gaze estimation, facial expression recognition, pose estimation, and object recognition models is illustrated to extract features to compute the engagement levels. Videos of 24 children, including 13 children with autism spectrum disorder (ASD), in triadic social interaction were collected for the engagement recognition experiment and cross-group analysis. The experimental results validate the effectiveness of the proposed automatic feature extraction method compared to human observations. Cross-group analyses revealed significant differences in affective engagement between children with ASD and typical developmental (TD) children.

17:50-18:10, Paper TuDT3.2
Harmonizing Human Insights and AI Precision: Hand in Hand for Advancing Knowledge Graph Task (I)

Wang, Shurong	Zhejiang University
Zhang, Yufei	Zhejiang Univeristy
Huang, Xuliang	Zhejiang University
Wang, Hongwei	Zhejiang University
Keywords: Human-Machine Interaction, Human-Computer Interaction, Human-Machine Cooperation and Systems Abstract: Knowledge graph embedding (KGE) has caught significant interest for its effectiveness in knowledge graph completion (KGC), specifically link prediction (LP), with recent KGE models cracking the LP benchmarks. Despite the rapidly growing literature, insufficient attention has been paid to the cooperation between humans and AI on KG. However, humans' capability to analyze graphs conceptually may further improve the efficacy of KGE models with semantic information. To this effect, we carefully designed a human-AI team (HAIT) system dubbed KG-HAIT, which harnesses the human insights on KG by leveraging fully human-designed ad-hoc dynamic programming (DP) on KG to produce human insightful feature (HIF) vectors that capture the subgraph structural feature and semantic similarities. By integrating HIF vectors into the training of KGE models, notable improvements are observed across various benchmarks and metrics, accompanied by accelerated model convergence. Our results underscore the effectiveness of human-designed DP in the task of LP, emphasizing the pivotal role of collaboration between humans and AI on KG. We open avenues for further exploration and innovation through KG-HAIT, paving the way towards more effective and insightful KG analysis techniques.

18:10-18:30, Paper TuDT3.3
Lexicographic Multi-Objective Order Picking Optimization for Robotic Mobile Fulfillment Systems (I)

Wang, Hanying	Northeastern University
Zhao, Ziyan	Northeastern University
Liang, Jiaqi	Polytechnique Montréal
Li, Xingyang	Northeastern University
Liu, Shixin	Northeastern University
Keywords: Human-Machine Cooperation and Systems Abstract: In light of advancements in artificial intelligence, the Internet of Things, and mechatronics, robots are increasingly integrated into e-commerce warehouses to enable smart order picking solutions and foster intelligent automation. A robot-assisted order picking process revolutionizes the traditional labor-intensive person-to-goods order picking technology, leading to a goods-to-person (G2P) smart warehouse. Within it, robots transport pods to predefined picking stations, where human pickers retrieve the requested goods from these pods to fulfill customer orders. The allocation of pods to robots and the scheduling of picking operations are key optimization issues in G2P order picking systems. Although they play a key role in improving operational efficiency, existing research has paid limited attention to their joint optimization. This study considers a lexicographic multi-objective optimization problem to shorten the order picking cycles under the premise of optimizing the picking efficiency evaluated by makespan. We build a mixed integer program for the newly proposed problem and develop a matheuristic algorithm by integrating a commodity-order model into a metaheuristic algorithm to solve it. Experimental results show that the proposed method can significantly shorten the total order picking cycles while keeping the minimum makespan. It outperforms a recent state-of-the-art algorithm. This work emphasizes the importance of joint optimization within G2P smart warehouses and reveals the high potential of the proposed method to be used in practice.


TuDT5
Autonomous Systems and Robotics 2
Chair: Huang, Jie	Wuhan University

17:30-17:50, Paper TuDT5.1
System-Of-Systems, Operations Research and Robotics Swarms Coupling: It’s about Time and It’s Running Out

Omar, Hammami	ENSTA PARIS
Keywords: System Architecture, Robotic Systems, Large-Scale System of Systems Abstract: Systems of system theory, modeling and design methodologies offer an adequate framework to support the strong emergence of heterogeneous swarms of umanned systems. The NATO Architecture Framework (NAF) provide the capabilities and system views needed in this objective. This establishes the junction between system of systems and robotics. These two fields so far decoupled gain considerably by providing a unified and seamless view of heterogeneous robotics swarm research issues. This paper advocates the need to enhance Identification and listing of Operation Research Problems in the Framework of Heterogeneous Robotic Swarms in System-Of-Systems allowing a path for consistent research targets.

17:50-18:10, Paper TuDT5.2
An Air-Ground Cooperative Real-Time Delivery Scheme Based on Joint Scheduling

Huang, Jie	Wuhan University
Liu, Yueheng	Wuhan Cyber Security Association
Cao, Yue	School of Cyber Science and Engineering, Wuhan University, China
Zhang, Xu	University of East Anglia
Lin, Hai	Wuhan University
Song, Yujie	Wuhan University
Chen, Zhuo	Wuhan University
Keywords: Cooperative Systems and Control, Autonomous Vehicle, System Modeling and Control Abstract: With the development of driverless technology, unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) have been widely used in the realm of commodity delivery. In the scenario with a high requirement for timeliness commodity delivery, the mobile unmanned retail mode has grabbed tremendous sights. However, the traditional retail mode solely focuses on presetting routes for delivery, but fails to deal with the real-time change of customers orders with the assistance of UAVs and UGVs. In this paper, we propose an air-ground cooperative real-time delivery scheme based on joint scheduling. Specifically, UGVs deliver commodities to customers based on spatial-temporal costs (e.g., delivery distance and order urgency). The UAV serves to replenish UGVs with commodity resources, achieving a balance between regional resource consumption and UAV resource replenishment based on joint scheduling. Finally, experimental results show our scheme outperforms other baseline schemes in terms of customers average waiting time, UGVs average delivery delay time, and UAV total flight distance.

18:10-18:30, Paper TuDT5.3
A Diverse Group Trading Strategy Portfolio Optimization Algorithm Based on Network Modularity (I)

Chideme, Kudakwashe	National Kaohsiung University of Science and Technology
Chen, Chun-Hao	National Kaohsiung University of Science and Technology
Hong, Tzung-Pei	National University of Kaohsiung
Keywords: Consumer and Industrial Applications, Decision Support Systems Abstract: Strategically allocating capital and safeguarding investors from potential adverse conditions are important challenges in finance. Building upon Markowitz's Modern Portfolio Theory (MPT), which advocates diversifying investments across uncorrelated assets, we extend this principle to encompass diversified trading strategies. Previous works in portfolio optimization have explored various assets and methodologies, including machine learning and evolutionary algorithms. The Group Trading Strategy Portfolio (GTSP) framework, which utilizes the grouping genetic algorithm (GGA) to optimize portfolios of trading strategies, has also been proposed. However, challenges persist in the GTSP framework, such as the absence of a mechanism in fitness evaluation to ensure dissimilarity among trading strategies within the same group, leading to the inclusion of highly correlated trading strategies in the portfolios generated. Additionally, in the GTSP framework, strategies rely on a single stock series, which poses a risk, as the underlying risk profile remains uniform across strategies, limiting diversification opportunities. To overcome these challenges, we integrate the GTSP approach with network theory, specifically modularity, to create a model we call GTSP-Modu. By introducing diverse underlying assets, ensuring similarity within groups, and re-designed the fitness function, experiments on the real datasets show that the proposed approach is better than the previous approach in terms of risk-adjusted returns.


TuDT6	MR06
Ethics, Assurance, and Security
Chair: Jourabchi Amirkhizi, Parisa	Faculty of Design, Tabriz Islamic Art University

17:30-17:50, Paper TuDT6.1
Fault Detection Filtering for Unmanned Surface Vehicles with Markov Switching and Random Packet Losses

Hu, Meng-Jie	China University of Mining and Technology
Park, Ju H.	Yeungnam University
Cheng, Jun	Guangxi Normal University
Keywords: Fault Monitoring and Diagnosis, Autonomous Vehicle, System Modeling and Control Abstract: This paper explores the dissipative fault detection filtering (FDF) issue for unmanned surface vehicles (USVs) within a network environment, considering switching channels and random occurrence of packet losses. Specifically, a multi-channel transmission mechanism is adopted to enhance the reliability of the system, and the switching channel is orchestrated by a Markov chain. The random intermittent data dropouts in each channel between the system and FDF modeled as a Bernoulli process are considered. Leveraging the Lyapunov theory and slack matrix technique, switching-channel-dependent FDF is crafted to detect faults for USVs and ensure that the augmented system keeps stochastic stability while attaining a strictly dissipative performance. Finally, an illustrative example is presented to validate the effectiveness of the proposed method for USVs.

17:50-18:10, Paper TuDT6.2
Resilience Engineering for Industry 5.0 Cyber-Physical Systems: Dynamic Adaptation and Fault Tolerance

Jourabchi Amirkhizi, Parisa	Faculty of Design, Tabriz Islamic Art University
Pedrammehr, Siamak	Deakin University
Pakzad, Sajjad	Faculty of Design, Tabriz Islamic Art University
Nahavandi, Saeid	Swinburne University of Technology
Arogbonlo, Adetokunbo	Deakin University
Asadi, Houshyar	Deakin University
Keywords: Fault Monitoring and Diagnosis, Adaptive Systems, Cyber-physical systems Abstract: Industry 5.0 signifies a transformative integration of cyber-physical systems (CPS) into industrial processes, promising efficiency and innovation but also posing challenges in resilience and reliability. Resilience engineering offers a pivotal framework to address these challenges. However, existing approaches reveal significant gaps, including the need for more adaptive and resilient strategies, a comprehensive understanding of interconnectedness, heightened awareness of vulnerabilities, and integration of advanced technologies. Motivated by these gaps, our research aims to contribute to the development of robust industrial systems in the digital era. We offer a comprehensive review of resilience engineering principles within Industry 5.0 CPS and propose a novel model for integration, emphasizing system resilience, reliability, and sustainability. Our study sets the stage for future research in refining the proposed model, exploring additional applications, and addressing emerging challenges. By embracing resilience engineering, organizations can navigate the complexities of Industry 5.0, ensuring reliability, safety, and sustainability in an interconnected world.

18:10-18:30, Paper TuDT6.4
Web User Profiling Using Fuzzy Signatures and Browser Fingerprinting (I)

Aliberti, Luca	University of Salerno
Apicella, Francesco	Evolution Group
D'Aniello, Giuseppe	University of Salerno
Flammini, Francesco	Mälardalen University
Gaeta, Matteo	University of Salerno
Salzano, Simone	University of Salerno
Keywords: Homeland Security, Enterprise Information Systems, Consumer and Industrial Applications Abstract: Accurately identifying and profiling users is one of the primary challenges of many modern web applications. This paper presents an approach for user profiling that utilizes Fuzzy User Signatures combined with browser fingerprinting techniques. Our approach analyzes users' web domain visit frequencies and categories to determine their preferences and behaviors. Fuzzy User Signatures provide a condensed representation of user activities, enabling a framework for assessing user similarity. This method can significantly improve web navigation experiences by allowing for personalized content and product recommendations. The approach has been evaluated on a dataset comprising users' web activities combined with browser fingerprints, achieving overall good performances.


TuDT7	MR07
Online - AI Applications 4
Chair: Qiao, Gaofei	Inner Mongolia University

17:30-17:50, Paper TuDT7.1
MECNet: Multi-Scale Exposure-Consistency Learning Via Fourier Transform for Exposure Correction

Qiao, Gaofei	Inner Mongolia University
Zhang, Zhibin	Inner Mongolia University
He, Liqiang	Geomechanica Inc
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: In the real world, due to various challenging lighting conditions such as low light, underexposure, and overexposure, captured images often exhibit undesirable appearances. Given that images with different exposure levels require different correction processes, a single neural network struggles to produce satisfactory results. We propose a coarse-to-fine exposure correction model for learning exposure consistency representation to address underexposure and overexposure issues. Building upon the bilateral activation mechanism, we introduce the Fourier transform to capture global information and fuse it with locally extracted information through convolution to achieve superior feature representation. Additionally, we employ Laplacian pyramids to decompose the source image into different spatial frequency bands, then the image details are enhanced by denoising high-frequency layers. Experimental results on the MSEC and SICE datasets demonstrate the superiority of our proposed method over current state-of-the-art approaches. Our code will be made available on GitHub.

17:50-18:10, Paper TuDT7.2
An Optimal Multi-Path Power Routing and Transmission Scheduling Approach for Peer-To-Peer Power Trading in Energy Internet

Maya, Neethu	Indian Institute of Science
Sundararajan, Narasimman	Nanyang Technological University
Sundaram, Suresh	Indian Institute of Science
Keywords: Intelligent Power Grid, Smart Buildings, Smart Cities and Infrastructures Abstract: Energy Internet exploits a network of sparsely interconnected prosumers and maximizes the Peer-to-Peer (P2P) power exchanges by routing power through intermediate peers. A novel graph theory-based deterministic approach for P2P power routing that aids efficient distance-based prosumer matching while addressing limited connectivity challenges is presented here. The proposed Optimal-Walk Multi-path Power Routing (OMPR) approach consists of two steps. First, an optimal-walk connectivity algorithm calculates the shortest path between all the nodes. Leveraging this data, in the second part, the Multi-path Power Routing (MPR) algorithm, identifies optimal paths for handling multiple simultaneous P2P exchanges. In the MPR algorithm, the power scheduling and routing are formulated and solved as a Multi-path Power Scheduling Optimization (MPSO) problem, which is compared as both linear and nonlinear programming. Performance evaluation of the OMPR approach demonstrates the algorithm's scalability and capability to handle complex scenarios efficiently, including rerouting upon connectivity disruptions. The linear MPSO formulation within the OMPR facilitates real-time implementation compared to nonlinear MPSO by solving an 18-walk P2P exchange in a 100-node community in 0.18 seconds.

18:10-18:30, Paper TuDT7.3
A Survey of Applications for Anomaly Detection in the IoT Methods, New Perspectives, and Future

Wang, Yingxiang	Sichuan Normal University
Guo, Rongzuo	Sichuan Normal University
Min, Peng	Sichuan Normal University
Keywords: Fault Monitoring and Diagnosis, System Architecture Abstract: In recent years, with the rapid increase the popularity of cellular Internet of Things (IoT) devices and the sharp increase in the number of end users, ensuring the stability and reliability of IoT systems has become an important challenge. In this context, anomaly detection techniques provide solutions for IoT applications. This paper reviews anomaly detection research applied in the field of the IoT in recent years (mainly from 2018 to 2023) from a technical perspective. First, the causes and basic types of IoT anomalies are introduced to provide a better understanding of the importance of anomaly detection. Second, we focus on research progress in machine learning and edge computing, and propose a general workflow for anomaly detection in the IoT based on edge computing, which we call ECADW. Furthermore, the challenges of anomaly detection in the IoT are proposed, and future research perspectives are prospected. It is hoped that this review can help researchers to better understand the research direction of this topic and choose interesting anomaly detection techniques.


TuDT8	MR08
Online - Manufacturing Automation and Systems	Regular Papers - Cybernetics
Chair: Zhou, Ziyang	Huizhou University

17:30-17:50, Paper TuDT8.1
A Robust Method for Camera Calibration in Noisy Settings Based on Genetic Programming

Casado, Ricardo	Universidade Federal De São Carlos
Tronco, Mário Luiz	São Paulo University
Pedrino, Emerson	Federal University of Sao Carlos
Keywords: Evolutionary Computation, Image Processing and Pattern Recognition, Machine Vision Abstract: In this article, we introduce a novel camera calibration method using genetic programming, to calibrate cameras in noisy environments. Traditional calibration methods, such as those of Tsai and Zhang, extensively use the pinhole camera model, and are less accurate in the presence of noise. In this work, high precision calibration is achieved using pseudo linear genetic programming. Instead of the pinhole camera model, pseudo linear genetic programming generates mathematical functions which allow for far greater precision in the calibration process than classical methods, regardless of the environmental conditions. The method has several challenges such as identification of suitable calibration functions and creation of an extensive training database. However, the method provides advantages in terms of better results quality and practicality, as it eliminates the necessity of the intrinsic camera parameters. The results illustrate that this methodology is far superior in comparison to the current state-of-the-art technique, Zhang's widely used method, with a 20× improvement in calibration accuracy.

17:50-18:10, Paper TuDT8.2
DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

Zhou, Ziyang	Huizhou University
Lei, Yingtie	University of Macau
Chen, Xuhang	Huizhou University
Luo, Shenghong	University of Macau
Zhang, Wenjun	Tp-Link International Shenzhen Co., Ltd., Shenzhen, China
Pun, Chi-Man	University of Macau
Wang, Zhen	Huizhou University
Keywords: Image Processing and Pattern Recognition, Multimedia Computation, Machine Vision Abstract: Shadows in scanned documents pose significant challenges for document analysis and recognition tasks due to their negative impact on visual quality and readability. Current shadow removal techniques, including traditional methods and deep learning approaches, face limitations in handling varying shadow intensities and preserving document details. To address these issues, we propose DocDeshadower, a novel multi-frequency Transformer-based model built upon the Laplacian Pyramid. By decomposing the shadow image into multiple frequency bands and employing two critical modules: the Attention-Aggregation Network for low-frequency shadow removal and the Gated Multi-scale Fusion Transformer for global refinement. DocDeshadower effectively removes shadows at different scales while preserving document content. Extensive experiments demonstrate DocDeshadower's superior performance compared to state-of-the-art methods, highlighting its potential to significantly improve document shadow removal techniques. The code is available at href{https://github.com/leiyingtie/DocDeshadower}{https://github.com/leiyingtie/DocDeshadower}.

18:10-18:30, Paper TuDT8.3
HCU-Net: An Efficient Fully Convolutional Neural Network for Thyroid Nodules Ultrasound Image Segmentation

Ren, Mengxi	WuHan University of Science and Technology
Liu, Jun	Wuhan University of Science and Technology
Keywords: Deep Learning, AI and Applications Abstract: Thyroid nodules, which can manifest as solitary or multiple growths, are among the most prevalent diseases with a high incidence rate. When examining thyroid nodules, ultrasound is typically the first step. Ultrasound image segmentation of thyroid nodules can assist physicians in diagnosis. However, the following three factors limit the development of thyroid nodule segmentation: (1) most of the existing studies optimize the segmentation of single nodules, and the incidence of multiple nodules is higher, so the segmentation of multiple nodules is also very important; (2) the ultrasound image of thyroid nodules is difficult to segment because of its low resolution and fuzzy boundary; (3) Due to the small data set of thyroid nodule ultrasound, in such a small dataset, attention mechanisms may be prone to overfitting, resulting in poorer generalization performance. Therefore, the attention-based model may not be suitable for the segmentation of thyroid nodules. Hence, we propose Hybrid Convolutional U-Net（HCU-Net）, a simple and strong fully convolutional neural network for the segmentation of single or multiple thyroid nodules. HCU-Net incorporates a large receptive field to ensure accurate and stable segmentation. Specifically, we design Hybrid Branch Block to enhance the network’s feature extraction capability for single or multiple nodules in ultrasound images by integrating multi-scale features. Furthermore, we add Enhanced ConvNeXt Blocks to the U-Net’s final layer, enabling the extraction of global contextual information for segmentation performance. The Experimental results show that our proposed method can effectively improve these issues on two publicly available datasets. In addition, compared to attention-based models, our network is simpler and more suitable for small datasets like ultrasound images.


TuDT9	MR09
AI Applications 11	Regular Papers - Cybernetics
Chair: Yang, Jun	Beijing Information Science and Technology University

17:30-17:50, Paper TuDT9.1
Issue Title Generation: How Far Can Large Language Models Go?

Yang, Jun	Beijing Information Science and Technology University
Liu, Shifan	Beijing Information Science and Technology University
He, Qifan	Beijing Information Science and Technology University
Xie, Songcheng	Beijing Information Science and Information Technology Universit
Cui, Zhanqi	Beijing Information Science and Technology University
Keywords: Application of Artificial Intelligence, AI and Applications, Expert and Knowledge-Based Systems Abstract: In open-source software and platforms, developers utilize issues to record software failures or propose new features. The title of an issue, which is a mandatory field, should accurately describe the core content in a concise way. However, developers often face challenges in crafting high quality issue titles due to insufficient experience or limited proficiency. As a result, researchers have proposed several methods for automatically generating issue titles, but typically relying on constructing large datasets to train models. Recently, Large Language Models (LLMs) have exhibited exceptional performance across a variety of general tasks, suggesting significant potential for issue title generation. Initial experiments indicate that the direct application of LLMs fails to yield satisfactory results. Therefore, we propose a method named LBITG (LLMs-Based Issue Title Generation). LBITG enhances the effectiveness of LLMs by providing contextual information through four types of prompts, which include example prompt and label prompt. These prompts serve as guidance for LLMs, thereby further improving their performance. Experimental results demonstrate that LBITG can significantly enhance the quality of issue titles generated by LLMs without any training. In the within-project scenario, LBITG achieves a minimum improvement of 111.29% in ROUGE, 104.54% in BLEU, and 188.48% in METEOR compared to iTAPE, and achieves performance comparable to that of the SOTA method iTiger. Moreover, LBITG demonstrates superior performance in the cross-project scenario, which outperforms iTiger by 25.33%, 30.14%, and 27.29% in terms of ROUGE-1, BLEU-1, and METEOR, respectively.

17:50-18:10, Paper TuDT9.2
Learnable Filter with Decoupling Fusion Method for Sequential Recommendation

Long, Hua	Chongqing University of Technology
Huang, BingWen	Chongqing University of Technology
Lu, Jiaqiang	Chongqing University of Technology
Keywords: Application of Artificial Intelligence, Deep Learning, Big Data Computing, Abstract: The aim of sequential recommendation is to predict the next possible interaction item in a sequence by interpreting the dynamic preferences derived from users' historical interactions. However, these methodologies encounter challenges in effectively leveraging side information that could offer a more precise depiction of user preferences. Furthermore, the presence of noise in the user visit sequence, caused by accidental clicks, hampers the prediction of the subsequent item, thereby undermining the effectiveness of the recommendation. To address these issues, we propose a novel model, LFDF-SR. This model combines a Learnable Filter with Decoupling Fusion Method for Sequential Recommendation. It uses an encoder-decoder architecture to model the interactive items and their corresponding side information separately. To minimize noise, the encoder component incorporates a learnable stacked filter layer to refine item embeddings. In order to more efficiently characterize side information, we innovatively integrate the heterogeneous side information from different sources into a holistic feature representation, which are then embedded through a linear layer. Then the decoder component acquires fused embeddings through a decoupled fusion method of side information, and subsequently merges the denoised item embeddings and fused embeddings using cross-attention to generate the final representation of each item. Extensive experiments on three publicly available real-world datasets demonstrate that our model outperforms nine baseline methods in recommendation performance. These results highlight the advantages of LFDF-SR in handling noise and effectively utilizing side information, showcasing its potential to enhance sequential recommendation performance.

18:10-18:30, Paper TuDT9.3
AlpaCream: An Effective Method of Data Selection on Alpaca

Li, Yijie	Minzu University of China
Sun, Yuan	Minzu University of China; Minority Languages Branch, National La
Keywords: AI and Applications, Deep Learning, Machine Learning Abstract: Instruction Fine-Tuning (IFT) optimizes Large Language Models (LLMs) to enhance the comprehension and execution of user instructions through the use of extensive instruction datasets. However, these datasets are often voluminous, repetitive, and contain a substantial proportion of low-quality data, necessitating effective selection. Traditional data selection methods are plagued by challenges such as insufficient diversity, unclear selection criteria, biases from external LLMs, and excessive resource consumption. This paper proposes AlpaCream, a novel instruction data selection methodology designed for industrial application that aligns with expert insights and ensures maximum diversity of the data. Our approach involves initially categorizing the instruction data into dense clusters using a topic model, and then employing a quality assessment model to isolate high-quality instruction data subsets based on their categorizations. The selected data is further improved by using prompt to enhance the data quality. Finally, using the augmented data to fine-tune the base LLM to get a model with strong instruction-following capability. It can achieve an average performance increase of 31.3% to 39.8% on the Alpaca 52k dataset, which just use 5.76% of the total instruction data. Moreover, AlpaCream surpasses other models developed through alternative instruction fine-tuning data selection methods.


TuDT10	MR10
Fuzzy Systems and Hybrid Models	Regular Papers - Cybernetics
Chair: Elleuch, Souhir	University of Sfax

17:30-17:50, Paper TuDT10.1
Verifying Robustness of Neural Networks with Abstract Features

Li, Xuejian	Anhui University
Xia, Hantao	Anhui University
Keywords: Neural Networks and their Applications, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Assurance Abstract: 神经网络很容易受到攻击，因为它们很容易受到输入上的小扰动的影响。因此，确保高安全性场景中的稳定性和鲁棒性至关重要。许多方法旨在通过定义鲁棒区域来验证神经网络的鲁棒性。然而，这些方法不能完全准确地表示特征区域，导致鲁棒性边界不精确。在本文中，我们提出了一种基于抽象解释理论捕获抽象特征的方法，然后验证神经网络的鲁棒性。首先，将样本分为决策状态和非决策状态，决策状态对应于抽象输出域中的真实输出分类。然后，从抽象输出域迭代到输入域，以获得与决策状态相关的抽象特征。实验结果表明，与定义鲁棒区域的策略相比，所得到的抽象特征具有有效性，并且在验证鲁棒性边界方面具有更高的&#

17:50-18:10, Paper TuDT10.2
Efficient PSO Coupled with a Local Search Heuristic for Radio Resource Allocation in V2X Communications

Elleuch, Souhir	University of Sfax
Ibtissem, Brahmi	University of Sfax
Monia, Hamdi	University of Sfax
Faouzi, Zarai	University of Sfax
Keywords: Metaheuristic Algorithms, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Swarm Intelligence Abstract: Research on cooperative intelligent traffic issues has enhanced ground transportation's efficiency, safety, and comfort. The present work is interested in the resource allocation problem in V2X communications. We propose a new hybrid metaheuristic for a resource allocation scheme to maximize the system's total sum rate. This method is based on the particle swarm optimization (PSO) algorithm and a new suggested local search. We try to leverage the strengths and mitigate the weaknesses of both algorithms. The standard PSO algorithm has issues converging to optimal solutions because it lacks exploitation abilities. The local search aims to expand the search space vertically, allowing for a more balanced approach and addressing global exploration and local exploitation. We compared the proposed approach to the PSO and the Ant Colony Optimization (ACO) algorithms. The simulation was conducted using the MATLAB software platform. The results demonstrate that The algorithms proposed in this article significantly improve the system throughput and access rate of vehicular user equipment (VUEs) while ensuring the data rate of cellular user equipment (CUEs).'The results demonstrate the superiority of the proposed scheme.

18:10-18:30, Paper TuDT10.3
Joint Optimization of Recursive Graph Encoding and CNN Model Based on Heuristic Algorithms

Zhang, Hongda	Fudan University
Gan, Zhongxue	Fudan University
Liu, Yi	Fudan University
Lin, Wei	Fudan University
Liu, Bowen	Fudan University
Meng, Chunlei	Fudan University
Ouyang, Chun	Fudan University
Keywords: Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Heuristic Algorithms, Neural Networks and their Applications Abstract: Peripheral waveform analysis (PWA), which is generally used to reveal hidden health status information from peripheral pulse signals, typically involves three procedures: signal preprocessing, feature extraction, and pattern classification. With the advancement of data-driven deep neural network methodologies, feature extraction and pattern classification have progressively converged into end-to-end neural networks, where the final layer of the network is equivalent to conventional pattern classifiers. However, the performance of deep learning models heavily relies on the quality of the dataset, rendering data signal preprocessing a crucial component. This study proposes a framework that integrates signal preprocessing, feature extraction, and pattern classification into a unified learning approach using heuristic algorithms, enabling the automatic discovery of optimal data encoding methods and their corresponding models. Initially, the search space is defined based on parameters relevant to signal preprocessing, and a fitness function is constructed utilizing CNN. Subsequently, the optimal combination of data preprocessing and CNN is determined through the heuristic algorithm Particle Swarm Optimization (PSO). The proposed method was evaluated in the dataset comprising authentic clinical cases of type 2 diabetes screening involving approximately 200 volunteers. The model derived from this framework demonstrates the capability to effectively discriminate between healthy volunteers and those with diabetes, achieving the highest accuracy of 93.6%. Compared to state-of-the-art algorithms, the proposed model was shown to be competitive in both accuracy and time cost.


TuDT11	MR11
Image Processing and Pattern Recognition 3	Regular Papers - Cybernetics
Chair: Cong, Shan	Harbin Engineering University

17:30-17:50, Paper TuDT11.1
A Cross-Modal Interactive Memory Network Based on Fine-Grained Medical Feature Extraction for Radiology Report Generation

Ma, Xitong	Qilu University of Technology (Shandong Academy of Sciences)
Kuang, Yuansen	Qilu University of Technology(Shandong Academy of Sciences)
Yuan, Lin	Qilu University of Technology (Shandong Academy of Sciences)
Tian, Cheng	Qilu University of Technology (Shandong Academy of Sciences)
Zeng, YiJie	Qilu University of Technology(Shandong Academy of Sciences)
Liu, Song	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: Radiology report generation is an essential task in the medical field, which aims to automate the generation of medical terminology descriptions of radiology images. However, this task currently suffers from several problems: 1) existing methods need to manually build knowledge graphs or templates (consuming time and effort) to introduce medical or prior knowledge to assist in report generation; 2) previous models cannot handle the problem of data bias well (anomaly reports and anomaly descriptions make up only a tiny portion of the dataset), causing the models to ignore the learning of anomaly descriptions easily; 3) existing approaches cannot robustly supervise the model, resulting in incomplete and inconsistent reports being generated. To address these issues, we propose a cross-modal interactive memory network based on fine-grained medical feature extraction. In our model, we design a cross-modal interactive memory network to automatically store and remember the required medical text knowledge and use this medical knowledge to help generate reports. Furthermore, we design an abnormal medical knowledge enhancement module to enhance the learning of abnormal fine-grained knowledge through the interaction of disease topics and their states to interact with text features. In addition, we design a cross-modal joint semantic loss unit to reduce semantic differences between different features and improve the visual representation ability of the model. We experimented and evaluated our model on MIMIC-CXR and IU-Xray datasets to compare with other baseline models.

17:50-18:10, Paper TuDT11.2
A Novel 3D Medical Image Segmentation Model Using Improved SAM

Kuang, Yuansen	Qilu University of Technology(Shandong Academy of Sciences)
Ma, Xitong	Qilu University of Technology (Shandong Academy of Sciences)
Zhao, Jing	Qilu University of Technology (Shandong Academy of Sciences)
Wang, Guangchen	Qilu University of Technology (Shandong Academy of Sciences)
Zeng, YiJie	Qilu University of Technology(Shandong Academy of Sciences)
Liu, Song	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: 3D medical image segmentation is an essential task in the medical image field, which aims to segment organs or tumours into different labels. A number of issues exist with the current 3D medical image segmentation task: existing models cannot simultaneously obtain the space correlation and depth correlation of 3D slices; previous models suffer from local detail loss of positional embedding in 3D images; previous approaches often have blurring of boundaries in segmenting 3D images. To solve these shortcomings, we propose a 3D medical image segmentation model named TPM-SAM. In our model, we design a twinchannel image encoder to simultaneously capture the space correlation and depth correlation of 3D slices through a multi-head attention mechanism and improved adapters. Furthermore, we design a prompt encoding generator, which divides the volumetric image into small blocks and better captures the local detail information. In addition, we introduce a multi-layer aggregation decoder by employing U-Net with multi-level skip connection to solve the blurring of boundaries in processing 3D images. Finally, we experimented and evaluated our model on KiTS21 and LiTS17 datasets to compare with other baseline models.

18:10-18:30, Paper TuDT11.3
DDASR: Domain-Distance Adapted Super-Resolution Reconstruction of MR Brain Images

Cong, Shan	Harbin Engineering University
Cui, Kailong	Qingdao Innovation and Development Center, Harbin Engineering Un
Yang, Yuzun	Harbin Engineering University
Zhou, Yang	Harbin Medical University Cancer Hospital
Wang, Xinxin	Harbin Medical University Cancer Hospital
Luo, Haoran	Harbin Engineering University
Zhang, Yichi	Fudan University
Yao, Xiaohui	Harbin Engineering University
Keywords: Image Processing and Pattern Recognition, Biometric Systems and Bioinformatics, Computational Life Science Abstract: High detail and fast magnetic resonance imaging (MRI) sequences are highly demanded in clinical settings, as inadequate imaging information can lead to diagnostic difficulties. MR image super-resolution (SR) is a promising way to address this issue, but its performance is limited due to the practical difficulty of acquiring paired low- and high-resolution (LR and HR) images. Most existing methods generate these pairs by down-sampling HR images, a process that often fails to capture complex degradations and domain-specific variations. In this study, we propose a domain-distance adapted SR framework (DDASR), which includes two stages: the domain-distance adapted down-sampling network (DSN) and the GAN-based super-resolution network (SRN). The DSN incorporates characteristics from unpaired LR images during down-sampling process, enabling the generation of domain-adapted LR images. Additionally, we present a novel GAN with enhanced attention U-Net and multi-layer perceptual loss. The proposed approach yields visually convincing textures and successfully restores outdated MRI data from the ADNI1 dataset, outperforming state-of-the-art SR approaches in both perceptual and quantitative evaluations. Code is available at https://github.com/Yaolab-fantastic/DDASR.


TuDT12	MR12
Haptic and Human-Computer Interaction 9
Chair: Cao, Yukun	ShangHai University of Electric Power

17:30-17:50, Paper TuDT12.1
FRD-DST: A Fine-Grained Relation-Driven Model for Dialogue State Tracking

Cao, Yukun	ShangHai University of Electric Power
Chen, Ming	Shanghai University of Electric Power
Li, Jingjing	Shanghai University of Electric Power
Liu, Yuanmin	Shanghai University of Electric Power
Wang, Tianhao	Shanghai University of Electric Power
Keywords: Human-Computer Interaction, Networking and Decision-Making, Augmented Cognition Abstract: Dialogue State Tracking (DST) aims to convert dialogue history into dialogue states of slot-value pairs. Many existing studies usually utilize deep neural networks to learn the representation of dialogues and slots. However, these studies usually do not adequately consider the fine-grained relationships between each word and slot in a dialogue. Meanwhile, due to the complexity of the task, there needs to be more textual and semantic diversity in dialogues. To address these challenges, we propose a fine-grained relation-driven model (FRD-DST) to synthesize the dialogues and the connections between words and slots. In this approach, each word and slot of information in a dialogue context is constructed as a dialogue word-slot heterograph, and a relationship aggregation network is used to capture the fine-grained features between them. Meanwhile, to complement the contextual association features that the relational aggregation network may not adequately capture, we use a conditional random field (CRF) to capture the dialogue contextual association features after synonym replacement. The two feature information sets are fused in a hidden space, which can create new features by mixing the hidden states of different texts to create new semantic variants, thus enhancing the diversity of dialogue texts and semantics. The experimental results show that FRD-DST achieves state-of-the-art DST performance on the MultiWOZ 2.1 and MultiWOZ 2.4 datasets compared to existing DST methods.

17:50-18:10, Paper TuDT12.2
Computational Modeling of Mental Health Checkup with Response-Based Characterization Using a Smart Mirror

Noguchi, Taiga	Tsukuba University
Hirokawa, Masakazu	NEC Corporation
Doki, Shotaro	Tsukuba University
Suzuki, Kenji	University of Tsukuba
Keywords: Human-Computer Interaction, User Interface Design, Human-Machine Interface Abstract: In this study, we developed a smart mirror device for continuous mental health checkup based on individual response characteristics through simple and short dialogue interaction on a daily basis. Early detection and intervention have a predominant impact on remission in many cases of mental illness, and daily monitoring is essential for early detection. However, existing methods require time-consuming and burdensome measurements and rely on information obtained from patients' experiences, communication, and physical findings for diagnosis. We propose a mental health checkup system that uses supervised learning to model physician's evaluation based on the response characteristics of individuals during short dialogue interaction with the smart mirror. A validation experiment with 10 participants was conducted for two weeks, and the results showed that mental health checkups by physician was successfully reproduced with the proposed algorithm. In conclusion, this study demonstrates the potential of Smart Mirror for mental health checkup in daily life. It provides a potentially more objective and non-invasive method for early detection and monitoring, which could contribute to gradual improvements in mental healthcare.

18:10-18:30, Paper TuDT12.3
Ontology-Driven Reinforcement Learning for Personalized Student Support (I)

Hare, Ryan	Rowan University
Tang, Ying	Rowan University
Keywords: Human-Machine Cooperation and Systems, Human-centered Learning, Assistive Technology Abstract: In the search for more effective education, there is a widespread effort to develop better approaches to personalize student education. Unassisted, educators often do not have time or resources to personally support every student in a given classroom. Motivated by this issue, and by recent advancements in artificial intelligence, this paper presents a general-purpose framework for personalized student support, applicable to any virtual educational system such as a serious game or an intelligent tutoring system. To fit any educational situation, we apply ontologies for their semantic organization, combining them with data collection considerations and multi-agent reinforcement learning. The result is a modular system that can be adapted to any virtual educational software to provide useful personalized assistance to students.

Technical Program for Tuesday October 8, 2024