SMC 2024 Program | Wednesday October 9, 2024


WeAT1	MR01
Cybernetics and Quantum Systems 3	Regular Papers - Cybernetics
Chair: Sheng, Taoran	University of Texas at Arlington

08:45-09:05, Paper WeAT1.3
Fine-Grained Derivative-Free Simultaneous Optimistic Optimization with Local Gaussian Process

Song, Junhao	East China Normal University
Zhang, Yangwenhui	East China Normal University
Qian, Hong	East China Normal University
Keywords: Machine Learning, Computational Intelligence, Evolutionary Computation Abstract: Derivative-free optimization has achieved remarkable success across a variety of applications where the explicit formulation of an objective function is inaccessible. Learning an accurate surrogate model from solutions and their function values is crucial for derivative-free optimization. Methods for constructing global surrogate models, such as Bayesian optimization (BO), encounter the challenge of high learning cost, which impairs optimization efficiency. Splitting the entire search domain into smaller regions, a series of domain partition methods are proposed, like simultaneous optimistic optimization (SOO). It has demonstrated notable effectiveness in derivative-free optimization but still has room for improvement due to its relatively coarse-grained partition strategy. To this end, this paper proposes a fine-grained simultaneous optimistic optimization (FGSOO) method with local Gaussian process. Specifically, FGSOO designs a fine-grained partition strategy to endow SOO with the capability of cross-height comparison, and utilizes local Gaussian process to make nodes' potential more representative, so as to reduce the required number of solutions for learning surrogate models. Compared with BO, FGSOO reduces the learning cost. Meanwhile, compared with SOO, FGSOO could avoid unnecessary partition. The experimental results on real-world tasks, such as trajectory optimization and molecule substructure optimization, verify that FGSOO surpasses the compared methods in improving efficiency while maintaining effectiveness.

09:05-09:25, Paper WeAT1.4
Domain Knowledge Based Weakly Self-Supervised Human Activity Recognition with Wearables

Sheng, Taoran	University of Texas at Arlington
Huber, Manfred	The University of Texas at Arlington
Keywords: Machine Learning, Application of Artificial Intelligence, Deep Learning Abstract: Recognizing different types of human activities from wearable sensor-based data remains a challenging research topic in ubiquitous computing, despite the availability of embedded sensors in smartphones and wearable devices. The lack of labeled data poses a significant hurdle for existing human activity recognition (HAR) systems that heavily rely on supervised methods. In this paper, we propose a novel weakly self-supervised approach consisting of two stages. Firstly, our model leverages the inherent nature of human activities to project the data into an embedding space, grouping similar activities together. Secondly, the model is fine-tuned using similarity information in a few-shot learning fashion, enhancing the embedding's discriminative power. This enables downstream classification or clustering tasks to benefit from the learned embeddings. We evaluate our framework on three benchmark datasets and demonstrate its effectiveness. Our approach achieves comparable performance to pure supervised techniques applied directly to fully labeled datasets, thereby aiding in identifying and categorizing underlying human activities. The results highlight the potential of our approach to improve clustering algorithms for activity recognition tasks in real-world scenarios with limited labeled data.

09:25-09:45, Paper WeAT1.5
UG-STNN：A Spatial-Temporal Neural Network Based on Unsupervised Graph Representation Module for Traffic Flow Prediction

Zhang, Enwei	Qingdao University
Cheng, Zesheng	College of Computer Science and Technology, Qingdao University,
Wang, Tiankuan	Faculty of Computer Science, University of Alberta, Edmonton, Ca
Liu, Weidong	Menual School, Qingdao
Keywords: Machine Learning Abstract: Accurate and efficient traffic flow prediction helps to build an intelligent transportation system and improve the travel experience in daily life. In this study, a new Spatial-Temporal Neural Network Based on Unsupervised Graph Representation Module (UG-STNN) is proposed to improve the graph convolution module, which uses unsupervised learning to extract features in spatial dimensions, and it can learn the structural and feature information in the graph better. Our UG-STNN uses fewer convolutional layers to reduce the number of parameters, decrease the complexity of the model, and improve performance and accuracy. From the experimental results of UG-STNN on different test datasets, the model can approach or even achieve better prediction results compared with other models, which well illustrates the accuracy and stability of the UG-STNN model.


WeAT2	MR02
Entertainment and Media Computing
Chair: Zhao, Hongtian	Xinjiang University

08:05-08:25, Paper WeAT2.1
ReMark: Reversible Lexical Substitution-Based Text Watermarking

Jiang, Ziyu	Sichuan University
Wang, Hongxia	Sichuan University
Keywords: Information Assurance and Intelligence, Multimedia Computation, Application of Artificial Intelligence Abstract: Neural-based natural language watermarking (NLW) shows promise for generating context-aware lexical substitutions, minimizing semantic loss in watermarked text. However, existing works confront two primary challenges: 1) the reliance and sensitivity on textual context during substitutes generation hinders text reversibility, and 2) strict synchronization constraints on the generation order of substitutes from both original and watermarked text blocks out some suitable substitutes, limiting watermark capacity. This paper puts forward a reversible neural NLW approach with improved capacity and text quality. Specifically, we construct a novel lexical substitution system (LSS), utilizing prompt learning for candidates generation and comprehensive assessment features for candidates ranking. A reversible watermarking scheme is then presented by ingeniously screening recoverable positions and enabling multi-bit substitutions via the proposed LSS. Experiments validate that our method achieves complete reversibility while enhancing watermark payload and text fidelity compared to prior arts.

08:25-08:45, Paper WeAT2.2
Improving Multi-View Vehicle Identification in Complex Scenes Using Robust Deep Neural Networks

Zhao, Hongtian	Xinjiang University
Keywords: Deep Learning, Multimedia Computation, AI and Applications Abstract: Vehicle re-identification is crucial for intelligent transportation systems and traffic management, enabling the matching of vehicles across diverse camera captures. The primary challenge lies in the significant variation in appearance and background due to different camera angles, which complicates the retrieval of consistent vehicle images from a database. The variability directly impacts the effectiveness of re-identification techniques. This paper proposes a novel learning approach that leverages a view-consistent triplet loss framework and region segmentation to address the challenges of pose variation and background complexity in vehicle re-identification due to multi-view imaging. Specifically, the consistency of segmentation area distribution is used to estimate view consistency, and then triplets are selected based on it. Concentrating on distinguishing features between sample pairs that are likely to be confused, our approach markedly enhances model robustness in scenarios involving multiple perspectives. Experimental evaluations on the Veri776 dataset demonstrate that the proposed method surpasses several state-of-the-art techniques across various metrics and shows exceptional performance in recognizing samples with complex viewpoints, thus validating the efficacy of our approach.

08:45-09:05, Paper WeAT2.3
Multimodal Mutual Learning with Online Knowledge Distillation for Dim Object Recognition in Aerial Images

Li, Zixing	National University of Defense Technology
Lan, Zhen	National University of Defense Technology
Yan, Chao	Nanjing University of Aeronautics and Astronautics
Xiang, Xiaojia	National University of Defense Technology
Tang, Dengqing	National University of Defense Technology
Keywords: Deep Learning, Multimedia Computation Abstract: Deep learning methods have shown promise in various visual tasks such as object recognition. However, achieving robust and accurate performance in dim object recognition for remote sensing images remains challenging in the field of computer vision. This challenge can be attributed to factors such as cluttered backgrounds, varying observing angles, and limited availability of labeled data. In contrast, the human brain exhibits robust and efficient recognition of sensitive targets. To leverage the strengths of both computer calculation and human cognition, we propose a multimodal mutual learning with online knowledge distillation method (MMOKD) for object recognition. Our approach enables simultaneous training and mutual learning between modalities, where each modality serves as both a teacher and a student. A series of experiments are conducted to verify the potential of multimodal learning for object recognition. The results demonstrate that our approach not only enhances the robustness of multimodal fusion model, but also improves the accuracy of visual modality.


WeAT3	MR03
Wearable Computing
Chair: Ting, Wei-Lun	Department of Electronic Engineering, National Taipei University of Technology

08:05-08:25, Paper WeAT3.1
Real-Time Pedestrian Dead Reckoning for IoT Based Platform-Independent Positioning System

Vu, Anh Van	Korea Advanced Institute of Science and Technology
Nguyen, Thanh Minh	Korea Advanced Institute of Science and Technology
Sung, Changmin	Korea Advanced Institute of Science and Technology
Han, Dongsoo	Korea Advanced Institute of Science and Technology
Keywords: Wearable Computing Abstract: This paper introduces a real-time pedestrian dead reckoning (PDR) algorithm for Internet of Things (IoT) devices. It is motivated by a practical challenge encountered during the development and operation of a positioning system named KAILOS, which provides positioning services via a smartphone application. Its dependence on smartphones causes data collection constraints imposed by operating systems. To mitigate this issue, we designed and incorporated a dedicated IoT device into the system, enabling full access to sensing data. This invention demands developing and porting a PDR algorithm into the IoT device to address the communication bottleneck with a remote positioning server. Our goal was to process a large amount of data from motion sensors instantly on the device to produce precise positioning information without forwarding all of the data to the server. To this end, we propose a new approach for detecting steps using acceleration differential in our PDR. Additionally, a dynamic gyroscope bias update strategy is also included to enhance the capability of heading estimation. These advancements not only enhance the accuracy of the PDR algorithm but also facilitate its implementation on IoT devices. We practically deployed the PDR algorithm into our IoT hardware platform called Kailos Tag (K-Tag). Via the extensive experiments conducted both indoors and outdoors, we found that our real-time PDR outperformed the conventional methods. It reduces step detection errors (SDE) to approximately 1.6%, travel distance errors (TDE) to below 1.8%, and end/start errors(E/SE) to about 3.2m regardless of environment. Moreover, it enables an average positioning latency of 2.49 ms, while consuming only 20% of CPU usage and 8.6% of total power consumption.

08:25-08:45, Paper WeAT3.2
FedBChain: A Blockchain-Enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights

Li, Gaoxuan	Monash University
Lim, Chern Hong	Monash University Malaysia
Ma, Qiyao	Sichuan University
Tang, Xinyu	Monash University
Tew, Hwa Hui	Monash University Malaysia
Ding, Fan	Monash University
Luo, Xuewen	Monash University
Keywords: Wearable Computing, Human-centered Learning, Systems Safety and Security, Abstract: Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unknown. In this paper, we introduce a novel framework: FedBChain, which integrates the federated learning paradigm based on a modified DeepConvLSTM architecture with a single LSTM layer. This framework performs comparative tests of prediction performance on three different real-world datasets based on three different hidden layer units (128, 256, and 512) combined with five different federated learning strategies, respectively. The results show that our architecture has significant improvements in Precision, Recall and F1-score compared to the centralized training approach on all datasets with all hidden layer units for all strategies: FedAvg strategy improves on average by 4.54%, FedProx improves on average by 4.57%, FedTrimmedAvg improves on average by 4.35%, Krum improves by 4.18% on average, and FedAvgM improves by 4.46% on average. Based on our results, it can be seen that FedBChain not only improves in performance, but also guarantees the security and privacy of user data compared to centralized training methods during the training process. The code for our experiments is publicly available (https://github.com/Glen909/FedBChain).

08:45-09:05, Paper WeAT3.3
Development of Smart Mask System Integrated with Alert Detection and Vital-Sign Measurement (I)

Ting, Wei-Lun	Department of Electronic Engineering, National Taipei University
Hsiao, Chun-Chieh	Lunghwa University of Science and Technology
Lee, Ren-Guey	National Taipei University of Technology
Keywords: Wearable Computing, Environmental Sensing,, Assistive Technology Abstract: This research focuses on enhancing worker safety in environments with high levels of TVOC gases, such as toluene, a neurotoxin. Traditional cartridge replacement methods, based on fixed intervals, lack precision and pose risks. We propose a novel approach using the SGP30 sensor to monitor cartridge effectiveness, aligned with PEL-TWA regulations. Our system triggers alerts when sensor readings exceed 10 ppm for toluene, 35 ppm for carbon monoxide, and 5000 ppm for carbon dioxide. Additionally, we incorporate physiological monitoring via Photoplethysmography (PPG) using the AFE4404 sensor to assess Heart Rate (HR) and Heart Rate Variability (HRV), alongside respiration monitoring with the D6F-P0010AM2 airflow sensor. Data from these sensors is transmitted via low-power Bluetooth to a mobile APP, enabling real-time monitoring of the wearer’s condition. In case of cartridge failure or abnormal physiological readings, the APP triggers vibrations on the mask and automatically sends an SOS to the employer’s server via UDP protocol, ensuring immediate intervention. This system aims to enhance occupational safety by reducing the risk of accidents and safeguarding worker well-being.


WeAT4	MR04
Human-Centered Design and Systems	Regular Papers - HMS


WeAT5	MR05
Cyber-Physical Systems and Robotics 1	Regular Papers - SSE
Chair: George, Nijil	TCS Research, Tata Consultancy Services Ltd

08:05-08:25, Paper WeAT5.1
Visual Anomaly Detection with Self-Attention and Separate Memory Bank

Hattori, Kosaburo	Ritsumeikan University
Ishibashi, Ryuto	Ritsumeikan University
Kaneko, Hayata	Ritsumeikan University
Meng, Lin	Ritsumeikan University
Izumi, Tomonori	Ritsumeikan University
Keywords: Robotic Systems, Soft Robotics, Manufacturing Automation and Systems Abstract: Declining birthrate and aging populations are progressing all over the world. This has led to labor shortage, making visual inspections more challenging in various industries. Recently, visual anomaly detection methods using deep learning have been proposed to solve these problems. However, they are computationally expensive and difficult to infer in real-time, even in a GPU environment. In addition, while they detect structural anomalies (e.g., scratches and stains), logical anomalies(e.g., mis-position and mis-number) cannot be detected. This work proposes an anomaly detection method to detect both structural and logical anomalies with high speed by improving PatchCore. The proposal applies self-attention mechanism for the intermediate layer of the pre-trained Convolutional Neural Networks(CNN) model. Self-attention mechanism enables the model to understand the relationships between image features and detect logical anomalies. In addition, the global and local features are extracted from the intermediate layer of the pretrained CNN model and stored in Separate Memory Bank (SMB). SMB leads to improving AUROC, which represents accuracy, by calculating features for each feature type. It also avoids unnecessary upsampling and reduces the dimensionality, thus improving inference speed. Experiments validate the proposed method and compare previous anomaly detection methods. Experiments evaluate the performance of the proposal for the CAD-SD dataset and MVTec LOCO dataset, which contains structural and logical anomalies. For Co-occurrence dataset, the experimental results show that the proposal achieves 98.5% (improving 2.2%) for AUROC and 16.1 (improving 66.6%) for FPS compared to the state-of-the-art method. Also, the experimental results show that the proposal achieves 82.8% (improving 0.9%) for MVTec-LOCO dataset. Hence, the proposal can contribute to the efficiency and automation of manufacturing, medical, and other fields.

08:25-08:45, Paper WeAT5.2
HyperSurf: Quadruped Robot Leg Capable of Surface Recognition with GRU and Real-To-Sim Transferring

Satsevich, Sergei	Skolkovo Institute of Science and Technology
Savotin, Yaroslav	Skolkovo Institute of Science and Technology
Belov, Danil	Skolkovo Institute of Science and Technology
Pestova, Elizaveta	Skolkovo Institute of Science and Technology
Erkhov, Artem	Skolkovo Institute of Science and Technology
Khabibullin, Batyr	Skolkovo Institute of Science and Technology
Bazhenov, Artem	Skolkovo Institute of Science and Technology
Kovalev, Vyacheslav	Skolkovo Institute of Science and Technology
Fedoseev, Aleksey	Skolkovo Institute of Science and Technology
Tsetserukou, Dzmitry	Skoltech
Keywords: Mechatronics, Adaptive Systems, Robotic Systems Abstract: This paper introduces a system of data collection acceleration and real-to-sim transferring for surface recognition on a quadruped robot. The system features a mechanical single- leg setup capable of stepping on various easily interchangeable surfaces. Additionally, it incorporates a GRU-based Surface Recognition System, inspired by the system detailed in the Dog-Surf paper [1]. This setup facilitates the expansion of dataset collection for model training, enabling data acquisition from hard-to-reach surfaces in laboratory conditions. Furthermore, it opens avenues for transferring surface properties from reality to simulation, thereby allowing the training of optimal gaits for legged robots in simulation environments using a pre-prepared library of digital twins of surfaces. Moreover, enhancements have been made to the GRU-based Surface Recognition System, allowing for the integration of data from both the quadruped robot and the single-leg setup. The dataset and code have been made publicly available.

08:45-09:05, Paper WeAT5.3
System for Autonomous Management of Retail Shelves Using an Omnidirectional Dual-Arm Robot with a Novel Soft Gripper

George, Nijil	TCS Research, Tata Consultancy Services Ltd
Saha, Somdeb	Tata Consultancy Services
Parab, Shubham	Tata Consultancy Services
Vakharia, Vismay	Tata Consultancy Services
Lima, Rolif	Tata Consultancy Services
Vatsal, Vighnesh	TCS Research, Tata Consultancy Services Ltd
Das, Kaushik	TCS Research
Keywords: Robotic Systems, Soft Robotics, Consumer and Industrial Applications Abstract: Managing shelves in retail stores includes restocking, rearrangement and replenishment of products. As these are some of the most labor-intensive activities, there has been widespread demand from retailers for automation in this domain. However, major challenges still remain in perception, navigation and manipulation while implementing an autonomous robotic system for this purpose. We present a system aimed at addressing some of these challenges through novel approaches. In terms of perception, we have developed a transformer-based local anomaly detection algorithm that can identify misplaced items without the need for a central database. Navigation of the omnidirectional mobile base is performed through stereo vision and LiDAR sensors. Finally, identifying grasping and manipulation as one of the key shortcomings of present robotic systems in this domain, we have developed a customized soft robotic gripper targeted at retail objects. It has compliant cable-driven fingers, and a palm configuration that can be adapted in real-time based on the target object's geometry. Coupled with a conventional two-fingered gripper in a dual-arm setup, this system is equipped to handle most objects encountered in a retail setting. We describe the underlying hardware and algorithms for each component of the system, evaluating their individual performance. We then evaluate the whole system in a mock retail setup, demonstrating promising results for autonomous management of shelves.

09:05-09:25, Paper WeAT5.4
Jam-Absorption Driving with Data Assimilation

Li, Siyu	The University of Tokyo
Nishi, Ryosuke	Tottori University
Yanagisawa, Daichi	The University of Tokyo
Nishinari, Katsuhiro	The University of Tokyo
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Abstract: This paper introduces a data assimilation (DA) framework based on the extended Kalman filter-cell transmission model, designed to assist jam-absorption driving (JAD) operation to alleviate sag traffic congestion. To ascertain and demonstrate the effectiveness of the DA framework for JAD operation, in this paper, we initially investigated its impact on the motion and control performance of a single absorbing vehicle. Numerical results show that the DA framework effectively mitigated underestimated or overestimated control failures of JAD caused by misestimation of key parameters (e.g., free flow speed and critical density) of the traffic flow fundamental diagram. The findings suggest that the proposed DA framework can reduce control failures and prevent significant declines and deteriorations in JAD performance caused by changes in traffic characteristics, e.g., weather conditions or traffic composition.


WeAT6	MR06
Infrastructure Systems and Services 1	Regular Papers - SSE
Chair: Walter, Marcelo Luis	PUC-PR

08:05-08:25, Paper WeAT6.1
Satellite Image and Tree Canopy Height Analysis Using Machine Learning on Google Earth Engine with Carbon Stock Estimation

Loo, ChuKiong	University of Malaya
Wang, Huang Han	University of Malaya
Keywords: Smart Buildings, Smart Cities and Infrastructures Abstract: This research presents a comprehensive investigation into the dynamics of forested ecosystems using advanced geospatial techniques and machine learning applications, focusing on the University of Malaya study area. The study aims to contribute crucial data for informed decision-making aligned with sustainable development goals. It encompasses canopy height estimation, aboveground biomass density prediction, and carbon stock estimation. Machine learning algorithms, including Random Forest, Gradient Boost Tree Regression, and Support Vector Machine, are employed for canopy height estimation. Their performance is evaluated with and without Principal Component Analysis using metrics such as Root Mean Squared Error and R-squared. Results, summarized in Table 1 and Table 2, highlight the variability in canopy height predictions across different models and feature selection methods. The research explores challenges associated with GEDI Aboveground Biomass Density data, emphasizing spatial variability in model performance across different strata. Results, detailed in Table 3, underscore the importance of tailoring the model, especially in areas characterized by high biomass canopy forests. The integration of Aboveground Biomass Density data with tree cover datasets forms the basis for aboveground carbon stock estimation. Carbon stock is calculated considering forest area, land-use types, and specific carbon content factors. Findings, presented in Table 5, reveal a spectrum of aboveground carbon stock estimates, reflecting the complexity of the University of Malaya study area. This research advances remote sensing and machine learning in forestry and environmental monitoring. Its insights support informed decision-making and policy formulation.

08:25-08:45, Paper WeAT6.2
Evaluation of Machine Learning Models in a Smart Water Metering System

Walter, Marcelo Luis	PUC-PR
Ribeiro, Juliano	Pontifícia Universidade Católica - PUC-PR
Nunes, Leonardo Reis	Sumersoft Tecnologia
Nodari, Alexandre Luis	PUCPR
Pellenz, Marcelo Eduardo	Graduate Program in Computer Science (PPGIa) - Pontifical Cathol
Scalabrin, Edson Emilio	Pontifícia Universidade Católica Do Paraná
Tramontini, Ramon	PUC Parana
Keywords: Smart Metering, Smart Buildings, Smart Cities and Infrastructures, Fault Monitoring and Diagnosis Abstract: The integration of AI with IoT heralds the era of AIoT (Artificial Intelligence of Things). It represents a transformative approach in technology and opens up a new opportunity for deploying machine learning models in embed- ded devices that face resource constraints and operate on the edge of networks. Central to this study is the implementation of computational vision techniques for digit recognition, evaluating various machine learning models, particularly in the context of smart metering. The selected models were converted from GPU-equipped workstations to ESP32-S3 microcontroller-based low-end devices. Through a series of experiments using ESP32-S3 development kits, the MNIST database, and TensorFlow Lite, we explore the effectiveness of these models in smart metering applications, focusing on accuracy, inference times, and the challenges in model conversion. The findings demonstrate the feasibility of executing machine learning inferences on low-end devices with high accuracy in smart meter contexts. However, challenges such as model size limitations, processing speed, conversion difficulties, and potential accuracy loss were noted. Not all models were viable for conversion to TensorFlow Lite. Simpler models like LeNet5 emerged as effective solutions for smart metering applications, balancing size, accuracy, and latency. This work offers practical insights for researchers and engineers looking to implement machine learning in AIoT and smart metering environments, highlighting the trade-offs and considerations for effective deployment.

08:45-09:05, Paper WeAT6.3
A Neighborhood Reconstruction-Based Cyber Attack Detection Method for Smart Grid Security

Ren, Wanwan	Central South University
Peng, Jun	Central South University
Li, Shuo	Changsha University of Science and Technology
Zhang, Rui	Changsha University
Rong, Jieqi	Central South University
Li, Heng	Central South University
Keywords: Smart Buildings, Smart Cities and Infrastructures Abstract: The integration of advanced communication and information technologies in smart grids has led to enhanced efficiency and reliability but also introduced security vulnerabilities, prompting the need for robust cyber attack detection methods. Traditional approaches struggle to capture evolving attack patterns and handle high-dimensional data, highlighting the necessity for more sophisticated approaches. A neighborhood reconstruction-based smart grid attack detection scheme based on subgraphs is proposed. By leveraging Graph Neural Networks (GNNs), the challenge of capturing complex interdependencies among grid nodes is addressed. This approach employs unsupervised learning principles, training the model solely on normal data and utilizing the reconstruction error of node features to detect attacks. Additionally, by subgraph sampling and feature suppression, the model's ability to utilize neighborhood information is enhanced, thereby further improving detection effectiveness. Simulation results on IEEE 30-bus and IEEE 118-bus power system demonstrate the feasibility of the method, achieving a detection accuracy of 96.67% and 97.46%, respectively.


WeAT7	MR07
Online - AI Applications 5
Chair: Zou, Zhiyuan	Wuhan Textile University

08:05-08:25, Paper WeAT7.1
A Multi-Lead Electrocardiogram Signal Classification Method Based on Temporal and Multi-View Contrastive Learning

Li, Luyao	Qilu University of Technology (Shandong Academy of Sciences)
Liu, Hui	Qilu University of Technology (Shandong Academy of Sciences)
Zhou, Shuwang	Shandong Artificial Intelligence Institute, Qilu University of T
Liu, Zhaoyang	Shandong Artificial Intelligence Institute, Qilu University of T
Shu, Minglei	Shandong Artificial Intelligence Institute, Qilu University of T
Keywords: Application of Artificial Intelligence, Biometric Systems and Bioinformatics, Neural Networks and their Applications Abstract: 可穿戴设备的兴起导致了产生大量未标记的心电图（心电图）数据。有效利用这些数据已经是一个挑战。解决此问题的一种方法是对比学习。然而，大多数现有的对比主要基于数据增强的学习方法利用增强心电图（ECG）信号进行比较。这些方法有一定的缺点。而增强的心电图信号可能有助于突出某些特征，它们也可能可能掩盖重要原始信号的变化，导致重要信息。此外，仅依赖增强数据可能导致模型过于依赖特定变化模式，忽略了真实的特征存在于原始信号中，导致不良下游任务的泛化性和鲁棒性。自针对这个问题，我们提出了一个多线索基于的心电图信号分类方法时间和多视角对比学习。方法利用 ECG 信号的时间不变性和原始

08:25-08:45, Paper WeAT7.2
XWCoDe: XGBoost with Weighted Code Dependency for Requirements-To-Code Traceability Link Recovery

Zou, Zhiyuan	Wuhan Textile University
Wang, Bangchao	School of Computer Science and Artificial Intelligence, Wuhan Te
Deng, Yang	School of Computer Science and Artificial Intelligence, Wuhan Te
Wan, Hongyan	School of Computer Science and Artificial Intelligence, Wuhan Te
An, Zhiquan	Wuhan Textile University
Cao, Yukun	Wuhan Textile University, School of Computer Science and Artific
Keywords: Machine Learning Abstract: Information Retrieval (IR), Machine Learning (ML), and Deep Learning (DL) have become mainstream methods for traceability link recovery. However, IR-based methods face the challenge of low precision, while DL-based methods require large-scale training data to achieve better performance. In this paper, we propose a novel model XWCoDe, which apply XGBoost combined with a weighted code dependency strategy to traceability link recovery domain. In order to refine the initial candidate links generated by the XGBoost model, the strategy only modifies low confidence candidate links and pioneers the use of graph embedding technology node2vec to calculate the importance of each code dependency relationship. The experimental results show that the average F1 score of XWCoDe on 4 datasets and 9 training/testing ratios is 12.93% higher than the state-of-the-art method DF4RT.

08:45-09:05, Paper WeAT7.3
Robotic Crop Disease Monitoring Using Neural Network-Based Prediction and Weighted Path Planning

Sutton, Jacob	University of North Florida
Dutta, Ayan	University of North Florida
Kreidl, O. Patrick	University of North Florida
Boloni, Ladislau	University of Central Florida
Roy, Swapnoneel	University of North Florida
Keywords: Application of Artificial Intelligence, Deep Learning, Computational Intelligence Abstract: Disease control is paramount in modern agriculture to ensure optimal yield. Monitoring the spread of crop diseases is crucial for effective control measures. Traditional methods involve uniform pesticide spraying across entire fields, which can be inefficient and environmentally harmful. In this paper, we propose an intelligent solution employing mobile robots equipped with predictive AI techniques for disease monitoring and targeted intervention. These robots strategically visit select locations within the field, guided by a convolutional and recurrent neural network model trained on limited data to predict disease spread. We introduce a novel weighted path planning algorithm to optimize robot movement within the field considering disease risk and battery constraints. Our approach is implemented in the WaterBerry benchmark, an open-source platform for agricultural robotics. Experimental results demonstrate the efficacy of our technique, showcasing improved prediction accuracy and operational efficiency compared to baseline methods.

09:05-09:25, Paper WeAT7.4
Optimal Barcode Representation for NLP Embeddings

Sinha, Soumen	Mahindra University
Asilian Bidgoli, Azam	Wilfrid Laurier University
Rahnamayan, Shahryar	Brock University
Keywords: Deep Learning, Evolutionary Computation, Computational Intelligence in Information Abstract: The utilization of binary representation of the embeddings over real valued features represents a promising avenue, in terms of memory savings and faster operations for various machine learning models. In this research paper, we delve into the exploration of barcode representation for text embeddings derived from BERT, which is optimized using Coordinate Search algorithm. These binary embeddings present a compact representation of text, thereby mitigating memory and computational demands, which is especially advantageous in the context of resource-intensive large-scale text processing tasks. In our study, we introduce a novel optimal threshold technique, coupled with the Coordinate Search algorithm to transform continuous BERT embeddings into binary barcodes thereby enabling effective Natural Language Processing while sustaining computational efficiency. The optimal barcode representations have been applied in Natural Language Processing applications, showcasing its innovative potential in revolutionizing text representation. Through an extensive series of experiments on various NLP task encompassing diverse datasets, we comprehensively evaluate our approach, comparing it against a spectrum of thresholding techniques. The binary embeddings achieved by optimal thresholds outperform traditional binarization methods in terms of accuracy. The proposed method for generating a binary representations is versatile, being independent of the model, data and task, making it applicable across various machine learning applications.

09:25-09:45, Paper WeAT7.5
Disparity Map-Crack Detection: Combining Disparity Map Feature into Binary Segmentation for Accurate Crack Detection

Liu, Yang	Qingdao University
Yuan, Genji	Qingdao University
Li, Jianbo	Qingdao University
Keywords: Intelligent Transportation Systems Abstract: To address the limitations of crack detection methods relying solely on RGB images, we propose an innovative approach that incorporates disparity maps as an additional data source. This integration with RGB images aims to enhance crack detection performance. However, the generation of disparity maps is susceptible to image noise and matching errors, leading to inaccurate or mismatched disparity values that may impede precise ground crack detection. To mitigate this challenge, we apply a disparity transformation technique to refine the estimated disparity map, improving the differentiation of crack regions. Additionally, we employ a feature fusion method based on a connected low-loss subspace. This approach adaptively assigns feature weights to facilitate the complementary fusion of disparity and RGB features. Furthermore, the decoder includes a multi-scale feature alignment module that uses the fused encoder features to align each layer in the decoding process. This preserves image details and local features, enhancing the overall detection accuracy. Extensive testing experiments demonstrate a significant breakthrough in crack detection performance, achieving an Intersection over Union (IoU) of 80.48%. Our approach sets the benchmark in crack detection, effectively leveraging multi-source information, mitigating disparity map noise, and enhancing feature fusion.


WeAT8	MR08
Biometric and Bioinformatics Systems


WeAT9	MR09
Deep Learning and Neural Networks 13	Regular Papers - Cybernetics
Chair: Xin, Sida	Academy of Military Sciences

08:05-08:25, Paper WeAT9.1
Adaptive Graph Spatial Temporal Fourier-Enhanced Transformer Networks for Traffic Prediction

Hu, Jun	Hunan University
He, Xiaolong	Hunan University
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning Abstract: Traffic prediction is an important component of intelligent transportation systems as it plays a key role in route planning and traffic management. However, traffic flow series present a complex spatial-temporal correlations and nonlinear traffic patterns, predicting traffic accurately is made challenging by this. The current methods are struggling to model the overall trend of traffic flow series and are unable to utilize dynamic information about spatial dependencies. In this paper, we propose an adaptive graph spatial temporal Fourier-enhanced transformer networks (ASTFETN) to tackle the above traffic prediction problems. ASTFETN adopts an encoder-decoder architecture, the encoder and decoder are both composed of multiple spatial-temporal blocks to capture dynamic spatial and nonlinear temporal correlations. Furthermore, there is a transformer attention layer to capture the relationships of historical and future time. Experiments on two datasets, METR-LA and PEMS-BAY, demonstrate that ASTFETN outperforms the state-of-the-art baselines.

08:25-08:45, Paper WeAT9.2
Multi-Objective Evolutionary Neural Architecture Search for Liquid State Machine

Xin, Sida	Academy of Military Sciences
Chen, Renzhi	Defense Innovation Institute
Xiao, Xun	National University of Defense Technology
Li, Yuan	National University of Defense Technology
Wang, Lei	Defense Innovation Institute
Keywords: Evolutionary Computation, Neural Networks and their Applications, Machine Learning Abstract: Liquid State Machine (LSM) is a brain-inspired computational model that has proven highly effective in various applications, owing to its intrinsic capability to process spatiotemporal information and its minimal training complexity. However, the performance of LSMs significantly depends on the design of their network architecture, which is overly reliant on existing human experience. Furthermore, as the network scale increases, the computing resources required for deployment and operation also increase, so we regarded the network design as a multi-objective problem. To address these challenges, we introduced an effective surrogate-assisted multi-objective evolutionary neural architecture search algorithm that balanced the accuracy and network scale. Our approach utilized parameter sensitivity analysis followed by the upper confidence bound algorithm to reduce the search space. Experimental results demonstrate that we successfully reduced the dimensions of the search space by 11% and the size of the entire search space by 75%. Compared to the state-of-the-art, our approach offered better trade-off solutions, such as a solution that reduced network scale by 32.5% while maintaining the same accuracy, and another that improved accuracy by 1.4% without changing the network scale. Furthermore, the knee point reduced network scale by 25% and simultaneously increased accuracy by 0.7%. The source code can be accessed at https://github.com/XinSida/MOENAS-PSA.

08:45-09:05, Paper WeAT9.3
Semantic Consistency Based Dual-Asymmetric Discrete Online Hashing for Multi-View Streaming Data Retrieval

Jing, Chen	Beijing University of Posts and Telecommunications
Zu, Yunxiao	Beijing University of Posts and Telecommunications
Hou, Bin	Beijing University of Posts and Telecommunications
Sang, Xinzhu	Beijing University of Posts and Telecommunications
Liu, Meiru	Beijing University of Posts and Telecommunications
Keywords: Big Data Computing,, Machine Learning, Multimedia Computation Abstract: Multi-view online hashing has received much attention due to its huge potential in the area of large-scale multimedia retrieval. However, there are still some issues, e.g., how to alleviate the catastrophic forgetting, how to adequately extract high-level semantic information of multi-view streaming data and improve the discrimination of hash models, and how to effectively optimize the binary constraint problem. In this paper, we propose a novel Semantic Consistency based Dual-asymmetric Discrete Online Hashing method, SC-DDOH for short. It adopts a dual-asymmetric distance-based similarity supervision to retain similarities of new data chunk and database. To extract efficient high-level semantic information, an online semantic consistent supervision to mine the semantic related information from word embedding labels. Moreover, an efficient discrete iterative optimization algorithm is introduced to directly learn hash code in the Hamming space. Experiment results on three large-scale multi-view datasets demonstrate the superiority of SC-DDOH over the state-of-the-art baselines.


WeAT10	MR10
Image Processing and Pattern Recognition 4	Regular Papers - Cybernetics
Chair: Yuzhe, Wang	Dalian Minzu University

08:05-08:25, Paper WeAT10.1
EMPA-YOLO: A Lightweight Real-Time Weed Detection Method Suitable for Natural

Yuzhe, Wang	Dalian Minzu University
Jiahao, Chen	Dalian Minzu University
Xiaodong, Duan	Dalian Minzu University
Li, Zhuohui	DalianMinzuUniversity
Keywords: Application of Artificial Intelligence, Image Processing and Pattern Recognition, Machine Vision Abstract: Weed detection is crucial for the healthy growth of crops, yet existing detection models struggle to perform high-accuracy real-time detection of weeds in natural environments on edge computing devices. This paper introduces EMPA-YOLO, a model designed for rapid, accurate, real-time weed detection on low-performance edge computing devices. It incorporates an efficient multi-scale convolutional structure, C3EMSC, and a lightweight, adaptive weight subsampling layer, LAWDS, into YOLOv5s. Additionally, a logical distillation algorithm, AlignSoftTarget, is proposed for knowledge distillation. Validation on a mixed dataset of crops and weeds showed that EMPA-YOLO improved mAP50 by 11.2%, reduced parameter count by 4.8M, decreased computational load by 8.7GFLOPs, and increased inference frame rate by 58% compared to the original YOLOv5s algorithm. When compared to YOLOv3, YOLOv5s, YOLOv6, YOLOv8s, and RT-DETR, inference speed improved by 89.2%, 60.8%, 77.5%, 76.5%, and 94.7%, respectively, with mean accuracy enhancements of 7.3%, 11.2%, 16.7%, 0.9%, and 1.2%. Real-world testing on edge computing devices met real-time detection requirements, proving its efficacy and practicality in weed detection. Keywords—lightweight model; YOLO; model compression; knowledge distillation; edge computing

08:25-08:45, Paper WeAT10.2
A Multi-Stream Structure-Enhanced Network for Mesh Denoising

Yang, Yutian	South China University of Technology
Liang, Lingyu	South China University of Technology
Yan, Jie	South China University of Technology
Xu, Yong	South China University of Technology
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Application of Artificial Intelligence Abstract: Triangular meshes provide an efficient representation of 3D shapes. Various applications such as 3D simulation suffer from degradation in geometric quality. This paper proposes a novel Multi-stream Structure-Enhanced Network (MSE-Net) based on graph convolutional networks. The network uses multi-scale features besides vertex position to guide face normal filtering, which can better preserve the geometric feature during the denoising process. In contrast to former methods that focus on filtering vertex coordinate and face normal apart, MSE-Net innovatively fuses more structure features like face area, inner product between face normal and vertex normals, and the interior angles of face to guide the face normal and vertex position updating, utilizing the inherent structural characteristic of Mesh. Our method achieves state-of-the-art performance on several publicly available datasets, demonstrating its effectiveness.

08:45-09:05, Paper WeAT10.3
Object Detection Approaches to Identifying Hand Images with High Forensic Values

Nguyen, Thanh Thi	Monash University
Wilson, Campbell	Monash University
Khan, Imad	Monash University
Dalins, Janis	Australian Federal Police
Keywords: AI and Applications, Image Processing and Pattern Recognition, Neural Networks and their Applications Abstract: Forensic science plays a crucial role in legal investigations, and the use of advanced technologies, such as object detection based on machine learning methods, can enhance the efficiency and accuracy of forensic analysis. Human hands are unique and can leave distinct patterns, marks, or prints that can be utilized for forensic examinations. This paper compares various machine learning approaches to hand detection and presents the application results of employing the best-performing model to identify images of significant importance in forensic contexts. We fine-tune YOLOv8 and vision transformer-based object detection models on four hand image datasets, including the 11k hands dataset with our own bounding boxes annotated by a semi-automatic approach. Two YOLOv8 variants, i.e., YOLOv8 nano (YOLOv8n) and YOLOv8 extra-large (YOLOv8x), and two vision transformer variants, i.e., DEtection TRansformer (DETR) and Detection Transformers with Assignment (DETA), are employed for the experiments. Experimental results demonstrate that the YOLOv8 models outperform DETR and DETA on all datasets. The experiments also show that YOLOv8 approaches result in superior performance compared with existing hand detection methods, which were based on YOLOv3 and YOLOv4 models. Applications of our fine-tuned YOLOv8 models for identifying hand images (or frames in a video) with high forensic values produce excellent results, significantly reducing the time required by forensic experts. This implies that our approaches can be implemented effectively for real-world applications in forensics or related fields.


WeAT11	MR11
Image Processing and Pattern Recognition 7
Chair: Komiya, Daiki	Kanagawa University

08:05-08:25, Paper WeAT11.1
Low-Rank Tensor-Based Two-Dimensional Projection Learning for Feature Extraction

Liang, Xiaojia	South China Normal University
Wu, Yue	South China Normal University
Xiao, Xiaolin	South China Normal University
Keywords: Machine Learning, Image Processing and Pattern Recognition Abstract: Recently, Low-Rank Matrix (LRM)-based feature extraction methods have drawn increasing attention since they can extract robust features when the data are corrupted. However, these algorithms require a matrix-to-vector transformation to tackle Two-Dimensional (2D) images, through which the spatial structure residing in 2D images is ignored. To solve this problem, we propose a Low-Rank Tensor-based 2D Projection learning model (LRT-2DP) to extract features directly from 2D images as well as to reduce dimensionality. In essence, LRT-2DP embraces the global self-expressiveness property to denoise the corrupted data, from which a 2D projection basis is learned for robust feature extraction. The proposed LRT-2DP can be efficiently optimized with an alternative optimization scheme. Extensive experiments on image feature extraction have demonstrated the superiority of LRT-2DP compared to state-of-the-arts.

08:25-08:45, Paper WeAT11.2
EE-MVSNet: Deep Learning-Based Cascaded High-Precision Multi-View Stereo Network with ECA and EVC (I)

Zhang, Ziyi	Zhejiang University of Technology
Kong, Changfei	Zhejiang University of Technology
Mao, Jiafa	Zhejiang University of Technology
Cheng, Xu	Norwegian University of Science and Technology
Chan, Sixian	Zhejiang University of Technology
Keywords: Deep Learning, Image Processing and Pattern Recognition, Multimedia Computation Abstract: Multi-view stereo (MVS) has emerged as a pivotal algorithm in 3D reconstruction, garnering significant research attention over the past several decades. While recent coarse-to-fine methods have demonstrated promising results in enhancing the reconstruction quality of traditional algorithms, they often neglect the crucial aspect of feature layer refinement. Additionally, these methods face the challenge of low-cost feature matching. To address these limitations, we propose a novel learning-based MVS framework(EE-MVSNet). Firstly, we propose a novel approach incorporating an explicit visual center (EVC) module within the feature pyramid network (FPN), strengthening the adjustment within feature layers and improving model accuracy. Furthermore, we introduce the ECA+3DCNN module, which utilizes channel attention to alleviate the problem of low-cost feature matching. Finally, our model achieves competitive performance through extensive experimentation on the DTU dataset, showcasing its high-quality 3D reconstruction.

08:45-09:05, Paper WeAT11.3
A Classification Method for “kawaii” Images Using Semantic Interpretation (I)

Komiya, Daiki	Kanagawa University
Akiyoshi, Masanori	Kanagawa University
Keywords: Representation Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence Abstract: This paper proposes a classification method for five types of images represented by the word “kawaii”. “Kawaii images” do not have fixed concept or object, which makes it to classify them simply using conventional methods such as Convolutional Neural Network or Support Vector Machine. Our previous study extracted color and shape features from such images, then achieving a classification accuracy of 70.2%. However, this approach did not handle the semantic content of the images. In this study, a classification method based on the constituent elements of “kawaii images” is used, resulting in a classification accuracy of 71.4%. Additionally, the experiment seems to providing possibilities to reflect similarities with human recognition to some extent.


WeAT12	MR12
Affective and Cognitive Computing	Regular Papers - HMS
Chair: Liu, Muxuan	Ochanomizu University

08:05-08:25, Paper WeAT12.1
Dual-Domain Attention Based Adaptive Graph Convolutional Network for EEG Emotion Recognition

Xu, Tie	South China University of Technology
Zhang, Tong	South China University of Technology
Chen, Bianna	South China University of Technology
Chen, C. L. Philip	University of Macau
Keywords: Affective Computing, Brain-Computer Interfaces Abstract: The asymmetry of emotional responses is observed in electroencephalogram (EEG) of different frequency bands across various spatial brain regions in neuroscience research. Many prior works have primarily emphasized the dependencies among channels in the spatial domain, neglecting the dynamic interaction of EEG in both spatial and frequency domains, which may limit the performance of EEG emotion recognition. To address these issues, we propose the dual-domain attention based adaptive graph convolutional network (DDA-AGCN) for EEG emotion recognition. Specifically, we propose the lightweight dual-domain attention mechanism (DDA) based on random vector similarity measurement and the squeeze-excitation technique to capture important characteristics in the channel and frequency domain respectively. Furthermore, the adaptive graph convolutional network (AGCN) is utilized to adaptively filter and refine low signal-to-noise ratio EEG data, while also learning the dynamic connectivity patterns among important EEG channels and extracting higher-level abstract features for emotion recognition tasks. To validate the effectiveness of the proposed method, experimental comparisons were conducted on SEED, SEED-IV, and MPED. The experimental results show that our method achieves highly competitive classification performance compared to existing methods. Moreover, under fair comparison, the DDA demonstrates better performance and computational efficiency than self-attention.

08:25-08:45, Paper WeAT12.2
Transfer Learning for Emotion Recognition across Depression Patients and Healthy Subjects with Data Alignment and Selection

Jiang, Chao	Shanghai University
Dai, Yingying	Shanghai University of Electric Power
Chen, Xi	The School of Communication and Information Engineering, Shangha
Tang, Yingying	Shanghai Mental Health Center, Shanghai Jiao Tong University Sch
Li, Yingjie	Shanghai University
Keywords: Affective Computing, Brain-Computer Interfaces, Cognitive Computing Abstract: Identifying and understanding emotions is crucial, particularly when considering a range of subjects. This study introduces a novel approach to emotion recognition between depression patients and healthy subjects, amalgamating data alignment (DA) and subject selection (SS) within the transfer learning framework. Three methodologies for DA are explored, including Riemannian alignment (RA), Euclidean alignment (EA), and correlation alignment (CORAL). The Jensen -Shannon (JS) divergence is employed to gauge the similarity between target and source subjects to select potential training datasets. Through conducting cross-subject experiments within depression and healthy cohorts, and employing diverse models, experimental outcomes evince notable enhancements in recognition efficacy facilitated by this combined DA and SS transfer learning paradigm. Moreover, the study demonstrates that despite cognitive challenges in emotion recognition among individuals with depression and those without the disorder, skillful design enables the utilization of data from healthy individuals and trained algorithms to greatly enhance emotion recognition in depression patients, resulting in significant benefits.

08:45-09:05, Paper WeAT12.3
Do Feature Representations from Different Language Models Affect Accuracy of Brain Encoding Models' Predictions?

Liu, Muxuan	Ochanomizu University
Kobayashi, Ichiro	Ochanomizu University
Keywords: Brain-based Information Communications, Cognitive Computing, Human Perception in Multimedia Abstract: We investigate the impact of feature respresentations derived from different language models on brain encoding models, which are designed to predict brain states from linguistic stimuli. This study aims to determine whether the variances in the feature respresentations of language models, originating from their distinct encoder/decoder architectures, training data quality and quantity, and parameter sizes, affect their predictive accuracy on brain states. By examining how these feature respresentations influence brain encoding models, we identify specific brain regions where the predictability of brain activity is consistently influenced across various models, thereby uncovering similarities in their predictive effectiveness.


WeAT13	Room T13
Brain-Machine Interfaces (BMIs)	Regular Papers - SSE


WeK5N	HALL C&D
Keynote 5 Chairperson: Prof. Gina Tang Organoids Prof. Huijun Gao, Harbin Institute of Technology


WeBT1	MR01
Cybernetics and Quantum Systems 4
Chair: Shen, Yixiang	Shanghai University

11:00-11:20, Paper WeBT1.1
Universum Based Class-Specific Self-Set Broad Learning System for Software Defect Prediction

Tang, LeQi	South China University of Technology
Huang, Sen	South China University of Technology
Chen, Wuxing	South China University of Technology
Bi, Jichao	Zhejiang University
Zhou, Shan	Technology and Engineering Center for Space Utilization, Chinese
Yang, Kaixiang	South China University of Technology
Keywords: Machine Learning, Artificial Social Intelligence, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing Abstract: In today's rapidly evolving field of information technology, software plays a key role in a wide variety of applications, thus increasing the need for accurate and fast prediction of software defects. Despite the growing interest in software defect prediction, the prevalent prediction methods have paid little attention to the class imbalance embedded in them. To address this problem, we introduce the universum-based Class-Specific Broad learning system (UCSSBLS). UCSSBLS synthesises the prior information in the classification model by computing the mean of two samples from different classes. At the same time universum-based BLS tries to use this information to create a third plane between the two planes of symmetry. Based on the a priori information in the training data and different class distributions, we adaptively modify the penalty parameters to fit the imbalanced class distributions, and the simulation experiments demonstrate that our proposed method is effective in the software defect prediction problem with imbalanced distributions.

11:20-11:40, Paper WeBT1.2
S-ADMM: Optimizing Machine Learning on Imbalanced Datasets Based on ADMM

Shen, Yixiang	Shanghai University
Lei, Yongmei	Shanghai University
Qiu, Qinnan	Shanghai University
Keywords: Machine Learning Abstract: Nowadays, addressing large-scale machine learning problems using high-performance computing (HPC) clusters has gained significant importance. The alternating direction method of multipliers (ADMM) is widely used in machine learning for solving optimization problems on clusters. However, ADMM's performance is greatly affected by imbalanced datasets on the HPC cluster. In this paper, we propose the distributed shunt ADMM with a new adaptive penalty method based on the hybrid MPI/OpenMP programming model (S-ADMM). The proposed shunt strategy chooses different sub-problem optimization algorithms to improve the accuracy with the imbalanced datasets. Additionally, we design a novel adaptive penalty parameter method and improve a sub-problem optimization algorithm for S-ADMM. The adaptive penalty parameter method accelerates the algorithm's convergence and the sub-problem optimization algorithm improves the training efficiency of ADMM. Moreover, S-ADMM reduces communication cost by exchanging parameters among nodes using the MPI and saves calculation time by parallel computation within nodes via OpenMP threads. For the SVM classification problem, experiments conducted on the Tianhe-2 supercomputing platform show that S-ADMM has competitive running efficiency and up to 43% accuracy improvement compared to existing distributed ADMM implemented with pure MPI or MPI/OpenMP on imbalanced datasets.

11:40-12:00, Paper WeBT1.3
A Boosting Framework for Financial Distress Prediction Based on Imbalanced Data

Dan, Zhao	Zhejiang University
Keywords: Machine Learning, Soft Computing, Socio-Economic Cybernetics, Application of Artificial Intelligence Abstract: This study introduces a boosting framework for financial distress prediction, specifically designed for imbalanced data, and incorporates robust business logic to enhance interpretability. The framework employs a clustering algorithm to group data samples based on corporate governance features and then determines the optimal number of clusters using a unique validation measure. An oversampling method is applied post-clustering, followed by a base prediction algorithm to predict financial distress for each cluster. The empirical analysis is conducted using imbalanced sample data from Chinese listed companies, with feature data at time t− m (where m = 1, 2, 3) and the Special Treatment (ST) status at time t used to train the model. The aim is to predict the occurrence of financial distress m years into the future. The results demonstrate that the proposed boosting framework outperforms the base model in terms of prediction accuracy on imbalanced data.

12:00-12:20, Paper WeBT1.4
Optimising Horizons in Model Predictive Control for Motion Cueing Algorithms Using Reinforcement Learning

Al-serri, Sari	Deakin University
Qazani, Mohammad Reza Chalak	Deakin University
Mohamed, Shady	Senior Research Fellow, Deakin University
Arogbonlo, Adetokunbo	Deakin University
Al-ashmori, Mohammed	Deakin University
Lim, Chee Peng	Deakin University
Nahavandi, Saeid	Swinburne University of Technology
Asadi, Houshyar	Deakin University
Keywords: Machine Learning, Metaheuristic Algorithms Abstract: This paper explores the application of driving simulators across multiple sectors, highlighting the challenges associated with refining motion cueing algorithms (MCA) through model predictive control (MPC). Through these platforms, drivers can simulate the sensation of motion. The implementation of MPC-based MCA, while advantageous for its precision in controlling motion simulations, encounters significant hurdles such as the requirement for highly accurate system models and the extensive parameter tuning needed for each specific control scenario. These issues create a critical gap in achieving optimal simulation fidelity and efficiency with lower computational time, necessitating a novel approach to improve the MCA domain. Addressing these challenges, the study pioneers the use of Deep Q-Network (DQN), a reinforcement learning (RL) technique, to optimise the horizons of MPC within the MCA domain. This innovation is significant as it introduces, for the first time, a method to dynamically adjust MPC-based MCA horizons using DQN, which learns through continuous interaction with the simulation environment. This approach is set to overcome the limitations of traditional meta-heuristic optimisation methods, such as the Grasshopper Optimisation Algorithms (GOA) and Butterfly Optimisation Algorithms (BOA), by offering a more flexible and adaptable solution. The overarching goal of this research is to minimise the system's cost function by maximising a reward function that encompasses key performance metrics such as specific force sensation, angular velocity, linear displacement, linear velocity, and angular displacement. By integrating DQN into the MPC-based MCA environment, this study demonstrates a faster computational running time and improves the precision and efficiency of the simulations. This innovative approach enhances the efficiency of the horizon determination process, showcasing promising implications for the MCA domain's advancement.

12:20-12:40, Paper WeBT1.5
Hybrid Quantum-Inspired Evolutionary Neural Networks for Intrusion Detection System

Kuo, Shu-Yu	National Taiwan University
Shen, Jyun-Yi	National Chi Nan University
Liu, Chia-Lin	National Chi-Nan University
Chou, Yao-Hsin	National Chi Nan University
Keywords: Quantum Cybernetics, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Hybrid Models of Computational Intelligence Abstract: Quantum-inspired evolutionary algorithms harness quantum properties to optimize the search process within classical computers, efficiently addressing complex and challenging problems. This study first proposes an intrusion detection system (IDS) based on a hybrid model using quantum-inspired evolutionary neural networks. The model integrates a deep neural network (DNN) and a global best-guided quantum-inspired tabu search algorithm (GQTS). To safeguard against potential threats, an IDS is deployed to monitor network or system traffic and detect malicious attacks. Anomaly detection, a pivotal aspect of IDS, aims to establish a normal model to respond effectively to unknown abnormal attacks. The experiment utilizes the latest dataset, CICIDS2017, which is generated based on realistic background traffic. During the training phase, GQTS selects valid features from the dataset and optimizes the hyperparameters of the DNN setting automatically, significantly contributing to improving accuracy and reducing the false negative rate. The results highlight that the proposed hybrid model decreases computational complexity through feature selection and enhances model accuracy via suitable hyperparameter optimization compared to other state-of-the-art methods. The proposed model demonstrates great potential over alternative structures.

12:40-13:00, Paper WeBT1.6
Fine-Grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-Scale Pre-Trained Model (I)

Chen, Zhonglong	Beijing University of Technology
Changwei, Song	Beijing University of Technology
Chen, Yining	Beijing University of Technology
Li, Jianqiang	Beijing University of Technology
Fu, Guanghui	Sorbonne University
Tong, Yongsheng	Peking University Huilongguan Clinical Medical School
Zhao, Qing	Beijing University of Technology
Keywords: Deep Learning, Machine Learning Abstract: Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available at: https://github.com/czl0914/psy_hotline_analysis.


WeBT2	MR02
Deep Learning and Neural Networks - 14
Chair: Yi, Chaoxiong	National University of Defense Technology

11:00-11:20, Paper WeBT2.1
HMO: Host Memory Optimization for Model Inference Acceleration on Edge Devices

Yi, Chaoxiong	National University of Defense Technology
Jian, Songlei	National University of Defense Technology
Tan, Yusong	National University of Defense Technology
Zhang, Yusen	National University of Defense Technology
Keywords: Machine Learning, Deep Learning, AI and Applications Abstract: Deep learning (DL) is characterized by its demanding computational and memory requirements, which creates a significant challenge when deploying on edge devices. These devices often have limited computational capabilities and constrained resources. Most existing methods primarily focus on model-level techniques, such as model pruning or parameter quantization, to reduce model size and computation for accelerating inference. Considering the prevalent programming paradigms in DL, we propose a host memory optimization method, namely HMO, which can be integrated into DL programming framework, e.g., PyTorch, to improve the inference efficiency of DL models without modifying any model code. We particularly focus on memory optimization for intermediate variables in inference, aiming to enhance inference speed while maintaining a lower memory footprint. HMO involves a single profiling of inference to gather memory statistics about intermediate variables. These statistics are then used to guide subsequent inference. Additionally, we incorporate huge pages in operating systems to improve the memory access performance of HMO. Our experimental results show that HMO can achieve an average inference latency optimization ratio of 20.13% compared with native PyTorch on six typical DL image representation models while effectively managing memory usage. Importantly, this is achieved without compromising model accuracy.

11:20-11:40, Paper WeBT2.2
Contrastive Learning-Based User Identification with Limited Data on Smart Textiles

Zhang, Yunkang	University of Science and Technology of China
Wu, Ziyu	University of Science and Technology of China
Liang, Zhen	University of Science and Technology of China
Xie, Fangting	University of Science and Technology of China
Wan, Quan	Universtity of Science and Technology of China
Zhao, Mingjie	University of Science and Technology of China
Cai, Xiaohui	University of Science and Technology of China
Keywords: Deep Learning, Transfer Learning, Neural Networks and their Applications Abstract: Pressure-sensitive smart textiles are widely applied in the fields of healthcare, sports monitoring, and intelligent homes. The integration of devices embedded with pressure sensing arrays is expected to enable comprehensive scene coverage and multi-device integration for smart home environments. However, the implementation of identity recognition, a fundamental function in this context, relies on extensive device-specific datasets due to variations in pressure distribution across different devices. To address this challenge, we propose a novel user identification method based on contrastive learning. We design two parallel branches to facilitate user identification on both new and existing devices respectively, employing supervised contrastive learning in the feature space to promote domain unification. When encountering new devices, extensive data collection efforts are not required; instead, user identification can be achieved using limited data consisting of only a few simple postures. Through experimentation with two 8-subject pressure datasets (BedPressure and ChrPressure), our proposed method demonstrates the capability to achieve user identification across 12 sitting scenarios using only a dataset containing 2 postures. Our average recognition accuracy reaches 79.05%, representing an improvement of 2.62% over the best baseline model.

11:40-12:00, Paper WeBT2.3
A Joint Multi-Dimensional Fine-Grained Pruning Method for Deep Neural Network

Chen, Cong	Nanjing University of Aeronautics and Astronautics
Zhang, Tong	Nanjing University of Aeronautics and Astronautics
Zhu, Kun	Nanjing University of Aeronautics and Astronautics
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications Abstract: Existing deep neural network (DNN) pruning methods can be classified into two main categories: structured pruning and weight pruning. Structured pruning is a representative model compression technology of DNN to reduce the storage and computation requirements and accelerate inference, which mainly includes filter pruning and channel pruning. However, they both belong to coarse-grained methods, which can only decide whether to prune a whole filter or channel or not and provide limited decision space. On the other hand, structured stripe-wise pruning has finer granularity than filter pruning, and shape-wise pruning also has finer granularity than channel pruning. These two fine-grained methods are related to two dimensions: rows and columns from the general matrix multiplication (GEMM) perspective of convolution operations. Considering that combining pruning decisions in finer granularity from multiple dimensions will produce a larger solution space, in this paper we propose a joint multi-dimensional finegrained pruning scheme (JFP) for DNN compression, which simultaneously prune elements in filters and channels. Extensive experiments on the CIFAR-10 dataset demonstrate that: (1) JFP achieves stabler pruning ratios compared to stripe-wise pruning (2) JFP effectively compresses DNN parameters and reduces calculation amount while maintaining the accuracy compared with counterparts.

12:00-12:20, Paper WeBT2.4
PSA-Swin Transformer: Image Classification on Small-Scale Datasets

Shao, Chao	XinJiang University
Jiang, Shaochen	Xinjiang University
Li, Yongming	Xinjiang University
Keywords: Deep Learning, AI and Applications, Image Processing and Pattern Recognition Abstract: This paper introduces PSA-Swin Transformer, a novel framework for image classification on small-scale datasets, highlighting the challenges of training effective models in resource-constrained environments. Recognizing the limitations of current deep learning methods that rely heavily on large-scale datasets and extensive pre-training, we propose an approach to handle small datasets. Our model is able to effectively handle smaller data volumes without pre-training weights. The key to our approach is the introduction of an Efficient Positional Embedding (EPE) module, which improves parameter utilization and network expressiveness through a grouped convolutional architecture and shuffling operations for dynamic information exchange. In addition, we integrate the Polarized Self-Attention (PSA) module, which addresses the complexity of learning element-specific attention by combining polarized filtering with augmentation techniques. Through a series of experiments on the Mini-Imagenet dataset, PSA-Swin Transformer demonstrates decent performance, especially in environments where high-quality annotated data is scarce or costly to acquire. Our results are expected to lead to advances in areas where efficient and accurate image classification using limited resources is required.

12:20-12:40, Paper WeBT2.5
Deep Reinforcement Learning-Based Strategies for Truck Platooning at Highway On-Ramps

Wang, An	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Luan, Wenjing	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Guo, Xiwang	Liaoning Petrochemical University
Keywords: Deep Learning, Application of Artificial Intelligence, Machine Learning Abstract: The development of Connected and Automated Trucks (CATs) provides a new opportunity for freight industry to enhance fuel efficiency, increase traffic flow, and improve safety through platooning. Particularly at highway on-ramps, how to effectively form CAT platoons is a key research topic. In the process of CAT platooning, the timing, location, and speed of CAT merging significantly impact safety and energy consumption. Thus, this study proposes a hierarchical merging strategy, aimed at achieving effective autonomous CAT platooning at highway on-ramps by considering the interference of human-driven vehicles (HDVs). Specifically, we employ a model-free deep reinforcement learning method that guides CAT merging process by exploring optimal driving behaviors. It ensures the safety and efficiency of the CAT merging process. In addition, we use the real vehicle dynamics model in simulation. The proposed strategy can handle the variation of the CATs’ initial positions and speeds at on-ramps, as well as disturbances caused by HDVs at highway mainline. The effectiveness of the proposed strategy has been validated through simulations. The results show that the proposed strategy can effectively coordinate CAT platooning at highway on-ramps.

12:40-13:00, Paper WeBT2.6
Modeling Hydrodynamic Diffusion Processes Using Spatio-Temporal Deep Neural Networks with Environmental Physical-Coupled Constraints (I)

Jia, Lei	University of Aizu
Yen, Neil	University of Aizu
Pei, Yan	University of Aizu
Keywords: Deep Learning Abstract: The simulation and analysis of complex spatiotemporal systems are crucial for expressing and solving chaotic dynamical systems such as those in Earth and environmental sciences. Understanding and computing physical processes, reactions, or substance transport typically relies on control equations. This paper aims to explore a novel research paradigm by enhancing the physical network coupling structure to construct predictive models for fluid dynamics systems, simulating spatiotemporal dynamical processes of substance transformation in the domain of environmental physics. In particular, when addressing problems involving the non-homogeneous 2D fluid dynamics equations, the characteristic parameters of the physical processes were redefined. This was achieved by encoding hard boundary conditions and designing appropriate neural network architectures to mitigate over-fitting issues during the prediction of parameterized dynamical systems. Comparative experiments involving five benchmark physics-informed neural network methods emphasize the significant improvement in capturing time-varying features and prediction accuracy brought by the proposed approach. Through various water cycle scenarios, the model’s estimation ability for diffusion fields is validated, focusing on analyzing the influence of data errors and sample size on the computational results of this deep neural network.Notably, the proposed method exhibits higher robustness to outlier observations under extreme conditions.


WeBT3	MR03
Cognitive and Affective Computing
Chair: Gu, Jiaqi	GuangXi University for Nationalities

11:00-11:20, Paper WeBT3.1
Precise Knowledge Enhancement Via CBR Framework for Empathetic Dialogue Generation

Gu, Ziyin	Chinese Academy of Sciences
Zhu, Qingmeng	Science & Technology on Integrated Information System Laboratory
He, Hao	Chinese Academy of Sciences
Yu, ZhiPeng	ISCAS
Keywords: Affective Computing, Human-Machine Interaction Abstract: Empathetic dialogue systems are designed to capture emotions in conversations and provide appropriate emotional responses. Previous researches have indicated that integrating specific knowledge into empathetic dialogue systems can enhance the overall effectiveness of generating empathetic responses. Nevertheless, existing methods for knowledge-enhanced empathetic dialogue generation lack a focus on the precise selection of knowledge enhancement configurations for this specific task. To address this, we propose a Case-Based Reasoning (CBR) framework called CBR-KNOWLEDGE for autonomously select precise knowledge enhancement configurations tailored to specific empathetic dialogue contexts. Firstly, CBR-KNOWLEDGE establishes a case base that mirrors the overall quality of empathetic dialogues generated under various knowledge enhancement configurations. Subsequently, CBR-KNOWLEDGE employs an innovative text representation method, integrating an additional representation for words with noteworthy emotional impact. This approach facilitates the retrieval of analogous empathetic dialogues, enabling the reuse of their knowledge enhancement configurations to determine a new knowledge enhancement configuration. Ultimately, CBR-KNOWLEDGE employs this precise knowledge enhancement configuration for the purpose of empathetic dialogue generation. Experimental results demonstrate that CBR-KNOWLEDGE effectively enhances the performance of empathetic dialogue generation task.

11:20-11:40, Paper WeBT3.2
Two-Stage Multi-Modal Prompt Tuning for Few-Shot Sentiment Analysis

Gu, Jiaqi	GuangXi University for Nationalities
Niu, Hao	Gengchi Technology Co., Ltd
Keywords: Affective Computing, Multimedia Systems Abstract: Few-shot 多模态情感分析（MSA）至关重要视觉语言理解领域的任务和在各种应用领域中发挥着关键作用（例如，互动、电子商务推广和社交媒体分析等）。最近，随着在预训练的语言模型中，前人的工作主要有利用了预训练语言模型的组合除了视觉编码器和采用的提示学习之外将预训练的语言模型推广到 MSA 任务。但是，有专门的 visionlanguage 预训练设计用于处理视觉语言任务的模型（VLPM），如视觉问答。几乎没有 VLPM及其提示学习方法的探索多模态情感分析。因此，我们的工作填补了通过提出两阶段多模态提示来弥补这一差距基于小样本情感分析的调优（TSMMP） VLPM。TSMMP 由两级提示调谐组成。在第一阶段，我们分别对图像和文本进行编码，然后将它们馈

11:40-12:00, Paper WeBT3.3
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Gbagbe, Koffivi Fidele	Skolkovo Institute of Science and Technology
Altamirano Cabrera, Miguel	Skolkovo Institute of Science and Technology Skoltech
Alabbas, Ali	Skolkovo Institute of Science and Technology
Alyounes, Oussama	Skolkovo Institute of Science and Technology Skoltech
Lykov, Artem	Skolkovo Institute of Science and Technology
Tsetserukou, Dzmitry	Skoltech
Keywords: Human-Computer Interaction, Affective Computing, Human-Machine Interface Abstract: This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulation that seamlessly integrates vision for scene understanding, language comprehension for translating human instructions into executable code, and physical action generation. We evaluated the system’s functionality through a series of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to prepare the requested salad. We assessed the system’s performance in terms of accuracy, efficiency, and adaptability to different salad recipes and human preferences through a series of experiments. Our results show a 100% success rate in generating the correct executable code by the Language Module, a 96.06% success rate in detecting specific ingredients by the Vision Module, and an overall success rate of 83.4% in correctly executing user-requested tasks.

12:00-12:20, Paper WeBT3.4
Self-Attention Residual Connection and Graph Neural Hawkes Bilayer Model for Session-Based Recommendation

Li, Huan	Dongguan University of Technology
Chen, Senpeng	Dongguan University of Technology
Wei, Wenhong	Dongguan University of Technology
Dong, Ani	Dongguan City University
Li, Qingxia	Dongguan City University
Keywords: Cognitive Computing, Intelligence Interaction, Human-Machine Interaction Abstract: Session-based recommendation aims to make recommendations for anonymous users based on limited session data. However, traditional session-based recommendation methods fail to capture complex item transitions and simply represent the user’s last clicked item as a short-term preference, neglecting the global sequential information of the session. This approach struggles to consider transitions between contexts and cannot accurately capture the user’s true intentions. To address these issues, this paper proposes a session recommendation method based on self-attention residual connections and graph neural Hawkes (SRGNH). This method introduces a duallayer network structure consisting of graph neural self-attention residual connection layers and graph neural Hawkes layers, designed to learn users’ long-term and short-term preferences, respectively. SRGNH employs a Gated Graph Neural Network (GGNN) to capture complex interactions between nodes, obtaining latent vectors for each item. It incorporates self-attention networks and residual connections to effectively utilize low-level inspired information for capturing users’ long-term preferences. The graph neural Hawkes layer combines the Hawkes process with GGNN to capture the relationship between user item clicks over continuous time, accurately representing users’ short-term preferences. To better represent user intent, we linearly combine users’ long-term and short-term preferences in the end. Experimental results demonstrate that the proposed SRGNH outperforms other recommendation models on the Diginetica, Yoochoose1/64, and Yoochoose1/4 datasets.

12:20-12:40, Paper WeBT3.5
Investigation of Correspondence between Learner Sensory Processing Sensitivity and Different Avatars in Online Lectures (I)

Riese, Sean Mirai	Japan Advanced Institute of Science and Technology
Koich, Ota	Japan Advanced Institute of Science
Gu, Wen	Center for Innovative Distance Education and Research, Japan Adv
Hasegawa, Shinobu	Japan Advanced Institute of Science and Technology
Keywords: Affective Computing Abstract: Highly Sensitive Person (HSP) characteristics such as "depth of processing," "overstimulation," "emotional reactivity and empathy," and "sensitivity to subtleties" sometimes face challenges due to their high Sensory Processing Sensitivity (SPS) to the environment. To investigate the differences in SPS for each learner and the impact of different avatars on video presentation for online videos increased by COVID-19, we surveyed 20 participants who learned SDGs instruction videos with four different avatars. Analysis of their SPS responses using the HSPS-J19 self-assessment tool revealed that participants' SPS scores followed a normal distribution, indicating individual differences in SPS. In addition, a few correlations were found between HSPS-J19 scores and participants' impressions and motivation to avatar presentation. Furthermore, the cluster analysis results indicated that the HSP tendency group was more effective in applying appropriate avatars. Based on these results, we designed an online lecture support environment that allows the control of other people's videos as environmental stimuli. This research focuses on an unexplored area and enhances online lectures for HSP since HSP statistically applies to 15% to 20% of the population, and supporting high SPS learners is socially significant in the post-COVID-19 era.

12:40-13:00, Paper WeBT3.6
Teacher-To-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model (I)

Singkul, Sattaya	SpeeChance Co., Ltd
Yuenyong, Sumeth	Mahidol University
Wongpatikaseree, Konlakorn	Mahidol University
Keywords: Affective Computing, Human-Computer Interaction, Human-Machine Interaction Abstract: This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets-ThaiSER, EMOLA, and MU-the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition.


WeBT4	MR04
BMI - Novel Algorithms and Privacy-Preserving Brain-Computer Interfaces (Chair: Dongrui Wu)	BMI Workshop Papers
Chair: Meng, Lubin	Huazhong University of Science and Technology

11:00-11:20, Paper WeBT4.1
Knowledge-Data Fusion Based Source-Free Semi-Supervised Domain Adaptation for Seizure Subtype Classification

Peng, Ruimin	Huazhong University of Science and Technology
An, Jiayu	Huazhong University of Science and Technology
Wu, Dongrui	Huazhong University of Science and Technology
Keywords: Brain-Computer Interfaces, Assistive Technology, Brain-based Information Communications Abstract: Electroencephalogram (EEG)-based seizure subtype classification enhances clinical diagnosis efficiency. Source-free semi-supervised domain adaptation (SF-SSDA), which transfers a pre-trained model to a new dataset with no source data and limited labeled target data, can be used for privacy-preserving seizure subtype classification. This paper considers two challenges in SF-SSDA for EEG-based seizure subtype classification: 1) How to effectively fuse both raw EEG data and expert knowledge in classifier design? 2) How to align the source and target domain distributions for SF-SSDA? We propose a Knowledge-Data Fusion based SF-SSDA approach, KDF- MutualSHOT, for EEG-based seizure subtype classification. In source model training, KDF uses Jensen-Shannon Divergence to facilitate mutual learning between a feature-driven Decision Tree-based model and a data-driven Transformer-based model. To adapt KDF to a new target dataset, an SF- SSDA algorithm, MutualSHOT, is developed, which features a consistency-based pseudo-label selection strategy. Experiments on the public TUSZ and CHSZ datasets demonstrated that KDF-MutualSHOT outperformed other supervised and source-free domain adaptation approaches in cross-subject seizure subtype classification.

11:20-11:40, Paper WeBT4.2
Multi-Type Privacy Protection in EEG-Based Brain-Computer Interfaces (I)

Meng, Lubin	Huazhong University of Science and Technology
Jiang, Xue	Huazhong University of Science and Technology
Jia, Tianwang	Huazhong University of Science and Technology
Wu, Dongrui	Huazhong University of Science and Technology
Keywords: Other Neurotechnology and Brain-Related Topics, BMI Emerging Applications Abstract: A brain-computer interface (BCI) enables direct communication between the brain and an external device. Electroencephalogram (EEG) is the preferred input signal in non-invasive BCIs, due to its convenience and low cost. EEG-based BCIs have been successfully used in many applications, such as neurological rehabilitation, text input, games, and so on. However, EEG signals inherently carry rich personal information, necessitating privacy protection. This paper demonstrates that multiple types of private information (user identity, gender, and BCI-experience) can be easily inferred from EEG data, imposing a serious privacy threat to BCIs. To address this issue, we design perturbations to convert the original EEG data into privacy-protected EEG data, which conceal the private information while maintaining the primary BCI task performance. Experimental results demonstrated that the privacy-protected EEG data can significantly reduce the classification accuracy of user identity, gender and BCI-experience, but almost do not affect at all the classification accuracy of the primary BCI task, enabling user privacy protection in EEG-based BCIs.

11:40-12:00, Paper WeBT4.3
Exploring Spatial Information for EEG-Based User Authentication: A ShallowNet Approach

Lee, Chaehyun	Gwangju Institute of Science and Technology
Kang, Sunghyun	Gwangju Institute of Science and Technology
Kim, Heegyu	Gwangju Institute of Science and Technology
Jun, Sung	Gwangju Institute of Science and Technology
Keywords: Other Neurotechnology and Brain-Related Topics Abstract: As biometric authentication has become commonplace in recent years, electroencephalography (EEG)-based user authentication research has gained significant attention from researchers. The practicality of enhancing authentication requires users to be classified effectively with fewer channels, which necessitates research on spatial information across brain regions to identify optimal channels for authentication protocols. While previous research has examined different acquisition protocols, comprehensive studies that have investigated the influence of spatial information from various brain regions are lacking. Therefore, in this study, a convolutional neural network (CNN)-based deep learning model was trained using EEG signals obtained from one of the most promising protocols, the steadystate visual evoked potential (SSVEP) experiment. By using a pre-trained model, the effect of spatial information specific to each brain region was examined by classifying data masked for each respective region. Through this study, we confirmed that the amount of information used to classify users is distributed more in the occipital and parietal regions than in the frontal and temporal regions. These insights suggest that focusing on channels in the occipital and parietal regions may be important in SSVEP-based user authentication research.

12:00-12:20, Paper WeBT4.4
EEG-TCF2Net: A Novel Deep Interval Type-2 Fuzzy Model for Decoding SSVEP in Brain-Computer Interfaces

Contreras Cabrera, Marcelo	University of Alberta
Christian, Flores Vega	Universidad De Ingeniería Y Tecnología
Andreu-Perez, Javier	University of Essex
Keywords: Active BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics Abstract: The Steady-State Visual Evoked Potential (SSVEP) is a robust method for creating a fast Brain-Computer Interface (BCI); however, the time window of Electroencephalography (EEG) trials has to be reduced to improve the BCI's speed. This reduction leads to a decrease in the Signal-to-noise ratio (SNR), making it more difficult to classify these signals accurately. Conversely, combining Fuzzy Neural Block (FNB) that includes Type-1 Fuzzy (T1F) in deep learning architecture has improved classification accuracy over data obtained in noisy environments. However, T1F has limitations in accurately modeling uncertainty and handling complex systems compared to Interval Type-2 Fuzzy (IT2F), which is particularly suitable for applications where robustness, adaptability, and accuracy are crucial. In this work, we proposed a deep learning framework that integrates the FNB using IT2F called FNB-IT2F. It is included parallel to the linear and final layers to assess their effectiveness. Thus, this study presents a unification of EEG-TCNet-LSTM with FNB-IT2F, which we call EEG-TCNet-LSTM-FNB-IT2F (i.e. EEG-TCF2Net). Our results reported a maximum recognition accuracy of 51.0% to 76.5% using the proposed method of EEG-TCF2Net in a subject-independent classification across all 10 subjects for 0.2 to 0.5 s time window. Overall, including FNB-IT2F in this deep learning architecture outperformed those without it, as well as baseline methods such as Filter-Bank Canonical Correlation Analysis (FBCCA) and Task-related component analysis (TRCA).

12:20-12:40, Paper WeBT4.5
An Unsupervised Clustering and Markov Chain-Based Approach for Assessing Performance During Online User Training for Mental Imagery EEG-BCIs

Ivanov, Nicolas	University of Toronto
Chau, Tom	University of Toronto
Keywords: Brain-Computer Interfaces, Human Performance Modeling, Human-centered Learning Abstract: Brain-computer interfaces (BCIs) have many potential applications for individuals with physical disabilities; however, their usage is limited due to unreliable performance. While users can improve performance via training, the effectiveness of current training approaches may be limited by inaccurate performance assessments and confusing feedback. Herein, we render recently proposed user performance assessments for mental imagery electroencephalography (EEG)-BCIs conducive to online deployment. The approach uses K-means clustering to segment the EEG signal space into pattern states and then models transitions between these states using Markov chains. A metric, taskDistinct, uses the Markov chain steady-state distributions to measure user ability to produce task-specific EEG patterns. The objective of this work was twofold: first, to assess the sensitivity of the adjusted metrics to performance variations throughout a session; and second, to examine the influence of the number of pattern states in the models on this sensitivity. To meet these objectives, we performed pseudo-online analyses where the taskDistinct metric was computed with various numbers of pattern states throughout simulated data collection sessions. Analysis revealed significant positive correlations between the adapted taskDistinct metric and other performance metrics. Additionally, the metric sensitivity to performance changes was not significantly affected by the number of pattern states. The results indicate that the adapted Markov chain-based metrics could be used for assessing performance in online user training for mental imagery EEG-BCIs.

12:40-13:00, Paper WeBT4.6
Dual Kalman Filter Based on Maximum Correntropy Criterion for Adaptive Decoding in Brain-Machine Interface

Cai, Yuxuan	Sun Yat-Sen University
Liu, Xi	Sun Yat-Sen University
Keywords: Brain-Computer Interfaces Abstract: Brain-Machine Interface (BMI) assists paralyzed patients in restoring motor functions by controlling neuroprosthetic devices through intentions captured from brain activities. Traditional neural signal decoding algorithms typically rely on Gaussian noise assumptions and static functional relationships between neural activities and movements, overlooking the common presence of non-Gaussian noise in the neural encoding process and the dynamic variations in neural tuning over time. To address these challenges, this study introduces a novel decoding algorithm that integrates maximum correntropy criterion (MCC) with dual Kalman filters. By incorporating MCC, the algorithm enhances the filter's robustness to non-Gaussian noise by capturing higher-order statistical properties of information. Additionally, a pair of mutually communicative dual filters are used to estimate states and periodically update parameters. Tested within a simulated Skinner box experiment, our algorithm exhibits superior performance in tracking the dynamic variations of neural tuning under non-Gaussian noise conditions compared to traditional Kalman filter with fixed tuning parameters and dual Kalman filter without MCC incorporation, and provides a new solution for BMI applications requiring long-term stable operation.


WeBT5	MR05
Cyber-Physical Systems and Robotics 2
Chair: Li, Yanmei0213	Ningxia University

11:00-11:20, Paper WeBT5.1
Accelerometer-Based Pushing Location Feedback Mechanism for CPR Training

Chao, Chien-Yu	Chang Gung University
Yen, Chun-Han	Chang Gung University
Lin, Wen-Yen	Chang Gung University
Keywords: Mechatronics Abstract: Using CPR training equipment with complete feedback mechanism is a key to help people learn how to deliver high quality CPR in the emergent occasions and hence the survival rate of people who suffered from cardiac arrest can be improved. However, on some advanced CPR training models, Resusci Anne, only pushing depth and frequency feedback information are provided, but no Resusci Anne is equipped with pushing location information feedback system. In this study, an accelerometer-based sensing mechanism on Resusci Anne has been proposed to provide the feedback information about the offset distances and directions from correct pushing location. With an accelerometer mounted on the center of correct pushing location, according to the inclination and rotation angles converted from the measured data of accelerometer, the offset distances, and directions of actual pushing locations on Resusci Anne can be derived. From the preliminary test, the system can successfully provide the pushing location information during CPR training and hence help people learn high quality CPR.

11:20-11:40, Paper WeBT5.2
Stackelberg-Nash Game-Theoretic Formation Path Planning for Multi-Agent Interactions

Su, Xuqi	Shanghai Jiao Tong University
Yang, Zhaohui	Shanghai Jiao Tong University
Huang, Fengling	Shanghai Jiao Tong University
Zhou, Pingfang	Shanghai Jiao Tong University
Chen, Jianguo	Chinese Academy of Sciences, University of Chinese Academy of Sc
Jing, Yuhao	University of New South Wales
Keywords: Robotic Systems, Modeling of Autonomous Systems Abstract: As technologies advance, multi-agent systems have been used in various domains. In a multi-agent formation path planning system, a number of aspects including formation maintenance, collision avoidance, control consumption, reaching the destination and the interactions among agents need to be considered together. To solve this, we model multi-agent formation path planning in the Stackelberg-Nash game (SNG) framework to address the mentioned issues. In the SNG, the leader can adopt a global perspective and make multi-step predictions to plan a formation path that avoids obstacles. However, the followers can only perform myopic planning due to their limited capabilities. We introduce the natural logarithm barrier functions penalizing the collision constraints and we utilize the first-order condition to approximate the optimality condition, serving as a consideration term for all followers' potential reactions. We employ the receding horizon algorithm (RHA) to plan the formation path for the leader and followers. Case studies in an environment with obstacles demonstrate the superior performance of our SNG framework. Compared with the Nash-game and Non-game frameworks, our framework outperforms in maintaining standard formation and effective control consumption. This work contributes to the discourse on agent-based coordination and offers scalable, efficient strategies in complex environments.

11:40-12:00, Paper WeBT5.3
Teleoperated Omni-Directional Dual Arm Mobile Manipulation Robotic System with Shared Control for Retail Store

Lima, Rolif	Tata Consultancy Services
Saha, Somdeb	Tata Consultancy Services
George, Nijil	TCS Research, Tata Consultancy Services Ltd
Vakharia, Vismay	Tata Consultancy Services
Parab, Shubham	Tata Consultancy Services
Gaonkar, Sahil	Tata Consultancy Services
Vatsal, Vighnesh	TCS Research, Tata Consultancy Services Ltd
Das, Kaushik	TCS Research
Keywords: System Architecture, Robotic Systems, Consumer and Industrial Applications Abstract: The swiftly expanding retail sector is increasingly adopting autonomous mobile robots empowered by artificial intelligence and machine learning algorithms to gain an edge in the competitive market. However, these autonomous robots encounter challenges in adapting to the dynamic nature of retail products, often struggling to operate autonomously in novel situations. In this study, we introduce an omni-directional dual-arm mobile robot specifically tailored for use in retail environments. Additionally, we propose a tele-operation method that enables shared control between the robot and a human operator. This approach utilizes a Virtual Reality (VR) motion capture system to capture the operator's commands, which are then transmitted to the robot located remotely in a retail setting. Furthermore, the robot is equipped with heterogeneous grippers on both manipulators, facilitating the handling of a wide range of items. We validate the efficacy of the proposed system through testing in a mockup of retail environment, demonstrating its ability to manipulate various commonly encountered retail items using both single and dual-arm coordinated manipulation techniques.

12:00-12:20, Paper WeBT5.4
MARLander: A Local Path Planning for Drone Swarms Using Multiagent Deep Reinforcement Learning

Tareke, Demetros Aschalew	Intelligent Space Robotics Laboratory, Skolkovo Institute of Sci
Peter, Robinroy	Skolkovo Institute of Science and Technology
Karaf, Sausar	Skoltech Institute of Science and Technology
Fedoseev, Aleksey	Skolkovo Institute of Science and Technology
Tsetserukou, Dzmitry	Skoltech
Keywords: Robotic Systems, Cooperative Systems and Control, System Modeling and Control Abstract: Achieving safe and precise landings for a swarm of drones poses a significant challenge, primarily attributed to conventional control and planning methods. This paper presents the implementation of multi-agent deep reinforcement learning (MADRL) techniques for the precise landing of a drone swarm at relocated target locations. The system is trained in a realistic simulated environment with a maximum linear velocity of 3 m/s in training spaces of 4 m³ and deployed utilizing Crazyflie drones with a Vicon indoor localization system. The experimental results revealed that the proposed approach achieved a landing accuracy of 2.26 cm on stationary and 3.93 cm on moving platforms surpassing a baseline method used with a Proportional–integral–derivative (PID) controller with an Artificial Potential Field (APF). This research highlights drone landing technologies that eliminate the need for analytical centralized systems, potentially offering scalability and revolutionizing applications in logistics, safety, and rescue missions.

12:20-12:40, Paper WeBT5.5
Pointing Frame Estimation with Audio-Visual Time Series Data for Daily Life Service Robots

Nakagawa, Hikaru	Ritsumeikan University
Hasegawa, Shoichi	Ritsumeikan University
Hagiwara, Yoshinobu	Soka University
Taniguchi, Akira	Ritsumeikan University
Taniguchi, Tadahiro	Kyoto University
Keywords: Robotic Systems Abstract: Daily life support robots in the home environment interpret the user's pointing and understand the instructions, thereby increasing the number of instructions accomplished. This study aims to improve the estimation performance of pointing frames by using speech information when a person gives pointing or verbal instructions to the robot. The estimation of the pointing frame, which represents the moment when the user points, can help the user understand the instructions. Therefore, we perform pointing frame estimation using a time series model, utilizing the user's speech, images, and speech-recognized text observed by the robot. In our experiments, we set up realistic communication conditions, such as speech containing everyday conversation, non-upright posture, actions other than pointing, and reference objects outside the robot's field of view. The results showed that adding speech information improved the estimation performance, especially the Transformer model with Mel-Spectrogram as a feature. This study will lead to be applied to object localization and action planning in 3D environments by robots in the future. The project website is https://emergentsystemlabstudent.github.io/PointingImgEst/.

12:40-13:00, Paper WeBT5.6
An Improved MOEA/D with Adaptive Operator for Stereoscopic Warehouse Management System (I)

Li, Yanmei0213	Ningxia University
Qu, Xinran	Dalian University of Technology
Tao, Yongting	Ningxia University
Shiduo, Ning	Dalian University of Technology
Chen, Yanzhou	National University of Singapore
Shi, Yanjun	Dalian University of Technology
Keywords: Adaptive Systems, Decision Support Systems Abstract: Stereoscopic warehouses improve logistics efficiency and reduce costs through automation, making it necessary to optimize their storage management systems. This paper focuses on the problem of stacker picking delay in the storage management system of a stereoscopic warehouse, aiming to improve storage and production efficiency through technology and process optimization. Firstly, a path planning model is established to optimize the path length and response speed. Then in order to enhance the global search capability and make the algorithm jump out of the local optimum, the MOEA/D algorithm is adapted to incorporate the adaptive Lévy flight operator into it. The superiority of the improved MOEA/D (IMOEA/D) is verified through a case study.


WeBT6	MR06
Infrastructure Systems and Services 2
Chair: Liu, Hongyu	Southern University of Science and Technology

11:00-11:20, Paper WeBT6.1
A Distribued Method for State of Charge Estimation for Supercapacitors Pack

Li, Heng	Central South University
Zhang, Yulin	Central South University
Zhuo, Shilong	Central South University
Peng, Hui	Central South University
Keywords: Smart Sensor Networks Abstract: Supercapacitors, leveraging their distinctive characteristics and advantages, have evolved into efficient energy storage solutions. State of Charge (SOC) is a crucial parameter for supercapacitors, and the estimation of SOC for individual supercapacitor cells has been extensively researched. In practical applications, it is common to assemble hundreds or even thousands of cells to form a supercapacitors pack, particularly in fields like electric vehicles and electric buses. Therefore, estimating the SOC for the supercapacitors pack becomes imperative. In response to the demands for supercapacitors pack SOC (SOC_pack) estimation and wireless management, this paper proposes a distributed method. After modeling the supercapacitor cells, the definition of SOC_pack is introduced. The SOC of a cell in the definition is estimated based on Kalman filter. The proposed distributed method relies on wireless communication and computational updates between cells. Through iterative processes, it ultimately converges to the estimated SOC_pack. Finally, we conducted simulation experiments to analyze the performance of the proposed method under various communication conditions, thereby validating its effectiveness and robustness.

11:20-11:40, Paper WeBT6.2
Time-Domain-Agnostic Contactless Fingerprinting Localization Via LoRa Frequency-Hopping

Lu, Yijing	China University of Mining and Technology
Lin, Tao	China University of Mining and Technology
Lan, Rixia	China University of Mining and Technology
Yin, Yuqing	China University of Mining and Technology
Chen, Pengpeng	China University of Mining and Technology
Keywords: Smart Sensor Networks Abstract: LoRa technology provides new potentials for long-range localization. Unlike previous works which attach a device to a person for active localization, this paper presents a cross-temporal domain device-free fingerprinting localization system based on LoRa technology. The rationale of this work is that the person standing on different positions can induce different multipaths, and the challenge is to extract locations from receiving signals over time. Through careful mathematical analysis, we observe the key factors that can characterize the location features and then propose a novel fingerprinting construction method leveraging frequency-hopping to expand the locations’ resolution. Considering the temporal instability of the signal, we establish a domain adversarial-based localization model for position estimation. Extensive experiments have been conducted to evaluate our design, and results indicate that the fingerprinting construction approach can well express the location diversity even across different time and the designed model achieves decimeter-level localization in long-range indoor and multipath environments.

11:40-12:00, Paper WeBT6.3
Leak Detection and Localization in Water Distribution Networks Via Online Change-Point Detection and Leak Sensitivity Modeling

Liu, Hongyu	Southern University of Science and Technology
Jiang, Jie	China University of Petroleum (Beijing)
Zhang, Xinchen	The University of Hong Kong
Ding, Yulong	Southern University of Science and Technology
Yang, Lili	Southern University of Science and Technology
Yang, Shuang-Hua	University of Reading
Keywords: Fault Monitoring and Diagnosis, Smart Sensor Networks, Cyber-physical systems Abstract: Efficient leak detection and localization in water distribution networks (WDNs) are essential for mitigating disastrous losses from leak incidents. Current unsupervised learning-based methods excel in detecting single leaks but struggle with multi-leak scenarios, limiting their practical applicability. Additionally, leak localization methods relying on simulation optimization can be computationally intensive for large-scale WDNs. Meanwhile, purely data-driven approaches struggle to leverage network topology and hydraulic characteristics, hindering accurate leak localization. To address these challenges, we propose a leak detection and localization framework that employs online change-point detection and leak sensitivity modeling to achieve real-time detection and localization in multi-leak scenarios. For leak detection, the framework utilizes an unsupervised One-dimensional Convolutional Auto-encoder (1D-CAE) to reconstruct the pressure data collected from sensors deployed over a WDN. The residuals between the reconstructed values and the observed values, also known as reconstruction errors, are then analyzed using a Sequentially Discounting Normalized Maximum Likelihood (SDNML) change-point detector. Equipped with a customized scoring function designed for robust multi-leak detection, this detector computes anomaly scores in real-time at each time step, facilitating effective leak detection. Furthermore, we simulate leaks at different nodes of the WDN to quantify their impact on the sensor nodes using pressure residuals-based leak characteristic factors. By comparing reconstruction errors with these factors, we identify the nodes that best match the current leak event, achieving precise leak localization. An evaluation on the L-Town dataset demonstrates the superior performance of our method for multi-leak detection and localization.

12:00-12:20, Paper WeBT6.4
Rapid Object Detection and Localization through R-Tree Based Multi-Sensor Fusion for Inspection Robots

Wang, Dengshuo	Beijing University of Chemicial Technology
Huo, Dongjie	Beijing University of Chemical Technology
Zhang, Dong	Beijing University of Chemical Technology
Keywords: Smart Sensor Networks, Robotic Systems, Autonomous Vehicle Abstract: The detection and localization of objects are critical for inspection robots. Camera images facilitate rapid object detection but are limited in providing accurate object localization. Conversely, LiDAR point clouds can provide precise object localization but lack semantic information. To solve this problem, this work proposes a multi-sensor data fusionbased method for object detection and localization. The rich semantic information of camera images and the precise location of LiDAR point clouds are combined. An R-tree spatial indexing algorithm is used to match and process data swiftly. Accurate object localization can be achieved by using the provided method based on the fused data. Moreover, the method leverages point cloud data to refine confidence, significantly reducing the miss rate of detection in glaring conditions. Simulations and experiments are conducted to compare the proposed method with other localization algorithms. The results show that it significantly outperforms them in terms of accuracy.

12:20-12:40, Paper WeBT6.5
NO-GAT: Neighbor Overlay-Induced Graph Attention Network (I)

Wei, Tiqiao	Southwest University
Yuan, Ye	Southwest University
Keywords: Smart Sensor Networks Abstract: Graph neural networks (GNNs) have garnered significant attention due to their ability to represent graph data. Among various GNN variants, graph attention network (GAT) stands out since it is able to dynamically learn the importance of different nodes. However, present GATs heavily rely on the smoothed node features to obtain the attention coefﬁcients rather than graph structural information, which fails to provide crucial contextual cues for node representations. To address this issue, this study proposes a neighbor overlay-induced graph attention network (NO-GAT) with the following two-fold ideas: a) learning favorable structural information, i.e., overlaid neighbors, outside the node feature propagation process from an adjacency matrix; b) injecting the information of overlaid neighbors into the node feature propagation process to compute the attention coefﬁcient jointly. Empirical studies on graph benchmark datasets indicate that the proposed NO-GAT consistently outperforms state-of-the-art models.

12:40-13:00, Paper WeBT6.6
Exploring Automated Feature Engineering for Energy Consumption Forecasting with AutoML (I)

Alkhulaifi, Nasser	University of Nottingham
Bowler, Alexander L.	University of Leeds
Pekaslan, Direnc	University of Nottingham
Triguero, Isaac	University of Nottingham
Watson, Nicholas J.	University of Leeds
Keywords: Smart Buildings, Smart Cities and Infrastructures Abstract: Machine learning methods are widely used to predict energy consumption, aiming to enhance efficiency and support environmental goals. However, developing these models is traditionally time-consuming and expert-dependent. While Automated Machine Learning (AutoML) has emerged as a valuable approach to streamlining machine learning pipelines, including appropriate preprocessing and learning algorithms, it may still require human experts. These experts might be needed to generate new, interpretable features that could significantly improve model performance. This is particularly relevant in complex settings such as the energy domain, where deep learning's automatic feature extraction lacks interpretability. To address this challenge, this exploratory work introduces an automated feature engineering method tailored for energy forecasting problems. It involves generating a comprehensive set of features that can be fed into AutoML, thereby reducing the need for domain knowledge in feature engineering. The proposed method has been validated using eleven datasets from various energy domains, including residential buildings, renewable energy, and regional energy consumption, with state-of-the-art AutoML methods, namely H2O, TPOT, AutoGluon, and FLAML. The results demonstrate a noticeable reduction in prediction errors across all the examined datasets.


WeBT7	MR07
Online - AI Applications 6
Chair: Yu, Bihui	Shenyang Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences

11:00-11:20, Paper WeBT7.1
SAM-Wav2lip++: Enhancing Behavioral Realism in Synthetic Agents through Audio-Driven Speech and Action Refinement

Yu, Bihui	Shenyang Institute of Computing Technology, Chinese Academy of S
Liu, Dawei	Shenyang Institute of Computing Technology, Chinese Academy of S
Shi, Huiyang	University of Chinese Academy of Sciences
Chang, Guiyong	Shenyang Institute of Computing Technology, Chinese Academy of S
Wei, Jingxuan	Shenyang Institute of Computing Technology, Chinese Academy of S
Sun, Linzhuang	Shenyang Institute of Computing Technology, Chinese Academy of S
Tian, Songtao	Tsinghua University
Bu, Liping	Shenyang Institute of Computing Technology, Chinese Academy of S
Keywords: AI and Applications, Application of Artificial Intelligence, Artificial Social Intelligence Abstract: Digital human generation is a forward-looking field in technology. Despite significant progress in the generation of speaking facial videos, many challenges remain unaddressed. Issues such as unnatural head movements, distorted expressions, artifacts in generated videos, and uncoordinated limb movements persist. Most current efforts are focused on specific individuals, with enhancements often limited to head movements without further advancing the overall behavioral actions of digital humans. In this context, we introduce a new dataset, CFMD, and a novel model, SAM-Wav2lip++, capable of generating consistent, audio-synchronized lip and behavior action videos from a single reference image of any identity. This work features three main innovative components: (1) a contrastive lip-sync discriminator for precise lip synchronization, (2) a generator for the synthesis of sound-action consistency, and (3) the SAM module for facial refinement operations. Through extensive experiments and user studies, our results demonstrate that our model can synthesize digital human videos of impressively high perceptual quality that accurately sync lip movements and behavioral actions with the input audio, substantially outperforming the state-of-the-art baselines evaluations.

11:20-11:40, Paper WeBT7.2
MHN-GCN: A Cancer Driver Gene Prediction Method for Multi-Omics Data

Zheng, Xuedong	Shenyang Aerospace University
Wang, Dinglun	Shenyang Aerospace University
Lv, Xuenan	Shenyang Aerospace University
Keywords: Biometric Systems and Bioinformatics, Application of Artificial Intelligence, Computational Life Science Abstract: Identifying cancer driver genes is crucial for studying targeted therapies. Due to the complexity of cancer, single-omics data often fall short in elucidating cancer genes. Therefore, this paper proposes a novel cancer gene prediction method, MHN-GCN, which integrates data from gene expression, DNA methylation, mutations, and copy number variations to predict cancer genes accurately. By comparing with currently popular single-omics data prediction methods and multi-omics data prediction methods, it was found that MHN-GCN performed well in most metrics. Subsequently, bioinformatics analysis was conducted on the predicted genes to explore these suspicious genes' functions and potential mechanisms, providing new research directions. The MHN-GCN proposed in this paper provides valuable insights for predicting cancer driver genes.

11:40-12:00, Paper WeBT7.3
Integrating Graph Neural Networks with Multi-Head Attention for Multi-Task Learning in Session-Based Recommendation

Long, Hua	Chongqing University of Technology
Lu, Jiaqiang	Chongqing University of Technology
Huang, BingWen	Chongqing University of Technology
Keywords: AI and Applications, Deep Learning, Application of Artificial Intelligence Abstract: Session-based recommendation (SBR) endeavors to forecast the subsequent interaction item of an anonymous user, relying on their brief historical sequence of interactions. Due to the short-term nature of the session, the information available was limited. Most SBR algorithms mainly rely on current session information and ignore global context information. Therefore, previous approaches often suffer from data sparsity issues, leading to suboptimal performance. To this end, we propose a novel model, namely IGM-ML, which leverages global context as implicit feedback information via graph neural network, Adjustable weight parameters are then utilized to control global context information for recommendation tasks. Meanwhile, we design multi-head attention to learn the current explicit item transfer information to obtain local session-level representation. Given that the global implicit session representation and the local explicit session are learned using different approaches, we maximize the mutual information between them by introducing contrastive learning as an auxiliary task for the bridge. Then, We utilize multi-task learning approach to simultaneously optimize recommendation task and auxiliary task. IGM-ML has outperformed other state-of-the-art methods on three benchmark datasets, as evidenced by experimental results.

12:00-12:20, Paper WeBT7.4
Spatial-Temporal Motion Compensation for Learned Video Compression

Huang, Qian	Hohai University
Liu, Wenting	Hohai University
Lu, Hao	Hohai University
Wang, Yiming	Hohai University
Keywords: Deep Learning, Multimedia Computation, Machine Vision Abstract: In the past few years, learned video compression has received increasing attention. However, most current learned methods rely on the bilinear warping operations in motion compensation, which is equivalent to a low-pass filtering operation and brings the distortion of reconstruction. To address this problem, we propose a spatial-temporal motion compensation for learned video compression (STMC-LVC). STMC-LVC uses a spatial-temporal motion compensation network (STMC-Net) to fully consider the spatial-temporal correlation between successive frames, performing accurate motion compensation. Specifically, STMC-Net mainly consists of the initial feature prediction module and the fusion module. The initial feature prediction module predicts reference features based on DCN to provide sufficient information for motion compensation. The fusion module obtains temporal attention information by calculating the similarity between features, performs spatial attention operations, and finally performs spatial-temporal fusion, further improving the robustness of motion compensation. In addition, STMC-LVC uses a conditional coding framework. We use a concatenation of the current feature and predicted feature as context to explore spatial-temporal correlations in feature space. Experimental results show that our method effectively improves video compression. Specifically, our model achieves an average of 33.86% and 58.70% bitrate savings than x265 (veryslow) on PSNR and MS-SSIM, respectively.

12:20-12:40, Paper WeBT7.5
Exploring Social Decision Models through Quantum Fuzzy Approaches

Botelho, Cecilia Silva da Costa	Federal University of Pelotas
Buss, Juliano Strelow	Federal University of Pelotas
Santos, HelidaSS	Federal University of Rio Grande
Lucca, Giancarlo	Catholic University of Pelotas
Cruz, Anderson	Federal University of Rio Grande Do Norte
Yamin, Adenauer	Federal University of Pelotas
Reiser, Renata	Federal University of Pelotas
Keywords: Quantum Cybernetics, Fuzzy Systems and their applications Abstract: This study introduces a innovative quantum fuzzy approach for modeling and simulating complex decision-making processes, utilizing quantum circuits to express membership degrees as unitary transformations. We contrast Quantum Fuzzy Computing with Classical Computing, highlighting advancements in modeling the prisoner's dilemma. Using Qiskit framework, we demonstrate how quantum fuzzy algorithms, featuring connectives such as 'exclusive or' and 'arithmetic means', enable detailed modeling of multi-agent relationships via multidimensional quantum registers, showcasing potential advancements in this hybrid research area.

12:40-13:00, Paper WeBT7.6
Medical Multi-Choice Question Answering with Retrieval Augmented Language Models

Liu, Yujie	Shanghai University
Keywords: Medical Informatics, Biometrics and Applications, Abstract: Medical multi-choice question-answering (Medical MCQA) is an emerging topic with great practical importance for diagnosis and treatment. However, this task is under-explored due to the scarcity of data, whose annotation cost is relatively high, demanding domain-specific knowledge. Some methods attempt to expand data scale through mitigation strategies based on large-scale unsupervised corpora. Despite their promising results, pre-trained language models, lacking medical knowledge, fail to make accurate predictions for professional questions about medicine. To address this, we applied the Retrieval-Augmented method to Medical MCQA and proposed a new framework based on interactive retrieval augmentation. This framework consists of two parts for adapting knowledge to the medical domain. Firstly, the dynamic interplay between inquiries and contextual nuances is learned in the data-rich domain through QA modeling of language models. This procedural knowledge and these implicit semantic associations embedded in vocabulary representations are adapted to the medical domain for better understanding. Secondly, an adaptive retrieval network is designed to inject knowledge into the model. Our method has shown superior performance on multiple Medical MCQA datasets compared to baseline models, effectively addressing the challenges posed by data scarcity and domain specialization of Medical MCQA. Analysis showcases that our model has a better ability to retrieve and comprehend medical details.


WeBT8	MR08
Online - Complex and Cooperative Systems
Chair: Lu, Kai	Qilu University of Technology (Shandong Academy of Sciences)

11:00-11:20, Paper WeBT8.1
Detection Method of Teaching Discourse Richness Based on Prompt Learning and Pre-Trained Language Model

Liu, Shuhua	Northeast Normal University
Guo, Yiwei	Northeast Normal University
Yang, Shihao	Northeast Normal University
Yang, Fengqin	Northeast Normal University
Keywords: Application of Artificial Intelligence, Deep Learning, AI and Applications Abstract: Teaching discourse can not only introduce the class into different teaching episodes, but also interact emotionally with students by different instructions. Therefore, the richness of teaching discourse is a very important evaluation criteria. This paper proposes a teaching discourse richness detection model based on prompt learning ProTDR, which converts the data of downstream tasks into natural language form by setting different kinds of prompt templates. Without changing the structure of the backbone model, the prompt learning can assist the pretrained model to better judge the teaching discourse category. We conduct a series of experiments on a Chinese DialogId dataset which consists of teacher utterance transcripts in real classrooms. Compared with current mainstream pre-trained models, the proposed model achieves the state-of-the-art performance on the DialogId dataset. Finally, we give a test sample of the model ProTDR on a real classroom teaching of a high school, and visually prove the feasibility and effectiveness of the proposed model.

11:20-11:40, Paper WeBT8.2
Optimizing Click-Through Rate Prediction: A Model Utilizing Multi-Attention Fusion Hashing Algorithms

Li, Xinge	Tianjin University of Technology
Wang, Fayu	Tianjin University of Technology
Chen, HONGtao	Tianjin University of Technology
Keywords: AI and Applications, Application of Artificial Intelligence, Deep Learning Abstract: Click-through rate prediction plays a crucial role in contemporary recommender systems, which aim to refine recommended items by predicting the click-through rate to enhance the overall recommendation effectiveness. However, existing click-through rate prediction models still do not fully leverage both long-term and short-term user behavior information. Consequently, this paper proposes a novel click-through rate prediction model, named Interest Evolutionary Network with Hash and Multiple Attention Network (IEHAN). The model categorizes user behaviors into long-term and short-term categories, employing a hash sampling method to model long-term user behaviors. By directly filtering behavioral items with the same hash signature as the target item from the entire sequence, the IEHAN model effectively models long-term user interests, thereby reducing information loss. Additionally, the model employs the multi-head attention mechanism and the GRU model with an attention update gate to integrate the relevance of the user's interest state and the target item, thereby enhancing the impact of relevant interests on interest evolution and shaping short-term user interest. Experimental results demonstrate that the IEHAN model achieves higher AUC and lower LOSS evaluation metrics on the Taobao dataset and the Books subset of the Amazon dataset, surpassing other existing models and exhibiting superior prediction accuracy.

11:40-12:00, Paper WeBT8.3
JTEA: Implementing Semi-Supervised Entity Alignment Using Joint Teaching Strategies

Lu, Kai	Qilu University of Technology (Shandong Academy of Sciences)
Zhao, Jing	Qilu University of Technology(ShanDong Academy of Sciences)
Ding, Lichao	Qilu University of Technology (Shandong Academy of Sciences)
Hao, Zenghao	Qilu University of Technology
Keywords: Knowledge Acquisition, Representation Learning, Neural Networks and their Applications Abstract: Entity alignment(EA) aims to identify equivalent entities in different knowledge graphs or data sources and establish correspondences between them. In recent years, embedding-based methods have made significant progress in the field of entity alignment, yet the shortage of adequate training data (i.e., seed entity pairs) continues to be a primary challenge. Traditional semi-supervised methods typically address this issue by generating potential mappings (i.e., pseudo-mappings) for unlabeled entities. However, these methods often suffer from the influence of erroneous potential mappings or overlook the uncertainty of potential mappings. Therefore, we propose a novel semi-supervised entity alignment method called JTEA. This method combines seed entity pairs with potential mappings through end-to-end joint training, providing strong guidance for model training. In constructing potential mappings, we propose a bidirectional matching strategy (BMS), which generates more detailed potential mappings by jointly adjusting decisions in different directions and uses a joint matching confidence score to characterize its uncertainty. Furthermore, we also introduce a method of matching diversity alignment selection, aiming to screen out more reliable entity alignments to further improve the performance and stability of the model. We conducted a large number of experiments on the benchmark dataset to fully validate the effectiveness of the JTEA method.

12:00-12:20, Paper WeBT8.4
CASSTIMP: Cascaded Architecture for Symptom Status Tracking with Inquiry-Aware Attention and Multi-Perception Pooling

Yu, Haowen	Fudan University
Li, Mingcheng	Fudan University
Zhang, Lihua	Fudan University
Keywords: Biometric Systems and Bioinformatics, Application of Artificial Intelligence, Deep Learning Abstract: Symptom status tracking poses a significant challenge due to the intricate nature of symptom identification and inference from medical doctor-patient dialogues. Numerous prior studies in this domain have relied on approaches involving multi-label classification and multi-task learning. Multi-label classification methods typically consider symptoms and statuses within a unified label space. Nevertheless, this approach frequently results in sparse predictions, eroding semantic relationships among labels and causing instability in prediction outcomes. In contrast, multi-task learning segregates symptom prediction and status prediction into separate tasks, thereby improving performance relative to conventional multi-label classification methods. Nonetheless, despite these advancements, the imbalance in training task weights persists, leading to suboptimal performance. To tackle these challenges, we employ a cascaded model structure rooted in the Question-Answering (QA) paradigm in this study. Our approach utilizes dialogue content to create context-inquiry pairs and introduces two novel modules: inquiry-aware attention and multi-perception pooling. Inquiry-aware attention enhances the contextual relationship between inquiries and dialogues, while multi-perception pooling extracts diverse semantics from the dialogue. The experimental results unequivocally demonstrate our method's efficiency, surpassing state-of-the-art techniques in symptom status tracking and indicating its superior effectiveness.

12:20-12:40, Paper WeBT8.5
SMGNN: Semantic Multi-Connected Graph Neural Network for Traffic Flow Prediction

Yao, Xin-Wei	Zhejiang University of Technology
Li, Wei-Cai	Zhejiang University of Technology
Li, Xiang-Yang	University of Science and Technology of China
Zhang, Xiao-Li	Zhejiang University of Technology
Yao, Zhong-Hua	ZheJiang Institute of Communications
Li, Qiang	Zhejiang University of Technology
Keywords: AI and Applications, Big Data Computing,, Deep Learning Abstract: Traffic flow prediction, as one of the problems of spatial correlation analysis of time series, has been extensively studied. The extraction and fusion of effective spatio-temporal features are crucial for achieving high-precision traffic flow prediction. Traditionally, the adjacency graph designed based on the neighboring nodes of real-world road networks has been indispensable for learning spatial features. However, this single connected component graph structure is prone to the phenomenon of over-smoothing, leading to homogenization of the learned spatial feature. Addressing this challenge, this paper proposes a novel Semantic Multi-connected Graph Neural Network (SMGNN) aimed at mitigating the homogeneity of spatial features and effectively modeling spatio-temporal interactions. Firstly, considering the existence of several nodes in large-scale road networks with similar traffic flow variation patterns, we semantically connect these nodes to construct multi-connected semantic spatial graphs (MSSG), replacing the traditionally used neighboring node graph in conventional graph neural networks. Correspondingly, we design a novel graph neural network architecture that cyclically fuses dynamic scale spatio-temporal features from MSSG using an improved Dynamic Spatial Graph Attention (DSGA) module. Secondly, to achieve a more effective representation, we design a Inverted Temporal Attention (ITA) module to supplement static scale temporal features. Furthermore, we introduce a Multi-dimensional and Multi-scale Feature Extraction (MMFE) module to fuse spatio-temporal features at various scales within different receptive fields. Extensive experiments conducted on real-world datasets have verified the effectiveness of our proposed method, significantly outperforming various baseline models.

12:40-13:00, Paper WeBT8.6
User Authentication Using Typing Habits in Virtual Reality

Chen, Li	Chongqing University
Chen, Hengxin	Chongqing University
Keywords: Biometrics and Applications,, Human-Computer Interaction Abstract: Text input has a wide range of application scenarios in virtual reality. The integration of physical keyboards into virtual environments facilitates more straightforward and efficient text input. However, while the user is immersed in a virtual environment utilizing a physical keyboard for textual interaction, the user's information (e.g., passwords) is vulnerable to potential attackers. This paper explores the feasibility of utilizing hand tracking data for implicit authentication in virtual reality as users type fixed text. We create an undirected graph from hand key-points to model the user’s typing habits. We collected hand tracking data from 51 participants twice, two weeks apart. They used a QWERTY keyboard to input text in a virtual space. A spatial-temporal graph convolutional network was employed to extract the user's typing habits, resulting in an equal error rate of 0.048 in the case of internal spoofing attacks and 0.055 in the case of external spoofing attacks.


WeBT9	MR09
AI, AIoT and AI Applications	Regular Papers - Cybernetics
Chair: Cai, Yitong	Institute of Information Engineering, Chinese Academy of Sciences

11:00-11:20, Paper WeBT9.1
Fractal: Facilitating Robust Encrypted Traffic Classification Using Data Augmentation and Contrastive Learning

Cai, Yitong	Institute of Information Engineering, Chinese Academy of Science
Li, Shu	Institute of Information Engineering，Chinese Academy of S
Zhang, Hongfei	Institute of Information Engineering, Chinese Academy of Science
Liu, Yuyi	Institute of Information Engineering, Chinese Academy of Science
Du, Meijie	Institute of Information Engineering, Chinese Academy of Science
Fang, Binxing	Harbin Institute of Technology
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Deep Learning Abstract: Encrypted traffic classification using deep learning models based on packet length sequences has shown promising results. However, in real-world network conditions, network-induced phenomena such as packet loss, packet retransmission, and packet disorder are prevalent, leading to a decline in performance. To address this challenge, we propose Fractal, a novel approach designed to enhance existing deep learning models by integrating data augmentation and contrastive learning, thereby facilitating robust encrypted traffic classification under various network conditions. Specifically, Fractal employs three data augmentations to simulate different network conditions, generating diverse packet length sequences from the same flow. Contrastive learning is then leveraged to distill robust features from these augmented sequences. Fractal enables deep learning model to discern the intrinsic patterns of each flow, regardless of the variance in packet length sequences caused by network-induced phenomena. Our comprehensive evaluations demonstrate that Fractal enhances the classification performance of deep learning models under different network conditions, achieving 23% increase in accuracy and 15% improvement in F1-score.

11:20-11:40, Paper WeBT9.2
Enhancing Sentiment Analysis in E-Commerce through the Integration of GPT-3.5 Embeddings and BiLSTM Networks

Wang, Mengling	Jianghan University
Hou, Qun	Jianghan University
Peng, Ao	Jianghan University
Keywords: Deep Learning, AI and Applications, Expert and Knowledge-Based Systems Abstract: As e-commerce continues to grow, comprehending consumer emotions has be-come crucial, directly influencing product sales and corporate reputation. Yet, the high dimensionality and complexity of e-commerce platform comment data pose significant challenges to conventional sentiment analysis techniques. In response, this paper presents an innovative sentiment analysis model for e-commerce re-views, leveraging GPT-3.5 embeddings, convolutional neural networks (CNN), bidirectional long short-term memory (BiLSTM) networks, and Self-Attention mechanisms. Initially, GPT-3.5 is employed to transform input text into dynamic word vectors, thereby capturing subtle semantic distinctions. Subsequently, the CNN layer is utilized to extract textual emotional features, while the Self-Attention mechanism enhances the model's sensitivity to subtle emotional shifts. The BiLSTM layer further captures bidirectional, backward and forward contex-tual information, offering a holistic grasp of textual sentiment. Emotional features are ultimately classified using the Softmax function. Relative to existing models, our proposed model exhibits superior performance in e-commerce dataset exper-iments, demonstrating notable enhancements in accuracy, precision, recall, and F1 scores. This validation underscores our solution's effectiveness in conducting sentiment analysis within the e-commerce domain.

11:40-12:00, Paper WeBT9.3
SMGC-SBERT: A Multi-Feature Fusion Chinese Short Text Similarity Computation Model Based on Optimised SBERT

Zhao, Shuo	Qilu University of Technology (Shandong Academy of Sciences)
Gu, Qiliang	Qilu University of Technology ( Shandong Academy of Sciences )
Zhang, Jianqiang	Shandong Branch of China Mobile Communication Group Design Insti
Song, Gongpeng	Shandong Branch of China Mobile Communication Group Design Insti
Lu, Qin	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, Artificial Social Intelligence, Machine Learning Abstract: Chinese short text similarity computation stands as a pivotal task within natural language processing, garnering significant attention. However, existing models grapple with limitations in handling intricate semantic relationships, such as the challenge of discerning subtle semantic nuances in text, inadequacies in effectively integrating diverse levels of semantic information, and the struggle to capture polysemous meanings accurately. In addressing these issues, this paper introduces an innovative Chinese short text similarity computation model, SMGC-SBERT. This model addresses shortcomings of the existing models by employing a multi-module fusion strategy, thereby enabling a more precise measurement of semantic similarity between texts. Primarily, the model incorporates SAT embedding to acquire phrase-level semantic information and leverages the MS-BERT model to encode text, improving the model's comprehension of textual polysemy and obtaining richer semantic representation. Then, the fusion of module features, including multi-branch convolutional networks and mix pooling, enables the extraction of textual features from varied levels, bolstering the model's representational capacity. Additionally, to further reduce overfitting risks while improving accuracy and other performance, a multi-layer feature adjustment network is utilized for short text similarity calculation. The final resultant experimental findings showcase the superiority of the SMGC-SBERT model over other neural network models, demonstrating significant advancements across both Chinese-SNLI and CCKS2018_Task3 Chinese short text datasets.

12:00-12:20, Paper WeBT9.4
MFFLEN: Multi-Label Text Classification Based on Multi-Feature Fusion and Label Embedding

Gu, Qiliang	Qilu University of Technology ( Shandong Academy of Sciences )
Zhao, Shuo	Qilu University of Technology (Shandong Academy of Sciences)
Zhang, Jianqiang	Shandong Branch of China Mobile Communication Group Design Insti
Song, Gongpeng	Shandong Branch of China Mobile Communication Group Design Insti
Lu, Qin	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, Artificial Social Intelligence, Machine Learning Abstract: To address the challenges associated with insufficiently extracting and utilizing features at different levels, overlooking the connection between label meanings and text, and facing problems of over-compression or information loss when extracting global information using recurrent neural networks in the field of multi-label text categorization, this paper introduces an innovative model known as MFFLEN (Multi-Feature Fusion and Label Embedding Neural Network).First, a back-translated enhanced label set is constructed by back-translated splicing enhancement of the original label set. This set, together with the text, is then input into the embedding layer, which consists of the pre-trained model of bert-base-Chinese, thus establishing the initial connection between the text and the labels within the same vector space. Then, to comprehensively extract multi-level semantic features, the model uses a convolutional layer to extract local features and an embedding layer to extract sentence-level features. A bidirectional attention embedded GRU (BAE-GRU) layer is used to extract hybrid fine-grained features, which are then fed into the attention layer to further extract hybrid labeled features based on labeling information. Finally, these three different types of features are fused and multi-label text classification results are obtained using a classifier. The experiments proved that the MFFLEN model achieved 73.82% and 88.44% macro-F1 and 88.00% and 88.86% micro-F1 on the two datasets CAIL 2018 Small and CAIL 2018 Split, respectively, which is better than other baseline models.

12:20-12:40, Paper WeBT9.5
IRS-Enabled Interference Elimination and Fairness Enhancement in D2D Communication Networks

Dai, Xin	Beijing Information Science & Technology University
Chen, Xin	Beijing Information Science and Technology University
Jiao, Libo	Beijing Information Science and Technology University
Han, Bingjie	Beijing University of Technology
Pan, Yizheng	Beijing Information Science & Technology University
Yin, Tong	Beijing Information Science & Technology University
Keywords: Cloud, IoT, and Robotics Integration, Intelligent Internet Systems, Optimization and Self-Organization Approaches Abstract: In order to cope with the increasing data traffic, we try to enable Intelligent Reflecting Surface (IRS) interference elimination in Device-to-Device (D2D) communication networks to improve the Signal Interference Noise Ratio (SINR). We build the system model and divide the original problem into two subproblems: IRS reflection parameter adjustment and IRS allocation. We use the the Cross Entropy Global Optimization Method (CEGOM) to solve the first subproblem. For the second subproblem, in order to ensure the fairness of user rates and avoid user starvation, we propose a heuristic algorithm based on the Max-Min Fairness Method (MMFM) to solve the problem. Simulation results demonstrate the superiority of the proposed algorithms, which improves Jain's fairness index by 70%, 72% and 13% and reduces the blocking probability by 92%, 90% and 82%, respectively, when compared to the random, shortest distance, and traditional MMFM strategies.

12:40-13:00, Paper WeBT9.6
A Novel Federated Learning System with Privacy Protection and Blockchain Consensus Incentive Mechanisms in Cloud-Edge Collaboration Scenarios

Liu, Longyi	Shenyang Institute of Computing Technology, Chinese Academy of S
Hu, Yi	Shenyang Institute of Computing Technology, Chinese Academy of S
Zhao, Yanqing	Shenyang Institute of Computing Technology, Chinese Academy of S
Zhang, Xiyang	Shenyang Institute of Computing Technology, Chinese Academy of S
Ma, Yongze	Shenyang Institute of Computing Technology, Chinese Academy of S
Chang, Guiyong	Shenyang Institute of Computing Technology, Chinese Academy of S
Keywords: Cloud, IoT, and Robotics Integration, Machine Learning, Information Assurance and Intelligence Abstract: The data-driven transformation of industrial intelligence is becoming a key focus for the future of smart manufacturing. However, due to concerns about data privacy, the industry often faces a prevalent "data island" problem. Traditional federated learning ensures data privacy and usability to some extent, yet the C/S architecture's design model is susceptible to poisoning attacks, denial-of-service threats, and lacks consensus mechanisms and incentives among participants to ensure fairness in learning. This paper integrates federated learning into blockchain consensus protocols, proposing an FRConsensus algorithm based on model evaluation and stake election, which overcomes issues of passive participation and resource wastage during model training. Simultaneously, it introduces model watermarking and ECC public-key encryption mechanisms to secure parameter transmission. The experimental results demonstrate that the proposed federated learning system achieves superior model convergence. Furthermore, it is more effective in defending against poisoning attacks. In comparison to mainstream federated learning mechanisms based on consensus algorithms, it reduces the election latency by 20%, shortens the consensus latency by 8%, and increases the throughput by 12%.


WeBT10	MR10
Image Processing and Pattern Recognition 6	Regular Papers - Cybernetics
Chair: Wang, Chengkun	Inner Mongolia University

11:00-11:20, Paper WeBT10.1
Enhancing Training Stability in Generative Adversarial Networks Via Penalty Gradient Normalization

Xia, Tian	Tohoku University
Su, Qinglang	Macao University of Tourism
Keywords: Deep Learning, Image Processing and Pattern Recognition, Machine Learning Abstract: In the evolution of the generative modeling domain, Generative Adversarial Networks (GANs) have emerged as focal points of scholarly attention. In this study, we introduce a novel normalization technique called penalty gradient normalization (PGN) to address the inherent training instability observed in GANs. This instability often arises due to sharp gradient variations within the network. Unlike conventional approaches such as gradient penalty and spectral normalization, our proposed PGN method selectively imposes a constraint on the gradient norm within the discriminator function. By doing so, we enhance the discriminator’s capacity to discern subtle variations in generated samples. Empirical investigations across diverse datasets reveal that GANs, when trained with penalty gradient normalization, exhibit outstanding performance compared to existing methods in terms of both Frechet Inception Distance and Inception Score.

11:20-11:40, Paper WeBT10.2
Dynamic Attention Fusion Decoder for Speech Recognition

Zhang, Jiang	Xinjiang University
Wang, Liejun	Xinjiang University
Xu, Miaomiao	Xinjiang University
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: Speech recognition serves as the foundation for human-computer interaction. To attain more accurate speech recognition results, the models for speech recognition are becoming increasingly sophisticated, demanding a greater volume of training data. In situations where data and computational resources are limited, the rapid development of usable speech recognition models becomes crucial for the future of this field. Currently, mainstream speech recognition models rely on attention mechanisms. These models employ cross-attention during decoding to address the relationship between the encoded speech input and the textual representation of the output. This mechanism allows the model to focus on different parts of one sequence while generating elements of another sequence, facilitating a better understanding of their relationships. However, stacking numerous layers of cross-attention is required to increase the model's expressive power. Yet, too many layers can impact model training and inference speed, and necessitate more data for training. We introduce a dynamic attention fusion speech recognition decoder designed for small datasets to address this issue. The decoder utilizes enhanced positional information during the decoding process to query the positional correspondence between the output text and the input speech encoding. This approach eliminates the need for extensive stacking of cross-attention mechanisms. Subsequently, a dynamic fusion module integrates these correspondences with the original decoding information. This method effectively establishes improved correspondences between the input and output, eliminating the need for stacking additional decoder layers and thereby reducing the model's parameter count. Our model achieved Character Error Rates (CER) of 4.60%, 12.67%, and 7.06% on the Aishell1, Primewords, and Free ST Chinese Mandarin Corpus datasets, respectively. Meanwhile, on the Uyghur dataset, our model attained a Word Error Rate (WER) of 4.23%. These results outperformed the baseline systems. The error rates decreased by 0.05%, 0.23%, 0.21%, and 1.52% on four datasets respectively compared to the baseline system. Additionally, our model's parameter count also decreased by 6.7%.

11:40-12:00, Paper WeBT10.3
Concrete Structural Crack Damage Classification Using Nonlinear Dimension Reduction and Broad Learning System

Wang, Bingshu	Northwestern Polytechnical University, Taicang Campus
Lin, Jia	Northwestern Polytechnical University
Zhuang, Xiaodong	Northwestern Polytechnical University
Zhang, Guanghui	Shandong University
Chen, C. L. Philip	University of Macau
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Machine Vision Abstract: Concrete structural crack damage classification is of importance for road safety. This paper proposes a new method based on broad neural network for crack damage classification in concrete structures. It includes three stages. Firstly, a pre-trained deep neural network is used to extract the features from crack images. Secondly, principal component analysis is used to project the retrieved features from high dimensions to low dimensions. Thirdly, broad learning system is employed to predict the classification using the low-dimensional features. Experimental results demonstrate that this method reduces the model’s training time and improves classification accuracy.

12:00-12:20, Paper WeBT10.4
EFFDet: A Crack Detector Via Boundary Preservation and Cross-Attention Integration

Gao, Linhua	Zhejiang University of Technology
Weng, Libo	Zhejiang University of Technology
Gao, Fei	Zhejiang University of Technology
Keywords: Deep Learning, Image Processing and Pattern Recognition, Machine Vision Abstract: The complexity of scenes and the topology of cracks make road crack detection a challenging task. Compared to other semantic segmentation tasks, this mission places a greater demand on the network's ability to preserve detailed boundary information. To address this, a novel road crack detection network architecture EFFDet is proposed in this paper. Firstly, we redesign the encoding-decoding module based on large-scale convolutional kernels and attention mechanisms to reduce the loss of detailed information caused by downsampling. Secondly, the Cross Attention module is proposed to integrate more precise details into the output of the decoding layer. In comparative experiments on four datasets, CRACK500, Volker, CrackLS315 and DeepCrack, EFFDet achieves ODS values of 0.7434, 0.6758, 0.6449 and 0.8708, respectively. The experimental results show that EFFDet demonstrates stronger detection capabilities in road crack detection.

12:20-12:40, Paper WeBT10.5
EFD-MVSNet: Enhanced Feature Distinctiveness for Multi-View Stereo

Wang, Chengkun	Inner Mongolia University
Zhang, Zhibin	Inner Mongolia University
He, Liqiang	Geomechanica Inc
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: 近年来，将平面扫描的概念应用于基于深度学习的多视图立体（MVS）取得了重大进展。然而，纹理稀缺性、弱纹理和反射场景等挑战会导致深度估计出现错误，从而在重建的三维点云的准确性和完整性方面留下改进空间。在本文中，我们深入探讨了MVS作为一对多特征匹配任务的本质，并引入了增强特征独特性MVSNet（EFD-MVSNet），这是一种创新的深度估计网络。我们的重点是增强特征显著性，同时降低与平面扫描相关的计算成本，并提高特征匹配精度。我们提出了一种高效的特征细化卷积（FRC）来压缩传统卷积并增强特征对比度。此外，我们还引入了聚合线性转换器（ALT）以自然地适应一对多 MVS 任务，从而实现图像内部和图像之间的线性复杂度远ĸ

12:40-13:00, Paper WeBT10.6
Adaptive Domain-Enhanced Transfer Learning for Welding Defect Classification

Dai, Dan	University of Warwick
Mohan, Anand	University of Warwick
Franciosa, Pasquale	University of Warwick
Zhang, Tong	South China University of Technology
Chen, C. L. Philip	South China University of Technology
Ceglarek, Darek	University of Warwick
Keywords: Transfer Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence Abstract: The integration of Intelligent Welding Systems (IWS) in smart manufacturing leverages advancements in sensors, robotics, and artificial intelligence to optimize welding processes. However, in industry practice, we still face challenges such as sufficient data is not available for every manufacturing task, the costs associated with welding data annotation quality, and the risk of knowledge forgetting during the continual welding process. To tackle these issues, we developed an Adaptive Domain-Enhanced Transfer Learning (ADETL) framework that integrates self-supervised and continual learning strategies. This framework is adept at using incremental and unlabeled data for pre-training, in which we analyze the parameter space, loss landscape, and make the model understand the behaviour of knowledge transfer from diverse source domains. The ADETL framework improves the performance of defect classification, offering a promising solution to the challenges inherent in automatic, continuous welding operations.


WeBT11	MR11
Cognitive and Affective Computing 2	Regular Papers - Cybernetics
Chair: Semba, Shogo	The University of Aizu

11:00-11:20, Paper WeBT11.1
Boosting MLPs on Graphs Via Distillation in Resource-Constrained Environment

Yang, Zhihui	Hohai University
Qu, Zhihao	Hohai University
Jia, Ninghui	Hohai University
Hu, Shihong	Hohai University
Ye, Baoliu	Nanjing University
Zeng, Deze	China University of Geosciences
Keywords: Neural Networks and their Applications, Deep Learning, Application of Artificial Intelligence Abstract: Graph Neural Networks (GNNs) have emerged as a powerful technique across various applications, due to their effective message-passing mechanism. However, their deployment is constrained by limited computational resources, energy concerns, and low-latency processing requirements. While existing works employ logit-based knowledge distillation from GNNs to guide Multilayer Perceptrons (MLPs) training, these methods may lead to reduced accuracy and compromised robustness. These drawbacks arise from two primary factors: the insufficient exploitation of the rich information embedded within the graph structures and the inherent susceptibility of MLPs to noisy data. To tackle these issues, we propose a Mixed Multi-order Knowledge Distillation (MMKD) method, which combines the GNN's logits with hidden layer information through the multi-order distillation to improve the accuracy of the MLP. Moreover, we employ both raw data and perturbed data as input, enhancing the density of knowledge extraction as well as the MLPs' generalization. Extensive experiments across seven benchmark datasets verify the superior performance of our approach in terms of effectiveness and robustness. In comparison with the baseline, our approach achieves an accuracy improvement of up to 8.68% in typical GNN tasks.

11:20-11:40, Paper WeBT11.2
RobustLayoutLM: Leveraging Optimized Layout with Additional Modalities for Improved Document Understanding

Wang, Bowen	Shanghai University
Wei, Xiao	Shanghai University
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Deep Learning Abstract: Pre-training methods have become the mainstream in document understanding, involving self-supervised learning on large-scale unlabeled document to learn rich feature representations, followed by supervised fine-tuning on a smaller labeled document dataset. Recently emerged pre-training models for document understanding all use layout information as important features, however, their performance significantly suffers when the layout information is incorrect. In this paper, we introduce RobustLayoutLM, which utilizes the optimized layout information generated by our proposed Uniform XY Cut algorithm. And Visual-Graph Parser is introduced to integrate image and graph information after the transformer feature fusion, minimizing the impact of erroneous layout information. Experimental results show that our RobustLayoutLM achieves competitive or better results on multiple standard VrDU benchmarks and outperforms previous methods in the face of incorrect layout information.

11:40-12:00, Paper WeBT11.3
MDTF: Multimodal Rumor Detection with Dual Augmentation at Textual and Feature Levels

Mao, Shun	University of Chinese Academy of Sciences
Sui, Jie	University of Chinese Academy of Sciences
Keywords: Deep Learning, Neural Networks and their Applications, AI and Applications Abstract: While social media facilitates communication, it also contributes to the widespread dissemination of rumors, posing significant societal harm. Although numerous detection methods have been proposed to address online rumors, most lack efficient feature fusion for multimodal rumor detection, leading to suboptimal detection performance. Therefore, we introduce the Multimodal Rumor Detection with Dual Augmentation at Textual and Feature Levels model (MDTF), which utilizes feature extraction modules to separately extract textual and visual features. The model employs a cross-attention mechanism to enhance visual features and incorporates a gradual feature fusion approach. Considering the potential signal interference and feature coverage issues from feature fusion, we implement residual connections to mitigate these effects. Additionally, due to the limited data volume in rumor detection datasets, we employ a text data augmentation module to generate a substantial amount of text based on images, thereby enriching the textual dataset. To evaluate the performance of the MDTF model, experiments were conducted on the Twitter dataset, demonstrating that the MDTF model surpasses the most advanced models in terms of accuracy and F1 score.

12:00-12:20, Paper WeBT11.4
A Battery-Powered Wild Animal Tracking Device Using a PTZ Camera and Deep Learning

Semba, Shogo	The University of Aizu
Saito, Hiroshi	The University of Aizu
Tomioka, Yoichi	University of Aizu
Kohira, Yukihide	The University of Aizu
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence Abstract: In this paper, we propose a battery-powered wild animal tracking device using a Pan-Tilt-Zoom (PTZ) camera and deep learning. The proposed tracking device detects wild animals using YOLOv5 and tracks the detected wild animals using DeepSort. In addition, the proposed tracking device realizes tracking for a wide range by controlling a PTZ camera according to the movement direction of the detected wild animals. In the experiment, we developed a prototype for the proposed tracking device and conducted the field test of the developed prototype. We confirmed and discussed cases where the developed prototype could and could not track wild animals. The energy consumption of the developed prototype during tracking was 526.03J at the daytime and 730.99J at the nighttime.

12:20-12:40, Paper WeBT11.5
CSTformer: Cross Spatio-Temporal Transformer for Multivariate Time Series Forecasting

Cai, Tao	Jiangsu University
Wu, Haixiang	Jiangsu University
Niu, Dejiao	Jiangsu University
Li, Lei	Jiangsu University
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence Abstract: Transformer-based models have shown remarkable success in Multivariate Time Series Forecasting (MTSF). Previous methods apply Attention mechanisms to capture temporal and spatial (variable) dependencies separately. However, they struggle to model intricate local spatiotemporal correlations. To address this limitation and enhance performance, we propose CSTformer, a novel Transformer-based model that enables capturing Cross Spatio-Temporal (CST) dependency for MTSF. In CSTformer, through a Variate Compete Linear Attention (VCLA) mechanism, each variable efficiently achieves specific CST features, which compete against the backdrop of the local multivariate. Additionally, we develop a Mixture of Latent (MoL) module to provide adaptive predictions for variables with varying degrees of CST dependencies. Our experimental results on nine benchmarks indicate that, compared with the state-of-the-art method, CSTformer yields a 2.7% relative improvement.

12:40-13:00, Paper WeBT11.6
Reinforcement-Learning-Based Collision Avoidance Strategy for Autonomous Vehicles to Multiple Two-Wheelers at Un-Signalized Obstructed Intersections

Zhang, Delei	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Luan, Wenjing	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Deep Learning Abstract: Two-Wheelers (TWs) often occupy lanes illegally and exceed speed limits, which may lead to traffic accidents. This work uses deep reinforcement learning to design driving strategies for Autonomous Vehicles (AVs) to avoid collision and reduce injury of TW riders with irregular riding behaviors at un-signalized occluded intersections. First, collision-avoidance behaviors of TWs such as bikes, e-bikes, and motorcycles are modeled respectively. Its state spaces integrate a safe avoidance range of AVs, a new position of AVs after adopting the maximum deceleration and maximum steering angle, a predicted acceleration, and information on AVs and other vehicles to improve the performance of driving decisions. At the same time, a reward function is designed based on the injury of TW riders and the driving safety and comfort of AVs. Secondly, a reinforcement learning model for autonomous driving strategy is constructed. Finally, Soft Actor-Critic is used to train the model, and the randomness policy is used to help AVs flexibly deal with the uncertain behaviors of TW riders and realize the balance between exploring unknown behaviors and using existing information. The simulation results show that compared with an autonomous emergency braking system, the injury of the riders using the driving strategy is reduced by 18.02% on average; compared with a risk-aware high-level decision strategy, the injury is reduced by 41.24% on average.


WeBT12	MR12
Networked Systems and Decision Making
Chair: Liu, Xuwang	Henan University

11:00-11:20, Paper WeBT12.1
Deep Reinforcement Learning for Collaborative Inference in Local-Serverless Edge Computing

Chen, Dehua	Guilin University of Electronic Technology
Dong, QingHe	Guilin University of Electronic Technology
He, Qian	Guilin University of Electronic Technology
Jiang, Bingcheng	Guilin University of Electronic Technology
Keywords: Networking and Decision-Making Abstract: High-dimensional parametric models and large-scale mathematical computations limit the efficiency of IoT devices. The emergence of serverless edge computing provides a solution where users can deploy models as serverless functions and delegate provisioning and scaling to the platform. However, the resource-constrained nature of edge resources leads to their inefficiency or inability to serve large neural networks. Therefore, this paper emphasizes collaborative inference between devices and serverless edge computing for DNN models. Specifically, we consider the partitioning and offloading of DNN models and propose a DRL-based strategy to learn the joint decision of model partitioning and function memory type selection to achieve a cost-optimal service that meets the SLO requirements. In addition, we introduce a simulated annealing algorithm in deep reinforcement learning that focuses on exploration in the early stage and empirical value in the later stage. Finally, experimental results show that our algorithm can satisfy various SLOs at a low service cost and outperform three benchmarking strategies.

11:20-11:40, Paper WeBT12.2
IoT-Based Smart Home System Integrated with Deep Learning on the FPGA Development Board (I)

Sung, Guo-Ming	National Taipei University of Technology
Kuo, Fan-Ning	National Taipei University of Technology
Lin, Chih-Yu	National Taipei University of Technology
Tseng, Chwan-Lu	National Taipei University of Technology
Chou, Jen-Hsiang	National Taipei University of Technology
Tung, Li-Fen	National Taipei University of Technology
Keywords: Networking and Decision-Making, Ethics of AI and Pervasive Systems, Assistive Technology Abstract: This study proposes an Internet of Things (IoT)-based smart home system that sends and receives packets through the RS232 protocol and processes them using deep learning. A Field-Programmable Gate Array (FPGA) development board serves as a transceiver, operating a universal serial bus (USB) interface and a Wi-Fi module. The proposed system comprises a built-in wireless transceiver, a set of sensors, a development board running a deep learning algorithm, an MQTT communication protocol, and a terminal device controller. The objective is to implement an IoT-based smart home system with wireless data transmission. Node-RED is used to develop a comprehensive smart home system on the server side for IoT applications, facilitating data access, data processing, and terminal device control. The aim is to achieve automatic regulation and ensure comfortable indoor temperatures. In experiments, the root mean squared error difference between actual and predicted temperatures was approximately 0.4426 °C. After evaluation experiments with the FPGA development board, an application-specific integrated circuit (ASIC) based on the TSMC 0.18-μm CMOS process was used. Simulation results indicate that the chip area is approximately 1.186 × 1.188 mm², and the dynamic power consumption is approximately 8.1674 mW at a power supply of 1.8 V and operating frequencies of 50 and 5 MHz.

11:40-12:00, Paper WeBT12.3
Online Product Presale Pricing Strategies Considering Consumer Anticipated Regret Behavior (I)

Liu, Xuwang	Henan University
Wei, Yujia	Henan University
Qi, Wei	Henan University
Guo, Xiwang	Liaoning Petrochemical University
Wang, Jiacun	Monmouth University
Tang, Ying	Rowan University
Keywords: Networking and Decision-Making, Human Factors, Human-centered Learning Abstract: Online product pricing represents a critical research domain within the platform economy, with the two-stage hybrid pre-sale model emerging as a prevalent mode of online sales. Within the realm of online purchasing, the pre-sale approach exerts a profound influence on consumer behavior and product pricing, while concurrently, consumers' idiosyncratic irrational behavior also significantly shapes product pricing. Based on the two-stage hybrid pre-sale model, this study explores the impact of anticipated regret on consumers' purchasing decision-making processes. Through the formulation of a pertinent product pricing model, the investigation delves into the determination of optimal pricing, sales volume, and profit margins for online products. Moreover, it scrutinizes the ramifications of anticipated regret on consumer behavior and the consequential implications for optimal pricing strategies. Theoretical deliberations and numerical analyses reveal that a lower sensitivity coefficient of high-price regret corresponds to heightened firm profitability, whereas an elevated sensitivity coefficient of out-of-stock regret correlates with enhanced firm profitability. Furthermore, mitigating waiting costs during the pre-sale phase is identified as a means to bolster firm profits. These insights furnish valuable guidance for platform enterprises and merchants endeavoring to refine their product pricing strategies.

12:00-12:20, Paper WeBT12.4
Product Line Pricing and Assortment Optimization Considering Consumer Search Cost (I)

Qi, Wei	Henan University
Zhang, Bangchen	Henan University
Liu, Xuwang	Henan University
Guo, Xiwang	Liaoning Petrochemical University
Wang, Jiacun	Monmouth University
Tang, Ying	Rowan University
Keywords: Human Factors, Networking and Decision-Making, Human-centered Learning Abstract: Addressing the practical challenge of a monopolistic enterprise selling multiple products on an online platform within a specific sales cycle, and taking into account the impact of consumers' search costs, this study employs the Multinomial Logit (MNL) model to simulate customers' decision-making processes. The analysis demonstrates that the profit function exhibits concavity concerning market share, leading to the derivation of theoretically optimal solutions for product pricing, market share allocation, and overall firm profitability. Subsequently, optimal decisions regarding product pricing, market share distribution, and enterprise profit are identified. Furthermore, by considering the constraints inherent in real-world sales scenarios, this research offers a strategic product selection sequence to maximize the enterprise's total revenue. The findings of this study offer valuable theoretical insights to guide enterprises in making informed decisions regarding product line design.

12:20-12:40, Paper WeBT12.5
Optimal Pricing of Product Considering Consumer Regret Psychology (I)

Liu, Xuwang	Henan University
Liu, Tong	Henan University
Qi, Wei	Henan University
Guo, Xiwang	Liaoning Petrochemical University
Wang, Jiacun	Monmouth University
Tang, Ying	Rowan University
Keywords: Networking and Decision-Making, Human Factors, Human-centered Learning Abstract: This paper investigates the application of skimming pricing in sales strategies for businesses, considering the influence of regret psychology on consumer decision-making. It introduces two types of regret psychology—high-price regret and out-of-stock regret—into the consumer utility function, developing a two-stage consumer decision-making model. The paper then addresses optimal pricing and profit strategies for dynamic pricing and commitment pricing approaches. Additionally, it compares and analyzes the advantages and disadvantages of these models. Finally, it examines how high-price regret and out-of-stock regret affect optimal profits. The study suggests that dynamic pricing is optimal when consumers are more sensitive to high-price regret, while commitment pricing is preferable for maximizing profits when consumers are more sensitive to out-of-stock regret. Moreover, it finds that the lower the consumer high-price regret coefficient, the higher the optimal profit for enterprises. For businesses selling “continuous demand” commodities, optimal profit increases with consumer out-of-stock regret coefficient, whereas for businesses selling “fault-line demand” commodities, optimal profit decreases with this coefficient. This research can provide a theoretical model for AI to analyze consumer behavior while considering consumer regret psychology, and then help enterprises to provide decision support for product pricing and revenue management.

12:40-13:00, Paper WeBT12.6
Channel Attention Network for Feature Fusion of Ultrasonic and IMU Signals in Cross-Subject Gait Recognition (I)

Huang, Xujia	Harbin Institute of Technology, Shenzhen
Zhou, Zixiang	Harbin Institute of Technology, Shenzhen
Xu, Xinglong	Harbin Institute of Technology, Shenzhen
Wang, Zhiyong	Harbin Institute of Technology, Shenzhen
Keywords: Biometrics and Applications,, Networking and Decision-Making, Wearable Computing Abstract: In traditional human-machine interface research, a single sensing mode is often adopted, focusing on individual human motion classification experiments. However, the performance often declines when applied across individuals. Ultrasonic and inertial sensing, as mainstream technologies in human-machine interfaces, can respectively detect the muscle states and limb information of the human body, offering potential complementary benefits. The fusion of ultrasonic(US) and Inertial Measurement Unit(IMU) is expected to significantly enhance the generalization ability of physiological signals in motion intention recognition. This paper presents a method that employs fusion of ultrasound and IMU signals, and designs a Convolutional Neural Network with Channel Attention Module (IU-CAM-CNN) for gait classification experiments. The experimental results demonstrate that by introducing channel attention modules, the model's ability to extract features from fused signals can be effectively enhanced, resulting in a classification accuracy of 90% for cross-individual movements. In contrast, traditional LDA and CNN networks achieve only 22% and 47% classification accuracy, respectively. This method provides an effective solution for cross-individual movement intention recognition.


WeBT13	Room T13
Human Factors Amf Wearable Computing
Chair: He, Tianjia	Kyushu University

11:00-11:20, Paper WeBT13.1
Robot Arm Autonomous Grasping Based on Improved TRRT Algorithm

Yin, Xiong	East China Jiaotong University
Cao, Yulian	Beijing University of Posts and Telecommunications
Cheng, Ming	Hubei Central China Technology Development of Electric Power Co
Liu, Wei	Hubei Central China Technology Development of Electric Power Co
Wang, Xiaoming	East China Jiaotong University
Yao, Daojin	ECJTU
Keywords: Ethics of AI and Pervasive Systems, Human-Collaborative Robotics, Human-Machine Interaction Abstract: Aiming at the inability of traditional robotic arms to grasp and avoid obstacles independently, an improved TRRT algorithm is proposed. First, the depth camera is attached to the end of the robot arm, and the GRCNN network model is employed to analyze input objects for grasping, and the robot arm to grasp the pose is output. Secondly, the TRRT algorithm is improved, the pose constraint is integrated, and the Bessel curve is used to smooth the path. Simulation experiments and prototype experiments are carried out respectively, and he experimental results demonstrate that the enhanced algorithm can autonomously grasp and plan motions while avoiding obstacles.

11:20-11:40, Paper WeBT13.2
Development of a Support System for Situation Awareness and Decision of Beginner Navigators Using Voice Information

Nishizaki, Chihiro	Tokyo University of Marine Science and Technology
Hamanaka, Shunsuke	Tokyo University of Marine Science and Technology
Keywords: Human Factors Abstract: Marine accidents often result from errors in situation awareness (SA) among navigators. Despite the ongoing unmanned ship projects worldwide, the implementation of unmanned systems on many ships is expected to take a considerable amount of time. Therefore, the support technology for shipboard navigators and maritime autonomous surface ship’s operators will continue to be crucial in the future. Additionally, to address the challenges posed by the shortage and aging of seafarers, supporting the SA of beginner navigators is essential. In this study, we propose a support system for enhancing the SA of beginner navigators using voice information. This system aims to encourage the gathering of information on target ships and, if necessary, prompt actions against target ships. The results of simulator experiments with cadets, using the Situation Awareness Global Assessment Technique demonstrated the effectiveness of the proposed support system in enhancing SA and decision-making among beginner navigators through voice information. However, to enhance the practicality of the system, it is essential to adjust the timing, contents, and frequency of the voice information according to the skill levels of beginner navigators.

11:40-12:00, Paper WeBT13.3
The Proxemic Influence on Trust in Triadic Human-Robot Interaction: Insights for Tele-Operative Sonography Assessment in Human-In-The-Loop Systems

Toomey, Nicole Gwenith	Deakin University's Institute for Intelligent Systems Reseach An
Kebria, Parham M.	The Institute for Intelligent Systems Research and Innovation, D
Nahavandi, Darius	Deakin Universirty
Skvarc, David	School of Psychology, Deakin University
Mohamed, Shady	Senior Research Fellow, Deakin University
Rahimzadeh, Ghazal	Deakin University, Iisri
Keywords: Human-Machine Interaction, Telepresence, Human-Collaborative Robotics Abstract: Abstract— As robots become more prevalent in society and applied in various workplace sectors, individuals must have an appropriate amount of trust that aligns with robots’ or automated systems' actual capabilities, facilitating optimal and safe human-robot interaction. Appropriately calibrated trust levels can enhance robots’ safe and successful adoption into our society and their unique applied environments. The current research aims to assess individuals' self-reported trust levels in a triadic human-robot-human interaction concerning a collaborative haptically enabled “sonography” style robot (having tele-operative capabilities) to assess moderators of trust unique to this specific domain. The objectives of the current research are to identify participants' trust levels in a triadic interaction focusing on the operator's proxemic location while operating the robot (1) and to compare self-reported trust levels across conditions suggested by the literature to have an influence (2). A repeated measures ANOVA revealed a significant association between the replicated traditional sonography assessment and participants possessing higher trust levels than all robot-related conditions. Further, participants had greater trust for the smooth and slow-functioning robot than the non-smooth functioning robot. Lastly, the current study's findings suggest that, compared to the other robot-related conditions, the experimenter's location operating the tele-operative robot does not significantly influence participants' trust levels. Future research should consider exploring humans’ qualitative perceptions of their interactions with sonography robots and whether trust can be more accurately calibrated over time. Doing so may assist in developing an in-depth understanding of the discrepancies between human-human interaction and human-robot interactions unique to this setting.

12:00-12:20, Paper WeBT13.4
Optimizing Motion Completion with Unconstrained Human Skeleton Structure Learning

He, Tianjia	Kyushu University
Yang, Tianyuan	Kyushu University
Konomi, Shin'ichi	Kyushu University
Keywords: Human Performance Modeling, Cognitive Computing, Human-Machine Interface Abstract: Completing a motion sequence based on sparse key-frames remains a challenging task. The limited grasp of the human skeleton's spatial structure and the complexity of handling sparsely distributed motion sequences pose challenges for traditional interpolation algorithms, hindering their ability to generate authentic and smooth results. Recent progress utilizes Graph Convolutional Networks (GCN) to analyze human skeleton data, yielding promising results. In this paper, we introduce an improved Attention-Based Graph Convolutional Network framework for motion completion that tackles two key challenges: modeling correlations across indirectly connected joints and modeling correlations across frames in motion sequences with diverse sparsity. The method is designed to augment the learning capability of the GCN-based model without being constrained by the inherent human skeleton structure. Furthermore, this design can be concurrently applied to scenarios involving the completion of both single-person and multi-person motions. Experimental results on public human action datasets NTU RGB-D affirm the Spatio-Temporal Attention-Based Graph Convolutional Network's ability to generate smooth and authentic motion results.

12:20-12:40, Paper WeBT13.5
Exploring a Fatigue Compensation Approach for Joint Angle Regression through sEMG and NIRS (I)

Wang, Yan	Harbin Institute of Technology
Cao, Ruikai	Harbin Institute of Technology, Shenzhen
Lyu, Jiahao	Harbin Institute of Technology, Shenzhen
Sheng, Yixuan	Harbin Institute of Technology, Shenzhen
Keywords: Human-Computer Interaction, Wearable Computing, Biometrics and Applications, Abstract: Under conditions of muscle fatigue, the efficacy of using electromyographic signals for decoding joint angles significantly deteriorates, highlighting the importance of optimizing decoding under fatigue. In this context, we combined Near-Infrared Spectroscopy (NIRS) signals with surface electromyography (sEMG) data, utilizing an improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) to enhance adaptive noise reduction. This approach enabled the extraction of Intrinsic Mode Functions reflecting muscle activity characteristics from complex physiological signals. Through correlation analysis, redundant components were eliminated. We proposed a methodology that adeptly captures intricate features and long-term dependencies in time-series data while preserving sensitivity to short-term dynamics, thereby enhancing the precision of joint angle decoding. Following fatigue induction in subjects, the inclusion of near-infrared compensation led to a variable reduction in RMSE values, all of which remained below 5. Furthermore, the correlation coefficient R was enhanced, never falling below 93%. Consequently, our findings demonstrate that integrating near-infrared spectroscopy features into the model exhibits fatigue resistance, thereby improving the precision of joint angle predictions under conditions of fatigue to a certain degree.

12:40-13:00, Paper WeBT13.6
Risk-Aware Shared Control for Teleoperation of Automated Vehicles in Dynamic Environments (I)

Brecht, David	Technical University of Munich
Diermeyer, Frank	Technical University Munich
Keywords: Shared Control, Supervisory Control, Human-Collaborative Robotics Abstract: Teleoperation technology aims to support automated vehicles in situations where no solution to the present scenario can be found. In these situations, decision making is handled by a remote human operator. To overcome safety impairments caused by latencies in data transmission and reduced situational awareness of the remote operator, this work proposes a shared control framework that assists the operator. To enable teleoperation in dynamic environments, risk assessment and prediction methods are integrated into the framework. As a proof of concept, two system implementations are shown. One implementation uses the time-to-collision metric to enhance teleoperation safety in presence of dynamic objects. The second implementation integrates an open-source occlusion awareness module that allows to consider risk arising from traffic participants potentially emerging from occluded areas. The risk and criticality of situations with dynamic objects are reduced by both implementations.


WeBPSR	Room T14
Poster Presentation - Session 3	Poster Session

11:00-13:00, Paper WeBPSR.1
Development of a Navigation System Using MY VISION for Visually Impaired People －A Method to Guide the Direction of Travel

Koike, Yuki	Kyushu Institute of Technology
Tanjo, Yui	Kyushu Institute of Technology
Keywords: Transfer Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition Abstract: The number of visually impaired people in Japan as of 2016 was 312,000, and 70.4% of them have experienced accidents such as falls or collisions while going out. In addition, while a white walking cane is commonly used to move around when going out, there is a problem of not being able to recognize a wide range of areas. Many studies have been conducted to support the movement of visually impaired people using camera images, and many methods have been proposed to support walking on sidewalks and crossing intersections. However, when considering walking on a sidewalk, there are cases where the sidewalk is broken by a side street intersecting the roadway, and few studies have focused on such a break in the sidewalk. In many cases, there are no traffic signals or pedestrian crossings at such breaks in the sidewalk, making them dangerous places for people with vision difficulties. In this study, we propose a walking assistance method at a break in the sidewalk using images obtained by MY VISION (a Magic eYe of a Visually Impaired for Safety and Independent actiON) and deep learning. MY VISION is a system that analyzes videos obtained from a camera attached to the user's body and provides useful visual information, functioning as a virtual eye for visually impaired people. The proposed method provides a model for recognizing the sidewalk environment and a model for guiding a user to the center of the sidewalk in order to guide the visually impaired people safely. Experiments were conducted to verify the accuracy of each model, and the effectiveness of the proposed method was shown.

11:00-13:00, Paper WeBPSR.2
A Bilayered Decomposition Technique for Handling Complex Constrained Multi-Objective Optimization Problems

Yasuda, Yusuke	Tokyo Metropolitan University
Tamura, Kenichi	Tokyo Metropolitan University
Yasuda, Keiichiro	Tokyo Metropolitan University
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Heuristic Algorithms Abstract: Constrained multi-objective optimization problems (CMOPs) are prevalent in real-world applications, dealing with multi-objective and constraint functions. Multiobjective evolutionary algorithm based on decomposition with differential evolution (MOEA/D-DE) has proven effective for unconstrained multi-objective optimization problems with complex Pareto front (PF). However, in CMOPs, the conflicting nature between objectives and constraints often makes it challenging to appropriately manage constraints while ensuring convergence, diversity, and feasibility of the solution set towards the PF. To address this issue, this paper proposes a bilayered decomposition technique. The proposed algorithm treats constraints as an additional objective function, separately decomposing the objective space and the constraint violation space. This method allows for the simultaneous attainment of convergence, diversity, and feasibility in the solution set while efficiently exploring the PF. Experimental results demonstrate that, within the MOEA/D-DE framework, our algorithm efficiently navigates complex PFs on challenging problems such as LIR-CMOPs and DAS-CMOPs, matching the performance of state-of-the-art constrained multi-objective evolutionary algorithms.

11:00-13:00, Paper WeBPSR.3
Multi-Scale HSV Color Feature Embedding for High-Fidelity NIR-To-RGB Spectrum Translation

Zhai, Huiyu	Hunan University of Science and Technology
Chen, Mo	Hong Kong Baptist University
Yang, Xingxing	Hong Kong Baptist University
Kang, Guosheng	Hunan University of Science and Technology
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning Abstract: The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral mapping ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the mapping process into three sub-tasks, including NIR texture maintenance, coarse geometry reconstruction, and RGB color prediction. Thus, we propose three critical modules for each corresponding sub-task: the Texture Preserving Block (TPB), the HSV Color Feature Embedding Module (HSV-CFEM), and the Geometry Reconstruction Module (GRM). These modules contribute to our MCFNet by methodically tackling spectral translation through a series of escalating resolutions, progressively enriching images with color and texture fidelity in a scale-coherent fashion. The proposed MCFNet demonstrates substantial performance gains over the NIR image colorization task. The code is available at: https://github.com/AlexYangxx/MCFNet.

11:00-13:00, Paper WeBPSR.4
Deep Spectral Clustering Via Joint Spectral Embedding and Kmeans

Guo, Wengang	Tongji University
Ye, Wei	Tongji University
Keywords: Image Processing and Pattern Recognition, Representation Learning, Deep Learning Abstract: Spectral clustering is a popular clustering method. It first maps data into the spectral embedding space and then uses Kmeans to find clusters. However, the two decoupled steps prohibit joint optimization for the optimal solution. In addition, it needs to construct the similarity graph for samples, which suffers from the curse of dimensionality when the data are high-dimensional. To address these two challenges, we introduce Deep Spectral Clustering (DSC), which consists of two main modules: the spectral embedding module and the greedy Kmeans module. The former module learns to efficiently embed raw samples into the spectral embedding space using deep neural networks and power iteration. The latter module improves the cluster structures of Kmeans on the learned spectral embeddings by a greedy optimization strategy, which iteratively reveals the direction of the worst cluster structures and optimizes embeddings in this direction. To jointly optimize spectral embeddings and clustering, we seamlessly integrate the two modules and optimize them in an end-to-end manner. Experimental results on seven real-world datasets demonstrate that DSC achieves state-of-the-art clustering performance.

11:00-13:00, Paper WeBPSR.5
Multi-Volumetric Feature-Based Brain Age Prediction Using sMRI and Graph Neural Networks

Kumar, Suraj	Indian Institute of Technology Guwahati
Gupta, Navin	Indian Institute of Technology Guwahati, Assam, India
Keywords: Image Processing and Pattern Recognition, Deep Learning, Machine Learning Abstract: There are noticeable studies where Graph Neural Networks (GNNs) have been utilized effectively in the field of neuroimaging, mostly on functional magnetic resonance imaging (fMRI), and their implementation on structural MRI (sMRI) has not been investigated much. GNNs perform better than traditional deep learning methods in capturing the intricate brain's anatomical attributes and the relationships between the brain regions of interest (ROIs). This is because of GNN's capacity to efficiently capture intricate information by leveraging the structural information of the underlying graph. Additionally, GNNs have relatively fewer message-passing stages, which makes them computationally efficient. This study presents a GNN-based framework for brain age prediction at the ROI level by utilizing volumetric information of grey matter (GM) and white matter (WM). The dataset comprises 232 T1-weighted sMRI scans (98 males, 134 females) of healthy controls obtained from ADNI. The brain was segmented into 56 ROIs using the LPBA40 brain atlas in CAT12. Subsequently, GM and WM volumes were extracted from these ROIs, and anatomical graphs were constructed based on GM volume information. These graphs served as inputs for different GNN models, including GCN, GraphSage, GAT, and GIN layers-based models. These models have been trained to predict brain age, and the model's effectiveness evaluation demonstrates that the GCN-based GNN model outperforms other models, with a Mean Absolute Error (MAE) value of 5.06 and Pearson's correlation coefficient (PCC) of 0.76.

11:00-13:00, Paper WeBPSR.6
A No Reference Deep Quality Assessment Index for 3D Colored Meshes

Ibork, Zaineb	Ibn Tofail University
Nouri, Anass	Ibn Tofail University
Lezoray, Olivier	University of Caen Normandy
Charrier, Christophe	University of Caen Normandie
Raja, Touahni	Ibn Tofail University
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning Abstract: The advent of 3D data has revolutionized various industries, from architecture and engineering to healthcare and entertainment, enabling more precise simulations and realistic visualizations. However, 3D data is susceptible to noise and loss during generation and transmission, making quality assessment crucial for ensuring accuracy and usability. While existing literature addresses quality assessment for 3D point clouds and meshes separately, a gap exists in assessing the quality of 3D colored meshes due to the lack of reference datasets. This paper proposes an approach for No Reference 3D Colored Mesh Visual Quality Assessment (CMVQA), building on previous work in 3D uncolored meshes quality assessment. Our approach combines geometric and color features with spatial domain features extracted from mesh projections. Through extensive experiments and comparison with full-reference metrics, including image quality metrics, our proposed approach demonstrates superior performance.

11:00-13:00, Paper WeBPSR.7
3D Cellular Segmentation of Live Embryos Via Topologically and Biologically Boundary-Aware Semi-Supervised Learning

Li, Zelin	City University of Hong Kong
Huang, Zhaoke	City University of Hong Kong
Yan, Hong	City University of Hong Kong
Keywords: Image Processing and Pattern Recognition, Deep Learning, Computational Life Science Abstract: 3D cellular segmentation of fluorescence images of live embryos is a fundamental step in the analysis of embryonic developmental processes. However, existing fully CNN-based supervised learning methods to achieve this often suffer from non-robust biological constraints and insufficient training data. Previous work on cellular segmentation did not consider topological information and biological constraints. In this paper, we propose a novel semi-supervised method using topological loss and biologically boundary-aware synthetic (labeled and unlabeled mixed) ground truth for 3D cellular segmentation of live embryos. The topological loss function guides the model in extracting features in a latent space. Semi-supervised learning and synthetic biologically boundary-aware datasets improve the accuracy of inner and outer membrane recognition on a large number of unlabeled images. Experiment results and evaluation on 1472 live embryo images show that our method outperforms existing deep-learning models. This method can be adapted to images of live embryos of other animals.

11:00-13:00, Paper WeBPSR.8
Neural Operator-Based Framework for Time Efficient Denoising of Displacement Fields in Ultrasound Elastography

Zhu, Yihong	Southwest Petroleum University
Peng, Bo	Southwest Petroleum University
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications Abstract: 在超声弹性成像中，测量的位移场中存在的噪声一直是影响应变质量或弹性分布重建的关键因素。现有的基于偏微分方程（PDE）的去噪算法可以在一定程度上有效去除噪声，但受求解速度慢的限制。为了应对这一挑战，我们引入了一种基于神经操作员的框架，用于对超声弹性成像中测量的位移场进行去噪。通过利用神经算子学习去噪方程的通用解算子，该框架旨在实现位移场的快速准确的去噪。在模拟数据和组织模拟模型数据上进行的实验表明，该方法在各种指标上的表现与传统方法相当。特别是，所提出的方法在各种数据维度上的处理速度比有限差分法（FDM）快约30至577倍，使其成为超声弹性成像中快速和可扩展数据处理的高效解决方

11:00-13:00, Paper WeBPSR.9
PSAM: Prompt-Based Segment Anything Model Adaption for Medical Image Segmentation

Wen, Chengyi	Tongji University
He, Lianghua	Tongji University
Keywords: Image Processing and Pattern Recognition, Deep Learning, Application of Artificial Intelligence Abstract: In the current landscape where large models are increasingly becoming the norm for task solving, maximizing the utilization of these models has emerged as a focal point of research. The Segment Anything Model (SAM), an eminent large-scale image segmentation model using a new task, model, and dataset, has garnered recognition for its efficacy across different scenarios. However, the effectiveness of SAM is hindered in the medical domain due to the scarcity of available medical images, leading to suboptimal training and inadequate adaptation of its feature extractor to medical imagery. In this work, we propose PSAM, which built upon SAM to explore a new research paradigm of customizing large-scale models to meet the demands of medical image segmentation tasks. This is achieved through a two-fold strategy: Firstly, we incorporate parallel feature extraction branches into SAM, guided by taskspecific prompts derived from CLIP, enhancing its ability to extract relevant features. Secondly, we introduce an enhanced, visually task-friendly adapter mechanism, which effectively injects medical knowledge into SAM’s ViT image encoder for facilitating adaptive task execution in medical image scenarios. Our experimental findings demonstrate the effectiveness of PSAM in accurately segmenting medical images, underscoring its potential as a valuable tool in the medical imaging domain.

11:00-13:00, Paper WeBPSR.10
Vision Transformer Based Hash Coding for Efficient Image and Audio Retrieval with Global and Local Equilibrium Distance Constraints

Liu, Ye	Sun Yat-Sen University
Pan, Yan	Sun Yat-Sen University
Yin, Jian	Sun Yat-Sen University
Keywords: Image Processing and Pattern Recognition, Deep Learning, Multimedia Computation Abstract: With the application of efficient retrieval in information systems and retrieval augmented generation with vector database for large language models, hash coding algorithms have made progress in recent years. The rise of transformer technology in the field of deep learning has brought the possibility to further improve the effect of hash coding algorithms. We introduce the vision transformer framework to both images and audios, and propose a novel approach for the tasks of multi-label image retrieval and audio event retrieval. In the proposed hash coding model, global and local equilibrium distance constraints are integrated, so that the hash codes for images can be better obtained through the global hash centers and local similar samples. In order to realize end-to-end training and hash code generation for audios, we adopt the adapter of mel spectrogram, thus the proposed approach can be simply converted and applied to audio hash coding. Comparative experiments verify that better results can be achieved on multiple image and audio datasets.

11:00-13:00, Paper WeBPSR.11
Discriminative Visual-Semantic Collaborative Online Decomposition Hashing for Streaming Image Data Retrieval

Jing, Chen	Beijing University of Posts and Telecommunications
Zu, Yunxiao	Beijing University of Posts and Telecommunications
Weihai, Li	Beijing University of Posts and Telecommunications
Sang, Xinzhu	Beijing University of Posts and Telecommunications
Liu, Meiru	Beijing University of Posts and Telecommunications
Xu, Mengying	Beijing University of Posts and Telecommunicaions
Keywords: Image Processing and Pattern Recognition, Media Computing Abstract: Online hashing has gained significant interest for its tremendous promise in handling large-scale streaming image data. However, there are still several problems to be solved. Firstly, many existing methods struggle to fully utilize previous knowledge and fail to effectively mitigate catastrophic forgetting. Secondly, current online hashing methods lack targeted guidance for hash function learning, resulting in weak discriminative ability. Third, many current methods adopt ineffective optimization methods to learn hash code. This paper proposes a Discriminative Visual-semantic Collaborative Online Decomposition Hashing method, abbreviated as DVsCODH. It includes two steps: hash code learning and hash function learning. In the first step, DVsCODH adopts an online pairwise similarity supervision to guide hash code learning, and simultaneously introduces a visual-semantic collaborative online decomposition strategy to adaptively fuse visual and semantic information. Through incremental online learning, DVsCODH is capable of extracting complete similarity, visual and high-level semantic information of the database. In the second step, DVsCODH leverages database-wide similarity to supervise hash function online learning. Furthermore, we propose an effective online discrete optimization algorithm that directly generate hash code, thereby enhancing model training efficiency. Experimental results on three public image datasets demonstrate the excellent retrieval performance of DVsCODH.

11:00-13:00, Paper WeBPSR.12
Can Students Understand AI Decisions Based on Variables Extracted Via AutoML?

Tang, Liang	University of Illinois at Urbana Champaign
Bosch, Nigel	University of Illinois at Urbana Champaign
Keywords: Human Factors, Human-centered Learning, Human-Computer Interaction Abstract: In computer-based education, understanding student data is essential for students, teachers, researchers, and others to adapt to insights gained from analyses (e.g., AI predictions of student outcomes). However, one important question is: how well can students make sense of the data we present? And what factors influence the interpretability of those data? This study assessed students’ perceptions of predictive variables (i.e., “features”) used in machine learning models for predicting student outcomes; in particular, we explored features crafted by experts versus those extracted by methods for automatic machine learning (i.e., AutoML). Our results indicated a meaningful difference in students’ interpretability perceptions between the expert and AutoML features across two diverse datasets; additionally, features derived from timing and scoring data were more interpretable than those from interaction (e.g., keystroke) data. Other potential explanations for interpretability differences, including statistical methods, repeated exposure, and lexical familiarity, had relatively minimal impact on interpretability.

11:00-13:00, Paper WeBPSR.13
A Digital Twin-Based Distributed Method for the SOC Estimation of Li-Ion Battery Pack

Li, Heng	Central South University
Zhuo, Shilong	Central South University
Zhang, Yulin	Central South University
Peng, Hui	Central South University
Keywords: Digital Twin Abstract: In the current era, a Li-ion battery pack, typically comprised of multiple cells, can offer higher voltage and output power. This plays a crucial role in various applications, including electric vehicles and energy storage. Accurate estimating the battery pack's state of charge (SOC) is crucial to offer users a clearer understanding of the battery status and to alleviate range anxiety. In the industry, it's common practice to precisely estimate the SOC for each cell, enabling an accurate assessment of the battery pack's overall SOC. However, most current methods for estimating the SOC in battery packs are centralized. In such cases, a problem with estimating the SOC of a single cell can greatly impact the overall SOC estimation of the entire battery pack. Likewise, if centralized equipment encounters issues, the SOC estimation for the entire battery pack is likely to be interrupted. This paper presents a distributed method for estimating battery pack SOC, utilizing a digital twin-based simulation platform. In the following, the each node that measures the SOC of cell is regarded as an agent that can communicate. Through communication among agents, each agent can converge to a reliable battery pack SOC estimation. In the event of a sudden issue arising in the SOC estimation of a cell, the proposed method can still uphold a dependable estimate of the battery pack's SOC, thereby bolstering the overall robustness of the SOC estimation system for the entire battery pack.

11:00-13:00, Paper WeBPSR.14
Graph-Based Ensemble Learning for Enhanced Fault Localization in Microservices

Chen, Ruibo	Beihang University
Peng, Fang	Big Data Center, State Grid Corporation of China
Xin, Ji	Big Data Center of State Grid Corporation of Chin
Nan, Xiang	State Grid Nanjing Power Supply Company
Kui, Zhang	Beihang University
Lou, Yihua	TravelSky Technology Limited
Pu, Yanjun	Zhongguancun Laboratory
Wu, Wenjun	Beihang University
Keywords: Fault Monitoring and Diagnosis, Large-Scale System of Systems, Service Systems and Organizations Abstract: As microservices architectures become increasingly prevalent, they introduce significant operational challenges due to the complexities in service interactions and fault propagation. These architectures often conceal the origins of faults due to intricate inter-service communications, making fault localization both critical and challenging. Addressing these difficulties, this paper introduces a novel fault localization method that leverages synergies between domain prior knowledge, ensemble learning, and graph-based modeling. Our approach models microservices as a graph, with services as nodes and their interactions as edges, illuminating complex dependencies and enhancing the depth of data analysis. The method integrates expert knowledge with a unique blend of multi-class decision trees and strategy models derived from a knowledge base, enabling effective detection of diverse patterns and anomalies. Additionally, a meta-learner refines the outputs from base models using a weighted decision-making process, significantly improving the accuracy and robustness of fault detection. Compared to traditional models, including graph neural networks, our approach substantially reduces model complexity and enhances adaptability to evolving service patterns. It demonstrates superior scalability and real-time processing capabilities, offering a robust solution to the challenges of fault localization in dynamic microservice environments.

11:00-13:00, Paper WeBPSR.15
Power Asymmetry in Basic Hierarchical Graph Model for Conflict Resolution

Xie, Hui	National University of Defense Technology
Ge, Bingfeng	National University of Defense Technology
Huang, Yuming	National University of Defense Technology
Liu, Zihui	National University of Defense Technology
Hou, Zeqiang	National University of Defense Technology
Wei, Wanying	National University of Defense Technology
Keywords: Conflict Resolution, Decision Support Systems, Cooperative Systems and Control Abstract: The hierarchical graph model for conflict resolution (HGMCR) serves as a powerful tool for analyzing multiple interrelated conflicts, in which decision makers (DMs) at different levels are brought together and assumed to have symmetrical power. In some hierarchical conflicts, however, the DMs may vary in power and influence the ultimate conflict resolution at diverse extents. Accordingly, this paper aims to introduce the power asymmetry into basic HGMCR (B HGMCR) to resolve more complex hierarchical conflict problems. Initially, power dynamics are defined to capture the preference relations of DMs under power asymmetry. Then, matrix-based BHGMCR modeling under power asymmetry is presented, followed by the expansion of four classic stability definitions. Finally, a case study on carbon emission conflict is used to demonstrate that the proposed approach can handle the real-world hierarchical conflicts. The intervention of government power can promote carbon reform in a global perspective.

11:00-13:00, Paper WeBPSR.16
Optimization of Post-Disaster Road Network Repair Strategy Considering Road Recovery Level

Sun, Zhiyuan	Chang'an University, Xi'an, China
Mu, Chen	Chang'an University, Xi'an, China
Liu, Shumei	Chang'an University
Wang, Jiapei	Chang'an University, Xi'an, China
Zou, Yuyang	Chang'an University, Xi'an, China
Keywords: Decision Support Systems, System Modeling and Control Abstract: Existing studies on post-disaster road network repair strategies have ignored the impact of different levels of road damage and recovery on the efficiency of network repair. To solve this issue, this study integrates the construction material distribution (CMD) with the repair crew scheduling and routing problem (RCSRP), and determine the level of road recovery through the CMD. Then, a bi-level optimization model is proposed with network performance resilience and recovery speed resilience as the optimization objectives. A two-stage optimization algorithm (TSOA) composed of a genetic algorithm with an improved coding method (ICM-GA) and the Frank-Wolfe algorithm (FW) is then employed to solve this model. Finally, the effectiveness of the model and algorithm is validated through simulation experiments. The results indicate that, under given material and time constraints, the optimal repair strategy proposed in this study outperforms the repair strategy without considering road recovery level by 17.51% and 5.42% in terms of network performance resilience and recovery speed resilience, respectively. This demonstrates the positive significance of considering road recovery level in formulating road network repair strategies. Besides, this strategy can be applied to optimize the configuration of workstation count for different-scale networks.

11:00-13:00, Paper WeBPSR.17
HemSeg-200: A Voxel-Annotated Dataset for Intracerebral Hemorrhages Segmentation in Brain CT Scans (I)

Changwei, Song	Beijing University of Technology
Zhao, Qing	Beijing University of Technology
Li, Jianqiang	Beijing University of Technology
Yue, Xin	Beijing University of Technology
Gao, Ruoyun	Beijing University of Technology
Wang, Zhaoxuan	Beijing University of Technology
Gao, An	Tianjin Medical University Cancer Institute and Hospital
Fu, Guanghui	Sorbonne University
Keywords: Medical Informatics Abstract: Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment plan. While current research in deep learning has largely focused on qualitative analyses, such as identifying subtypes of cerebral hemorrhages, there remains a significant gap in quantitative analysis crucial for enhancing clinical treatments. Addressing this gap, our paper introduces a dataset comprising 222 CT annotations, sourced from the RSNA 2019 Brain CT Hemorrhage Challenge and meticulously annotated at the voxel level for precise IPH and IVH segmentation. This dataset was utilized to train and evaluate seven advanced medical image segmentation algorithms, with the goal of refining the accuracy of segmentation for these hemorrhages. Our findings demonstrate that this dataset not only furthers the development of sophisticated segmentation algorithms but also substantially aids scientific research and clinical practice by improving the diagnosis and management of these severe hemorrhages. Our dataset and codes are available at url{https://github.com/songchangwei/3DCT-SD-IVH-ICH}

11:00-13:00, Paper WeBPSR.18
Protein-Protein Interaction Prediction Models Based on Graph Neural Networks (I)

Li, Yapeng	Chongqing University of Posts and Telecommunications
Yang, Jie	Chongqing University of Posts and Telecommunications
Chen, Yuwen	Chongqing Institute of Green and Intelligent Technology, Chinese
Keywords: Digital Twin Abstract: Protein-protein interactions (PPIs) are the foundation for numerous biological processes within cells, which are crucial for understanding cellular signaling networks, disease mechanisms, and drug development. Recently, numerous artificial intelligence (AI)-based approaches have emerged for predicting PPIs. Nevertheless, existing AI-based approaches either partially or loosely consider these relationships and mechanisms by a non-end-to-end learning framework, resulting in sub-optimal feature extractions and fusions for prediction. To address this issue, this paper proposes an end-to-end graph neural network model for protein-protein interaction prediction, termed the Knowledge Graph Fused Graph Neural Network (KGF-GNN). First, protein associated network (PAN) is constructed by comprehensively exploiting protein-associated relationships and mechanisms among drugs, diseases, ribonucleic acid, protein structures, etc. Then, a graph neural network (GNN) is built to extract both the topological and semantic features from PAN. Secondly, the observed interactions between proteins are constructed into a PPI network, and another GNN is built to extract the hidden topological features within the PPI network. Third, a multi-layer perceptron is designed to fuse the extracted various features by end-to-end learning. With such designs, the feature extractions and fusions of PPIs are guaranteed to be comprehensive and optimal for prediction. Finally, by conducting extensive experiments on real PPI datasets, we demonstrate that our KGF-GNN can accurately predict PPIs and significantly outperform state-of-the-art models.

11:00-13:00, Paper WeBPSR.19
Diverse Transformation-Augmented Graph Tensor Convolutional Network for Dynamic Graph Representation Learning (I)

Wang, Ling	Chongqing University of Posts and Telecommunications
Huang, Yixiang	Southwest University
Hao, Wu	Southwest University
Keywords: Large-Scale System of Systems Abstract: A dynamic graphs (DG) is frequently adopted to describe the evolving interactions between nodes in real-world applications such as device communication networks. Temporal patterns are the natural characteristics of DG and are also the key to representation learning. However, most of the existing dynamic GCN models consist of static GCN and sequence modules, resulting in the separation of spatiotemporal information and the inability to effectively capture the complex temporal patterns in DG. To solve this problem, this study proposes a Diverse Transformation-Augmented Graph Tensor Convolutional Network (DTGTCN) with three-fold ideas: a) leveraging the tensor M-product to formulate the unified graph tensor convolution network (GTCN) without separate representation of spatiotemporal information; b) introducing three transformation schemes into GTCN to model complex temporal patterns for aggregating temporal information; c) building the ensemble of diverse transformation schemes to obtain high representation capacity. Empirical studies on four DGs emerging from communication networks demonstrate that owing to diverse transformation, the proposed DTGTCN significantly outperforms state-of-the-art models in addressing the task of link weight estimation.


WeG1D	HALL C&D
Keynote 6 (Guest Speaker) Chairperson: Prof Ishak Aris the Role of Telepresence in Future Space Missions


WeCT1	MR01
Cybernetics and Quantum Systems 5
Chair: Zhao, Huarong	Jiangnan University

15:00-15:20, Paper WeCT1.1
An Algorithmic Framework for Constructing Multiple Decision Trees by Evaluating Their Combination Performance Throughout the Construction Process

Tajima, Keito	Waseda University
Ichijo, Naoki	Waseda University
Nakahara, Yuta	Waseda University
Shimada, Koshi	Waseda University
Matsushima, Toshiyasu	Waseda University
Keywords: Machine Learning Abstract: Predictions using a combination of decision trees are known to be effective in machine learning. Typical ideas for constructing a combination of decision trees for prediction are bagging and boosting. Bagging independently constructs de- cision trees without evaluating their combination performance and averages them afterward. Boosting constructs decision trees sequentially, only evaluating a combination performance of a new decision tree and the fixed past decision trees at each step. Therefore, neither method directly constructs nor evaluates a combination of decision trees for the final prediction. When the final prediction is based on a combination of decision trees, it is natural to evaluate the appropriateness of the combination when constructing them. In this paper, we propose a new algorithmic framework that constructs decision trees simultaneously and evaluates their combination performance throughout the construction process. Our framework repeats two procedures. In the first procedure, we construct new candidates of combinations of decision trees to find a proper combination of decision trees. In the second procedure, we evaluate each combination performance of decision trees under some criteria and select a better combination. To confirm the performance of the proposed framework, we experiment with synthetic and benchmark data.

15:20-15:40, Paper WeCT1.2
Data-Driven Dynamic Event-Triggered Sliding-Mode Heading Control for Unmanned Surface Vehicles with Uncertainties

Zhao, Huarong	Jiangnan University
Shan, Jinjun	York University
Li, Xing	Dongguan University of Technology
Yu, Hongnian	Built Environment, Edinburgh Napier University
Keywords: Cybernetics for Informatics Abstract: This paper investigates a data-driven dynamic event-triggered sliding mode heading control problem for unmanned surface vehicles with uncertain dynamics models. First, a virtual sensor is introduced to establish a compact dynamic linearization model for the unmanned surface vehicle. Then, a dynamic event-triggered scheme is developed to alleviate the communication burden. Moreover, a sliding mode surface is designed, and a data-driven dynamic event-triggered sliding mode heading control approach is formulated. Finally, rigorous mathematical proofs are given, and several simulations demonstrate the effectiveness and superiority of the proposed method compared to existing approaches.

15:40-16:00, Paper WeCT1.3
BINN-DT: Towards Better Interpretability of Multidimensional Decision Rules Via Bivariate Nonlinear Node Decision Trees

Arai, Satoshi	Yokohama National University
Shirakawa, Shinichi	Yokohama National University
Nagao, Tomoharu	Yokohama National University
Keywords: Machine Learning, Heuristic Algorithms, Expert and Knowledge-Based Systems Abstract: In the practical application of machine learning, the opaqueness of models often poses significant challenges. While decision trees are known for balancing representability with interpretability, enabling humans to understand decision rules, their interpretability decreases as the complexity of the task increases and the tree size expands, making it difficult to trace and interpret the decision flow. In this paper, we introduce a new variant of decision tree called Bivariate Nonlinear Node Decision Tree (BINN-DT), designed to enhance the interpretability of decision trees. BINN-DT selects bivariate features at each node and utilizes nonlinear splitters to learn the data splitting rules. Additionally, each node visualizes the relationship between the data distribution and split boundaries through a two-dimensional map using the selected bivariate features. Our experiments compared the proposed BINN-DT method with traditional univariate decision trees. The results demonstrate that our approach not only maintains classification accuracy but also produces more compact models. BINN-DT clearly depicts the entire decision boundaries of a model as a tree-structured collection of two-dimensional maps with the bivariate feature axes selected from the entire features. Our method significantly improves the interpretability of models by producing more compact models than the traditional decision trees, without sacrifice of accuracy.

16:00-16:20, Paper WeCT1.4
MVOD: A Multi-View Outlier Detection Method with Single-Feature View Augmentation

Zhu, ZhaoWei	Zhejiang Sci-Tech University
Lei, Yun	Huawei Technologies Ltd. (huawei.com)
Xu, JiaWei	Central South University
Gui, Ning	Central South University
Chen, Zhu	Dalian Power Plant of China Huaneng, China
Li, Dongdong	Dalian Power Plant of China Huaneng, China
Keywords: Machine Learning Abstract: Outlier detection identifies rare items, events, or observations in data analysis and has critical applications in many fields. In most such applications, datasets are high-dimensional. To reduce the impact of the "curse of dimensionality", many such applications decompose the entire feature space into different subspaces with two or more "relevant" features for deviations of interest. Those approaches often ignore the case for subspaces with a single feature. Due to the low dimension and high data density, it sometimes sufficient to identify univariate outliers. Thus, this paper proposes a multi-view outlier detection algorithm MVOD to ensemble outlier detection from three views: single feature view, local view, and global view. More specifically, we design a general outlier score function based on the quantities of information to evaluate the strength of the data distribution structure. Then, the outlier score for each point from different views is normalized and combined to reduce representational bias under different views. Extensive experiments are carried out on ten public benchmark datasets with ten state-of-art baselines. Experimental results show that MVOD is significantly better than those baselines in terms of AUC_ROC.

16:20-16:40, Paper WeCT1.5
Multiobjective Quantum-Inspired Tabu Search for Trend Ratio-Based Portfolio Optimization

Kuo, Shu-Yu	National Taiwan University
Tong, Yong Feng	National Chi Nan University
Shen, Jyun-Yi	National Chi Nan University
Young, Alvin	National Chi Nan University
Jiang, Yu-Chi	Princeton University
Lai, Yun-Ting	National Chi Nan University
Chang, Ming-Ho	National Chi Nan University
Chou, Yao-Hsin	National Chi Nan University
Keywords: Quantum Cybernetics, Metaheuristic Algorithms, Soft Computing, Socio-Economic Cybernetics Abstract: Quantum-inspired optimization (QIO) has garnered attention for attempting to retain quantum benefits on classical computers, thereby improving search efficiency in solving complex optimization problems. Portfolio optimization is one of the complicated real-world applications that concerns conflicting objectives of profit and risk simultaneously, making it a bi-objective problem. This study exploits the advantage of the QIO to propose a multi-objective quantum-inspired tabu search algorithm (MoQTS) for constructing the Pareto front (PF) for portfolio optimization based on the innovative bi-objective trend ratio model. MoQTS initially employs the superposition encoding mechanism and Q-gate to search for potential areas quickly while maintaining the memory of PF information. Then, the entanglement move expands the search direction along with the current PF with more diversity. This study provides exhaustive search results to examine the completeness of optimal solutions in PF. The experimental results demonstrate that MoQTS exhibits competitive performance compared to classical methods across various metrics, including inverted generational distance (IGD), hypervolume (HV), and others. MoQTS shows significant potential in generating the PF using fewer computational resources.

16:40-17:00, Paper WeCT1.6
Synthesis of Decoherence-Free Modes in Linear Quantum Passive Systems Via Robust Pole Placement (I)

Miao, Zibo	Harbin Institute of Technology
Pan, Yu	Zhejiang University
Gao, Qing	Beihang University
Keywords: Quantum Cybernetics Abstract: In this paper we extend our previous research on coherent observer-based pole placement approach to study the synthesis of robust decoherence-free (DF) modes for linear quantum passive systems, which is aimed at preservation of quantum information. In particular, DF modes can be generated by placing the poles on the imaginary axis via a coherent feedback design scheme, and these modes can further be simultaneously made robust against perturbations to the system parameters by minimizing the condition number associated with imaginary poles. We develop explicit algebraic conditions for the existence of such a coherent quantum controller, with the corresponding deign procedure provided. Examples are given to illustrate the process of tuning the DF modes towards perfect robustness via the proposed pole placement technique.


WeCT2	MR02
Complex and Cooperative Systems 1
Chair: Wu, Hao	Dongguan University of Technology

15:00-15:20, Paper WeCT2.1
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Fan, Yihe	TongJi University
Cao, Yuxin	National University of Singapre
Zhao, Ziyu	Beijing University of Technology
Liu, Ziyao	Nanyang Technological University
Li, Shaofeng	Peng Cheng Laboratory
Keywords: Machine Learning, Deep Learning, AI and Applications Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that increasingly influence various aspects of our daily lives, constantly defining the new boundary of Artificial General Intelligence (AGI). Image modalities, enriched with profound semantic information and a more continuous mathematical nature compared to other modalities, greatly enhance the functionalities of MLLMs when integrated. However, this integration serves as a double-edged sword, providing attackers with expansive vulnerabilities to exploit for highly covert and harmful attacks. The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research. In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs. Initially, we delineate the foundational components and training processes of MLLMs. Subsequently, we construct a threat model, outlining the security vulnerabilities intrinsic to MLLMs. Moreover, we analyze and summarize existing scholarly discourses on MLLMs' attack and defense mechanisms, culminating in suggestions for the future research on MLLM security. Through this comprehensive analysis, we aim to deepen the academic understanding of MLLM security challenges and propel forward the development of trustworthy MLLM systems.

15:20-15:40, Paper WeCT2.2
Influence Distribution for Misinformation Containment under Competitive Activation Models

Gu, Ming	South China University of Technology
Chen, Wei-Neng	South China University of Technology
Hu, Xiao-Min	Guangdong University of Technology
Jeon, Sang-Woon	Hanyang University
Keywords: Heuristic Algorithms, Complex Network Abstract: The widespread adoption of social networks fa- cilitates the dissemination of authentic information while also accelerating the spread of misinformation, such as rumors. The propagation of positive information can enhance user awareness and mitigate the hazards of misinformation. The misinforma- tion containment (MC) problem aims to identify a set of k nodes that initiate the spread of positive information, maximizing its influence while minimizing the hazards of misinformation. The greedy approach, which employs extensive Monte Carlo simulations to estimate influence, is time-consuming and can only prioritize either propagation or containment, but not both. This paper studies the MC problem under competitive activation models. Based on geometric models of probability, we calculate the approximate probabilities of nodes being activated by positive information and misinformation at various times. Taking into account the two-hop theory, we propose a consistent and efficient computational method to assess node influence distribution from the perspectives of propagation and contain- ment. This method strikes a balance between propagation and containment, surpassing degree centrality, further informing a heuristic solution to the MC problem. The heuristic solution’s overall performance surpasses that of greedy approaches, which can only prioritize one aspect. Experiments on real-world networks demonstrate that our approach effectively balances the propagation of positive information and misinformation containment with low time complexity.

15:40-16:00, Paper WeCT2.3
Dual-GTF: Enhancing Community Detection with Dual-Scale Game Theoretic Framework

Shi, Haodong	University of Electronic Science and Technology of China
Kang, Zhao	University of Electronic Science and Technology of China
Yan, Ke	University of Electronic Science and Technology of China
Liu, Sicong	University of Electronic Science and Technology of China
Xin, Yichen	University of Electronic Science and Technology of China
Keywords: Complex Network, Agent-Based Modeling, Computational Intelligence in Information Abstract: Community detection is a crucial task in complex network analysis, and existing game-theoretic-based community detection methods struggle to achieve a balance between local and global competition and cooperation. This paper introduce a Dual-scale Game theoretical Framework (Dual-GTF) for community detection. First, a new modularity function is proposed, enabling each node to incorporate additional expansion information at the community scale by collaborating closely with the individual scale modularity function. Subsequently, leveraging cooperative and competitive strategies informed by realistic empiricism, this paper integrate community-scale information to facilitate effective role division within communities. Evaluations of real-world datasets demonstrate that this approach, Dual-GTF, achieves superior convergence and performance results in most cases.

16:00-16:20, Paper WeCT2.4
SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Liu, Pengjie	Southern University of Science and Technology
Zhang, Wang	Southern University of Science and Technology
Ding, Yulong	Southern University of Science and Technology
Zhang, Xuefeng	Northeastern University
Yang, Shuang-Hua	Southern University of Science and Technology
Keywords: Application of Artificial Intelligence, Representation Learning, AI and Applications Abstract: Legal Judgment Prediction (LJP) aims to form legal judgments based on the criminal fact description. However, researchers struggle to classify confusing criminal cases, such as robbery and theft, which requires LJP models to distinguish the nuances between similar crimes. Existing methods usually design handcrafted features to pick up necessary semantic legal clues to make more accurate legal judgment predictions. In this paper, we propose a Semantic-Aware Dual Encoder Model (SEMDR), which designs a novel legal clue tracing mechanism to conduct fine-grained semantic reasoning between criminal facts and instruments. Our legal clue tracing mechanism is built from three reasoning levels: 1) Lexicon-Tracing, which aims to extract criminal facts from criminal descriptions; 2) Sentence Representation Learning, which contrastively trains language models to better represent confusing criminal facts; 3) Multi-Fact Reasoning, which builds a reasons graph to propagate semantic clues among fact nodes to capture the subtle difference among criminal facts. Our legal clue tracing mechanism helps SEMDR achieve state-of-the-art on the CAIL2018 dataset and shows its advance in few-shot scenarios. Our experiments show that SEMDR has a strong ability to learn more uniform and distinguished representations for criminal facts, which helps to make more accurate predictions on confusing criminal cases and reduces the model uncertainty during making judgments. All codes will be released via GitHub.

16:20-16:40, Paper WeCT2.5
Pre-Training and Fine-Tuning for Efficient Routing in Opportunistic Networks

Hao, Jia	Inner Mongolia University
Wu, XiaoRui	Inner Mongolia University
Seah, Winston	Victoria University of Wellington
Zhang, Feng	Shanxi University
Xu, Gang	Inner Mongolia University
Keywords: Complex Network, Optimization and Self-Organization Approaches, Application of Artificial Intelligence Abstract: In opportunistic networks, it is a challenge to find the best relay node instead of blindly selecting from among available nodes to forward messages and effectively transmit them to their destinations. By comparing the similarity between nodes, the next hop node that is most similar to the destination node is found in time. Existing node similarity-based opportunistic networks routing algorithms only calculate similarity between nodes according to nodes’ properties that can be directly obtained from the network, such as the historical meeting record between nodes, etc. However, this superficial similarity calculation is obviously inadequate to describe the inherent dynamic nature of opportunistic networks at both spatial and temporal levels, and ignores the salient movement characteristics of nodes, resulting in poor routing performance. Therefore, this paper proposes an opportunistic network routing strategy based on pre-training and fine-tuning (PTFT) model. Firstly, an autoencoder is added to the graph neural network model to encode node movement behavior. Then, a pre-training graph neural network model in large scale opportunistic network scenarios is applied to learn potential features of nodes through fine-tuning. Finally, we calculate the similarity between nodes based on the node potential feature, thereby assisting the nodes to achieve an efficient routing decision. The simulation results show that our PTFT-based algorithm not only has superior performance compared to traditional routing algorithms, but also faster learning speed than other machine learning-based routing algorithms.

16:40-17:00, Paper WeCT2.6
Link Prediction for Dynamic Weighted Graph Via Adaptive Nonnegative Tensor CP Decomposition (I)

Wu, Hao	Dongguan University of Technology
Li, Weiling	Dongguan University of Technology
Keywords: Complex Network, Evolutionary Computation, Knowledge Acquisition Abstract: A dynamic weighted graph is frequently encountered in real-world industrial applications like Internet of Things, which can be modeled into a Third-order Incomplete (ToI) tensor. Correspondingly, each element of the ToI tensor represents an observed link of dynamic weighted graph. A nonnegative tensor CP decomposition (NTC)-based link prediction model has proven to be efficient in predicting the missing links of a dynamic weighted graph. However, the learning objective of existing NTC model is usually built via a standard Euclidean distance, which restricts model prediction ability due to its low generalization. To address this issue, this paper presents an Adaptive Nonnegative Tensor CP decomposition (ANTC) model with two ideas include: a) adopting the β-divergence to build the learning objective for improving the generalization of the model; and b) implementing hyper-parameters self-adaptation via utilizing a differential evolutionary algorithm. Empirical studies on four dynamic weighted graphs generated by a real application illustrate that the proposed ANTC model achieves higher prediction accuracy and computational efficiency than state-of-the-art predictors in predicting the missing links.


WeCT3	MR03
Assistive and Companion Technology 1
Chair: Bautz, Julian	University of the Bundeswehr Munich

15:00-15:20, Paper WeCT3.1
Immediate After-Effects of Maintaining Light Touch Using Electrical Muscle Stimulation to Index Finger Extensor Muscles on Postural Control

Shindo, Masato	NTT Corporation
Aoki, Ryosuke	NTT Corporation
Keywords: Assistive Technology, Haptic Systems, Human-Computer Interaction Abstract: In postural control, the light touch effect stabilizes posture by adding sensory cues when a part of the body, such as the fingertips, lightly touches a support surface. However, maintaining light touch can be difficult depending on individual characteristics and task difficulty. To address this challenge, we previously proposed a system that utilizes electrical muscle stimulation (EMS) on the index finger extensor muscle to maintain light touch. The present study aims to investigate the immediate after-effects of sustained light touch through the EMS-based system on postural control. Participants performed an eyes-closed single-leg stance without touch in the pre- and post-tests and performed the same task with their index finger lightly touching a handrail between the two tests (LT-test). During the LT test, the EMS group (n=10) lightly touched the handrail while EMS was administered, and the control group (n=10) performed the light touch without EMS. The postural sway parameters and electromyography were compared within each group between pre- and post-tests. The results showed a significant difference in the center of pressure sway in the EMS group, indicating decreased postural sway immediately after the LT test. This suggests that the sustained light touch with EMS enabled users to perceive more detailed sensory cues and facilitated the subsequent retention of postural control. Moreover, the postural stability was strongly associated with pelvic stability.

15:20-15:40, Paper WeCT3.2
Posture Prediction in Response to Seat Angle Change with Smart Chair

Esumi, Tsubasa	Kyushu Institute of Technology
Takemura, Noriko	Kyushu Institute of Technology
Keywords: Human-Computer Interaction, Assistive Technology, Human-Machine Interface Abstract: Long-term desk work with lousy posture causes physical disabilities. To solve this problem, we have developed a smart chair with a controllable seat angle in our previous research. In order to control the seat angle and guide the user to the correct posture using this smart chair, it is necessary to investigate how the posture changes in response to the seat angle change. In this study, we first constructed a dataset by capturing posture data from 30 subjects while changing the seat angle of the smart chair. This dataset trained a model based on a graph neural network to predict posture in response to seat angle changes. We compared the model's prediction performance in the evaluation experiment depending on the training data and the body part.

15:40-16:00, Paper WeCT3.3
Sign Language Recognition and Translation Methods Promote Sign Language Education: A Review (I)

Zou, Jingchen	Beijing University of Technology
Li, Jianqiang	Beijing University of Technology
Tang, Jing	Beijing University of Technology
Huang, Yuning	Beijing University of Technology
Ding, Shujie	Beijing University of Technology
Xu, Xi	Beijing University of Technology
Keywords: Visual Analytics/Communication, Assistive Technology Abstract: Sign language recognition and translation (SLRT) aims to convert sign language into textual representation, which holds significant importance for the deaf community. Sign language possesses complex and diverse grammatical structures, with each sign language having distinct motion trajectories and gesture variations, making SLRT a complex research domain. In recent years, numerous researchers have proposed different modeling approaches, achieving significant advancements through the utilization of large language models. In this survey, we systematically review the developmental trajectory of SLRT, encompassing an introduction to key technical approaches at each stage and the latest research progress. Through a comprehensive examination of these methods, valuable insights are provided for future research and practical applications. Lastly, we identify the existing limitations of current methods and propose potential avenues for future research.

16:00-16:20, Paper WeCT3.4
Terrain Modeling for Control of Lower Limb Prostheses and Exoskeletons Using Low-Cost Wearable Sensors (I)

Yang, Yunfang	Nankai University
Han, Jianda	Nankai University
Huo, Weiguang	Nankai University
Keywords: Assistive Technology, Human-Machine Interaction Abstract: The development of powered lower-limb prostheses and exoskeletons (LLPE) for assisting individuals in activities of daily living has been gaining increasing interest in the robotic community. To assist wearers walking on various terrains in daily environments, accurate gait-mode recognition and seamless transition of control strategies are crucial for these devices. Due to the high diversity of terrains, such capabilities are usually subject to terrain conditions, making terrain detection an essential issue for LLPE control. In the paper, we proposed a method for terrain detection and modeling aimed at reconstructing terrains rather than merely classifying them to provide richer information for LLPE control, including online 2D terrain information and the relative foot position. The implementation of the proposed method relies on a sensor group consisting of a low-cost single-point laser sensor and two inertial measurement units (IMU) to simultaneously and continuously capture lower limb kinematic features and terrain features. A time-varying Kalman filter is employed to fuse these features, facilitating rapid and accurate modeling of different terrains such as level ground, stairs ascend/descend, and ramp ascend/descend. The performance of the proposed method was evaluated via experiments with two healthy subjects. The results show that the reconstructed terrains can provide accurate information for LLPE control, demonstrating the effectiveness and adaptability of the proposed method.

16:20-16:40, Paper WeCT3.5
Adaptive Mission Planning: Evaluation of a Hybrid Cognitive Mixed-Initiative Planning Assistant in Manned-Unmanned Teaming Operations (I)

Maier, Siegfried	University of the Bundeswehr Munich
Kiam, Jane Jean	Universität Der Bundeswehr München
Schulte, Axel	Bundeswehr University Munich
Keywords: Human-Machine Cooperation and Systems, Assistive Technology, Cognitive Computing Abstract: This paper examines the integration of cognitive mixed-initiative assistance via a Planning Assistance Agent in the context of Manned-Unmanned Teaming operations. The aim is to enhance mission planning, replanning, and execution in complex military air operations. The agent employs a hybrid Mixed-Initiative-Planning approach to autonomously adjust and optimize mission plans in real time, with the objective of reducing pilot workload, increasing situational awareness, and maintaining high mission success rates without increasing the risk of losses of unmanned assets. The agent integrates current environmental data and tactical situation changes into its planning processes, thereby closing the so-called "cognitive loop". This is made possible by the use of sophisticated algorithms and planning problem modeling languages. The effectiveness of the Agent was evaluated through the participation of German Air Force pilots in both static and dynamic mission simulations. The dynamic simulations were conducted in a fully integrated Manned-Unmanned-Teaming fighter simulator, while the static missions required the pilots to create mission plans on a separate workstation. The simulations assessed the impact of the Agent on mission success and the pilot performance under varying levels of assistance. The results demonstrated that a situation-adapted assistance, which allows for dynamic and autonomous tactical adjustments by the Agent, most effectively enhances operational performance and pilot engagement without overwhelming the pilot or causing the pilot to over rely on automated systems.

16:40-17:00, Paper WeCT3.6
Using Situational Awareness and Situative Criticality for Adaptive Planning Assistance in MUM-T Missions (I)

Bautz, Julian	University of the Bundeswehr Munich
Schwerd, Simon	University of the Bundeswehr Munich
Schulte, Axel	Bundeswehr University Munich
Keywords: Assistive Technology, Human-Machine Interaction, Human-Machine Cooperation and Systems Abstract: This study examines the effect of adaptive assistance in online mission planning for military aircraft pilots in Manned-Unmanned-Teaming (MUM-T) missions. We evaluated two adaptive approaches to select from three levels of assistance in a cockpit simulator experiment. Both approaches used the criticality of the situation to choose an assistance level, with one approach additionally integrating an eye-tracking-based measure of Situational Awareness (SA). Based on these triggers, the adaptive system used different levels of assistance for mission planning support. We compared these two adaptive conditions to a baseline setup without assistance and evaluated performance, workload, usability, and subjective SA. Performance was assessed by flight data and mission completion times, while workload, usability, and SA were assessed with standard questionnaires. Both assistance systems exceeded the non-assisted baseline in terms of mission time, workload management, SA, and usability. Pilots supported by adaptive assistance integrating both, criticality, and SA, showed improvements in subjective SA over the criticality-only-condition while demonstrating faster mission completion. The findings suggest that adaptive assistance, particularly when incorporating SA, can enhance pilot performance in MUM-T operations.


WeCT4	MR04
Resilience Engineering	Regular Papers - Cybernetics


WeCT5	MR05
Cyber-Physical Systems and Robotics 3
Chair: Wang, Hangyu	Institute of Information Engineering, Chinese Academy of Sciences

15:00-15:20, Paper WeCT5.1
Multi-User Computation Offloading in Mobile Edge Computing with Hybrid Whale Optimization

Bi, Jing	Beijing University of Technology
Li, Ning	Beijing University of Technology
Yuan, Haitao	Beihang University
Zhang, Jia	Southern Methodist University
Zhou, Mengchu	New Jersey Institute of Technology
Keywords: Cyber-physical systems, Infrastructure Systems and Services, Smart Buildings, Smart Cities and Infrastructures Abstract: With the increasing amount of data and the need for real-time processing, Mobile Edge Computing (MEC) is growing rapidly, driving the shift from traditional cloud computing to distributed edge architectures. When offloading these applications with large amounts of data on mobile devices, a lot of computing and storage resources and high energy consumption are required. Yet, mobile devices’ computing power, resource storage, and battery power are often limited and cannot meet these needs. To solve a computation offloading problem for joint optimization of time, cost, and energy, this work proposes an improved hybrid algorithm called Chaos and L´evy flights-based Whale Optimization Algorithm (CLWOA) to solve the multi-user offloading problem in an MEC-Cloud system. Each task is offloaded to local processors of mobile devices, edge servers, and cloud servers in proportion to jointly minimize the completion time, energy consumption, and total cost. Finally, compared with the whale optimization algorithm, l´evy flight whale optimization algorithm, refined whale optimization algorithm, and chaos-based whale optimization algorithm, CLWOA reduces the weighted cost by 1.89%, 0.31%, 0.19%, and 0.42%, respectively.

15:20-15:40, Paper WeCT5.2
Energy and Time-Optimized Task Scheduling with Simulated-Annealing-Based Firefly Algorithm in Hybrid Cloud Edge Computing

Bi, Jing	Beijing University of Technology
Zhou, Xinmin	Beijing University of Technology
Yuan, Haitao	Beihang University
Zhang, Jia	Southern Methodist University
Zhou, Mengchu	New Jersey Institute of Technology
Keywords: Cyber-physical systems, Smart Buildings, Smart Cities and Infrastructures, Decision Support Systems Abstract: In a cloud-edge system, data analysis, processing, and storage can be performed in edge servers, avoiding transferring data to more distant cloud servers. This greatly improves the efficiency of data processing, saves network bandwidth and cloud resources, and reduces operating and maintenance costs. However, it is a challenge of how to perform task scheduling. It is difficult to schedule tasks for joint optimization of the total energy consumption and completion time of a task sequence within a limited time in a resource-constrained cloud-edge system. The work proposes an improved Simulated-Annealing-based Firefly Algorithm with Linear position update, called SAFAL for short. SAFAL incorporates a simulated annealing mechanism and an efficient position update strategy into the firefly algorithm, enabling fireflies to find the optimal solution more quickly and avoid getting trapped in local optima. SAFAL adopts a probabilistic mapping operator to map the position of each firefly to a task scheduling sequence, thus linking the firefly space and the task space. Several test instances in cloud-edge systems are designed to validate the superiority of SAFAL over the firefly algorithm, simulated annealing, and firefly algorithm with a selfadaptive strategy. Results show that the weighted cost of total energy consumption and completion time of SAFAL is reduced by 16.32%, 17.62%, and 14.21%, respectively, with 20 tasks.

15:40-16:00, Paper WeCT5.3
Unified Industrial Cyber-Physical Systems Modeling and Performance Analysis under Cyber-To-Physical Attacks

Wang, Yifan	Zhejiang University
Zhou, Chenchen	Zhejiang University
Cao, Yi	Zhejiang University
Zhang, Xuefeng	Northeastern University
Shuang-Hua, Yang	Zhejiang University
Keywords: Cyber-physical systems, System Modeling and Control, Fault Monitoring and Diagnosis Abstract: An increasing number of factories are transitioning to industrial cyber-physical systems (iCPSs), which pose the threat of cyberattacks. To simulate and analyze the dynamic behavior of iCPS under cyber attacks, a unified iCPS model that integrates cyber and physical systems with profound interactions is indispensable. On the basis of a generic CPS architecture, we systematically model cyber and physical systems as discrete state-space equations, and simulate packet loss and delay in communication using Markov chains. Models for denial-of-service (DoS) attacks and false data injection (FDI) attacks have been delineated and applied to the unified iCPS model. Ultimately, the experimental results corroborate the feasibility and efficacy of the unified iCPS model. Furthermore, recommendations and guidelines are proposed to bolster the cybersecurity of bilateral control iCPSs.

16:00-16:20, Paper WeCT5.4
MICABAC: Multidimensional Industrial Control Attribute-Based Access Control Model

Wang, Hangyu	Institute of Information Engineering, Chinese Academy of Science
Lv, Fei	Institute of Information Engineering, Chinese Academy of Science
Chen, Yuqi	ShanghaiTech University
Si, Shuaizong	Institute of Information Engineering, Chinese Academy of Science
Pan, Zhiwen	Institute of Information Engineering, Chinese Academy of Science
Sun, Degang	Computer Network Information Center, Chinese Academy of Sciences
Sun, Limin	University of Chinese Academy of Sciences, Institute of Informat
Keywords: Cyber-physical systems, Consumer and Industrial Applications, Homeland Security Abstract: As the Industrial Control System (ICS) increasingly merges with the Internet, the security threats have been increasing from internal users and external hackers. These challenges are further intensified by the facts: industrial control devices and protocols, leading to the inadequacy of traditional access control models in tackling the intricacies of ICS. We identify attributes that are optimally aligned with the specific needs of the ICS environment and propose the Multidimensional Industrial Control Attribute-Based Access Control Model (MICABAC) as a customized solution. MICABAC model significantly improves access control security and is finer granularity by selecting and evaluating required attributes within various ICS. We have been validated in two real-world ICS environments: the Gas Pipe Network System (GPNS) and the Computer Numerical Control (CNC) machine tool. Experiments indicate that by integrating MICABAC into the existing system, the maximum delay for access requests is 63.83 ms. In terms of accuracy in defending against malicious attacks, the GPNS achieves 96.49% and the CNC reaches 94.86%. Finally, we discuss the advantages and limitations of MICABAC and explore potential directions for future research.

16:20-16:40, Paper WeCT5.5
A Compact-Dynamic Graph Convolutional Network for Spatiotemporal Signal Recovery (I)

Gao, Pengcheng	SouthWest University
Gao, Zicheng	Southwest University
Yuan, Ye	Southwest University
Keywords: Cyber-physical systems Abstract: High quality spatiotemporal signal is vitally important for real application scenarios like energy management, traffic planning and cyber security. Due to the uncontrollable factors like abrupt sensors breakdown or communication fault, the spatiotemporal signal collected by sensors is always incomplete. A dynamic graph convolutional network (DGCN) is effective for processing spatiotemporal signal recovery. However, it adopts a static GCN and a sequence neural network to explore the spatial and temporal patterns, separately. Such a separated two-step processing is loose spatiotemporal, thereby failing to capture the complex inner spatiotemporal correlation. To address this issue, this paper proposes a Compact-Dynamic Graph Convolutional Network (CDGCN) for spatiotemporal signal recovery with the following two-fold ideas: a) leveraging the tensor M-product to build a unified tensor graph convolution framework, which considers both spatial and temporal patterns simultaneously; and b) constructing a differential smoothness-based objective function to reduce the noise interference in spatiotemporal signal, thereby further improve the recovery accuracy. Experiments on real-world spatiotemporal datasets demonstrate that the proposed CDGCN significantly outperforms the state-of-the-art models in terms of recovery accuracy.

16:40-17:00, Paper WeCT5.6
Transformer-Based Model Sea Temperature and Salinity Prediction Using Satellite Remote Sensing and Argo Data Fusion (I)

Liao, Yan-Jhen	National Taipei University
Zhan, Cheng-Han	National Taipei University
Chang, Yue-Shan	National Taipei University
Keywords: Cyber-physical systems, Distributed Intelligent Systems, Modeling of Autonomous Systems Abstract: Due to the intensification of global warming, changes in sea temperature and salinity have profound impacts on ecosystems and climate. Accurately predicting marine environmental changes has thus become an important issue. However, the spatial limitations inherent in Argo floats and satellite remote sensing data pose challenges to prediction. This study integrates data collected from Argo floats and satellite remote sensing, employing a Transformer-based model to predict sea temperature and salinity at different times, latitudes, and depths below 50 meters beneath the sea surface, aiming to supplement the shortcomings of Argo floats data and satellite remote sensing data and achieve comprehensive marine environmental prediction. The results of the study demonstrate that the proposed Transformer-based model performs well in sea temperature prediction, with a mean absolute error (MAE) of 0.489℃, root mean square error (RMSE) of 0.809℃, and the coefficient of determination (R2) of 0.983. Regarding salinity prediction, the model also exhibits excellent performance, with MAE of 0.167 psu, RMSE of 0.229 psu, and R2 of 0.805. Compared to other models, the model proposed in this study shows superior performance in predicting sea temperature and salinity, providing an important tool and reference for future marine resource management and climate change research.


WeCT6	MR06
Infrastructure Systems and Services 3
Chair: Li, Dandan	Renmin University of China

15:00-15:20, Paper WeCT6.1
A Novel Trajectory Analysis Framework for Detecting Corruption Involving Government Vehicles

Li, Dandan	Renmin University of China
Xu, Wei	Renmin University of China
Li, Qian	Renmin University of China
Keywords: Decision Support Systems, Smart Buildings, Smart Cities and Infrastructures Abstract: Corruption involving government vehicles is a long-standing public concern, that hampers government operations, economic growth, and social harmony. However, the inherent complexity and pervasive uncertainty pose significant challenges for corruption detection. With the advent of information and communication technology, GPS trajectory data has been extensively collected, providing vehicle location histories. In this study, we propose an innovative method to detect corruption by classifying vehicle trajectories and geographically mining anomalous locations using the Gaussian Mixture Model (GMM), offering valuable insights for policymakers. Initially, we introduce stay points to reject noise points and condense the trajectory into actionable spatial data. Following this, we employ temporal partitioning and a speed threshold to create a quaternion-based spatial-temporal-speed trajectory, serving as the classifier's input data. We then apply the GMM to discern patterns and anomalous trajectories within the stay point data. Based on the cluster information, we design a three-tiered (top-middle-bottom) location corruption monitoring mechanism derived from probability theory to detect corruption indicators near sensitive landmarks. In practice, we present a case study using real-world data from a border city in China. The results demonstrate the potential of our method in detecting corruption involving government vehicles, offering valuable insights for policymakers and law enforcement agencies.

15:20-15:40, Paper WeCT6.2
Mobility and Privacy-Aware Computation Offloading with Energy Harvesting in MEC-Enabled Networks

Bi, Jing	Beijing University of Technology
Niu, Siyu	Beijing University of Technology
Yuan, Haitao	Beihang University
Zhai, Jiahui	Beijing University of Technology
Zhang, Jia	Southern Methodist University
Zhou, Mengchu	New Jersey Institute of Technology
Keywords: Distributed Intelligent Systems, Smart Sensor Networks, Infrastructure Systems and Services Abstract: Many new IoT applications have emerged with the fast evolution of 5G and the Internet of Things (IoT). These applications place higher demands on network energy consumption and processing capabilities. Mobile edge computing (MEC) significantly enhances execution efficiency, while energy harvesting (EH) modules further augment the operational features of IoT devices. However, existing studies mainly concentrate on energy consumption and latency problems, often neglecting issues about user mobility and potential privacy leakage within the MEC environment. Therefore, optimizing computation offloading and resource allocation for MEC-enabled IoT networks is essential. This work proposes an innovative architecture with EH for collaborative computing between multiple mobile devices (MDs) and MEC servers. To tackle the problem, this work also proposes an advanced hybrid algorithm named Self-adaptive Bat Optimizer with Genetic operations and individual update of Grey wolf optimizer (SBG2). With SBG2, this work aims to minimize the energy consumption of MDs while providing user mobility and privacy protection. Simulation experiments show that SBG2 reduces energy consumption by 79.15%, 93.20%, and 89.58%, respectively, compared to the other three typical algorithms.

15:40-16:00, Paper WeCT6.3
Water Quality Anomaly Detection with Dual Sliding Windows and Convolutional LSTM

Bi, Jing	Beijing University of Technology
Yuan, Ming	Beijing University of Technology
Yuan, Haitao	Beihang University
Wang, Ziqi	Beijing University of Technology
Qiao, Junfei	Beijing University of Technology
Keywords: Smart Buildings, Smart Cities and Infrastructures, Decision Support Systems, Smart Sensor Networks Abstract: Water pollution is continuously increasing in water ecosystems across all continents. Surface water sensors can record data on water quality indicators at regular intervals, and the associated water quality sequences show abnormal trends when extreme weather or unusual industrial discharges occur. Therefore, governments can take timely actions to minimize damage and protect the water environment by detecting these abnormal trends promptly. However, current methods make it difficult to interpret different correlations among water quality parameters effectively. To solve this problem, this work proposes a parameter correlation-aware anomaly detection model, which integrates Dual sliding windows, Convolutional LSTM, and a Deep neural network with dropout, called for DCLD short. First, DCLD designs dual sliding windows to capture local and global patterns within the sequence of water quality. Second, DCLD adopts a stacked long short-term memory with a convolutional neural network to capture complex features and long-term dependencies in the time series. Third, DCLD uses a deep neural network incorporating the dropout algorithm to extract abstract features. DCLD mitigates overfitting risks and enhances the model’s generalization capacity. Finally, DCLD is evaluated with two real-world water quality datasets, and its anomaly detection accuracy is improved by 5.41% and 0.79% on average over its peers.

16:00-16:20, Paper WeCT6.4
Research on Urban Metro Network Recovery Strategy Based on Resilience Theory: A Case Study of Qingdao Metro Network

Shao, Zhiguo	Qingdao University of Technology
Chen, Jie	Qingdao University of Technology
Li, Mengdi	Tongji University
Tang, Hongxia	Qingdao University of Technology
Keywords: Quality and Reliability Engineering, Infrastructure Systems and Services, Intelligent Transportation Systems Abstract: The operation of the subway system is highly susceptible to natural disasters. In order to ensure the safe operation of the metro, this paper applies the complex network theory and resilience theory, takes Qingdao metro network as an example, and takes the network efficiency as the evaluation index of resilience, and puts forward four attack strategies to study the robustness of the metro network in the face of different attacks, and restores the network after the damage to study the resilience recovery ability. The results show that the deliberate attack strategy has greater destructive power compared with the random attack strategy. Among them, the station-degree attack strategy has the greatest impact on the recovery capability of the metro network. Among the recovery strategies, scenarios 1 and 2 have the most effective inter-degree centrality-based recovery strategies, while scenarios 3 and 4 have the best station-degree-based recovery strategies. This suggests the need for different recovery strategies depending on the type of attack scenario.

16:20-16:40, Paper WeCT6.5
Graph Attention Transformer with Dilated Causal Convolution and Laplacian Eigenvectors for Long-Term Water Quality Prediction

Bi, Jing	Beijing University of Technology
Chen, Danqing	Beijing University of Technology
Yuan, Haitao	Beihang University
Keywords: Decision Support Systems, Smart Buildings, Smart Cities and Infrastructures, Smart Sensor Networks Abstract: Water quality prediction has emerged as a prominent research problem in recent years, which entails a spatiotemporal prediction task. However, several challenges are associated with water quality prediction: 1) water quality time series exhibit complex nonlinear relationships, making prediction challenging; 2) water quality sensors are distributed across river networks, leading to strong spatial dependencies in water quality prediction; 3) current methods like traditional machine learning methods have poor accuracy in long-term water quality prediction. To solve these problems, this work proposes a spatiotemporal prediction model named Graph Attention Transformer with Dilated Causal Convolution and Laplacian Eigenvectors (GTDL). First, dilated causal convolution is adopted to extract temporal features of input sequences. Second, a transformer with Laplacian eigenvectors is designed to extract spatial dependencies of river networks. Third, residual connections are utilized at the output stage to enhance the accuracy of the long-term prediction. Finally, GTDL is evaluated with two real-world water quality datasets and experimental results prove that GTDL outperforms other baseline methods regarding the prediction accuracy. Specifically, compared with five state-of-the-art prediction models, GTDL improves the prediction accuracy by 11.3%-63.5% and 6%-52.4% on two datasets, respectively.

16:40-17:00, Paper WeCT6.6
MDS-GNN: Mutual Dual-Stream Graph Neural Network on Incomplete Graphs (I)

Yuan, Peng	Chongqing University of Posts and Telecommunications
Tang, Peng	Southwest University
Keywords: Smart Sensor Networks Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for analyzing and learning representations from graph-structured data. A crucial prerequisite for the outstanding performance of GNNs is the availability of complete graph information, i.e., node features and graph structure, which is frequently unmet in real-world scenarios since graphs are often incomplete due to various uncontrollable factors. Existing approaches only focus on dealing with either incomplete features or incomplete structure, which leads to performance loss inevitably. To address this issue, this study proposes a mutual dual-stream graph neural network (MDS-GNN), which implements a mutual benefit learning between features and structure. Its main ideas are as follows: a) reconstructing the missing node features based on the initial incomplete graph structure; b) generating an augmented global graph based on the reconstructed node features, and propagating the incomplete node features on this global graph; and c) utilizing contrastive learning to make the dual-stream process mutually benefit from each other. Extensive experiments on six real-world datasets demonstrate the effectiveness of our proposed MDS-GNN on incomplete graphs.


WeCT7	MR07
Online - AI Applications 7
Chair: Dong, Qinghui	Institute of Automation, Chinese Academy of Sciences

15:00-15:20, Paper WeCT7.1
Collaborative Resource Allocation for Blockchain-Enabled Internet of Things with Multi-Agent Deep Reinforcement Learning

Lu, Xinyu	Inner Mongolia University of Technology
Wan, Jianxiong	Inner Mongolia University of Technology
Li, Leixiao	Inner Mongolia University of Technology
Liu, Chuyi	Inner Mongolia University of Technology
Si, Xiaowei	Inner Mongolia University of Technology
Keywords: Agent-Based Modeling, Intelligent Internet Systems, Deep Learning Abstract: Mobile Edge Computing (MEC) reduces service latency and enhances Quality of Service (QoS) by offloading tasks to the wireless network edge. However, the rapid growth of task offloading and the associated data transmission security challenges deserve further investigation. This study proposes a blockchain-MEC hybrid solution where mobile devices process tasks and engage in block mining to boost system utility. The objective is to maximize the accumulated reward in the blockchain-MEC system by optimizing offloading decisions, channel selection, transmission power, computing resources, and block intervals. A Markov Decision Process is formulated to model the optimization problem which is solved via a multi-agent deep reinforcement learning (MADRL) algorithm. The results of the simulation demonstrate that our approach is more effective than the baseline method.

15:20-15:40, Paper WeCT7.2
BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network Based on Self-Supervised Embedding

Mattursun, Alimjan	Xinjiang University
Wang, Liejun	Xinjiang University
Yu, Yinfeng	Xinjiang University
Keywords: Deep Learning, Representation Learning, Image Processing and Pattern Recognition Abstract: Speech self-supervised learning (SSL) represents has achieved state-of-the-art (SOTA) performance in multiple downstream tasks. However, its application in speech enhancement (SE) tasks remains immature, offering opportunities for improvement. In this study, we introduce a novel cross-domain feature fusion and multi-attention speech enhancement network, termed BSS-CFFMA, which leverages self-supervised embeddings. BSS-CFFMA comprises a multi-scale cross-domain feature fusion (MSCFF) block and a residual hybrid multi-attention (RHMA) block. The MSCFF block effectively integrates cross-domain features, facilitating the extraction of rich acoustic information. The RHMA block, serving as the primary enhancement module, utilizes three distinct attention modules to capture diverse attention representations and estimate high-quality speech signals. We evaluate the performance of the BSS-CFFMA model through comparative and ablation studies on the VoiceBank-DEMAND dataset, achieving SOTA results. Furthermore, we select three types of data from the WHAMR! dataset, a collection specifically designed for speech enhancement tasks, to assess the capabilities of BSS-CFFMA in tasks such as denoising only, dereverberation only, and simultaneous denoising and dereverberation. This study marks the first attempt to explore the effectiveness of self-supervised embedding-based speech enhancement methods in complex tasks encompassing dereverberation and simultaneous denoising and dereverberation.

15:40-16:00, Paper WeCT7.3
EK-CPSG: Enhancing Confusing Charge Prediction with Criminal Charge Definition

Zhang, Yifei	Inner Mongolia Normal University
Sa, Rina	Inner Mongolia Normal University
Li, Yanling	Inner Mongolia Normal University
Fengpei Ge, Fengpei Ge	Beijing University of Posts and Telecommunications
Yu, Haiqing	Inner Mongolia Normal University
Wang, Sukun	Inner Mongolia Normal University
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Deep Learning Abstract: 摘要 - 电荷预测的任务旨在确定通过分析事实对被告的最终指控法律案件中的描述。现有方法主要使用数据集中提供的事实描述以提供指导这个任务。但是，刑事指控定义（CCD）法律法规是一个丰富的附加信息，能有效区分混淆电荷，现有模型未考虑这一点。在这项工作中，我们提出了一个称为电荷预测序列图的模型与外部知识（EK-CPSG），它集成了有关刑事指控定义（CCD）的信息。这方法基于电荷预测模型CP-KG，其克服了由于数据不平衡和密钥提取不完整元素。首先，我们使用注意力机制来有效地将 TextCNN 编码的 CCD 与案例事实相结合由 Bi-GRU 编码。其次，我们专注于最相关的案例事实和CCD的要素。第三，评估 EK-CPSG，我们从数据集构建了CCs-8，CAIL2018数据集&

16:00-16:20, Paper WeCT7.4
Fusion Data on Fuzzy Modality: From Algebraic Interpretations to Quantum Simulations Via Qiskit Platform

Buss, Juliano Strelow	Federal University of Pelotas
Novack, Bruna Camily Domingues	Federal University of Pelotas
Botelho, Cecilia Silva da Costa	Federal University of Pelotas
Yamin, Adenauer	Federal University of Pelotas
Reiser, Renata	Federal University of Pelotas
Lucca, Giancarlo	Catholic University of Pelotas
Santos, HelidaSS	Federal University of Rio Grande
Cruz, Anderson	Federal University of Rio Grande Do Norte
Keywords: Fuzzy Systems and their applications Abstract: This study is given at the intersection of three important areas: Modal Logic (ML), Fuzzy Logic (FL), and Quantum Computing (QC), leveraging from their main features. On the one hand, we have the QC ability to handle complex data more efficiently, taking advantage of the quantum mechanics concepts. On the other hand, ML and FL allow us to express uncertainties by mathematically modeling the imprecision of the natural language. Therefore, we first provide an algebraic model to interpret modal operators on fuzzy logic, and then we represent them in a quantum computing environment. Besides, we present some case studies simulating the fuzzy modal connectives in the Qiskit platform, aiming to better understand the evolution of quantum circuits.

16:20-16:40, Paper WeCT7.5
HDD4DBP: A Large-Scale Multi-Modal Benchmark on Driving Behavior Prediction

Dong, Qinghui	Institute of Automation, Chinese Academy of Sciences
Zhang, Zhang	Institute of Automation, Chinese Academy of Sciences
Chang, Yubo	Institute of Automation, Chinese Academy of Sciences
Chen, Wentao	Institute of Automation, Chinese Academy of Sciences
Wang, Liang	Institute of Automation, Chinese Academy of Sciences
Keywords: Human-Centered Transportation, Human-Machine Interaction, Human Perception in Multimedia Abstract: Driving behavior prediction (DBP) plays an important role in autonomous driving. Accurately anticipating driving behaviors (e.g., left turn, right turn) of ego-vehicles 3-5 seconds before actual occurrences of maneuvers can help AI pilots planning safer trajectories or informing possible dangers to drivers. However, current DBP benchmark datasets, e.g., the Brain4Cars, are restricted by the limited amount of data samples and small number of behavior categories. Thus, there is an urgent need for creating a new larger-scale benchmarks to help boost the studies of DBP. In this work, based on the annotations in the HRI Driving Dataset (HDD), we extract 5000+ clips from untrimmed multi-modal data sequences to form a new dataset, termed the HDD4DBP, as a convincing large-scale testbed for evaluating various DPB methods. Furthermore, inspired by the success of transformers on modeling long range dependence in sequences, we build a strong baseline with the vision transformer (ViT) backbone for predicting driving behaviors. Compared to previous representative baselines, a large margin performance gain can be achieved by our strong baseline on the HDD4DBP. Moreover, simply fine-tuning the pre-trained strong baseline can obtain the state-of-the-art performance on the Brain4Cars dataset, which further validates the benefits of the HDD4DBP. The dataset and source code will be released.

16:40-17:00, Paper WeCT7.6
CE: Knowledge Tracking Model Uncertainty Assessment Method under Trusted Artificial Intelligence

Bai, Ji ping	Ocean University of China
Kang, Bo	Ocean University of China
Hu, Kexin	Ocean University of China
Zheng, Qi	Ocean University of China
Wang, Xiaodong	Ocean University of China
Keywords: Trust in Autonomous Systems, Technology Assessment Abstract: 值得信赖的人工智能正在成为该领域的新焦点人工智能及其研究可信度标准对于提高人工智能的可信度。尽管有大量的方法已探索评估其可信度人工智能，仍然缺乏简单直观的东西评估方法。本文重点介绍以下领域：知识追踪，结合各种聚类技术提出一种基于的可信评价方法不确定性计算。通过使用进行验证聚类、蒙特卡洛方法和相关性分析，我们的方法有效地检查了以下机构的可信度多个突出的知识追踪模型。用人开放以项目响应为基础的数据集和虚拟数据集理论（IRT），我们的方法取得了值得称赞的性能以低成本，从而为研究提供参考融入知识追踪模型的可信度。最后，我们总结了模型误差和区域改进，并提供有关规模和Ɓ


WeCT8	MR08
Online - Agent-Based and Autonomous Systems	Regular Papers - Cybernetics
Chair: Lv, Shaomei	Jinan University

15:00-15:20, Paper WeCT8.1
DialogNTM: Context Reconstruction in Multi-Turn Dialogue Generation Using Neural Turing Machines

Zhao, Haohao	Northeast Forestry University
Liu, Meiling	Northeast Forestry University
Zhou, Jiyun	Lieber Institute, Johns Hopkins University
Fu, Kaiquan	South Dakota State University
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications Abstract: In actual conversational scenarios, we can often determine which parts of the previous dialogue are more critical based on the current inquiry. However, the existing contextual modeling methods often encode the query sentence and the dialogue history in a unified manner, which fails to effectively highlight the inference effect of the query sentence. Moreover, these methods typically process the dialogue history only at the information extraction level, neglecting the treatment of the context itself. In this paper, we propose a novel conversational context modeling technique called DialogNTM. Based on the guidance of the query sentence, the technology can effectively eliminate redundant information by reconstructing the representation of the context. Specifically, we have tweaked the memory and input flow of the Neural Turing Machine (NTM) to encode contextual information in memory and guide the read, write, and erase operations of memory through query sentence. This design simulates the human brain's dynamic retrieval and renewal mechanism of previous memories when dealing with current problems. We have conducted extensive experiments on three publicly available datasets to verify the effectiveness of the DialogNTM model. Compared to the benchmark model, DialogNTM showed significant performance improvements ranging from 11% to 73% across multiple automated evaluation metrics (3.52% to 8.68% in absolute terms).

15:20-15:40, Paper WeCT8.2
Content Popularity Prediction in Edge Computing Networks Based on Elastic Federated Learning

Lv, Shaomei	Jinan University
Zhou, Jipeng	Jinan Univerity
Keywords: Cloud, IoT, and Robotics Integration, Machine Learning Abstract: This paper focuses on the problem of content popularity prediction in edge computing networks. To enable small base stations (SBSs) to obtain accurate prediction results while protecting user privacy, we propose a novel popularity prediction strategy based on elastic federated learning. First, we propose a cluster segmentation method based on region preference similarity, aggregating users from similar regions into the same cluster by considering content similarity and region distance similarity. This fine-grained segmentation method effectively enables each SBS better to provide personalized services to users in specific groups. Second, we adopt a comprehensive approach that considers multiple contextual features to train the Variational Auto-Encoder(VAE) model in SBS and learn its latent representations to improve the accuracy and reliability of model prediction. Finally, we construct a personalized model for SBS based on elastic federated learning. Specifically, we achieved personalized learning and prevalence prediction of SBS by dynamically adjusting the learning process of each SBS and comprehensively considering the differences between global and local models. Experimental results show that our approach improves the cache hit rate by up to 28.10% compared to existing strategies.

15:40-16:00, Paper WeCT8.3
Understanding Driving Risks Via Prompt Learning

Chang, Yubo	Institute of Automation, Chinese Academy of Sciences
Lyu, Fan	New Laboratory of Pattern Recognition, State Key Laboratory of M
Zhang, Zhang	Institute of Automation, Chinese Academy of Sciences
Wang, Liang	Institute of Automation, Chinese Academy of Sciences
Keywords: Deep Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence Abstract: Understanding driving risks is crucial for enhancing driving safety. It is a challenging task to evaluate driving risks in various complex driving scenarios. Inspired by prompt-based learning, we propose an end-to-end approach for identifying the highest-risk object in the current driving scenario based on a learnable risk pool. Specifically, a method based on key-value pair matching is designed to build a memory system for learning a collection of risk prototypes. Extensive experiments on the DRAMA dataset show that the proposed method achieves an improvement of 18.6% in Mean-IOU and 3.0% in B4 score compared to the state-of-the-art (SOTA) methods, which indicates that our method can effectively localize risky objects and accurately describe the driving scenes.


WeCT9	MR09
Brain-Machine Interfaces (BMIs) 1	Regular Papers - Cybernetics
Chair: Wang, Zhichao	Beijing Jiaotong University

15:00-15:20, Paper WeCT9.1
Innovating Educational Assessment: A Hybrid TCN-LSTM Model for Knowledge Tracing

Zheng, Siyu	Beijing Normal University
Xiong, Qingyun	Beijing Normal University
Li, Yutong	Beijing Normal University
Han, Tianli	Beijing Normal University
Guo, Junqi	Beijing Normal University
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications Abstract: With the rapid development of Internet technology and artificial intelligence, artificial intelligence technology based on deep learning has created new application prospects in the education industry, promoting the innovation and progress of the education system and facilitating the transformation from traditional teaching mode to intelligent education. In this transformation process, knowledge tracing technology is a key tool for assessing student learning behavior. Knowledge tracing aims to trace and assess students' understanding and mastery of individual knowledge points in real-time by analyzing data from students' previous answers to construct models. With the continuous advancement of deep learning technology in recent years, knowledge tracing has evolved into a mainstream approach for modelling students' mastery of knowledge in educational assessment, predominantly employing Recurrent Neural Networks (RNN). Aiming at the challenges RNN face in handling long-term data dependencies, this paper proposes a hybrid knowledge tracing model that combines Temporal Convolutional Networks (TCN) and Long Short-Term Memory (LSTM). The proposed model is tested and evaluated on three public datasets, ASSISTments2009, ASSISTments2017, and Statics2011. Compared with the existing classical methods, the proposed model shows a significant improvement in the Area Under Curve (AUC) and Accuracy (ACC), which verifies the effectiveness of the proposed method.

15:20-15:40, Paper WeCT9.2
MIG: Addressing the Cold-Start Problem in Task Recommendations through Enhanced Meta Embeddings

Wang, Zhichao	Beijing Jiaotong University
Ma, Yixuan	Beijing Jiaotong University
Keywords: Deep Learning, Representation Learning, Application of Artificial Intelligence Abstract: With the rapid expansion of freelance workers and tasks, online labor market platforms face a significant challenge with the cold-start problem, which makes it very difficult to effectively match new workers with suitable tasks. To solve this challenge, this paper presents a novel solution termed the Meta-learning ID Embedding Generator (MIG). MIG addresses the cold-start problem in task recommendation systems by efficiently learning suitable ID embeddings for new workers. MIG consists of an initial embedding generator for generating the initial ID embedding, alongside two adaptors designed to iteratively refine this embedding on the worker’s competence and interest. The efficacy of this method has been assessed using authentic data sourced from Freelancer.com, a leading online labor marketplace. The empirical findings demonstrate its superiority over state-of-the-art methods when addressing the needs of two challenging user segments: newcomers to the platform and long-inactive users whose bidding records are sparse. MIG can seamlessly integrate into existing task recommendation systems, thereby enhancing their effectiveness, particularly in cold start scenarios.

15:40-16:00, Paper WeCT9.3
Confidence Elicitation Improves Selective Generation in Black-Box Large Language Models

Liu, Sha	University of Chinese Academy of Sciences ; Computer Network Inf
Yue, Zhaojuan	Computer Network Information Center, Chinese Academy of Scienc
Li, Jun	Computer Network Information Center
Keywords: Deep Learning, AI and Applications, Neural Networks and their Applications Abstract: Large language models (LLMs) exhibit impressive capabilities across various domains in natural language processing. However, LLMs can produce fictional content, which we refer to as hallucinations, and it makes the LLMs unreliable. An important research topic is how to make LLMs accurately express the confidence to their answers, so they can refrain from outputting or regenerate output in cases of lowconfidence predictions. It facilitates the application of LLMs in high-stakes areas. Currently, research on eliciting calibrated confidence from LLMs is still insufficient. Additionally, methods for estimating uncertainty in responses based on internal parameters of LLMs become unavailable, as many existing LLMs are black boxes served via APIs. Therefore, we analyze the existing confidence elicitation methods and propose COVO, a new confidence elicitation method that allows the blackbox LLMs to output their confidence levels by letting the LLM itself judges whether the answer comes from a reliable source. Our method does not require external knowledge and it has high computational efficiency. Experiments show that COVO achieves better calibration and effectively reduces hallucinations in LLMs through selective generation. Additionally, the confidence scores enhance the reliability of the LLMs’responses.

16:00-16:20, Paper WeCT9.4
ART-Net: An Attention-Based Hybrid ResNet-Transformer Network for 12-Lead ECG Signal Classification

Liu, Kun	Shandong University of Science and Technology
Yang, Ruiping	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Luan, Wenjing	Shandong University of Science and Technology
Zhang, Zhen	Shandong University of Science and Technology
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications Abstract: Electrocardiogram (ECG) signal classification is an important task in healthcare as it plays a vital role in early prevention and diagnosis of cardiovascular diseases. In this work, we propose an attention-based hybrid ResNet-Transformer network (ART-Net) for 12-lead ECG signal classification. It is comprised of a stacked multi-scale attention-based ResNet and self-attention-based Transformer. At first, ECG signals are divided into several signal segments with the same length. Then multi-scale features are extracted by attention-based Resnet through signal segments, and attention mechanisms are used to adjust the weight of different channel features based on their importance. Next, these multi-scale features from a same ECG signal are integrated in chronological order as input to the Transformer network. In this end, extracting and fusing contextual information based on self-attention mechanism, and extracting the correlation between beats at different positions. The experimental results on CPSC2018 indicate that our model outperforms three state-of-the-art methods, and achieve 85.27% of accuracy, 86.01% of sensitivity and 85.59% of specificity, respectively.

16:20-16:40, Paper WeCT9.5
BTVD-BERT: A Bilingual Domain-Adaptation Pre-Trained Model for Textural Vulnerability Descriptions

Wang, Ziyuan	Hebei University
Liang, Xiaoyan	Hebei University
Du, Ruizhong	Hebei University
Zhou, Xin	Hebei University
Keywords: Deep Learning, Application of Artificial Intelligence, Transfer Learning Abstract: Textural Vulnerability Descriptions(TVD) refers to the natural language description of a vulnerability in databases like National Vulnerability Database(NVD) and China National Vulnerability Database(CNVD), which typically provides a concise summary of critical vulnerability details. To facilitate the understanding of domain-specific terms in TVD and the accurate extraction of information, we have introduced a bilingual domain-adaptation pre-trained model called BTVD-BERT, aimed at enhancing the model's capability to process and understand vulnerability descriptions in both Chinese and English. We explore three issues, the first being the impact of catastrophic forgetting on the model. Second, how should the dataset be proportioned to best enhance the model's generalization capabilities across both Chinese and English. Third, the addition of extra task-related metric information to the original dataset to construct a higher quality dataset, and whether training with this high-quality dataset can further improve model performance. We attempted to study the issues from a data engineering perspective and conducted numerous ablation experiments to find answers to these three questions. The experimental results indicate that catastrophic forgetting adversely affects the model, causing it to forget previously acquired knowledge while better retaining more recently obtained information. By employing a training approach using a mix of Chinese and English data, we were able to mitigate the impact of catastrophic forgetting on the model to some extent. Optimizing data proportions and improving data quality can effectively enhance the overall performance of the model. This study not only enhances the identification and analysis of security vulnerabilities but also offers new perspectives and empirical support for the research of multilingual domain-adaptive models.

16:40-17:00, Paper WeCT9.6
Distingusic: Distinguishing Synthesized Music from Human

Yong, Zi Qian	Monash University
Leong, Shu-Min	Monash University Malaysia
Rajanala, Sailaja	Monash University Malaysia
Pal, Arghya	Monash University
Phan, Raphael	Monash University
Keywords: Artificial Life, Deep Learning, AI and Applications Abstract: In this paper we focus on a problem that is increasingly plaguing the music industry; to a large extent due to the proliferation of generative AI models that enable the generation of new realistic and indistinguishable content for diverse modalities: text, image, audio, video. We address this problem from the perspective of audio watermarking; to our best knowledge, this is the first-known watermarking based approach to solve the problem of distinguishing realistic songs synthesized from generative AI models from real songs sung by humans. In more detail, our approach specifically utilizes the SHA-256 hash function, Singular Value Decomposition (SVD) and Discrete Wavelet Transform (DWT) for robust audio watermarking of synthesized songs. Before embedding, the audio is subjected to an attack phase to pinpoint less vulnerable regions for QR watermark placement. During the embedding process, the audio chunks first undergo a 1-level Discrete Wavelet Transform (DWT), and then the resulting approximate coefficients go through Singular Value Decomposition (SVD). Additionally, the watermarked array is subjected to SHA-256 hashing for collision-resistant conciseness, which is subsequently embedded into the singular values of the audio. Experimental findings demonstrate the superiority of our method over existing audio watermarking approaches under various signal attack scenarios.


WeCT10	MR10
Machine Vision and Perception 3
Chair: Selladurai, Sathiyamoorthy	Carleton University

15:00-15:20, Paper WeCT10.1
A Novel Reinforcement Learning Multi-Objective Community Detection Algorithm with Epsilon-Gradient-Greedy Strategy

Wei, Wenhong	Dongguan University of Technology
Meng, Yi	Dongguan University of Technology
Li, Qingxia	Dongguan City University
Keywords: Application of Artificial Intelligence, Deep Learning, Evolutionary Computation Abstract: Accurately categorizing communities within a social network is a crucial aspect of community detection, carrying significant practical relevance. To achieve a higher quality of community division, we combine reinforcement learning methods to learn the distribution characteristics of community nodes during the iteration process of the algorithm, which guides the nodes to migrate to other communities and enhances the algorithm's global search capability. Through a new epsilon-gradient-greedy strategy, which can obtain the node's gradient information relative to its neighborhood, and achieve higher performance in local search. To speed up the adaptability of the algorithm at the beginning of the iteration and to alleviate the resolution limitation imposed by modularity optimization, this paper employs a triangular subnetwork-based weight assignment method for balancing the weights of each edge in the network. Experimental results on real-world and synthetic network datasets demonstrate that our method's community identification precision outperforms recent community detection algorithms, exhibiting higher accuracy, higher resolution, and adaptability to various network characteristics and structural changes.

15:20-15:40, Paper WeCT10.2
DOARS: Dynamic Objects Aware RGB-D SLAM with Neural Rendering

Li, Chao	Tongji University
Yao, Chenpeng	Tongji University
Liu, Chengju	Tongji University
Chen, Qijun	Tongji University
Keywords: Neural Networks and their Applications, Representation Learning, Machine Vision Abstract: Abstract— Simultaneous Localization and Mapping (SLAM) refers to robots estimating their motion state and mapping surroundings using sensors in unknown environments. In dy- namic scenes, occlusions and variation of disparities may lead to significant error in feature matching and erroneous pose estimation. For this reason, a novel method based on neural rendering is designed to address this problem. It employs semantic segmentation to extract masks of dynamic objects, using a purely static background to construct photometric and geometric loss functions. Additionally, joint feature-based pose estimation is integrated into the initialization during the tracking process, enhancing the network convergence speed and bolstering the resilience against motion blur. In the mapping phase, a keyframe addition strategy thar compares repaired images is proposed to refine the scene representation. This method has been validated on the TUM Dataset and compared with the existing traditional dynamic visual SLAM and neural rendering-based SLAM. The results reveal that it has doubled the mapping capability in highly dynamic scenes, demonstrating that it can significantly improve localization accuracy compared with other approaches.

15:40-16:00, Paper WeCT10.3
DASwin-T: A Preprocessing Framework for Capturing Enhanced Details and Textures with Local Transformation

Guan, Ziyan	Chongqing University
Liao, Kai	Guizhou University
Ding, Ziyan	Northwestern Polytechnical University
Keywords: Machine Vision, Deep Learning Abstract: Since the introduction of Transformer models into Computer Vision (CV), remarkable progress has been made in image classification tasks. ViT captures the dependencies and correlations of each part of the image well using a global modeling-oriented mechanism. However, the Vision Transformer(ViT) framework can hardly avoid ignoring the effective local features and texture information in global modeling, and the self-attention mechanism between each embedding cannot obtain the degree of contribution of local regions compared to the global. In this study, we propose an image preprocessor DASwin-T, tailored to amplify the details and texture information of images before model learning. Images undergo the Detail and Texture Enhancement (DTE) module, which employs two distinct pathways for extracting edge features and texture features. Additionally, we partition the image into patches and utilize the Affine Transformation (AF) module to assess the importance of local information within each patch. Through this pipeline of image preprocessing, we succeed in enhancing meaningful but less readily learned information in images. Extensive experiments demonstrate that our approach surpasses state-of-the-art Vision Transformers and efficient CNN models in terms of performance.

16:00-16:20, Paper WeCT10.4
DLE: Document Illumination Correction with Dynamic Light Estimation

Quan, Jiahao	East China Normal University
Wang, Hailing	East China Normal University
Wu, Chunwei	East China Normal University
Cao, Guitao	East China Normal University
Keywords: Machine Vision, Deep Learning, Image Processing and Pattern Recognition Abstract: Document images captured through mobile devices in natural environments are often affected by various types of illumination degradation. The degradation diminishes the clarity and readability of document images, thereby complicating their application to OCR downstream tasks. Existing methods typically address only one or a limited number of degradation types and do not consider the diversity of image degradation types. Additionally, these methods typically involve a pre-trained fixed sub-network to estimate background light or shadows, which lacks flexibility and adaptability. To overcome these challenges, this study proposes a novel framework named DLE, which comprises a two-loop generative adversarial network and a multi-modal discriminator. Specifically, to improve the quality of image representation, a mask extractor is embedded before the image input generator. This forces the model to focus on the distinct features in the image, enhancing the representation of illumination anomalous and degraded regions. The mask extractor generates a luminance mask to evaluate the difference in illumination between the input and target images. Subsequently, the consistency loss computation incorporates a dynamic optimization of the mask extractor, strengthening its ability to estimate the illumination degradation part. Moreover, a pre-trained visual-language model is introduced into the multi-modal discriminator, leveraging its robust cross-modal alignment capability to improve the semantic consistency of the generated images with the preset input text. Extensive experiments demonstrate that our approach achieves the SOTA performance in terms of edit distance (ED) and character error rate (CER).

16:20-16:40, Paper WeCT10.5
Towards 3D-Denser Ultrasound Image Simulation from 2D CT-Scan for Ultrasound-Guided Percutaneous Nephrolithotomy Training

Selladurai, Sathiyamoorthy	Carleton University
Sainsbury, Ben	Marion Surgical
Watterson, James	University of Ottawa
Hibbert, Rebecca	Mayo Clinic
Satheesh B, Anila	Indian Institute of Technology Madras
Thittai, Arun	Indian Institute of Technology Madras
Rossa, Carlos	Carleton University
Keywords: Image Processing and Pattern Recognition, Machine Vision, Computational Intelligence Abstract: Virtual reality (VR) simulation can improve the outcomes of percutaneous nephrolithotomy (PNCL) - a surgery to extract kidney stones using ultrasound (US) or fluoroscopy image guidance. These simulators almost exclusively employ fluoroscopy, and no commercial VR simulator is available for US-guided PNCL (usPCNL). In this paper, we proposed the first step towards developing an usPCNL simulator that integrates a volumetric US model of the patient’s anatomy derived from parallel 2D computed tomography (CT) scans. A critical challenge in US image generation from CT scans is that the limited spatial resolution of CT slices may lead to inaccuracies in the simulated US images. The proposed algorithm interpolates successive CT images to create an augmented dataset with increased spatial resolution. Each CT slice is then converted into a US image based on principles of linear acoustics and spatial impulse response. These images are then combined to form two different volumetric US images, one derived from the original sparse CT scans, and one created with the augmented data. From these volumetric US images, new images can be formed along arbitrary imaging planes not captured in the original CT data. The obtained simulated images are compared with their corresponding real US images acquired experimentally, and further evaluated quantitatively using normalized root mean square error (NRMSE) and dice similarity coefficient (DSC). The results reveal an NRMSE of 0.235 ± 0.051 and a DSC of 0.9139 ± 0.062, showcasing a close resemblance between simulated and actual ultrasound images. Additionally, we show that denser CT scan data leads to a 25% improvement in image quality based on peak signal-to-noise ratio compared to the original dataset. This initial work is laying the foundation for the development of the usPCNL simulator, which could potentially have significant benefits for training and enhancing skills in this medical procedure.

16:40-17:00, Paper WeCT10.6
FP-GCN: A Novel Feature Pyramid Graph Convolutional Network for Skeleton-Based Action Recognition (I)

Ke, Chengyuan	Zhejiang University of Technology
Liu, Sheng	Zhejiang University of Technology
Ke, Zhenghao	Zhejiang University of Technology
Feng, Yuan	Zhejiang University of Technology
Keywords: Deep Learning, Machine Vision Abstract: For skeleton-based action recognition, the aggregation of features among human skeletal joints is a critical factor, which influences recognition accuracy in graph convolutional networks. Existing methods often neglect the extraction of skeletal structure features at different scales, which limits the ability of the model to understand actions. To address this issue, we propose a novel Feature Pyramid Graph Convolutional Network(FP-GCN) that enhances the representational capability of the model by capturing the multi-scale spatial features of the skeleton sequence. In detail, we propose an attention-based graph pooling module that effectively contracts the skeleton to multiple lower-order sub-graphs, which serve as spatial representations of the skeleton at corresponding levels. The original skeleton and these sub-graphs are combined to form the feature pyramid, where joints of each level span in the same semantic space. Additionally, we introduce a graph unpooling module to restore the pooled sub-graphs to their original topology. Moreover, we adopt a multi-loss strategy across different spatial scales, encouraging the model to learn more comprehensive skeletal features. Finally, we validate our proposed model on three large-scale datasets, achieving the highest accuracy compared to state-of-the-art methods. We conduct numerous comparative experiments to verify the effectiveness of modules.


WeCT11	MR11
Evolutionary and Heuristic Computation 2	Regular Papers - Cybernetics
Chair: Meselhi, Mohamed	University of New South Wales Canberra

15:00-15:20, Paper WeCT11.1
Augmenting Particle Swarm Optimization with Simulated Annealing and Dimensional Learning for UAVs Path Planning

Wei, Jie	Dongguan University of Technology
Zhang, Yuhui	DongGuan University of Technology
Wei, Wenhong	Dongguan University of Technology
Keywords: Swarm Intelligence, Evolutionary Computation, Metaheuristic Algorithms Abstract: 以缓解通常面临的过早收敛通过传统的粒子群优化（PSO）和增强算法在无人机中的全局搜索能力路径规划中，本文提出了一种模拟退火的方法和维度学习增强粒子群优化算法（SDPSO）。首先，学习因素惯性权重在搜索中动态调整实现全球之间均衡的过程勘探和本地开发。随后，模拟退火（SA）算法在早期被采用搜索阶段，以帮助算法摆脱局部最优并增强其发现全局最优的能力解决方案，同时保持快速收敛 PSO的特征。此外，为了纠正这一挑战的粒子振荡出现在搜索过程中， SDPSO嵌入了维度学习策略（DLS），该策略使每个粒子的所有维度都能学习到有用的东西来自全局最优位置的信息。实验的结果表明，在前 30 个中纳入 SA 算法的迭代不仅增强了

15:20-15:40, Paper WeCT11.2
An Inverse Modeling Constrained Multi-Objective Evolutionary Algorithm Based on Decomposition

Farias, Lucas Rodolfo Celestino	Universidade Federal De Pernambuco
Ribeiro Araújo, Aluizio Fausto	Universidade Federal De Pernambuco
Keywords: Evolutionary Computation, Metaheuristic Algorithms Abstract: This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with constraints. The proposed approach is experimentally evaluated on diverse real-world problems (RWMOP1-35), showing superior performance to state-of-the-art constrained multi-objective evolutionary algorithms (CMOEAs). The experimental results highlight the robustness of the algorithm and its applicability in real-world constrained optimization scenarios.

15:40-16:00, Paper WeCT11.3
An Evolutionary Framework with Improved Variance-Stabilized Multi-Objective Proximal Policy Optimization and NSGA-II

Bi, Jing	Beijing University of Technology
Yue, Caiheng	Beijing University of Technology
Yuan, Haitao	Beihang University
Zhai, Jiahui	Beijing University of Technology
Zhang, Jia	Southern Methodist University
Zhou, Mengchu	New Jersey Institute of Technology
Keywords: Swarm Intelligence, Metaheuristic Algorithms, Evolutionary Computation Abstract: Multi-objective optimization algorithms are essential for addressing real-world challenges characterized by conflicting objectives. Although conventional algorithms are effective in exploring solution spaces and generating non-dominated solutions, solution quality and dynamic adaptability of true Pareto fronts need to be improved. This work proposes a multi-objective algorithm that integrates Non-dominated sorting genetic algorithm II (NSGA-II) and Multi-Objective Reinforcement Learning (NMORL). N-MORL consists of two parts including upstream and downstream components. In the upstream component, this work improves the Variance-stabilized Multi-objective Proximal Policy Optimization (VMPPO) for enhanced convergence stability by adjusting its iteration mechanism. Additionally, this work optimizes variance networks and action sampling to balance exploration and exploitation, which improves experience sampling efficiency. This work adopts high-quality solution sets yielded by MORL as the initial solution set for downstream NSGAII, guiding the exploration space and increasing the solution number. High-quality initial solutions significantly accelerate the iterative convergence speed of N-MORL. N-MORL provides the quality and the number of solutions, better covering or approaching the true Pareto front. Experimental results with five benchmark multi-objective functions demonstrate that N-MORL outperforms the other three multi-objective evolutionary algorithms regarding high-quality solutions with the same iterations.

16:00-16:20, Paper WeCT11.4
Feature Selection Method Based on an Improved PSO Algorithm with Multilayered Update Strategies

Cao, Feng	Shanxi University
Lu, Zheng	Shanxi University
Li, Deyu	Shanxi University
Zheng, Jianxing	Shanxi University
Keywords: Swarm Intelligence, Metaheuristic Algorithms, Machine Learning Abstract: Feature selection (FS) is a crucial preprocessing method for improving feature set quality. Particle swarm optimization (PSO) is effective and simple to implement and has minimal parameter requirements, making it highly suitable for FS challenges. However, PSO can suffer from premature convergence and difficulty escaping local optima. To address this issue, our paper proposes a novel PSO-based feature selection technique featuring a multilevel updating strategy (MLUSPSO). This method starts with a unique classification approach in which the particle swarm is divided into three groups: elite, medium, and weak. This division is based on each particle's fitness and exploratory abilities, thereby fostering diversity in each group. Next, we define specific updating strategies for the feature subsets chosen by these particle groups. Notably, we devise a new correlation-based strategy to update weak-class particles, enhancing their exploratory potential in the feature space through the incorporation of correlation data. Comparative tests revealed that MLUSPSO outperforms other high-dimensional classification feature selection methods by delivering feature subsets with superior classification accuracy and fewer features.

16:20-16:40, Paper WeCT11.5
Farmland Segmentation and Multiple Agricultural Machines Cooperative Scheduling Method under Sudden Disasters

Dou, Guiping	Xinjiang University
Jia, Liruizhi	Xinjiang University
Liu, Shengquan	Xinjiang University
Zhang, Cheng	Xinjiang University
Liu, Yuan	Xinjiang University
Kong, Bo	Xinjiang University
Keywords: Heuristic Algorithms, Deep Learning, Swarm Intelligence Abstract: With the rapid development of smart farm models, multi-machine collaboration has become an important trend. However, in the face of sudden disasters, the traditional multi-machine collaborative operation has problems such as unclear operation area allocation, simple path planning, and scheduling algorithms that are easy to fall into local optimal solutions, resulting in low efficiency and unable to meet the task requirements of emergency tasks. In order to solve these problems, a multi-agricultural machinery collaborative scheduling method based on farmland segmentation was proposed. Firstly, the large-scale farmland is segmented based on the multi-machine task requirements, and the location information of the segmented sub-region is obtained. Secondly, in order to minimize the turning time in the headland, the coverage path planning is planned to be completed in the field to obtain the best coverage angle and the shortest operation time. Finally, the optimal simulated annealing algorithm was used to obtain the task assignment of agricultural machinery to minimize the maximum working time. The simulation results show that the optimization effect is achieved. In emergency Ren &#

16:40-17:00, Paper WeCT11.6
An Evolutionary Framework for Large-Scale Constrained Optimization

Meselhi, Mohamed	University of New South Wales Canberra
Hamza, Noha	UNSW
Elsayed, Saber	University of New South Wales Canberra
Essam, Daryl	University of New South Wales Canberra
Sarker, Ruhul	University of New South Wales Canberra
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Optimization and Self-Organization Approaches Abstract: In recent decades, large-scale optimization has received significant research attention; however, most of these studies have not considered problems with functional constraints. The introduction of constraints significantly amplifies the difficulty of solving optimization problems. Given the prevalence of high-dimensional constrained optimization problems in real-world applications, a critical need has emerged for an in-depth exploration of this research domain. This paper presents a novel framework that can tackle complex, large-scale constrained optimisation problems. The framework incorporates a decomposition method that leverages interactions among decision variables, employing a contribution-based strategy to prioritize subproblems that have more substantial influence on enhancing solution quality. Furthermore, the framework integrates constraint consensus to mitigate constraint violations throughout the search process. The proposed algorithm is evaluated on a test suite of constrained overlapping problems, revealing its superior performance when compared to other state-of-the-art algorithms.


WeCT12	MR12
Affective and Cognitive Computing and Information Management
Chair: Guo, Xiaopeng	Peking University

15:00-15:20, Paper WeCT12.1
Multi-Level Semi-Coupled Dictionary Learning for Enhanced Person Re-Identification

Zhao, Hongtian	Xinjiang University
Keywords: Multimedia Systems, Biometrics and Applications,, Human-Machine Interaction Abstract: Person Re-ID is concerned with crafting a resilient feature matching framework that effectively associates depictions of the same person across disparate camera perspectives. In this work, we unveil a multi-level feature matching framework for person re-identification that capitalizes on both local and global visual cues. The framework stratifies the re-identification challenge into two concurrent pipelines. Within these, this paper introduces a pair of novel semi-coupled dictionary learning (SCDL) algorithms for the cultivation of distinct, yet complementary, feature representations: (1) The Local SCDL (L-SCDL) algorithm focuses on region-level descriptors, mapping them to a mutual feature space that diminishes the influence of camera-specific variations. (2) The Global SCDL (G-SCDL) algorithm, conversely, is designed to harmonize global appearance attributes, thereby enhancing the robustness of image-level descriptor matching. These two complementary components synergistically exploit both global and local visual cues, enhancing the discriminative ability of our approach. Extensive experiments conducted on three challenging datasets demonstrate the superior performance of the proposed person re-identification framework in comparison to several mainstream approaches.

15:20-15:40, Paper WeCT12.2
Enhancing Sound Source Localization in Blind Soccer: Analyzing Head Movement Strategies and Localization Abilities for Static and Dynamic Auditory Cues Via a Virtual Spatial Acoustic System

Tsuji, Ayumu	Waseda University
Aihara, Shimpei	Japan Institute of Sports Sciences
Tanaka, Shotaro	Waseda Univercity
Iwata, Hiroyasu	Waseda University
Keywords: Human Performance Modeling, Information Visualization, Virtual and Augmented Reality Systems Abstract: In the domain of blind soccer, proficient sound localization, utilizing auditory cues, is essential for player performance. This study explores the capacity of individuals to accurately localize static and dynamic auditory cues in an environment that allows free neck movement, and investigates the strategies of head movement that contribute to optimal sound localization. Through enhancements to our virtual reality system, we conducted comprehensive verification tests to assess these abilities. Our analysis reveals that sighted players with experience in blind soccer tend to augment their sound localization capability for static sources through more pronounced and frequent head movements. Conversely, visually impaired players exhibit superior localization accuracy for both types of sound sources with minimal, yet efficient, head movements. A critical observation from this study is the differential performance in localizing dynamic sound sources; sighted players, despite active head movements, were unable to match the localization precision of their visually impaired peers. This disparity underscores the necessity for sighted players to refine their head movement techniques for improved sound localization. Importantly, this research also introduces a comparative analysis of simultaneous static and dynamic sound source localization, highlighting the complexities and adaptive strategies required for effective auditory perception in blind soccer.

15:40-16:00, Paper WeCT12.3
Mixture-Of-Experts Based Structure-Context Fusion for Programming Knowledge Tracing

Guo, Xiaopeng	Peking University
Nie, Chang	China Tower Corporation Limited
Liu, Jun	China Tower Corporation Limited
Shu, Maojing	Peking University
Huang, Zhijie	Peking University
Sun, Jun	Peking University
Keywords: Human Performance Modeling, Cognitive Computing, Human-Computer Interaction Abstract: Programming knowledge tracing (PKT) is an essential yet challenging task that aims to assess students' proficiency in programming skills by tracking their historical performance on programming assignment tasks. Previous approaches typically focus on structural or contextual views of the code to derive corresponding representations to model PKT. Unfortunately, the single-view based code representation may fail to efficiently capture the subtle differences in student-submitted code, leading to less effective performance. In this paper, we emphasize the importance of both structural or contextual views of the student-submitted code and formulate PKT as a multi-view fusion task. Guided by two common principles in multi-view learning, namely textit{complementarity} and textit{consensus}, we propose an efficient mixture-of-experts based structure-context fusion (name) method for PKT. First, to adhere to the principle of complementarity, our name~presents a MoE based fusion scheme that seeks to exploit the complementary information present in both the structural and contextual representations through individual experts collaborating in parallel. Simultaneously, a gating mechanism is employed to dynamically assign appropriate weights for each expert, facilitating an `optimal' aggregation of structural and contextual representations. Second, in pursuit of consensus, we align the knowledge hidden state derived from each representation type towards a consensus representation. This alignment strategy would capture essential common patterns across different views, facilitating the learning of coherent representations that are unified and less sensitive to view-specific noise, thus enhancing the generalization capacity of our name. We conduct extensive experiments to verify the effectiveness of our name, and the results demonstrate that our proposal achieves better performance compared to the currently popular methods.

16:00-16:20, Paper WeCT12.4
Effects of Sensor Setup Time and Comfort on User Experience in Physiological Computing

Novak, Vesna	University of Cincinnati
Kaya, Robert	University of Wyoming
Erion, Collyn	University of Wyoming
Hossain, Mohammad Sohorab	University of Cincinnati
Clapp, Joshua	University of Wyoming
Keywords: Affective Computing, Human Factors, Human-Computer Interaction Abstract: Physiological sensors are commonly applied for user state monitoring and consequent machine behavior adaptation in applications such as rehabilitation and intelligent cars. While more accurate user state monitoring is known to lead to better user experience, increased accuracy often requires more sensors or more complex sensors. The increased setup time and discomfort involved in the use of such sensors may itself worsen user experience. To examine this effect, we conducted a study where 72 participants interacted with a computer-based multitasking scenario whose difficulty was periodically adapted – ostensibly based on data from either a remote eye tracker or a lab-grade “wet” electroencephalography sensor. Deception was used to ensure consistent difficulty adaptation accuracies, and user experience was measured with the Intrinsic Motivation Inventory, NASA Task Load Index, and an ad-hoc scale. We found few user experience differences between the eye tracker and electroencephalography sensor - while one interaction effect was noted, it was small, and there were no other differences. This result is at first surprising and seems to indicate that comfort and setup time are not major factors for laboratory-based user experience evaluations of such technologies. However, the result is likely due to a suboptimal study protocol where each participant interacted with only one sensor. In future work, we will use an alternate protocol to further explore the effects of user comfort and setup time on user experience.

16:20-16:40, Paper WeCT12.5
SOS-1K: A Fine-Grained Suicide Risk Classification Dataset for Chinese Social Media Analysis (I)

Qi, Hongzhi	Beijing University of Technology
Liu, Hanfei	Beijing University of Technology
Li, Jianqiang	Beijing University of Technology
Zhao, Qing	Beijing University of Technology
Zhai, Wei	Beijing University of Technology
Luo, Dan	Wuhan University
He, Tianyu	Wuhan University
Liu, Shuo	Wuhan University
Yang, Bing Xiang	Wuhan University
Fu, Guanghui	Sorbonne University
Keywords: Cognitive Computing, Affective Computing, Human-Machine Interaction Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with the best weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance this model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available at: https://github.com/HongzhiQ/FineGrainedSuicideDetection.

16:40-17:00, Paper WeCT12.6
Improved Chinese Few-Shot Relation Extraction Using Large Language Model for Data Augmentation and Prototypical Network (I)

Xu, Haoran	Tongji University
Lou, Kechen	Zhejiang University of Technology
Chan, Sixian	Zhejiang University of Technology
Keywords: Multimedia Systems, Design Methods, Networking and Decision-Making Abstract: Chinese few-shot relation extraction aims to effectively identify relationships between entities in text using limited data, which is crucial for information extraction and knowledge construction and assists in extracting critical information from medical cases to support personalized rehabilitation treatment plans. However, due to the limited number of samples, existing methods struggle to capture sufficient relational features from limited data, resulting in poor extraction performance. Therefore, we propose an Improved Chinese Few-Shot Relation Extraction Using Large Language Model for Data Augmentation and Prototypical Network to address this issue. Specifically, we establish the Chinese few-shot relation extraction datasets DUIE Few and SanWen Few. Notably, we introduce a framework based on large language models for dataset augmentation, which effectively alleviates the problem of feature extraction due to insufficient data and improves task performance. Finally, we present baselines for prototype networks, Siamese networks, and the CFSRE model based on relational category information. Experimental results show that the CFSRE model improves accuracy, recall, and F1 score under few-shot conditions, particularly as the sample size decreases. In summary, the method we propose demonstrates promising results in Chinese few-shot relation extraction tasks and holds the potential to advance medical rehabilitation research.


WeCT13	Room T13
2P - Autonomous Systems and Robotics	2-Page Abstracts
Chair: Yu, Gwo-Ruey	National Chung Cheng University

15:00-15:20, Paper WeCT13.1
Novel Gmapping Using Quantum-Behaved Particle Swarm Optimization

Yu, Gwo-Ruey	National Chung Cheng University
Hsu, H.-H.	Delta Electronics, Inc
Keywords: Robotic Systems, Soft Robotics, Modeling of Autonomous Systems Abstract: This paper proposed quantum-behaved particle swarm optimization (QPSO) combined with Gmapping simultaneous localization and mapping (SLAM). In quantum mechanics, the positions and velocities of particles cannot be determined simultaneously due to the uncertainty principle. Therefore, in the QPSO algorithm, the positions of particles must be randomly initialized. Every particle is described by a wave function’s probability density function, which helps particles find optimal solutions. Based on the sampling method of QPSO combined with Gmapping, it can help particles find the larger solution space. The scan matching algorithm must be used as the fitness function, where the Gmapping focuses on the data of particles with high fitness values. Then, the error of building a digital map can be reduced, which can decrease the risk of crashing obstacles.

15:20-15:40, Paper WeCT13.2
Analyzing the Correlation between Sleep Duration and Heart Rate Variability Index Using ECG Big Data

Yuda, Emi	Tohoku University
Yoshida, Yutaka	Tohoku University
Keywords: Cybernetics for Informatics, Computational Life Science, Big Data Computing, Abstract: The purpose of this research is to clarify the influence on the lives and health of older adults. The influence of sleep time on heart rate variability (HRV) in older adults was investigated using ECG and 3-axis acceleration data collected from the Allostatic State Mapping by Ambulatory ECG Repository (ALLSTAR) big database. The subjects are older adults (more than 65 y.o.), and the data was recorded from 2019 to March 2021. The number and mean age were 2019 (n=23547,76 ± 6 y.o.), 2020 (n=25402,76 ± 6 y.o.), and 2021 (6205,76 ± 6 y.o.). Lying rate (%) was calculated using 3-axis acceleration and grouped into subjects with long and short sleep time. HRV indices were calculated from 1-day ECG data. As a result, RRI increased (P<0.001) and the low frequency power of ULF, VLF, and LF significantly decreased or decreased trend in the long sleep time group than the short sleep time group. A decrease in LF/HF was also observed. It is suggested that long hours of sleep in older adults who are less daily activity may decrease HRV.

15:40-16:00, Paper WeCT13.3
A Taxonomic Classification and Identification System for Robots: Abstract

Isaka, Satoru	Vision Del Mar, LLC
Keywords: Robotic Systems, Modeling of Autonomous Systems, Trust in Autonomous Systems Abstract: This is an abstract article to report a new research result on the taxonomy of robots, introducing a general taxonomic classification and identification system with a coding scheme that assigns taxonomic identifiers to robots and autonomous systems. While systematics and taxonomies have been extensively studied in biology, there is no general taxonomy that can identify and classify man-made systems in general. Machines are increasingly becoming complex, diverse, and most importantly autonomous, and as they pervasively co-exist with humans and animals in nature, the absence of a unified classification system presents significant challenges in safety, communication, and effective development of future machines. The key contributions of this research are three-fold. First on analysis: it examines in depth the current industry standards and academic literature and identifies major issues in building a taxonomy. Second on synthesis: it establishes a general principle to classify broad classes of machines and builds a coding scheme for consistent naming, identification, and classification. Third on applications: it provides exemplifying use cases to demonstrate the utility of the proposed system. The result is a general taxonomic classification system that categorizes machines not just by their physical attributes, but from the principled perspectives of autonomy, ecology, and anthropogenic factors. This interdisciplinary approach offers a robust, inclusive taxonomic framework that encompasses a broad spectrum of machines, including software and learning systems.

16:00-16:20, Paper WeCT13.4
Adaptive Neural Networked Based Impedance Control of Mobile Manipulators with Uncertainties

Li, Bo-Nian	National Cheng Kung University
Yeh, Hua-Hsuan	National Cheng Kung University
Liu, Yen-Chen	National Cheng Kung University
Keywords: Robotic Systems, System Modeling and Control, Control of Uncertain Systems Abstract: This paper introduces a novel impedance controller for mobile manipulators that ensures safe interaction with uncertain environments without requiring force information. Dynamics uncertainties are compensated by employing adapative radial basis function neural networks (RBFNN). Redundancy of mobile manipulators is managed to perform subtasks alongside tracking control without compromising performance. Stability analysis guarantee the uniformly ultimately boundedness of tracking errors during interaction with the environment and asymptotic convergence to zero in free motion. Numerical simulation is provided to validate the efficacy of the proposed method.

16:20-16:40, Paper WeCT13.5
SMCS TEAM: Open Course on Cyber Physical Systems Foundation and Design for Unmanned Aerial Vehicles (UAVs)

Wan, Yan	University of Texas at Arlington
Fu, Shengli	University of North Texas
Xie, Junfei	San Diego State University
Lu, Kejie	University of Puerto Rico at Mayaguez
Keywords: Cyber-physical systems, Autonomous Vehicle, Control of Uncertain Systems Abstract: This abstract describes the project funded by the IEEE SMCS on Transforming Educational Assets and Materials (TEAM) in Systems, Man, and Cybernetics. The project develops an open course on Cyber Physical Systems (CPS) Foundation and Design for Unmanned Aerial Vehicles (UAVs). The course will be available to the public and serve the need of researchers, students and professionals who are interested in conducting UAVs related research. The open course contains integrated modules on control, communication and networking, computing, and artificial intelligence (AI) to provide trainees a comprehensive knowledge needed for UAVs. The course is self-paced and contains quizzes in each module for help students assess the quality of learning and also allow course designers to evaluate the effectiveness of the course materials for continuous improvement. The open course promotes CPS which is a SMCS technical field. It will also attract students and professionals to the SMC community.

16:40-17:00, Paper WeCT13.6
Dynamic Capacitated Vehicle Routing Problem with Stochastic Requests Using Deep Reinforcement Learning

Tang, Kaiqiang	Nanjing University
Fu, Huiqiao	Nanjing University
Liu, Jiasheng	Nanjing University
Deng, Guizhou	Southwest University of Science and Technology
Lu, Yuanyang	Nanjing University
Chen, Chunlin	Nanjing University
Keywords: Autonomous Vehicle, Intelligent Transportation Systems Abstract: Delivery tasks are usually dynamic, and much information about customer and order requests is disclosed over time. In this work, we propose a deep reinforcement learning (DRL) model based on a dynamic attention network for Dynamic Capacitated Vehicle Routing Problem (CVRP) with Stochastic Requests (DCVRPSR), which extends the attention model from the original static CVRP task to a dynamic CVRP task with stochastic requests. With the dynamic encoder-decoder architecture, our proposed DRL model can track the changes in customer disclosure status in real time. Experimental results show that the proposed DRL model outperforms the state-of-the-art traditional algorithms LKH and OR-Tools in computational speed and solution quality.


WeCPSR	Room T14
Poster Presentation - Session 4	Poster Session

15:00-17:00, Paper WeCPSR.1
Towards Personalized Federated Learning Via Comprehensive Knowledge Distillation

Wang, Pengju	Chinese Academy of Sciences
Liu, Bochao	Chinese Academy of Sciences
Guo, Weijia	Chinese Academy of Sciences
Li, Yong	Chinese Academy of Sciences
Ge, Shiming	Chinese Academy of Sciences
Keywords: Machine Learning, Deep Learning Abstract: Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method.

15:00-17:00, Paper WeCPSR.2
Balanced Subgoals Generation in Hierarchical Reinforcement Learning

Tong, Sifeng	Soochow University
Liu, Quan	School of Computer Science and Technology, Soochow University, S
Keywords: Machine Learning, Deep Learning, AI and Applications Abstract: 分层强化学习（HRL）已展示在扩展强化学习方法方面的潜力。但是，它在扩展任务时遇到困难稀疏的外部奖励。其中，存在的问题勘探效率低下、非平稳性严重。在本文中，我们提出了一种新的HRL方法，其中我们为子目标设计两个度量：平衡和潜力。通过生成新的子目标并将其扩展到更多前景广阔的领域，我们提升了勘探效率通过在过程中实现平衡。最后，杠杆作用这两项措施，我们采取积极的勘探政策，以防止引入低层次的内在奖励，这自然避免了非平稳问题。我们实验表明，我们的方法超越了 MuJoCo 任务中当前最先进的 HRL 基线，具有稀疏的奖励。

15:00-17:00, Paper WeCPSR.3
A Multiform Evolutionary Approach with Enhanced Diversity and Opposition-Based Learning for Multi-Objective Feature Selection

Li, GaoHui	Sun Yat-Sen University
Chen, Zefeng	Sun Yat-Sen University
Zhou, Yuren	Sun Yat-Sen University
Keywords: Evolutionary Computation, Machine Learning Abstract: Recently, there has been extensive research on multi-objective feature selection (FS). In general, minimizing the number of selected features and maximizing classification performance are two primary objectives of most multi-objective FS tasks. However, due to complex interactions among features, a subset with poor classification performance does not necessarily imply uselessness; for instance, some features, when combined with others, can significantly enhance classification performance. Therefore, not only the classification performance of feature subsets but also their diversity in the search space should be considered. Motivated by this, this paper proposes an evolutionary multitasking based optimization framework, named textit{multi-objective feature selection with enhanced diversity and opposition-based learning}. Firstly, it utilizes two different forms of single-objective FS tasks to provide feature subsets with better classification performance and greater diversity for the multi-objective FS task at hand, thereby promoting evolutionary search in multi-objective FS tasks. Then, to explore more feature combinations, a novel crossover operator utilizing opposition-based learning is proposed to search for more promising feature subsets. Additionally, a duplicate solution handling approach is proposed to maintain population diversity. Experimental results on ten classification datasets demonstrate that the proposed method outperforms five state-of-the-art FS methods in most cases.

15:00-17:00, Paper WeCPSR.4
A Dynamic Weight Optimization Strategy Based on Momentum Method

Wei, Lin	Zhengzhou University
Wang, Qilong	Zhengzhou University
Lian, Huijuan	Zhengzhou University
Shi, Lei	Zhengzhou University, School of Cyber Science and Engineering
Yuan, Shaohua	Zhengzhou University
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications Abstract: Federated learning is an emerging machine learning framework, which is commonly used in the structure of distributed machine learning due to its characteristic of “data immutable model motion”. In practical scenarios, the data samples and hardware conditions between clients are highly heterogeneous. The traditional simple aggregation can cause the global model to unintentionally favor certain clients. There is a significant performance gap between vulnerable groups and groups with richer training resources in the global model. This paper proposes Dynamic Momentum-based Federated Learning (DMFL) to address this issue. It dynamically adjusts the client aggregation weight based on historical performance and current round losses in each round. Experimental results show that DMFL can improve the effectiveness of the overall model while reducing the variance of the client accuracy distribution. Compared to existing baselines, the proposed algorithm performs superior fairness in results.

15:00-17:00, Paper WeCPSR.5
Prediction of Ship Operation Time at Bulk Cargo Terminals Using Stacking Ensemble Learning

Zhang, Wei	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Zhang, Weili	Qinggang International Co., LTD
Xue, Song	Qinggang International Co., LTD
Guo, Xiwang	Liaoning Petrochemical University
Keywords: Machine Learning, Big Data Computing,, Application of Artificial Intelligence Abstract: Ship operation time is a crucial factor in developing berth plans. While most existing research focuses on container terminals, few scholars have examined bulk terminals due to the unique nature of their cargo and complex operational processes. This work proposes a berth characteristic classification-based prediction method (BCCPM) to predict the operation time of ships in bulk cargo terminals. Firstly, recognizing both the similarities and differences among berths in bulk cargo terminals, this paper introduces a berth clustering method based on K-means (BCMK) to group berths with similar operational traits. Then, to conquer the limitations of individual machine learning models in prediction, a stacking ensemble learning approach (SELA) is proposed for predicting ship operation time in various types of berths. Experiments are conducted on the real operation data from Qingdao dry bulk cargo terminal, China. The results show that SELA outperforms single machine learning models in terms of prediction accuracy and generalization. Moreover, BCCPM effectively captures operational nuances of different berths, resulting in a 2-hour reduction in MAE and a 6% decrease in MAPE compared to the overall prediction method.

15:00-17:00, Paper WeCPSR.6
PTGFI: A Prompt-Based Two-Stage Generative Framework for Function Name Inference

Wang, Menglu	Qilu University of Technology (Shandong Academy of Sciences)
Han, Xiaohui	Qilu University of Technology (Shandong Academy of Sciences)
Wang, Peipei	Qilu University of Technology (Shandong Academy of Sciences)
Zuo, Wenbo	Qilu University of Technology (Shandong Academy of Sciences)
Keywords: Deep Learning, Machine Learning, Artificial Social Intelligence Abstract: In the field of cybersecurity, analyzing malicious software or programs is crucial for preventing network attacks. Malicious code often exists in a stripped binary form to thwart analysis, presenting challenges for analysts. This study investigates inferring function names from stripped binary to aid security researchers in analyzing malicious code. We propose PTGFI, a Prompt-based Two-stage Generative framework for Function name Inference. The PTGFI framework transforms the task of inferring function names into a two-stage semantic generation problem. By capturing function descriptions of assembly functions and introducing prompt learning, effective inference of function names is achieved. In experiments, PTGFI outperforms the state-of-the-art model by 2.96% in precision. Moreover, ablation studies demonstrate the effectiveness of advanced components within the PTGFI framework. We further validate the utility and reliability of function names generated by the PTGFI framework through case studies.

15:00-17:00, Paper WeCPSR.7
Rescue Operators’ Perspectives on KIRETT Wearable Technology: A Qualitative Study

Nadeem, Mubaris	University of Siegen
Zenkert, Johannes	University of Siegen
Bender, Lisa	University of Siegen
Weber, Christian	University of Siegen
Fathi, Madjid	Institute of Based System & Knowledge Management
Keywords: Medical Informatics, Wearable Computing, User Interface Design Abstract: In emergencies, treatment needs to be fast, accurate and patient-specific. For instance, in emergency scenarios, obstacles like treatment environments and medical difficulties can lead to bad outcomes for patients. Additionally, a drastic change of health vitals can force paramedics to shift to a different treatment in the ongoing treatment of the patient in order to save a patient’s life. The KIRETT (engl.: ‘Artificial intelligence in rescue operations’) demonstrator is developed to provide a rescue operator with a wrist-worn device, enabling treatment recommendation (with the help of knowledge graph) with situation detection models to improve the emergency treatment of a patient. This paper aims to provide a qualitative evaluation of the 2-days testing in the KIRETT project with the focus of knowledge graphs, knowledge fusion, and userexperience-design (UX-design).

15:00-17:00, Paper WeCPSR.8
Real-Time Person-Following Robot: Front-Following Using Human Motion Prediction

Wang, Ansheng	The University of Tokyo
Makino, Yasutoshi	The University of Tokyo
Shinoda, Hiroyuki	The University of Tokyo
Keywords: Human-Collaborative Robotics, Human-centered Learning, Human-Machine Interaction Abstract: Many studies have been conducted on companion robots that follow behind a human leader; however, this strategy puts the robot out of sight of the person it is accompanying. To stay within sight, the robot needs to follow the leader from a different position. This paper presents a front-following system for an autonomous mobile robot using a Kinect sensor. Research effort is concentrated on control of the robot, which walks in front of the human leader. For a general human-following system, especially for front-following, both localization of the robot and the prediction of the human’s motion and state position are necessary. However, the framework proposed in this study uses a machine-learning-based prediction system to direct the robot ahead of the human without the need for robot localization. The proposed human motion prediction neural network can predict the 3D coordinates of a human walking behind the robot and, when combined with a proportional-integral–differential controller to control robot movement, enables accurate following for turning angles up to 100°. Since robot localization is not required, only one Kinect sensor is needed. The front-following system is validated via both simulation and real-time experiments, demonstrating overall success in front-following for wide and narrow spaces.

15:00-17:00, Paper WeCPSR.9
An Incremental Remaining Useful Life Prediction Method Based on Wasserstein GAN and Knowledge Distillation

He, Xiaorui	Tongji University
Ding, Chen	Tongji University
Qiao, Fei	Tongji University
Shi, Jiaxuan	Tongji University
Keywords: Fault Monitoring and Diagnosis Abstract: 不同操作条件下对设备的剩余使用寿命（RUL）进行准确和及时的估计，有助于主动维护设备，并防止可能导致经济损失和人员伤亡的故障。该文提出一种基于Wasserstein GAN的梯度惩罚和知识蒸馏（WGAN-KD）的任务增量RUL预测方法，以实现高精度、快速的预测。WGAN-KD开发了一种双重旧任务留存模型，以确保旧任务的留存，同时促进新任务的获取。为了评估WGAN-KD的性能，在不同工作条件下对滚动轴承进行了几次实验。实验结果表明，在不同工作条件下，WGAN-KD在精度上优于比较的增量学习方法。此外，与批量学习方法相比，它在保持高精度预测的同时提高了训练效率。

15:00-17:00, Paper WeCPSR.10
Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment

Chowdhury, Jayabrata	Indian Institute of Science, Bangalore
Shivaraman, Venkataramanan	Indian Institute of Science, Bangalore
Dangi, Sumit	Indian Institute of Science Education and Research, Bhopal
Sundaram, Suresh	Indian Institute of Science
Baliyarasimhuni, Sujit	IISER Bhopal
Keywords: Autonomous Vehicle, Intelligent Transportation Systems Abstract: Autonomous Vehicle (AV) decision-making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. The varying importance of these interactions for AV safety necessitates adaptive approaches. In this work, we propose the Deep Attention Driven Reinforcement Learning (DAD-RL) method, which dynamically incorporates the significance of surrounding vehicles into the state space representation for RL-based AV decision-making. Our approach introduces an AV-centric spatio-temporal attention encoding mechanism that learns the dynamic interactions with different surrounding vehicles. Since contextual information is also important, the context encoder extracts features from context maps. The spatio-temporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft-Actor Critic (SAC) algorithm. Evaluation on SMARTS urban benchmarking scenarios without traffic signals demonstrates that DAD-RL outperforms recent state-of-the-art methods. Furthermore, an ablation study underscores the importance of both the context-encoder and spatio-temporal attention encoder in achieving superior performance.

15:00-17:00, Paper WeCPSR.11
Hybrid Variable Neighborhood Search Algorithm for the Multi-Objective Distributed Permutation Flowshop Scheduling Problem with Sequence-Dependent Setup Times

She, Mingzhe	Tongji University
Qiao, Fei	Tongji University
Ma, Yumin	Tongji University
Wang, Junkai	Tongji University
Ai, Jiakang	Tongji University
Liu, Juan	Tongji University
Keywords: Intelligent Green Production Systems, Manufacturing Automation and Systems Abstract: This paper addresses the multi-objective distributed permutation flowshop scheduling problem with sequence-dependent setup times (MODPFSP_SDST), whose optimization objectives are makespan, the total energy consumption and the noise emission. An effective hybrid variable neighborhood search (HVNS) algorithm is proposed for solving this problem. HVNS utilizes a probabilistic matrix to store structural information of the promising solution, and then employs matrix-based search operators to execute variable neighborhood search. Additionally, the objective-oriented local intensification search is integrated to further enhance solution quality. Simulations and comparisons demonstrate the effectiveness of HVNS in solving the MODPFSP_SDST.

15:00-17:00, Paper WeCPSR.12
An eBPF-Empowered Congestion Control System with Delay Requirements

Pan, Wenqi	Tongji University
Xu, Yuedong	Fudan University
Wang, Chenhao	Fudan University
Wu, Jun	Fudan University
Keywords: Intelligent Transportation Systems, Adaptive Systems Abstract: The rapid development of new communication applications such as virtual reality and video conferencing has brought new challenges to congestion control algorithms. In particular, these applications have specific requirements in terms of delay. Meeting specific delay requirements without high throughput loss is difficult, especially in dynamic networks. In addition, it is important that the proposed congestion control algorithms can be easily deployed. In this paper, we propose a congestion control algorithm, namely TD-BBR, to meet the delay requirements of different applications. TD-BBR is built on BBR and can adapt to various network environments without high throughput loss. we employ an online algorithm based on the recursive least squares method to predict future bandwidth. We design a simple and effective algorithm to adjust the congestion window (CWND) to meet the specific delay requirements according to the value of bandwidth prediction and the distance between the current delay and the target delay. We implement a real congestion control system through extended Berkeley Packet Filter (eBPF) technology and have deployed it in the Linux kernel without recompiling the kernel. Extensive experiments show that TD-BBR can effectively meet different delay requirements in most cases, decrease the 95th percentile delay, and avoid high throughput loss compared to other congestion control algorithms.

15:00-17:00, Paper WeCPSR.13
Multitask Routing Design for Load-Balancing Satellite Networks

Wang, Xiaoling	Tongji University
Fan, Zheng	Tongji University
Zhao, Shibing	Tongji University
Deng, Qi	Tongji University
Kang, Qi	Tongji University
Dou, Xiao	Tongji University
Keywords: Intelligent Transportation Systems Abstract: 传输和低延迟，空空空地（SAG）一体化网络引起了广泛关注。自提高服务质量，我们要考虑如何同时计划多个通信任务的路径时间。基于多任务优化，我们提出了一种新的深度挖掘和利用演进经验的框架提高路由效率。在提议的框架中，每个任务的种群分为两个零件，即性能较好和性能较差的零件亚种群。然后，随机交换策略是专为前者设计，并具有任务间知识提出转移策略供后者探索更有前途的地区，改善人口多样性。实验结果表明，所提出的框架可以同时解决许多通信任务合理分配网络资源。

15:00-17:00, Paper WeCPSR.14
Generative AI Impact on the Future of Work: Insights from Software Development

da Conceição Lima, Caroline	Universidade Federal Do Rio De Janeiro
Salazar, Herbert	Universidade Federal Do Rio De Janeiro
Lima, Yuri	UFRJ
Barbosa, Carlos Eduardo	Universidade Federal Do Rio De Janeiro
Argôlo, Matheus	Universidade Federal Do Rio De Janeiro
Lyra, Alan	UFRJ
Souza, Jano	Federal University of Rio De Janeiro
Keywords: Technology Assessment Abstract: Recent Artificial Intelligence advancements raise concerns about the future of work, particularly technological unemployment. Studies show automation's impact, but tools like ChatGPT disrupt even traditionally secure professions like programming. In this study, we reevaluate AI's effects using a method to assess the impact of Generative AI technologies on occupations to understand the potential effects of Generative AI systems on software development work. Valuable insights were obtained by gathering the view of a group of workers, primarily composed of developers who are starting their careers, regarding the impact of these technologies on the tasks they perform to provide a comprehensive understanding of the implications of Generative AI for software development. Results show that all programming tasks performed by these workers would experience some impact by Generative AI -- 65% of the tasks being considerably impacted, 12% moderately impacted, and 18% minimally impacted. This analysis highlights the substantial influence of Generative AI technologies on software development, mainly affecting those in the early stages of their career. The results of this work contribute to the academic community with valuable information. Policymakers can also use this information, as this work provides a comprehensive view of the impacts of Generative AI on software developers, considering their direct impact on job tasks.

15:00-17:00, Paper WeCPSR.15
Integrating Private and Accountable Threshold Signature into Ciphertext-Policy Attribute-Based Encryption Supporting Collaboration Decryption

Chen, Meixin	East China Normal University
Keywords: Technology Assessment, Communications, Cooperative Systems and Control Abstract: Cloud services are becoming increasingly popular in social life. Collaboration in the cloud is critical for teams in different locations and organizations. However, the current collaborative decryption scheme requires all members to be online in order to decrypt, which causes offline members to affect the progress of the project. In this paper, we propose a more flexible and reliable collaborative decryption solution for team collaboration scenarios in cloud services. Our solution, based on the Threshold Accountable Private Signature (TAPS) framework, offers higher security, practicality, and lower overhead. Unlike the existing scheme, our approach allows for decryption when the number of online members reaches a certain threshold. Additionally, our framework can track individual members instead of all members, ensuring continuity of collaboration. To demonstrate the practicality and efficiency of our framework, we implement and verify its usefulness through experiments using the Rust programming language.

15:00-17:00, Paper WeCPSR.16
Joint Awareness and Congestion Control in Vehicular Networks

Chen, Ying	Shandong University of Science and Technology
Zhang, Fuxin	Shandong University of Science and Technology
Keywords: Communications, Intelligent Transportation Systems Abstract: In vehicular networks, vehicles frequently broadcast vehicle states information to track the movement of their neighbors. A large number of vehicles get access to the shared channel resources to broadcast their states information, which may lead to channel congestion. The existing channel congestion solutions mainly focus on the media access control layer state to optimize channel resource utilization, without considering the impact of the physical layer on successful packet reception performance. In this article, we propose a packet reception model to represent the successful reception rate of state packets under the influence of interference signals and noise, and design an application-specific utility function. This utility considers the states of the physical layer and the media access control layer to balance reducing the physical layer interference and meeting application requirements. We then propose a distributed joint power and rate control algorithm that uses on-demand mode to allocate channel resources to meet the safety application requirements of each vehicle. The simulation results show that our work can effectively prevent channel congestion and improve the safety performance of vehicles in different driving scenarios.

15:00-17:00, Paper WeCPSR.17
Cross-Domain Transfer Learning Using Attention Latent Features for Multi-Agent Trajectory Prediction

Loh, Jia Quan	Monash Univeristy Malaysia
Ding, Fan	Monash University
Luo, Xuewen	Monash University
Tew, Hwa Hui	Monash University Malaysia
Loo, Junn Yong	Monash University Malaysia
Ding, Ze Yang	Monash University Malaysia
Susilawati, Susilawati	Monash University Malaysia
Tan, Chee Pin	Monash University
Keywords: Intelligent Transportation Systems, Autonomous Vehicle, Smart Buildings, Smart Cities and Infrastructures Abstract: With the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effectively to unseen networks. To address this, we proposed a novel spatial-temporal trajectory prediction framework that performs cross-domain adaption on the attention representation of a Transformer-based model. A graph convolutional network is also integrated to construct dynamic graph feature embeddings that accurately model the complex spatial-temporal interactions between the multi-agent vehicles across multiple traffic domains. The proposed framework is validated on two case studies involving the cross-city and cross-period settings. Experimental results show that our proposed framework achieves superior trajectory prediction and domain adaptation performances over the state-of-the-art models.

15:00-17:00, Paper WeCPSR.18
MGS-Net: Fusing Global and Local Feature Enhancements for Healthcare Education Management of Myasthenia Gravis Using Speech Data (I)

Tang, Jing	Beijing University of Technology
Li, Jianqiang	Beijing University of Technology
Zou, Jingchen	Beijing University of Technology
Huang, Yuning	Beijing University of Technology
Ding, Shujie	Beijing University of Technology
Zhao, Linna	Beijing University of Technology
Xu, Xi	Beijing University of Technology
Keywords: Medical Informatics, Assistive Technology Abstract: Myasthenia gravis (MG) is a neurological disease that is difficult to diagnose and requires long-term management. The progression of this disease is reflected to some extent in changes in speech, such as hoarseness and articulation disorders. However, it is difficult for general neurologists to grasp the diagnostic patterns of such rare diseases, especially in underdeveloped regions. As an emerging field, speech-based intelligent diagnostic assistance provides a safe, non-invasive, and convenient solution for healthcare education management. To this end, we firstly constructed a novel Chinese speech dataset of myasthenia gravis patients (MGCS). Then we proposed a network named Myasthenia Gravis Speech Net (MGS-Net) for the classification of myasthenia gravis pathological speech, which is mainly composed of two blocks: the Local Feature Enhancement (LFE) block and the Feedforward Dense (FFD) block. The LFE block extracts temporal local features using a sliding window approach, while the FFD block captures the global representation of the data. Compared to existing methods, our pipeline achieves an accuracy of 98.75% and a recall rate of 99.17%. We validated the effectiveness of existing acoustic feature sets in pathological speech classification of MG, which will provide an important tool for health education management of neurological diseases.

15:00-17:00, Paper WeCPSR.19
Sustainable Energy Planning for Community Microgrids Considering Economic, Environmental, and Resilience Factors (I)

Uddin, Moslem	The University of New South Wales
Mo, Huadong	University of New South Wales
Dong, Daoyi	Australian National University
Keywords: System Modeling and Control Abstract: This study presents a framework for sustainable energy planning of community microgrids (MGs), integrating optimal design and decision-support tools. A rural community in New South Wales, Australia, is considered as a case study for this investigation. The proposed microgrid framework is evaluated based on economic viability, environmental sustainability, and community resilience. The economic analysis reveals an attractive net present cost of 3.26 million over the MG’s 25-year lifetime, with a competitive levelized cost of energy of 0.196 per kWh. The environmental impact assessment quantifies a significant reduction of 394.429 tonnes of CO2- equivalent greenhouse gas emissions annually through the integration of 200 kW of solar photovoltaic and 258 kW of wind turbines. The resilience assessment demonstrates a high energy reliability with zero unmet loads facilitated by backup systems and decision-making tools. The findings contribute to the field of sustainable energy planning by providing a comprehensive and integrated approach that addresses the complex interplay of economic, environmental, and resilience factors in the context of community MGs.


WeDT1	MR01
Cybernetics and Quantum Systems 6
Chair: Kaji, Hirotaka	Toyota Motor Corporation

17:30-17:50, Paper WeDT1.1
Graph Attention Networks for Invisible Attack Identification in Smart Grids

Jiang, Fu	Central South University
Wu, Hui	Central South University
Tang, Yihan	Central South University
Liu, Weirong	Central South University
Ren, Wanwan	Central South University
Yang, Yingze	Central South University
Keywords: Machine Learning Abstract: 随着先进通信和传感器技术的集成，传统电网正在转向自动化智能电网，同时加剧了网络攻击的风险。该文提出一种时空图注意力网络，以改进智能电网中的隐形攻击检测。首先，利用图论和网格知识进行建模，以学习图结构和空间特征。其次，基于图注意力机制提取隐藏特征，预测节点未来行为;第三，根据端到端学习的节点偏差计算图偏差分数，这是识别攻击的判断;仿真结果表明，我们的方法比基线方法更准确地识别攻击。

17:50-18:10, Paper WeDT1.2
User Clustering for Pairwise Comparison with Missing Values Using Bradley-Terry Model and Consensus Clustering

Kaji, Hirotaka	Toyota Motor Corporation
Keywords: Machine Learning Abstract: As a method of evaluating user preferences for items such as products and designs, the pairwise comparison method, where the user chooses which of two items is preferable, is often employed. Since the preference for items varies among users, clustering methods such as k-means can be applied to capture these tendencies. However, as the number of items increases, the number of comparisons also increases. Therefore, missing values occur when users do not answer due to fatigue, or when experimenters reduce the number of comparisons in advance. The net-win vector, which counts the number of wins for each item against others, has robustness for clustering with missing values. In this paper, we propose a multiple imputation method for clustering pairwise comparison data. Our approach combines the net-win vector with missing-value imputation using the Bradley-Terry model and consensus clustering to integrate multiple clustering results. Furthermore, we attempt to improve performance by iteration between missing value imputation and clustering. The effectiveness of the proposed method is demonstrated through synthetic and benchmark problems.

18:10-18:30, Paper WeDT1.3
Quantum Robust Control for Time-Varying Noises Based on Adversarial Learning (I)

Ji, Haotian	University of Science and Technology of China
Kuang, Sen	University of Science and Technology of China
Dong, Daoyi	Australian National University
Chen, Chunlin	Nanjing University
Keywords: Quantum Cybernetics Abstract: Time-varying noises are one of the reasons that make it difficult for quantum systems to complete control tasks. How to quantify the influence of time-varying noises on control results and how to design a control law that can resist time-varying noises are two important problems. In this paper, the adversarial learning is introduced into quantum control and the loss function under the worst-case noise is used as a way to quantify the impact of time-varying noises on control performance. We utilize the Gradient Ascent Pulse Engineering (GRAPE) technique to search the worst-case noise and meanwhile offer a strategy to improve the robustness of the control law. Simulation experiments on a two-qubit system and a four-qubit system show that the found noises indeed can act as worst-case noises. Furthermore, the optimized control laws demonstrate good robustness to time-varying noises in state preparation tasks.


WeDT2	MR02
Big Data and Analytics
Chair: Nomura, Ryo	Waseda University

17:30-17:50, Paper WeDT2.1
Information Spectrum Approach to Binary Hypothesis Testing with Unknown Parameters

Nomura, Ryo	Waseda University
Keywords: Big Data Computing,, Machine Learning, Computational Intelligence in Information Abstract: In the field of information theory, the optimum hypothesis testing exponent, which is defined as the maximum exponent of the type II error probability under the condition that the type I error probability is smaller than or equal to some constant, has been analyzed in several settings. In particular, information spectrum methods, which is one of efficient techniques in information theory, have been applied to the hypothesis testing problem, and have succeeded in giving the general formula of the optimum hypothesis testing expo- nent. Recently, two typical extensions of the binary hypothesis testing setting have received much attention. One is the case where we do not know the probabilistic distributions. The other is the hypothesis testing problem in the presence of noise. However, information spectrum methods have not yet been applied to the hypothesis testing in these two directions. Hence, in this paper we develop information spectrum methods to treat the hypothesis testing in these settings. In particular, we first consider the hypothesis testing with noise and show the optimum exponent of the type II error probability under the condition that the type I error probability is smaller than or equal to some constant. Then, we extend this result to the case where probability distributions of data are unknown.

17:50-18:10, Paper WeDT2.2
Rumor Detection Based on Macro-Micro Public Opinion Modeling and Sentiment Scores

Li Li, Li Li	Chongqin University
Hu, Jingyang	University
Fu, Shihao	Chongqing University
Xin, Xing	Chongqing Academy of Metrology and Quality Inspection
Zhou, Wei	Chongqing University
Keywords: Artificial Social Intelligence, Big Data Computing,, Computational Intelligence in Information Abstract: 关于新闻传播中信息变化的研究开始了受到关注，但揭开了实际谣言传播的特征仍然具有挑战性。基于传播的现有谣言检测方法模式主要学习图结构信息或作为传播特征的时间序列结构。然而，这些策略没有充分考虑哪些特征应该在宏观上解决，或者微观层面。此外，其根本表现新闻传播——公众舆论——没有得到足够的重视注意力。我们提出了一种基于巨集-微观的舆论（MMPO）建模方法来解决这些差距。这种方法使人们能够更好地理解新闻传播社交媒体平台。首先，我们纳入了帖子情绪是一个至关重要的特征。其次，我们研究了如何微观层面的传播影响着子帖子。接下来，我们研究了公共场合的时间变化从巨集角度的观点。最后，我们

18:10-18:30, Paper WeDT2.3
Online Anomaly Detection for Streaming Data in the Presence of Missing Values (I)

Xu, Xinyao	Tianjin University of Technology
Liu, Mengna	Tianjin University of Technology
Cheng, Xu	Norwegian University of Science and Technology
Zhang, Jianhua	Tianjin University of Technology
Song, Lei	Chinese Academy of Sciences
Keywords: Big Data Computing, Abstract: 在线异常检测是数据分析中的关键领域，特别是对于处理动态数据流和解决概念漂移的挑战。虽然当前的在线异常检测方法已经取得了重大突破，但创建一个能够在数据缺失的情况下持续有效地学习的系统仍然是一个艰巨的挑战。在本文中，我们引入了一种基于自动编码器的在线深度异常检测模型，该模型可以解决数据缺失和概念漂移问题。该模型具有一个轻量级模块，专门设计用于高效的缺失值处理。此外，它还集成了一个自适应模型池，以管理动态数据流中常见的时变概念漂移。这种灵活且动态的管理机制使模型能够适应数据流的变化，从而在各种条件下保持强大的异常检测性能。通过对受概念漂移影响的高维数据集进行的10个


WeDT3	MR03
Assistive and Companion Technology 2	Regular Papers - HMS
Chair: Noguchi, Takahiro	Tokyo Denki University

17:30-17:50, Paper WeDT3.1
Automatic Classification of Subjective Time Perception Using Multi-Modal Physiological Data of Air Traffic Controllers

Aust, Till	University of Konstanz
Balta, Eirini	Panteion University of Social and Political Sciences
Vatakis, Argiro	Panteion University of Social and Political Sciences
Hamann, Heiko	University of Konstanz
Keywords: Assistive Technology, Companion Technology, Human-Machine Cooperation and Systems Abstract: In high-pressure environments where human individuals must simultaneously monitor multiple entities, communicate effectively, and maintain intense focus, the perception of time becomes a critical factor influencing performance and well-being. One indicator of well-being can be the person's subjective time perception. In our project ChronoPilot, we aim to develop a device that modulates human subjective time perception. In this study, we present a method to automatically assess the subjective time perception of air traffic controllers, a group often faced with demanding conditions, using their physiological data and eleven state-of-the-art machine learning classifiers. The physiological data consist of photoplethysmogram, electrodermal activity, and temperature data. We find that the support vector classifier works best with an accuracy of 79 % and electrodermal activity provides the most descriptive biomarker. These findings are an important step towards closing the feedback loop of our ChronoPilot-device to automatically modulate the user's subjective time perception. This technological advancements may promise improvements in task management, stress reduction, and overall productivity in high-stakes professions.

17:50-18:10, Paper WeDT3.2
Support Vector Machine and Random Forest Evaluation for Motion Intention Detection Using Surface EMG

Noguchi, Takahiro	Tokyo Denki University
Inoue, Jun	Tokyo Denki University
Keywords: Human-Machine Interface, Assistive Technology, Human-Machine Cooperation and Systems Abstract: Although the world's population is growing, the birth rate is currently declining and is expected to decline, resulting in an ageing population. Electric wheelchairs enhance social activities by expanding the mobility range of older people. However, the proportion of accidents involving electric mobility devices is high among people aged 70 years and above. This is because of decreased sensory and cognitive speeds and motor function. To improve these factors, we focused on surface electromyograms (sEMG) that are output before the start of movement. Using this signal, we used machine learning to estimate the distance and direction of movement of a hand before concluding the control operation of the wheelchair. Multiple features, including frequency and magnitude sum, were determined from the measured sEMG, and two types of machine learning, support vector machine (SVM) and random forest (RF), were used to identify the distance and direction of movement. Three types of movement distances and four types of movement directions were identified. The feature importance was calculated, and those contributing to the distance and direction discrimination were determined and compared for each machine learning method. The discrimination rate using an RF was higher than that using SVM. The feature contribution and each participant's score were calculated for each feature. We focused on the feature contributions and the muscles used by each participant. In the RF model, the anterior deltoid was important for identifying the direction of movement, while the middle deltoid was important for identifying the movement distance. The findings of this study indicate that the features utilized in SVM for each subject exhibited less variation compared to those employed in RF. This suggests the potential for developing a model capable of generalizing across multiple participants. In future studies, we will focus on the features contributing to discrimination and create a classifier that can discriminate using a single model.

18:10-18:30, Paper WeDT3.3
Development of SNS Applications to Control Cyberbullying with Nudge Using Multilingual Translation

Hirano, Taichi	University of Tsukuba
Tanaka, Fumihide	University of Tsukuba
Keywords: Interactive and Digital Media, Assistive Technology, Networking and Decision-Making Abstract: Cyberbullying on social networking sites can cause harm by increasing the likelihood of suicidal thoughts and attempts and should be controlled. To control cyberbullying in the long term, it is necessary to change the attitudes and behaviors of individuals who engage in harmful communication. In the current study, we focused on positive affect (PA) and negative affect (NA), which are related to attitudes and behaviors. We developed two SNS applications (X-type and VRChat-type) that incorporate textit{nudge} using Multilingual Translation (NMT), which translates cyberbullying words into positive words in other languages. The process also exposes users to positive reactions from imaginary third parties who have seen these positive words. In this paper, we describe the development of two types of SNS applications and report the initial results of a pilot study to explore their potential utility.


WeDT4	MR04
Technology Assessment	Regular Papers - SSE


WeDT5	MR05
Cyber-Physical Systems and Robotics 4	Special Sessions: SSE
Chair: Yue, Yuangan	University of Science and Technology Beijing

17:30-17:50, Paper WeDT5.1
Vibration Control Using a Robust Input Shaper Via Extended Kalman Filter-Incorporated Residual Neural Network (I)

Yang, Weiyi	University of Chinese Academy of Sciences
Shuai, Li	University of Oulu, Technology Research Center of Finland (VTT)
Luo, Xin	Chinese Academy of Sciences
Keywords: System Modeling and Control, Robotic Systems, Soft Robotics Abstract: With the rapid development of industry, there is a growing concern over the vibration control challenges associated with flexible structures and underactuated systems. Input shaping technology enables stable performance for high-speed motion in industrial motion systems. However, existing input shapers commonly suffer from the ineffective control performance caused by the ignorance of observation error errors. To address this critical issue, this paper proposes an Extended Kalman Filter-incorporated Residual Neural Network-based input Shaping (ERS) model for vibration control. Its main ideas are two-fold: a) adopting an extended Kalman filter to address a vertical flexible beam’s model errors; and b) adopting a residual neural network to cascade with the extended Kalman filter for eliminating the remaining observation errors. Detailed experiments on a real dataset collected from a vertical flexible beam demonstrate that the proposed ERS model has achieved significant vibration control performance over several state-of-the-art models.

17:50-18:10, Paper WeDT5.2
ADRC Parameter Tuning of Flapping-Wing Robot Motor Based on Improved Lion Swarm Optimization (I)

Yue, Yuangan	University of Science and Technology Beijing
Wang, Zheqi	University of Science and Technology Beijing
Zou, Yao	University of Science and Technology Beijing
He, Wei	University of Science and Technology Beijing
Keywords: Robotic Systems Abstract: Flapping wing robots (FWRs) mainly use the crank-rod structure to transform the rotational motion of permanent magnet synchronous motor (PMSM) into the flapping motion of the wings so as to generate lift and thrust. In this scenario, the motor load varies periodically and the external disturbances are intense. The ADRC controller has the advantages of rapid response speed and strong anti-interference ability which is very suitable to drive the motor on flapping wing robot. However, ADRC has many parameters and the physical meanings of some parameters are unclear, making it difficult to tune. To solve the problem, this paper proposes a New Lion Swarm Optimization algorithm (NLSO) that attempts to search for the optimal parameters of ADRC in the optimization space. In view of the problem that traditional Lion Swarm Optimization algorithm are prone to converging to local optimum, this paper adjusts the cub search strategy and adds a stray lion swarm to prevent the algorithm from falling into local optimum and improve convergence accuracy, and verifies it through standard test functions. Finally, this article applies the NLSO to the parameter tuning of ADRC on PMSM speed control system, and compares it with traditional Lion Swarm Optimization and PMSM speed control system based on PI controller through simulation. The results show that compared to traditional lion swarm optimization algorithms, the NLSO has a better parameter tuning effect on ADRC, improves the response speed and anti-interference ability of PMSM, and enables PMSM to better adapt to the scenario as the driving motor for FWRs.

18:10-18:30, Paper WeDT5.3
An Improved Adaptive Moment Estimation Algorithm for the Industrial Robot Calibration (I)

Chen, Tinghui	Southwest University
Shuai, Li	University of Oulu, Technology Research Center of Finland (VTT)
Keywords: Robotic Systems, Soft Robotics, Adaptive Systems Abstract: Within the context of intelligent manufacturing, industrial robots have a pivotal function. Nonetheless, extended operational periods cause a decline in their absolute positioning accuracy, preventing them from meeting high precision. To address this issue, this paper presents a novel robot algorithm that combines an adaptive and momental bound algorithm with decoupled weight decay (AdaModW), which has three-fold ideas: a) adopting an adaptive moment estimation (Adam) algorithm to achieve a high convergence rate, b) introducing a hyperparameter into the Adam algorithm to define the length of memory, effectively addressing the issue of the abnormal learning rate, and c) interpolating a weight decay coefficient to improve its generalization. Numerous experiments on an HRS-JR680 industrial robot show that the presented algorithm significantly outperforms state-of-the-art algorithms in robot calibration performance. Thus, in light of its reliability, this algorithm provides an efficient way to address robot calibration concerns


WeDT6	MR06
Infrastructure Systems and Services 4
Chair: David, Beserra	EPITA

17:30-17:50, Paper WeDT6.1
Raspberry Pi Single-Board Computers: Cost/Performance Relationship Over Time

David, Beserra	EPITA
Endo, Patricia Takako	Universidade De Pernambuco
Clinckx, Louis	EPITA
Clement, Thomas	EPITA
Guisse, Boubacar	EPITA
Maugras, Alexandre	EPITA
Keywords: Infrastructure Systems and Services, Distributed Intelligent Systems, System Architecture Abstract: This study delves into the dynamic landscape of cost versus performance ratio within the Raspberry Pi family of computers, specifically scrutinizing the Raspberry Pi B and Raspberry Pi Zero lines. Based on previous analyses, our comprehensive investigation encompasses all generations of the Raspberry Pi B and Zero lines available until January 2024. Prices are meticulously adjusted to the 2012 dollar value, aligning with the inaugural launch of the Raspberry Pi. The findings illuminate an upward in performance around 229 times over an 11-year period, coupled with a notable decline in the cost per unit of performance. The impact of the dollar's depreciation since 2012 further accentuates these trends.

17:50-18:10, Paper WeDT6.2
Model Predictive Control with Recursive Multi-Step Input Convex Lipschitz Neural Networks: An Application to Smart Buildings (I)

Scarabaggio, Paolo	Politecnico Di Bari
Mignoni, Nicola	Politecnico Di Bari
Jan, Jantzen	Samso Energy Academy
Carli, Raffaele	Politecnico Di Bari
Dotoli, Mariagrazia	Politecnico Di Bari
Keywords: Smart Buildings, Smart Cities and Infrastructures, Intelligent Green Production Systems, System Modeling and Control Abstract: Model Predictive Control (MPC) is an optimal control technique that employs a dynamic model of the controlled process and an optimization algorithm to determine the control strategy. Nevertheless, the cost and effort required to create and maintain dynamical models are often high, and solving the resulting optimal control problem can be computationally complex. In recent years, data-driven modeling has become an attractive alternative to approximate the behavior of dynamical systems, with the aim of alleviating these issues. However, using such models for model-based control can be challenging due to their typically nonlinear and nonconvex nature. To address these issues, we propose a recursive multi-step learning-based dynamical modeling framework to capture the temporal behavior of dynamic systems. We take advantage of Input Convex Lipschitz Neural Networks, which are explicitly designed to be convex and continuous with respect to their inputs. We further show that these mathematical proprieties hold in a multi-step dynamical modeling framework. The proposed approach is evaluated in a real-life MPC experiment conducted in a smart building in the Samso Marina, Denmark. We show that the proposed approach keeps the internal temperature within comfort constraints while minimizing heating/cooling energy consumption.


WeDT7	MR07
Online - Autonomous Systems and Robotics
Chair: He, Jiaxing	Chongqing University of Technology

17:30-17:50, Paper WeDT7.1
News Session Recommendation Based on Long and Short-Term User Interest Representation and Multi-View Learning

He, Jiaxing	Chongqing University of Technology
Li, Liang	Chongqing University of Technology
Mao, Xingbin	Chongqing University of Technology
Keywords: Deep Learning, Neural Networks and their Applications, Artificial Social Intelligence Abstract: 在当前新闻信息激增的背景下，个性化的新闻推荐对于帮助用户快速访问感兴趣的内容非常重要。现有方法往往忽略了用户阅读行为的多样性以及会话中新闻主题的可变性。他们倾向于假设在一个会话中阅读或点击的所有新闻都归因于一个主要兴趣，而实际上，每个新闻会话可能涵盖对应不同主题的不同兴趣的各种新闻。本文介绍了一种新闻会话推荐方法，该方法利用用户兴趣的长期和短期表示，并结合了多视图学习技术，该方法将匿名用户的点击行为视为会话：最初，采用多尺度卷积神经网络从多个角度捕获用户点击的新闻文章的特征;随后，精心设计了一个GRU多渠道兴趣解耦器，以识别每个点击的新闻项目中的潜在兴趣，从而能ࣩ

17:50-18:10, Paper WeDT7.2
FPRes-Net: Feature Pyramid-Based Residual Network for Alzheimer's Disease Diagnosis

Zhang, Yueheng	Qufu Normal University
Zhang, Xiaoshuang	Qufu Normal University
Li, Yaozu	Qufu Normal University
Wu, Jinfeng	Qufu Normal University
Liu, Jin-Xing	University of Health and Rehabilitation Sciences
Zheng, Chunhou	Anhui University
Cui, Xinchun	Qufu Normal University; University of Health and Rehabilitation S
Keywords: Deep Learning, Neural Networks and their Applications, Biometric Systems and Bioinformatics Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that progresses in a slow and irreversible manner. Although many computer-aided methods have been used to diagnose AD, the issue of underutilization of detailed information and features persists. In this study, we propose a new AD diagnostic network (FPRes-Net) that can fully learn the rich information of 3D MRI slices by extracting multi-scale features and feature fusion. Firstly, in order to fully extract multi-scale information, a network structure combining ResNet-50 with feature pyramids was designed. Next, a feature fusion method was designed to reduce noise and increase the importance of important features. Finally, a visually interpretable method called Gradient-weighted Class Activation Mapping (Grad-CAM) was introduced to visualize important feature regions in AD diagnosis. Experimental analysis was conducted on the publicly accessible ADNI-1 dataset, and our proposed FPRes-Net model performed better than other advanced research methods, with an accuracy rate of 99.5%. Our proposed model can be effectively used for clinical diagnosis of AD.

18:10-18:30, Paper WeDT7.3
An Attention-Enhanced Text Error Detection Model Based on Potential Miswritten Term Pre-Check Feature in Manufacturing

Xu, Nan	Shenyang Aerospace University
Wang, Peiyan	Shenyang Aerospace University
Keywords: Application of Artificial Intelligence Abstract: Manufacturing specifications contain a large number of terms, which makes general domain text error detection methods ineffective in manufacturing. Existing text error detection methods have a limited utilization of term dictionary, and the challenge lies in the inability to determine whether a word not in the term dictionary is an out-of-vocabulary word or a miswritten word. To tackle this challenge, we introduce a Miswritten Term Pre-check feature Matrix (MTP-Matrix), which leverages n-gram similarity matching method to pre-check potential miswritten terms. Besides, we capture the information about error positions and types of miswritten terms through adding Attention-enhanced Network with Local Feature (ANLF), which enhances the representation of miswritten terms. Experimental results demonstrate that the proposed MTP-Matrix and ANLF can improve the F1 values of all baselines. The highest F1 value reaches 76.84%, and the F1 value of term error detection increases by 6.09%.

18:10-18:30, Paper WeDT7.4
Enhancing GPT-3.5 for Knowledge-Based VQA with In-Context Prompt Learning and Image Captioning

Yang, Yuling	Institute of Information Engineering, Chinese Academy of Science
Cao, Cong	Institute of Information Engineering, Chinese Academy of Science
Yuan, Fangfang	Institute of Information Engineering, Chinese Academy of Science
Zeng, Shuai	Institute of Automation, Chinese Academy of Sciences
Wang, Dakui	Institute of Information Engineering, Chinese Academy of Science
Liu, Yanbing	Institute of Information Engineering, Chinese Academy of Science
Keywords: Application of Artificial Intelligence, Machine Vision, Machine Learning Abstract: Traditional visual question answering (VQA) often falls short as merely relying on image information is insufficient to answer given questions. Therefore, Knowledge-Based Visual Question Answering (KB-VQA) has emerged. Typically, KB-VQA involves first retrieving knowledge from external knowledge bases, then using the retrieved knowledge in conjunction with the understanding of visual content for joint reasoning to predict answers. However, current models often suffer from weak visual perception capabilities when processing image information. Additionally, due to the incompleteness of external knowledge bases, retrieved knowledge may contain noise or even irrelevant information. Moreover, the re-embedding of knowledge text features during the model's reasoning process may deviate from the original meanings in the knowledge base. To address these challenges, we propose a method for Knowledge-Based Visual Question Answering (KB-VQA) using GPT-3.5, leveraging image captions and in-context prompts. We utilize an advanced captioning model to convert images into accurate textual representations, enhancing the large language model's understanding of visual information. Moreover, we eliminate the need for additional knowledge bases by directly employing GPT-3.5 as a knowledge base for knowledge retrieval and generate logically consistent text during inference to predict answers. Furthermore, we enhance GPT-3.5's question-answering capability for VQA through in-context prompt learning. Experiments on the public OK-VQA dataset demonstrate the superior performance of our model.

18:10-18:30, Paper WeDT7.5
Named Entity Recognition of Chinese Diabetes Based on Multi-Feature Fusion and Multi-Head Attention Mechanism

Geng, Xueliang	Qilu University of Technology
Wang, Shihua	Qilu University of Technology
Gao, Tianle	Qilu University of Technology
Zhang, Li	Qilu University of Technology
Jing, Ming	Big Data Institute
Keywords: Neural Networks and their Applications, Knowledge Acquisition, Deep Learning Abstract: 中国糖尿病的命名实体识别（NER）是中国糖尿病医疗处理的基础数据。然而，中国糖尿病医学的结构数据复杂，存在实体容易混淆等问题类型和模糊的实体边界，构成中国人工作面临的困难和挑战糖尿病命名实体识别。目前的主流命名实体识别模型通常使用单个文本的特点，未能充分利用文本的特点，导致实体性差认可性能。因此，本文提出一种基于的中国糖尿病命名实体识别方法多功能融合，多头注意力机制。它使用 RoBERTa-wwm 预训练模型获取字符来自中国糖尿病医学文本的载体和增强通过结合中文来具有语义特征自由基级结构特征向量。获得的向量通过BiGRU模块进行处理以提取全局特征和CNN模块提取多尺度本地特色。融合这些功能后，

18:10-18:30, Paper WeDT7.6
Distributed and Adaptive Message Dissemination for Vehicle Platooning in Hybrid Traffic

Li, Xiang	Shandong University of Science and Technology
Zhang, Fuxin	Shandong University of Science and Technology
Keywords: Communications, Intelligent Transportation Systems Abstract: In vehicular networks, vehicle in the platooning relies on dissemination of beacons to perceive the status of neighbor vehicles and then take control low to maintain a constant inter-vehicle distance. Vehicle platooning communication has stringent high-reliability and low-latency requirements. In this paper, we focus on resource allocation strategies for vehicle platoons in hybrid scenarios. First, we propose a two-dimensional Markov model to describe the channel contention (back-off) processes for platoons and individual vehicles. Based on the model, we then can derive the transmission probabilities of beacons and event messages for platoons and individual vehicles. Finally, we design a distributed and adaptive beacon control scheme to determine the optimal beacon rates for vehicles in the platoons. The simulation results demonstrate that our efforts substantially enhance platooning communication reliability and ensure optimal performance for vehicle safety applications.

18:10-18:30, Paper WeDT7.7
Towards Assessing Generative AI Based Empathic Systems (I)

Ayesh, Aladdin	University of Aberdeen
Arevalillo-Herráez, Miguel	Universitat De València
Zoghlami, Asma	Acutest (Part of Trustmarque)
Keywords: Intelligence Interaction, Human-Machine Interaction, Cognitive Computing Abstract: Abstract—Recent advances in generative AI epitomized by Large Language Models (LLMs) demonstrated remarkable ca- pabilities in generating human-like text and understanding con- textual nuances. In parallel, advances in domestic applications of AI continue to break new grounds bringing to the forefront the need to address the impact of such systems especially on users’ well being through socially aware interactive intelligent systems. This paper takes one step forward in this endeavour to examine the potential of LLMs based systems to generate empathy in its responses. In doing so, we identified through initial tests the key factors to designing empathy triggering responses. We then designed a set of experiments to test systematically the impact each of these factors would have on triggering empathy. We analysed the results against the degree of expressed empathy and its perceived quality.


WeDT8	MR08
Image Processing and Pattern Recognition	Regular Papers - Cybernetics


WeDT9	MR09
Deep Learning and Neural Networks - 8	Regular Papers - Cybernetics
Chair: Zhou, Yue	Nanjing Forestry University

17:30-17:50, Paper WeDT9.1
Enhanced Tree Branch Segmentation in Urban Environments Using a Dual-Encoder Model with Graph Reasoning Decoder Block

Zhou, Yue	Nanjing Forestry University
Wang, Hancong	Nanjing Forestry University
Liu, Yanyi	Nanjing Forestry University
Wu, Yin	Nanjing Forestry University
Keywords: Deep Learning, Machine Vision, Machine Learning Abstract: Urban greening trees play a significant role in optimizing the environment, and the segmentation of branches can effectively help researchers assess the growth status of trees. This paper proposed an innovative RGB image-based model to segment tree branches within urban landscapes. The proposed model employed a dual-encoder architecture, including a basic encoder and an edge encoder. These components were designed with specialized blocks adept at extracting critical semantic features and edge information, enhancing the model’s ability to segment fine branches. A graph reasoning decoder block with attention-based feature fusion was proposed to capture the semantic associations between regions incorporating edge information. Moreover, the elastic interaction-based loss function, a groundbreaking loss function, was introduced to ensure that the segmentation of the fine branches should be achieved smoothly and consistently. Upon evaluation against a public urban street tree dataset, the precision, recall, IoU, and accuracy of tree branch segmentation are 94.39%, 93.16%, 88.27%, and 98.78%, respectively, achieving the best result among all the tested models. This performance demonstrates the impact of deep learning on enhancing urban greening and sustainable development in effective tree management.

17:50-18:10, Paper WeDT9.2
A Novel Framework Combining VSL and Vehicle Platooning for Freeway Bottleneck

Lu, Tong	Shandong University of Science and Technology
Qi, Liang	Shandong University of Science and Technology
Luan, Wenjing	Shandong University of Science and Technology
Liu, Kun	Shandong University of Science and Technology
Guo, Xiwang	Liaoning Petrochemical University
Talukder, Qurra	Shandong University of Science and Technology
Keywords: Machine Learning, Deep Learning, Cloud, IoT, and Robotics Integration Abstract: Freeway bottlenecks caused by traffic incidents contribute significantly to large-scale traffic congestion. Traditional strategies, including variable speed limit (VSL) and ramp metering, are commonly used for freeway traffic congestion management. Recently, vehicle platooning has become a promising way to alleviate traffic bottlenecks. This work proposes a novel framework that combines VSL and vehicle platooning for freeway bottleneck, referred to as VSL-VP, in mixed traffic of connected and autonomous vehicles (CAVs) and human-driven vehicles (HDVs). First, the upstream road of a bottleneck is divided into two segments, called the former and the latter ones. VSL limits vehicle speed at the former segment, thereby reducing inflow traffic to the latter one. Then, deep reinforcement learning is employed for CAV platooning at the latter segment, where low traffic flow density and large car-following distance create conditions for smooth lane change and platoon formulation of CAVs. Simulation results demonstrate that VSL-VP significantly enhances the bottleneck throughput and reduces traffic congestion at elevated levels of CAV penetration rates.

18:10-18:30, Paper WeDT9.3
Architecture and Aggregation Strategies of Federated Broad Learning System: A Feasibility Study

Yang, Xueyue	South China University of Technology
Liu, Zhulin	South China University of Technology
Chen, C. L. Philip	University of Macau
Keywords: Cloud, IoT, and Robotics Integration, Neural Networks and their Applications, AI and Applications Abstract: This study investigates the feasibility of incorporating the broad learning (BL) model in federated learning. Traditional deep learning-based federated learning encounters challenges such as excessive communication volume and extended training time. To address these issues, federated learning with the broad learning model has attracted much attention. We provide a detailed discussion on server-side aggregation for BL models, including the initialization process and three feasible aggregation approaches for single-round aggregation. Then, ablation studies are conducted to assess the suitable BL model architectures within the context of federated learning. Furthermore, we perform a comparative analysis between these approaches and existing federated learning schemes to evaluate their advantages and limitations. Our research suggests that federated learning with BL models is highly feasible in certain scenarios and can effectively tackle the issues of transmission efficiency. Finally, we analyze our findings and propose further investigation to improve algorithms for more efficient performance of federated broad learning systems in the future.


WeDT10	MR10
Machine Vision and Perception 4
Chair: Zhang, Shaobo	College of Computer Science and Technology, Zhejiang University of Technology

17:30-17:50, Paper WeDT10.1
SSP: A Simple and Safe Prompt Engineering for Visual Generation

Cheng, Weijin	University of Electronic Science and Technology of China
Liu, Jianzhi	UESTC
Jiao, Ziyun	University of Electronic Science and Technology of China
Deng, Jiawen	University of Electronic Science and Technology of China
Ren, Fuji	University of Electronic Science and Technology of China
Keywords: Machine Vision Abstract: Prompt engineering aims to adapt an AI foundation model on the token level without weight updating. Recently, with the development of visual models, many researchers have begun to study visual generation quality improvement using prompt engineering. However, while those studies mainly aim to improve visual quality, they overlook the safe factors in prompts. We find that adding specific camera descriptions not only prevents these issues but also enhances visual quality. Consequently, we propose a simple and safe prompt engineering method (SSP) to improve visual generation quality by providing optimal camera descriptions. Specifically, we create a dataset from multi-datasets as original prompts. To select the optimal camera, we design an optimal camera matching approach and implement a classifier for original prompts capable of automatically matching. Appending camera descriptions to original prompts generates optimized prompts for further visual generation. Experiments demonstrate that SSP improves semantic consistency by an average of 16% compared to others and safety metrics by 35.8%.

17:50-18:10, Paper WeDT10.2
Towards Consistent Object Detection Via LiDAR-Camera Synergy

Luo, Kai	Changsha University of Science and Technology
Wu, Hao	Changsha University of Science and Technology
Yi, Kefu	Changsha University of Science and Technology
Yang, KaiLun	Huna University
Hao, Wei	Changsha University of Science and Technology
Hu, Rongdong	Changsha Intelligent Driving Institute
Keywords: Machine Vision, Image Processing and Pattern Recognition Abstract: As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.

18:10-18:30, Paper WeDT10.3
HCMLP: A Highly Condensed All-MLP Architecture for Extended Long-Term Human Motion Prediction (I)

Zhang, Shaobo	College of Computer Science and Technology, Zhejiang University
Liu, Sheng	Zhejiang University of Technology
Gao, Fei	Zhejiang University of Technology
Feng, Yuan	Zhejiang University of Technology
Keywords: Deep Learning, Machine Vision Abstract: Accurate human motion prediction has significant potential in various artificial intelligence applications. To accommodate the demands of applications such as autonomous driving on mobile devices, it is essential to utilize models that are both lightweight and capable of performing extended-duration predictions to ensure the system remains swift and reliable. To address these challenges, we present the HCMLP, a highly condensed all-MLP architecture designed for optimal lightweight efficiency, enabling extended long-term predictions without compromising performance. This pioneering method simultaneously captures the spatial correlations between pose joints and the temporal dynamics of each joint by employing distinct but parallel spatial and temporal MLPs. Then, Dynamic Aggregation component dynamically assimilates the spatial and temporal correlations. Finally, channel MLP synergizes and refines these spatio-temporal features for enhanced prediction accuracy. Our experiments on the Human3.6M, AMASS, and 3DPW datasets reveal that HCMLP surpasses the performance of current state-of-the-art methods in short-term, long-term, and particularly extended long-term predictions, while maintaining the least parameters. Code will be available at https://github.com/alanzhangv123/HCMLP.


WeDT11	MR11
Cybernetics and Quantum Systems 2	Regular Papers - Cybernetics
Chair: Yang, Linyao	Zhejiang Lab

17:30-17:50, Paper WeDT11.1
Clustering-Based Co-Evolutionary Crow Search Algorithm for Feature Selection in High-Dimensional Data

Li, Huan	Dongguan University of Technology
Chen, ZhiPeng	Dongguan University of Technology
Wei, Wenhong	Dongguan University of Technology
Keywords: Application of Artificial Intelligence, Evolutionary Computation, Heuristic Algorithms Abstract: The present data collection techniques can generate thousands or even more features in a dataset. However, an excessive number of redundant features can negatively impact the learning speed and classification performance of machine learning models. Selecting important features from high-dimensional data is a challenge. To address this challenge, We propose a cooperative evolutionary algorithm based on feature clustering to partition the feature subspace for feature selection. Firstly, a clustering method based on feature similarity is employed to partition the subspaces at a lower computational cost. Then, an initialization strategy based on feature-label correlation is proposed to accelerate the convergence of the population. Finally, to reduce the dimensionality of the feature subset while ensuring the classification efficiency of the algorithm, a mutation operator based on the optimal individual is introduced to obtain higher quality solutions. The algorithm is applied to 14 classic datasets and compared with 7 advanced algorithms. Experimental results demonstrate the algorithm’s ability to obtain good feature subsets.

17:50-18:10, Paper WeDT11.2
Open-Set Entity Alignment Using Large Language Models with Retrieval Augmentation

Yang, Linyao	Zhejiang Lab
Chen, Hongyang	Zhejiang Lab
Wang, Xiao	Institute of Automation, Chinese Academy of Sciences
Lv, Yisheng	Institute of Automation, Chinese Academy of Sciences
Tian, Yonglin	Institute of Automation, Chinese Academy of Sciences
Dai, Xingyuan	Institute of Automation, Chinese Academy of Sciences
Wang, Fei-Yue	Institute of Automation, Chinese Academy of Sciences
Keywords: AI and Applications, Knowledge Acquisition, Expert and Knowledge-Based Systems Abstract: Recent years have witnessed remarkable advancements in entity alignment, which endeavors to identify entities that represent the same real-world objects across different knowledge graphs (KGs). Nonetheless, prevailing approaches predominantly operate within closed-domain scenarios, rendering them inadequate for handling unmatchable entities. To address this challenge, we propose a retrieval augmented large language model framework (RALLM) to leverage the reasoning capacities of large language models (LLMs) to achieve openset entity alignment, which not only enables the identification of equivalent entities for matchable entities but also addresses the identification of unmatchable ones. Specifically, we propose a novel retrieval augmentation method that leverages both textual and structural information of entities to retrieve potential equivalent candidates. Subsequently, we employ an iterative process to prompt the LLM to discern the equivalence between the retrieved candidate entity and the entity requiring alignment. To mitigate issues related to many-to-one alignment prediction and enhance alignment efficacy, we devise a memory mechanism to store highly confident aligned entity pairs and provide reminders to the LLM when a candidate entity has been matched. Our experimental findings underscore the superior performance of RALLM, highlighting the potential of LLMs in facilitating open-set entity alignment tasks.

18:10-18:30, Paper WeDT11.3
CMIX: Causal Value Decomposition for Cooperative Multi-Agent Reinforcement Learning

Yao, Dunqi	University of Chinese Academy of Sciences; Institute of Software
Sun, Chuxiong	Institute of Softfware Chinese Academy of Sciences
Li, Kai	Institute of Software, Chinese Academy of Sciences
Zhou, Kaijie	Institute of Software Chinese Academy of Sciences
Li, Hanyu	CASIC Research Institute of Intelligent Decision Engineering
Wang, Rui	Institute of Softfware Chinese Academy of Sciences
Liu, Lixiang	Institute of Software, Chinese Academy of Sciences
Keywords: Machine Learning Abstract: Value decomposition plays a pivotal role in ensuring effective credit assignment within Multi-Agent Reinforcement Learning (MARL), particularly in cooperative multi-agent tasks where agents are limited to accessing team rewards only. However, existing methods treat the mixing network as a black box, implicitly assuming that neural networks can autonomously extract important information and achieve rational credit assignment during policy learning. This approach not only lacks interpretability but may also prove inefficient in complex scenarios. To enhance the interpretability and rationality of value decomposition, we propose an innovative approach called ``Causal Value Decomposition''(CMIX). CMIX employs causal inference-based models, introducing a set of metrics beyond environmental rewards to enhance robustness and model interpretability. Specifically, CMIX establishes intricate relational structures among agents in complex environments and leverages causal relationships between agents and their surroundings to address the credit assignment challenge in MARL. By employing do-calculus, CMIX accurately measures the impact of each agent's actions on environmental states, precisely determining their contribution to the collective reward. This approach not only enhances the interpretability of existing black-box models but also improves the accuracy of credit assignment in multi-agent systems. Moreover, CMIX exhibits high scalability and complements existing value decomposition techniques. Its effectiveness and scalability have been rigorously tested across various settings, including MPE, LBF, and SMAC environments.


WeDT12	MR12
Complex, Cooperative Systems and Big Data
Chair: Jin, Yi	Beijing Jiaotong University

17:30-17:50, Paper WeDT12.1
AutoEDA: Iterative Data Focusing and Exploratory Analysis Based on Attribute Frequency

Wu, Tong	SouthWest University of Science and Technology
Wang, Song	Southwest University of Science and Technology
Peng, Xin	SouthWest University of Science and Technology
Keywords: Information Visualization, Visual Analytics/Communication Abstract: The paper proposes an automated data exploration and analysis method based on Attribute Frequency Statistical Feature Ratio (AFSFR). It integrates AutoVis and Data Preprocessing Methods to design and develop AutoEDA-Segment. Addressing the Concentrate on Field Sequences (CFS) problem in data exploration and analysis, this study employs various classification models and combines AFSFR with field type and the Elbow Inflection Point (EIP) of index features to design a field type identification and field value assessment method. For evaluating the effectiveness of focus analysis, the approach provides clustering visualization effects and an analysis scheme based on Field Type Search Tree (FST) and cluster comparison profiles, using a custom CFS approach. Additionally, to enhance the value of focused subset analysis data, the approach introduces a Parallel Coordinates-Based Data Filter (PCF), forming an EDA feedback loop to achieve Iterative Exploratory Data Analysis for User-inferred Cognition (IEDA-UC). Finally, we engaged graduate students with varying levels of experience in visualization research for collaboration and discussion, validating the effectiveness and feasibility of the approach using structured data from Kaggle.

17:50-18:10, Paper WeDT12.2
Toward a Computer-Supported Remote Serendipity: Understanding the Role of Collaboration in Remote Innovation (I)

Coutinho, Aline	Universidade Federal Do Rio De Janeiro
Barbosa, Carlos Eduardo	Universidade Federal Do Rio De Janeiro
de Almeida, Marcos Antonio	Ufrj
Souza, Jano	Federal University of Rio De Janeiro
Keywords: Human Factors, Cooperative Work in Design, Team Performance and Training Systems Abstract: Since 2016, the demand for remote work has grown by nearly 400%, with 3.5 million remote job vacancies posted, and it is expected to continue to grow in the coming years. Therefore, one of the challenges in advancing remote work is fostering innovation, particularly serendipity, in a remote workforce. This fortunate discovery can pave the way for technological advancements, new business strategies, or even scientific revolutions. However, creating moments of serendipity in a remote work environment is a significant challenge. Thus, finding the right approach to stimulate serendipity in a remote work environment is an ever-evolving challenge. Therefore, this work aims to understand the possibilities of serendipity that collaboration tools used in remote work can support. The methodology used for this work was a Rapid Review. First, we explore the factors related to serendipity in physical offices, for which we identified twenty-three elements. Next, we compiled 38 strategies for remote work that were found to promote serendipity, which we organized in a framework for better observation. Our findings can serve as a starting point for designing new tools and identifying existing software and tools that already play a supportive role.

18:10-18:30, Paper WeDT12.3
Enhancing Point Cloud Sampling Quality with Dual-Branch Fusion Networks (I)

Jin, Yi	Beijing Jiaotong University
Wang, Xu	Beijing Jiaotong University
Hu, Mengxia	China Automotive Engineering Research Institute Co., Ltd
Yu, Hui	University of Portsmouth
Li, Yidong	Beijing Jiaotong University
Wang, Tao	Beijing Jiaotong University
Feng, Songhe	Beijing Jiaotong University
Lang, Congyan	Beijing Jiaotong University
Keywords: Visual Analytics/Communication Abstract: Task-oriented point cloud sampling methods have attracted considerable attention for their ability to adaptively select important point sets based on downstream tasks, achieving an excellent balance between data simplification and task performance. However, existing task-oriented sampling models, primarily based on single-branch designs, struggle to fully extract features from input point clouds that comprehensively reflect multi-dimensional key information, thus limiting their sampling performance. In this paper, we introduce a dual-branch sampling network, named DBS-NET, which conducts crucial point sampling from both the global and local importance perspectives separately before merging them, thereby preserving multi-dimensional key information of the input data during the sampling process. Qualitative and quantitative experimental results demonstrate the competitive performance of DBS-NET on the classification benchmark task.


WeDT13	Room T13
2P - Decision Support and Expert Systems	2-Page Abstracts
Chair: Hüsing, Elodie Elisabeth Corinna	RWTH Aachen University

17:30-17:50, Paper WeDT13.1
Consensus Reaching Model for 2-Rank Group Decision Making with Personalized Individual Semantics

Zhang, Zhen	Dalian University of Technology
Yu, Wenyu	Dongbei University of Finance and Economics
Li, Ke	Sichuan University
Keywords: Decision Support Systems Abstract: Traditional group decision-making problems focus on obtaining a complete ranking of all alternatives from best to worst. However, in many real-life scenarios, there are instances where it is necessary to assign only two rank levels to alternatives, creating a ranking where one subset of alternatives is prioritized above another subset. These scenarios are referred to as 2-rank group decision-making problems. Linguistic preference relations serve as an effective tool for expressing decision-makers’ preferences, as they allow comparisons between two alternatives at a time using linguistic terms. Nonetheless, in 2-rank group decision-making problems with linguistic preference relations, it is common for the same linguistic term to hold different meanings for different decision-makers, a phenomenon known as personalized individual semantics (PISs). Addressing how to model PISs in 2-rank group decision-making problems presents a significant challenge. In this paper, we develop a consensus-reaching model for 2-rank linguistic group decision-making problems, incorporating PISs and consistency control for decision-makers. Specifically, we first employ consistency-driven models to evaluate and improve the consistency of each decision-maker’s linguistic preference relations. Based on this foundation, we determine the 2-rank preference vectors for both individuals and the group. Subsequently, we propose a 2-rank consensus measurement method and design a 2-rank consensus-reaching process to help decision-makers enhance their consensus level. This involves the development of a PIS-based consensus level maximization model and a PIS-based minimum adjustment model. Furthermore, we introduce an algorithm to implement the consensus-reaching framework. Ultimately, numerical experiments and simulation results are provided to demonstrate the effectiveness of the proposed method.

17:50-18:10, Paper WeDT13.2
Design Process for Concept Development of Human-Robot Workstations for People with Disabilities

Hüsing, Elodie Elisabeth Corinna	RWTH Aachen University
Weidemann, Carlo Benedikt	RWTH Aachen University
Corves, Burkhard	RWTH Aachen University
Hüsing, Mathias	RWTH Aachen University
Keywords: Design Methods, Human-Collaborative Robotics, Assistive Technology Abstract: The deployment of robotic systems in collaborative applications has the potential to significantly increase the inclusion of people with disabilities (PwD) on the primary labor market, particularly in light of the growing labor shortage due to demographic change. To ensure adequate assistance to PwD, it is essential to adapt the human-robot workstation to their capabilities and individual needs, as well as the underlying task. Therefore, we designed human-robot workstations that support PwD in industrial applications and the necessary methods to consider individual capabilities during concept development. These workstations enabled PwD to perform tasks that were previously unattainable. In this work, we propose a preliminary design process for the concept development of human-robot workstations for PwD. The process is based on the results and findings of our research and adheres to the guidelines established in VDI 2221.

18:10-18:30, Paper WeDT13.3
A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic

Li, Jinghong	Japan Advanced Institute of Science and Technology
Gu, Wen	Center for Innovative Distance Education and Research, Japan Adv
Koich, Ota	Japan Advanced Institute of Science
Hasegawa, Shinobu	Japan Advanced Institute of Science and Technology
Keywords: Knowledge Acquisition, Expert and Knowledge-Based Systems, Big Data Computing, Abstract: With the exponential growth in the number of papers and the trend of AI research, the use of Generative AI for information retrieval and question-answering has become popular for conducting research surveys. However, novice researchers unfamiliar with a particular field may not significantly improve their interaction efficiency with Generative AI because they have not developed divergent thinking in that field. This study aims to develop an in-depth Survey Forest Diagram that guides novice researchers in divergent thinking about the research topic by indicating the citation clues among multiple papers to help expand the survey perspective for novice researchers.

Technical Program for Wednesday October 9, 2024