| |
Last updated on September 30, 2024. This conference program is tentative and subject to change
Technical Program for Thursday October 10, 2024
|
ThAT1 |
MR01 |
Image Processing and Pattern Recognition 8 |
Regular Papers - Cybernetics |
Chair: Yan, Haimin | Ritsumeikan University |
|
08:30-08:50, Paper ThAT1.1 | |
Improvement in the Transferability of Target Adversarial Examples Based on Data Augmentation |
|
Wang, Kun | Qilu University of Technology (Shandong Academy of Sciences) |
Jin, Xing | Shandong Provincial Computing Center (National Supercomputing Ji |
Wang, Fuqiang | Qilu University of Technology, Shandong Computer Science Center |
|
|
08:50-09:10, Paper ThAT1.2 | |
YOLO-UTS: Lightweight YOLOv5 for UAV Traffic Monitoring and Surveillance |
|
Yan, Haimin | Ritsumeikan University |
Wang, Juncheng | China United Network Communications Corporation |
Kong, Xiangbo | Toyama Prefectural University |
Tomiyama, Hiroyuki | Ritsumeikan University |
Keywords: Image Processing and Pattern Recognition, Machine Learning, Machine Vision
Abstract: In the application of drone-based target detection, both the lightweighting of the model and its accuracy are crucial. Therefore, this paper aims to achieve model lightweighting and accuracy improvement through structural improvements to the YOLO model. To meet this objective, this work optimizes the structure of the existing model to address the challenge of operational limitations in small drones, which have restricted computational capacities. By integrating lightweight modules as the backbone of the model, it enhances the feasibility of deployment on devices with limited processing power. Furthermore, the introduction of a self-attention mechanism, which is placed in the Neck of the model improves the ability to prioritize critical regions within the image. This enhancement is crucial for accurately detecting overlapping or blurred objects encountered by the moving drone. Additionally, the modification of the Intersection over Union (IoU) metric, which now considers the aspect ratio and shape of the bounding boxes further refines the target detection capabilities, ensuring more precise and reliable object localization. Since drones do not remain at a constant height or position, the angles and distances of the vehicles which they capture are bound to change. This adjustment, which modifies how the IoU quantifies overlap, allows the IoU to more accurately quantify the degree of overlap of vehicle detection bounding boxes at different angles and distances, thereby providing more reliable target detection performance. Experimental results indicate that compared to existing YOLO models, our method achieves 30% reduction in model size and 1% improvement in accuracy.
|
|
09:10-09:30, Paper ThAT1.3 | |
YOLO-ELD: Efficient and Lightweight Detection for UAV Aerial Imagery |
|
Yan, Haimin | Ritsumeikan University |
Wang, Juncheng | China United Network Communications Corporation |
Kong, Xiangbo | Toyama Prefectural University |
Tomiyama, Hiroyuki | Ritsumeikan University |
Keywords: Image Processing and Pattern Recognition, Machine Learning, Machine Vision
Abstract: Object detection in UAV imagery has become a hot topic in recent years. However, deploying real-time detection models on UAV platforms is highly challenging due to limited computational power and memory. Moreover, the large size of UAV-captured images, the small size of objects, and their dense distribution all impact detection efficiency. Many researchers have made a series of improvements to address these issues, but they have not maintained a good balance among model size, inference speed, and accuracy. To address above difficulties, this paper proposes an efficient and lightweight model that maintains moderate detection accuracy while achieving model lightweighting and reduced inference time. Concretely, considering the higher resolution of drone-captured images, we have designed a backbone with more lightweight downsampling modules, enhancing deployment efficiency on devices with limited resources. Additionally, this work incorporates a self-attention mechanism in the feature extraction component of the model, which significantly improves the ability to process critical areas in the image, crucial for detecting small-scale and densely distributed targets. Moreover, this work designs an IoU tailored for drone aerial images, which calculates losses by focusing on the shape and scale of bounding boxes, thereby enhancing the accuracy of bounding box regression. Additionally, it uses a ratio of scale factors to control the generation of auxiliary bounding boxes, which aids in loss calculation and accelerates convergence. Experimental results on the VisDrone 2019 dataset show that compared to existing detection methods used for drones, our model is more lightweight and efficient, while also achieving medium accuracy. Also, compared to our baseline method, YOLO-ELD reduces the number of model parameters by about 40%, increases the inference speed of the model by 10%, and also improves the precision of model by 3%.
|
|
09:30-09:50, Paper ThAT1.4 | |
Unsupervised Low Light Image Enhancement Via SNR-Aware Swin Transformer |
|
Luo, Zhijian | Jiaying University |
Tang, Jiahui | Jiaying University |
Zhou, Kaihua | Jiaying University |
Huang, Zihan | Jiaying University |
Zhang, Jiao | Jiaying University |
Hou, Yueen | Jiaying University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Deep Learning
Abstract: Image captured under low-light conditions presents unpleasant artifacts, which debilitate feature extraction performance in upstream visual tasks. Low-light image enhancement (LLIE) aims to improve brightness and contrast, further reducing noise that corrupts the visual quality. Recently, many image restoration methods based on Swin Transformer have been proposed, and impressive performance has been achieved. However, on one hand, trivially employing Swin Transformer for LLIE would expose several artifacts, including over-exposure, brightness imbalance, noise corruption, etc. On the other hand, it is impractical to capture image pairs of low-light images and corresponding ground truth, i.e., well-exposed images in the same visual scene, for model training. In this paper, we propose a dual-branch network based on Swin Transformer, guided by a signal-to-noise ratio prior map which provides the spatial-varying information for LLIE. Moreover, we leverage unsupervised learning to construct the optimization objective based on the Retinex model, to guide the training of the proposed network. Experimental results demonstrate that the proposed model is competitive with the baseline models.
|
|
09:50-10:10, Paper ThAT1.5 | |
High-Performance Video Retrieval by Combining Contrastive Language Image Pre-Training and Cross-Modal Attention |
|
Ou, Jun-Jie | National Taipei University of Technology |
Yang, Shih-Hsuan | National Taipei University of Technology |
Keywords: Image Processing and Pattern Recognition, Application of Artificial Intelligence, Deep Learning
Abstract: Video retrieval is a task that locates the video segments that best match the query text. Although joint language-image models have recently achieved great success, training a video retrieval model typically requires a large data set and leads to long training time. This paper aims for a high-performance video retrieval method by combining pre-trained encoders and cross-modal attention. We first leverage the pre-trained CLIP (Contrastive Language-Image Pre-Training) ViT-B/16 encoders to extract the feature vectors for videos and texts. Next, the X-Pool, a recent cross-modal language-video attention model is incorporated into the system, where three layers of X-Pool are used to enhance retrieval accuracy. We also suggest using sparse sampling on video frames to further reduce training time. The effectiveness of the proposed method is evaluated using the MSRVTT and MSVD benchmark datasets. The proposed method achieves the best retrieval performance (recall@k, median rank, and mean rank) among the state-of-the-art methods with comparable computational complexity.
|
|
10:10-10:30, Paper ThAT1.6 | |
Malware Classification Method Based on Dynamic Features with Sensitive Behaviors |
|
Xie, Yamin | Institute of Information Engineering, University of Chinese Acad |
Li, Siyuan | Institute of Information Engineering Chinese Academy of Science |
Chen, ZhengCai | The Institute of Information Engineering, University of Chinese |
Du, HaiChao | Institute of Information Engineering, University of Chinese Acad |
Jia, Xiaoqi | Institute of Information Engineering, University of Chinese Acad |
Tang, Jing | Institute of Information Engineering, University of Chinese Acad |
Du, Yuejin | Future Security Institute, Qihoo 360 Technology Co Ltd |
Keywords: Deep Learning, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Neural Networks and their Applications
Abstract: Traditional malware classification methods often just scratch the surface by analyzing the sequence of system commands (API calls) used by malware during its operation. These approaches miss out on deeper, complex behaviors that could significantly enhance accuracy in identifying different malware types. To address this, we introduce SenBeMC, a method that delves deeper into the behaviors exhibited by malware. SenBeMC combine API call information vectors with behavioral information to enhance the deep semantic information of input features, enriching the hierarchical structure of feature representation. SenBeMC stands out by employing soft thresholding and attention mechanisms to sift through the noise --- extraneous information that can mask the malware's true nature, and a BiLSTM model that excels in understanding the sequence and timing of actions, crucial for spotting sophisticated threats. Experimental evaluations on real-world datasets affirm that SenBeMC effectively improves feature representation and accuracy of malware classification when compared to other contemporary state-of-the-art models.
|
|
ThAT2 |
MR02 |
Computational Intelligence and Soft Computing 1 |
Regular Papers - Cybernetics |
Chair: Xinggui, Ye | University of Electronic Technology and Science of China |
|
08:30-08:50, Paper ThAT2.1 | |
A Method for Robotaxi Dispatch with Recommendation of Boarding Time and Pick-up/Drop-Off Points |
|
Qi, Liang | Shandong University of Science and Technology |
Li, Mengqi | Shandong University of Science and Technology |
Luan, Wenjing | Shandong University of Science and Technology |
Zhang, Rongyan | Shandong University of Science and Technology |
Talukder, Qurra | Shandong University of Science and Technology |
Liu, Kun | Shandong University of Science and Technology |
Guo, Xiwang | Liaoning Petrochemical University |
Keywords: Computational Intelligence, Evolutionary Computation, Optimization and Self-Organization Approaches
Abstract: Given the rapid advancements in autonomous driving and communication technologies, robotaxis may emerge as a crucial component in the future of transportation. Optimization of robotaxi dispatch considering ride-sharing can enrich the travel modes of residents and improve the network capacity of transportation systems. This work proposes a multi-objective optimization model for robotaxi dispatch. Unlike previous approaches, this is the first attempt to adjust simultaneously unreasonable boarding time (BT) and pick-up/drop-off (UO) points for passengers in the robotaxi dispatch. It encourages passengers to walk to the recommended UO points or to be picked up slightly earlier or later, which aims to maximize the profit per kilometer of robotaxis and to minimize the total travel expense of passengers. Subsequently, a nondominated sorting genetic algorithm with mass center (NSGA-MC) is proposed to solve the model. Experimental results show that the proposed algorithm outperforms nondominated sorting genetic algorithm II (NSGA-II) and multi-objective evolutionary algorithm based on decomposition (MOEA/D) across several metrics. Additionally, an ablation study is conducted between with UO-point and boarding-time recommendation and without those to illustrate the advantage of the proposed model in developing intelligent public transportation systems.
|
|
08:50-09:10, Paper ThAT2.2 | |
A Safe Economic Dispatch for Microgrid with Ladder-Type Carbon Trading |
|
Liu, Weirong | Central South University |
Xie, Qifeng | Central South University |
Rong, Jieqi | Central South University |
Ren, Wanwan | Central South University |
Jiang, Fu | Central South University |
Keywords: Soft Computing, Socio-Economic Cybernetics
Abstract: The energy scheduling strategy for microgrids based on reinforcement learning play a very crucial role in realizing low-carbon and economic energy utilization. Considering the traditional reinforcement learning in difficult to meet the operational constraints of microgrids, this paper proposed a safety reinforcement learning method based on proximal policy optimization. Firstly, ladder-type carbon trading is introduced to strictly constrain the system carbon emission. Then, a safety proximal policy optimization method is proposed that incorporates a safety network to decouple the economic and safety factors in the traditional reward composition. Finally, experiments are conducted on real-world datasets to verify the effectiveness of the proposed algorithm. The experimental results show that the proposed safety reinforcement learning method is able to minimize the economic cost and carbon emission of microgrid scheduling while strict guaranteeing safety compared to existing methods.
|
|
09:10-09:30, Paper ThAT2.3 | |
Multi-Scale Quantum Harmonic Oscillator Behaved Algorithm with Three-Stage Perturbation for High-Dimensional Expensive Problems |
|
Xinggui, Ye | University of Electronic Technology and Science of China |
Jianping, Li | University of Electronic Science and Technology of China |
Wang, Peng | Southwest Minzu University |
Keywords: Computational Intelligence, Metaheuristic Algorithms, Swarm Intelligence
Abstract: Quantum perturbation plays an important role in quantum movement. This paper proposes a three-stage perturbation (TSP) framework to enhance the performance of multi-scale quantum harmonic oscillator algorithm (MQHOA). Three perturbations are adopted in the population initialization process with opposition-based learning (OBL), in quantum harmonic oscillator (QHO) process with ensemble of three differential evolution (DE) strategies and in multi-scale (M) process with a rollback mechanism to enhance the diversity of the particles and prevent from falling into local optima. The proposed approach has been evaluated on several high-dimensional expensive problems from 50-D to 500-D. The empirical results are compared with recent MQHOA variants and some state-of-the-art similar metaheuristic algorithms (MAs). The experimental data reveal the superiority or competitiveness of the the proposed approach.
|
|
09:30-09:50, Paper ThAT2.4 | |
Neural Model Embedded Heuristics for Robust Traveling Salesman Problem with Interval Uncertainty |
|
Xiao, Pei | Sun Yat-Sen University |
Zhang, Zizhen | Sun Yat-Sen University |
Chen, Jinbiao | Sun Yat-Sen University |
Wang, Jiahai | Sun Yat-Sen University |
Keywords: Hybrid Models of Computational Intelligence, Metaheuristic Algorithms, Neural Networks and their Applications
Abstract: We study the robust traveling salesman problem (RTSP) with interval uncertainty set under the min-max regret criterion, which represents an extended and robust version of the classic traveling salesman problem (TSP). The goal is to obtain a conservative solution that minimizes the maximum deviation from the optimal routing time in the worst-case scenario. Given the significant advancements and widespread application of neural techniques in recent years, we propose integrating neural models into heuristic approaches for tackling the RTSP. Specifically, by embedding a pre-trained TSP model into the tabu search framework as a key component for its evaluation function, our approach efficiently guides the improvement process of solutions. Our experimental results demonstrate the effectiveness of the proposed approach, showcasing its capability to efficiently handle various scales of robust traveling salesman problems within shorter time compared to other traditional heuristic methods.
|
|
09:50-10:10, Paper ThAT2.5 | |
Learning a Shogi Evaluation Function Using Genetic Algorithms |
|
O'Connor, Jim | Connecticut College |
Kosovsky, Russell | Connecticut College |
Brandenburger, Brooke | Connecticut College |
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence
Abstract: This paper introduces a novel approach to evolving a shogi evaluation function using genetic algorithms. This study explores an alternative to the commonly used mentor-assisted learning methods for shogi, also known as Japanese chess. Instead of relying on established concepts for learning evaluation functions, such as mentor-assisted learning, we employ a genetic algorithm utilizing the winning player as the sole learning input. Our novel dataset, compiled from 1 million board states scraped from professional games, served as the foundation for training. Our approach yielded a 70% classification accuracy in determining the winner of shogi games from previously unseen board states when tested on a validation set. The results highlight the effectiveness of using genetic algorithms to evolve a shogi evaluation function and provide a further understanding of enhancing shogi AI using game outcomes as primary training data. Our method's computational efficiency also stands as an advantage over other techniques commonly employed in this domain. This work offers a fresh perspective in the realm of computer shogi with implications for future research and development.
|
|
ThAT3 |
MR03 |
Machine Vision, Human-Machine and BMIs |
|
Chair: Wang, Mingyi | Hong Kong University of Science and Technology |
|
08:30-08:50, Paper ThAT3.1 | |
Towards Robust Blockchain Price Oracle: A Study on Human-Centric Node Selection Strategy and Incentive Mechanism |
|
Xian, Youquan | Guangxi Normal University |
Zeng, Xueying | Guangxi Normal University |
Wu, Hao | Guangxi Normal University |
Yang, DanPing | Guangxi Normal University |
Wang, Peng | Guangxi Normal University |
Liu, Peng | Guangxi Normal University |
Keywords: Human Performance Modeling, Human Enhancements, Systems Safety and Security,
Abstract: As a trusted middleware connecting the blockchain and the real world, the blockchain oracle can obtain trusted real-time price information for financial applications such as payment and settlement, and asset valuation on the blockchain. However, the current oracle schemes face the dilemma of security and service quality in the process of node selection, and the implicit interest relationship in financial applications leads to a significant conflict of interest between the task publisher and the executor, which reduces the participation enthusiasm of both parties and system security. Therefore, this paper proposes an anonymous node selection scheme that anonymously selects nodes with high reputations to participate in tasks to ensure the security and service quality of nodes. Then, this paper also details the interest requirements and behavioral motives of all parties in the payment settlement and asset valuation scenarios. Under the hypothesis of rational man, an incentive mechanism based on the Stackelberg game is proposed. It can achieve equilibrium under the pursuit of the revenue of task publishers and executors, thereby ensuring the revenue of all types of users and improving the enthusiasm for participation. Finally, we verify the security of the proposed scheme through security analysis. The experimental results show that the proposed scheme can reduce the variance of obtaining price data by about 55% while ensuring security, and meeting the revenue of all parties.
|
|
08:50-09:10, Paper ThAT3.2 | |
Modularized Brain Network for Eliminating Volume Conduction Effects |
|
Das Chakladar, Debashis | Luleå University of Technology |
Simistira Liwicki, Foteini | Lulea University of Technology |
Keywords: Brain-based Information Communications, Human-Machine Interaction, Brain-Computer Interfaces
Abstract: Understanding brain dynamics through connectivity networks is a growing topic of neuroscience. The volume conduction (VC) effect can be approximated as a linear mixing of the electrical fields of the brain regions, leading to spurious connectivity results. The proposed modularized brain connectivity network consists of three methods: Surface Laplacian (SL), partial correlation, and phase lag index (PLI) to eliminate VC effects from the brain connectivity network. SL is initially applied to the raw Electroencephalography (EEG) signal, and Event-related potential peak-wise modules for each EEG event are identified. Next, the optimum EEG channels are selected using the partial correlation method, and the source channels of each module are identified. Finally, the resultant brain connectivity network is constructed by adding the edges (i.e., PLI value) between the source channels of two modules. The experiment is performed on an EEG-based driving dataset. The performance of the proposed brain network for each driving event is evaluated based on graph measures such as mean local efficiency (MLE) and global efficiency (GE). After eliminating the VC effects, the modularized brain connectivity network significantly improves information processing rates (in terms of graph measures) across the brain region. We achieved maximum average GE (AGE) and average MLE (AMLE) values of 0.742 and 0.825 with the proposed brain network.
|
|
09:10-09:30, Paper ThAT3.3 | |
Augmenting the Perceptual Experience of Being Faced Using Gaze Modeling During Online Video Viewing |
|
Nakajima, Yuto | University of Tsukuba |
Hirokawa, Masakazu | NEC Corporation |
Hassan, Modar | University of Tsukuba |
Suzuki, Kenji | University of Tsukuba |
Keywords: Human Perception in Multimedia, Entertainment Engineering, Human Factors
Abstract: Live Performances have been conducted online in recent years due to the influence of the COVID-19 pandemic. This generated a novel problem of losing eye contact with the performers, which deteriorates the viewing experience in the online setting. This study proposes a feedback system based on gaze modeling to augment the perceptual experience of being faced by the performer during an online live performance viewing. This system aims to improve the experience of viewing online live performances by recreating a perceptual experience of interaction between the performer and the audience. The proposed system consists of a gaze recognition unit that judges whether the performer is looking at the camera based on gaze modeling in the video, and a stimulus presentation unit that presents the judgement result to the audience through vibration stimulus. This paper describes the gaze recognition model, experiments to determine the parameters to be used in the model, and the evaluation of live performance viewing experiments using the proposed system.
|
|
09:30-09:50, Paper ThAT3.4 | |
Extracted Audio-Induced Reward Expectation Information from Local Field Potential in the Medial Prefrontal Cortex |
|
Wang, Mingyi | Hong Kong University of Science and Technology |
Tan, Jieyuan | Hong Kong University of Science and Technology |
Huang, Yifan | Hong Kong University of Science and Technology |
Wu, Shenghui | The Hong Kong University of Science and Technology |
Song, Zhiwei | The Hong Kong University of Science and Technology |
Wang, Yiwen | Hong Kong University of Science and Technology |
Keywords: Passive BMIs, Active BMIs, BMI Emerging Applications
Abstract: Brain-machine interface (BMI) technology has witnessed notable advancements, facilitating individuals with motor disabilities to effectively operate prosthetic limbs. Reinforcement learning (RL) has been employed within BMIs to train decoders that can interpret neural activity and translate it into movement intentions using reward information. Internal rewards, reflected in neural response to sensory feedback in the medial prefrontal cortex (mPFC), can be used for autonomous updates in RL-BMIs. Studies have shown that designed audio feedback improves subjects' learning abilities, while neural activity in the mPFC induced by audio feedback indicates future rewards information. These findings highlight the possibility to utilized neural modulation in mPFC upon audio-induced can serves intermediate guidance on decoder update feedback. However, the reliance on single neuron spike signals from mPFC has limitations and will be unavailable, especially in long-term BMI implants. Local field potentials (LFPs) provide neural ensemble information and have been proposed as an alternative long-term data source to overcome these limitations. This paper proposes to extract LFPs neuromodulations and relate them to audio-induced reward expectation information from mPFC neural activity by implementing a data-driven marked point process (MPP) methodology. We correlate synchronized spike activity to the transient events in the LFP broad high frequency (bhf) band (200-400Hz) in the mPFC of rats performing the two-lever press discrimination task. Compared with extracting LFP features from the binned spectrogram power, our approach improves 24.63% on average in the peak-signal-to-noise-ratio (PSNR) across subjects over our data segments. This study indicates that LFPs in the mPFC contains the information that can provide sensory-induced reward expectation information in long-term use and advances the development of autonomous guidance for BMI decoders.
|
|
09:50-10:10, Paper ThAT3.5 | |
Fuzzy Direct Torque Control Application-Specific Integrated Circuit with Neural Network and Fuzzy Hysteresis Controller for Induction Motor (I) |
|
Sung, Guo-Ming | National Taipei University of Technology |
Huang, Bo-Rui | National Taipei University of Technology |
Lin, Ze-Kai | National Taipei University of Technology |
Lee, Ching-Yin | Tungnan University |
Chen, Chao-Rong | National Taipei University of Technology |
Yu, Chih-Ping | National Taipei University of Technology |
Keywords: Supervisory Control, Human-Machine Cooperation and Systems, Design Methods
Abstract: This study proposes a direct torque control (DTC) application-specific integrated circuit (ASIC) equipped with a neural network and a fuzzy hysteresis controller to achieve seamless control of a three-phase induction motor. In the proposed DTC system, feedback currents and voltages measured at the stator are fed into the hysteresis controller and a switching table. Subsequently, six-arm voltages are generated based on the voltage vector selector table to drive the RM5G inverter. However, severe switching noise is present in the power transistors of the inverter. These problems lead to numerous large ripples, instability, and delayed torque and flux responses at the stator. To address the aforementioned challenges, this study proposes a fuzzy controller to enhance flux signals. This controller incorporates a fuzzifier, a fuzzy rule base, and a defuzzifier. Additionally, a backpropagation neural network control is employed to improve torque signals. The multilayer neural network is utilized not only to calculate torque rapidly but also to enhance calculation accuracy. The proposed control method effectively reduces flux and torque errors, facilitating smooth control of the three-phase induction motor. After functional verification on an FPGA board, the proposed design is implemented on an ASIC fabricated using the TSMC 0.18-μm CMOS process. The results indicate a chip area of approximately 0.959 × 0.9584 mm² and a power consumption of 2.2524 mW at a supply voltage of 1.8 V and an operating frequency of 10 MHz.
|
|
10:10-10:30, Paper ThAT3.6 | |
Classification and Deployment of Animal Images Based on Convolutional Neural Networks and Transferability Estimation (I) |
|
Jiang, Biyi | Nanjing University of Science and Technology |
Wang, Jun | Nanjing University of Science and Technology |
Zhu, Peng | Nanjing University of Science and Technology |
Yang, Huanyu | Nanjing University of Science and Technology |
Keywords: Human Perception in Multimedia, Cognitive Computing
Abstract: Although the current technology has achieved good results in classifying a large number of animal images, it is more practical to choose the right model and transfer it to the mobile applet application, so that users can quickly identify the classification of animals through the phone. Therefore, to address this problem, this study is divided into three parts: model training, selection; model portability assessment; model optimization and its deployment in WeChat applet side. First, an optimal model is selected by comprehensively evaluating the accuracy, parameter counts, precision, F1-Score value and loss value of the model. Then the portability of the model is assessed using the LogME method. Finally, the introduction of the ECA (Efficient Channel Attention) to the selected model resulted in a 0.51% increase in model accuracy and a 0.46% increase in precision. And the breakthrough of applying the large model to the WeChat applet side to realize the fast recognition of animal classes, which provides a good application case for future research.
|
|
ThAT5 |
MR05 |
Technology Assessment 1 |
|
Chair: Zhang, Rui | Changsha University |
|
08:30-08:50, Paper ThAT5.1 | |
Deep Arc Detection for High-Speed Railways: An Improved YOLOv5 Approach with SEC3 Attention and BiFPN Fusion |
|
Ma, Yixuan | Beijing Jiaotong University |
Xu, Shuai | Beijing Jiaotong University |
Jia, Huiyan | Beijing Jiaotong University |
Keywords: Intelligent Transportation Systems
Abstract: Reliable and efficient arc detection is critical for ensuring the safe operation of high-speed railways. However, existing methods often suffer from low accuracy and high computational complexity, hindering their practical application. To tackle these challenges, we propose a novel deep learning- based method that significantly improves the YOLOv5 object detection network for real-time arc detection in pantograph- catenary systems. Our approach introduces two key innova- tions: (1) a Squeeze-and-Excitation-based C3 (SEC3) attention module to adaptively prioritize informative features, and (2) a Bi-directional Feature Pyramid Network (BiFPN) for enhanced multi-scale feature fusion. We conduct comprehensive experi- ments on a diverse dataset collected from real-world scenarios, covering various challenging conditions. The results show that our improved YOLOv5 network achieves superior performance compared to state-of-the-art methods in both accuracy and efficiency. Moreover, ablation studies confirm the merits of the proposed SEC3 attention and BiFPN fusion modules in boosting performance. Our approach offers a promising solution for automatic arc detection, thereby contributing to the enhanced safety and reliability of high-speed railway systems.
|
|
08:50-09:10, Paper ThAT5.2 | |
Optimal Re-Sequencing of Electric Vehicle Platoons Based on Deep Reinforcement Learning |
|
Liu, Miao | Nanjing Tech University |
Peng, Chu | Nanjing Tech University |
Guo, Shaopan | Nanjing Tech University |
Xiao, Long | Nanjing Tech University |
Shi, Benyun | Nanjing Tech University |
Peng, Yue | Nanjing Tech University |
Keywords: Intelligent Transportation Systems, Electric Vehicles and Electric Vehicle Supply Equipment
Abstract: This study addresses the issue of uneven energy consumption in electric vehicle (EV) platoons, arising from the static ordering of vehicles within the fleet. Such an imbalance can negatively impact both the efficiency of individual vehicles and the overall dynamics of the platoon. Our approach proposes dynamically altering the formation of the fleet during transit to balance energy use. The core challenge is to identify the most efficient vehicle sequence at predetermined re-sequencing points during the journey. To address this, we introduce three innovative methods based on deep reinforcement learning, chosen for their ability to handle complex, dynamic optimization problems. Our experimental studies, conducted on actual transportation networks, demonstrate these methods significantly enhance energy management and distribution efficiency in EV platoons, highlighting their potential for practical applications in intelligent transportation systems.
|
|
09:10-09:30, Paper ThAT5.3 | |
Cooperative Adaptive Fault-Tolerant Braking Control for Urban Rail Trains with Prescribed Performance |
|
Zhang, Rui | Changsha University |
Chen, Bin | Changsha University of Science and Technology |
Li, Heng | Central South University |
Zhu, Peidong | Changsha University |
Kong, Lingshuang | Changsha University |
Keywords: Intelligent Transportation Systems, Cooperative Systems and Control, System Modeling and Control
Abstract: Faults in braking actuators can compromise the safety and stability of urban rail train operations. Existing fault-tolerant control methods for trains struggle to guarantee both transient and steady-state braking performance quantitatively. In this paper, we propose a cooperative fault-tolerant braking control with prescribed performance for urban rail trains. A coupled multi-agent braking model is first developed, where each vehicle is treated as an independent and controllable agent subject to various uncertainties, input saturation and different levels of actuator faults. Further, incorporating a prescribed tracking performance function, a distributive adaptive terminal sliding mode controller is developed to ensure safe and reliable train braking control. The control input saturation nonlinearity is addressed by employing a smooth hyperbolic tangent function for approximation. Adaptation laws are introduced to mitigate the effects of parameter uncertainties and external disturbances. The efficacy of the proposed control scheme is validated through comprehensive numerical simulations.
|
|
09:30-09:50, Paper ThAT5.4 | |
Robust Decentralised Control for Modular Aerial Parcel Delivery Using Persistently Excited Physics-Informed Neural Networks |
|
Kamath, Archit Krishna | Nanyang Technological University Singapore |
Nahavandi, Saeid | Swinburne University of Technology |
Anavatti, Sreenatha | University of New South Wales |
Feroskhan, Mir | Nanyang Technological University Singapore |
Keywords: Modeling of Autonomous Systems, Intelligent Transportation Systems, Autonomous Vehicle
Abstract: This paper presents a robust decentralised control approach for modular aerial parcel delivery using persistently excited physics-informed neural networks (PE-PINNs). The proposed method enables each propeller module to independently generate control efforts based solely on its local state information and that of its 1-hop neighbors, without requiring global system knowledge. The PE-PINN is trained to approximate the optimal centralized control policy by incorporating the nominal system dynamics and accounting for modeling uncertainties. Key innovations include estimating the Lipschitz constant to ensure persistent excitation during training, and a decentralised control formulation that minimizes the difference between the learned and optimal control efforts. Experimental results on a modular aerial testbed demonstrate the PE-PINN's ability to achieve high-accuracy fixed-point hover and trajectory tracking performance, outperforming a prior decentralised control approach by 8.57% and 24.17% respectively. The proposed framework enables scalable and robust control of modular aerial systems for parcel delivery applications.
|
|
09:50-10:10, Paper ThAT5.5 | |
Eye Tracking Based Data Annotation Behavior Analysis on Consecutive Images (I) |
|
Chen, Zhenqin | Guangdong University of Technology |
Luo, Chaoquan | Guangdong University of Technology |
Li, Yuxi | Guangdong University of Technology |
Yang, Zhuo | Guangdong University of Technology |
Li, Ming | The Hong Kong Polytechnic University |
Keywords: Technology Assessment, Quality and Reliability Engineering, Consumer and Industrial Applications
Abstract: Data annotation is pivotal in training artificial intelligence models, with consecutive images annotation emerging as a key focus in this domain. Unlike single images or videos, consecutive images provide enhanced temporal resolution and continuity. Moreover, the research of eye movements offers valuable insights into attentional shifts, opening avenues for understanding user attention patterns. In this context, by integrating eye movement data with data annotation, we propose an innovative eye-tracking-assisted consecutive images annotation system called GazeLabeler. GazeLabeler aims to quantitatively assess annotation quality by analyzing eye movements of annotators while they evaluate a series of consecutive industrial images. Key metrics under examination include annotators' first gaze duration, regression count, gaze-saccade ratio, intersection over union (IoU) score, and Consecutive Images Gaze Synchronization (CIGS). Furthermore, our system offers personalized or group visualization analyses of eye movement data, empowering users to better comprehend annotators' behaviors and annotation quality.
|
|
10:10-10:30, Paper ThAT5.6 | |
A Deep Reinforcement Learning Approach to Optimize Closing down a Single-Arm Cluster Tool (I) |
|
Liang, WeiXin | Guangdong University of Technology |
Zhu, QingHua | Guangdong University of Technology |
Li, ZongRu | Guangdong University of Technology |
Zhou, JianTie | Guangdong University of Technology |
Hou, Yan | Guangdong University of Technology |
Keywords: Manufacturing Automation and Systems, Intelligent Green Production Systems
Abstract: Cluster tools are widely adopted in semiconductor manufacturing. When a processing module fails, a cluster tool must undergo a close-down process, transiting to an idle state. To increase the throughput of a wafer fab, minimizing the makespan of a close-down process is economically significant. However, the optimization of a close-down process is challenging due to wafer residence time constraints. Existing linear program models must be constructed when specific processing time and robot activity time parameters are given. To address the issue, we propose a scheduling method based on deep reinforcement learning. For this reinforcement learning algorithm, the specific states, actions, and reward functions are designed for the close-down process to follow the Markov property, and a deep Q-network is adopted to find the optimal scheduling policy. Experimental results demonstrate that, in contrast to traditional scheduling methods, the proposed method achieves optimal policy and also excels in generalization, enabling adaptive real-time production scheduling.
|
|
ThAT6 |
MR06 |
Discrete Event and Distributed Systems 2 |
Regular Papers - SSE |
Chair: Li, Bin | Fujian University of Technology |
|
08:30-08:50, Paper ThAT6.1 | |
Computational Logistics: Definition Evolution, Conceptual Architecture, Practical Philosophy and Typical Application |
|
Li, Bin | Fujian University of Technology |
Keywords: Decision Support Systems, Discrete Event Systems, Intelligent Transportation Systems
Abstract: With the rapid development of global supply chain, the existing methodology and solution struggle to cope with the operation of complex logistics systems (CLS). Thereupon, the computational logistics is suggested systematically from the perspectives of definition evolution, conceptual architecture, problem-oriented exploration, and practical philosophy. All provide an elementary sketch and synopsis of CLS oriented scheduling and decision methodology. Subsequently, a typical application of computational logistics is discussed on container terminal handling system (CTHS). After the abstraction and automation of container terminal-oriented logistics generalized computation, the memory access model and computing principles in computer science are transferred and integrated to propose the yard facility-block accessing and switching model (YFB-ASM). The processor affinity, spatial locality and cooperative computing of CTHS are discussed according to the practice data in a large-scale CTHS. The YFB-ASM is supposed to establish a solid foundation for rolling plan, task scheduling, resource allocation and performance analysis of CTHS. It illustrates the feasibility, availability, credibility and practicability of computational logistics preliminarily.
|
|
08:50-09:10, Paper ThAT6.2 | |
Analysis of Bitcoin Fork by Colored Petri Nets |
|
Zhou, Zeyu | Xidian University |
Liu, Ding | Xidian University |
Shmeleva, Tatiana | Max Planck Institute for Software Systems, Kaiserslautern and Sa |
Zaitsev, Dmitry | The University of Derby |
Keywords: Discrete Event Systems, System Modeling and Control
Abstract: Bitcoin is under the threat of fork since it operates with a distributed ledger. Predicting the fork probability in advance is beneficial for taking early action to avoid malicious attacks. In this study, we compose a colored Petri net model of Bitcoin. Our model consists of a given number of nodes, and each node has five subpages representing the node structure: proof of work, broadcast blocks, verify blocks, and the process of adding blocks to blockchain, respectively. Simulation results of fork probability can be easily obtained and analyzed by observing the data in the measuring components of subpages. The results show that our model correctly simulates the fork probability: on recent Bitcoin data, compared with the results of the wide-known SimBlock simulator, a difference of some 4.3% has been obtained. Thus, taking into account vivid graphical representation, our model has certain advantages for the developing techniques of attack avoidance.
|
|
09:10-09:30, Paper ThAT6.3 | |
Fed-BRMC: Byzantine-Robust Federated Learning Via Dual-Model Contrastive Detection |
|
Gan, Xiaoyun | Guangxi Normal University |
Lu, Peng | Guangxi Normal University |
Gan, Shanyu | Guangxi Normal University |
Xian, Youquan | Guangxi Normal University |
Peng, Kaichen | Guangxi Normal University |
Liu, Peng | Guangxi Normal University |
Li, Dongcheng | Guangxi Normal University |
|
|
09:30-09:50, Paper ThAT6.4 | |
DFedCL: Decentralized Federated Collaborative Learning with Privacy Protection |
|
Hu, Jifei | Lishui University |
Li, Yanli | Nantong University |
Liu, Lifa | Lishui University |
Lou, Hua | Lishui University |
Keywords: Distributed Intelligent Systems, Quality and Reliability Engineering, System Architecture
Abstract: Federated Learning (FL) is a new machine learning paradigm that allows multiple clients to jointly train a global model without sharing the raw data. To achieve privacy protection, existing FL methods incorporate artificial noise into client model updates to ensure differential privacy (DP). However, these DP-FL approaches are vulnerable to single-point failures at the server node and suffer from significant performance declines under non-independent and identical (non-IID) scenarios. To mitigate the research gap, we propose the Decentralized Federated Collaborative Learning (DFedCL.) DFedCL framework is designed as a fully decentralized system where clients can directly share updated models without a server node. Within this framework, each client maintains a private model locally and uses a sharing model for information exchange. These two models are collaboratively trained and updated in each learning iteration based on the noise-added gradients generated. To improve the model generalization ability and enhance model learning performance in non-IID scenarios, we further propose the SAM-based update correction, applied by the sharing model exchange. We evaluate our proposed DFedCL through MNIST and CIFAR-10 datasets in different non-IID degrees, the experimental results show the DFedCL achieves state-of-the-art performance.
|
|
09:50-10:10, Paper ThAT6.5 | |
Finite-Time Multi-Parameter Smoothing Distributed Algorithm for Delayed Multi-Agent Systems Subject to Switching Topology |
|
Leng, Jiahao | University of Electronic Science and Technology of China |
Zhong, Qishui | University of Electronic Science and Technology of China |
Han, Sheng | University of Electronic Science and Technology of China |
Li, Guoyi | Southwest Minzu University |
Zhang, Gangming | University of Electronic Science and Technology of China |
Keywords: Distributed Intelligent Systems, System Modeling and Control, Communications
Abstract: This brief tackles the finite-time consensus and distributed optimization issues in multi-agent systems (MASs) affected by time-varying delays (TDs). It centers on scenarios where the agents’communication network topology is characterized by a switching undirected graph. Distributed time-varying optimization involves a collective effort by multiple agents to collaboratively minimize the aggregate of local objective functions that change over time. This process is subject to time-dependent equality constraints and relies solely on information available at the local level and from neighboring agents. First, a finite-time multi-parameter smooth distributed algorithm with input TDs is proposed to solve the optimization problem of MASs with arbitrary switching topologies. Secondly, combined with the Lyapunov stability theory, the states of each agent can achieve consensus and asymptotically track the optimal solution trajectory within a finite-time is proved. At last, an illustrative simulation example of target tracking with MASs is given to verify the effectiveness and applicability of the theoretical results.
|
|
10:10-10:30, Paper ThAT6.6 | |
Adaptive Observer Design for Actuator Fault Detection in Hybrid Systems |
|
Zarei, Jafar | Shiraz University of Technology |
Rastgoo, Hasan | Shiraz University of Technology |
Saif, Mehrdad | University of Windsor |
Keywords: Fault Monitoring and Diagnosis, Discrete Event Systems, Adaptive Systems
Abstract: This paper deals with the problem of fault detection for hybrid systems, focusing on the Mixed Logical Dynamic (MLD) modeling approach. An innovative nonlinear observer called the efficient adaptive observer for actuator fault detection is introduced. Despite the promising aspects of MLD, its application in fault diagnosis remains relatively unexplored. This research addresses this gap by investigating fault detection, isolation, and estimation, within the MLD framework. Leveraging the inherent advantages of MLD modeling, such as integrated system representation and simplified control task implementation, the study aims to develop an effective state estimation method tailored for hybrid systems. Additionally, it proposes novel fault detection methods leveraging residual space structures. The findings contribute to advancing fault diagnosis techniques for hybrid systems, offering practical implications for diverse engineering applications.
|
|
ThAT7 |
MR07 |
Online - Deep Learning and AI Applications 1 |
|
Chair: Wu, Jiawei | Sichuan Normal University |
|
08:30-08:50, Paper ThAT7.1 | |
Cross-Angle Facial Individual Verification of Giant Panda |
|
Wu, Jiawei | Sichuan Normal University |
Su, Han | Sichuan Normal University |
Min, Peng | Sichuan Normal University |
He, Mengnan | Chengdu Research Base of Giant Panda Breeding |
Wu, Pengcheng | Chengdu Research Base of Giant Panda Breeding |
Luo, Gai | Chengdu Research Base of Giant Panda Breeding |
Hou, Rong | Chengdu Research Base of Giant Panda Breeding |
Chen, Peng | Chengdu Research Base of Giant Panda Breeding |
Keywords: Biometric Systems and Bioinformatics, Image Processing and Pattern Recognition, Deep Learning
Abstract: Enhancing the protection of giant pandas necessitates a precise means of verifying their individual information. With the development of deep learning technology, some individual recognition methods for giant pandas have emerged. However, these methods often neglect cross-angle facial recognition, a common occurrence in natural settings, and predominantly focus on identification rather than verification, rendering them unsuitable for wild populations of unknown individuals. Cross-angle facial verification poses novel challenges, notably feature misalignment and geometric deformation induced by varying angles, particularly in the verification process reliant on comparing feature discrepancies. To address this issue, we developed the CrossPandaFace model. This model employs a Pixel Drift Unit (PDU) to adjust feature pixels and utilizes template features generated by the Template Generation Module (TGM) as a reference to align giant pandas from different angles onto a unified template. Furthermore, Multi-scale Feature Supplement (MSFS) compensates for the potential risk of losing local features of aligned features. Experimental results show state-of-the-art performance, thus affirming the efficacy of our model for cross-angle giant panda verification.
|
|
08:50-09:10, Paper ThAT7.2 | |
Efficient CNN-Transformer Aggregation Network for Remote Sensing Image Change Detection |
|
Kong, Bingjie | Qilu University of Technology (Shandong Academy of Sciences) |
Meng, Qinglong | Qilu University of Technology (Shandong Academy of Sciences) |
Hao, Fengqi | Key Laboratory of Computing Power Network and Information Secur |
Bai, Jinqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, Machine Vision, Neural Networks and their Applications
Abstract: Change detection is a crucial task in the field of remote sensing, focusing on identifying areas where significant changes occur between two remote sensing images captured at different time points. However, existing change detection methods based on CNNs or Transformers face two main challenges. Firstly, CNNs often lack the ability to effectively model long-range dependencies, resulting in diminished recognition performance for targets sharing the same semantics but having different features. Secondly, Transformers leverage self-attention mechanisms to capture long-range dependencies adeptly but may struggle to capture local details and multi-scale features effectively. In response to these challenges, we propose a CNN and Transformer Aggregation Network (CTA–Net) for change detection in remote sensing images. Specifically, we devise two encoders based on Transformers and CNNs, respectively, facilitating the generation of complementary global and local features during the encoding phase. In the decoding phase, we design a pyramid-structured complementary decoder to aggregate multi-level complementary features from the CNN and Transformer branches. Furthermore, we propose a novel skip connection strategy to exploit the relationship between decoded features and multi-level complementary features, thereby enhancing the model's multi-scale invariance. Extensive experiments conducted on multiple benchmark datasets validate the effectiveness of our proposed CTA-Net model.
|
|
09:10-09:30, Paper ThAT7.3 | |
Knowledge-Based Transformer: Enhancing Neural Machine Translation through Knowledge Base Integration |
|
Vo, Phong | University of Science, Ho Chi Minh City |
Nguyen, Long | University of Science, Ho Chi Minh City, Vietnam |
Keywords: Application of Artificial Intelligence, Computational Intelligence, Neural Networks and their Applications
Abstract: The Transformer architecture, introduced since 2017, has demonstrated its power in machine translation tasks. However, significant challenges persist, notably in accurately understanding and converting meaning between languages. Out-of-vocabulary words, especially rare entities and terminological expressions in datasets, are the main cause of this inefficiency. We propose two methods integrating knowledge bases into the Transformer model to generate more accurate translations for entities and terms, thus improving overall translation quality. Specifically, these methods involve linking representation vectors for entities in the knowledge base to internal vectors generated within the Transformer architecture or directly replacing internal vectors with those of corresponding entities in the knowledge base. Our methods have shown promising results on language pairs from the IWSLT dataset, evaluated using four metrics: BLEU, TER, METEOR, and GLEU. Our source code is available on GitHub at https://github.com/VTaPo/KnowledgeBaseTransformer.
|
|
09:30-09:50, Paper ThAT7.4 | |
Grassland Mouse Hole Recognition Model Based on UAV Remote Sensing and Improved YOLOv7 |
|
Han, Yu | Inner Mongolia University |
Bai, Xiangyu | Inner Mongolia University |
Gao, Xiaochen | Inner Mongolia University |
Cheng, Haoran | Inner Mongolia University |
Zhou, Kexin | Inner Mongolia University |
Keywords: Application of Artificial Intelligence, AI and Applications, Machine Learning
Abstract: As the most widely distributed vegetation type on Earth, grasslands are the second largest ecosystem after forests, and are known as the "skin of the earth". However, the frequency of rodent infestation has been increasing year by year in recent years due to climate change, loose soil and other factors. As the current traditional monitoring methods are time-consuming and laborious, while the deep learning method has the advantages of low cost, high efficiency and high accuracy. Therefore, this paper proposes a combination of UAV remote sensing and deep learning methods to monitor grassland mouse hole.First, because there is no publicly available grassland mouse hole dataset, this study used a UAV to capture images to create a grassland mouse hole dataset. Second, an improved YOLOv7 model, YOLOAPM, is proposed with the goal of improving the accuracy of small target recognition. Through ablation and comparison experiments, the validity of the improved model as well as the method of this paper are verified to be improved compared to other models. Finally, in order to facilitate the monitoring task of the researchers, a monitoring system is established and the mouse hole identification of this paper is embedded into the system.
|
|
09:50-10:10, Paper ThAT7.5 | |
SCMA Codebooks Design for Three Optimization Algorithms Based on Eisenstein Integer Unit Circle |
|
Wan, Teng | Xinjiang University |
Ge, Wenping | Xinjiang University |
Ubul, Kurban | Xinjiang University |
Liao, Yifan | Xinjiang University |
Keywords: Communications
Abstract: Codebook design plays a crucial role in non- orthogonal sparse code multiple access technology. In this paper, a mother constellation constructed by the Eisenstein integer unit circle in the complex plane is proposed, and the power imbalance and dimensionality reduction are introduced into the codebook design. Three optimization algorithms are used to maximize the minimum Euclidean distance (MED) of superimposed codewords on the resource element as the objective function, and the optimal solution of the rotation angle is finally obtained, so as to obtain the three expected codebooks. Simulation results demonstrate that the proposed codebooks have smaller bit error rate (BER) and better performance than the four benchmark codebooks provided under the condition of a Gaussian channel.
|
|
10:10-10:30, Paper ThAT7.6 | |
A Novel and Efficient Web Service Discovery Method Based on Threaded Index |
|
Liu, Jiamei | University of Science and Technology Beijing |
Keywords: Service Systems and Organizations, Infrastructure Systems and Services
Abstract: As the number of web services proliferates, service discovery has emerged as a burgeoning area of research. Conventional service discovery methods hinge on index creation to enhance the efficiency of service identification. Owing to the multilevel indexing mechanism, determing the hash range for distinct services is crucial. Additionally, heterogeneity between service compositions results in differing indexing structures. In this study, we present a novel thread-based indexing method explicitly devised for service discovery in web service environments. Our approach commences with employing indexing to distinguish partitioning intervals of web service repositories. Subsequently, we assign a corresponding index hash value to each web service. Each index is then threaded and registered within a thread-based index pool. By allocating threads to a central service discovery server, the pertinent search thread undertakes the service retrieval task initiated by the service consumer, enabling asynchronous processing and result delivery to the client. To corroborate our method, we performed comprehensive experiments in real-world web service discovery scenarios. The outcomes underscore the effectiveness of our approach, emphasizing its potential to augment service discovery in web service environments.
|
|
ThAT9 |
MR09 |
Deep Learning and Neural Networks 9 |
Regular Papers - Cybernetics |
Chair: Wang, Ran | Shenzhen University |
|
08:30-08:50, Paper ThAT9.1 | |
On the Adversarial Robustness of Hierarchical Classification |
|
Wang, Ran | Shenzhen University |
Simeng, Zeng | Shenzhen University |
Wenhui, Wu | Shenzhen University |
Yuheng, Jia | Southeast University |
Ng, Wing Yin | South China University of Technology |
Wang, Xizhao | Shenzhen University |
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications
Abstract: Deep neural networks (DNNs) have demonstrated remarkable success on various learning problems, but they face a formidable challenge in the form of adversarial attacks. Especially, when dealing with complex classification tasks for numerous classes with a hierarchical structure, the adversarial robustness of a DNN model may drop seriously. In this paper, we investigate the adversarial robustness of DNN models on such complex classification tasks. In response, we propose a two-stage hierarchical classification framework, which is composed of a coarse-grained classifier and a series of fine-grained classifiers. A data correction sampling module is designed between the two stages, in order to mitigate the influence of misclassification caused by the coarse-grained classifier; and a discriminative filter learning module is employed in the fine-grained classification, in order to gain better distinguish abilities among fine-grained categories. Experiments on the well-known dataset CIFAR-100 and a newly-constructed hierarchical dataset mini-ImageNet76 demonstrate that employing a hierarchical framework can effectively improve the model robustness on such complex classification tasks.
|
|
08:50-09:10, Paper ThAT9.2 | |
WASPCN-Net: Automatic Detection of Obstructive Sleep Apnea Using Smoothed Wavelet Spectrograms of Single-Lead ECG Signals |
|
Bhongade, Amit | Indian Institute of Technology Delhi |
Gandhi, Tapan Kumar | Indian Institute of Technology Delhi |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Obstructive sleep apnea (OSA) is an extremely severe condition. At present, the conventional polysomnography (PSG) test is utilised to treat OSA using several physiological signals, such as an electroencephalogram (EEG), an electrocardiogram (ECG), and the oxygen level in their blood. During the PSG test, the patient is required to wear many sensors while sleeping. This method is unquestionably complex, expensive, and can cause discomfort to patients. In addition, it is more appropriate to use single-lead ECG signals for wearable mobile devices due to their compatibility with noninvasive requirements and hardware limitations. In this research, a deep learning model (DLM) using smoothed wavelet spectrograms (SWS) of ECG signals is proposed for the automatic classification of OSA. The SWSs are smoothed using the Savitzky-Golay (S-G) filter. Then, these SWSs are provided as input to the designed DLM called WAvelet SPectrogram-based Convolutional Neural Network (WASPCN-Net) and pre-trained Res-Net50 model. The WASPCN-Net model obtained an accuracy of 87.25%, sensitivity of 78.97%, and specificity of 92.35% with SWS using a 10-fold cross-validation approach, which is superior to many state-of-the-art techniques. Further, we also obtained the performance on the pre-trained ResNet-50 model and received an accuracy of 86.92%, sensitivity of 81.74%, and specificity of 90.12%. The proposed WASPCN-Net is more accurate, simple, fast, and robust than the Res-Net50 model because it requires relatively few tunable learning parameters.
|
|
09:10-09:30, Paper ThAT9.3 | |
APS-LSTM: Exploiting Multi-Periodicity and Diverse Spatial Dependencies for Flood Forecasting |
|
Feng, Jun | Hohai University |
Liu, Xueyi | Hohai University |
Lu, Jiamin | Hohai University |
Shao, Pingping | Hohai University |
Keywords: Deep Learning, Neural Networks and their Applications
Abstract: Accurate flood prediction is crucial for disaster prevention and mitigation. Hydrological data exhibit highly nonlinear temporal patterns and encompass complex spatial relationships between rainfall and flow. Existing flood prediction models struggle to capture these intricate temporal features and spatial dependencies. This paper presents an adaptive periodic and spatial self-attention method based on LSTM (APS-LSTM) to address these challenges. The APS-LSTM learns temporal features from a multi-periodicity perspective and captures diverse spatial dependencies from different period divisions. The APS-LSTM consists of three main stages, (i) Multi-Period Division, that utilizes Fast Fourier Transform (FFT) to divide various periodic patterns; (ii) Spatio-Temporal Information Extraction, that performs periodic and spatial self-attention focusing on intra- and inter-periodic temporal patterns and spatial dependencies; (iii) Adaptive Aggregation, that relies on amplitude strength to aggregate the computational results from each periodic division. The abundant experiments on two real-world datasets demonstrate the superiority of APS-LSTM. The code is available: https://github.com/oopcmd/APS-LSTM.
|
|
09:30-09:50, Paper ThAT9.4 | |
KANS: Knowledge Discovery Graph Attention Network for Soft Sensing in Multivariate Industrial Processes |
|
Tew, Hwa Hui | Monash University Malaysia |
Li, Gaoxuan | Monash University |
Ding, Fan | Monash University |
Luo, Xuewen | Monash University |
Loo, Junn Yong | Monash University Malaysia |
Ting, Chee-Ming | Monash University Malaysia, School of Information Technology |
Ding, Ze Yang | Monash University Malaysia |
Tan, Chee Pin | Monash University |
Keywords: Deep Learning, Neural Networks and their Applications, Knowledge Acquisition
Abstract: Soft sensing of hard-to-measure variables is often crucial in industrial processes. Current practices rely heavily on conventional modeling techniques that show success in improving accuracy. However, they overlook the non-linear nature, dynamics characteristics, and non-Euclidean dependencies between complex process variables. To tackle these challenges, we present a framework known as a Knowledge discovery graph Attention Network for effective Soft sensing (KANS). Unlike the existing deep learning soft sensor models, KANS can discover the intrinsic correlations and irregular relationships between the multivariate industrial processes without a predefined topology. First, an unsupervised graph structure learning method is introduced, incorporating the cosine similarity between different sensor embedding to capture the correlations between sensors. Next, we present a graph attention-based representation learning that can compute the multivariate data parallelly to enhance the model in learning complex sensor nodes and edges. To fully explore KANS, knowledge discovery analysis has also been conducted to demonstrate the interpretability of the model. Experimental results demonstrate that KANS significantly outperforms all the baselines and state-of-the-art methods in soft sensing performance. Furthermore, the analysis shows that KANS can find sensors closely related to different process variables without domain knowledge, significantly improving soft sensing accuracy.
|
|
09:50-10:10, Paper ThAT9.5 | |
VNet: A GAN-Based Multi-Tier Discriminator Network for Speech Synthesis Vocoders |
|
Cao, Yubing | Xinjiang University |
Li, Yongming | Xinjiang University |
Wang, Liejun | Xinjiang University |
Yu, Yinfeng | Xinjiang University |
Keywords: Deep Learning, AI and Applications, Image Processing and Pattern Recognition
Abstract: Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-limited spectral information, which inevitably sacrifices high-frequency details. To address this, we adopt the full-band Mel spectrogram information as input, aiming to provide the vocoder with the most comprehensive information possible. However, previous studies have revealed that the use of full-band spectral information as input can result in the issue of over-smoothing, compromising the naturalness of the synthesized speech. To tackle this challenge, we propose VNet, a GAN-based neural vocoder network that incorporates full-band spectral information and introduces a Multi-Tier Discriminator (MTD) comprising multiple sub-discriminators to generate high-resolution signals. Additionally, we introduce an asymptotically constrained method that modifies the adversarial loss of the generator and discriminator, enhancing the stability of the training process. Through rigorous experiments, we demonstrate that the VNet model is capable of generating high-fidelity speech and significantly improving the performance of the vocoder.
|
|
10:10-10:30, Paper ThAT9.6 | |
TRFP: A Trip Recommendation Approach for a Query with Fixed Intermediate POI |
|
Luan, Wenjing | Shandong University of Science and Technology |
Jiang, Guodong | Shandong University of Science and Technology |
Qi, Liang | Shandong University of Science and Technology |
Liu, Kun | Shandong University of Science and Technology |
Guo, Xiwang | Liaoning Petrochemical University |
Keywords: Neural Networks and their Applications, Representation Learning, Deep Learning
Abstract: Trip recommendation, as a service based on location-based social networks, aims to provide users with a sequence of points of interest (POIs) according to their preference of interest and requirements when exploring unfamiliar cities. In contrast to prior research on trip recommendation, our research deals with such problem: If a user is scheduled to attend an academic conference at 2:30 PM, how might he make a visit to the city’s attractions while still managing to attend the conference? In this paper, we refer to it as trip recommendation with a single fixed intermediate POI (FP). To address this problem, a trip recommendation method based on mixed graph representation learning is proposed. Firstly, a mixed graph is used to describe the spatial, temporal, and transition knowledge in users’ check-in data, where directed edges represent transition relations among POIs. Then, we employ graph convolutional network to integrate knowledge matrices extracted from the mixed graph, aiming to obtain POI and time embeddings. Finally, a trip inference module, incorporating dual decoder, POI popularity knowledge and positional encoding, is designed to generate a trip for a given query. Experiments are conducted on five real-world trip datasets. The results demonstrate that the proposed method outperforms several widely-used baselines when recommending a trip with a FP.
|
|
ThAT10 |
MR10 |
Deep Learning and Neural Networks 12 |
Regular Papers - Cybernetics |
Chair: Sedov, Vivian | Royal Holloway, University of London |
|
08:30-08:50, Paper ThAT10.1 | |
Auxiliary Generative Adversarial Networks with Illustration2Vec and Q-Learning Based Hyperparameter Optimisation for Anime Image Synthesis |
|
Sedov, Vivian | Royal Holloway, University of London |
Zhang, Li | Royal Holloway, University of London |
Keywords: Computational Intelligence, Machine Learning, Neural Networks and their Applications
Abstract: Harnessing the power of Generative Adversarial Networks (GANs) for the specialised task of anime face generation, this study introduces enhanced models of Auxiliary Classifier GAN (AC-GAN) and Wasserstein Auxiliary Classifier GAN (WAC-GAN) with modified network architectures and reinforcement learning-based hyperparameter optimisation. These models are uniquely adapted to handle the distinct nuances of anime-style imagery, a domain where conventional GANs often stumble due to complex stylistic variations and a heightened risk of mode collapse. Novel elements of our approach include, (1) modification of existing generator and discriminator architectures of both AC-GAN and WAC-GAN, (2) Q-learning based optimal hyperparameter selection, and (3) Illustration2Vec (I2V)-based automated attribute label extraction. Specifically, the Q-learning method is employed for hyperparameter search which effectively explores the search space of key network configurations by fulfilling the principles of Bellman optimality. Besides that, a deep learning-based I2V's method is utilised to generate attribute class labels and latent vectors to inform the generation process. Furthermore, we augment AC-GAN and WAC-GAN with additional layers to enhance their feature learning and generative capabilities. The insertion of these additional layers is calibrated based on the optimised network learning settings as well as the class labels derived from I2V, to fine-tune model scalability and diversity. Our experimental studies indicate that the conjunction of these techniques has led to a significant improvement in generating high-fidelity anime faces, adeptly handling the diverse and complex attributes inherent in anime-style imagery. The proposed strategies also showcase the potential of our customised AC-GAN and WAC-GAN models to master the nuanced art of anime face generation.
|
|
08:50-09:10, Paper ThAT10.2 | |
CSLP: Collaborative Solution to Long-Tail Problem and Popularity Bias in Sequential Recommendation |
|
Huang, Yan | Qilu University of Technology (Shandong Academy of Sciences) |
Yang, Zhenyu | Qilu University of Technology (Shandong Academy of Sciences) |
Hu, Wenyue | Qilu University of Technology (Shandong Academy of Sciences) |
Xu, Baojie | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Zhibo | Qilu University of Technology |
Keywords: Neural Networks and their Applications, Representation Learning, Deep Learning
Abstract: Sequential Recommender Systems (SRS), leveraging the temporal information from users' behaviors, have noticeably improved user experience against traditional systems. However, these behaviors often follow long-tail distribution, making the systems biased towards popular items (i.e., popularity bias). Moreover, popularity bias would amplify the neglect of long-tail recommendations, thereby sharpening the long-tail problem. Previous researches usually address these challenges independently, focusing on reducing the over-recommendation of popular items or enhancing the representation quality of tail items. Indeed, it is possible to incorporate their merits to achieve the best of both worlds. Thus, we propose a novel and unified framework, named Collaborative Solution to Long-tailed problem and Popularity bias (CSLP), to tackle both the long-tail problem and popularity bias simultaneously. To achieve this, we first introduce a representation enhancement module featuring dual generators to enhance user and item representations, particularly for those in the tail. On the other hand, a debiasing module incorporating an Inverse Propensity Score (IPS) with a clipping strategy is introduced to further alleviate the popularity bias. Specifically, this clipping strategy demonstrates a clear decrease in the original IPS method's variance, effectively improving the recommendation for stability and accuracy. Experiments on three widely-used datasets show CSLP's effectiveness in solving both issues. CSLP surpasses all baselines (traditional, popularity bias, and long-tail problem) in overall performance, significantly enhancing recommendation accuracy for both tail users and items, and achieving a more balanced ratio of recommendations between popular and tail items. Code is available at https://github.com/Echohuangyan/CSLP.
|
|
09:10-09:30, Paper ThAT10.3 | |
RFSD-YOLO: An Enhanced X-Ray Object Detection Model for Prohibited Items |
|
Kong, Xiaotong | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Li, Wenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Zhiyao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Yuechen | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Application of Artificial Intelligence, Neural Networks and their Applications, Deep Learning
Abstract: X 射线图像检测对于确保公共性至关重要。 安全性,但传统方法严重依赖人力 分析效率相对较低。为了解决这个问题 针对此问题,本文提出了一种新的目标检测模型 称为 RFSD-YOLO,它基于 YOLOv8 模型,是 专为检测 中的违禁物品 X 线图像。该模型采用RFCAConv结构 而不是传统的卷积运算。这 允许在两者之间进行独立的参数化 卷积核,增强了模型的能力 捕获并表达可能的违禁物品特征。这 该模型的颈部部分包括 GSConv 和 VoVGSCSP 设计,旨在平衡复杂性和参数大小 在保持检测性能并降低 compu- Tational 负担。此外,我们还引入了动态 头部结构,DyHead,取代了传统的 检测头设计,提高检测精度 不增加计算成本。实验结果 证明我们的增强模型超越了当前模型 检测中最先进的物体检测&
|
|
09:30-09:50, Paper ThAT10.4 | |
Simple Structure Enhanced Contrastive Graph Clustering |
|
Wang, Haojin | Shanghaitech University |
Wang, Kexin | Shanghaitech University |
Keywords: Deep Learning, Machine Learning, Representation Learning
Abstract: As an important sub-field in clustering analysis, deep graph clustering is receiving more and more attention from academia and industry. The goal of deep graph clustering is to learn effective embeddings for all nodes in the graph. Such node embeddings can effectively perform clustering task and thus be extended to various real-world application scenarios. However, existing graph clustering methods usually focus on technical-level improvements and ignore data-level information augmentation. In fact, data augmentation on graph data can effectively improve the receptive field and feature richness of model, and its effectiveness has been verified in many deep learning fields. Based on this motivation, we proposed a simple structure enhanced method for graph clustering, called SEGC. Such method only needs to construct a simple deep clustering network at the technical-level to achieve better performance by performing data augmentation at the data-level. Specifically, we leverage the underexplored potential of node activeness to perform edge-increasing and edge-decreasing operations on the original graph data, thereby generating different views to enhance the model's learning ability and receptive field. Extensive experiments on multiple real-world datasets demonstrate the effectiveness of our method.
|
|
09:50-10:10, Paper ThAT10.5 | |
IAMS-Net: An Illumination-Adaptive Multi-Scale Lesion Segmentation Network |
|
Zheng, Yisen | Guangdong University of Technology |
Huang, Guoheng | Guangdong University of Technology, School of Computer Science A |
Zhang, Feng | Department of Otorhinolaryngology, the First Affiliated Hospital |
Cheng, Lianglun | School of Computer Science and Technology, Guangdong University |
Yuan, Xiaochen | Macao Polytechnic University |
Zhong, Guo | Guangdong University of Foreign Studies, School of Information S |
Luo, Shenghong | University of Macau |
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: In recent years, many Lesion segmentation (LS) models based on UNet have been proposed. However, existing researches rarely consider the influence of illumination change leads to the weak boundary area. Such as melanomas and polyps, the demarcation of the boundary between the diseased area and the surrounding tissue remains particularly challenging. To overcome these challenges, we propose an IlluminationAdaptive Multi-scale Lesion Segmentation Network (IAMSNet). In IAMS-Net, we integrate Illumination-Adaptive MultiStream Attention (IAMA) and Contour Perception Module (CPM). In the decoding stage, the IAMA is used as a bridge between the encoder and the decoder to solve the adverse effects of illumination changes on the segmentation of weak boundary lesions. In order to further enhance the boundary features lost due to illumination change in the low-contrast lesion area, we introduce the CPM to improve the perception of the integrity of the lesion area. Subsequently, we performed comparison and ablation experiments using the publicly available ISIC2018 dataset and the individually collected data set BoreIllumination(BI).
|
|
10:10-10:30, Paper ThAT10.6 | |
Enhancing Event Tagger with Automatic Speech Recognizer for Audio Multi-Task Scenarios by Distillation with Pre-Trained Large Models |
|
Cheng, Jianfeng | Lizhi Inc |
Liu, Ye | Sun Yat-Sen University |
Yin, Jian | Sun Yat-Sen University |
Wang, Liangdao | Sun Yat-Sen University |
Pan, Yan | Sun Yat-Sen University |
Keywords: Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Deep Learning, AI and Applications
Abstract: With the continuous expansion of robotics and digital humans in practical applications, the demand for the auditory system is becoming deeper, usually requiring more efficient speech recognition framework capabilities to handle multiple tasks and use fewer resources. In previous audio processing frameworks, each type of audio processing task typically requires constructing a standalone deep network model for training, which results in more training data and higher training time when constructing models for multi-task audio scenarios simultaneously. The recent improvement of audio models based on transformers have brought about methods that can handle multiple audio tasks concurrently. However, recent related methods still require retraining multi-task targets with an amount of data, and achieve the general effect after training for multi-task scenarios than the simple combination of standalone methods processed separately. In order to better build a model that can handle multiple audio tasks, we propose a novel framework of distillation through pre-trained large models for enhancing event tagger with automatic speech recognizer. Through multiple rounds of experiments on several audio datasets, it has been verified that the proposed framework can achieve better results than the baseline for multitasking, and comparative results with less parameters compared to the baselines for single-task scenarios.
|
|
ThAT11 |
MR11 |
Optimization and Self-Organization 2 |
Regular Papers - Cybernetics |
Chair: Yeom, Chanho | Yonsei University |
|
08:30-08:50, Paper ThAT11.1 | |
Conjugate Momentum Quadratic Penalty Alternating Minimization for Total Variation Image Restoration |
|
Ong, Yin Ren | Universiti Teknologi Malaysia |
bin Adam, Tarmizi | Universiti Teknologi Malaysia |
Mohamed, Nur Syarafina | Department of Mathematical Sciences, Faculty of Science, Univers |
Hassan, Mohd Fikree | Monash University Malaysia |
Pang, Yee Yong | Universiti Teknologi Malaysia |
Keywords: Optimization and Self-Organization Approaches, Image Processing and Pattern Recognition, Computational Intelligence
Abstract: Optimization algorithms are a core tool in image restoration. The Quadratic Penalty Alternating Minimization (QPAM) is an algorithm used to tackle image restoration challenges. However, the persistent challenge of slow convergence speed remains. Efforts have been made to enhance convergence speed, including extending the algorithm with Nesterov's momentum method. Yet, the algorithm displays oscillatory patterns during the minimization process, which may result in slow convergence speed. To address this issue, we proposed a conjugate gradient style momentum to accelerate the QPAM for image restoration. The iterative scheme of the proposed method consists of a proximal linearization that is re-formulated for the conjugate momentum acceleration. Experiments on both Gaussian and Poisson noise image restoration show that our proposed Conjugate Momentum QPAM is at par with or better than the original QPAM and its Nesterov accelerated version in terms of CPU time.
|
|
08:50-09:10, Paper ThAT11.2 | |
An Adaptive Parallel Gaussian Homotopy Approach for Zeroth-Order Optimization |
|
Zhou, Bingyi | Huazhong University of Science and Technology |
Wang, Guoyou | Huazhong University of Science and Technology |
Keywords: Optimization and Self-Organization Approaches, Neural Networks and their Applications, Machine Learning
Abstract: The Gaussian homotopy method is a classical optimization approach employed for solving nonconvex problems. It applies Gaussian smoothing to a given problem, using varying smoothing factors to generate a series of proxy problems that are easier to solve, subsequently addressing them gradually from simpler to more complex ones. Traditional Gaussian homotopy methods typically utilize a predefined sequence or parameter to update a single smoothing factor serially. However, this sequential updating process is inefficient and unsuitable for parallel homotopy pipelines. Moreover, the utilization of predefined sequences or parameters leads to a lack of adaptability. To address the abovementioned challenges, we propose an adaptive parallel Gaussian homotopy optimization method. Initially, we introduce an adaptive model called Scaling. This model is formulated as a scaler that concurrently adjusts multiple smoothing factors, with each factor aligning to a homotopy level. It can be integrated into serial and parallel homotopy pipelines easily. Furthermore, we establish a collaborative training regimen to jointly train the Scaling model and a parallel homotopy model, named continuation path learning (CPL) [1] model. Throughout the training process, the Scaling model furnishes CPL with multiple scaled smoothing factors and updates CPL implicitly. Extensive experiments demonstrate that the proposed Gaussian homotopy approach performs competitively.
|
|
09:10-09:30, Paper ThAT11.3 | |
Towards Workload-Specific Configuration Tuning Via Meta-Learning for RocksDB |
|
Yeom, Chanho | Yonsei University |
Lee, Jieun | Yonsei University |
Seo, Sangmin | Yonsei University |
Park, Sanghyun | Yonsei University |
Keywords: Application of Artificial Intelligence, Optimization and Self-Organization Approaches
Abstract: A persistent key-value store, RocksDB, is adaptable to various workloads and provides fast and low-latency storage for devices that are utilized by numerous applications. RocksDB has been introduced with numerous configuration options for customization and performance optimization. Unfortunately, determining an optimal configuration for each given workload remains challenging due to the overwhelming number of options. This complexity is compounded by different types of workloads, thereby requiring efficient configuration tuning. Recent studies have approached automatic tuning techniques to solve this problem by applying reinforcement learning approaches or transferring prior knowledge to predictive models in order to tune unobserved target workloads. However, the former method is time-consuming, and the latter results in unstable optimal performance according to the accuracy of the predictive models. The models trained with prior knowledge, estimate RocksDB performance of given configurations on the target workload, where those workload mismatches degrade tuning performance. To address these challenges, we propose MetaTune, which introduces a meta learner, which is a meta-learning technique, to train a workload-specific predictive model. MetaTune effectively transfers prior knowledge and efficiently fine-tunes the model for new workloads. We conducted a comparative analysis of MetaTune with the state-of-the-art baselines across a heterogeneous set of workloads. MetaTune achieved 3.78% to 53.25% improvement in tuning performance compared to the most recent baseline.
|
|
09:30-09:50, Paper ThAT11.4 | |
Incentive Mechanism of Selfish Nodes Based on Energy Optimization and Game Theory |
|
Dong, XiangJia | Inner Mongolia University |
Cheng, Yuan | Inner Mongolia University |
Seah, Winston | Victoria University of Wellington |
Zhang, Feng | Shanxi University |
Xu, Gang | Inner Mongolia University |
Keywords: Optimization and Self-Organization Approaches, Cybernetics for Informatics, Cloud, IoT, and Robotics Integration
Abstract: With the proliferation of mobile intelligent terminals, opportunistic networks have attracted widespread attention as a complementary technology to multi-network convergence. Different from traditional wireless networks, message delivery in opportunistic networks does not rely on a fixed infrastructure, but rather storing messages in a cache and utilizing the movement and encounters of nodes to relay messages. However, in practical application scenarios, nodes have limited storage space and energy and will easily exhibit selfishness. An increase in the number of selfish nodes will drastically degrade the performance of the network. To solve the problem of significant network performance degradation when the number of selfish nodes is high, this paper proposes an Incentive mechanism of Selfish nodes based on Energy optimization and Game Theory (ISEGT). The mechanism abstracts the process of forwarding messages by nodes into a bargaining game process, and selectively forwards messages based on nodes' remaining energy and other circumstances. The experimental results show that the ISEGT mechanism can motivate selfish nodes to actively participate in message forwarding, which improves the success rate of message delivery and the survival rate of nodes, and optimizes the overall performance of the network.
|
|
09:50-10:10, Paper ThAT11.5 | |
Heterogeneous Robot Swarms with an Attention Mechanism for Dynamic Target Tracking |
|
Zhou, Ziqing | Fudan University |
Chen, Bo | Fudan University |
Ouyang, Chun | Fudan University |
Dong, Xinyang | Fudan University |
Liu, Siao | Fudan University |
Hu, Linqiang | Fudan University |
Xie, Yi | Fudan University |
Zhao, Zhile | Fudan University |
Gan, Zhongxue | Fudan University |
Keywords: Swarm Intelligence, Optimization and Self-Organization Approaches
Abstract: Multirobot collaboration offers significant potential for diverse applications, including tracking and surveillance. In this paper, we introduce an attention mechanism tailored for heterogeneous robot swarms characterized by varied sensing ranges. This mechanism effectively utilizes the swarm's intrinsic characteristics, enabling rapid information transmission and ensuring consistent collective responses to external stimuli. Additionally, we introduce a pigeon-inspired navigation strategy that effectively replaces the traditional obstacle repulsion term by preventing the swarm from becoming trapped in local minima and reducing oscillatory behaviors. To validate the efficacy of our algorithm, we have developed an autonomously designed PlusBot swarm platform, which consists of agile vibration-driven miniature robots. Each of them is equipped with its own computing and communication system and is capable of precise closed-loop motion control. This setup meets the requirements for conducting heterogeneous swarm movement experiments in indoor environments. Through comprehensive numerical simulations and real-world experiments, our method has demonstrated exceptional precision and adaptability in tracking dynamic targets. The comparative analysis underscores the superiority of our approach, particularly in minimizing swarm collisions and ensuring safe navigation in dynamic target-tracking scenarios involving obstacles.
|
|
10:10-10:30, Paper ThAT11.6 | |
Research on Task Collaboration Over Heterogeneous Networks Based on Evolutionary Game Theory |
|
Wu, Hongqian | National University of Defense Technology |
Deng, Hongzhong | National University of Defense Technology |
Li, Jichao | National University of Defense Technology |
Luo, Hankang | National University of Defense Technology |
Keywords: Complex Network, Optimization and Self-Organization Approaches, Swarm Intelligence
Abstract: With the increasing level of machine intelligence, the problem of collaboration among intelligent individuals in heterogeneous networks has become a focal point of research. Taking the task allocation of heterogeneous Unmanned Aerial Vehicle (UAV) swarms as an example, we study the game behavior of heterogeneous network based on task traction. To align with the autonomous and collaborative decision-making process of unmanned swarms, we construct a multi-party multi-strategy evolutionary game framework on heterogeneous networks, define local and global game payoff functions, and innovatively propose an Enhanced Moran Rule that integrates "Pairwise updating" and "Virtual Game" (PVG-EMR algorithm) to improve the task collaboration of UAV swarm. Experiments on various underlying communication network models show that the PVG-EMR algorithm proposed in our study can effectively plan the collaborative object of UAVs and ensure the appropriate assignment of tasks in the heterogeneous network, optimizing both local and global payoffs. Moreover, the algorithm exhibits robust performance.
|
|
ThAT12 |
MR12 |
Smart Systems and Intelligent Production |
|
Chair: Nagpal, Malika | Indian Institute of Technology Mandi, India, 175005 |
|
08:30-08:50, Paper ThAT12.1 | |
Infrared Small Target Detection Based on DETR Architecture and Super-Resolution Technique (I) |
|
Yang, Huanyu | Nanjing University of Science and Technology |
Yang, Lijun | Nanjing University of Science and Technology |
Wang, Jun | Nanjing University of Science and Technology |
Bo, Yuming | Nanjing University of Science and Technology |
Wang, Jiacun | Monmouth University |
Keywords: Visual Analytics/Communication, Environmental Sensing,, Cognitive Computing
Abstract: Infrared small target detection (ISTD) holds significant importance in domains such as maritime search and rescue, and autonomous driving. To enhance the detection capabilities of infrared small targets against complex backgrounds, a novel detection algorithm based on an improved Detection Transformer (DTER) is proposed. This algorithm leverages the DTER detection framework and the EDSR network, utilizing super-resolution reconstructed images as inputs. It incorporates the Enhanced Multi-Scale Attention (EMA) module and an improved backbone structure. Moreover, it employs a micro-target detection encoder head with a new feature layer S2 to elevate the quality of minute feature extraction. The proposed method achieved a mAP@50 of 96% and mAP@(50:95) of 54.6% on a public dataset. Compared to current state-of-the-art methods for infrared small target detection, it demonstrates superior capabilities in reducing false positives and misses while maintaining commendable real-time performance.
|
|
08:50-09:10, Paper ThAT12.2 | |
End-To-End On-Orbit Objects Detection with ConvNets (I) |
|
Ru, Bo | Dalian University of Technology |
Hou, Pengrong | Dalian University of Technology |
Li, Xiang | Dalian University of Technology |
Chu, Qinghao | Dalian University of Technology |
Zeng, Zikang | Dalian University of Technology |
Zhang, Chenming | Dalian University of Technology |
Wang, Zhelong | Dalian University of Technology |
Keywords: Environmental Sensing,
Abstract: As space activities expand, the quantity of space debris also increases, posing significant risks to spacecraft and infrastructure. Space situational awareness (SSA) is essential for avoiding collisions and limiting the generation of extra debris . Accurate and efficient detection of space objects plays a critical role in achieving this goal. Our research focuses on the development of detection algorithms that are both precise and quick, taking into account the real-time and safety of spacecraft operations in orbit. For the first time, we take a fully Convolutional Neural Networks (ConvNets) to run the query-based end-to-end object detection for SSA. We further compare its performance with the newest YOLOv9 algorithm . This is an innovative attempt at SSA. First of all, it does not require predefined a priori anchor boxes or complex post-processing strategies such as Non-Maximum Suppression (NMS), and can directly achieve end-to-end target detection. Secondly, the fully ConvNets are selected as the basic framework, which not only retains the advantages of self-attention mechanism, but also greatly improves the computing efficiency. These methods show outstanding performance on the challenging SPARK data set . The fully ConvNets approach achieves end-to-end detection by utilizing the query attention mechanism, excluding the need for complicated post-processing in traditional object detection methods and with higher efficiency. YOLOv9 involves an enhanced feature pyramid fusion and a more powerful detection head, potentially resulting in higher precision. Following that, we will thoroughly assess the speed, accuracy, and trade-offs of the two algorithms using actual data sets in order to deliver an efficient and dependable solution for detecting targets in aerospace sensing missions.
|
|
09:10-09:30, Paper ThAT12.3 | |
Skeleton-Based Continuous Gesture Recognition Using Gesture Detection and Classification (I) |
|
Bi, Weishan | Sun Yat-Senn University |
Gao, Qing | Sun Yat-Sen University |
Keywords: Human-Computer Interaction, Human Perception in Multimedia, Human Performance Modeling
Abstract: Gesture is one of the common ways of communication in people’s daily life. By combining with human-computer interaction, it can bring us more convenience. However, continuous gesture recognition is a challenge with difficulties, since action-to-action continuity will introduce ambiguous features to the recognition task. In this paper, we propose a method for continuous gestures recognition using hand skeleton. We design two lightweight 1D-CNNs to detect and classify independent gestures. On this basis, we use the concepts of single activation and data filtering to build a system for continuous recognition. The method proposed in this paper is tested on the IPN dataset and achieves an accuracy of 81.28% and an inference speed of 4.75ms for independent gesture recognition, and an accuracy of 54.63% and achieves an inference speed of 20.8ms for continuous gestures recognition. The experimental results demonstrate the lightweight and high accuracy properties of the method.
|
|
09:30-09:50, Paper ThAT12.4 | |
Ubi-Care: An Elderly Life Support Healthcare Framework Based on Ubiquitous Personal Online Data Stores (I) |
|
Chen, Hong | Daiichi Institute of Technology |
Zhou, Tao | Daiichi Institute of Technology |
Wu, Bo | Tokyo University of Technology |
Keywords: Environmental Sensing,, Information Systems for Design
Abstract: In recent years, the global aging population has intensified, leading to a sharp increase in social security benefits and caregiving costs. The elderly face greater health risks, and their behaviors often indicate signs of crises and illnesses. The rapid development of IoT technology offers new solutions for detecting abnormal signs, thereby promoting healthy aging, independent living, and social participation for the elderly. However, the diversity of IoT devices has led to the phenomenon of personal data silos. When the elderly leave their usual IoT environment, the issue of continuity in healthcare services becomes more pronounced. To address this issue, this study proposes a decentralized Ubi-Care framework. The framework aims to achieve complete separation of data and applications. It is based on the ActivityPub protocol and integrates various data from wearable devices, smart home sensors, and social networks, storing this data in a ubiquitous personal online data store (UPOD) and assigning different roles based on data type. UPOD applications support bidirectional following, allowing users to access data associated with relevant roles, addressing issues related to data categorization, sharing, privacy, and security. Additionally, this study proposes a method for utilizing complete UPOD data to perform anomaly detection based on the hidden Markov model (HMM). By effectively integrating IoT data into UPOD and enhancing data interoperability. The data-sharing model proposed in this study also facilitates elderly individuals and their family members in sharing relevant data as needed, while ensuring privacy protection. The methods proposed in this study will help prevent accidental injuries and enable early diagnosis of diseases, providing strong technical support for elderly healthcare.
|
|
09:50-10:10, Paper ThAT12.5 | |
Could Human-Robot Interaction Enhance English Comprehension Skills Compared to Traditional Text Reading? a Behavioral-Thermographic Analysis (I) |
|
Nagpal, Malika | Indian Institute of Technology Mandi, India, 175005 |
Chauhan, Sakshi | Indian Institute of Technology Mandi, India, 175005 |
Choudhary, Gitanshu | Indian Institute of Technology Mandi, India, 175005 |
Saini, Shivanshi | Jawaharlal Nehru Government Engineering College |
Dutt, Varun | Indian Institute of Technology Mandi |
Keywords: Human-Computer Interaction
Abstract: Social robots enhance human-robot interaction, potentially improving English comprehension skills. Despite their promise, their effectiveness in this area is less known. This study addresses this gap by comparing the effectiveness of the social robot Ohbot with traditional text reading for training English comprehension. Participants were randomly assigned to three groups: Ohbot interaction (N = 20), text reading (N = 20), and a control group (N = 20) with no specific intervention. Both Ohbot and reading groups answered multiple-choice questions based on a poem. Emotional arousal was measured using a thermal camera. Results showed that the Ohbot group experienced an average facial temperature decrease of 0.78°C, indicating reduced stress or increased relaxation, while the reading group had a temperature increase of 1.13°C, suggesting higher cognitive or emotional effort. Despite these physiological differences, quiz performance was similar between the Ohbot and reading groups. Therefore, the Ohbot group, which relied on auditory processing, proved as effective as the reading group, which depended on visual processing, in learning English comprehension. These findings indicate that social robots could effectively complement traditional English education methods like reading.
|
|
ThAT13 |
Room T13 |
2P - Haptic and Human-Computer Interaction |
2-Page Abstracts |
Chair: Bordatchev, Evgueni | National Research Council of Canada |
|
08:30-08:50, Paper ThAT13.1 | |
Preliminary Insights on Using Image Entropy to Analyze the Dynamics, Self-Organization, and Stability of Laser Remelting |
|
Bordatchev, Evgueni | National Research Council of Canada |
Cvijanovic, Srdjan | Western University |
Tutunea-Fatan, Remus O. | Western University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Optimization and Self-Organization Approaches
Abstract: Surface polishing by laser remelting (LRM) is a recently developed advanced manufacturing technology used for surface finishing. This technique involves melting and redistributing a thin layer of molten material to smooth the surfaces of functional parts, improving their surface quality without degrading their overall geometrical form. This paper novelly explores the applicability of image entropy as a measure of chaos and information for analyzing the dynamics, self-organization, and stability of the LRM process. The experimental investigation focused on analyzing the evolution of image entropy along a single LRM track and an entire LRM area. The findings of the study show that dynamic complexity and chaoticity effectively describe the state of LRM self-organization and its dependence on the initial process topography. These preliminary insights provide a roadmap for the future development of closed-loop control systems and thermodynamic modeling of chaotic processes during LRM.
|
|
08:50-09:10, Paper ThAT13.2 | |
Boosting Cognitive Focus Via Binary Search of Attention Types Using Brain-Computer Interfaces |
|
Gheorghică Istrate, David | Informatics Association for The Future |
Balica, Darius-Cristian | Informatics Association for The Future |
Bucur, Andrei | Informatics Association for The Future |
Beu, Mihai-Robert | Informatics Association for the Future |
Durduman - Burtescu, Tudor | Informatics Association for the Future |
Ripiciuc, Amalia-Ioana | Informatics Association for the Future |
Keywords: Cognitive Computing, Human-Machine Cooperation and Systems, Brain-Computer Interfaces
Abstract: Electroencephalograms (EEGs) are widely used for analyzing brain signals, with one of their applications being the detection of attention and concentration levels. This study explores the potential of Brain-Computer Interfaces (BCIs) to enhance cognitive focus in children and teenagers, aged 8-17 years. Data was acquired anonymously to ensure participant privacy. This study aims to evaluate the potential of Brain-Computer Interfaces (BCIs) to improve cognitive focus in children and teenagers by detecting their attention state(effective and ineffective), which shows if a person’s concentration refers to learning, another productive task or focusing on distractions. Effective and ineffective attention states are thus classified through EEG signals in our system with a 94% accuracy obtained by preprocessing, feature extracting with CSP and classifying with KNN, followed by the attention enhancement system that aims to help the patient recover their concentration by running a continuous binary search algorithm on 5 dimensions, which are external factors: temperature, lightning, smell, noise and font size .
|
|
09:10-09:30, Paper ThAT13.3 | |
Effect of Presenting Dynamic Characteristic Information of Teleoperated Hydraulic Excavators on Work Efficiency When Switching to a Different Class |
|
Nagai, Masaki | Hiroshima University |
Masunaga, Junya | Toshiba Infrastructure Systems & Solutions |
Ito, Masaru | Kobelco Construction Machinery Co., Ltd |
Saiki, Seiji | KOBELCO Construction Machinery.Co., LTD |
Yamazaki, Yoichiro | Kobelco Construction Machinery Co., Ltd |
Kurita, Yuichi | Hiroshima University |
Keywords: Human-Machine Interface, User Interface Design, Assistive Technology
Abstract: We propose a method by which to assist an operator in switching an internal model by visually presenting information about the dynamic characteristics of the destination in accordance with changes in dynamic characteristics when switching teleoperated hydraulic excavators. Experimental results show that when switching without presentation, the operation time significantly increased by 8.8%, compared with that observed when not switching. However, when using the proposed method, there was no increase in work time owing to switching. Therefore, the proposed method makes it possible to immediately switch the internal model of the operator.
|
|
ThBT1 |
MR01 |
Image Processing and Pattern Recognition 9 |
Regular Papers - Cybernetics |
Chair: Li, Yubo | Chongqing Normal University |
|
11:00-11:20, Paper ThBT1.1 | |
GNF-Net: An Adaptive Mesh Denoising Method with GCN-Based Guided Normal Filtering |
|
Yan, Jie | South China University of Technology |
Liang, Lingyu | South China University of Technology |
Yang, Yutian | South China University of Technology |
Huang, Shuangping | South China University of Technology |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Application of Artificial Intelligence
Abstract: Mesh Denoising has become a popular area of research, many traditional and learning-based methods have been proposed to remove noise from meshes. However, most approaches only focus on denoising meshes with low levels of noise. When the noise is high-frequency, it can be difficult to recover the original shape. In this paper, we present a mesh denoising approach that utilizes graph convolution representations to enhance the understanding of the mesh characteristics. It integrates precisely designed graphs to explore the inherent compositional structure of the mesh. When analyzing meshes under the influence of various noises, we extract information about the original features of the mesh by capturing spatial geometric features through graph convolution calculations. Our method is based on Guided Normal Filtering (GNF) to design a Graphical Representation Module (GRM) and a GCN-Based Normal Prediction Module (NPM). It can adaptively obtain the optimal guided normal vectors for noisy meshes. We have compared and analyzed the various methods to produce state-of-the-art results.
|
|
11:20-11:50, Paper ThBT1.2 | |
PGNet: A GNN-Based Method with Mutual Information Minization for Fine-Grained Remote Sening Change Detection |
|
Daobo, Sun | Harbin Engineering University |
Sun, Kang | Beijing Institute of Spacecraft System Engineering |
Qin, Shuo | Harbin Engineering University |
Tang, Bin | Harbin Engineering University |
Meng, Xiangxu | Harbin Engineering University |
Keywords: Machine Learning, Image Processing and Pattern Recognition, Deep Learning
Abstract: Remote sensing image change detection (RSCD) is an active research area in remote sensing. Existing convolutional neural networks (CNN) and Transformer methods operate on images in Euclidean space, limiting flexible perception and interaction among different positions. Recent advancements in various fields have focused on transferring feature processing to non-Euclidean space, allowing for more flexible feature learning. Mapping different locations of high-resolution remote sensing images into non-Euclidean space enables more flexible perception of individual pixels.Motivated by this, we have developed a method solely based on GNN, termed Pure GNN-based Method with the Mutual Information Sensing (PGNet), specifically for detecting changes in remote sensing. In the PGNet framework, Graph Convolutional Networks (GCNs) are employed to derive graph-level characteristics. By aggregating information from neighboring nodes in the graph, we capture richer and more detailed features. Subsequently, by fusing the shallow and deep distributions of local and global features of bi-temporal remote sensing images, we achieve an effective combination of fine-grained and macro-level information. Finally, in order to make explicit semantic perception of bi-temporal images, we propose a Mutual Information Sensing module, which encourages bi-temporal feature pairs to enhance their understanding of each other's semantic information. To confirm the efficacy of PGNet, we carried out a series of tests with advanced techniques across multiple datasets. The outcomes show that our model surpasses existing CNN and Transformer-based methods in precision.
|
|
11:50-12:00, Paper ThBT1.3 | |
ETKD: A Semi-Supervised Learning-Based Knowledge Distillation Model for Encrypted Traffic Classification |
|
Pan, Quanbo | University of Chinese Academy of Sciences |
Yu, Yang | Qufu Normal University |
Yan, Hanbing | University of Chinese Academy of Sciences |
Wang, Maoli | Qufu Normal University |
Qi, Bingzhi | Shandong Jianzhu University |
Keywords: AIoT, Image Processing and Pattern Recognition, Machine Learning
Abstract: Encrypted traffic classification is a challenging task involving precisely categorizing various types of network traffic and applications, a challenge exacerbated by the continuous emergence of new applications. Traditional methods predominantly rely on feature extraction from datasets and the utilization of complex deep learning models for classification, necessitating larger datasets and more intricate computational models. The challenge lies in balancing leveraging unlabeled traffic to enhance classification datasets and controlling model complexity. In response to these challenges, this paper introduces ETKD, an encrypted traffic classification framework based on knowledge distillation. ETKD is designed to address the trade-off between classification accuracy and model complexity. The framework capitalizes on image recognition techniques to extract encrypted traffic features from extensive unlabeled traffic data. Subsequently, these features are fine-tuned using a teacher-student model, achieving a harmonious equilibrium between reducing model complexity and maintaining high classification accuracy. The experimental results presented in this paper provide compelling evidence of the framework's effectiveness. Compared to state-of-the-art models, the minimal model employed in our approach achieves notably higher accuracy and F1-score, underscoring the practicality and efficiency of the proposed ETKD method.
|
|
12:00-12:20, Paper ThBT1.4 | |
Adaptive Frequency Enhancement Network for Single Image Deraining |
|
Yan, Fei | Xiamen University |
He, Yuhong | Northeastern University |
Chen, Keyu | Xiamen University |
Cheng, En | Xiamen University |
Ma, Jikang | Xiamen University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: Image deraining aims to improve the visibility of images damaged by rainy conditions, targeting the removal of degradation elements such as rain streaks, raindrops, and rain accumulation. While numerous single image deraining methods have shown promising results in image enhancement within the spatial domain, real-world rain degradation often causes uneven damage across an image's entire frequency spectrum, posing challenges for these methods in enhancing different frequency components. In this paper, we introduce a novel end-to-end Adaptive Frequency Enhancement Network (AFENet) specifically for single image deraining that adaptively enhances images across various frequencies. We employ convolutions of different scales to adaptively decompose image frequency bands, introduce a feature enhancement module to boost the features of different frequency components and present a novel interaction module for interchanging and merging information from various frequency branches. Simultaneously, we propose a feature aggregation module that efficiently and adaptively fuses features from different frequency bands, facilitating enhancements across the entire frequency spectrum. This approach empowers the deraining network to eliminate diverse and complex rainy patterns and to reconstruct image details accurately. Extensive experiments on both real and synthetic scenes demonstrate that our method not only achieves visually appealing enhancement results but also surpasses existing methods in performance.
|
|
12:20-12:40, Paper ThBT1.5 | |
MAML MOT: Multiple Object Tracking Based on Meta-Learning |
|
Chen, Jiayi | Wuhan University of Science and Technology |
Deng, Chunhua | Wuhan University of Science and Technology |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Application of Artificial Intelligence
Abstract: With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.
|
|
12:40-13:00, Paper ThBT1.6 | |
Teeth Segmentation from Bite-Wing X-Ray Images by Integrating Nested Dual UNet with Swin Transformers |
|
Li, Yubo | Chongqing Normal University |
Tian, Yibin | Shenzhen University |
Zhang, Zhiyuan | Singapore Management University |
Zhang, Xueyang | Southern Medical University |
Du, Bingran | Southern Medical University |
Zeng, Zhi | Chongqing Normal University |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Deep Learning
Abstract: In medical practice, the precision of image segmentation is crucial for diagnosis and treatment evaluations. Specifically, in dentistry, accurate teeth segmentation from bite-wing images is important for automatic and objective evaluations of root canal treatments. This study introduces Swin-U2Net, a model merging the nested dual UNet with residual U-block and Swin Transformers. It combines the local feature extraction capability of the former and the global attention and context understanding of the latter. It has been evaluated for tooth root segmentation using 500 bite-wing dental x-ray images obtained from a root canal treatment clinic. It achieved the best segmentation outcome in terms of Intersection over Union (IOU) and the third best result in terms of Dice Similarity Coefficient (DSC) with the second least amount of network parameters among six UNet-like models, thus it is effective and efficient.
|
|
ThBT2 |
MR02 |
Complex, Cooperative and Bioinformatics Systems |
|
Chair: Yin, Tong | Beijing Information Science & Technology University |
|
11:00-11:20, Paper ThBT2.1 | |
Utility-Based Task Offloading and Resource Allocation for Digital Twin-Assisted Edge Networks |
|
Yin, Tong | Beijing Information Science & Technology University |
Chen, Xin | Beijing Information Science and Technology University |
Jiao, Libo | Beijing Information Science and Technology University |
Cao, Aobo | Beijing Information Science and Technology University |
Dai, Xin | Beijing Information Science & Technology University |
Keywords: Complex Network, Cloud, IoT, and Robotics Integration, Optimization and Self-Organization Approaches
Abstract: With the emergence of the Internet of Things (IoT), mobile edge computing (MEC) effectively reduces task delay of multiple applications. The limitations of computing and storage capabilities, as well as the complex dynamic network environments, make efficient edge service computing stressful. To address this challenge, digital twin (DT) technology is a promising solution that bridges the virtual and physical worlds by creating digital representations of physical objects. DT technology can model the behaviors of physical entities through virtual mirroring and assist physical networks in making optimal network strategies. In this paper, we combine DT technology with MEC networks to develop a utility-based task offloading and resource allocation scheme, aiming to maximize the quality of experience (QoE) of user equipments (UEs) and the utility of base stations (BSs). Specifically, we construct a hierarchical Stackelberg game model, study the interaction between BSs and UEs, and prove that the UE layer is an exact potential game with Nash equilibrium (NE). To achieve Stackelberg equilibrium (SE), we propose a game-based hierarchical interaction algorithm (GHIA) and analyze it. The experimental results show that GHIA has good convergence, and the utility performance of participants is better comparing to other algorithms.
|
|
11:20-11:50, Paper ThBT2.2 | |
Dynamic Resource Scheduling Based Quality of Service Optimisation in Multi-UAV-Assisted City Edge Network Systems |
|
Cao, Aobo | Beijing Information Science and Technology University |
Chen, Xin | Beijing Information Science and Technology University |
Jiao, Libo | Beijing Information Science and Technology University |
Yin, Tong | Beijing Information Science & Technology University |
Wei, Jiyuan | Beijing Information Science & Technology University |
Keywords: Complex Network, Machine Learning, Cloud, IoT, and Robotics Integration
Abstract: The paradigm of unmanned aerial vehicles (UAV)-assisted mobile edge computing (MEC) has emerged as an effective scheme for handling intensive tasks in heterogeneous networks. In this work, we consider a user-equipment-rich city network scenario. Due to the limited user equipments (UEs) resources and base station (BS) coverage, we utilise multipleUAV-assisted UEs and partial offloading to handle the tasks. Meanwhile, considering the impact of task diversity on the quality of service (QoS) of the system, we design an integrated scheme that combines improved clustering techniques and deep reinforcement learning (DRL) for dynamic resource scheduling. Firstly, a random forest-based clustering algorithm (RFCA) is used to cluster UEs according to the service requirements (SR) of tasks, as a way to reduce the complexity of task processing and user association. Then a DRL-based computational offloading and bandwidth allocation algorithm (DCOBA) is used to improve the QoS by jointly optimising UAV-user associations, offloading ratios, and bandwidth allocation to reduce the system latency and energy consumption. Finally, experimental simulation data shows that our scheme can better optimise the Qos compared to traditional schemes.
|
|
11:50-12:00, Paper ThBT2.3 | |
Stabilization of Walking Motion with Light Touch Using a Mobile Robot (I) |
|
Kiguchi, Kazuo | Kyushu University |
Nawama, Ryosuke | Kyushu University |
Tokunaga, Daigo | Kyushu University |
Nishikawa, Satoshi | Kyushu University |
Keywords: Biometric Systems and Bioinformatics, Cyborgs,, Computational Life Science
Abstract: Stabilizing walking motion is important to avoid unexpected falling for physically weak persons such as elderly. It is known that light touch makes human posture stable. This paper presents the way to realize the effect of light touch to make human walking stable using an omnidirectional mobile robot. The mobile robot has been developed to keep light touch contact to the lower back of the user during walking. In this study, the proper amount of light touch force for lower back and its contact location are investigated at first. Then it is realized by the mobile robot while the user is walking. The effectiveness of the proposed robotic light touch method was evaluated by performing the experiment.
|
|
ThBT3 |
MR03 |
Human-Centered and Human-Computer Interaction Systems |
|
Chair: Zheng, Haoran | Harbin Institute of Technology, Shenzhen |
|
11:00-11:20, Paper ThBT3.1 | |
Hype or Revolution: How GPT4, Claude3 Opus, and GPT3.5 Exaggerate Their Influence on HRM Occupations Exposed in Practice |
|
Xie, Yuanhan | College of Systems Engineering, National University of Defense T |
Wang, Tao | National University of Defense Technology |
Sun, Jiayuan | National University of Defense Technology |
Shen, Dayong | National University of Defense Technology |
Zhang, Zhongshan | National University of Defense Technology |
Yao, Feng | National University of Defense Technology |
Keywords: Human Factors, Human-Machine Interaction, Cognitive Computing
Abstract: This study investigates the actual impact of ChatGPT on human resource management (HRM) occupations and compares it with assessments from large language models (LLMs). Using data from the Occupational Information Network (O*NET), we developed a novel methodology combining real-world evidence from Google search results with evaluations from GPT-4, GPT-3.5, and Claude 3 Opus. Our findings reveal significant variations in the degree of exposure among HRM occupations. Training and Development Managers were the most exposed, with 86.52% of their tasks potentially automated by ChatGPT, while Labor Relations Specialists were the least exposed at 11.08%. Notably, LLM assessments consistently overestimated the exposure levels by an average of 30% compared to real-world evidence. This discrepancy highlights the importance of critically evaluating LLM capabilities in task automation. Our research contributes to understanding AI's impact on HRM practices and provides valuable insights for professionals adapting to the AI era. It also underscores the need for caution when relying solely on LLM assessments for workforce planning and development.
|
|
11:20-11:50, Paper ThBT3.2 | |
Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification |
|
Jiang, Xiaowei | University of Technology Sydney |
Ma, Wenhao | University of Technology Sydney |
Duan, Yiqun | University of Technology Sydney |
Do, Thomas | University of Technology Sydney |
Lin, Chin-Teng | University of Technology Sydney |
Keywords: Biometrics and Applications,, Human-Machine Interaction
Abstract: In the realm of digital forensics and document authentication, writer identification plays a crucial role in determining the authors of documents based on handwriting styles. The primary challenge in writer-id is the "open-set scenario", where the goal is accurately recognizing writers unseen during the model training. To overcome this challenge, representation learning is the key. This method can capture unique handwriting features, enabling it to recognize styles not previously encountered during training. Building on this concept, this paper introduces the Contrastive Masked Auto-Encoders (CMAE) for Character-level Open-Set Writer Identification. We merge Masked Auto-Encoders (MAE) with Contrastive Learning (CL) to simultaneously and respectively capture sequential information and distinguish diverse handwriting styles. Demonstrating its effectiveness, our model achieves state-of-the-art (SOTA) results on the CASIA online handwriting dataset, reaching an impressive precision rate of 89.7%. Our study advances universal writer-id with a sophisticated representation learning approach, contributing substantially to the ever-evolving landscape of digital handwriting analysis, and catering to the demands of an increasingly interconnected world.
|
|
11:50-12:00, Paper ThBT3.3 | |
Error-Tolerant Code Segmentation for Supporting Semantic Conflict Prevention in Real-Time Collaborative Programming |
|
Jiang, Jinfeng | Tongji University |
Fu, Qirui | Tongji University |
Wang, Mingjie | Tongji University |
Liu, Zhonghao | University of Wisconsin-Madison |
Lyu, Junxiao | Tongji University |
Fan, Hongfei | Tongji University |
Keywords: Human-Computer Interaction, Multi-User Interaction
Abstract: Real-time collaborative programming is a novel approach that enables programmers to simultaneously edit shared source code at the same time, which has been applied in a variety of software development scenarios. To achieve semantic conflict prevention in real-time collaboration, a dependency-based automatic locking (DAL) approach was proposed in prior work, which prevents programmers' concurrent editing on selected source code regions. However, DAL's reliance on source code analysis techniques may lead to failures when there exists syntax errors in the source code. To overcome such limitation, we propose an error-tolerant code segmentation (ECS) approach, as well as supporting algorithms, to improve semantic conflict prevention. Technically, the ECS approach continuously identifies source code regions and maintains their range information to ensure stable source code segmentation and code region tracking during the collaboration process, without the need to deal with complex syntax issues. The proposed approach and algorithms have been implemented in a prototype, and experimental evaluations have demonstrated their effectiveness and efficiency.
|
|
12:00-12:20, Paper ThBT3.4 | |
Dog's 3D Skeleton Reconstruction Using a Moving Trainer for Analysis of Guide Dog Training |
|
Wang, Ansheng | The University of Tokyo |
Huang, Rongjin | The University of Tokyo |
Fujii, Keisuke | Nagoya University |
Tanaka, Shinji | Japan Guide Dog Association |
Matsunami, Yoshiro | Japan Guide Dog Association |
Makino, Yasutoshi | The University of Tokyo |
Shinoda, Hiroyuki | The University of Tokyo |
Keywords: Visual Analytics/Communication, Human-Machine Interaction, Human-Computer Interaction
Abstract: This study aims to enhance the training efficiency of guide dogs by employing computer vision to collect training data and analyze the movements of both trainers and dogs. This task is challenging, owing to the constant movement of cameras and unstable reference points for camera calibration, which stem from the complexities of the guide dog training process and the surrounding environment. In addition, trainers and dogs walk side by side, making it difficult for 2D videos to capture complete interactions without obstacles. We present a comprehensive system that starts with 2D video footage from multicamera setups, proceeds to extrinsic camera calibration from the moving trainer's joints, and reconstructs the 3D poses of guide dogs and trainers. This process includes human 2D/3D pose estimation, camera calibration, and dog 2D/3D pose estimation. A novel aspect of the proposed approach involves modifying the existing calibration method for multiple cameras. This modification is designed to achieve extrinsic camera calibration and accommodate complex real-world camera settings, including fixed and moving multicamera setups without calibration objects. We can create a 3D representation of the training sessions by detecting the trainer's 2D and 3D skeletons and using calibrated cameras to triangulate the dog's 3D pose. This allows for a detailed analysis and adjustment of guide dog training methods based on the 3D pose data of trainers and guide dogs, thereby improving the overall training process.
|
|
12:20-12:40, Paper ThBT3.5 | |
Can Pupillometry Be Used to Detect Driver Hazard Awareness? |
|
Tamura, Kimimasa | Toyota Research Institute |
Gideon, John | Toyota Research Institute |
Stent, Simon | Toyota Research Institute |
Rosman, Guy | Toyota Research Institute |
Keywords: Human Performance Modeling, Human-Centered Transportation, Biometrics and Applications,
Abstract: Modern Advanced Driver-Assistance Systems (ADAS) increasingly rely on interactions between vehicle and human driver. To inform these interactions, it is helpful for a vehicle system to have a good understanding of a driver's situational awareness. In this work we explore a relatively under-exploited, passively measurable signal which might provide insight into a driver's awareness: the constriction and dilation of their pupils over time, or pupillometry. We ask whether pupillometry might be practically useful to detect if and when a driver becomes aware of a road hazard. Using a dataset of driver responses to both hazardous and routine scenarios during simulated semi-automated driving, we compare models trained on pupillometric data to a model trained on facial responses, and demonstrate how their performances differ in terms of accuracy and latency. While a driver's facial expressions are, as expected, a useful cue to determine awareness (0.82 AUC on held-out test stimuli), we find that pupillometric data alone can provide an even more meaningful signal (0.93 AUC). In addition, we find that the pupillometric model performance degrades more gracefully than the face model when tested on unseen subjects, while fusing models yields further accuracy and latency improvements given sufficient training data. We characterize the shape of the performance vs.~latency curve for all models and make our code available for reproducibility.
|
|
12:40-13:00, Paper ThBT3.6 | |
Dual Transducers Co-Focusing Method for Ultrasound Stimulation in Calf Peripheral Nervous System: A Feasibility Study (I) |
|
Zheng, Haoran | Harbin Institute of Technology, Shenzhen |
Wang, Xiaoxin | Harbin Institute of Technology, Shenzhen |
Zhang, Hongwei | Harbin Institute of Technology, Shenzhen |
Liu, Honghai | Shanghai Jiao Tong University |
Keywords: Human Enhancements, Human-Machine Interaction, Haptic Systems
Abstract: 作为一种神经调控技术,超声刺激 由于其非侵入性和 有针对性的性质。低强度聚焦超声 (LIFU) 可以精确定位周围神经 系统,能够在 人体,包括触觉、冷、热和疼痛。 目前的研究工作主要集中在超声方面 刺激上肢,特别是手指。 然而,对以下方面的调查很少 周围神经系统的刺激 人体的下肢。此外,利用率 用于刺激的单一聚焦换能器 引入各向异性,从而使其具有挑战性 提高轴向分辨率。本研究旨在探索 一种使用双传感器的创新方法,具有 协同聚焦超声,实现高精度 刺激并减少轴向和轴向之间的差异 横向分辨率。主要目标是进行调查 将这种方法用于深层肌肉的可行性 刺激人体小腿肌肉,从而深入研究 下肢神经调控领域。
|
|
ThBT4 |
MR04 |
Augmented, Virtual Reality and Big Data |
Workshops |
Chair: Kumar, Ajoy | Indian Institute of Technology Mandi |
|
11:00-11:20, Paper ThBT4.1 | |
HFVR: Free-Viewpoint Reconstruction of Moving Humans from Monocular Video (I) |
|
Xiao, Jincheng | Hangzhou Dianzi University |
Gao, Zhigang | China Jiliang University |
Huang, Rui | Coupang, Inc |
Feng, Jianwen | Hangzhou Dianzi University |
Li, Xionglong | Hangzhou Dianzi University |
Xia, Zilin | Hangzhou Dianzi University |
Keywords: Human Factors, Virtual/Augmented/Mixed Reality, Visual Analytics/Communication
Abstract: In AR/VR and other application scenarios, the quality of human free-view video reconstruction significantly impacts user experience. Current reconstruction methods often rely on images captured by multi-view cameras and struggle to handle deformations caused by human movement. To address this issue, we propose HFVR (Human Free-Viewpoint Reconstruction), which aims to reconstruct the free view of moving humans from monocular videos. The core concept of HFVR is to utilize the motion deformation field to connect the observation space and the canonical space to perform a 3D representation of humans in the canonical space for rendering from any perspective. We first introduce the motion deformation field to transform the human body in the observation space into a standard space, then model the human body in that standard space. Second, we design a 3D global perception module to use global structural information to address deformation accuracy issues. Finally, we construct a rendering-guided module based on a reference frame that effectively overcomes limitations due to a lack of multi-view information by combining 2D observation. Extensive experiments demonstrate that our method can render realistic free-viewpoint motion humans using monocular videos. Our method improves the LPIPS (Learned Perceptual Image Patch Similarity) metric by 9.89% compared to that of HumanNeRF, as well as by 45.28% compared to that of Neural Body.
|
|
11:20-11:50, Paper ThBT4.2 | |
Detecting Small Objects Using Multi-Scale Feature Fusion Mechanism with Convolutional Block Attention (I) |
|
Li, Xionglong | Hangzhou Dianzi University |
Yang, Kun | Hangzhou Dianzi University |
Huang, Rui | Coupang, Inc |
Zhou, Bingqin | Hangzhoudianzi University |
Xiao, Jincheng | Hangzhou Dianzi University |
Gao, Zhigang | China Jiliang University |
Keywords: Visual Analytics/Communication, Information Visualization, Assistive Technology
Abstract: Small object detection is difficult because of their low resolution and the inclusion of unimportant background information. Aiming at the problem of small objects information loss in multiscale feature fusion and the impact of background information, this paper proposes a multi-scale feature fusion mechanism (MFFM) with the convolutional block attention modules (CBAM). The proposed approach efficiently utilizes the low three-layer feature information (P1-P3) output from the backbone network and the improved feature fusion technique to enhance the characterization ability of the single-layer feature information through the attention mechanism. This mechanism enables the single-layer features to carry the three-layer feature information, thereby improving the fusion ability among the feature layers. The experimental results demonstrate that MFFM enhances the overall accuracy mAP on the VisDrone2019-DET validation set by 1.4% and the accuracy APs on small objects by 1.5% in comparison to the baseline model YOLOX-X. This approach effectively improves the performance of small object detection.
|
|
11:50-12:00, Paper ThBT4.3 | |
DAS-COD: Depth-Aware Camouflaged Object Detection Via Swin Transformer (I) |
|
Lu, Chenye | Hangzhou Dianzi University |
Tan, Min | Hangzhou Dianzi University |
Gao, Zhigang | China Jiliang University |
Mao, Xiaoyang | University of Yamanashi |
Xia, Zilin | Hangzhou Dianzi University |
Keywords: Visual Analytics/Communication, Human Perception in Multimedia, Multimedia Systems
Abstract: The successful integration of depth information into salient object detection tasks has catalyzed research interest in depth-enhanced camouflaged object detection (COD) tasks. However, the challenges associated with acquiring depth information pose significant hurdles to this task, especially given the lack of RGB-D datasets tailored specifically for COD. Consequently, employing depth estimation techniques to generate pseudo-depth information emerges as a viable solution in the realm of depth-enhanced COD tasks. In this study, we propose an architecture, DAS-COD (Depth-Aware Swin Transformer COD), that integrates Swin Transformer model with depth estimation techniques for the purpose of camouflaged object detection. In particular, we use a dual-stream Swin Transformer backbone to extract feature maps from different modalities. These maps are then enhanced with a multi-modal feature enhancement module.Additionally, to address the inherent discrepancies between pseudo-depth maps and actual depth information, we incorporate an edge-aware module to significantly improve the accuracy of boundary delineation in the predicted outcomes. We tested our proposed method on three different COD datasets. Our results show that the model achieves state-of-the-art performance across these camouflaged object detection datasets.
|
|
12:00-12:20, Paper ThBT4.4 | |
VRZM: Exploring the Effect of Zen Meditation on EEG Patterns in Immersive Environments (I) |
|
Kumar, Ajoy | Indian Institute of Technology Mandi |
Sankhyan, Sahil | Indian Institute of Technology Mandi |
Tripathi, Kirti | Indian Institute of Technology Mandi |
Thakur, Sakshi | Jawaharlal Nehru Government Engineering College, Sundernagar, Man |
Bhavsar, Arnav | Indian Institute of Technology Mandi (IIT Mandi) |
Dutt, Varun | Indian Institute of Technology Mandi |
Keywords: Virtual and Augmented Reality Systems, Human-Computer Interaction
Abstract: There is growing interest in developing virtual reality (VR) applications for mental health therapies. However, the investigation of the effectiveness of meditation in VR environments for mental health issues like stress remains mostly unexplored. This study seeks to fill this knowledge gap by investigating the influence of VR-guided Zen meditation (VRZM) on stress levels. 40 individuals were randomly divided into two between-subjects groups: one engaged in VRZM (N = 20), while the other received just a VR immersive environment without the Zen meditation’s audio (VR; N = 20). The study explored the impact of VRZM on stress via EEG patterns and the Depression Anxiety Stress Scale - 21 (DASS - 21). The results indicated significantly reduced depression, anxiety, and stress levels in the VRZM group but not in the VR group. Moreover, VRZM induced a pronounced increase in the frontal alpha-to-temporal theta ratio, indicating enhanced relaxation, contrasting with no significant change in the VR group. The results suggested the effectiveness of VRZM meditation in promoting calmness and its potential efficacy in mental health interventions. We highlight the implications of VRZM for alleviating mental health problems like stress.
|
|
ThBT5 |
MR05 |
AI Applications 10 |
Regular Papers - Cybernetics |
Chair: Liu, Xiaocheng | Qingdao University |
|
11:00-11:20, Paper ThBT5.1 | |
Simulation of UAV Path Planning Based on Reinforcement Learning with Attention and Fuzzy Control |
|
Liu, Xiaocheng | Qingdao University |
Gao, Shiwei | China State Shipbuilding Corporation Qingdao Beihai Shipbuilding |
Yang, Xianyang | China Offshore Oil Engineering Co., Ltd |
Liu, Dejun | China Academy of Railway Sciences |
Dong, Youqiang | Qingdao Haily Measuring Technologies Co., Ltd |
Park, Bongrae | Wapa System(South Korea) |
Wan, Zhibo | Qingdao University |
Lyu, Zhihan | Uppsala University |
Keywords: Application of Artificial Intelligence, AI and Applications, Fuzzy Systems and their applications
Abstract: With the development of the Internet of Things, information & communication technology, and sensing technology, as well as the improvement of UAV performance and endurance, the application of UAVs is becoming increasingly widespread. Meanwhile, the development of computer vision enables UAVs to work in more complex environments. By real-time modeling and rendering of the environment, UAVs can obtain more comprehensive scene information. Based on various sensor data, UAVs can make more accurate path planning and interaction decisions. Therefore, path planning and control algorithms for UAVs have been a hot topic for researchers globally. This article proposes a unmanned aerial vehicle (UAV) path planning algorithm based on reinforcement learning and built a digital twins simulation environment using the Unity3D engine, which uses ray detection for data collection, applies transformer architecture as a training network, adds attention mechanism to reinforcement learning. Through experimental comparison, it was shown that the performance of this research model is superior to the current popular reinforcement learning algorithms. We set different kinds of experiments compared the impact of different visual inputs in training process. We also apply a fuzzy control system to the UAV, making its motion trajectory smoother.
|
|
11:20-11:50, Paper ThBT5.2 | |
Dynamic NFT Classification and Detection on Ethereum Via Smart Contract |
|
Yin, Keting | Zhejiang University |
Zhu, Zheng | Zhejiang University |
Ren, Xiaoxue | Zhejiang University |
Wang, Xing | Zhejiang University |
Keywords: Application of Artificial Intelligence, Machine Learning, AI and Applications
Abstract: In recent years, Non-Fungible Token (NFT) has gradually become the key application of blockchain technology. Static NFT is the most common type of NFT. Once static NFT is minted on the blockchain, its additional metadata is immutable. However, some NFTs that mark real assets, games, sports, and other types need to update the metadata dynamically. Therefore, a dynamic NFT with changeable features is needed. The emergence of dynamic NFT has greatly expanded the application innovation scene, and promoted the rapid development of community ecology, but also brought new problems and challenges to anti-fraud and supervision. This paper aims to realize the classification and detection of dynamic NFT. First, define and classify dynamic NFTs from both dynamic and static perspectives. Second, a complete dataset of dynamic NFT smart contract codes on Ethereum was constructed for the first time, and analyzed from multiple perspectives. Third, a smart contract feature model of dynamic NFT is proposed, and machine learning methods are used for recognition and classification. After experimental verification, the method proposed in this article can be effectively used to detect and identify dynamic NFTs, helping NFT holders avoid risks.
|
|
11:50-12:00, Paper ThBT5.3 | |
NAEP: Neighborhood Multihead Attention with Evidential Conditional Neural Processes for Few-Shot Knowledge Graph Completion |
|
Cao, Yukun | ShangHai University of Electric Power |
Li, Jingjing | Shanghai University of Electric Power |
Chen, Ming | Shanghai University of Electric Power |
Huang, Luobin | Shanghai University of Electric Power |
Liu, Yuanmin | Shanghai University of Electric Power |
Wang, Tianhao | Shanghai University of Electric Power |
|
|
12:00-12:20, Paper ThBT5.4 | |
MUPO: Unlocking the Potential of Unpaired Data for Single Stage Preference Alignment in Language Models |
|
Dang, Truong-Duy | Faculty of Information Technology, University of Science, Vietna |
Quang, The-Cuong | University of Science, Vietnam National University, Ho Chi Minh |
Nguyen, Long | University of Science, Ho Chi Minh City, Vietnam |
Dinh, Dien | University of Science, Ho Chi Minh City |
Keywords: AI and Applications, Deep Learning, Machine Learning
Abstract: Preference alignment has recently been on the forefront of helping language model (LM) aligning model behaviors with human preferences, with the most relevant application being chat bots. With the success of Direct Preference Optimization (DPO), there have been many derivative works of the same paradigm that further better different aspects, in both ease of training and performance. Odds Ratio Preference Optimization (ORPO) provides the next step in the development of preference alignment by combining the training pipeline into a single monolithic reference optimization process. In addition, ORPO also eliminates the need for a reference model to further improve on training efficiency without negatively affecting performance. However, ORPO, and many of its predecessor, requires datasets with pre-defined reference pairs of example which could severely limit the amount of compatible training data and lower LM's capability compares to what could be potentially achieved. Inspired by this limitation, we devise a new approach, Monolithic Unpaired Preference Optimization (MUPO), that is capable of making use of the abundantly available data without paired reference and combining the training process into a single monolithic stage for further efficiency and accessibility. Through our experimentation, we found that MUPO's test performances on StableLM and Phi-2 stayed competitive with previous methods like SFT, SFT+DPO, and ORPO.
|
|
12:20-12:40, Paper ThBT5.5 | |
Consistency and Complementary Structure Diffusion for Deep Multi-View Subspace Clustering |
|
Zhou, Yuanzhu | Beijing University of Posts and Telecommunications |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Real-world data often exhibit multiple heterogeneous features, known as multi-view data, which provide rich information for practical applications. Multi-view clustering (MVC) harnesses consistency and complementary information from these multiple views to unveil intrinsic relationships among instances and improve clustering performance. Despite the promising performance of various deep multi-view clustering methods, most of them focus solely on learning shared feature representations in latent spaces, overlooking the structural information within affinity matrices. Consequently, such oversight can distort both consistency and view-specific structural details, while also limiting the effective utilization of complementary information during the integration of views. To tackle these challenges, we propose a collaborative deep multi-view subspace clustering algorithm. Unlike existing methods focused on shared feature representation, our approach exploits comprehensive information across all views within the learned subspace structure. Specifically, we leverage consistency and complementary information within subspace structures through a cross-view structure diffusion mechanism. This mechanism enables each view to collaboratively enhance its subspace structure by unveiling the underlying manifold geometry across all views, thereby enabling us to effectively leverage consistency and complementary information among multiple subspaces from different views. As a result, we obtain a more accurate consensus graph for final clustering. Experimental results across several representative datasets demonstrate the superior effectiveness of our method compared to other state-of-the-art methods.
|
|
12:40-13:00, Paper ThBT5.6 | |
Autoencoder-Based Drift Detection Method for Dynamic Analysis of EEG Data: A Comprehensive Study |
|
Khadimallah, Rihab | Faculty of Economics and Management of Sfax |
Kallel, Ilhem | REGIM-Lab., ENIS, University of Sfax |
Sanchez-Medina, Javier Jesús | Universidad De Las Palmas De Gran Canaria |
Drira, Fadoua | ENIS |
Keywords: AI and Applications
Abstract: In distributed data environments, the properties and probabilities of the data may evolve over time creating an occurrence called Concept Drift. Indeed, ensuring the effectiveness and dependability of machine learning models is greatly dependent on concept drift detection, especially in areas where the distribution of data is affected by periodic changes. Hence, in dynamic environments, such as EEG datasets, the concept drift poses an important challenge. This study presents the autoencoder approach for identifying concept drift in EEG datasets made up of children with normal development and ADHD (Attention Deficit Hyperactivity Disorder). After conducting many tests with different window sizes, we determined the number of batches in the two datasets (Normal and ADHD). The experimental results show how well the suggested technique works to identify shifts in the distributions of EEG data, which helps to maintain precise prediction models in dynamic settings.
|
|
ThBT6 |
MR06 |
Discrete Event and Distributed Systems 3 |
|
Chair: Yang, Tianshuo | Nipissing University |
|
11:00-11:20, Paper ThBT6.1 | |
Ontology Mapping-Based Semantic Reasoning with OPC UA for Heterogeneous Industrial Devices |
|
Bi, Jing | Beijing University of Technology |
Wu, Rina | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Wang, Ziqi | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Manufacturing Automation and Systems, Distributed Intelligent Systems, System Architecture
Abstract: The advent of smart manufacturing in Industry 4.0 signifies the arrival of the era of connections. As an excellent communication protocol, Object linking and embedding for Process Control Unified Architecture (OPC UA) can address most semantic heterogeneity issues. However, its semantics are not formally defined at the application layer. To address the information silo problem caused by semantic heterogeneity, a method named Querying of Ontology Mapping-based OPC UA (QOMOU) is proposed. It extracts the information models of OPC UA servers into resource description framework triples, utilizes web ontology language for semantic enrichment and inference, and employs a semantic similarity model for event ontology mapping to improve query efficiency. The method’s effectiveness is validated through functional queries using the SPARQL protocol in Apache Jena. The query efficiency is 5% higher on average compared to both structured query and extensible markup languages. Moreover, by employing a keyword-matching algorithm, the query accuracy of the existing heterogeneous data integration scheme is improved by 4% on average. This enhancement can boost the operational efficiency of Internet of Things systems based on the OPC UA architecture.
|
|
11:20-11:50, Paper ThBT6.2 | |
A Faster Heuristic Scheduling Search for Robotic Cellular Manufacturing Systems with Generalized and Timed Petri Nets |
|
Xiao, Yuanzheng | Nanjing University of Science and Technology |
Wu, Haoran | Nanjing University of Science and Technology |
Gao, Yangqing | Nanjing University of Science and Technology |
Huang, Bo | Nanjing University of Science and Technology |
Keywords: Discrete Event Systems, Manufacturing Automation and Systems, System Modeling and Control
Abstract: The design of the heuristic function in Petri-net-based A∗ search significantly impacts the efficiency and result quality of scheduling robotic cellular manufacturing (RCM) systems. Previous work in such designs lacks consideration for some key features such as token remaining time, alternative routes, weighted arcs, multiple resource copies, and batch-processing ability. This paper proposes a novel admissible heuristic function tailored to address these challenges. It is designed from the perspective of maximal time usage of each part token. In addition, it is admissible, which guarantees that its obtained schedules are optimal. Most importantly, it is highly informative, enabling efficient scheduling of generalized PNs of RCM systems. Experimental simulations on bench-mark PN models demonstrate the efficacy and efficiency of our method. Codes are available at https://github.com/PNOptimizer/NewHDesign.
|
|
11:50-12:00, Paper ThBT6.3 | |
Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging |
|
Wu, Zihan | City University of Hong Kong |
Huang, Zhaoke | City University of Hong Kong |
Yan, Hong | City University of Hong Kong |
Keywords: Large-Scale System of Systems, Distributed Intelligent Systems, Decision Support Systems
Abstract: Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.
|
|
12:00-12:20, Paper ThBT6.4 | |
Courier Delivery Optimization in Supply Chain Via Group Multirole Assignment (I) |
|
Lin, Xuewei | School of Computer Science and Technology Guangdong University O |
Ke, Xintong | School of Computer Science and Technology Guangdong University O |
Zhu, Haibin | Nipissing University |
Liu, Dongning | Guangdong University of Technology |
Keywords: Distributed Intelligent Systems, Adaptive Systems, Decision Support Systems
Abstract: Courier delivery is the end of the supply chain and affects the final delivery of products. Due to the promulgation of new courier delivery regulations, home delivery services have become the main choice for consumers, which has resulted in a greater workload. How to Reasonably Allocate Couriers for LEan Management (RACLEM) in supply chain optimization poses a challenge to traditional logistics. By extending the Group Multirole Assignment with Efficiency Degradation (GMRAED), this paper formalizes the problem. Moreover, we propose a quantitative calculation method of efficiency degradation based on Amdahl's law, taking region similarity and the number of tasks as important criteria, and compare the performance of different personnel arrangements. Additionally, compared with the brute force algorithm, we combine GMRAED and Genetic algorithm (GA) to create a practical solution. Large-scale simulation experiments demonstrate that the genetic algorithm significantly shortens the solution time, and the lowest experimental accuracy is 99.407%. By using the above method, decision makers are able to assist companies optimize personnel resource allocation, minimize personnel waste, and improve the performance of courier delivery within a shorter timeframe.
|
|
12:20-12:40, Paper ThBT6.5 | |
Adaptive Group Multi-Role Assignment in Garbage Management System (I) |
|
Yang, Tianshuo | Nipissing University |
Zhu, Haibin | Nipissing University |
Wang, Chun | Concordia University |
Yang, Phil Xing | Telitek Wireless Inc |
Keywords: Distributed Intelligent Systems, Decision Support Systems, Adaptive Systems
Abstract: In addition to prompt garbage collection and strategic routing, effective waste management also involves optimizing resource allocation and ensuring the sustainability of disposal practices. The E-CARGO (Environments – Classes, Agents, Roles, Groups, and Objects) model, coupled with the Role-Based Collaboration (RBC) methodology, offers a sophisticated approach to address these multifaceted challenges. By simulating the scheduling of garbage trucks for the transportation of bins within urban environments, the model provides valuable insights into the dynamics of waste collection operations. Furthermore, by integrating RBC, which dynamically assigns roles to agents involved in the waste management process, the model promotes coordination and cooperation among various entities, including municipal authorities, waste collection agencies, and residents. Through this adaptive framework, the article proposes a holistic approach to urban waste management that not only enhances operational efficiency but also fosters environmental sustainability and community engagement.
|
|
12:40-13:00, Paper ThBT6.6 | |
Distributed Charging Scheduling and Pricing Strategy for Plug-In Electric Vehicles Based on Stackelberg-Nash and Multi-Cluster Aggregative Games (I) |
|
Jing, Yuhao | University of New South Wales |
Chen, Jianguo | Chinese Academy of Sciences, University of Chinese Academy of Sc |
Qiao, Li | University of New South Wales |
Mo, Huadong | University of New South Wales |
Dong, Daoyi | Australian National University |
Keywords: Distributed Intelligent Systems, Decision Support Systems
Abstract: In this paper, we propose a distributed and in teractive Plug-in Electric Vehicle (PEV) charging scheduling approach, which is also combined with an optimal pricing strategy. This method tackles challenges such as fluctuations in charging currents, potential supply congestion, and uneven demand distribution that arise as PEV penetration increases. The objective is to improve the robust stability of the charging system while also reducing the costs for PEV users. This study designs a multi-cluster aggregative game mechanism to handle the competitive dynamics among operational clusters and the collective behavior of individual PEVs. Additionally, a strategic pricing method, based on Stackelberg game theory, is designed to refine the determination of basic electricity prices. We further introduce a distributed update method that efficiently seeks the Nash Equilibrium (NE) of the hierarchical game described. The effectiveness of the proposed architecture and solution methodology is validated through experimental studies.
|
|
ThBT7 |
MR07 |
Online - AI Applications 8 |
|
Chair: Chen, Liang | Sichuan Normal University |
|
11:00-11:20, Paper ThBT7.1 | |
FedAMKD: Adaptive Mutual Knowledge Distillation Federated Learning Approach for Data Quantity-Skewed Heterogeneity |
|
Ge, Shujie | Beijing Wuzi University |
Liu, Detian | Beijing University of Technology |
Yang, Yongli | Beijing Wuzi University |
He, Jianyu | Beijing Wuzi University |
Zhang, Shiqiang | Beijing Wuzi University |
Cao, Yang | Beijing Wuzi University |
Keywords: Deep Learning, Machine Learning
Abstract: Federated learning enables collaborative training across various clients without data exposure. However, data heterogeneity among clients may degrade system performance. The divergent training goals of servers and clients lead to performance degradation: servers aim for a global model with improved generalization across all data, whereas clients seek to develop private models tailored to their specific local data distributions. This paper introduces a novel federated learning framework named FedAMKD. FedAMKD divides federated learning into two independent entities, a local model tailored to each client's data and a global model for data aggregation and knowledge sharing. A unique aspect of FedAMKD is its adaptive mutual knowledge distillation at the local level, customized for the skewed degree of the client’s data quantity. This method achieves the goal of enhancing both local and global model performance, reducing the adverse effects of data quantity-skewed heterogeneity in federated learning. Extensive experiments across diverse datasets validate FedAMKD's success in addressing challenges related to data quantity imbalances in federated learning.
|
|
11:20-11:50, Paper ThBT7.2 | |
Enhancing CNN-Based Network Robustness Predictors through Representation Recovery against Information Noise |
|
Chen, Liang | Sichuan Normal University |
Huang, Wenli | Sichuan Normal University |
Wu, Chengpei | Sichuan Normal University |
Li, Junli | Sichuan Normal University |
Keywords: Complex Network, Machine Learning
Abstract: Connectivity robustness and controllability robustness play a crucial role in maintaining the stability of complex network systems. Recently, complex network systems have faced an increasing number of malicious attacks and random failures, emphasizing the vital importance of assessing their performance. The CNN-based predictor serves as a powerful tool for evaluating the robustness of complex networks. However, the excellent performance of CNN-based predictors requires complete network data, which is often not available in real-world networks. In this paper, we investigate the recovery of losing information in networks. The main contributions can be summarized as follows: 1) Explore the impact of information loss in complex networks on CNN-based robustness prediction models. 2) Propose three recovery algorithms for addressing information loss, effectively improving the issue of losing network information. Extensive experiments demonstrate that in the presence of information loss in complex networks, CNN-based predictors exhibit higher prediction errors. However, through the application of recovery algorithms to recover losing information, a significant reduction in prediction errors is achieved.
|
|
11:50-12:00, Paper ThBT7.3 | |
Adaptive Graph-Based Uncertain Trajectory Data Augmentation Network for Next POI Recommendation |
|
Wang, Tianci | Institute of Information Engineering, Chinese Academy of Science |
Lai, Yantong | Institute of Information Engineering, Chinese Academy of Science |
Wang, Yiyuan | Harbin Institute of Technology |
Xiang, Ji | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Neural Networks and their Applications, Representation Learning, Deep Learning
Abstract: Next point-of-interest (POI) recommendation has shown effectiveness in mining complex user preferences and transition patterns from sparse check-in data. Existing methods generally leverage auxiliary information like spatial-temporal context, POI categories, and social relationships to alleviate the problem of data sparsity. However, most of them overlook the fact that the check-in records collected from users are often incomplete, leading to effective information missing and inadequately modeling. To this end, we propose a novel method Adaptive Graph-Based Trajectory Data Augmentation (AG-TDA) for next POI recommendation, which perceives potential relations among POIs and auxiliary information based on adaptive graph structure learning. Specifically, we design an adaptive graph-based trajectory data augmentation module from global view, which automatically explores implicit uncertain relations among POIs, categories, and regions via similarity learning with fine-grained node embeddings to get more expressive representations. In local view, we extend the self-attention mechanism by the learned fine-grained representations and personalized spatial-temporal information for capturing user dynamic preferences and intentions. Extensive experiments on several real-world datasets demonstrate the effectiveness of our AG-TDA.
|
|
12:00-12:20, Paper ThBT7.4 | |
Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization |
|
Ye, Rongguang | Southern University of Science and Technology |
Chen, Longcan | Southern University of Science and Technology |
Kou, Wei-Bin | The University of Hong Kong |
Zhang, Jinyuan | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Keywords: Deep Learning, Machine Learning, Evolutionary Computation
Abstract: Pareto set learning (PSL) is an emerging approach for acquiring the complete Pareto set of a multi-objective optimization problem. Existing methods primarily rely on the mapping of preference vectors in the objective space to Pareto optimal solutions in the decision space. However, the sampling of preference vectors theoretically requires prior knowledge of the Pareto front shape to ensure high performance of the PSL methods. Designing a sampling strategy of preference vectors is difficult since the Pareto front shape cannot be known in advance. To make Pareto set learning work effectively in any Pareto front shape, we propose a Pareto front shape-aGnostic Pareto Set Learning (GPSL) that does not require the prior information about the Pareto front. The fundamental concept behind GPSL is to treat the learning of the Pareto set as a distribution transformation problem. Specifically, GPSL can transform an arbitrary distribution into the Pareto set distribution. We demonstrate that training a neural network by maximizing hypervolume enables the process of distribution transformation. Our proposed method can handle any shape of the Pareto front and learn the Pareto set without requiring prior knowledge. Experimental results show the high performance of our proposed method on diverse test problems compared with recent Pareto set learning algorithms.
|
|
12:20-12:40, Paper ThBT7.5 | |
Efficient Channel Search Algorithm for Convolutional Neural Networks Based on Value Density |
|
Yang, Sheng | Chinese Academy of Sciences |
Chen, Lipeng | Institute of Software, Chinese Academy of Sciences |
Yuan, Jiaguo | Institute of Software, Chinese Academy of Sciences |
Jia, Daixi | Institute of Software, Chinese Academy of Sciences |
Wu, Fengge | Institute of Software, Chinese Academy of Sciences |
Zhao, Junsuo | Institute of Software, Chinese Academy of Sciences |
Keywords: Machine Learning, Deep Learning
Abstract: Convolutional Neural Networks (CNNs) have ex-hibited remarkable success in various vision tasks. The allo- cation of channels within each layer significantly influences CNN performance. Despite the acknowledged importance of channel configurations, achieving an optimal distribution that balances computational efficiency and model accuracy remains challenging. In this paper, we initially transform the CNN train- ing problem into an analogous knapsack optimization problem, incorporating the loss sensitivity criteria. Subsequently, we introduce the concept of value density, employing greedy algorithms, to accurately quantify the improvement in accuracy achievable by increasing the unit FLOPs on every layer. This fosters a proficient exploration and optimization of channel configurations within convolutional layers. Additionally, the efficacy of the proposed methodology has been substantiated through both theoretical derivations and experimental valida- tions by the research team, outperforming popular pruning methods under equal-scale FLOPs conditions. We hope that the integration of value density into channel search algorithms will contribute to the development of more powerful CNNs.
|
|
12:40-13:00, Paper ThBT7.6 | |
Time Slot Bidding Optimization Strategy Based on TD3 in Real-Time Bidding |
|
Qiu, Hongkun | Shenyang Aerospace University |
Feng, Yuan | Shenyang Aerospace University |
Yang, Guogang | Shenyang Aerospace University |
Fan, Chunlong | Shenyang Aerospace University |
Zhu, Haojie | Shenyang Aerospace University |
Keywords: Consumer and Industrial Applications, Decision Support Systems, Discrete Event Systems
Abstract: Real-time bidding (RTB) is a key component in digital display advertising. During the auction process, advertisers strive to maximize the total value of their winning impressions within the limited budget constraint. However, due to the complexity and volatility of the bidding environment, it is difficult to obtain an optimal bidding strategy. To solve this challenge, the advertising campaign cycle is divided into different time slots to better control and manage advertising delivery. Subsequently, the dual-delay deep deterministic policy gradient (TD3) algorithm is employed to learn the optimal bidding factor for each time slot. This approach allows for the dynamic adjustment of bidding strategies at different time slots, adapting to market fluctuations and target audience behavioral patterns. The analysis shows that direct rewards from the bidding environment were misleading, so a new reward function based on the generalized second pricing (GSP) mechanism is designed to learn the optimal policy effectively. Finally, experiments on RTB dataset show that the algorithm could achieve more rewards in different bidding environments.
|
|
ThBT8 |
MR08 |
Online - Deep Learning and Neural Networks 2 |
|
Chair: Waqas, Muhammad | University of Science and Technology of China |
|
11:00-11:20, Paper ThBT8.1 | |
Messenger RNA Subcellular Localization Prediction Via Large Language Models and Attention Mechanisms |
|
Kong, Ge | Inner Mongolia University |
Fan, Yuanhao | Department of Computer Science, Inner Mongolia University |
Jianing, Wang | Department of Computer Science, Inner Mongolia University |
Yang, Zhao | Department of Computer Science, Inner Mongolia University |
Keywords: Biometric Systems and Bioinformatics, Application of Artificial Intelligence, Computational Life Science
Abstract: Messenger RNA (mRNA) exerts its specific functions at distinct subcellular locations, making the localization of mRNA within cells critically important for studying gene expression and cell migration, and also possessing significant medical value. Current mRNA subcellular localization prediction tasks rely on time-consuming and labor-intensive wet lab experiments. Traditional machine learning approaches suffer from weak feature representation capabilities, leading to poor predictive performance. To address this, we proposed LMmRNALoc, a mRNA subcellular localization prediction model based on large language models and attention mechanisms. This represents the first application of large language models in this field, harnessing their strong representational capabilities to overcome the limitations of previous machine learning approaches. Experiments shown that compared with state-ofthe-art predictors, LM-mRNALoc achieves an F1-score that exceeds the best existing predictors by 12.5% on independent validation sets. Additionally, we developed an online prediction tool to facilitate access for researchers†. The datasets and codes for LM-mRNALoc are available at https://anonymous.4open.science/r/LM_mRNALoc-28F0.
|
|
11:20-11:50, Paper ThBT8.2 | |
Advancements in Bug Traceability: A Systematic Mapping Study |
|
Wang, Bangchao | School of Computer Science and Artificial Intelligence, Wuhan Te |
Hu, Shouya | Wuhan Textile University |
Ye, Luyao | Wuhan Textile University |
Wan, Hongyan | School of Computer Science and Artificial Intelligence, Wuhan Te |
Zou, Zhiyuan | Wuhan Textile University |
Li, Xiaoxiao | School of Computer Science and Artificial Intelligence, Wuhan Te |
Zhu, Jiaxu | School of Computer Science and Artificial Intelligence, Wuhan Te |
Keywords: Machine Learning, Deep Learning
Abstract: Traceability refers to the potential for traces to be established (i.e., created and maintained) and used. Bug Traceability (BT) is critical for enhancing software quality, reducing maintenance costs, and boosting team efficiency. To explore the trends and advancements of BT, we conduct a systematic mapping study (SMS). We initially retrieve 4674 citations from 7 databases spanning 2014 to 2023, and 24 primary studies meet the rigorous selection criteria. Our study identifies 6 types of bug trace links, 8 traceability strategies, and 47 bug traceability recovery (BTR) techniques. Among them, 47 BTR techniques can be further classified into 6 categories. At the same time, we perform statistics on 113 datasets and 16 evaluation metrics used to assess the performance of BTR techniques proposed in the primary studies. In evaluating the overall quality of the primary studies, 8 dimensions are utilized to support technology transfer, categorizing the overall quality into 4 levels: poor, middle, good, and excellent, with 79% of primary studies evaluated at a good level. This study not only furnishes a clear definition of BT for scholarly reference, but also highlights that information retrieval (IR), machine learning (ML) and deep learning (DL) techniques are the mainstream techniques used for BTR.
|
|
11:50-12:00, Paper ThBT8.3 | |
An Entity Relation Extraction Framework Based on Large Language Model and Multi-Tasks Iterative Prompt Engineering |
|
Geng, Haibin | Qilu University of Technology (Shandong Academy of Sciences) |
Shi, Chenglong | Qilu University of Technology (Shandong Academy of Sciences) |
Jiang, Xuesong | Qilu University of Technology(shandong Academy of Sciences) |
Kong, Zan | Qilu University of Technology (Shandong Academy of Sciences) |
Liu, Song | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, Neural Networks and their Applications, Knowledge Acquisition
Abstract: Document-level entity relation extraction is an important task in the field of natural language processing, which plays an important role in semantic understanding and knowledge graph construction. However, existing deep neural networks and graph neural networks models are limited by their performance and parameters number, which can not capture global semantics and have poor generalization ability. Furthermore, existing methods employing large language model for entity relation extraction do not establish good relationships among multi-tasks of entity relation extraction task, resulting in more information can not be effectively shared and transmitted between tasks. In addition, previous approaches can not effectively eliminate false entities and relationships. To solve these problems, we propose an entity relation extraction framework based on large language model and multi-tasks iterative prompt engineering. In our model, we design an iterative prompt engineering, which can better establish the relationship among multi-tasks, and ensure every task to obtain the optimal results. Moreover, we design semantic merging, group disambiguation and self-verification modules to eliminate the false entity relations and noise nodes. Additionally, we design summary prompts to provide sufficient global semantics for better text segmentation. Finally, we evaluated our model on wikiann, wikineural, ACE2005, CoNLL2003, CoNLL2004, and SciERC datasets and compared it with other baseline models.
|
|
12:00-12:20, Paper ThBT8.4 | |
Enhancing MI-BCI Classification with Subject-Specific Spatial Evolutionary Optimization and Transfer Learning |
|
Petrov, Marios | University of Colorado, Colorado Springs |
Atyabi, Adham | University of Colorado at Colorado Springs |
Keywords: Brain-Computer Interfaces, Cognitive Computing, Assistive Technology
Abstract: Motor imagery BCI systems have demonstrated success in single-subject laboratory settings, where a classifier is trained using data from a single BCI user. Typically, multiple training sessions are needed to enhance the user's performance. To address this, the BCI community has developed Subject Transfer techniques, which reduce training time by leveraging data from other subjects, primarily as pretraining samples. This study introduces a novel subject transfer method that employs Wavelet Packet Decomposition (WPD) followed by Common Spatial Patterns (CSP) for feature extraction. Once the spatial features are extracted, binary particle swarm optimization (BPSO) is applied for feature selection. In this approach, 40% of the target subject's data is used to derive a BPSO filter, which is then applied to the data from all subjects before training and testing. The binary vector produced by BPSO acts as a filter, optimizing model performance by focusing on the most class-representative features. Classification is performed using linear support vector machines (SVMs) trained via Stochastic Gradient Descent (SGD), enabling the hyperplane to be pre-trained and allowing for the effective use of data collected outside of the user session in an interpretable way. The proposed method was evaluated using three benchmark BCI Competition datasets: III-IVa, IV-I, and IV-IIa. It outperformed the single-trial MI-EEG classification state-of-the-art by 3.4% on the BCI Competition III dataset IVa and by 8.4% on the BCI Competition IV dataset IIa. Additionally, it surpassed the subject-transfer MI-EEG classification state-of-the-art by 4.1% on the BCI Competition III-IVa dataset.
|
|
12:20-12:40, Paper ThBT8.5 | |
On the Modification of Ring Consistent Hash Uniformity: A Case Study |
|
Waqas, Muhammad | University of Science and Technology of China |
Lin, Sian-Jheng | University of Science and Technology of China |
Liu, Bin | University of Science and Technology of China |
Fazil, Adnan | Air University |
Keywords: Cyber-physical systems, Distributed Intelligent Systems, Large-Scale System of Systems
Abstract: Distributed systems' applications employ distributed hashing for minimal dispersal, load balancing, and efficient lookups. One such scheme is Ring Consistent Hashing (RCH), which was initially proposed to tackle the issue of web hotspots. Later, it gained popularity in numerous practical applications due to its structured node placement and O(log(N)) asymptotic complexity. Despite its popularity, RCH's uniformity is usually worse than other hashing schemes, such as the Highest Random Weight (HRW) and Anchor Hash (AH). For this reason, virtual nodes are introduced in RCH, which increases the memory footprint of the original scheme and restricts its scalability. In this paper, we propose a modification to the existing mapping rule between nodes and keys in the RCH scheme. We develop theoretical arguments and conduct several simulations on the conventional RCH algorithm using modified rule. The results show that the modified rule utilizes only half the number of virtual nodes compared to the original RCH for the same performance, resulting in significantly improved uniformity. Moreover, the memory footprint of the modified RCH is nearly halved compared to the original RCH, owing to the reduced number of virtual nodes. Additionally, the simulation results demonstrate that the modified scheme surpasses HRW and AH in lookup rate and memory usage, making it a superior choice for large-scale distributed systems.
|
|
12:40-13:00, Paper ThBT8.6 | |
Photovoltaic Power Forecasting with Missing Values Using VMD, GLTA-Unit and Multi-Scale Temporal Graph Convolution |
|
Wang, Yingxiang | Sichuan Normal University |
Guo, Rongzuo | Sichuan Normal University |
Min, Peng | Sichuan Normal University |
Keywords: Intelligent Power Grid
Abstract: Photovoltaic(PV) power generation forecasting is an important method to solve the inherent volatility and intermittency of solar energy. However, traditional research methods often assume that the data is complete or well pre-processed, which does not match the reality of data collection scenarios where data is missing. To this end, We propose M-VGTG, a novel framework that integrates Variational Mode Decomposition (VMD), Global-Local Temporal Attention Unit(GLTA-Unit), and Multi-Scale Temporal Graph Convolution to tackle the complex issue of missing values in PV power forecasting. First, to address the problem of modal aliasing, a missing data processing strategy based on VMD is designed. Secondly, a carefully designed GLTA-Unit provides a global-local attention mechanism to capture long-term dependencies and local fluctuations, improving the forecasting performance. To address the problem of different positive and negative correlation relationships between sequences at different time scales, a multi-scale method is adopted. The GLTA-Unit and Partial Convolution are introduced into the backbone model Temporal Convolutional Network (TCN) to handle temporal features and missing mode updates. Capturing spatial structure is also important. In order to adaptively handle missing modes, this paper uses Adaptive Graph Convolution Network (AGCN) to process spatial features and introduces the latest BiasedGCN module to handle the perception of missing values in the information propagation process. Experiments on real PV power generation datasets verify the effectiveness of the proposed method and demonstrate the potential of this architecture in addressing the unique challenges of PV power generation forecasting.
|
|
ThBT9 |
MR09 |
Deep Learning and Neural Networks 10 |
Regular Papers - Cybernetics |
Chair: Chi, Chengdao | Tongji University |
|
11:00-11:20, Paper ThBT9.1 | |
BD-YOLO: Optimising Insect Imbalance Data Detection Models |
|
Wang, Jianyi | Xinjiang University |
Jia, Zhenhong | XinJiang University |
Zhou, Gang | Xinjiang University |
Wang, Jiajia | Xinjiang University |
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: 基于视觉的昆虫检测是一项重要任务 在智能农业中。现场采集的昆虫图像数据集 通常表现出类别的严重失衡。以前 模型 精度不足,实时性低 用于检测昆虫不平衡数据的性能。自 地址 上述问题,改进的框架平衡 在 本文。我们创新性地设计了 Focaler-CPIoU 损失到 增加几类样品的重量以实现 这 优化有害生物不平衡数据检测。我们 介绍 轻量级卷积 MBConv 提高识别度 准确性,同时减少模型参数的数量。在 为了验证该方法在此的有效性 纸,我们 建立了一个包含六种类型的昆虫数据集 昆虫, 在 其中数量最多的蓟马是蓟马的431.5倍 数量最少的 chrysopaperla。实验结果表明 与YOLOv8n相比,mAP50达到61.7% 模型参数数量减少 73.1%,一个 提高 5.4%,FPS 提高 769。一种优化不平衡检测的可行方法 本文&
|
|
11:20-11:50, Paper ThBT9.2 | |
Behavior Recognition in Mice Using RGB-D Videos Captured from Below |
|
Oikawa, Haruki | Tokyo University of Sciense |
Tsuruda, Yoshito | Tokyo University of Sciense |
Sano, Yoshitake | Tokyo University of Sciense |
Furuichi, Teiichi | Tokyo University of Sciense |
Yamamoto, Masataka | Tokyo University of Science |
Takemura, Hiroshi | Tokyo University of Science |
Keywords: AI and Applications, Deep Learning, Transfer Learning
Abstract: Changes in mouse behavior are useful for both basic and applied research. However, visual inspection by humans is subjective and time-consuming. With the advancement of deep learning, systems have been developed that are capable of automatically and quantitatively classifying mouse behavior from videos. As camera angles are typically from above or the side, consistently capturing keypoints related to limb movement can be challenging. In this study, a mouse was placed on a transparent acrylic plate and its movements were recorded from below using an RGB-D camera, successfully capturing its limbs in 3D at all times. Furthermore, by using DeepLabCut, the 3D coordinates of the mouse's keypoints were obtained. By using deep learning with the time-series data of these obtained keypoints coordinates and corresponding behavioral labels, we created a model that classifies mouse behaviors from videos. This method achieved a total accuracy of 96.7% and a walking classification accuracy of 94.5%, demonstrating higher precision compared to previous studies.
|
|
11:50-12:00, Paper ThBT9.3 | |
Depth Camera-LiDAR Fusion Based UAV Autonomous Navigation for Parcel Delivery in Complex Urban Environment |
|
Chi, Chengdao | Tongji University |
Li, Bing | Tongji University |
Deng, Hao | Tongji University |
Zhao, Shengjie | Tongji University |
Keywords: Deep Learning, Application of Artificial Intelligence, Machine Vision
Abstract: Unmanned aerial vehicle (UAV) delivery in urban environments demonstrates significant growth potential due to its efficiency and eco-friendliness. The challenges of autonomous navigation and obstacle avoidance in complex environments are crucial study aspects of UAV delivery. This paper proposes a depth camera-LiDAR fusion based UAV autonomous navigation protocol, enabling autonomous navigation and obstacle avoidance for UAV parcel delivery in complex urban environments. Experiments are conducted in a highly realistic simulation environment to validate performance of the proposed method. The results indicate that the proposed method can efficiently and accurately achieve autonomous navigation and obstacle avoidance with high robustness.
|
|
12:00-12:20, Paper ThBT9.4 | |
DSESL: A Deep Stacking Ensemble Model for Synthetic Lethality Prediction |
|
Li, Zhuang | Tongji University |
Wang, Xiaowen | Tongji University |
Li, Yulong | Tongji University |
Zhu, Hongming | Tongji University |
Liu, Qin | Tongji University |
Keywords: Biometric Systems and Bioinformatics, Deep Learning
Abstract: Synthetic lethality (SL) refers to the phenomenon that simultaneous mutation of two genes is lethal to cells, while mutation of either gene alone is not lethal. Exploiting this genetic interaction holds immense clinical potential for selectively killing cancer cells without harming normal cells. Given the vast genomic combinatorial space, relying solely on wet lab experiments for screening synthetic lethal gene pairs is impractical, leading to the emergence of various computational methods. Existing computational methods often rely on single-feature extraction methods or single data sources for prediction, resulting in poor predictive accuracy when faced with unseen genes. In this work, we propose a novel deep ensemble model for synthetic lethality prediction based on a stacking strategy (DSESL). Firstly, leveraging the publicly available SynLethKG knowledge graph, we learn gene embeddings at three different focus-levels: single-entity single-relation, single-entity multi-relation, and multi-entity multi-relation, constructing three sub-models for prediction from the knowledge graph. Additionally, we incorporate signaling pathway data and utilize graph neural network-based methods to construct a pathway sub-model. Finally, we adopt a stacking strategy-based ensemble approach to effectively integrate the prediction results from different sub-models. Based experimental results, our proposed DSESL model outperforms existing state-of-the-art SL prediction methods in all three prediction scenarios. The source code of DSESL is available at https://github.com/TOJSSE-iData/DSESL/.
|
|
12:20-12:40, Paper ThBT9.5 | |
Multi-Agent Feedback Reinforcement Learning for Resource Allocation in Vehicular Networks |
|
Li, Shulin | Shandong University of Science and Technology |
Zhang, Fuxin | Shandong University of Science and Technology |
Keywords: AIoT, Machine Learning
Abstract: In vehicular networks, the local topology varies rapidly, and channel status fluctuates unpredictably. Such dynamic and uncertain factors can cause network states to deviate from ideal ones, seriously degrading network communication performance. To overcome the above problem, this paper proposes a novel multi-agent feedback reinforcement learning framework for resource allocation, which includes an attention module and a feedback reinforcement learning model. The former extracts information about vehicle-to-vehicle (V2V) links with interference relationships to input into the feedback reinforcement learning model and help each V2V link coordinate to make decisions. In the feedback reinforcement learning model, each agent evaluates the error between the desired V2V link state and the current actual V2V link state and learns the real-time resource allocation strategies that are implemented to guide each V2V link to the desired state.
|
|
12:40-13:00, Paper ThBT9.6 | |
Improved Proximal Policy Optimization Algorithm for Controller Design in Hybrid UAVs |
|
Qi, Mingyu | Nanjing University of Aeronautics and Astronautics |
Zheng, Hongyuan | Nanjing University of Aeronautics and Astronautics |
Zhai, Xiangping | Nanjing University of Aeronautics and Astronautics |
Zhu, Qi | Nanjing University of Aeronautics and Astronautics |
Keywords: Machine Learning, Neural Networks and their Applications
Abstract: Drones have become an indispensable tool in our daily lives. While fixed-wing and rotary-wing drones are common types, each comes with its own set of advantages and disadvantages. Hybrid drones, however, combine the strengths of both types, enabling vertical takeoff and landing, hovering, and remote flight. Nonetheless, the aerodynamics of hybrid drones are exceedingly intricate, leading to a sluggish pace in their development. In this article, we propose a neural network controller design, employing the reinforcement learning Proximal Policy Optimization (PPO) algorithm to train the controller. Additionally, we integrate an attention mechanism into the network's input section to emphasize the speed variable, thereby enhancing the data processing and improving the model performance and efficiency. Experimental results demonstrate that our approach yields a controller with superior stability and optimal value.
|
|
ThBT10 |
MR10 |
Expert Systems and Decision Support |
|
|
11:00-11:20, Paper ThBT10.1 | |
Long-Term Water Quality Prediction with Patch Savitsky-Golay Filtering and Transformer |
|
Lin, Yongze | Beijing University of Technology |
Qiao, Junfei | Beijing University of Technology |
Bi, Jing | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Zhai, Jiahui | Beijing University of Technology |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Application of Artificial Intelligence, Expert and Knowledge-Based Systems, Neural Networks and their Applications
Abstract: In many fields, time series prediction is gaining more and more attention, e.g., air pollution, geological hazards, and network traffic prediction.Water quality prediction is based on historical data to predict future water quality. However, it is difficult to learn a representation map from a time series that captures the trends and fluctuations to effectively remove noise from time series data and capture complex nonlinear relationships. To solve these problems, this work proposes a time series prediction model, called PSGT for short, which integrates Patch Savitsky-Golay filtering and Transformer. First, this work adopts a Patching method to embed sub-time series data and obtains the trends and semantic information of the time series. Second, it uses the Savitsky-Golay filtering to effectively remove the noise data in the patch and improve the prediction accuracy. Third, it uses a Transformer mechanism to address the nonlinear problem of water quality time series and improve long-term prediction capability. Two real-world datasets are utilized to evaluate the proposed PSGT, and experiments prove that PSGT performs better than other benchmark models by at least 6%.
|
|
11:20-11:50, Paper ThBT10.2 | |
Genetic Algorithm with Reinforcement Learning Based Parameter Optimisation |
|
Woodcock, Alexander | Royal Holloway, University of London |
Zhang, Li | Royal Holloway, University of London |
Keywords: Computational Intelligence, Expert and Knowledge-Based Systems, Heuristic Algorithms
Abstract: Optimisation of parameters in Genetic algorithms (GA) can improve the speed and accuracy of the solution produced, but well optimised parameters are dependant on the problem being solved, and the substantial additional cost of spending time pre-computing good parameters can offset the benefit. This research investigates the use of reinforcement learning algorithms to optimise the parameters of the GA during its runtime. Specifically, we propose a variant of the GA method which embeds the Q-learning algorithm to select an optimal mutation rate at each iteration. Evaluating with a set of benchmark functions, the proposed GA model with Q-learning shows promising performance with lower mean scores than those of the original GA for most test functions. In particular, the Q-learning algorithm shows a promising emergent behaviour, i.e. selecting a high mutation rate when the population variance is low to increase swarm and search diversity. Evaluated using diverse unimodal and multimodal numerical optimisation problems, the proposed model outperforms several baseline GAs with a statistical significance.
|
|
11:50-12:00, Paper ThBT10.3 | |
Improving Knowledge Tracing through Learning Processes and Concept Similarity Map |
|
Lai, Yingxu | Beijing University of Technology |
Xu, Xinyu | Beijing University of Technology |
Zhang, Xiao | Beijing University of Technology |
Dong, Xinrui | Beijing University of Technology |
Zhuang, Junxi | Beijing University of Technology |
Liu, Jing | Beijing University of Technology |
Keywords: Expert and Knowledge-Based Systems, Big Data Computing,
Abstract: Knowledge Tracing (KT) aims to track the evolving knowledge states of students based on their historical performance, playing a vital role in online intelligent education systems. While deep learning-based knowledge tracing achieves impressive predictive performance, existing methods suffer from two shortcomings. On one hand, the storage of massive historical information introduces irrelevant noise during the training process. Moreover, individual differences may lead to the model’s inability to accurately capture students’ comprehensive states. On the other hand, deep learning models lack interpretability, failing to provide precise descriptions of students’ knowledge states. This paper proposes an improving knowledge tracing through learning processes and concept similarity map. We incorporate diverse aspects of features, including exercise, exercise difficulty, concept, response time, response, and interval time, to measure the diversity of exercise interactions. Additionally, we utilize a forgetting gate to simulate the decline of students’ knowledge over time during the learning process. Furthermore, we introduce a concept similarity map as a constraint for model training, thereby clearly delineating the mastery of students across different knowledge points. Extensive experiments on three real-world datasets demonstrate that LCKT outperforms state-of-the-art KT methods and exhibits interpretability to some extent.
|
|
12:00-12:20, Paper ThBT10.4 | |
Enhancing Investment Forecasting through the Second-Level Thinking Concept |
|
Dan, Zhao | Zhejiang University |
Zhenyi, Shen | Zhejiang University |
Chao, Wang | Swinburne University of Technology |
Keywords: Expert and Knowledge-Based Systems, AI and Applications
Abstract: Forecasting models that rely on machine learning often lack a transparent trading logic, which can erode trust when they are applied to real-world trading scenarios. To tackle this problem, a model grounded in an investment concept known as ‘second-level thinking’ is proposed. This approach enhances both the interpretability of the model and the accuracy of its predictions. The model’s first-level thinking process makes a broad prediction based on features related to the macroeconomic environment. These features are used to group samples into clusters, each representing a different state of the macroeconomic environment. For each cluster, an associative memory is constructed to improve interpretability and simplify the prediction process, mirroring the function of human memory. A theorem is developed to explore the relationship between the capacity of the associative memory and the threshold used in the clustering step. This helps establish an appropriate threshold for reliably memorizing patterns within each cluster. The second-level thinking process then refines these predictions by fitting the residuals from the first-level process using technical features, further enhancing prediction performance. Experiments conducted on two datasets of Chinese government bonds demonstrate that our proposed model outperforms others in terms of prediction accuracy and trading performance.
|
|
12:20-12:40, Paper ThBT10.5 | |
Enhanced Spatial-Temporal Analysis for EEG-Based Microsleep Detection: Integrating Kalman Filtering with Voronoi Tessellation and Adaptive Coverage Control |
|
Biró, Attila | Obuda University |
Cuesta-Vargas, Antonio Ignacio | University of Malaga |
Szilágyi, László | Obuda University |
Keywords: Application of Artificial Intelligence, Biometric Systems and Bioinformatics, Image Processing and Pattern Recognition
Abstract: Detecting microsleep in real time is crucial to facilitating the transition from semi-autonomous systems to completely autonomous driving technologies. Integrating sophisticated detection algorithms with vehicle control systems enables the provision of prompt corrective measures, such as driver alerts or temporary vehicle control. Minimizing the probability of accidents caused by driver fatigue not only improves the safety of users, but also promotes the overall security of the road network. By integrating Kalman filtering, Voronoi tessellation, and adaptive coverage control algorithms, the study seeks to find a feasible and sophisticated methodology to enhance the spatial and temporal resolution of EEG data analysis, leading to more robust and reliable detection of microsleep episodes. The paper presents a framework as a significant advancement in sleep technology, offering a new method to diagnose and understand microsleeps, characteristics, and patterns of brain activity and sleep disorders.
|
|
12:40-13:00, Paper ThBT10.6 | |
A Self-Tuning Version for the Fuzzy-Possibilistic Product Partition C-Means Algorithm |
|
Naghi, Mirtill Boglárka | Óbuda University, Sapientia University |
Kreinovich, Vladik | University of Texas at El Paso |
Kovacs, Levente | Obuda University |
Szilágyi, László | Obuda University |
Keywords: Fuzzy Systems and their applications, Computational Intelligence, Soft Computing, Socio-Economic Cybernetics
Abstract: The fuzzy-possibilistic product partition c-means (FPPPCM) algorithm was proposed as a robust solution to the c-means clustering problem, in which outlier data behave similarly to distant objects in gravity systems. Although FPPPCM reliably provides fine partitions when its parameters are well chosen, things can be difficult when it is not initialized properly. To avoid such cases, this paper proposes a self-tuning version of the FPPPCM algorithm, which incorporates some cluster size controlling variables into the objective function that allow for the adjustment of the so-called possibilistic penalty terms during the alternative optimization process. The proposed method was evaluated using four standard test datasets in three different scenarios: (1) no added noise; (2) a single outlier added; (3) multiple noisy items added. The partitions provided by the proposed algorithm were evaluated based on cluster purity, normalized mutual information and adjusted Rand index, and was compared with the outcome of previous clustering models. The proposed method performed better or at least at the same quality level at previous ones, while reducing the number of parameters the user is responsible for.
|
|
12:40-13:00, Paper ThBT10.7 | |
A Hybrid Method for Dense Points Enclosing and Observation Planning |
|
Pan, Youmei | Institute of Software, Chinese Academy of Sciences |
Hui, Xinyao | Institute of Software, Chinese Academy of Sciences |
Li, Jinwen | Institute of Software, Chinese Academy of Sciences |
Zhu, Xiaobin | Beijing Institute of Tracking and Telecommunication Technology |
Xu, Fanjiang | Institute of Software, Chinese Academy of Sciences |
Wang, Peng | Chinese Academy of Sciences |
Keywords: Decision Support Systems
Abstract: As a new form of Earth-observation requirement, dense points usually contain a large number of points of interest (PoIs) which are distributed in a wide area and need to be planed by involving several satellites to perform the tasks collaboratively. This challenges the usage of satellites when visible time window is calculated and decided separately for each PoI, causing frequent switching among a lot of PoI captures. While traditional methods are more suitable for observing fewer PoIs, denser points require a different approach to fully utilize satellite resources and avoid inefficient tasking. Meanwhile, because PoIs are geographically clustered and dispersed in different regions, more satellites need to be analyzed in order to generate a collaborative planning result. This paper proposed a dense-point aggregation method that uses clustering to generate a hybrid task representation based on the geographic distribution of PoIs. A collaborative plan modeling and solving algorithm is proposed to provide a compatible optimization with this hybrid representation. The efficacy of the proposed method is demonstrated in the experiment to show the advantage of the proposed method.
|
|
12:40-13:00, Paper ThBT10.8 | |
Group Consensus Optimization Ranking Approach with Expert Reliability and Social Network Relationships |
|
Li, Guangxu | University of Electronic Science and Technology of China |
Zhao, Xin | University of Electronic Science and Technology of China |
Li, Yanhong | ChengDu University |
Keywords: Decision Support Systems, Conflict Resolution
Abstract: In group decision making, reaching consensus of decision groups is important to obtain reliable decision-making result. Moreover, expert reliability can significantly affect the result reliability, and unreliable experts provide incorrect information and affect the rationality and effectiveness of decision result. A few studies have considered expert reliability and defined it as the similarity of each expert’s opinions before and after adjustment in the iterative consensus process. They have not considered how to eliminate overly unreliable experts before consensus process to improve the reliability of result, or how to define and measure expert reliability in non-iterative optimization consensus process. To deal with these issues, this paper proposes a group consensus optimization ranking approach with expert reliability and social network relationships to avoid the negative impact of unreliable experts on decision-making. The similarity between original and consistent preference is used to measure expert reliability, the weights of experts are obtained by social network relationships. When the reliability of some experts doesn’t meet the requirement, they should be excluded before consensus process. For the remaining experts, an improved consensus optimization model and a median ranking model are established to promote consensus and obtain final ranking result based on a new ordinal consensus measurement. Finally, a numerical example about pollution control plan selection and a comparison analysis are given to illustrate the feasibility and advantage of the proposed approach.
|
|
12:40-13:00, Paper ThBT10.9 | |
Duopoly Competition in Blockchain Game with Interoperability |
|
Sun, Jinghan | The Chinese University of Hong Kong, Shenzhen |
Zhang, Hongbo | The Chinese University of Hong Kong, Shenzhen |
El Saddik, Abdulmotaleb | University of Ottawa |
Cai, Wei | The Chinese University of Hong Kong, Shenzhen |
Keywords: Distributed Intelligent Systems, Decision Support Systems, System Modeling and Control
Abstract: As a bridge connecting the Web3 financial ecosystem and digital games, smart contracts empowered blockchain games have attracted significant attention from the Web3 community in recent years. By providing players ownership over assets and interoperable Non-Fungible Tokens (NFTs), blockchain games enable the reuse of in-game assets beyond the original games, thereby overturning the ``walled garden'' among traditional games. Nonetheless, blockchain games diminish the monopolistic edge previously held by traditional game providers, forcing them to compete with players by token distribution. Therefore, this paper explores the duopoly competition within the blockchain game market, emphasizing the role of interoperable NFTs together with NFT wear and tear. Specifically, we propose a three-stage game to formulate the interactions between game providers and players. Besides, we revealed the relationship between game providers' code disclosure strategies for NFT interoperability and token retention strategies. Finally, the experimental results demonstrate how the token distribution, players' preferences, and the NFT wear level affect the profits of game providers.
|
|
12:40-13:00, Paper ThBT10.10 | |
Games in Technology Forecasting & Foresight: A Rapid Review |
|
Machado Andrade, Rafael | Universidade Federal Do Rio De Janeiro |
Lima de Souza, Aline | Universidade Federal Do Rio De Janeiro |
Barbosa, Carlos Eduardo | Universidade Federal Do Rio De Janeiro |
Lyra, Alan | UFRJ |
Salazar, Herbert | Universidade Federal Do Rio De Janeiro |
Lima, Yuri | UFRJ |
Argôlo, Matheus | Universidade Federal Do Rio De Janeiro |
Souza, Jano | Federal University of Rio De Janeiro |
Keywords: Technology Assessment, Decision Support Systems
Abstract: Future-related studies are key in identifying and predicting emerging technologies, fostering innovation, and driving scientific progress. Games have increasingly served as a tool to engage researchers in Technology Forecasting and Foresight activities, presenting a gamified approach to gather, analyze, and synthesize information to allow future foresight. In this work, we use the Rapid Review methodology to analyze the scientific literature and explore the use of games as tools for Technology Forecasting and Foresight. We analyzed and categorized the articles by theme and type of study, identifying the areas where games can be most beneficial to guide professionals performing Technology Forecasting and Foresight, helping companies and governments better understand the future. The empirical evidence found provides a comprehensive understanding of how games can enhance foresight activities and contribute to the success of Technology Forecasting and Foresight initiatives. Our analysis also provides insights for future research about gamified strategies in Foresight.
|
|
12:40-13:00, Paper ThBT10.11 | |
Adaptive Task and Motion Planning for Object-Invariant Stacking Operations |
|
Sadhu, Arup Kumar | Tata Consultancy Services |
Saha, Arindam | Tata Consultancy Services Limited |
Roy Choudhury, Anushko | TCS Research |
Dasgupta, Ranjan | TCS Research |
Keywords: Robotic Systems, Decision Support Systems, Manufacturing Automation and Systems
Abstract: This paper proposes a method for adapting an object-invariant action model for stacking operations. The proposed approach results in the generalization of stacking operations for multiple objects, in contrast to handcrafted action model-based task and motion planning (TAMP). Action model adaptation for new object stacking is done by drawing inferences using intuitions and learning a reward from visual clues based on the heuristically defined reward function. With this reward, the action model adaptation problem is formulated as an n-armed bandit problem. Additionally, an efficient action selection strategy is proposed for the n-armed bandit problem, which leads to fast convergence. The agent repetitively interacts with the environment in real-time to adapt the action model for the new object. After adaptation, the agent is capable of stacking objects with varying poses. The proposed approach is tested in simulation using the UR5 manipulator with a two-finger gripper. The proposed adaptive action model-based TAMP outperforms the traditional handcrafted action model-based TAMP by a significant margin in terms of success rate.
|
|
12:40-13:00, Paper ThBT10.12 | |
Novel Creative Technologies Decision Support System for Small and Medium Enterprises: Exploring the Potential, Current Findings and Future Research Consideration |
|
Sosunova, Inna | Lappeenranta-Lahti University of Technology LUT |
Hasheela, Victoria | University of Namibia |
Happonen, Ari | LUT University |
Keywords: Decision Support Systems, Adaptive Systems, Technology Assessment
Abstract: Small and Medium Enterprises (SMEs) face numerous technological barriers in new creative technologies (CTs) adoption efforts, including artificial intelligence, generative algorithms, large language models and other innovative supportive tools utilizations. Still these technologies adoption can enhance SMEs value productivity, efficiency, market visibility, boost their competitiveness and cut costs and resource needs. To empower SMEs to make informed decisions between creative technologies options (also known as Createch solution adoption), be educated on solutions available for them and have direction to proceed, a decision-support system is needed. Our work presents Creative Technologies Decision Support System (CT DSS), offering a novel solution to find innovative approaches for SMEs, for their identified challenges, technology utilization problems and skill-building needs to be solved. The CT DSS puts the power of technology adoption in the hands of SMEs, enhancing digital transformation journey, reducing digital blindness and minimizing need to wander clueless on the Internet, weeping around things they do not know. The system uses two working modes: minimal and expert, with six main stages. Our work describes the developed concept of the CT DSS for SMEs and presents the Internet versions of the developed tool and first public & operational field test results of the CT DSS.
|
|
ThBT11 |
MR11 |
Knowledge and Information Management |
Regular Papers - Cybernetics |
Chair: Xu, Junyan | University of Electronic Science and Technology of China |
|
11:00-11:20, Paper ThBT11.1 | |
Plant Species Classification Using Evolving Ensemble and Siamese Networks |
|
Arno, Jed | Royal Holloway, University of London |
Grace, Olwen | Royal Botanic Garden Edinburgh |
Larridon, Isabel | Royal Botanic Gardens, Kew |
Zhang, Li | Royal Holloway, University of London |
Keywords: Expert and Knowledge-Based Systems, Hybrid Models of Computational Intelligence, Knowledge Acquisition
Abstract: Image-based dried plant specimen identification poses a significant challenge due to the large number of possible classes and extreme scarcity of labelled training samples. To tackle these limitations and mitigate classification biases, this research proposes a Particle Swarm Optimisation (PSO)-based weighted evolving ensemble model as well as a Siamese network for plant species classification. Specifically, we first diversify the base classifier pool by employing three networks, i.e. ResNet50, Xception, and VGG19, fine-tuned using the specimen samples. Besides the adoption of a mean average ensemble model, a weighted ensemble scheme with PSO-based optimal weighting factor generation is also utilised to integrate the outputs of the three base networks for tackling classification variances. In addition, to further tackle species classification with extremely imbalanced data, a Siamese network with ResNet50 as the backbone is utilised. Evaluated using a challenging FGVC6 data set with Melastomataceae images, the PSO-based weighted ensemble model is able to assign more influence to the best performing base networks for ensemble prediction and outperforms the traditional mean average ensemble method. Moreover, the Siamese network also obtains competitive performance for solving imbalanced specimen classification by performing similarity comparison between images.
|
|
11:20-11:50, Paper ThBT11.2 | |
Robust Kalman Filter Based Path-Tracking Control with Prescribed Performance for Autonomous Ground Vehicles under FDI Attack |
|
Wang, Yuxin | Shanghai Jiao Tong University |
Hu, Chuan | Shanghai Jiao Tong University |
Gao, Hong | Shanghai Jiao Tong University |
Keywords: Cybernetics for Informatics, Intelligent Internet Systems
Abstract: This paper researches the path-tracking control of autonomous ground vehicles (AGVs) with prescribed performance constraints when the measurement is under Guassian noise and false data injection (FDI) attack. The robust Kalman filter with credibility mechanism is utilized to filter out the attacked state data. The controller is designed by incorporating the prescribed performance control (PPC) and backstepping approaches, based on a combined brushed tire model. To achieve the control objective with concerned issues, three contributions in this paper are presented: 1) To eliminate the effect of noise and FDI attack, the robust Kalman filter considers the credibility of measurement in three methods, and obtains a more accurate estimation; 2) A modified prescribed performance function (PPF) is developed in the PPC design to stabilize the control outputs, and constrain them within the PPF boundaries, where the need for accurate initial conditions is eliminated; 3) A modified backstepping control approach is presented to overcome the ``explosive complexity" in backstepping and thus reduce the computational effort. Finally, high-fidelity simulations have been conducted to verify the effectiveness of the proposed approach, where the superiority of the proposed approach is demonstrated by comparing with a traditional sliding mode control (SMC) law.
|
|
11:50-12:00, Paper ThBT11.3 | |
IXGB: An Incremental Learning Algorithm for XGBoost |
|
Ning, Tao | Beijing University of Posts and Telecommunications |
Song, Guo | Beijing University of Posts and Telecommunications |
Zhao, Fang | Beijing University of Posts and Telecommunications |
Luo, Haiyong | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: Machine Learning, Knowledge Acquisition
Abstract: Incremental learning aims to emulate the human knowledge acquisition process by enabling models to engage in a continuous learning process. While incremental learning has been notably successful within the realm of deep learning, traditional machine learning algorithms are prized for their robust interpretability, low computational overhead, and ability to operate efficiently with limited data. Our research endeavors to harness the strengths of traditional machine learning by integrating it with incremental learning. We introduce an innovative incremental learning algorithm for XGBoost, which restructures the model's knowledge architecture. This is achieved through a two-pronged approach: the expansion of decision trees and the recalibration of leaf node weights within these trees. Our algorithm adeptly addresses the challenge of catastrophic forgetting by simultaneously absorbing new information and reinforcing the integrity of pre-existing knowledge. We comprehensively evaluate our algorithm on the public data set and the private data set, and the experimental results show that our algorithm's performance is comparable to the tested deep learning models and surpasses that of traditional machine learning algorithms.
|
|
12:00-12:20, Paper ThBT11.4 | |
GraDiNet: Implicit Self-Distillation of Graph Structural Knowledge |
|
Xu, Junyan | University of Electronic Science and Technology of China |
Liao, JianXing | Shenzhen Institute for Advanced Study, University of Electronic |
Rucong, Xu | University of Electronic Science and Technology of China |
Li, Yun | Shenzhen Institute for Advanced Study, University of Electronic |
Keywords: Knowledge Acquisition, Computational Intelligence in Information, Deep Learning
Abstract: Graph Knowledge Distillation (GKD) in artificial intelligence typically employs a teacher-student model, which faces challenges such as rigidity, time-consumption, and teacher training. To improve, this paper develops a Graph self-Distillation Network (GraDiNet), a framework that operates without the need for a teacher model or graph neural network (GNN) during training and inferencing phases. GraDiNet uniquely utilizes multi-layer perceptrons (MLPs) to harness both the structural knowledge of graphs and the semantic information of nodes, thus facilitating hierarchical self-distillation between a target node and its neighbors. Additionally, the GraDiNet approach incorporates a novel similarity-based difference enhancement technique and a penalty factor within the training loss to further delineate the distinction between positive and negative samples. This allows GraDiNet not only to bypass the necessity for a GNN teacher in learning graph structure knowledge but also to predict node classification efficiently. Extensive evaluations show that standard MLPs can significantly boost their performance through this implicit hierarchical self-distillation and the similarity difference enhancement. GraDiNet thus achieves an average improvement of 15% over conventional MLPs and outperforms leading state-of-the-art GKD methods across three real-world datasets.
|
|
12:20-12:40, Paper ThBT11.5 | |
Fish-Bone Diagram of Research Issue: Gain a Bird's-Eye View on a Specific Research Topic |
|
Li, Jinghong | Japan Advanced Institute of Science and Technology |
Phan, Huy | FPT University, HCMC, Vietnam |
Gu, Wen | Center for Innovative Distance Education and Research, Japan Adv |
Koich, Ota | Japan Advanced Institute of Science |
Hasegawa, Shinobu | Japan Advanced Institute of Science and Technology |
Keywords: Knowledge Acquisition, Machine Learning, AI and Applications
Abstract: Novice researchers often face difficulties in understanding a multitude of academic papers and grasping the fundamentals of a new research field. To solve such problems, the knowledge graph supporting research survey is gradually being developed. Existing keyword-based knowledge graphs make it difficult for researchers to deeply understand abstract concepts. Meanwhile, novice researchers may find it difficult to use ChatGPT effectively for research surveys due to their limited understanding of the research field. Without the ability to ask proficient questions that align with key concepts, obtaining desired and accurate answers from this large language model (LLM) could be inefficient. This study aims to help novice researchers by providing a fish-bone diagram that includes causal relationships, offering an overview of the research topic. The diagram is constructed using the issue ontology from academic papers, and it offers a broad, highly generalized perspective of the research field, based on relevance and logical factors. Furthermore, we evaluate the strengths and improvable points of the fish-bone diagram derived from this study's development pattern, emphasizing its potential as a viable tool for supporting research survey.
|
|
12:40-13:00, Paper ThBT11.6 | |
Parallel-SWSA: Automated Extraction for Feature Sequences from Remote Access Trojan Attack Packets |
|
Li, Xiangyu | Beihang University |
Yan, Hanbing | National Internet Emergency Center |
Lang, Bo | Beihang University |
Zhang, Yanzhe | University of Chinese Academy of Sciences |
Keywords: Information Assurance and Intelligence, Intelligent Internet Systems, Computational Intelligence
Abstract: Remote Access Trojans (RATs) are malware that allow attackers to remotely control infected systems and steal sensitive user data via the internet. Although current detection methods based on network traffic rules are widely adopted due to their efficiency and accuracy, writing these rules heavily relies on expert knowledge and is costly. This is incredibly challenging when extracting rules for obfuscated Trojans. To achieve effective and efficient RAT detection, this paper proposes a Parallel Sliding Window Sequence Alignment (Parallel-SWSA) algorithm. This method uses a sliding window to segment the data, then uses the Jaccard Similarity coefficient to calculate the similarity between sequences within the window, selects the most similar sequence pairs for alignment one by one, and extracts the common feature sequences. By introducing parallel processing techniques, the extraction efficiency of the feature sequences is significantly improved. Additionally, we developed an automatic conversion tool for Snort rules to address the issue of labor-intensive rule writing. This tool can automatically transform extracted feature sequences into Snort-compatible rules, suitable for intrusion detection systems that support Snort rules. The Parallel-SWSA algorithm was validated on five datasets and achieved high detection rates. The results indicate that the method can effectively extract malicious feature sequences generated during RAT communication and possess strong noise resistance capability.
|
|
ThBT12 |
MR12 |
Medical Informatics |
|
Chair: Wang, Fanxin | Xi'an Jiaotong-Liverpool University |
|
11:00-11:20, Paper ThBT12.1 | |
Transformer Based Tissue Classification in Robotic Needle Biopsy |
|
Wang, Fanxin | Xi'an Jiaotong-Liverpool University |
Cheng, Yikun | UIUC |
Mukherjee, Sudipta | University of Illinois Urbana-Champaign |
Bhargava, Rohit | University of Illinois Urbana-Champaign |
Kesavadas, T. | Division of Research and Economic Development, University at Alb |
Keywords: Medical Informatics, Human-Machine Cooperation and Systems, Haptic Systems
Abstract: Image-guided minimally invasive robotic surgery is commonly employed for tasks such as needle biopsies or localized therapies. However, the nonlinear deformation of various tissue types presents difficulties for surgeons in achieving precise needle tip placement, particularly when relying on low-fidelity biopsy imaging systems. In this paper, we introduce a method to classify needle biopsy interventions and identify tissue types based on a comprehensive needle-tissue contact model that incorporates both position and force parameters. We trained a transformer model using a comprehensive dataset collected from a formerly developed robotics platform, which consists of synthetic and porcine tissue from various locations (liver, kidney, heart, belly, hock) marked with interaction phases (pre-puncture, puncture, post-puncture, neutral). This model achieves a significant classification accuracy of 0.93. Our demonstrated method can assist surgeons in identifying transitions to different tissues, aiding surgeons with tissue awareness.
|
|
11:20-11:50, Paper ThBT12.2 | |
A Multi-Scale Grid Attention Based Network for Blood Vessel Extraction from Retinal Fundus Images |
|
Wang, Xiaoyan | Zhejiang University of Technology |
Peng, Meifang | Zhejiang University of Technology |
Zheng, Yuanhao | Zhejiang University of Technology |
Yu, Jianhao | School of Computer Science, Zhejiang University of Technology |
Zhu, Yating | Zhejiang University of Technology |
Xia, Ming | Zhejiang University of Technology |
Keywords: Medical Informatics
Abstract: Accurate blood vessel extraction in medical images is a key step in the diagnose of vascular-related diseases and can also provide important guiding information for surgery. However, blood vessels are characterized by elongated and tortuous, irregular shapes, making the task of vessel segmentation challenging. Multi-scale contextual features can help to better understand the structure and morphology of an image, and attention mechanisms plays a quite essential role in the abstraction of contextual features. In this paper, we propose a multi-scale dual-branching blood vessel image segmentation method based on the grid attention mechanism. In the encoder, a dual combination of CNN and grid attention is used for multi-scale contextual information extraction, and then channel attention is utilized to fusion the feature representations extracted in a weighted manner, which can be better adapted to complicated vascular structures. In the decoder, each feature map of the encoder layer is up-sampled by sub-pixel convolution, and subsequently, the high-resolution feature maps obtained are concatenated to generate the final segmentation mask by further convolution operations. This design can fully utilize the multi-scale feature information extracted by the encoder while preserving the spatial structure and details of the image. We conducted experiments on two public retinal fundus image datasets and the results demonstrate that our approach outperforms CNN-based, attention-based and state-of-the-art CNN-Attention combined models.
|
|
11:50-12:00, Paper ThBT12.3 | |
Autoencoder-XGBoost Classifier (AeXGB) for Predicting Severity Level of Parkinson’s Disease from Spontaneous Speech |
|
Wai, Thiri | National Taiwan University |
Liao, Yu-Shan | National Taiwan University |
Liao, Ting-Yun | National Taiwan University |
Lin, Chin-Hsien | Department of Neurology, National Taiwan University Hospital Bei |
Hung, Chi-Sheng | National Taiwan University Hospital |
Fu, Li-Chen | National Taiwan University |
Keywords: Medical Informatics
Abstract: Parkinson’s disease (PD) is the cause of the gradual decline of nerve cells that control movement disorder disease, which is most common among the elderly in the US after Alzheimer’s disease. There are several studies on detecting Parkinson’s disease from speech using machine learning techniques; however, most of them focus on classifying healthy patients (HC) against Parkinson’s disease (PD). This paper focuses on developing a screening system that could detect healthy (HC) vs. mild Parkinson’s (MP) vs. severe Parkinson’s (SP) from spontaneous speech. Four acoustic feature sets were compared for the screening system. Our proposed AeXGB was also compared with six different classifying approaches. The result has shown that extracting the phonation features and using AeXGB could achieve an accuracy of 92% for classifying HC vs. MP vs. SP, which outperforms the traditional machine learning approach for three class classifications of Parkinson’s severity level from spontaneous speech.
|
|
12:00-12:20, Paper ThBT12.4 | |
Interactive Health Care Robot to Measure Vital Signs |
|
Saegusa, Ryo | Kanagawa Institute of Technology |
Ohno, Kensuke | Kanagawa Institute of Technology |
Okonogi, Rikuto | Kanagawa Institute of Technology |
Keywords: Medical Informatics, Human-Machine Interface, Human-Machine Interaction
Abstract: In order to maintain the health of aged people living in care facilities, care staffs need to check the vital conditions of the residents frequently. For the facility staffs, however, taking vital signals of residents is one of the time-consuming tasks, and the operation is strongly expected to be supported by modern robot technologies. In this study, we propose an interactive health care robot that measures vital signals of residents. In this interactive vital measurement, the health care robot gently approaches a resident and operates its arm to reach the resident’s hand. The robot then interactively requests the resident to grab the haptic vital handle. The novelty of our method is in the design of human-robot interaction. On the surface of the haptic vital handle, the vital sensors and pressure sensors are arrayed in order to detect physical gripping. The vital signals of the blood oxygen saturation and heart rate are recorded only when the robot is detecting pressure patterns and confident of the hand gripping. This mechanism allows the open contact measuring without clipping resident's fingers. We performed experiments to measure vital and pressure signals from healthy adults and evaluated the reliability of the measurement in the different configurations of the body postures denoted as the stance, on-chair, on-wheelchair, and on-bed.
|
|
12:40-13:00, Paper ThBT12.6 | |
EGLN: A Event Graph Learning Network for Patient Similarity with Chinese Electronic Medical Records (I) |
|
Zhu, Zhichao | Beijing University of Technology |
Li, Jianqiang | Beijing University of Technology |
Xu, Chun | Xinjiang University of Finance and Economics |
Zhao, Qing | Beijing University of Technology |
Keywords: Intelligence Interaction, Medical Informatics, Assistive Technology
Abstract: 图神经网络(GNN)和递归神经网络(RNN)的组合模型被广泛用于患者相似度计算。然而,这些研究主要使用医学概念来组织患者图谱,而电子病历(EMR)中的许多概念是超定向的,基于概念序列学习时间信息可能会给相似性计算带来噪声。为了解决这个问题,我们提出了一个事件图学习网络(EGLN)来学习患者的相似性。具体而言,我们首先利用训练好的事件提取(EE)模型来获取事件元素。然后,聚合每个医疗事件的副定向概念,为患者构建事件图,以避免副定向概念之间不存在的时间信息的负面影响。最后,对事件节点之间的时空语义信息进行聚合,进行相似度计算。我们利用真实世界的数据集评估了EGLN,实验结果表明,我们提出的EGLN模
|
|
12:40-13:00, Paper ThBT12.7 | |
Forecasting Slow-Wave Sleep Deficiency through Stress-Related Markers in Forehead EEG (I) |
|
Su, Cheng-Hua | National Yang Ming Chiao Tung University |
Ko, Li-Wei | National Chiao-Tung University |
Jung, Tzyy-Ping | University of California San Diego |
Onton, Julie | University of California San Diego |
Juang, Jia-Chi | Kaohsiung Medical University Hospital |
Hsu, Chung-Yao | Kaohsiung Medical University Hospital |
Keywords: Medical Informatics, Brain-Computer Interfaces, Wearable Computing
Abstract: Sleep quality is critical for human well-being. Lack of sleep and poor sleep quality impair daily cognitive functions and health. While stress has been recognized as a detrimental factor on sleep quality, the relationship between pre-sleep stress level, resting EEG and subsequent sleep structure remains to be explored. This study presents a novel approach that evaluates pre-sleep stress levels using a 2-channel EEG to predict slow-wave sleep (SWS) deficiency. We recorded forehead EEG immediately before sleep onset, then utilized power spectra and entropy analysis to extract stress-related neurological features, including beta/delta correlation, alpha asymmetry, fuzzy entropy (FuzzEn), and spectral entropy (SpEn). We found that individuals with SWS deficiency exhibited signs of stress, such as a robust beta/delta correlation, higher alpha asymmetry, and increased FuzzEn. Conversely, individuals with ample SWS displayed weak beta/delta correlation and reduced FuzzEn in EEG recordings. Finally, we tested the robustness of the selected neuro markers with two supervised learning models and found that the selected markers predict SWS deficiency with an accuracy above 70%. Our study demonstrated that stress-related neurological markers derived from pre-sleep EEG can effectively predict SWS deficiency. The proposed method can be integrated with a portable EEG device and sleep-improving interventions to develop a personalized sleep-improvement solution.
|
|
ThBT13 |
Room T13 |
IT - AI Applications, Deep Learning and Neural Networks |
|
Chair: Yoosaf, Salih | EQUIPO |
|
11:00-11:20, Paper ThBT13.1 | |
Development of Machine Learning Models to Predict Strip Breakage During the Aluminium Cold Rolling Process |
|
Ramos, Abreu, Tiago | University of Pernambuco |
Maciel, Alexandre | University of Pernambuco |
Keywords: Consumer and Industrial Applications, Manufacturing Automation and Systems, Decision Support Systems
Abstract: The aluminium processing industry relies on cold rolling as a crucial method for producing materials essential to everyday life. As a consequence of thickness reduction: Strip Breakage, which is one of the main degraders in the aluminum rolling process, arises. In more recent related studies, data were mainly chosen by experts, which can sometimes hide crucial causal factors. Therefore, this work develops Machine Learning models to predict strip breakage, by investigating the relationship between features and the breakage event, using attribute selection techniques to reduce the dimensionality of the problem, comparing classification methods based on trees, such as Decision Trees (DT), Random Forest (RF) and Extra Trees (ET). This work uses the following methodology: first, the models were evaluated for the complete set of data with default initial parameters, then a new evaluation of the models was carried out with the optimized parameters still with high data dimensions, and finally, an evaluation was carried out of the models with the best-selected attributes. The model that performed best was the DT with 21 attributes, with a recall or TPR (True Positive Rate) of 0.863 and an AUC (Compute Area Under of ROC Curve) of 0.926. The models also indicated that there is no unique cause that characterizes the breaks in the aluminium cold mill, but several others, such as the effect of the rolling mill, coil history domain, oil mill and lamination cylinders.
|
|
11:20-11:50, Paper ThBT13.2 | |
Equipping Large Language Models with Memories: A GraphRAG Based Approach |
|
Li, Tie | University of Electronic Science and Technology of China |
Keywords: Human-Machine Interaction, Cognitive Computing, Intelligence Interaction
Abstract: Large language models (LLMs) have demonstrated strong capabilities in natural language understanding and generation, but they lack mechanisms to effectively store and retrieve information from past interactions. This problem hinders their potential for building truly conversational applications. To address this problem, we propose an approach to integrating memory into LLMs using GraphRAG, which is a framework that leverages Knowledge Graph and Retrieval Augmented Generation techniques for retrieving historical interactions. By representing the key knowledge contained in the dialogue history as a knowledge graph, we can capture complex relationships between entities and concepts mentioned in previous turns. We also introduce a mechanism for effectively accessing relevant nodes with the current query, allowing for more focused and efficient recall of past interactions. We evaluate our approach on benchmark datasets for question answering, text summarization, and dialogue systems, demonstrating significant improvements in performance compared to baseline LLMs. Our findings highlight the potential of GraphRAG as a powerful tool for equipping LLMs with robust memory capabilities, paving the way for more sophisticated and context-aware AI applications.
|
|
11:50-12:00, Paper ThBT13.3 | |
MediVision - Advanced Medical Imaging and Visualization through AR/VR |
|
Yoosaf, Salih | EQUIPO |
Ziyad, Naeema | MACE |
Keywords: Virtual and Augmented Reality Systems, Virtual/Augmented/Mixed Reality, Medical Informatics
Abstract: "MediVision - Advanced Medical Imaging and Visualization through AR/VR Integration" represents a pioneering leap in healthcare technology by seamlessly integrating augmented reality (AR) and virtual reality (VR) into medical imaging. This innovative approach transforms conventional 2D medical images into immersive 3D models, enhancing diagnostic precision and enabling in-depth anatomical exploration. By leveraging Unity 3D and advanced image processing algorithms, the project creates dynamic representations that allow medical professionals to interact with anatomical structures in unprecedented ways. The key innovation lies in MediVision's ability to convert DICOM and other medical image formats into interactive 3D models. This transformation not only enhances the visualization of complex anatomical features but also facilitates comprehensive understanding and analysis of medical conditions. Through intuitive gestures and interactive labeling within AR and VR environments, clinicians can manipulate and explore these models in real-time, improving diagnostic accuracy and treatment planning. Beyond clinical applications, MediVision supports medical education by providing realistic simulations of surgical procedures and disease progression. This educational component is crucial for training healthcare professionals and fostering continuous learning in the medical field. From a humanitarian perspective, MediVision contributes to sustainable development goals by advancing healthcare accessibility and quality. By enhancing medical imaging capabilities, it aims to reduce diagnostic errors, optimize treatment outcomes, and ultimately improve patient care globally. The project also promotes environmental sustainability by reducing the need for physical models and enabling remote consultations, thereby minimizing travel and associated carbon footprints. In conclusion, MediVision exemplifies how cutting-edge technology can transform healthcare delivery. By bridging the gap between traditional diagnostics and immersive visualization, it sets a new standard for medical imaging that prioritizes accuracy, accessibility, and educational advancement in the pursuit of better health outcomes worldwide.
|
|
12:00-12:20, Paper ThBT13.4 | |
MAFEN: Multiattention Feature Extraction Network for Remote Sensing Scene Classification |
|
Cao, Feng | Shanxi University |
Yang, Shuiyuan | Shanxi University |
Li, Deyu | Shanxi University |
Chongben, Tao | Suzhou University of Science and Technology |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: The scene classification of high-resolution remote sensing images (HRSIs) provides crucial information for the high-level semantic understanding of the complicated earth’s surface. Some existing methods concatenate local features with global features to complete the classification task. However, simple feature concatenation cannot fully exploit the advantages of the above two features. This study proposes a novel multiattention feature extraction network (MAFEN) for the scene classification of HRSIs. First, a dual-branch network is proposed to extract local and global features from the scene images. Subsequently, an interattention fusion module is utilized to merge the local and global features. In addition, the multilevel feature fusion module helps avoid the loss of shallow and middle-layer features. Finally, an aggregated loss is employed to enhance the feature fusion of the dual-stream feature extractor by considering the contributions of different features. Experimental evaluations are conducted on three widely used public datasets for scene classification. Results confirm the potential of MAFEN for remote sensing scene classification tasks when compared with currently advanced methods in the field.
|
|
12:20-12:40, Paper ThBT13.5 | |
An Empirical Fractal Measure to Predict the Generalization of Deep Neural Networks |
|
Florindo, Joao Batista | University of Campinas |
Misturini, Davi | Universidade Estadual De Campinas |
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications
Abstract: The prediction of the generalization error in deep neural networks is a fundamental task with important implications, both in theoretical and practical terms. Recently, inspired by the intuitive conjecture that deep neural networks possess some low-dimensional embedding, which is the main responsible for the learning progress, fractal dimension has been presented as a promising measure to describe the intrinsic dimension of this embedding and, as an immediate consequence, also providing more reliable prediction of generalization. Nevertheless, the investigated approaches rely on the indirect modeling of the training dynamics and involves an analysis of the entire trajectory of the parameter space, which is prohibitively expensive in computational terms. Based on this, here we propose a straightforward and efficient numerical method to calculate the fractal dimension of the parameter distribution and use it as a predictor of the generalization error. The method takes inspiration from the variogram method to estimate the fractal dimension of a function profile and relies on the empirical power-law scaling between a small perturbation added to the trained parameters and the corresponding displacement in the loss function. The proposed measure is compared with other classical and state-of-the-art generalization measures in the literature using different hyperparameter configurations and the results suggest its promising potential, achieving competitive performance in all the analyzed datasets. Our study also proposes an adapted loss function that accounts for the fractal dimension and a theoretical generalization bound.
|
|
ThCT1 |
MR01 |
Complex and Cooperative Systems 2 |
|
Chair: Tian, Yufeng | Chongqing University |
|
14:00-14:20, Paper ThCT1.1 | |
Optimizing Few-Shot Learning with Relational Synthesis and Data Augmentation Techniques |
|
Dong, HaoBo | Harbin University of Commerce |
Cai, Zi hao | Central South University |
Hou, Yao Hui | Harbin University of Commerce |
Wu, Xin | Harbin University of Commerce |
Jiang, Ji hong | Harbin University of Commerce |
Liu, Dewang | Harbin University of Commerce |
Liu, Kewen | Harbin University of Commerce |
Keywords: Deep Learning
Abstract: This study tackles significant challenges in few-shot learning (FSL), which is particularly important for machine learning applications in data-scarce environments. We address the issue of inconsistent data augmentation methods across studies, which complicates fair model performance comparisons, by implementing a standardized suite of automated data augmentation techniques aimed at enhancing model generalization and reducing computational demands. Furthermore, we introduce the Relational Synthesis Network (RSNet), featuring two innovative modules designed to improve the model's efficacy in sparse data settings. The first module, Auto-Associative Embedding (AAE), boosts feature representations by amplifying self-correlation among image features, significantly aiding in the recognition of patterns in minimally sampled data. The second module, Mutual Correlation Attention (MCA), employs advanced spatial attention mechanisms to selectively enhance the most informative features, improving the model's accuracy and adaptability to new categories. Comprehensive validations on benchmark datasets like CIFAR-FC and miniImageNet show that this integrated strategy not only elevates overall model performance but also markedly enhances its generalization capabilities to unseen categories, setting a new standard in FSL.
|
|
14:20-14:40, Paper ThCT1.2 | |
Event-Triggered Unified Performance State Estimation for Neural Networks with Time-Varying Delays (I) |
|
Tian, Yufeng | Chongqing University |
Su, Xiaojie | Chongqing University |
Shi, Peng | University of Adelaide, Adelaide |
Galambos, Peter | Obuda University |
Shen, Chao | Xi'an Jiaotong University |
Zhang, Linsong | Jianghuai Advance Technology Center |
Keywords: Complex Network, Neural Networks and their Applications
Abstract: This paper tackles the problem of event-triggered unified performance state estimation in neural networks with time-varying delays. A novel event-triggered methodology is introduced, aiming to balance the performance of the state estimator and the network’s communication bandwidth. The proposed method leverages a triggered-parameter-dependent integral inequality with matrices that consider the event-triggered mechanism, capturing the interplay between the time-varying delay and system states. This innovative approach guarantees the asymptotic stability of the estimation error system, thereby meeting the Hinfty performance criterion. The efficacy of the proposed condition is demonstrated by a numerical example.
|
|
14:40-15:00, Paper ThCT1.3 | |
Boosting a Non-Negative Matrix Factorization-Based Community Detector Via Graph Convolution Regularization (I) |
|
Liu, Zhigang | Dongguan University of Technology |
Li, Weiling | Dongguan University of Technology |
Zhong, Yurong | Dongguan University of Technology |
Keywords: Complex Network, Representation Learning, Knowledge Acquisition
Abstract: Community detection sheds light on various graph mining tasks such as social recommendation, which is becoming a long-standing issue in the realm of complex network analysis. A non-negative matrix factorization (NMF) is frequently used to tackle this task. However, it builds based on the principle of linear representation and has difficulty in capturing non-linear features from irregularly non-Euclidean data. To address this issue, this study boosts an NMF-based community detector by combining with a graph convolution module, and a novel Graph Convolution and Graph-Laplacian bi-regularized, Symmetric non-negative matrix factorization (GCGS) model is proposed relying on two main ideas: a) taking a graph convolution network (GCN) as a non-linear constraint module on the feature matrix to ensure its non-linearity; and b) adopting graph regularization to preserve the local geometric features of the network topology. A non-negative and multiplicative update (NMU) algorithm is then derived to solve the unified objective function. Extensive experimental results on six real networks indicate that GCGS achieves higher precision in community detection than its peers.
|
|
15:00-15:20, Paper ThCT1.4 | |
Reconstruction and Prediction of Spatio-Temporal Temperature and Salinity Fields in Controlled River Using Data-Driven Deep Fusion (I) |
|
Jia, Lei | University of Aizu |
Pei, Yan | University of Aizu |
Yen, Neil | University of Aizu |
Keywords: Deep Learning, Complex Network
Abstract: Accurately estimating the temperature and salinity structure of lakes or reservoirs is crucial for understanding terrestrial hydrological processes and pollutant transport pathways. However, key parameters for solute transport models in hydrodynamic systems are difficult to obtain directly and often require inversion simulations involving multi-source solute parameters. This study addresses the challenges of multi-source heterogeneous data assimilation and the optimization of training data distribution for alternative models. We develop a hybrid algorithm that solves data assimilation issues for multi-source heterogeneous data in reactive solute transport inversion simulations. By combining posterior inference of states with the prior distribution of parameters, we propose a novel coupled paradigm and explore the impact of identifying characteristic parameters across multiple scenarios and the performance advantages of surrogate models.
|
|
ThCT2 |
MR02 |
Evolutionary and Heuristic Computation 1 |
|
Chair: Qi, Xiaowen | University of Maryland |
|
14:00-14:20, Paper ThCT2.1 | |
An Adaptive Multi-Stage Evolution Algorithm for High-Dimensional Expensive Problems |
|
Zhang, Boyuan | Beihang University |
Lai, Rui | Beihang University |
Gong, Guanghong | Beihang University |
Yuan, Haitao | Beihang University |
Yang, Jinhong | CSSC Systems Engineering Research Institute |
Zhang, Jia | Southern Methodist University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, AI and Applications
Abstract: Recently, many studies have used evolutionary algorithms (EAs) to optimize complex problems across various fields, including mechanical structure design, robotics, and cloud computing. EAs simulate the process of evolution to improve solutions to a given problem iteratively. However, EAs encounter significant challenges when dealing with high-dimensional expensive problems (HEPs). The large solution space and high computing cost of fitness evaluations (FEs) make optimization with limited FEs particularly difficult. To tackle this problem, an Adaptive Multi-stage Evolution Algorithm named AMEA is proposed. In AMEA, an adaptively enhanced teaching-learning-based optimization algorithm is adopted to explore the search space and find potential areas quickly. Then, in the next stage, the Gaussian process surrogate model and a genetic learning particle swarm optimization algorithm are adopted for further exploitation. Besides, this work proposes an adaptive stage switching criterion and an individual screening mechanism to enhance the optimization ability. AMEA demonstrates strong optimization performance when applied to HEPs. We compare AMEA with several state-of-the-art HEP optimization algorithms through seven benchmark functions, and the results show that it performs competitively with other algorithms. Finally, we validate AMEA’s effectiveness with a real-world computation offloading problem.
|
|
14:20-14:40, Paper ThCT2.2 | |
Large-Scale Evolutionary Multiobjective Optimization: An Experimental Study |
|
Tseng, Tser-Ru | National Taiwan Normal University |
Chiang, Tsung-Che | National Taiwan Normal University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms
Abstract: Evolutionary multiobjective optimization (EMO) has been a subject of intensive study in the past two decades, owing to its research challenges and practical values. With the progress and development of multiobjective evolutionary algorithms (MOEAs), recent research efforts have shifted to addressing large-scale EMO, which refers to applying evolutionary algorithms to solve multiobjective optimization problems with 100 or more decision variables. In this study, we delve into the design of eight large-scale MOEAs and evaluate their performance under different problem scales and computational resource. Based on the experimental results, we identify suitable algorithms in different scenarios. We also present observations and findings on the relationships between algorithm design concepts and performance.
|
|
14:40-15:00, Paper ThCT2.3 | |
On the Use of the Total Constraint Violation As an Additional Objective in Evolutionary Multi-Objective Optimization |
|
Nan, Yang | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Shu, Tianye | Southern University of Science and Technology |
Gong, Cheng | City University of Hong Kong |
Keywords: Evolutionary Computation
Abstract: In real-world applications, multi-objective optimization problems (MOPs) usually have multiple constraints. To solve constrained MOPs (CMOPs), various constraint handling techniques (CHTs) were proposed in the field of evolutionary multi-objective optimization (EMO). A simple CHT with high applicability is to use the total constraint violation as an additional objective. The total constraint violation-based CHT transforms a constrained (m-1)-objective MOP to an unconstrained m-objective MOP. This CHT was also used to create a real-world unconstrained multi-objective test suite called RE from real-world constrained problems. Recently, the RE test suite has been frequently used for evaluating EMO algorithms. Only when the additional objective value is zero (i.e., only when the total constraint violation is zero), solutions are feasible in the original constrained MOP. This means that feasible solutions of the original constrained MOP are located on the boundary of the Pareto front of the formulated unconstrained MOP. As a result, the final population of an EMO algorithm on the formulated unconstrained MOP includes many infeasible solutions of the original constrained MOP. This means that good solution sets for the unconstrained MOP are not always good solution sets for the original constrained MOP. In this paper, we propose an improved total constraint violation-based CHT. The core idea is to use not only positive constraint violations but also negative constraint violations. We apply the proposed CHT to real-world constrained MOPs. Experimental results show that the proposed modification improves the quality of feasible solutions obtained by the total constraint violation-based CHT.
|
|
15:00-15:20, Paper ThCT2.4 | |
Balancing Fairness and Accuracy for Predictive Models in Criminal Justice Applications Using Multi-Objective Optimization Methods |
|
Qi, Xiaowen | University of Maryland |
Ma, Yujunrong | University of Maryland |
Nakamura, Kiminori | University of Maryland |
Bhattacharyya, Shuvra | University of Maryland, College Park |
Keywords: Heuristic Algorithms, Evolutionary Computation, Application of Artificial Intelligence
Abstract: In the field of predictive modeling for criminal justice applications, the dual challenges of ensuring fairness and maintaining interpretability are crucial. This paper addresses these challenges by introducing a new approach to optimizing decision trees using evolutionary algorithms (EAs). Our approach focuses on refining decision trees to achieve a balance between accuracy and algorithmic fairness, a task complicated by potential bias present in historical data. By leveraging the principles of multi-objective optimization, our model systematically trades off prediction accuracy and fairness. The evolutionary process characterized by selection, crossover, and mutation is tailored to fit the decision tree structure, ensuring that model development is not only accurate but also promoting measurable fairness. Experimental results demonstrate the effectiveness of our approach in providing interpretable and fair predictive models that can be considered for high-stakes applications in criminal justice. More broadly, this research makes a significant contribution to the field of explainable machine learning, providing a powerful framework for engineering systems that are transparent, fair, and adaptable to different data environments.
|
|
15:20-15:40, Paper ThCT2.5 | |
A Brain Tumor Segmentation Approach with Adaptive Threshold Optimization Numerical Spiking Neural P Systems |
|
Dong, Jianping | Chengdu University of Information Technology |
Zhang, Gexiang | Chengdu University of Information Technology |
Rong, Haina | Southwest Jiaotong University |
Fortino, Giancarlo | University of Calabria |
Chen, Min | Huazhong University of Science and Technology |
Keywords: AI and Applications, Evolutionary Computation, Metaheuristic Algorithms
Abstract: Magnetic resonance imaging (MRI) with the high-resolution in computer-aided diagnostic technology is widely used to provide doctors with diagnostic advice, especially in brain tumor segmentation. In addition, MRI multi-sequence images of brain tumors also provide better image data support for studying brain tumor segmentation. In this paper, an adaptive threshold segmentation numerical optimization spiking neural P system (ATONSNPS or ATONSN P system) is designed to dynamically adjust the threshold quantity. In addition, the ATONSN P system and connectivity algorithm are combined to finish multi-sequence brain tumor segmentation. Experimental results on BraTS2019 show that the multi-sequence brain tumor segmentation approach can achieve more effective segmentation of brain tumor images comparing with several benchmark algorithms.
|
|
15:40-16:00, Paper ThCT2.6 | |
Exploring the Potential of Discrete Chaotic Evolution Algorithm for Combinatorial Optimization (I) |
|
Ding, Yi | University of Aizu |
Meng, Xiang | University of Aizu |
Pei, Yan | University of Aizu |
Li, Jianqiang | Beijing University of Technology |
Keywords: Evolutionary Computation
Abstract: We propose an extension of the chaotic evolution algorithm into the discrete domain to address combinatorial optimization problems. In this study, we leverage the discrete chaotic evolution algorithm to tackle the Traveling Salesman Problem (TSP) for assessment purposes. The chaotic evolution algorithm exploits the ergodicity of chaos to facilitate the search process within the optimization algorithm. It incorporates a mathematical mechanism into the iterative evolution process, simulating ergodic motion within a search space based on a simple principle. To manage the discrete mutation operation within the chaotic evolution algorithm, we introduce a specifically designed chaotic operation. This operation is tailored for its application in solving combinatorial optimization problems. The chaotic sequence plays a crucial role in determining the mutation location. Our evaluation involves the comparison of our proposed discrete chaotic evolution algorithm with the outcomes of the simulated annealing algorithm and the tabu search algorithm. The assessment serves to demonstrate and validate that the discrete chaotic evolution algorithm yields superior optimization performance within the discrete domain.
|
|
ThCT5 |
MR05 |
Discrete Event and Distributed Systems 1 |
|
Chair: Kawakami, Hiroshi | Kyoto University of Advanced Science |
|
14:00-14:20, Paper ThCT5.1 | |
GN: Guided Noise Eliminating Backdoors in Federated Learning |
|
Huang, Siquan | South China University of Technology |
Gao, Ying | South China University of Technology |
Chen, Chong | South China University of Technology |
Shi, Leyu | South China University of Technology |
Keywords: Distributed Intelligent Systems
Abstract: Federated learning (FL) trains a model collaboratively but is susceptible to backdoor attacks for its privacy-preserving nature. Existing defenses against backdoor attacks in FL always make specific assumptions on data distributions among clients and are ineffective against sophisticated attacks. Although adding noise mitigates backdoors injected in the model, it simultaneously negatively impacts the main performance. To address the aforementioned issues, we propose a novel defense mechanism, Guided Noise (GN), that eliminates backdoors without compromising the model's main performance. GN achieves this by utilizing conductance to evaluate the importance of neurons and subsequently adding guided noise to suspected backdoor neurons selected by voting, which only disturbs the backdoor task. Extensive experimental evaluations of GN show its significant superiority over traditional noising-based defenses, making it a valuable replacement for existing noising to enhance the robustness of existing defenses against backdoor attacks in FL.
|
|
14:20-14:40, Paper ThCT5.2 | |
Designing Antifragile Physical Systems |
|
Kawakami, Hiroshi | Kyoto University of Advanced Science |
Mori, Kazuyuki | Mitsubishi Electric Corporation |
Iwawaki, Tomoyuki, Tomoyuki | Mitsubishi Electric |
Keywords: System Architecture, Distributed Intelligent Systems, Quality and Reliability Engineering
Abstract: This paper examines how to make physical systems antifragile. Regarding "fragile" as the property of going from zero to negative easily, "robust" as that of not going negative easily, and "resilient" as that of going from negative to zero again, "antifragile" can be regarded as that of going from negative to positive through zero. Antifragility can usually be observed in nature and is now being examined to recognize in artificial systems. Among such examinations, this paper allows such systems to include physical modules and aims to "synthesize" new systems rather than "analyze" existing systems. Starting from organizing definitions and examples of antifragility, we analyzed and organized the terms used to describe antifragility in previous studies. Based on the analysis, this paper shows strategies to make physical systems antifragile.
|
|
14:40-15:00, Paper ThCT5.3 | |
Hierarchical Policy Optimization for Cooperative Multi-Agent Reinforcement Learning |
|
He, ShunFan | ZheJiang University |
Zheng, Ronghao | Zhejiang University |
Zhang, Senlin | Zhejiang University |
Liu, Meiqin | Xi'an Jiaotong University |
Keywords: Cooperative Systems and Control, Distributed Intelligent Systems, Large-Scale System of Systems
Abstract: To handle the non-stationarity of the environment and the curse of dimensionality issues in multi-agent reinforcement learning, gathering information through communication is a critical part. Existing frameworks have proposed centralized or distributed structures to deal with the problem. However, they either have problems of robustness or problems of high communication costs. This paper adopts a hierarchical zeroth-order policy optimization (HZOPO) algorithm for cooperative multiagent reinforcement learning (MARL) problems. The agents are divided into different groups with high- and low-level entities. A hierarchical communication structure is implemented to reach global consensus. It is shown that the HZOPO algorithm can balance both convergence and communication efficiency in cooperative MARL environments. The convergence of the algorithm is also proved.
|
|
15:00-15:20, Paper ThCT5.4 | |
Expedited Block Transmission in Blockchain Network by Using Clusters |
|
Hua, Xiaoqi | Nanjing University of Information Science and Technology |
Chen, Jian | Nanjing University of Information Science and Technology |
Zhang, Peiyun | Nanjing University of Information Science and Technology |
Fu, Zhangjie | Nanjing University of Information Science and Technology |
Zhu, Haibin | Nipissing University |
Lu, Kezhong | Chizhou University |
Ren, Jigang | Nanjing University of Information Science and Technology |
Keywords: Distributed Intelligent Systems
Abstract: Blockchain technology has garnered increasing attention from researchers. Because blockchain systems may contain malicious or spatially limited nodes that may delay block verification and reduce block transmission rate, this work proposes a block transmission model by designing and using special clusters. This work proposes the cluster formation and selection mechanisms. Nodes are grouped into clusters in a blockchain, and clusters with high fitness values are chosen to transmit blocks by calculating their trust values and block transmission rates. The proposed method is compared with the peers: Layer-Chain, BlockP2P-EP and RNS. According to experimental findings, the proposed method is superior to its peers regarding the time needed for block synchronization and transmission, block occupation storage ratio, transaction throughput, and block transmission success ratio.
|
|
15:20-15:40, Paper ThCT5.5 | |
Privacy Preservation Distributed Tracking Consensus with Secure Event-Based Control for Nonlinear Multiagent System |
|
He, Yejie | University of Electronic Science and Technology of China |
Chen, Yong | University of Electronic Science and Technology of China |
Keywords: Distributed Intelligent Systems, Discrete Event Systems, System Modeling and Control
Abstract: In this paper, the distributed tracking consensus control for a nonlinear multiagent system with the event trigger mechanism and quantization is investigated and the privacy preservation for each agent is researched. Firstly, a nonlinear multiagent system with quantized states and control signals is considered. The stability of the system influenced by quantization and triggering mechanism with time-varying thresholds is proved. Moreover, to protect the privacy concerning the states, control signals, and trigger thresholds of agents, a privacy-preserving diagram is proposed based on addictive homomorphic encryption and a secure comparison algorithm. The security and the impact on the stability of the proposed diagram is discussed. Finally, a comparison experiment demonstrates that the proposed privacy preservation control strategy achieves better tracking accuracy, reduces more communication burden and guarantees the security of the privacy of systems.
|
|
15:40-16:00, Paper ThCT5.6 | |
DecTest: A Decentralised Testing Architecture for Improving Data Accuracy of Blockchain Oracle |
|
Zeng, Xueying | Guangxi Normal University |
Xian, Youquan | Guangxi Normal University |
Li, Chunpei | Guangxi Normal University |
Hu, Zhengdong | Guangxi Normal University |
Zhou, Aoxiang | Guangxi Normal University |
Liu, Peng | Guangxi Normal University |
Keywords: Trust in Autonomous Systems, Distributed Intelligent Systems, Quality and Reliability Engineering
Abstract: Blockchain technology ensures secure and trustworthy data flow between multiple participants on the chain, but interoperability of on-chain and off-chain data has always been a difficult problem that needs to be solved. To solve the problem that blockchain systems cannot access off-chain data, oracle is introduced. However, existing research mainly focuses on the consistency and integrity of data, but ignores the problem that oracle nodes may be externally attacked or provide false data for selfish motives, resulting in the unresolved problem of data accuracy. In this paper, we introduce a new Decentralized Testing architecture (DecTest) that aims to improve data accuracy. A blockchain oracle random secret testing mechanism is first proposed to enhance the monitoring and verification of nodes by introducing a dynamic anonymized question-verification committee. Based on this, a comprehensive evaluation incentive mechanism is designed to incentivize honest work performance by evaluating nodes based on their reputation scores. The simulation results show that we successfully reduced the discrete entropy value of the acquired data and the real value of the data by 61.4%.
|
|
15:40-16:00, Paper ThCT5.7 | |
Temporal Pattern-Aware QoS Prediction with Privacy-Preserving Via Federated Learning Based on Latent Factorization of Tensors (I) |
|
Zhong, Shuai | Southwest University |
Tang, Zetong | Southwest University |
Wu, Di | Southwest University |
Keywords: Distributed Intelligent Systems
Abstract: 在与大数据和服务计算相关的应用中,往往会遇到动态连接,尤其是Web服务中用户视角服务质量(QoS)的动态数据。它们被转化为高维和不完全 (HDI) 张量,其中包括丰富的时间模式信息。张量潜在因子分解 (LFT) 是从 HDI 张量中提取此类模式的一种非常有效且典型的方法。然而,当前的 LFT 模型要求将 QoS 数据维护在一个中心位置(例如,中央服务器),这对于越来越敏感隐私的用户来说是不可能的。针对这一问题,本文创造性地设计了一种基于张量潜在分解(FL-LFT)的联合学习方法。构建面向数据密度的联邦学习模型,使隔离用户能够在保护用户隐私的同时,协同训练全局LFT模型。对从现实世界收集的 QoS 数据集进行的大量实验证实,与最先进的ň
|
|
ThCT6 |
MR06 |
Manufacturing Automation and Systems |
|
Chair: Chen, Ting You | Gradaute Degree Program of Robotics, National Yang Ming Chiao Tung University |
|
14:00-14:20, Paper ThCT6.1 | |
Learning-Based Adaptive Spatiotemporal Modeling of Industrial Distributed Processes |
|
Wang, Tianyue | City University of Hong Kong |
Li, Han-Xiong | City University of Hong Kong |
Keywords: Manufacturing Automation and Systems, Modeling of Autonomous Systems, Intelligent Green Production Systems
Abstract: This paper proposed a learning-enabled approach for adaptive spatiotemporal modeling of industrial distributed processes. Within the framework of Karhunen-Loève (KL) separation, the spatial basis functions (SBF) are updated online in a forgetful learning mode to capture spatial dynamics. Then, the temporal model is also updated iteratively in a forgetting mode to adjust temporal dynamics. Finally, the predicted spatiotemporal state is obtained via Time/Space synthesis. This dual forgetting mechanism embedded in the model can adaptively track the spatiotemporal dynamic changes, thus achieving better modeling effects. The experimental validation of distributed thermal processes in battery operation demonstrates the modeling efficacy of the designed learning approach.
|
|
14:20-14:40, Paper ThCT6.2 | |
Production-Logistics Collaborative Scheduling in Dynamic Flexible Job Shops Via Multi-Objective Deep Reinforcement Learning |
|
Shi, Jiaxuan | Tongji University |
Qiao, Fei | Tongji University |
Ma, Yumin | Tongji University |
Liu, Juan | Tongji University |
Wang, Junkai | Tongji University |
Keywords: System Modeling and Control, Manufacturing Automation and Systems
Abstract: Production scheduling and logistics scheduling are vital means for organizing manufacturing activities in flexible job shops. Given the intricate coupling relationship between them, corresponding collaborative scheduling becomes urgent need and challenging. Meanwhile, the actual manufacturing process is inevitably affected by disturbances, necessitating the consideration of dynamic environments. To this end, this study investigates a new production-logistics collaborative scheduling problem in dynamic flexible job shops (PLCSP-DFJS). The high-frequency disturbance of new job arrivals is incorporated into the PLCSP-DFJS, and two objectives, namely makespan and total logistics cost, are optimized. A multi-objective deep reinforcement learning (MODRL) method is presented to solve PLCSP-DFJS. In MODRL, a weight-decomposition and neighborhood-inheritance training mechanism is devised to obtain the near-optimal Pareto front, and a dual-level channel-driven framework capable of achieving decentralized decision-making of production and logistics is designed. The performance of MODRL is verified through experiments conducted in an aviation component production shop.
|
|
14:40-15:00, Paper ThCT6.3 | |
LumChromNet: A Dual-Path Neural Network Framework for Optical Measurement in VR Systems |
|
Leng, Zhaoqing | University of Science and Technology of China |
Zhao, Zhengang | University of Science and Technology of China |
Yu, Yanwei | University of Science and Technology of China |
Zhou, Xiaoyu | University of Science and Technology of China |
Dong, Xu | University of Science and Technology of China |
Zhang, Yican | University of Science and Technology of China |
Keywords: Manufacturing Automation and Systems, Adaptive Systems, Technology Assessment
Abstract: Traditional optical instruments for VR glasses are costly, bulky, and complex, while current automated solutions lack sufficient measurement range and accuracy. To optimize VR glasses' optical characteristics, we introduce LumChromNet, an innovative dual-path neural network framework for automated production solutions. LumChromNet comprises two key components: First, for luminance calibration, it utilizes a stacked neural network combining Gaussian Process Regression (GPR) and Multilayer Perceptron (MLP) to capture luminance features under low illumination and to process nonlinear characteristics across different light levels. By integrating the multiscale hybrid features of GPR with the perceptual capabilities of MLP and fusing through Random Forest, it achieves high precision calibration and a wide range under limited data samples. Second, for chromaticity calibration, it employs the Adaptive Multidimensional Color Space Transformation Network (AMCS-TransNet), expanding the dimensions of the color space and incorporating a linear transformation layer with a dynamic learning matrix to enhance the stability of color space conversion. This ensures accurate conversion from RGB to CIE XYZ space, adaptable to white points across different dimensions. LumChromNet extends the luminance grayscale range to 0-255, with a fitting error below 1.87%, and enhances color space R² to 0.998 with an average error within 1.5%. It expands the luminance measurement range to 0-250cd/m² and maintains optical property errors under ±5%, providing a cost-effective, flexible, and automated strategy for VR glasses and other devices.
|
|
15:00-15:20, Paper ThCT6.4 | |
Channel-Time Attention Based Patch-Attribute Alignment for Zero-Shot Fault Diagnosis |
|
Zuo, Liangqing | Tongji University |
Wang, Han | Tongji University |
Zhang, Xiaohan | Tongji University |
Liu, Qing | Tongji University |
Xu, Gaowei | Tongji University |
Liu, Min | Tongji University |
Keywords: Fault Monitoring and Diagnosis, Manufacturing Automation and Systems, Quality and Reliability Engineering
Abstract: Recently, zero-shot fault diagnosis has gradually attracted the attention of researchers because that some type of data cannot be obtained in advance in practical production scenarios. However, the existing zero-shot methods hardly utilize multi-channel data information and learn the potential correlation between signal patches and attributes for unseen fault classes, resulting in unsatisfactory diagnostic accuracy. To address these issues, we propose a patch-attribute alignment method based on channel-time attention for zero-shot fault diagnosis. First, a feature extraction module with channel attention is introduced to obtain the multi-channel information of signal patch. Then the feature map and the corresponding attribute vector are processed interactively by an embedding-reconstruction structure. Finally, a patch-attribute alignment module with time attention CNN is utilized to predict the attribute vector and consequently obtain the corresponding fault diagnosis label. Extensive experiments show that the proposed method outperforms baseline zero-shot fault diagnosis methods, and ablation experiments demonstrate the effectiveness of each module.
|
|
15:20-15:40, Paper ThCT6.5 | |
Should a Platform Provide Per-Use Rental Services? |
|
Chen, Jinyi | Northwestern Polytechnical University |
Liu, Chenguang | Northwestern Polytechnical University |
Dong, Yongjie | Northwestern Polytechnical University |
Keywords: Consumer and Industrial Applications, Service Systems and Organizations, Decision Support Systems
Abstract: Driven by the growing societal emphasis on sustainable consumption, per-use rental channels are being integrated by platforms alongside their traditional standalone resale channels. Motivated by this business practice, this study aims to answer the question: should a platform operating a retail channel introduce a per-use rental channel? To this end, two analytical models are developed. One where the platform only resells the manufacturer's products (the S model) and another where the platform incorporates a per-use rental channel alongside the resale channel (the SR model). We study the channel structure selection of the platform between the S and the SR models. Some important findings are obtained. Firstly, the platform may not always benefit from adding a per-use rental channel and will benefit only if the utility that consumers gain from each usage is high. Secondly, the retail price of the platform will increase in the wholesale price of the manufacturer, but the rental price will not be affected. Lastly, the parameters such as the psychological cost, utility per usage, and maximum usage frequency of customers exert different influence on manufacturer and platform pricing, and the profits of both entities.
|
|
15:40-16:00, Paper ThCT6.6 | |
A Modular and Coordinated Multi-Agent Framework for Flexible Job-Shop Scheduling Problems with Various Constraints |
|
Zhou, Qi | Harbin Institute of Technology (ShenZhen) |
Cheng, Zhengtao | Harbin Institute of Technology (ShenZhen) |
Wang, Hongpeng | Harbin Institute of Technology (ShenZhen) |
Keywords: Manufacturing Automation and Systems, System Modeling and Control, System Architecture
Abstract: The Flexible Job-shop Scheduling Problem (FJSP) is essential in today's industrial manufacturing, as it can greatly enhance efficiency in production via real-time data processing. In FJSP, it is important to consider various constraints due to the complexity of real-world production environments. Traditional Meta-heuristic methods encounter challenges in accommodating intricate problem constraints and suffer from high computational complexity, while rule-based methods perform poorly. Single-agent deep reinforcement learning frameworks are not equipped to handle complex real-world production issues. On the other hand, although existing multi-agent deep reinforcement learning frameworks designed for FJSP can solve FJSP with various constraints through minor adjustments, they often lack coordination among agents, leading to inefficient performance and unstable training. In this paper, we design a modular and coordinated multi-agent Deep Reinforcement Learning (DRL) framework that can solve FJSP problems with various constraints by adding agents tailored to specific constraints. We introduce a novel multi-agent coordinated proximal policy optimization algorithm (MACPPO), which promotes cooperation among agents by achieving dynamic credit allocation. We conducted experiments on multiple-sized instances under various constraints including equipment calendars and transportation, to verify the superiority of our framework. Experimental results show the effectiveness of the proposed novel method in addressing both the original FJSP problem and the FJSP problem with transportation and equipment calendar constraints. This efficiency is achieved compared to other well-known scheduling approaches, demonstrating the flexibility and efficiency of the architecture we proposed under diverse constraints.
|
|
15:40-16:00, Paper ThCT6.7 | |
Development of a Mixed-Reality Dual-User Training System for Robot Laparoscopic Surgery (I) |
|
Chen, Ting You | Gradaute Degree Program of Robotics, National Yang Ming Chiao Tu |
Young, Kuu-Young | National Yang Ming Chiao Tung University |
Keywords: Robotic Systems, Cyber-physical systems, Cooperative Systems and Control
Abstract: As robot surgery becomes more popular these days, it consequently solicits a high demand on its training system. Currently, most of the systems are intended for personal training, while it may be more effective if an experienced expert can also be involved. It thus motivates us to propose a dual-user robotic surgical training system that can closely link the mentor and trainee together. The system is developed in a mixed-reality environment that integrates the information from both the virtual and real worlds. The haptic feedback and virtual fixtures are employed for assistance during skill transfer. The proposed system can assess the proficiency level of the trainee for the mentor to provide proper assistance during training. The experimental results demonstrate that the proposed system is capable of enhancing the trainee’s ability for surgery. Especially, similar effectiveness is observed during comparative studies with direct hands-on guidance frequently adopted for skill transfer during sport training.
|
|
ThCT7 |
MR07 |
Online - AI Applications 9 |
|
Chair: Hu, Shasha | Sichuan Normal University |
|
14:00-14:20, Paper ThCT7.1 | |
Semantic Relation-Based Cross Attention Network for Image-Text Retrieval |
|
Zhou, Huanxiao | Qilu University of Technology (Shandong Academy of Sciences) |
Geng, Yushui | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Jing | Qilu University of Technology (Shandong Academy of Sciences) |
Ma, Xishan | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications
Abstract: 图文检索是指在给定图像或文本的情况下,查询另一种模态的文本或图像,其关键在于准确衡量文本和图像之间相似度的能力。现有的检索方法大多仅利用各模态的模态内关系或两种模态的模态间关系来执行检索任务。针对上述问题,本文整合了模态内和模态间的关系,提出了一种基于语义关系的交叉注意力网络(SRCAN)。在我们提出的方法中,我们首先挖掘图像中区域之间以及文本中单词之间可能存在的模态内关联,以获得具有语义关系的特征,然后通过交叉注意力机制捕获段落之间的细粒度关联。最后,平衡模态内和模态间的关系,以提高模型的性能。在Flickr30K和MS-COCO两个数据集上进行了实验验证,实验结果表明,该方法取得了优异的ă
|
|
14:20-14:40, Paper ThCT7.2 | |
Residual Diffusion Model for Joint Stochastic Trajectory Prediction in Roadside Surveillance Environments |
|
Li, Haoxuan | University of Chinese Academy of Sciences |
He, Wei | Shanghai Institute of Microsystem and Information Technology |
Wang, Tao | Shanghai Institute of Microsystem and Information TechnologyA |
Wang, Nan | ShanghaiTech University |
|
|
14:40-15:00, Paper ThCT7.3 | |
Disease Detection Module for SBCE Images Using Modified YOLOv8 |
|
Ye, Shiren | Changzhou University |
Zhang, Shuo | Changzhou University |
Meng, Qi | Changzhou University |
Wang, Hui | Changzhou University |
Zhu, Jiaqun | Changzhou University |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Image Processing and Pattern Recognition
Abstract: Small Bowel Capsule Endoscopy (SBCE) technology provides an intuitive and efficient means for detecting intestinal lesions. However, the generated videos and images, which can last for several hours, require a considerable amount of time for physicians to review. Utilizing deep learning and computer vision to automate lesion detection in SBCE is an effective approach to replace human experts. We propose a YOLO-SBCE model, based on the YOLOv8 framework, to address the specific characteristics of intestinal images, such as abnormal lesion morphology and fuzzy boundaries. The model introduces Channel-Spatial Omni-Dimensional Dynamic Convolution (CS-ODConv) to learn complementary attention along multiple dimensions in the kernel space, enabling the model to focus more on crucial features of the input. Additionally, an Alpha-CIoU loss function is employed to flexibly control the bounding box regression accuracy by adjusting the α value. To enhance the model's generalization capability and detection performance, the model incorporates a Multi-Head Context Aggregation (MHCA) network module. Finally, a lightweight Ghost module is adopted to reduce parameters and computational complexity, thereby improving inference speed. Experimental results demonstrate that YOLO-SBCE outperforms existing classical visual models on the SEE-AI and PASCAL VOC datasets, showcasing its superior performance.
|
|
15:00-15:20, Paper ThCT7.4 | |
Few-Shot Classification Based on Feature Enhancement Network |
|
Hu, Shasha | Sichuan Normal University |
Su, Han | Sichuan Normal University |
Gao, Ruixuan | Sichuan Normal University |
Keywords: Representation Learning, Deep Learning, Image Processing and Pattern Recognition
Abstract: 少样本图像分类是一项关键任务 在计算机视觉领域内。但是,获得 来自有限注释样本的准确类原型是 这是一个具有挑战性的问题。近年来,许多方法基于 在原型网络上已经显示出优异的性能。 然而,现有的方法忽略了歧视性 由于样本稀缺和 查询集中的隐藏类别信息,但无法 解决了由以下来源生成的原型不可靠的问题 有限的注释样本。在本文中,我们提出了一种 用于小样本分类的特征增强网络。自 提高 Few-Shot 的准确性和鲁棒性 分类模型,我们首先增强支持集 通过学习一个权重矩阵,然后对齐 增强的支持:使用文本语义设置原型。自 避免受到引入的先前噪声的影响,我们融合 在语义对齐的原型和平均原型之间 并最终利用查询原型进行动
|
|
15:20-15:40, Paper ThCT7.5 | |
User Engagement Correlates Better with Behavioral Than Physiological Measures in a Virtual Reality Robotic Rehabilitation System |
|
Zhang, Yawen | Peng Cheng Laboratory |
Wang, Haofei | Peng Cheng Laboratory |
Shi, Bertram E. | Hong Kong University of Science and Technology |
Keywords: Human-Computer Interaction, Virtual and Augmented Reality Systems, Assistive Technology
Abstract: Robotic systems to assist with movement rehabilitation are transitioning from providing fixed pre-programmed assistance towards adaptive challenge-oriented strategies that present patients with tasks that are demanding yet achievable. This promotes active engagement, which is crucial for stimulating neural plasticity and promoting recovery. While it has been well established that varying the challenge level can affect user engagement, measuring engagement during task performance has received less attention. To investigate this issue, we developed a virtual reality (VR) robotic system for upper limb rehabilitation using a line-tracing task that measures physiological and behavioral signals. Challenge level can be modulated by introducing force noise disturbance. We conducted a preliminary study on 12 participants, measuring user engagement and physiological/behavioral signals at different noise (challenge) levels. Our findings align with the predictions of flow channel theory. Engagement peaks at an intermediate challenge level. While past work considered only physiological measures, our results reveal that behavioral measures are better correlated with user engagement. Physiological measures correlate better with arousal. This work takes a step toward systems that dynamically adapt task parameters to optimize user engagement.
|
|
15:40-16:00, Paper ThCT7.6 | |
A Lightweight Convolutional Transformer Architecture Approach for Crack Segmentation in Safety Assessment |
|
Usman, Muhammad | Shanghai Jiao Tong University |
Chen, Haopeng | Shanghai Jiao Tong University |
Raza, Muhammad | Shanghai Jiao Tong University |
Keywords: Decision Support Systems, Discrete Event Systems, Distributed Intelligent Systems
Abstract: Crack segmentation is a pivotal task in assessing structural integrity across diverse domains, ranging from civil infrastructure such as bridges and buildings to the fabrication of heavy vehicles, which is crucial for ensuring the longevity and safety of materials. Despite its critical importance, the domain remains relatively underexplored within the academic sphere, particularly in accommodating the crack segmentation on resource-constrained devices. This challenge arises due to the inherent demand for deeper and broader network structures to achieve optimal performance, resulting in heavier computational and storage overhead. Thus, deploying crack segmentation models on practical platforms poses a formidable challenge. This paper presents a novel, lightweight hybrid framework comprising robust Attention UNet architecture to assimilate comprehensive contextual information alongside the efficient MobileVit block to extract and integrate global contextual information utilizing the intricate self-attention mechanism. The intensive experiment results illustrate that our proposed method outperforms the existing state-of-the-art methods on the public benchmark datasets despite employing a reduced parameter space. The dataset and code can be accessed at https://github.com/REINS-SJTU/TransAUnet
|
|
15:40-16:00, Paper ThCT7.7 | |
Multi-Objective Optimization and Discrete Action Set Based Rolling Optimal Lookahead Schedule for Coordinating Electric Vehicles |
|
Lei, Zhipeng | Xiangtan University |
Xiao, Chixin | Xiangtan University |
Jiang, Dechen | Xiangtan University |
Ao, Zhi | Xiangtan University |
Keywords: Cooperative Systems and Control, Intelligent Power Grid, Decision Support Systems
Abstract: Addressing the arbitrage revenue of electric vehicles (EVs) and the mitigation of grid load fluctuations, this paper proposes an approach aiming at rolling optimal lookahead schedule to coordinate the EV charging(or discharging) by using a multi-objective optimization (MOO) algorithm equipped with a discrete action (or behavior) set based EV aggregation and the associated price-driven strategy. Consequently, a comprehensive mechanism is developed, in which, for more judicious trade-off decision making based on the maximum EV arbitrage revenue and the minimum grid load fluctuation, a typical MOO algorithm, i.e., multi-objective evolutionary algorithm based on decomposition (MOEA/D), is adopted on the evaluation over entire schedule period; for decreasing excessive computational and communication costs, a finite discrete set associated with charging or discharging behaviors is proposed, and for organizing such EV behaviors, a price-driven auction algorithm is developed to respond to time-of-use (TOU) tariffs in each time slice. The simulation section utilizes the data from state-of-the-art to validate the proposed approach, the results show that the performance is promising.
|
|
ThCT8 |
MR08 |
Online - Artificial Intelligence and Social Systems |
|
Chair: Jalote-Parmar, Ashis | Norwegian University of Science and Technology, NTNU |
|
14:00-14:20, Paper ThCT8.1 | |
MRCAN:Multi-Scale Region Correlation-Driven Adaptive Normalization for Image Harmonization |
|
Duan, Luwen | Zhejiang University |
Min, Wu | Zhejiang Dahua Technology Co., Ltd |
Lou, Hongliang | Zhejiang Provincial Public Security Department |
Yin, Jun | Zhejiang Dahua Technology Co., Ltd |
Li, Xi | Zhejiang University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Machine Learning
Abstract: Image composition serves as a vital data augmentation technique commonly utilized for intelligent model training. In order to facilitate the optimization of image composition and improve the authenticity and efficacy of the composite images, this paper delves into the composite image harmonization technique to adjust the appearance of the foreground to be harmonious with the background. Current methods usually overlook the interrelation between the foreground and background content, typically transferring the style directly from the whole background to the foreground. In addition, conventional normalization methods were prone to a degradation in image quality of the foreground region after harmonization. To address these issues, we propose an innovative Adaptive Normalization method, which modulates the mean and standard deviation of foreground region specifically and pointedly, to harmonize the appearance while simultaneously preserving the texture structures of original foreground by considering the local information from the deviation maps. Moreover, we incorporate a Multi-scale Region Correlation-driven strategy to explore the correlation between the foreground and the background content, enabling the foreground object to fit the background semantics more seamlessly. Both ablation and comparison experiments on the iHarmony4 dataset demonstrate the effectiveness of our proposed method as well as the superiority of our model over other state-of-the-art image harmonization methods.
|
|
14:20-14:40, Paper ThCT8.2 | |
BCAMLI: A Bidirectional Cross-Attention Mechanism-Based Prediction Model for Plant miRNA-lncRNA Interaction |
|
Ruan, Tiantian | Wuhan University of Technology |
Peng, Jing | Wuhan University of Technology |
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Computational Life Science
Abstract: 植物miRNA-lncRNA相互作用的预测至关重要 了解miRNA-lncRNA的机制 相互作用并进一步探索它们的基因组 功能。目前,许多计算技术已经 用于预测植物miRNA-lncRNA相互作用。然而,大多数现有方法只关注 在 miRNA 和 lncRNA 特征提取上,忽略融合 学习 miRNA-lncRNA 相互作用,这可能导致 特征表示不足。在本文中,我们 提出一种基于双向交叉注意力机制的交叉注意力机制 植物miRNA-lncRNA相互作用的预测模型 (BCAMLI)。我们使用预训练模型嵌入RNA (Doc2vec),可以完全组合上下文信息 的 k-mer 字,具有 RNA序列,使我们能够捕获序列特征 更多语义信息。我们开发了一种新颖的双向 用于增强交互功能的交叉注意力机制 在 miRNA 和 lncRNA 之间学习,让我们更好地 了解潜在的生物学机制并改善 MLI 预测。最后,我们在基准૭
|
|
14:40-15:00, Paper ThCT8.3 | |
Dual-Branch Complementary Multimodal Interaction Mechanism for Joint Video Moment Retrieval and Highlight Detection |
|
Liang, Xuehui | Sun Yat-Sen University |
Wang, Ruomei | Sun Yat-Sen University |
Lin, Ge | Sun Yat-Sen University |
Feng, Jiawei | Sun Yat-Sen University |
Luo, Yuanmao | Sun Yat-Sen University |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Joint video moment retrieval and highlight detection is a video understanding task that requires the model to construct multimodal interaction between heterogeneous features. Recent Transformer-based models mainly focus on promoting global interaction between features. However, local interaction and temporal asynchronism modeling are not deeply considered. To solve this problem, this paper proposes a dual-branch complementary multimodal interaction mechanism (DCMI), which consists of a global difference feature activation module (GDFA) and a local information dynamic aggregation module (LIDA). GDFA measures the difference between the target element and the global features, thus activating important information. LIDA designs a multimodal heterogeneous graph and constructs asynchronous interaction between heterogeneous features to dynamically aggregate local information. DCMI adaptively fuses the complementary dual branches to improve the model's cognitive and decision-making abilities of global and local information. Comprehensive comparisons with existing methods on public datasets verify the superiority of the proposed model. Extensive ablation experiments and qualitative analysis show the effectiveness and rationality of DCMI, which can promote the interaction between multimodal features.
|
|
15:00-15:20, Paper ThCT8.4 | |
DHCT-DT: Dual-Branch Hybrid CNN-Transformer Combined with Dual-Teacher Semi-Supervised Network for Kidney Ultrasound Image Segmentation |
|
Luo, Yang | Wuhan University of Science and Technology |
Liu, Jun | Wuhan University of Science and Technology |
Ding, Mengqian | Wuhan University of Science and Technology |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: Accurate segmentation of kidney ultrasound (KUS) images is of great significance for the diagnosis of kidney diseases. However, KUS images have challenges such as low contrast, large size variations, and acoustic shadow interference. In addition, obtaining labeled data is a major challenge in the field of medical image segmentation. This paper proposes a dual-branch hybrid CNN Transformer combined with a dual teacher semi-supervised segmentation network (DHCT-DT) based on the MT model. The reliability of guidance can be improved between two teachers by constructing consistency regularization constraints. The U-Net architecture is employed between the main teacher and student model. The auxiliary teacher model adopts a dual-branch hybrid CNN Transformer architecture. The multi-scale channel fusion attention mechanism (MSCF) is introduced to capture the feature correlation information of KUS images at different scales and channels. Two different training methods can combine their advantages to fully extract local and global information from images. To demonstrate the applicability of the proposed method, experiments were conducted on two datasets: a KUS dataset and a breast ultrasound (BUS) dataset. Both datasets demonstrated the most competitive segmentation performance. For example, in a semi-supervised experiment using only 10% labeled data in the BUS dataset, the Dice coefficient, Jaccard coefficient, HD, and ASSD reached 82.28%, 76,36%, 13.36mm, and 3.71mm.
|
|
15:20-15:40, Paper ThCT8.5 | |
Learning Circuit Complexity of Boolean Functions |
|
Loscos Barroso, Daniel | Universidad Complutense De Madrid |
Martí-Oliet, Narciso | Universidad Complutense De Madrid |
Rodríguez, Ismael | Universidad Complutense De Madrid |
Villarrubia, Jorge | Universidad Complutense De Madrid |
Keywords: Neural Networks and their Applications, Machine Learning, Image Processing and Pattern Recognition
Abstract: Computational Complexity has grappled for decades with understanding which Boolean functions can be computed with small circuits. This knowledge would unlock new ways to approach paramount open problems in Computer Science, such as P vs NP. We present an empirical approach to the problem in which we classify 5-to-1-bit Boolean functions. We built a dataset of functions with small encoding circuits and tested different classifiers to isolate them from the vast complementary set of functions with big circuits. Simple multilayer perceptrons achieved 97% and higher accuracies. We introduce r-weights, a heuristic on neuron weights, to explain how and why this approach was so successful, and we present the theoretical conclusions we extracted from them to face the circuit complexity problem.
|
|
15:40-16:00, Paper ThCT8.6 | |
Adaptive Anytime Multi-Agent Path Finding with Density and Delay in Large Neighborhood Search |
|
Wu, ZiXian | Sun Yat-Sen University |
Chen, Zitong | Sun Yat-Sen University |
Keywords: Swarm Intelligence, Agent-Based Modeling, AI and Applications
Abstract: The Multi-Agent Path Finding (MAPF) problem is to find a set of collision-free paths for multiple agents within a shared environment while minimizing the total time spent. In MAPF, finding an optimal solution is NP-hard. And in some cases, finding suboptimal solutions is also NP-hard, with severe limitations imposed by the scale of the problem. Anytime multi-agent path planning (MAPF) is a promising method for path optimization in large-scale multi-agent systems. Large Neighborhood Search (LNS) is the foundation of the state-of-the-art anytime MAPF and can even perform well for large-scale issues. It discovers a fast solution at first, and then iteratively plans paths for subsets of agents produced via destructive heuristic techniques to gradually enhance the quality of the solution until it approaches optimality. In this paper, we propose a subset selection algorithm based on maximum latency and density, which improves upon the subset selection rule of the existing state-of-the-art solver, MAPF-LNS. We conducted tens of thousands of experiments on multiple maps of the MAPF benchmark set with larger numbers of agents. Our experimental results indicate that our solver, MAPF-DDLNS, significantly outperforms MAPF-LNS in both the quality of the final solution and the speed of resolution.
|
|
15:40-16:00, Paper ThCT8.7 | |
Development of an Innovative User Centered Design Driven mHealth App for Female Athletes “The Coral App” * |
|
Jalote-Parmar, Ashis | Norwegian University of Science and Technology, NTNU |
Topranin, Virginia | Norwegian University of Science and Technology (NTNU) |
Taylor, Madison | Norwegian University of Science and Technology (NTNU) |
Singh Parmar, Vikram | Norwegian University of Science and Technology (NTNU) |
Sandbakk, Oyvind | Norwegian University of Science and Technology (NTNU) |
Keywords: Design Methods, Human-Computer Interaction, User Interface Design
Abstract: Decision support tools aimed at female athletes and coaches, which could aid in enhancing the relationship between the Menstrual Cycle (MC) and its impact on training and performance, are noticeably absent in the literature. This absence may be attributed to the global under-discussion of the MC topic, which has often been perceived as taboo within the sports community. This article delineates a User-Centered Design (UCD) guided research and innovation process for an mHealth App, namely the "Coral App." This personalized decision support tool is tailored for female athletes and coaches to augment their comprehension of the MC and its impact on training and performance. The fully operational Coral App was meticulously designed and field-tested through collaboration with a multidisciplinary team comprising expert coaches, athletes, sport and health researchers, and designers. The design process of the Coral App comprised of three stages: (i) User studies involving focus group and semi-structured interviews with coaches and female athletes, (ii) Iterative design of the Coral App with experts, and (iii) Longitudinal field testing of the fully functional Coral App with 26 recreational athletes and one coach over the course of a year. The System Usability Scale (SUS) and User Experience Scale (UEQ) were employed to gauge user feedback on usability and experience. The study results concluded that the perceived ease of use and usefulness are indicators of higher user acceptance intention of the Coral App among female athletes. This is particularly significant for establishing tracking, communication, and education regarding the MC and comprehending its impact on performance. This article contributes to the much-needed literature on female-focused innovation of decision support-related mHealth services within the sports domain. Furthermore, it illustrates the value of UCD in furnishing a framework for collaboration with relevant stakeholders to develop research-driven innovative solutions, especially concerning the MC, a topic often shrouded in stigma and deemed taboo internationally.
|
|
15:40-16:00, Paper ThCT8.8 | |
MSFSAN : A Novel Multi-Scale Spatio-Temporal Feature Screening Attention Network for Urban Carbon Emission Prediction |
|
Ben, Wang | Xinjiang University |
Qin, Xizhong | XinJiang University |
Qin, Jiwei | Xinjiang University |
Zhang, Xiaoyu | Xinjiang University |
Ma, Haodong | Xinjiang University |
|
|
15:40-16:00, Paper ThCT8.9 | |
Data-Driven Parameter Estimation for Autonomous Underwater Vehicle Depth Subsystem |
|
Mishra, Sourav | Indian Institute of Science, Bangalore |
Makam, Rajini | Indian Institute of Science |
Sundaram, Suresh | Indian Institute of Science |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, System Architecture
Abstract: Autonomous Underwater Vehicles (AUVs) play a key role in modern marine exploration. The effective deployment of AUVs relies on accurately determining their dynamics, which can be modeled using system identification approaches. Data-driven approaches for system identification of AUV subsystems is an open research area. In this paper, we present a data-driven approach for system identification of the depth subsystem of AUVs using Physics Informed Networks (PINNs) and Sparse Identification of Non-Linear Dynamics (SINDy). The AUV depth subsystem is excited with two different input profiles and the resulting states are used for system identification. The robustness of coefficient estimation using PINNs and SINDy is also assessed by adding Gaussian noise to the simulated states. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) quantify the closeness between the actual states and those obtained from the system using the estimated coefficient values. In addition, the percentage error in the identified coefficient values is reported. The results highlight that PINNs obtain the most remarkable performance with multi-step inputs, whereas SINDy excels with sine inputs. For multi-step input, PINN yields MSE and RMSE values of the order 10^{-2} for states, whereas SINDy achieves values around of the order 10^{-3} for sinusoidal input.
|
|
15:40-16:00, Paper ThCT8.10 | |
Integrating Agent Transparency Adaptation Mechanisms into Human Machine Cooperation (I) |
|
Simon, Loïck | Université Bretagne Sud, Lab-Sticc, Fhoox (UMR 6285) |
Guérin, Clément | Lab-STICC, Université Bretagne Sud |
Pacaux-Lemoine, Marie-Pierre | Lamih - Cnrs Umr 8201 |
Rauffet, Philippe | Université Bretagne Sud |
Keywords: Human-Machine Cooperation and Systems, Human-Machine Interaction, Cooperative Work in Design
Abstract: This paper proposes to model the dynamic management of dialogue in human-machine cooperation. It seeks to articulate a generic model of human-machine cooperation, based on two complementary abilities of agents (the Know-How-to-Operate and the Know-How-to-Cooperate), with 2 prominent models of agent transparency (SAT and HRT model). Modeling is applied to a specific Human-Machine Cooperation, a cyber-physical system assisting a human for planning decisions of maintenance operations. Dialogue management mechanisms filter the information displayed on the interface, from the agent's analysis of the supervised process to the team's performance and cooperation activities. This dynamic dialogue management contributes to the calibration of trust and optimisation of cooperation
|
|
ThCT9 |
MR09 |
Deep Learning and Neural Networks 11 |
Regular Papers - Cybernetics |
Chair: Perween, Tarannum | TCS Research |
|
14:00-14:20, Paper ThCT9.1 | |
Top-K POOLING with PATCH CONTRASTIVE LEARNING for WEAKLY-SUPERVISED SEMANTIC SEGMENTATION |
|
Wu, Wangyu | Xi’an Jiaotong-Liverpool University |
Dai, Tianhong | University of Aberdeen |
Huang, Xiaowei | University of Liverpool |
Fei, Ma | Xi’an Jiaotong-Liverpool University |
Jimin, Xiao | Xi'an Jiaotong-Liverpool University |
Keywords: Deep Learning, Machine Learning
Abstract: Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to cost-effectiveness. Recently, Vision Transformer (ViT) based methods without class activation map (CAM) have shown greater capability in generating reliable pseudo labels than previous methods using CAM. However, the current ViT-based methods utilize max pooling to select the patch with the highest prediction score to map the patch-level classification to the image-level one, which may affect the quality of pseudo labels due to the inaccurate classification of the patches. In this paper, we introduce a novel ViT-based WSSS method named top-K pooling with patch contrastive learning (TKP-PCL), which employs a top-K pooling layer to alleviate the limitations of previous max pooling selection. A patch contrastive error (PCE) is also proposed to enhance the patch embeddings to further improve the final results. The experimental results show that our approach is very efficient and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 dataset.
|
|
14:20-14:40, Paper ThCT9.2 | |
DiffusionVTON: An Image-Based Virtual Try-On Framework Using Denoising Diffusion Models |
|
Liang, Xiubo | Zhejiang University |
Zhi, Xin | Zhejiang University |
Wei, Mingyuan | Zhejiang University |
Wang, Hongzhi | Zhejiang University |
Li, Mengjian | Zhejiang Lab |
Keywords: Deep Learning, Machine Vision, Multimedia Computation
Abstract: The utilization of virtual try-on has gained popularity in the fashion and e-commerce industries as it enables customers to try on clothing virtually before making online purchases. However, existing virtual try-on techniques encounter difficulties in handling complex poses and distortions, which often result in visible misalignments or defects. To overcome these challenges, we propose DiffusionVTON, a virtual try-on framework that employs denoising diffusion models and an Enhanced Garment Guide decoder. Our approach relies on pose keypoints, target models, and clothing images, reducing additional input requirements and mitigating the effects of potentially inaccurate intermediate predictions. The Enhanced Garment Guide decoder enhances the virtual try-on results by incorporating additional garment information into each layer of the decoder, improving image quality and preserving clothing details. Experimental results on VITON and MPV datasets demonstrate that our approach outperforms existing methods in terms of image quality and fidelity, providing users with realistic and accurate virtual try-on experiences.
|
|
14:40-15:00, Paper ThCT9.3 | |
Resolving Spurious Temporal Location Dependency for Video Corpus Moment Retrieval |
|
Yishuo, Zhang | Beijing University of Posts and Telecommunications |
Lanshan, Zhang | Beijing University of Posts and Telecommunications |
Zhizhen, Zhang | Tsinghua University |
Ziyi, Wang | Beijing University of Posts and Telecommunications |
Xie, Xiaohui | Tsinghua University |
Wendong, Wang | Beijing University of Posts and Telecommunications |
Keywords: Multimedia Computation, Neural Networks and their Applications, Deep Learning
Abstract: Video Corpus Moment Retrieval aims to retrieve the relevant video from a large corpus and localize the corresponding moment within the target video based on a specific query. Existing methods have achieved promising accuracy and efficiency through dedicated model designs. However, we suggest these methods overly exploit dataset biases instead of semantics. This will lead to exaggerated performances on biased datasets and implicit a significant deficiency in generalizability, which is an important metric yet not considered in existing studies. In this paper, we observe the degradation caused by a spurious dependency and design a model to mitigate this harm. Specifically, we generate an Out-Of-Distributed (OOD) test set from a widely used TV Retrieval dataset, revealing the existing models' erroneous dependency on the temporal locations of target moments. Therefore, we utilize a theoretical Structural Causal Model (SCM) to dig into the roots of this dependency by constructing causal paths for the models. Furthermore, we propose a concrete Clip Location Deconfounding Model (CLDM) to disentangle the confounded video features into the content part and the location confounder part, then produce results with causal intervention. Experiments show that CLDM significantly alleviates the impact brought by dataset biases thus providing advanced generalizability among existing works.
|
|
15:00-15:20, Paper ThCT9.4 | |
AAFM-Net: An Ensemble CNN with Auxiliary Attention Filtering Module for Intrusion Detection |
|
Sun, Yunpeng | Nanjing University of Science and Technology |
Zhang, Shuangquan | School of Cyber Science and Engineering |
Lian, Zhichao | Nanjing University of Science and Technology |
Keywords: Neural Networks and their Applications, Cloud, IoT, and Robotics Integration, Deep Learning
Abstract: In recent years, electric vehicles (EV) have developed greatly and have begun to gradually replace traditional cars. With this development, the potential security threats faced by electric vehicle network systems are also increasing. To cope with these threats in the EV network, in the paper we propose an auxiliary attention filtering module (AAFM) that cooperates with an ensemble convolutional neural network (CNN). AAFM combines the attention of features from different dimensions, filters irrelevant features, and compensates for the output of the model. The latest CICEV2023 dataset is used for training and evaluation. Experiments demonstrate that AAFM can effectively improve model performance and deal with attacks in extreme situations compared to 8 classic and effective intrusion detection models. AAFM-Net achieves over 90% accuracy.
|
|
15:20-15:40, Paper ThCT9.5 | |
Class Incremental Learning Via Feature Knowledge Prompts |
|
Ba, Zhibiao | Tongji University |
Ma, Jun | Tongji University |
Fan, Chaoyu | Tongji University |
Shi, Lihua | Tongji University |
Kang, Qi | Tongji University |
Keywords: Deep Learning, Machine Learning, Image Processing and Pattern Recognition
Abstract: Incremental learning uses previous knowledge and experience to deal with new problems. Traditional class incremental Learning methods usually need to ensure the current model's sensitivity to old tasks by storing, retrieving and using a large amount of historical data, which suffers from difficulties such as high storage costs and high dependence on data availability. This work proposes a class incremental learning method based on feature knowledge prompts to help models better select and utilize old knowledge elements to assist model training. The design contains prompt matching sample feature maps related to specific fields. A feature splitting method is used to isolate the common and special features of a sample to indicate the potential feature space direction of the sample to be tested. The method has been verified to have outstanding performance through multiple data sets and multiple tasks.
|
|
15:40-16:00, Paper ThCT9.6 | |
Spatial-Temporal Traffic Forecasting Based on Bottom-Up Representation Learning |
|
Chi, Pengnan | KTH Royal Institute of Technology |
Ma, Xiaoliang | KTH Royal Inst of Tech |
Keywords: Representation Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: To implement proactive traffic management, traffic forecasting becomes an essential function of modern intelligent transport systems (ITS). Traffic flows on motorways exhibit substantial variability, making it necessary to capture high-frequency patterns in the spatiotemporal model. To address the challenges, a representation learning approach is leveraged in this paper to extract high-level features that facilitate traffic forecasting on motorway. A bottom-up learning structure is proposed to sequentially extract information from local to the global level. Computational experiments show that simple models with informative representation may achieve satisfactory performance for traffic prediction.
|
|
15:40-16:00, Paper ThCT9.7 | |
Inferring Reward Functions from State Transitions: A Deep Learning Approach |
|
Perween, Tarannum | TCS Research |
Roy, Shuvra Neel | TCS Research |
Sadhu, Arup Kumar | Tata Consultancy Services |
Dasgupta, Ranjan | TCS Research |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Computational Intelligence
Abstract: In the field of reinforcement learning, reward functions are essential for training agents aiming to execute desired actions. However, designing reward functions manually is difficult, time-consuming, and can lead to inaccuracies. This research introduces a novel framework called the Scale Invariant Reward Framework (SIRF) for creating dense rewards using a neural network (NN) classifier. SIRF employs neural networks to mix complex state functions and generate dense bounded reward signals. SIRF's effectiveness is evaluated using the DDPG (Deep Deterministic Policy Gradient) algorithm in an OpenAI Gym environment, a popular open-source toolkit for reinforcement learning research. The results demonstrate that SIRF achieves comparable performance to existing reward functions from the OpenAI Gym environment in terms of key metrics such as average reward, network loss, velocity, and energy consumption for continuous environment agents. This suggests that SIRF can effectively guide the agent's learning process while maintaining consistency in the reward scale, simplifying the overall reinforcement learning setup.
|
|
ThCT10 |
MR10 |
Optimization and Self-Organization |
|
Chair: Ji, Jing-Yu | Lingnan University |
|
14:00-14:20, Paper ThCT10.1 | |
Optimization and Research on Army Vehicle Deployment in Emergency Situations |
|
Haorao, He | Nanjing University of Information Science and Technology |
Zhang, Xiaoxiong | National University of Defense Technology |
Yang, Jun | National University of Defense Technology |
Zhou, Xiaolei | National University of Defense Technology |
Yan, Hao | National University of Defense Technology |
Keywords: Optimization and Self-Organization Approaches, Evolutionary Computation, Application of Artificial Intelligence
Abstract: In the event of disasters, emergencies, or special circumstances, the military needs to deploy vehicle resources to meet the needs of emergency response and rescue work. This problem involves the reasonable allocation and utilization of limited vehicle resources to complete emergency tasks quickly and efficiently while reducing casualties and property losses. However, there are still some difficulties regarding the military vehicle deployment problem, including uneven resource deployment, low deployment efficiency, and insufficient logistics protection. Herein, a mixed-integer linear programming model is constructed to optimize the deployment problem of military vehicles to minimize the total transportation time and cost. Considering the complexity and uncertainty in military operations, this paper introduces a variety of constraints, particularly in terms of vehicle number, geographical limitation, and strategic vehicle. With the designed model, instances of problems of different scales are solved, and comparisons are drawn with the existing deployment methods. The results show that the proposed model can effectively reduce the time and cost required for vehicle deployment, while significantly improving the flexibility and adaptability of the deployment scheme. This study can not only provide a new optimization tool for army vehicle deployment but also be utilized for other similar types of emergency logistics deployment.
|
|
14:20-14:40, Paper ThCT10.2 | |
Optimization of Neural Network Models Based on Symbol Interval Propagation |
|
Li, Xuejian | Anhui University |
Li, Zihan | Anhui University |
|
|
14:40-15:00, Paper ThCT10.3 | |
Surrogate-Assisted Differential Evolution for Expensive Equality Constrained Optimization |
|
Ji, Jing-Yu | Lingnan University |
Yu, Wei-Jie | Sun Yat-Sen University |
Wong, Man-Leung | Lingnan University |
Kwong, Sam Tak Wu | Lingnan University |
Keywords: Evolutionary Computation, Optimization and Self-Organization Approaches, Computational Intelligence
Abstract: In recent years, surrogate-assisted evolutionary algorithms have gained considerable success in addressing expensive constrained optimization problems. While significant focus has been directed toward optimization challenges with inequality constraints, the domain of expensive equality-constrained optimization also necessitates attention, as equality constraints are frequently encountered in traditional constrained optimization problems. Recognizing this gap, this study introduces an innovative approach that integrates a multilayer perceptron regression-based surrogate with a gradient descent-based repair method and differential evolution to address these challenges effectively. Our contributions are threefold: 1) We develop a multilayer perceptron-based surrogate model that concurrently approximates the objective function and equality constraints, 2) We employ a gradient descent-based repair method to adeptly manage the challenging equality constraints, and 3) We propose a hybrid local search scheme that enhances the solution refinement process. The combined use of the multilayer perceptron-based surrogate and gradient descent-based local search works in concert with differential evolution to guide the population toward the feasible region. This approach enables the evolutionary search, supported by the surrogate model, to extensively explore potential feasible regions. Our experimental results underscore the potential and efficacy of the proposed surrogate-assisted evolutionary algorithm in solving such complex optimization problems.
|
|
15:00-15:20, Paper ThCT10.4 | |
DNAS: Depth-First Neural Architecture Search |
|
Zhou, Jianjun | South China University of Technology |
Chen, Junying | South China University of Technology |
Cai, Yi | South China University of Technology |
Keywords: Optimization and Self-Organization Approaches, Neural Networks and their Applications, Deep Learning
Abstract: The key challenge of neural architecture search (NAS) methods lies in efficiently exploring search spaces. To solve this problem, Breadth-First Search (BFS) method uses two trees to represent a search space, and performs bi-level BFS on these trees to find the optimal network architecture. However, the BFS method did not discover the best network in NAS-Bench-201, and its search efficiency is still not high enough. In this work, we propose a one-shot method called Depth-first Neural Architecture Search (DNAS) to efficiently explore the best architecture. Given a search space with N-candidate operations, we represent it as a single N-ary tree and employ depth-first search on this tree to explore high-performance network architectures while significantly reducing the resource consumption during the exploration. While the BFS method explores multiple networks simultaneously in each exploration, DNAS efficiently explores along only one direction at a time, eliminating the need to synchronize the search processes of multiple networks. The proposed DNAS, performed on a single RTX 2080Ti GPU, finds the optimal architectures for CIFAR-10 and ImageNet16-120 in 0.3 hour on NAS-Bench-201, and in 1.0 hour on CIFAR-100. While finding the best network, the search time of the DNAS method has been reduced by around 2 to 8 GPU hours as compared to other well-performed methods, especially reducing the search time by 91.67% when compared to the BFS method. The best searched network was further transferred to medical image classification tasks and achieved high classification accuracy across multiple datasets. In addition, the results of ablation experiments substantiate the effectiveness and efficiency of the proposed method. Our source code is available at: https://github.com/Bob5090/DNAS.
|
|
15:20-15:40, Paper ThCT10.5 | |
GHVC-Net: Hypervolume Contribution Approximation Based on Graph Neural Network |
|
Wu, Guotong | Southern University of Science and Technology |
Nan, Yang | Southern University of Science and Technology |
Shang, Ke | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Keywords: AI and Applications, Optimization and Self-Organization Approaches
Abstract: This paper proposes a framework called GHVC-Net that uses the graph neural network (GNN) model to approximate each solution's hypervolume contribution (HVC). GHVC-Net is permutation invariant and can handle solution sets of arbitrary size, similar to the properties of GNN. Compared to HVC-Net (i.e., a machine learning model for HVC approximation), GHVC-Net achieves better accuracy with less training time. GHVC-Net is also compared with traditional approximation methods, such as line-based and point-based methods, to demonstrate its ability to identify the solution with the smallest (largest) HVC.
|
|
15:40-16:00, Paper ThCT10.6 | |
A Dynamic Operational Optimization Method for Robotic Mobile Fulfillment Systems with Inventory Discrepancy Events (I) |
|
Ma, Huai | Northeastern University, Shenyang, China |
Zhao, Ziyan | Northeastern University |
Liang, Jiaqi | Polytechnique Montréal |
Li, Xingyang | Northeastern University |
Liu, Shixin | Northeastern University |
Keywords: Optimization and Self-Organization Approaches
Abstract: A Robotic Mobile Fulfillment System (RMFS) is an emerging "cargos-to-person" picking system that relies on the broom of Internet of Things (IoT) technology. It aims to offer significant enhancements in order picking efficiency. However, dynamic disturbances, such as inventory discrepancies arising from errors in receiving, shipping, and handling of goods, often disrupt its operations, leading to degraded service and increased operational costs, thereby affecting overall system performance. Traditional optimization solutions may necessitate adaptations or overhauls in response to such disturbances. This paper introduces a proactive multi-pathway response algorithm tailored to mitigating dynamic disturbances in RMFS, particularly concerning inventory discrepancies. We extend an open-source simulation framework to evaluate the performance of the proposed algorithm and conduct a comparative analysis of dynamic systems. Experimental results indicate that our proposed algorithm can effectively improve the processing efficiency of abnormal orders with the minimal system-wide impact, highlighting its potential to well address dynamic and abnormal events in smart warehouses.
|
|
ThCT11 |
MR11 |
Resilience Engineering 1 |
Regular Papers - Cybernetics |
Chair: Kobayashi, Manabu | Waseda University |
|
14:00-14:20, Paper ThCT11.1 | |
AOCN: Appendix Object Correction Network Utilizing Relationships across CT Slices |
|
Ng, Wing Yin | South China University of Technology |
Xu, Jing | South China University of Technology |
Liang, Yinhao | South China University of Technology |
Wang, Ting | South China University of Technology |
Zhang, Jianjun | South China University of Technology |
Hui, Zhou | The Sixth Affiliated Hospital of Guangzhou Medical University, Q |
Dan, Liang | Guangzhou First People’s Hospital/The Second Affiliated Hospital |
Li, GuangMing | The Sixth Affiliated Hospital of Guangzhou Medical University, Q |
Wei, Xinhua | Department of Radiology, Guangzhou First People's Hospital, Sout |
Keywords: Computational Intelligence, Machine Learning
Abstract: When analyzing CT images of patients with suspected appendicitis, radiologists need to observe and examine consecutive 2D CT slices. Computer-assisted detection of the appendix in 2D CT slices significantly improve the diagnostic efficiency of radiologists. However, existing 2D medical image object detection methods primarily focus on spatial features within a single CT slice, which overlook spatial relationships between consecutive slices. We propose an Appendix Object Correction Network (AOCN) to refine predictions of universal object detectors. Although AOCN is a 2D network, it effectively leverages spatial relationships across consecutive CT slices. AOCN requires only a few training epochs to improve the accuracy of bounding boxes significantly, which offers advantages such as high scalability, low cost, and reduced training time. It consists of a global case feature learning module for extracting global feature map from the CT case and an object feature relation module for modeling the relationships between objects across slices. Experimental results demonstrate the effectiveness and efficiency of AOCN in correcting the output bounding boxes of several mainstream object detection networks, with a 6% to 14% improvement in Recall while requiring only a few training epochs.
|
|
14:20-14:40, Paper ThCT11.2 | |
MCD: Defense against Query-Based Black-Box Surrogate Attacks |
|
Zou, Yiwen | South China University of Technology |
Ng, Wing Yin | South China University of Technology |
Xueli, Zhang | South China University of Technology Guangzhou |
Loo, Brick | South China University of Technology |
Yan, Xingfu | South China Normal University |
Wang, Ran | Shenzhen University |
Keywords: Computational Intelligence, Machine Learning
Abstract: Deep neural networks (DNNs) is susceptible to surrogate attacks, where adversaries use surrogate data and corresponding outputs from the target model to build their own stolen model. Model stealing attacks jeopardize model privacy and model owners’ commercial benefits. To address this issue, this paper proposes a hybrid protection approach – Maximize the confidence differences between benign samples and adversarial samples (MCD), to protect models from theft. Firstly, the LogitNorm approach is used to overcome the overconfidence problem in adversary query classification. Then, samples are divided into four groups according to ES and RS. Different groups are poisoned by different degrees. In addition to enhancing defensive performance and accounting for model integrity, the MCD uses a trigger to confirm the cloned model’s owner. Experimental results show that the MCD defends against a variety of original models and attack techniques well. Against KnockoffNets and DFME attacks, the MCD yields an average defense performance of 54.58% on five datasets, which is a great improvement over other defenses. Compared to other poisoning techniques, the Strong Poisoning (SP) module reduces the adversary’s accuracy by 48.23% on average. Additionally, the MCD overcomes the issue of OOD overconfidence while safeguarding the model accuracy in OOD detection and reduces the misclassification rate of ID samples for multiple OOD datasets.
|
|
14:40-15:00, Paper ThCT11.3 | |
Multi-Agent Pruning and Quantization in Mixed-Precision Models |
|
Hsieh, Mong-Yung | National Chung Cheng University |
Liu, Alan | National Chung Cheng University |
Chen, Zih-Jyun | National Chung Cheng University |
Keywords: Machine Learning, Transfer Learning, Deep Learning
Abstract: In order to improve the size reduction of the deep learning models for deploying to edge devices, this study employs the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method, combined with the Preserve Ratio for each layer, to formulate different compression strategies without manual parameter tuning. Additionally, it evaluates the model compression effectiveness using metrics such as Top-1 accuracy, model size, parameter count, FLOPs, and latency. The results show that compressing MobileNet-V2 at FP32 precision on the target hardware reduces inference latency by 45.8%. Moreover, the model size decreases by 79.23%, the parameter counts by 60%, and the model computational load by 90.35%.
|
|
15:00-15:20, Paper ThCT11.4 | |
Average Performance Analysis of Multi-Class Classification Based on Error-Correcting Output Codes |
|
Kobayashi, Manabu | Waseda University |
Kumoi, Gendo | Nagaoka University of Technology |
Yagi, Hideki | University of Electro-Communications |
Hirasawa, Shigeichi | Waseda University |
Keywords: Machine Learning, Deep Learning
Abstract: In machine learning, one of the methods to solve multi-class classification problems is a framework called Error-Correcting Output Codes (ECOC), which constructs a multi-class classifier by combining a lot of binary classifiers. ECOC assigns binary codewords to each category, and the multi-class classification performance varies depending on the code. In this study, we treat each element of the codeword as a random variable and evaluate the average performance of ECOC. As a result, for M class classification if the number of binary classifiers is O(log M), then the average error probability of various codes approaches that of MAP estimation. We show that the important points are the ratio between the number of binary classifiers and log M and the difference between the maximum posterior probability and the second highest posterior probability for the categories.
|
|
15:20-15:40, Paper ThCT11.5 | |
MMPGCN: Multi-Hop Message Passing Graph Convolutional Network for Knowledge Graph Completion |
|
Wang, Jian | Shanghai Ocean University |
Zhang, Zizhao | Shanghai Ocean University |
He, Qi | Shanghai Ocean University |
Zhou, You | Shanghai Ocean University |
Pan, Qi | Shanghai Ocean University |
Keywords: Machine Learning, Representation Learning
Abstract: Graph Convolutional Network (GCN) is extensively utilized in the domain of Knowledge Graph Completion (KGC) which is aiming to predict the absent entities or relationships within Knowledge Graph (KG). The majority of traditional GCN-based models employ the method of passing messages layer by layer to capture the characteristics of distant neighbors. However, this approach fails to effectively integrate the semantic feature information of multi-hop neighbors due to the significant complexity of relations when attempting to learn continuous vectors for entities. To overcome this limitation, this study introduces Multi-Hop Message Passing Graph Convolutional Network (MMPGCN), a novel framework for GCNs that effectively integrates feature information from indirect neighbors in a novel manner. Furthermore, the graph attention mechanism is utilized in the proposed model to differentiate the weights of various indirect neighbors. A robust and expressive balancing gate mechanism is designed to integrate the information from both direct and indirect neighbors to produce the ultimate representation. The approach is assessed using FB15k237 and WN18RR datasets, and it demonstrates superior performance in comparison to state-of-the-art methods for KGC task.
|
|
15:40-16:00, Paper ThCT11.6 | |
Action Robust Reinforcement Learning with Highly Expressive Policy |
|
Kim, SeongIn | University of Tsukuba |
Shibuya, Takeshi | University of Tsukuba |
Keywords: Machine Learning, Deep Learning
Abstract: In traditional reinforcement learning, there can be a degradation in the control performance of the policy when the environmental parameters differ between the training and application phase. The policy that minimizes this degradation is referred to as a robust policy. A framework called Noisy action Robust MDP (NR-MDP) was proposed for training robust policies, and the Action Robust DDPG (AR-DDPG) algorithm was introduced as a method for solving NR-MDP. The optimal policy in NR-MDP includes policies following various probability distributions, whereas AR-DDPG is restricted to deterministic policies. We propose a new robust reinforcement learning method called Action Robust Q Learning (AR-QL) that enables the training of optimal policies in NR-MDP by leveraging various sampling techniques to extend the representational capacity of policies, targeting an improvement in policy robustness. To validate this, we confirmed that AR-QL can acquire the optimal policy for a simple NR-MDP problem, for which AR-DDPG fails to obtain the optimal policy. Furthermore, we confirmed that the robust performance of policy trained by AR-QL in the OpenAI's InvertedPendulum environment surpasses that of policy trained by AR-DDPG.
|
|
15:40-16:00, Paper ThCT11.7 | |
Explainable Reinforcement Learning Via Causal Model Considering Agent’s Intention |
|
Kim, SeongIn | University of Tsukuba |
Shibuya, Takeshi | University of Tsukuba |
Keywords: Machine Learning, Deep Learning
Abstract: Explaining agents’ decisions of can offer valuable insights for designers and end-users. One proposed method for describing an agent's decision-making involves representing the control target as a causal model and providing explanations for the decisions made. However, traditional causal models often face structural limitations, restricting the range of representable control problems. Additionally, accurately providing explanations becomes challenging in environments with various types of rewards. In this study, we introduce a causal model capable of representing a broader range of control problems and a method to provide accurate explanations in environments with various types of reward structures. Through redefining the relationships between nodes in the causal model, we have enabled a broader representation of control problems. Also, By incorporating the intentions of agents into the explanation, we have achieved to provide a more precise description. To validate the effectiveness of the proposed method, we conducted experiments using OpenAI's LunarLander environment. Using a proposed causal model, we plotted the causal model of LunarLander, which could not be represented by conventional causal models. Furthermore, by incorporating the intentions of the agent into the explanation, novel interpretations previously inaccessible have become feasible.
|
|
ThCT13 |
Room T13 |
Intelligence and Decision Making and Wearable Computing |
Workshops |
Chair: Wang, Qirun | Tokyo University of Technology |
|
14:00-14:20, Paper ThCT13.1 | |
Analyzing the Influence of Driving Experience on Difference Reverse Parking Behaviors through Eye-Tracking Data Analysis (I) |
|
Wang, Qirun | Tokyo University of Technology |
Huang, Xuan | WASEDA University |
Wu, Bo | Tokyo University of Technology |
Keywords: Human Factors, Intelligence Interaction, Visual Analytics/Communication
Abstract: In daily driving, reverse parking into a garage often leads to collision accidents. However, current studies mostly focus on analyzing drivers' eye movements while using a certain specific parking style, lacking comparative research on different parking behaviors. In this study, 200 experiments were conducted with 20 participants of varying driving experience to collect their eye movement data during different types of reverse parking into the garage. Based on the collected eye-tracking data, we try to analyze how driving experience impacts reverse parking behaviors. The findings shown that reveal significant differences in gaze behavior and fixation positions between novice and experienced drivers. Novice drivers exhibit more erratic gaze patterns, focusing more on the right door mirror and frequently shifting their gaze between areas of interest. To be specific, in situation A (entering the garage from the right side), novice drivers feel insecure due to their inability to visually assess road conditions directly, prompting them to rely more on the right door mirror. On the other hand, in situation B (entering the garage from the left side), reliance on interior mirrors is reduced for both novice and experienced drivers. Insights from these findings could enhancing overall driving safety and efficiency.
|
|
14:20-14:40, Paper ThCT13.2 | |
Multimodal Federated Learning Via Local-Global Fusion (I) |
|
Xia, Zilin | Hangzhou Dianzi University |
Tan, Min | Hangzhou Dianzi University |
Gao, Zhigang | China Jiliang University |
Chu, Lingqiang | Hangzhou Dianzi University |
Han, Tingting | Hangzhou Dianzi University |
Keywords: Systems Safety and Security, Cooperative Work in Design, Assistive Technology
Abstract: The proliferation of Internet of Things (IoT) devices across diverse domains in modern life has made them significant sources for collecting and analyzing multi-modal data. However, concerns about ownership and data privacy associated with IoT devices make data sharing among multiple devices impractical. Recently, multimodal federated learning has emerged as an innovative solution where each device client can collectively train a satisfactory local model without exchanging local data. Nevertheless, most existing multimodal federated learning approaches prioritize training a powerful global server model while neglecting the performance of local client models. In this context, this paper introduces FedAF, a multimodal Federated learning approach via feature Fusion with adversarial representation learning, aimed at enhancing local representations and thereby improving local client models. Specifically, using the trained global model, FedAF integrates the global representation of each local client's data into its local feature obtained from the corresponding client model. Furthermore, domain adversarial learning is employed to align global and local representations by minimizing the discrepancy between local and global encoders, compelling the global encoder to adapt to local tasks. Comprehensive experiments on two unimodal classifications and one multimodal retrieval dataset demonstrate that FedAF achieves state-of-the-art performance compared to other federated learning methods and significantly improves local client models while maintaining the satisfactory performance of the global server model.
|
|
14:40-15:00, Paper ThCT13.3 | |
Assessing the Impact of Immersive Augmented and Virtual Reality Based Joint Attention Training Platform on Autistic Children Via Behavioral and Physiological Measures (I) |
|
Samantaray, Ashirbad | Indian Institute of Technology Delhi |
Kaur, Taranjit | Indian Institute of Technology Jodhpur |
Majumder, Chayan | Indian Institute of Technology Delhi |
Gulati, Sheffali | All India Institute of Medical Sciences Delhi |
Gandhi, Tapan Kumar | Indian Institute of Technology Delhi |
Keywords: Virtual/Augmented/Mixed Reality, Interactive and Digital Media, Assistive Technology
Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that is usually diagnosed between the ages of one and three. It is characterized by developmental issues and repetitive behaviours. One of the important social skills, i.e., Joint Attention (JA), involves developing a shared focus of attention with another person. Children with ASD often lack JA skills, which can impede their ability to develop social communication skills later in life. This makes early intervention critical. Previous research has explored various techniques for teaching JA skill training, but few have utilized immersive Augmented Reality (AR) and Virtual Reality (VR) based devices for JA skill training. Additionally, there is limited work in literature exploring the physiological effect of JA training via using immersive AR and VR devices. This paper addresses these gaps by introducing a novel JA training platform that utilizes immersive AR and VR devices for JA skill training. This platform enables participants to interact with it using their eye gaze. To validate the acceptance of the developed platforms, we conducted experiments on ASD(5) and Neurotypical (NT)(10) participants. To quantify the participant’s task performance while interacting with these platforms, we have used behavioural (time duration to register a response) and physiological parameters (Beats per minute(BPM)). The ASD group took a longer time for response registration than the NT on both AR and VR platforms (mean duration in sec, for ASD (AR/VR): 34.5/12.5; for NT (AR/VR): 8.8/4.22). Also, the physiological parameter BPM showed a similar trend, which was higher in ASD in comparison to NT for both platforms (BPM, for ASD (AR/VR): 100.49/90.27; for NT (AR/VR):87.39/86.60). The increase in cardiac activity, as quantified by BPM values for ASD, gives us an impression of the sensory sensitivities in the autistic group that lead to physiological arousal and thereby interfere with their focusing capability, resulting in delayed response. This study emphasizes the importance of monitoring physiological responses of participants during JA training. It also highlights the difficulties faced by ASD participants during these trainings in immersive AR and VR environments
|
|
15:00-15:20, Paper ThCT13.4 | |
VR-Based Mantra Meditation for Mental Wellness (I) |
|
Garg, Ankita | Indian Institue of Technology, Mandi |
Kumar, Ajoy | Indian Institute of Technology Mandi |
Garg, Shubham | University School of Information, Communication & Technology |
Behera, Laxmidhar | IIT Kanpur |
Dutt, Varun | Indian Institute of Technology Mandi |
Keywords: Virtual and Augmented Reality Systems, Human-Computer Interaction
Abstract: Emotional stability, awareness, and attention may likely be enhanced by meditation and related techniques. Since meditation practitioners may need focus and engagement, virtual reality (VR) may be helpful. Even though there has been some research on the usefulness of VR for meditation, very few studies have looked at the effectiveness of VR on audible mantram repetition (AuMR). Our research addresses this limitation by investigating the efficacy of AuMR, which is assigned to promote better cognitive health and overall brain well-being in VR. Forty-one individuals were randomly divided into two groups, test and control. The test group was engaged in a ten-minute VR-based AuMR session, while the control group did nothing in the same virtual reality setting for ten minutes. Both groups completed self-reported questionnaires before and after the intervention and electroencephalography (EEG) and heart rate variability (HRV) measurements. We evaluated EEG band power ratios such as alpha-to-beta (AB) ratio and frontal-alpha-to-temporal-theta (FATT) ratio to find the effects of VR-aided meditation. The findings of the ANOVA test demonstrated a substantial decrease in the self-reported stress, anxiety, and depression parameters. Furthermore, comparing the test group to the control group revealed a significant increase in the FATT ratio and a significant decrease in the AB ratio. We also observed significant changes in the HRV values of the test group. The study offers sufficient evidence to suggest the feasibility of AuMR in VR for cognitive wellness.
|
| |