| |
Last updated on October 3, 2023. This conference program is tentative and subject to change
Technical Program for Monday October 2, 2023
|
Mo-P1P Late-Breaking Session, Lanai |
Add to My Program |
BMI Workshop Abstracts |
|
|
|
11:00-12:00, Paper Mo-P1P.1 | Add to My Program |
Ethics Education Needed for “Responsible Research and Innovation” of BMI/Neurotechnology |
|
Fukushi, Tamami | Tokyo Online University |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: Neurotechnology has increased its contact with various fields of engineering, and the market for neurotechnology is expanding due to the fusion of big data and AI technologies, as well as the lightweight and sophisticated devices for measuring brain activity. Thus, the social implementation of neurotechnology has become a reality, and the development of an international code of ethics as well as international standard to conduct "responsible research and innovation (RRI)" that contributes to the ethical development, use, and diffusion of neurotechnology is now actively discussed from the perspective of neuroethics. It is required to develop human resources who understand the elements and neurotechnology and have knowledge of neuroethics as well as RRI. However, it is not clear what kind of engagement and educational opportunities should be provided in undergraduate and graduate school education. In this presentation, the author investigated the current situation of undergraduate and graduate education on neuroethics in Japan and the United States. The author also conducted a web-based survey of Japanese students, graduate students, and faculty regarding desirable educational topics and learning formats. A series of studies suggest that the introduction of neuroethics and RRI into engineering education requires appropriate awareness-raising and educational opportunities based on more advanced technological development trends and the formulation of international rulemaking, which might be different from the conventional structure of educational courses in philosophy and general education.
|
|
11:00-12:00, Paper Mo-P1P.2 | Add to My Program |
Toward EEG-Based Objective Assessment of Emotion Intensity |
|
Ho, Pin-Han | National Yang Ming Chiao Tung University |
Chen, Yong-Sheng | National Yang Ming Chiao Tung University |
Wei, Chun-Shu | National Yang Ming Chiao Tung University |
|
11:00-12:00, Paper Mo-P1P.3 | Add to My Program |
A Transformer Model with Spatiotemporal Input Embedding for fNIRS Data-Driven Neural Decoding |
|
Lee, Hyunmin | Daegu Gyeongbuk Institute of Science and Technology |
Kim, Taehun | DGIST |
An, Jinung | DGIST |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Passive BMIs
Abstract: In the fNIRS-based neural decoding problem, fNIRS measures local cortico-activity, so it is necessary to configure the input space to encompass all spatiotemporal features. We propose a neural decoding model with input embedding reflecting fNIRS data's spatial and temporal traits together. Our model embeds fNIRS channels corresponding to 3x3 and 5x5 sized grids based on fNIRS optodes into new channels using the depth-wise convolution layers. The embedded data is passed to the classifier through the 2D convolution layer and the transformer encoder. The transformer-based model is better for evaluating the correlated features between non-adjacent and distant channels because it inherently possesses the multi-head self-attention layer. The open dataset (mental arithmetic, BNCI Horizon 2020 [1]) was trained to verify model performance. As a data preprocessing process [2], the frequency band of 0.09Hz or more was removed through the 4th-order low-pass Butterworth filter, data segmentation was performed, and then z-score standardization was applied. As a result of the learning, the average accuracy of leave-one-subject-out (LOSO) cross-validation (CV) was 90.19%, and the average accuracy of k-fold cross-validation was 81.78%. A five-fold CV was performed five times, and the average accuracy of 25 results was used. Typically, the accuracy of the k-fold CV is supposed to be higher than that of the LOSO CV. Still, the proposed model resulted in the k-fold accuracy being lower than the LOSO accuracy. We checked the training loss graph and revealed an overfitting problem. The reason could be that the model parameter size is too large compared to the complexity of the problem (i.e., binary classification between rest state and task state) to be solved in the open dataset. Another one could be that the trained feature patterns are not very different because fNIRS data were measured only in the forebrain area, a functionally and structurally identical cortical region. Therefore, the proposed model should be further applied and updated to other datasets [3, 4] measured in the whole brain region or different cortical areas.
|
|
11:00-12:00, Paper Mo-P1P.4 | Add to My Program |
Target Spectral Band Canonical Correlation Analysis Enhancing Target Frequency Feature Extraction in SSVEP-Based BCI |
|
Ju, Young-Gi | Hallym University |
Park, Dogeun | Hallym University |
Won, Dong-Ok | Hallym University |
Keywords: Passive BMIs
Abstract: Due to its high performance, filter-bank canonical correlation analysis (FBCCA) is widely used in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs). The FBCCA has a broad subband bandwidth for detecting harmonic components. Nevertheless, the SSVEP signal responds most strongly at fundamental frequency. This study proposed a target spectral band canonical correlation analysis(TSB-CCA) that focuses on the target frequency(which is assumed to have a high fundamental frequency component). Using existing low- and high-frequency SSVEP dataset, we assessed the accuracy of the canonical correlation analysis (CCA), the FBCCA, and the proposed TSB-CCA. Consequently, it was determined that FBCCA performed better with the low frequency dataset, while TSB-CCA performed better with the high frequency dataset.
|
|
11:00-12:00, Paper Mo-P1P.5 | Add to My Program |
Stable Cognitive Attention through Ventilation Pattern Manipulation in ERP-Based Brain-Computer Interfaces |
|
Park, Dogeun | Hallym University |
Ju, Young-Gi | Hallym University |
Won, Dong-Ok | Hallym University |
Keywords: Passive BMIs
Abstract: A representative approach is event-related potential (ERP) based paradigm among the various brain-computer interface (BCI) sytems. The primary objective of this study is to investigate the effect of ventilation pattern manipulation on character recognition accuracy in the ERP-based speller paradigm. To check the possibility of the proposed methods, five subjects participated in the experiment, and each subject experienced two sessions. This study demonstrates that the proposed ventilation method is more effective than a normal method on average and lead our assumptions in a positive direction.
|
|
11:00-12:00, Paper Mo-P1P.6 | Add to My Program |
Non-Invasive Brain-Machine Interface for MindPong Game Using Electroencephalogram and Reservoir Computing Decoder |
|
Kim, Hoon-Hee | Pukyong National University |
Keywords: Brain-Computer Interfaces
Abstract: In the field of multimedia, various electroencephalogram (EEG) studies have been reported in the context of brain-machine interface research. Recently, Neuralink has demonstrated a prototype wherein they invasively measured the brainwaves of a primate and showed that it was possible to control the bar in the Pong Game merely by thought. However, this method, which involves the surgical installation of electrodes inside the skull for measuring brainwaves, poses difficulties for application to the people. Therefore, in this study, we designed a non-invasive EEG-based brain-machine interface and applied it to the Pong Game. In particular, we demonstrated that it is possible to decode the intention of manipulation directly, not through motor imagery, by measuring signals solely from the prefrontal cortex, unlike traditional brainwave decoders that use sensory-motor signals.
|
|
Mo-P3P Late-Breaking Session, Lanai |
Add to My Program |
Cybernetics and Image Processing |
|
|
|
13:45-14:45, Paper Mo-P3P.1 | Add to My Program |
Nuclear-Integrated Energy Units: Advancing Cybersecurity for Resilient Energy Systems |
|
Bhowmik, Palash Kumar | Idaho National Laboratory |
Alam, Syed | Missouri University of Science and Technology |
Talukder, Sajedul | University of Alabama at Birmingham |
Sabharwall, Piyush | Idaho National Laboratory |
Keywords: Large-Scale System of Systems, Cyber-physical systems, Decision Support Systems
Abstract: Rapidly increasing usage of nuclear-integrated energy units has created new challenges in terms of cybersecurity. This paper discusses the potential cyberthreat challenges and cyber risks associated with the widespread adoption of these units, and the role of artificial intelligence (AI) and machine learning (ML) techniques in enhancing the security and resilience of these systems.
|
|
13:45-14:45, Paper Mo-P3P.3 | Add to My Program |
Analysis of Globally Stable Periodic Orbits in Permutation Elementary Cellular Automata |
|
Okano, Taiji | HOSEI University |
Onuki, Mikito | Hosei University |
Saito, Toshimichi | HOSEI University |
Keywords: Neural Networks and their Applications, Soft Computing, Socio-Economic Cybernetics
Abstract: We consider a simple three-layer dynamical systems related to recurrent neural networks. The input to hidden layers construct an elementary cellular automaton and the hidden to output layers are one-to-one connection described by a permutation. Depending on the permutation connections, the network can generate various periodic orbits of binary vectors. Especially, we have discovered globally stable periodic orbits such that almost all initial points fall into the orbits. Based on numerical analysis, we present an important conjecture for property of globally stable periodic orbits. This is a first step to consider various periodic orbits and their engineering applications.
|
|
13:45-14:45, Paper Mo-P3P.4 | Add to My Program |
Societal Immunizing System: A New Approach to “Never Again” Function of Fatal Situations in Everyday Lives |
|
Nishida, Yoshifumi | Tokyo Institute of Technology |
Keywords: Application of Artificial Intelligence, Cybernetics for Informatics, Image Processing and Pattern Recognition
Abstract: The biological immune systems of organisms provide a function of “never again” of diseases by responding to new sources of risk that arise in everyday life by dealing with both the diversity and specificity of diseases. In this paper, the author describes the concept of a societal immunization system in which the biological function of an immune system protecting from pathogens is reproduced informatic and robotic and applied to protect from the risk of fatal accidents in the living environment. To proceed with the system concept empirically, the author focuses on unintentional childhood accidents, the second most frequent cause of child fatalities. This paper reports a developing system for demonstrating proof of concept through implementing the societal immune system that completes from the creation of knowledge on fatal situations to integration with fatal situations recognition technology. Fundamental evaluations were conducted with a dummy child in a living laboratory environment, and the feasibility of the developed system was evaluated through tests among children of two years and five years in an actual home.
|
|
13:45-14:45, Paper Mo-P3P.5 | Add to My Program |
Speech Imagery Decoding for Emotional Expression by Exploring Visual Perception from EEG Signals |
|
Park, Hyeong-Yeong | Chungbuk National University |
Yu, Seong-Hyun | Chungbuk National University Cheongju, Republic of Korea |
Lee, Seo-jin | Chungbuk National University |
Jeong, Ji-Hoon | Chungbuk National University |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Active BMIs
Abstract: This study investigates the impact of visual perception, specifically emojis, on speech imagery decoding in brain-computer interfaces for emotional representation. Nine participants took part in two sessions: a speech imagery session and speech imagery with visual perception session. The results show that deep learning, specifically EEGNet, achieved the highest accuracy in the speech imagery session. Interestingly, the inclusion of visual perception was found to reduce the user’s focus on expressing emotions through speech imagery. These findings support the notion that speech imagery decoding provides a more intuitive approach to natural emotional expression and user intention communication.
|
|
13:45-14:45, Paper Mo-P3P.6 | Add to My Program |
A Vision Transformer Model with Compressive Sensing for Crowd Density Level Classification |
|
Sun, Dan | Soochow University |
Zhang, Jin | Soochow University |
Wu, Cheng | Soochow Univerisity |
Keywords: Intelligent Transportation Systems, Smart Buildings, Smart Cities and Infrastructures
Abstract: In this paper, a Vision Transformer Model with Compressive Sensing for crowd density level classification is proposed. Crowd density level classification is an important crowd monitoring task that is widely used in public places. However, the performance of existing methods degrades when dealing with heavily occluded scenes because they have difficulty in extracting complete and accurate crowd features. To solve this problem, we perform compressed perception on selected image blocks after removing occlusions, and then feed the results into the transformer backbone network, which outputs classification results from the density classification task head. We conducted experiments in a typical occlusion scenario of subway cars, and the results show that our approach achieves relatively good results.
|
|
Mo-P4P Late-Breaking Session, Lanai |
Add to My Program |
Deep Learning |
|
|
|
15:00-16:00, Paper Mo-P4P.1 | Add to My Program |
Deepfake Detection for Palmprint Authentication |
|
Tsai, Min-Jen | National Chiao Tung University |
Chang, Cheng-Tao | Chunghwa Telecom Laboratories |
Keywords: Cybernetics for Informatics, Deep Learning, Image Processing and Pattern Recognition
Abstract: The focus of this paper is palmprint recognition. Using the database of Hong Kong Polytechnic University. A method is proposed to generate synthetic palmprint images by using different types of GANs. These images are evaluated by a MesoNet algorithm to achieve an AUC score of 0.66. The results demonstrate the algorithm’s accuracy in detecting deepfake palmprints.
|
|
15:00-16:00, Paper Mo-P4P.2 | Add to My Program |
Distributed Nuclear Norm Minimization Algorithm for Low-Rank Matrix Completion and Its Application to Low-Rank Tensor Completion |
|
Konishi, Katsumi | Hosei University |
Sasaki, Ryohei | Tokyo University of Technology |
Furukawa, Toshihiro | Tokyo University of Science |
Keywords: Machine Learning, Big Data Computing,, Deep Learning
Abstract: This paper deals wiith a low-rank matrix completion, which is a problem of estimating missing entries in a given low-rank matrix. Several singular value decomposition (SVD) based matrix shrinkage algorithms have been proposed and achieve good performance. While the SVD based algorithm takes high computational cost derived from computation of the singular value decomposition (SVD), it can be speeded up by using GPU computing. However, GPU computing is not available for very huge matrices due to the limitation of memory on GPU. In order to utilize GPU computing, this paper provides distributed singular value shrinkage algorithm. This paper also deals with a low-rank tensor completion and applies the proposed distributed algorithm to a tensor completion. Numerical examples show that the proposed distributed algorithm can recover a low-rank tensor well.
|
|
15:00-16:00, Paper Mo-P4P.3 | Add to My Program |
Exemplifying the Automated Decision Making Process of Medical Image Annotation Systems through Explainable AI Tools |
|
Hasan, Md. Rakibul | Morgan State University |
Rahman, Md | Morgan State University |
Keywords: Image Processing and Pattern Recognition, AI and Applications, Deep Learning
Abstract: The field of medical imaging analysis has undergone a revolution thanks to the incorporation of automated methods for medical image annotation that are powered by artificial intelligence (AI). The lack of interpretability and transparency in these systems, however, raises questions about how they make decisions and restricts their use in therapeutic settings. Using explainable AI (XAI) tools to illustrate the decision-making procedure of medical picture annotation systems, we propose to address this difficulty in this study. We investigate numerous XAI methods, including Grad-CAM, LRP, and attention mechanisms, to shed light on the key elements and justifications for the system's annotations. We seek to improve the interpretability, reliability, and therapeutic utility of these automated systems by the visualization and explanation of the decision-making process.
|
|
15:00-16:00, Paper Mo-P4P.5 | Add to My Program |
How to Make a Neural Network Learn from a Small Number of Examples -- and Learn Fast: An Idea |
|
Baral, Chitta | Arizona State University |
Kreinovich, Vladik | University of Texas at El Paso |
Keywords: Neural Networks and their Applications, Deep Learning, Computational Intelligence
Abstract: Current deep learning techniques have led to spectacular results, but they still have limitations. One of them is that, in contrast to humans who can learn from a few examples and learn fast, modern deep learning techniques require a large amount of data to learn, and they take a long time to train. In this paper, we show that neural networks do have a potential to learn from a small number of examples -- and learn fast. We speculate that the corresponding idea may already be implicitly implemented in Large Language Models -- which may partially explain their (somewhat mysterious) success.
|
|
Mo-P5P Late-Breaking Session, Lanai |
Add to My Program |
General I |
|
|
|
16:00-17:00, Paper Mo-P5P.1 | Add to My Program |
CATKoDR: Hybrid Context-Awareness Model Architecture for Natural Language Processing |
|
Matta, Nour | University of Technology of Troyes |
Matta, Nada | University of Technology of Troyes |
Marcante, Agata | Namkin |
Declercq, Nicolas | Namkin |
Keywords: Cognitive Computing
Abstract: Context plays a crucial role in understanding and representing the meaning of words. In previous work [1], we proposed a context awareness approach to enhance knowledge discovery from textual data. In this paper, we will present the system architecture to enable a hybrid approach to extract knowledge and its context from text.
|
|
16:00-17:00, Paper Mo-P5P.2 | Add to My Program |
H∞ Control of a Bidirectional Converter Based on Novel Interval Type-2 T-S Fuzzy Model |
|
Yu, Gwo-Ruey | National Chung Cheng University |
Chen, Z.-Y. | National Chung Cheng University |
Keywords: System Modeling and Control
Abstract: This paper proposes novel interval type-2 (IT-2) T-S fuzzy control systems and designs H-infinity controllers for a DC/DC converter. The novel IT-2 T-S fuzzy control systems proposed in this paper can divide the nonlinear converter into two subsystems. Therefore, regardless of the number of premise parameters the state-space equations have, the novel IT-2 T-S fuzzy control systems only need two control rules. It does not only reduce the number of the control rules, but it also reduces the calculation time of the microprocessor and has more relaxed stability conditions. The H-infinity performance index is used to suppress the external interference and the forward diode bias of buck mode and boost mode. The experimental results show that the novel IT-2 T-S fuzzy control systems have better performance than the existing IT-2 T-S fuzzy control systems.
|
|
16:00-17:00, Paper Mo-P5P.3 | Add to My Program |
Optimization of Charging Management in Electric Scooter Battery Swapping Stations Based on Level 3 Digital Twin System |
|
Choi, Seon Han | Ewha Womans University |
Kim, Taehoon | Korea Institute of Industrial Technology |
Park, Kyoung-young | Korea Institute of Industrial Technology |
Keywords: Digital Twin, Intelligent Transportation Systems, Discrete Event Systems
Abstract: A battery swapping station has been attracting attention to reduce long battery charging times of electric scooters. For quality service, effective charging management is necessary to be able to provide charged batteries on request. This study proposes a level 3 digital twin system to optimize the charging management. The proposed system synchronizes an agent-based simulation model, describing battery exchange process between drivers and stations, with the collected actual data via a preprocessor. Then, an optimal charging schedule is derived based on the synchronized model through an efficient simulation-based optimization tool and applied to the physical system.
|
|
16:00-17:00, Paper Mo-P5P.4 | Add to My Program |
Study Channel Hopping Sequences in BT Mesh Networks to Improve Packet Forwarding Efficiency |
|
Chuang, Yue-Ru | Fu Jen Catholic University |
Chieh-Yu, Chung | Fu Jen Catholic University |
Sheu, Shiann-Tsong | National Central University |
Keywords: Communications, Smart Buildings, Smart Cities and Infrastructures, Smart Sensor Networks
Abstract: BT mesh network adopts channel hopping and advertising method to transmit data packets. In early specification, three advertising channels and fixed hopping sequence (Ch37, Ch38, Ch39) are designed for BT mesh networks. However, when there are a large number of relay nodes deployed in the network and a large number of data packets needed to be transmitted, the advertising method and fixed hopping sequence may cause serious packet collision and reduce packet forwarding efficiency. Hence, this paper will focus on the influence study of channel hopping sequences on packet collision in an advertising environment. Based on the random distribution mesh network, this paper simulates and compares the performances of fixed hopping sequence and three proposed random hopping sequences. The results show that the proposed random hopping sequences can effectively reduce the probability of packet collision and further improve packet forwarding efficiency in the BT mesh networks.
|
|
16:00-17:00, Paper Mo-P5P.5 | Add to My Program |
Adaptive Type-2 Fuzzy Neural Network for Lateral Vehicle Control Design in Automated Driving |
|
Huang, Mei-Lin | National Yang Ming Chiao Tung University |
Zhang, Jing-Xiang | China Motor Corporation |
Chiang, Hsin-Han | National Taipei University of Technology |
Li, Hsiao-Chi | National Taipei University of Technology |
Keywords: Autonomous Vehicle, Cooperative Systems and Control, Intelligent Transportation Systems
Abstract: The automated driving control system plays an important role in autonomous vehicles. So far, control parameters in such systems mainly rely on modeling the interaction between the subsystems to derive the proper adaptive law. In this study, we propose an adaptive steering compensation scheme based on the type-2 fuzzy neural network (T2FNN) with the sliding mode learning algorithm to yield effective compensation control by learning online interactions from the measured feedback error. The conducted simulation will verify the adaptability and robustness of the proposed approach and test under several driving conditions.
|
|
16:00-17:00, Paper Mo-P5P.6 | Add to My Program |
Best Dispatching Rule Analysis for Dynamic Scheduling Problem with Periodical Demand |
|
Hirotani, Daisuke | Prefectural University of Hiroshima |
Hayashida, Tomohiro | Hiroshima University |
Nishizaki, Ichiro | Hiroshima University |
Sekizaki, Shinya | Hiroshima University |
Maeda, Ibuki | Hiroshima University |
Keywords: Manufacturing Automation and Systems, System Modeling and Control
Abstract: Dynamic scheduling can be used in scenarios in which jobs may arrive irregularly, and therefore, the schedule of jobs may need to be changed. In actuality, due to various reasons, demand for a service can suddenly change. In previous papers, various methods using genetic programming (GP) have been proposed. Using GP, an appropriate dispatching rule that determines the job sequence can be derived in a short amount of time, and derived rules are better than existing dispatching rules. However, the derived dispatching rules are not analyzed in previous papers. In this paper, dispatching rules that use GP are analyzed; the characteristics of the rules are described, and they are compared with previous dispatching rules.
|
|
Mo-PS20-T1 Regular Session, Hawaii 1 |
Add to My Program |
Machine Learning I |
|
|
|
10:45-11:00, Paper Mo-PS20-T1.1 | Add to My Program |
A Data Analysis Method Using Orthogonal Transformation in a Reproducing Kernel Hilbert Space |
|
Qu, Lingxiao | University of Aizu |
Pei, Yan | University of Aizu |
Jianqiang, Li | Beijing University of Technology |
Keywords: Machine Learning, AI and Applications, Machine Vision
Abstract: We propose a data analysis method that combines the objectives of nonlinear principal component analysis and nonlinear discriminant analysis with the kernel method in a reproducing kernel Hilbert space. This method addresses nonlinear data analysis problems in high-dimensional spaces, specifically the reproducing kernel Hilbert space, through the use of the kernel trick. Our proposed method can be considered as a semi-supervised data analysis approach. We evaluate our proposed method using various kernel functions and datasets, both visually and quantitatively. The evaluation results demonstrate that our proposal outperforms kernel principal component analysis and generalized discriminant analysis in terms of classification performance. This indicates the advantages and originality of our proposed method. Furthermore, we analyze and discuss our findings based on the evaluation results, and highlight potential areas for further research and future work related to our proposal.
|
|
11:00-11:15, Paper Mo-PS20-T1.2 | Add to My Program |
Using SMOTE-Based Data Augmentation for Social Media Time Series Prediction |
|
Mubang, Fred | University of South Florida |
Hall, Lawrence | University of South Florida |
Keywords: Machine Learning, AI and Applications
Abstract: In the context of predicting activity on a social network, data for any individual will be limited. Also, low levels of activity for a topic of interest may make it difficult to build a strong predictive model. This work examines how augmentation by oversampling all activity data used to build a predictive model of user activity on Twitter can be used to improve the fidelity of predictions. Our features are counts of activity by de-identified users at the granularity of hours. It is shown that for some topics oversampling the data by creating new synthetic examples provides an effective way to increase the accuracy of predictions of future activity.
|
|
11:15-11:30, Paper Mo-PS20-T1.3 | Add to My Program |
Training Knowledge Inheritance through Deep Q-Net |
|
Zhang, Enzhi | Hokkaido University |
Dong, Bochen | Western University |
Wahib, Mohamed | RIKEN Center for Computational Science |
Zhong, Rui | Hokkaido University |
Munetomo, Masaharu | Hokkaido University |
Keywords: Machine Learning, Deep Learning, Image Processing and Pattern Recognition
Abstract: When training neural networks, the weights of the model are updated at each optimization step, and the older weights are discarded}. In this paper, we propose a method called, Training Knowledge Inheritance (TKI), to use the knowledge about the progression of weight and loss data in reducing overfitting and improving the generalization in the later stages of training. We reformulate the traditional gradient optimization problem as a reinforcement learning task. In particular, by treating the trainable weight space as an environment, the learning rate as action, and the validation accuracies as the rewards, we train a Q-network (controller) to learn the discounted future validation accuracy and guide the later training of another network (worker). We conduct the experiments on the MNIST, CIFAR-10, and CIFAR-100 datasets with naive dense neural networks and ResNet-56. Our results show that TKI could emph{rediscover} learning rate schedule rules similar to previous works, including increasing, decaying, and cyclical repeating.
|
|
11:30-11:45, Paper Mo-PS20-T1.4 | Add to My Program |
Mutual Learning for Pattern Recognition |
|
Chowdhury, Sabrina Tarin | Indiana University Purdue University Indianapolis |
Mukhopadhyay, Snehasis | IUPUI |
Narendra, Kumpati | Yale Univ |
Keywords: Machine Learning, Image Processing and Pattern Recognition, Neural Networks and their Applications
Abstract: Mutual learning algorithm can be an efficient mechanism for improving machine learning and neural network efficiency in a multi-agent system. Specifically, in many cases, where the system cannot be trained using a big training dataset, the data exchange in the teacher-student network system can lead to efficient learning. Usually, in mutual learning algorithms, a big network plays the role of a static teacher and passes the data to smaller networks, known as student networks, to improve the efficiency of the latter. In this paper, we will show that two small networks can dynamically play the changing roles of teacher and student to share their knowledge and hence, the efficiency of both networks improves simultaneously. We demonstrate the concept and the proposed mutual learning algorithm using convolutional neural networks (CNNs) to recognize the benchmark Modified National Institute of Standards and Technology (MNIST) hand-writing dataset
|
|
11:45-12:00, Paper Mo-PS20-T1.5 | Add to My Program |
Training Minimal Complexity Support Vector Machines with Multiple Kernels |
|
Abe, Shigeo | Kobe University |
Keywords: Machine Learning
Abstract: The minimal complexity support vector machine (ML1 SVM), which is a fusion of the standard support vector machine (L1 SVM) and the minimal complexity machine (MCM), works to improve the generalization ability over the L1 SVM. In this paper, we introduce a three-kernel structure into the ML1 SVM. We call it ML1 SVM (MK). We speed up model selection by three-stage cross-validation: In the first stage, we determine the hyperparameters for the L1 SVM (MK) fixing the hyperparameters for multiple kernels. In the second stage we determine the hyperparameters for the multiple kernels, and in the third stage, we determine the hyperparameter for the ML1 SVM (MK) for controlling the maximum margin. We also discuss a method for accelerating the second stage of model selection. In the computer experiment using two-class and multiclass problems, the ML1 SVM (MK) showed statistically comparable to, or better than the L1 SVM (MK), ML1 SVM, and L1 SVM.
|
|
Mo-PS20-T2 Regular Session, Kona |
Add to My Program |
Machine Vision |
|
|
|
10:45-11:00, Paper Mo-PS20-T2.1 | Add to My Program |
Continually-Adapted Margin and Multi-Anchor Distillation for Class-Incremental Learning |
|
Chen, Yi-Hsin | National Yang Ming Chiao Tung University |
Chen, Dian-Shan | National Yang Ming Chiao Tung University |
Weng, Ying-Chieh | National Yang Ming Chiao Tung University |
Peng, Wen-Hsiao | National Yang Ming Chiao Tung University |
Chiu, Wei-Chen | National Yang Ming Chiao Tung University |
Keywords: Machine Vision, Deep Learning, Representation Learning
Abstract: This paper addresses the problem of class-incremental learning. The model is trained to recognize the classes added incrementally. It thus suffers from the challenging issue of catastrophic forgetting. Stemming from the knowledge distillation idea of attempting to retain the model's knowledge on seen classes while learning the newly-added ones, we advance to further alleviate the catastrophic forgetting via our proposed multi-anchor distillation objective, which is realized by constraining the spatial relationship between the input data and the multiple class embeddings of each seen class in the feature space while training the model. Moreover, since the knowledge distillation for incremental learning generally relies on keeping a replay buffer to store the samples of seen classes, the buffer of limited size brings another issue of class imbalance: the number of samples from each seen class decreases gradually, thus being much smaller than the number of samples from each new class. We therefore propose to introduce the continually-adapted margin into the classification objective for tackling the prediction bias towards new classes caused by the class imbalance. Experiments are conducted on various datasets and settings to demonstrate the effectiveness and superior performance of our proposed techniques in comparison to several state-of-the-art baselines.
|
|
11:00-11:15, Paper Mo-PS20-T2.2 | Add to My Program |
Visible-Infrared Features Fusion Based Object Detection |
|
Yang, Fan | University of Alberta |
Cheng, Irene | University of Alberta |
Keywords: Machine Vision, Image Processing and Pattern Recognition, Deep Learning
Abstract: Fusion techniques are frequently utilized in the realm of multimodal object detection tasks. While many current studies showcase their proficiency in generating visually pleasing fused images, only a limited number of them focused on the object detection performance. This study addresses the issue by presenting an end-to-end framework for object detection through the fusion of visible and infrared features (VIFF). Specifically, our approach involves the use of two distinct processing units that independently extract features from visible and infrared images, followed by the fusion of these features using a novel fusion strategy. While the visible feature processing unit preserves the direction of the gradient of visible images, the infrared feature processing unit focuses on extracting the contrast and semantic features of infrared images. Both features are aggregated by attention mechanisms and then fed into the backbone of the object detection networks. Our fusion network achieved superior object detection accuracy compared to existing state-of-the-art approaches on various datasets. We have also demonstrated that the proposed visible feature and infrared feature processing units are capable of enhancing the performance of various object detection models.
|
|
11:15-11:30, Paper Mo-PS20-T2.3 | Add to My Program |
Mitigating Forgetting in Continual Learning Via Contrasting Semantically Distinct Augmentations |
|
Yu, Sheng-Feng | Macronix International Co., Ltd |
Chiu, Wei-Chen | National Yang Ming Chiao Tung University |
Keywords: Machine Vision, Representation Learning, Deep Learning
Abstract: Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the ``catastrophic forgetting'' issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the textit{semantically distinct augmentation} technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.
|
|
Mo-PS20-T3 Regular Session, Hawaii 5 |
Add to My Program |
Human Perception in Multimedia |
|
|
|
10:45-11:00, Paper Mo-PS20-T3.1 | Add to My Program |
Generation of Height-Map Images with Desired Haptic Sensation Based on Image Features |
|
Kurita, Yuichi | Hiroshima University |
Kanemoto, Takuma | Hiroshima University |
Keywords: Haptic Systems, Kansei (sense/emotion) Engineering, Human Perception in Multimedia
Abstract: In this paper, we propose a method for generating height-map images that can achieve surface texture with the haptic sensation desired by the user. A texture plate was generated based on the height-map image, and a haptic data set was generated through questionnaire evaluation when subjects freely touched the plate. Next, we constructed a haptic estimation model using features based on image decomposition methods and confirmed that haptic sensation can be estimated from image features. Then, we set up a method to calculate evaluation values for haptic and visual impressions and set up an objective function using these values. Finally, we confirmed that we can generate height-map images with the desired haptic sensation by optimizing the objective function to minimize it through verification with a test set.
|
|
11:00-11:15, Paper Mo-PS20-T3.2 | Add to My Program |
Human Eyeblink Detection in the Field Using Wearable Eye-Trackers |
|
Nishizono, Ryota | NTT Communication Science Laboratories |
Saijo, Naoki | NTT Communication Science Laboratories |
Kashino, Makio | NTT Communication Science Laboratories |
Keywords: Biometrics and Applications,, Human Factors, Human Perception in Multimedia
Abstract: Eyeblink dynamics, including eyeblink rate, duration, and timings, are widely recognized to reflect internal states such as cognitive and psychological states. Gaze movement strategies have been widely investigated beyond laboratory settings to infer the internal states; however, eyeblink behavior has not been well investigated. Consequently, the relationship between eyeblink dynamics and internal states under natural behavior is still unclear. Eyeblink detection algorithms for head-worn eye trackers are susceptible to extreme eye angles and illumination changes; this limits the study of eyeblink in natural behavior. This issue is addressed by a supervised-machine-learning-aided data analysis pipeline that offers a reliable baseline for eyeblink detection, even when using datasets acquired under harsh conditions. Our pipeline consists of training a deep convolutional neural network model by a transfer learning approach, prediction using the model, and efficient performance monitoring incorporating eyeblink characteristics. In the case study, we analyze eye videos during formula car driving data by our proposed method and the default method of an off-the-shelf eyetracker. Our method predicts better than the present software in terms of the proportion of the predicted eyeblinks with the appropriate duration range as spontaneous eyeblinks and estimated precision. The results suggest that our approach could serve as a valuable tool for cognitive, psychological, and human factors research by enabling high-quality eyeblink dynamics data acquisition from various eyetracking data.
|
|
11:15-11:30, Paper Mo-PS20-T3.3 | Add to My Program |
Multi-Sensory Visual-Auditory Fusion of Wearable Navigation Assistance for People with Impaired Vision |
|
Li, Guoxin | The Institute of Artificial Intelligence, Hefei Comprehensive Na |
Li, Zhijun | South China Univ. of Tech |
Xia, Haisheng | University of Science and Technology of China |
Feng, Ying | South China University of Technology |
Keywords: Human-Computer Interaction, Augmented Cognition, Wearable Computing
Abstract: Navigating independently is a challenge for visually impaired vision due to the demand of obstacles avoiding, recognizing desired objects, and wayfinding in complicated environments. In this paper, we present an augmented wearable E-Glasses with a set of sensors, where an object detection neural network based on visual-auditory fusion method is employed to search desired targets, thus addressing navigation challenges and improving the mobility and independence of the visually impaired. We demonstrate advanced navigation capabilities: indoor wayfinding, recognizing and steering the users to desired goals, and a sequence of indoor challenges. The fusion network adopts a feature-level fusion strategy, which is capable to align two modalities automatically and effectively integrate visual features and audio features. Across all experiments, the developed fusion algorithm has a 94.67% success rate. The wearable E-Glasses supply a platform that helps to improve the mobility and quality of life of people with impaired vision.
|
|
11:30-11:45, Paper Mo-PS20-T3.4 | Add to My Program |
Audibility and Preference of Musical Instrument by People with Hearing Loss |
|
Hiraga, Rumi | Tsukuba University of Technology |
Shiraishi, Yuhki | Tsukuba University of Technology |
Yasu, Keiichi | Tsukuba University of Technology |
Keywords: Assistive Technology, Human Factors, Kansei (sense/emotion) Engineering
Abstract: Some deaf and hard-of-hearing (DHH) individuals like to ``listen to" music. For them to access a wider range of music, both the preference for and the audibility of music are important. As a starting point for understanding music accessibility, we investigated the audibility and preference for 19 musical instruments using five pitches totaling 95 timbres. To examine how people listen to music daily, owing to the difficulty of having participants come to the laboratory due to the spread of COVID-19, we conducted the experiment online. Participants listened to short clips of 95 sounds and subjectively answered questions on their audibility and preferences. An analysis of their subjective answers revealed the audibility and preference differences among instruments, specifically that the recorder lacks the same audibility and preference as some of the other instruments at certain pitches. However, there was little correlation between audibility and timbre preferences. The study findings will be used in music classes in which DHH students participate and indicate that timbre audibility is necessary for building music recommendation systems for DHH individuals. We also discuss the limitations and possibilities of online experiments using sound.
|
|
Mo-PS20-T4 Regular Session, Honolulu |
Add to My Program |
Data-Driven Approaches and Machine Learning I |
|
|
|
10:45-11:00, Paper Mo-PS20-T4.1 | Add to My Program |
An Offline Profile-Guided Optimization Strategy for Function Reordering on Relational Databases |
|
Chen, Weibin | The Chinese University of HongKong (ShenZhen) |
Chung, Yeh-Ching | The Chinese University of HongKong (ShenZhen) |
Keywords: Enterprise Information Systems, Consumer and Industrial Applications, Infrastructure Systems and Services
Abstract: Profile-guided optimization (PGO) is an advanced technique used to improve the performance of Relational Databases (RDBs). However, the most common strategy is to perform profiling on the production environment, which can lead to instability and performance loss for the Database System. To address this issue, we propose an offline profiling strategy that uses a query reduction strategy to obtain a reduced query set from the production environment's log file. We then run this sample on an offline test environment to collect profile data and use it to conduct function reorder optimization. To evaluate our approach, we compared the performance improvements achieved by running a reduced query set and a full query set on a MYSQL database. We generated function call graphs for both query sets and observed that the performance improvement achieved by the reduced query set was slightly lower than that of the full query set. These results demonstrate the effectiveness of our approach and highlight the potential benefits of offline profiling strategies for improving the performance of RDBs in production environments, while also avoiding performance losses and increasing stability.
|
|
11:15-11:30, Paper Mo-PS20-T4.3 | Add to My Program |
Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes |
|
Venkataramanan, Revathy | University of South Carolina |
Roy, Kaushik | University of South Carolina |
Raj, Kanak | University of South Carolina |
Prasad, Renjith | University of South Carolina |
Zi, Yuxin | University of South Carolina |
Narayanan, Vignesh | University of South Carolina |
Sheth, Amit | University of South Carolina |
Keywords: Decision Support Systems, Consumer and Industrial Applications
Abstract: As people become more aware of their food choices, food computation models have become increasingly popular in assisting people in maintaining healthy eating habits. For example, food recommendation systems analyze recipe instructions to assess nutritional contents and provide recipe recommendations. The recent and remarkable successes of generative AI methods, such as auto-regressive large language models, can lead to robust methods for a more comprehensive understanding of recipes for healthy food recommendations beyond surface-level nutrition content assessments. In this study, we explore the use of generative AI methods to extend current food computation models, primarily involving the analysis of nutrition and ingredients, to also incorporate cooking actions (e.g., add salt, fry the meat, boil the vegetables, etc.). Cooking actions are notoriously hard to model using statistical learning methods due to irregular data patterns - significantly varying natural language descriptions for the same action (e.g., marinate the meat vs. marinate the meat and leave overnight) and infrequently occurring patterns (e.g., add salt occurs far more frequently than marinating the meat). The prototypical approach to handling irregular data patterns is to increase the volume of data that the model ingests by orders of magnitude. Unfortunately, in the cooking domain, these problems are further compounded with larger data volumes presenting a unique challenge that is not easily handled by simply scaling up. In this work, we propose novel aggregation-based generative AI methods, Cook-Gen, that reliably generate cooking actions from recipes, despite difficulties with irregular data patterns, while also outperforming Large Language Models and other strong baselines.
|
|
Mo-PS20-T5 Regular Session, Kahuku |
Add to My Program |
Vehicles/Objects Manipulations |
|
|
|
10:45-11:00, Paper Mo-PS20-T5.1 | Add to My Program |
Towards Intelligent Training Systems for Customer Service |
|
Song, Shuangyong | China Telecom Corporation Ltd |
Liu, Shixuan | Australian National University |
Keywords: Human-Computer Interaction, Human-centered Learning
Abstract: Customer service is very important in many industrial fields, and the service quality is most essential. However, customer service practitioners are with a high turnover rate, and it usually takes months for a new customer service employee to be an experienced one. If the training of new employees is conducted by other experienced employees, there will be a high resource consumption. Therefore, intelligent training systems for customer service can be designed to replace the manual training. In this paper, we define the task of intelligent training for customer service and propose an architecture of intelligent training systems. Dialogue scripts are prepared offline, and a dialogue simulation module and a service evaluation module are separately intended for the online service training and the service quality evaluation. We evaluate state-of-the-art models with respect to the ability to provide service training, and the experimental results show that our proposed system is effective on this task.
|
|
11:00-11:15, Paper Mo-PS20-T5.2 | Add to My Program |
Comparison of Communication Modalities: Safe and Efficient Interaction between an Automated Vehicle and a Pedestrian |
|
Hübner, Maximilian | Technical University of Munich |
Mühlbauer, Michael | Technical University of Munich |
Rettenmaier, Michael | Technical University of Munich |
Feierle, Alexander | Technical University of Munich |
Bengler, Klaus | Chair of Ergonomics, Technical University of Munich |
Keywords: Human-Machine Interface, Human Factors, Virtual and Augmented Reality Systems
Abstract: As the communication between automated vehicles and human road users becomes increasingly important, this study aimed to investigate the effect of different communication modalities on traffic safety and efficiency. A VR study was conducted wherein 37 participants interacted as pedestrians with an automated vehicle to evaluate implicit communication via lateral offset, explicit communication via an external human-machine interface, and a combination of both. The results indicate that the combined communication approach was ranked the highest in subjective evaluations, while the external human-machine interface achieved the best efficiency results. In terms of traffic safety, no differences were found between the different communication modalities. The study concludes that a combined communication approach, consisting of both implicit and explicit signals, can enable safe and efficient cooperation between traffic participants in future interactions between automated vehicles and pedestrians. However, in situations involving further agents, such as manual vehicles, clearer communication may be necessary. Overall, this study contributes to a better understanding of communication strategies for automated vehicles and provides insights for future research.
|
|
11:30-11:45, Paper Mo-PS20-T5.4 | Add to My Program |
Quantification of Motion Gracefulness Focused on Knees and Hips |
|
Yoneda, Ryo | Osaka Institute of Technology |
Ueda, Etsuko | National Institute of Technology, Kagoshima College |
Keywords: Human Performance Modeling, Human Factors
Abstract: One of the adjectives used to describe human behavior is ``Gracefulness''. Graceful motion can make a good impression on people.We have proposed an evaluation formula for ``Gracefulness'' by focusing on ``Graceful motion'' defined by William Hogarth and analyzing the trajectory of the arm.However, this evaluation formula is difficult to apply to whole-body movements.Therefore, in this report, in addition to the evaluation of the trajectory of the hand, we analyzed the periodicity and frequency components of movements focusing on the parts of the body other than the arm.The results of these analyses were compared with the impressions of gracefulness obtained from a questionnaire survey.
|
|
Mo-PS20-T6 Regular Session, Oahu |
Add to My Program |
Human Collaborative Robotics |
|
|
|
10:45-11:00, Paper Mo-PS20-T6.1 | Add to My Program |
Evaluation of On-Robot Depth Sensors for Industrial Robotics |
|
Adamides, Odysseus | Rochester Institute of Technology |
Avery, Alexander | Rochester Institute of Technology |
Subramanian, Karthik | Rochester Institute of Technology |
Sahin, Ferat | Rochester Institute of Technology |
Keywords: Human-Collaborative Robotics, Systems Safety and Security,
Abstract: This work evaluates a Continuous Wave (CW) Time-of-Flight (ToF) camera, Stereoscopic camera, and LiDAR to determine if they are potential candidates for point-rich on-robot sensing in Speed and separation monitoring (SSM) applications. These experiments characterize the static and dynamic behaviors of the sensors while mounted on-robot. From these tests, it was found that ToF and Stereo cameras exhibit better performance to their more expensive LiDAR counterpart. Specifically, it was observed that the ToF camera demonstrated better depth accuracy while the Stereo camera generated better 3D reconstruction accuracy. Overall, ToF and Stereo Cameras demonstrate that with continued innovation and integration, these sensors could become the building blocks to point rich on-robot SSM.
|
|
11:00-11:15, Paper Mo-PS20-T6.2 | Add to My Program |
A Mixed Reality System for Interaction with Heterogeneous Robotic Systems |
|
Villani, Valeria | University of Modena and Reggio Emilia |
Capelli, Beatrice | University of Modena and Reggio Emilia |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: User Interface Design, Virtual and Augmented Reality Systems, Human-Collaborative Robotics
Abstract: The growing spread of robots for service and industrial purposes calls for versatile, intuitive and portable interaction approaches. In particular, in industrial environments, operators should be able to interact with robots in a fast, effective, and possibly effortless manner. To this end, reality enhancement techniques have been used to achieve efficient management and simplify interactions, in particular in manufacturing and logistics processes. Building upon this, in this paper we propose a system based on mixed reality that allows a ubiquitous interface for heterogeneous robotic systems in dynamic scenarios, where users are involved in different tasks and need to interact with different robots. By means of mixed reality, users can interact with a robot through manipulation of its virtual replica, which is always colocated with the user and is extracted when interaction is needed. The system has been tested in a simulated intralogistics setting, where different robots are present and require sporadic intervention by human operators, who are involved in other tasks. In our setting we consider the presence of drones and AGVs with different levels of autonomy, calling for different user interventions. The proposed approach has been validated in virtual reality, considering quantitative and qualitative assessment of performance and user's feedback.
|
|
11:15-11:30, Paper Mo-PS20-T6.3 | Add to My Program |
Co-Operation of a Dual-Arm Robotic Avatar through Body Integration of Multi-Person |
|
Sato, Tsugumi | Nagoya Institute of Technology |
Yukawa, Hikari | Nagoya Institute of Technology |
Minamizawa, Kouta | Keio University Graduate School of Media Design |
Tanaka, Yoshihiro | Nagoya Institute of Technology |
Keywords: Human-Collaborative Robotics, Human-Machine Cooperation and Systems, Human-Machine Interaction
Abstract: Humans have spatiotemporal limitations. Robotic avatar technology enables the expansion of an individual’s innate physical characteristics, physical capabilities, and perceptual and cognitive abilities. Our research group has focused on scalability and co-creativity through co-operation and collaboration between humans, and has developed a system in which two operators are physically integrated into a single robotic arm. In particular, to enable more human-like tasks, such as receiving and delivering, we extended the system to a dual-arm robotic avatar and built a system in which multiple persons can integrate their bodies, with the expectation of further task efficiency and human extension. Specifically, we investigated the workability and operator cognition of the dual-arm robotic avatar when it was operated by one, two, or three individuals. The results showed that stability improved with the three-person operation. Furthermore, it was found that co-operation among multiple persons can overcome the original individual abilities for operation. In the three-person condition, all participants obtained a higher feeling of control than the actual amount of manipulation, which has potential applications in skill transfer. The three-person condition tended to be better, validating the usefulness of multi-person body integration.
|
|
11:45-12:00, Paper Mo-PS20-T6.5 | Add to My Program |
Human-Robot Interaction and Collaboration Utilizing Voluntary Bimanual Coordination |
|
Huang, Shouren | Tokyo University of Science |
Cao, Yongpeng | The University of Tokyo |
Murakami, Kenichi | The University of Tokyo |
Ishikawa, Masatoshi | Tokyo University of Science |
Yamakawa, Yuji | The University of Tokyo |
Keywords: Human-Machine Interaction, Human-Collaborative Robotics, Human-Machine Interface
Abstract: In daily life, we realize various complex tasks with our two upper limbs based on the so-called bimanual coordination phenomenon. A fundamental feature of bimanual coordination is the natural tendency to synchronize the motion of two upper limbs, resulting in some preferred patterns of interlimb coordination. In this study, based on the coarse-to-fine human-robot collaboration framework, we investigate the possibility of realizing human-robot interaction and collaboration for accurate manipulation utilizing voluntary bimanual coordination. The practical motivation of utilizing the voluntary bimanual coordination, which can be perceived as an indirect way of implementing interlimb transmission of force feedback information, is to avoid the bad effect on force feedback presentation due to the counterforce if the human-robot collaboration was realized in an unimanual manner. For experimental studies, we firstly evaluated the synchronous protocols in terms of in-phase and anti-phase between two arms. Based on the studied protocol, moving target tracking was then demonstrated as an application scenario of the proposed human-robot collaboration utilizing voluntary bimanual coordination.
|
|
Mo-PS20-T7 Special Session, Hawaii 2 |
Add to My Program |
Quantum Control and Its Applications |
|
|
Co-Chair: Dong, Daoyi | University of New South Wales |
Organizer: Pan, Yu | Zhejiang University |
Organizer: Dong, Daoyi | Australian National University |
Organizer: Chen, Chunlin | Nanjing University |
Organizer: Cui, Wei | South China University of Technology |
|
10:45-11:00, Paper Mo-PS20-T7.1 | Add to My Program |
Guided Reward Design in Continuous Reinforcement Learning for Quantum Control (I) |
|
Zhou, Shumin | University of Science and Technology of China |
Ma, Hailan | University of New South Wales |
Kuang, Sen | University of Science and Technology of China |
Dong, Daoyi | University of New South Wales |
Keywords: Quantum Cybernetics, Deep Learning
Abstract: Reinforcement learning has been intensively applied to tackle complex quantum control problems owing to its adaptability in dynamic environments. However, commonly used reinforcement learning algorithms are restricted to selecting actions from a discrete action space, which may result in inaccurate control. In order to surpass this constraint, we propose an improved continuous reinforcement learning algorithm that can generate control policies in a continuous action space, enabling precise control of quantum systems. Moreover, a guided reward function design method is proposed to guide the learning process toward higher fidelity. Numerical results demonstrate that the proposed continuous reinforcement learning algorithm with a guided reward function can capably prepare states on one-qubit and two-qubit systems.
|
|
11:15-11:30, Paper Mo-PS20-T7.3 | Add to My Program |
Learning Quantum Distributions Based on Normalizing Flow (I) |
|
Li, Li | Tongji University |
Wang, Yong | Tongji University |
Cheng, Shuming | Tongji Univeristy |
Liu, Lijun | Shanxi Normal University |
Keywords: Quantum Cybernetics, Quantum Machine Learning
Abstract: Abstract—Learning many-body quantum systems is of fundamental importance in quantum information processing, however, it is a challenging task which typically requires estimating quantum distributions of dimensionality exponentially scaling to the system size. As generative models have shown a great scalability to learn high-dimensional distributions and found wide applications in the domain of image and text, they can be a powerful tool to facilitate us to accomplish the challenging quantum tasks. In this work, we propose using normalizing flow (NF) models with fast sampling to learn discrete quantum distributions for quantum state tomography. Particularly, three NF models, including denoising flow, argmax flow, and tree flow, are first adapted to the task of explicit quantum probability density estimation. We then perform extensive experiments on a large scale of quantum systems, and our numerical results demonstrate that these discrete NFs admit an excellent sampling efficiency in the sense that they are insensitive to the system size to learn the high-dimensional quantum distributions, without compromising the learning performance. Finally, in comparison to the other generative models, such as autoregressive, the NFs avoid the problem of slow sequential sampling.
|
|
Mo-PS20-T8 Special Session, Hawaii 3 |
Add to My Program |
Theory and Applications of Distributed Intelligent Methods |
|
|
Chair: Senkerik, Roman | Tomas Bata University in Zlin |
Organizer: Kromer, Pavel | VSB-Technical University of Ostrava |
Organizer: Senkerik, Roman | Tomas Bata University in Zlin |
|
10:45-11:00, Paper Mo-PS20-T8.1 | Add to My Program |
Investigating the Potential of AI-Driven Innovations for Enhancing Differential Evolution in Optimization Tasks (I) |
|
Pluhacek, Michal | Tomas Bata University in Zlin |
Kazikova, Anezka | Tomas Bata University in Zlin |
Viktorin, Adam | Tomas Bata University in Zlin |
Kadavy, Tomas | Tomas Bata University in Zlin |
Senkerik, Roman | Tomas Bata University in Zlin |
Keywords: Evolutionary Computation, Application of Artificial Intelligence, Computational Intelligence
Abstract: In recent years, artificial intelligence (AI) and machine learning have demonstrated remarkable potential in various application domains, including optimization. This study investigates the process of leveraging AI, particularly large language models (LLMs), to enhance the performance of metaheuristics, with a focus on the well-established Differential Evolution (DE) algorithm. We employ GPT, a state-of-the-art LLM, to propose an improved mutation strategy based on a dynamic switching mechanism, which is then integrated into the DE algorithm. Throughout the investigation, we also observe and analyze any errors or limitations the LLM might exhibit. We conduct extensive experiments on a comprehensive set of 30 benchmark functions, comparing the performance of the proposed AI-inspired strategy with the standard DE algorithm. The results suggest that the AI-driven dynamic switching mutation strategy provides a competitive edge in terms of solution quality, showcasing the potential of using AI to guide the development of improved optimization algorithms. This work not only highlights the effectiveness of the proposed strategy but also contributes to the understanding of the process of using LLMs for enhancing metaheuristics and the challenges involved therein.
|
|
11:00-11:15, Paper Mo-PS20-T8.2 | Add to My Program |
Designing and Predicting the Performance of Agent-Based Models for Solving Best-Of-N (I) |
|
Jain, Puneet | Brigham Young University, Provo |
Goodrich, Michael | Brigham Young University |
Keywords: Agent-Based Modeling, Swarm Intelligence, Optimization and Self-Organization Approaches
Abstract: Biological inspiration from honeybees, insects, and other animals has been used to create interesting implementations of multi-robot swarms. When the robots in a swarm are completely distributed, that is they lack any form of centralized control, the swarm acts as an agent-based model (ABM) wherein each agent implements its own controller and collective behavior emerges from the interactions between agents. Differential equation and graph-based models of some types of swarms have been used to guarantee collective behavior, but guaranteeing or predicting outcomes for hub-based agent colonies with finite numbers of robots remains an open problem. This paper presents a case study of designing an agent-based, hub-based swarm that solves the best-of-N problem with predictable success rates and completion times. The key innovation is modifying a tripartite graph formulation (TGF) from previous work so that it acts as a graph schema which abstracts an ABM into a simplified four state model, which in turn leads to a large discrete time Markov chain (DTMC) that describes how the collective state evolves over time. The DTMC can be used to compute success rates and completion times, which act as predictions for the ABM. Deviations between observed ABM outcomes and DTMC predictions lead to modifications in the ABM so that the swarm becomes more predictable.
|
|
11:15-11:30, Paper Mo-PS20-T8.3 | Add to My Program |
Efficient Time-Delay System Optimization with Auto-Configured Metaheuristics (I) |
|
Senkerik, Roman | Tomas Bata University in Zlin |
Kadavy, Tomas | Tomas Bata University in Zlin |
Viktorin, Adam | Tomas Bata University in Zlin |
Janku, Peter | Tomas Bata University in Zlin |
Pluhacek, Michal | Tomas Bata University in Zlin |
Kominkova Oplatkova, Zuzana | Tomas Bata University in Zlin, Faculty of Applied Informatics |
Guzowski, Hubert | AGH University of Science and Technology |
Smolka, Maciej | AGH University of Science and Technology |
Byrski, Aleksander | AGH University of Science and Technology |
Pekar, Libor | Tomas Bata University in Zlin |
Matusu, Radek | Tomas Bata University in Zlin |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Application of Artificial Intelligence
Abstract: This paper presents an experimental study that compares the performance of four selected metaheuristic algorithms for optimizing a time delay system model. Time delay system models are complex and challenging to optimize due to their inherent characteristics, such as non-linearity, multi-modality, and constraints. The study includes an explanation of the choice and core functionality of the selected algorithms, which are both baseline and state-of-the-art variants of self-organizing migrating algorithm (SOMA), state-of-the-art variant from the Success-History-based Adaptive Differential Evolution family of algorithms, with emphasis on diverse search (DISH algorithm), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. The hyperparameters of the metaheuristic algorithms were set using the iRace automatic algorithm configuration framework. The paper emphasizes the importance of metaheuristic algorithms in control engineering for time-delay systems to develop more effective and efficient control strategies and precise model identifications. The experimental results highlight the effectiveness of the state-of-the-art algorithms with specific adaptive mechanisms like population organization process, diverse search and adaptation mechanisms ensuring a gradual transition from exploration to exploitation. Overall, this study contributes to understanding the challenges and advantages of using metaheuristic algorithms in control engineering for time delay systems. The results provide valuable insights into the performance of modern metaheuristic algorithms and can help guide the selection of appropriate adaptive mechanisms of metaheuristics.
|
|
11:30-11:45, Paper Mo-PS20-T8.4 | Add to My Program |
Pneumatic Assist Suit to Facilitate Lower-Body Twisting for the Training of Forehand in Table Tennis (I) |
|
Kashiwagi, Akihiko | Kyushu University |
Kiguchi, Kazuo | Kyushu University |
Nishikawa, Satoshi | Kyushu University |
Keywords: Cyborgs,
Abstract: Engineering support for sports has the potential to improve training performance. Most research on sports training support has focused on support in areas that directly affect the point of force action. On the other hand, even if sports with hand-held tools, the importance of lower body motor function has been indicated. This suggests that supporting areas away from the point of force action can improve the performance of novice players. Therefore, in this study, we developed an assist suit that facilitates lower body twisting for table tennis beginners during forehand swing. Using this assistive suit, experiments were conducted for the experimental group under the following four conditions (1) "No wear (before)," (2) "Without assist," (3) "With assist," and (4) "No wear (after)" to clarify the effect of wearing, assist, and training. In addition, a control group in which participants hit the ball without the assist suit was set up to examine the training effect of repetition. The results showed that both the amount of waist rotation and racket velocity increased significantly in the experimental group compared to the control group, which indicates the effect of the assist suit on training.
|
|
Mo-PS20-T12 Workshop Session, Hawaii 4 |
Add to My Program |
Workshop 4.1 - Workshop on Neuroergonomics |
|
|
Organizer: Barresi, Giacinto | Istituto Italiano Di Tecnologia |
Organizer: Fortino, Giancarlo | University of Calabria |
|
10:45-11:00, Paper Mo-PS20-T12.1 | Add to My Program |
Patient’s Data and Models in Neuroergonomics for Healthcare |
|
Barresi, Giacinto | Istituto Italiano di Tecnologia |
Fortino, Giancarlo | University of Calabria |
|
11:00-11:15, Paper Mo-PS20-T12.2 | Add to My Program |
Piezoelectric Skin Compliant Transducers for Health Monitoring (I) |
|
Mastronardi, Vincenzo | Universitŕ del Salento |
de Marzo, Gaia | Istituto Italiano di Tecnologia |
Fachechi, Luca | Istituto Italiano di Tecnologia |
Rizzi, Francesco | Istituto Italiano di Tecnologia |
Demir, Suleyman Mahircan | Istituto Italiano di Tecnologia |
Shumba, Angela Tafadzawa | Universitŕ del Salento |
Antonaci, Valentina | Universitŕ del Salento |
Stomeo, Tiziana | Istituto Italiano di Tecnologia |
Visconti, Paolo | Universitŕ del Salento |
De Vittorio, Massimo | Istituto Italiano di Tecnologia |
|
11:15-11:30, Paper Mo-PS20-T12.3 | Add to My Program |
Implantable CMOS Neurolectronic Probes for Brain Machine Interfaces (I) |
|
Ribeiro, Joao Felipe | Istituto Italiano di Tecnologia |
Vincenzi, Matteo | Istituto Italiano di Tecnologia |
Perna, Alberto | Istituto Italiano di Tecnologia |
Orban, Gabor | Istituto Italiano di Tecnologia |
Stubbendorff, Christine | Istituto Italiano di Tecnologia |
Angotzi, Gian Nicola | Istituto Italiano di Tecnologia |
Berdondini, Luca | Fondazione Istituto Italiano di Tecnologia (IIT) |
|
11:30-11:45, Paper Mo-PS20-T12.4 | Add to My Program |
Haptics in Surgical Robotics (I) |
|
Berkelman, Peter | University of Hawaii |
Keywords: Haptic Systems, Human-Machine Interaction
Abstract: We experience haptic feedback during countless daily manual tasks which involve grasping, cutting, and manipulation of all kinds of objects and materials, either through direct contact with the hand and fingers, or mediated through handheld tools and instruments. Due to our experience and skill in these haptic tasks, it would be reasonable to surmise that teleoperated robotic minimally invasive surgery (RMIS) systems such as the well-known da Vinci from Intuitive would benefit from inclusion of haptic force feedback to the user. Yet despite extensive research and development in the areas of tool-tissue interaction force sensing and haptic feedback to teleoperators, the adoption of haptic feedack in RMIS systems remains limited. This lack of haptic feedback in most RMIS procedures may be due to many different reasons, which will be presented in detail. To add haptic feedback to a teleoperated robotic system, the control console must incorporate some actuated haptic interface device to present instrument and tissue interaction forces to the surgeon operator. The robotic instruments must include a means of sensing of interaction forces, such as a multi-axis strain gauge sensor. In the case of RMIS, it is a technical challenge to provide useful, accurate, reliable and biocompatible force sensing at the end of surgical instruments. Furthermore, experienced surgeons are accustomed to the lack of haptic feedback in manual MIS when passing laparoscopic instruments through minimally invasive incisions and trocars, and may find added force feedback to be unnatural or a distraction.
|
|
11:45-12:00, Paper Mo-PS20-T12.5 | Add to My Program |
Supporting Human Situational Awareness in Swarm Tasks (I) |
|
Hussein, Aya | University of New South Wales-Canberra |
|
Mo-PS30WS1 Workshop Session, Puna |
Add to My Program |
Active BCIs and Applications |
|
|
Co-Chair: Floreani, Erica Danielle | University of Toronto |
|
13:00-13:15, Paper Mo-PS30WS1.1 | Add to My Program |
Development and Validation of a BCI-Enabled Boccia Ramp for Sport Participation |
|
Comaduran Marquez, Daniel | University of Calgary |
Kerr McNutt, Morgan | University of Calgary |
Lillywhite, Brielle | University of Calgary |
Robu, Ion | Alberta Children's Hospital |
Irvine, Brian | University of Calgary |
Zewdie, Ephrem | University of Calgary |
Kirton, Adam | University of Calgary |
Kinney-Lang, Eli | University of Calgary |
Keywords: Brain-Computer Interfaces, Assistive Technology
Abstract: We present a brain-computer interface (BCI) system designed to enable individuals with severe motor disabilities to play Boccia, a Paralympic sport. Boccia is a precision sport in which the objective is to get a ball as close as possible to a target. In its most adapted form, Boccia allows for the use of a ramp to assist the user. The proposed system consists of a BCI-enabled ramp that can be controlled by the user's brain signals using a visual control paradigm (i.e., P300, or SSVEP). We developed a software interface using custom tools in Unity and Python for the front-end and back-end, respectively. To validate the software, we tested the system with five subjects who performed six pipelines (three with P300 and three with SSVEP) to simulate real-world use. Each pipeline consisted of 10 guided selections in the software. The classifiers used Riemannian geometry and shrinkage linear discriminant analysis (sLDA) for P300 and canonical correlation analysis (CCA) for SSVEP. The results showed that the P300 (93 +/- 3%, mean +/- SEM) paradigm had higher classification accuracy than the SSVEP (27 +/- 0.02%, mean +/- SEM) paradigm. Additionally, we designed and built a 3D CAD model and a hardware prototype of the ramp. The hardware prototype uses linear actuators to change the incline of the ramp and the height of the ball. Stepper motors allow for the rotation of the ramp and the release mechanism of the ball. Recommendations on improvements to the hardware and software components are made for future prototypes. The presented system opens new possibilities for sports applications that can improve the quality of life of people with severe motor disabilities.
|
|
13:15-13:30, Paper Mo-PS30WS1.2 | Add to My Program |
Investigating the Influence of Background Music on the Performance of a cVEP-Based BCI |
|
Henke, Lisa | Rhine-Waal University of Applied Sciences |
Rulffs, Paul | Rhine-Waal University of Applied Science |
Adepoju, Foluke | Rhine-Waal University of Applied Sciences, Faculty of Technology |
Stawicki, Piotr | Rhine-Waal University of Applied Sciences |
Cantürk, Atilla | Rhine-Waal University of Applied Sciences |
Volosyak, Ivan | Rhine-Waal University of Applied Sciences |
Keywords: Active BMIs, Passive BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: Brain-computer interfaces (BCIs) like e.g. different types of EEG-based BCI spellers (up to date the most common BCI applications) allow new methods of control and interaction with computers and machines. The impact of background music and noise on user’s performance (information transfer rate - ITR and accuracy) has already been investigated for different types of spellers, and usually led to a distractive, decreasing effect on the BCI performance, though there is always room for improving the subjective user experience. In this study, 11 participants used a cVEP-based BCI to perform two spelling tasks “BCI AND MUSIC” and “CONTRARY”, while listening via headphones to a standard instrumental song, a self-chosen instrumental song (“own music”), or to noise conditions: “no music” and “white noise”. Objective factors such as the blood pressure and heart rate were measured after each spelling task, and questionnaires were answered. The BCI accuracy was close to 100% in most cases, while the lowest accuracy was reached while listening to the individually selected, self-chosen music (“own music”). Similarly, the ITR was the highest in case of “no music” and lowest in case of “own music” (which, on the other hand, resulted in a better mood and higher excitement of the participants). Thus, while the general condition of “no music” could be recommended for further BCI experiments, additional research regarding individual factors like musicality or listening habits of BCI participants should be considered.
|
|
13:30-13:45, Paper Mo-PS30WS1.3 | Add to My Program |
Comparing the Effect of Different Electrode Subsets on P300 Speller Performance (I) |
|
Noble, Sandra-Carina | Maynooth University |
Ward, Tomas | Dublin City University |
Ringwood, John | Maynooth University |
Keywords: Active BMIs, Passive BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: The P300 speller is a widely used application in brain-computer interface research. It has been demonstrated that the P300 speller can serve as a neurofeedback training tool for attention enhancement by gradually increasing the difficulty of the spelling task. This adaptive approach makes it harder for users to spell words correctly, encouraging them to improve their attention to counteract the increasing difficulty. Therefore, the adaptive P300 speller has the potential to serve as a treatment option for children with ADHD, elderly patients with dementia, and as a cognitive enhancement tool for healthy adults. However, the training length, including setup time, needs to be quick to ensure user acceptability. This study investigates the effect of different electrode subsets on P300 speller performance, with and without the use of the xDAWN spatial filter. Results indicate that the xDAWN spatial filter can improve performance with many electrodes but can decrease results with fewer than eight electrodes. For scenarios where near-perfect performance is crucial and many electrodes are available, a set of 16 electrodes with the xDAWN spatial filter is recommended. For situations where cost and setup time are a concern and lower performances are acceptable, using six electrodes without the spatial filter were found to be sufficient.
|
|
13:45-14:00, Paper Mo-PS30WS1.4 | Add to My Program |
Channel Selection Improves Accuracy for Pediatric Users of Motor Imagery Brain-Computer Interfaces |
|
Irvine, Brian | University of Calgary |
Kinney-Lang, Eli | University of Calgary |
Maalouf, Elissa | University of Calgary |
Dowlatabadibazaz, Maziyar | University of British Columbia |
Kelly, Dion | University of Calgary |
Keough, Joanna | University of Calgary |
Kirton, Adam | University of Calgary |
Abou-Zeid, Hatem | University of Calgary |
Keywords: Active BMIs, Passive BMIs
Abstract: Children are an under-served population in the field of brain-computer interface (BCI) development. The high prevalence of lifelong disability coupled with the diversity and plasticity of children's brains make them ideal candidates for personalized BCI systems. Channel selection methods provide a tool for the in-session personalization of BCI systems. To evaluate the efficacy of channel selection for pediatric users, we tested four wrapper-based channel selection algorithms, sequential forward selection (SFS), sequential backward selection (SBS), sequential forward floating selection (SFFS), and sequential backward floating selection (SBFS) on offline motor imagery BCI data from three datasets involving typically developing children. The purpose was to assess the performance benefits and computational costs of each algorithm. All algorithms provided classification accuracy gains of 10-15 % with their optimal subsets. The time required to reach the optimal subsets varied between algorithms, but all took less than 80 s with mean completion times of 9.5 s and 35.8 s for the fastest (SFS) and slowest (SFFS), respectively. Adjusting the stopping criterion of the algorithm enables users to further reduce computation time with a disproportionately small effect on classification accuracy. All methods demonstrated an ability to prioritize expected physiological regions of interest and leave out channels detrimental to the classifier. Channel selection offers personalization of the BCI system for a specific user and a specific classifier. These findings emphasize the value of using personalized channel selection algorithms to improve motor imagery BCI systems for pediatric users.
|
|
14:00-14:15, Paper Mo-PS30WS1.5 | Add to My Program |
Analysis of Different Stimulus for Evoking the ErrP Potential in a MI-BMI for Starting the Gait with a Lower-Limb Exoskeleton |
|
Soriano-Segura, Paula | Miguel Hernandez University of Elche |
Ferrero, Laura | Miguel Hernandez University of Elche |
Gracia, Desirée I. | Miguel Hernandez University of Elche |
Ortiz, Mario | Universidad Miguel Hernández |
Iáńez, Eduardo | Miguel Hernández University of Elche |
Azorin, Jose M. | Universidad Miguel Hernandez De Elche |
Keywords: Brain-Computer Interfaces
Abstract: A new approach that includes the detection of Error Related Potentials (ErrP) for self-tuning wrong commands in MI-BMI, with the aim of improving the accuracy of a lower limb exoskeleton gait initiation system, is currently in its early stages of development. Due to the requirement of warning the subject before the command is executed, a different type of stimulus must be used to evoke the ErrP in cases where the exoskeleton is about to move against the subject's will. Therefore, it is essential to research a feedback type that better differentiates the ErrP from the correct cases, in order to achieve efficient performance of the BMI. As such, we have analyzed both Tactile (T) and VisuoTactile (VT) feedbacks to not only verify the realism of the designed protocol, but also to examine their effectiveness in eliciting ErrP.
|
|
Mo-PS30-T1 Regular Session, Hawaii 1 |
Add to My Program |
Machine Learning II |
|
|
|
13:00-13:15, Paper Mo-PS30-T1.1 | Add to My Program |
Tree-Structured Gaussian Mixture Models and Their Variational Inference |
|
Nakahara, Yuta | Waseda University |
Keywords: Machine Learning
Abstract: In this paper, Gaussian mixture models with tree structure and their variational inference methods are proposed for non-parametric Bayesian clustering. In this model, the tree structure is included as an unobservable random variable. The number of leaf nodes corresponds to the number of mixture components. This model is expected to capture not only the number of clusters but also tree structure from data.
|
|
13:15-13:30, Paper Mo-PS30-T1.2 | Add to My Program |
Causal Deep Operator Networks for Data-Driven Modeling of Dynamical Systems |
|
Nghiem, Truong X. | Northern Arizona University |
Nguyen, Thang | Texas A&M University - Corpus Christi |
Nguyen, Binh | Texas A&M University - Corpus Christi |
Nguyen, Linh | Federation University Australia |
Keywords: Machine Learning, Deep Learning, Neural Networks and their Applications
Abstract: The deep operator network (DeepONet) architecture is a promising approach for learning functional operators, that can represent dynamical systems described by ordinary or partial differential equations. However, it has two major limitations, namely its failures to account for initial conditions and to guarantee the temporal causality - a fundamental property of dynamical systems. This paper proposes a novel causal deep operator network (Causal-DeepONet) architecture for incorporating both the initial condition and the temporal causality into data-driven learning of dynamical systems, overcoming the limitations of the original DeepONet approach. This is achieved by adding an independent root network for the initial condition and independent branch networks conditioned, or switched on/off, by time-shifted step functions or sigmoid functions for expressing the temporal causality. The proposed architecture was evaluated and compared with two baseline deep neural network methods and the original DeepONet method on learning the thermal dynamics of a room in a building using real data. It was shown to not only achieve the best overall prediction accuracy but also enhance substantially the accuracy consistency in multistep predictions, which is crucial for predictive control.
|
|
13:30-13:45, Paper Mo-PS30-T1.3 | Add to My Program |
A Post-Selection Algorithm for Improving Dynamic Ensemble Selection Methods |
|
Cordeiro, Paulo Roger Gomes | Instituto Federal De Pernambuco |
Cavalcanti, George | Universidade Federal De Pernambuco |
Menelau Oliveira e Cruz, Rafael | École |
Keywords: Machine Learning, Computational Intelligence
Abstract: Dynamic Ensemble Selection (DES) is a Multiple Classifier Systems (MCS) approach that aims to select an ensemble for each query sample during the selection phase. Even with the proposal of several DES approaches, no particular DES technique is the best choice for different problems. Thus, we hypothesize that selecting the best DES approach per query instance can lead to better accuracy. To evaluate this idea, we introduce the Post-Selection Dynamic Ensemble Selection (PS-DES) approach, a post-selection scheme that evaluates ensembles selected by several DES techniques using different metrics. Experimental results show that using accuracy as a metric to select the ensembles, PS-DES performs better than individual DES techniques.
|
|
13:45-14:00, Paper Mo-PS30-T1.4 | Add to My Program |
Preference-Based Multi-Objective Optimization with Gaussian Process |
|
Huang, Tian | University of Electronic Science and Technology of China |
Li, Ke | University of Exeter |
Keywords: Machine Learning, Evolutionary Computation
Abstract: Traditional evolutionary multi-objective optimization (EMO) algorithm is to generate a set of non-dominated solutions on the Pareto front (PF). However, this technique falls short of delivering the outcomes for multi-objective optimization problems (MOPs) containing user preference. In this paper, we present a novel EMO algorithm that incorporates user preferences via a decision maker (DM). Our approach comprises three modules: consultation, preference elicitation and optimization. The DM undertakes the consultation and preference elicitation using Gaussian process (GP) to provide preference information. We employ the decomposition-based EMO algorithm (i.e., MOEA/D) for optimization. The experiment comprises two sessions. Firstly, we simulate the decision maker module with GP. Secondly, we simulate our proposed method and compare its performance with existing interactive optimization algorithms. Our research proposes a new preference-based EMO algorithm that addresses the shortcomings of traditional techniques and unlocks new possibilities for multi-objective optimization.
|
|
Mo-PS30-T2 Regular Session, Kona |
Add to My Program |
Evolutionary Computation I |
|
|
|
13:00-13:15, Paper Mo-PS30-T2.1 | Add to My Program |
Two-Stage Lazy Greedy Inclusion Hypervolume Subset Selection for Large-Scale Problem |
|
Nan, Yang | Southern University of Science and Technology |
Shu, Tianye | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Keywords: Evolutionary Computation
Abstract: Hypervolume subset selection (HSS) is a hot topic in the evolutionary multi-objective optimization (EMO) community since hypervolume is the most widely-used performance indicator. In the literature, most HSS algorithms were designed for smallscale HSS (e.g., environmental selection: select N solutions from 2N solutions where N is the population size). Few researchers focus on large-scale HSS as a post-processing procedure in an unbounded external archive framework (i.e., subset selection from all examined solutions). In this paper, we propose a twostage lazy greedy inclusion HSS (TGI-HSS) algorithm for largescale HSS. In the first stage of TGI-HSS, a small solution set is selected from a large-scale candidate set using an efficient subset selection method (which is not based on exact hypervolume calculation). In the second stage, the final subset is selected from the small solution set using an existing efficient HSS algorithm. Experimental results show that the computational time can be significantly reduced by the proposed algorithm in comparison with other state-of-the-art HSS algorithms at the cost of only a small deterioration of the selected subset quality.
|
|
13:15-13:30, Paper Mo-PS30-T2.2 | Add to My Program |
Feature Selection Using Evolutionary Techniques |
|
Gholamigazafrudy, Mandana | University of Regina |
Mouhoub, Malek | University of Regina |
Sadaoui, Samira | University of Regina |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Machine Learning
Abstract: Data clustering has many applications in machine learning, data mining and image processing. K-means is the most popular clustering algorithm due to its efficiency and simplicity of implementation. However, K-means has limitations, such as large feature spaces, which may affect its effectiveness. To improve K-means accuracy, we adopt the Biogeography-Based Optimization (BBO) evolutionary technique to select the most relevant features of datasets. We conducted several experiments to compare our approach with other methods, such as PCA and Particle Swarm Optimization (PSO). The results demonstrate the effectiveness of BBO for feature selection.
|
|
13:30-13:45, Paper Mo-PS30-T2.3 | Add to My Program |
Effects of Initialization Methods on the Performance of Multi-Objective Evolutionary Algorithms |
|
Gong, Cheng | Southern University of Science and Technology |
Pang, Lie Meng | Southern University of Science and Technology |
Nan, Yang | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Zhang, Qingfu | City University of Hong Kong |
Keywords: Evolutionary Computation
Abstract: Population initialization is always needed in evolutionary multi-objective optimization (EMO) algorithms. Intuitively, a well-designed initialization method can help facilitate the evolutionary process and improve the performance of EMO algorithms. However, very few studies have investigated the effects of initialization methods on the performance of EMO algorithms. Many existing EMO algorithms randomly generate an initial population to start the evolutionary process. To fill this research gap and attract more attention from EMO researchers to this important yet under-explored issue, in this paper, we examine the effects of various initialization methods that may become promising alternatives to the commonly-used random initialization method. Each initialization method is evaluated through computational experiments on test problems of various sizes with 5-1000 decision variables. Experimental results clearly demonstrate the advantage of well-designed initialization methods over the random initialization method. This study provides useful insights into EMO algorithm design and motivates further research on population initialization.
|
|
14:00-14:15, Paper Mo-PS30-T2.5 | Add to My Program |
Comparative Study on Different Types of Surrogate-Assisted Evolutionary Algorithms for High-Dimensional Expensive Problems |
|
Qiao, Zhuo-Yin | Nanjing University of Information Science & Technology |
Yang, Qiang | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Swarm Intelligence
Abstract: Expensive optimization problems (EOPs) are becoming more and more ubiquitous nowadays. To effectively solve such problems, surrogate-assisted evolutionary algorithms (SAEAs) have been developed. Specifically, a SAEA usually maintains a surrogate model to simulate the real objective function of an EOP. Such a surrogate model is trained based on real-evaluated solutions. Then, it is utilized to evaluate the fitness of individuals in the EA instead of the real expensive fitness evaluation. Though many SAEAs have been designed, they mainly concentrate on dealing with low-dimensional EOPs with fewer than 300 dimensions. Their performance on large-scale EOPs with more than 300 dimensions is unknown. To fill this gap, this paper conducts a comparative study on two types of state-of-the-art SAEAs with a total of four algorithms on four classical EOPs. To make comprehensive comparisons, we range the dimension size from 50 to 1000. As far as we know, this is the first time to assess SAEAs on EOPs with such a wide range of dimension sizes and such high dimensionality. The comparison results show that the optimization performance of the compared four SAEAs on high-dimensional EOPs with more than 500 dimensions is not as satisfactory as their performance on low-dimensional EOPs because of their slow convergence. Therefore, research on large-scale SAEAs for high-dimensional EOPs still deserves intensive attention.
|
|
14:15-14:30, Paper Mo-PS30-T2.6 | Add to My Program |
How to Find a Large Solution Set to Cover the Entire Pareto Front in Evolutionary Multi-Objective Optimization |
|
Pang, Lie Meng | Southern University of Science and Technology |
Nan, Yang | Southern University of Science and Technology |
Ishibuchi, Hisao | Southern University of Science and Technology |
Keywords: Evolutionary Computation
Abstract: Recently, it has been pointed out in many studies that the performance of evolutionary multi-objective optimization (EMO) algorithms can be improved by selecting solutions from all examined solutions stored in an unbounded external archive. This is because in general the final population is not the best subset of the examined solutions. To obtain a good final solution set in such a solution selection framework, subset selection from a large candidate set (i.e., all examined solutions) has been studied. However, since good subsets cannot be obtained from poor candidate sets, a more important issue is how to find a good candidate set, which is the focus of this paper. In this paper, we first visually demonstrate that the entire Pareto front is not covered by the examined solutions through computational experiments using MOEA/D, NSGA-III and SMS-EMOA on DTLZ test problems. That is, the examined solution set stored in the unbounded archive has some large holes (i.e., some uncovered area of the Pareto front). Next, to evaluate the quality of the examined solution set (i.e., to measure the size of the largest hole), we propose the use of a variant of the inverted generational distance (IGD) indicator. Then, we propose a simple modification of EMO algorithms to improve the quality of the examined solution set. Finally, we demonstrate the effectiveness of the proposed modification through computational experiments.
|
|
Mo-PS30-T3 Regular Session, Hawaii 5 |
Add to My Program |
Networking and Decision-Making |
|
|
|
13:00-13:15, Paper Mo-PS30-T3.1 | Add to My Program |
Effects of Self and Other's Intentions on Moving Behavior in Crossing Interactions |
|
Matsubayashi, Shota | Nagoya University |
Miwa, Kazuhisa | Nagoya University |
Terai, Hitoshi | Kindai University |
Ninomiya, Yuki | Nagoya University |
Keywords: Human Factors, Multi-User Interaction
Abstract: This study examines the effects of self and other's intentions on their moving performance in crossing interactions with multiple agents. An experimental task paradigm was developed to verify self and other's effects simultaneously. Different intentions were assigned to self and other in the 2-person crossing situation and to the minority and majority in the 4-person crossing situation. Further, performance was assessed based on completion time, amount of operations, and interruptions. The results show that the effects of self-intention on self-performance were generally found, but the minority's intention does not affect its interruption. For completion time and operation, the effect of other's intention decreased in the order of the self in the 2-person crossing situation, minority, and majority in the 4-person crossing situation. Noteworthy, for the interruption, the other's intention affected the majority's interruption, but it did not affect the minority's interruption. These findings emphasize the significance of simultaneously considering self and other's intentions simultaneously when analyzing crossing interactions in shared space.
|
|
13:15-13:30, Paper Mo-PS30-T3.2 | Add to My Program |
ISAR: In-Sample Advantage-Regulated Offline Reinforcement Learning |
|
Yang, Deyu | Xi'an Jiaotong University |
Ma, Chengzhong | Xi'an Jiaotong University |
Liu, Zeyang | Xi'an Jiaotong University |
Lan, Xuguang | Xi'an Jiaotong University |
Keywords: Cognitive Computing, Networking and Decision-Making
Abstract: Offline reinforcement learning (RL) enables learning policies from fixed datasets, avoiding the potential safety risks and cost issues of online interaction with the environment. By collecting data from the real environment, offline reinforcement learning can also alleviate the cost of policy transfer in online learning, thus achieving higher learning efficiency and practicality. However, current offline reinforcement learning algorithms that use value functions to improve policies suffer from the problem of distributional shift, which makes it difficult to accurately evaluate state-action pairs within the dataset, and they pay little attention to the balance between reinforcement learning and imitation learning when improving policies. In this paper, we propose a novel offline learning algorithm ISAR that makes use of in-sample value function learning and advantage-regulated policy improvement. By learning the in-sample state value function, we avoid out-of-distribution action evaluation. And we introduce advantage-weighted behavioral cloning term during policy improvement to balance the relationship between reinforcement learning and behavior cloning. The experimental results show that the ISAR algorithm achieves results comparable to current state-of-the-art algorithms in various robot tasks without complex parameter tuning.
|
|
13:30-13:45, Paper Mo-PS30-T3.3 | Add to My Program |
Mask R-CNN Transfer Learning Variants for Multi-Organ Medical Image Segmentation |
|
Lem, Hongjian | Royal Holloway, University of London |
Zhang, Li | Royal Holloway, University of London |
Keywords: Medical Informatics, Human-Computer Interaction, Human-Machine Interface
Abstract: Medical abdomen image segmentation is a challenging task owing to discernible characteristics of the tumour against other organs. As an effective image segmenter, Mask R-CNN has been employed in many medical imaging applications, e.g. for segmenting nucleus from cytoplasm for leukaemia diagnosis and skin lesion segmentation. Motivated by such existing studies, this research takes advantage of the strengths of Mask R-CNN in leveraging on pre-trained CNN architectures such as ResNet and proposes three variants of Mask R-CNN for multi-organ medical image segmentation. Specifically, we propose three variants of the Mask R-CNN transfer learning model successively, each with a set of configurations modified from the one preceding. To be specific, the three variants are (1) the traditional transfer learning with customized loss functions with comparatively more weightage on the segmentation performance, (2) transfer learning based on Mask R-CNN with deepened re-trained layers instead of only the last two/three layers as in traditional transfer learning, and (3) the fine-tuning of Mask R-CNN with expansion of the Region of Interest pooling sizes. Evaluating using Beyond-the-Cranial-Vault (BTCV) abdominal dataset, a well-established benchmark for multi-organ medical image segmentation, the three proposed variants of Mask R-CNN obtain promising performances. In particular, the empirical results indicate the effectiveness of the proposed adapted loss functions, the deepened transfer learning process, as well as the expansion of the RoI pooling sizes. Such variations account for the great efficiency of the proposed transfer learning variant schemes for undertaking multi-organ image segmentation tasks.
|
|
13:45-14:00, Paper Mo-PS30-T3.4 | Add to My Program |
TCFP: A Novel Privacy-Aware Edge Vehicular Trajectory Compression Scheme Using Fuzzy Markovian Prediction |
|
Li, Yinglong | Zhejiang University of Technology |
Huang, Zhiwei | Zhejiang University of Technology |
Chen, Tieming | Zhejiang University of Technology |
Xu, Xinchen | Zhejiang University of Technology |
Liu, Weiru | University of Bristol, UK |
Lv, Mingqi | Zhejiang University of Technology |
Keywords: Networking and Decision-Making
Abstract: Vehicular trajectory data can be widely used in applications such as traffic prediction and congestion control. However vehicular trajectory data is voluminous and requires significant storage and processing resources, which contradicts the resources-constraint of vehicular networks. Existing compression methods suffer either low compression effects or privacy leakage. A privacy-aware Trajectory Compression scheme based on Fuzzy markovian Prediction (TCFP) is proposed in this paper, which consists of two steps of fuzzy compression. The first step compression is achieved by converting the raw trajectory data into fuzzy information on the edge vehicle sides. Further compression is performed at edge RSUs through fuzzy multi-order Markovian prediction combined with new-devised fuzzy deviation filtering rules. Extensive experimental evaluation based on real-world data sets demonstrates the proposed TCFP scheme achieves desired QoS performance in terms of compression rate, compression time, and information loss.
|
|
Mo-PS30-T4 Regular Session, Honolulu |
Add to My Program |
Data-Driven Approaches and Machine Learning II |
|
|
|
13:15-13:30, Paper Mo-PS30-T4.2 | Add to My Program |
Prediction of Electrical Characteristics of A-IGZO TFT Based on Transfer Learning-Based Variational Autoencoder |
|
Bea, Khean Thye | National Taipei University of Technology |
Hu, Shih-Shin | National Taipei University of Technology |
Lin, Wei-Hsuan | National Taipei University of Technology |
Lin, Da-Zheng | National Taipei University of Technology |
Chen, Xiu-Zhi | National Taipei University of Technology |
Lin, Ting Ru | National Taipei University of Technology |
Hu, Hsin-Hui | National Taipei University of Technology |
Chen, Yen-Lin | National Taipei University of Technology |
Chen, Kun-Ming | National Nano Device Laboratories |
Cheng, Wai Khuen | Univetsiti Tunku Abdul Rahman |
Keywords: Manufacturing Automation and Systems
Abstract: In this study, we proposed a transfer-learning based variational autoencoder model for predicting the electrical characteristics in the parameter tuning process of a-IGZO TFT structure design. The result achieve a high R2 score of 0.9704 with a low-computing-power hardware-friendly method that reduced time consumption significantly compared to prior approaches. The findings have practical implications for mitigating the time-consuming nature of TCAD simulations, and the method can expand to various types of input data while ensuring high performance and generalization. We demonstrated significant improvement in generalization and accuracy through a k-fold validation.
|
|
13:30-13:45, Paper Mo-PS30-T4.3 | Add to My Program |
User Experience Meets GPS Trajectory Search |
|
Cavojsky, Maros | Slovak University of Technology |
Drozda, Martin | Slovak University of Technology |
Keywords: Smart Buildings, Smart Cities and Infrastructures
Abstract: We investigate a user-centered approach to trajectory search in large data sets. We assume that user draws a query trajectory and expects to obtain a number of similar trajectories. Unlike the research results reported elsewhere, we try to answer the following simple question: What if we only focus on user experience of trajectory search, and give less or no importance to computational efficiency, how would we design such a user-centered trajectory search? Our answer is an approach based on gates through which a trajectory has to pass in order to be considered similar to the query trajectory. User thus does not have to draw an accurate query trajectory what can require a considerable amount of time, effort and attention to detail, and can lead to user frustration. We compare our approach to existing approaches, Geodabs and Geohash. For experimental evaluation we apply a dense synthetic data set. Our approach finds all trajectories that pass through any required gates, whereas Geodabs and Geohash only find a subset of similar trajectories with respect to a similarity measure.
|
|
13:45-14:00, Paper Mo-PS30-T4.4 | Add to My Program |
Vector Representation and Machine Learning for Short-Term Photovoltaic Power Prediction |
|
Costa, Renan | Universidade Federal De Pernambuco - UFPE |
Costa, Alexandre | Universidade Federal De Pernambuco - UFPE |
Vilela, Olga | Universidade Federal De Pernambuco - UFPE |
Tsang, Ing Ren | Universidade Federal De Pernambuco |
Keywords: Intelligent Power Grid, Decision Support Systems, Infrastructure Systems and Services
Abstract: Short-term photovoltaic (PV) energy production forecasting is critical for managing grid-connected systems and energy trading. Machine learning models are widely used for accurate prediction, and this study proposes using Time2Vec as an embedding for a transformer-based neural network architecture. Experiments on two PV power plants in India showed significant improvements comparing our proposed architecture to MLP, LSTM, and the persistence model, which is a standard baseline prediction in this type of forecasting, with over 20% improvements in some horizons. These findings demonstrate the effectiveness of the proposed approach for short-term PV forecasting using machine learning models.
|
|
14:00-14:15, Paper Mo-PS30-T4.5 | Add to My Program |
Serendipity-Oriented Recommender System with Dynamic Unexpectedness Prediction |
|
Tokutake, Yu | The University of Electro-Communications |
Okamoto, Kazushi | The University of Electro-Communications |
Keywords: Decision Support Systems
Abstract: With unexpectedness as a component of serendipity, many previous studies on serendipity-oriented recommender systems have quantified the degree of unexpectedness of items for users as a score. A user's browsing and rating history is necessary for the score calculation. These studies either treated all histories as equal or used only the most recent histories. However, these calculation methods cannot cope with unexpectedness in the case of constant change owing to fluctuations in user preferences and public popularity. In this study, we propose a serendipity-oriented recommender system that sequentially calculates the unexpectedness score and predicts the score in the recommendation time. The proposed system consists of a reranking algorithm that reranks a recommendation list generated by accuracy-oriented recommender systems. It also introduces a parameter to adjust users' ability to accept unexpectedness and aims for serendipitous recommendations that are in line with user preferences. The experiment on two benchmark datasets showed two results: the first is the proposed system improved the serendipitous metric by 0.061 points compared with accuracy-oriented systems before reranking and by 0.045 points compared with them without using the acceptance parameter. The second is the error between the predicted and ground truth of the unexpectedness score was smaller for a large dataset.
|
|
Mo-PS30-T5 Special Session, Kahuku |
Add to My Program |
Distributed Adaptive Systems |
|
|
Organizer: Zhu, Haibin | Nipissing University |
Organizer: Shen, Weiming | National Research Council Canada |
Organizer: Fortino, Giancarlo | University of Calabria |
Organizer: Xiong, Naixue | Northeastern State University |
|
13:00-13:15, Paper Mo-PS30-T5.1 | Add to My Program |
3D Map Extraction and Reconstruction Based on Point Cloud Data (I) |
|
Tang, Jianyin | Changchun University of Science and Technology |
Yu, Zhenglin | Changchun University of Science and Technology |
Shao, Changshun | Changchun University of Science and Technology |
Din, Kaifang | Changchun University of Science and Technology |
Li, Dianming | Changchun University of Science and Technology |
Xu, Shida | Changchun University of Science and Technology |
Keywords: Robotic Systems, System Modeling and Control
Abstract: 农业采摘领域的主要问题之一 就是要实现全地形图的构建和自主化 导航。许多采摘场景的特点是许多 坑洼大,斜坡多,实际环境复杂。在 实际的自动拣选过程中,拣选机器人有 一定的穿越障碍的能力,并能通过 穿过一些坑洼,而传统 二维地图无法有效识别和判断 障碍,难以实现良好的最优 路径规划。基于这个问题,我们提出了一个基础 一种基于三维的三维地图提取与重建方法 点云数据。在目标区域,首先,无人机是 配备激光雷达,用于3D点云数据 采集,对采集到的点云数据进行去噪, 进行地面分割和提取以及点云 简化,重建3D点云图 为移动拣选提供自主导航 3D 地图 机器人。
|
|
13:30-13:45, Paper Mo-PS30-T5.3 | Add to My Program |
Optimal Procurement in Consideration of Carbon Emissions (I) |
|
Peng, Chengyu | Laurentian University |
Zhu, Haibin | Nipissing University |
Liu, Linyuan | Nanjing Audit University |
Grewal, Ratvinder | Laurentian University |
Keywords: Consumer and Industrial Applications, Intelligent Green Production Systems
Abstract: With the rise in human activities, the trend of increasing carbon emissions is becoming more apparent. Enterprises are also facing serious challenges in the trend of low carbon and environmental protection, and the procurement process has a significant impact on carbon emissions. In this paper, we propose a procurement solution that integrates many aspects of procurement factors while focusing on reducing carbon emissions. Specifically, we first assess the cost of procurement, the carbon emissions involved, and the quality of the items. We then aggregate the evaluated data and combine them using the analytic hierarchy process to calculate the total qualification value. Also, we specified various constraints and used the E-CARGO model to formalize this problem. Based on the qualification values, we can use the IBM ILOG CPLEX Optimization (CPLEX) package to find the best sourcing solution according to the working hour allocation. In our experiments, our proposed solution can effectively derive the optimal procurement solution based on the purchaser's needs while focusing on reducing the carbon emissions involved in the procurement.
|
|
13:45-14:00, Paper Mo-PS30-T5.4 | Add to My Program |
Self-Adaptive Facial Expression Recognition Based on Local Feature Augmentation and Global Information Correlation (I) |
|
Yan, Lingyu | Hubei University of Technology |
Xia, Jinyao | Hubei University of Technology |
Wang, Chunzhi | Hubei University of Technology |
Keywords: Control of Uncertain Systems, Decision Support Systems, Discrete Event Systems
Abstract: Facial expression recognition(FER) is one of the important research in computer vision, which has been widely applied inhuman-computer interaction, education, healthcare, transportation, etc. However, the wide application of facial expression recognition technology also brings new challenges, where occlusion and pose variation are two of the worst factors that disturb facial expression recognition in the wild. We propose a facial expression recognition method based on local feature augmentation and multi-scale global correlation which can adaptively extract robust local features and global features from the feature level to suppress the disturbances of occlusion and pose variation on facial expression recognition. The experimental results show that our method performs well on the RAF-DB dataset and has stronger robustness compared with other algorithms.
|
|
14:00-14:15, Paper Mo-PS30-T5.5 | Add to My Program |
Stable Cloud Provider Selection Via Group Role Assignment with KB4 Logic Extended (I) |
|
Cai, Yuelin | Guangdong University of Technology |
Zhu, Haibin | Nipissing University |
Liu, Dongning | Guangdong University of Technology |
Keywords: Adaptive Systems
Abstract: Although cloud manufacturing offers greater flexibility and diversity than traditional manufacturing, it also presents greater uncertainty and variability. How to select stable cloud providers for material procurement (SSCPFMP) has become a critical supply chain optimization issue in cloud manufacturing. By extending the Group Role Assignment (GRA) model, this paper formalizes the problem. Moreover, we propose a new method for evaluating cloud providers that incorporates stability as an important criterion. Additionally, in order to complete the stability assessment as quickly as possible, we propose using the KB4 logic instead of the commonly used KB5 to mine potential cooperative relationships between cloud providers. We prove by deduction that KB4 and KB5 are equivalent. Largescale simulation experiments indicate that the KB4 logic performs significantly better than the KB5 logic, which can be improved by up to 43.78%. By using this method, decision makers are able to find more stable cloud providers for material procurement within a shorter timeframe.
|
|
14:15-14:30, Paper Mo-PS30-T5.6 | Add to My Program |
Trust Establishment for the Role-Based Collaborative Multi-Robot Systems (I) |
|
Akbari, Behzad | Nipissing University |
Zhu, Haibin | Nipissing University |
Pan, Ya-Jun | Dalhousie University |
Keywords: Trust in Autonomous Systems, Cooperative Systems and Control, Robotic Systems
Abstract: Trust evaluation and trust establishment play crucial roles in the management of trust within a multi-agent system. When it comes to collaboration systems, trust becomes directly linked to the specific roles performed by agents. The Role-Based Collaboration (RBC) methodology serves as a framework for assigning roles that facilitate agent collaboration. Within this context, the behavior of an agent with respect to a role is referred to as a process role. This research paper introduces a role engine that incorporates a trust establishment algorithm aimed at identifying optimal and reliable process roles. In our study, we define trust as a continuous value ranging from 0 to 1. To optimize trustworthy process roles, we have developed a consensus-based Gaussian Process Factor Graph (GPFG) tool. Our simulations and experiments validate the feasibility and efficiency of our proposed approach with autonomous robots in unsignalized intersections and narrow hallways
|
|
Mo-PS30-T6 Regular Session, Oahu |
Add to My Program |
Cyber-Physical Systems I |
|
|
|
13:00-13:15, Paper Mo-PS30-T6.1 | Add to My Program |
Metaverse-Driven Drone Edge Intelligence in B5G: A Conceptual Framework for Empowering CPSS (I) |
|
Hamood Alsamhi, Saeed | Insight Centre for Data Analytics, NUIG, Galway, Ireland |
Hawbani, Ammar | University of Science and Technology of China |
Santosh, Kumar | International Institute of Information Technology, Naya Raipur |
Gravina, Raffaele | University of Calabria |
Fortino, Giancarlo | University of Calabria |
Curry, Edward | National University of Ireland, Galway |
Keywords: Virtual/Augmented/Mixed Reality, Virtual and Augmented Reality Systems, Systems Safety and Security
Abstract: The Metaverse is an emerging concept that aims to integrate the physical and virtual worlds, creating a shared 3D virtual world where users can interact and immerse in new experiences. With the rise of Metaverse-driven Cyber- Physical-Social Systems (CPSSs), integrating drones as a critical technology in the Metaverse has become increasingly important. CPSSs have become proliferating and integral to our daily lives. This paper proposes a conceptual framework for Metaverse- driven drone edge intelligence, which integrates drone-enabled sensing, communication, and computation to enable real-time decision-making in CPSSs. We present a detailed analysis of the challenges and opportunities for integrating drones in the Metaverse and discuss the potential impact of our framework on various application domains. Our work contributes to advancing the Metaverse and CPSSs by providing a novel approach for empowering real-time decision-making and enabling new user experiences through integrating drones and the Metaverse. The proposed framework has the potential to revolutionize the way we approach data-driven decision-making in various industries and applications, including precision agriculture, transportation, emergency response, smart cities, healthcare, manufacturing, and energy
|
|
13:15-13:30, Paper Mo-PS30-T6.2 | Add to My Program |
Event-Based Fractional Order MIMO Control for Hemodynamic Stabilization During General Anesthesia |
|
Birs, Isabela | Ghent University, FWO |
Muresan, Cristina Ioana | Technical University of Cluj-Napoca |
Ghita, Mihaela | Ghent University |
Ghita, Maria | Ghent University |
Nascu, Ioana | Technical University of Cluj Napoca |
Ionescu, Clara Mihaela | Ghent University |
Keywords: Discrete Event Systems, System Modeling and Control, Cyber-physical systems
Abstract: General anesthesia is used to induce a reversible loss of consciousness in a patient to ensur they are completely unaware and pain-free during a surgical procedure. This is achieved through the administration of a combination of drugs that depress the central nervous system and can include intravenous injections, inhaled gases, or a combination of both. During the anesthesia maintenance phase, the events in surgery are not continuous, but obey a profile similar to that endorsed by event-based control. The present study considers event-based control as a natural solution to the control challenge of general anesthesia. Two fractional order Proportional Integral (PI) controllers are tuned and implemented using a decentralized approach to control a part of general anesthesia based on hemodynamic variables. The results show that this approach can be successfully used to keep Cardiac Output and Mean Arterial Pressure stable, with a special focus on patient safety, through fast detection of surgical stimulus and robustness to inter/intra patient variability.
|
|
13:30-13:45, Paper Mo-PS30-T6.3 | Add to My Program |
Formally Verifying the Security and Privacy of an Adopted Standard for Software-Update in Cars: Verifying Uptane 2.0 |
|
Boureanu, Ioana | University of Surrey |
Keywords: System Modeling and Control
Abstract: In this paper, we formally analyse the security of Uptane 2.0 -- the latest framework for over-the-air, i.e., online, delivery of software to cars. We are doing so by using the threat model and security requirements found in standard document that accompanies Uptane 2.0, as well as a modulation of this threat model and requirements added by ourselves, for a deeper analysis. To undertake this verification, we use the well-known formal protocol-verifier and theorem prover called Tamarin. We discuss our responsible disclosure to and work with the Uptane Alliance.
|
|
13:45-14:00, Paper Mo-PS30-T6.4 | Add to My Program |
Principles for the Effective Application of Systems Engineering: A Systematic Literature Review and Application Use Case |
|
Mundt, Enrik Georg | Fraunhofer Institute for Mechatronic System Design |
Wilke, Daria | Fraunhofer Institute for Mechatronic System Design |
Anacker, Harald | Fraunhofer Institute for Mechatronic System Design |
Dumitrescu, Roman | Fraunhofer Institute for Mechatronic System Design |
Keywords: Cyber-physical systems, Technology Assessment, Service Systems and Organizations
Abstract: The implementation of Systems Engineering (SE) offers a solution for organizations facing challenges from digitalization and increasing product complexity and interconnectivity. However, the implementation of SE company-wide requires a holistic approach that considers human and organizational factors beyond just technology and technical processes. This paper presents a systematic literature review (SLR) that aimed to determine the key principles for the effective application of SE in industrial practice. Through open, axial, and selective coding, 12 key principles were identified. Finally, an application example is given to show how the principles can be used to improve the process of the offer phase in special purpose machinery.
|
|
Mo-PS30-T7 Special Session, Hawaii 2 |
Add to My Program |
Computational and Medical Cybernetics I |
|
|
Co-Chair: Szilágyi, László | Obuda University |
Organizer: Rudas, Imre | Obuda University |
Organizer: Kovacs, Levente | Obuda University |
Organizer: Eigner, György | Obuda University |
Organizer: Szilágyi, László | Obuda University |
Organizer: Kubota, Naoyuki | Tokyo Metropolitan University |
Organizer: Kozma, Robert | University of Memphis, TN |
|
13:00-13:15, Paper Mo-PS30-T7.1 | Add to My Program |
Model Predictive Control with Dynamic Positive Input Extension for Artificial Pancreas Applications (I) |
|
Novák, Kamilla | Obuda University |
Siket, Máté | Obuda University |
Kovacs, Levente | Obuda University |
Drexler, Dániel András | Óbuda University |
Rudas, Imre | Obuda University |
Eigner, Gyorgy | Obuda University |
Keywords: Expert and Knowledge-Based Systems
Abstract: Like many physiological systems, the various mathematical models describing the glucose-insulin system can only have non-negative inputs. Thus, the control method --- often model predictive control in artificial pancreas systems---must provide a non-negative control signal. Existing solutions include saturation and constrained optimization. In this paper, we propose a dynamic extension of the patient model as a way of ensuring the positivity of the control signal, a method previously applied in tumor growth control. We evaluate the controller in a closed-loop simulation and compare the results with a controller using saturation. Our simulations show that control performance with the extended model can reach or exceed the performance achieved with saturation.
|
|
13:15-13:30, Paper Mo-PS30-T7.2 | Add to My Program |
Impulsive Model Predictive Control in Type 1 Diabetes Mellitus Applications (I) |
|
Novák, Kamilla | Obuda University |
Siket, Máté | Obuda University |
Kovacs, Levente | Obuda University |
Drexler, Dániel András | Óbuda University |
Eigner, György | Obuda University |
Keywords: Expert and Knowledge-Based Systems
Abstract: Despite the increasing availability and reliability of artificial pancreas devices, many people with type 1 diabetes mellitus are still on multiple daily injections therapy consisting of a daily basal insulin injection and mealtime boluses. Use of an insulin bolus advisor may improve glycaemic control as well as reduce the burden of the disease on these patients. This paper investigates the application of an impulsive model predictive controller in a bolus advisor system. The bolus calculator is assessed in closed-loop simulations using different basal insulin scenarios. The simulations show that a model predictive controller can achieve good glycaemic control and may be suitable to give bolus recommendations to people with diabetes. Higher basal insulin levels can result in better times in target range, but also carry a higher risk of hypoglycaemia.
|
|
13:30-13:45, Paper Mo-PS30-T7.3 | Add to My Program |
Brain Tumor Segmentation from Multi-Spectral MRI Records Using a U-Net Cascade Architecture (I) |
|
Gyorfi, Agnes | Sapientia - Hungarian University of Transylvania |
Kovacs, Levente | Obuda University |
Szilágyi, László | Obuda University |
Keywords: Image Processing and Pattern Recognition, Application of Artificial Intelligence, Neural Networks and their Applications
Abstract: Brain tumor segmentation has been a widely researched topic for decades, and it intensified ten years ago as a consequence of the Brain Tumor Segmentation Challenges (BraTS), which provided and yearly updated a standard multi-spectral brain tumor MRI data set and a unified evaluation framework to the research community. This paper proposes a procedure for brain tumor segmentation, which uses a spatial histogram enhancement method to preprocess the data, and two identical cascaded U-net networks that work with 3D convolution. The first U-net accomplishes an intermediary segmentation of the brain volume, while the second one reevaluates the labels given to pixels based on the labels of neighbor pixels. The output of both U-nets are evaluated using statistical accuracy benchmarks. The proposed procedure achieved an average Dice score of 88.8% on the high-grade glioma records of the BraTS 2019 training data set. Post-processing increased the average Dice score by 1.1%, but in case of typical small high-grade tumor lesion it can achieve an improvement of up to 5%.
|
|
13:45-14:00, Paper Mo-PS30-T7.4 | Add to My Program |
Effect of Hyperparameters of Reinforcement Learning in Blood Glucose Control (I) |
|
Dénes-Fazakas, Lehel | Óbuda University |
Siket, Máté | Obuda University |
Szilágyi, László | Obuda University |
Eigner, György | Obuda University |
Kovacs, Levente | Obuda University |
Keywords: Application of Artificial Intelligence
Abstract: Reinforcement learning (RL) has shown promise in controlling blood glucose levels in a personalized way in type 1 diabetic patients. In this study, we investigate the impact of different activation functions and layer numbers on RL performance in blood glucose control. We train RL agents with various combinations of activation functions and layer numbers on a virtual patient model. The RL agents are evaluated based on their ability to maintain blood glucose levels within a target range while minimizing the frequency and magnitude of hypoglycemia and hyperglycemia events. Our results show that the choice of activation function and layer number significantly affects the RL performance. Specifically, the agents with ReLU activation functions and two or three hidden layers outperform the other agents, achieving a higher percentage of time in the target range and fewer hypoglycemia and hyperglycemia events. These findings provide valuable insights for the development of RL-based blood glucose control systems in type 1 diabetic patients.
|
|
14:00-14:15, Paper Mo-PS30-T7.5 | Add to My Program |
Brain Dynamics in Engaged and Relaxed Psychophysiological States Reflecting the Creation of Knowledge and Meaning (I) |
|
Davis, Joshua | University of Auckland |
Schubeler, Florian | Embassy of Peace, Whitianga |
Kozma, Robert | University of Memphis, TN |
Keywords: Computational Life Science, Biometric Systems and Bioinformatics, Cybernetics for Informatics
Abstract: Scalp electroencephalography (EEG) provides a practical tool for the identification and characterization of various brain states, including healthy and diseased conditions. We measure brain dynamics on the scalp via a HydroCel Geodesic Sensor Net, 128 electrodes dense-array EEG. We compute the pragmatic information index (PI) by Hilbert analysis of the EEG signals of 20 healthy participants. We compare 6 task modalities, combining different audio-visual stimuli, leading to various mental states, predominantly relaxed versus engaged. We analyze PI values to classify different brain states. We show significant differences between the measured neural signatures depending on the task modalities, based on both qualitative and quantitative analysis. The results can help to develop tools for medical diagnostics of stress-related mental conditions. Keywords — EEG, Cognition, Intentional Action, Pragmatic Information, Meditation, Awareness, Meaning, Knowledge.
|
|
Mo-PS30-T9 Workshop Session, Hilo |
Add to My Program |
Paper Talks 1 |
|
|
Organizer: Stoica, Adrian | NASA Jet Propulsion Laboratory |
|
13:00-13:15, Paper Mo-PS30-T9.1 | Add to My Program |
Classification of Haptic Handshake Data for the Control of Human-Telerobot Social Contact Interactions (I) |
|
Brunken, Tomma | Southern Illinois University Edwardsville |
Gorlewicz, Jenna L. | Saint Louis University |
Butts-Wilmsmeyer, Carolyn | Southern Illinois University Edwardsville |
Weinberg, Jerry B. | Southern Illinois University Edwardsville |
Keywords: Telepresence, Shared Control, Human-Computer Interaction
Abstract: Mobile Remote Presence (MRP) robots have emerged out of the need for telepresence in various settings such as the workplace and hospitals. As with face-to-face experiences, these robot mediated encounters have social aspects that current commercially available MRP robots lack the capabilities to incorporate. In previous work, we integrated a manipulator onto a commercial telerobotic platform to enable expressive gestures and demonstrated that the gesturing capabilities enhanced the social connection between remote and local users. However, we also found that controlling the robot for complex interactions, such as a handshake, diminishes the remote user’s social experience. This paper presents the discovery of models for handshakes in different social contexts, which can be used in a shared-control architecture to reduce the effort on the remote user. Using a haptic measurement glove, force and inertia data was collected for human-human handshakes in various social contexts. By applying a k-nearest neighbor algorithm in combination with dynamic time warping and a support vector machine algorithm, two classification models are derived that predict the social context and can be used in an intelligent shared-control robot architecture.
|
|
13:15-13:30, Paper Mo-PS30-T9.2 | Add to My Program |
Towards Immersive Bilateral Teleoperation Using Encountered-Type Haptic Interface (I) |
|
Kim, Yaesol | Istituto Italiano Di Tecnologia |
Castillo Silva, Myrna Citlali | Istituto Italiano Di Tecnologia |
Anastasi, Sara | Istituto Nazionale Per l'Assicurazione Contro Gli Infortuni Sul |
Deshpande, Nikhil | Istituto Italiano Di Tecnologia |
Keywords: Haptic Systems, Telepresence, Team Performance and Training Systems
Abstract: Encountered-type haptics (ETH) is an emerging research field that enables unencumbered physical haptic interaction in virtual reality (VR). In this paper, we propose Encountered-type haptics (ETH) as an interaction medium for immersive remote teleoperation, facilitating intuitive bare-hand interaction with visuo-haptic feedback. Our system allows a human operator to control a remote robot immersively through the visual rendering of the VR environment, while interacting with a haptic robot at the user site. The ETH feedback rendering and the teleoperation at the remote site are both implemented using a 7 degrees-of-freedom (DoFs) Franka Emika Panda robot under Cartesian-impedance control. The Cartesian goal poses for each robot are determined based on the pose of the operator's proxy hand in the VR environment and the operator's interaction intention, estimated through hand gestures and gaze direction. The impedances of both robots are updated at runtime to provide the operator with bilateral haptic interaction forces. Our system was evaluated through a user study involving a door-opening task under three teleoperation conditions: (1) without haptic feedback, (2) ETH feedback with constant impedance, and (3) ETH feedback with variable impedance. The results highlight the advantages of using ETH, including the ability to regulate forces at both the user and robot sites. ETH conditions demonstrate lower peak forces and reduced force jittering at the remote site. Furthermore, the variable impedance condition within ETH shows improved task execution times and reduced exerted force. This paper demonstrates that ETH is an effective medium for immersive bare-hand bilateral teleoperation.
|
|
13:30-13:45, Paper Mo-PS30-T9.3 | Add to My Program |
SSC3OD: Sparsely Supervised Collaborative 3D Object Detection from LiDAR Point Clouds (I) |
|
Han, Yushan | Beijing Jiaotong University |
Zhang, Hui | Beijing Jiaotong University |
Zhang, Honglei | Beijing Jiaotong University |
Li, Yidong | Beijing Jiaotong University |
Keywords: Visual Analytics/Communication
Abstract: Collaborative 3D object detection, with its improved interaction advantage among multiple agents, has been widely explored in autonomous driving. However, existing collaborative 3D object detectors in a fully supervised paradigm heavily rely on large-scale annotated 3D bounding boxes, which is laborintensive and time-consuming. To tackle this issue, we propose a sparsely supervised collaborative 3D object detection framework SSC3OD, which only requires each agent to randomly label one object in the scene. Specifically, this model consists of two novel components, i.e., the pillar-based masked autoencoder (PillarMAE) and the instance mining module. The Pillar-MAE module aims to reason over high-level semantics in a self-supervised manner, and the instance mining module generates high-quality pseudo labels for collaborative detectors online. By introducing these simple yet effective mechanisms, the proposed SSC3OD can alleviate the adverse impacts of incomplete annotations. We generate sparse labels based on collaborative perception datasets to evaluate our method. Extensive experiments on three largescale datasets reveal that our proposed SSC3OD can effectively improve the performance of sparsely supervised collaborative 3D object detectors.
|
|
13:45-14:00, Paper Mo-PS30-T9.4 | Add to My Program |
BlindSpotEliminator: Collaborative Point Cloud Perception in Cellular-V2X Networks (I) |
|
Chen, Ziyue | Beijing University of Posts and Telecommunications |
Luo, Guiyang | Beijing University of Posts and Telecommunications |
Shao, Congzhang | Beijing University of Posts and Telecommunications |
Yuan, Quan | Beijing University of Posts and Telecommunications |
Li, Jinglin | Beijing University of Posts and Telecommunications |
Keywords: Visual Analytics/Communication
Abstract: Multi-agent collaborative perception depends on sharing sensory information to improve perception accuracy and robustness, as well as to extend coverage. However, most collaborative perception methods ignore the limitations of communication networks, such as limited bandwidth and the possibility of wireless conflicts. To fill this gap, this paper proposes BlindSpotEliminator, a conflict-free scheduler over the cellularV2X networks for supporting practical collaborative point cloud perception to eliminate blind spots. BlindSpotEliminator first identifies the blind spots for each vehicle, then lists the corresponding conflict relationships based on the distribution of the blind spots and communication conflicts, and finally designs an optimized point cloud data transmission strategy to eliminate the blind spots of each vehicle. Extensive experiments show that compared with greedy algorithm and random methods, BlindSpotEliminator achieves better efficiency, i.e., transmitting 20% more point cloud data.
|
|
14:00-14:15, Paper Mo-PS30-T9.5 | Add to My Program |
MS-Transformer: Masked and Sparse Transformer for Point Cloud Registration (I) |
|
Jia, Qingyuan | Beijing University of Posts and Telecommunications |
Luo, Guiyang | Beijing University of Posts and Telecommunications |
Yuan, Quan | Beijing University of Posts and Telecommunications |
Li, Jinglin | Beijing University of Posts and Telecommunications |
Shao, Congzhang | Beijing University of Posts and Telecommunications |
Chen, Ziyue | Beijing University of Posts and Telecommunications |
Keywords: Visual Analytics/Communication
Abstract: In this paper, we propose a masked and sparse transformer to address the problem of point cloud registration with low overlap. The mask mechanism reduces the overall data, increasing the corresponding point ratio in the overlap region, while also reducing the computational cost to accelerate the algorithm’s execution speed. Moreover, we combine spatial position encoding and sparse self-attention to establish relationships within the source point cloud, as well as the relationships and attention scores between the source and target point clouds. This approach is specifically designed for the task of point cloud registration. Finally, we search for the maximum overlap area by matching the spatial consistency between points and calculate the 3D transformation matrix to complete the registration process. Our method achieves an improvement in the inlier ratio and performs well on the 3DMatch and 3DLoMatch datasets, demonstrating high registration efficiency.
|
|
14:30-14:45, Paper Mo-PS30-T9.9 | Add to My Program |
Cyber-Physical Humans at the Intersection of Digital Twins, Immersive Internet and Telepresence (I) |
|
van Erp, Jan | University of Twente |
Keywords: Telepresence, Virtual/Augmented/Mixed Reality, Ethics of AI and Pervasive Systems
Abstract: The digital transformation is ongoing and affects all aspects of our professional and personal life. This paper sketches three transformations and introduces the concept of Cyber-Physical Humans as an inevitable result of their merger. Cyber-Physical Humans connect the cyberspace (back) with the physical space and derive their value in their capability to interact directly, or indirectly, with the physical world. This means that we will encounter representations of people present in cyberspace in the real world, for instance as holographic projection or as robotic avatar. Emerging technologies allow us to be present in the real world, in a digital world, in a remote world, or in multiple worlds at the same time. It is no longer hypothetical to be in the real world and talk with one colleague who is present through the metaverse and a holographic projection and another who is present through a robotic avatar operated in telepresence. These digital transformations will blur the lines between the real, the virtual, and the remote world with potentially huge consequences for the way we live, work and communicate. This predictably comes with great opportunities and risks amongst others health risks (isolation, problematic use, cyber sickness), and security & privacy risks (deep-fakes, alternate presentations, spoofing, identity theft). It is important to address these risks to ensure that the benefits of Cyber-Physical Humans are maximized while minimizing the negative consequences.
|
|
14:30-14:45, Paper Mo-PS30-T9.10 | Add to My Program |
Audio-Based Roughness Sensing and Tactile Feedback for Haptic Perception in Telepresence (I) |
|
Pätzold, Bastian | University of Bonn |
Rochow, Andre | University of Bonn |
Schreiber, Michael | University of Bonn |
Memmesheimer, Raphael | University of Bonn |
Lenz, Chris | University of Bonn |
Schwarz, Max | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Haptic Systems, Telepresence
Abstract: Haptic perception is highly important for immersive teleoperation of robots, especially for accomplishing manipulation tasks. We propose a low-cost haptic sensing and rendering system, which is capable of detecting and displaying surface roughness. As the robot fingertip moves across a surface of interest, two microphones capture sound coupled directly through the fingertip and through the air, respectively. A learning-based detector system analyzes the data in real time and gives roughness estimates with both high temporal resolution and low latency. Finally, an audio-based vibrational actuator displays the result to the human operator. We demonstrate the effectiveness of our system through lab experiments and our winning entry in the ANA Avatar XPRIZE competition finals, where briefly trained judges solved a roughness-based selection task even without additional vision feedback. We publish our dataset used for training and evaluation together with our trained models to enable reproducibility of results.
|
|
Mo-PS30-T12 Workshop Session, Hawaii 4 |
Add to My Program |
Workshop 4.2 - Workshop on Neuroergonomics |
|
|
Organizer: Barresi, Giacinto | Istituto Italiano Di Tecnologia |
Organizer: Fortino, Giancarlo | University of Calabria |
|
13:00-13:15, Paper Mo-PS30-T12.1 | Add to My Program |
Dynamic Performance in Virtual Spaces (I) |
|
Mayr, Riley | Kinetic Vision, Inc |
Kuznetsov, Nikita | University of Cincinnati |
Lorenz, Tamara | University of Cincinnati |
Keywords: Human Performance Modeling, Virtual/Augmented/Mixed Reality, Human-Collaborative Robotics
Abstract: Virtual, Augmented, and Mixed Reality (VAMR) applications and trainings have gained traction across business sectors, including healthcare, all around the globe. Yet, while technology has made huge jumps towards displaying photo-realistic environments and applications, non-visual feedback (auditory, haptic, olfactory) is still evolving. In PR, humans are processing their environment multi-modally and base their actions on those multi-modal perceptions. Thus, it remains unclear if human action dynamics in information-reduced, vision-heavy VAMR of today resembles that in physical reality. We therefore present a pilot study in a simple human-robot co-action paradigm investigating dynamic performance differences between VAMR and PR during task performance. Results suggest that some aspects of task performance in VR may be the same as in the physical setting, however, users may utilize different strategies in these modalities because the consequences for failure are different, which may have consequences for VAMR related application and trainings.
|
|
13:15-13:30, Paper Mo-PS30-T12.2 | Add to My Program |
Co-Creative Rehabilitation Platform for Cognitive Modeling in a Perceiving-Acting Cycle System (I) |
|
Obo, Takenori | Tokyo Metropolitan University |
Sekiguchi, Takuro | Tokyo Metropolitan University |
Kubota, Naoyuki | Tokyo Metropolitan University |
Keywords: Virtual and Augmented Reality Systems
Abstract: In this presentation, we introduce the concept of the rehabilitation platform and computational approaches for cognitive modeling. Firstly, we discuss immersive VR/AR systems and assessment programs designed for higher brain dysfunction. Additionally, we explore the significance of the co-creative platform as a novel approach to rehabilitation spaces. We then explain the structured learning process, where each learning system serves as an interdependent subsystem, for modeling the perceiving-acting cycle in rehabilitation tasks. The architecture comprises four subsystems: the perceptual system, action system, attention system, and anticipation system. Topological mappings and recurrent neural networks are employed for spatiotemporal pattern modeling. Furthermore, we show several experimental examples conducted in clinical settings. Finally, we discuss the future directions of research concerning the co-creative rehabilitation platform.
|
|
Mo-PS50-T1 Regular Session, Hawaii 1 |
Add to My Program |
Machine Learning III |
|
|
|
16:00-16:15, Paper Mo-PS50-T1.1 | Add to My Program |
Robust Fault Diagnosis for Gas Turbine Rotor Via Transfer Reinforcement Learning |
|
Zhang, Yufei | Beijing University of Technology |
Keywords: Transfer Learning, Machine Learning, Deep Learning
Abstract: As the central component of a significant power machine, the gas turbine’s fault diagnosis accuracy is critical to the equipment’s safety in service. However, the fault detection of the gas engine rotor system still faceschallenges due to the difficulty of acquiring sensitive features and the lack of labeled data. To address these issues, we propose an improved DQNbased Transfer Reinforcement Learning method (Transfer DQN) for robust gas turbine rotor fault diagnosis. The proposed method takes the collected one-dimensional raw vibration signal as input, with the fault sample set and fault category serving as the model environment and action. It uses a multi-scale one-dimensional wide convolutional neural network (M-WDCNN) with ϵ-greedy strategy for Q-network fitting and decision making. Additionally, to consider the computational efficiency and differences between fault classes, Transfer-DQN uses multiple fault sample data as the source domain and a single fault class as the target domain, while performing the source-to-target domain transfer learning based on generative adversarial. Extensive experiments on the bearing dataset of Western Reserve University and our gas turbine test bench demonstrate the superiority of Transfer-DQN, achieving accuracies of 98.95% and 96.91%, respectively. Compared with baseline approaches, our method breaks through the previous upper limit of 95% to meet the need for robust and efficient fault diagnosis.
|
|
16:15-16:30, Paper Mo-PS50-T1.2 | Add to My Program |
Genetic Programming Lifelong Multitasking Evolution: LLGP-Tasking |
|
Kattan, Ahmed | Ministry of Municipal, Rural Affairs, and Housing, Saudi Arabia |
Doctor, Faiyaz | University of Essex, Essex, UK, |
Keywords: Transfer Learning, Computational Intelligence, Evolutionary Computation
Abstract: We present a Lifelong Multi-Tasking learning algorithm based on Genetic Programming referred to as``LLGP-Tasking". This paper extends previously published work on "GP-Tasking", an evolutionary drive optimisation approach for evolving a population of GP trees using a multifaceted strategy. In GP-Tasking, each individual is trained with different training sets and evaluated with multiple fitness functions (where each function represents one task). Empirical evidence demonstrated that the quality of evolved solutions is comparable to standard GP achieving significantly faster computational time while maintaining smaller evolved population sizes. In this work, we improved GP-Tasking and introduced a new crossover mechanism to transfer useful knowledge across different tasks. Further, we introduced new population initialisation approach to accumulate knowledge across different domains. The new LLGP-Tasking can solve multiple problems simultaneously and receive sequentially new batches of problems, Experimental results of the new LLGP-Tasking demonstrate superiority of evolved solutions over standard GP and it maintained same search speed produced by its predecessor (i.e., GP-Tasking).
|
|
16:30-16:45, Paper Mo-PS50-T1.3 | Add to My Program |
Reference Governors Based on Offline Training of Regression Neural Networks |
|
Lim, Chuan Yuan | University of Michigan |
Ossareh, Hamid | University of Vermont |
Kolmanovsky, Ilya V. | University of Michigan |
Keywords: Neural Networks and their Applications, Machine Learning
Abstract: This paper presents two machine learning-based constraint management approaches based on Reference Governors (RGs). The first approach, termed NN-DTC, uses regression neural networks to approximate the distance to constraints. The second, termed NN-NL-RG, uses regression neural networks to approximate the input-output map of a nonlinear RG. Both approaches are shown to enforce constraints for a nonlinear second order system. NN-NL-RG requires a smaller dataset size as compared to NN-DTC for well-trained neural networks. For systems with multiple constraints, NN-NL-RG is also more computationally efficient than NN-DTC. Finally, promising results are reported by having both approaches implemented on a more complex spacecraft proximity maneuvering and docking application, through simulations.
|
|
16:45-17:00, Paper Mo-PS50-T1.4 | Add to My Program |
Time-Efficient Weapon-Target Assignment by Actor-Critic Reinforcement |
|
Byun, Muhyun | Korea Advanced Institute of Science and Technology (KAIST) |
Na, Hyungho | Korea Advanced Institute of Science and Technology (KAIST) |
Moon, Il-Chul | KAIST |
Keywords: Neural Networks and their Applications, Machine Learning, Application of Artificial Intelligence
Abstract: This paper proposes a time-efficient model for solving the Weapon-target assignment (WTA) problem with actor-critic reinforcement learning. While typical heuristic algorithms and recently studied artificial neural network methodologies have shown good performance results, previous approaches has not been time-efficient in large-scale WTA problems. This paper utilizes the actor-critic framework to resolve the WTA problem, and this framework enables retrieving solutions 23 times faster than the previous deep Q-network approach. Additionally, we incorporate a recurrent neural network model of gated recurrent units (GRU) to allow agents to learn the latent state-space of the WTA problem. Our experiments demonstrate the solution quality and the time efficiency compared to traditional heuristic methods as well as recent DQN-based RL models
|
|
17:00-17:15, Paper Mo-PS50-T1.5 | Add to My Program |
Gait Measurement System Toward Robotic Rehabilitation of Locomotive Functions |
|
Saegusa, Ryo | Kanagawa Institute of Technology |
Keywords: Machine Learning
Abstract: The extension of health expectancy is getting a more important subject for developed countries with higher rate of elderly population. In order to sustain their societies, aged people are encouraged to participate in the social activities, while the social participation requires them to keep their locomotive function high enough to accomplish their missions. From this reason, the easy and frequent assessment of the locomotive function thought the people’s daily lives is strongly expected. One of the promising methods to evaluate locomotive function is the Timed Up and Go test (TUG test), which is a widely used method in the field of physical therapy for locomotive rehabilitation in medical facilities. Throughout our research activities, we have been investigating robotic approaches to assess cognitive and motor functions of aged people in the purpose of their health expectancy extension. In this article, we will propose a novel method of a gait measurement system that will be incorporated with health care mobile robots developed previously. We demonstrated experiments to measure the walking patterns of 23 subjects of aged people in order to evaluate their locomotive functions using the proposed gait measurement system.
|
|
17:15-17:30, Paper Mo-PS50-T1.6 | Add to My Program |
Reinforcement Learning with Experience Sharing for Intelligent Educational Systems (I) |
|
Hare, Ryan | Rowan University |
Tang, Ying | Rowan University |
Keywords: Adaptive Systems, Cyber-physical systems, System Architecture
Abstract: With higher education pushing toward larger class sizes, a large portion of current methodology focuses on one-size-fits-all approaches that can effectively educate a large class. However, when these approaches fail, students can be left behind and fail classes due to simple misunderstandings. Inspired by these issues, this paper proposes a modular reinforcement learning system that can be used in intelligent educational systems to inform personalized student support. Based on a similar method detailed in prior work, this paper proposes experience sharing with tutor agents as a computationally light approach to improve reinforcement learning training speed on the task of student support. We also provide preliminary results obtained from student simulations to demonstrate the effectiveness of the proposed method on reinforcement learning agent performance.
|
|
Mo-PS50-T2 Regular Session, Kona |
Add to My Program |
Evolutionary Computation II |
|
|
|
16:00-16:15, Paper Mo-PS50-T2.1 | Add to My Program |
Comparative Study on Different Encoding Strategies for Multiple Traveling Salesmen Problem |
|
Dou, Xin-Ai | School of Artificial Intelligence, Nanjing University of Informa |
Yang, Qiang | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Keywords: Evolutionary Computation, Computational Intelligence, AI and Applications
Abstract: Multiple traveling salesmen problem (MTSP) is an extension of traditional traveling salesman problem (TSP). It involves both the city assignment optimization and the route optimization of each salesman. Genetic algorithms (GA) have been widely used to solve MTSP thanks to its easiness in implementation and good global search ability. To help GA effectively solve MTSP, researchers have developed various encoding schemes. However, there is no systematic and comparative study on the effectiveness of these encoding strategies. To fill this gap, this paper conducts investigations to compare four popular encoding strategies for MTSP, namely the one-chromosome encoding, the two-chromosome encoding, the two-part-chromosome encoding and the multi-chromosome encoding. Experimental results on different MTSP instances with different numbers of cities and salesmen show that the multi-chromosome encoding is far better than the other encoding strategies.
|
|
16:15-16:30, Paper Mo-PS50-T2.2 | Add to My Program |
Unsupervised Learning-Based Methodology for Detection of Postural Anomalies in Wheelchair Users (I) |
|
Vermander, Patrick | University of the Basque Country (UPV/EHU) |
Mancisidor, Aitziber | University of the Basque Country (UPV/EHU) |
Fortino, Giancarlo | University of Calabria |
Cabanes, Itziar | University of the Basque Country |
Gravina, Raffaele | University of Calabria |
Keywords: Assistive Technology, Medical Informatics
Abstract: Postural monitoring in wheelchair users is a topic of growing interest. The detection of changes in the sitting patterns of these patients may serve to detect changes in their functional status and be able to adapt rehabilitation early. For this reason, this paper presents a methodology for the detection of specific postural anomalies that, unlike previous works, adopts unsupervised learning. The proposed methodology involves data dimensionality reduction using Principal Component Analysis, and the application of K-means clustering to group different normal posture states. The anomalies are detected using a threshold approach, where data points that fall outside a certain threshold are considered as anomalies. The results show that the methodology is effective in identifying anomalies with a high degree of accuracy (around 90%).
|
|
16:30-16:45, Paper Mo-PS50-T2.3 | Add to My Program |
Binomial Distribution Assisted Individual Selection for Differential Evolution |
|
Ji, Jiawei | Nanjing University of Information Science and Technology |
Yang, Qiang | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Computational Intelligence
Abstract: Mutation plays a crucial role in assisting differential evolution (DE) to effectively solve optimization problems. The key to mutation lies in the selection of parent individuals participating in the mutation. Along this road, this paper devises a binomial distribution-assisted individual selection strategy for DE. Spe-cifically, this paper takes advantage of the probability distribu-tion function of the binomial distribution to assign weights to individuals based on their fitness rankings. In this way, the selection of individuals focuses more on medium better individ-uals instead of the top best ones. Therefore, high mutation di-versity can be preserved and thus it is likely that falling into local regions can be effectively avoided. Embedding this selec-tion strategy into DE, a novel DE variant called binomial dis-tribution assisted DE (BDDE) is developed. Experiments con-ducted on the CEC2017 benchmark suite have verified the ef-fectiveness of BDDE in solving optimization problems. Particu-larly, BDDE gains much better performance against the well-known and representative mutation strategies.
|
|
16:45-17:00, Paper Mo-PS50-T2.4 | Add to My Program |
Random Pairwise Competition Based Ant Selection for Pheromone Up-Dating in Ant Colony Optimization |
|
Cao, Hao | Nanjing University of Information Science and Technology |
Yang, Qiang | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Zhang, Jun | Hanyang University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Computational Intelligence
Abstract: Ant Colony Optimization (ACO) has shown very promising performance in solving Traveling Salesman Problem (TSP). However, most existing ACO algorithms utilize either the abso-lutely best ants or all ants to update the pheromone matrix. This leads to either serious diversity loss or slow convergence. To alleviate these predicaments, this paper designs a random pairwise competition based ant selection for pheromone updat-ing. Specifically, a number of ants are randomly selected from the ant colony and then are randomly paired together. Subse-quently the better one in each pair is selected to update the pheromone matrix. In this way, a good balance between search diversity and search convergence is potentially maintained. Integrating this selection strategy along with a local search scheme into the ACO framework, a new ACO algorithm called random pairwise competition based ACO (RPCACO) is devel-oped. Experiments conducted on 8 TSP instances from the TSPLIB benchmark set demonstrate that RPCACO is more effective and efficient than the five classical ACO algorithms in solving TSP.
|
|
Mo-PS50-T3 Regular Session, Hawaii 5 |
Add to My Program |
Shared Control |
|
|
|
16:00-16:15, Paper Mo-PS50-T3.1 | Add to My Program |
Shared Control for Giving Ordinary Drivers Expert Level Drifting Skills |
|
Karino, Izumi | Toyota Reserach Institute |
Dallas, James | Toyota Research Institute |
Goh, Jonathan | Toyota Research Institute |
Keywords: Human-Machine Cooperation and Systems, Human-Computer Interaction
Abstract: Expanding the shared control paradigm for autonomous vehicles to include the open-loop unstable region can improve safety in extreme vehicle conditions, such as tire saturation from encountering low friction or from emergency lane changes. Whereas the state-of-the-art sacrifices agility for stability by restricting the vehicle domain to operate in the open-loop stable regime, this paper expands existing shared control approaches to ensure safety outside the open-loop stable region. A Nonlinear Model Predictive Control framework balances a cost to follow the driver's commands and costs for safety, such that it does not disturb the driver in safe states and intervenes only when necessary in dangerous states seamlessly. Specifically, a novel cost function is formulated to incorporate driver intent, while ensuring safety through costs on the maximum phase recovery envelope and track bounds. Ensuring sufficient model fidelity for drifting is achieved through a nonlinear bicycle model that incorporates a nonlinear tire model and wheelspeed dynamics. Circular drifting experiments with a full-scale vehicle demonstrate the ability of the controller to follow driver commands in safe states, while augmenting the driver commands to avoid situations of track bound violation and spin-out when drifting a circle.
|
|
16:15-16:30, Paper Mo-PS50-T3.2 | Add to My Program |
Rule Renew Based on Learning Classifier System and Its Application to UAVs Swarm Adversarial Strategy Design |
|
Li, Xuanlu | Southeast University |
Zhang, Ya | Southeast University |
Keywords: Team Performance and Training Systems, Systems Safety and Security, Shared Control
Abstract: This paper studies how to renew and improve the expert rule and apply the rule update mechanism to optimize UAVs swarm adversarial strategy. A rule update approach is proposed, which uses the classifier subsystem to construct a training model based on expert experience, further trains the model through rule evaluation mechanisms and rule discovery subsystems to improve and enhance the rule base. A UAV swarm confrontation strategy model is further proposed based on the learning classifier system(LCS). Under the simulated aerial engagement environment of island capture between the red and blue sides, simulation experiments show that the model has robust combat effectiveness and offers significant practical utility for agent decision.
|
|
16:30-16:45, Paper Mo-PS50-T3.3 | Add to My Program |
Illusory Control with Instant Virtual World Environment |
|
Aoki, Junki | Ricoh Co., Ltd./Kyushu University |
Sasaki, Fumihiro | Ricoh Company, LTD |
Yamashina, Ryota | Ricoh Company, Ltd |
Kurazume, Ryo | Kyushu University |
Keywords: Human-Computer Interaction, Shared Control, Virtual and Augmented Reality Systems
Abstract: We proposed a teleoperation method, illusory control (IC), that provides a comfortable operation experience using a seamless transition between real and pre-prepared virtual environments. Therefore, the mobile robot with IC can function solely in familiar environments. However, this study proposes a novel method, instant IC, that eliminates the requirement for a pre-prepared virtual environment. The proposed robot system can instantly generate a virtual environment using actual 360° images of the robot in motion, utilizing instant neural graphics primitives and neural radiance fields. The 360° images allow the entire surrounding environment to be virtualized without requiring specific camera orientations. In addition, by optimizing the density of neural radiance fields using depth estimation results beforehand, the reconstruction accuracy at unknown poses can be guaranteed. Furthermore, we proposed a depth scaling method based on the actual measurements obtained by LiDAR to increase the consistency of virtual and real environments. With this instant virtual environment, the proposed system enables teleoperation in unknown environments via the seamless transition between real and virtual environments. The experimental results exhibit consistent and smooth back-and-forth transitions between virtual and real space in mobile robot teleoperation.
|
|
16:45-17:00, Paper Mo-PS50-T3.4 | Add to My Program |
Follow My Lead: Designing an ADAS That Shares Decision Making and Control with the Driver |
|
Weiss, Elliot | Stanford University |
Gerdes, Chris | Stanford University |
Keywords: Human-Centered Transportation, Shared Control, Virtual/Augmented/Mixed Reality
Abstract: Driving through traffic often involves sequences of distinct maneuvers, for example changing lanes and overtaking slower vehicles. To assist drivers in these situations, a system for shared decision making and control (SDMC) is developed, taking inspiration from the lead-follow relationship in partner dancing. The SDMC system plans maneuvers through parallel nonlinear optimizations, infers the driver's intended maneuver, and shares control over steering, throttle, and braking actuators to jointly execute the maneuver. Experimental results in overtaking and lane changing scenarios demonstrate the driver's ability to guide the system through sequences of maneuvers and the system's support of the driver via shared lateral and longitudinal control.
|
|
17:00-17:15, Paper Mo-PS50-T3.5 | Add to My Program |
Automatic Adjustment Method of an Operational Assist Rate for a Hydraulic Excavator by Shared Management |
|
Hiraoka, Kei | Hiroshima University |
Yamamoto, Toru | Hiroshima University |
Kozui, Masatoshi | KOBELCO Construction Machinery Co., Ltd |
Koiwai, Kazushige | Kobelco Construction Machinery Co., LTD |
Yamashita, Koji | Kobelco Construction Machinery Co., Ltd |
Keywords: Human-Machine Cooperation and Systems, Shared Control, Design Methods
Abstract: In recent years, sustainable development goals (SDGs) have attracted attention. In Japan, ”Society 5.0” has been proposed and promoted by various organizations as a vision for a future society linked to the achievement of SDGs. In particular, ”i-Construction” is being promoted in the construction industry. For example, hydraulic excavators are being automated (or semiautomated). Nevertheless, it is important for operators to have a sense of accomplishment and motivation for their work through active operations. In this study, a control system that automatically adjusting an operational assist rate according to the person is proposed . It is difficult to model a person because they exhibit time-varying and non-linear characteristics. Therefore, the control system uses work data to evaluate differences from the desired characteristics, and adaptively adjusts the operational assist rate. The proposed method is implemented on a hydraulic excavator and its effectiveness is verified.
|
|
17:15-17:30, Paper Mo-PS50-T3.6 | Add to My Program |
Safe Bilateral Teleoperation for a UAV Using Control Barrier Functions and Passivity |
|
Liu, Kai-Yuan | National Cheng Kung University |
Ibuki, Tatsuya | Meiji University |
Liu, Yen-Chen | National Cheng Kung University |
Keywords: Haptic Systems, Shared Control
Abstract: In this work, an optimization-based control scheme for a bilateral teleopered unmanned aerial vehicle (UAV) is proposed using control barrier functions (CBF) and passivity. We consider a human operator moving the position of an end effector of a haptic device as the velocity command input to the UAV. The CBF is applied as a constraint to maintain the collision-free motion of the UAV for safety. On the other hand, to preserve the stability of the systems, we consider passivity as another constraint together, but it leads to undesired behavior of an optimal solution for haptic feedback. To deal with the issue, the strategy called energy tank for passivity is included to replace the condition of strict output passivity. The aforementioned designs are formalized as a quadratically constrained quadratic program (QCQP) to solve numerically. Through numerical examples, we have a comprehensive discussion about the features of the proposed controller with better safety. The experimental results verify the effectiveness of our method in practice.
|
|
Mo-PS50-T4 Regular Session, Honolulu |
Add to My Program |
Intelligent Transportation Systems |
|
|
|
16:00-16:15, Paper Mo-PS50-T4.1 | Add to My Program |
Highway Condition Analysis and Traffic Safety Monitoring System through Analysis of Time-Series Data from LiDAR-Based Probe Vehicle |
|
Kim, Dohun | Electronics and Telecommunications Research Institute |
Kim, Hongjin | Korea Expressway Corporation |
Han, Sangjin | Seoul National University |
Kim, Wonjong | ETRI |
Keywords: Intelligent Transportation Systems, System Architecture, Smart Buildings, Smart Cities and Infrastructures
Abstract: Recently, various approaches have been developed for traffic analysis systems, ranging from Vehicle Detection Systems (VDS) to mobile detectors. However, stationary or mobile detectors have clear limitations. For example, in actual driving situations such as intersections or exits, each lane can face completely different traffic conditions, revealing the limitations of traffic information provided by conventional stationary detectors. In this paper, we propose a methodology to efficiently monitor the traffic condition and safety on the road by utilizing LiDAR sensors installed on vehicles to collect continuous traffic information about surrounding vehicles. To implement the Mobile Detection System (MDS), we collect point cloud data from LiDAR and detect the position and size of vehicles using a deep learning-based voxel-RCNN. We then convert the data into traffic information for analysis. Furthermore, we propose an efficient method for analyzing road hazards by introducing the MTTC method based on the TTC for hazard assessment. To evaluate the performance of the proposed method, we compare its reliability with that of conventional VDS and perform road hazards analysis using LiDAR-based probe vehicles with data collected directly from highways in Korea.
|
|
16:15-16:30, Paper Mo-PS50-T4.2 | Add to My Program |
Key Requirements for Autonomous Micromobility Vehicle Simulators |
|
Luttkus, Lennart | University of Augsburg |
Mikelsons, Lars | University of Augsburg |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle, Trust in Autonomous Systems
Abstract: With the growing demand for autonomous micromobility vehicles, developing robust and effective simulators for them becomes increasingly important. This research paper examines the essential requirements of a simulator for autonomous micromobility vehicles, focusing on aspects such as accurate sensor modeling, realistic pedestrian behavior, customizability, scenario and vehicle library, scalability, and user-friendliness. By analyzing these key features, we provide a comprehensive understanding of the necessary components for an effective simulation environment, aiming to enable researchers, developers, and other stakeholders to design, test, and evaluate autonomous micromobility vehicles in a safe and controlled manner. Addressing these requirements, simulators can significantly contribute to the advancement of autonomous micromobility technology, leading to safer and more efficient urban transportation systems in the future.
|
|
16:45-17:00, Paper Mo-PS50-T4.4 | Add to My Program |
Vehicle Map Mapping and Parking Occupancy Estimation System with No Ambiguity |
|
Oh, Cheonin | Electronics and Telecommunications Research Institute (ETRI) |
Shin, Sungwoong | Electronics and Telecommunications Research Institute(ETRI) |
Yoon, Daesub | ETRI |
Choi, Sunglok | SEOULTECH |
Keywords: Intelligent Transportation Systems, Infrastructure Systems and Services, Autonomous Vehicle
Abstract: This paper presents a novel approach for vehicle map mapping and estimating parking occupancy in parking lots using a single camera without parking lines. The system is capable of detecting around 10 vehicles even with just one camera, and is able to track vehicles with the same ID as they enter and pass through the parking lot. Furthermore, it proposes a method to map the locations of vehicles on real-world maps, allowing for accurate estimation of parking occupancy. The system achieved high accuracy in vehicle detection and tracking, with a 98.4% recall rate and 100% tracking accuracy. The accuracy of parking occupancy estimation was calculated using root mean square error, with an average error of 0.24m and a maximum error of 0.36m. The results demonstrate the feasibility of detecting and tracking vehicles, as well as estimating parking occupancy, in parking lots without parking lines using only one camera. Further testing on multiple sites, including nighttime and adverse weather conditions, is needed to increase the reliability of the system.
|
|
17:00-17:15, Paper Mo-PS50-T4.5 | Add to My Program |
Vehicle Occurrence-Based Parking Space Detection |
|
Lisboa de Almeida, Paulo Ricardo | Universidade Federal Do Paraná |
Honório Alves, Jeovane | Federal University of Paraná |
Oliveira, Luiz S. | UFPR |
Hochuli, Andre Gustavo | Pontifícia Universidade Católica Do Paraná (PPGIA/PUCPR) |
Fröhlich, João Vitor | Universidade Do Estado De Santa Catarina |
Krauel, Rodrigo Augusto | Universidade Do Estado De Santa Catarina |
Keywords: Intelligent Transportation Systems, Smart Buildings, Smart Cities and Infrastructures, Infrastructure Systems and Services
Abstract: Smart-parking solutions use sensors, cameras, and data analysis to improve parking efficiency and reduce traffic congestion. Computer vision-based methods have been used extensively in recent years to tackle the problem of parking lot management, but most of the works assume that the parking spots are manually labeled, impacting the cost and feasibility of deployment. To fill this gap, this work presents an automatic parking space detection method, which receives a sequence of images of a parking lot and returns a list of coordinates identifying the detected parking spaces. The proposed method employs instance segmentation to identify cars and, using vehicle occurrence, generate a heat map of parking spaces. The results using twelve different subsets from the PKLot and CNRPark-EXT parking lot datasets show that the method achieved an AP25 score up to 95.60% and AP50 score up to 79.90%.
|
|
17:15-17:30, Paper Mo-PS50-T4.6 | Add to My Program |
Vehicle Routing Problem with Fair Profits and Time Windows (VRP-FPTW) |
|
Lopez Sanchez, Aitor | University Rey Juan Carlos, CIF: Q2803011B |
Lujak, Marin | University Rey Juan Carlos |
Semet, Frederic | Centrale Lille, Univ. Lille, CNRS, Inria |
Billhardt, Holger | University Rey Juan Carlos |
Keywords: Intelligent Transportation Systems, Cooperative Systems and Control, Autonomous Vehicle
Abstract: In crowdsourced delivery organizations, where individual vehicles with shared common goals may have conflicting individual interests, the preference is for collaboration over competition, provided it is less costly. However, achieving a balance between the efficiency of individual vehicles and the overall fleet poses a challenge. This paper introduces a novel Vehicle Routing Problem with Fair Profits and Time Windows (VRP-FPTW), which aims to meet customer demand and stringent time windows while maximizing the profit of the worst-off vehicle in the fleet. We propose a centralized and distributed vehicle routing model for this problem, both with quality of solution guarantees. The distributed approach is tailored for multi-agent systems relying on a coordination mechanism where each vehicle modeled as an individually rational agent finds its route autonomously in coordination with a fleet coordinator agent, without sharing its private information. The objective of a vehicle agent is to maximize its own profit while following the fleet's norms and regulations based on shared values. Simulation experiments provide compelling evidence of the robustness and scalability of the proposed distributed approach, showcasing significant enhancements in both solution quality and computational efficiency, particularly when dealing with larger vehicle fleets.
|
|
Mo-PS50-T5 Regular Session, Kahuku |
Add to My Program |
Virtual, Augmented, Mixed Reality |
|
|
|
16:00-16:15, Paper Mo-PS50-T5.1 | Add to My Program |
Deep3DSketch+/+: High-Fidelity 3D Modeling from Single Free-Hand Sketches |
|
Zang, Ying | Huzhou University |
Ding, Chaotao | Huzhou University |
Chen, Tianrun | Zhejiang University |
Mao, Papa | KOKONI, Moxin (Huzhou) Technology Co., LTD |
Wenjun, Hu | Huzhou University School of Information Engineering |
Keywords: Multimedia Systems, Design Methods, Virtual/Augmented/Mixed Reality
Abstract: The rise of AR/VR has led to an increased demand for 3D content. However, the traditional method of creating 3D content using Computer-Aided Design (CAD) is a labor-intensive and skill-demanding process, making it difficult to use for novice users. Sketch-based 3D modeling provides a promising solution by leveraging the intuitive nature of human-computer interaction. However, generating high-quality content that accurately reflects the creator's ideas can be challenging due to the sparsity and ambiguity of sketches. Furthermore, novice users often find it challenging to create accurate drawings from multiple perspectives or follow step-by-step instructions in existing methods. To address this, we introduce a groundbreaking end-to-end approach in our work, enabling 3D modeling from a single free-hand sketch, Deep3DSketch+backslash+. The issue of sparsity and ambiguity using single sketch is resolved in our approach by leveraging the symmetry prior and structural-aware shape discriminator. We conducted comprehensive experiments on diverse datasets, including both synthetic and real data, to validate the efficacy of our approach and demonstrate its state-of-the-art (SOTA) performance. Users are also more satisfied with results generated by our approach according to our user study. We believe our approach has the potential to revolutionize the process of 3D modeling by offering an intuitive and easy-to-use solution for novice users.
|
|
16:15-16:30, Paper Mo-PS50-T5.2 | Add to My Program |
Capturing Quantitative Data from UI Prototypes for AR and VR Using Online Remote User Testing |
|
Garcia, Sarah | University of South Florida |
Andujar, Marvin | University of South Florida |
Keywords: User Interface Design, Human-Computer Interaction, Virtual/Augmented/Mixed Reality
Abstract: As the development of augmented reality (AR) and virtual reality (VR) applications is still limited to those with substantial amounts of technical knowledge, the prototyping and testing of user interface (UI) designs for AR and VR applications remotely proves difficult. Recent tools proposed for prototyping AR/VR applications focus on working toward increased fidelity of prototyping methods, but provide limited ability to easily collect objective quantitative data from user interactions with prototypes, especially in remote settings. In this paper, we present a remote usability study using Adobe XD rapid prototyping software integrated with Maze User Testing Software, that collect data for UI designs for an existing cross-platform software. The results were found by collecting task completion time, misclick rate, and user click-data using heatmaps. We discuss the results of our study, and show that objective quantitative data can be collected for AR/VR prototypes in remote testing settings to provide insightful usability feedback in early stages of interface design.
|
|
16:30-16:45, Paper Mo-PS50-T5.3 | Add to My Program |
An Augmented Cooperative Setting for Training the Embodiment of an Artificial Lower Limb |
|
Mariani, Giulia | Istituto Italiano Di Tecnologia (IIT) |
Tessari, Federico | Massachusetts Institute of Technology |
Ferraresi, Carlo | Politecnico Di Torino |
Lucania, Elena | Politecnico Di Torino |
Lo Tauro, Rebecca | Politecnico Di Torino |
Freddolini, Marco | Istituto Italiano Di Tecnologia |
Traverso, Simone | Istituto Italiano Di Tecnologia (IIT) |
Cherubini, Andrea | Istituto Italiano Di Tecnologia (IIT) |
Gruppioni, Emanuele | INAIL - Centro Protesi |
Laffranchi, Matteo | Istituto Italiano Di Tecnologia |
De Michieli, Lorenzo | Istituto Italiano Di Tecnologia |
Barresi, Giacinto | Istituto Italiano Di Tecnologia |
Keywords: Human-Machine Interaction, Human-Computer Interaction, Virtual/Augmented/Mixed Reality
Abstract: Literature highlights how virtual and augmented settings offer engaging solutions to improve one’s feeling of an artificial limb embodiment. In this paper, we explored the potential of a setting for Spatial Augmented Reality (SAR, where a display augments a surface without making the user wear any visor) in two conditions of a lower limb ownership training involving subjects without disabilities. In the first condition, the subject must contract the quadriceps of a leg for commanding (through electromyography, EMG) a virtual leg (a 3D model of the Hybrid Knee prosthesis) to kick a virtual wall: each collision corresponds to a vibratory feedback on the thigh (a position defined for upcoming tests with transfemural amputees). The second condition adds a social context to engage the user: the subject is asked to cooperate with another (fictional) player to kick on the same virtual wall. Subjective (through questionnaires) and objective (according to the number of kicks as a performance index, and the proprioceptive drift as an embodiment index) assessments have been performed before a rubber leg illusion test. Overall, we observed how the cooperative task can engage the subject to be more active. However, this condition can reduce the impact of the training on the embodiment itself, probably because the social task generates a distraction. Nevertheless, such findings suggest the possibility to alternate these two tasks in the same session to increase the duration of a prosthetic embodiment training.
|
|
16:45-17:00, Paper Mo-PS50-T5.4 | Add to My Program |
A Multimodal Virtual Keyboard in Fully Immersive Virtual Reality |
|
Smrkovsky, Eric | California State University, Fresno |
Wong, Ren Hao | California State University, Fresno |
Tran, Thuy Uyen My | California State University, Fresno |
Cecotti, Hubert | California State University, Fresno |
Keywords: Human-Computer Interaction, Virtual/Augmented/Mixed Reality, Virtual and Augmented Reality Systems
Abstract: Virtual reality (VR) technology can be used for multiple purposes, for simulating or enhancing real-life situations. VR-based interfaces can provide new operational advantages and reduce hardware requirements, i.e., weight, by offering various enhanced input modalities that otherwise could not be used in a physical cockpit. This study presents a comparative analysis of three input modalities for a virtual keyboard in a fully immersive VR environment. Each input modality has two steps: 1) pointing to the desired button, and 2) the selection of the item. We compare the hand laser pointer with trigger selection, the head laser pointer with trigger selection, and the head laser pointer with the dwell time selection (no need to use the hands). In addition, we propose new types of visual feedback: for all the input modalities, we display in the field of view of the user the last 5 letters. For the modality with the dwell time, we add a circular progress bar around the field of view of the user to provide a constant awareness of the possible button selection. The proposed virtual keyboard input modalities were assessed by 32 participants. The results support the conclusion that the hand pointer with the trigger control provides the best speed compared to other modalities.
|
|
17:00-17:15, Paper Mo-PS50-T5.5 | Add to My Program |
A Biofeedback-Enhanced Virtual Exergame for Upper Limb Repetitive Motor Tasks |
|
Galletti, Chiara | Istituto Italiano Di Tecnologia (IIT) |
Parente, Chiara | Politecnico Di Torino |
Bottino, Andrea | Politecnico Di Torino |
Lamberti, Fabrizio | Politecnico Di Torino |
Salatino, Laura | Istituto Italiano Di Tecnologia |
de Zambotti, Massimiliano | SRI International |
Podda, Jessica | Associazione Italiana Sclerosi Multipla (AISM) |
Tacchino, Andrea | Associazione Italiana Sclerosi Multipla (AISM) |
Brichetto, Giampaolo | Associazione Italiana Sclerosi Multipla (AISM) |
De Michieli, Lorenzo | Istituto Italiano Di Tecnologia |
Barresi, Giacinto | Istituto Italiano Di Tecnologia |
Keywords: Virtual/Augmented/Mixed Reality, Human-Computer Interaction, Biometrics and Applications,
Abstract: Upper Limb (UL) Rehabilitation in Multiple Sclerosis (MS) is an open research field due to the complex interplay between cognitive and physical dysfunctions. Virtual Reality (VR) can face such an issue by enriching physical training with engaging features, including biofeedback strategies to selfregulate autonomic functions according to the visualisation of indices like heart rate variability (HRV). In the present work, HRV biofeedback is introduced in a VR-based exergame (a game designed to promote exercising), tailored to rehabilitation of the dominant upper limb in Persons with MS (PwMS). The exergame is based on a dual-task paradigm, integrating a UL motor rehabilitative task with a breathing task. The aim is to investigate how the design developed for the HRV biofeedback affects engagement and performance during the exergame session. As a preliminary study, sixteen able-bodied subjects are tested in a within-subjects design, to assess the quality of the game features and design, before approaching MS patients. Two conditions are presented, with and without biofeedback. The proposed HRV biofeedback has two possible levels, depending on whether or not the desired respiratory rate of six breaths/min is successfully maintained. It is used to control game elements and change difficulty of the session. The main finding of this study is that biofeedback improves both user performance and experience in healthy subjects. These results underline the great potential of this technique to promote engagement. Thus, they point to fostering the rehabilitative effectiveness of repetitive motor tasks and encouraging adherence to the longterm training. Future studies will encompass fine tuning of the experimental setup and include PwMS to further adjust the game to patients’ needs and observe the setup compliance to rehabilitation settings.
|
|
17:15-17:30, Paper Mo-PS50-T5.6 | Add to My Program |
Cognitive Workload and Usability of Virtual Reality Simulation for Prosthesis Training |
|
Park, Junho | Texas A&M University |
Music, Austin | University of Florida |
Delgado, Daniel | University of Florida |
Berman, Joseph | North Carolina State University |
Dodson, Albert | North Carolina State University and University of North Carolina |
Liu, Yunmei | University of Florida |
Ruiz, Jaime | University of Florida |
Huang, He (Helen) | North Carolina State University/ University of North Carolina At |
Kaber, David | University of Florida |
Zahabi, Maryam | Texas A&M University |
Keywords: Assistive Technology, Virtual/Augmented/Mixed Reality, Human-Machine Interaction
Abstract: Amputees use prosthetic devices to perform activities of daily living. However, some users reject their devices due to the lack of usability or high cognitive workload. Although virtual reality has been studied in this domain for training purposes, there has not been any investigation on usability and cognitive workload of using virtual reality simulations for training of prosthetic devices. The objective of this study was to compare cognitive workload and usability of using virtual reality-based simulation of electromyography based prosthetic devices and physical devices. The findings suggested that using virtual reality simulations were helpful in reducing cognitive workload and increasing perceived usability of prosthetic devices.
|
|
17:30-17:45, Paper Mo-PS50-T5.7 | Add to My Program |
A Multi-Level HXRI-Based Approach for XR-Based Surgical Training |
|
Gupta, Avinash | University of Illinois Urbana-Champaign |
Cecil, J. | Oklahoma State University |
Pirela-Cruz, Miguel | Texas Tech University Health Sciences Center |
Keywords: Virtual/Augmented/Mixed Reality, Human-Computer Interaction, Virtual and Augmented Reality Systems
Abstract: In the past decade, Extended Reality (XR), through rapid technological advancement, has managed to make modest changes in surgical training approaches. Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR)-based simulators have been adopted for training in various surgical domains such as brain, eye, laparoscopic, and orthopedic surgery among others. However, there is a lack of effort in exploring Human Extended Reality Interaction (HXRI)-based concepts to design effective training environments that corroborate the actual training procedure. In this paper, an HXRI-based multi-level approach has been proposed for the creation of XR-based training environments. The multi-level approach takes advantage of both Virtual and Mixed Reality to create a holistic training environment while eliminating the disadvantage arising from using stand-alone VR and MR-based training environments. The approach was based on the results of an exhaustive study in which medical personnel interacted with VR and MR environments to train in an orthopedic surgical procedure. The approach, design, and development of the XR environments, the study, and the results from the study which underscore the importance of such a multi-level approach are presented in the paper.
|
|
Mo-PS50-T6 Regular Session, Oahu |
Add to My Program |
Navigation and Positioning Systems |
|
|
|
16:30-16:45, Paper Mo-PS50-T6.3 | Add to My Program |
Composition Optimization of Moving Objects Using Recursive Gaussian Process for Photography Drone |
|
Yokomatsu, Taisei | Meijo University |
Sekiyama, Kosuke | Meijo University |
Keywords: Modeling of Autonomous Systems, Robotic Systems
Abstract: In this study, we developed a process of heuristically searching and selecting the optimal viewpoint for capturing a good composition of a group of moving subjects with an autonomous indoor drone camera. The subjects on the drone’s camera screen are represented using a Gaussian mixture model. The Kullback–Leibler divergence between the Gaussian mixture model and a user-defined reference composition is evaluated and defined as the composition evaluation value. The drone searches for a viewpoint in a 3D space to optimize this value using particle swarm optimization. To facilitate the search, a recursive Gaussian process is employed to update the prediction of the observation result. Through the proposed method, a sufficient optimal viewpoint can be obtained, even for moving subjects.
|
|
16:45-17:00, Paper Mo-PS50-T6.4 | Add to My Program |
Improving Accuracy of Stereo Matching of Aerial Images by Extending the Baseline Length Based on RTK-GNSS and Application to Depth Measurement of Earthquake Cracks on the Ground Surface |
|
Tanabe, Ryota | University of Tsukuba |
Shoji, Gaku | University of Tsukuba |
Nobuhara, Hajime | University of Tsukuba |
Keywords: Infrastructure Systems and Services, Smart Metering, Smart Buildings, Smart Cities and Infrastructures
Abstract: This study measured the depth of cracks on the ground surface by stereo matching aerial images of the cracks captured using a drone equipped with real time kinematics-global navigation satellite system (RTK-GNSS) and a high-resolution compact digital camera. Existing crack depth measurement methods are time-consuming, expensive, and cannot measure a wide area. In comparison, the proposed method can measure the depth of cracks in a short time and over a wide area by stereo matching only two aerial images. In addition, the drone's movement extends the baseline length, which increases the resolution in the height direction, resulting in precise depth measurements. To demonstrate the effectiveness of the proposed method, we took aerial images of cracks made of Styrofoam using shooting equipment made of aluminum frame and measured their depth. We realized stable millimeter-order crack depth measurements and achieved millimeter-order height resolution of 1.404 mm, and achieved 0.14% and 0.13% errors for 5 cm and 15 cm deep cracks, respectively.
|
|
17:00-17:15, Paper Mo-PS50-T6.5 | Add to My Program |
Maritime Path Planning Using Heuristic Evaluation Functions for Weather Parameters |
|
Bienkowski, Adam | University of Connecticut |
Pattipati, Krishna | University of Connecticut |
Sidoti, David | US Naval Research Laboratory |
Keywords: Intelligent Transportation Systems, Decision Support Systems
Abstract: Planning ship routes that take into account meteorological and oceanographic conditions is a salient problem for both commercial and Naval applications. The A* algorithm is a very common method for finding the minimum cost path in a graph. Finding appropriate heuristics for a cost function is critical to the computational efficiency and memory requirements of the A* algorithm. We propose heuristic evaluation functions (HEFs) based on the minimum, mean, median, and mode of the cost function values over the reachable nodes given time constraints. Only the HEF based on the minimum is admissible, but we show that the other heuristics are able to find near-optimal solutions in substantially shorter times than the Dijkstra's algorithm, which does not use a heuristic. We evaluate these heuristics over many scenarios and show that overall the HEF based on the mean value performs the best. This HEF can be used in time-critical applications where an occasional loss of optimality is sacrificed for faster run time, such as real-time planning and control, or as part of a multi-objective shortest path algorithm for planning and execution.
|
|
17:15-17:30, Paper Mo-PS50-T6.6 | Add to My Program |
Ship Control of Approach Maneuvering under Wind Disturbance Using a Deep Neural Network |
|
Kashiwagi, Hideto | Tokyo University of Marine Science and Technology |
Okazaki, Tadatsugi | Tokyo University of Marine Science and Technology |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems
Abstract: The effect of wind disturbance is one of the major factors that complicate heading control in ship control at low speeds. Approach maneuvering while decreasing speed and approaching the pier requires a control adopted to changing maneuverability and wind conditions. In this study, a control scheme using a deep neural network (DNN) was developed to output command values corresponding to wind conditions by training the DNN on maneuvering patterns of approach maneuvering under various wind conditions. The developed control scheme was validated through experiments using an actual ship.
|
|
Mo-PS50-T7 Regular Session, Hawaii 2 |
Add to My Program |
Cyber Modern Technology on Medicine, Health Care and Human Assist |
|
|
Chair: Yagi, Naomi | University of Hyogo |
|
16:00-16:15, Paper Mo-PS50-T7.1 | Add to My Program |
Automation Data Acquisition and Shortening Training Time for Surgical Instrument Detection System in Total Knee Arthroplasty (I) |
|
Kasai, Ryusei | University of Fukui |
Nagamune, Kouki | University of Fukui |
Keywords: Machine Learning, Artificial Social Intelligence, Neural Networks and their Applications
Abstract: In total knee arthroplasty, nearly 100 types of surgical instruments are prepared in addition to the usual surgical instruments. Since there are many types of surgical instruments used and their shapes and sizes are similar, accidents due to incorrect selection of implants have occurred. In addition, there is concern that the shortage of nurses will accelerate around the world in the future. For these reasons, we developed a surgical instrument detection system for total knee arthroplasty to reduce the incidence of accidents and the burden on scrub nurses. The surgical instrument detection system in the previous study required a lot of labor and time to create training data. Also, when increasing the number of surgical instruments to be detected, it was necessary to retrain all the surgical instruments. To solve these problems, we developed an automatic annotation system using a turntable in this study. In addition, we develop learning and inference methods using standard objects. In the experiment, we verified the accuracy of the learning data created by the automatic annotation system and verified the object detection accuracy for five types of surgical instruments. As a result, all experiments showed the effectiveness of this system.
|
|
16:15-16:30, Paper Mo-PS50-T7.2 | Add to My Program |
Semi-Automatic Placenta Segmentation Based on Time-Series Superpixel Propagation for Fetal Growth Restriction Estimation (I) |
|
Nishida, Kentaro | Graduate School of Engineering, Mie University |
Morita, Kento | Mie University |
Magawa, Shoichi | Mie University |
Nii, Masafumi | Mie University |
Ikeda, Tomoaki | Mie University |
Wakabayashi, Tetsushi | Mie University |
Keywords: Image Processing and Pattern Recognition, Machine Learning, Deep Learning
Abstract: Fetal Growth Restriction (FGR) slows or stops the fetal growth during pregnancy, resulting in low fetal weight at gestational age. Currently, FGR is diagnosed based on ultrasound weight estimation and blood test, but the estimated weight have an error and blood test cannot diagnose FGR correctly due to placental factor, so prenatal diagnosis of FGR is difficult. Therefore, the use of Blood oxygenation level-dependant magnetic resonance imaging (BOLD MRI) has been investigated as a method for prenatal diagnosis of FGR. This paper proposes a FGR estimation method using placenta region in time-series BOLD MRI. The proposed method extracted placenta region semi-automatically using Simple Linear Iterative Clustering (SLIC) on the time-series BOLD MRI, and input time-series radiomics features in the placental region into Long Short-Term Memory (LSTM) to estimate FGR/non-FGR status of patient. As a result, the semi-automatic extraction of the placental region achieved the highest dice coefficient of 0.774 at the final frame where the extraction accuracy would be decreased due to propagation. And the result of 4-fold cross validation test, FGR/non-FGR were predicted in F-measure of 0.773, 0.745, 0.794, and 0.800, respectively.
|
|
16:30-16:45, Paper Mo-PS50-T7.3 | Add to My Program |
Development of System for Quantifying the Lachman Test Using Inertial and Force Sensors (I) |
|
Kato, Hiroki | University of Fukui |
Nagamune, Kouki | University of Fukui |
Keywords: Computational Life Science, Computational Intelligence, Cybernetics for Informatics
Abstract: Manual testing is one of the most common diagnostic methods for anterior cruciate ligament (ACL) injuries in the knee joint. In particular, the Lachman test is superior to others. In this study, a 9-axis inertial sensor and a force sensor are used to measure the examiner's manual technique. In this study, we experimented on the accuracy of the inertial sensor and an experiment to measure the Lachman test. The results of Experiment 1 were within ± 1.7 degrees of the theoretical values for each axis throughout the experiment. The results of Experiment 2 showed that Endpoint could be estimated from the acceleration component of the subject's tibia, the acceleration component of the examinee's hand, and the force of each finger and that the motion of the knee joint and the examinee's hand technique could be measured. The development of this system is expected to have training effects for unskilled examinees.
|
|
16:45-17:00, Paper Mo-PS50-T7.4 | Add to My Program |
Development of a Quantitative Evaluation System for Bradykinesia Using MediaPipe Hands (I) |
|
Ishizuka, Ryoga | University of Fukui |
Nagamune, Kouki | University of Fukui |
Keywords: Machine Learning, Image Processing and Pattern Recognition, Machine Vision
Abstract: Parkinson’s disease is a neurodegenerative disease characterized by slow progression and is never cured. Its symptoms include motor impairments such as bradykinesia, muscle stiffness, tremor at rest, and postural dysreflexia, as well as non-motor impairments such as depression and autonomic neuropathy. One measure to evaluate Parkinson’s disease is the Unified Parkinson’s Disease Rating Scale. One of the items in this scale that evaluates bradykinesia, one of the motor symptoms, is the finger tapping test. However, many of the expressions used there are subjective, and the evaluation criteria are ambiguous. Therefore, the purpose of this study was to develop a quantitative hand movement evaluation system using MediaPipe Hands, to analyze finger tapping test, and to evaluate the effectiveness of the system. With this developed system, hand movements can be analyzed by using a webcam. Hand motion was analyzed by calculating the length between hand joints, the speed of joint movement, and the angle between joints. To evaluate the accuracy of the system, we first measured the length between the joints of the fingers as the hand was moved away from and closer to the camera, and then examined the error. Then, the finger tapping test was conducted using the developed system to evaluate the bradykinesia. Accuracy evaluations have shown that MediaPipe Hands can detect hand movements with high accuracy. The finger tapping test also revealed that there was reproducibility in the magnitude of each finger movement, but not in the speed of the movement.
|
|
17:00-17:15, Paper Mo-PS50-T7.5 | Add to My Program |
Generation of Hand Reaching Motion Illusion Using Vibration Stimulation (I) |
|
Abe, Misaki | Kyushu University |
Nishikawa, Satoshi | Kyushu University |
Kiguchi, Kazuo | Kyushu University |
Keywords: Biometric Systems and Bioinformatics, Cyborgs,
Abstract: Artificial motion sensation can be used for virtual reality, rehabilitation, etc. It is known that joint motion illusion is induced by giving certain vibration stimulation on human muscles even though the joint is not actually moved. In this paper, a method to generate motion illusion of arbitrary hand reaching motion, which is one of the most important daily living movements, by giving multiple vibration stimulations on shoulder and elbow muscles is proposed. In order to generate hand reaching motion illusion, frequency of vibration stimulation is controlled on each muscle at the same time considering the spatial hand movement in the proposed method. The experimental of generating hand reaching motion illusion is carried out to evaluate the effectiveness of the proposed method.
|
|
17:15-17:30, Paper Mo-PS50-T7.6 | Add to My Program |
ROC-Score-Based Ensemble Training for Multiple Deep Learning Modules in Classification between Polyps and Non-Polyps in CT Colonography (I) |
|
Suzuki, Kenji | Tokyo Institute of Technology |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: We developed an automatic ensemble training method for fusing massive-training artificial neural network (MTANN) deep-learning modules in classification between polyps and non-polyps in CT colonography. We started from an initial MTANN module that had been trained with an initial set of polyps and non-polyps. We applied the trained initial module to polyps and non-polyps to analyze the weakness of the initial module. We arranged the output scores of the initial module to form a score scale in receiver-operating-characteristic (ROC) space, representing the “degree of difficulty” in distinction between polyps and non-polyps by the initial module. Based on the score-space, several sets of training polyps and non-polyps with different degrees of difficulties were determined. We trained several MTANN modules with the several sets of training samples so that each module became an expert at a certain level of difficulty. We then combined expert modules with a mixing module to form a “mixture of expert” MTANNs. Our database consisted of CT colonography datasets acquired from 100 patients, including 26 polyps. The mixture of expert MTANNs with the ensemble training method distinguished all polyps correctly from more than 50% of the non-polyps. We compared the effectiveness of the ensemble training with that of training with manually selected cases. The performance of the mixture of expert MTANNs with ensemble training method was superior to that of the “reference-standard” MTANNs trained with manually selected cases, which could reduce the cost of the manual selection.
|
|
17:30-17:45, Paper Mo-PS50-T7.7 | Add to My Program |
Prediction of Bed-Leaving Behaviors Using Edge AI to Prevent Medical Accidents (I) |
|
Kondo, Noriyasu | University of Hyogo |
Fujita, Daisuke | University of Hyogo |
Kobashi, Syoji | University of Hyogo |
Fujita, Takayuki | University of Hyogo |
Keywords: Machine Learning
Abstract: The incidence of falls in hospital facilities is high and can lead to a decrease in patients' quality of life and an increase in medical expenses. Therefore, the development of a system that can predict getting-up from a bed is necessary. This study proposes a get-up detecting sensor using 6-axis inertial sensor. This system can detect getting-up from a bed in real-time using machine learning with Edge AI. To evaluate the basic performance of the proposed system, a protocol was applied for four subjects, and data were collected. Alert accuracy rate and false alert rate were used as evaluation metrics, and a model was built using data from three of the subjects and evaluated with the remaining subject, which was repeated for all four subjects. For high-risk bed-leaving behavior, medium-risk pre-bed-leaving behavior, and low-risk get-up behavior, the alert accuracy rate (i.e., Recall) was 89.4%, 97.5%, and 86.3%, respectively, and the false alert rate (1-Precision) was 6.3%, 10.2%, and 0.0%, respectively. This confirmed the possibility of predicting rising behavior with high accuracy. Furthermore, as a proof of concept, a real-time get-up detection system was developed, and its practicality was demonstrated. Future challenges include reevaluating feature extraction and evaluating the performance of the proposed system with a diverse range of subjects of different ages, genders, and health statuses.
|
|
Mo-PS50-T12 Workshop Session, Hawaii 4 |
Add to My Program |
Workshop 3 - Workshop on AI and (Cyber)Security |
|
|
Organizer: Falk, Tiago H. | INRS-EMT |
Organizer: Avila, Anderson | INRS |
Organizer: Lameiras Koerich, Alessandro | Ecole De Technologie Superieure (ETS) |
|
16:00-16:15, Paper Mo-PS50-T12.1 | Add to My Program |
Enhancing Cyber Defense: Using Machine Learning Algorithms for Detection of Network Anomalies (I) |
|
Li, Zhida | New York Institute of Technology |
Trajkovic, Ljiljana | Simon Fraser University |
Keywords: Communications
Abstract: Developing advanced cyber defense techniques is essential for effectively detecting network anomalies that are becoming more challenging to identify. In this paper, we generate machine learning models based on real-time Internet and historical data and evaluate their classification performance. We introduce a network anomaly detection tool CyberDefense that integrates various stages of the anomaly detection process. It facilitates performance evaluation of machine learning algorithms and generation of new machine learning models. Its modular and scalable design enables incorporating new datasets and machine learning algorithms. The tool has been utilized to generate models and evaluate their classification performance using datasets collected during reported power outage and ransomware attacks.
|
|
16:15-16:30, Paper Mo-PS50-T12.2 | Add to My Program |
BGP Features and Classification of Internet Worms and Ransomware Attacks (I) |
|
Takhar, Hardeep | Simon Fraser University |
Trajkovic, Ljiljana | Simon Fraser University |
Keywords: Communications
Abstract: Machine learning approaches for detecting anomalies in communication networks heavily depend on the properties of training data. We analyze the impact of data probability distributions on performance of machine learning models developed based on Border Gateway Protocol datasets collected during the worm and ransomware attacks. Feature selection is performed to determine the most important features and identify their best fitting distributions. Experimental results indicate that certain features follow heavy-tailed distributions. Traffic anomalies are then classified based on selected features using the gradient boosting decision tree models suitable for designing real-time and scalable intrusion detection systems.
|
|
16:30-16:45, Paper Mo-PS50-T12.3 | Add to My Program |
Intelligent Tutoring System for Cyber Security with a Trust Management System Component (I) |
|
Arrabito, Robert | Defence Research and Development Canada |
Hou, Ming | Department of National Defence, Canada |
Fischmeister, Sebastian | University of Waterloo |
Falk, Tiago H. | INRS-EMT |
Willoughby, Hannah | C3 Human Factors |
Cameron, Madison | C3 Human Factors |
Foley, Liam | C3 Human Factors |
Normandin, Sarah | C3 Human Factors |
Banbury, Simon | C3 Human Factors |
Keywords: Decision Support Systems, Adaptive Systems, Trust in Autonomous Systems
Abstract: The Royal Canadian Navy (RCN) strategic vision Cyber Strategy 2020-2025 requires that the RCN workforce is trained, educated and aware of cyber risks and their role in cyber security and defence [1]. To support the RCN vision, Defence Research and Development Canada (DRDC) – Toronto Research Centre recently commenced an investigation to research advanced training strategies for maintaining operational resilience against cyber attacks onboard His Majesty’s Canada (HMC) platforms based on the identification of a corpus of human-noticeable aspects of cyber attacks. This paper presents training strategies that are based on emerging artificial intelligence (AI) technologies for teaching cyber security to RCN operators to support cyber damage control on future HMC platforms. We promote that AI systems can achieve state-of-the-art results for cyber security training. We began our investigation by interviewing six Department of National Defence/Canadian Armed Forces subject matter experts (SMEs) with substantial experience in cyber damage control on how to identify potential cyber security risks, and mitigations strategies to detect, respond, and recover from cyber security incidents onboard HMC platforms [2]. Cyber awareness training for all RCN ranks was identified by the SMEs as the biggest mitigator to improve operator awareness of compromised platform systems such as the Integrated Platform Management System or the Combat Management System. One means of future instruction for the RCN is the design and development of an Intelligent Tutoring System (ITS) [3, 4]. Traditional ITS uses static instructional training in cyber security education, which is not able to meet the evolving landscape of cyber threats across a wide student population. New ITS will need to employ adaptive learning for individuals with different backgrounds to improve the student learning experience [5, 6]. Combating cyber threats also requires trust in Information Technology (IT) as it relates to cyber damage control. As a potential additional component of an ITS, a Trust Management System (TMS) needs to be implemented to help build, maintain, and repair operator trust in IT. The TMS will identify when trust in the system is lost using psychoph
|
|
16:45-17:00, Paper Mo-PS50-T12.4 | Add to My Program |
Assessing the Vulnerability of Self-Supervised Speech Representations for Keyword Spotting under White-Box Adversarial Attacks (I) |
|
Guimaraes, Heitor | Institut National De La Recherche Scientifique |
Zhu, Yi | Institut National De La Recherche Scientifique |
Mengara, Orson | Institut National De La Recherche Scientifique |
Avila, Anderson | INRS |
Falk, Tiago H. | INRS-EMT |
Keywords: Trust in Autonomous Systems, Communications
Abstract: Self-supervised speech pre-training has emerged as a useful tool to extract representations from speech that can be used across different tasks. While these models are starting to appear in commercial systems, their robustness to so-called adversarial attacks have yet to be fully characterized. This paper evaluates the vulnerability of three self-supervised speech representations (wav2vec 2.0, HuBERT and WavLM) to three white-box adversarial attacks under different signal-to-noise ratios (SNR). The study uses keyword spotting as a downstream task and shows that the models are very vulnerable to attacks, even at high SNRs. The paper also investigates the transferability of attacks between models and analyses the generated noise patterns in order to develop more effective defence mechanisms. The modulation spectrum shows to be a potential tool for detection of adversarial attacks to speech systems.
|
|
Mo-PS6-T1 Regular Session, Hawaii 1 |
Add to My Program |
Machine Learning IV |
|
|
|
18:15-18:30, Paper Mo-PS6-T1.2 | Add to My Program |
AgreementLoss: A Data Pruning Metric for Stable Performance Over All Pruning Ratio |
|
Higashi, Ryota | Wakayama University |
Wada, Toshikazu | Wakayama University |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Data pruning is the method selecting a subset from an entire training dataset while maintaining the performance after training. Most pruning metrics measure the difficulty of each training sample for the target task and select the hardest samples first. The “hardest-first sampling” works well when the pruning ratio is low (the pruned dataset is large). However, while the pruning ratio increases, the performance decreases significantly and get worse than the random pruning. This is because the hardest samples are the special samples of the class, and a model trained on few hardest samples specializes in them and drops its performance. For solving this problem, we propose a new metric: AgreementLoss. This is computed from two classifier models trained on two datasets obtained by the disjoint decomposition of the original dataset. By the dataset decomposition, the predictions of the models are more different for a sample since either of them trains each sample in the original dataset. The metric is defined as the positive or negative loss value of the trained either model, depending on two predictions agree or disagree. This metric represents the difficulty of each sample, while extremely difficult “noisy” samples get lower scores by the negative sign. By selecting samples in descending order of the AgreementLoss scores, we can reduce the priority of the “noisy” samples. Experiments on various datasets show that our method gets stable performance over all pruning ratio and outperforms other metrics especially when the pruning ratio is high.
|
|
18:45-19:00, Paper Mo-PS6-T1.4 | Add to My Program |
Enlarged Large Margin Loss for Imbalanced Classification |
|
Kato, Sota | Meijo University |
Hotta, Kazuhiro | Meijo University |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: We propose a novel loss function for imbalanced classification. LDAM loss, which minimizes a margin-based generalization bound, is widely utilized for class-imbalanced image classification. Although, by using LDAM loss, it is possible to obtain large margins for the minority classes and small margins for the majority classes, the relevance to a large margin, which is included in the original softmax cross entropy loss, is not be clarified yet. In this study, we reconvert the formula of LDAM loss using the concept of the large margin softmax cross entropy loss based on the softplus function and confirm that LDAM loss includes a wider large margin than softmax cross entropy loss. Furthermore, we propose a novel Enlarged Large Margin (ELM) loss, which can further widen the large margin of LDAM loss. ELM loss utilizes the large margin for the maximum logit of the incorrect class in addition to the basic margin used in LDAM loss. Through experiments conducted on imbalanced CIFAR datasets and large-scale datasets with long-tailed distribution, we confirmed that classification accuracy was much improved compared with LDAM loss and conventional losses for imbalanced classification.
|
|
19:00-19:15, Paper Mo-PS6-T1.5 | Add to My Program |
Comparative Study on Semi-Supervised Learning Applied for Anomaly Detection in Hydraulic Condition Monitoring System |
|
Dong, Yongqi | Delft University of Technology |
Chen, Kejia | Zhejiang University |
Ma, Zhiyuan | ShangHai Normal University |
Keywords: Deep Learning
Abstract: Condition-based maintenance is becoming increasingly important in hydraulic systems. However, anomaly detection for these systems remains challenging, especially since that anomalous data is scarce and labeling such data is tedious and even dangerous. Therefore, it is advisable to make use of unsupervised or semi-supervised methods, especially for semi-supervised learning which utilizes unsupervised learning as a feature extraction mechanism to aid the supervised part when only a small number of labels are available. This study systematically compares semi-supervised learning methods applied for anomaly detection in hydraulic condition monitoring systems. Firstly, thorough data analysis and feature learning were carried out to understand the open-sourced hydraulic condition monitoring dataset. Then, various methods were implemented and evaluated including traditional stand-alone semi-supervised learning models (e.g., one-class SVM, Robust Covariance), ensemble models (e.g., Isolation Forest), and deep neural network based models (e.g., autoencoder, Hierarchical Extreme Learning Machine (HELM)). Typically, this study customized and implemented an extreme learning machine based semi-supervised HELM model and verified its superiority over other semi-supervised methods. Extensive experiments show that the customized HELM model obtained state-of-the-art performance with the highest accuracy (99.5%), the lowest false positive rate (0.015), and the best F1-score (0.985) beating other semi-supervised methods.
|
|
Mo-PS6-T3 Regular Session, Hawaii 5 |
Add to My Program |
Cybernetics General I |
|
|
|
18:00-18:15, Paper Mo-PS6-T3.1 | Add to My Program |
Extremely Weak Feedback Method for Controlling Chaotic Resonance |
|
Iinuma, Takahiro | Chiba Institute of Technology |
Ebato, Yudai | Chiba Institute of Technology |
Nobukawa, Sou | Chiba Institute of Technology |
Tran, Anh Tu | Chiba Institute of Technology |
Wagatsuma, Nobuhiko | Toho University |
Inagaki, Keiichiro | Chubu University |
Doho, Hirotaka | Kochi University |
Yamanishi, Teruya | Osaka Seikei University |
Nishimura, Haruhiko | Yamato University |
Keywords: Cybernetics for Informatics, Computational Intelligence
Abstract: Chaotic resonance, resembling stochastic resonance, is induced by internal fluctuations (i.e., chaos). This phenomenon has been observed in numerous systems. The most representative form of chaotic resonance is the synchronization with respect to weak applied signals under chaos--chaos intermittency, where a chaotic orbit moves among multiple attractors. Chaotic resonance exhibits higher sensitivity than stochastic resonance, but its engineering applications are limited by concerns such as the requirement to adjust the state of chaos by internal system parameters to induce chaotic resonance rather than the external noise strength. However, achieving this adjustment is challenging, especially in biological systems. Therefore, to handle this limitation, we proposed a novel double-Gaussian-filtered reduced region of orbit (RRO) method (called the DG-RRO method) for obtaining perturbed feedback signals lower than the conventional RRO method. This DG-RRO feedback signal is determined by the inverse sign of the map function and double-Gaussian filters around the local maximum/minimum values of the map. Because of its fine local specification, the DG-RRO feedback signal induces a chaotic resonance by one-third the feedback strength of the conventional signal. This method may pave the way for using chaotic resonance in engineering applications.
|
|
18:15-18:30, Paper Mo-PS6-T3.2 | Add to My Program |
Human-Robot Imitation Learning of Movement for Embodiment Gap |
|
Tanaka, Masaki | Meijo Univercity |
Sekiyama, Kosuke | Meijo University |
Keywords: Expert and Knowledge-Based Systems, Deep Learning, Computational Intelligence
Abstract: One of the problems in imitation between humans and robotd is embodiment gap. Therefore, in this paper, we propose a system that realizes movement level imitation between humans and robots for embodiment gap. We target embodiment gap arising from link length and number of joints, and estimate embodiment gap by solving not only the inverse kinematics but also the forward kinematics using the Levenbarg-Marquart method. By doing this, human movement is separated into end-effector-oriented movement and joint angle-oriented movement according to the robot’s own embodiment. Then, based on the separated movement, we realize the imitation of movement using Behavior Cloning based Energy-Based Model. We applied the proposed method to a small humanoid robot and verified its effectiveness. As a result, we confirmed that the proposed method separates human movement dependent on the robot’s embodiment. In addition, we realized the imitation learning of movement corresponding to embodiment gap estimated from the separated human movement.
|
|
18:45-19:00, Paper Mo-PS6-T3.4 | Add to My Program |
EdgeCPS-AI Knowledge Sharing Model for Supporting Computing Partition |
|
Kim, Young-Joo | Electronics and Telecommunications Research Institute |
Kang, Sungjoo | Electronics and Telecommunications Research Institute |
Chun, In-geol | Electronics and Telecommunications Research Institute |
Keywords: Computational Intelligence in Information, AI and Applications, Expert and Knowledge-Based Systems
Abstract: The rapid development of the AI industry had led to the widespread utilization of numerous edge devices in the real world. However, AI services still heavily rely on high-performance computing systems, which means that these devices are not being effectively utilized. To address these issues, EdgeCPS technology has emerged. EdgeCPS is a technology that supports the seamless execution of various services, including AI applications, through the interconnection of edge devices and edge servers, as well as resource and function augmentation. In order to appropriately utilize edge devices in an EdgeCPS environment, the paper proposes an EdgeCPS-AI knowledge sharing model for supporting computing partition services. The proposed EdgeCPS-AI knowledge is represented as a graph, which systematically structures AI service-related information that arises on virtual and physical edge devices. This knowledge provides not only partitioned AI weighted models in order to support the characteristics of EdgeCPS, but also enables flexible provision of various information required for AI services. Thus, this approach can facilitate the reconstruction of AI services according to service requirements and maximize the utilization of edge devices. To achieve this, a microservice-based computing partition AI service is devised and experimentally proven by constructing a Kubernetes system using 8 heterogeneous edge devices and a graph DB system consisting of 2,762 nodes. Experimental results show that performing computing partition services on edge devices is significantly more efficient in terms of overall resource consumption, with up to 85.57% improvement, as compared to performing the same tasks on high-performance computing systems.
|
|
19:00-19:15, Paper Mo-PS6-T3.5 | Add to My Program |
TDID: Transparent and Efficient Decentralized Identity Management with Blockchain |
|
Hao, Jiakun | Peking University |
Gao, Jianbo | Peking University |
Xiang, Peng | Peking University |
Zhang, Jiashuo | Peking University |
Chen, Ziming | Peking University |
Hu, Hao | Nanjing University |
Chen, Zhong | Peking University |
Keywords: Cybernetics for Informatics, Big Data Computing,, Cloud, IoT, and Robotics Integration
Abstract: Decentralized identity (DID) is an identity management framework aiming to return the ownership of an identity to its corresponding user. Recent studies propose to store the identifiers of DID issuers and implement identity management systems based on blockchain. However, existing systems cannot avoid identity tampering and verifiable credential abuse of decentralized identities, which makes the identity management opaque. In this paper, we propose TDID, a Transparent and efficient Decentralized IDentity management system with blockchain. The key insight behind TDID is to manage the registration and authentication of DIDs via smart contracts, and design Structured Merkle Patricia Tree (SMPT) as an underlying data structure to store identity data on blockchain. The smart contract based processes can improve transparency of decentralized identity management, while the SMPT data structure can realize efficient storage of DID data. We implement and evaluate TDID on different identity management operations, and the experimental results show that TDID can achieve about 3.1 times for write operation and 6.3 times for read operation while improving the transparency of DID management.
|
|
19:15-19:30, Paper Mo-PS6-T3.6 | Add to My Program |
A Nested Edge Addition Strategy for Network Controllability Robustness Enhancement |
|
Wu, Chengpei | Sichuan Normal University |
Li, Junli | Sichuan Normal University |
Xu, Siyi | Sichuan Normal University |
Yu, Zhuoran | Sichuan Normal University |
Keywords: Heuristic Algorithms, Complex Network, Optimization and Self-Organization Approaches
Abstract: Edge rectification is a widely used method to enhance network robustness. However, in some networked systems, edge rectification may be challenging or even infeasible to implement. An edge addition strategy is proposed as an alternative optimization method in this paper. Nested Ring Structure (NRS), whereby each node's edges connect its nearest neighbors along the backbone direction, have exhibited robust controllability against random attacks. Therefore, The Nested Edge Addition (NEA) strategy is proposed, which enhances network controllability by building NRS through edge addition to a given initial network. With a small number of added edges, NEA can rapidly enhance network controllability, allowing the network to be controlled using just one driver node. The more nested edges are added, the stronger the NRS in a network, thus exhibiting better controllability robustness. The effectiveness of NEA is verified by simulations on both synthetic and real-world networks. Extensive experimental results demonstrate that NEA is an efficient strategy for designing network topology and optimizing real-world networks.
|
|
Mo-PS6-T4 Regular Session, Honolulu |
Add to My Program |
Advanced Robotics and Autonomous Systems |
|
|
|
18:00-18:15, Paper Mo-PS6-T4.1 | Add to My Program |
Design of Anthropomimetic Robotic Wrist Joint and Forearm |
|
Obata, Yoshinobu | The University of Electro-Communications |
Jiang, Yinlai | The University of Electro-Communications |
Yokoi, Hiroshi | The University of Electro-Communications |
Togo, Shunta | Univ. Electro-Communications |
Keywords: Soft Robotics, Mechatronics, Robotic Systems
Abstract: In this study, we propose a design methodology for anthropomimetic robotic wrists and forearms. Conventional robotic wrists and forearms have few examples of human mimicry, and they have been simplified. The ligaments and tendons of the robotic forearm proposed in this study were replicated using chain-stitched wires and arranged in a manner similar to the human anatomy. Motion capture and goniometer measurements were used to measure the range of motion (ROM) of the wrist and forearm, driven by 16 servomotors. The results of the experiment were compared with the human ROM reported in previous studies, and it was found that the developed robotic forearm could achieve a ROM similar to that of humans. The use of this robotic forearm in functional verification experiments will enhance our understanding of the human structure. The integration of mechanical mechanisms with structures created through evolution is expected to improve the functionality of future robots.
|
|
18:15-18:30, Paper Mo-PS6-T4.2 | Add to My Program |
Redundant Voronoi Roadmap Graph Using Imaginary Obstacles for Multi-Robot Path Planning |
|
Aryadi, Hanif | Tohoku University |
Bezerra, Ranulfo | Tohoku University |
Ohno, Kazunori | Tohoku University |
Gunji, Kenta | Tohoku University |
Kojima, Shotaro | Tohoku University |
Kuwahara, Masao | Tohoku University |
Okada, Yoshito | Tohoku University |
Konyo, Masashi | Tohoku University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Robotic Systems, Conflict Resolution
Abstract: Roadmap-based path planning is a well-established method for multi-robot systems, where the free space of the environment is represented as a roadmap graph. The Voronoi diagram is known for its efficiency in creating roadmaps for single-robot systems due to its ability to generate paths with high clearance from obstacles. However, the design of the Voronoi diagram does not allow for redundant paths, as only one path is obtained between each pair of obstacles. In contrast, multi-robot path planning requires multiple path options for efficient solutions. Therefore, we propose redundant Voronoi roadmap graph, which incorporates multiple paths computed from the Voronoi diagram. In this approach, we introduce imaginary obstacles to modify the costmap and obtain a roadmap with more paths. Our proposed roadmap retains the high clearance feature of the Voronoi diagram and is suitable for multi-robot systems due to the availability of alternative route options. We demonstrate that our method can generate roadmaps in various simulated environments with different levels of redundancy. Additionally, we verify the efficiency of our proposed roadmap graph through graph analysis and multi-robot path planning experiments. Comparative analysis shows that the use of the proposed roadmap increases the success rate and solution quality compared to the roadmap obtained directly from the conventional Voronoi diagram.
|
|
18:30-18:45, Paper Mo-PS6-T4.3 | Add to My Program |
Robust Cooperative Control of a Team of UAVs Carrying a Slung Payload |
|
Samarasinghe, Sudarshan Mark | Deakin University |
Abu Alqumsan, Ahmad | Deakin University |
Arogbonlo, Adetokunbo | Deakin University |
Pappu, Mohammad Rokonuzzaman | Deakin University |
Nahavandi, Saeid | Swinburne University of Technology |
Keywords: Cooperative Systems and Control, System Modeling and Control, Autonomous Vehicle
Abstract: The increased use of commercial Unmanned Aerial Vehicles (UAVs) has generated a great interest in their potential to be used for transporting loads and other equipment. However, as the attraction of a UAV is its versatility and cost effectiveness, constraints are placed on the size and capability of a single UAV. Therefore, using multiple UAVs in flight formation has become an elegant solution to these limitations. The formation control of a team of load bearing UAVs is far from trivial. The UAV itself is an underactuated nonlinear system posing significant control challenges. Recent research on this topic has shown promising results and interesting modelling methods such as the Udwadia- Kalaba method has been proposed, to model the loaded system. This research will explore this problem using this method while seeking to bring in robust control techniques for low-level UAV stabilization by designing a sliding mode control system. The proposed low level controller will be combined with the formation controller and the stability demonstrated through simulations.
|
|
18:45-19:00, Paper Mo-PS6-T4.4 | Add to My Program |
2D Ultrasound-Guided Visual Servoing for In-Plane Needle Tracking in Robot-Assisted Percutaneous Nephrolithotomy |
|
Mazdarani, Hoorieh | Carleton University |
Cotton, Alec | Carleton University |
Rossa, Carlos | Carleton University |
Keywords: Robotic Systems, System Modeling and Control, Mechatronics
Abstract: Ultrasound (US)-guided percutaneous nephrolithotomy is a surgical procedure for large kidney stone removal through an incision in the patient’s back. To gain kidney access, the surgeon steers a needle towards the kidney while simultaneously controlling the position and orientation of a US probe to keep the needle in the image plane. To successfully reach the kidney while avoiding delicate structures, a significant level of skill and precision is required. To alleviate the surgeon’s cognitive workload, robot-assisted needle tracking can be implemented to autonomously track the needle in the US images and adjust the US probe’s position and orientation such that the same portion of the needle is visible in the images. This paper presents a US-guided visual servoing (VS) algorithm to track the translation and rotation of a needle in a plane. Image features representing the desired pose of the needle in the image are defined, through which an interaction matrix is devised to relate the rate of the change of the image features in US images to the required position and orientation of the US probe connected to a robotic manipulator. Experimental results in 4 experimental scenarios in a water tank demonstrate the capability of the proposed method in tracking the needle in real-time with an accuracy of 2.6 mm with a control rate of 20 Hz. Although VS has been used to track surgical targets in the past, this paper proposes the first implementation of VS for needle tracking in longitudinal US images subjected to 3-DOF motion in a plane without any prior knowledge of needle trajectory or additional position sensors.
|
|
Mo-PS6-T5 Regular Session, Kahuku |
Add to My Program |
Biometric Systems, Bioinformatics, and Complex Networks |
|
|
|
18:00-18:15, Paper Mo-PS6-T5.1 | Add to My Program |
Group Lasso with Checkpoints Selection for Biological Data Regression |
|
Zhan, Huixin | Texas Tech University |
Yifan, Wang | Texas Tech University |
Keywords: Biometric Systems and Bioinformatics, Machine Learning, AI and Applications
Abstract: Some unique characteristics of biological data are (1) that they are always High-Dimension and Low-Sample-Size (HDLSS) and (2) there are changes in the data distribution, such as an imbalance in classes, distribution and covariate shifts, etc. In this paper, we propose a Group Lasso with Checkpoints SElection (GL CSE) algorithm to tackle both issues. To address the first issue, we utilize a group Lasso regression model tailored for HDLSS data to perform feature selection on predefined groups of features, alleviating overfitting and being invariant under group-wise orthogonal reparameterizations. To address the second issue, we propose the checkpoint selection method to extract important model checkpoints while training on group Lasso via two proposed metrics, i.e., the average KL-divergence between training and validation features and the Frobenius error of the covariance matrices between training and validation features. Both metrics aim to select model checkpoints with minimal drifts between the training and validation features. The results of our experiments indicate that our proposed GL CSE algorithm achieves better performance compared to other baseline methods in terms of the MSE and R2 measurements. Specifically, on the biological age dataset, our GL CSE method achieves 0.8799 and 0.9883 for the MSE and R2 measurements, respectively. Additionally, we also show that our proposed checkpoint selection method performs better than regular K-fold cross-validation. Specifically, on the biological age dataset, GL CSE (Q2) achieves 0.9045 MSE and 0.9880 R2, respectively, which outperforms the regular K-fold cross-validation results, i.e., 1.0612 MSE and 0.9871 R2, respectively.
|
|
18:15-18:30, Paper Mo-PS6-T5.2 | Add to My Program |
How Do Low-Level Image Features Affect CNN-Based Face Detector Accuracy? |
|
Roy Choudhury, Ahana | Valdosta State University |
K. S., Krishnapriya | Valdosta State University |
Vaughan, Chase E | Valdosta State University |
Mihail, Radu Paul | Valdosta State University |
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Image Processing and Pattern Recognition
Abstract: Face detectors are a subset of object detectors that output, at a minimum, a set of locations in an image if and where human faces are present. Face detection is challenging, in part, due to low variance in the structural content of frontal-view faces (i.e., most faces have two eyes, a nose and a mouth) and high variance in visual appearance. This aspect of the domain skews detectors to higher false positive rates as a consequence of many patches of imagery containing features spatially consistent with frontal-view faces. In this study we evaluate the performance of three state-of-the-art face detectors (BlazeFace, MTCNN, and SCRFD) on frontal-view face imagery in a novel human-labeled dataset of 64,104 images with reliable ground truth. We show evidence that modern CNN-based models rely heavily on low-level image features, in spite of their powerful capability to learn complex, discriminatory visual features and concepts. We do this by altering the spectral and color content of frontal-view face images. To gain a better understanding of detector failures, we apply the Deep Dream technique to enhance image features that lead models to false positives.
|
|
18:30-18:45, Paper Mo-PS6-T5.3 | Add to My Program |
Fingerprint Digital Twin for Secure and Privacy Preserving Biometric Authentication |
|
Shukla, Rishabh | Indian Institute of Technology Jammu |
Kaur, Harkeerat | Indian Institute of Technology Jammu |
Echizen, Isao | National Institute of Informatics Tokyo |
Khanna, Pritee | Indian Institute of Information Technology, Design & Manufacturi |
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Image Processing and Pattern Recognition
Abstract: This work proposes a novel application of digital twins in the field of biometric security. Biometric systems have become widespread but their use carries risks of privacy invasion attacks due to the sensitive nature of biometric data. To address these concerns, we propose creating biometric clones for digital access and authentication systems. A user’s fingerprint can act as a virtual representation or cyberproxy, allowing users to exist in the digital world with a unique, changeable, and privacy- preserving identity. The digital twin or clone fingerprint is generated using deep neural networks combined with a user- specific token/key. This approach allows third parties to process and store the proxy biometrics without putting the user’s personal information at risk. We suggest that this approach could provide a safer and more secure alternative to traditional biometric security systems.
|
|
18:45-19:00, Paper Mo-PS6-T5.4 | Add to My Program |
An Adaptive Community-Based Influence Maximization Algorithm in Social Networks |
|
Kun, Pan | South China University of Technology |
Qiu, Wen-Jin | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Complex Network, Computational Intelligence, Evolutionary Computation
Abstract: Influence maximization (IM) is a problem of select- ing the most influential vertices with limited budget under a given propagation model. A significant challenge faced by many existing algorithms pertains to their inability to reconcile the competing goals of solution quality and computational efficiency, rendering them unsuitable in large-scale social networks. In this paper, we propose an adaptive community-based influence maximization algorithm, named AComA, to solve the IM problem effectively and efficiently. First, we introduce a community detection method to divide a large-scale network into several communities. An adaptive indicator is then defined to identify vertices with high propagation value in divided community networks. Based on community detection and the adaptive influence indicator, the number of candidate vertices is reduced, which helps significantly reduce the search space of the problem. Second, to select the final seed set from these candidate vertices, a genetic algorithm (GA) is introduced. Both the crossover and mutation operations are specifically modified in order to adapt to the IM problem. Taking into account both the local neighborhood information and the global community structure information, AComA manages to achieve a more accurate measurement of the influence spread for each vertex. The method proposed in this paper is tested on several real-world datasets. The experimental results show that AComA is promising.
|
|
19:00-19:15, Paper Mo-PS6-T5.5 | Add to My Program |
Predicting Robustness Performance with Noises in Network Representation |
|
Wu, Chengpei | Sichuan Normal University |
Li, Junli | Sichuan Normal University |
Xu, Siyi | Sichuan Normal University |
Keywords: Complex Network, Deep Learning, Neural Networks and their Applications
Abstract: The connectivity and controllability of complex networks play an important role in ensuring the proper functioning of network systems. Robustness of connectivity and controllability is the ability of a network to maintain its basic functions against various malicious attacks. Convolutional neural network (CNN)-based approaches provide an efficient framework to approximate the network robustness, which significantly reduces computation time compared to attack simulations. In this paper, the performance of CNN-based prediction for connectivity and controllability robustness is investigated, when there are noises in the predicted network representation. Two CNN-based predictors are compared, 1) convolutional neural network-based robustness predictor (CNN-RP), and 2) spatial pyramid pooling-based convolutional neural network (CNN-SPP). Two aspects of network information noises are considered and investigated, 1) the random node information noises (RNIN), and 2) the random edge information noises (REIN). The following main conclusions are obtained from extensive experimental studies on synthetic networks: 1) CNN-RP is more tolerant than CNN-SPP to network noises, 2) The characteristics of small-world and scale-free networks make them have a favorable anti-noise ability, and 3) RNIN has less impact on CNN-based prediction performance than REIN, RNIN and REIN show opposite effects on prediction performance when the size of the predicted network is out of the training network size range.
|
|
Mo-PS6-T6 Regular Session, Oahu |
Add to My Program |
Applications of Artificial Intelligence |
|
|
|
18:00-18:15, Paper Mo-PS6-T6.1 | Add to My Program |
CoRL: A Cost-Responsive Learning Optimizer for Neural Networks |
|
Kar, Reshma | Saint Peter's University |
Voddi, Vijay Kumar Reddy | Saint Peter's University |
Patra, Braja Gopal | Weill Cornell Medicine |
Pathak, Jyotishman | Weill Cornell Medicine |
Keywords: Application of Artificial Intelligence, Computational Intelligence, Machine Learning
Abstract: Selection of the optimal learning rate for training neural networks has often been a matter of concern for the machine learning community. The existing learning rates are dependent on multiple scaling factors. This paper proposes Cost-Responsive Learning (CoRL) which does not require manual hyper-parameter tuning. It maintains a linear relationship with the prediction error of the neural network. This is expected to offer the lowest learning rate at the global minima, and higher learning rates elsewhere. Hence, a number proportional to the prediction error is used as a learning rate, subject to the constraint that the number is within an acceptable range (here [0,1]). The derivation of an optimal learning rate from a given cost function is illustrated with the popular binary/categorical cross-entropy cost function(s). Experiments performed under multiple settings demonstrate that, with the CoRL optimizer needs no parameter tuning to obtain state-of-the-art results with significantly lower training time for equivalent performance.
|
|
18:15-18:30, Paper Mo-PS6-T6.2 | Add to My Program |
Feature Extraction and Selection from Impedance Measurements for Bladder Tumor Differentiation |
|
Veil, Carina | University of Stuttgart |
Krauß, Franziska | University of Stuttgart |
Walz, Simon | University Hospital Tübingen |
Schüle, Johannes | University of Stuttgart |
Stenzl, Arnulf | University Hospital Tübingen |
Sawodny, Oliver | University of Stuttgart |
Keywords: Application of Artificial Intelligence, Machine Learning, Neural Networks and their Applications
Abstract: Correctly classifying tumorous tissue and the resection margins proves to be a difficult task in endoscopic bladder cancer surgeries. As a tumor shows altered electrical properties, intraoperative impedance measurements can support the surgeon to classify the tissue. An important step in this process is the decision on how to pass the impedance information to the classification algorithm. This can be either in the form of a raw measurement vector or based on parameters extracted from the measurement. These parameters arise from curve characteristics of the Nyquist diagram, electrical tissue models, or indices defined as ratios between specific measurement points. In this work, different features proposed in the literature are reviewed and extracted from impedance measurements taken on bladder tissue. The most influential measurement frequencies are determined via a principle component analysis, and the most promising parameters are selected based on an analysis of variance. The different feature extraction approaches are compared based on the classification accuracy on unseen test data. The accuracy is very high for all different ways to define features, and it increases for every approach when only considering the selected features. Especially for the Nyquist parameters and indices, as well as the reduced measurement vector, the number of necessary frequencies is significantly reduced. This shortens the measurement, which is favorable for an intraoperative application.
|
|
18:30-18:45, Paper Mo-PS6-T6.3 | Add to My Program |
A Small Claims Court for the NLP: Judging Legal Text Classification Strategies with Small Datasets |
|
Noguti, Mariana Yukari | UFPR |
Vellasques, Eduardo | SAP |
Oliveira, Luiz S. | UFPR |
Keywords: Application of Artificial Intelligence, Computational Intelligence, Transfer Learning
Abstract: Recent advances in language modelling has significantly decreased the need of labelled data in text classification tasks. Transformer-based models, pre-trained on unlabeled data, can outmatch the performance of models trained from scratch for each task. However, the amount of labelled data need to fine-tune such type of model is still considerably high for domains requiring expert-level annotators, like the legal domain. This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data and perform a classification task in the legal area with 50 predefined topics. More specifically, we use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects, which currently demands deep legal knowledge for manual filling. The task of optimizing the performance of classifiers in this scenario is especially challenging, given the low amount of resources available regarding the Portuguese language, especially in the legal domain. Our results demonstrate that classic supervised models such as logistic regression and SVM and the ensembles random forest and gradient boosting achieve better performance along with embeddings extracted with word2vec when compared to BERT language model. The latter demonstrates superior performance in association with the architecture of the model itself as a classifier, having surpassed all previous models in that regard. The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning, with an accuracy of 80.7% in the aforementioned task.
|
|
18:45-19:00, Paper Mo-PS6-T6.4 | Add to My Program |
Toward Intention Discovery for Early Malice Detection in Cryptocurrency |
|
Cheng, Ling | Singapore Management University |
Zhu, Feida | Singapore Management University |
Wang, Yong | Singapore Management University |
Ruicheng, Liang | Hefei University of Technology |
LIU, Huiwen, Huiwen | Singapore Management University |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Cryptocurrency is particularly vulnerable to malicious activities due to its pseudo-anonymous nature. However, existing solutions that heavily rely on deep learning lack interpretability and are only available for retrospective analysis of specific malice types. To address these challenges, we propose Intention-Monitor for early malice detection in bitcoin. Our model utilizes a Decision-Tree based feature Selection and Complement (DT-SC) to build different feature sets for different malice types. Additionally, the Status Proposal Module (SPM) and a hierarchical self-attention predictor propose the global status and predict the address label in real-time. A survival module decides when to stop prediction and proposes the status sequence, namely intention. Our model can detect various malicious activities with strong interpretability, outperforming state-of-the-art methods according to extensive experiments on three real-world datasets. Additionally, our model can explain existing malicious patterns and find new suspicious characters through additional case studies.
|
|
19:15-19:30, Paper Mo-PS6-T6.6 | Add to My Program |
Video Prediction Based on Multi-Resolution Echo State Networks for a Jar Test |
|
Sato, Ryoga | Nagaoka University of Technology |
Harakawa, Ryosuke | Nagaoka University of Technology |
Iwahashi, Masahiro | Nagaoka University of Technology |
Keywords: Application of Artificial Intelligence, Image Processing and Pattern Recognition, Machine Learning
Abstract: For a jar test in water purification, it is necessary to develop a video prediction method to assist in coagulant dosage decisions. However, such a method has not been established; video prediction methods in other fields require a large amount of training videos. This paper proposes a video prediction method based on multi-resolution echo state networks (ESN) for a jar test. Because the proposed method is constructed based on ESN, we succeed in long-term video prediction even if the amount of training videos is small. Specifically, we develop a new model that performs ESN while observing the input video at different resolutions and effectively fuses the prediction results. The multi-resolution approach enables us to acquire both rough shape changes and fine pattern features, resulting in accurate prediction. In fusion of the multiple prediction results, our method can calculate optimal weights by the least squares method. Experimental results for a jar test confirm that our method outperformed existing video prediction methods based on deep neural networks. In addition, the results on the sky timelapse dataset show the versatility of our method.
|
|
19:30-19:45, Paper Mo-PS6-T6.7 | Add to My Program |
Deep Modular Fuzzy Inference Model |
|
Nagai, Haruya | Osaka Universisty |
Seki, Hirosato | Osaka University |
Keywords: AI and Applications, Fuzzy Systems and their applications
Abstract: In recent years, explainable AI (XAI) has been emphasized in various fields such as healthcare, finance, etc. Since the fuzzy rules in fuzzy inference models are structured as If-Then rules, the inference process of fuzzy inference models can be easily understood by humans. Among them, a deep TSK fuzzy inference model has fuzzy rules with low complexities due to select randomly features and map randomly them to fuzzy partitions. However, this randomness may lead to situations where important features or fuzzy partitions of them that have a significant impact on the results are not used in the inference. Therefore, this paper propose a deep modular fuzzy inference model with high accuracy and stability using small number of fuzzy rules with high interpretability.
|
|
Mo-PS6-T7 Regular Session, Hawaii 2 |
Add to My Program |
Cognitive Computing |
|
|
|
18:00-18:15, Paper Mo-PS6-T7.1 | Add to My Program |
A Transferable Multi-Agent Reinforcement Learning Method for Distribution Service Restoration |
|
Si, Ruiqi | Wuhan University |
Qiao, Ji | China Electric Power Research Institute |
Wang, Xiaohui | China Electric Power Research Institute |
Ji, Kaixuan | China Electric Power Research Institute |
Wang, Zibo | China Electric Power Research Institute |
Zhang, Jun | Wuhan University |
Pan, Xuanying | Wuhan University |
Zhang, Zhengyan | Wuhan University |
Keywords: Cognitive Computing, Multi-User Interaction
Abstract: The occurrence of extreme events, which has increased the risk of major outages in the grid, makes the quick and efficient recovery of load in the distribution network become a key issue. The data-driven deep reinforcement learning method has great potential in providing fast decision-making. However, a large number of agents lead to the curse of dimensionality, making it inefficient to obtain effective control strategies. When tackling similar tasks in different power gird, the retraining of multiple agents will bring us great cost. To solve this problem, we propose a transferable multi-agent reinforcement learning framework that employs model reload and buffer reuse methods to transfer control strategies from small-scale simple scenes to large-scale complex scenes. It also utilizes attention mechanisms to aggregate observation features and handles the problem of variable observation dimensions. Finally, the distribution service restoration problem is modeled as a Markov decision process and solved using the QMIX algorithm. The performance of the proposed method has been verified in IEEE 34-node and IEEE 123-node distribution systems.
|
|
18:30-18:45, Paper Mo-PS6-T7.3 | Add to My Program |
Soft Robotic Mastication System Inducing Misperception of Food Texture During Human Oral Processing |
|
Hirose, Kosuke | Yamagata University |
Ogawa, Jun | Yamagata University |
Watanabe, Yosuke | Yamagata University |
Shiblee, MD Nahin Islam | Yamagata University |
Furukawa, Hidemitsu | Yamagata University |
Keywords: Cognitive Computing, Human-Collaborative Robotics, Human-Machine Interface
Abstract: The perception of food texture in humans heavily relies on the way of biting and the condition and arrangement of teeth. Consequently, changes in dentures or biting speed can create an illusion of different textures for the same food. Likewise, in mastication robots, variations in control parameters and probe geometry can lead to differences in acquired data, resulting in texture misidentification. This study aims to bridge the gap in texture perception between humans and machines, and make their misperception of texture more similar. To achieve this, the possibility of replicating human texture perception by machines was investigated using a highly accurate texture discriminator called "Gel Biter", which comprises multiple soft materials. The findings indicate that misperception due to variations in mastication speed leads to misdiscrimination with a tendency similar to that in humans. Additionally, it was discovered that different contact between the tooth-shaped probes can cause a completely different texture perception. The potential contributions to food development using human dental techniques and food texture illusion are also discussed.
|
|
18:45-19:00, Paper Mo-PS6-T7.4 | Add to My Program |
Personality Perception Using Scenario Based Stimulation and Physiological Signals |
|
Biswas, Amrijit | North South University |
Li, Jingjie | Australian National University |
Shubho, Fahimul Hoque | North South University |
Gedeon, Tom | Curtin University |
Rahman, Shafin | North South University |
Hossain, Md Zakir | Curtin University |
Keywords: Affective Computing, Cognitive Computing
Abstract: Previous studies on automatic personality perception have primarily focused on a limited number of personality traits. However, in real-world situations, humans exhibit a wide range of personality traits. To overcome this limitation, a new methodology for automatic personality perception is proposed in this paper. This revised approach can predict various personality traits (17 traits) with satisfactory performance by utilizing physiological signals. The underlying concept is to stimulate participants with different emotional stimuli to elicit physiological responses in a specific scenario. Biomarkers such as Electroencephalogram (EEG), Skin Conductance, Blood Volume Pulse, and Pupil Dilation reflect an individual's personality traits. Two experiments are conducted with different scenarios, including Image/Video Stimulation and Driving Simulation, to support this study. Based on the collection of data and validation of supervised learning models, Naive Bayes outperforms other classifiers explored in this research. EEG is the most effective signal for predicting personality, although combining other signals may produce similar results. Our method accurately predicts the 17 personality traits, demonstrating significant potential for clinical research.
|
|
19:00-19:15, Paper Mo-PS6-T7.5 | Add to My Program |
Autism Spectrum Disorder Classification Via Local and Global Feature Representation of Facial Image |
|
Mahamood, Md. Nadim | Begum Rokeya University, Rangpur |
Uddin, Md. Zasim | Begum Rokeya University, Rangpur |
Shahriar, Md. Arif | Begum Rokeya University, Rangpur |
Alnajjar, Fady | United Arab Emirates University, |
Ahad, Md Atiqur Rahman | University of East London, UK |
Keywords: Affective Computing, Cognitive Computing, Human-Machine Interaction
Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects social communication and interaction. Early diagnosis of ASD can mitigate the severity and help with ideal treatment direction. Computer vision-based methods with traditional machine learning and deep learning are employed in the literature for automatic diagnosis. Recently, deep learning with a facial image-based ASD classification has gained interest due to its ease of collection and non-invasiveness. We observed that the existing approaches utilized either local or global features of facial images to diagnose ASD. However, its important to consider both local and global features to obtain fine-grained details and larger contextual information for accurate detection and classification. This paper proposes a sequencer-based patch-wise Local Feature Extractor along with a Global Feature Extractor. Finally, the features from these modules are aggregated to obtain the final feature for the classification of ASD. Experiments on a publicly available Autism Facial Image Dataset demonstrate that our proposed framework achieves state-of-the-art performance. We achieved accuracy, precision, recall, and F1-score of 94.7%, 94.0%, 95.3%, and 94.6%, respectively.
|
|
19:15-19:30, Paper Mo-PS6-T7.6 | Add to My Program |
Decoding Illusion Perception: A Comparative Analysis of Deep Neural Networks in the Müller-Lyer Illusion |
|
Zhang, Hongtao | Kochi University of Technology |
Yoshida, Shinichi | Kochi University of Technology |
Li, Zhen | Shenzhen MSU-BIT University |
Keywords: Human Performance Modeling, Cognitive Computing
Abstract: Visual illusions, as powerful tools for revealing the strategies and limitations of brain visual processing, have garnered attention in neural network research. This study aims to investigate the performance of various deep neural networks (DNNs) in handling the Müller-Lyer illusion task, thereby providing insights into the mechanisms of the brain's visual system in processing visual illusions. We adopted two types of visual stimuli and recruited 12 participants for a length-matching task. Based on the participants' perceptual data, we employed multiple classical neural network models to test their responses to each type of stimuli. To analyze the differences in model performance, we employed representational dissimilarity matrices (RDM) analysis and utilized Grad-CAM techniques to visualize neural network activation maps. The results revealed that some neural networks could simulate human visual illusion phenomena but exhibit differences in representational similarity and feature preferences. These findings provide a preliminary understanding of the computational principles and mechanisms of the brain's visual system and offer insights for future research in building more robust and efficient neural network models.
|
|
19:30-19:45, Paper Mo-PS6-T7.7 | Add to My Program |
BrainLM: Estimation of Brain Activity Evoked Linguistic Stimuli Utilizing Large Language |
|
Luo, Ying | Ochanomizu University |
Kobayashi, Ichiro | Ochanomizu University |
Keywords: Cognitive Computing, Brain-based Information Communications
Abstract: In recent years, with the recent remarkable development of large-scale language models in natural language processing research, there has been an increasing number of studies employing large-scale language models to investigate the information processing mechanisms of encoding and decoding in the brain. In this study, we developed a new pre-trained language model, BrainLM, which incorporates paired data of brain activity induced by text and stimuli, and verified the accuracy of estimating brain states from natural language in multiple NLP tasks. In essence, our research has achieved several noteworthy accomplishments. Firstly, we successfully developed a multimodal model that incorporates both brain and text. Subsequently, we conducted bi-directional experiments to validate the model and ensure the reliability of both brain encoding and decoding processes. Furthermore, we performed meticulous comparative experiments, wherein we introduced 20 state-of-the-art (SOTA) language models as a control group. Our findings reveal that our proposed model outperforms superior brain encoding ability compared to the control group. Lastly, we designed a discrete Autoencoder module that extracts brain features. This module can be utilized independently for the purpose of extracting brain features in a wider range of brain decoding studies beyond fMRI.
|
|
Mo-PS6-T8 Regular Session, Hawaii 3 |
Add to My Program |
Computational Intelligence |
|
|
|
18:00-18:15, Paper Mo-PS6-T8.1 | Add to My Program |
Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution |
|
Modupe, Abiodun | University of Pretoria |
Celik, Turgay | University of the Witwatersrand, |
Marivate, Vukosi | University of Pretoria |
Olugbara, Oludayo | MICT SETA Center of Excellence in 4IR, Durban University of Tech |
Keywords: Computational Intelligence, Deep Learning, Machine Learning
Abstract: The problem of unveiling the author of a given text document from multiple candidate authors is called authorship attribution. Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. Unfortunately, the performance of word-based authorship attribution systems is limited by the vocabulary of the training corpus. Literature has recommended character-based stylistic markers as an alternative to overcome the hidden word problem. However, character-based methods often fail to capture the sequential relationship of words in texts which is a chasm for further improvement. The question addressed in this paper is whether it is possible to address the ambiguity of hidden words in text documents while preserving the sequential context of words. Consequently, a method based on bidirectional long short-term memory (BLSTM) with a 2-dimensional convolutional neural network (CNN) is proposed to capture sequential writing styles for authorship attribution. The BLSTM was used to obtain the sequential relationship among characteristics using subword information. The 2-dimensional CNN was applied to understand the local syntactical position of the style from unlabeled input text. The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50. Experimental results indicate accuracy improvement of 1.07%, and 0.96% on CCAT50 and Twitter, respectively, and produce comparable results on the remaining datasets.
|
|
18:15-18:30, Paper Mo-PS6-T8.2 | Add to My Program |
Learning a Policy for Pursuit-Evasion Games Using Spiking Neural Networks and the STDP Algorithm |
|
Tayefe Ramezanlou, Mohammad | Carleton University |
Schwartz, Howard | Carleton University |
Lambadaris, Ioannis | Carleton University |
Barbeau, Michel | Carleton University |
Naqvi, Syed Hassan Raza | Carleton University |
Keywords: Computational Intelligence, Neural Networks and their Applications, AI and Applications
Abstract: Pursuit-Evasion (PE) games are regarded as a major platform for game theory. In this kind of game, an agent called an evader tries to escape from another agent called a pursuer. Active Target Defense (ATD) is a derivative of PE games, attracting attention recently. In an ATD game, the evader, often called an invader, strives to capture a moving target. The pursuer, called a defender, tries to intercept the invader. This paper implements the Spike-Timing-Dependent Plasticity (STDP) algorithm to train two Spiking Neural Networks (SNNs) to find a suitable solution for the ATD problem in decentralized situations. One of the SNNs is used to control the invader, while the other controls the defender. The performance is compared with the analytical solution for the pedestrian model. The results showed that an SNN can learn the optimal capture point only using relative velocities and line of sight.
|
|
18:30-18:45, Paper Mo-PS6-T8.3 | Add to My Program |
Secure and Efficient Group Decision-Making with Blockchain-Based Consensus and Trust Management |
|
Hassani, Hossein | University of Windsor |
Razavi-Far, Roozbeh | University of New Brunswick |
Saif, Mehrdad | University of Windsor |
Herrera Viedma, Enrique | University of Granada (Spain) |
Keywords: Computational Intelligence, Expert and Knowledge-Based Systems, Application of Artificial Intelligence
Abstract: Trust-building is of paramount importance for managing and improving consensus in group decision-making (GDM). This mechanism usually involves a trust propagation process for estimating the level of trust among decision-makers (DMs). However, this process is computationally expensive and hinders the speed of consensus reaching. To address this issue, this work proposes a novel trust-building mechanism that does not rely on the trust propagation process to quantify DMs' level of trust. Instead, it makes use of Blockchain technology to facilitate communication between the moderator and the group of DMs. This novel trust-building mechanism does not rely on trust propagation, which makes it computationally efficient for building trust among DMs while also providing a secure and efficient communication protocol to accelerate the consensus-reaching process. The proposed GDM model is illustrated through an example, and the sensitivity of the model to various assumptions is analyzed, demonstrating the practical applicability of this approach.
|
|
18:45-19:00, Paper Mo-PS6-T8.4 | Add to My Program |
Medical Image Classification Using Transfer Learning and Network Pruning Algorithms |
|
Saleh, Luca | Royal Holloway, University of London |
Zhang, Li | Royal Holloway, University of London |
Keywords: Computational Intelligence, Expert and Knowledge-Based Systems, Big Data Computing,
Abstract: Deep neural networks show great advancement in recent decades in classifying medical images (such as CT-scans) with high precision to aid disease diagnosis. However, the training of deep neural networks requires significant sample sizes for learning enriched discriminative spatial features. Building a high quality dataset large enough to satisfy model training requirement is a challenging task due to limited disease sample cases, and various data privacy constraints. Therefore in this research, we perform medical image classification using transfer learning based on several well-known deep networks, i.e. GoogLeNet, Resnet and EfficientNet. To tackle data sparsity issues, a Wasserstein Generative Adversarial Network (WGAN) is used to generate new medical image samples to increase the numbers of training instances of the minority classes. The transfer learning process itself also allows the building of strong classifiers by transferring knowledge from the pre-trained image domain to a new medical domain using a small sample size. Moreover, the lottery ticket hypothesis is also used to prune each transfer learning network trained using the new target image data sets. Specifically, the L1 norm unstructured pruning technique is used for network reduction. Hyper-parameter fine-tuning is also performed to identify optimal settings of key network hyper-parameters such as learning rate, batch size and weight decay. A total of 20 trials are used for optimal hyper-parameter selection. Evaluated using multi-class lung X-ray images for pneumonia conditions and brain tumour CT-scans, the fine-tuned EfficientNet model obtains the best brain tumour classification accuracy rate of 96% and a fine-tuned GoogLeNet model with pruning has the highest pneumonia classification accuracy rate of 81.5%.
|
|
Mo-S4T1 Virtual Session, Room T1 |
Add to My Program |
Neural Networks and Their Applications I |
|
|
|
16:00-17:00, Paper Mo-S4T1.1 | Add to My Program |
Enhanced Facial Expression Recognition Based on Facial Action Unit Intensity and Region |
|
Chen, Weiyang | Qufu Normal University |
Wang, Anrui | Qilu University of Technology |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Deep Learning
Abstract: Facial expression recognition (FER) is an attractive research area with important applications, such as in human computer interaction, where FER can facilitate better collaboration between intelligent machines and humans. Facial Action Units (AUs) can describe facial changes in more detail, and Facial Action Unit Intensities (AUIs) reflect the intensity level of facial behavioral expression, and they can help enhance the performance of ambiguous expressions. In this paper, we propose a FER method based on AUs and use a simple and effective network called Action Unit-Enhanced Expressions (QUEEN). Specifically, first we define the center position of the region of interest of each AU through face landmarks. Second, we attach AUIs to facial expressions in the form of Gaussian heatmaps and crop AU regions of interest (RoI) based on each AU center point. Finally, we fuse the obtained AUI-enhanced expression features with the deep features of the AU RoI region via a convolutional neural network (CNN) for further expression recognition. Our proposed method is evaluated on the widely available FER databases RaFD and Oulu-CASIA. Experimental results and comparisons show that our model achieves better performance than several other popular methods.
|
|
16:00-17:00, Paper Mo-S4T1.2 | Add to My Program |
Accelerating Column Generation Algorithm Using Machine Learning Based Column Elimination |
|
Fang, Lichang | Tsinghua University |
Yuan, Haofeng | Tsinghua University |
Zhang, Yuli | Beijing Institute of Technology |
Song, Shiji | Tsinghua University |
Keywords: Neural Networks and their Applications, Deep Learning, Heuristic Algorithms
Abstract: The column generation (CG) algorithm is widely used in large-scale optimization problems. However, a large amount of columns in the restricted master problem (RMP) makes the computing process very time-consuming. This paper proposes a machine learning based column elimination strategy to accelerate the CG algorithm. Our approach represents the RMP by a bipartite graph and applies a learned Graph Neural Network model to predict redundant columns to be eliminated from the RMP, so as to reduce the time cost of solving the RMP and iterations required for convergence. Our approach is tested on cutting stock problem instances. Compared with the vanilla CG algorithm, the iterations and time required for convergence are reduced by up to 31% and 48%, respectively. Furthermore, our approach shows great generalization to cutting stock problem instances of different sizes.
|
|
16:00-17:00, Paper Mo-S4T1.3 | Add to My Program |
OpenCADP: Open-Set Intrusion Detection with a Cluster Anomaly Detection Plugin |
|
Ping, Guolou | Tsinghua University |
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning
Abstract: Open-set recognition has gained significant attention in intrusion detection due to its ability to classify known attacks and identify novel attacks. However, current approaches based solely on discriminative features may fail to identify new data with non-discriminative feature differences and are prone to adversarial attacks. To overcome these limitations, this paper proposes an open-set intrusion detection framework that incorporates a cluster anomaly detection plugin to identify non-discriminative unknown classes by introducing primitive features and detecting adversarial examples through cluster design. Specifically, 1) we use the anomaly detection plugin trained with primitive features from the raw data to discover unseen data. Samples rejected by either deep discriminative classifiers or detection plugins are considered unknown. 2) We design the plugin as a cluster of one-class anomaly detectors. This approach effectively isolates adversarial examples carefully crafted to deceive the deep classifier. 3) We provide theoretical evidence of the proposed framework's ability to detect non-discriminative unknown classes and adversarial examples. Extensive experiments demonstrate the effectiveness of our approach when applied to open-set intrusion detection.
|
|
16:00-17:00, Paper Mo-S4T1.4 | Add to My Program |
NCDFSA: Neural Cognitive Diagnostic Focusing on Students' Attention to Knowledge Concepts |
|
Wei, Guoxiong | Jinan University, Guangdong |
He, Zhenyu | Jinan University |
Quanlong, Guan | Jinan University, Guangzhou |
Fang, Liangda | Jinan University, Guangzhou |
Luo, Weiqi | Jinan University, Guangzhou |
Chen, Guanliang | Monash University |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Knowledge Acquisition
Abstract: The primary aim of cognitive diagnosis is to predict students' performance and knowledge structures by analyzing their learning behavior and answering results, thereby enabling educators can provide personalized instruction. Scholars have proposed many cognitive diagnostic models. However, most of the models do not fully extract and utilize the relevant data and parameters of cognitive diagnosis. Moreover, some models only rely on artificially designed simple functions to analyze the cognitive process of students, which cannot fully capture the complex relationship between students and the exercises. To address these limitations, this paper proposes a neural cognitive diagnostic model named NCDFSA, which focuses on students' attention to knowledge concepts. The model utilizes neural networks to diagnose students' knowledge and considers students' implicit relationships with knowledge concepts. We introduce the concept of attention matrix(AM) and define the importance of knowledge concepts by the frequency of usage of knowledge concepts to improve the prediction effect. This paper compares NCDFSA with existing classical models on four real datasets and finds that the model has higher accuracy and rationality in predicting student performance.
|
|
16:00-17:00, Paper Mo-S4T1.5 | Add to My Program |
Combining Self-Organizing Map with Reinforcement Learning for Multivariate Time Series Anomaly Detection |
|
Su, Peng | KTH Royal Institute of Technology |
Lu, Zhonghai | KTH Royal Institute of Technology |
Chen, Dejiu | KTH Royal Institute of Technology |
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning
Abstract: Anomaly detection plays a critical role in condition monitors to support the trustworthiness of Cyber-Physical Systems (CPS). Detecting multivariate anomalous data in such systems is challenging due to the lack of a complete comprehension of anomalous behaviors and features. This paper proposes a framework to address time series multivariate anomaly detection problems by combining the Self-Organizing Map (SOM) with Deep Reinforcement Learning (DRL). By clustering the multivariate data, SOM creates an environment to enable the DRL agents interacting with the collected system operational data in terms of a tabular dataset. In this environment, Markov chains reveal the likely anomalous features to support the DRL agent exploring and exploiting the state-action space to maximize anomaly detection performance. We use a time series dataset, Skoltech Anomaly Benchmark (SKAB), to evaluate our framework. Compared with the best results by some currently applied methods, our framework improves the F1 score by 9% from 0.67 to 0.73.
|
|
16:00-17:00, Paper Mo-S4T1.6 | Add to My Program |
Multi-Granularity Interest Learning for Click-Through Rate Prediction |
|
Niu, Chang | Tsinghua University |
Huang, Wei | Tsinghua University |
Yang, Yujiu | Tsinghua University |
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning
Abstract: Embedding&MLP paradigm based on deep learning has been widely used in click-through rate prediction tasks. Such approaches compress user features into a fixed-length vector, which causes the bottleneck of interest learning and makes it difficult to express diverse interests of users. To improve the accuracy and diversity of recommendation, more comprehensive user interest learning is required. We present Multi-Granularity Interest Learning(MGIL)—combining the memorization of user’s historical preferences with generalization of user’s interests. Our fine-grained interest module learns interests at the granularity of items, preserving the memory of users’ historical preferences as much as possible. The coarse-grained interest module, on the other hand, extracts multiple high-level interests from user behavior sequences to characterize multiple aspects of user interest, which are more generalized and abstract. The two modules complement each other from different granularities. We conduct extensive experiments on three real-world datasets, Movielens, Taobao and Amazon. Experimental results demonstrate that our method significantly improves the accuracy and diversity of recommendation compared to state-of-the-art models.
|
|
16:00-17:00, Paper Mo-S4T1.7 | Add to My Program |
A Text Classification Model Based on Virtual Adversarial Training and Bilateral Contrastive Learning |
|
Dou, Ximeng | Qilu University of Technology(ShanDong Academy of Sciences) |
Li, Ming | Shandong University of Traditional Chinese Medicine |
Zhao, Jing | Qilu University of Technology(ShanDong Academy of Sciences) |
Gao, Shuai | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Neural Networks and their Applications, Deep Learning, Representation Learning
Abstract: In unsupervised text classification, contrastive learn- ing can effectively learn discriminative representations of textfeatures, thereby improving the performance of the model. However, in the context of supervised text classification, contrastive learning methods cannot fully utilize the label information of samples in the text classification model, nor can they learn the relationship between labels and features among samples well. Therefore, how to improve the generalization and robustness of the supervised text classification model has been a problem that researchers have been exploring. In this paper, we propose a framework called VABCL (Virtual Adversarial and Bidirectional Contrastive Learning) based on virtual adversarial training and bidirectional contrastive learning. This framework combines virtual adversarial training with contrastive learning to train the model to establish better relationships between adversarial and real samples, thus improving the robustness and generalization performance of the classification model. Furthermore,we generate augmented samples using the label information of the samples, and use bidirectional contrastive learning to learn the relationship between sample labels and features, further improving the accuracy of the classification model. We experimentally verified the VABCL framework on three benchmark text classification datasets, and the results confirmed the improvement in the feature extraction ability and classification accuracy of VABCL, indicating that VABCL has good performance in the field of supervised text classificati
|
|
Mo-S4T2 Virtual Session, Room T2 |
Add to My Program |
Distributed Intelligent Systems |
|
|
Chair: Cao, Di | Zhejiang University of Technology |
|
16:00-17:00, Paper Mo-S4T2.1 | Add to My Program |
DNFOMP: Dynamic Neural Field Optimal Motion Planner for Navigation of Autonomous Robots in Cluttered Environment |
|
Katerishich, Maksim | Skolkovo Institute of Science and Technology |
Kurenkov, Mikhail | Skolkovo Institute of Science and Technology |
Karaf, Sausar | Skoltech Institute of Science and Technology |
Nenashev, Artem | Skolkovo Institute of Science and Technology (Skoltech) |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, System Modeling and Control
Abstract: Motion planning in dynamically changing environments is one of the most complex challenges in autonomous driving. Safety is a crucial requirement, along with driving comfort and speed limits. While classical sampling-based, lattice-based, and optimization-based planning methods can generate smooth and short paths, they often do not consider the dynamics of the environment. Some techniques do consider it, but they rely on updating the environment on-the-go rather than explicitly accounting for the dynamics, which is not suitable for self-driving. To address this, we propose a novel method based on the Neural Field Optimal Motion Planner (NFOMP), which outperforms state-of-the-art approaches in terms of normalized curvature and the number of cusps. Our approach embeds previously known moving obstacles into the neural field collision model to account for the dynamics of the environment. We also introduce time profiling of the trajectory and non-linear velocity constraints by adding Lagrange multipliers to the trajectory loss function. We applied our method to solve the optimal motion planning problem in an urban environment using the BeamNG.tech driving simulator. An autonomous car drove the generated trajectories in three city scenarios while sharing the road with the obstacle vehicle. Our evaluation shows that the maximum acceleration the passenger can experience instantly is -7.5 m/s2 and that 89.6% of the driving time is devoted to normal driving with accelerations below 3.5 m/s2. The driving style is characterized by 46.0% and 31.4% of the driving time being devoted to the light rail transit style and the moderate driving style, respectively.
|
|
16:00-17:00, Paper Mo-S4T2.2 | Add to My Program |
An Edge-Based Aquaculture Monitoring System for Fish Behavior Detection |
|
Cao, Di | Zhejiang University of Technology |
Chen, Zhenghao | Zhejiang University of Technology |
Zhang, Lechao | Zhejiang University of Technology |
Zhang, Yu | Zhejiang University of Technology |
Lu, Jian | Zhejiang University of Technology |
Lei, Yanjing | Zhejiang University of Technology |
Keywords: Cyber-physical systems, Distributed Intelligent Systems, Decision Support Systems
Abstract: Artificial intelligence, the main enabler for the intelligence of aquaculture monitoring systems, helps to increase the efficiency in fish farming, ensure the robustness and reduce the maintenance costs. With the rapid development of AI, many researchers focus on understanding the statuses of the fish via using water quality sensors and fish motion detection. However, existing methods focus on the pattern of movement of the fish, without exploring the health statuses in a real environment. Meanwhile, the computational requirement of the model and computing hardware equipment is also need to be taken into consideration. Thus, an edge-based aquaculture monitoring system is proposed with the advantage of health statuses recognition based on AI model and real-time communication by using edge computing devices. Slowfast, the deep learning algorithm is also first implemented to detect the different statuses of fish with an accuracy of 97.06%, which outperforms the other existing methods. The proposed monitoring system has also been successfully deployed in monitoring the intensive American shad farming in the cities of Suzhou, China.
|
|
16:00-17:00, Paper Mo-S4T2.3 | Add to My Program |
POA: Passable Obstacles Aware Path-Planning Algorithm for Navigation of a Two-Wheeled Robot in Highly Cluttered Environments |
|
Petrovsky, Alexander | Skoltech |
Youssef, Yomna | Valeo |
Myasoedov, Kirill | Skoltech |
Timoshenko, Artem | Skolkovo Institute of Science and Technology |
Vladimir.Guneavoy@skoltech.Ru, Vladimir | Skoltech |
Kalinov, Ivan Alexeevich | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Robotic Systems, Autonomous Vehicle, Cooperative Systems and Control
Abstract: This paper focuses on Passable Obstacles Aware (POA) planner - a novel navigation method for two-wheeled robots in a highly cluttered environment. The navigation algorithm detects and classifies objects to distinguish two types of obstacles - passable and unpassable. Our algorithm allows twowheeled robots to find a path through passable obstacles. Such a solution helps the robot working in areas inaccessible to standard path planners and find optimal trajectories in scenarios with a high number of objects in the robot’s vicinity. The POA planner can be embedded into other planning algorithms and enables them to build a path through obstacles. Our method decreases path length and the total travel time to the final destination up to 43% and 39%, respectively, comparing to standard path planners such as GVD, A*, and RRT*.
|
|
16:00-17:00, Paper Mo-S4T2.4 | Add to My Program |
Intelligent Data Management Pipeline to Facilitate Human Robot Interaction in Large-Scale Real-Time Environments |
|
Paul, Nicholas | SoarTech |
Sullivan, Zachary | SoarTech |
Marinier, Bob | SoarTech |
Moshkina, Lilia | SoarTech |
Keywords: Distributed Intelligent Systems, Autonomous Vehicle, Decision Support Systems
Abstract: Human-facing robotics systems are becoming more prevalent in our daily lives. Armies of food delivery robots roam the sidewalks of universities while security robots patrols malls and casinos. While the outlook of these types of systems is exciting, real-world interactions with these systems are often underwhelming or ineffective due to the many complexities involved in Human-Robot Interaction (HRI). Here we present the Spatial Reasoner (SR), a distributed ROS-based data management and processing pipeline capable of supporting coordinated human/robot interaction in large, urban environments in real time. The SR processes information from sensors mounted on ground and air platforms or in static positions and aggregates it into discrete, meaningful human/robot, human/human, and human/environment encounters. Encounters are explainable and traceable and used by other system components to generate evidence and recommendations to support the decision-making of a human operator. Uncertainty is propagated through its layered architecture making it highly robust and informative in unknown environments. The SR has been deployed and tested in hundreds of hours of live experimentation. It has efficiently observed and processed hundreds of thousands of interactions in dense, crowded environments and has been shown to effectively provide information for a human-directed and controlled AI system.
|
|
16:00-17:00, Paper Mo-S4T2.5 | Add to My Program |
An Adaptive Detection and Recognition Method for Traffic Sign Based on Multi-Scale Attention (I) |
|
Wang, Chunzhi | Hubei University of Technology |
Bao, Shuo | Hubei University of Technology |
Wu, Dade | Hubei Mechanical and Electrical Research and Design Institute Co |
Yan, Lingyu | Hubei University of Technology |
Keywords: Distributed Intelligent Systems, Intelligent Transportation Systems
Abstract: Traffic sign detection aims to locate and classify traffic signs in real time and accurately. But because of their small size and complex backgrounds, some smaller traffic signs are harder to detect than larger ones. On the other hand, some false information is always detected due to the influence of light changes and bad weather. Therefore, in order to solve the problems of missing detection and false detection, this paper proposes an adaptive multi-scale spatial-channel attention fusion center network (MSCA-Center Net). Firstly, a residual space-channel attention module combined with multi-channel information is proposed. The module divides the channels in the feature map into several groups, generate separate spatial and channel attention for each group, and combine the spatial and channel information of the multi-channel. Then, a multi-scale attention fusion module is proposed to integrate the extracted high-level and low-level features, which can improve the detection and classification accuracy. In addition, a coordinate attention module is introduced to obtain the final feature code, so that the model can locate and identify the target area more accurately. Finally, the optimal detection box is generated adaptively according to the feature coding. The proposed model was tested and evaluated on the CCTSDB dataset. Compared with the existing methods, the proposed method can detect traffic signs adaptively in real time under complex background, which verifies the effectiveness of the proposed method.
|
|
16:00-17:00, Paper Mo-S4T2.6 | Add to My Program |
Enhancing Efficiency of Quadrupedal Locomotion Over Challenging Terrains with Extensible Feet |
|
Kumar, Lokesh | TCS Research |
Sortee, Sarvesh | TCS Research |
Bera, Titas | TCS Research |
Dasgupta, Ranjan | TCS Research |
Keywords: Robotic Systems, Control of Uncertain Systems
Abstract: Recent advancements in legged locomotion research have made legged robots a preferred choice for navigating challenging terrains when compared to their wheeled counterparts. This paper presents a novel locomotion policy, trained using Deep Reinforcement Learning, for a quadrupedal robot equipped with an additional prismatic joint between the knee and foot of each leg. The training is conducted in NVIDIA Isaac Gym simulation environment. Our study investigates the impact of these joints on maintaining the quadruped's desired height and maintaining commanded velocities while traversing challenging terrains. We provide comparison results, based on a Cost of Transport (CoT) metric, between quadrupeds with and without prismatic joints. The learned policy is evaluated on a set of challenging terrains using the CoT metric in simulation. Our results demonstrate that the added degrees of actuation offer the locomotion policy more flexibility to use the extra joints to traverse terrains that would be deemed infeasible or prohibitively expensive for the conventional quadrupedal design, resulting in significantly improved efficiency.
|
|
16:00-17:00, Paper Mo-S4T2.7 | Add to My Program |
Distributed Second-Order Method with Diffusion Strategy |
|
Qu, Zhihai | Tongji University |
Li, Xiuxian | Tongji University |
Li, Li | Tongji University |
Hong, Yiguang | Tongji University |
Keywords: Optimization and Self-Organization Approaches
Abstract: Within the realm of distributed optimization, each node in the network possesses computational capabilities. Nodes perform local calculations on their own data, and by communicating local information (e.g., local gradients) with neighboring nodes, agents collectively achieve a globally optimal solution. In recent years, distributed optimization has garnered interest across diverse disciplines, particularly in situations where communication abilities are limited or data are private. In this paper, a distributed second-order algorithm, based on an augmented Lagrangian function, is proposed with an enhanced diffusion communication strategy. An R-linear convergence rate is established under a relaxed locally restricted strong convexity assumption, along with a widely employed L-smoothness assumption. Finally, the superiority of the algorithm is showcased by a distributed logistic regression example utilizing a synthetic dataset with various parameter settings.
|
|
Mo-S4T3 Virtual Session, Room T3 |
Add to My Program |
Systems Safety and Security I |
|
|
|
16:00-17:00, Paper Mo-S4T3.1 | Add to My Program |
Robust Malicious Domain Detection against Adversarial Attacks on Heterogeneous Graph |
|
Gao, Yi | University of Chinese Academy of Science |
Li, Zhiping | University of Chinese Academy of Sciences |
Fangfang, Yuan | Institute of Information Engineering, Chinese Academy of Science |
Zhang, Xiaoliang | Institute of Information Engineering, Chinese Academy of Science |
Wang, Dakui | Institute of Information Engineering, Chinese Academy of Science |
Cao, Cong | Institute of Information Engineering, Chinese Academy of Science |
Liu, Yanbing | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Systems Safety and Security, Systems Safety and Security,
Abstract: Domain Name System (DNS) is a crucial infrastructure of the Internet, yet it is also a primary medium for disseminating illicit information. Researchers have proposed numerous methods to detect malicious domains, among which heterogeneous graph (HG) based models have demonstrated good performance. However, their success may also motivate attackers to defeat HG based models in order to evade detection. In this paper, we propose a novel malicious domain detection model named RoDom, which is robust against adversarial attacks on HG. Firstly, we introduce different perturbations to construct multiple attacked graphs, which are designed to simulate different types of adversarial attacks on the HG. Secondly, we design a discriminator to perform robust representation learning on the HG by discriminating the original graph from attacked graphs. Finally, we introduce a classification selector to further improve the model's robustness by automatically combining domain representations of multiple HGs for domain classification. The experimental results show that RoDom outperforms other state-of-the-art methods and exhibits stronger robustness against adversarial attacks on the HG.
|
|
16:00-17:00, Paper Mo-S4T3.2 | Add to My Program |
AirTouch: Towards Safe Human-Robot Interaction Using Air Pressure Feedback and IR Mocap System |
|
Rakhmatulin, Viktor | Skolkovo Institute of Science and Technology Skoltech |
Grankin, Denis | Skolkovo Institute of Science and Technology (Skoltech) |
Konenkov, Mikhail | Skolkovo Institute of Science and Technology |
Davidenko, Sergei | Skolkovo Institute of Science and Technology (Skoltech) |
Trinitatova, Daria | Skolkovo Institute of Science and Technology |
Sautenkov, Oleg | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Systems Safety and Security,, Haptic Systems, Human Factors
Abstract: The growing use of robots in urban environments has raised concerns about potential safety hazards, especially in public spaces where humans and robots may interact. In this paper, we present a system for safe human-robot interaction that combines an infrared (IR) camera with a wearable marker and airflow potential field. IR cameras enable real-time detection and tracking of humans in challenging environments, while controlled airflow creates a physical barrier that guides humans away from dangerous proximity to robots without the need for wearable devices. A preliminary experiment was conducted to measure the accuracy of the perception of safety barriers rendered by controlled air pressure. In a second experiment, we evaluated our approach in an imitation scenario of an interaction between an inattentive person and an autonomous robotic system. Experimental results show that the proposed system significantly improves a participant's ability to maintain a safe distance from the operating robot compared to trials without the system.
|
|
16:00-17:00, Paper Mo-S4T3.3 | Add to My Program |
HLRS: A Deep Reinforcement Learning-Based Hero Recommendation System for MOBA Games |
|
Bian, Huicong | Qilu University of Technology(Shandong Academy of Sciences) |
Lu, Qin | Qilu University of Technology |
Keywords: Entertainment Engineering, Information Systems for Design and Marketing, Design Methods
Abstract: Multiplayer Online Battle Arena (MOBA) games have gained immense popularity in recent years, with games like Dota2, League of Legends (LoL), and Honor of Kings (HoK) being the most popular. In a typical MOBA game, two teams consisting of 10 players battle against each other in two phases: hero selection and hero battle phase. Choosing a suitable virtual game character from all the available heroes during the hero selection phase is a daunting task due to the large number of current heroes, their complex relationships, and player interests. Therefore, we propose an innovative hero recommendation system called HLRS, which acts as a coach to guide players in selecting appropriate heroes during the selection phase. To verify the effectiveness of our model, we used HoK as a test platform and trained the model using its game data. We conducted numerous experiments, and the results showed that our model was more effective than other recommendation schemes. It analyzed various aspects to recommend the most suitable hero characters for players.
|
|
16:00-17:00, Paper Mo-S4T3.4 | Add to My Program |
Video Self-Supervised Cross-Pathway Training Based on Slow and Fast Pathways |
|
Li, Jie | Xi'an Jiaotong University |
Yang, Jing | Xi'an Jiaotong University |
Jiang, Zhou | Xi'an Jiaotong University |
Chen, Yuehai | Xi'an Jiaotong University |
Du, Shaoyi | Xi'an Jiaotong University |
Keywords: Human Performance Modeling, Visual Analytics/Communication, Human-centered Learning
Abstract: In the field of video self-supervised learning, contrastive instance learning methods suffer from a lack of semantic information, resulting in inadequate generalization in downstream tasks. Although optical flow can provide some semantic information, it requires significant computational cost prior to training. To address this, we propose a Video self-supervised Cross-pathway training model based on Slow and Fast pathways (VCSF). This model separately extracts temporal and spatial features from pure RGB video frames, and uses the complementary representations of the two pathways to conduct cross-pathway training. Additionally, we propose a motion perception module in the low-frame-rate space to enhance the network's ability to perceive rapidly changing human motion. We conducted extensive experiments in downstream missions of UCF101 and HMDB51, and obtained state-of-the-art results in models using the UCF101 data set for self-supervised pre-training, including motion recognition and nearest neighbor retrieval.
|
|
16:00-17:00, Paper Mo-S4T3.5 | Add to My Program |
A Practical, Robust, Accurate Gaze-Based Intention Inference Method for Everyday Human-Robot Interaction |
|
Xu, Haoyang | Peking University |
Wang, Tao | Peking University |
Chen, Yingjie | Peking University |
Shi, Tianze | Peking University |
Keywords: Affective Computing, Human-Machine Interaction
Abstract: Gaze estimation is a crucial component of human-robot interaction (HRI). While previous gaze estimation methods have been widely applied in advertising and gaming with head-mounted devices or complicated camera systems, little research has been conducted in everyday HRI scenarios. During interactions, robots have the potential to infer human intention through static gaze directions and dynamic eye movements, enabling them to behave more intelligently and friendly. This paper combines appearance-based gaze estimation methods with eye movement analysis methods to infer human intentions, particularly in human-robot interaction scenarios. Real interactions were conducted to test the accuracy and robustness of the methods developed. The experiments demonstrate that our methods deliver practical, robust, and accurate results.
|
|
16:00-17:00, Paper Mo-S4T3.6 | Add to My Program |
Spatio-Temporal Attention Based Graph Convolutional Networks for Human Action Reconstruction and Analysis |
|
He, Tianjia | Kyushu University |
Konomi, Shin'ichi | Kyushu University |
Yang, Tianyuan | Kyushu University |
Keywords: Human Performance Modeling, Cognitive Computing, Human-Machine Interface
Abstract: In recent years, there has been a growing interest in the application of Graph Convolutional Networks (GCNs) for classifying or generating human skeleton-based action sequences. Despite the progress in this field, there exists a relative dearth of research on the underlying mechanisms of how these network structures learn and represent the information features of the human skeleton. This paper proposes a novel GCN-based reconstruction network ST-ATGCN that utilizes spatial and temporal attention mechanisms for analyzing the extraction and reconstruction patterns of human action sequences. This versatile network can be effectively employed in a wide array of applications, including data compression, noise reduction, and interpolation. Experimental results on a public dataset demonstrate that ST-ATGCN network outperforms most of the currently prevailing GCN-based methods. This indicates the efficacy of the proposed network architecture in accurately extracting and reconstructing human skeleton information. Moreover, the reconstruction network exhibits proficiency in effectively restoring noisy action sequences.
|
|
16:00-17:00, Paper Mo-S4T3.7 | Add to My Program |
Model-Mediated Delay Compensation with Goal Prediction for Robot Teleoperation Over Internet |
|
Lima, Rolif | Tata Consultancy Services |
Vakharia, Vismay | Tata Consultancy Services |
Rai, Utsav | Tata Consultancy Services, Research |
Mehta, Hardik | Tata Consultancy Services Ltd |
Vatsal, Vighnesh | TCS Research, Tata Consultancy Services Ltd |
Das, Kaushik | TCS Research |
Keywords: Shared Control, Human-Collaborative Robotics, Virtual/Augmented/Mixed Reality
Abstract: Teleoperated robots have enabled humans to manipulate objects in remote environments without requiring physical presence. In this paper we focus on teleoperation of a robotic arm with shared control between the robot and the operator. A model-mediated approach is used to compensate for delays in the communication channel. Position information of the operator's arm is captured and processed to compute the states of a motion prediction model before transmission over a network to be used on the robot's side, allowing for compensation of transmission delays. Model Predictive Control (MPC) and a novel goal prediction algorithm is used to follow the operator's intended motion while reducing the cognitive loads arising from collision avoidance and fine manipulation in the remote environment. We evaluate the proposed method against a baseline pure teleoperation condition with an inverse kinematic controller and observe that the proposed approach improves the overall teleoperation performance in terms of task completion time.
|
|
Mo-S4T4 Virtual Session, Room T4 |
Add to My Program |
Data Analysis and Artificial Intelligence |
|
|
|
16:00-17:00, Paper Mo-S4T4.1 | Add to My Program |
Federated Learning with Common Representation Learning Criterion and Personalized Predictor |
|
Wang, Wenzhong | Hohai University |
Xie, Zaipeng | Hohai University |
Yu, Bingzhe | Hohai University |
Qu, Zhihao | Hohai University |
Zhang, Yufeng | Hohai University |
Cao, Hongli | Southeast University |
Keywords: Distributed Intelligent Systems, System Modeling and Control, Adaptive Systems
Abstract: Federated learning (FL) enables model training on decentralized devices while preserving data privacy. However, data heterogeneity poses a significant challenge to FL, and various approaches have been proposed to address it. Existing research has mainly focused on either enhancing global models or customizing personalized models for clients. This paper proposes a novel approach, FedCRC, that decouples the machine learning model into a representation extractor and predictor. This enables us to enhance both generalization and personalization, thereby addressing the challenge of data heterogeneity in FL. The approach employs a stable global predictor to unify the representation learning criterion during the training of the representation extractor. Additionally, a personalized predictor is trained for each client to achieve a personalized model tailored to the local data distribution. Our FedCRC algorithm was evaluated on multiple benchmark datasets with varying distributions, covering diverse settings. Extensive experimental results demonstrate the effectiveness of our method.
|
|
16:00-17:00, Paper Mo-S4T4.2 | Add to My Program |
FedGSync: Jointly Optimized Weak Synchronization and Gradient Transmission for Fast Distributed Machine Learning in Heterogeneous WAN |
|
Huaman, Zhou | University of Electronic Science and Technology of China |
He, Yihong | University of Electronic Science and Technology of China |
Zhang, Zhihao | University of Electronic Science and Technology of China |
Luo, Long | University of Electronic Science and Technology of China |
Yu, Hongfang | University of Electronic Science and Technology of China |
Sun, Gang | University of Electronic Science and Technology of China |
Keywords: Distributed Intelligent Systems, Service Systems and Organizations, Quality and Reliability Engineering
Abstract: Due to privacy and cost reasons, distributed machine learning in Wide-Area Networks(DML-WAN) is becoming an emerging and popular collaborative learning paradigm. However, heterogeneity in computing power and data distribution among workers in different locations has a dramatic impact on training performance, including convergence speed and learning accuracy. Most of the existing works on distributed training mechanisms either focus on computing heterogeneity or data heterogeneity, and none of them can handle both well. In this paper, we propose FedGSync, a novel distributed training mechanism to improve the training performance for DML-WAN, where computing heterogeneity and data heterogeneity usually coexist. To speed up training and improve model accuracy, FedGSync clusters workers into groups according to the similarity of their data distribution and introduce group-based weak synchronization to minimize the synchronization delays waiting for slow workers and the accuracy loss by balancing the contributions of all data distributions. To preserve data privacy and improve efficiency, FedGSync only groups workers based on principal components of gradients and design an approximate grouping mechanism based on Kmeans. To further reduce synchronization time, FedGSync prioritizes packets and uses differential transmission for gradient packets between groups. Evaluation results demonstrate that FedGSync improves convergence speed and learning accuracy under the coexistence of computing heterogeneity and data heterogeneity compared with state-of-the-art distributed training mechanisms.
|
|
16:00-17:00, Paper Mo-S4T4.3 | Add to My Program |
An Empirical Exploration of Working Memory, Selective Attention and Reasoning During the Comprehension of Process Models |
|
Winter, Michael | University of Würzburg |
Pryss, Rüdiger | University of Würzburg, |
Keywords: Enterprise Information Systems, Decision Support Systems, Technology Assessment
Abstract: Moving toward a digital environment poses complex challenges for organizations. Digital blueprints, or process models, are crucial for facilitating digital transformation. However, it is vital to ensure that all stakeholders correctly understand these models to reap their benefits. Despite extensive research on process model comprehension, there is still a lack of in-depth knowledge about the role of cognitive aspects. To address this gap, this paper presents the findings of an empirical study that evaluated the predictive value of cognitive skills, namely working memory, selective attention, and reasoning, on process model comprehension. The study involved 50 participants, and the analysis revealed that higher reasoning abilities were associated with better comprehension, emphasizing the critical role of this cognitive skill in understanding process models.
|
|
16:00-17:00, Paper Mo-S4T4.4 | Add to My Program |
A Study of Chinese Medicine Entity Recognition Method by Fusing Multi-Features and Pointer Networks |
|
Lv, Zihao | China West Normal University |
He, Chunlin | China West Normal University |
Xu, Liming | China West Normal University |
Keywords: Service Systems and Organizations, Large-Scale System of Systems, Intelligent Transportation Systems
Abstract: The recognition of named entities in Traditional Chinese medicine (TCM) is a difficult task in the field of medical information extraction, which often contains a large number of domain nouns and specialized terms with high semantic complexity and unclear entity boundaries and multiple meanings among some entities. In order to effectively solve the recognition problem of named entities in TCM, and to address the phenomenon of underutilized semantic information in entity recognition tasks, an entity recognition method incorporating Chinese character multi-features and SPAN pointer networks is proposed to obtain character feature vectors of data using the powerful characterization information of the pre-training model BERT, connect the character vectors with lexical and radical feature embeddings, and obtain the long-range textual context through BiGRU and Attention layer to obtain the contextual information of long-range text, and finally use SPAN pointer network to achieve the start boundary determination of the entity and complete the extraction of the entity. In addition, adversarial training and focal loss are added to reduce the risk of overfitting and enhance the generalization ability and robustness of the model. The final experiments show that this method has superior performance in dealing with the named entity recognition problem of TCM.
|
|
16:00-17:00, Paper Mo-S4T4.5 | Add to My Program |
Detection of Driver Cognitive Distraction Using Driver Performance Measures, Eye-Tracking Data and a D-FFNN Model (I) |
|
Shajari, Arian | Deakin University |
Asadi, Houshyar | Deakin University |
Alsanwy, Shehab | Deaklin University |
Nahavandi, Saeid | Swinburne University of Technology |
Keywords: Intelligent Transportation Systems
Abstract: The issue of cognitive distraction during driving has been identified as a major cause of road accidents. Detecting cognitive distraction in real-time can be a valuable strategy for preventing accidents. In this study, a novel approach is presented for the purpose of detecting cognitive distraction in real-time using artificial intelligence while taking into account eye-tracking and head movement data, combined with driving performance measures. This methodology involved collecting data from participants in a driving simulator, on a motion platform, while they performed a cognitive task as well as a control driving scenario. The data collected included eye-tracking data, head movement data, driving performance measures, and subjective ratings of distraction. To develop an accurate model for detecting cognitive distraction, a Deep Feedforward Neural Network (D-FFNN) model was employed while considering binocular gaze direction, pupil diameter, orientation of each eye, head rotational velocities, and head acceleration. The developed model was trained using the collected data and achieved an accuracy of 96.09% in detecting cognitive distraction. The results of our study demonstrate the effectiveness of the proposed method in identifying cognitive distraction in real-time. Also, the accuracy of this model was compared with other AI based classification algorithms. The proposed method has significant implications for preventing vehicle accidents caused by cognitive distraction. The proposed method can be integrated into existing driver-assistance systems to alert drivers and assist them in returning their focus to the road.
|
|
16:00-17:00, Paper Mo-S4T4.6 | Add to My Program |
Haptically-Enabled Robotic Teleoperation for Transcranial Magnetic Stimulation (TeleTMS) |
|
Mohsenzadeh Kebria, Parham | Deakin University |
Nahavandi, Saeid | Swinburne University of Technology |
Enticott, Peter | Deakin University |
Bello, Fernando | Imperial College |
Keywords: Human-Collaborative Robotics, Telepresence, Haptic Systems
Abstract: Transcranial Magnetic Stimulation (TMS) is a non-invasive and painless technique used in both clinical trials and research on cortical activity and brain networks. TMS involves the use of an electromagnetic coil, which can induce powerful but brief magnetic pulses. When the coil is headed against the scalp, it can induce electrical activity in underlying brain tissue. For effective results, the TMS coil should be in appropriate contact with patients' scalp and positioned for consistent stimulation. In many cases, it requires researchers and clinicians to not only hold and position the coil on subjects' heads, also to take care to ensure appropriate and consistent contact between the TMS coil and the subject's scalp. This task is noticeably tiresome for the operators considering weight of the coil and the dense cable attached to it. On the other side, the patient or participant has to sit motionlessly; otherwise, the contact will be lost and the stimulation will have a reduced impact. In this paper, we propose and develop a haptically-enabled teleoperated robotic platform that removes all those limitations and burdensome from both TMS operators and patients/participants. The operator, through a haptic interface, remotely controls a robotic arm holding the coil. This system provides the operator with the sense of touch to feel the contact force between the coil and patient/participant’s head. Therefore, operators and patients/participants do not need to be in the same location while conducting TMS, including the ``motor thresholding” procedure. This will offer a huge benefit to healthcare services in rural areas. Experimental evaluations were carried out to demonstrate the effectiveness of the proposed robotic system.
|
|
16:00-17:00, Paper Mo-S4T4.7 | Add to My Program |
FISMI-DRL: A Framework for Interactive Segmentation of Medical Image Based on Deep Reinforcement Learning |
|
Qian, Chen | Donghua University |
Yang, Hao | Donghua University |
Li, Jiyun | Donghua University |
Keywords: Human-Machine Interaction, Biometrics and Applications,, Networking and Decision-Making
Abstract: At present, deep learning-based medical image segmentation algorithms have achieved fast and accurate semantic segmentation. However, their segmentation accuracy is still challenging to reach the clinical use standard, requiring further refinement by medical experts. Therefore, some researchers have turned their attention to interactive segmentation methods, which introduce human interaction to obtain information gain. Such methods model the dynamics of the image annotation process state and can effectively improve the segmentation accuracy under the interaction of medical experts. In this paper, we put forward a novel framework for the interactive segmentation of medical images based on deep reinforcement learning, namely FISMI-DRL. The experimental results demonstrate that our model achieves high segmentation accuracy and interaction efficiency.
|
|
Mo-S4T5 Virtual Session, Room T5 |
Add to My Program |
Affective Computing |
|
|
|
16:00-17:00, Paper Mo-S4T5.1 | Add to My Program |
Dialogue-Clues: Dual-Channel Dialogue Clues Embedding Context Perception Network for Emotion Recognition in Conversations |
|
Cao, Yukun | ShangHai University of Electric Power |
He, Zhenyi | Shanghai University of Electric Power |
Tang, Yijia | Shanghai University of Electric Power |
Yan, Jialuo | Shanghai University of Electric Power |
Cheng, Yu | Shanghai University of Electric Power |
Kangle, Xu | Shanghai University of Electric Power |
Keywords: Affective Computing, Human-Machine Interaction
Abstract: In recent years, emotion recognition in conversations has gained widespread attention due to its extensive applications. Many recent studies have focused on perceiving conversational context from the perspective of capturing dialogue clues. However, these studies often use the entire conversation sequence to represent the dialogue clues, which may result in insufficient representation of the speaker's emotional dynamics in multi-turn conversation. To address these issues, we propose a novel dual-channel dialogue clues embedding context perception network (Dialogue-Clues) that integrates dialogue clues information into global conversation context modeling. We also introduce a dual-channel dialogue clues perception architecture that captures and reinforces both static and dynamic dialogue clues in conversations, and bi-directionally reinforces both types of dialogue clues information. To better represent dynamic dialogue clues, we construct a novel speaker-interaction isomorphic graph structure for the dual-channel dialogue clues perception architecture. Through extensive comparisons with ten existing methods on four public datasets, we confirm the effectiveness of the proposed method. The results demonstrate that integrating Dialogue-Clues information can improve the ability of context modeling.
|
|
16:00-17:00, Paper Mo-S4T5.2 | Add to My Program |
Two-Stage Aspect Sentiment Quadruple Prediction Based on MRC and Text Generation |
|
Li, Zhijun | Qilu University of Technology (Shandong Academy of Sciences) |
Yang, Zhenyu | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Xiaoyang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Yiwen | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Affective Computing
Abstract: In recent years, aspect sentiment quadruple prediction (ASQP) has become popular in aspect-based sentiment analysis (ABSA). Its purpose is to decode a given sentence into aspect sentiment quadruples (aspect category, aspect term, opinion term, and sentiment polarity). When trying to efficiently extract aspect sentiment quadruples, the following problems are often encountered: Firstly, the intrinsic relationships between aspect terms and opinion terms are usually ignored, thus failing to address the correlation between establishing aspect-opinion pairs and ignoring the mutual interference between different sentiment quadruples; Secondly, the semantic information contained in the sentiment elements of comment utterances is often underutilized, thus increasing the risk of obtaining inaccurate predictions. We propose a two-stage framework to address these issues by enhancing the correlations between aspects and opinions and fully utilizing the semantic information of sentiment elements. Specifically, in the first stage, we treat the extraction task as a machine reading comprehension (MRC) problem, employ a span-based labeling scheme, and construct a question-and-answer-based MRC task to efficiently extract aspect-opinion pairs. In the second stage, we view the classification of aspect categories and sentiment polarities as a text generation task, where the semantics of sentiment elements can be leveraged by learning to generate them in natural language form. Finally, the two stages are combined with our proposed template generator, and the aspect sentiment quadruples can be decoded. We conducted experiments on two datasets, and the experimental results were superior to those of the comparison approaches, with our model demonstrating excellent performance in terms of processing complex sentences containing multiple quaternary groups. Additionally, in subtask experiments, our model achieved good results, further proving its effectiveness.
|
|
16:00-17:00, Paper Mo-S4T5.3 | Add to My Program |
Dual-Cell Recurrent Network for Target-Oriented Opinion Word Extraction on Global Fields |
|
Huang, Jiaming | Xihua University, School of Computer and Software Engineering |
Li, Xianyong | Xihua University, School of Computer and Software Engineering |
Du, Yajun | Xihua University, School of Computer and Software Engineering |
Xie, Chunzhi | Xihua University, School of Computer and Software Engineering |
Chen, Xiaoliang | Xihua University, School of Computer and Software Engineering |
Fan, Yongquan | Xihua University, School of Computer and Software Engineering |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering, Design Methods
Abstract: Target-oriented opinion word extraction (TOWE) is critical in aspect-based sentiment analysis. It aims at extracting opinion words that are related to aspect terms. Existing TOWE approaches primarily focused on explicit or implicit target aspects. However, few methods dealt with them simultaneously. For compensating this limitation, this study proposes a dual-cell recurrent network (DCRN) that combines aspect term extraction (ATE) and target-oriented opinion word extraction. The DCRN model is trained and evaluated on global fields, including explicit and implicit target aspects. Empirical results demonstrate that the proposed DCRN model outperforms existing methods by an average of 4.90% on the SemEval14–16 datasets. Furthermore, the DCRN model achieves higher Macro-F1 values than the IOG model on the Restaurant 14–16 datasets by 8.97%, 7.90%, and 8.70%, respectively. These results indicate that the DCRN model significantly improves the performance of TOWE and exhibits robust generalization capabilities. Index Terms--target-oriented opinion words extraction, aspect-term extraction, aspect-based sentiment analysis, global fields
|
|
16:00-17:00, Paper Mo-S4T5.4 | Add to My Program |
Semantic Interaction Fusion Framework for Multimodal Sentiment Recognition |
|
Yang, Shanliang | Shandong University of Technology |
Cui, Lichao | School of Computer Science and technology,Shandong Univer |
Tao, Wang | Shandong University of Technology |
Keywords: Affective Computing, Cognitive Computing, Intelligence Interaction
Abstract: Multimodal sentiment recognition has gained considerable attention for its relevance to various applications. To improve performance, it is critical to extract semantic information and fuse multimodal features. However, most current methods either emphasize single-modal semantic extraction and representation or lack semantic integration at a deep level. In this paper, we propose a Semantic Interaction Fusion Framework (SIFF) extracting the semantic information that evokes the specific sentiment from multiple modalities and integrating multimodal semantic information using a gate attention fusion module. The gate attention fusion module fuses multimodal semantic information adaptively, eliminating the influence of conflicting information and strengthening the emotional cues interaction between multiple modalities. We perform experiments on two benchmark datasets, CMU-MOSI and CMU-MOSEI. Our proposed method on both datasets achieves the accuracy of 87.2% and 86.5%, respectively, which is a 1.1% and 1.8% absolute improvement over the current state-of-the-art.
|
|
16:00-17:00, Paper Mo-S4T5.5 | Add to My Program |
Dynamic Facial Expression Recognition Based on Vision Transformer with Deformable Module |
|
Wang, Rui | University of Science and Technology of China |
Sun, Xiao | Hefei University of Technology |
Keywords: Affective Computing, Ethics of AI and Pervasive Systems
Abstract: Facial expressions convey a great deal of information during human emotional interaction. However, due to the potential for various types of in-the-wild interferences, such as occlusions and variant head poses, dynamic facial expression recognition (DFER) has been a desperately complicated task. Previous methods focus on applying more robust models to extract the spatial-temporal features but ignore the impact of the key features of the regions of interest (ROIs). This inhibits further improvement of recognition accuracy. In this paper, we propose a 3D vision Transformer with a deformable module termed 3D-DSwin Transformer to guide our model to capture more discriminative features. The deformable module can gradually shift the deformable points to guide our model to pay more attention to the ROIs. A simple yet effective video augmentation method is proposed to expand the number of training samples and avoid overfitting. Visualizations and extensive experimental results demonstrate that our proposed 3D-DSwin Transformer has the ability to obtain the key feature maps, and outperforms the previous state-of-the-art methods on both the FERV39k and DFEW benchmarks.
|
|
16:00-17:00, Paper Mo-S4T5.6 | Add to My Program |
Audio-Visual Emotion Recognition Based on Multi-Scale Channel Attention and Global Interactive Fusion |
|
Zhang, Peng | Qilu University of Technology, Shandong Computer Science Center |
Zhao, Hui | Qilu University of Technology |
Li, Meijuan | Qilu University of Technology, Shandong Computer Science Center |
Chen, Yida | Qilu University of Technology, Shandong Computer Science Center |
Zhang, Jianqiang | Qilu University of Technology, Shandong Computer Science Center |
Wang, Fuqiang | Qilu University of Technology, Shandong Computer Science Center |
Wu, Xiaoming | Qilu University of Technology, Shandong Computer Science Center |
Keywords: Affective Computing, Human-Machine Interaction, Human-Computer Interaction
Abstract: Facial expressions and speech are the most natural and common ways for humans to express their emotions. Automatic audio-visual emotion recognition has attracted a lot of attention in recent years. However, previous methods cannot effectively exploit the complementarity between modalities when performing feature fusion, and the resulting fused feature representations may contain redundant information from different modalities. This paper proposes an audio-visual emotion recognition model based on multi-scale channel attention (MCA) and global interactive fusion (GIF). The MCA module is designed to extract modal key emotional features at multiple contextual scales to express human emotions. Then, feature fusion is performed by using the GIF module, which is implemented in two steps. In the first step, features are fused through a global interactive attention layer, which considers the global interactive information of both intra and inter modalities, and reduces the redundancy of fused feature representations by fusing only the attention scores. In the second step, multiple convolutional neural networks with different kernel sizes are used to further learn the multi-scale emotional information in the fused features that is meaningful for both modalities. The proposed model is verified on two multimodal emotion datasets, the RAVDESS and the SAVEE, and achieves accuracies of 84.12% and 98.71%, respectively, with only 2M parameters.
|
|
16:00-17:00, Paper Mo-S4T5.7 | Add to My Program |
CMM: Code-Switching with Manifold Mixup for Cross-Lingual Spoken Language Understanding |
|
Mao, Tianjun | Fudan University |
Zhang, Chenghong | Fudan University |
Keywords: Human-Machine Interaction, Human Perception in Multimedia, Human-Computer Interaction
Abstract: Spoken language understanding (SLU) is a task that typically involves intent detection and slot filling. Although it has achieved great success in high-resource languages, it remains challenging in low-resource languages due to the lack of labeled training data. Consequently, there is growing interest in code-switching method for zero-shot cross-lingual SLU to tackle with the challenge in low-resource languages. However, despite the success of existing models with code-switching method, most of them do not address the problem of difficulties in learning from code-switched utterances. To tackle this issue, we propose a framework called Code-Switching with Manifold Mixup for zero-shot cross-lingual spoken language understanding (CMM) that simplifies learning task for model. Specifically, we apply both mixup and curriculum learning to dynamically combine information from pure utterances and code-switched utterances. Our experimental results show that the proposed framework significantly improves performance compared to strong baselines and achieves state-of-the-art performance on the MultiATIS++ dataset, with a relative improvement of 3.0% in overall accuracy over the previous best model.
|
|
Mo-S4T6 Virtual Session, Room T6 |
Add to My Program |
Image Processing and Pattern Recognition I |
|
|
|
16:00-17:00, Paper Mo-S4T6.1 | Add to My Program |
Decoupled Adversarial Network and Self-Training with Weighted Pseudo-Labels for Domain Adaptive Semantic Segmentation |
|
Zheng, Yilin | China Jiliang University |
He, Lingmin | China Jiliang University |
Li, Jianbao | China Jiliang University |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Deep Learning
Abstract: Unsupervised Domain Adaptation(UDA) in semantic segmentation reduces the dependence on pixel level labeling. It uses labeled source domain datasets and unlabeled target domain images to learn to segment the network. This article proposes a domain adaptive framework that combines decoupled adversarial network and self-training. The problem of over fitting the source domain in domain adaptation and the inability of the network to focus on segmentation tasks has been solved. Considering the impact of the long tailed distribution of data, the Rare Class Sampling (RCS) module is introduced. In order to fully utilize pseudo-labels, we designed an UDA scheme using self-training with weighted pseudo-labels. At the same time, the RCS module for rare class sampling improves the quality of pseudo-labels by reducing the recognition bias of self-training on public classes. The LoveDA dataset is the latest domain adaptive dataset for land cover mapping. In urban to rural and rural to urban scenarios, our proposed UDA method has significant advantages.
|
|
16:00-17:00, Paper Mo-S4T6.2 | Add to My Program |
Oriented CenterNet: Rotated Object Detection in Remote Sensing Images |
|
Cheng, Mengfan | Qilu University of Technology(Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Liu, Deqi | Qilu University of Technology(Shandong Academy of Sciences) |
Yao, Dexu | Qilu University of Technology(Shandong Academy of Sciences) |
Liu, Xiaohan | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Image Processing and Pattern Recognition
Abstract: 遥感影像中的物体通常很小, 密集的复杂背景。最重要的功能 是非轴对称和任意对齐。因此 HBB(水平边界框)不适合表示 遥感图像中的物体。我们提出一个 名为定向中心网络的检测框架,它可以 有效检测任意方向的对象。我们采用 轻量级的骨干称为Swin-Tiny Transformer。 与基于CNN的骨干网相比,它可以获得全球 感受野并与其他像素建立连接 点。然后,我们提出了一个新的简单的六参数 表示,名为“角点偏移表示”,至 表示旋转的对象,可以将 HBB 转换为 RBB (旋转边界框)容易。对于远程中的小物体 感知图像,网络采用有效功能 融合和采样方法,集成中心池 进入预测模块,增强功能。 小物体并削弱背景噪音。在 此外,我们通过除以 回归损失为水平和旋转&
|
|
16:00-17:00, Paper Mo-S4T6.3 | Add to My Program |
A Practical YOLOV5 Face Detector with Decoupled Swin Head |
|
Yuan, Shuozhi | Beijing University of Posts and Telecommunications |
Guo, Wenming | Beijing University of Posts and Telecommunications |
Feng, Yang | Xinjiang Institute of Engineering |
Keywords: Image Processing and Pattern Recognition, Deep Learning, AI and Applications
Abstract: Face detection is a fundamental and practical problem in computer vision, which aims to indicate the face positions in a wild environment precisely. However, different from the generic object detection tasks, there are a large number of face samples that suffer from nconstrained poses, occlusion, extreame lights, or other harmful conditions. YOLOV5 is an incredible milestone in the object detection area, but still not powerful enough for the challenging samples. To ease these difficulties, in this paper, we customize DSH-YOLOV5, a practical face detector. Specifically, we integrate a decoupled head with swin transformer layers, whose self-attention mechanism has great potential to explore subtle interconnection details. Additionally, we use two context modules (CBAM and SSH) to enhance the performance of features. Furthermore, we design a novel copy-paste data augmentation to fit the above challenging scenes. Extensive experiments demonstrate that we achieve SOTA performance on WIDER FACE, FDDB, and PASCAL FACE with competitive computational costs. Moreover, under COVID-19, we add the breathing mask and gender classification branches based on the ROI Align to produce more practical face information.
|
|
16:00-17:00, Paper Mo-S4T6.4 | Add to My Program |
Run Away from the Original Example and towards Transferability |
|
Yang, Rongbo | Nanjing University of Science and Technology |
Li, Qianmu | Nanjing University of Science and Technology |
Meng, Shunmei | Nanjing University of Science and Technology |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Neural Networks and their Applications
Abstract: Transfer-based attacks against black-box neural network models have received increasing attention because they are more realistic scenarios, but how to produce highly transferable adversarial examples on the surrogate model becomes critical. In this work, we find that if the attack direction of the original example is controlled from the beginning, the produced adversarial examples will be more transferable. Specifically, we propose the Output Direction Controller (ODC) to initialize the example direction so that the example starts off with a deviation from the true direction or toward the target direction. ODC is a simple and extensible component that can be combined with various transfer-based attack methods and significantly improve the transferability of the adversarial examples. On the ImageNet dataset, we optimize the baseline method by ODC to improve the success rate of untargeted attacks by an average of 11.79% and targeted attacks by an average of 3.38%. Code is available at https://github.com/yangrongbo/ODC.
|
|
16:00-17:00, Paper Mo-S4T6.5 | Add to My Program |
AFPN: Asymptotic Feature Pyramid Network for Object Detection |
|
Yang, Guoyu | Zhejiang University of Technology |
Lei, Jie | Zhejiang University of Technology |
Zhu, Zhikuan | Zhejiang University of Technology |
Cheng, Siyu | Zhejiang University of Technology |
Zunlei, Feng | Zhejiang University |
Liang, Ronghua | Zhejiang University of Technology |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Neural Networks and their Applications
Abstract: Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at https://github.com/gyyang23/AFPN.
|
|
16:00-17:00, Paper Mo-S4T6.6 | Add to My Program |
A New Method for Single Image Rain Removal with Directional Gradient Constraints |
|
Chen, Yarui | Beijing University of Technology |
Wang, Liang | Beijing University of Technology |
Keywords: Image Processing and Pattern Recognition
Abstract: Due to the presence of rain, the visibility of images captured outdoors on rainy days will be severely degraded. Rain removal using image processing technology can reduce the influence of rain to estimate rain-free images. However, existing traditional rain removal methods are very time-consuming, and newly emerging deep learning-based methods require a large amount of data and computational resources, resulting in long time consumption and poor visual effect. To solve the problems of existing methods, a new method is proposed to remove oblique rain streaks in windy conditions better. Firstly, the directional gradient constraint is proposed to locate oblique rain streaks in the rain layer effectively. Then, a sparse prior for oblique rain streaks is presented to enhance oblique rain streaks removal. After that, an optimization problem combining the directional gradient, sparse priors, and non-negativity constraints is presented. Finally, the alternating direction method of multipliers is exploited to solve the optimization problem effectively. Experiment results show that our method outperforms other methods in removing oblique rain streaks and requires less time.
|
|
16:00-17:00, Paper Mo-S4T6.7 | Add to My Program |
Temporal Aggregation with Context Focusing for Few-Shot Video Object Detection |
|
Han, Wentao | Zhejiang University of Technology |
Lei, Jie | Zhejiang University of Technology |
Wang, Fahong | Zhejiang University of Technology |
Zunlei, Feng | Zhejiang University |
Liang, Ronghua | Zhejiang University of Technology |
Keywords: Image Processing and Pattern Recognition, Representation Learning, Neural Networks and their Applications
Abstract: Few-shot video object detection focuses on finding all the objects in a given query video that belong to the same class, given only a few support images of the target object in an unseen class. Unfortunately, due to the object blur or occlusion in video frames, using single-frame object detection directly will greatly limit the accuracy. The issue is significantly worse in few-shot settings due to insufficient support and time-domain information. In this paper, we propose a temporal aggregation with context focusing framework (TACF) for few-shot video object detection, which aims to fully use the information between support images and adjacent video frames. The context focusing module effectively encodes the target object in adjacent frames according to the support images. Afterward, the temporal aggregation module implicitly extracts the most similar ROI features from these adjacent frames to obtain the target proposals. In the end, the matching network determines the category and bounding box by calculating the distance with the support images. Extensive experimental evaluations on FSVOD and FSYTV databases show that our method achieves more competitive results than image-based methods, naive video-based extensions, and the state-of-the-art few-shot video object detection method.
|
|
Mo-S4T7 Virtual Session, Room T7 |
Add to My Program |
Resource Management and Optimization |
|
|
|
16:00-17:00, Paper Mo-S4T7.1 | Add to My Program |
Understanding Stakeholder Game Relationships and Behaviors to Facilitate Recycled Resource Management: A Systematic Framework |
|
Zhou, Jia-He | Northwestern Polytechnical University; Renmin University of Chin |
Zhu, Yuming | Northwestern Polytechnical University |
Zhou, Qing-Qing | Northwestern Polytechnical University |
Keywords: Conflict Resolution, Consumer and Industrial Applications, System Modeling and Control
Abstract: Understanding the game relationships and behaviors of stakeholders is the key to effectively promoting recycled resource management. The insights from past research are relatively limited and lack a holistic and comprehensive understanding to guide recycled resource management practice in China. To this end, this study systematically reviews and summarizes the stakeholder game relationships and behaviors in recycled resource management, based on which a systematic framework for analyzing stakeholder game relationships and behaviors in recycled resource management is proposed. To further explain this framework, cross-sectional and longitudinal analyses of the game relationships among and within stakeholder groups are conducted. The proposed systematic framework is a unification and deepening of existing research perspectives on stakeholder games in recycled resource management, and has more practical guidance for promoting recycled resource management in China.
|
|
16:00-17:00, Paper Mo-S4T7.2 | Add to My Program |
Truncated Quantile Critics Algorithm for Cryptocurrency Portfolio Optimization |
|
Xiao, Leibing | Northwestern Polytecnaical University |
Wei, Xinchao | Northwestern Polytecnaical University |
Xu, Yuelei | Northwestern Polytecnaical University |
Xu, Xin | Northwestern Polytecnaical University |
Kun, Gong | Northwestern Polytecnaical University |
Li, Huafneg | Northwestern Polytecnaical University |
Zhang, Fan | Northwestern Polytecnaical University |
Keywords: Decision Support Systems
Abstract: This paper investigates portfolio management algorithm for the cryptocurrency market by using the TQC (Truncated Quantile Critics) algorithm. The study is based on the daily prices of cryptocurrencies. TQC is a deep reinforcement learning algorithm with the Actor-Critic architecture. It alleviates the overestimation problem of traditional value learning algorithm. In this paper, the data of cryptocurrencies are first processed as input to the networks. The inputs to the networks include not only the closing prices of cryptocurrencies, but also the relative strength index, moving average line, and moving average convergence divergence. Various metrics measuring algorithm returns and algorithm stability are used as evaluation criteria in this paper. In this paper, common deep reinforcement learning algorithms are compared. The experimental results show that the TQC algorithm has a highest return of 33.9 % during the test period, which is 3 %, 3 % and 15.6 % higher than A2C, PPO and DDPG respectively. And, the TQC algorithm has the highest stability of return, which is an important evaluation metric for portfolio management algorithms. Despite the high volatility of the cryptocurrency market, the performance of the TQC algorithm has remained relatively stable. This illustrates the positive effects of the TQC algorithm.
|
|
16:00-17:00, Paper Mo-S4T7.3 | Add to My Program |
Hybrid Micro-Energy Harvesting System Based on Combined MPPT Method |
|
Cao, Di | Zhejiang University of Technology |
Zhou, Junfeng | Zhejiang University of Technology |
Lei, Yanjing | Zhejiang University of Technology |
Lee, William | Zhejiang University of Technology |
Keywords: Smart Sensor Networks, Adaptive Systems, Control of Uncertain Systems
Abstract: Energy harvesting act as one of the promising techniques to provide sustainable energy for self-powered sensor nodes by converting environmental energy into electricity. Energy harvested from single micro-energy source suffers from low power density and vulnerable to environmental changes, while the hybrid energy harvesting system can supply energy sustainably. However, cumulative power from various energy sources leads to a challenge of low energy conversion efficiency. In this paper, a Combined Maximum Power Point Tracking (Com-MPPT) method for hybrid energy harvesting is designed to improve the overall harvesting efficiency. By analyzing output characteristic curves under different environmental states, a mathematic model for hybrid energy harvesting system is proposed. It is found that with the change of environmental factors, multiple power peaks exist with varying voltage on the load, where traditional MPPT method for single energy source is no longer applicable. Therefore, the proposed Com-MPPT algorithm based on Search skip and linear extrapolation proposed in this paper can quickly search for the global maximum power peak in the case of multiple power peaks in hybrid energy collection. Simulation and experimental results show that the proposed algorithm can track the global maxi-mum power point rapidly under different environmental conditions, and the tracking time is more than 50% shorter than that of the improved particle swarm optimization and flower pollination algorithm.
|
|
16:00-17:00, Paper Mo-S4T7.4 | Add to My Program |
SRockDB: A Range-Query Optimized Database Based on RocksDB |
|
Gai, Shun | National University of Defense Technology |
Xie, Xinjia | National University of Defense Technology |
Keywords: System Architecture, System Modeling and Control
Abstract: Data stores based on Log-Structured Merge Tree (LSM-Tree) are widely used in data centres, Artificial Learning and Machine Learning. As the core data structure, LSM-Tree is efficient in writing operations due to the out-of-place update and leveled design. However, it has limitations in writing, reading and space amplifications which decrease performance, especially range-query. Range query is one of the most important operations in data processing. To address this problem, we propose SRockDB. SRockDB is specifically designed to improve the performance of range queries by the in-memory cache. Compared to the block cache in RocksDB, SRockDB is more effective and coalesces adjacent keys to cache more data in limited memory. In experiments, SRockDB achieves 2.76× Queries Per Second (QPS), up to 41.74% lower tail latency, without downsides on the performance of write and point read.
|
|
16:00-17:00, Paper Mo-S4T7.5 | Add to My Program |
A Low-Cost and Pages-Interrelation-Aware Attention Model for Hybrid Memory Scheduling |
|
Zhen, Yanjie | Tsinghua University |
Chen, Yu | Tsinghua University |
Keywords: System Architecture
Abstract: Hybrid memory architecture has become an important solution to address the increasing demand for the main memory capacity of big data applications. Due to the varying properties of different components in hybrid memory, accurately predicting the hotness of pages and timely scheduling hot pages to fast memory becomes crucial for optimal performance. However, existing hybrid memory schedulers using non-intelligent policy exhibit low performance. Although schedulers employing neural models can improve performance, they suffer limitations such as long inference time and loss of interrelation between pages. This paper presents PI-Attention, a low-cost and pages-interrelation-aware attention model for hybrid memory scheduling. It addresses the limitations above by utilizing two attention modules in the page and time sequence dimensions. Our experiments show that PI-Attention brings 11.14% performance improvement and a 3.75x reduction in inference time.
|
|
16:00-17:00, Paper Mo-S4T7.6 | Add to My Program |
Optimizing Investment Strategies with Lazy Factor and Probability Weighting: A Price Portfolio Forecasting and Mean-Variance Model with Transaction Costs Approach |
|
Han, Shuo | Southwest University |
Chen, Yinan | Southwest University |
Liu, Jiacheng | Southwest University |
Keywords: AI and Applications, Big Data Computing,, Machine Learning
Abstract: Market traders often engage in the frequent transaction of volatile assets to optimize their total return. In this study, we introduce a novel investment strategy model, anchored on the 'lazy factor.' Our approach bifurcates into a Price Portfolio Forecasting Model and a Mean-Variance Model with Transaction Costs, utilizing probability weights as the coefficients of laziness factors. The Price Portfolio Forecasting Model, leveraging the EXPMA Mean Method, plots the long-term price trend line and forecasts future price movements, incorporating the tangent slope and rate of change. For short-term investments, we apply the ARIMA Model to predict ensuing prices. The Mean-Variance Model with Transaction Costs employs the Monte Carlo Method to formulate the feasible region. To strike an optimal balance between risk and return, equal probability weights are incorporated as coefficients of the laziness factor. To assess the efficacy of this combined strategy, we executed extensive experiments on a specified dataset. Our findings underscore the model's adaptability and generalizability, indicating its potential to transform investment strategies.
|
|
16:00-17:00, Paper Mo-S4T7.7 | Add to My Program |
Building Extensible Model for Healthcare Resource Allocation Using Model-Driven Approach |
|
Parveen, Rizwan | Birla Institute of Technology and Science Pilani |
Laurel R, Ashin | Birla Institute of Technology and Science Pilani |
Thalanki, Mihir | Birla Institute of Technology and Science Pilani |
Goveas, Neena | Birla Institute of Technology and Science Pilani |
Gawali, Shubhangi | Birla Institute of Technology and Science Pilani |
Keywords: Cyber-physical systems, System Modeling and Control, Service Systems and Organizations
Abstract: In the healthcare industry, proper allocation of critical resources is essential to prevent dire consequences. The availability of resources is limited and needs to be handled in an organized and orderly manner. Model-driven development is a methodology using which complex scenarios can be built with the abstraction of pre-built components. In this paper, a model for automated resource allocation is constructed using various actors of the Ptolemy-II tool. This model is built in a modular fashion such that most of the key functional actors are custom-built user libraries that can be utilized in future scaling and design. The proposed model is capable of allocating resources and scheduling patient treatment in a range of scenarios with the flexibility to change certain parameters in every case. This paper illustrates how the proposed model can evaluate, visualize, and identify scaling issues in a hospital by running a few different scenarios with varying number of caregivers. These scenarios also show the variability of metrics like average waiting patients, the efficiency of the hospital and average doctor load with respect to the change in the number of caregivers.
|
|
Mo-S4T8 Virtual Session, Room T8 |
Add to My Program |
Deep Learning V-III |
|
|
|
16:00-17:00, Paper Mo-S4T8.1 | Add to My Program |
Cascade Cost Volume Multi-View Stereo Network with Transformer and Pseudo 3D |
|
Qu, Jiacheng | Qilu University of Technology |
Zhao, Shengrong | Qilu University of Technology |
Liang, Hu | Qilu University of Technology |
Zhang, Qingmeng | Department of Orthopaedics Qilu Hospital of Shandong University |
Li, Tingshuai | Qilu University of Technology(Shandong Academy of Sciences) |
Liu, Bing | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Deep Learning, Machine Learning, Machine Vision
Abstract: 基于学习的多视图立体声 (MVS) 和立体声匹配 方法通常基于 参考视图的相机视锥体。正则化和 执行成本量的回归以获得 深度图。但是,输出深度图的分辨率 受计算成本的限制,并且在执行时 特征提取,卷积的特征 本地感知使得无法捕捉全球 上下文信息。在本文中,我们提出了CTPMVSNet: 使用全局特征感知转换器 (GFT) 来 聚合内部和之间的全局上下文信息 图像。为了更好地利用GFT,我们使用 可变形卷积模块 (DCM) 确保平滑 提取的特征范围的过渡。此外,在 成本量正规化阶段,提高效率 和生成精度,我们设计轻量级 集成的正则化网络 伪三维卷积和我们的实验 在多个数据集上取得了有希望的结果。
|
|
16:00-17:00, Paper Mo-S4T8.2 | Add to My Program |
Public Crisis Events Tweet Classification Based on Multimodal Cycle-GAN |
|
Zhou, Jinyan | Qilu University of Technology |
Wang, Xingang | Qilu University of Technology(Shandong Academy of Sciences) |
Lv, Jian dong | Shandong University of Science and Technology |
Liu, Ning | China University of Mining and Technology |
Zhang, Hong | Qilu University of Technology |
Cao, Rui | Qilu University of Technology |
Liu, Xiaoyu | Qilu University of Technology(Shandong Academy of Sciences) |
Li, Xiaomin | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, Machine Learning, Machine Vision
Abstract: 公共危机事件是意想不到的灾难性事件 危及整个公众的整体生活。最 公共危机事件是不可预测和突然的, 包括事故、自然灾害、社会动荡、 突发公共卫生事件等。当公共危机 事件发生后,用户通常会在社交上发送许多推文 脸书、推特等媒体平台分享 实时情况并寻求帮助。这些推文,如果 有效选择和利用,将有助于人道主义 组织评估情况并计划救济 操作。以前的研究主要使用文本数据 推文分类,但忽略了两者之间的互补性 多模态数据。虽然有些作品结合了多式联运 来自推文的数据用于推文分类,但融合 方法不够全面,忽略了 多模态数据之间的异构差异。 因此,MMC-GAN(多模态循环-GAN与混合 引入融合策略)模型,对 公共危机。MMC-GAN模型由
|
|
16:00-17:00, Paper Mo-S4T8.3 | Add to My Program |
Real-Time Defect Detection Network Based on Hybrid Attention Mechanism for Small-Size Printed Circuit Boards |
|
Wang, Congcong | Qilu University of Technology |
Wei, Xiumei | Qilu University of Technology |
Wu, Xiaoming | Qilu University of Technology |
Jiang, Xuesong | Qilu University of Technology |
Keywords: Deep Learning, Machine Vision, Machine Learning
Abstract: The defect detection of Printed Circuit boards (PCB) is challenging due to the complex image background, various types of defects, and small size of defects. The SC-YOLOv5 network for accurately detecting printed circuit boards is developed and evaluated in this paper. First, we combine the spatial attention mechanism of SA with the channel attention module of the Efficient Channel Attention Mechanism (ECA) to construct the hybrid attention mechanism module (SCA). SCA has higher defect feature expression ability and doesn't need dimensionality reduction. Second, we analyze the feature pyramid structure of YOLOv5 and construct a multi-direction dilated convolution module (MD) for the last feature layer. MD has a rich receptive field, so MD can retain more defect information during the feature pyramid downsampling process. We perform experimental evaluations on PCB Dataset and DeepPCB datasets. Experiments show that SC-YOLOv5 improves the detection effect on two data sets, and the detection speed can reach about 120FPS. Compared with the mainstream indication defect detection algorithms, SC-YOLOv5 significantly improves accuracy.
|
|
16:00-17:00, Paper Mo-S4T8.4 | Add to My Program |
HEROCA: Multimodal Sentiment Analysis Based on HEterogeneous Representation Optimization and Cross-Modal Attention |
|
Zhao, Tong | Shanghai University |
Peng, Junjie | Shanghai University |
Wang, Lan | Shanghai University |
Zheng, Cangzhi | Shanghai University |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Multimodal sentiment analysis aims to determine human sentiment by extracting and integrating valuable information from multiple modalities, which has a wide range of application scenarios and can provide reliable references for opinion analysis, disease analysis, financial prediction and other fields. Many researchers in this field have proposed effective strategies from different perspectives. However, existing studies often fuse textual and non-textual modalities at the same layer while neglect the difference in information quality between them. The huge representation quality gap between modalities hinders the model from thoroughly learning the interactions between modalities and affects the model's performance. Based on the above reasons, we propose a multimodal sentiment analysis model based on heterogeneous representation optimization and cross-modal attention. The model designs a bimodal gated filtering mechanism which enhances the quality of semantic representations of non-text modalities by extracting and optimizing non-text modalities information. In addition, to fully exploit the useful inter-modal information and improve the quality of modality fusion, the model uses a cross-modal attention mechanism to model the complex dependencies between modalities and introduces a holistic view for semantic complementation in the prediction stage. To verify the effectiveness of our proposed method, we conduct experiments on two public datasets, CMU-MOSI and CMU-MOSEI. The experimental results show that our method can efficiently decrease the semantic gap between text and non-text modalities, significantly outperforming existing methods with superior generalization ability and strong competitiveness.
|
|
16:00-17:00, Paper Mo-S4T8.5 | Add to My Program |
FPA-WAN: Feature Pyramid Attention Based Watermarking Attack Network |
|
Wang, Chunpeng | Qilu University of Technology(Shandong Academy of Sciences) |
Li, Xinying | Qilu University of Technology(Shandong Academy of Sciences) |
Xia, Zhiqiu | Qilu University of Technology(Shandong Academy of Sciences) |
Li, Qi | Qilu University of Technology(Shandong Academy of Sciences) |
Wei, Ziqi | Institute of Automation, Chinese Academy of Sciences |
Ma, Bin | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Deep Learning, Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: Digital watermarking technology is a method of embedding specific information in digital images and videos,often used for copyright protection and identity verification.However, unscrupulous users may also use this technique to falsify and tamper with data. Therefore, a reliable watermark attack network is needed to detect and remove watermarks embedded by unscrupulous users. In this paper, we propose a watermarking attack network based on feature pyramid attention, which can effectively remove watermarks embedded in digital images and greatly guarantee the image quality of carrier images. The network consists of two main modules: the feature extraction module and the watermark attack module. In the feature extraction module, we use a convolutional neural network and a residual block to extract the features of the image. Then, in the watermarking attack module, we use the pyramid attention mechanism to focus on the important regions in the feature map and apply the attention weights to the watermarking attack operation. To validate the effectiveness of this network, we conducted experiments using a variety of standard data sets. Experimental results show that the network can effectively attack the watermark information embedded in digital images while guaranteeing the quality of the images after the attack. Overall, the watermarking attack network based on feature pyramid attention proposed in this paper is an effective attack with high imperceptibility that can be applied in the field of digital media protection and security in practical scenarios.
|
|
16:00-17:00, Paper Mo-S4T8.6 | Add to My Program |
Fast Crop Pest Detection Using Lightweight Feature Extraction and Knowledge Distillation |
|
Yang, Ze | Ningbo University |
Xianliang, Jiang | Ningbo University |
Jin, Guang | Ningbo University |
Huang, JunKai | Ningbo University |
Bai, Jie | Ningbo University |
Yu, Dingxin | Ningbo University |
Keywords: Deep Learning, Machine Vision, AI and Applications
Abstract: Pest detection is critical for achieving effective pest control. However, the current deep learning-based pest detection algorithm is unsuitable for deployment on resource-limited edge devices due to its extensive computation and long inference time. Although lightweight models have been widely used for practical detection, their insufficient feature extraction capability leads to a decline in detection accuracy. This paper proposes a fast algorithm for crop pest detection based on lightweight feature extraction and knowledge distillation. Firstly, we introduce partial convolution and propose a lightweight feature extraction module, C3Faster, which reduces the model’s computation and speeds up model inference while ensuring effective feature extraction. Secondly, we use knowledge distillation to improve the model’s detection accuracy by using teacher networks to assist in training. Finally, we created a dataset, CropPest6, consisting of six crop pest categories and conducted experiments. The experimental results demonstrate that our method reduces the detection time, number of parameters, and computation by 17%, 38%, and 44%, respectively, compared to the baseline model. Furthermore, our method achieves 93.9% Precision, 93.6% Recall, and 97.5% mean Average Precision (mAP), demonstrating its practical suitability for fast crop pest detection.
|
|
16:00-17:00, Paper Mo-S4T8.7 | Add to My Program |
Enhancing Code Search with Token-Level Information Flow Graphs Generated from Aligned Structural and Textual Features |
|
Liu, Sirui | Wuhan University |
Guo, Weirong | Wuhan University |
Yu, Yaoxiang | Wuhan University |
Cai, Bo | Wuhan University |
Keywords: Deep Learning, Representation Learning, AI and Applications
Abstract: Large-scale code search is a crucial task in software engineering, yet existing deep learning based models often embed Abstract Syntax Trees (ASTs) and code sequences separately, limiting their ability to learn the correlation between structural and textual features. To address this limitation, we propose a novel code search model that automatically generates Token-Level Information Flow Graphs (TL-IFGs) from aligned AST nodes and source code tokens. Our model includes an aligner that establishes a one-to-one correspondence between AST leaves and code tokens, which we make publicly available, along with a processed dataset to facilitate further research. The model automatically generates a TL-IFG for each code snippet from the aligned datas by predicting the information flow at the token-level, which ensures that structural and textual features are highly correlated during the embedding process. We also generate TL-IFGs for descriptions and embed them using a similar process. Experimental results demonstrate that our model outperforms state-of-the-art code search models, indicating the effectiveness of our approach. Furthermore, an ablation study shows that the generated TL-IFGs for both code and description positively impact model performance.
|
|
Mo-S4T9 Virtual Session, Room T9 |
Add to My Program |
New Session for Latest Online Requests II |
|
|
|
16:00-17:00, Paper Mo-S4T9.1 | Add to My Program |
Energy-Optimized with Multi-Population Differential Annealed Optimization in Mobile Edge Computing |
|
Wu, Ruixuan | Beijing University of Technology |
Shi, Yuliang | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Intelligent Internet Systems
Abstract: Mobile devices (MDs) cannot fully run all computation/delay-sensitive tasks due to their limited computing resources. Mobile edge computing (MEC) meets the demand by providing massive resources for MDs and offloading task partitions to MEC servers. However, task offloading also brings communication delay and energy consumption. Therefore, it is challenging to associate resource-constrained MDs with appropriate MEC servers to minimize power consumption. To address this problem, a constrained mixed integer nonlinear program is formulated to optimize the total energy consumption of the system including MDs and MEC servers. To solve this problem, this work designs an improved meta-heuristic optimization algorithm called Self-adaptive and Multi-population Differential Annealed Optimization (SMDAO). SMDAO jointly optimizes the transmission power of MDs used to upload task partitions to MEC servers, that of MDs used to upload the execution results of local task partitions to MEC servers, the offloading ratio of tasks, CPU running frequencies of MDs used to execute their task partitions locally, computation speeds allocated by MEC servers for each task partition, and association relations among MDs and MEC servers. Experimental results demonstrate that compared with its two state-of-the-art peers, the proposed SMDAO yields the best solution with the smallest total energy consumption in the least time.
|
|
16:00-17:00, Paper Mo-S4T9.2 | Add to My Program |
Focusing on Needs: A Chatbot-Based Emotion Regulation Tool for Adolescents |
|
Ni, Yeming | Tsinghua University |
Ding, Ruyi | Sun Yat-Sen University |
Chen, Yuqing | Tsinghua University |
Hou, Hanchao | Tsinghua Shenzhen International Graduate School |
Ni, Shiguang | Tsinghua University |
Keywords: Affective Computing, Kansei (sense/emotion) Engineering, Human-Machine Interface
Abstract: Adolescents face much psychological stress in the current social environment, and effective emotional regulation is crucial to their mental health. This article introduces a paradigm of product-oriented psychological dialogue, that is, to study psychological problems first, determine user needs and the most effective way of action, and then develop tools based on this paradigm. We use the above paradigm to build an artificial intelligence-based adolescent emotion adjust the conversational bot. Specifically, to explore adolescents' emotional regulation needs, this study collected the required data (n=317, 5,543 questionnaires) through the intensive tracking method. It revealed the mechanism of user needs and emotion regulation. Emotion regulation strategy weighting mechanism, and using the collected raw data and existing emotion support dialogue datasets (ESConv), a Chinese adolescent emotion regulation dialogue dataset was constructed. After that, this paper fine-tunes the existing dialogue model (GPT-2 chitchat). Through these improvements, the dialogue model has dramatically improved its performance and can also provide more personalized and effective emotional regulation support according to the actual needs of adolescents. In summary, this study provides new ideas and methods for mental health support, and promotes the research and development of emotional regulation support for adolescents.
|
|
16:00-17:00, Paper Mo-S4T9.3 | Add to My Program |
Bioelectronic Zeitgebers: Targeted Neuromodulation to Re-Establish Circadian Rhythms |
|
Deli, Alceste | University of Oxford |
Zamora, Mayela | University of Oxford |
Fleming, John E. | University of Oxford |
Divanbeighi Zand, Amir | Department of Surgical Science, University of Oxford |
Benjaber, Moaad | MRC Brain Network Dynamics Unit, University of Oxford |
Green, Alexander Laurence | University of Oxford |
Denison, Timothy | University of Oxford |
Keywords: Brain-Computer Interfaces, Human-Machine Interface, Medical Informatics
Abstract: Existing neurostimulation systems implanted for the treatment of neurodegenerative disorders generally deliver invariable therapy parameters, regardless of phase of the sleep/wake cycle. However, there is considerable evidence that brain activity in these conditions varies according to this cycle, with discrete patterns of dysfunction linked to loss of circadian rhythmicity, worse clinical outcomes and impaired patient quality of life. We present a targeted concept of circadian neuromodulation using a novel device platform. This system utilises stimulation of circuits important in sleep and wake regulation, delivering bioelectronic cues (Zeitgebers) aimed at entraining rhythms to more physiological patterns in a personalised and fully configurable manner. Preliminary evidence from its first use in a clinical trial setting, with brainstem arousal circuits as a surgical target, further supports its promising impact on sleep/wake pathology. Data included in this paper highlight its versatility and effectiveness on two different patient phenotypes. In addition to exploring acute and long-term electrophysiological and behavioural effects, we also discuss current caveats and future feature improvements of our proposed system, as well as its potential applicability in modifying disease progression in future therapies.
|
|
16:00-17:00, Paper Mo-S4T9.4 | Add to My Program |
Real-Time Disease and COVID-19 Detection Pipeline from Voice for Performance Sports (I) |
|
Biró, Attila | George Emil Palade University of Medicine, Pharmacy, Science, An |
Cuesta-Vargas, Antonio Ignacio | University of Malaga |
Szilágyi, Sándor Miklós | UMFST Tirgu Mures |
Keywords: Biometric Systems and Bioinformatics, Machine Learning, Application of Artificial Intelligence
Abstract: Voice-based disease detection with Artificial Intelligence has the potential to revolutionize healthcare, offering cost-effective, non-invasive, and accessible diagnostic methods for a wide range of diseases. The development of voice-based disease detection systems requires collaboration between multiple fields, including data science, linguistics, machine learning, and medical research. This interdisciplinary approach has led to the creation of innovative solutions that advance both healthcare and technology. By incorporating individual patient data, AI-driven voice diagnostics can provide personalized insights into a person's health, enabling tailored treatment plans that better address individual needs. The goal of the study was to find a feasible, machine learning-supported pipeline, by combining the feature extraction methods to support professional sports staff to predict disease from voice samples and prevent cross-contamination in joint events.
|
|
16:00-17:00, Paper Mo-S4T9.5 | Add to My Program |
MorpheusNet: Resource Efficient Sleep Stage Classifier for Embedded On-Line Systems |
|
Kavoosi, Ali | University of Oxford |
Denison, Timothy | University of Oxford |
Fleming, John E. | University of Oxford |
Mitchell, Morgan P. | University of Oxford |
Cagnan, Hayriye | University of Oxford |
Johansen-Berg, Heidi | University of Oxford |
Kariyawasam, Raveen | University of Oxford |
Lewis, Penny | University of Cardiff |
Keywords: Wearable Computing, Human-Machine Interface, Brain-based Information Communications
Abstract: Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95 % in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm Cortex-M4 processor, allowing its use for on-line SSC-based therapies.
|
|
16:00-17:00, Paper Mo-S4T9.6 | Add to My Program |
Target Tracking Control of Space-Manipulators on Lie Groups |
|
Monazzah Moghaddam, Borna | Carleton University |
Chhabra, Robin | Carleton University |
Keywords: System Modeling and Control, Robotic Systems
Abstract: We present a full-pose end-effector control approach on Lie groups for free-floating space manipulators with non-zero momentum while tracking a moving target during proximity operations. We model space-manipulators as open-chain multi-body systems with 1-degree-of-freedom joints where the configuration space of the spacecraft is isomorphic to the Special Euclidean group SE(3). We formulate the dynamics of the spacecraft-manipulator via the Lagrange- Poincar´e equations and the dynamics of its target via Euler- Poincar´e equations to avoid kinematic singularities associated with parametrization of their poses. This model reduces the phase space of the space-manipulator by exploiting its inherent independence of the spacecraft’s pose. We consider the full pose of the end-effector relative to the target as the system output, which transforms the output-tracking control problem into an output-regulation problem. To avoid parametrization singularities of this output, we perform feedback linearization on the matrix Lie group SE(3) in the reduced phase space of the space-manipulator. We then propose a feedback/feedforward proportional-integral-derivative workspace controller, based on coordinate-free pose and velocity error functions defined on the matrix Lie group associated with the target’s relative pose. We provide analytical proof of the almost-global stability of the presented controller when regulating the end-effector’s pose relative to the target towards identity.
|
|
16:00-17:00, Paper Mo-S4T9.7 | Add to My Program |
Event-Triggered Near Optimal Output Feedback Control for Constrained Discrete-Time Systems Via an Iterative Adaptive Algorithm (I) |
|
Hou, Jiaxu | Beijing Normal Universuty |
Zhu, Liao | Beijing Normal University at Zhuhai |
Guo, Ping | Beijing Normal University |
Keywords: System Modeling and Control, Adaptive Systems, Discrete Event Systems
Abstract: For event-triggered optimal control problems of general nonlinear systems, it is very difficult to obtain the optimal analytical solution. In this paper, an adaptive near optimal output feedback control method is presented for discrete-time (DT) nonlinear systems. The event-triggered mechanism is introduced to significantly reduce the execution costs through aperiodic control updating intervals without affecting system responses. Furthermore, an integral term is presented in the performance index function to achieve the optimal constrained control. Consequently, iterative dual heuristic dynamic programming algorithm (DHP) is adopted to learn the optimal control law and the costate function. Finally, two examples are provided to illustrate the effectiveness of the proposed approach.
|
|
Mo-S5T1 Virtual Session, Room T1 |
Add to My Program |
Neural Networks and Their Applications II |
|
|
|
17:00-18:00, Paper Mo-S5T1.1 | Add to My Program |
Spatial-Temporal Hierarchical Graph Convolutional Networks for Traffic Forecasting |
|
Shao, Qingyuan | Nanjing University of Science and Technology |
Yan, Hui | Nanjing University of Science and Technology |
Chen, Yuxin | Nanjing University of Science and Technology |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Deep Learning
Abstract: Traffic forecasting is a critical task in transportation planning and management, which requires modeling the complex spatial and temporal dependencies in traffic data. Most current methods employ Graph Convolutional Networks (GCN) to model spatial dependencies, and Recurrent Neural Networks (RNN) or Temporal Convolutional Networks (TCN) to model temporal dependencies. However, the representation ability of such methods is limited due to: 1) the conventional temporal extract models, such as RNN and TCN, suffer from limited flexibility, specifically, RNN can only capture temporal dependencies sequentially, while TCN is constrained by its multi-layer dilated convolution structure; 2) spatial and temporal dependencies are intricately intertwined in the real world, but most methods fail to capture this spatial-temporal correlation resulting in sub-optimal performance. To this end, we propose the Spatial-Temporal Hierarchical Graph Convolutional Networks (STHGCN), in which we design Spatial-Temporal Hierarchical Graph (STHG) to simultaneously model spatial and temporal dependencies. Specifically, to model temporal dependencies more flexibly, we introduce two crucial components: the Local Temporal Transmission Matrix (LTTM) and the Multi-hop Temporal Similarity Matrix (MTSM). The LTTM captures adjacent temporal dependencies, while the MTSM captures multi-hop temporal dependencies. We further propose a Temporal Neighbor Fusion model that combines the LTTM and MTSM to obtain the adjacency matrix of STHG. Additionally, accounting for spatial-temporal correlation, we exploit the spatial GCN results as the STHG nodes which allows us to learn spatial and temporal dependencies simultaneously via the temporal GCN. Our experiments on four real-world datasets demonstrate that STHGCN outperforms the state-of-the-art methods for traffic forecasting. The code is available at https://github.com/sqy123qwer/STHGCN
|
|
17:00-18:00, Paper Mo-S5T1.2 | Add to My Program |
Topology-Aware Debiased Self-Supervised Graph Learning for Recommendation |
|
Han, Lei | Nanjing University of Science and Technology |
Yan, Hui | Nanjing University of Science and Technology |
Qiao, Zhicheng | Nanjing University of Science and Technology |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Deep Learning
Abstract: In recommendation, graph-based Collaborative Filtering (CF) methods mitigate the data sparsity by introducing Graph Contrastive Learning (GCL). However, the random negative sampling strategy in these GCL-based CF models neglects the semantic structure of users (items), which not only introduces false negatives (negatives that are similar to anchor user (item)) but also ignores the potential positive samples. To tackle the above issues, we propose Topology-aware Debiased Self-supervised Graph Learning (TDSGL) for recommendation, which constructs contrastive pairs according to the semantic similarity between users (items). Specifically, since the original user-item interaction data commendably reflects the purchasing intent of users and certain characteristics of items, we calculate the semantic similarity between users (items) on interaction data. Then, given a user (item), we construct its negative pairs by selecting users (items) which embed different semantic structures to ensure the semantic difference between the given user (item) and its negatives. Moreover, for a user (item), we design a feature extraction module that converts other semantically similar users (items) into an auxiliary positive sample to acquire a more informative representation. Experimental results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets. Our model implementation codes are available at url{https://github.com/malajikuai/TDSGL}.
|
|
17:00-18:00, Paper Mo-S5T1.3 | Add to My Program |
Time-Graph Adjustive Graph Convolutional Recurrent Network for Traffic Forecasting |
|
Shao, Qingyuan | Nanjing University of Science and Technology |
Yan, Hui | Nanjing University of Science and Technology |
Chen, Yuxin | Nanjing University of Science and Technology |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Deep Learning
Abstract: Traffic forecasting is a crucial undertaking in the transportation domain. Current practices rely heavily on Recurrent Neural Networks (RNNs) and Temporal Convolutional Networks (TCNs) to model temporal dependencies in traffic forecasting. However, these approaches tend to overlook the interdependence of multi-hop time steps, impeding their ability to capture long-term dependencies and ultimately limiting their effectiveness in long-term forecasting. To address this issue, we present a novel method, the Time-Graph Adjustive Graph Convolutional Recurrent Network (TAGRN), for traffic forecasting. Our approach employs a Time-Graph model on the temporal domain, treating each time step of the traffic series as a graph node. We incorporate the similarity between time steps as weighted edges, enabling the Time-Graph to capture correlations between multi-hop time steps and model long-term dependencies through graph convolution. We utilize the Simple Graph Convolution (SGC) technique for information propagation due to its simplicity, linearity, and efficiency. Additionally, we introduce an adaptive semantic graph to enhance the capture of spatial information on the spatial domain. Experimental results on various real traffic datasets demonstrate the effectiveness of our proposed method. Compared to existing approaches, TAGRN achieves superior performance in long-term traffic forecasting, highlighting its potential for practical applications. The code is available at https://github.com/sqy123qwer/TAGRN.
|
|
17:00-18:00, Paper Mo-S5T1.4 | Add to My Program |
A Social Recommendation Model Based on Mining Timing Information and Enhancing Item Neighborhood Relationships |
|
Dai, Zhiqiang | Qilu University of Technology(shandong Academy of Sciences) |
Gao, Qian | Qilu University of technology(Shandong Academy of Sciences) |
Fan, Jun | Business-Intelligence of Oriental Nations Corporation Ltd |
Keywords: Neural Networks and their Applications, Complex Network, Deep Learning
Abstract: With the advancement of the Internet, Graph Neural Networks based recommendation systems have become a topic of great concern in the research field. However, the current recommendation systems still have the following problems. First, it focuses on modeling users but ignores the problems of missing values of item rating vectors and non-corresponding positions of item rating vectors in the process of solving associated items; second, it focuses on the association relationship between users but pays less attention to the association relationship between items; third, there is insufficient research on the short-term attractiveness of items and the users' temporary preferences. To address the above problems, this study proposes the following solutions to better construct the item social graph and extract the short-term interest/attraction of users/items. Firstly, for problem one, this study reconstructs the item rating vector innovatively based on whether users have interaction with the items; secondly, for problem two, this study proposes to use Pearson similarity to calculate the association relationship between items so as to construct the item social graph. Again, for problem three, This paper investigates temporal information features and extracts short-term user preferences and item attractiveness. To achieve this, an attention network that focuses on temporal information features is constructed by combining channel attention and bidirectional long-term and short-term memory networks. Finally, it involves using a multilayer perceptron with a residual connection structure to combine user and item factors, leading to more accurate predictions. In this study, two publicly available datasets, Epinions and Ciao, were used in a comparative experiment. This model outperformed other baseline models in the experiment, resulting in a reduction of 1.42% and 1.24% in MAE values, and 1.47% and 1.38% in RMSE values, respectively. These findings suggest that incorporating short-term preferences and reconstructing item social graphs can enhance the precision of social recommendations.
|
|
17:00-18:00, Paper Mo-S5T1.5 | Add to My Program |
Cascaded Learning Generation Framework for Quadrotor UAV Maneuvering Simulation Models |
|
Zeng, Shaoxiong | Academy of Military Science |
Weilong, Yang | Academy of Military Science |
Zhou, Dongao | Academy of Military Science |
Xu, Xinhai | Academy of Military Sciences |
Keywords: Neural Networks and their Applications, Metaheuristic Algorithms, Deep Learning
Abstract: The quadrotor unmanned aerial vehicle (UAV) is widely used due to its low maintenance cost, high maneuverability and strong hovering capability. Modeling the quadrotor UAV maneuver and simulating its performance can effectively support airborne intelligent algorithms training such as mission planning and scheduling. Traditional quadrotor UAV maneuver modeling method construct high-order mathematical model based on physics analysis, which require significant expertise and difficult to generalize. In this paper, we analyze the quadrotor UAV maneuvering process and propose a cascaded quadrotor UAV maneuvering model generating framework based on deep neural network. Using long short-term memory (LSTM) network to model each part of the quadrotor UAV maneuvering process individually, and flexibly combine network of each part to obtain varying granularity models. A variable-dimensional particle swarm optimization (PSO) algorithm based on detour foraging strategy is proposed to simultaneously determine the LSTM network’s hidden layers and neurons of each hidden layer. We validate the effectiveness of the maneuvering model generation framework and the improved PSO algorithm through comparative experiments.
|
|
17:00-18:00, Paper Mo-S5T1.6 | Add to My Program |
Towards Interpretable, Attention-Based Crime Forecasting |
|
Ma, Yujunrong | University of Maryland |
Qi, Xiaowen | University of Maryland |
Nakamura, Kiminori | University of Maryland |
Bhattacharyya, Shuvra | University of Maryland, College Park |
Keywords: Neural Networks and their Applications, Artificial Social Intelligence, Expert and Knowledge-Based Systems
Abstract: While the use of machine learning techniques in high stake fields, such as medical diagnosis and criminal justice, has been increasing in recent years, concerns have been raised regarding the lack of transparency and interpretability of the algorithms used. In this paper, we propose the use of interpretable attention-based ConvLSTM models for crime forecasting application. This approach combines the power of ConvLSTM models in capturing spatio-temporal patterns with the interpretability of attention mechanisms, allowing for the identification of key geographic areas in the input data that contribute to the prediction. We demonstrate the effectiveness of this approach through experiments on real-world crime data, showing that our model demonstrates high accuracy in crime predictions while providing insightful visualization that enhances the interpretability of prediction results.
|
|
17:00-18:00, Paper Mo-S5T1.7 | Add to My Program |
IFF-Net: I-Frame Fusion Network for Compressed Video Action Recognition |
|
Li, Shaojie | Inner Mongolia University |
Guo, Jinxin | Inner Mongolia University |
Zhang, Jiaqiang | Inner Mongolia University |
Guo, Xu | Inner Mongolia University |
Ma, Ming | Inner Mongolia University |
Keywords: Neural Networks and their Applications, Deep Learning, Representation Learning
Abstract: Compressed video action recognition has received significant attention due to its potential for reducing storage and computational costs. However, the current methods typically only capture a few RGBs and compressed motion cues (e.g., motion vectors and residuals), which are insufficient for modeling actions at their full temporal extent. To address this issue, we propose a Time Domain Fusion (TDF) Module that can extract both low-frequency and high-frequency components from the video and integrate them seamlessly, resulting in the effective integration of abundant motion information into a single frame. More importantly, by using the TDF module, we introduced a new network called I-Frame Fusion Network (IFF-Net). The IFF-Net interacts with the original network (I-frame, motion vector, and residual) in two ways: explicit and implicit. Explicit interaction involves extracting the new representation and the original compressed representation information separately and then performing a later fusion. In contrast, implicit interaction uses the distillation approach, with the IFF-Net acting as the teacher to guide the I-frame network to learn full temporal expressions. Our approach performs better than state-of-the-art methods on the UCF-101 and HMDB-51 datasets for compressed video action recognition.
|
|
Mo-S5T2 Virtual Session, Room T2 |
Add to My Program |
Security and Risk Management |
|
|
|
17:00-18:00, Paper Mo-S5T2.1 | Add to My Program |
Blockchain-Based Multi-Cloud Data Storage System Disaster Recovery |
|
Wang, Feiyu | Inner Mongolia University |
Zhou, Jiantao | Inner Mongolia University |
Keywords: System Architecture, Distributed Intelligent Systems, Large-Scale System of Systems
Abstract: Cloud storage services have been used by most businesses and individual users. However, data loss, service interruptions and cyber attacks often lead to cloud storage services not being provided properly, and these incidents have caused financial losses to users. Second, traditional and single-cloud model disaster recovery services are no longer suitable for the current complex cloud storage systems. Therefore, a scheme to provide disaster recovery for cloud storage services in a multi-cloud storage environment is needed in real production. In this paper, we propose a disaster recovery scheme based on blockchain technology. The proposed scheme outlined in this study aims to address the issue of data availability within the cloud storage landscape. The proposed scheme achieves this goal by dividing data into hot and cold categories, verifying the integrity of copy data via blockchain technology, and utilizing blockchain networks to manage multi-cloud storage systems. Experimental findings demonstrate that the proposed scheme yields superior results in terms of computation and time overheads.
|
|
17:00-18:00, Paper Mo-S5T2.2 | Add to My Program |
The Potential of RISC-V Platform in Financial Computing on Option Pricing and Energy Efficiency |
|
Guo, Guoxiang | Faculty of Science and Technology, University of Macau |
Qi, Yuanyuan | Faculty of Science and Technology, University of Macau |
Zhu, Minhao | Faculty of Science and Technology, University of Macau |
Wang, Yang | Shenzhen Institutes of Advanced Technology |
Yen, Jerome | Faculty of Science and Technology, University of Macau |
Keywords: Consumer and Industrial Applications, Infrastructure Systems and Services, Service Systems and Organizations
Abstract: The fifth version of the Reduced Instruction Set Computer (RISC-V) is a popular instruction set architecture (ISA) featured for low energy consumption. Currently, a growing number of industrial applications are based on RISC-V platforms, especially Internet of Things (IoT) devices. Those applications pursue low power and just sufficient computing capacity. However, low power consumption shall not be directly regarded as weak in computation. Recent advancement in RISC-V shows the potential for building a computing platform capable of handling tasks requiring considerable computing power. Traditional financial computing platforms are generally based on Complex Instruction Set Computers (CISC), like x86 platforms. As green computing is sweeping, it is meaningful to handle financial computing tasks with less energy consumption. To explore the potential of RISC-V in financial computing, we set up a typical financial computing task - American option implied volatility calculation, and examine the performance and power consumption of x86 and RISC-V platforms. The result shows that the RISC-V CPU is sufficient for some financial computing scenarios considering actual requirements. A heterogeneous computing system composed of x86 and RISC-V platforms could significantly improve energy efficiency.
|
|
17:00-18:00, Paper Mo-S5T2.3 | Add to My Program |
Redistillation of Radio Frequency Knowledge for RFF Imbalanced Sample Recognition |
|
Fan, Xiaolin | Institute of Artificial Intelligence Xiamen University Xiamen, C |
Zhao, Caidan | Dept.of Informatics Xiamen University, Xiamen, China |
Xiao, Liang | Xiamen University |
Lei, Yang | Xiamen University |
Keywords: Communications
Abstract: Radio Frequency Fingerprint (RFF) technology is an effective means to defend against cheating and counterfeiting attacks in wireless communication. However, to move from a theoretical algorithm to a practical application, the challenges of imbalanced data samples and environmental noise must be addressed for Radio Frequency (RF) identification technology. Although noise reduction can restore the signal to some extent, the recognition performance of RFF technology is affected when the dataset is imbalanced. While many RF identification algorithms focus on identification performance under a low Signal-to-Noise Ratio (SNR), performance degradation caused by data imbalance is a pressing problem that requires attention. Directly applying re-sampling algorithms in imbalanced dataset processing can lead to data overlap and neural network over-fitting. To address these issues, this paper proposes a "Redistillation of Radio Frequency Knowledge" (RRFK) algorithm combined with knowledge distillation (KD). The experimental results show that the proposed algorithm can achieve good recognition performance in both stepped and long-tail imbalanced data sets.
|
|
17:00-18:00, Paper Mo-S5T2.4 | Add to My Program |
Strategies of Repair Sequence for the Metro in Bus Feeder Scenario from the Resilience Perspective |
|
Du, Mijie | Northwestern Polytechnical University |
Guo, Wenxuan | Taiyuan University of Technology |
Zhao, Jing | Northwestern Polytechnical University |
Xie, Xinzhe | Northwest Polytechnic University |
Wu, Yanfang | Northwestern Polytechnical University |
Wang, Ying | Northwestern Polytechnical University |
Keywords: Conflict Resolution, Infrastructure Systems and Services, System Modeling and Control
Abstract: The metro system significantly reduced the pressure on urban traffic with its high capacity and speed. However, urban metro system failures and accidents are common, and how to better repair the metro system after a disturbance has attracted extensive attention from researchers and government agencies. Existing repair strategies take less account of passenger factors and differences in station failure levels. To address this issue, this paper proposes a repair model with the objectives of maximum resilience and speed under differences in station failure levels in bus feeder scenario, taking into account both passenger travel time and passenger flow characteristics in network performance, which can provide the best repair sequence strategy for multi-engineering teams. The proposed model is further applied to the metro network in downtown Shanghai. The simulation results indicate that the resilience-first strategy provides the highest level of resilience, but the total repair time is prolonged. Moreover, the speed-first repair strategy achieves optimal resilience and ensures the shortest repair time.
|
|
17:00-18:00, Paper Mo-S5T2.5 | Add to My Program |
Vulnerability Analysis of Interdependent Infrastructures Considering the Sensitivity of Components to Different Risks |
|
Wu, Yanfang | Northwestern Polytechnical University |
Guo, Peng | Northwestern Polytechnical University |
Wang, Ying | Northwestern Polytechnical University |
Du, Mijie | Northwestern Polytechnical University |
Wang, Xiaonan | Northwestern Polytechnical University |
Zhang, Dingning | Northwestern Polytechnical University |
Keywords: Conflict Resolution, Infrastructure Systems and Services, System Modeling and Control
Abstract: Infrastructure systems have become increasingly vulnerable to various risks due to their growing interdependencies. The vulnerability analysis of Interdependent Infrastructure Systems (IISs) is crucial for enhancing system resilience against threats. This paper proposes a global vulnerability assessment model by considering the interplay between the attack strength of different risks and the preparedness of infrastructure systems. Three load metrics, Degree-based load, Betweenness-based load, and Flow-based load are adopted to evaluate the vulnerability of IISs. The vulnerability assessment model incorporates both inner- and inter-networks cascading failures to examine the impact of tolerance parameters on global vulnerability. Moreover, critical components are identified by analyzing N-1 scenarios. The results indicate that the vulnerability of IISs assessed by the flow-based load metric can reflect the real-world performance of service-providing IISs. Adjusting the tolerance parameter proves to be effective in mitigating the spread of failures. The vulnerability distribution of IISs can assist stakeholders in identifying critical components to protect for safeguarding IISs against attacks.
|
|
17:00-18:00, Paper Mo-S5T2.6 | Add to My Program |
Global-Aware Attention Network for Multi-Modal Sarcasm Detection |
|
Song, Liujing | University of Chinese Academy of Sciences |
Zhao, Zefang | Computer Network Information Center, Chinese Academy of Sciences |
Ma, Yuxiang | Henan University |
Liu, Yuyang | Chinese Academy of Medical Sciences and Peking Union Medical Col |
Li, Jun | Computer Network Information Center |
Keywords: Affective Computing, Human-Computer Interaction
Abstract: Sarcasm detection is crucial for natural language processing in various applications, such as affective computing and opinion mining. Multi-modal sarcasm detection, which combines information from different modalities, has attracted increasing attention in recent years. However, many current methods concatenate image and text features directly without considering the contextual information between the cross-modal alignment and single-modal features simultaneously. Inspired by this observation, we propose a novel Global-Aware Attention Network (GAAN) for multi-modal sarcasm detection. Specifically, we investigate a cross-modal multi-granularity alignment module that captures align context features through coarse-grained and fine-grained attention. More importantly, considering the complementary effects of single-modal contextual information in sarcasm detection, we fuse textual, visual context features and alignment context features to obtain the global context features. We conducted extensive experiments on public datasets, and the results compared to the baselines illustrate that our proposed model obtains state-of-the-art performance in multi-modal sarcasm detection.
|
|
17:00-18:00, Paper Mo-S5T2.7 | Add to My Program |
An Error Correction Mid-Term Electricity Load Forecasting Model Based on Seasonal Decomposition |
|
Zhang, Liping | Chongqing University of Posts and Telecommunications |
Wu, Di | Southwest University |
Luo, Xin | Chinese Academy of Sciences |
Keywords: Intelligent Power Grid, Consumer and Industrial Applications
Abstract: Mid-term electricity load forecasting (LF) plays a critical role in power system planning and operation. To address the issue of error accumulation and transfer during the operation of existing mid-term LF models, a novel model called error correction based LF (ECLF) is proposed in this paper, which is designed to provide more accurate and stable LF. Firstly, time series analysis and feature engineering act on the original data to decompose load data into three components and extract relevant features. Then, based on the idea of stacking ensemble, long short-term memory is employed as an error correction module to forecast the components separately, and the forecast results are treated as new features to be fed into extreme gradient boosting for the second-step forecasting. Finally, the component sub-series forecast results are reconstructed to obtain the final LF results. The proposed model is evaluated on real-world electricity load data from two cities in China, and the experimental results demonstrate its superior performance compared to the other benchmark models.
|
|
Mo-S5T3 Virtual Session, Room T3 |
Add to My Program |
Systems Safety and Security II |
|
|
|
17:00-18:00, Paper Mo-S5T3.1 | Add to My Program |
Personalized Educational Video Evaluation Combining Student's Cognitive and Teaching Style |
|
Weng, Jinta | University of Chinese Academy of Sciences |
Dong, Haoyu | School of Cyber Security, University of Chinese Academy of Scien |
Deng, Yifan | School of Cyber Security, University of Chinese Academy of Scien |
Wu, Hao | Guangzhou University |
Hu, Yue | School of Cyber Security, University of Chinese Academy of Scien |
Huang, Heyan | School of Computer Science and Technology, Beijing Institute Of |
|
|
17:00-18:00, Paper Mo-S5T3.2 | Add to My Program |
Cnn-Based Visible Ingredients Recognition in a Food Image Using Decision Making Schemes |
|
Fu, Kun | Iwate Prefectural University |
Dai, Ying | Iwate Pref. University |
Zhu, Ziyi | Iwate Prefectural University |
Keywords: Information Systems for Design, Ethics of AI and Pervasive Systems, Multimedia Systems
Abstract: Food image recognition is different from general object recognition tasks. Many types of ingredients in food images do not have a unique spatial layout, and the shape of ingredients may change with different cooking and cutting methods. This poses a significant challenge for food recognition, especially in recognizing the ingredients in food images. In this paper, we focus on recognizing the ingredients segmented from food images and propose a method for it. Specifically, we localize the candidate regions of the ingredients for recognition using the methods of locating and sliding windows. Then, these regions are assigned to ingredient classes using a CNN (convolutional neural network)-based single ingredient classification model trained on a dataset of single ingredient images. Finally, the ingredients are determined from these candidate results using a decision-making scheme. The effectiveness of the proposed method is evaluated through experimental results.
|
|
17:00-18:00, Paper Mo-S5T3.3 | Add to My Program |
Creation of Delicious Mixed Juices for Multiple Users Based on Distributed Interactive Genetic Algorithm |
|
Fukumoto, Makoto | Fukuoka Institute of Technology |
Hanada, Yoshiko | Kansai University |
Keywords: Kansei (sense/emotion) Engineering, Multimedia Systems, Interactive Design Science and Engineering
Abstract: Creating media content suited to many users’ feelings is essential in a situation with multiple users’ participation. Interactive Evolutionary Computation (IEC) is a search method that finds good solutions; however, it is originally for each user. Previous studies proposed that IEC creates mixed juices for two users. Allowing many users’ participation contributes to finding good solutions suited to many users and will support developing good products. This study aims to propose an IEC in which trio of users can participate. The elite Genetic Algorithm individuals were exchanged between the users in this method. A concrete system based on the proposed method was constructed, and the experiment was conducted with the system. Fifteen students participated in the experiment. Changes in the fitness values and search space were investigated. Furthermore, the exchanged individuals were compared with the individuals created by the examinee’s population by their fitness values. The statistical analyses showed that the proposed method could find good mixed juices for the trio of examinees, and the exchange of the elite individuals had the potential for an efficient search.
|
|
17:00-18:00, Paper Mo-S5T3.4 | Add to My Program |
DialCL-Bias: A Semantic and Contextual Framework to Identify Social Bias in Dialogue |
|
Cai, Ziyan | Tsinghua University |
Wu, Dingjun | Tsinghua University |
Li, Ping | Tsinghua University |
Keywords: Ethics of AI and Pervasive Systems, Human-centered Learning, Information Systems for Design and Marketing
Abstract: The content generated by dialogue systems has been found to contain social bias due to the social bias in the corpus and the algorithm design. We propose a Dialogue-based Contrastive Learning for identifying social Bias (DialCL-Bias) framework in dialogue based on two dimensions-relevance and fine-grain. For the identification of social bias relevance, we propose a Dialogue-based Semantic Contrastive Learning (DialSCL) method. DialSCL converts simple dichotomy into triple input and decomposes the classification target based on data annotation rules for achieving semantic-level classification. We use a Key Words and Cosine Similarity (KWCS) method to construct difficult samples and achieve improvement in DialCCL. For the identification of fine-grained social bias, we propose a Dialogue-based Context Contrastive Learning (DialCCL) method. DialCCL simultaneously learns the context-level features of samples and the parameters of classifiers in the same space to naturally generate a classifier. Finally, we measure the degree of bias in the open-domain dialogue corpus and the responses of ChatGPT by using the social bias identification classifiers.
|
|
17:00-18:00, Paper Mo-S5T3.5 | Add to My Program |
How Secure Is Code Generated by ChatGPT? (I) |
|
Khoury, Raphael | Université Du Québec En Outaouais |
Anderson, Avila | Instritut National De Recherche SCientifique |
Brunelle, Jacob | Université Du Québec En Outaouais |
Camara, Baba Mamadou | Univerisité Du Québec En Outaouais |
Keywords: Homeland Security, Quality and Reliability Engineering
Abstract: In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs generated by ChatGPT should not be overlooked. In this paper, we perform an experiment to address this issue. Specifically, we ask ChatGPT to generate a number of program and evaluate the security of the resulting source code. We further investigate whether ChatGPT can be prodded to improve the security by appropriate prompts, and discuss the ethical aspects of using AI to generate code. Results suggest that ChatGPT is aware of potential vulnerabilities, but nonetheless often generates source code that are not robust to certain attacks.
|
|
17:00-18:00, Paper Mo-S5T3.6 | Add to My Program |
Comparing the Effectiveness of Static, Dynamic and Hybrid Malware Detection on a Common Dataset (I) |
|
Razgallah, Asma | Université Du Québec En Outaouais |
Khoury, Raphael | Université Du Québec En Outaouais |
Khanmohammadi, Kobra | Univeristé Du Québec En Outaouais |
Pere, Christophe | Université Laval |
Keywords: System Modeling and Control, Technology Assessment, Trust in Autonomous Systems
Abstract: The detection of malicious Android applications is a major security challenge. A number of machine learning-based techniques have been put forth, and some of them have attained great accuracy. However, the diversity of apps and frequency at which new malware families are found means that the issue remains unresolved. In this paper, we use both static, dynamic and hybrid analysis to automatically classify Android apps as benign or infected. We compare all three approaches on a common dataset --- the TwinDroid dataset which contains over over 15,000 system call traces from over 9,000 benign and infected app, which allows comparison on equal footing. We make further contributions on the topic of feature selection and trace abstraction.
|
|
17:00-18:00, Paper Mo-S5T3.7 | Add to My Program |
A Multimodal Approach for Bridge Inspection |
|
Ma, Hongyao | Shandong Jiaotong University |
Wang, Zhixue | Shandong Jiaotong University |
Shen, Zhen | Institute of Automation, Chinese Academy of Sciences |
Zhang, Hong | Shandong Hi-Speed Qingdao Development Co., Ltd |
Li, Chuanfu | Shandong Hi-Speed Qingdao Development Co., Ltd |
Wang, Fei-Yue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Systems Safety and Security
Abstract: With the exacerbation of bridge aging issues, the demand for efficient and cost-effective bridge inspection solutions becomes increasingly urgent. Currently, most methods for surface damage detection on bridges employ single-modal, image-based object detection models. Despite their overall effectiveness on specific datasets, such methods frequently encounter error detection issues during actual inspection processes. This study proposes a multimodal (image and text) model for surface damage detection in bridge structures by combining the CLIP model with the YOLOv8 model. By conducting tests on the composite concrete bridge of the Jiaozhou Bay Bridge, Qingdao, the effectiveness of bridge detection using drones is validated, and the common issue of false positive detections in traditional object detection models is successfully addressed.
|
|
Mo-S5T4 Virtual Session, Room T4 |
Add to My Program |
Technology and Systems in Real-World Scenarios |
|
|
|
17:00-18:00, Paper Mo-S5T4.1 | Add to My Program |
Driving Mechanism of Urban-Rural Integrated Development: Population-Land-Industry Perspective |
|
He, Lei | Northwestern Polytechnical University |
Zhu, Yuming | Northwestern Polytechnical University |
Zhou, Jia-He | Northwestern Polytechnical University; Renmin University of Chin |
Zheng, Xin | Northwestern Polytechnical University |
Mu, Bingxu | Northwestern Polytechnical University |
Keywords: System Modeling and Control, Conflict Resolution, Decision Support Systems
Abstract: Urban-rural integration (URI) is a complex system engineering problem involving the interaction of multiple aspects of population-land-industry. Identifying the driving mechanisms of the drivers of urban-rural integration development is crucial for predicting the future trends of urban-rural integration and identifying the key promotion paths. In this study, 27 representative factors were synthetically identified using literature research and expert interview methods. Then, the explanatory structural model (ISM) is combined with the Impact Matrix Cross-Reference Multiplication Applied to a Classification (MICMAC), which can be used to analyze and model the complex relationships and interactions in URI systems, to establish integrated ISM-MICMAC model. The model was applied to quantify the causal driving relationships among factors and to develop a multilevel recursive structural model for understanding the underlying mechanisms. The analysis revealed that all 27 drivers can be classified into seven layers. Four source factors such as land use planning and policies, three outcome factors such as urban and rural residents' income, and 20 process factors such as population size are included. And accordingly, corresponding suggestions are made for the government. This study contributes to the advancement of theoretical and model development in understanding the driving mechanism of urban-rural integration. Moreover, it provides valuable insights for policymakers in formulating evidence-based development strategies.
|
|
17:00-18:00, Paper Mo-S5T4.2 | Add to My Program |
Pricing Strategy in Dual-Channel Supply Chain with Manufacturer's Risk Attitude under Asymmetry Information on Add-On Services |
|
Han, Wenting | Northwestern Polytechnical University |
Cai, Jianfeng | Northwestern Polytechnical University |
Chen, Nan | Northwestern Polytechnical University |
|
|
17:00-18:00, Paper Mo-S5T4.3 | Add to My Program |
A Tool for Transforming SysML State Machine into Uppaal Automatically |
|
Wang, Shaopeng | East China Normal University |
Shi, Jianqi | East China Normal University |
Huang, Yanhong | East China Normal University |
Yang, Yang | East China Normal University |
Keywords: Modeling of Autonomous Systems, Quality and Reliability Engineering, Trust in Autonomous Systems
Abstract: SysML state machine (SysML-STM) is a modeling tool used in the Systems Modeling Language (SysML) to describe the behavior of a system. It is widely used in model-driven development (MDD). Formal methods are mathematical techniques to ensure the correctness, reliability and safety of software systems and hardware designs. In this paper, we introduce formal methods into MDD by transforming a SysML-STM model into a Uppaal timed automata. By formally verifying the system at an early stage of the development life-cycle, we aim to enhance the system's robustness. We design the mapping rules between the two models and have developed a tool, STMTU, to transform them directly. Our tool effectively leverage the benefits of formal verification techniques to ensure the correctness and reliability of the system. And the direct transformation of these models not only reduces the learning cost for developers but also helps to promote the wider adoption of formal methods.
|
|
17:00-18:00, Paper Mo-S5T4.4 | Add to My Program |
A Power Electronic Converters-Inspired Approach for Modeling PWM Switched-Based Nonlinear Hydraulic Servo Actuators |
|
Bozza, Augusto | Polytechnic of Bari |
Cavone, Graziana | University Roma Tre |
Carli, Raffaele | Politecnico Di Bari |
Dotoli, Mariagrazia | Politecnico Di Bari |
Keywords: System Modeling and Control, Mechatronics, Modeling of Autonomous Systems
Abstract: This paper investigates a novel approach for properly modeling Hydraulic Servo Actuators (HSAs) based on ON-OFF switching valves. HSAs represent very high efficiency and small size-to-power ratio hydraulic actuators. Their functioning is guaranteed by their control system that ensures the desired flow-rate, and consequently, the proper pressure, to be provided to the actuator’s chambers. Nevertheless, achieving a good model of such HSAs for control purposes is non-trivial, due to their hybrid nature inherited from the switching between the different operating modes produced by valves. To overcome this limit, we propose an average equivalent discrete-time model of the chambers' pressure dynamics related to a single control input for the digital valves. The proposed model takes inspiration from the analogy existing between hydraulic systems and power electronic converters, and guarantees the same performance as the traditional model, with the advantage of greatly simplifying the control of the servo actuator. Finally, the consistency of the proposed model with respect to its nonlinear hybrid version is proved via numerical examples.
|
|
17:00-18:00, Paper Mo-S5T4.5 | Add to My Program |
A Federated Mining Framework for Complete Frequent Itemsets (I) |
|
Hong, Tzung-Pei | National University of Kaohsiung |
Hsu, Ya-Ping | National Sun Yat-Sen University |
Chen, Chun-Hao | National Taipei University of Technology |
Wu, Jimmy Ming-Tai | Shandong University of Science and Technology |
Keywords: Big Data Computing,, Soft Computing, Socio-Economic Cybernetics, Machine Learning
Abstract: In this paper, we address the common features of horizontal federated learning in data mining and propose a federated mining framework, which adopts a client-server model that cooperates with multiple data-source clients. The proposed algorithm handles client-side mining and server-side aggregation. For client-side mining, the algorithm uses prelarge itemsets to collect additional information for the server to integrate the clients’ local mining results. For server-side aggregation, the algorithm considers the characteristics of large and prelarge itemsets sent from the clients and use a boundary strategy for integration. Experiments show that our method acquires the complete mined results while protecting data.
|
|
17:00-18:00, Paper Mo-S5T4.6 | Add to My Program |
Modeling and Virtual Simulation Environment Design for Falcon-Like Flapping-Wing Aircraft |
|
Zhao, Xuena | University of Science and Technology Beijing |
Zhijie, Liu | University of Science and Technology Beijing |
Li, Guang | Queen Mary University of London |
He, Wei | University of Science and Technology Beijing |
Keywords: Autonomous Vehicle, Robotic Systems, Modeling of Autonomous Systems
Abstract: Bionic flapping-wing aircraft is a strongly coupled and underactuated system, and its dynamic modeling and intelligent control are still a major challenge. In this paper, we develop an 3-dimensional dynamic model for the flapping-wing aircraft designed by our team. The aerodynamic performance of the wing is analysed by the blade element method and a theoretical calculation model is obtained. Based on wind tunnel experiment, an aerodynamic model is identified for the V-Tail, the attitude control ruddervators. Further, we build a virtual simulation environment based on gym, which is verified by the outdoor flight data. This work provides the basis for intelligent control of flapping wing aircraft.
|
|
17:00-18:00, Paper Mo-S5T4.7 | Add to My Program |
Wing Analysis of Bionic Flapping-Wing Flying Robot |
|
Liang, Zhang | University of Science and Technology Beijing |
Xiuyu, He | University of Science and Technology Beijing |
Haisheng, Song | University of Science and Technology Beijing |
Li, Guang | Queen Mary University of London |
He, Wei | University of Science and Technology Beijing |
Keywords: Robotic Systems, Technology Assessment, Autonomous Vehicle
Abstract: Flapping-wing flying robots, as a newly emerging research hotspot, have attracted more and more researchers' attention. Compared with traditional aircraft, flapping- wing flying robots have the characteristics of high flight efficiency, good concealment, and so on, and have a wide range of application prospects. As an important power mechanism of aircraft, the research of wing is very important. We design a wing structure that can realize the active bending of wings, which can well imitate the bending pattern of wings of birds in the natural flight process. At the same time, a wind tunnel test was carried out to measure the lift resistance of the single wing and the folded wing under the same power. The results show that the folded wing has higher flight efficiency under the same power.
|
|
Mo-S5T5 Virtual Session, Room T5 |
Add to My Program |
UAVs Monitoring and Navigation Systems |
|
|
|
17:00-18:00, Paper Mo-S5T5.1 | Add to My Program |
Development of a Plant Monitoring System Using a Drone-Equipped Agricultural Robot |
|
Fujinaga, Takuya | Fukuoka University |
Keywords: Robotic Systems, Mechatronics, Cooperative Systems and Control
Abstract: This paper proposes a plant monitoring system to automate agricultural works by robots in plant factories. This system uses a drone-equipped agricultural robot to monitor plants. This robot has a camera-equipped drone tethered with a power supply cable. Therefore, the drone can fly without worrying about its battery capacity. As a case study, this system targets a strawberry greenhouse. YOLO (You Look Only Once) v4 is used to detect strawberry plants. Created detection models were evaluated by four-fold cross-validation, the average precisions of flowers, immature and mature fruits were 0.826, 0.841 and 0.920, respectively. As the performance of this system, the numbers of strawberry plants were counted based on the detection results, and the counting results were evaluated by comparing with the actual numbers. The accuracies of flowers, immature, and mature fruits were 0.963, 0.961, and 0.972 respectively. Even if strawberry plants cannot be detected from one viewpoint, if they can be detected from another viewpoint, the task of detecting them is achieved. The results of this experiment demonstrated the effectiveness of a plant monitoring system using a drone- equipped agricultural robot.
|
|
17:00-18:00, Paper Mo-S5T5.2 | Add to My Program |
MorphoLander: Reinforcement Learning Based Landing of a Group of Drones on the Adaptive Morphogenetic UAV |
|
Karaf, Sausar | Skoltech Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Martynov, Mikhail | Skolkovo Institute of Science and Technology |
Darush, Zhanibek | Skoltech |
Shcherbak, Aleksei | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Robotic Systems, Cooperative Systems and Control, Distributed Intelligent Systems
Abstract: This paper focuses on a novel robotic system MorphoLander representing heterogeneous swarm of drones for exploring rough terrain environments. The morphogenetic leader drone is capable of landing on uneven terrain, traversing it, and maintaining horizontal position to deploy smaller drones for extensive area exploration. After completing their tasks, these drones return and land back on the landing pads of MorphoGear. The reinforcement learning algorithm was developed for a precise landing of drones on the leader robot that either remains static during their mission or relocates to the new position. Several experiments were conducted to evaluate the performance of the developed landing algorithm under both even and uneven terrain conditions. The experiments revealed that the proposed system results in high landing accuracy of 0.5 cm when landing on the leader drone under even terrain conditions and 2.35 cm under uneven terrain conditions. MorphoLander has the potential to significantly enhance the efficiency of the industrial inspections, seismic surveys, and rescue missions in highly cluttered and unstructured environments.
|
|
17:00-18:00, Paper Mo-S5T5.3 | Add to My Program |
A Decentralized Importance-Based Multi-UAV Path Planning Algorithm for Wildfire Monitoring |
|
Islam, S M Towhidul | Georgia State University |
Hu, Xiaolin | Georgia State University |
Keywords: Adaptive Systems, Autonomous Vehicle
Abstract: Wildfire monitoring is a highly important task in preventing the potential damages caused by this disastrous event. In wildfire monitoring, Unmanned Aircraft Vehicles (UAVs) has gained significant attention from researchers and practitioners worldwide in recent years. While different UAV-based wildfire monitoring systems have been proposed in the past, a few of them consider decentralized computation and take into account the uneven spatiotemporal propagation nature of wildfires in their proposed solutions. This paper presents a decentralized and importance-based multi-UAV path planning algorithm for wildfire monitoring. We describe the algorithm design, implementation, and use simulation to evaluate its performance. Experiment results show the effectiveness of the proposed decentralized algorithm for importance-based wildfire monitoring.
|
|
17:00-18:00, Paper Mo-S5T5.4 | Add to My Program |
Joint UAV Trajectory Scheduling and Network Routing for FANETs: A Reinforcement Learning Approach |
|
Gan, Junyi | Tongji University |
Li, Bing | Tongji University |
Zhao, Shengjie | Tongji University |
Keywords: Communications, System Modeling and Control
Abstract: Flying Ad-hoc Networks (FANETs) consisting of multiple flexible unmanned aerial vehicles (UAVs) have gained ever-increasing attention due to their advantages in flexible deployment and enhanced data link connectivity. However, existing FANET schemes do not consider the mutual influence between data routing and UAV trajectory scheduling, resulting in limited network performance especially when the number of UAVs in FANET is large. This paper proposes a joint optimization framework for routing and UAV trajectory scheduling in FANET to enhance communication reliability and efficiency. Unlike previous studies that consider routing and trajectory scheduling separately, this framework formulates the routing problem as a Markov Decision Process (MDP) and employs trajectory scheduling to guide the state transition probability, addressing the limitation of routing algorithms that passively perceive network topology. In addition, we exploit reinforcement learning to iteratively determine the optimal routing and trajectory scheduling strategy with the support of network topology prediction. Simulation results demonstrate that our proposed approach can significantly reduces latency, communication overhead, and packet loss in FANET.
|
|
17:00-18:00, Paper Mo-S5T5.5 | Add to My Program |
NeuroSwarm: Multi-Agent Neural 3D Scene Reconstruction and Segmentation with UAV for Optimal Navigation of Quadruped Robot |
|
Zhura, Iana | Skolkovo Institute of Science and Technology |
Davletshin, Denis | Skolkovo Institute of Science and Technology |
Weerakkodi Mudalige, Nipun Dhananjaya | Skolkovo Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Peter, Robinroy | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Robotic Systems, Distributed Intelligent Systems, Adaptive Systems
Abstract: Quadruped robots have the distinct ability to adapt their body and step height to navigate through cluttered environments. Nonetheless, for these robots to utilize their full potential in real-world scenarios, they require awareness of their environment and obstacle geometry. We propose a novel multi-agent robotic system that incorporates cutting-edge technologies. The proposed solution features a 3D neural reconstruction algorithm that enables navigation of a quadruped robot in both static and semi-static environments. The prior areas of the environment are also segmented according to the quadruped robots’ abilities to pass them. Moreover, we have developed an adaptive neural field optimal motion planner (ANFOMP) that considers both collision probability and obstacle height in 2D space. Our new navigation and mapping approach enables quadruped robots to adjust their height and behavior to navigate under arches and push through obstacles with smaller dimensions. The multi-agent mapping operation has proven to be highly accurate, with an obstacle reconstruction precision of 82%. Moreover, the quadruped robot can navigate with 3D obstacle information and the ANFOMP system, resulting in a 33.3% reduction in path length and a 70% reduction in navigation time.
|
|
17:00-18:00, Paper Mo-S5T5.6 | Add to My Program |
Adaptive Containment Control of Heterogeneous Multi-Agent Systems with Unknown Leaders and Communication Delays |
|
Bi, Cong | Nankai University |
Xu, Xiang | Southern University of Science and Technology |
Keywords: Adaptive Systems, Communications, Cooperative Systems and Control
Abstract: This paper investigates the containment control problem of heterogeneous linear multi-agent systems with nonuniform distributed communication delays under an assumption that only the neighboring agents of multiple leaders have access to the information on both the system matrices and the states of multiple leaders. Adaptive distributed observers under nonuniform distributed communication delays are proposed to estimate both the system matrices and the states of multiple leaders without the prior knowledge of communication delays. Then a novel distributed output feedback control strategy is proposed based on the adaptive distributed observer. It is shown that under the proposed control strategy, the output of each follower converges to the convex hull spanned by the outputs of the leaders.
|
|
17:00-18:00, Paper Mo-S5T5.7 | Add to My Program |
Formal Verification of Ethical Choices in Industrial Cyber-Physical Systems |
|
Liu, Yinling | University of Lorraine |
Hind, Bril El-Haouzi | University of Lorraine |
Keywords: Trust in Autonomous Systems, Cyber-physical systems, Modeling of Autonomous Systems
Abstract: This paper addresses the issue of formal verification of ethical choices in Industrial Cyber-Physical Systems. An innovative approach based on Beliefs, Desires, and Intentions (BDI) agents and model checking is proposed. To do so, we first give a formal definition of ethical rules. Based on this definition, an algorithm is then designed to implement ethical reasoning. Finally, we apply this approach to TRACILOGIS Plaform to illustrate its feasibility. Four properties are designed and checked. The verification results show the agent with ethics can always reason out the least unethical actions to take.
|
|
Mo-S5T6 Virtual Session, Room T6 |
Add to My Program |
Image Processing and Pattern Recognition II |
|
|
|
17:00-18:00, Paper Mo-S5T6.1 | Add to My Program |
GRDet: Rotated Object Detection in Remote Sensing Images Based on Gaussian Distribution |
|
Cheng, Mengfan | Qilu University of Technology(Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Liu, Deqi | Qilu University of Technology(Shandong Academy of Sciences) |
Yao, Dexu | Qilu University of Technology(Shandong Academy of Sciences) |
Liu, Xiaohan | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Image Processing and Pattern Recognition
Abstract: In recent years, many rotated object detection (ROD) methods have been proposed and have attracted wide attention in many fields. Most of them use anchor-based or Gaussian heatmaps for label assignment (LA), which cannot capture the shape and orientation characteristics of the rotated object and introduce a large number of hyperparameters. At the same time, most methods only add angle regression or use enclosing rectangles to realize ROD, which cannot express the object well. In this paper, we propose a new method for ROD, named GRDet, which is keypoint-based anchor-free algorithm. GRDet can adaptively learn and represent an object with point sets, discarding the limitation of anchor on the size and orientation of the object. Specifically, we introduce a conversion function that is able to transform the point set into a rotated bounding box (RBB) for precise localization and classification. In addition, we propose a Gaussian-based dynamic label assignment (GDLA) strategy to realize the assignment of positive and negative (P&N) samples, which can adaptively learn according to the size and orientation characteristics of any rotated object. Moreover, we define an intersection over union (IoU) suitable for ROD, called Gaussian-IoU, which simulates the calculation of IoU by Gaussian distribution and solves the case that some points are not differentiable. Furthermore, we design a dynamic spatial quality constraint (DSQC) for RBB, which can dynamically evaluate the quality of the predicted RBB, and adaptively select high quality RBB. We use KFIoU loss and introduce Gaussian center loss to supervise the training of the network. Extensive experiments with DOTA dataset demonstrate the effectiveness of our proposed method.
|
|
17:00-18:00, Paper Mo-S5T6.2 | Add to My Program |
MTN: A Multi-Scale Transformer Network for Different Resolution Remote Sensing Images Change Detection |
|
Zhu, Hongming | Tongji University |
Wu, Guodong | Tongji University |
Wang, Zeju | Tongji University |
Xu, Manxin | Tongji University |
Liu, Qin | Tongji University |
Liu, Sicong | Tongji University |
Du, Bowen | Tongji University |
Keywords: Image Processing and Pattern Recognition, Deep Learning
Abstract: In the field of change detection, detecting changes in images with different resolutions is crucial for both long-term interval scenes and scenarios that require rapid detection. However, existing methods face two main issues. Firstly, they require more stringent prior knowledge. The SPM-based methods require pixel-level class labeling of high-resolution(HR) images, while the approaches based on image super-resolution require HR images corresponding to low-resolution(LR) images. Such prior knowledge is either costly for labeling or difficult to meet in realistic scenarios. The second is that redundant error accumulation affects detection accuracy. Whether in traditional sub-pixel mapping(SPM) methods or deep learning methods based on image super-resolution, the final detection results are obtained after generating HR images from LR images. The redundant error produced in this step will be accumulated and affect the final detection results. Although the unsupervised methods do not have this problem, the detection accuracy is not as good as that of the supervised. To address these issues, we propose a multi-scale Transformer network(MTN). This model first uses a multi-scale feature extractor(MFE) to extract multi-scale features and perform scale matching at the feature level. Then, the Transformer is used to extract long-range relationships of ground objects on the multi-scale features to enhance the features. Finally, the multi-scale features are fused, and a classifier composed of a convolutional network is used to obtain binary change detection results. In addition, we consider that the edges of objects may be affected during the scale matching process, and introduce a CEBoundary Loss to better detect object edges. The results on the LEVIR and Google datasets demonstrate the effectiveness of our proposed method. The source code of MTN is available at https://github.com/Gavin-debug/MultiResolutionCD.
|
|
17:00-18:00, Paper Mo-S5T6.3 | Add to My Program |
DualYOLO: Remote Small Target Detection with Dual Detection Heads Based on Multi-Scale Feature Fusion |
|
Zhang, Zhenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Chuantao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Wang, Chunxiao | Qilu University of Technology (Shandong Academy of Sciences), Sh |
Lv, Jialiang | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Image Processing and Pattern Recognition, Machine Vision, Machine Learning
Abstract: In recent years, accurate and real-time longrange small target detection has become a popular and challenging task,particularly in time-sensitive scenarios such as unmanned aerial vehicle (UAV) scene analysis and military reconnaissance. Most existing solutions rely on deep CNNs to learn strong feature representations of objects isolated from the background to detect small objects with minimal visual features in images. However, this approach incurs significant computational overhead. In this paper, we propose DualYOLO, a fast and accurate long-range small object detection method that combines multi-level multi-scale feature fusion (MLMFF) and concat channel attention (CatCA). Specifically, in order to prevent small targets from becoming more and more blurred after multilayer convolution operations, DualYOLO fuses the features of different layers in the backbone network to obtain small target features with strong semantics and high detail. Furthermore, we use a new loss function to address the sensitivity of IoU to small object position deviations, thereby improving detection accuracy. In terms of data preprocessing, we utilize an image slicing strategy to process the dataset. The experimental results show that DualYOLO achieves 82.2% accuracy (in terms of mAP@.5) on the VEDAI dataset processed using slices, with a performance more than 2% higher than that of large models (e.g., YOLOv5x,YOLOR and YOLOv7).
|
|
17:00-18:00, Paper Mo-S5T6.4 | Add to My Program |
Object Semantics Give Us the Depth We Need: Multi-Task Approach to Aerial Depth Completion |
|
Hatami Gazani, Sara | University of Victoria |
Dadboud, Fardad | University of Ottawa |
Bolic, Miodrag | University of Ottawa |
Mantegh, Iraj | National Research Council Canada |
Najjaran, Homayoun | University of British Columbia |
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications
Abstract: Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.
|
|
17:00-18:00, Paper Mo-S5T6.5 | Add to My Program |
CAME: Convolution and Attention Construct Multi-Scale Neural Network Efficiently for Medical Image Classification |
|
Li, Jiuqiang | Southwest Jiaotong University |
Keywords: Image Processing and Pattern Recognition, Deep Learning, Machine Vision
Abstract: The study of medical image classification is of great importance to assist doctors in diagnosing conditions and intelligently identifying types of illnesses. However, unlike general image classification, this task is still very challenging because medical images have more complex and variable structural patterns. There are two directions currently explored: transfer learning and Transformer-based, but they have certain drawbacks. For example, transfer learning requires a large amount of annotated data support and the training process is tedious and complicated; the Transformer-based approach has the problem of high computational time complexity, which is also very time-consuming. It cannot be ignored that their representations for classification need to be enhanced. Based on the above issues, our proposed CAME consists of three feature extraction modules from different scales including local feature information, global semantic representation, and external attention, and a multi-scale feature aggregation module (MSFA). The MSFA module enhances the semantics of each of the three scales of representations through space, through the attention mechanism, and then aggregates the three enhanced representations to obtain the final representation. In the experiments, the proposed CAME performs best in the baseline on the benchmarks of the three criteria and achieves end-to-end medical image classification.
|
|
17:00-18:00, Paper Mo-S5T6.6 | Add to My Program |
A Novel Multimodal Prototype Network for Interpretable Medical Image Classification |
|
Wang, Guangchen | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Jinbao | Qilu University of Technology (Shandong Academy of Sciences) |
Tian, Cheng | Qilu University of Technology (Shandong Academy of Sciences) |
Ma, Xitong | Qilu University of Technology |
Liu, Song | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Image Processing and Pattern Recognition, Deep Learning, Neural Networks and their Applications
Abstract: Medical image classification is a main task in medical diagnosis field. Some black box models have achieved expert-level accuracy on medical datasets, but these models are less adopted in clinical practice due to the lack of interpretability. In the past few years, designing interpretable models has been one of the major challenges in the medical field. The existing interpretable prototype networks only use medical images for training, ignoring the role of medical reports, and these medical reports can assist in prototype training and activation. Furthermore, existing prototype network methods neglect the position information in medical images, which is helpful for disease diagnosis. To solve these shortcomings, we propose an interpretable medical image classification framework (MProtoNet) that improves the accuracy and interpretability of disease predictions. In MProtoNet, we design a multimodal attention module and use prototype activation restriction loss to provide evidence for prototype training and activation. In addition, we design a position embedding module and multi-factor similarity calculation method to effectively utilize the position information in the image. We conducted experiments on the chest datasets MIMIC-CXR and open-I to test the model and compare it with other baseline models. Experimental results show that MProtoNet has made improvements in accuracy while preserving the interpretability of the model.
|
|
17:00-18:00, Paper Mo-S5T6.7 | Add to My Program |
Automatic Deep Active Contour Model Based on Local Image Characteristics |
|
Haijun, You | South China University of Technology |
Kai, Li | The Third Affiliated Hospital of Sun Yat-Sen University |
Chen, Junying | South China University of Technology |
Keywords: Image Processing and Pattern Recognition, Hybrid Models of Computational Intelligence, Application of Artificial Intelligence
Abstract: In recent years, convolutional neural networks have made remarkable achievements in medical image segmentation. Unclear margins and fuzzy boundary regions are formed surrounding the thyroid gland area under the influence of artifacts and speckles, which increases the difficulty of thyroid ultrasound image segmentation. However, building an effective CNN model usually requires massive computing resources, long training time, and large amounts of carefully labeled data. In this work, we develop a fast, compact, accurate and robust image segmentation model for automatic thyroid segmentation in medical ultrasound images. We propose an automatic deep active contour (DAC) model based on local image characteristics, which utilizes the deep neural network model to obtain the initial contour and constructs external energy items based on local image characteristics of contour points. Compared with deep neural network model, DAC model requires less training data, and the segmentation performance of DAC model on the small dataset can reach that of the deep neural network model on the larger dataset. The proposed DAC model can be used for automatic thyroid segmentation in medical ultrasound images, which does not require massive computing resources, long training time, and large amounts of carefully labeled data.
|
|
Mo-S5T7 Virtual Session, Room T7 |
Add to My Program |
New Session for Latest Online Requests I |
|
|
|
17:00-18:00, Paper Mo-S5T7.1 | Add to My Program |
Adaptive Multitask Evolutionary Optimization for Tasks with Non-Uniform Evaluation Cost |
|
Wang, Wenhui | Sun Yat-Sen University |
Chen, Zefeng | Sun Yat-Sen University |
Zhou, Yuren | Sun Yat-Sen University |
Keywords: Evolutionary Computation
Abstract: In recent years, there has been a great deal of research on evolutionary multitasking.Most existing evolutionary multitask algorithms treat all optimization tasks equally, assuming that all tasks can be evaluated at the same time. However, in practice, the function evaluation time of different tasks varies greatly due to their distinct properties. This paper proposes to formalize this special scenario of multitask optimization as multitask optimization with non-uniform evaluation cost, and concentrates on how to effectively and efficiently solve it. For this type of problem, we propose an adaptive evolutionary multitask optimization algorithm that uses a multi-population evolutionary framework to solve multiple tasks in parallel, allocating different computational resources to different tasks. It performs effective positive knowledge transfer between different tasks, and dynamically adjusts the amount of knowledge transfer according to the survival rate of offspring and transferred individuals. To verify the performance of the proposed algorithm, we conduct experiments on artificially constructed benchmark problems fitting our proposed scenario of multitask optimization with non-uniform evaluation cost. And the experimental results show the superiority of our proposed algorithm over other state-of-the-art algorithms. For the multitask optimization problem with non-uniform evaluation cost, the proposed algorithm can solve tasks efficiently within a limited budget of computational resources.
|
|
17:00-18:00, Paper Mo-S5T7.2 | Add to My Program |
Data-Efficient Elitist Evolutionary Algorithm for Training Neural Networks |
|
Yang, Yurui | Sun Yat-Sen University |
Chen, Zefeng | Sun Yat-Sen University |
Keywords: Evolutionary Computation, Heuristic Algorithms, Machine Learning
Abstract: Currently, optimizers based on gradients dominate the training of neural networks. In contrast to gradient-based optimizers, evolutionary algorithms are population-based optimization algorithms that possess several notable advantages, such as strong parallel capability and the ability to escape from local optima. However, they have gradually been marginalized due to their low efficiency. In this paper, we propose an approach called the data-efficient elitist evolutionary algorithm (DeiEA) to address the low-efficiency issue of evolutionary-based optimizers. DeiEA utilizes a mini-batch of data, replacing the entire training dataset, to evaluate individuals in the population. Additionally, by carefully designing the procedure for producing offspring using three effective strategies, the proposed DeiEA can achieve faster convergence while maintaining population diversity.
|
|
17:00-18:00, Paper Mo-S5T7.3 | Add to My Program |
Online Product Pricing Research Considering Price Anchoring and Online Reviews (I) |
|
Liu, Xuwang | Henan University |
Zhang, Qiannan | Henan University |
Qi, Wei | Henan University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Tang, Ying | Rowan University |
Keywords: Cybernetics for Informatics, Computational Intelligence in Information, Big Data Computing,
Abstract: The adjustment effect of the anchoring effect and online reviews on consumer cognition has grown to be a significant element influencing business pricing. This study explores the effects of online reviews and price anchoring on company pricing and profits by building an online product pricing model from the perspective of consumer purchasing psychology. The findings show that firms must take consumer anchoring psychology into account when making decisions if they want to increase revenue. Different pricing strategies are used depending on the variables associated with the quality of online reviews. The higher the sensitivity coefficient of reviews, in particular when the quality of the reviews is higher than a specified value, the bigger the profit. In the anchor point, the optimal price is rising. When a company chooses a higher price policy, the optimal price steadily decreases with the degree of anchoring.
|
|
17:00-18:00, Paper Mo-S5T7.4 | Add to My Program |
Bundle Pricing of Product Line and Value-Added Services Considering Reference Price Effect (I) |
|
Qi, Wei | Henan University |
Li, Nan | Henan University |
Liu, Xuwang | Henan University |
Guo, Xiwang | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Tang, Ying | Rowan University |
Keywords: Soft Computing, Socio-Economic Cybernetics, Cybernetics for Informatics, Computational Intelligence in Information
Abstract: Price is an important index of consumers' purchase choice, and the price comparison behavior of consumers in the decision-making process also affects the profit and loss of their own purchase utility to varying degrees. Based on multinomial logit(MNL) model, the reference price is incorporated into product line development and design, and the pricing decision of product line and value-added services bundle is studied. The influence mechanism of reference price effect on optimal product pricing and maximum profit is analyzed, and the deviation of strategic decision-making caused by not considering reference price effect is discussed. The results show that the reference price effect has a positive impact on the lowest price products in the product line, but has a negative impact on the high price products in the product line, the total market share of the firm and the total profit. When the reference price effect is ignored, the pricing of different quality products and services in the product line will be higher or lower, and the total market share and total profit will be higher. The results can provide theoretical support for product line design and pricing decision.
|
|
17:00-18:00, Paper Mo-S5T7.5 | Add to My Program |
Image-Guided Point Cloud Completion with Multi-Modal Fusion Transformers |
|
Li, ZhaoWen | Sun Yat-Sen University |
Lin, Shujin | Sun Yat-Sen University |
Zhou, Fan | Sun Yat-Sen University |
Keywords: Machine Vision, Deep Learning
Abstract: The task of image-guided point cloud completion aims to leverage information from images to address uncertainty issues in the completion inference of point clouds. The key challenge in this setting lies in how to effectively combine features extracted from both modalities. Due to the large domain discrepancy between the image and point cloud, existing methods that use cross-modal attention to directly fuse features have increased attention on redundant information and noise from different modalities, resulting in poor feature fusion performance. Hence, by introducing multi-modal fusion transformers that use bottleneck tokens, we enabled point cloud feature to learn image feature through information bridges, leading to improved point cloud completion performance. Our method can not only benefit from RGB images, but also from sketches with less feature information but more emphasis on edge information. Extensive experiments demonstrate that our proposed method enhances the quality of point cloud completion and outperforms other state-of-the-art methods.
|
|
17:00-18:00, Paper Mo-S5T7.6 | Add to My Program |
Are Deep Point Cloud Classifiers Suffer from Out-Of-Distribution Overconfidence Issue? |
|
He, Xu | Guangzhou University |
Tang, Keke | Guangzhou University |
Shi, Yawen | Guangzhou University |
Li, Yin | Fudan University |
Peng, Weilong | Guangzhou University |
Zhu, Peican | Northwestern Polytechnical University |
Keywords: Deep Learning, AI and Applications, Machine Vision
Abstract: 3D point cloud perception using deep neural networks (DNNs) has been a trend for various application scenarios. However, the black-box nature of DNNs will bring many hidden risks as in the 2D image field. In this paper, we present a preliminary evaluation on the out-of-distribution (OOD) overconfidence issue of deep point cloud classifiers, which has been proven to exist in deep 2D image classifiers, i.e., OOD inputs will lead to overconfident predictions on predefined categories. We also investigate whether a simple thresholding baseline and two modern OOD detection solutions can handle the issue by detecting OOD samples. Extensive experiments with four representative deep point cloud classifiers train/evaluate on different in/out-of-distribution point clouds validate the severity and knottiness of the OOD overconfidence issue. Our investigation will provide the groundwork for future studies on handling the OOD overconfidence issue of DNN classifiers for 3D point clouds.
|
|
17:00-18:00, Paper Mo-S5T7.7 | Add to My Program |
Optimal PD Control for Robots Using GAN and LSTM |
|
Hernandez, Ivan | CINVESTAV-IPN |
Yu, Wen | CINVESTAV-IPN |
Keywords: Deep Learning
Abstract: PD (proportional-derivative) control is a widely used model-free method for controlling robots. However, it does not guarantee optimal performance. Model-based optimal control methods, such as the linear quadratic regulator (LQR), can achieve the desired control performance, but they are only suitable for linear systems that are well understood. In this paper, we propose a novel approach to design an optimal PD control for unknown robot systems using Conditional Adversarial Generative Networks (C-GAN) and long-short term memory (LSTM) to approximate LQR PD control. This new control mechanism ensures both stability and optimal performance. We apply this method to control lower limb prostheses and our results demonstrate that the optimal PD control using GAN and LSTM outperforms classical controllers.
|
|
17:00-18:00, Paper Mo-S5T7.8 | Add to My Program |
Application Analysis and Exploration of Hybrid-Augmented Intelligence in Power System |
|
Fan, Shixiong | China Electric Power Research Institute |
Zhao, Zening | China Electric Power Research Institute |
Ma, Shicong | China Electric Power Research Institute |
Guo, Jianbo | China Electric Power Research Institute |
Wang, Guozheng | China Electric Power Research Institute |
Xu, Haotian | China Electric Power Research Institute |
Keywords: Cognitive Computing, Systems Safety and Security,, Supervisory Control
Abstract: The new generation of artificial intelligence (AI) technology will play an important role in promoting the digitalization, informatization and intelligence of the future power grid due to its high-dimensional state intelligent perception and rapid decision-making capabilities. However, its inherent shortcomings such as poor interpretability and fragility also limit the further application of AI technology in power systems. This paper first introduces hybrid-augmented intelligence (HAI) technology and its application development in the fields of autonomous driving and industrial robots. Combining the characteristics of the power system and AI technology, the requirements of the power system for HAI are analyzed and summarized. Secondly, the key technologies involved in human-machine collaborative HAI are analyzed in terms of data processing, model training and model application. On this basis, the application of HAI technology in typical scenarios such as power flow section regulation is designed and analyzed, which provides reference for subsequent engineering applications. Finally, the challenges faced by the application of HAI in power systems are analyzed and prospected, aiming to promote and enrich the development of basic theories and key technologies of hybrid intelligence in power systems.
|
|
Mo-S5T8 Virtual Session, Room T8 |
Add to My Program |
Deep Learning V-IV |
|
|
|
17:00-18:00, Paper Mo-S5T8.1 | Add to My Program |
Social Occlusion Inference with Vectorized Representation for Autonomous Driving |
|
Huang, Bochao | Tongji University |
Sun, Ping | Tongji University |
Keywords: Deep Learning, Agent-Based Modeling, Application of Artificial Intelligence
Abstract: Autonomous vehicles must be capable of handling the occlusion of the environment to ensure safe and efficient driving. The social occlusion inference task focuses on inferring occupancy from agent behaviors as a remedy for perceptual deficiencies. We identify visible trajectories, road context, and occlusion information as the three key environmental elements represented by vectors in our method. Therefore, this paper introduces a novel social occlusion inference approach that learns a mapping from three types of vectors to an occupancy grid map (OGM) representing the view of the ego vehicle. Specifically, vectorized inputs are encoded through the polyline encoder to aggregate vector-level features into polyline-level features. Since vehicles are constrained by the road and affected by other agents and occlusion areas, we exploit a transformer module that models the high-order social interactions of three types of polylines. Importantly, to address the inconsistency between input and output modalities and introduce prior knowledge of occlusion, occlusion queries are proposed to fuse polyline features and generate the OGM without the input of visual modality. We evaluate our approach on the INTERACTION dataset, which achieves on-par or better performance than the baselines. The ablation study demonstrates that three key elements of input can enhance the performance of our network.
|
|
17:00-18:00, Paper Mo-S5T8.2 | Add to My Program |
MA-Net: Multi-Scale Adaptive Network for Oriented Object Detection in Remote Sensing Images |
|
Pan, Jiahao | Nanjing University of Science and Technology |
Zhang, Chongyang | Nanjing University of Science and Technology |
Keywords: Deep Learning, Application of Artificial Intelligence
Abstract: The remote sensing image detection task faces significant challenges,e.g. complex background, dense distribution, widely varied target scale, and diverse target directions. This paper proposes a remote sensing image Oriented object detection network (MA-Net) that integrates parallel multi-scale feature extraction and adaptive dynamic label assignment. The network is based on the FCOS object detection algorithm, which introduces a parallel multi-scale feature extraction module to enhance the model's understanding of multi-scale objects. The adaptive dynamic label assignment algorithm is introduced to enable the model to select samples that are more worthy of learning. According to the number of positive samples at different levels, hierarchical supervision reweighting is used to help the model focus on more important levels and improve detection accuracy. Experiments are conducted on two public datasets, DOTA and HRSC2016. Compared with the original FCOS algorithm, the detection accuracy of our algorithm was improved by 2.45% and 4.44%. And its performance is comparable to that of state-of-the-art detection methods.
|
|
17:00-18:00, Paper Mo-S5T8.3 | Add to My Program |
DiscSumm: A Discourse Based Saliency Model to Generate Abstract Summaries for Conversation |
|
Srivastava, Siddharth | Natwest Group |
Tripathi, Anurag | IIT Delhi |
Keywords: Deep Learning, Machine Learning
Abstract: Summarization of large corpus of text is an important and hard problem. The amount of text information is increasing at an exponential rate which makes it consumption and organization difficult. This data also involves information obtained from transcription, chat along with traditional long form content such as books, periodicals, web pages etc. In this paper, we focus on long conversations obtained via transcription of multi speaker audio calls. Most of the traditional works text summarization focus on structured text summarization, however, we focus on multi turn conversation summarization. Many recent works perform model fine-tuning which improves the abstractive summarization by multi folds, it can still be improved further by incorporating text into the encoder framework. Therefore, in this work we have studied the discourse-based saliency model that identifies the important segments within a text conversation and combines them adaptively to generate summary of the conversations. We demonstrate that the proposed method achieves state of the art results on public benchmarks.
|
|
17:00-18:00, Paper Mo-S5T8.4 | Add to My Program |
MOBTAG: Multi-Objective Optimization Based Textual Adversarial Example Generation |
|
Qiao, Yuanxin | Beijing Information Science and Technology Univerisity |
Xie, Ruilin | Beijing Information Science and Technology University |
Li, Li | Beijing Information Science and Technology University |
He, Qifan | Beijing Information Science and Technology University |
Cui, Zhanqi | Beijing Information Science and Technology University |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning
Abstract: Natural language processing (NLP) models are vulnerable to adversarial examples. Generating high-quality adversarial examples, which expose the vulnerability of NLP models and can be used to evaluate and improve their robustness, deserves further research. Existing techniques of generating adversarial examples in the NLP field are typically based on greedy synonym replacements, which may result in out-of-context and unnatural perturbations, and are easily identifiable by humans. In this paper, we present MOBTAG, a Multi-objective Optimization based Textual Adversarial Example Generation method, which includes three types of perturbations, and utilizes pre-trained models such as BERT and RoBERTa to generate high-quality adversarial examples. MOBTAG generates fluent and grammatical output through a mask-then-infill procedure, with introducing multi-objective optimization and genetic algorithm to pursue a high attack success rate while maintaining a high level of similarity and readability. Experimental results show that compared with methods such as TextFooler, BERTAttack, and CLARE, MOBTAG improves the attack success rate and the textual similarity by at least 11.8% and 0.09 on average, respectively.
|
|
17:00-18:00, Paper Mo-S5T8.5 | Add to My Program |
DA-MTAD: Capturing Intra and Inter-Metric Dependencies for Multivariate Time Series Anomaly Detection |
|
Meng, Zhaoyang | Beijing University of Posts and Telecommunications |
Zhu, Xinning | Beijing University of Posts and Telecommunications |
Pan, Feng | Beijing University of Posts and Telecommunications |
Hu, Zheng | Beijing University of Posts and Telecommunications |
Keywords: Deep Learning, Machine Learning
Abstract: Multivariate time series anomaly detection is a challenging task due to the intricate intra- and inter-metric dependencies present in the data. Previous methods have mainly focused on capturing intra-metric dependency, while neglecting the utilization of the dependency information across metrics. More recent approaches have considered inter-metric dependency, but have not been able to capture the complex inter-metric dynamics accurately and adaptively. To address these challenges, we propose a novel dual-attentional multivariate time series anomaly detection framework. Our approach improves on the original Dot-Product Attention to robustly capture intra-metric dependency. Additionally, we introduce automatic graph structure learning and graph attention mechanism to adaptively capture inter-metric dependency and utilize it effectively. Experiments on five public datasets from different domains demonstrate that comprehensive and accurate modeling of both dependencies enables our proposed method to outperform baseline methods in accurately detecting anomalies.
|
|
17:00-18:00, Paper Mo-S5T8.6 | Add to My Program |
MC-PDARTS: Multi-Cell Progressive Differentiable Architecture Search |
|
Liu, Qun | South China University of Technology |
Cai, Yi | South China University of Technology |
Chen, Junying | South China University of Technology |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Recently, network architecture search has been practiced in more and more tasks, but its high cost makes it difficult for small research teams or individual researchers to follow. Differentiable architecture search (DARTS) algorithm provides an easier available solution in searching effective network architectures. Although DARTS algorithm has greatly reduced the search cost as compared with other evolution based or reinforcement learning based methods, there is still a long way to go before it is widely used, since gradient based methods consume a lot of GPU memory and its search space lacks diversity. Encouraged by the idea of progressive differentiable architecture search (P-DARTS), we propose a multi-cell progressive differentiable architecture search (MC-PDARTS) algorithm, which allows cells at different levels to learn different structures, and progressively searches the best structure for the corresponding level in each search stage. The proposed algorithm greatly enlarges the search space of the search network while reducing the memory consumption and search time in the search process, and alleviates the problem of performance collapse caused by too many skip connections which often occur in previous gradient based methods. Compared with the previous algorithms, the proposed MC-PDARTS algorithm achieves state-of-the-art performance on CIFAR-10 and CIFAR-100 datasets with around 4-hour search on a single Nvidia GTX1080Ti GPU. In order to reduce the performance loss in the transfer process, MC-PDARTS searches directly on a large target dataset (ImageNet) without proxy tasks, achieving state-of-the-art performance.
|
|
17:00-18:00, Paper Mo-S5T8.7 | Add to My Program |
OpenNet: Incremental Learning for Autonomous Driving Object Detection with Balanced Loss |
|
Wang, Zezhou | East China Normal University |
Cao, Guitao | East China Normal University |
Xi, Xidong | East China Normal University |
Wang, Jiangtao | East China Normal University |
Keywords: Deep Learning
Abstract: Automated driving object detection has always been a challenging task in computer vision due to environmental uncertainties. These uncertainties include significant differences in object sizes and encountering the class unseen. It may result in poor performance when traditional object detection models are directly applied to automated driving detection. Because they usually presume fixed categories of common traffic participants, such as pedestrians and cars. Worsely, the huge class imbalance between common and novel classes further exacerbates performance degradation. To address the issues stated, we propose OpenNet. To moderate the class imbalance, we use the Balanced Loss based on cross-entropy loss. Besides, we adopt an inductive layer based on gradient reshaping to fast learn new classes with limited samples during incremental learning. To against catastrophic forgetting, we employ normalized feature distillation. By the way, we improve multi-scale detection robustness and unknown class recognition through FPN and energy-based detection, respectively. The Experimental results upon the CODA dataset show that the proposed method can obtain better performance than that of the existing methods.
|
|
Mo-S5T9 Virtual Session, Room T9 |
Add to My Program |
New Session for Latest Online Requests III |
|
|
|
17:00-18:00, Paper Mo-S5T9.1 | Add to My Program |
Workload-Aware Cache Replacement Policy Based on Bayesian Inference |
|
He, Yandong | National University of Defense Technology |
Wan, Zhong | University of Defence Tech |
Chen, Renzhi | Defense Innovation Institute |
Keywords: AI and Applications, Metaheuristic Algorithms
Abstract: The replacement policy contributes to enhancing the cache hit ratio, affecting the performance of the processor indirectly. Prior proposed static replacement policies are limited to certain classes of workload types, failing to achieve high hit rates in various benchmarks. In this work, we propose a self-adaptive replacement algorithm. The algorithm detects the drop points of the hit ratio online based on Bayesian inference and selects a new policy from a policy pool containing multiple policies according to its weight at the change points. Besides, the algorithm updates the weights based on the performance of the selected policy. Compared to the static replacement policy, our algorithm is able to apply to more access patterns and achieve a high hit ratio. We choose 15 benchmarks from DPC3 and concatenate them to generate a total of 13 composite benchmarks. We run our algorithm using a 2MB last-level cache (LLC) and show that our algorithm improves the hit rate by 2.6% over LRU and 10.6% over Random.
|
|
17:00-18:00, Paper Mo-S5T9.2 | Add to My Program |
It Is about Weather: Explainable Machine Learning for Traffic Accident Understanding (I) |
|
Salamat, Syabil Soedirman | Singapore Institute of Technology |
Liu, Fang | Singapore University of Social Sciences |
Fan, Zengyan | Singapore University of Social Sciences |
Zhang, Wei | Singapore Institute of Technology |
Keywords: Intelligent Transportation Systems, Smart Buildings, Smart Cities and Infrastructures, Cyber-physical systems
Abstract: Road traffic accidents cause injuries, claim lives, and disrupt economic activities. It is among the key problems for intelligent transportation and smart cities. Cites, especially the mega ones, must strive for reducing accidents for public safety and sustainable growth, and the first task is to understand accidents. In this paper, we build up such understanding with the emerging explainable machine learning (ML) technique. We prepare a huge dataset with over two million accident records and use it to deliver ML models for accident modelling. Given ML models of high fidelity for mapping accident features and conditions to the accident severity, we apply several explainable ML techniques to explain the models and understand accidents. We first consider coarse granularity to capture the overall feature importance. Then we consider fine granularity methods including partial dependence plot and Shapley additive explanations. The former shows that feature impact varies at different feature values and uses a plot to reflect the impact changes. The latter puts attention to individual accident and quantifies the feature impact for the specific accident. Our core observation is that traffic accidents are often about weather, followed by location and road type. Extensive experimental study is performed to support our discussion and justify our conclusion. The deliverable of this paper offers an advanced way of understanding traffic accidents accurately in a quantitative manner and has great potential to be used for intelligent transportation and smart city applications.
|
|
17:00-18:00, Paper Mo-S5T9.3 | Add to My Program | |