 
Last updated on October 1, 2019. This conference program is tentative and subject to change
Technical Program for Wednesday September 25, 2019

WeA1 Invited Session, Library 
Add to My Program 
Learning and Planning in Adversarial Environments 


Chair: Ornik, Melkior  University of Illinois at UrbanaChampaign 
CoChair: Tran, Huy  University of Illinois at UrbanaChampaign 
Organizer: Ornik, Melkior  University of Illinois at UrbanaChampaign 
Organizer: Tran, Huy  University of Illinois at UrbanaChampaign 

08:3008:50, Paper WeA1.1  Add to My Program 
NonEquilibrium Learning and CyberPhysical Security (I) 
Vamvoudakis, Kyriakos  Georgia Inst. of Tech 
Kanellopoulos, Aris  Georgia Inst. of Tech 
Keywords: Machine Learning and Learning Theory, Dynamic Games
Abstract: This paper introduces a framework for nonequilibrium behavior analysis in cyberphysical systems for security purposes. To categorize the player, we employ the principles of reinforcement learning in order to derive an iterative method of optimal responses that determine the policy of an agent with levelk intelligence in a general nonzerosum, nonlinear environment. For the special case of zerosum, linear quadratic games we derive appropriate nonequilibrium game Riccati equations. To obviate the need for complete knowledge of the system dynamics, we employ a Qlearning algorithm as a best response solver. We then design an estimator that determines the distribution of intelligence levels in the adversarial environment of the system. Finally, simulation results showcase the efficacy of our approach.


08:5009:10, Paper WeA1.2  Add to My Program 
Resilience for ConsensusBased Distributed Algorithms in Hostile Environment (I) 
Wang, Xuan  Purdue 
Mou, Shaoshuai  Purdue University 
Sundaram, Shreyas  Purdue University 
Keywords: Distributed and LargeScale Systems, Reliability, Security and Trust, Complex Networked Systems
Abstract: This is a summary of an invited talk in “Learning and Planning in Adversarial Environment” at the 57th Annual Allerton Conference on Communication, Control and Computing on September 25, 2019. In this talk we mainly present an approach to achieved automated resilience without identification of malicious agents for consensusbased distributed algorithms.


09:1009:30, Paper WeA1.3  Add to My Program 
Reinforcement Learning in Adversarial Environments with Temporal Logic Specifications (I) 
Wen, Min  University of Pennsylvania 
Topcu, Ufuk  The University of Texas at Austin 

09:3009:50, Paper WeA1.4  Add to My Program 
Opportunistic Synthesis of Reactive Systems with Information Asymmetry (I) 
Fu, Jie  Worcester Polytechnic Institute 
Keywords: Learning and Inference
Abstract: Reactive systems are systems that interact with dynamic environments to satisfy temporal logical specifications. Synthesis of reactive systems hinges on gametheoretic analysisthat is, the system plays against the environment, and its winning strategy guarantees correct behavior for the specification. However, in many cyberphysical systems with strategic interactions, information asymmetry is omnipresent. We study a class of reactive systems in which part of the system's specification is intentionally hidden from the adversarial environment. We extend the formalism of hypergame to represent games with asymmetric information and boolean payoffs specified in temporal logic formulas. Building on the solution concepts of hypergame, we develop a method of opportunistic synthesis that enables a system to leverage its information for a strategic advantage against the adversary environment. Mainly, when the adversary's strategy is ineffective against the system's hidden goals due to the lack of information, probabilistic planning is introduced to maximize the probability of achieving the preferred temporal logical objectives.


09:5010:10, Paper WeA1.5  Add to My Program 
Training Classifiers for Control Using Unreliable Data (I) 
Poonawala, Hasan A.  University of Kentucky 

WeA2 Regular Session, Solarium 
Add to My Program 
Games and Algorithms 


Chair: BehrouziFar, Amir  Rutgers University 

08:3008:50, Paper WeA2.1  Add to My Program 
Redundancy Scheduling in Systems with BiModal Job Service Time Distribution 
BehrouziFar, Amir  Rutgers University 
Soljanin, Emina  Rutgers University 
Keywords: Performance Analysis
Abstract: Queuing systems with redundant requests have drawn great attention because of their promise to reduce the job completion time and its variability. Despite a large body of work on this topic, we are still far from fully understanding the benefits of redundancy in practice. We here take one step towards practical systems by studying queuing systems with bimodal job service time distribution. Such distributions have been observed in practical systems, as can be seen in Google cluster traces. We develop an analogy to a classical urns and balls problem, and use it to study the queuing time performance of two nonadaptive classical scheduling policies: random and roundrobin. In particular, we introduce new performance metrics in the analogous model, and argue that they are good indicators of the queuing time performance of nonadaptive scheduling policies. We then propose a new nonadaptive scheduling policy that is based on combinatorial designs, and show that it improves our performance metrics. We observe in simulation that the proposed scheduling policy, as the performance metrics indicate, reduces the queuing times up to 25% compared to random scheduling and up to 100% compared roundrobin scheduling.


08:5009:10, Paper WeA2.2  Add to My Program 
On Analysis of the Bitcoin and Prism Backbone Protocols in Synchronous Networks 
Li, Jing  NORTHWESTERN UNIVERSITY 
Guo, Dongning  Northwestern University 
Keywords: Network Games and Algorithms, Reliability, Security and Trust, Performance Analysis
Abstract: Bitcoin is a peertopeer payment system proposed by Nakamoto in 2008. The bitcoin backbone protocol has been analyzed in some depth: the blockchain growth property quantifies the number of blocks added to the blockchain during any time intervals; the blockchain quality property ensures the honest miners always contribute at least a certain fraction of the blockchain; the common prefix property ensures if a block is deep enough, it must be adopted by all honest miners with high probability. The Prism protocol was recently proposed to dramatically improve the blockchain throughput while maintaining the same level of security. Prior analyses of the bitcoin and Prism backbone protocols assume the lifespan of blockchain is finite. This paper presents a streamlined and strengthened analysis in synchronous networks without the finite lifespan assumption. Specifically, the results include a blockchain growth property, a blockchain quality property, and a common prefix property of the bitcoin backbone protocol, as well as the liveness and persistence of the Prism backbone protocol regardless of whether the blockchains have finite lifespan. The properties take the form of explicit expressions in lieu of order optimal results.


09:1009:30, Paper WeA2.3  Add to My Program 
Optimizing InterDatacenter Tail Flow Completion Times Using Best WorstCase Routing 
Noormohammadpour, Max  University of Southern California 
Srivastava, Ajitesh  University of Southern California 
Raghavendra, Cauligi  University of Southern California 
Keywords: Network Games and Algorithms, Performance Analysis, Pricing and Congestion Control
Abstract: Flow routing over interdatacenter networks is a wellknown problem where the network assigns a path to a newly arriving flow potentially according to the network conditions and the properties of the new flow. An essential systemwide performance metric for a routing algorithm is the flow completion times, which affect the performance of applications running across multiple datacenters. Current static and dynamic routing approaches do not take advantage of flow size information in routing, which is practical in a controlled environment such as interdatacenter networks that are managed by the datacenter operators. In this paper, we discuss Best Worstcase Routing (BWR), which aims at optimizing the tail completion times of longrunning flows over interdatacenter networks with nonuniform link capacities. Since finding the path with the best worstcase completion time for a new flow is NPHard, we investigate two heuristics of BWRH and BWRHF which use two different upper bounds on the worstcase completion times for routing. We evaluate BWRH and BWRHF against several real WAN topologies and multiple traffic patterns. Although BWRH better models the BWR problem, BWRH and BWRHF show negligible difference across various systemwide performance metrics, while BWRHF being significantly faster. Furthermore, we show that compared to other popular routing heuristics, BWRHF can reduce the mean and tail flow completion times by over 1.5x and 2x, respectively.


09:3009:50, Paper WeA2.4  Add to My Program 
ContinuousTime Markov Decision Processes with Controlled Observations 
Huang, Yunhan  New York University 
Voleti, Veeraruna Kavitha  IIT Bombay 
Zhu, Quanyan  New York University 
Keywords: Optimization, Dynamic Games, Machine Learning and Learning Theory
Abstract: In this paper, we study a continuoustime discounted jump Markov decision process with both controlled actions and observations. The observation is only available for a discrete set of time instances. At each time of observation, one has to select an optimal timing for the next observation and a control trajectory for the time interval between two observation points. We provide a theoretical framework that the decision maker can utilize to find the optimal observation epochs and the optimal actions jointly. Two cases are investigated. One is gated queueing systems in which we explicitly characterize the optimal action and the optimal observation where the optimal observation is shown to be independent of the state. Another is the inventory control problem with Poisson arrival process in which we obtain numerically the optimal action and observation. The results show that it is optimal to observe more frequently at a region of states where the optimal action adapts constantly.


WeA3 Regular Session, Butternut 
Add to My Program 
Information Limits 


Chair: Wei, Shuangqing  Louisiana State University 

08:3008:50, Paper WeA3.1  Add to My Program 
Information Bottleneck Problem Revisited 
Bayat, Farhang  Louisiana State University 
Wei, Shuangqing  Louisiana State University 
Keywords: Network Information Theory, Optimization, Information Theory
Abstract: In this paper, we revisit the information bottleneck problem whose formulation and solution are of great importance in both information theory and statistical learning applications. We go into details as to why the problem was first introduced and how the algorithm proposed using Lagrangian method to solve such problems fell short of an exact solution. We then revisit the limitations of such Lagrangian methods, and propose to adopt a more systematic method, namely, Alternate Direction Method of Multipliers (ADMM) to develop a more efficient ADMM algorithm with randomized permutation orders to solve such problems. More importantly, we mathematically demonstrate how our suggested method outperforms the original Information Bottleneck (IB) method. At the end, we provide numerical results to demonstrate the notable advantages our algorithm attains as compared with the wellknown IB approach in terms of both attained objective function values and the resulting constraints. We further inspect the concepts of accuracy and convergence and the tradeoff between them in our method.


08:5009:10, Paper WeA3.2  Add to My Program 
Adaptive Coded Modulation for Stabilization of Wireless Networked Control Systems Over Binary Erasure Channels 
Royyan, Muhammad  Aalto University 
Vehkapera, Mikko  Aalto University 
Charalambous, Themistoklis  Aalto University 
Wichman, Risto  Aalto University 
Keywords: Network Information Theory, Optimization, Adaptive Control and Automation
Abstract: This paper proposes adaptive coded modulation for stabilization of wireless networked control systems (WNCSs). We combine a wellknown data rate theorem with adaptive coded modulation to guarantee stability and optimize the spectral efficiency in WNCSs. We believe that this is the first work in adaptive coded modulation for stabilization. In addition, we propose three schemes to optimize various objectives with given constraints. Our proposed schemes are as follows: maximizing throughput with energy Constraint (MaxTEC), minimizing energy with throughput constraint (MinETC), and minimizing delay with energy constraint (MinDEC). The numerical results show that each scheme is able to select the optimal modulation to optimize objectives at given various channel conditions and constraints.


09:1009:30, Paper WeA3.3  Add to My Program 
Fundamental Limits of QuantumSecure Covert Communication Over Bosonic Channels 
Bullock, Michael S.  University of Arizona 
Gagatsos, Christos N.  University of Arizona 
Guha, Saikat  University of Arizona 
Bash, Boulat A.  University of Arizona 
Keywords: Information Theory, Detection and Estimation
Abstract: We investigate the fundamental limit of quantumsecure covert communication over the lossy thermal noise bosonic channel, the quantummechanical mode underlying many practical channels. We assume that the adversary has unlimited quantum information processing capabilities as well as access to all transmitted photons that do not reach the legitimate receiver. Given existence of noise that is uncontrolled by the adversary, the square root law (SRL) governs covert communication: up to c*sqrt{n} covert bits can be transmitted reliably in n channel uses. Attempting to surpass this limit results in detection with unity probability as n approaches infinity. Here we present the expression for c, characterizing the SRL for the bosonic channel. We also prove that discretevalued coherent state quadrature phase shift keying (QPSK) constellation achieves the optimal c, which is the same as that achieved by a circularlysymmetric complexvalued Gaussian prior on coherent state amplitude. Finally, while binary phase shift keying (BPSK) achieves the Holevo capacity for noncovert bosonic channels in the low received signaltonoise ratio regime, we show that it is strictly suboptimal for covert communication.


09:3009:50, Paper WeA3.4  Add to My Program 
Flow Decomposition 
Ponniah, Jonathan  University of Illinois at UrbanaChampaign 
Xie, LiangLiang  University of Waterloo 
Keywords: Network Information Theory, Complex Networked Systems
Abstract: The framework of flow decomposition is proposed to unify regular encoding/decoding schemes in multisource multirelay multicast channels; flows describe encoding schemes and layered partitions describe decoding schemes. Flow decomposition reveals a fundamental duality between compressforward and decodeforward schemes with broader implications for combinatorial optimization over submodular functions. The main result proves that regular decoding schemes collectively achieve the regional cutset extension of the onerelay decodeforward rate. The proof mimics interiorpoint methods in convex optimization. A shifting algorithm is used to construct a sequence of layered partitions that converges to a target ratevector, where the number of shifts is linear in the network size. Flow decomposition inherits the benefits of regular decoding without the drawbacks of backward decoding: linear (as opposed to exponential) encoding delays and channels with unrestricted (as opposed to strictly hierarchical) flow.


09:5010:10, Paper WeA3.5  Add to My Program 
On CompressForward Schemes for Relay Networks 
Ponniah, Jonathan  University of Illinois at UrbanaChampaign 
Keywords: Network Information Theory, Complex Networked Systems
Abstract: The compressforward scheme proposed by Yassaee and Aref is revisited, as part of a unified framework for regular encoding/decoding based on layered partitions. The framework reveals a fundamental duality between compressforward and decodeforward schemes with broader implications for combinatorial optimization over submodular functions. A more general proof is presented of the key result, that regular decoding schemes collectively achieve the cutset extension of the onerelaychannel compressforward rate. The proof uses a shifting algorithm that mimics the interiorpoint methods of convex optimization and achieves linear convergence over factorial spaces. Layeredpartitionbased schemes inherit the benefits of regular decoding without the drawbacks of backward decoding: encoding delays that are linear (as opposed to exponential) in the size of the network and unrestricted (as opposed to strictly hierarchical) information flow.


WeA4 Regular Session, Pine 
Add to My Program 
Optimization and Deep Learning 


Chair: Roulet, Vincent  University of Washington 

08:3008:50, Paper WeA4.1  Add to My Program 
Convex Optimization for Shallow Neural Networks 
Ergen, Tolga  Stanford University 
Pilanci, Mert  Stanford University 
Keywords: Optimization, Machine Learning and Learning Theory
Abstract: We consider nonconvex training of shallow neural networks and introduce a convex relaxation approach with theoretical guarantees. For the single neuron case, we prove that the relaxation preserves the location of the global minimum under a planted model assumption. Therefore, a globally optimal solution can be efficiently found via a gradient method. We show that gradient descent applied on the relaxation always outperforms gradient descent on the original nonconvex loss with no additional computational cost. We then characterize this relaxation as a regularizer and further introduce extensions to multineuron single hidden layer networks.


08:5009:10, Paper WeA4.2  Add to My Program 
Elementary Convergence Guarantees for GradientBased Optimization of Deep Networks 
Roulet, Vincent  University of Washington 
Harchaoui, Zaid  University of Washington 

09:1009:30, Paper WeA4.3  Add to My Program 
Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes 
Kawaguchi, Kenji  Massachusetts Institute of Technology 
Huang, Jiaoyang  Harvard University 
Keywords: Machine Learning and Learning Theory, Optimization
Abstract: In this paper, we theoretically prove that gradient descent can find a global minimum for nonlinear deep neural networks of sizes commonly encountered in practice. The theory developed in this paper only requires the practical degrees of overparameterization unlike previous theories. Our theory only requires the number of trainable parameters to increase linearly as the number of training samples increases. This allows the size of the deep neural networks to be consistent with practice and to be several orders of magnitude smaller than that required by the previous theories. Moreover, we prove that the linear increase of the size of the network is the optimal rate and that it cannot be improved, except by a logarithmic factor. Furthermore, deep neural networks with the trainability guarantee are shown to generalize well to unseen test samples with a natural dataset but not a random dataset.


09:3009:50, Paper WeA4.4  Add to My Program 
PSConv: A PreDefined Sparse Kernel Based Convolution for Deep CNNs 
Kundu, Souvik  University of Southern California 
Prakash, Saurav  University of Southern California 
Akrami, Haleh  University of Southern California 
Beerel, Peter  University of Southern California 
Chugg, Keith M  University of Southern California 
Keywords: Learning and Inference, Machine Learning and Learning Theory, Computational Models and Representations
Abstract: The high demand for computational and storage resources severely impedes the deployment of deep convolutional neural networks (CNNs) in limited resource devices. Recent CNN architectures have proposed reduced complexity versions (e.g,.SuffleNet and MobileNet) but at the cost of modest drops in accuracy. This paper proposes pSConv, a predefined sparse 2D kernel based convolution, which promises significant improvements in the tradeoff between complexity and accuracy for both CNN training and inference. To explore the potential of this approach, we have experimented with two widely accepted datasets, CIFAR10 and Tiny Imagenet, in sparse variants of both ResNet18 and VGG16 architectures. Our approach shows a parameter count reduction of up to 4.24× with modest degradation in classification accuracy relative to that of standard CNNs. Our approach outperforms a popular variant of ShuffleNet using a variant of ResNet18 with pSConv having 3×3 kernels with only four of nine elements not fixed at zero. In particular, the parameter count is reduced by 1.7× for CIFAR10 and 2.29× for Tiny ImageNet with an increased accuracy of∼4%.


09:5010:10, Paper WeA4.5  Add to My Program 
Learning and Recovery in the ReLU Model 
Mazumdar, Arya  University of Massachusetts Amherst 
Rawat, Ankit Singh  Google 
Keywords: Learning and Inference, Signal Acquisition, Coding, and Retrieval, Statistical Signal Processing
Abstract: Rectified linear units, or ReLUs, have become a preferred activation function for artificial neural networks. In this paper we consider two basic learning problems assuming that the underlying data follow a generative model based on a simple network with ReLU activations. The first problem we study corresponds to learning a generative model in the presence of nonlinearity (modeled by the ReLU functions). Given a set of signal vectors mathbf{y}^i in mathbb{R}^d, i =1, 2, dots , n, we aim to learn the network parameters, i.e., the dtimes k matrix A, under the model mathbf{y}^i = mathrm{ReLU}(Amathbf{c}^i +mathbf{b}), where mathbf{b}in mathbb{R}^d is a random bias vector. We show that it is possible to recover the column space of A within an error of O(d) (in Frobenius norm) under certain conditions on the distribution of mathbf{b}. The second problem we consider is that of robust recovery of the signal in the presence of outliers. In this setting, we are interested in recovering the latent vector mathbf{c} from its noisy nonlinear images of the form mathbf{v} = mathrm{ReLU}(Amathbf{c}) + mathbf{e}+mathbf{w}, where mathbf{e} in mathbb{R}^d denotes the outliers with sparsity s and mathbf{w} in mathbb{R}^d denote the dense but small noise. We show that the LASSO algorithm recovers mathbf{c} in mathbb{R}^k within an ell_2error of Obig(sqrt{{((k+s)log d})/{d}}big) when A is a random Gaussian matrix.


WeA5 Regular Session, Lower Level 
Add to My Program 
Topics in Coding Theory 


Chair: Chandak, Shubham  Stanford University 

08:3008:50, Paper WeA5.1  Add to My Program 
Explicit LowComplexity Codes for Multiple Access Channel Resolvability 
Sultana, Rumia  Wichita State University 
Chou, Remi  WSU 
Keywords: Network Information Theory
Abstract: We design an explicit lowcomplexity coding scheme that achieves the multiple access channel resolvability region for an arbitrary discrete memoryless multiple access channel whose input alphabets have prime cardinalities. Unlike previous works, we do not assume channel symmetry and rely on ratesplitting not to use time sharing when it is known to be unnecessary. The idea of our construction is to reduce the problem to a combination of several source resolvability problems. Specifically, our coding scheme relies on polar codes for source coding to implement source resolvability, and a block Markov coding scheme that performs randomness recycling in the different encoding blocks.


08:5009:10, Paper WeA5.2  Add to My Program 
Asymmetric LOCO Codes: Constrained Codes for Flash Memories 
Hareedy, Ahmed  Duke University 
Calderbank, A. Robert  Duke University 
Keywords: Coding Theory, Data Storage
Abstract: In data storage and data transmission, certain patterns are more likely to be subject to error when written (transmitted) onto the media. In magnetic recording systems with binary data and bipolar nonreturntozero signaling, patterns that have insufficient separation between consecutive transitions exacerbate intersymbol interference. Constrained codes are used to eliminate such errorprone patterns. A recent example is a new family of capacityachieving constrained codes, named lexicographicallyordered constrained codes (LOCO codes). LOCO codes are symmetric, that is, the set of forbidden patterns is closed under taking pattern complements. LOCO codes are suboptimal in terms of rate when used in Flash devices where block erasure is employed since the complement of an errorprone pattern is not detrimental in these devices. This paper introduces asymmetric LOCO codes (ALOCO codes), which are lexicographicallyordered constrained codes that forbid only those patterns that are detrimental for Flash performance. ALOCO codes are also capacityachieving, and at finitelengths, they offer higher rates than the available stateoftheart constrained codes designed for the same goal. The mappingdemapping between the index and the codeword in ALOCO codes allows lowcomplexity encoding and decoding algorithms that are simpler than their LOCO counterparts.


09:1009:30, Paper WeA5.3  Add to My Program 
SlepianWolf Polar Coding with Unknown Correlation 
Tunuguntula, Karthik Nagarjuna  University of California, San Diego 
Siegel, Paul H.  UCSD 
Keywords: Coding Theory, Information Theory, Network Information Theory
Abstract: We consider the source coding problem of a binary discrete memoryless source with correlated side information available only at the receiver whose conditional distribution given the source is unknown to the encoder. We propose two methods based on polar codes to attain the achievable rates under this setting. The first method incorporates a staircase scheme, which has been used for universal polar coding for a compound channel. The second method is based on the technique of universalization using bitchannel combining. We also give a list of pros and cons for the two proposed methods.


09:3009:50, Paper WeA5.4  Add to My Program 
ErrorCorrecting Codes for Noisy Duplication Channels 
Tang, Yuanyuan  University of Virginia 
Farnoud (Hassanzadeh), Farzad  University of Virginia 
Keywords: Coding Theory, Data Storage, Coding Techniques and Applications
Abstract: Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal.


09:5010:10, Paper WeA5.5  Add to My Program 
Improved Read/write Cost Tradeoff in DNABased Data Storage Using LDPC Codes 
Chandak, Shubham  Stanford University 
Tatwawadi, Kedar  Stanford University 
Lau, Billy  Stanford University 
Kubit, Matt  Berkeley Lights, Inc 
Mardia, Jay  Stanford University 
Neu, Joachim  Stanford University 
Griffin, Peter  Stanford University 
Wootters, Mary  Stanford University 
Weissman, Tsachy  Stanford 
Ji, Hanlee  Stanford University 
Keywords: Biological Information Systems, Coding Techniques and Applications, Data Storage
Abstract: With the amount of data being stored increasing rapidly, there is significant interest in exploring alternative storage technologies. In this context, DNAbased storage systems can offer significantly higher storage densities and durability than current technologies. Recent advances in DNA sequencing and synthesis pipelines have made DNAbased storage a promising candidate for the storage technology of the future. Recently, there have been multiple efforts in this direction, focusing on aspects such as error correction for synthesis/sequencing errors and erasure correction for handling missing sequences. The typical approach is to use separate codes for handling errors and erasures, but there is limited understanding of the efficiency of this framework. In this work, we study the tradeoff between the writing and reading costs involved in DNAbased storage and propose a practical scheme to achieve an improved tradeoff between these quantities. Our scheme breaks with the traditional separation framework and instead uses a single large blocklength LDPC code for both erasure and error correction. We also introduce novel techniques to handle insertion and deletion errors introduced by the synthesis process. For a range of writing costs, the proposed scheme achieves 3040% lower reading costs than stateoftheart techniques on experimental data. The code, data and supplementary material is available at https://github.com/shubhamchandak94/LDPC_DNA_storage.


WeB1 Invited Session, Library 
Add to My Program 
Electric Power Systems I 


Chair: Bose, Subhonmesh  University of Illinois at Urbana Champaign 
CoChair: DominguezGarcia, Alejandro  University of Illinois at UrbanaChampaign 
Organizer: Bose, Subhonmesh  University of Illinois at Urbana Champaign 
Organizer: DominguezGarcia, Alejandro  University of Illinois at UrbanaChampaign 

10:3010:50, Paper WeB1.1  Add to My Program 
ReliabilityAware RealTime Pricing of Electricity Based on Bandit Heuristics (I) 
Alizadeh, Mahnoosh  University of California Santa Barbara 

10:5011:10, Paper WeB1.2  Add to My Program 
Renewable Generation Investment with Downstream Competition (I) 
Badia, Brendan  Northwestern University 
Berry, Randall  Northwestern University 
Wei, Ermin  Northwestern University 
Keywords: Pricing and Congestion Control, Network Games and Algorithms
Abstract: With the cost of renewable electricity generation like solar and wind decreasing, the share of total generation provided by such sources is growing. This leads to questions of how utilities should deal with integrating renewables into the grid. An interesting and not commonly studied case of this is when firms consume significant amounts of electricity in order to serve their customers (e.g., electric vehicle charging stations) and can get it either from the grid or through renewable generation they install themselves. Furthermore, if they have more renewable generation than needed to serve their customers they can sell it back to the grid. Therefore, the cost of electricity and value of excess generation affects firm's interest in renewable generation, and has impacts on both the grid (by affecting how much renewable generation firms will provide) as well as the downstream market (by affecting the cost structure at both firms and how they compete). We study this though a two stage oligopolistic model. In the first stage, two differentiated firms choose a level of investment in renewable generation. In the second stage, they compete over prices to serve customers. We show equilibria always exist, and the difference between price of electricity and the feedin tariff has potentially large implications for the number and type of equilibria as well as aggregate outcomes.


11:1011:30, Paper WeB1.3  Add to My Program 
On the Distribution of RealValued Solutions to the Power Flow Equations (I) 
Lesieutre, Bernard  University of Wisconsin 
Lindberg, Julia  University of WisconsinMadison 
Zachariah, Alisha  University of Wisconsin  Madison 
Boston, Nigel  University of Wisconsin  Madison 
Keywords: Complex Networked Systems, Distributed and LargeScale Systems
Abstract: The number and nature of realvalued solutions to the nonlinear power flow equations are not completely understood and there are no proven simple methods for computing all solutions. It has been noted that in practice the equations tend to admit many fewer realvalued solutions than what might be estimated using traditional bounds. In this paper we examine a means to calculate the distribution of number of solutions as a function of multiple parameters. This method relies on a reduction technique that resolves to an equation in one variable for which the calculation of realvalued solutions is straightforward. Importantly we examine the fundamental question of whether a realvalued solution in one variable necessarily implies all variables will be simultaneously realvalued. We present topologies for which this favorable property is retained and topologies for which it fails


11:3011:50, Paper WeB1.4  Add to My Program 
A Control Architecture for Power and Voltage Regulation from a Collection of GridForming or GridFollowing Inverters in Distribution Networks (I) 
Purba, Victor  University of Minnesota  Twin Cities 
AlDigs, Abdullah  University of British Columbia 
Dhople, Sairaj  University of Minnesota 

11:5012:10, Paper WeB1.5  Add to My Program 
Risk Tunable Market Clearing with Massive End User Flexibilities (I) 
Xie, Le  Texas A&M University 

12:1012:30, Paper WeB1.6  Add to My Program 
Learning the Unobservable: HighResolution State Estimation via Deep Learning 
Mestav, Kursat Rasim  Cornell University 
Tong, Lang  Cornell University 
Keywords: Machine Learning and Learning Theory, Learning and Inference, Statistical Signal Processing
Abstract: The problem of achieving the fasttimescale state estimation for a power system with limited deployment of phasor measurement units is considered. A deep neuralnetwork architecture that integrates baddata detection, data cleansing, and the minimum mean squared error state estimation is developed. It includes a universal baddata detection and a Bayesian state estimation subnetworks. A novel universal baddata detection technique is proposed that requires no knowledge about data distributions under regular and irregular operating conditions. The subnetwork for universal baddata detection consists of an inverse generative model and a coincidence test. It is implemented through the training of a generative adversary network and an autoencoder using slowtimescale historical data. The Bayesian state estimation subnetwork is trained through a generative adversary network with embedded physical models of the power system. Comparing with the conventional weighted least squares approach to state estimation, the proposed minimum meansquared error state estimator does not require observability. Simulations demonstrate orders of magnitude improvement in estimation accuracy and online computation costs of/ver the stateoftheart solutions.


WeB2 Invited Session, Solarium 
Add to My Program 
Deep Learning for Communications I 


Chair: Kim, Hyeji  Samsung AI Research Cambridge 
Organizer: Kim, Hyeji  Samsung AI Research Cambridge 
Organizer: ten Brink, Stephan  University of Stuttgart 
Organizer: Viswanath, Pramod  University of Illinois 

10:3010:50, Paper WeB2.1  Add to My Program 
Turbo Autoencoder: Channel Codes Via Deep Learning (I) 
Jiang, Yihan  University of Washington 
Kim, Hyeji  Samsung AI Research Cambridge 
Asnani, Himanshu  Stanford University 
Kannan, Sreeram  University of Washington 
Oh, Sewoong  University of Washington 

10:5011:10, Paper WeB2.2  Add to My Program 
Deep LearningBased Polar Code Design (I) 
Ebada, Moustafa  University of Stuttgart 
Cammerer, Sebastian  Institute of Telecommunications, Universität Stuttgart 
Elkelesh, Ahmed  University of Stuttgart 
ten Brink, Stephan  University of Stuttgart 
Keywords: Network Coding
Abstract: In this work, we introduce a deep learningbased polar code construction algorithm. The core idea is to represent the information/frozen bit indices of a polar code as a binary vector which can be interpreted as trainable weights of a neural network (NN). For this, we demonstrate how this binary vector can be relaxed to a softvalued vector, facilitating the learning process through gradient descent and enabling an efficient code construction. We further show how different polar code design constraints (e.g., code rate) can be taken into account by means of careful binarytosoft and softtobinary conversions, along with rateadjustment after each learning iteration. Besides its conceptual simplicity, this approach benefits from having the “decoderintheloop”, i.e., the nature of the decoder is inherently taken into consideration while learning (designing) the polar code. We show results for belief propagation (BP) decoding over both AWGN and Rayleigh fading channels with considerable performance gains over stateoftheart construction schemes.


11:1011:30, Paper WeB2.3  Add to My Program 
Image Information Embedding Via Adversarial Neural Networks (I) 
Jiang, Yihan  University of Washington 
Mangalam, Saket  University of Washington 
Asnani, Himanshu  Stanford University 
Kannan, Sreeram  University of Washington 

11:3011:50, Paper WeB2.4  Add to My Program 
Learning to Communicate with Limited CoDesign (I) 
Sahai, Anant  UC Berkeley 
Sanz, Joshua  UC Berkeley 
Subramanian, Vignesh  UC Berkeley 
Tran, Caryn  UC Berkeley 
Vodrahalli, Kailas  UC Berkeley 
Keywords: Wireless Communication Systems
Abstract: In this work we examine the problem of learning to cooperate in the context of wireless communication. We consider the two agent setting where agents must learn modulation and demodulation schemes that enable them to communicate with each other in the presence of a powerconstrained additive white Gaussian noise (AWGN) channel. We investigate whether learning is possible under different levels of information sharing between distributed agents, that are not necessarily codesigned. We make use of the ``echo'' protocol, a learning protocol where an agent hears, understands, and repeats (echoes) back the message received from another agent. Each agent uses what it sends and receives to train itself to communicate. To capture the idea of cooperation between agents that are "not necessarily codesigned," we use two different populations of function approximators  neural networks and polynomialbased. In addition to diverse learning agents, we include nonlearning agents that already used fixed standardized modulationprotocols such as QPSK and QAM16. This is used to verify that the echo approach to learning to communicate works independent of the details of the inner working of the agents, and that learning agents can not only learn to match the communication expectations of others, but can also collaboratively invent a successful communication approach from independent random initializations. In addition to simulationbased experiments, we implement the echo protocol in physi


11:5012:10, Paper WeB2.5  Add to My Program 
Deep Learning for Communication Over Dispersive Nonlinear Channels: Performance and Comparison with Classical Digital Signal Processing (I) 
Karanov, Boris  University College London 
Liga, Gabriele  TU Eindhoven 
Aref, Vahid  Nokia Bell Labs 
Lavery, Domanic  University College London 
Bayvel, Polina  University College London 
Schmalen, Laurent  Karlsruhe Institute of Technology 
Keywords: Statistical Signal Processing
Abstract: In this paper, we apply deep learning for communication over dispersive channels with power detection, as encountered in lowcost optical intensity modulation/direct detection (IM/DD) links. We consider an autoencoder based on the recently proposed sliding window bidirectional recurrent neural network (SBRNN) design to realize the transceiver for optical IM/DD communication. We show that its performance can be improved by introducing a weighted sequence estimation scheme at the receiver. Moreover, we perform bittosymbol mapping optimization to reduce the biterror rate (BER) of the system. Furthermore, we carry out a detailed comparison with classical schemes based on pulseamplitude modulation and maximum likelihood sequence detection (MLSD). Our investigation shows that for a reference 42 Gb/s transmission, the SBRNN autoencoder achieves a BER performance comparable to MLSD, when both systems account for the same amount of memory. In contrast to MLSD, the SBRNN performance is achieved without incurring a computational complexity exponentially growing with the processed memory.


12:1012:30, Paper WeB2.6  Add to My Program 
Siamese Neural Networks for Wireless Positioning and Channel Charting (I) 
Lei, Eric  Cornell University 
Castañeda, Oscar  Cornell University 
Tirkkonen, Olav  Aalto University 
Goldstein, Tom  University of Maryland 
Studer, Christoph  Cornell University 
Keywords: Information Theoretic Approaches in Wireless Communications
Abstract: Neural networks have been proposed recently for positioning and channel charting of user equipments (UEs) in wireless systems. Both of these approaches process channel state information (CSI) that is acquired at a multiantenna basestation in order to learn a function that maps CSI to location information. CSIbased positioning using deep neural networks requires a dataset that contains both CSI and associated location information. Channel charting (CC) only requires CSI information to extract relative position information. Since CC builds on dimensionality reduction, it can be implemented using autoencoders. In this paper, we propose a unified architecture based on Siamese networks that can be used for supervised UE positioning and unsupervised channel charting. In addition, our framework enables semisupervised positioning, where only a small set of location information is available during training. We use simulations to demonstrate that Siamese networks achieve similar or better performance than existing positioning and CC approaches with a single, unified neural network architecture.


WeB3 Regular Session, Butternut 
Add to My Program 
Distributed and Large Systems 


Chair: Mustafa, Aquib  Michigan State University 

10:3010:50, Paper WeB3.1  Add to My Program 
Spectral Analysis of the Adjacency Matrix of Random Geometric Graphs 
Hamidouche, Mounia  EURECOM 
Avrachenkov, Konstantin E.  INRIA Sophia Antipolis 
Cottatellucci, Laura  FriedrichAlexanderUniversität ErlangenNürnberg 
Keywords: Distributed and LargeScale Systems, Statistical Signal Processing, Complex Networked Systems
Abstract: We analyze the limiting eigenvalue distribution (LED) of random geometric graphs (RGGs) in two different regimes. The RGG is constructed by uniformly distributing n nodes on the ddimensional torus mathbb{T}^d equiv [0, 1]^d, and connecting two nodes if their ell_{p}distance, p in [1, infty] is at most r_{n}. We study the LED of the adjacency matrix for RGGs in a large class of the connectivity regime in which the average vertex degree scales as logleft( nright) or faster, i.e., Omega left(log(n) right). Additionally, we analyze the LED of normalized adjacency matrices in the thermodynamic regime in which the average vertex degree tends to a constant. In the connectivity regime, under some conditions on r_{n}, we show that the LED of the adjacency matrix of RGGs converges to the LED of the adjacency matrix of a deterministic geometric graph with nodes in a grid (DGG) as n goes to infinity. In the thermodynamic regime, we propose an approximation for the LED of the adjacency matrix normalized by the squared average degree. Then, we upper bound the Levy distance between this approximation and the actual distribution. Based on the structure of the DGG, we provide an explicit expression for the eigenvalues in both the connectivity and the thermodynamic regimes.


10:5011:10, Paper WeB3.2  Add to My Program 
Scalable Panel Fusion Using Distributed Min Cost Flow 
Shinde, Swapnil  Comscore 
Ranta, Jukka  Comscore 
Deitrick, Paul  Comscore 
Malloy, Matthew  University of Wisconsin 
Keywords: Optimization, Distributed and LargeScale Systems, Data Analytics
Abstract: Modern audience measurement requires combining observations from disparate panel datasets. Connecting and relating such panel datasets is a process termed panel fusion. This paper formalizes the panel fusion problem and presents a novel approach to solve it. We cast the panel fusion as a network flow problem, allowing the application of a rich body of research. In the context of digital audience measurement, where panel sizes can grow into the tens of millions, we propose an efficient algorithm to partition the network into subproblems. While the algorithm solves a relaxed version of the original problem, we provide conditions under which it guarantees optimality. We demonstrate our approach by fusing two realworld panel datasets in a distributed computing environment.


11:1011:30, Paper WeB3.3  Add to My Program 
GraphTheoretic Analyses of MIMO Channels in Diffusive Networks 
Koorehdavoudi, Kasra  Washington State University 
Roy, Sandip  Washington State University 
Keywords: Distributed and LargeScale Systems, Decentralized Control Systems
Abstract: In this work, we characterize the finitezero and infinitezero structure of a multiinput multioutput channel in a standard model for network synchronization. To do so, we first develop an algebraic analysis of the zeros based on a inputtooutput transformation known as the special coordinate basis. This decomposition then allows us to develop topological results on the zeros, i.e. characterizations in terms of the network graph and the input/output locations relative to the graph. Specifically, our results show how the relative locations and interactions among multiple inputoutput pairs in a network influence the locations of the finite and invariant zeros. As a whole, the study contributes to the analysis of dynamical networks from an inputoutput perspective, rather than only in terms of internal or emergent behaviors.


11:3011:50, Paper WeB3.4  Add to My Program 
Attack Analysis for DiscreteTime Distributed MultiAgent Systems 
Mustafa, Aquib  Michigan State University 
Modares, Hamidreza  Michigan State University 
Keywords: Distributed and LargeScale Systems, Reliability, Security and Trust, Intrusion/Anomaly Detection and Diagnosis
Abstract: The recent growth of cyberphysical systems provides new opportunities to the attackers to undermine the system performance or even destabilize it. This work presents a system theoretic approach for the analysis of the adverse effects of attacks in discretetime distributed multiagent systems. Analyzing cyberphysical attacks from the attacker’s perspective improves the system awareness against attacks and reinforces the necessity of developing novel resilient control protocols. To this end, we first show that an attack on a compromised agent can adversely affect intact agents that are reachable from it. Then, we show that how an attack on a single root node can snowball into a networkwide attack and even destabilize the entire system. Finally, we show that the local neighborhood tracking error for the intact agents goes to zero, despite attack. This makes existing robust control approaches that aim at attenuation of disturbance or attack based on local neighborhood tracking ineffective, and demands designing novel resilient control approaches.


11:5012:10, Paper WeB3.5  Add to My Program 
PrivacyPreserving Average Consensus Over Digraphs in the Presence of TimeDelays 
Charalambous, Themistoklis  Aalto University 
Manitara, Nikolas  University of Cyprus 
Hadjicostis, Christoforos N.  University of Cyprus 
Keywords: Distributed and LargeScale Systems, Reliability, Security and Trust, Complex Networked Systems
Abstract: In this paper, we propose a privacypreserving discretetime asymptotic average consensus mechanism that allows components of a multicomponent system to calculate the exact average of their initial values without revealing to other components their specific value. We assume that components (nodes) interact with other components via possibly directed communication links (edges), forming a generally directed communication topology (digraph). The proposed distributed protocol can be followed by any component that wants to maintain its privacy (i.e., not reveal the initial value it contributes to the average) to possibly multiple curious but not malicious nodes (curious nodes try to identify the initial values of other nodes, and can exchange information with other curious nodes, but do not interfere in the computation in any other way). We devise a distributed mechanism, based on ratio consensus, where each node updates its information state by combining the available information received by its inneighbors using constant positive weights and by adding an offset (only at one of the two states communicated during the execution of the algorithm). We establish that this privacypreserving version of ratio consensus, henceforth called the privacypreserving ratio consensus algorithm, converges to the exact average of the nodes' initial values, even in the presence of time varying delays. Illustrative examples demonstrate the validity and performance of our proposed algorithm.


WeB4 Regular Session, Pine 
Add to My Program 
Codes for Computing and Caching 


Chair: AlHassoun, Yousef  The Ohio State University 

10:3010:50, Paper WeB4.1  Add to My Program 
Distributed BlackBox Optimization via Error Correcting Codes 
Bartan, Burak  Stanford University 
Pilanci, Mert  Stanford University 
Keywords: Coding Techniques and Applications, Distributed and LargeScale Systems, Optimization
Abstract: We introduce a novel distributed derivativefree optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider blackbox adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.


10:5011:10, Paper WeB4.2  Add to My Program 
Random KhatriRaoProduct Codes for NumericallyStable Distributed Matrix Multiplication 
Muthuveeru Subramaniam, Adarsh  Texas A&M University 
Heidarzadeh, Anoosheh  Texas A&M University 
Narayanan, Krishna  Texas A&M University 
Keywords: Coding Theory, Coding Techniques and Applications
Abstract: We propose a class of codes called random KhatriRaoProduct (RKRP) codes for distributed matrix multiplication in the presence of stragglers. The main advantage of the proposed codes is that decoding of RKRP codes is highly numerically stable in comparison to decoding of Polynomial codes and decoding of recently proposed OrthoPoly codes. We show that RKRP codes are maximum distance separable with probability 1. The communication cost and encoding complexity for RKRP codes are identical to that of OrthoPoly codes and Polynomial codes and the decoding complexity of RKRP codes is lower than that of OrthoPoly codes. Numerical results show that the average relative L2norm of the reconstruction error for RKRP codes is substantially better than that of OrthoPoly codes.


11:1011:30, Paper WeB4.3  Add to My Program 
From Alignment to Acyclic Chains: Lexicographic Performance Bounds for Index Coding 
Liu, Yucheng  The Australian National University 
Sadeghi, Parastoo  The Australian National University 
Keywords: Information Theory
Abstract: In this work, we study the informationtheoretic performance bounds for the index coding problem. Recently, weighted alignment chain models were proposed by generalizing the alignment chain model of Maleki et al. This led to derivation of singleletter performance bounds that are strictly tighter than both the wellknown maximum acyclic induced subgraph (MAIS) bound and the internal conflict bound. Here, we propose a more general chain model, namely the acyclic chain. Unlike weighted alignment chains that are constructed by individual messages, acyclic chains can deal with set of messages as their components. This allows a recursive development of new performance bounds. The new acyclic chain bounds subsume the weighted alignment chain bounds and can be strictly tighter. Moreover, a key drawback of the weighted alignment chain bounds is that their improvement over the MAIS bound is limited to a fixed constant value that does not scale with the problem size. In contrast, we show that the new acyclic chain bounds have a desired multiplicative property under the lexicographic product of the index coding side information graphs. As such, the gap between these new bounds and the MAIS bound is not fixed, but can be magnified to a multiplicative factor which is polynomial in the problem size.


11:3011:50, Paper WeB4.4  Add to My Program 
Efficient Coded Caching with Limited Memory 
AlHassoun, Yousef  The Ohio State University 
Alotaibi, Faisal  The Ohio State University 
El Gamal, Aly  Purdue University 
El Gamal, Hesham  Ohio State University 
Keywords: Network Information Theory, Network Coding, Coding Theory
Abstract: Recently, coded caching techniques have received tremendous attention due to its significant gain in reducing the cost of delivery rate. However, this gain was only considered with the assumption of free placement phase. Motivated by our recent result of coded caching, we focus here on minimizing the overall rate of the caching network by capturing the transmission cost of the placement and delivery phases undertextit{ limited storage memory} at the end user. We model the dynamic nature of the network through a cost structure that allows for varying the network architecture and cost per transmission across the two phases of caching. The optimal caching decision for the worst case scenario with memory constraint is provided. Moreover, analysis of the delivery phase is proposed where tradeoffs between system parameters, memory, and delivery rate are considered. Interestingly, we show that there are regions where the uncoded caching scheme outperforms the coded caching scheme. Finally, we provide numerical results to support and demonstrate our findings.


11:5012:10, Paper WeB4.5  Add to My Program 
Straggler Resilient Serverless Computing Based on Polar Codes 
Bartan, Burak  Stanford University 
Pilanci, Mert  Stanford University 
Keywords: Coding Techniques and Applications, Distributed and LargeScale Systems, Coding Theory
Abstract: We propose a serverless computing mechanism for distributed computation based on polar codes. Serverless computing is an emerging cloud based computation model that lets users run their functions on the cloud without provisioning or managing servers. Our proposed approach is a hybrid computing framework that carries out computationally expensive tasks such as linear algebraic operations involving largescale data using serverless computing and does the rest of the processing locally. We address the limitations and reliability issues of serverless platforms such as straggling workers using coding theory, drawing ideas from recent literature on coded computation. The proposed mechanism uses polar codes to ensure stragglerresilience in a computationally effective manner. We provide extensive evidence showing polar codes outperform other coding methods. We have designed a sequential decoder specifically for polar codes in erasure channels with fullprecision input and outputs. In addition, we have extended the proposed method to the matrix multiplication case where both matrices being multiplied are coded. The proposed coded computation scheme is implemented for AWS Lambda. Experiment results are presented where the performance of the proposed coded computation technique is tested in optimization via gradient descent. Finally, we introduce the idea of partial polarization which reduces the computational burden of encoding and decoding at the expense of stragglerresilience.


12:1012:30, Paper WeB4.6  Add to My Program 
Lagrange Coded Computing with Sparsity Constraints 
Fahim, Mohammad  Pennsylvania State University 
Cadambe, Viveck  Pennsylvania State University 
Keywords: Coding Techniques and Applications, Coding Theory, Distributed and LargeScale Systems
Abstract: In this paper, we propose a distributed coding scheme that allows for lower computation cost per computing node than the standard Lagrange Coded Computing scheme. The proposed coding scheme is useful for cases where the elements of the input data set are of large dimensions and the computing nodes have limited computation power. This coding scheme provides a tradeoff between the computation cost per worker and the recovery threshold in a distributed coded computing framework. The proposed scheme is also extended to provide data privacy against at most t colluding worker nodes in the system.


WeB5 Invited Session, Lower Level 
Add to My Program 
Sequential Methods I 


Chair: Fellouris, Georgios  University of Illinois 
Organizer: Fellouris, Georgios  University of Illinois 
Organizer: Veeravalli, Venu  University of Illinois 

10:3010:50, Paper WeB5.1  Add to My Program 
A Sequential Detection Theory for Statistically Periodic Random Processes (I) 
Banerjee, Taposh  University of Texas at San Antonio 
Gurram, Prudhvi  U.S. Army Research Lab and Booz Allen Hamilton 
Whipps, Gene  U.S. Army Research Lab 
Keywords: Detection and Estimation, Statistical Signal Processing
Abstract: Periodic statistical behavior of data is observed in many practical problems encountered in cyberphysical systems and biology. A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to model such data. An optimal stopping theory is developed to solve sequential detection problems for i.p.i.d. processes. The developed theory is then applied to detect a change in the distribution of an i.p.i.d. process. It is shown that the optimal change detection algorithm is a stopping rule based on a periodic sequence of thresholds. Numerical results are provided to demonstrate that a singlethreshold policy is not strictly optimal.


10:5011:10, Paper WeB5.2  Add to My Program 
Optimum Quickest Detection in Statistically Periodic Processes (I) 
Moustakides, George  University of Patras, Greece and Rutgers University, USA 

11:1011:30, Paper WeB5.3  Add to My Program 
Rapid Detection of HotSpot by Tensor Decomposition with Application to Weekly Gonorrhea Data (I) 
Zhao, Yujie  Georgia Institute of Technology 
Yan, Hao  Arizona State University 
Holte, Sarah  Fred Hutchinson Cancer Research Center 
Kerani, Roxanne P.  University of Washington 
Mei, Yajun  Georgia Institute of Technology 

11:3011:50, Paper WeB5.4  Add to My Program 
A Sequential and Scalable Approach to Community Detection in Dynamic Graphs (I) 
Beckus, Andre  University of Central Florida 
Atia, George  University of Central Florida 
Keywords: Learning and Inference, Data Analytics, Detection and Estimation
Abstract: We study a sequential sketchbased approach for the clustering of timeevolving graphs. We present a dynamic extension to the static Stochastic Block Model, which accommodates growing and shrinking graphs, as well as the movement of nodes between clusters. We then propose an online algorithm which constructs and maintains a small sketch consisting of nodes sampled from the full graph. The sketch is clustered and a retrieval algorithm is used to infer cluster membership of nodes in each successive graph snap. We demonstrate that the use of a small sketch not only improves computational complexity, but also improves the success rate when sketches are properly proportioned. We present a sampling method which chooses nodes according to edge density, whereby very small clusters can be successfully tracked.


11:5012:10, Paper WeB5.5  Add to My Program 
Compound Sequential Change Point Detection in Multiple Data Streams (I) 
Li, Xiaoou  University of Minnesota 
Chen, Yunxiao  Emory University 

12:1012:30, Paper WeB5.6  Add to My Program 
A Sequential GradientBased Multiple Access for Distributed Learning Over Fading Channels (I) 
Sery, Tomer  BenGurion University of the Negev 
Cohen, Kobi  BenGurion University of the Negev 
Keywords: Detection and Estimation
Abstract: A distributed learning problem over multiple access channel (MAC) using a large wireless network is considered. The objective function is a sum of the nodes’ local loss functions. The inference decision is made by the network edge and is based on received data from distributed nodes which transmit over a noisy fading MAC. We develop a novel GradientBased Multiple Access (GBMA) algorithm to solve the distributed learning problem over MAC. Specifically, the nodes transmit an analog function of the local gradient using common shaping waveforms. The network edge receives a superposition of the analog transmitted signals which represents a noisy distorted gradient used for updating the estimate. We analyze the performance of GBMA theoretically, and prove that it can approach the convergence rate of the centralized gradient descent (GD) algorithm in large networks under both convex and strongly convex loss functions with Lipschitz gradient.


WeC1 Invited Session, Library 
Add to My Program 
Electric Power Systems II 


Chair: Bose, Subhonmesh  University of Illinois at Urbana Champaign 
CoChair: DominguezGarcia, Alejandro  University of Illinois at UrbanaChampaign 
Organizer: Bose, Subhonmesh  University of Illinois at Urbana Champaign 
Organizer: DominguezGarcia, Alejandro  University of Illinois at UrbanaChampaign 

13:3013:50, Paper WeC1.1  Add to My Program 
Implied Constraint Satisfaction in Power System Optimization: The Impacts of Load Variations (I) 
Roald, Line  University of Wisconsin  Madison 
Molzahn, Daniel  Georgia Institute of Technology 
Keywords: Optimization, Complex Networked Systems, Machine Learning and Learning Theory
Abstract: In many power system optimization problems, we observe that only a small fraction of the line flow constraints ever become active at the optimal solution, despite variations in the load profile and generation costs. This observation has farreaching implications not only for power system optimization, but also for practical applications such as longterm planning, operation, and control of the system. This paper presents a constraint screening approach to identify constraints whose satisfaction is implied by other constraints in the problem, and can therefore be safely disregarded. The approach is targeted at problems which involve DC power flow constraints, and involves two steps. The first step uses simple analytical relationships to remove redundant limits on parallel lines. The second step uses optimization to consider interactions among all of the problem constraints. In essence, we solve a (relaxed) optimization problem for each constraint to identify whether it is redundant. Different from existing methods that focus on constraint screening for a given daily load profile, we consider ranges of load that are wide enough to represent yearly variations in loading. The constraint screening results are thus valid for long periods of time, justifying the computational overhead required for the screening method.


13:5014:10, Paper WeC1.2  Add to My Program 
Accurate ReducedOrder Models for Coherent Synchronous Generators (I) 
Min, Hancheng  Johns Hopkins University 
Paganini, Fernando  Universidad ORT Uruguay 
Mallada, Enrique  Johns Hopkins University 
Keywords: Complex Networked Systems, Distributed and LargeScale Systems
Abstract: Accurately modeling generator frequency response to power disturbances is essential for assessing frequency control performance in power grids. A widely used technique is to aggregate the frequency response of coherent generators into a single effective machine. Previous work has demonstrated that the best choice of inertial and damping coefficients for the effective generator is obtained by adding among all the corresponding generator parameters. However, in the presence of turbine dynamics, the proper choice of turbine time constants is challenging. We introduce a novel framework to approximate the aggregate frequency dynamics of coherent synchronous generators. By leveraging recent results on dynamics concentration of tightlyconnected networks, we show that the aggregate dynamics are highorder, therefore cannot be accurately represented by a single machine. Instead, we develop a hierarchy of reduced order models based on balance truncation that accurately approximate the aggregate system response. Our results outperform existing aggregation techniques and can be shown to monotonically improve the approximation as the hierarchy order increases.


14:1014:30, Paper WeC1.3  Add to My Program 
Optimal Grid – Distributed Energy Resource Coordination: Distribution Locational Marginal Costs and Hierarchical Decomposition (I) 
Andrianesis, Panagiotis  Boston University 
Caramanis, Michael C.  Boston University 
Keywords: Optimization, Pricing and Congestion Control, Complex Networked Systems
Abstract: We consider radial distribution networks hosting Distributed Energy Resources (DERs), including Solar Photovoltaic (PV) and storagelike loads, such as Electric Vehicles (EVs). We employ shortrun dynamic Distribution Locational Marginal Costs (DLMCs) of real and reactive power to cooptimize distribution network and DER schedules. Striking a balance between centralized control and distributed selfdispatch, we present a novel hierarchical decomposition approach that is based on centralized AC Optimal Power Flow (OPF) capability interacting with DER selfdispatch that adapts to real and reactive power DLMCs. The proposed approach is designed to be highly scalable allowing for massive DER participation with high model fidelity that captures precise estimation and cost inclusiveness of DLMCs. We illustrate that the discovery of spatiotemporally varying DLMCs, using an enhanced AC OPF model, which apart from network congestion also reflects asset (transformer) degradation, is key to optimal Grid  DER coordination. We employ actual distribution feeders to exemplify the use of DLMCs as financial incentives that convey sufficient information to optimize Distribution Network, and DER (PV and EV) operation, and we discuss the applicability and tractability of the proposed approach, while modeling the full complexity of spatiotemporal DER capabilities and preferences.


14:3014:50, Paper WeC1.4  Add to My Program 
Projected GridForming Control for CurrentLimiting of Power Converters (I) 
Gross, Dominic  ETH Zurich 
Dorfler, Florian  ETH Zurich 
Keywords: Complex Networked Systems, Decentralized Control Systems
Abstract: Sustainability and resilience concerns have motivated an unprecedented transformation of the electric power systems towards massive integration of renewable generation interfaced by power electronics. In this context, voltage source converters using gridforming control are envisioned to provide services that so far have been provided by synchronous machines. In contrast to synchronous machines, voltage source converters are subject to stringent overcurrent limits. By exploiting the inherent timescale separation between the inner control loops, the gridforming reference dynamics (i.e., droop control, virtual oscillator control, etc.), and the transmission network, we highlight the wellknown fact that the gridforming reference dynamics need to be restricted to limit the converter output current without compromising stability. Next, we propose to formalize the problem of limiting the output current of a gridforming converter by projecting the gridforming reference dynamics onto a constraint on the output current. The main contribution is a projected droop controller that can be implemented using only local measurements. Moreover, we link the results to current limiting approaches using virtual impedance. Finally, we use a highfidelity simulation to show that projected droop control outperforms virtual impedance current limiting.


14:5015:10, Paper WeC1.5  Add to My Program 
Optimal Operation of Sanitary Pump Stations with Energy Storage During Power Outages (I) 
Kothyari, Ashish  University of British Columbia 
Tordoff, Stephen  Energy Canvas Ltd 
David, Christopher  Lulu Island Energy Company Ltd 
Nagamune, Ryozo  University of British Columbia 
Chen, Christine  University of British Columbia 
Keywords: Optimization, Robust and Nonlinear Control
Abstract: This paper aims to determine the optimal strategy to operate the pump in a sanitary pump station (SPS) to minimize its energy consumption over a specified time window. An SPS collects sanitary waste from nearby neighbourhoods in a city and then pumps the waste to a treatment plant for processing. Sustained operation of SPSs, even during power outages, is critical for the health and safety of urban dwellers. Energy storage can be used to operate the SPS when power from the grid is unavailable. Major challenges in minimizing energy consumption include (i) the nonlinear power characteristic of the pump, and (ii) the uncertainty in predicting the waste deposited into the SPS. We formulate a chanceconstrained nonlinear optimal control problem and further devise a tractable solution strategy. We illustrate the energy savings obtained with the proposed control over the existing one.


WeC2 Invited Session, Solarium 
Add to My Program 
Distributed Control, Optimization, and Learning I 


Chair: Etesami, Rasoul  UIUC 
Organizer: Beck, Carolyn  University of Illinois 
Organizer: Etesami, Rasoul  UIUC 
Organizer: Hu, Bin  University of Illinois at UrbanaChampaign 
Organizer: Srikant, R  University of Illinois 

13:3013:50, Paper WeC2.1  Add to My Program 
Systematic Design of Decentralized Algorithms for Consensus Optimization (I) 
Han, Shuo  University of Illinois at Chicago 

13:5014:10, Paper WeC2.2  Add to My Program 
A New Approach to Distributed Hypothesis Testing and NonBayesian Learning: Improved Learning Rate and ByzantineResilience (I) 
Mitra, Aritra  Purdue University 
Sundaram, Shreyas  Purdue University 

14:1014:30, Paper WeC2.3  Add to My Program 
Nested Distributed Gradient Methods with Stochastic Computation Errors (I) 
Iakovidou, Charikleia  Northwestern University 
Wei, Ermin  Northwestern University 
Keywords: Optimization, Machine Learning and Learning Theory, Distributed and LargeScale Systems
Abstract: In this work, we consider the problem of a network of agents collectively minimizing a sum of convex functions. The agents in our setting can only access their local objective functions and exchange information with their immediate neighbors. Motivated by applications where computation is imperfect, including, but not limited to, empirical risk minimization (ERM) and online learning, we assume that only noisy estimates of the local gradients are available. To tackle this problem, we adapt a class of Nested Distributed Gradient methods (NEARDGD) to the stochastic gradient setting. These methods have minimal storage requirements, are communication aware and perform well in settings where gradient computation is costly, while communication is relatively inexpensive. We investigate the convergence properties of our method under standard assumptions for stochastic gradients, i.e. unbiasedness and bounded variance. Our analysis indicates that our method converges to a neighborhood of the optimal solution with a linear rate for local strongly convex functions and appropriate constant steplengths. We also show that distributed optimization with stochastic gradients achieves a noise reduction effect similar to minibatching, which scales favorably with network size. Finally, we present numerical results to demonstrate the effectiveness of our method.


14:3014:50, Paper WeC2.4  Add to My Program 
An Extension of Decentralized Bayesian Learning for Federated Learning (I) 
Javidi, Tara  University of California, San Diego 
Lalitha, Anusha  University of California San Diego 

14:5015:10, Paper WeC2.5  Add to My Program 
OffPolicy ActorCritic with TimeVarying Behavior Policies for Distributed Reinforcement Learning in Continuous State and Action Spaces (I) 
Suttle, Wesley  Stony Brook University 
Liu, Ji  Stony Brook University 

WeC3 Regular Session, Butternut 
Add to My Program 
Machine Learning with Applications 


Chair: Cohen, Kobi  BenGurion University of the Negev 

13:3013:50, Paper WeC3.1  Add to My Program 
A Distributed Stable Strategy Learning Algorithm for MultiUser Dynamic Spectrum Access 
Gafni, Tomer  BenGurion University of the Negev 
Cohen, Kobi  BenGurion University of the Negev 
Keywords: Machine Learning and Learning Theory, Wireless Communication Systems, Learning and Inference
Abstract: We consider the problem of multiuser dynamic spectrum access (DSA) in cognitive radio networks. The shared bandwidth is divided into K orthogonal channels, and M (secondary) users aim at accessing the spectrum, where K>=M. Each user is allowed to choose a single channel for transmission at each time slot. The state of each channel is modeled by a restless unknown Markovian process. By contrast to existing studies that analyzed a special case of this setting, in which each channel yields the same expected rate for all users, in this paper we consider the more general model, where each channel yields a different expected rate for each user. This general model adds a significant challenge of how to efficiently learn a channel allocation in a distributed manner so as to yield a global system wide objective. We adopt the stable matching utility as the system objective, which is known to yield strong performance in multichannel wireless networks, and develop a novel Distributed Stable Strategy Learning (DSSL) algorithm to achieve the objective. We prove theoretically that the DSSL algorithm converges to the stable matching allocation, and the regret, defined as the loss in total rate with respect to the stable matching solution, has a logarithmic order with time. Finally, we present numerical examples that support the theoretical results and demonstrate strong performance of the DSSL algorithm.


13:5014:10, Paper WeC3.2  Add to My Program 
Learning a DomainInvariant Embedding for Unsupervised Domain Adaptation Using ClassConditioned Distribution Alignment 
Alex, Gabourie  Stanford University 
Rostami, Mohammad  University of Pennsylvania 
Pope, Philip  HRL Labs 
Kolouri, Soheil  HRL Lab 
Kim, Kyungnam  HRL Labs 
Keywords: Machine Learning and Learning Theory
Abstract: We address the problem of unsupervised domain adaptation (UDA) by learning a crossdomain agnostic embedding space, where the distance between the probability distributions of the two source and target visual domains is minimized. We use the output space of a shared crossdomain deep encoder to model the embedding space anduse the SlicedWasserstein Distance (SWD) to measure and minimize the distance between the embedded distributions of two source and target domains to enforce the embedding to be domainagnostic.Additionally, we use the source domain labeled data to train a deep classifier from the embedding space to the label space to enforce the embedding space to be discriminative.As a result of this training scheme, we provide an effective solution to train the deep classification network on the source domain such that it will generalize well on the target domain, where only unlabeled training data is accessible. To mitigate the challenge of class matching, we also align corresponding classes in the embedding space by using high confidence pseudolabels for the target domain, i.e. assigning the class for which the source classifier has a high prediction probability. We provide experimental results on UDA benchmark tasks to demonstrate that our method is effective and leads to stateoftheart performance.


14:1014:30, Paper WeC3.3  Add to My Program 
GLIMPS: A Greedy MixedInteger Approach for Super Robust Matched Subspace Detection 
Rahman, Md Mahfuzur  Georgia State University 
PimentelAlarcon, Daniel Leonardo  University of WisconsinMadison 
Keywords: Machine Learning and Learning Theory, Detection and Estimation, Optimization
Abstract: Due to diverse nature of data acquisition and modern applications, many contemporary problems involve high dimensional datum x ∈ R d whose entries often lie in a union of subspaces and the goal is to ﬁnd out which entries of x match with a particular subspace U, classically called matched subspace detection. Consequently, entries that match with one subspace are considered as inliers w.r.t the subspace while all other entries are considered as outliers. Proportion of outliers relative to each subspace varies based on the degree of coordinates from subspaces. This problem is a combinatorial NPhard in nature and has been immensely studied in recent years. Existing approaches can solve the problem when outliers are sparse. However, if outliers are abundant or in other words if x contains coordinates from a fair amount of subspaces, this problem can’t be solved with acceptable accuracy or within a reasonable amount of time. This paper proposes a twostage approach called Greedy Linear Integer Mixed Programmed Selector (GLIMPS) for this abundantoutliers setting, which combines a greedy algorithm and mixed integer formulation and can tolerate over 80% outliers, outperforming the stateoftheart.


14:3014:50, Paper WeC3.4  Add to My Program 
Matching Observations to Distributions: Efficient Estimation via Sparsified Hungarian Algorithm 
Chewi, Sinho  University of California, Berkeley 
Yang, Forest  University of California, Berkeley 
Ghosh, Avishek  University of California, Berkeley 
Ramchandran, Kannan  University of California, Berkeley 
Parekh, Abhay  University of California, Berkeley 
Keywords: Learning and Inference, Detection and Estimation, Machine Learning and Learning Theory
Abstract: Suppose we are given observations, where each observation is drawn independently from one of k known distributions. The goal is to match each observation to the distribution from which it was drawn. We observe that the maximum likelihood estimator (MLE) for this problem can be computed using weighted bipartite matching, even when n, the number of observations per distribution, exceeds one. This is achieved by instantiating n duplicates of each distribution node. However, in the regime where the number of observations per distribution is much larger than the number of distributions, the Hungarian matching algorithm for computing the weighted bipartite matching requires O(n^3) time. We introduce a novel randomized matching algorithm that reduces the runtime to O(n^2) by sparsifying the original graph, returning the exact MLE with high probability. Next, we give statistical justification for using the MLE by bounding the excess risk of the MLE, where the loss is defined as the negative loglikelihood. We test these bounds for the case of isotropic Gaussians with equal covariances and whose means are separated by a distance eta, and find (1) that log k separation suffices to drive the proportion of mismatches of the MLE to 0, and (2) that the expected fraction of mismatched observations goes to zero at rate O((log k)^2/eta^2).


14:5015:10, Paper WeC3.5  Add to My Program 
A Theory of Uncertainty Variables for State Estimation and Inference 
Talak, Rajat  MIT 
Karaman, Sertac  Massachusetts Institute of Technology 
Modiano, Eytan  MIT 
Keywords: Learning and Inference, Machine Learning and Learning Theory, Detection and Estimation
Abstract: Probability theory forms an overarching framework for modeling uncertainty, and by extension, also in designing state estimation and inference algorithms. While it provides a good foundation to system modeling, analysis, and an understanding of the real world, its application to algorithm design suffers from computational intractability. In this work, we develop a new framework of uncertainty variables to model uncertainty. A simple uncertainty variable is characterized by an uncertainty set, in which its realization is bound to lie, while the conditional uncertainty is characterized by a set map, from a given realization of a variable to a set of possible realizations of another variable. We prove Bayes' law and the law of total probability equivalents for uncertainty variables. We define a notion of independence, conditional independence, and pairwise independence for a collection of uncertainty variables, and show that this new notion of independence preserves the properties of independence defined over random variables. We then develop a graphical model, namely Bayesian uncertainty network, a Bayesian network equivalent defined over a collection of uncertainty variables, and show that all the natural conditional independence properties, expected out of a Bayesian network, hold for the Bayesian uncertainty network. We also define the notion of point estimate, and show its relation with the maximum a posteriori estimate.


WeC4 Regular Session, Pine 
Add to My Program 
Optimization I 


Chair: Doan, Thinh  Georgia Institute of Technology 

13:3013:50, Paper WeC4.1  Add to My Program 
Convergence Rate Analysis for Distributed Optimization with Localization 
Kao, Hsu  University of Michigan 
Subramanian, Vijay  University of Michigan 
Keywords: Optimization, Distributed and LargeScale Systems, Decentralized Control Systems
Abstract: We study the effect of the localization scheme on the convergence rate in the setting of distributed gradient descent for smooth optimization. Localization is a technique that exploits the partial dependency structure in the objective functions to reduce the memory storage and communication involved in distributed optimization algorithms. We find that the localization scheme could reduce the convergence rate by a constant factor, which depends on the sizes of the local networks and the spectral radii of the associated optimal doubly stochastic matrices for averaging. The choice of local networks directly affects the memory cost, the communication cost, and the convergence speed. In order to characterize the relation, we explore how the optimal spectral radius depends on the topology of a graph, i.e., local network in the context. We provide a numerical example that shows the saving in terms of memory, communication, as well as iterations required could be tremendous with the localization schemes.


13:5014:10, Paper WeC4.2  Add to My Program 
Robust Convergence Analysis of ThreeOperator Splitting 
Wang, Han  University of Pennsylvania 
Fazlyab, Mahyar  University of Pennsylvania 
Chen, Shaoru  University of Pennsylvania 
Preciado, Victor M.  University of Pennsylvania 
Keywords: Optimization, Robust and Nonlinear Control, Machine Learning and Learning Theory
Abstract: Operator splitting methods solve composite optimization problems by breaking them into smaller subproblems that can be solved sequentially or in parallel. In this paper, we propose a unified framework for certifying both linear and sublinear convergence rates for threeoperator splitting (TOS) method under a variety of assumptions about the objective function. By viewing the algorithm as a dynamical system with feedback uncertainty (the oracle model), we leverage robust control theory to analyze the worstcase performance of the algorithm using matrix inequalities. We then show how these matrix inequalities can be used to guide the search for selecting the parameters of the algorithm (both symbolically and numerically) for optimal worstcase performance. Our framework yields tighter bounds relative to competing bounds in the literature. We illustrate our results numerically by solving an inputconstrained optimal control problem.


14:1014:30, Paper WeC4.3  Add to My Program 
Linear TwoTimeScale Stochastic Approximation: A FiniteTime Analysis 
Doan, Thinh  Georgia Institute of Technology 
Romberg, Justin  Georgia Tech 
Keywords: Optimization, Performance Analysis, Machine Learning and Learning Theory
Abstract: We consider twotimescale stochastic approximation for finding the solution of a linear system of two equations. Such methods have found broad applications in many areas, especially in machine learning and reinforcement learning. A critical question in this area is to analyze the convergence rates (or sample complexity) of this method, which has not been fully addressed in the existing literature. Our contribution in this paper is, therefore, to provide a new analysis for the finitetime performance of the twotimescale stochastic approximation. Our key idea is to leverage the common techniques from optimization, in particular, we utilize a residual function to capture the coupling between the two iterates. This will allow us to explicit design the two step sizes used by the two iterations as well as to provide a finitetime error bound on the convergence of the two iterates. Our analysis in this paper provides another aspect to the existing techniques in the literature of twotimescale stochastic approximation, which we believe is more elegant and can be more applicable to many scenarios.


14:3014:50, Paper WeC4.4  Add to My Program 
Cubic Regularized ADMM with Convergence to a Local Minimum in NonConvex Optimization 
Shi, Zai  The Ohio State University 
Eryilmaz, Atilla  Ohio State University 
Keywords: Optimization, Machine Learning and Learning Theory
Abstract: How to escape saddle points is a critical issue in nonconvex optimization. Previous methods on this issue mainly assume that the objective function is HessianLipschitz, which leave a gap for applications using nonHessianLipschitz functions. In this paper, we propose Cubic Regularized Alternating Direction Method of Multipliers (CRADMM) to escape saddle points of separable nonconvex functions containing a nonHessianLipschitz component. By carefully choosing a parameter, we prove that CRADMM converges to a local minimum of the original function with a rate of O(1/T^(1/3)) in time horizon T, which is faster than gradientbased methods. We also show that when one or more steps of CRADMM are not solved exactly, CRADMM can converge to a neighborhood of the local minimum. Through the experiments of matrix factorization problems, CRADMM is shown to have a faster rate and a lower optimality gap compared with other gradientbased methods. Our approach can also find applications in other scenarios where regularized nonconvex cost minimization is performed, such as parameter optimization of deep neural networks.


14:5015:10, Paper WeC4.5  Add to My Program 
Byzantine FaultTolerant Parallelized Stochastic Gradient Descent for Linear Regression 
Gupta, Nirupam  Georgetown University 
Vaidya, Nitin  Georgetown University 
Keywords: Reliability, Security and Trust, Data Analytics, Detection and Estimation
Abstract: This paper addresses the problem of Byzantine faulttolerance in parallelized stochastic gradient descent (SGD) method solving for a linear regression problem. We consider a synchronous system comprising of a master and multiple workers, where up to a (known) constant number of workers are Byzantine faulty. Byzantine faulty workers may send incorrect information to the master during an execution of the parallelized SGD method. To mitigate the detrimental impact of Byzantine faulty workers, we replace the averaging of gradients in the traditional parallelized SGD method by a provably more robust gradient aggregation rule. The crux of the proposed gradient aggregation rule is a gradientfilter, named comparative gradient clipping (CGC) filter. We show that the resultant parallelized SGD method obtains a good estimate of the regression parameter even in presence of bounded fraction of Byzantine faulty workers. The upper bound derived for the asymptotic estimation error only grows linearly with the fraction of Byzantine faulty workers.


WeC5 Invited Session, Lower Level 
Add to My Program 
Sequential Methods II 


Chair: Veeravalli, Venu  University of Illinois 
Organizer: Fellouris, Georgios  University of Illinois 
Organizer: Veeravalli, Venu  University of Illinois 

13:3013:50, Paper WeC5.1  Add to My Program 
Sequential ModelFree Anomaly Detection for Big Data Streams (I) 
Kurt, Mehmet Necip  Columbia University 
Yilmaz, Yasin  University of South Florida 
Wang, Xiaodong  Columbia University 
Keywords: Detection and Estimation, Learning and Inference, Statistical Signal Processing
Abstract: We study sequential anomaly detection for big data streams where the nominal and anomalous highdimensional probabilistic data models are unknown. We propose a modelfree solution approach in that we firstly compute a set of univariate summary statistics from a nominal dataset in an offline phase where the summary statistics are useful to distinguish anomalous data from nominal data. We then evaluate whether the online summary statistics deviate from the nominal case via a cumulative sumlike detector. Our experiments with realworld data illustrate the advantages of the proposed detector in early and reliable anomaly detection in big data settings compared to the existing alternatives.


13:5014:10, Paper WeC5.2  Add to My Program 
Adaptive Online Monitoring of the Ising Model (I) 
Suh, Namjoon  Georgia Institute of Technology 
Zhang, Ruizhi  University of Nebraska, Lincoln 
Mei, Yajun  Georgia Institute of Technology 
Keywords: Sensor Networks, Adaptive Control and Automation, Detection and Estimation
Abstract: Ising model is a general framework for capturing the dependency structure among random variables. It has many interesting realworld applications in the fields of medical imaging, genetics, disease surveillance, etc. Nonetheless, literature on the online changepoint detection of the interaction parameter in the model is rather limited. This might be attributed to following two challenges: 1) the exact evaluation of the likelihood function with the given data is computationally infeasible due to the presence of partition function and 2) the postchange parameter usually is unknown. In this paper, we overcome these two challenges via our proposed adaptive pseudoCUSUM procedure, which incorporates the notion of pseudolikelihood function under the CUSUM framework.


14:1014:30, Paper WeC5.3  Add to My Program 
Controlled Sensing for Hypothesis Testing: NonStationary Markov Process (I) 
Tajer, Ali  Rensselaer Polytechnic Institute 

14:3014:50, Paper WeC5.4  Add to My Program 
Robust Jeffery's Divergence for Sequential ChangePoint Detection (I) 
Ahad, Nauman  Georgia Institute of Technology 
Xie, Yao  Georgia Institute of Technology 
Davenport, Mark  Georgia Institute of Technology 

14:5015:10, Paper WeC5.5  Add to My Program 
Stochastic Gradient Descent on a Tree: An Adaptive and Robust Approach to Stochastic Convex Optimization (I) 
Vakili, Sattar  Cornell Univeristy 
Salgia, Sudeep  Cornell University 
Zhao, Qing  Cornell University 
Keywords: Machine Learning and Learning Theory, Optimization
Abstract: Online minimization of an unknown convex function is considered under firstorder stochastic bandit feedback, which returns a random realization of the gradient of the function at each query point. Without knowing the distribution of the random gradients, a learning algorithm sequentially chooses query points with the objective of minimizing regret defined as the expected cumulative loss of the function values at the query points in excess to the minimum value of the function. An approach based on devising a biased random walk on an infinitedepth binary tree constructed through successive partitioning of the domain of the function is developed. Each move of the random walk is guided by a sequential test based on confidence bounds on the empirical mean constructed using the law of the iterated logarithm. With no tuning parameters, this learning algorithm is robust to heavytailed noise with infinite variance and adaptive to unknown function characteristics (specifically, convex, strongly convex, and nonsmooth). It offers better or matching regret orders as the classical stochastic gradient descent approach which requires the knowledge of the function characteristics for tuning the sequence of stepsizes.


WeD1 Invited Session, Library 
Add to My Program 
Bioinformatics and Computational Genomics 


Chair: Ochoa, Idoia  University of Illinois at UrbanaChampaign 
CoChair: Hernaez, Mikel  University of Illinois at UrbanaChampaign 
Organizer: Hernaez, Mikel  UIUC 
Organizer: Ochoa, Idoia  University of Illinois at UrbanaChampaign 
Organizer: Shomorony, Ilan  University of Illinois at UrbanaChampaign 
Organizer: ElKebir, Mohammed  University of Illinois at UrbanaChampaign 

15:3015:50, Paper WeD1.1  Add to My Program 
Casting Analytics to Capture Individual Variations in Complex Diseases (I) 
Kalari, Krishna  Mayo Clinic 

15:5016:10, Paper WeD1.2  Add to My Program 
Models, Methods, and Statistical Techniques for Single Cell RNASeq Experiments (I) 
Grama, Ananth  Purdue University 
Mohammadi, Shahin  MIT 

16:1016:30, Paper WeD1.3  Add to My Program 
Alerton Condensation Abstract Submission (I) 
Krishnaswamy, Smita  Yale University 

16:3016:50, Paper WeD1.4  Add to My Program 
Learning Spatial Structures through Hierarchical Models (I) 
Banerjee, Anjishnu  Medical College of Wisconsin 
LaViolette, Peter  Medical College of Wisconsin 

16:5017:10, Paper WeD1.5  Add to My Program 
Reconstruction of Tumor Population from HighThroughput Sequencing Data: A Mixed Integer Linear Programming Framework (I) 
Shorya, Consul  UT Austin 
Vikalo, Haris  The University of Texas at Austin 

WeD2 Invited Session, Solarium 
Add to My Program 
Distributed Control, Optimization, and Learning II 


Chair: Sun, Ruoyu  UIUC 
Organizer: Beck, Carolyn  University of Illinois 
Organizer: Etesami, Rasoul  UIUC 
Organizer: Hu, Bin  University of Illinois at UrbanaChampaign 
Organizer: Srikant, R  University of Illinois 

15:3015:50, Paper WeD2.1  Add to My Program 
Convergence Properties of Social HegselmannKrause Dynamics (I) 
Parasnis, Rohit  University of California, San Diego 
Franceschetti, Massimo  University of California at San Diego 
Touri, Behrouz  University of California, San Diego 

15:5016:10, Paper WeD2.2  Add to My Program 
Learning Transition Statistics in Networks of Interacting Agents (I) 
Fiscko, Carmel  Carnegie Mellon University 
Kar, Soummya  Carnegie Mellon University 
Sinopoli, Bruno  Washington University in St. Louis 
Keywords: Decentralized Control Systems
Abstract: Studying the decisionmaking of agents can reveal group behavior and internal lines of influence. We work with systems of interacting agents, where the decisionmaking of each agent is affected by their neighbors within some graph structure. As agents make choices, the stochastic transitions between chosen group actions can be learned, and thus the group behavior can be characterized and predicted. We express each element of the transition matrix P as the product p_{ij}=prod_{text{agents}} P(text{agent's action}text{actions of agent's neighbors}), showing that the graph structure leads to a separable estimator for the unknown p_{ij} of interest. This enables us to find a maximum likelihood estimator (MLE) for each factor and thus effectively estimate each p_{ij} with reduced complexity. We derive analytical concentration bounds for the error rates of this approach and demonstrate it on data sets.


16:1016:30, Paper WeD2.3  Add to My Program 
Simple, Private, and Accurate Distributed Averaging (I) 
Ridgley, Israel  Northwestern University 
Freeman, Randy  Northwestern University 
Lynch, Kevin  Northwestern University 
Keywords: Optimization, Sensor Networks, Complex Networked Systems
Abstract: Some distributed optimization applications require privacy, meaning that the values of certain parameters local to a node should not be revealed to other nodes in the network during the joint optimization process. A special case is the problem of private distributed averaging, in which a network of nodes computes the global average of individual node reference parameters in a distributed manner while preserving the privacy of each reference. We present simple iterative methods that guarantee high accuracy (i.e. the exact asymptotic computation of the global average) and high privacy (i.e. no node can estimate another node's reference value to any meaningful degree). To achieve this, we assume that the digraph modeling the communication between nodes satisfies certain topological conditions. Other related methods in the literature also achieve high accuracy and privacy, but under topological conditions more restrictive than ours. Moreover, our method is simpler because it does not require any initial scrambling phase, it does not inject any noise or other masking signals into the distributed computation, it does not require any random switching of edge weights, and it does not rely on homomorphic encryption.


16:3016:50, Paper WeD2.4  Add to My Program 
Robust Reinforcement Learning with Unmodeled Dynamics (I) 
Seiler, Peter  University of Minnesota 
Venkataraman, Harish Kumaar  University of Minnesota 

16:5017:10, Paper WeD2.5  Add to My Program 
Analysis and Design of FirstOrder Distributed Optimization Algorithms Over TimeVarying Graphs (I) 
Sundararajan, Akhil  University of WisconsinMadison 
Van Scoy, Bryan  University of WisconsinMadison 
Lessard, Laurent  University of WisconsinMadison 

WeD3 Regular Session, Butternut 
Add to My Program 
Reliability, Security and Trust 


Chair: Peng, Pei  Rutgers, the State University of New Jersey 

15:3015:50, Paper WeD3.1  Add to My Program 
Straggling for Covert Message Passing on Complete Graphs 
Peng, Pei  Rutgers, the State University of New Jersey 
Melissaris, Nikolas  Rutgers University 
Soljanin, Emina  Rutgers University 
Keywords: Reliability, Security and Trust, Wireless Communication Systems, Performance Analysis
Abstract: We introduce a model for mobile, multiagent information transfer that increases the communication covertness through a protocol which also increases the information transfer delay. Covertness is achieved in the presence of a warden who has the ability to patrol the communication channels. Furthermore we show how two forms of redundancy can be used as an effective tool to control the tradeoff between the covertness and the delay.


15:5016:10, Paper WeD3.2  Add to My Program 
Collaborative Privacy for Web Applications 
Hu, Yihao  Boston University 
Trachtenberg, Ari  Boston University 
Ishwar, Prakash  Boston University 
Keywords: Reliability, Security and Trust, Coding Techniques and Applications, Information Theory
Abstract: Realtime, onlineediting web apps provide free and convenient services for collaboratively editing, sharing and storing files. The benefits of these web applications do not come for free: not only do service providers have full access to the users' files, but they also control access, transmission, and storage mechanisms for them. As a result, user data may be at risk of data mining, thirdparty interception, or even manipulation. To combat this, we propose a new system for helping to preserve the privacy of user data within collaborative environments. There are several distinct challenges in producing such a system, including developing an encryption mechanism that does not interfere with the backend (and often proprietary) control mechanisms utilized by the service, and identifying transparent code hooks through which to obfuscate user data. Toward the first challenge, we develop a characterlevel encryption scheme that is more resilient to the types of attacks that plague classical substitution ciphers. For the second challenge, we design a browser extension that robustly demonstrates the feasibility of our approach, and show a concrete implementation for Google Chrome and the widelyused Google Docs platform. Our example tangibly demonstrates how several users with a shared key can ollaboratively and transparently edit a Google Docs document without revealing the plaintext directly to Google.


16:1016:30, Paper WeD3.3  Add to My Program 
A CyberNuclear Deterrence Game 
Soper, Braden C.  Lawrence Livermore National Laboratory 
Keywords: Reliability, Security and Trust, Complex Networked Systems
Abstract: The reliability of nuclear command, control and communications has long been identified as a critical component of the strategic stability among nuclear states. Advances in offensive cyber weaponry have the potential to negatively impact this reliability, threatening strategic stability. In this paper we present a game theoretic model of preemptive cyber attacks against nuclear command, control and communications. The model is a modification of the classic twoplayer game of Chicken, a standard game theoretic model for nuclear brinksmanship. We fully characterize equilibria in both the complete information game and two distinct twosided incomplete information games. We show that when both players have advanced cyber capabilities conflict is more likely in equilibrium, regardless of information structure. On the other hand, when at most one player has advanced cyber capabilities, strategic stability depends on the information structure. Under complete information, asymmetric cyber capabilities have a stabilizing effect in which the player with strong cyber has the resolve to stand firm in equilibrium. Under incomplete information, asymmetric cyber capabilities can have both stabilizing and destabilizing effects depending on prior beliefs over opponent cyber capabilities.


16:3016:50, Paper WeD3.4  Add to My Program 
Local Voting Games for Misbehavior Detection in VANETs in Presence of Uncertainty 
Behfarnia, Ali  Wichita State University 
Eslami, Ali  Wichita State University 
Keywords: Intrusion/Anomaly Detection and Diagnosis, Sensor Networks
Abstract: Cooperation between neighboring vehicles is an effective solution to the problem of malicious node identification in vehicular ad hoc networks (VANETs). However, the outcome is subject to nodes' beliefs and reactions in the collaboration. In this paper, a plain gametheoretic approach that captures the uncertainty of nodes about their monitoring systems, the type of their neighboring nodes, and the outcome of the cooperation is proposed. In particular, one stage of a local votingbased mechanism (game) for identifying a target node is developed using a Bayesian game. In this context, incentives are offered in expected utilities of nodes in order to promote cooperation in the network. The proposed model is then analyzed to obtain equilibrium points, ensuring that no node can improve its utility by changing its strategy. Finally, the behavior of malicious and benign nodes is studied by extensive simulation results. Specifically, it is shown how the existing uncertainties and the designed incentives impact the strategies of the players and, consequently, the correct targetnode identification.


16:5017:10, Paper WeD3.5  Add to My Program 
Improving Privacy in Graphs through Node Addition 
Takbiri, Nazanin  University of Massachusetts Amherst 
Shao, Xiaozhe  University of Massachusetts Amherst 
Gao, Lixin  University of Massachusetts Amherst 
PishroNik, Hossein  University of Massachusetts Amherst 
Keywords: Reliability, Security and Trust, Information Theory and Statistics, Network Information Theory
Abstract: The rapid growth of computer systems which generate graph data necessitates employing privacypreserving mechanisms to protect users' identity. Since structurebased deanonymization attacks can reveal users' identity's even when the graph is simply anonymized by employing naive ID removal, recently, kanonymity is proposed to secure users' privacy against the structurebased attack. Most of the work ensured graph privacy using fake edges, however, in some applications, edge addition or deletion might cause a significant change to the key property of the graph. Motivated by this fact, in this paper, we introduce a novel method which ensures privacy by adding fake nodes to the graph. First, we present a novel model which provides kanonymity against one of the strongest attacks: seedbased attack. In this attack, the adversary knows the partial mapping between the main graph and the graph which is generated using the privacypreserving mechanisms. We show that even if the adversary knows the mapping of all of the nodes except one, the last node can still have kanonymity privacy. Then, we turn our attention to the privacy of the graphs generated by interdomain routing against degree attacks in which the degree sequence of the graph is known to the adversary. To ensure the privacy of networks against this attack, we propose a novel method which tries to add fake nodes in a way that the degree of all nodes have the same expected value.


17:1017:30, Paper WeD3.6  Add to My Program 
PrivacyPreserving Adversarial Networks 
Tripathy, Ardhendu Shekhar  University of WisconsinMadison 
Wang, Ye  Mitsubishi Electric Research Laboratories 
Ishwar, Prakash  Boston University 
Keywords: Reliability, Security and Trust, Machine Learning and Learning Theory, Learning and Inference
Abstract: We propose a datadriven framework for optimizing privacypreserving data release mechanisms to attain the informationtheoretically optimal tradeoff between minimizing distortion of useful data and concealing specific sensitive information. Our approach employs adversariallytrained neural networks to implement randomized mechanisms and to perform a variational approximation of mutual information privacy. We validate our PrivacyPreserving Adversarial Networks (PPAN) framework via proofofconcept experiments on discrete and continuous synthetic data, as well as the MNIST handwritten digits dataset. For synthetic data, our modelagnostic PPAN approach achieves tradeoff points very close to the optimal tradeoffs that are analyticallyderived from model knowledge. In experiments with the MNIST data, we visually demonstrate a learned tradeoff between minimizing the pixellevel distortion versus concealing the written digit.


WeD4 Invited Session, Pine 
Add to My Program 
Computational Imaging and Inverse Problems 


Chair: Do, Minh  University of Illinois 
CoChair: Zhao, Zhizhen  University of Illinois at UrbanaChampaign 
Organizer: Dokmanic, Ivan  University of Illinois at UrbanaChampaign 
Organizer: Zhao, Zhizhen  University of Illinois at UrbanaChampaign 
Organizer: Do, Minh  University of Illinois 

15:3015:50, Paper WeD4.1  Add to My Program 
FarSubwavelength Imaging Based on Motion in Structured Illumination (I) 
Webb, Kevin John  Purdue University 
Luo, Qiaoen  Purdue University 
Hastings, Ryan  Purdue University 
Raghuram, Vivek  Purdue University 
Patel, Justin A.  Purdue University 

15:5016:10, Paper WeD4.2  Add to My Program 
Early Stopped Gradient Descent on Convolutional Generators Provably Regularizes (I) 
Heckel, Reinhard  TUM 
Soltalolkotabi, Mahdi  USC 

16:1016:30, Paper WeD4.3  Add to My Program 
Neumann Networks for Inverse Problems in Imaging (I) 
Ongie, Greg  University of Chicago 
Gilton, Davis  University of Wisconsin 
Willett, Rebecca  University of Chicago 

16:3016:50, Paper WeD4.4  Add to My Program 
Surface and Function Recovery in HighDimensional Spaces (I) 
Jacob, Mathews  University of Iowa 

16:5017:10, Paper WeD4.5  Add to My Program 
PlugAndPlay and Unrolled AMP for Accelerated MRI (I) 
Sarkar, Subrata  Ohio State 
Reehorst, Edward  Ohio State University 
Ahmad, Rizwan  Ohio State 
Schniter, Philip  The Ohio State University 

WeD5 Invited Session, Lower Level 
Add to My Program 
Reinforcement Learning and Multiarmed Bandits I 


Chair: Sirignano, Justin  University of Illinois at UrbanaChampaign 
Organizer: Hajek, Bruce  University of Illinois 
Organizer: Srikant, R  University of Illinois 

15:3015:50, Paper WeD5.1  Add to My Program 
An InformationTheoretic Analysis of Posterior Sampling for Stochastic Graphical Bandits (I) 
Liu, Fang  The Ohio State University 
Shroff, Ness  The Ohio State University 

15:5016:10, Paper WeD5.2  Add to My Program 
Practical Algorithms for Multiplayer Bandits (I) 
Kaufmann, Emilie  CNRS 

16:1016:30, Paper WeD5.3  Add to My Program 
On the Throughput vs Accuracy TradeOff for Streaming Classification (I) 
Basu, Soumya  UT Austin 
Gutstein, Steven  US ARL 
Lance, Brent  ARLHRED 
Shakkottai, Sanjay  The University of Texas at Austin 

16:3016:50, Paper WeD5.4  Add to My Program 
Revisiting Sample Complexity in Reinforcement Learning (I) 
Proutiere, Alexandre  KTH 
Ok, Jungseul  UIUC 

16:5017:10, Paper WeD5.5  Add to My Program 
On the Sample Complexity of Distributed Linear Optimal Controllers (I) 
Fattahi, Salar  University of California, Berkeley 
Matni, Nikolai  University of Pennsylvania 
Sojoudi, Somayeh  UC Berkeley 

17:1017:30, Paper WeD5.6  Add to My Program 
Stopping, Matching and Learning (I) 
Zeevi, Assaf  Columbia University 
 