Keywords:Optimization, Data Analytics Abstract: We present a randomized algorithm for reconstructing directed rooted trees of n nodes and node degree at most d, by asking at most O(dn log^2 n) path queries. Each path query takes as input an origin node and a target node, and answers whether there is a directed path from the origin to the target. Regarding lower bounds, we show that any randomized algorithm requires at least Omega(n log n) queries, while any deterministic algorithm requires at least Omega(dn) queries. Additionally, we present a O(dn log^3 n) randomized algorithm for noisy queries, and a O(dn log^2 n) randomized algorithm for additive queries on weighted trees.

Keywords:Coding Techniques and Applications, Coding Theory Abstract: In this paper, we give a new scheme for Joint Source-Channel Coding of Gaussian sources over AWGN channels. We present a novel VAE architecture that exploits the manifold nature of joint source-channel coding. Since the projection onto a complex manifold is a highly discontinuous function we split the encoder into two arms and a selector. This design combined with the power of data-driven deep neural networks allows us to efficiently train an encoder-decoder system end-to-end. The scheme achieves state of the art performance for five out of the seven configurations tested upon and match the state of the art for one of the other configurations for almost all channel conditions tested upon. Further, without providing any expert prior on the optimum JSCC encoder, the proposed method learns an encoding function that is similar to the ones proposed and found by the researchers in the past three decades via heuristic approaches or complex variational analysis for simple configurations. It is also able to find excellent encoders and decoders for those source channel dimensions where past works have been unable to find any good solutions. Finally, we show that the system is also robust to changes in channel conditions.

Keywords:Wireless Communication Systems, Multiuser Detection and Estimation, Information Theoretic Approaches in Wireless Communications Abstract: Cell-free massive MIMO (CF-M-MIMO) systems represent an evolution of the classical cellular architecture that has dominated the mobile landscape for decades. In CF-M-MIMO, a central processing unit (CPU) controls a multitude of access points (APs) that are irregularly scattered throughout the coverage area effectively becoming a fully distributed implementation of the M-MIMO technology. As such, it inherits many of the key properties that have made M-MIMO one of the physical layer pillars of 5G systems while opening the door to new features not available in M-MIMO. Among the latest is the possibility of performing the precoding at the CPU (centralized) or at the APs (distributed) with the former known to offer much better performance at the cost of having to collect all the relevant channel state information (CSI) at the CPU. Realistic deployments of cell-free systems are likely to require more than one CPU when the area to be covered is large, thus a critical issue that needs to be solved is how these multiple CPUs should be interconnected. This paper analyzes and proposes designs for different degrees of interconnectivity among the CPUs for the specific case of centralized zero-forcing precoding. Results show that a modest form of CPU interconnection can boost very significantly the max-min rate performance so prevalent in CF-M-MIMO architectures.

Keywords:Sensor Networks in Communications, Sensor Networks, Performance Analysis Abstract: The notion of timely status updating is investigated in the context of cloud computing. Measurements of a time-varying process of interest are acquired by a sensor node, and uploaded to a cloud server to undergo some required computations. These computations consume random amounts of service time that are independent and identically distributed across different uploads. After the computations are done, the results are delivered to a monitor, constituting an update. The goal is to keep the monitor continuously fed with fresh updates over time, which is assessed by an age-of-information (AoI) metric. A scheduler is employed to optimize the measurement acquisition times. Following an update, an idle waiting period may be imposed by the scheduler before acquiring a new measurement. The scheduler also has the capability to preempt a measurement in progress if its service time grows above a certain cutoff time, and upload a fresher measurement in its place. Focusing on stationary deterministic policies, in which waiting times are deterministic functions of the instantaneous AoI and the cutoff time is fixed for all uploads, it is shown that the optimal waiting policy that minimizes the long term average AoI has a threshold structure, in which a new measurement is uploaded following an update only if the AoI grows above a certain threshold that is a function of the service time distribution and the cutoff time. The optimal cutoff is then found for standard and shifted exponential ser

Keywords:Coding Theory, Wireless Communication Systems, Coding Techniques and Applications Abstract: Visible light communication (VLC) uses run length limited (RLL) code for avoiding flicker and supporting various dimming ranges. In this paper, we propose a low complexity split phase code as RLL code in serial concatenation with a convolutional code as a forward error correction (FEC) code for VLC. We use the extrinsic information transfer (EXIT) chart to explain the iterative decoding behavior of the proposed serial concatenated scheme. We also use puncturing and compensation symbols to support various dimming range in VLC. Thereafter, we derive an expression for upper bound to the average probability of bit error for the proposed VLC system, under maximum likelihood decoding. Furthermore, we propose a method of code mixing in inner RLL code to improve the bit error rate performance in the low signal to noise ratio regime. EXIT chart is used to analyze the effect of RLL code mixing on the convergence threshold of iterative concatenated coding scheme.

Keywords:Coding Techniques and Applications, Coding Theory, Information Theory Abstract: Consider cache-aided broadcast channels with user cooperation, where a server connects with K users each equipped with a cache memory, while the users can communicate with each other through a cooperation network. We introduced a new definition of transmission delay, together with a decentralized coded caching scheme with a novel randomized delivery strategy. By exploiting parallel transmission between the server and users, and optimally allocating communication loads, the transmission delay was greatly reduced, creating cooperation gain and parallel gain. It was shown that if the cache size of each users is larger than a small threshold that tends to zero as K goes to infinity, the scheme approaches the information theoretic lower bound with a constant multiplicative factor.

Keywords:Coding Theory, Coding Techniques and Applications, Information Theory Abstract: Cryptographic protocols are often implemented at upper layers of communication networks, while error-correcting codes are employed at the physical layer. In this paper, we consider utilizing readily-available physical layer functions, such as encoders and decoders, together with shared keys to provide a threshold-type security scheme. To this end, the effect of physical layer communication is abstracted out and the channels between the legitimate parties, Alice and Bob, and the eavesdropper Eve are assumed to be noiseless. We introduce a model for threshold-secure coding, where Alice and Bob communicate using a shared key in such a way that Eve does not get any information, in an information-theoretic sense, about the key as well as about any subset of the input symbols of size up to a certain threshold. Then, a framework is provided for constructing threshold-secure codes form linear block codes while characterizing the requirements to satisfy the reliability and security conditions. Moreover, we propose a threshold-secure coding scheme, based on Reed-Muller (RM) codes, that meets security and reliability conditions. Furthermore, it is shown that the encoder and the decoder of the scheme can be implemented efficiently with quasi-linear time complexity. In particular, a low-complexity successive cancellation decoder is shown for the RM-based scheme. Also, the scheme is flexible and can be adapted given any key length.

Keywords:Network Coding, Performance Analysis, Coding Theory Abstract: Numerous applications require the sharing of data from each node on a network with every other node. In the case of Connected and Autonomous Vehicles (CAVs), it will be necessary for vehicles to update each other with their positions, manoeuvring intentions, and other telemetry data, despite shadowing caused by other vehicles. These applications require scalable, reliable, low latency communications, over challenging broadcast channels. In this article, we consider the allcast problem, of achieving multiple simultaneous network broadcasts, over a broadcast medium. We model slow fading using random graphs, and show that an allcast method based on sparse random linear network coding can achieve reliable allcast in a constant number of transmission rounds. We compare this with an uncoded baseline, which we show requires O(log(n)) transmission rounds. We justify and compare our analysis with extensive simulations.

Keywords:Optimization, Distributed and Large-Scale Systems, Decentralized Control Systems Abstract: In this paper we introduce a new class of local linear operators on graphs, the sheaf Laplacians, which provide drop-in replacements for graph Laplacians in distributed algorithms. These operators can enforce more general constraints on data distributed in a network than those given by the graph Laplacian. The constraints for such optimization problems can be framed in the context of sheaf cohomology, leading to a description of this framework as distributed optimization with homological constraints. We formulate a representative problem, elucidate its solution with sheaf Laplacians, and give illustrative examples of potential applications.

Keywords:Optimization Abstract: We study continuous time job scheduling and server speed scaling problems with focus on minimizing service and job waiting costs. Jobs arrive over time and are to be served by a server. Each job is characterized by its arrival time, service requirement and deadline. The processor incurs service cost, the service cost rate being an increasing convex function of the service rate. Further, the jobs also incur linear waiting cost. We first consider the offline version of the problem where arrival times, requirements and deadlines of all the jobs are known a priori. We propose job scheduling algorithms that minimize the total accrued cost. We then consider the online version of the problem and propose two heuristics. Our problem can be seen as an extension of the well studied speed scaling problem with a delay cost also included.

Keywords:Optimization Abstract: There has been much recent interest in hierarchies of progressively stronger convexifications of polynomial optimisation problems (POP). These often converge to the global optimum of the POP, at least asymptotically, but prove challenging to solve beyond the first level in the hierarchy for modest instances. We present a finer-grained variant of the Lasserre hierarchy, together with first-order methods for solving the convexifications, which allow for efficient warm-starting with solutions from lower levels in the hierarchy.

Keywords:Optimization, Distributed and Large-Scale Systems Abstract: In this paper, an asynchronous random projection algorithm is introduced to solve a distributed constrained convex optimization problem over a time-varying multi-agent network. In this asynchronous case, each agent computes its estimate by exchanging information with its neighbors within a bounded delay lapse. For diminishing uncoordinated stepsizes and some standard conditions on the gradient errors, we provide a convergence analysis of Distributed Asynchronous Random Projection Algorithm (DARPA) to the same optimal point under an arbitrary uniformly bounded delay.

Keywords:Optimization, Distributed and Large-Scale Systems Abstract: The paper studies continuous-time distributed gradient descent (DGD) and considers the problem of showing that in nonconvex optimization problems, DGD typically converges to local minima rather than saddle points. In centralized settings, the problem of demonstrating nonconvergence to saddle points is typically handled by way of the stable-manifold theorem from classical dynamical systems theory. However, the classical stable-manifold theorem is not applicable in the distributed setting. The paper develops an appropriate stable-manifold theorem for DGD. This shows that convergence to saddle points may only occur from a low-dimensional stable manifold. Under appropriate assumptions (e.g., coercivity), the result implies that DGD almost always converges to local minima.

Keywords:Information Theory and Statistics Abstract: We study the problem of community detection when there is covariate information about the node labels and one observes multiple correlated networks. We provide an asymptotic upper bound on the per-node mutual information as well as a heuristic analysis of a multivariate performance measure called the MMSE matrix. These results show that the combined effects of seemingly very different types of information can be characterized explicitly in terms of formulas involving low-dimensional estimation problems in additive Gaussian noise. Our analysis is supported by numerical simulations.

Keywords:Coding Techniques and Applications, Information Theory Abstract: This paper considers the problem of Quantitative Group Testing (QGT) where there are some defective items among a large population of N items. We consider the scenario in which each item is defective with probability K/N, independently from the other items. In the QGT problem, the goal is to identify all or a sufficiently large fraction of the defective items by testing groups of items, with the minimum possible number of tests. In particular, the outcome of each test is a non-negative integer which indicates the number of defective items in the tested group. In this work, we propose a non-adaptive QGT scheme for the underlying randomized model for defective items, which utilizes sparse graph codes over irregular bipartite graphs with optimized degree profiles on the left nodes of the graph as well as binary t-error-correcting BCH codes. We show that in the sub-linear regime, i.e., when the ratio K/N vanishes as N grows unbounded, the proposed scheme with {m=c(t,d)K(tlog (frac{ell N}{c(t,d)K}+1)+1)} tests can identify all the defective items with probability approaching 1, where d and ell are the maximum and average left degree, respectively, and c(t,d) depends only on t and d (and does not depend on K and N). For any tleq 4, the testing and recovery algorithms of the proposed scheme have the computational complexity of mathcal{O}(Nlog frac{N}{K}) and mathcal{O}(Klog frac{N}{K}), respectively.

Keywords:Information Theory Abstract: In this paper we develop the elements of the theory of algorithmic randomness in continuous-time Markov chains (CTMCs). Our main contribution is a rigorous, useful notion of what it means for an individual trajectory of a CTMC to be random. CTMCs have discrete state spaces and operate in continuous time. This, together with the fact that trajectories may or may not halt, presents challenges not encountered in more conventional developments of algorithmic randomness.

Although we formulate algorithmic randomness in the general context of CTMCs, we are primarily interested in the computational power of stochastic chemical reaction networks, which are special cases of CTMCs. This leads us to embrace situations in which the long-term behavior of a network depends essentially on its initial state and hence to eschew assumptions that are frequently made in Markov chain theory to avoid such dependencies.

After defining the randomness of trajectories in terms of a new kind of martingale (algorithmic betting strategy), we prove equivalent characterizations in terms of constructive measure theory and Kolmogorov complexity.

Keywords:Information Theory, Network Information Theory Abstract: We consider the problem of reconciling similar, but remote, strings with minimum communication complexity. This "string reconciliation" problem is a fundamental building block for a variety of networking applications, including those that maintain large-scale distributed networks and perform remote file synchronization. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that is computationally practical for large strings and scales linearly with the edit distance between the remote strings. We provide comparisons to the performance of rsync, one of the most popular file synchronization tools in active use. Our experiments show that, with minimal engineering, RCDS outperforms the heavily optimized rsync in reconciling release revisions for about 51% of the 5000 top starred git repositories on GitHub. The improvement is particularly evident for repositories that see frequent, but small, updates.

Keywords:Detection and Estimation, Information Theory, Signal Acquisition, Coding, and Retrieval Abstract: This paper analyzes the impact of 2-level (1-bit) quantization on the correlation of Gaussian sources and of binary sources in the presence of additive noise, with a focus on the low-SNR regime. In this regime, in which the correlation vanishes as the SNR tends to zero, 1-bit symmetric-threshold quantization of Gaussian sources and of equiprobable, antipodal binary sources degrades the correlation by 2/pi (-1.96 dB), a result first established by Van Vleck for correlated Gaussian sources. Symmetric quantization of these sources also reduces the channel capacity per unit-energy by the same factor. However, asymmetric-threshold quantization of a class of binary source distributions known as flash signaling can avoid the 2/pi penalty in channel capacity, as shown by Koch and Lapidoth. In this paper, we show how flash signaling with asymmetric quantization can not only avoid the 2/pi penalty in correlation, but as the SNR tends to zero, the quantized correlation can approach the noise-free source correlation.

Keywords:Dynamic Games, Optimization Abstract: We consider an example of stochastic games with partial, asymmetric and non-classical information. We obtain relevant equilibrium policies using a new approach which allows managing the belief updates in a structured manner. Agents have access only to partial information updates, and our approach is to consider optimal open loop control until the information update. The agents continuously control the rates of their Poisson search clocks to acquire the locks, the agent to get all the clocks before others would get reward one. However, the agents have no information about the acquisition status of others and will incur a cost proportional to their rate process. We solved the problem for the case with two agents and two locks and conjectured the results for N-agents. We showed that a pair of (partial) state-dependent time-threshold policies form Nash equilibrium.

Keywords:Dynamic Games, Learning and Inference, Decentralized Control Systems Abstract: We study informational cascades in a scenario where a finite number of players need to decide whether to buy a product, which is either good or bad, or not. The true value of the product is not known to the players, but each player has her own private information on it. Each player observes the previous actions of other players and forms a belief on the quality of the product. In this work, players get more than one opportunity to act, although, a player can only buy the product once. This is in contrast to the existing literature on informational cascades, where each player only acts once. We consider an exogenous random process for choosing the players to act in each turn. We provide a characterization of structured perfect Bayesian equilibria (sPBE) with forward-looking strategies through a fixed-point equation of dimensionality that grows only quadratically with the number of players. We show the existence of sPBE and prove that bad informational cascades can be avoided entirely for infinitely patient players when the product is bad. Furthermore, we show that for sufficiently patient players, bad informational cascades happen only when at least half of the players have revealed their private information.

Keywords:Network Games and Algorithms, Performance Analysis, Machine Learning and Learning Theory Abstract: With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are typically unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieving desirable network performance (e.g., high throughput or low average job delay). Online/sequential learning algorithms are well-suited to learning the optimal control policy from observed data for systems without the information of underlying dynamics. In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy of queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Existing RL techniques, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Piecewise Decaying Epsilon-Greedy Reinforcement Learning (PDGRL), which applies model-based RL methods over a finite subset of the state space. We establish that the average queue backlog under PDGRL with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate PDGRL in dynamic server allocation and routing problems. Simulations show that PDGRL minimizes the average queue backlog effectively.

Keywords:Network Games and Algorithms, Pricing and Congestion Control Abstract: Traffic navigation services have gained widespread adoption in recent years. The route recommendations generated by these services often leads to severe congestion on urban streets, raising concerns from neighboring residents and city authorities. This paper is motivated by the question: How can a transportation authority design an information structure to induce a preferred equilibrium traffic flow pattern in uncertain network conditions? We approach this question from a Bayesian persuasion viewpoint. We consider a basic routing game with two parallel routes and an uncertain state that affects the travel cost on one of the routes. The authority sends a noisy signal of the state to a given fraction of travelers. The information structure (i.e. distribution of signals in each state) chosen by the authority creates a heterogeneous information environment for the routing game. The solution concept governing the travelers' route choices is Bayesian Wardrop Equilibrium. We design an information structure to minimize the average traffic spillover -- the amount of equilibrium route flow exceeding a preferred limit -- on one of the routes. We provide an analytical characterization of the optimal information structure for any fraction of travelers receiving the signal. We find that the optimal information structure can achieve the minimum spillover so long as the fraction of travelers receiving the signal is larger than a threshold (smaller than 1).

Keywords:Network Games and Algorithms, Detection and Estimation, Optimization Abstract: We study a model where a data collector obtains data from users through a payment mechanism to learn the underlying state from the elicited data. The private signal of each user represents her individual knowledge about the state. Through social interactions, each user can also learn noisy versions of her friends' signals, which is called group signals. Based on both her private signal and group signals, each user makes strategic decisions to report a privacy-preserved version of her data to the data collector. We develop a Bayesian game theoretic framework to study the impact of social learning on users' data reporting strategies and devise the payment mechanism for the data collector accordingly. Our findings reveal that, the Bayesian-Nash equilibrium can be in the form of either a symmetric randomized response (SR) strategy or an informative non-disclosive (ND) strategy. A generalized majority voting rule is applied by each user to her noisy group signals to determine which strategy to follow. When a user plays the ND strategy, she reports privacy-preserving data completely based on her group signals, independent of her private signal, which indicates that her privacy cost is zero. Both the data collector and the users can benefit from social learning which drives down the privacy costs and helps to improve the state estimation at a given payment budget. We derive bounds on the minimum total payment required to achieve a given level of state estimation accuracy.

Keywords:Performance Analysis, Wireless Communication Systems, Optimization Abstract: The random access erasure collision channel captures, in an abstracted manner, several important features of a wireless environment shared by uncoordinated radios. The radios employ random access and, when contending, transmit over independent heterogeneous erasure channels with the common access point. The access point is capable of only receiving a single message at a time, and so any colliding messages are lost. The combined effects of the channel heterogeneity and the collision rule give rise to a natural question: how does the expected sum throughput vary with the subset of radios that are active? The subset of radios achieving the optimal throughput is found by a simple greedy packing procedure --- add the radios, sorted by nonerasure probability, until a target offered load is exceeded.

Keywords:Performance Analysis, Distributed and Large-Scale Systems, Network Games and Algorithms Abstract: Balls-in-bins model, in which n balls are sequentially placed into n bins according to some dispatching policy, is an important model with a wide range of applications despite its simplicity. The power-of-d choices (Pod) policy, in which each ball samples d independent uniform random bins and join the one with the least load (where ties are broken arbitrarily), can yield a maximum load of loglog n/ log d + Theta(1) with high probability whenever d >= 2. Vöking later proposed a variant of power-of-d scheme in which bins are divided into d groups, and d bins are sampled from each group respectively. One important feature of this scheme is that ties are broken asymmetrically based on groups. Comparing with Pod, this scheme reduces the maximum load to loglog n / (d log phi_d) + Theta(1) where 1 < phi_d < 2. Our recent work shows that one can replace independent uniform sampling with random walk based sampling while having the same performance of Pod in terms of the maximum load of all bins. In this work, we propose multiple derandomized variants of Vöking's asymmetrical schemes and we show that they can yield the same performance as the original scheme, i.e. the maximum load is bounded by loglog n / (d log phi_d) + Theta(1).

Keywords:Reliability, Security and Trust, Information Theory and Statistics Abstract: A physically unclonable function (PUF) is an electronic circuit that produces an intrinsic identifier in response to a challenge. These identifiers depend on uncontrollable variations of the manufacturing process, which make them hard to predict or to replicate. Various security protocols leverage on such intrinsic randomness for authentification, cryptographic key generation, anti-counterfeiting, etc. Evaluating the entropy of PUFs (for all possible challenges) allows one to assess the security properties of such protocols. In this paper, we estimate the probability distribution of certain kinds of PUFs composed of n delay elements. This is used to evaluate relevant Rényi entropies and determine how they increase with n. Such a problem was known to have extremely high complexity (in the order of 2^(2^n)) and previous entropy estimations were carried out up to n=7. Making the link with the theory of boolean threshold functions, we leverage on the representation by Chow parameters to estimate probability distributions up to n=10. The resulting Shannon entropy of the PUF is close to the max-entropy, which is asymptotically quadratic in n.

Keywords:Intrusion/Anomaly Detection and Diagnosis, Complex Networked Systems Abstract: This paper analyzes the effect of replay attacks on constrained cyber-physical systems which are subject to linear probabilistic constraints. In order to inject an exogenous control input without being detected the attacker will hijack the sensors, observe and record their readings for a certain amount of time and repeat them afterwards while carrying out his attack. The conditions under which the attacker can induce perturbation in the control loop without being detected is studied. Then, in order to make the system resilient to the replay attack, a random signal (serving as the authentication signal) is added to the control input. Since this signal can hamper the performance of the system, finally, the optimization of the authentication signals is proposed to maximize the detection rate while keeping the process deterioration bounded. The effectiveness of the proposed scheme is demonstrated on a simulated case study.

National Institute of Advanced Industrial Science and Technology

Keywords:Reliability, Security and Trust Abstract: We introduce a general model for the local obfuscation of probability distributions by probabilistic perturbation, e.g., by adding differentially private noise, and investigate its theoretical properties. Specifically, we relax a notion of distribution privacy (DistP) by generalizing it to divergence, and propose local obfuscation mechanisms that provide divergence distribution privacy. To provide f-divergence distribution privacy, we prove that probabilistic perturbation noise should be added proportionally to the Earth mover's distance between the probability distributions that we want to make indistinguishable. Furthermore, we introduce a local obfuscation mechanism, which we call a coupling mechanism, that provides divergence distribution privacy while optimizing the utility of obfuscated data by using exact/approximate auxiliary information on the input distributions we want to protect.

Keywords:Adaptive Control and Automation, Machine Learning and Learning Theory Abstract: We study online reinforcement learning for finite-horizon deterministic control systems with arbitrary state and action spaces. Suppose the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after K episodes is O(DLK)^{frac{d}{d+1}}H where D is the diameter of the state-action space, L is a smoothness parameter, and d is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret.

Keywords:Machine Learning and Learning Theory, Adaptive Control and Automation, Learning and Inference Abstract: It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.

Keywords:Optimization, Machine Learning and Learning Theory Abstract: In this paper, we propose multi-timescale, sequential algorithms for deterministic optimization which can find high-quality solutions. The algorithms fundamentally track the well-known derivative-free model-based-search methods in an efficient and resourceful manner. Our approaches exhibit competitive performance on a selected few global optimization benchmark problems.

Keywords:Machine Learning and Learning Theory, Optimization, Adaptive Control and Automation Abstract: Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak's momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov's acceleration method, or a simplification of PolSA.

The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by sigma^2/n + o(1/n), where sigma^2 ge 0 is an identifiable constant. If these conditions fail, the rate is typically sub-linear.

There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, sigma^2: the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate O(1/n^2). The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR.

Keywords:Detection and Estimation, Information Theory and Statistics, Signal Acquisition, Coding, and Retrieval Abstract: Nonlinear function estimation is core to modern machine learning applications. In this paper, to perform nonlinear function estimation, we reduce a nonlinear inverse problem to a linear one using a polynomial kernel expansion. These kernels increase the feature set, and may result in poorly conditioned matrices. Nonetheless, we show several examples where the matrix in our linear inverse problem contains only mild linear correlations among columns. The coefficients vector is modeled within a Bayesian setting for which approximate message passing (AMP), an algorithmic framework for signal reconstruction, offers Bayes-optimal signal reconstruction quality. While the Bayesian setting limits the scope of our work, it is a first step toward estimation of real world nonlinear functions. The coefficients vector is estimated using two AMP-based approaches, a Bayesian one and empirical Bayes. Numerical results confirm that our AMP-based approaches learn the function better than LASSO, offering markedly lower error in predicting test data.

Keywords:Detection and Estimation, Statistical Signal Processing, Optimization Abstract: We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries iid Gaussian. This includes, as a special case, the previously studied least-squares estimator and various novel results for other popular estimators such as least-absolute deviations, hinge-loss and logistic-loss. Importantly, the sharp nature of our results allows for accurate comparisons between these different estimators. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

Keywords:Optimization, Performance Analysis, Distributed and Large-Scale Systems Abstract: We consider resource allocation questions for computing infrastructures with multiple server instances. In particular, the joint optimization of active service capacity, load balancing between clusters of servers, and task scheduling at each cluster, under conditions of data locality which imply different service rates for different cluster locations. Building on previous work, we formulate a convex optimization problem, and use Lagrange duality to decompose it between the different decision variables. We include regularization terms from proximal methods to obtain continuous control laws for load balancing and scheduling, and optimize the remaining variables through primal-dual gradient dynamics. We prove convergence of the resulting control laws to the desired optimal points, and demonstrate its behavior by simulations.

Keywords:Complex Networked Systems, Distributed and Large-Scale Systems, Robust and Nonlinear Control Abstract: Load models have a great impact on voltage behaviors as well as power system transient dynamics. Extensive work has been done on this topic, proposing appropriate load models and capturing better load behaviors during transient. This paper presents a comprehensive study to investigate the geometric and topological changes induced by different load models for the traditional power system differential-algebraic equations. Specifically, we attempt to reveal the deformation of equilibria, stability regions, and algebraic manifolds during a continuous evolution of load model. Several findings are presented in the paper, some of which countering traditional recognitions and intuitions. A major discovery is that the load model with a large proportion of constant impedance and a small proportion of constant power exhibits much more complex features than the load model with the reversed proportions of impedance and power. The increase of complexity is thoroughly recorded and investigated by the changes of geometric properties and mutations of topological invariants in the sense of equilibria, stability regions, and algebraic manifolds for the DAE system. However, most of the changes seem to occur on unstable components of algebraic manifolds or near the singular boundary surfaces, suggesting a limited variation of dynamical behaviors on the stable component.

Keywords:Complex Networked Systems, Distributed and Large-Scale Systems, Adaptive Control and Automation Abstract: This paper formulates the efficient and feasible participation of distributed energy resources (DERs) in complex electricity services as a centralized nonlinear optimization problem first. This problem is then re-stated using the novel energy/power transformed state space. It is shown that the DER dynamics in closed-loop can be made linear in this new state space. The decision making by the DERs then becomes a distributed model predictive control problem and it forms the basis for deriving physically implementable convex market bids. A multi-layered interactive optimization for clearing the distributed bids by higher layer decision makers, such as market aggregators, is posed and shown to lead to a near-optimal system-level performance at the slower market clearing rates. A proof-of-concept example is illustrated involving close to one hundred heterogeneous controllable DERs with real consumption data of a distribution feeder in Texas, contributing to automatic generation control (AGC).

Keywords:Complex Networked Systems, Learning and Inference Abstract: This paper studies topology inference, from agent states, of a directed cyber-social network with opinion spreading dynamics model that explicitly takes confirmation bias into account. The cyber-social network comprises a set of partially connected directed network of agents at the social level, and a set of information sources at the cyber layer. The necessary and sufficient conditions for the existence of exact inference solution are characterized. A method for exact inference, when it is possible, of entire network topology as well as confirmation bias model parameters is proposed for the case where the bias mentioned earlier follows a piece-wise linear model. The particular case of no confirmation bias is analyzed in detail. Numerical simulations demonstrate the effectiveness of the proposed method.

Keywords:Complex Networked Systems, Optimization Abstract: We compare three formulations of stationary equations of the Kuramoto model as systems of polynomial equations. In the comparison, we present bounds on the numbers of real equilibria based on the work of Bernstein, Kushnirenko, and Khovanskii, and performance of methods for the optimisation over the set of equilibria based on the work of Lasserre, both of which could be of independent interest.

Keywords:Pricing and Congestion Control, Network Games and Algorithms, Dynamic Games Abstract: We investigate the effect of information/navigation platforms in transportation networks. Specifically, we analyze the outcome when these platforms are owned by for-profit strategic companies such as Google and Apple. We consider two business models, one that makes a profit through advertisements and user information collection, and one that generates revenue from its user by charging a subscription fee. We show that social welfare in an environment with a single platform can be higher than the one when multiple platforms compete with one another. This is in contrast to the standard result for classical goods where competition always improves social welfare. Most importantly, we show that in a competitive environment with multiple platforms, each platform finds it optimal to disclose its information perfectly about the current condition of the network for free. Consequently, in a competitive market (almost) all information platforms must have an ad-based business model and reveal perfect information about the transportation network. Our results provide a purely economic justification on why in practice no navigation application discloses partial information to improve the congestion as suggested previously in the literature.

Keywords:Machine Learning and Learning Theory, Adaptive Control and Automation, Learning and Inference Abstract: We propose a data-driven framework to enable the modeling and optimization of human-machine interaction processes, e.g., systems aimed at assisting humans in decision-making or learning, work-load allocation, and interactive advertising. This is a challenging problem for several reasons. First, humans' behavior is hard to model or infer, as it may reflect biases, long term memory, and sensitivity to sequencing, i.e., transience and exponential complexity in the length of the interaction. Second, due to the interactive nature of such processes, the machine policy used to engage with human may bias possible data-driven inferences. Finally, in choosing machine policies that optimize interaction rewards, one must, on the one hand, avoid being overly sensitive to error/variability in the estimated human model, and on the other, being overly deterministic/predictable which may result in poor human 'engagement' in the interaction. To meet these challenges, we propose a robust approach, based on the maximum entropy principle, which iteratively estimates human behavior and optimizes the machine policy--Alternating Entropy-Reward Ascent (AREA) algorithm. We characterize AREA, in terms of its space and time complexity and convergence. We also provide an initial validation based on synthetic data generated by an established noisy nonlinear model for human decision-making.

Keywords:Robust and Nonlinear Control, Optimization, Wireless Communication Systems Abstract: In this paper, we consider the trajectory planning of an autonomous vehicle at an intersection by a central coordinator. We consider two real-world scenarios of unreliable communication channel and stochastic noises in process and observation. A robust trajectory is derived within a predetermined time-slot using model predictive control. The intersection crossing is modeled as a chance constraint problem and the stochastic noise evolution is restricted by a terminal constraint. The communication impairments are modeled as packet drop probabilities and Kalman estimation techniques are used for predicting the states in the presence of process and observation noises. A robust sub-optimal solution is obtained which establishes that the intersection is crossed by the vehicle with very low chance of failure.

Keywords:Decentralized Control Systems, Reliability, Security and Trust, Distributed and Large-Scale Systems Abstract: Consensus is one of the most important primitives in large-scale distributed systems due to its wide applications. Consensus has been studied and applied extensively in diverse areas including control systems, optimization, distributed computing, and emerging areas such as robotics, Blockchain, and Internet- of-Things. However, one common issue in practical systems is the misuse of consensus algorithms that are impossible to achieve in asynchronous systems with failures. We propose an alternative formulation that is achievable in such a scenario and easy to understand compared to other versions of relaxed consensus formulation.

Concretely, inspired by the asymptotic (approximate) consensus in control systems and applications in storage systems, we introduce a new consensus problem – Eventual Consensus (or Asymptotic Exact Consensus). Two distinctive features are “exact agreement” and “eventual agreement” on the state values of fault- free nodes. Eventual property allows us to solve the problem in asynchronous systems with crash or even Byzantine failures. Exact agreement property makes it useful for applications like storage systems which require replicas (or servers) to reach exactly the same state (eventually).

Keywords:Decentralized Control Systems Abstract: We address the problem of how to optimally schedule data packets over an unreliable channel in order to minimize the estimation error of a simple-to-implement remote linear estimator using a constant ``Kalman'' gain to track the state of a Gauss Markov process. The remote estimator receives time-stamped data packets which contain noisy observations of the process. Additionally, they also contain the information about the ``quality'' of the sensorslash source, i.e., the variance of the observation noise that was used to generate the packet. In order to minimize the estimation error, the scheduler needs to use both while prioritizing packet transmissions. It is shown that a simple index rule that calculates the value of information (VoI) of each packet, and then schedules the packet with the largest current value of VoI, is optimal. The VoI of a packet decreases with its age, and increases with the precision of the source. Thus, we conclude that, for constant filter gains, a policy which minimizes the age of information does not necessarily maximize the estimator performance.

Keywords:Adaptive Control and Automation, Distributed and Large-Scale Systems, Optimization Abstract: We propose job scheduling algorithms to minimize job migration and server running costs in cloud computing platforms offering Infrastructure as a Service. We first consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We develop a drift-plus-penalty framework based algorithm that can achieve optimal cost arbitrarily closely. Specifically, this algorithm yields a trade-off between delay and costs. We then relax the job-size knowledge assumption and give an algorithm that uses readily offered service to the jobs. We show this algorithm gives order-wise identical cost as the job size based algorithm. We illustrate the performance of the proposed algorithms and compare these to the existing algorithms via simulations.

Keywords:Machine Learning and Learning Theory, Distributed and Large-Scale Systems Abstract: The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we study the generalization performance of stochastic gradient descent (SGD) on a distributed asynchronous system. The system consists of multiple worker machines that compute stochastic gradients which are further sent to and aggregated on a common parameter server to update the variables, and the communication in the system suffers from possible delays. Under the algorithm stability framework, we prove that distributed asynchronous SGD generalizes well given enough data samples in the training optimization. In particular, our results suggest to reduce the learning rate as we allow more asynchrony in the distributed system. Such adaptive learning rate strategy improves the stability of the distributed algorithm and reduces the corresponding generalization error. Then, we confirm our theoretical findings via numerical experiments.

Keywords:Machine Learning and Learning Theory, Learning and Inference Abstract: Overparameterized machine learning models are often fit perfectly to training data, yet remarkably generalize well to new data. However, learning good models can require an enormous number of labeled training data. This challenge motivates the study of active learning algorithms that sequentially and adaptively request labels for “informative” examples for a large pool of unlabeled data. A maximin criterion was recently proposed for active learning specifically in the overparameterized and interpolating regime. Roughly speaking, the maximin criterion selects the example that is most difficult to interpolate, as measured by an appropriate norm on the interpolating function. Data-dependent norms perform best empirically, exhibiting intriguing adaptivity to cluster structure within the data. The main contribution of this paper is to mathematically characterize this behavior. Our main results show that the maximin criterion based on data-dependent norms provably discovers clusters and also automatically generates labeled coverings of the dataset.

Keywords:Information Theory, Machine Learning and Learning Theory, Learning and Inference Abstract: The Hirschfeld-Gebelein-R'{e}nyi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semi-supervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.

Keywords:Machine Learning and Learning Theory, Learning and Inference, Information Theory and Statistics Abstract: The k-vectors algorithm for learning regression functions proposed here is akin to the well-known k-means algorithm. Both algorithms partition the space of ``features'', but in contrast to the k-means algorithm, the k-vectors algorithm aims to reconstruct the regression function of the features (``response''), rather than the features themselves. The partitioning rule of the algorithm is based on maximizing the correlation (inner product) of the feature vector x with a set of k vectors, and generates polyhedral cells, similar to the ones generated by the nearest-neighbor rule of the k-means algorithm. Similarly to k-means, the learning algorithm alternates between two types of steps. In the first type of steps, k labels are determined via a centroid-type rule (in the response space), and in the second type of steps, the k vectors which determine the partition are updated according to a multiclass classification rule, in the spirit of support vector machines. It is proved that both steps of the algorithm only require solving convex optimization problems, and that the algorithm is empirically consistent - as the length of the training sequence increases, fixed-points of the empirical algorithm tend to fixed points of the population algorithm.

Keywords:Learning and Inference, Information Theory Abstract: Generative Adversarial Networks (GANs) have become a powerful framework to learn generative models that arise across a wide variety of domains. While there has been a recent surge in the development of numerous GAN architectures with distinct optimization metrics, we are still lacking in our understanding on how far away such GANs are from optimality. In this paper, we make progress on a theoretical understanding of the GANs under a simple linear-generator Gaussian-data setting where the optimal maximum-likelihood generator is known to perform Principal Component Analysis (PCA). We find that the original GAN by Goodfellow et. al. fails to recover the optimal PCA solution. On the other hand, we show that Wasserstein GAN can approach the PCA solution in the limit of sample size, and hence it may serve as a basis for an optimal GAN architecture that yields the optimal generator for a wide range of data settings.