Study-unit SIGNAL PROCESSING AND OPTIMIZATION FOR BIG-DATA

Course name	Computer engineering and robotics
Study-unit Code	A001256
Curriculum	Data science e data engineering
Lecturer	Paolo Banelli
Lecturers	Paolo Banelli
Hours	72 ore - Paolo Banelli
CFU	9
Course Regulation	Coorte 2023
Supplied	2024/25
Learning activities	Affine/integrativa
Area	Attività formative affini o integrative
Sector	ING-INF/03
Type of study-unit	Obbligatorio (Required)
Type of learning activities	Attività formativa monodisciplinare
Language of instruction	Italian
Contents	- RECALLS of STATISTICAL SIGNAL PROCESSING BASICS -FUNDAMENTALS of CONVEX OPTIMIZATION - BIG-DATA REDUCTION and SAMPLING - GRAPH-BASED SIGNAL/DATA PROCESSING - DISTRIBUTED OPTIMIZATION AND SIGNAL PROCESSING for LEARNING over NETWORKS
Reference texts	Most of the class content will be inspired to some chapters and paragraphs of these books: - S.Kay, Fundamentals of Statistical Signal Processing, Vol. I & II, Prentice Hall, 1993-1998; - S. Theodoridis, Machine Learning: A Bayesian and optimization perspective. - T. Hastie, et. al., The Elements of Statistical Learning: data Mining, Inference, and Prediction - M. E. J Newman, Networks an Introduction- S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004; - S. Boyd et al., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3(1):1–122, 2011- Furthermore some notes of the teacher will be available.
Educational objectives	Understanding and applying the basics of statistical inference and convex optimization to (big)-data analytics. Understanding the concept of data-reduction/sampling and conditions under which statistical inference and reconstruction of the information does not suffer too much by reduction/sampling. Extend the knowledge of classical signal processing to signals defined over graphs, which is a natural representation of big-data either dependent on their distribution over a network, or on their statistical similarity, or both. Understand the methodological tools to distribute complex statistical inference on parallel and distributed agents (computers, etc.) as a way to empower statistical inference on big-data, possibly geographically or logically distributed over a network. Learning from observed data the topological structure that characterizes their generation and evolution.
Prerequisites	Mandatory: Calculus, Linear Algebra, Random Variables and Stochastic processes, Fourier Analysis, Digital signal processing.Suggested: Machine Learning and Data Mining. Useful: Estimation and Detection Theory (Statistical Inference)
Teaching methods	The class will be given face-to-face by the lecturer with the aid of computer-slides. Furthermore some of the algorithms will be also implements by PC-based simulations, interactively with the students.
Other information
Learning verification modality	1) Short Thesis on a topic related to the class content, with computer aided simulations. To be given 1 week before the oral exam. 2) Oral Exam: Discussion of the Thesis plus typically 2 questions.
Extended program	- Part I: RECALLS on BASICS OF STATISTICAL INFERENCE AND LEARNING (6 hours) Recalls on estimators, frequentist and Bayesian, performance indicators and common estimators (MVUE, MLE, MMSE, LS, etc.) Recalss on binary hypothesis testing: likelihood ratio test (LRT), Neyman-Pearson and Bayesian perspectives (Minimum error probability, MAP, Bayes Risk). Statistical learning and relationship with machine-learning: linear regression, K-means, etc. - Part II: FUNDAMENTALS OF (DISTRIBUTED) CONVEX OPTIMIZATION (15 hours ) Basics of convex optimization: Convex sets, convex functions, convex optimization problems; Duality theory: Lagrange dual problem, Slater's constraint qualifications, KKT conditions; Optimization algorithms: Primal methods (steepest descent, gradient projection, Newton method), primal-dual methods (dual ascent, alternating direction method of multipliers);Examples of applications: Approximation and fitting, statistical estimation and detection, adaptive filtering, supervised and unsupervised learning from data; Distributed optimization: Consensus and sharing; Distributed optimization: Primal and primal-dual methods; - Part III: BIG-DATA REDUCTION (9 hours) Compressed Sampling/Sensing and reconstruction. Statistical Inference by Sparse Sensing, Classification by Principal Component Analysis, Canonical Correlation Analysis, and Information Bottleneck. - Part IV: GRAPH-BASED SIGNAL PROCESSING (15 hours) Signals on graph: motivating examples; algebraic graph theory, graph features; signal processing on graphs: Fourier Transform, smoothing, sampling, and data compression on graph; - Part V: DISTRIBUTED OPTIMIZATION, SIGNAL PROCESSING, and LEARNING over NETWORKS 27 hours) Average consensus: Theory and algorithms; Distributed signal processing: Estimation and detection; Distributed signal processing: LMS, RLS and Kalman Filtering on Graphs. Distributed supervised learning (LASSO, SVM, Logistic Regression) Distributed unsupervised learning: Dictionary, learning and data clustering: learning of eigenvector and eigenvalues of Laplacian matrices. Graph learning: Gaussian Markov Random Fields and Graphical LASSO, Smoothness and Total Variation approaches, Gaussian processes for directed causal inference. Matrix Completion algorithms.