## Course homepage for Optimisation algorithms in Statistics I (Ph.D. course 2020)

Summary

Optimisation (computation of minima or maxima) is frequently needed in statistics. Maximum likelihood estimates, optimal experimental designs, risk minimization in decision theoretic models are examples where solutions of optimisation problems usually do not have a closed form but need to be computed numerically with an algorithm. Moreover, the field of machine learning depends on optimisation and has new demands on algorithms for computation of minima and maxima.

In this course, we will start with discussing properties of gradient based algorithms like the Newton method and the gradient descent method. We will then look in developments especially triggered by machine learning and discuss stochastic gradient based methods. Recent developments recognised the value of gradient free algorithms and we will consider so-called metaheuristic algorithms, e.g. particle swarm optimisation. The last topic of the course will deal with handling of restrictions during optimisation like equality and inequality restrictions.

We will implement these algorithms in R. Examples from machine learning and optimal design will illustrate the methods.

Most welcome to the course!
Frank Miller, Department of Statistics, Stockholm University
frank.miller@stat.su.se

Lectures: October 2; Time 10-12, 13-15 (registered participants received link to all lectures).

• Givens GH, Hoeting JA (2013). Computational Statistics, 2nd edition. John Wiley & Sons, Inc., Hoboken, New Jersey. Chapter 2 until 2.2.3 (and Chaper 1.1-1.4 if needed).
• Goodfellow I, Bengio Y, Courville A (2016). Deep Learning. MIT Press, http://www.deeplearningbook.org. Chapter 4.3 (and parts of Chapter 2 and Chapter 4.2 if needed).
• Sun S, Cao Z, Zhu H, Zhao J (2019). A survey of optimization methods from a machine learning perspective, https://arxiv.org/pdf/1906.06821.pdf. Section I, II, IIIA1, IIIB1-2.
• AlphaOpt (2017). Introduction To Optimization: Gradient Based Algorithms, Youtube video (very elementary introduction of concepts).
• About analytical optimisation. (Frank Miller, September 2020)
• Understanding gradient descent. (Eli Bendersky, August 2016)
• Bisection method. (Frank Miller, March 2020; 4min video)

Example code: steepestascent.r

Topic 2: Stochastic gradient based algorithms

Lecture: October 13; Time: 9-12 (Zoom).

About the solution of Problem 1.3.

Dataset logist.txt.

Lecture: October 23; Time 9-12 (Zoom).

• Givens GH, Hoeting JA (2013). Computational Statistics, 2nd edition. John Wiley & Sons, Inc., Hoboken, New Jersey. Chapter 2.2.4 and 3.1 to 3.4.
• AlphaOpt (2017). Introduction To Optimization: Gradient Free Algorithms (1/2) - Genetic - Particle Swarm, Youtube video (elementary introduction of concepts).
• AlphaOpt (2017). Introduction To Optimization: Gradient Free Algorithms (2/2) Simulated Annealing, Nelder-Mead, Youtube video (elementary introduction of concepts).
• Clerc M (2012). Standard Particle Swarm Optimisation. 15 pages.https://hal.archives-ouvertes.fr/hal-00764996 (some background to details in implementation including choice of parameter values).
• Wang D, Tan D, Liu L (2018). Particle swarm optimization algorithm: an overview. Soft Comput 22:387–408. (Broad overview over research about PSO since invention).
• Goodfellow I, Bengio Y, Courville A (2016). Deep Learning. MIT Press, http://www.deeplearningbook.org. Chapter 5.2 and 7.1 (about regularisation).

Function g in Problem 3.1: bimodal.r.

Dataset cressdata.txt.

Topic 4: Optimisation with restrictions

Lecture: November 6, Time 9-12 (Zoom) 