Тип работы:
Предмет:
Язык работы:


Main education program СВ.5000.2018 «Mathematics»

Работа №142556

Тип работы

Бакалаврская работа

Предмет

математика

Объем работы27
Год сдачи2022
Стоимость4700 руб.
ПУБЛИКУЕТСЯ ВПЕРВЫЕ
Просмотрено
27
Не подходит работа?

Узнай цену на написание


Аннотация
1. Introduction 3
2. Problem setup and assumptions 4
3. Lookahead and Polynomial Dynamic Modification 5
4. Unidimensional quadratic problem 7
4.1. PDM analysis 9
4.2. Lookahead analysis 12
4.3. Summary 14
5. Strongly convex case 15
5.1. Hessian reformulation 15
5.2. PDM analysis 18
5.3. Lower bounds 22
6. Experiments 24
7. Conclusion 25
8. Acknowlegements 25

We focus on the optimization problem
min f (x) (1)icR"
These kinds of problems arise in various fields of applied science. By now de­terministic algorithms for solving problem (1) have been extensively investigated (Nesterov ((2018)). Meanwhile, with the emergence of new fields of interest (Vap- nik ((1999))), one often has to deal with the stochastic formulation of the problem (1):
f (X) = E- ^ [f (X)] , (2)
where £ is a random variable. Stochastic formulations come up when we do not have access to (or do not want to use due to the high computational cost) the deterministic value of f (x). For example, in the case of statistical supervised learning theory (Shalev-Shwartz and Ben-David ((2014))). In this case, x is a machine learning model with n weights/parameters (it can be a regression problem or a deep neural network), D represents an unknown data distribution, f is the loss of model x on datapoint £, and f is the generalization error. We assume that we do not know the distribution D from which the data come, but have some samples £ from D. But we want to have a suitable model fit to whole D. Deterministic methods are no longer suitable, we need stochastic modifications of classical methods. One of the most popular such methods is the stochastic gradient descent (SGD) (Robbins and Monro ((1951)); Nemirovski et al. ((2009)); Vaswani et al. ((2019)); Gorbunov et al. ((2020))).
In the paper we compare different stochastic gradient methods in the regime of fixed step size. In contrast to deterministic methods, stochastic methods in such a situation do not lead to an exact solution of the problem, but begin to oscillate around the solution, having achieved some accuracy. Then, one comes to tradeoff between rate of convergence and limit noise level near the solution. It is proposed to compare stochastic gradient methods by their rate of convergence with desired limit noise being fixed. We use SGD, an efficient method of convex optimization, as a reasonable baseline to compare with. Also we introduce new algorithm called Polynomial Lookahead or alternatively Polynomial Dynamic Modification(PDM) which takes its inspiration in already studied in the literature Lookahead algorithm firstly introduced in Michael R. Zhang and Ba ((2019)) and analysed in quadratic stochastic case. Also Lookahead was studied in non-convex regime in Jianyu Wang and Rabbat ((2020)). These techniques allow to come closer to the solution more efficiently and combat the limit radius of oscillation around the solution.

Возникли сложности?

Нужна помощь преподавателя?

Помощь в написании работ!


We introduced new algorithm called Polynomial Dynamic Modification and showed that it experiences faster convergence rate on the quadratic stochastic prob­lem compared to SGD and Lookahead with desired noise near solution being fixed. Moreover, we prove linear convergence of PDM to some limit noise on the class of stongly convex twicely differentiable functions.


1. C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1-27, 2011.
2. E. Gorbunov, F. Hanzely, and P. Richtarik. A unified theory of sgd: Variance reduc­tion, sampling, quantization and coordinate descent. In International Conference on Artificial Intelligence and Statistics, pages 680-690. PMLR, 2020.
3. N. B. Jianyu Wang, Vinayak Tantia and M. Rabbat. Lookahead converges to stationary points of smooth non-convex functions. 2020.
4. H. Michael R. Zhang, James Lucas and J. Ba. Lookahead optimizer: k steps forward, 1 step back. 2019.
5. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approxi­mation approach to stochastic programming. Society for Industrial and Applied Mathematics, 19:1574-1609,01 2009. doi: 10.1137/070704277.
6. Y. Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
7. Robbins and S. Monro. A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 - 407, 1951. doi: 10.1214/aoms/1177729586. URL https://doi.org/10.1214/aoms/1177729586.
8. S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
9. V. Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
10. S. Vaswani, F. Bach, and M. Schmidt. Fast and faster convergence of sgd for over­parameterized models and an accelerated perceptron. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1195-1204. PMLR, 2019.


Работу высылаем на протяжении 30 минут после оплаты.




©2025 Cервис помощи студентам в выполнении работ