LaTeX Theme Editor

Export Theme

Copy this preamble into your .tex file, or download it as a .sty file and use \usepackage{mytheme}.

% =========================================
% Generated by LaTeX Theme Editor
% Preset: Academic Clean
% =========================================

% --- Page Layout ---
\usepackage[
  a4paper,
  top=2.5cm, bottom=2.5cm,
  left=3cm, right=3cm
]{geometry}

% --- Font ---
\usepackage{palatino}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

% --- Font Size ---
% Use \documentclass[11pt]{article} (or report/book)

% --- Line Spacing ---
\usepackage{setspace}
\setstretch{1.5}

% --- Colours ---
\usepackage[dvipsnames]{xcolor}
\definecolor{accentcolour}{HTML}{2E4057}
\definecolor{linkcolour}{HTML}{1A6B9A}

% --- Headings ---
\usepackage{titlesec}
\titleformat{\section}
  {\Large\bfseries\color{accentcolour}}
  {\thesection}{1em}{}
\titlespacing*{\section}{0pt}{18pt}{8pt}

\titleformat{\subsection}
  {\large\bfseries\color{accentcolour}}
  {\thesubsection}{1em}{}
\titlespacing*{\subsection}{0pt}{12pt}{4pt}

\titleformat{\subsubsection}
  {\normalsize\bfseries\color{accentcolour}}
  {\thesubsubsection}{1em}{}
\titlespacing*{\subsubsection}{0pt}{10pt}{3pt}

% --- Header & Footer ---
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}
\fancyhead[R]{\thepage}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0pt}

% --- Hyperlinks ---
\usepackage[
  colorlinks=true,
  linkcolor=linkcolour,
  urlcolor=linkcolour,
  citecolor=linkcolour
]{hyperref}
Gradient Descent Convergence in Non-Convex Optimisation
Abstract. We analyse the convergence behaviour of stochastic gradient descent (SGD) and its adaptive variants in non-convex loss landscapes typical of deep neural networks. Using a unified framework based on Lyapunov stability theory, we derive tighter upper bounds on the expected gradient norm after T iterations and identify conditions under which saddle-point escape is guaranteed with high probability. Experiments on standard benchmarks confirm a 35–42% reduction in wall-clock time to reach a target validation loss compared to tuned baselines.
1  Introduction

Optimising high-dimensional non-convex objectives is the central computational challenge in modern machine learning. Despite the empirical success of first-order methods, theoretical guarantees remain elusive outside restricted problem classes. A core difficulty is that standard convergence proofs assume Lipschitz-smooth gradients and bounded stochastic noise, assumptions routinely violated in practice by batch normalisation layers and heavy-tailed gradient distributions.

1.1  Background

Early convergence results for SGD in non-convex settings established O(1/√T) rates for the minimum gradient norm, matching known lower bounds up to logarithmic factors. Subsequent work extended these guarantees to variance-reduced estimators, showing that finite-sum structure can be exploited to achieve faster rates without full-batch gradients. See Theorem 2.1 for a precise statement.

1.1.1  Adaptive Methods

Adam and its variants introduce per-parameter learning rates derived from second-moment estimates, dramatically accelerating early training. However, their convergence in non-convex settings requires careful choice of the ε stabilisation term; values that are too small reintroduce instability near saddle points, while large values recover the behaviour of standard SGD.

2  Theoretical Analysis

Let f : ℝd → ℝ be an L-smooth function bounded below. Under mild assumptions on the stochastic gradient oracle, we show that the iterates {x_t} produced by SGD with step size ηt = η/√t satisfy 𝔼[‖∇f(xτ)‖²] = O(log T/√T), improving the constant in prior bounds by a factor of 2Lη.