Calculating L-smoothness constant for logistic regression.
I am trying to find the $L$-smoothness constant of the following function (logistic regression cost function) in order to run gradient descent with an appropriate stepsize.
The function is given as $f(x)=-frac{1}{m} sum_{i=1}^mleft(y_i log left(sleft(a_i^{top} xright)right)+left(1-y_iright) log left(1-sleft(a_i^{top} xright)right)right)+frac{gamma}{2}|x|^2$ where $a_i in mathbb{R}^n, y_i in{0,1}$,$s(z)=frac{1}{1+exp (-z)}$ is the sigmoid function.
The gradient is given as
$nabla f(x)=frac{1}{m} sum_{i=1}^m a_ileft(sleft(a_i^{top} xright)-y_iright)+gamma x $.
My ideas was that the smoothness constant $L$ has to be bigger than all the eigenvalues of the hermitian of the given function, this follows from the fact that if $f$ is $L$-smooth, $g(x)=frac{L}{2} x^T x-f(x)$ is a convex function and therefore the hessian has to be positive semi-definite.
The second-order partial derivatives of $f$ are given as
$ frac{partial^2 }{partial x_k partial x_j}f(x)=frac{1}{m} sum_{i=1}^ms(a_i^{top} x)left(1-s(a_i^{top} x)right)[a_i]_k[a_i]_j+gammadelta_{ij} $
from the following github post (https://github.com/ymalitsky/adaptive_GD/blob/master/logistic_regression.ipynb) i know that $ L=frac{1}{4} lambda_{max }left(A^{top} Aright)+gamma$ , where $lambda_{max }$ denotes the largest eigenvalue, which seems good since i figured out that $s(a_i^{top} x)left(1-s(a_i^{top} x)right)leq frac{1}{4}$ for all $x$.
But i am not able to fit everything together. I would appreciate any help.
$endgroup$