Cook ’ s Local Influence in Generalized Linear Models via the Shape Operator 1

In this paper we develop an algorithm for assessing the effect of small perturbations of the data on the validity of a postulated generalized linear model. The procedure is based on the geometric notion of shape operator, a single mathematical object that gathers together all the normal curvatures of a given influence graph or hypersurface. In addition to introducing relevant theoretical notions and explaining the foundations of the local influence assessment method in a generalized linear model with a canonical link, we provide a detailed example of application together with the explicit R code implementing the algorithm.


Introduction
The adjustment of a statistical model should be followed by a stability analysis for examining the effect of (small) data perturbations on the postulated model.A comprenhensive account of the general perturbation schemes evolved from casedeletion and caseweights paradigms can be found in the classical work of Cook (1986).In this paper, the author develops a far-reaching method for assessing the local influence in a linear model that has had an enormous effect on the investigations in the field of statistical influence.Even though Cook (1986) considers the extension of his results beyond normal linear models, neither a systematic nor a detailed approach to the Generalized Linear Models is given as it would be expected, i. e., in an analogous manner Newton-Raphson-Fisher algorithm provides a way to determine the model coefficients for any distribution in the exponential family.Furthermore, the normal curvatures (associated to the normal sections to the tangent space of the influence graph at the observed point) can be brought together with the help of the mathematical notion of shape operator.For the plentiful recent developments in the field, the reader can see Ortega et al. (2006) and Ortega et al. (2008).
In this paper we carry out a twofold programme.On the one hand, we give a working definition of generalized lineal model with a perturbation scheme (GLMP), besides useful notions of likelihood displacement, influence hypersurface (or graph) and local influence.These concepts furnish the right framework for developing an algorithm to compute the directions where the local influence achieves its maximum and its minimum.On the other hand, we calculate local influence in terms of the shape operator of the influence hypersurface at the observed point.The procedure is thus automated with the help of the well-known Linear Algebra Spectral Theorem.An initial inquiry into this method was attempted by Muñoz & Segura (2006).
Section 2 is devoted to the above-mentioned theoretical background.In Section 3 we present a full application example concerning a dose vs. response experiment.Section 4 contains the details of the R computer programme used to manage the binomial distribution in the example.Finally, we draw some conclusions from the proposed method and its implementation.

Local Influence in a Generalized Linear Models and Differential Geometry
In the next paragraphs we will introduce some concepts needed to develop our proposal in terms of geometry and generalized linear models.

Departure notions
The Generalized Linear Models were proposed by ? as flexible alternative for use different distribution to normal model.In the first place, we add an extra element to the following notion of Generalized Linear Model (GLM).
Definition 1 (GLM).A GLM is a triplet (Y, η, g), constituted of: A random vector Y whose entries follow a distribution in the exponential family.The probability density function for any element of Y is where θ is named the canonical parameter, φ is the dispersion parameter, a i (•),b(•) and C( ) are known functions.The mean of Y will be denoted by µ.
A systematic component or linear predictor η = Xβ, where X is the model matrix and β is the vector of parameters.
An increasing continuously differentiable canonical link g relating the entries of η with the corresponding entries of µ, that is, We assume the dimensions of Y, η, X, β are n×1, n×1, n×p and p×1, respectively.
The adjustment of a GLM is a procedure to estimate the vector β from a set of independent observations y i , x ij , 1 ≤ i ≤ n, 1 ≤ j ≤ p.This is usually accomplished by means of the Newton-Raphson-Fisher algorithm.Since the link is canonical, the likelihood function is concave, the first order differential condition is sufficient and so, the estimate β is a global maximum point.
In order to study influence, we must perturb the model.The practical way to realize the perturbation consists in including a p × 1 vector ω in the log-likelihood l (β; y) to get a new function l (ω; β; y).ω can be used to produce a wanted effect on the case observations, the outputs or the explanatory variables, among other choices.We suppose ω ∈ Ω, where Ω is an open neighborhood of 0 ∈ R p .We also demand that l (0; β; y) = l (β; y), that is, ω = 0 means no perturbation.
l (ω; β; y) is continuously differentiable and, when ω = 0, it attains its maximum at β = β(0).Therefore, if Ω is small enough and ω ∈ Ω is fixed, the function is maximized at some β(ω).This fact lead us into the following important notion.
In what follows, we shall always postulate the existence of the following object.
For us, the likelihood displacement of a GLMP is the function Clearly, D is nonnegative and achieves a local minimum at ω = 0. Cook (1986) contemplates alternative likelihood displacements.However, we shall only consider the one defined by function D. The motivation for D comes mainly from the case of normal lineal models and can be interpreted in terms of the large sample confidence region for β.
The influence hypersurface of a GLMP is the graph of D, that is, Once S D is furnished with its natural structure of differential manifold, Classical Differential Geometry supplies the tools for computing its two fundamental forms.
With them, the shape operator dG 0 of S D at ω = 0 is determined.The reader is referred to Auslander (1967) for a thorough account of these results.They allow us to state with precision the crucial notion of this paper.
Definition 4. The local influence of a GLMP (Y, η, g, β) is the function I : Here, S q−1 denotes the unit (q − 1)-sphere in R p .dG 0 is the shape operator of S D evaluated at ω = 0.
Since the shape operator is a symmetric linear endomorphism of the tangent space to S D at ω = 0 (i.e., R p ), the methods of Numeric Linear Algebra yield the unit eigenvectors pointing the directions of maximal and minimal influence.According to the Spectral Theorem, they are associated to the maximal and minimal (absolute) eigenvalues of dG 0 .

Assessing local influence
The log-likelihood of a GLM can be written McCullagh & Nelder (1997) as where the functions a, b y c are determined by the particular distribution.Here we shall only study the perturbation scheme defined implicitly by assigning to each vector ω ∈ Ω the optimal set of parameters β(ω) maximizing In other words, we use case-weights 1 The computation of the first fundamental form depends on the first partial derivatives of the map α (ω) = (ω, D (ω)).By virtue of the Chain Rule, where "1" occurs in the i-th position.The gradient or score is given by Hence, the coefficients of the first fundamental form are That is, this form is simply the identity matrix n × n.It will be denoted by F 0 .Also, the normal unit vector or Gauss map at ω = 0 is The calculation of the second fundamental form relies on the second partial derivatives Since we are assuming canonical links, the score entries of the log-likelihood take, in absence of perturbation, the simpler form Garcia (2002) ∂l 1 ≤ k ≤ p, where g -1 denotes the inverse function of link function, g.U k is not identically equal to zero because l is not perturbed.Now, Comunicaciones en Estadística, junio 2015, Vol. 8, No. 1 Consequently, we reed to find the values of ∂ βr They can be obtained by differentiating the score entries of the perturbed log-likelihood l(ω; β; y).A quick computation reveals that These expressions at ω = 0 form the linear systems The solutions ∂ βr ω i (ω = 0) determine the derivatives In this way, the coefficients of the second fundamental form H 0 are Lastly, the shape operator at ω = 0 is given by Solanilla ( 2008) The eigenvectors of dG 0 associated with the (absolute) maximum and minimal eigenvalues point respectively towards the directions of maximum and minimum local influence.The random vector entries follow a binomial distribution.The model matrix is .
For the binomial distribution, a(φ) = 1, the canonical link is logistic and so, Our procedure requires the calculation of the partial derivatives Comunicaciones en Estadística, junio 2015, Vol. 8, No. 1 The first subset of derivatives are computed by the following expressions.
We present the values of the derivatives in Table 2.The derivatives of the perturbation scheme are the solutions of the 2 × 2 linear systems where βt = ( β1 , β2 ), U t = (U 1 , U 2 ) and A is the matrix with components m l e η l (1 + e η l ) 2 x lk x lr .
Table 3 shows matrix A. The derivatives ∂βr ∂ωi are given in Table 4.We finally get the second fundamental form, its eigenvalues and eigenvectors by S=t(dU)%*%dB G=eigen(S) R gives at once the unit eigenvectors.It remains to arrange them according to the order adopted for their corresponding absolute eigenvalues.

Concluding remarks
Cook's differential-geometric method to assess local influence is systematically carried out via the shape operator of the influence graph.Even if we have only dealt with canonical links, the process can by straightforwardly attempted for arbitrary suitable links.The use of the shape operator simplifies the procedure and makes possible to develop a simple algorithm based on mere elementary linear operations.

Table 1 :
: Garcia 2002.set of experimental data about Rotenone's toxicityGarcia (2002).The dose d i given to m i insects is explanatory variable.The outcome is the number y i of dead insects.The corresponding ratios p i = y i /m i are also presented in the table.Number of dead insects y i out of m i insects receiving a dose d i of Rotenone.Source: Garcia 2002.