Title: | Decomposition of (Income) Inequality |
---|---|
Description: | Decomposition of (income) inequality by population sub groups. For a decomposition on a single variable the mean log deviation can be used (see Mookherjee Shorrocks (1982) <DOI:10.2307/2232673>. For a decomposition on multiple variables a regression based technique can be used (see Fields (2003) <DOI:10.1016/s0147-9121(03)22001-x>). Recentered influence function regression for marginal effects of the (income or wealth) distribution (see Firpo et al. (2009) <DOI:10.3982/ECTA6822>). Some extensions to inequality functions to handle weights and/or missings. |
Authors: | René Schulenberg |
Maintainer: | René Schulenberg <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-06 03:26:13 UTC |
Source: | https://github.com/reneschulenberg/dineq |
Decomposition of the change in (income) inequality into multiple characteristics, divided by a price and a quantity effect.
dineq_change_rb(formula1, weights1 = NULL, data1, formula2, weights2 = NULL, data2)
dineq_change_rb(formula1, weights1 = NULL, data1, formula2, weights2 = NULL, data2)
formula1 |
an object of class "formula" (or one that can be coerced to that class) for the first year/dataset: a symbolic description of the model to be fitted in the ordinary least squares regression. |
weights1 |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. Should be inside selected data frame in the function and between quotation marks. |
data1 |
a data frame containing the variables for the first year/dataset in the model. |
formula2 |
an object of class "formula" (or one that can be coerced to that class) for the first year/dataset: a symbolic description of the model to be fitted in the ordinary least squares regression. |
weights2 |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. Should be inside selected data frame in the function and between quotation marks. |
data2 |
a data frame containing the variables for the first year/dataset in the model. |
This function uses a multivariate regression-based decomposition method. Multiple characteristics can be added to the function in order to calculate the contribution of each individual variable (including the residual) to the change of the inequality. For instance socio-economic, demographic and geographic characteristics (such as age, household composition, gender, region, education) of the household or the individual can be added.
The change decomposition is divided into a price and a quantity effect for each characteristic. The quantity effect is caused by changes in the relative size of subgroups (for instance: a higher percentage of elderly households). The price effect is caused by a change in the influence of the characteristic on the dependent variable (for instance a higher income for the elderly households).
It uses a logarithmic transformation of the values of the dependent variable. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function.
The decomposition can only be used on the variance of log income.
The main difference with the decomposition of the change of the mean log deviation is that multiple characteristics can be analyzed at the same time. While the decomposition function only analyze one characteristic at the same time.
The function uses two datasets for both years to compare. Pay attention that characteristics should be the same (although can be named differently) and in the same order in the formula.
a list with the results of the decomposition and the parts used for the decomposition, containing the following components:
attention |
optional note on the difference in the input. |
variance_logincome |
the values of the variance of log income of both years/datasets and difference between both. |
decomposition_inequality |
the (relative) decomposition of the inequality of both years/datasets into the different variables. See function 'rb_decomp'. |
decomposition_change_absolute |
decomposition of the change in the variance of log income into the different variables and residual split into price and quantity effects. Adds up to the absolute change in variance of log income. |
decomposition_change_relative |
decomposition of the change in the variance of log income into the different variables and residual split into price and quantity effects. Adds up to 100 percent. |
notes |
number of zero or negative observations in both data sets/years. The function uses a logarithmic transformation of x as input for the regression. Therefore these observations are deleted from the analysis |
Yun, M.-S. (2006) Earnings Inequality in USA, 1969–99: Comparing Inequality Using Earnings Equations, Review of Income and Wealth, 52 (1): p. 127–144.
Fields, G. (2003) Accounting for income inequality and its change: a new method, with application to the distribution of earnings in the United States, Research in Labor Economics, 22, p. 1–38.
Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decomposition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p. 289-322,
#Decomposition of the change in income inequality into 4 variables using the Mexican Income #data set data(mex_inc_2008) inequality_change <- dineq_change_rb(formula1=income~hh_structure+education+domicile_size+age_cat, weights1="factor",data1=mex_inc_2008, formula2=income~hh_structure+education+ domicile_size+age_cat, weights2="factor",data2=mex_inc_2016) #selection of output: change in variance of log income decomposed in variables split into price #and quantity effect and residual. inequality_change["decomposition_change_absolute"] #selection of output: relatieve change in variance of log income decomposed in variables split #into price and quantity effect and residual. Because of negative change in variance of log #income, the negative contributuon of education (quantity) becomes a positive number. inequality_change["decomposition_change_relative"]
#Decomposition of the change in income inequality into 4 variables using the Mexican Income #data set data(mex_inc_2008) inequality_change <- dineq_change_rb(formula1=income~hh_structure+education+domicile_size+age_cat, weights1="factor",data1=mex_inc_2008, formula2=income~hh_structure+education+ domicile_size+age_cat, weights2="factor",data2=mex_inc_2016) #selection of output: change in variance of log income decomposed in variables split into price #and quantity effect and residual. inequality_change["decomposition_change_absolute"] #selection of output: relatieve change in variance of log income decomposed in variables split #into price and quantity effect and residual. Because of negative change in variance of log #income, the negative contributuon of education (quantity) becomes a positive number. inequality_change["decomposition_change_relative"]
Decomposition of (income) inequality into multiple characteristics. A regression-based decomposition method is used.
dineq_rb(formula, weights = NULL, data)
dineq_rb(formula, weights = NULL, data)
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted in the ordinary least squares regression. |
weights |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. Should be inside selected data frame in the function and between quotation marks. |
data |
a data frame containing the variables in the model. |
This function uses a multivariate regression-based decomposition method. Multiple variables can be added to the function in order to calculate the contribution of each individual variable (including the residual) to the inequality. For instance socio-economic, demographic and geographic characteristics (such as age, household composition, gender, region, education) of the household or the individual can be added.
This decomposition can be used on a broad range of inequality measure, like Gini, Theil, mean log deviation, Atkinson index and variance of log income.
It uses a logarithmic transformation of the values of the dependent variable. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function.
The main difference with the decomposition of the mean log deviation or Gini coefficient is that multiple characteristics can be analyzed at the same time. While the other decomposition functions only analyze one characteristic at the same time.
a list with the results of the decomposition, containing the following components:
inequality_measures |
the values of 4 inequality measures: gini, mean log deviation, theil and variance of log income |
decomposition_inequality |
the (relative) decomposition of the inequality into the different variables |
regression_results |
results of the ols regression which is used to make the decomposition of inequality |
note |
number of zero or negative observations. The function uses a logarithmic transformation of x as input for the regression. Therefore these observations are deleted from the analysis |
Fields, G. S. (2003). ‘Accounting for income inequality and its change: a new method, with application to the distribution of earnings in the United States’, Research in Labor Economics, 22, p. 1–38.
Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decomposition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p. 289-322,
#Decomposition of the income inequality into 4 variables using Mexican Income data set: data(mex_inc_2008) inequality_decomp <- dineq_rb(income~hh_structure+education+domicile_size+age_cat, weights="factor", data=mex_inc_2008) #selection of the output: decomposition of the inequality into the contribution of the #different variables and residual (adds up to 100 percent) inequality_decomp["decomposition_inequality"]
#Decomposition of the income inequality into 4 variables using Mexican Income data set: data(mex_inc_2008) inequality_decomp <- dineq_rb(income~hh_structure+education+domicile_size+age_cat, weights="factor", data=mex_inc_2008) #selection of the output: decomposition of the inequality into the contribution of the #different variables and residual (adds up to 100 percent) inequality_decomp["decomposition_inequality"]
Decomposes the Gini coefficient into population subgroups. Distinction is made by between and within group inequality and an overlap (interaction) term.
gini_decomp(x, z, weights = NULL)
gini_decomp(x, z, weights = NULL)
x |
a numeric vector containing at least non-negative elements. |
z |
a factor containing the population sub groups. |
weights |
an optional vector of weights of x to be used in the computation of the decomposition. Should be NULL or a numeric vector. |
The decomposition of the Gini coefficient by between and within group inequality. In most cases there is an overlap of the distribution of both groups. Consequence is that between and within group inequality doesn't add up to the total Gini coefficient. In those cases there is an overlap term. Also referred to as interaction effect.
Within group inequality is calculated by using the Gini coefficient for each sub group. Between group inequality by using the gini coefficient of the average of both sub groups.
a list with the results of the decomposition and the parts used for the decomposition, containing the following components:
gini_decomp |
a list containing the decomposition: gini_total (value of the gini coefficient of x), gini_within (value of within-group inequality), gini_between (value of between-group inequality) and gini_overlap (value of overlap in inequality) |
gini_group |
a list containing gini_group (the gini coefficients of the different subgroups) and gini_group_contribution(the contribution of the subgroups to the total within-group inequality: adds up to gini_within) |
gini_decomp |
a list containing the means of x: mean_total (value of the mean of x of all subgroups combined) and mean_group (value of the mean of x of the individual subgroups) inequality) and gini_between (value of between-group inequality) |
share_groups |
the distribution of the subgroups z |
share_income_groups |
the distribution of vector x by subgroups z |
number_cases |
a list containing the number of cases in total, by subgroup (weighted and unweighted): n_unweighted (total number of unweighted x), n_weighted (total number of weighted x), n_group_unweighted (number of unweighted x by subgroup z), n_group_unweighted (number of weighted x by subgroup z) |
Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income inequality, Economic Journal, 92 (368), p. 886-902.
Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook of Income Distribution. Amsterdam: Elsevier, p. 87-166.
#Decomposition of the gini coefficient by level of education using Mexican Income data set data(mex_inc_2008) education_decomp <- gini_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education, weights=mex_inc_2008$factor) #complete output education_decomp #Selected output: decomposition into between- and within-group inequality and overlap (interaction) education_decomp["gini_decomp"]
#Decomposition of the gini coefficient by level of education using Mexican Income data set data(mex_inc_2008) education_decomp <- gini_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education, weights=mex_inc_2008$factor) #complete output education_decomp #Selected output: decomposition into between- and within-group inequality and overlap (interaction) education_decomp["gini_decomp"]
Returns the (optional weighted) Gini coefficient for a vector.
gini.wtd(x, weights = NULL)
gini.wtd(x, weights = NULL)
x |
a numeric vector containing at least non-negative elements. |
weights |
an optional vector of weights of x to be used in the computation of the Gini coefficient. Should be NULL or a numeric vector. |
The Gini coefficient is a measure of inequality among values of a distribution. The most used single measure for income inequality. The coefficient can theoretically range between 0 and 1, with 1 being the highest possible inequality (for instance: 1 person in a society has all income; the others none). But coefficients that are negative or greater than 1 are also possible because of negative values in the distribution. Compared to other measures of inequality, the Gini coefficient is especially sensitive for changes in the middle of the distribution.
Extension of the gini function in reldist package in order to handle missings.
The value of the Gini coefficient.
Handcock, M. (2016), Relative Distribution Methods. Version 1.6-6. Project home page at http://www.stat.ucla.edu/~handcock/RelDist.
Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC: World Bank.
Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook of Income Distribution. Amsterdam: Elsevier, p. 87-166.
#calculate Gini coefficient using Mexican Income data set data(mex_inc_2008) #unweighted Gini coefficient: gini.wtd(mex_inc_2008$income) #weighted Gini coefficient: gini.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
#calculate Gini coefficient using Mexican Income data set data(mex_inc_2008) #unweighted Gini coefficient: gini.wtd(mex_inc_2008$income) #weighted Gini coefficient: gini.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
Selection of Mexican income (survey) data and household characteristic for 2008. Extracted from ENIGH (Household Income and Expenditure Survey).
data(mex_inc_2008)
data(mex_inc_2008)
A data frame containing 5000 observations and 8 variables (a selection from the original).
Household ID.
Population inflating weights.
Household income.
Household structure, factor with levels unipersonal, nuclear, ampliado, compuesto and coresidente.
Highest achieved education of the head of the household, factor with levels Sin instruccion, Preescolar, Primaria incompleta, Primaria completa, Secundaria incompleta, Secundaria completa, Preparatoria incompleta, Preparatoria completa, Profesional incompleta, Profesional completa, Posgrado.
Population of domicile, factor with levels <2500, 2500-15000, 15000-100000, >100000.
age (integer) of the head of the household.
age (categorical) of the head of the household , factor with levels <25, 25-34, 35-44, 45-54, 55-64, 65-74, >=75.
This data set is a selecion of the original dataset of the National Institute of Statistics and Geography in Mexico (INEGI). The original contains 29468 observations and 129 variables with information on the income and household characteristics in Mexico. This selection is only meant to be used as a calculation example for the functions in this package. Results will not represent the correct information on the Mexican situation.
http://en.www.inegi.org.mx/proyectos/enchogares/regulares/enigh/nc/2008/default.html, the whole data set can be obtained here.
INEGI (2009), ENIGH 2008 Nueva construcción. Ingresos y gastos de los hogares, Aguascalientes: INEGI.
Selection of Mexican income (survey) data and household characteristic for 2016. Extracted from ENIGH (Household Income and Expenditure Survey).
data(mex_inc_2016)
data(mex_inc_2016)
A data frame containing 5000 observations and 8 variables (a selection from the original).
Household ID.
Population inflating weights.
Household income.
Household structure, factor with levels unipersonal, nuclear, ampliado, compuesto and coresidente.
Highest achieved education of the head of the household, factor with levels Sin instruccion, Preescolar, Primaria incompleta, Primaria completa, Secundaria incompleta, Secundaria completa, Preparatoria incompleta, Preparatoria completa, Profesional incompleta, Profesional completa, Posgrado.
Population of domicile, factor with levels <2500, 2500-15000, 15000-100000, >100000.
age (integer) of the head of the household.
age (categorical) of the head of the household , factor with levels <25, 25-34, 35-44, 45-54, 55-64, 65-74, >=75.
This data set is a selecion of the original dataset of the National Institute of Statistics and Geography in Mexico (INEGI). The original contains 70311 observations and 127 variables with information on the income and household characteristics in Mexico. This selection is only meant to be used as a calculation example for the functions in this package. Results will not represent the correct information on the Mexican situation.
http://en.www.inegi.org.mx/proyectos/enchogares/regulares/enigh/nc/2016/default.html, the whole data set can be obtained here.
INEGI (2017), Encuesta Nacional de Ingresos y Gastos de los Hogares 2016. ENIGH. Nueva serie. Temas, categorías y variables, Aguascalientes: INEGI.
Decomposes the change of the mean log deviation between two years/data sets into population subgroups.
mld_change(x1, z1, weights1 = NULL, x2, z2, weights2 = NULL)
mld_change(x1, z1, weights1 = NULL, x2, z2, weights2 = NULL)
x1 |
a numeric vector for the first year/dataset containing at least non-negative elements. |
z1 |
a factor for the first year/dataset containing the population subgroups. |
weights1 |
an optional vector of weights of x for the first year/dataset to be used in the computation of the decomposition. Should be NULL or a numeric vector. |
x2 |
a numeric vector for the second year/dataset containing at least non-negative elements. |
z2 |
a factor for the second year/dataset containing the population subgroups. |
weights2 |
an optional vector of weights of x for the second year/dataset to be used in the computation of the decomposition. Should be NULL or a numeric vector. |
The change of the mean log deviation can be decomposed into three components: inequality changes between and within groups and changes in the relative sizes of the groups. The change of between group inequality is measures by a change in the relative income of the subgroups. The change of within group inequality by adding up all changes in mean log deviation within the subgroups. And the contribution of changes in relative population size effects the change on both the within and between group components. For the relative contributions those two are added together.
This method is introduced by Mookherjee and Shorrocks. It is an accurate approximation of the exact decomposition. It uses a logarithmic transformation of the values of the distribution. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function.
a list with the results of the decomposition and the parts used for the decomposition, containing the following components:
mld_data1 |
the value of the mean log deviation index of x for the first year/dataset, and the decomposition into within-group and between-group inequality |
mld_data2 |
the value of the mean log deviation index of x for the second year/dataset, and the decomposition into within-group and between-group inequality |
mld_difference |
the difference between the mean log deviation and the decomposition between the second and first year/dataset |
absolute_contributions_difference |
decomposition of the absolute change in inequality into: within group changes, group size changes (split into the effect of within and between group components) and between group changes. |
relative_contributions_difference |
decomposition of the change in inequality into relatieve contributions of: within group changes, group size changes and between group changes. Adds up to 100 percent (or -100 percent for negative change) |
note |
number of zero or negative observations in both datasets. The mean log deviation uses a logarithmic transformation of x. Therefore these observations are deleted from the analysis |
Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income inequality, Economic Journal, 92 (368), p. 886-902.
Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decomposition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p. 289-322,
#Decomposition of the change in mean log deviation by level of eduction using #Mexican Income data set data(mex_inc_2008) change_education <- mld_change(x1=mex_inc_2008$income, z1=mex_inc_2008$education, weights1=mex_inc_2008$factor, x2=mex_inc_2016$income, z2=mex_inc_2016$education, weights2=mex_inc_2016$factor) #selection of the output: decomposition of the change into within- and between-group #contribution and change in de size of groups (adds up to 100 percent) change_education["relative_contributions_difference"]
#Decomposition of the change in mean log deviation by level of eduction using #Mexican Income data set data(mex_inc_2008) change_education <- mld_change(x1=mex_inc_2008$income, z1=mex_inc_2008$education, weights1=mex_inc_2008$factor, x2=mex_inc_2016$income, z2=mex_inc_2016$education, weights2=mex_inc_2016$factor) #selection of the output: decomposition of the change into within- and between-group #contribution and change in de size of groups (adds up to 100 percent) change_education["relative_contributions_difference"]
Decomposes the mean log deviation into non overlapping population subgroups. Distinction is made by between and within group inequality.
mld_decomp(x, z, weights = NULL)
mld_decomp(x, z, weights = NULL)
x |
a numeric vector containing at least non-negative elements. |
z |
a factor containing the population subgroups. |
weights |
an optional vector of weights of x to be used in the computation of the decomposition. Should be NULL or a numeric vector. |
The decomposition of the mean log deviation by between and within group inequality. Within group inequality is calculated by using the mean log deviation for each sub group. Between group inequality by the mean log deviation of the average of both sub groups.
It uses a logarithmic transformation of the values of the distribution. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function.
Based on calcGEI function in IC2 package. Handles missings.
a list with the results of the decomposition and the parts used for the decomposition, containing the following components:
mld_decomp |
a list containing the decomposition: mld_total (value of the mean log deviation index of x) mld_within (value of within-group inequality) and mld_between (value of between-group inequality) |
mld_group |
a list containing mld_group (the mean log deviations of the different subgroups) and mld_group_contribution(the contribution of the subgroups to the total within-group inequality: adds up to mld_within) |
mld_decomp |
a list containing the means of x: mean_total (value of the mean of x of all subgroups combined) and mean_group (value of the mean of x of the individual subgroups) inequality) and mld_between (value of between-group inequality) |
share_groups |
the distribution of the subgroups z |
share_income_groups |
the distribution of vector x by subgroups z |
number_cases |
a list containing the number of cases in total, by subgroup (weighted and unweighted): n_unweighted (total number of unweighted x), n_weighted (total number of weighted x), n_group_unweighted (number of unweighted x by subgroup z), n_group_unweighted (number of weighted x by subgroup z) |
note |
number of zero or negative observations. The mean log deviation uses a logarithmic transformation of x. Therefore these observations are deleted from the analysis |
Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1. https://CRAN.R-project.org/package=IC2
Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income inequality, Economic Journal, 92 (368), p. 886-902.
Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decomposition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p. 289-322,
Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC: World Bank.
#Decomposition of mean log deviation by level of education using Mexican Income data set data(mex_inc_2008) education_decomp <- mld_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education, weights=mex_inc_2008$factor) #complete output education_decomp #Selected output: decomposition into between- and within-group inequality education_decomp["mld_decomp"]
#Decomposition of mean log deviation by level of education using Mexican Income data set data(mex_inc_2008) education_decomp <- mld_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education, weights=mex_inc_2008$factor) #complete output education_decomp #Selected output: decomposition into between- and within-group inequality education_decomp["mld_decomp"]
Returns the (optional weighted) mean log deviation for a vector.
mld.wtd(x, weights = NULL)
mld.wtd(x, weights = NULL)
x |
a numeric vector containing at least non-negative elements. |
weights |
an optional vector of weights of x to be used in the computation of the mean log deviation. Should be NULL or a numeric vector. |
The mean log deviation is a measure of inequality among values of a distribution. It is a member of the Generalized Entropy Measures. Also referred to as GE(0). A value of zero is the lowest possible inequality. The measure does not have an upper bound for the highest inequality. It uses a logarithmic transformation of the values of the distribution. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function. The mean log deviation is more sensitive for changes in the lower tail of the distribution.
Extension of the calcGEI function in IC2 package in order to handle missings.
the value of the mean log deviation index.
Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1. https://CRAN.R-project.org/package=IC2
Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC: World Bank.
Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook of Income Distribution. Amsterdam: Elsevier, p. 87-166.
#calculate mean log deviation using Mexican Income data set data(mex_inc_2008) #unweighted mean log deviation: mld.wtd(mex_inc_2008$income) #weighted mean log deviation: mld.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
#calculate mean log deviation using Mexican Income data set data(mex_inc_2008) #unweighted mean log deviation: mld.wtd(mex_inc_2008$income) #weighted mean log deviation: mld.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
Breaks input vector into n groups. Returns the (optional weighted) tile of an individual observation in vector x.
ntiles.wtd(x, n, weights = NULL)
ntiles.wtd(x, n, weights = NULL)
x |
a numeric vector for which the quantiles are computed. Missing values are left as missing. |
n |
the number of desired sub groups to break vector x into. |
weights |
an optional vector of weights of x to be used in the computation of the tiles. Should be NULL or a numeric vector. |
Breaks vector x into n sub groups. The main difference with other tile functions (for instance ntile from dplyr) is that those functions break up vector x in exact equal size sub groups. Observations with the same value can end up in different tiles. In this function, observations with the same value always end up in the same tile, therefore sub groups may have different sizes. Especially when the weights argument is used. For a weighted tile function with the same group size, see for instance weighted_ntile from the grattan package.
When using a short-length vector (compared to the number of tiles) or with high variance weights, output may be different than anticipated.
A vector of integers corresponding to the quantiles of vector x.
#Break up the income variable in the Mexican Income data set into 10 groups (tiles) data(mex_inc_2008) #unweighted tiles: q <- ntiles.wtd(x=mex_inc_2008$income, n=10) #weighted tiles: qw <- ntiles.wtd(x=mex_inc_2008$income, n=10, weights=mex_inc_2008$factor)
#Break up the income variable in the Mexican Income data set into 10 groups (tiles) data(mex_inc_2008) #unweighted tiles: q <- ntiles.wtd(x=mex_inc_2008$income, n=10) #weighted tiles: qw <- ntiles.wtd(x=mex_inc_2008$income, n=10, weights=mex_inc_2008$factor)
Returns the (possibly weighted) polarization index for a vector. The Wolfson index of bipolarization is used.
A bipolarized (income) distribution has fewer observations in the middle and more in lower and/or higher part of the distribution. The regular measures of inequality (like the gini coefficient) does not give information about the polarization of the distribution. This Polarization index computes the level of bipolarization of the distribution. The concept is closely related to the Lorenz curve and therefore the scalar measure is also related to the Gini coefficient. A lower number means a lower level of polarization.
Extension of the polar.aff function in affluence-index package. Option of weighting the index is included.
polar.wtd(x, weights = NULL)
polar.wtd(x, weights = NULL)
x |
a numeric vector. |
weights |
an optional vector of weights of x to be used in the computation of the Polarization index. Should be NULL or a numeric vector. |
The value of the Wolfson polarization index.
Wolny-Dominiak, A. and A. Saczewska-Piotrowska (2017). affluenceIndex: Affluence Indices. R package version 1.0. https://CRAN.R-project.org/package=affluenceIndex
Wolfson M. (1994) When inequalities diverge, The American Economic Review, 84, p. 353-358.
Schmidt, A. (2002) Statistical Measurement of Income Polarization. A Cross-National, Berlin 10th International conference on panel data.
#calculate Polarization Index using Mexican Income data set data(mex_inc_2008) #unweighted Polarization Index: polar.wtd(mex_inc_2008$income) #weighted Polarization Index: polar.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
#calculate Polarization Index using Mexican Income data set data(mex_inc_2008) #unweighted Polarization Index: polar.wtd(mex_inc_2008$income) #weighted Polarization Index: polar.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
Returns the (optional weighted) recentered influence function of a distributional statistic.
rif(x, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian")
rif(x, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian")
x |
a numeric vector for which the recentered influence function is computed. |
weights |
an optional vector of weights of x to be used in the computation of the recentered influence function. Should be NULL or a numeric vector. |
method |
the distribution statistic for which the recentered influence function is estimated. Options are "quantile", "gini" and "variance". Default is "quantile". |
quantile |
quantile to be used when method "quantile" is selected. Must be a numeric between 0 and 1. Default is 0.5 (median). Only a single quantile can be selected. |
kernel |
a character giving the smoothing kernel to be used in method "quantile". Options are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine". Default is "gaussian". |
The RIF can be used as input for a RIF regression approach. RIF regressions are mostly used to estimate the marginal effect of covariates on distributional statistics of income or wealth.
The RIF is calculated by adding the distributional statistic (quantile, gini or variance) to the influence function. RIF is a numeric vector where each element corresponds to a particular individual’s influence on the distributional statistic.
A numeric vector of the recentered influence function of the selected distributional statistic.
Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica, 77(3), p. 953-973.
Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.
Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered influence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal, University of Evora.
data(mex_inc_2008) #Recentered influence funtion of 20th quantile rif_q20 <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="quantile", quantile=0.2) #Recentered influence funtion of the gini coefficient rif_gini <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="gini")
data(mex_inc_2008) #Recentered influence funtion of 20th quantile rif_q20 <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="quantile", quantile=0.2) #Recentered influence funtion of the gini coefficient rif_gini <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="gini")
Recentered influence function regression of a distributional statistic.
rifr(formula, data, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian")
rifr(formula, data, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian")
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted in the RIF regression. |
data |
a data frame containing the variables and weights of the model. |
weights |
an optional vector of weights of x to be used in the computation of the recentered influence function. Should be NULL or a numeric vector. Should be inside selected data frame in the function and between quotation marks. |
method |
the distribution statistic for which the recentered influence function is estimated. Options are "quantile", "gini" and "variance". Default is "quantile". |
quantile |
quantile to be used when method "quantile" is selected. Must be a numeric between 0 and 1. Default is 0.5 (median). Multiple quantiles can be used. |
kernel |
a character giving the smoothing kernel to be used in method "quantile". Options are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine". Default is "gaussian". |
RIF Regressions can be used to estimate the marginal effects of covariates on distributional statistics (such as quantiles, gini and variance). It is based on the recentered influence function of a statistic. The transformed RIF is used as the dependent variable in an ordinary least squares regression. RIF regressions are mostly used to estimate the marginal effect of covariates on distributional statistics of income or wealth.
A list containing the results of the RIF regression.
coefficients |
the coefficient estimates. |
SE |
the coefficient standard error. |
t |
the coefficient t-value. |
p |
the coefficient p-value. |
adjusted_r2 |
the adjusted r-squares. |
Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica, 77(3), p. 953-973.
Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.
Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered influence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal, University of Evora.
data(mex_inc_2008) #Recentered influence funtion of each decile rifr_q <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="quantile", quantile=seq(0.1,0.9,0.1), kernel="gaussian") #Recentered influence funtion of the gini coefficient rifr_gini <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="gini")
data(mex_inc_2008) #Recentered influence funtion of each decile rifr_q <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="quantile", quantile=seq(0.1,0.9,0.1), kernel="gaussian") #Recentered influence funtion of the gini coefficient rifr_gini <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="gini")
Inference of a RIF Regression using a bootstrap method.
rifrSE(formula, data, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian", Nboot = 100, confidence = 0.95)
rifrSE(formula, data, weights = NULL, method = "quantile", quantile = 0.5, kernel = "gaussian", Nboot = 100, confidence = 0.95)
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted in the RIF regression. |
data |
a data frame containing the variables and weights of the model. |
weights |
an optional vector of weights of x to be used in the computation of the recentered influence function. Should be NULL or a numeric vector. Should be inside selected data frame in the function and between quotation marks. |
method |
the distribution statistic for which the recentered influence function is estimated. Options are "quantile", "gini" and "variance". Default is "quantile". |
quantile |
quantile to be used when method "quantile" is selected. Must be a numeric between 0 and 1. Default is 0.5 (median). Only a single quantile can be used. |
kernel |
a character giving the smoothing kernel to be used in method "quantile". Options are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine". Default is "gaussian". |
Nboot |
the number of bootstrap replicates. Default is 100. |
confidence |
significance level for estimation of the confidence interval of the fitted model. Default is 0.95. |
RIF Regressions can be used to estimate the marginal effects of covariates on distributional statistics (such as quantiles, gini and variance). It is based on the recentered influence function of a statistic. The transformed RIF is used as the dependent variable in an ordinary least squares regression. RIF regressions are mostly used to estimate the marginal effect of covariates on distributional statistics of income or wealth.
The standard errors, confidence intervals and Z- and P-values are calculated by using a standard bootstrap method (from boot package).
A data frame containing the results of the RIF regression.
Coef |
estimated coefficients of the original (non bootstrapped) RIF regression |
lower |
lower bound of confidence interval of estimated coefficient |
upper |
upper bound of confidence interval of estimated coefficient |
SE |
standard error |
Z Value |
Z value |
P Value |
P value |
Signif |
Significance codes of P: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |
Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica, 77(3), p. 953-973.
Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.
Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered influence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal, University of Evora.
data(mex_inc_2008) #Recentered influence funtion of 20th quantile rifr_q <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="quantile", quantile=0.2, kernel="gaussian", Nboot=100, confidence=0.95) #Recentered influence funtion of the gini coefficient rifr_gini <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="gini", Nboot=100, confidence=0.95)
data(mex_inc_2008) #Recentered influence funtion of 20th quantile rifr_q <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="quantile", quantile=0.2, kernel="gaussian", Nboot=100, confidence=0.95) #Recentered influence funtion of the gini coefficient rifr_gini <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor", method="gini", Nboot=100, confidence=0.95)
Returns the (optional weighted) Theil index for a vector.
theil.wtd(x, weights = NULL)
theil.wtd(x, weights = NULL)
x |
a numeric vector containing at least non-negative elements. |
weights |
an optional vector of weights of x to be used in the computation of the Theil index. Should be NULL or a numeric vector. |
The Theil index is a measure of inequality among values of a distribution. It is a member of the Generalized Entropy Measures. Also referred to as GE(1). The index can have a value between 0 and ln N (the logarithm of the number of values), with 0 being the lowest possible inequality. It uses a logarithmic transformation of the values of the distribution. Therefore it cannot handle negative or zero values. Those are excluded from the computation in this function. The Theil Index is more sensitive for changes in the upper tail of the distribution.
Extension of the calcGEI function in IC2 package in order to handle missings.
The value of the Theil index.
Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1. https://CRAN.R-project.org/package=IC2
Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC: World Bank.
Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook of Income Distribution. Amsterdam: Elsevier, p. 87-166.
#calculate Theil Index using Mexican Income data set data(mex_inc_2008) #unweighted Theil Index: theil.wtd(mex_inc_2008$income) #weighted Theil Index: theil.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)
#calculate Theil Index using Mexican Income data set data(mex_inc_2008) #unweighted Theil Index: theil.wtd(mex_inc_2008$income) #weighted Theil Index: theil.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)