Home

Manuel d'utilisation

1. SEXE SEXE MODALITE PWNHRPORPONDUBWHN 00 14 an 15 24 an 25 59 an 60 an 1 LAMBDA1 27 0 St 0 1 2 e 0 0 0 0 0 0 e St 0 72987 73203 52491 02398 44313 01767 28744 00359 07570 02896 15277 04781 13633 58023 24146 12577 M THODE IT RATION Ou P W CR RAKING RATIO PREMIER TABLEAU R CAPITULATIF DE L ALGORITHME LA VALEUR DU CRIT RE D ARR T ET LE NOMBRE DE POIDS N GATIFS APR S CHAQUE IT RATION IT RE D ARR T 31 19 6 M THODE LAMBDA2 51 0 0 0 1 1 0 0 0 0 0 0 e zl 0 06367 81548 90137 08362 27743 44720 24031 03815 05431 02652 09427 05975 14540 45016 00060 08359 LA 0 0 0 0 1 1 0 0 0 0 0 0 0 e 0 0 1 0 0 0 8702 4266 6239 6661 1647 0018 0000 N GATIFS POIDS o RAKING RATIO DEUXI ME TABLEAU R CAPITULATIF DE L ALGORITHME LES COEFFICIENTS DU VECTEUR LAMBDA DE MULTIPLICATEURS DE LAGRANGE APR S CHAQUE IT RATION MBDA3 74162 71064 57274 05495 16253 08231 10618 05350 01510 04203 05985 05073 10953 37013 73525 06877 28 LAMBDA4 0 0 0 0 1 0 e 0 0 0 0 0 0 e 0 0 59605 66919 46619 15849 16950 93289 01675 06461 00174 05293 04150 04011 07838 35722 60728 06586 LAMBDA5 0 0
2. correction and generalized calibration at the same time The results in section 3 3 can be used to calculate the precision of the estimators calibrated by this method The approximate variance Avis uses the residuals of the instrumental variable regression in the population E x B The variance estimator vy uses the residuals of the instrumental variable regression in the respondent sample weighted by the e y x B o gt where gt a B 2 bre x B s 0 is the estimator of which would be calculated if the ker response probabilities H B were known These probabilities are unknown because of f D they are estimated by replacing f with The residuals become e y x where L a y 0 which is an instrumental variable regression in sample r weighted ker by the calibration weights w H 5 Note The estimated variance viv is written in the form Q e Q e where the quadratic form Q e denotes the phase 1 selection of sample 5 variance estimate and Q e denotes the phase 2 selection of sample variance estimate Case of a generalized linear model In practice the functions B are of the form H z where z is a vector of n
3. 0 e St 0 0 0 0 0 e 0 0 e 0 0 58433 66734 45885 17125 17952 91463 00398 06705 00141 05605 03791 03860 07366 35799 59095 06527 LAMBDA6 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 58423 66734 45880 17139 17978 91443 00382 06708 00142 05611 03785 03858 07360 35802 59075 06526 LAMBDA7 0 0 0 0 1 e 0 0 0 0 e 0 0 0 0 0 58423 66734 45880 17139 17978 91443 00382 06708 00142 05611 03785 03858 07360 35802 59075 06526 M THODE RAKING RATIO COMPARAISON ENTRE LES MARGES FINALES DANS L CHANTILLON AVEC LA POND RATION FINALE ET LES MARGES DANS LA POPULATION MARGES DU CALAGE MARGE MARGE POURCENTAGE POURCENTAGE VARIABLE MODALIT CHANTILLON POPULATION CHANTILLON POPULATION CS90 2 457 457 8 95 8 95 3 470 470 9 21 9 21 4 537 537 10 52 10 52 5 435 435 8 52 8 52 6 1254 1254 24 56 24 56 7 1952 1952 38 24 38 24 SEUL90 0 3933 3933 77 04 77 04 1 1172 1172 22 96 22 96 5 90 0 1314 1314 25 74 25 74 1 833 833 16 32 16 32 2 704 704 13 79 13 79 3 1477 1477 28 93 28 93 4 777 777 15 22 15 22 AGE 00 14 an 2514 2514 19 51 19 51 15 24 an 1799 1799 13 96 13 96 25 59 an 5984 5984 46 45 46 45 60 an 2586 2586 20 07 20 07 SEXE 1 6255 6255 48 55 48 55 2 6628 6628 51 45 51 45 29 M THODE RAKING RATIO STATISTIQUES SUR LES RAPPORTS DE POIDS POND RATIONS FINALES
4. POND RATIONS INITIALES ET SUR LES POND RATIONS FINALES The UNIVARIATE Procedure Variable _F_ RAPPORT DE POIDS Basic Statistical Measures Location Variability Mean 1 000153 Std Deviation 0 63381 Median 0 841701 Variance 0 40172 Mode 0 718970 Range 4 84255 Interquartile Range 0 64433 Quantiles Definition 5 Quantile Estimate 100 Max 4 987209 99 3 442740 95 2 055925 90 1 680763 75 Q3 1 292517 50 Median 0 841701 25 Q1 0 648192 10 0 427769 5 0 373577 1 0 176809 0 Min 0 144655 Extreme Observations Lowest Highest Value IDENT Obs Value IDENT Obs 144655 9363006020 413 3 44274 2163019030 85 0 160953 7269012040 311 4 49041 2369009020 118 0 160953 5369013020 280 4 49041 8269014980 366 0 160953 2169020050 101 4 53997 5363003760 254 0 176809 9363022260 420 4 98721 9363033000 425 30 M THODE RAKING RATIO STATISTIQUES SUR LES RAPPORTS DE POIDS POND RATIONS FINALES POND RATIONS INITIALES ET SUR LES POND RATIONS FINALES The UNIVARIATE Procedure Variable _F_ RAPPORT DE POIDS Histogram Boxplot 4 9 1 T 3 1 3 2 0 Si 1 0 2 5 1 0 2 4 0 EE 6 12 17 24 JERR 54 Haas 31 JOR k k k AK 72 Pieced JEG E k k kk kk kkk k k k k k kk K 103 dns ARENA kk k kk kk k 73 25 0 1 6 m
5. 2 1 The problem In some surveys data are collected at different levels INSEE s continuing survey of household living conditions includes questions about the household type of dwelling number of persons occupation of the head of the household etc each member of the household sex age occupation etc and usually a specific set of questions for an individual selected at random from the eligible members of the household often those aged 15 and over referred to as the Kish individual e the French industry ministry s annual business survey contains questions on each firm s overall activities and a section on each of its establishments When the survey data are adjusted either independent calibrations can be performed for the various levels or simultaneous combined calibrations can be carried out Simultaneous calibration produces the same weights for all members of a household provided they were all surveyed and ensures consistency in the statistics obtained from the various data files For example when independent calibrations are performed on the sample of households and on the sample of household members the number of one person households estimated from the former sample cannot be expected to match the number of persons belonging to one person households estimated from the latter sample 2 2 The method More generally the situations described above relate to surveys that involve cluster samp
6. dent taper pour quitter l application taper F _ 24 V B i a ZE o 17 TES 25 This program has produced the following output KK K R R OR K K PARAM TRES DE LA MACRO HN EE ER EE TABLE S EN ENTR E TABLE DE DONN ES DE NIVEAU 1 DATAMEN BASE ECHANT 2 IDENTIFIANT DU NIVEAU 1 IDENT IDENT TABLE DE DONNEES DE NIVEAU 2 DATAIND BASE ECHANT_INDIV2 IDENTIFIANT DU NIVEAU 2 IDENT2 ID TABLE DES INDIVIDUS KISH DATAKISH PONDERATION INITIALE POIDS POIDS1 FACTEUR D CHELLE ECHELLE 1 POND RATION QK PONDQK UN POND RATION KISH POIDKISH TABLE S DES MARGES DE NIVEAU 1 MARMEN BASE MARGE GEN MEN DE NIVEAU 2 MARIND GEN IND DE NIVEAU KISH MARKISH MARGES EN POURCENTAGES PCT NON EFFECTIF DANS LA POPULATION DES L MENTS DE NIVEAU 1 POPMEN DES L MENTS DE NIVEAU 2 POPIND DES L MENTS KISH POPKISH REDRESSEMENT DE LA NON R PONSE DEMANDE NONREP OUI M THODE UTILIS E M 2 BORNE INF RIEURE LO BORNE SUP RIEURE UP COEFFICIENT DU SINUS HYPERBOLIQUE ALPHA SEUIL D ARR T SEUIL 0 0001 NOMBRE MAXIMUM D IT RATIONS MAXITER 15 TRAITEMENT DES COLINEARITES COLIN NON TABLE S CONTENANT LA POND FINALE DE NIVEAU 1 DATAPOI POIDSGEN DE NIVEAU 2 DATAPOI2 POIDSGEN_INDIV DE NIVEAU KISH DATAPOI3 MISE JOUR DE S TABLE S DATAPOI 2 3 MISAJOUR
7. gt a X X kes kes e the exponential method where all the calibration variables are qualitative this is the raking ratio method e Deming and Stephan 1940 e logit method this method provides lower limits L and upper limits U on the weight ratios wk dk e the truncated linear method very similar to the logit method The last two methods are used to control the range of the distribution of weight ratios The logit method is used more often because it avoids excessively large weights which can compromise the robustness of the estimates and excessively small or even negative weights which can be produced by the linear method 2 Quantitative variables or indicators associated with the response categories of qualitative variables Precision All of the Y calibrated estimators have the same precision asymptotically regardless of the method used the approximate variance of Yw is therefore equal to that of the regression estimator Y ais gt gt Auld E d E U 1 where E y x B with B gt gt 3 Ei 3 and Ek the residual of the regression of keU keU Y on the X j in the population U This variance is especially small if the variable of interest Y and calibration variables X j X J are strongly correlated A variance estimator is given by ADD A ke Tke d e d e kes kes E where e y x avec snc nn and ek is the residual of the
8. OUI PONDERATION FINALE DES UNITES 1 ET 2 POIDSFIN WGEN LABEL DE LA PONDERATION FINALE LABELPOI POIDS CALAGE GENERALISE POND RATION FINALE DES UNITES KISH POIDSKISHFIN LABEL DE LA POND RATION KISH LABELPOIKISH CONTENU DE S TABLE S DATAPOI 2 3 CONTPOI OUI DITION DES R SULTATS EDITION 3 DITION DES POIDS EDITPOI NON STATISTIQUES SUR LES POIDS STAT OUI CONTR LES CONT OUI TABLE CONTENANT LES OBS LIMIN ES OBSELI NON NOTES SAS NOTES NON 26 COMPARAISON ENTRE LES MARGES TIR ES DE L CHANTILLON AVEC LA POND RATION INITIALE ET LES MARGES DANS LA POPULATION MARGES DU CALAGE MARGE MARGE POURCENTAGE POURCENTAGE VARIABLE MODALIT CHANTILLON POPULATION CHANTILLON POPULATION CS90 2 474 87 457 9 30 8 95 3 548 05 470 10 73 9 21 4 452 79 537 8 87 10 52 5 408 56 435 8 00 8 52 6 1335 78 1254 26 16 24 56 7 1886 68 1952 36 94 38 24 SEUL90 0 3994 25 3933 78 22 77 04 1 1112 48 1172 21 78 22 96 5 90 0 1336 76 1314 26 18 25 74 1 850 08 833 16 65 16 32 2 668 15 704 13 08 13 79 3 1452 54 1477 28 44 28 93 4 799 20 777 15 65 15 22 AGE 00 14 an 2714 64 2514 20 44 19 51 15 24 an 1859 50 1799 14 00 13 96 25 59 an 5853 04 5984 44 06 46 45 60 an 2856 77 2586 21 51 20 07 SEXE 1 6610 65 6255 49 76 48 55 2 6673 30 6628 50 24 51 45 27 VARIABLE CS90 CS90 CS90 CS90 CS90 CS90 SEUL90 SEUL90 5 90 5 90 5 90 5 90 5 90
9. VALEUR DES VARIABLES NOMBRE D OBSERVATIONS RAPPORT VARIABLE MODALITE DE NIVEAU 2 DE POIDS AGE 00 14 an 230 0 91079 AGE 15 24 an 155 0 96502 AGE 25 59 an 506 1 02352 AGE 60 an 239 0 91740 SEXE 1 561 0 94519 SEXE 2 569 0 99468 ENSEMBLE 1130 0 97011 33 INITIALES INITIALES M THODE RAKING RATIO CONTENU DE LA TABLE POIDSGEN CONTENANT LA NOUVELLE POND RATION WGEN The CONTENTS Procedure Data Set Name WORK POIDSGEN Observations 439 Member Type DATA Variables 2 Engine v8 Indexes 0 Created 18 10 Thursday August 25 2005 Observation Length 24 Last Modified 18 10 Thursday August 25 2005 Deleted Observations Protection Compressed NO Data Set Type Sorted NO Label Variable Type Len Pos Label ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffllff 1 IDENT Char 10 8 2 WGEN Num 8 2 POIDS CALAGE GENERALISE M THODE RAKING RATIO CONTENU DE LA TABLE POIDSGEN_INDIV CONTENANT LA NOUVELLE PONDERATION WGEN The CONTENTS Procedure Data Set Name WORK POIDSGEN_INDIV Observations 1130 Member Type DATA Variables 3 Engine v8 Indexes 0 Created 18 10 Thursday August 25 2005 Observation Length 32 Last Modified 18 10 Thursday August 25 2005 Deleted Observations Protection Compressed NO Data Set Type Sorted NO Label Variable Type Len Pos Label TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 1 IDENT Char 10 8 3 WGEN Num 8 o POIDS CALAGE GENERALI
10. a non response correction using a homogeneous response group model and then a post stratification where the groups and the post strata are identical This is equivalent to performing a direct formal post stratification on the respondent sample The advantage of direct calibration is that it does not require explicit modelling of the response mechanism Lundstr m and S rndal 1999 also studied the properties of direct calibration and in particular they proposed variance estimators that take sampling variance and non response variance into account 4 4 Direct generalized calibration Let s start with a system of calibration equations on the respondent sample of the form X ker These equations can be interpreted as indicated below Let there be a response model of the form Pk where B denotes the actual value of the _ H Bo model s parameter The calibration equations can be rewritten as follows where B denotes the solution to the system A X 2 4 8 GE ja a with B B A et A 1210 Hence these equations take the form of generalized calibration equations where the initial weights are the d p i e the sampling weights corrected for non response and the functions which verify F 0 1 are the calibration functions The instruments are z gradF 0 e grad B Solving this system is equivalent performing
11. from a sample of a contingency table of two or more qualitative variables to the known population margins However the program is more general than mere calibration on margins since it also calibrates on the totals of quantitative variables Calmar was developed in 1990 at France s Institut National de la Statistique et des Etudes Economiques INSEE where it is used regularly to adjust survey data It is also used by many other statistics agencies in France and other countries The new version Calmar 2 developed in 2003 offers the user new resources for performing calibrations and implements the generalized calibration method of handling non response proposed by Deville 1998 Calmar can be downloaded from INSEE s Web site www insee fr and Calmar 2 will also be available on the site sometime in 2006 1 2 Calmar s calibration methods It is worth restating the principle underlying the calibration methods implemented by Calmar see also Deville et al 1993 Consider a population U of individuals from which a probabilistic sample s has been selected Let Y be a variable of interest for which we want to estimate the total in the population Y gt keU Olivier Sautory Cepe Insee 3 avenue Pierre Larousse 92245 Malakoff Cedex France sautory ensae fr This text has been first published in the Statistics Canada International Symposium Series Proceedings 2003 The usual estimator of Y is the Horvitz Thompso
12. valider l cran sans ressaisie taper V Pour reuenir sur l cran pr c dent taper R Pour quitter l application taper F 21 sas olx File View Tools Solutions Window Help vl amp Gl Gals m m o 9 Q OC ET FENETRES Command gt M thode de calage fin MAXITER Faire ENTREE pour continuer Pour valider l cran sans ressaisie taper V Pour reuenir sur l cran pr c dent taper Pour quitter l application taper F CNWINNTSProfilesNeguenec 2 IRD marrer QJ Explorateur Neuchatel E SAS f Camar2 symposiumA4 doc Bay 18 07 In case of failure of a first calibration program because of hidden colinearities among calibration variables you may change the COLIN parameter value into OUI which involves the use of generalized inverse matrices 22 _ gt Stockage des poids de calage associ s aux unit s de niveau 1 DATAPOI Stockage des poids de calage associ s aux unit s de niveau 2 DATAPOI2 POIDSFIN LABELPOI MISAJOUR Faire ENTREE pour continuer Pour valider l cran sans ressaisie taper V Pour reuenir sur l cran pr c dent taper R Pour quitter l application taper F 23 s kj apr gt m gt 8 gi ON Baa Command Edition des r sultats du calage OBSEL I EDITION EDITPOI CONT NOTES CONTPOI Pour valider l cran taper V pour revenir sur l cran pr c
13. 08119 8369002830 386 84 2340 9363033000 425 33 M THODE RAKING RATIO STATISTIQUES SUR LES RAPPORTS DE POIDS POND RATIONS FINALES POND RATIONS INITIALES ET SUR LES POND RATIONS FINALES The UNIVARIATE Procedure Variable MEIN POND RATION FINALE Histogram Boxplot 82 5 1 2 1 42 5 1 i 2 3 0 et 4 QUEE 14 o 23 RR RK KK KK KKK 50 BR OH A A A 2 kk kakak kok KK KKK 99 OK ee OK OK OK A OK KK KK KK KK KKK KK KKK 185 2 5 56 may represent up to 4 counts Normal Probability Plot 82 5 42 54 pp K kk k k 2 5 bbb 34 M THODE RAKING RATIO RAPPORTS DE POIDS MOYENS POND RATIONS FINALES POND RATIONS POUR CHAQUE VALEUR DES VARIABLES NOMBRE D OBSERVATIONS RAPPORT VARIABLE MODALITE DE NIVEAU 1 DE POIDS CS90 2 38 0 94738 CS90 3 51 0 88555 CS90 4 39 1 12938 CS90 5 39 1 07181 CS90 6 113 0 91558 CS90 7 159 1 06035 SEUL90 0 341 0 98348 SEUL90 1 98 1 05817 STRATE90 0 92 0 98297 5 90 1 88 0 97991 5 90 2 83 1 05366 5 90 3 86 1 01684 STRATE90 4 90 0 97222 ENSEMBLE 439 1 00015 M THODE RAKING RATIO RAPPORTS DE POIDS MOYENS POND RATIONS FINALES POND RATIONS POUR CHAQUE
14. CALMAR 2 A NEW VERSION OF THE CALMAR CALIBRATION ADJUSTMENT PROGRAM Olivier ABSTRACT Calmar 2 is the new version of the Calmar calibration adjustment program It contains two major developments When survey data are collected at different levels e g households and individuals simultaneous calibration of the samples helps maintain consistency in the statistics produced from the samples Where there is total non response generalized calibration makes it possible to rewrite the calibration equations with two sets of variables the actual calibration variables and the non response explanatory variables This corrects for non response even when the variables that explain it are unknown for the sample non respondents KEYWORDS Calibration Generalized Calibration Non Response Simultaneous Calibration 1 THE CALMAR MACROS 11 Background Calmar is a SAS macro program that implements the calibration methods developed by Deville and S rndal 1992 The program adjusts samples through reweighting of individuals using auxiliary information available from a number of variables referred to as calibration variables The weights produced by this method are used to calibrate the sample on known population totals in the case of quantitative variables and on known category frequencies in the case of qualitative variables Calmar is an acronym for CALibration on MARgins an adjustment technique which adjusts the margins estimated
15. REE pour continuer Pour revenir l cran pr c dent taper R Pour abandonner application taper Fo 15 FENETRE2 Faire ENTREE pour continuer Pour revenir l cran pr c dent taper Pour quitter l application taper 16 FENETRE2 Faire ENTREE pour continuer Pour valider l cran sans ressaisie taper U Pour revenir l cran pr c dent taper R Pour quitter l application taper F L 17 FENETRE4 onn es de niveau 1 exem DATAMEN PONDGK MARMEN Faire ENTREE pour continuer Pour valider l cran sans ressaisie taper U Pour revenir sur l cran pr c dent taper R Pour quitter l application taper F 18 A FENETRE5 Command gt Donn es de niveau 2 exemple niveau individus DATAIND IDENT2 MARI ND Pour valider l cran sans ressaisie taper V Pour revenir sur l cran pr c dent taper R Pour quitter l application taper F 19 E c X Be x NONREP Faire ENTREE pour continuer Pour valider l cran sans ressaisie taper H Pour reuenir sur l cran pr c dent taper R Pour quitter l application taper res Using the parameter NONREP OUI is the way to perform a generalized calibration 20 A FENETRE 7 Command gt M thode de calage ECHELLE Faire ENTREE pour continuer Pour
16. SE R R 2 id Char 12 18 36 Jo db 4b dk a SE S et bo e SS X X X gt K K K K K OK X X K KK OK OK OK K KK BILAN ke Eoo eoe o K X OK OK K K K DATE 25 AOUT 2005 HEURE 17 48 gt K gt x gt K K K x 2 K oe KE X 2K K kK K X K ooo R R OR RK R TABLE EN ENTR E BASE ECHANT_MEN2 gt K R ok ok 5k 5k 5k 5k Pk sk Pk oF oF EE kk k k R R R NOMBRE D OBSERVATIONS DANS LA TABLE EN ENTR E 439 NOMBRE D OBSERVATIONS LIMIN ES 0 NOMBRE D OBSERVATIONS CONSERV ES 439 VARIABLE DE PONDERATION POIDS1 NOMBRE DE VARIABLES CATEGORIELLES 3 LISTE DES VARIABLES CATEGORIELLES ET DE LEURS NOMBRES DE MODALITES 590 6 seul90 2 strate90 5 SOMME DES POIDS INITIAUX 5107 TAILLE DE LA POPULATION 5105 VARIABLES DE NON REPONSE NOMBRE DE VARIABLES CATEGORIELLES 3 LISTE DES VARIABLES CAT GORIELLES ET DE LEURS NOMBRES DE MODALIT S cs96 6 seul96 2 strate96 5 gt K gt x gt HA K K gt x gt K oe oe ok oko koe oe 2K K FK OK X K K K 2K R OR R KOR OR R OR TABLE EN ENTR E BASE ECHANT_INDIV2 gt F oF ok 5k 5k o k k k k k k k k k kk k k k k k NOMBRE D OBSERVATIONS DANS LA TABLE EN ENTR E 1130 NOMBRE D OBSERVATIONS LIMIN ES NOMBRE D OBSERVATIONS CONSERV ES 1130 NOMBRE DE VARIABLES CAT GORIELLES 2 LISTE DES VARIABLES CAT GORIELLES ET DE LEURS NOMBRES DE MODALITES age 4 sexe 2 SOMME DES POIDS INITIAU
17. X 13284 TAILLE DE LA POPULATION 12883 VARIABLES DE NON REPONSE NOMBRE DE VARIABLES CATEGORIELLES 2 LISTE DES VARIABLES CAT GORIELLES ET DE LEURS NOMBRES DE MODALIT S age bis 4 sexe bis 2 M THODE UTILIS E RAKING RATIO LE CALAGE A T R ALIS EN 7 IT RATIONS LES POIDS ONT T STOCK S DANS LA VARIABLE WGEN DE LA TABLE POIDSGEN ET DE LA TABLE POIDSGEN INDIV 37
18. ay represent up to 3 counts 31 M THODE RAKING RATIO STATISTIQUES SUR LES RAPPORTS DE POIDS POND RATIONS FINALES POND RATIONS INITIALES ET SUR LES POND RATIONS FINALES The UNIVARIATE Procedure Variable _F_ RAPPORT DE POIDS Normal Probability Plot 4 94 F 2 5 E pkk kkk REPE PARLES 2 kok k K k K K K pa 1 2 1 0 1 2 32 M THODE RAKING RATIO STATISTIQUES SUR LES RAPPORTS DE POIDS POND RATIONS FINALES POND RATIONS INITIALES ET SUR LES POND RATIONS FINALES The UNIVARIATE Procedure Variable MEIN POND RATION FINALE Basic Statistical Measures Location Variability Mean 11 62870 Std Deviation 8 61949 Median 9 78269 Variance 74 29559 Mode 6 94525 Range 82 93829 Interquartile Range 7 63255 Quantiles Definition 5 Quantile Estimate 100 Max 84 23396 99 44 74633 95 25 32619 90 21 86779 75 Q3 14 02605 50 Median 9 78269 25 Q1 6 39351 10 4 43868 5 3 62805 1 2 08119 0 Min 1 29567 Extreme Observations Lowest Highest Value IDENT Obs Value IDENT Obs 1 29567 7269012040 311 44 7463 3169012010 187 1 29567 5369013020 280 48 5008 2163000830 77 1 29567 2169020050 101 65 2456 2369009020 118 2 03368 1163023120 16 65 2456 8269014980 366 2
19. calibration equations are F x A x X ker If one of the calibration variables is the constant variable equal to 1 or at least a qualitative variable the d be multiplied by a constant with no effect on w d Ha 1 Consequently the calibration equations can be rewritten gt a Fx Al X which shows that this strategy is equivalent to the n ker r previous one with a non response correction using a uniform response model Dupont 1996 compared the two strategies on the basis of theoretical considerations and simulations His study led to the following findings If the non response correction is performed by a generalized linear model where the H function is one of the usual calibration functions F and if the calibration variables X contain the non response explanatory variables Z then the two strategies produce very similar results Furthermore if the calibration variables X are identical with the non response explanatory variables Z the following two strategies are equivalent e performing a non response correction using a generalized linear model with the exponential function as the H function then performing a calibration using the corrected weights with the exponential function as the calibration function F e performing a direct calibration using the initial weights with the exponential function as the calibration function F The same is true if we perform
20. emaitre G and Dufour J 1987 An integrated method for weighting persons and families Survey Methodology 13 pp 199 207 Le Guennec J et Sautory O 2003 La macro Calmar2 manuel d utilisation document interne INSEE Le Guennec J 2004 Correction de la non r ponse par calage g n ralis une exp rimentation Actes des journ es de m thodologie statistique 16 et 17 d cembre 2002 INSEE M thodes para tre Lundstr m S and Sarndal C E 1999 Calibration as a standard method for treatment of nonresponse Journal of Official Statistics 15 pp 305 327 Roy G et Vanheuverzwyn A 2001 Redressement par la macro CALMAR applications et pistes d am lioration in Traitements des fichiers d enqu te Presses Universitaires de Grenoble pp 31 46 Sautory O 1996 Calage sur des chantillons de m nages d individus d individus Kish issus d une m me enqu te communication invit e aux Journ es de Statistique de l ASU Qu bec Canada 11 ANNEXE AN EXAMPLE OF SIMULTANEOUS GENERALIZED CALIBRATION The survey A sample has been withdrawn to investigate on population s way of life work level of income cultural consumptions social integration The survey was performed in 1996 About 1100 individuals have been selected through a cluster sampling design The first stage sample includes 439 households It is stratified according to the agglomeration size and withdrawn by simple random sa
21. imum likelihood Din method produces the estimates where n resp n is the number of individuals in group h h who are in sample s resp sample r p 1s therefore the observed response rate in group h Generalized linear model The probability of response is a function of a vector z of non response explanatory variables 7 and an unknown parameter p 2 1 H z B 0 where H is a function defined on R with values in in principle To estimate land therefore the p Z variables must be known for both respondents and non respondents It is possible to use an even more general model of the form p 1 where lis a vector of J adjustment parameters and H is a function dependent on individual k We will now examine various calibration strategies for cases where there is total non response 4 2 Calibration after correcting for non response Suppose we have corrected for total non response for example with one of the methods described above Thus we can perform a conventional calibration starting with the weights corrected for non response d S d The calibration equations are written gt d F X where F is one of the usual Pk ker calibration functions 4 3 Direct conventional calibration Another strategy is to perform a calibration directly without prior correction for non response The
22. les AGE_BIS and SEX_BIS equal to AGE and SEX in the individuals sample data set as pseudo instrumental variables in order to complete the zk vector dimension The population totals data sets have the following form e Primary units var n r mari mar2 mar3 mar4 mars mar6 strate90 5 0 1314 833 704 1477 777 5 190 2 0 3933 1172 S cs90 6 0 457 470 537 435 1254 1952 strate96 5 1 seul96 2 1 cs96 6 1 e Secondary units var n r mari mar2 mar3 mar4 sexe 2 0 6255 6628 age 4 0 2514 1799 5984 2586 sexe bis 2 1 age bis 4 1 The variable R points out the calibration variables R 0 and the instrumental variables R 1 For these ones no population total has to be entered In both cases we must specify the number of levels of categorical variables variable N The 7 CALMAR2 GUIDE interface We may specify the macro parameters through 7 CALMAR2 GUIDE program The varied data files which are referred to in the calibration adjustment must have been allocated first Entering the CALMAR2_GUIDE statement in the SAS Editor window makes the following interactive screens appear which allow the user to specify his parameters value In this example we choose a generalized two levels simultaneous calibration 13 k SAS FENETRE1 Faire ENTREE pour continuer Jeudi 25 aout 2005 Taper F pour abandonner l application _ 14 SAS FENETRE1B amp 8 Faire ENT
23. ling or multi stage sampling where there is auxiliary information about the clusters or primary units and the secondary units and where the survey s variables of interest concern both the clusters or PUs and the SUs The simultaneous calibration method was proposed by Sautory 1996 It is more general than the method proposed by Lemaitre and Dufour 1987 It consists in performing a single calibration at the PU level Estimates of the totals for the calibration variables defined at the SU level are computed and then used in the PU calibration which includes both PU and SU variables Thus if X is a calibration variable for the SUs the estimate X gt x im Lis calculated for each PU kem m where denotes the probability of inclusion of SU k when PU m has been selected Hence the calibration equation for variable X can be written X X where s denotes the PU sample 2 3 An example Suppose we have a survey in which a sample of households s was selected and some data on the sample households were collected All members of the selected households were surveyed forming a sample s In addition an individual k referred to as the Kish individual was chosen in each selected household m by simple random sampling without replacement among the e eligible members of the household e g individuals aged 15 and over and surveyed with a specific questionnaire Note that xm is the vector
24. mpling in the stratum out of the previous population census which took place in 1990 All households members are included into the final sample Although the target population is composed of the individuals that is the secondary units the survey includes questions about the habitation and about the whole family that is primary units such as the number of persons in the household the head of household s profession the household s total income The calibration model We want to calibrate the estimations on sex and age population distribution and on households size and households professional group distribution Households size and head of households profession are correlated both to the variables of interest and to response behaviour Population totals in those domains are only known in the sampling frame that is at the last census date For that reason the survey estimators are adjusted by calibration on those four variables totals household size single person 2 persons 6 persons or more head of household professional group individual age group individual sex We also add the strata among the calibration variables in order to keep the equality between the sum of weights and the stratum population The update household size and head of household profession collected in the survey are supposed to be better explanatory variables for non response than the sampling frame values They are observed only on respondents and
25. n estimator 1 Yur gt y kes Tk kes Assume that we know the population totals for J auxiliary variables X1 X j X J available in the sample X keU We will look for new calibration weights wk that are as close as possible as determined by a certain distance function to the initial weights dk these are usually the sampling weights equal to the inverses of the probabilities of inclusion These wk are calibrated on the totals of the X j variables in other words they verify the calibration equations Vieh S w k 1 kes The solution to this problem is given by w d F x A where Xx X X is a vector of J Lagrange multipliers associated with the constraints 1 and F is a function the calibration function whose terms depend on the distance function that is used Vector is determined by the solution to the non linear system of J equations in J unknowns resulting from the calibration equations F x Aix X kes The estimator of the total for a variable of interest will be the calibrated estimator Y kes The original version of Calmar offered four calibration methods corresponding to four different distance functions These methods are characterized by the form of function F e thelinear method the calibrated estimator is the generalized regression estimator A i x r J where X X
26. ng the respondent units Reweighting techniques are based on models of the response mechanism This mechanism is similar to random selection of a sample r of size nr from sample s This selection can be viewed as a supplementary phase added to the original sample design defined by a pseudo sample design denoted q rls Associated with this design are the individual response probabilities p P k er k es If these probabilities were known the total Y for a variable of interest would be estimated without bias by Yeo 2 ker Px y known as the expansion estimator In fact though the design q rls and therefore the probabilities pk are unknown They must therefore be estimated substituting a model for the response mechanism and using an estimation method maximum likelihood moments etc A logical choice is the Poisson model q fully specify this model we must ker kes r provide the form of the probabilities pk Three conventional models of the non response mechanism are described below Uniform response model We assume that each individual has the same probability of response p p VkeU The maximum likelihood method produces the estimate p Pe the observed response rate n Homogeneous response groups Population U is split into H groups that are assumed homogeneous with respect to non response All individuals in group h have the same probability of response denoted p The max
27. of known auxiliary variables for each household m in household sample s X gt x is the vector of the totals for these variables which totals are known for the population of meUy households U 2 15 the vector of known auxiliary variables for each individual i in household mj Z gt z is the vector of the totals for these variables whose totals are known for the population of 0 individuals U vk is the vector of known auxiliary variables for each Kish individual k in household m and V gt v is the vector of the totals for these variables whose totals are known for the population of ieU eligible individuals Uf The probabilities of inclusion of households m are denoted x and we let 4 1 7 The probabilities of inclusion of individuals m i when household has been selected are 1 The probability of inclusion of Kish individual when household m has been selected is Lie The method involves performing a single calibration at the household level calculating for each household the totals of the calibration variables for individuals Z 2 and the estimated totals m i emen of the calibration variables for Kish individuals V e The calibration variables vector for household m becomes Xas Za and the totals vector X Z V The calibration equations are written as follows FG AZ UEL DG Zins Win X Z V mesq u y denote components of the Lagrange m
28. on response explanatory variables Z The calibration equations are H z B x X E ker ZB and are equal to H z Bo H z p The instruments are Z Z 7 L These are estimated by Hab the z when H is the exponential function Properties of the method The dissociation in a system of calibration equations E between the 2 non response explanatory variables and the X calibration variables results in a lower non response bias courtesy of the Z and a smaller variance thanks to the X The method requires that the number of 7 variables quantitative variables and indicators of quantitative variable response categories be equal to the number of X calibration variables In addition the method is effective only if the correlations between the Z and the X are sufficiently strong Unlike the standard non response adjustment methods this method works even when the variables that cause the non response are known only for respondents In particular it handles situations where the non response factors are variables of interest non ignorable response mechanism Calmar 2 makes it possible to use this method with the H functions being the usual calibration functions Le Guennec 2004 provides an example of how the method can be applied to survey data REFERENCES Caron N et Sautory O 2004 Calages simul
29. re generally of the form F F z Where z is a vector of J variables Z known for sample s and F is a function of Rin R such that F 0 1 and F 0 1 and hence grad F 0 z The calibration equations are gt a F z A x X kes When F is a linear function zk 1 zk the calibrated estimator is the instrumental variable regression estimator Y reg i since we 5 1 3 3 Precision Through proofs similar to the ones used in conventional calibration we obtain the following results The approximate variance of the calibrated estimator can be written AVI Y Y A a E a E where U 1 E y x B avec d Ei 3 is the residual of the regression of Y on the X keU keU in U with instrumental variables Z A variance estimator is given by v v J yx d e 9 s kr 1 where e y x Bg avec Es Z E 2 3 is the residual of the regression weighted kes kes by the dk of Y on the j X in sample s with instrumental variables j Z 4 CALIBRATION IN THE CASE OF TOTAL NON RESPONSE 4 1 Standard methods of correcting for total non response Total non response is usually accounted for by reweighti
30. regression weighted by the of Y on the X j in sample s 1 3 What s new in Calmar 2 In addition to the four calibration methods mentioned above Calmar 2 Le Guennec and Sautory 2003 offers the following e simultaneous calibration for different levels in a survey e adjustment for total non response using generalized calibration These two features will be described in detail below Calmar 2 offers a solution to the problem of collinearity between calibration variables it uses generalized inverse matrices to compute the calibration weights The original version of Calmar produced an error message in such cases Calmar 2 also offers a new distance function the generalized hyperbolic sine function which depends on a parameter a Like the exponential method this method always yields positive weights but the distribution of weights at the high end is narrower In addition the method reduces the range of the weight distribution as do the logit and truncated linear methods but it does so with only one parameter Roy et al 2001 Finally the program is more user friendly especially in two respects e users can enter qualitative calibration variables without prior recoding to obtain sequential response categories e users have the option of entering parameters interactively using capture screens that guide them in their choices 2 SIMULTANEOUS CALIBRATIONS
31. reviously denotes the kes vector of the J calibration variables Solving this system for yields the new weights w d F Basic result Let grad F 0 2 vectors that will be referred to as instruments see below We can show that calibrated estimators based on the same instruments and the same calibration variables are all asymptotically equivalent We can rewrite the calibration equations X gt d off Xk kes 2 or This yields X X Ja d A d x Ob kes kes 2 H E dE X o x x if we let z x which is assumed to be of full rank kes A calibrated estimator Y x 18 therefore asymptotically equivalent to kes gt d Zk KR x Xr Ib E Yo x r j Ta gt a Zk Yk Ys x Xr E B P egi kes kes SZX D verifies gt a Zy Vx gt a 2 HI it is the vector of the coefficients of the instrumental kes kes variable regression weighted by the d of Y the X X X variables the variables that make up the z vectors are the instruments for example see Fuller 1987 By analogy with the generalized regression estimator the estimator Y is referred to as the instrumental variable regression estimator regi 3 2 Standard form of the calibration functions In practice calibration functions F a
32. tan s pour diff rentes unit s d une m me enqu te Document de travail M thodologie statistique n 0403 INSEE Deming W E and Stephan F F 1940 On a least squares adjustment of a sampled frequency table when the exact totals are known Annals of Mathematical Statistics 11 pp 427 444 Deville J C and S rndal C E 1992 Calibration estimation in survey sampling Journal of the American Statistical Association 87 n 418 pp 375 382 Deville J C S rndal C E and Sautory O 1993 Generalized raking procedures in survey sampling Journal of the American Statistical Association 88 n 423 pp 1013 1020 10 Deville J C 1998 La correction de la non r ponse par calage ou par chantillonnage quilibr Actes du colloque de la Soci t Statistique du Canada Sherbrooke Canada Deville 1 2004 La correction de la non r ponse par calage g n ralis Actes des journ es de m thodologie statistique 16 et 17 d cembre 2002 INSEE M thodes paraitre Dupont F 1996 Calage et redressement de la non r ponse totale Actes des journ es de m thodologie statistique 15 et 16 d cembre 1993 INSEE M thodes 56 5 7 58 Estevao V and S rndal C E 2003 Calibration estimation in sample surveys an overview and recent developments article pr sent au Joint Statistical Meetings de l ASA San Fransisco Fuller W 1987 Measurement Error Models New York Wiley L
33. their totals in the population are unknown That s the reason why they are introduced as instrumental variables GA into the calibration adjustment Both calibration variables and instrumental variables vectors x and z must have the same dimension As we have only two real instrumental variables update household size and head of household professional group while the x vector is composed of 5 variables we simply add to the zk vector some of the calibration variables This leads to the following model e primary units level strata90 strata96 strata90 household size in 1990 zy household size in 1996 head of household profession in 1990 head of household profession in 1996 e secondary units level sex in 1990 sex bis sex X age in 1990 lt age bis age 12 The data sets structure The households sample data set includes the two instrumental variables named SEUL96 household size in the survey and CS96 head of household professional group in the survey and the three calibration variables named STRATA90 strata number SEUL90 household size in the sampling frame and cs90 head of household professional group at the census date The individuals sample data set includes the two calibration variables named AGE and SEX coming from the sampling frame We create the variable STRATA96 equal to STRATA9O in the households sample data set and the variab
34. ultipliers vector The solutions w d Da 4 Z y of these equations are the new household weights Thus the weight w assigned to individual i of household m in the sample of individuals is equal to the weight w Of household m The weight w assigned to the Kish individual of household m is equal to e w It can be verified that with these weights the various samples are correctly calibrated on totals X Zand V gt Wai Zmi gt 4 SH x Zn Z ies mi e men mes a Em Ve d i Va V Km ESK Km ESK Km ESK This method could be used with Calmar see Caron and Sautory 2004 but some SAS programming would be required Calmar 2 performs all the operations necessary to reduce the process to a single calibration The user must provide the entry tables for the various levels and the totals for the calibration variables Estevao and S rndal 2003 compare several calibration methods for two stage sample designs including the method described below 3 GENERALIZED CALIBRATION 3 1 The underlying principle While calibration is usually presented using functions of distance between weights Deville 2002 for example states the calibration equations directly with calibration functions defined in a very general form R XeR SR J eR 1 with E 0 1 where is a vector of J adjustment parameters The generalized calibration equations are written X where x as p

Manuel d'utilisation

Contents

Download Pdf Manuals

Related Search

Related Contents