Frequently Asked Questions
- Can I link IHDS-II households and individuals with IHDS-I files?
- How do I obtain IHDS-II data?
- How do I solve my programming problem?
- May I have the codes to research questions at the village level of data?
- Can I have access to the HDPI 1993-1994 data?
- What is the recommended citation for Human Development in India, IHDS I, IHDS II, and HDPI data?
- How do I link IHDS I and HDPI data?
- Why are there so many missing values in short (SM variables), long term morbidity (MB variables) activities of daily living (AD variables), and the smoking, chewing tobacco and drinking variables in both surveys?
- What are different weight variables?
- How do I read the Birth History variables?
- Are IHDS-I and IHDS-II questions strictly comparable?
- How were schools and medical facilities selected in IHDS I and IHDS II?
- Is it possible to draw an inference about a particular state using IHDS data?
- In comparing the monthly per capita consumption expenditure variable (COPC) between the two rounds, why is there such a big change?
- How were the new households in IHDS II selected?
- When there is a difference in demographic indicator data that should not change between IHDS I and IHDS II, which one should I use?
- How are the income quintiles created?
- In the data, there are different kinds of missing data - blanks, valid blanks, and various numeric indicators (99, 88, 18)? What do these differences mean?
- Which weights should be used with which data?
- How is the DEFLATOR variable constructed?
Three files from IHDS-II of 2011-12 have been released. These are: (1) Household file with basic information about income, consumption and standard of living; (2) Individual file which contains data on employment, morbidity, and education; (3) Ever-Married woman's file which contains information on gender relations, marital history, number of children even borne and maternity care. These may be obtained from Data Sharing for Demographic Research Archive at ICPSR. Download will require registration but is available free of charge.
The India Human Development Surveys are placed in public domain as a public resource and most users find that they need little support in using these data. This is fortunate because due to resource limitations we are unable to provide any support to the users. Hence we try to create an FAQ section on www.ihds.info for questions that are of general utility. For other questions, particularly programming related questions we are unable to provide any support. We hope you will understand our constraints.
Unfortunately we are not allowed under the terms of our ethical clearance guidelines to provide any geographic information below the level of the district. We seek to create a unique public resource of panel data for researchers interested in India. However, we must balance research needs with protecting the privacy of the respondents. Thus we had a choice of limiting individual information such as caste/religion and other background data that make individual identifiable within a small village/neighborhood or limiting village/tehsil names and locations from our public release files. On advice from an eminent panel of academics and policy makers we opted for the former strategy. Thus, our ethical clearance guidelines clearly specify that we cannot release any identifying information below district. We recognize you have a genuinely interesting research problem that can benefit from geo-spatial linkages but we must regretfully decline to release this information.
Please go to the IHDS website at /1993-94-panel-data for HDPI data information. Under this page there is a link page 'HDPI panel description and data use agreement". Once you complete the agreement information and send to email@example.com, then after approval we shall send the link for downloading HDPI data sets.
Human Development in India citation:
Desai, Sonalde, Amaresh Dubey, B.L. Joshi, Mitali Sen, Abusaleh Shariff and Reeve Vanneman. 2010. Human Development in India: Challenges for a Society in Transition. New Delhi: Oxford University Press. Pp. 234.
IHDS I Citation:
Desai, Sonalde, Reeve Vanneman, and National Council of Applied Economic Research, New Delhi. India Human Development Survey (IHDS), 2005. ICPSR22626-v8. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-06-29. http://doi.org/10.3886/ICPSR22626.v8
IHDS II Citation:
Desai, Sonalde, and Reeve Vanneman and National Council of Applied Economic Research, New Delhi. India Human Development Survey-II (IHDS-II), 2011-12. ICPSR36151-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2015-07-31. http://doi.org/10.3886/ICPSR36151.v2
National Council on Applied Economic Research. 1994. Human Development Profile of India (HDPI). New Delhi.
The data set 'panelcrosswork.dta' gives the link between two surveys, 1992-93 and IHDS-I (2004-05) interview ids. v2 : state code (1992-93) v3 : hhid is a state (1992-93) stateid : state id (2004-05) distid : dist id (2004-05) psuid : psuid (2004-05) hhid : hhid (2004-05) hhsplitid: HH split id (2004-05) idhh : composite id of 2004-05 Using this link table you will be able link the two surveys.
Questionnaires were designed to reduce respondent burden. Hence at the start of the section, interviewers record the names of the respondents who engage in the activity or have the illness. Those who do not are coded as missing so as to excuse them from further questions. To clarify here is an example concerning illness. When asking about fever, cough and diarrhea, there was an introduction: “We would like to learn about the health of the various family members in this household, including very young children over the last month. We are interested in three specific illnesses: fever, cough and diarrhea. Has anybody been ill with any of these illnesses in the last month?” Then the interviewer asked, “Can you tell me names of all those that had this illness?” This allowed the interviewer to list all members suffering from illness or activity limitations. Hence it can be assumed that those not listed in this section, did not have the illness or limitation. Similarly, when the question is about behavior, those who do not engage in the behavior are coded as missing. PLEASE RECODE THE MISSING VALUES FOR THESE VARIABLES TO 0.
WT — Sample weight for the household, most useful and usually used in almost all analyses
FWT — Integer weight (truncated from WT) for STATA routines that require integer weight
INDWT — WT * NPERSONS — this represents number of individuals in the household for analyses that require individual specific weights (e.g. Head Count Ratio for Poverty) when using the household-level file
INDFWT - integer value of INDWT.
If you look at the questionnaire you will see that birth date is collection in 4 columns, first two are month and second two years. So 497 would be month 04 and year 97.
Please pay attention to the questionnaires for each variable of interest. You may find in other cases (e.g. age of child at the moment) the question was asked age in years and months. So 1606 would 16 years and 6 months. Leading 0s are dropped, trailing are not.
Also 88 reflects missing value for months and 18 is missing value for calendar years.
No. While IHDS-II follows the same general pattern as IHDS-I and many of the questions are identical, some questions were changed based on our experience with IHDS-I. Moreover, question numbers and variable names have also changed. Users are urged to consult the two sets of questionnaires and compare the question wording before trying to interpret the results.
The schools that are most commonly used by village residents based on a village focus group. In many villages, there was only one school.
*Cautious* inferences can be made at the state level for large states (or state groups as in stateid2); but not at the district level. The issue is not so much weighting but sample size and selection. The urban sample is representative only at the state level; the rural sample might be considered more representative at the district (1991 district) level; but sample sizes are small so drawing conclusions about any one district would be mistaken. Samples sizes at the state level are also small sometimes, so *cautious* inferences are necessary. More information is available at http://ihds.umd.edu/IHDS_files/AppHDinIndia.pdf
Three things are at work here:
1. The big difference is that IHDS-I is monthly per capita and IHDS-II is annual. The IHDS-II variable will be changed back to monthly during the next data update.
2. Price changes: There is a variable, DEFLATOR, in the public IHDS-II file; mean= .5453441 .
3. Economic growth.
For example, take the mean above for IHDS-I: 955.09 multiply by 12 months and divide by the average IHDS-II deflator, .5453441=21,016. The difference between 21,016 for IHDS-I and 27,155 for IHDS-II is a measure of economic growth.
You will need to do this separately for each household. Please note, there is not a hh “inflator” for the IHDS-I file. So, you can deflate the IHDS-II households by dividing by 12 months and multiplying by DEFLATOR. This will give IHDS-II totals in IHDS-I prices.
Selection information: In urban blocks and rural areas of northeastern states where 5 or more IHDS households were lost to attrition, the interviewers were asked to notify NCAER monitors of this loss. Once the loss was verified via physical check, a replacement household was randomly selected in the same neighborhood to refresh the sample. This has led to 2134 new households being included in the IHDS-II sample.
Use IHDS II which has been more recently cleaned and updated; in addition, we believe our second round of data benefits from our experience with the first in terms of procedures and supervision so IHDS II data is preferred.
The quintiles were created using a weighted STATA command. Please note: Since income is a household, not an individual, characteristic, the income quintiles are weighted quintiles of households, not of individuals. In addition, there is a zero category to the quintiles that includes negative incomes and those below 1000R.
The top quintiles have more individuals because high income households tend to have more individuals.
For households with income of R1000 or greater, if you sort the individual file by household id, and select only the first individual in each household, you will get almost exactly the same number of individuals/ households in each quintile.
For now, please ignore the different missing value codes and treat all as missing data. Our goal is to distinguish different kinds of missing data in the future, but the current data do not reliably do this.
Weights are a complex issue. Our recommendation:
If doing individual cross sectional analyses, then use the appropriate individual survey weight (WT for 2012 and SWEIGHT for 2005).
If doing a panel analysis, best approximation is to use the weights for 2005 rather than 2012.
DEFLATOR is a variable that is used to adjust for price changes over time in different states. Deflators for rural areas are based on CPI for Agricultural Wage Labour, deflators for urban areas are based on CPI for Industrial Workers. For interviews that took place before July 2012, it refers to ratio of CPI-AL and CPI-IW 2011-12 to those for 2004-05; for interviews that took place after July 2012, it is the ratio of July-December CPI-AL and CPI-IW to June-May 2004-05 CPI-AL and CPI-IW respectively.
To convert 2004-5 prices to 2011-12, divide by DEFLATOR.