Skip to main content
How do I obtain IHDS data?

All data files may be obtained from Data Sharing for Demographic Research Archive at ICPSR. Download will require registration but is available free of charge.

How do I solve my programming problem?

The India Human Development Surveys are placed in public domain as a public resource and most users find that they need little support in using these data. This is fortunate because due to resource limitations we are unable to provide any support to the users. Hence we try to create an FAQ section on www.ihds.info for questions that are of general utility. For other questions, particularly programming related questions we are unable to provide any support. We hope you will understand our constraints.

Can I link IHDS 2 households and individuals with IHDS 1 files?

Is IHDS a panel survey? If so, can I link IHDS 1 files with IHDS 2 files? What is required for linking?

IHDS 1 and 2 are panel surveys. IHDS 2 reinterviewed about 83% of the IHDS 1 households plus any split households that resided in the same community. Linking information is available at both household and individual level. In order to link two rounds of data, you will require linking files which can be downloaded at this website. You will need to register in order to download these files. Please click here and you will be taken to registration, following which you will be able to download the files. These are STATA 11/12 format. If you have an older version of STATA or are using another program, you will need to use a conversion program to convert the files.

Update:  ICPSR has a merged version of the two panels available at:

https://www.icpsr.umich.edu/web/DSDR/studies/37382

May I have the codes to research questions at the village level of data?

Unfortunately we are not allowed under the terms of our ethical clearance guidelines to provide any geographic information below the level of the district. We seek to create a unique public resource of panel data for researchers interested in India. However, we must balance research needs with protecting the privacy of the respondents. Thus we had a choice of limiting individual information such as caste/religion and other background data that make individuals identifiable within a small village/neighborhood or limiting village/tehsil names and locations from our public release files.

On advice from an eminent panel of academics and policy makers, we opted for the former strategy. Thus, our ethical clearance guidelines clearly specify that we cannot release any identifying information below the district level. We recognize you have a genuinely interesting research problem that can benefit from geo-spatial linkages but we must regretfully decline to release this information.

Can I have access to the HDPI 1993-1994 data?

Please go to the IHDS website HDPI page. Under this page there is a link page 'HDPI panel description and data use agreement". Complete the agreement information and send to ihdsinfo@gmail.com; once approved we shall send the link for downloading HDPI data sets.  This process can take up to two weeks.

What is the recommended citation for Human Development in India, IHDS 1, IHDS 2, and HDPI data?

Human Development in India citation:

Desai, Sonalde, Amaresh Dubey, B.L. Joshi, Mitali Sen, Abusaleh Shariff and Reeve Vanneman. 2010. Human Development in India: Challenges for a Society in Transition. New Delhi: Oxford University Press. Pp. 234.

IHDS I Citation:

Desai, Sonalde, Reeve Vanneman, and National Council of Applied Economic Research, New Delhi. India Human Development Survey (IHDS), 2005. ICPSR22626-v8. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-06-29. http://doi.org/10.3886/ICPSR22626.v8

IHDS II Citation:

Desai, Sonalde, Reeve Vanneman and National Council of Applied Economic Research. India Human Development Survey-II (IHDS-II), 2011-12. Inter-university Consortium for Political and Social Research [distributor], 2018-08-08. https://doi.org/10.3886/ICPSR36151.v6

HDPI Citation:

National Council on Applied Economic Research. 1994. Human Development Profile of India (HDPI). New Delhi.

Why are there so many missing values in short (SM variables), long term morbidity (MB variables) and activities of daily living?

(**Also pertains to the smoking, chewing tobacco and drinking variables in both surveys.)

Questionnaires were designed to reduce respondent burden. Hence at the start of the section, interviewers record the names of the respondents who engage in the activity or have the illness. Those who do not are coded as missing so as to excuse them from further questions.

To clarify, here is an example concerning illness. When asking about fever, cough and diarrhea, there was an introduction: “We would like to learn about the health of the various family members in this household, including very young children over the last month. We are interested in three specific illnesses: fever, cough and diarrhea. Has anybody been ill with any of these illnesses in the last month?”

Then the interviewer asked, “Can you tell me names of all those that had this illness?” This allowed the interviewer to list all members suffering from illness or activity limitations. Hence it can be assumed that those not listed in this section, did not have the illness or limitation. Similarly, when the question is about behavior, those who do not engage in the behavior are coded as missing. PLEASE RECODE THE MISSING VALUES FOR THESE VARIABLES TO ZERO (0).

What are different weight variables?

WT — Sample weight for the household, most useful and usually used in almost all analyses

FWT — Integer weight (truncated from WT) for STATA routines that require integer weight

INDWT — WT * NPERSONS — this represents number of individuals in the household for analyses that require individual specific weights (e.g. Head Count Ratio for Poverty) when using the household-level file

INDFWT - integer value of INDWT.

How do I read the Birth History variables?

If you look at the questionnaire you will see that birth date is collection in 4 columns, first two are month and second two years. So 497 would be month 04 and year 97.

Please pay attention to the questionnaires for each variable of interest. You may find in other cases (e.g. age of child at the moment) the question was asked age in years and months. So 1606 would 16 years and 6 months. Leading 0s are dropped, trailing are not.

Also 88 reflects missing value for months and 18 is missing value for calendar years.

Are IHDS 1 and IHDS 2 questions strictly comparable?

No. While IHDS 2 follows the same general pattern as IHDS 1 and many of the questions are identical, some questions were changed based on our experience with IHDS 1. Moreover, question numbers and variable names have also changed. Users are urged to consult the two sets of questionnaires and compare the question wording before trying to interpret the results.

In comparing the monthly per capita consumption expenditure variable (COPC) between IHDS 1 & 2, why is there such a big change?

Three things are at work here:

1. The big difference is that IHDS 1 is monthly per capita and IHDS 2 is annual. The IHDS 2 variable will be changed back to monthly during the next data update.

2. Price changes: There is a variable, DEFLATOR, in the public IHDS 2 file; mean= .5453441 .

3. Economic growth.

For example, take the mean above for IHDS 1: 955.09 multiply by 12 months and divide by the average IHDS 2 deflator, .5453441=21,016. The difference between 21,016 for IHDS-I and 27,155 for IHDS 2 is a measure of economic growth.

You will need to do this separately for each household. Please note, there is not a hh “inflator” for the IHDS 1 file. So, you can deflate the IHDS 2 households by dividing by 12 months and multiplying by DEFLATOR. This will give IHDS 2 totals in IHDS 1 prices.

Is it possible to draw an inference about a particular state using IHDS data?

*Cautious* inferences can be made at the state level for large states (or state groups as in stateid2); but not at the district level. The issue is not so much weighting but sample size and selection. The urban sample is representative only at the state level; the rural sample might be considered more representative at the district (1991 district) level; but sample sizes are small so drawing conclusions about any one district would be mistaken. Samples sizes at the state level are also small sometimes, so *cautious* inferences are necessary. 

How were the new households in IHDS 2 selected?

Selection information: In urban blocks and rural areas of northeastern states where 5 or more IHDS households were lost to attrition, the interviewers were asked to notify NCAER monitors of this loss.

Once the loss was verified via physical check, a replacement household was randomly selected in the same neighborhood to refresh the sample. This has led to 2134 new households being included in the IHDS 2 sample.

When there is a difference in demographic indicator data that should not change between IHDS 1& IHDS 2, which one should I use?

For example, a respondent's age, sex, religion or caste.

Use IHDS 2 which has been more recently cleaned and updated.  We believe our second round of data benefits from our experience with the first, in terms of procedures and supervision, so IHDS 2 data is preferred.

When there is a difference in demographic indicator data that should not change between IHDS 1& IHDS 2, which one should I use?

For example, a respondent's age, sex, religion or caste.

Use IHDS 2 which has been more recently cleaned and updated.  We believe our second round of data benefits from our experience with the first, in terms of procedures and supervision, so IHDS 2 data is preferred.

How were schools and medical facilities selected in IHDS 1 and IHDS 2?

The schools that are most commonly used by village residents based on a village focus group. In many villages, there was only one school.

Why are there different kinds of missing data - blanks, valid blanks, and various numeric indicators (99, 88, 18) in the data?

For now, please ignore the different missing value codes and treat all as missing data. Our goal is to distinguish different kinds of missing data in the future, but the current data do not reliably do this.

How are the income quintiles created?

If there are five equal categories, why do they appear unequal?

The quintiles were created using a weighted STATA command. Please note: Since income is a household, not an individual, characteristic, the income quintiles are weighted quintiles of households, not of individuals. In addition, there is a zero category to the quintiles that includes negative incomes and those below 1000R.
The top quintiles have more individuals because high income households tend to have more individuals.
For households with income of R1000 or greater, if you sort the individual file by household id, and select only the first individual in each household, you will get almost exactly the same number of individuals/ households in each quintile.

Which weights should be used with which data?

Weights are a complex issue. Our recommendation:


If doing individual cross sectional analyses, then use the appropriate individual survey weight (WT for 2012 and SWEIGHT for 2005).
If doing a panel analysis, best approximation is to use the weights for 2005 rather than 2012.

How is the DEFLATOR variable constructed?

DEFLATOR is a variable that is used to adjust for price changes over time in different states. Deflators for rural areas are based on CPI for Agricultural Wage Labour, deflators for urban areas are based on CPI for Industrial Workers. For interviews that took place before July 2012, it refers to ratio of CPI-AL and CPI-IW 2011-12 to those for 2004-05; for interviews that took place after July 2012, it is the ratio of July-December CPI-AL and CPI-IW to June-May 2004-05 CPI-AL and CPI-IW respectively.

To convert 2004-5 prices to 2011-12, divide by DEFLATOR.

Can I link medical facility and school data across the two waves to create a facility panel?

Did IHDS survey the same schools and medical facilities in 2004-5 and 2011-12, allowing us to create a facilities panel?

No, there was no intention of surveying the same facilities. Thus, while in some cases, the same schools and clinics may have been surveyed in two rounds, this was not planned and no linkages are possible.

How do I link IHDS 1 and HDPI data?

The data set 'panelcrosswork.dta' gives the link between two surveys, 1992-93 and IHDS 1 (2004-05) interview ids. v2 : state code (1992-93) v3 : hhid is a state (1992-93) stateid : state id (2004-05) distid : dist id (2004-05) psuid : psuid (2004-05) hhid : hhid (2004-05) hhsplitid: HH split id (2004-05) idhh : composite id of 2004-05

Using this link table you will be able link the two surveys.