Case and Geographic Identification

Back to Constructed Variables.

IHDS households are uniquely identified by the combination of stateid + distid + psuid + hhid + hhsplitid (where “+” signifies concatenation, not addition). In the individual file, persons are uniquely identified by the combination of those variables + personid. Several other identification variables are available to assist in sorting and merging files and for identifying geographic areas. See the merging page for help in merging files.

Variable	Obs	Unique	Mean	Min	Max	Label
stateid	41554	33	18.78	1	34	State code
distid	41554	61	14.69	0	68	District code
psuid	41554	39	5.76	1	39	PSU: village/neighborhood code
hhid	41554	52	9.22	1	52	Household ID
hhsplitid	41554	8	0.41	0	7	Split household ID
caseid	41554	41554	NA	NA	NA	HH id: 11 byte string
idhh	41554	41554	181680422	10201010	340006150	HH id 9-digit unique
idpsu	41554	2474	189288	10201	340006	PSU id 6-digit unique
stateid2	41554	22	483.18	101	733	State codes, collapsed
distname	41554	373	1892.8	102	3400	District codes with names
dist01	41554	61	14.67	0	68	H1sp: District ID Census 2001
urban	41554	2	0.36	0	1	Census: 2001 village/town
metro6	4133	6	2.97	1	6	Largest 6 metro areas 1-6
sweight	41554	1526	4623.48	220	308216.4	Design weights

caseid: There is a string variable named caseid in both the hh and ind files, but they are different variables. In the hh file, caseid uniquely identifies each household (i.e., = stateid + distid + psuid + hhid + hhsplitid ) while in the ind file caseid uniquely identifies each person (i.e., = stateid +distid +psuid + hhid + hhsplitid + personid).

idhh: idhh is a long integer variable that uniquely identifies each household. idhh is calculated as stateid*10000000 + distid*100000 + psuid*1000 + hhid*10 + hhsplitid.

idpsu: idpsu is a long integer variable that uniquely identifies each primary sampling unit (PSU). It is useful for identifying survey clusters in some statistical analyses. idpsu is calculated as stateid*10000 + distid*100 + psuid.

stateid2: stateid2 is a slightly collapsed version of stateid that creates 22 states and state groups from the 33 states in IHDS. stateid2 also sorts the states into a slightly different regional order. Chandigarh is collapsed into Punjab. All Northeast states and Sikkim are treated as a single group. Daman and Diu is collapsed into Gujarat. Dadra and Nagar Haveli is also collapsed into Gujarat. Goa is collapsed into Maharashtra. Pondicherry is collapsed into Tamil Nadu.

dist01: District identifiers in IHDS (distid) are generally the census 2001 district identification number. In a small number of cases (317 households), distid does not record the correct census code. The correct 2001 census code is always given in dist01. However, dist01 should not be used for sorting or identifying households since PSU and household ids are not unique within dist01. stateid is always the 2001 census state code.

distname: distname is a 4 digit integer code combining the 2001 census state and district codes. Value labels provide the name of each district.

urban: This dichotomy identifies every primary sampling unit that was in an urban area as identified by the 2001 census. It differs slightly from the code recorded in the urban/rural variable, id9, on the household survey which was created from the sampling design. 19 rural PSUs in the sampling design were changed to urban areas in the 2001 census.

vsweight: The IHDS sample is a complex combination of rural and urban samples (see the section on samples). To calculate population estimates for India or for individual states, sweight is needed as a design weight in all analyses.

vmetro6: The six largest metropolitan areas (Mumbai, Delhi, Kolkata, Chennai, Bangalore, and Hyderabad) are identified as codes 1-6 in metro6. Households in all other areas are coded as missing. Metropolitan areas were defined as any district included in the census definition of “urban conglomerates” for each of these six areas. These districts often include both urban and rural areas; all parts are included in the metro6 categories. Delhi is a slight exception to this definition. The Census of India does not allow urban conglomerates to cross state boundaries, although parts of U.P. and Haryana are clearly part of the larger Delhi metropolitan area. To correct for this, Gurgaon district in Haryana and Ghaziabad and Gautam Buddha Nagar districts in U.P. are included as part of the Delhi metropolitan area.

Back to Constructed Variables

Case and Geographic Identification

Back to Constructed Variables.

Subscribe