Case and Geographic Identification
Back to Constructed Variables.
IHDS households are uniquely identified by the combination of stateid + distid + psuid + hhid + hhsplitid (where “+” signifies concatenation, not addition). In the individual file, persons are uniquely identified by the combination of those variables + personid. Several other identification variables are available to assist in sorting and merging files and for identifying geographic areas. See the merging page for help in merging files.
Variable | Obs | Unique | Mean | Min | Max | Label |
stateid | 41554 | 33 | 18.78 | 1 | 34 | State code |
distid | 41554 | 61 | 14.69 | 0 | 68 | District code |
psuid | 41554 | 39 | 5.76 | 1 | 39 | PSU: village/neighborhood code |
hhid | 41554 | 52 | 9.22 | 1 | 52 | Household ID |
hhsplitid | 41554 | 8 | 0.41 | 0 | 7 | Split household ID |
caseid | 41554 | 41554 | NA | NA | NA | HH id: 11 byte string |
idhh | 41554 | 41554 | 181680422 | 10201010 | 340006150 | HH id 9-digit unique |
idpsu | 41554 | 2474 | 189288 | 10201 | 340006 | PSU id 6-digit unique |
stateid2 | 41554 | 22 | 483.18 | 101 | 733 | State codes, collapsed |
distname | 41554 | 373 | 1892.8 | 102 | 3400 | District codes with names |
dist01 | 41554 | 61 | 14.67 | 0 | 68 | H1sp: District ID Census 2001 |
urban | 41554 | 2 | 0.36 | 0 | 1 | Census: 2001 village/town |
metro6 | 4133 | 6 | 2.97 | 1 | 6 | Largest 6 metro areas 1-6 |
sweight | 41554 | 1526 | 4623.48 | 220 | 308216.4 | Design weights |
caseid: There is a string variable named caseid in both the hh and ind files, but they are different variables. In the hh file, caseid uniquely identifies each household (i.e., = stateid + distid + psuid + hhid + hhsplitid ) while in the ind file caseid uniquely identifies each person (i.e., = stateid +distid +psuid + hhid + hhsplitid + personid).
idhh: idhh is a long integer variable that uniquely identifies each household. idhh is calculated as stateid*10000000 + distid*100000 + psuid*1000 + hhid*10 + hhsplitid.
idpsu: idpsu is a long integer variable that uniquely identifies each primary sampling unit (PSU). It is useful for identifying survey clusters in some statistical analyses. idpsu is calculated as stateid*10000 + distid*100 + psuid.
stateid2: stateid2 is a slightly collapsed version of stateid that creates 22 states and state groups from the 33 states in IHDS. stateid2 also sorts the states into a slightly different regional order. Chandigarh is collapsed into Punjab. All Northeast states and Sikkim are treated as a single group. Daman and Diu is collapsed into Gujarat. Dadra and Nagar Haveli is also collapsed into Gujarat. Goa is collapsed into Maharashtra. Pondicherry is collapsed into Tamil Nadu.
dist01: District identifiers in IHDS (distid) are generally the census 2001 district identification number. In a small number of cases (317 households), distid does not record the correct census code. The correct 2001 census code is always given in dist01. However, dist01 should not be used for sorting or identifying households since PSU and household ids are not unique within dist01. stateid is always the 2001 census state code.
distname: distname is a 4 digit integer code combining the 2001 census state and district codes. Value labels provide the name of each district.
urban: This dichotomy identifies every primary sampling unit that was in an urban area as identified by the 2001 census. It differs slightly from the code recorded in the urban/rural variable, id9, on the household survey which was created from the sampling design. 19 rural PSUs in the sampling design were changed to urban areas in the 2001 census.
vsweight: The IHDS sample is a complex combination of rural and urban samples (see the section on samples). To calculate population estimates for India or for individual states, sweight is needed as a design weight in all analyses.
vmetro6: The six largest metropolitan areas (Mumbai, Delhi, Kolkata, Chennai, Bangalore, and Hyderabad) are identified as codes 1-6 in metro6. Households in all other areas are coded as missing. Metropolitan areas were defined as any district included in the census definition of “urban conglomerates” for each of these six areas. These districts often include both urban and rural areas; all parts are included in the metro6 categories. Delhi is a slight exception to this definition. The Census of India does not allow urban conglomerates to cross state boundaries, although parts of U.P. and Haryana are clearly part of the larger Delhi metropolitan area. To correct for this, Gurgaon district in Haryana and Ghaziabad and Gautam Buddha Nagar districts in U.P. are included as part of the Delhi metropolitan area.
Back to Constructed Variables