PSID Geospatial

Skip Navigation LinksHome > Data > Restricted and Sensitive Use > Geospatial

The Geospatial Data include the identification codes necessary to link address data from the PSID in each year to contextual data from secondary sources such as the Census. As described in the PSID Geocode Match File Data Codebook, geocoding has been conducted using six different Censuses. Addresses from 1968-1985 were geocoded to 1970 and 1980 census identifiers; those from 1968-1999 were geocoded to the 1990 census identifiers; those from 1968-2009 were geocoded to the 2000 census identifiers; those from 1968-2019 were geocoded to the 2010 census identifiers; and starting with the 2021 wave, address information has been geocoded to the 2020 census identifiers. Address data obtained from respondents were geocoded using the SAS proc geocode and proc ginside processes. In 2019, a new geocoding procedure was initiated to prioritize the use of the physical address of the residence. During the past several waves, we found that for a small subset of cases, the U.S. postal mail address provided by the respondent for the mailing of the interview incentive payment was used to determine the geocode, even if a different physical address was provided. The 2019-2023 geocoding process uses the postal mail address only when there is no physical address.

The Between-Wave Moves File has also been geocoded to the 2020 census identifiers, and is available from 2003-2023. The primary objective of this geocoding project has been to characterize the "neighborhoods" in which respondents live. Census tract is used as the approximation of neighborhood in tracted areas, Block Numbering Areas (BNAs) in blocked but untracted areas (i.e., small cities), and Enumeration Districts (EDs) in usually rural areas with neither tracts nor blocks.

Geocode Match File and EHC Between-Wave Moves File
The PSID Geocode Match file contains one record per family unit (location at time of interview), while the EHC Between-Wave Moves file has up to six records for each family (reflecting the moves between waves). Additionally, while the Geocode Match file includes a single address for the family unit, the records in the EHC Between-Wave Moves file are for the Reference Person. That is, no matter who the Respondent is, the question about current and past residences are asked about the Reference Person only. Both the Geocode Match Files and the EHC Between-Wave Moves Files are available at the Tract, Block-Group, and Block level. For more information on the EHC Between-Wave Moves files, please see the EHC Between-Wave Moves Documentation.

Employer Geocode Match Files
The 2013-2017 PSID Employer Geocode Match Files contain location information for the employer of Reference Persons and Spouses/Partners who were currently employed at the time of interview. Address data obtained from respondents were geocoded using the SAS proc geocode and proc ginside processes. A more detailed variable-by-variable description of the data is available in the 2013-2017 Employer Geocode Match Files Codebook, as well as in the File Documentation. The data files are released at the Census Tract, Block-Group, and Block Level.

County Born/Grew Up: Head/Reference Person, Wife/Spouse or Partner, and Parents
In addition, county-level data about where PSID individuals and their parents grew up is provided. The County Born/Grew Up Geospatial Codebook describes these variables.

Census Tract, Block-Group, and Block
Three levels of PSID geospatial data are available that users may request with evidence of an appropriate research and data security protection plan, in order of descending precision: Census Tract, Block-Group, and Block. As described below, Census Block is the smallest level of data made available by PSID. Several Blocks make up Block-Groups, which in turn make up Tracts. Because of its precision, the U.S. Census makes available relatively limited data at the Census Block level. Thus, while it is the smallest and most precise geospatial indicator, its usefulness may be limited.

Researchers who request these data must provide a research plan describing the analytic use of these data. Because of the small level of precision of Block-Group and Census Block, researchers who request either of these variables must additionally provide an explicit and detailed justification for exactly how and why the research will benefit from having the PSID *Block* identifiers OR *Block-Group* identifiers above and beyond *Tract* identifiers. For example, if the goal of the study is to examine contextual effects on individual outcomes by merging contextual data from the Census to the PSID, this particular contextual data itself needs to be measured at the Block or Block-Group level, and must be specified. If it is not measured at the Block or Block-Group level, then access to the PSID Block or Block-Group data - above and beyond Tract data – may not be warranted. In your statement, please describe the data to which the PSID Block or Block-Group will be merged.

The descriptions below provide information about the specific data available from the Census at the level of Block, Block-Group and Census Tract.

Census Block: A Census Block is the smallest geographic unit used by the United States Census Bureau for tabulation of 100-percent data (data collected from all houses, rather than a sample of houses). Several blocks make up Block-Groups, which again make up Census Tracts. There are on average about 39 blocks per Block-Group, but there are variations. Blocks typically have a four-digit number where the first number indicates which Block-Group the Block is in, for example, Block 3019 would be in Block-Group 3. The number of blocks in the United States including Puerto Rico is about 11,770,000.

Blocks are typically bounded by streets, roads or creeks. In cities a Census Block may correspond to a city block, but in rural areas where roads are fewer, blocks may be limited by other features. Census Blocks covering the entire country were introduced with the 1990 Census. Prior to this, back to the 1940 Census, only select areas were divided into blocks.

Because particular Census Blocks may consist of small populations, the Census makes available relatively few variables to which Census Block can be linked. Only population and housing characteristic estimates at the Census Block level from the Census Short Form can be used in conjunction with Census Block.

A private vendor, Geolytics, has made the Census Short Form data available at the Block level and provides a list here.

Block-Group: A higher level of geography than Census Block is the statistical subdivision of a Census Tract (or, prior to Census 2000, a block numbering area) called Block-Group. A Block-Group consists of all tabulation blocks whose numbers begin with the same digit in a Census Tract. For example, for Census 2000, Block-Group 3 within a Census Tract includes all blocks numbered from 3000 to 3999. (A few Block-Groups consist of a single block.) Block-Groups generally contain between 300 and 3,000 people, with an optimum size of 1,500 people. While at a higher level of detail than Census Block, the Block-Group is the lowest-level geographic entity for which the U.S. Census Bureau tabulates sample data from a decennial census.

Block-Group can be linked to Block-Group-level estimates available from the Census Long Form (2000) from private vendors such as Geolytics who can provide a dataset containing 5,500 variables estimated at the Block-Group level such as income, housing, employment, language spoken, ancestry, education, poverty, rent, mortgage, commute to work, etc.

Individuals who are applying for Census Block and Census Block-Group data must describe the exact links they plan to make to Census data. This information is described here: Special Information for Requests for Census Block and Block-Group Data.

Census Tract: A small, relatively permanent statistical subdivision of a county or statistically equivalent entity, delineated for data presentation purposes by a local group of census data users or the geographic staff of a regional census center in accordance with U.S. Census Bureau guidelines. Designed to be relatively homogeneous units with respect to population characteristics, economic status, and living conditions at the time they are established, Census Tracts generally contain between 1,000 and 8,000 people, with an optimum size of 4,000 people. Census Tract boundaries are delineated with the intention of being stable over many decades, so they generally follow relatively permanent visible features. However, they may follow governmental unit boundaries and other invisible features in some instances; the boundary of a state or county (or statistically equivalent entity) is always a Census Tract boundary. When data are provided for American Indian entities, the boundary of a federally recognized American Indian reservation and off-reservation trust land is always the boundary of a tribal Census Tract. See block numbering area, tribal Census Tract.

More than 10,000 estimates at the Census Tract level are available from the Census Long Form. Geolytics provides a description here.

The U.S. Census Bureau adjusts some tract boundaries after each decennial census which may affect the consistent characterization of neighborhood conditions of PSID respondents and complicate the identification of true residential moves from changes in tracts. The Neighborhood Change Database, constructed through a collaboration of GeoLytics Corporation and the Urban Institute, may mitigate some of these problems by providing tract-level data from the 1970, 1980, 1990, and 2000 censuses based on a consistent set of tract definitions.

Note: Researchers who obtain these data will also obtain the higher levels of geocode data available in the PSID, including county, zip code, and MSA as well as other variables.