NYC Interactive Health Data (EpiQuery)
What is EpiQuery?
EpiQuery is a web-based, user-friendly system designed to provide users with health data from a variety of sources. EpiQuery is separated into different modules that are based on health datasets with varying topics and indicators for different NYC populations. Users select the data source and topic of interest and the system runs real-time analyses. EpiQuery offers prevalence estimates with confidence intervals, rates over time, bar charts, neigborhood maps, and much more. Results can be downloaded to a spreadsheet or another statistical program. The EpiQuery Data/Feature Summary provides an overview of the data available by module.
Tips for using NYC Interactive Health Data (EpiQuery)
Step 1: Formulate the question. What health condition or behavior is of interest? During what time period? Among what population? For example:
- Community Health Survey (CHS) data: You might want to know the number and ages of NYC adults who are current smokers and are overweight.
- Youth Risk Behavior Survey (YRBS) data: You might want to know the percentage of public high school boys and girls in different age groups who attempted suicide in 2003.
- Vital Statistics data:You might want to know the rate of live births in 2007 among mothers residing in Manhattan.
- Communicable Disease Surveillance System (CDSS) data: You might want to know the number of reported cases of West Nile virus over the past 10 years.
Consult the EpiQuery Data/Features Summary to figure out the module that will best fit your needs.
Step 2: Review existing data tabulations. Before performing any analyses, analysts should review existing Health Department data reports on the topic of interest:
These reports answer many questions and often suggest strategies for further research and application of data. Even without additional analyses, many individuals have been able to use results from these standard reports for implementing health promotion activities and establishing future priorities.
Step 3: Develop a plan. If additional analyses are proposed, analysts should first develop a plan that:
- Specifies the purpose of the analysis.
- Indicates the specific data needed for analysis
(e.g., Community Health Survey data, Communicable Disease Surveillance System data, etc.).
- Projects outcomes (formulates hypotheses) and suggests how the results will be used in specific activities, such as intervention or screening programs.
Step 4: Conduct the analysis. Use the EpiQuery system to conduct the analysis, selecting a data source that has data on the topic of interest for the population of interest during the time period of interest. Step-by-step instructions for using EpiQuery are given below.
Step 5: Use the data for decision-making. Once the data have been analyzed, you can consider how the results will be applied and disseminated.
Can I see an example of EpiQuery with Community Health Survey data?
Suppose you wanted to know the proportion of Hispanic females with diabetes compared with Hispanic males in New York City.
Step 1: Formulate the question: What is the prevalence of diabetes among Hispanic women in NYC?
Step 2: See if there have been any previous publications that have the data in which you are interested. You first check the publications on the Dept. of Health website and find that there are several on diabetes. After you review them, you realize that they do not give the prevalence of diabetes for Hispanic women.
Step 3: Develop a plan to specify the desired analysis. You would like to use a data source that has New York City population data about diabetes and includes gender and ethnicity data. You want the most recently available data. Further, you specify that you want to compare adult Hispanic females and males because you think one group might have more diabetes than the other and you want to target your education program at the needier group.
Step 4: Go to the EpiQuery website (www.nyc.gov/health/epiquery). Select a data source that has diabetes information (i.e., the Community Health Survey) and the most recent year of the survey that includes diabetes. Then select "Diabetes ever" from the list of chronic conditions and clic "submit" at the bottom of the page.
This brings up a table with the survey question and the overall prevalence of self-reported diabetes in NYC for the year you selected (see above). Since you want to get the prevalence by sex and race/ethnicity, choose the "select up to two demographic subgroups" button from the blue part of the screen, and then choose "sex" from one box and "race/ethnicity" from the other. Click "Submit".
RESULTS BY SEX AND RACE/ETHNICITY:
At the bottom of the table you can find the prevalence of self-reported diabetes among Hispanic women.
You can then return to the main diabetes results page by
pushing the back button. You can submit a new query to show the distribution of self-reported diabetes prevalence by neighborhood or to show results by another health indicator (for example, prevalence of ever having diabetes by high cholesterol).
Step 5 - An example decision from the data: You see that the prevalence of diabetes is similar between Hispanic females and males (13.9% and 14% in 2011), so you decide your diabetes education efforts should target both equally.
How can I see trends over time?
Several data sources available in EpiQuery can provide trend data for select measures, including the Community Health Survey (CHS), the Youth Risk Behavior Survey (YRBS), the Communicable Disease Surveillance System (CDSS), STD Surveillance data, and Vital Statistics data. Select "trends" from each module to see all years of available data for a particular measure. There must be at least three years of data available in order to display a trend.
What population is used to calculate rates in EpiQuery?
Several of the modules on EpiQuery provide rates based on administrative or surveillance data, such as the Communicable Disease Surveillance Data or Vital Statistics death data. In each of these cases, the population denominators used to calculate rates are doucmented in the "information" provided for each module. For more information, go to the EpiQuery homepage and click on the "information" link for any of the modules of interest.
Can I refine Community Health Survey EpiQuery results to include demographic subgroups within neighborhoods?
Currently it is not possible to do this for Community Health Survey data on EpiQuery; however, such data are available. First, if you are interested in data for a specific neighborhood or community, check existing data publications to see if any contain the information you need. If you do not find what you need there, submit a query to the Epi Services Team (firstname.lastname@example.org) or download a public use CHS dataset and perform the query using a statistical software package. Because the sample size is smaller at the neighborhood level, it may be necessary to combine years of data. Contact email@example.com to discuss further.
How do I get EpiQuery output into a word processor or spreadsheet on my computer?
It is possible to select, copy and paste the output from any EpiQuery into a word processor or spreadsheet program, such as MS Word or Excel.
In addition, many EpiQuery result tables can be saved in comma separated file format (CSV). To do this, click the "Download Results (CSV)" button on the EpiQuery results page. Once saved, the file can be opened in most spreadsheet programs, such as MS Word or Excel, as well as statistical software packages like SAS, SPSS or STATA.
What exactly are age-adjusted estimates? Why are they useful? When are they not useful?
Within EpiQuery, all age-adjusted estimates have been standardized to the Year 2000 U.S. Standard Population. Most health outcomes and behaviors are related to age. Epidemiologists use age adjustment to compare the attributes of two or more groups whose age distributions may be different. Take the following examples:
When not to age adjust: There are some instances when age adjustment is not desired or appropriate. Specifically, if a researcher is only interested in outcome differences between populations, but not the reasons for those differences (for example, when performing needs assessment for service delivery), a comparison and evaluation of the unadjusted rates may be more appropriate. Also, if a researcher is interested in examining the effect of age on a health-related outcome, it may be important to adjust for factors other than age to assess the true impact of age on the outcome of interest.
- The prevalence of diabetes and high blood pressure, and the overall death rate all increase with age. In addition, females and males have different age-distributions - e.g., there is a larger proportion of older adults among women than among men because women live longer on average. For example, it may be important to adjust for age when comparing the prevalence of high blood pressure between males and females. Age-adjustment ensures that any differences in high blood pressure found between the two groups are not because there are more older women than older men, but because there is a true difference between the sexes in high blood pressure. If we did not adjust and found a higher prevalence of high blood pressure among women compared with men, then this difference may be attribued to the women (on average) being older than the men.
- To compare the death rate between two NYC neighborhoods, an epidemiologist would want to account for the fact that the population of one neighborhood might be older or younger than the other. Age-adjusted estimates are created by calculating the death rate for each age group in the neighborhood, multiplying each of those rates by the proportion of that age group in a standard population to obtain a weighted average, and then adding the "weighted" age-specific rates together to obtain an overall rate for the neighborhood that has been adjusted to the age distribution of the standard population. Thus, each neighborhood's age-adjusted death rate is based on the same age distribution - the distribution in the standard population. The Year 2000 U.S. Standard Population is used at the Health Department, as recommended by the Department of Health and Human Services. Removing the effect of age in this manner when comparing the age-adjusted death rates allows the conclusion that any difference in death rate between the two neighborhoods is not likely to be due to age differences, but to some other factor. Conversely, when a difference observed in the unadjusted, or crude, death rate between two neighborhoods is no longer evident after adjusting for age, it is likely that the difference in the unadjusted rates is due to age differences.
For a more detailed discussion of prevalence estimates and age adjustment, take the CDC training module on age standardization.
What is a Prevalence Rate? A 95% Confidence Interval?
Prevalence of a disease or health outcome is defined as the percent of persons with the disease or outcome in a population. Prevalence refers to the current disease/outcome status in the population. For example, the estimated prevalence of self-reported diabetes among persons aged 18 and older from CHS 2011 was 10.4% (unadjusted for age).
Because the prevalence is actually estimated based on a sample of roughly 9,000 persons, there is possible error in the estimate. The 95% confidence interval gives a sense of the range of error in the estimate. For example, the above-estimated prevalence rate of self-reported diabetes was 10.4% and has a 95% confidence interval of 9.4% to 11.5%. The 95% confidence interval tells us that if we were to repeat the process many times to obtain an estimate, the true value would fall within the confidence interval 95% of the time. When 95% confidence intervals are very wide, we have less confidence in our estimates than when they are narrow. 95% confidence limits tend to be larger when the sample size is small and narrower when the sample size is large.
Where do I find data that are not available through EpiQuery? How can I do more complex analyses?
The Health Department provides numerous online data resources, including datasets, data tables and data-focused publications. Links to all of these resources can be found at the Data and Statistics page on the Health Department's website. If you are unable to find the data you are looking for in these resources or on EpiQuery, there are two courses of action for conducting more analysis:
1. Download or request a public use dataset:
2. For vital statistics data (births and deaths) make a special data request to the Office of Vital Statistics; for other data (e.g., Community Health Survey) submit an Epi Services data request by e-mailing firstname.lastname@example.org. Data requests can provide an Excel table of frequencies and prevalences pertaining to a specific health topic for which these data are available. In the data request, please include your name, organization, phone number, e-mail address, and the following information regarding your request: the data/dataset you would like (e.g., Community Health Survey); the year(s) in which you are interested; the variable(s) of interest; any restrictions (sex, age, etc.) you would like to apply to the data; and any cross-tabulations that you would like to see. Data requests are processed according to volume and priority; standard turnaround time is 2-3 weeks.