Search Email Updates Contact Us Residents Business Visitors Government Office of the Mayor NYC.gov always open
The New York City Department of Health and Mental HygieneSign up for Health Emails
Take Care New York
Submit comments about the website.
Advanced

Stay Connected
Follow us on TwitterFollow us on FacebookFollow us on TumblrView our YouTube channelFollow us on foursquare

RSS

Translate the page





















Environmental & Occupational Disease Epidemiology : NYC DOHMH

Epidemiology Services

Frequently Asked Questions

F.A.Q. EpiQuery
F.A.Q. Community Health Survey F.A.Q. YRBS
NYC INTERACTIVE HEALTH DATA (EPIQUERY)

 

TIPS FOR USING NYC INTERACTIVE HEALTH DATA (EPIQUERY)
Step 1: Formulate the question. Analysts must first decide on the question to be answered with the data to be used. For example:

  • CHS data: You might want to know the number and ages of residents who are current smokers and are overweight.
  • SPARCS data: You might want to know the number and ages of residents who were hospitalized with HIV/AIDS-related conditions.
  • YRBS data: You might want to know the percentage of boys and girls in different age groups who attempted suicide in 2003.

Step 2: Review existing data tabulations. Before performing any analyses, analysts should review the standard reports on the data:

These reports answer many questions and often suggest strategies for further research and application of data. Even without additional analyses, many individuals have been able to use results from these standard reports for implementing health promotion activities and establishing future priorities.

Step 3: Develop a plan. If additional analyses are proposed, analysts should first develop a plan that:

  • Specifies the purpose of the analysis.
  • Indicates the specific data needed for analysis (e.g., CHS questions, ICD-9 SPARCS codes, etc.).
  • Projects outcomes (formulates hypotheses) and suggests how the results will be used in specific activities, such as intervention or screening programs.

Step 4: Conduct the analysis. Use the EpiQuery system to conduct the analysis. Analysts interested in more complex questions than supported by EpiQuery should consult with a statistician or the Epi Services Team about the software needed for the analysis.

  • CHS data require a statistical package capable of analyzing weighted data (e.g., Epi Info v.6+ or SUDAAN-). Calculating confidence intervals around estimates requires use of SUDAAN-, Epi Info, or another software package that can account for the complex survey design of the CHS. Make an Epi Services Data Request by contacting the Bureau at survey@health.nyc.gov.
  • For YRBS, if you require a more sophisticated analysis than the EpiQuery system can provide, you should make an Epi Services Data Request by contacting the Epi Services Team directly (survey@health.nyc.gov).
  • For Vital Statistics, World Trade Center Health Registry or Communicable disease data, if you require a more sophisticated analysis than the EpiQuery system can provide, you should consult with a statistician and/or the EpiQuery Team (EpiQuery@health.nyc.gov).

Step 5: Use the data for decision-making. Once the data have been analyzed, you can consider how the results will be applied and disseminated.

BACK TO EPIQUERY FAQ

CAN I SEE AN EXAMPLE OF EPIQUERY WITH CHS DATA?
Suppose you wanted to know the proportion of females with diabetes compared with males.

Step 1: Formulate the question: What is the prevalence of diabetes among Hispanic women in NYC?

The second step would be to see if there have been any previous CHS publications that have the data in which you are interested. You first check the Vital Sign Reports publications and find that there is one written on diabetes. After you review it, you realize that it does not give the prevalence of diabetes for Hispanic women. So you go to the CHS EpiQuery website.


Choose "Diabetes" from the middle column and then click "submit".


This brings up a table with the survey question and the overall prevalence of self-reported diabetes in NYC (9.0%). Since you want to get the prevalence by sex and race/ethnicity, choose the "select up to two demographic subgroups" button from the blue part of the screen, and then choose "sex" from one box and "race/ethnicity" from the other. Click "Submit".


At the bottom of the table you can find the prevalence of self-reported diabetes among Hispanic women to be 12%.

You can then return to the Main diabetes screen by choosing "Same topic, new refinement" to submit a query to show the distribution of self-reported diabetes prevalence by neighborhood or to compare these 2003 survey results for diabetes to those for 2002.

BACK TO EPIQUERY FAQ

WHAT POPULATION IS USED TO CALCULATE RATES IN EPIQUERY?
All rates are calculated using the U.S. Census 2000 data for New York City.

BACK TO EPIQUERY FAQ

CAN I REFINE CHS EPIQUERY RESULTS TO INCLUDE DEMOGRAPHIC SUBGROUPS WITHIN NEIGHBORHOODS?
Currently it is not possible to do this with EpiQuery; however, such data are available. First, if you are interested in data for a specific neighborhood or community, check the Community Health Profile for that neighborhood or community (NYC Community Health Profiles) to see if it contains the information you need. If you do not find what you need there, submit a query to the Epi Services Team (epidatarequest@health.nyc.gov) or download a public use CHS dataset and perform the query using a statistical software package.

BACK TO EPIQUERY FAQ

HOW DO I GET EPIQUERY OUTPUT INTO A WORD PROCESSOR OR SPREADSHEET ON MY COMPUTER?
It is possible to select, copy and paste the output from any EpiQuery into a word processor or spreadsheet program, such as MS Word or Excel.

In addition, CHS, YRBS and Census EpiQuery result tables can be saved in comma separated file format (CSV). To do this, click the "download results to CSV" button on the EpiQuery results page. Once saved to disk, the file can be opened in most spreadsheet programs, such as MS Word or Excel, as well as statistical software packages like SAS, SPSS or STATA.

BACK TO EPIQUERY FAQ

HOW DO I COMPLETE A DATA REQUEST FROM THE CHS DATASET THAT IS MORE COMPLEX THAN THE EPIQUERY SYSTEM CAN PROVIDE?
The best way to do this is to submit an Epi Services Data Request, email epidatarequest@health.nyc.gov or download the a CHS public use dataset.

BACK TO EPIQUERY FAQ

WHAT EXACTLY ARE AGE-ADJUSTED ESTIMATES? WHY ARE THEY USEFUL? WHEN ARE THEY NOT USEFUL?
Within EpiQuery, all age-adjusted estimates have been standardized to the Year 2000 U.S. Standard Population.  Most health outcomes and behaviors are related to age. Epidemiologists use age adjustment to compare the attributes of two or more groups whose age distributions may be different. Take the following examples:

  1. The prevalence of smoking, individuals' cholesterol levels, and death rate all increase with age. In addition, persons in various groups sampled in CHS may have different age distributions than others. For example, females interviewed in the CHS are slightly older than males. So if you wanted to compare the smoking rates of men and women in NYC, you would have to take into account the different age distributions.

  2. For example, to compare the death rate between two NYC neighborhoods, an epidemiologist would want to account for the fact that the population of one neighborhood might be older or younger than the other. Age-adjusted estimates would be calculated by applying the death rate for each age group within the two populations to a standard population (such as the whole NYC population). Thus, overall death rate is computed twice for the standard population using the death rates derived from each of the two comparison neighborhoods; these two death rates are then used for comparison between the two original populations and are the age-adjusted rates. Removing the effect of age in this manner and comparing the age-adjusted death rates allows the conclusion that any difference in death rate between the two neighborhoods is not likely to be due to age differences, but to some other factor. When a difference observed in the unadjusted, or crude, death rate between two neighborhoods is no longer evident after adjusting for age, it is likely that the difference in the unadjusted rates is due to age differences. The age-adjustment technique is not limited to neighborhoods. For example, it may be important to adjust for age when comparing the rates of elevated cholesterol between males and females.

  3. Some groups of persons hospitalized in New York City may have different age distributions than others (e.g., males vs. females, or persons in Manhattan vs. Brooklyn). Thus, if you want to compare the rate of hospital discharges due to colon cancer between two NYC neighborhoods, you would want to account for the fact that the population of one neighborhood might be older or younger than the other. Age-adjusted estimates are calculated by applying the hospital discharge rate for each age group in two populations to a standard population (such as the whole NYC population). The overall rate of discharges for colon cancer is computed two times in the standard population using the hospital discharge rates derived from each of the two comparison neighborhoods. These rates are the age-adjusted rates. Removing the effect of age in this manner and comparing the age-adjusted hospital discharge rates for colon cancer allows us to say that any differences in rates between the two neighborhoods are likely not due age differences, but to some other factor. When a difference is observed in the unadjusted (crude) discharge rate between two neighborhoods goes away after adjusting for age, it is likely that the differences in the unadjusted rates between the two populations aaaare due to age differences. The age-adjustment technique is not limited to neighborhoods. For example, it may be important to adjust for age when comparing the rates of discharges due to hip fracture between male and female New Yorkers.
There are some instances when age adjustment is not desired or appropriate. Specifically, if a researcher is only interested in outcome differences between population, but not the reasons for those differences (for example, when performing needs assessment for service delivery), a comparison and evaluation of the unadjusted rates may be more appropriate. Also, if a researcher is interested in examining the effect of age on a health-related outcome, it may be important to adjust for factors other than age to assess the true impact of age on the outcome of interest.

For a more detailed discussion of prevalence estimates and age adjustment, take the CDC training module on prevalence and surveys

BACK TO EPIQUERY FAQ

HOW DO I DO AN ANALYSIS THAT COMBINES MORE THAN ONE QUESTION OR SURVEY MEASURE?
Currently, researchers in the Bureau of Epidemiology Services are working on adding this capability to Online EpiQuery. However, since this is not yet available, there are two courses of action for conducting more complicated analyses: (1) for CHS analyses, download the public use dataset; or (2) submit an Epi Services Data Request by emailing survey@health.nyc.gov.

Persons interested in analyzing the CHS data should consult statisticians and epidemiologists who have expertise in the analysis of complex survey data. A useful reference for using the CHS data and other chronic disease data is Using Chronic Disease Data: A Handbook for Public Health Practitioners (Centers for Disease Control [CDC], 1992). Consultations on CHS data are available with staff epidemiologists in the Division of Epidemiology Services on request. Questions about a specific software package can be answered by the manufacturer's support technicians.

The basic requirement for an acceptable statistical analysis software package is its ability to produce frequencies and allow for weighting of data. Commonly used PC-based software programs include, but are not limited to, SAS-, SPSS-, Epi Info, and Epistat; however, most statistical software packages assume that survey data are obtained using simple random sampling. This results in standard errors for prevalence estimates and 95% confidence limits for odds ratios that are incorrect (too small) for the surveys using any design other than simple random sampling. To calculate standard error and 95% confidence limits for odds ratios correctly, software packages must be used that account for the complex BRFSS sample design. PC-oriented software that meet this requirement are Epi Info (version 6.0+), SUDAAN-, and Stata.

BACK TO EPIQUERY FAQ

WHAT IS A PREVALENCE RATE? A 95% CONFIDENCE INTERVAL?
Prevalence of a disease or health outcome is defined as the number of persons with the disease or outcome in a population. For example, the estimated prevalence of self-reported diabetes among persons aged 18 and older from CHS 2001 was 530,000 in 2001. The estimated prevalence rate of self-reported diabetes refers to the prevalence (e.g., 530,000) divided by the number of persons aged 18 and over (6,048,000), or 9%.

Because the prevalence and prevalence rate are actually estimated based on a sample of roughly 10,000 persons, the possibility of error in the estimates is introduced. The 95% confidence interval gives a sense of the range of error in the estimate. For example, the above-estimated prevalence rate of self-reported diabetes was 9% and has a 95% confidence interval of 8.3% to 9.7%. The 95% confidence interval tells us that if we were to repeat our estimate of the prevalence rate of self-reported diabetes 100 times, the estimate obtained would fall between 8.3% and 9.7%, 95% of the time. The 95% confidence interval is often interpreted as the interval where the true prevalence most likely lies. When 95% confidence intervals are very wide, we have less confidence in our estimates than when they are narrow. 95% confidence limits tend to be larger when the sample size is small and narrower when the sample size is large.

For a more detailed overview on surveys and their technical aspects, take the CDC Survey Training Module.

BACK TO EPIQUERY FAQ

WHERE DO I FIND DATA THAT ARE NOT AVAILABLE THROUGH EPIQUERY?
The Health Department provides numerous online data resources, including dataset, data tables and data-focused publications.  Links to all of these resources can be found at My Community's Health: Data and Statistics.  If you are unable to find the data you are looking for in these resources or on EpiQuery, you can make a data request.  Data requests can provide an Excel tables of frequencies and prevalences pertaining to a specific health topic for which these data are available. In the data request, please include your name, organization, phone number, e-mail address, and the following information regarding your request: the data/dataset you would like (e.g., Community Health Survey or Communicable Disease Surveillance System or Vital Statistics Deaths); the year(s) you are interested in; the variable(s) of interest; any restrictions (sex, age, etc.) you would like to apply to the data; and any cross-tabulations that you would like to see.  Data requests are processed according to volume and priority; standard turnaround time is 2-3 weeks. To make a data request, please contact us at epidatarequest@health.nyc.gov  and your request will be directed appropriately.

BACK TO EPIQUERY FAQ

Page last updated April 2009.

 
Copyright 2012 The City of New York Contact Us | FAQs | Privacy Policy | Terms of Use | Site Map