Stata | FAQ: How do I obtain percentiles for survey data?

Home / Resources & support / FAQs / Calculate percentiles with survey data

How do I obtain percentiles for survey data?

Title		Calculate percentiles with survey data
Author		Nini Zang, StataCorp

When we have survey data, we can still use pctile or _pctile to get percentiles. This is the case because survey characteristics, other than pweights, affect only the variance estimation. Therefore, point estimation of the percentile for survey data can be obtained with pctile or _pctile with pweights.

I will start by presenting an example on how _pctile works with survey data.

 . sysuse auto
 (1978 Automobile Data)

 . rename mpg psu

 . rename length strata

 . keep price psu strata weight

 . keep in 1/4
 (70 observations deleted)

 . svyset psu [pweight=weight], strata(strata)

       pweight: weight
           VCE: linearized
   Single unit: missing
      Strata 1: strata
          SU 1: psu
         FPC 1: <zero>

 . _pctile price [pweight=weight], p(10)

 . return list

 scalars:
                  r(r1) =  3799

As we already know, a percentile is the value of a variable below which a certain percentage of observations fall. So the 10th percentile is the value below which 10% of the observations may be found. Although we have survey structures—such as strata, PSU, and pweights—the percentiles are only affected by pweights. Let’s look at the formula of pctile or _pctile we use in Stata.

Let x_(j) refer to the x in ascending order for j = 1, 2, ..., n. Let w_(j) refer to the corresponding weights of x_(j); if there are no weights, w_(j) = 1. Let N = Σⁿ_j=1w_(j). To obtain the pth percentile, which we will denote as x_[p], we need to find the first index i such that W_(i) > P, where P = N * p/100 and W_(i) = Σⁱ_j=1w_(j).

The pth percentile is then

	{	x_(i−1) + x_(i)
x_[p] =		2	If w_(i−1) = P
		x_(i)	otherwise

From above, we can see that the calculation of a percentile is only associated with weights and observations.

Let’s manually calculate the percentile obtained above with _pctile. We first sort the data:

 . sort price

 . list

      +-------------------------------+
      | price   psu   weight   strata |
      |-------------------------------|
   1. | 3,799    22    2,640      168 |
   2. | 4,099    22    2,930      186 |
   3. | 4,749    17    3,350      173 |
   4. | 4,816    20    3,250      196 |
      +-------------------------------+

Let

price_(j) = the variable price in ascending order for j = 1, 2, 3, 4

weight_(j) = the corresponding weights

price_[10] = 10th percentile of price

We generate variable w, cumulative. Sum of weight:

. generate w=sum(weight)

. list


      +---------------------------------------+
      | price   psu   weight   strata       w |
      |---------------------------------------|
   1. | 3,799    22    2,640      168    2640 |
   2. | 4,099    22    2,930      186    5570 |
   3. | 4,749    17    3,350      173    8920 |
   4. | 4,816    20    3,250      196   12170 |
      +---------------------------------------+

Then, N = Σ⁴_j=1weight_(j) = 2640 + 2930 + 3350 + 3250 = 12170 and P = N * p/100 = (12170 * 10)/100 = 1217. To obtain the 10th percentile, we must find the first index i such that W_(i) > 1217. When index i =1, we can see W₍₁₎ = 2640, which is greater than 1217. Thus the 10th percentile price_[10] is equal to price₍₁₎; that is, the price_[10] = 3799.

We can also estimate the median from survey data by using summarize with aweights.

 . sysuse auto, clear
 (1978 Automobile Data)

 . rename mpg psu

 . rename length strata

 . keep price psu strata weight

 . keep in 1/4
 (70 observations deleted)

 . svyset psu [pweight=weight], strata(strata)

       pweight: weight
           VCE: linearized
   Single unit: missing
      Strata 1: strata
          SU 1: psu
         FPC 1: <zero>

 . summarize price [aweight=weight], detail

                             Price
 -------------------------------------------------------------
       Percentiles      Smallest
  1%         3799           3799
  5%         3799           4099
 10%         3799           4749       Obs                   4
 25%         4099           4816       Sum of Wgt.       12170
 
 50%         4749                      Mean            4404.32
                         Largest       Std. Dev.      489.7492
 75%         4816           3799
 90%         4816           4099       Variance       239854.3
 95%         4816           4749       Skewness      -.3284718
 99%         4816           4816       Kurtosis       1.321737

From above, we can see that the median of price is equal to 4749. The 10th percentile of price is equal to 3799, which is the same result that we obtained with _pctile and pweights.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I obtain percentiles for survey data?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I obtain percentiles for survey data?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies