This is not really a Stata question, but a modelling question, though
hopefully there is a Stata command that will provide a solution.
I have a dataset which contains reported totals and subtotals from
approximately 5k respondents. For example, 5k firms are asked about
their revenues from various sources, which are then summed to create
total revenues. (Specifically, it is an electronic form which forces
respondents to provide subtotals that equal their total). However, some
respondents only report their totals, and for these I am trying to
estimate the subtotals.
A basic approach would be to simply calculate the ratio for each sub
category (eg, sales revenue/total revenue) for existing data and use
this to estimate the missing values. However, because I have a lot more
information on the respondents - for instance, number of employees,
assets, etc - I think I can model the subtotal more accurately using
these variables. The problem is, the subtotals estimated this way do not
sum to the total.
Concretely, let's say there are three variables -x1-,-x2-,-x3-, and that
their total t = x1 + x2 + x3. What I'd like is something like