[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
ncdcta00@uniroma2.it |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: Unix stata big dataset |

Date |
Thu, 29 Nov 2007 23:36:18 +0100 |

The memory that I can allocate is 15 gb, the total observations are 18 millions

I have duplicate observation but I can't drop because they are the spell of work for each person, and I need these observations.

The two dataset have in common the same id, so I need to match the data set in booth but id is not unique.

so, one data set is

id x1 x3 x 4...

1 0 1991 1998

1 1 1991 1998

1 2 1999 1999

2

and second is:

id y1 y2 y3

1 34 2 35

1 34 2 67

1 34 1 68

2

the idea is to keep all the people that have the same id to obtainer this data set

id xi x2 x3 y1 y2 y3

1............

1.................

2...........

Sorry I don't understand the last part of email , how to do the merge

thanks a lot for your help

Quoting Michael Blasnik <michael.blasnik@verizon.net>:

... I can't really comment on the cpu and memory usage report but I would guess that you could save a large fraction of the time for this operation if you told us more about the joinby you want to do: 1) How many observations are in each of the two files? 2) What type of merge do you need: one-to-one, one-to-many, many-to-one, or many-to-many? Only the last type needs -joinby-. 3) What proportion of the observations in each file do you expect to match? Does the large table contain lots of observations you don't need? 4) Are there any variables you don't need in either file that could be dropped first? I think the biggest question is -- Are you sure that you need -joinby- rather than -merge-? Even if you need joinby, you may be able to do this much more quickly by first subsetting unique identifiers of the smaller file, then -merge- with the nokeep option to grab the useful observations in the large file and then go back to the smaller file to do a joinby on this subset file. Also, do you have enough physical memory and an operating system that can allocate 2GB+ to Stata for loading the large dataset? If you are using virtual memory things can be very slow. If you describe more about the data, there may be other approaches that reduce the memory requirements and speed the process. Michael Blasnik ----- Original Message ----- From: <ncdcta00@uniroma2.it> To: <statalist@hsphsun2.harvard.edu> Sent: Thursday, November 29, 2007 4:36 PM Subject: st: Unix stata big datasetDear Statalist,

I have a problem to joinby 2 datasets in unix, I have a dataset about 1,8 gb and other about 30 mg, I want to join this two dataset but in unix is very slow the process, and in 4 days I did'nt have a final dataset ( two month ago I join two dataset, more o less the same size, in only 1 day). I use a do file where I write my command joinby.

I look with the command top at the processor in local machine and my process is in state sleep. I use batch mode

11258 franz 1 20 0 0K 0K cpu/0 35.8H 24.41% stata

15566 ncd 1 20 0 0K 0K cpu/1 11:52 23.41% xstata

16084 ncd 1 60 0 0K 0K sleep 27:00 0.22% stata

so, if my processes is sleep means that it no functions? there is another user connected, can he influence my process?

thanks in advance for your help

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Catia Nicodemo * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: Unix stata big dataset***From:*"Michael Blasnik" <michael.blasnik@verizon.net>

**References**:**st: stset problem***From:*"Tam Phan" <tamdphan@gmail.com>

**st: Unix stata big dataset***From:*ncdcta00@uniroma2.it

**st: Re: Unix stata big dataset***From:*"Michael Blasnik" <michael.blasnik@verizon.net>

- Prev by Date:
**st: Re: Unix stata big dataset** - Next by Date:
**Re: st: need help with svymean** - Previous by thread:
**st: Re: Unix stata big dataset** - Next by thread:
**Re: st: Re: Unix stata big dataset** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |