Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: converting table into matrix


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: converting table into matrix
Date   Sun, 30 Mar 2014 12:25:22 +0100

Thanks for this. Often questions arise out of confusion about what to
do, but it's helpful to all concerned to be as clear as possible about
what the goal is. For example, simple examples can be excellent, but
they don't rule out sketching the real problem too.
Nick
[email protected]


On 30 March 2014 03:53, R Zhang <[email protected]> wrote:
> Nick,
>
> My sincere apology for not making my problem clear from the very
> beginning. I also did not mean your program does not work. All your
> programs have worked in my case.
>
> I post a 3*3 example to make it easy for readers to see what my data
> structure is. In the future, I would learn from my mistakes and try
> not to mislead you and other statalisters.
>
> at this point, I may look for ways to define coarser industries that
> will reduce the dimensions of my matrix.
>
> Regards,
>
> Rochelle
>
> On Sat, Mar 29, 2014 at 9:34 PM, Nick Cox <[email protected]> wrote:
>> Joe's guess is correct. I do other things too.
>>
>> Rochelle: You really did not do a good job of explaining your real
>> problem here. The thread has morphed from a toy problem with a 3 x 3
>> matrix to your explaining that (you think) the real problem may be
>> 70,000 x 70,000. More than once, you have reported that my code will
>> not work for a problem you did not explain.
>>
>> I think you need to think about whether you really have or can find or
>> can input the 4.9 billion numbers needed to specify such a matrix.
>> That is more immediate than whether you can compute the eigenvalues
>> for such a matrix. Very likely, most of them will be zero, but you
>> need to think about how you could possibly read them into Stata
>> variables (answer: you can't) or directly into Mata (I haven't thought
>> about it, but don't assume it will be trivial).
>>
>> I haven't seen an explanation of the report that you need to work with
>> 400 x 450, i.e. an oblong matrix.  I suspect now that was just a
>> careless slip.
>>
>> Nick
>> [email protected]
>>
>>
>> On 29 March 2014 23:02, R Zhang <[email protected]> wrote:
>>> Thank you Joe.
>>>
>>> My sincere apology to anyone who read the post and tried to help. I
>>> mentioned 400*400 initially because a research paper I use for my
>>> project discuss the data they use is about 470*470 . after processing
>>> my data I realize I may have to either (1) collapse the industry to
>>> reduce the dimensions to the hundreds or (2) keep the refined industry
>>> codes which is 70,000x70,000 matrix.
>>>
>>> I will think carefully about your input.
>>>
>>> thank you !
>>>
>>> -Rochelle
>>>
>>> On Sat, Mar 29, 2014 at 5:52 PM, Joe Canner <[email protected]> wrote:
>>>> Rochelle,
>>>>
>>>> I suspect the reason that you haven't heard back from Nick is that he put a lot of effort into providing a solution, only to find that you have mis-specified the problem. (Of course, he may also have a real life outside of Statalist, who knows?)
>>>>
>>>> You started out with a 3x3 example, then Nick answered on that basis, only to be informed that the data was "larger than 400x400".  However, you did not specify which version of Stata you are using, nor how big your data actually was, so he had no way of knowing that Stata could not accommodate your problem. (I guess he assumed that you had already determined that your data would fit in your version of Stata.) After he provided a solution, you informed him that your data was 70,000x70,000, which is far too large for a Stata matrix (regardless of which version you are running).
>>>>
>>>> There is a reason why it is important to completely specify the problem and what version of Stata you are running.
>>>>
>>>> All that said, you would have to use Mata to get to use matrices bigger than what Stata can provide.  I don't have the time at the moment nor the expertise to suggest a solution, but keep in mind that Mata matrices are limited by your system memory.  A 70,000x70,000 matrix will take somewhere on the order of 80Gb of memory (assuming 4 bytes per cell).
>>>>
>>>> Another hint: if you do use Mata, check out the -makesymmetric()- function.
>>>>
>>>> Regards,
>>>> Joe Canner
>>>> ________________________________________
>>>> From: [email protected] [[email protected]] on behalf of R Zhang [[email protected]]
>>>> Sent: Saturday, March 29, 2014 4:23 PM
>>>> To: [email protected]
>>>> Subject: Re: st: converting table into matrix
>>>>
>>>> Hi ,Nick and other Statalisters
>>>>
>>>> after creating the matrix, I will compute its eigenvectors.
>>>>
>>>> symeigen computes eigenvectors for symmetric matrix, which means I
>>>> need to fill in some values of my matrix to make it symmetric.
>>>>
>>>> my original matrix (for the sample 3* 3, the real data is 70,000*70,000)
>>>>
>>>> ** non-symmetric**
>>>> A[3,3]
>>>>               Forestrysu~t  Forestrynu~y       logging
>>>> Forestrysu~t             0             0             0
>>>> Forestrynu~y            64             1             1
>>>>      logging             7            29            41
>>>>
>>>>
>>>> if make it symmetric, it shall look like
>>>>
>>>>          Forestrysu~t  Forestrynu~y       logging
>>>> Forestrysu~t             0             64             7
>>>> Forestrynu~y            64             1             1
>>>>      logging             7            29            41
>>>>
>>>>
>>>> my question is : how should I edit my original stata dataset in order
>>>> to create a symmetric matrix
>>>>
>>>> *** data ***
>>>> clear all
>>>> input str20 C_industry str20 S_industry int x
>>>> Forestrysupport Forestrysupport 0
>>>> Forestrysupport Forestrynursery 0
>>>> Forestrysupport logging 0
>>>> Forestrynursery Forestrysupport 64
>>>> Forestrynursery Forestrynursery 1
>>>> Forestrynursery logging 1
>>>> logging Forestrysupport 7
>>>> logging Forestrynursery 29
>>>> logging logging 41
>>>> end
>>>>
>>>>
>>>> *** Nick's code - it works (but I need help with high dimensional data
>>>> 70,000*70,000) **
>>>>
>>>> qui tab C_industry
>>>> local nvals = r(r)
>>>>
>>>> egen i = seq(), block(`nvals')
>>>> egen j = seq(), to(`nvals')
>>>>
>>>> matrix A=J(`nvals',`nvals',.)
>>>>
>>>> forval n = 1/`=_N' {
>>>>   matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>   if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>   }
>>>> }
>>>> matrix rownames A = `rownames'
>>>> matrix colnames A = `rownames'
>>>>
>>>> matrix list A
>>>>
>>>> ***  A is nonsymmetric ***
>>>>
>>>> thanks !
>>>>
>>>> Rochelle
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 29, 2014 at 3:48 PM, R Zhang <[email protected]> wrote:
>>>>> Nick,
>>>>> you are correct about stata help concerning seq().  Thank you !
>>>>>
>>>>> my data has about 70,000 observations , i.e., 70,000 pairs of
>>>>> C_industry and S_industry. For my square matrix, 70,000*70,000 would
>>>>> exceed the maximum allowable dimensions in stata, is that correct?
>>>>>
>>>>> I ran your program and got "option block() incorrectly specified", my
>>>>> guess is the maximum dimension problem.
>>>>>
>>>>> In this case, can i increase the dimension in stata?
>>>>>
>>>>> Best,
>>>>>
>>>>> Rochelle
>>>>>
>>>>> On Sat, Mar 29, 2014 at 12:13 PM, Nick Cox <[email protected]> wrote:
>>>>>> I don't know why you are Googling this. That is like going to the
>>>>>> library to look for a book you already have. Stata itself gives you
>>>>>> ways of finding out what you need to know.
>>>>>>
>>>>>> -help egen- and looking at the results shows that the function -seq()-
>>>>>> creates indexes 1, 2, 3, ... for the rows and columns of the matrix.
>>>>>> It does not calculate the dimensions of the matrix, which are
>>>>>> calculated from the number of distinct values of your input string
>>>>>> variables.
>>>>>>
>>>>>> My code assumes a square matrix with the same number of rows and columns.
>>>>>> I understood from this thread and another (including a mention of
>>>>>> eigenvalue calculation) that you are dealing with square matrices.
>>>>>> Indeed, if you look at the code again, you should see that the number
>>>>>> of rows and columns is identical and the row and column names are
>>>>>> identical. So, that code cannot be used for oblong matrices (often
>>>>>> loosely called rectangular).
>>>>>>
>>>>>> For arbitrary matrices, you would need something more like this:
>>>>>>
>>>>>> * !!! code not tested
>>>>>>
>>>>>> qui tab C_industry
>>>>>> local nrows = r(r)
>>>>>> qui tab S_industry
>>>>>> local ncols = r(r)
>>>>>>
>>>>>> egen i = seq(), block(`ncols')
>>>>>> egen j = seq(), to(`ncols')
>>>>>>
>>>>>> matrix A=J(`nrows',`ncols',.)
>>>>>>
>>>>>> forval n = 1/`=_N' {
>>>>>>    matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>>>    if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>>>    }
>>>>>>    if `n' <= `ncols' {
>>>>>>           local colnames `colnames' `=S_industry[`n']'
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> matrix rownames A = `rownames'
>>>>>> matrix colnames A = `colnames'
>>>>>> matrix list A
>>>>>>
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 29 March 2014 15:46, R Zhang <[email protected]> wrote:
>>>>>>> Thanks, Nick !  You are always so generous in helping others.
>>>>>>>
>>>>>>> concerning:
>>>>>>>
>>>>>>> egen i = seq(), block(`nvals')
>>>>>>> egen j = seq(), to(`nvals')
>>>>>>>
>>>>>>> I did some google search and read one of your earlier posting on (
>>>>>>> Generating block randomation schedule using Stata)
>>>>>>>
>>>>>>> would it be correct to say : you use egen to generate the dimentions
>>>>>>> for the row and column of the matrix, if my matrix is 400*450, would I
>>>>>>> need to change your program?
>>>>>>>
>>>>>>> Best,
>>>>>>> Rochelle
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Mar 29, 2014 at 5:39 AM, Nick Cox <[email protected]> wrote:
>>>>>>>> This can be corrected and simplified as follows, illustrating the 7th
>>>>>>>> Law of Stata programming, that a shorter program needs more time. I
>>>>>>>> don't repeat Rochelle's code setting up a data example.
>>>>>>>>
>>>>>>>> qui tab C_industry
>>>>>>>> local nvals = r(r)
>>>>>>>>
>>>>>>>> egen i = seq(), block(`nvals')
>>>>>>>> egen j = seq(), to(`nvals')
>>>>>>>>
>>>>>>>> matrix A=J(`nvals',`nvals',.)
>>>>>>>>
>>>>>>>> forval n = 1/`=_N' {
>>>>>>>>   matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>>>>>   if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>>>>>   }
>>>>>>>> }
>>>>>>>> matrix rownames A = `rownames'
>>>>>>>> matrix colnames A = `rownames'
>>>>>>>> matrix list A
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>> On 29 March 2014 01:36, Nick Cox <[email protected]> wrote:
>>>>>>>>> Your main error is to overlook the fact that -encode- by default
>>>>>>>>> encodes in alphanumeric order. See for example the thread started by
>>>>>>>>> Michael McCulloch recently at
>>>>>>>>> http://www.stata.com/statalist/archive/2014-03/msg00346.html which
>>>>>>>>> underlined this point.
>>>>>>>>>
>>>>>>>>> There are various ways round this. One is just not to -encode-. If you
>>>>>>>>> map your string values to value labels, you then have to read them
>>>>>>>>> back.
>>>>>>>>>
>>>>>>>>> This code goes further than yours in supplying row and column names
>>>>>>>>> for the matrix. The assumption is that the string variables contain
>>>>>>>>> values all suitable as matrix row and column labels.
>>>>>>>>>
>>>>>>>>> clear all
>>>>>>>>> input str20 C_industry str20 S_industry int x
>>>>>>>>> Forestrysupport Forestrysupport 0
>>>>>>>>> Forestrysupport Forestrynursery 0
>>>>>>>>> Forestrysupport logging 0
>>>>>>>>> Forestrynursery Forestrysupport 64
>>>>>>>>> Forestrynursery Forestrynursery 1
>>>>>>>>> Forestrynursery logging 1
>>>>>>>>> logging Forestrysupport 7
>>>>>>>>> logging Forestrynursery 29
>>>>>>>>> logging logging 41
>>>>>>>>> end
>>>>>>>>>
>>>>>>>>> qui tab C_industry
>>>>>>>>> local nvals = r(r)
>>>>>>>>>
>>>>>>>>> egen i = seq(), block(`nvals')
>>>>>>>>> egen j = seq(), to(`nvals')
>>>>>>>>>
>>>>>>>>> matrix A=J(`nvals',`nvals',.)
>>>>>>>>> matrix list A
>>>>>>>>>
>>>>>>>>> forval n = 1/`=_N' {
>>>>>>>>>   matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>>>>>>   if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>>>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>>>>>>   }
>>>>>>>>>   if `n' < `nvals' {
>>>>>>>>>           local colnames `colnames' `=S_industry[`n']'
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>> matrix rownames A = `rownames'
>>>>>>>>> matrix colnames A = `colnames'
>>>>>>>>> matrix list A
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 March 2014 23:05, R Zhang <[email protected]> wrote:
>>>>>>>>>> Nick,
>>>>>>>>>> I forgot to post the code. Sorry! My real data has over 400*400
>>>>>>>>>> dimensions in a stata data format. that is why i can't use simple
>>>>>>>>>> matrix command to input data as matrix.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ***** my hypothetical data
>>>>>>>>>> clear all
>>>>>>>>>> input str20 C_industry str20 S_industry int x
>>>>>>>>>> Forestrysupport Forestrysupport 0
>>>>>>>>>> Forestrysupport Forestrynursery 0
>>>>>>>>>> Forestrysupport logging 0
>>>>>>>>>> Forestrynursery Forestrysupport 64
>>>>>>>>>> Forestrynursery Forestrynursery 1
>>>>>>>>>> Forestrynursery logging 1
>>>>>>>>>> logging Forestrysupport 7
>>>>>>>>>> logging Forestrynursery 29
>>>>>>>>>> logging logging 41
>>>>>>>>>> end
>>>>>>>>>>
>>>>>>>>>> list
>>>>>>>>>>
>>>>>>>>>> encode C_industry, gen(c)
>>>>>>>>>> encode S_industry, gen(s)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> drop C_ S_
>>>>>>>>>> list
>>>>>>>>>>
>>>>>>>>>> levelsof c, local(levs)
>>>>>>>>>> local rows : word count `levs'
>>>>>>>>>> matrix A=J(`rows',`rows',.)
>>>>>>>>>> matrix list A
>>>>>>>>>>
>>>>>>>>>> forval i=1/`=_N' {
>>>>>>>>>>   local r=c[`i']
>>>>>>>>>>   local c=s[`i']
>>>>>>>>>>   matrix A[`r',`c']=x[`i']
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> matrix list A
>>>>>>>>>>
>>>>>>>>>> *******************************************
>>>>>>>>>>
>>>>>>>>>> my guess is that the best approach is to use a loop to input data into matrix.
>>>>>>>>>>
>>>>>>>>>> my original post indicates the code did not produce the matrix I
>>>>>>>>>> wanted. could you please critique?
>>>>>>>>>>
>>>>>>>>>> thanks a lot,
>>>>>>>>>>
>>>>>>>>>> Rochelle
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 28, 2014 at 3:49 PM, Nick Cox <[email protected]> wrote:
>>>>>>>>>>> I don't see that your code produces a matrix at all.
>>>>>>>>>>>
>>>>>>>>>>> Seems that you would be better off just typing it in directly.
>>>>>>>>>>>
>>>>>>>>>>> matrix want = (0,0,0\64,1,1\7,29,41)
>>>>>>>>>>> matrix rownames want = Forestrysupport Forestrynursery logging
>>>>>>>>>>> matrix colnames want = Forestrysupport Forestrynursery logging
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 28 March 2014 19:37, R Zhang <[email protected]> wrote:
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the following sample code to input data from stata (see below
>>>>>>>>>>>> datahave) and get an output in matrix form. after that i will compute
>>>>>>>>>>>> eigenvalue for this matrix.
>>>>>>>>>>>>
>>>>>>>>>>>> the code runs, but the output matrix has some elements misplaced. I
>>>>>>>>>>>> wonder if someone could help correct it.
>>>>>>>>>>>>
>>>>>>>>>>>> thanks!
>>>>>>>>>>>> ++++++++++++
>>>>>>>>>>>> datahave
>>>>>>>>>>>> clear all
>>>>>>>>>>>> input str20 C_industry str20 S_industry int x
>>>>>>>>>>>> Forestrysupport Forestrysupport 0
>>>>>>>>>>>> Forestrysupport Forestrynursery 0
>>>>>>>>>>>> Forestrysupport logging 0
>>>>>>>>>>>> Forestrynursery Forestrysupport 64
>>>>>>>>>>>> Forestrynursery Forestrynursery 1
>>>>>>>>>>>> Forestrynursery logging 1
>>>>>>>>>>>> logging Forestrysupport 7
>>>>>>>>>>>> logging Forestrynursery 29
>>>>>>>>>>>> logging logging 41
>>>>>>>>>>>> end
>>>>>>>>>>>> ++++++++++++
>>>>>>>>>>>>
>>>>>>>>>>>> ++++++++++++
>>>>>>>>>>>> matrix want
>>>>>>>>>>>>            c1 c2 c3
>>>>>>>>>>>> r1        0    0   0
>>>>>>>>>>>> r2        64   1   1
>>>>>>>>>>>> r3         7    29  41
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to replace c1,c2,c3 with variable names Forestrysupport
>>>>>>>>>>>> Forestrynursery  logging
>>>>>>>>>>>>
>>>>>>>>>>>> -Rochelle
>>>>>>>>>>>> *
>>>>>>>>>>>> *   For searches and help try:
>>>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>>> *
>>>>>>>>>>> *   For searches and help try:
>>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index