Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: converting table into matrix


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: converting table into matrix
Date   Sat, 29 Mar 2014 21:52:25 +0000

Rochelle,

I suspect the reason that you haven't heard back from Nick is that he put a lot of effort into providing a solution, only to find that you have mis-specified the problem. (Of course, he may also have a real life outside of Statalist, who knows?)

You started out with a 3x3 example, then Nick answered on that basis, only to be informed that the data was "larger than 400x400".  However, you did not specify which version of Stata you are using, nor how big your data actually was, so he had no way of knowing that Stata could not accommodate your problem. (I guess he assumed that you had already determined that your data would fit in your version of Stata.) After he provided a solution, you informed him that your data was 70,000x70,000, which is far too large for a Stata matrix (regardless of which version you are running).

There is a reason why it is important to completely specify the problem and what version of Stata you are running.

All that said, you would have to use Mata to get to use matrices bigger than what Stata can provide.  I don't have the time at the moment nor the expertise to suggest a solution, but keep in mind that Mata matrices are limited by your system memory.  A 70,000x70,000 matrix will take somewhere on the order of 80Gb of memory (assuming 4 bytes per cell).

Another hint: if you do use Mata, check out the -makesymmetric()- function.

Regards,
Joe Canner
________________________________________
From: [email protected] [[email protected]] on behalf of R Zhang [[email protected]]
Sent: Saturday, March 29, 2014 4:23 PM
To: [email protected]
Subject: Re: st: converting table into matrix

Hi ,Nick and other Statalisters

after creating the matrix, I will compute its eigenvectors.

symeigen computes eigenvectors for symmetric matrix, which means I
need to fill in some values of my matrix to make it symmetric.

my original matrix (for the sample 3* 3, the real data is 70,000*70,000)

** non-symmetric**
A[3,3]
              Forestrysu~t  Forestrynu~y       logging
Forestrysu~t             0             0             0
Forestrynu~y            64             1             1
     logging             7            29            41


if make it symmetric, it shall look like

         Forestrysu~t  Forestrynu~y       logging
Forestrysu~t             0             64             7
Forestrynu~y            64             1             1
     logging             7            29            41


my question is : how should I edit my original stata dataset in order
to create a symmetric matrix

*** data ***
clear all
input str20 C_industry str20 S_industry int x
Forestrysupport Forestrysupport 0
Forestrysupport Forestrynursery 0
Forestrysupport logging 0
Forestrynursery Forestrysupport 64
Forestrynursery Forestrynursery 1
Forestrynursery logging 1
logging Forestrysupport 7
logging Forestrynursery 29
logging logging 41
end


*** Nick's code - it works (but I need help with high dimensional data
70,000*70,000) **

qui tab C_industry
local nvals = r(r)

egen i = seq(), block(`nvals')
egen j = seq(), to(`nvals')

matrix A=J(`nvals',`nvals',.)

forval n = 1/`=_N' {
  matrix A[`=i[`n']', `=j[`n']'] = x[`n']
  if C_industry[`n'] != C_industry[`=`n'-1'] {
          local rownames `rownames' `=C_industry[`n']'
  }
}
matrix rownames A = `rownames'
matrix colnames A = `rownames'

matrix list A

***  A is nonsymmetric ***

thanks !

Rochelle




On Sat, Mar 29, 2014 at 3:48 PM, R Zhang <[email protected]> wrote:
> Nick,
> you are correct about stata help concerning seq().  Thank you !
>
> my data has about 70,000 observations , i.e., 70,000 pairs of
> C_industry and S_industry. For my square matrix, 70,000*70,000 would
> exceed the maximum allowable dimensions in stata, is that correct?
>
> I ran your program and got "option block() incorrectly specified", my
> guess is the maximum dimension problem.
>
> In this case, can i increase the dimension in stata?
>
> Best,
>
> Rochelle
>
> On Sat, Mar 29, 2014 at 12:13 PM, Nick Cox <[email protected]> wrote:
>> I don't know why you are Googling this. That is like going to the
>> library to look for a book you already have. Stata itself gives you
>> ways of finding out what you need to know.
>>
>> -help egen- and looking at the results shows that the function -seq()-
>> creates indexes 1, 2, 3, ... for the rows and columns of the matrix.
>> It does not calculate the dimensions of the matrix, which are
>> calculated from the number of distinct values of your input string
>> variables.
>>
>> My code assumes a square matrix with the same number of rows and columns.
>> I understood from this thread and another (including a mention of
>> eigenvalue calculation) that you are dealing with square matrices.
>> Indeed, if you look at the code again, you should see that the number
>> of rows and columns is identical and the row and column names are
>> identical. So, that code cannot be used for oblong matrices (often
>> loosely called rectangular).
>>
>> For arbitrary matrices, you would need something more like this:
>>
>> * !!! code not tested
>>
>> qui tab C_industry
>> local nrows = r(r)
>> qui tab S_industry
>> local ncols = r(r)
>>
>> egen i = seq(), block(`ncols')
>> egen j = seq(), to(`ncols')
>>
>> matrix A=J(`nrows',`ncols',.)
>>
>> forval n = 1/`=_N' {
>>    matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>    if C_industry[`n'] != C_industry[`=`n'-1'] {
>>           local rownames `rownames' `=C_industry[`n']'
>>    }
>>    if `n' <= `ncols' {
>>           local colnames `colnames' `=S_industry[`n']'
>>   }
>> }
>>
>> matrix rownames A = `rownames'
>> matrix colnames A = `colnames'
>> matrix list A
>>
>> Nick
>> [email protected]
>>
>>
>> On 29 March 2014 15:46, R Zhang <[email protected]> wrote:
>>> Thanks, Nick !  You are always so generous in helping others.
>>>
>>> concerning:
>>>
>>> egen i = seq(), block(`nvals')
>>> egen j = seq(), to(`nvals')
>>>
>>> I did some google search and read one of your earlier posting on (
>>> Generating block randomation schedule using Stata)
>>>
>>> would it be correct to say : you use egen to generate the dimentions
>>> for the row and column of the matrix, if my matrix is 400*450, would I
>>> need to change your program?
>>>
>>> Best,
>>> Rochelle
>>>
>>>
>>> On Sat, Mar 29, 2014 at 5:39 AM, Nick Cox <[email protected]> wrote:
>>>> This can be corrected and simplified as follows, illustrating the 7th
>>>> Law of Stata programming, that a shorter program needs more time. I
>>>> don't repeat Rochelle's code setting up a data example.
>>>>
>>>> qui tab C_industry
>>>> local nvals = r(r)
>>>>
>>>> egen i = seq(), block(`nvals')
>>>> egen j = seq(), to(`nvals')
>>>>
>>>> matrix A=J(`nvals',`nvals',.)
>>>>
>>>> forval n = 1/`=_N' {
>>>>   matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>   if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>   }
>>>> }
>>>> matrix rownames A = `rownames'
>>>> matrix colnames A = `rownames'
>>>> matrix list A
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 29 March 2014 01:36, Nick Cox <[email protected]> wrote:
>>>>> Your main error is to overlook the fact that -encode- by default
>>>>> encodes in alphanumeric order. See for example the thread started by
>>>>> Michael McCulloch recently at
>>>>> http://www.stata.com/statalist/archive/2014-03/msg00346.html which
>>>>> underlined this point.
>>>>>
>>>>> There are various ways round this. One is just not to -encode-. If you
>>>>> map your string values to value labels, you then have to read them
>>>>> back.
>>>>>
>>>>> This code goes further than yours in supplying row and column names
>>>>> for the matrix. The assumption is that the string variables contain
>>>>> values all suitable as matrix row and column labels.
>>>>>
>>>>> clear all
>>>>> input str20 C_industry str20 S_industry int x
>>>>> Forestrysupport Forestrysupport 0
>>>>> Forestrysupport Forestrynursery 0
>>>>> Forestrysupport logging 0
>>>>> Forestrynursery Forestrysupport 64
>>>>> Forestrynursery Forestrynursery 1
>>>>> Forestrynursery logging 1
>>>>> logging Forestrysupport 7
>>>>> logging Forestrynursery 29
>>>>> logging logging 41
>>>>> end
>>>>>
>>>>> qui tab C_industry
>>>>> local nvals = r(r)
>>>>>
>>>>> egen i = seq(), block(`nvals')
>>>>> egen j = seq(), to(`nvals')
>>>>>
>>>>> matrix A=J(`nvals',`nvals',.)
>>>>> matrix list A
>>>>>
>>>>> forval n = 1/`=_N' {
>>>>>   matrix A[`=i[`n']', `=j[`n']'] = x[`n']
>>>>>   if C_industry[`n'] != C_industry[`=`n'-1'] {
>>>>>           local rownames `rownames' `=C_industry[`n']'
>>>>>   }
>>>>>   if `n' < `nvals' {
>>>>>           local colnames `colnames' `=S_industry[`n']'
>>>>>   }
>>>>> }
>>>>> matrix rownames A = `rownames'
>>>>> matrix colnames A = `colnames'
>>>>> matrix list A
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 28 March 2014 23:05, R Zhang <[email protected]> wrote:
>>>>>> Nick,
>>>>>> I forgot to post the code. Sorry! My real data has over 400*400
>>>>>> dimensions in a stata data format. that is why i can't use simple
>>>>>> matrix command to input data as matrix.
>>>>>>
>>>>>>
>>>>>> ***** my hypothetical data
>>>>>> clear all
>>>>>> input str20 C_industry str20 S_industry int x
>>>>>> Forestrysupport Forestrysupport 0
>>>>>> Forestrysupport Forestrynursery 0
>>>>>> Forestrysupport logging 0
>>>>>> Forestrynursery Forestrysupport 64
>>>>>> Forestrynursery Forestrynursery 1
>>>>>> Forestrynursery logging 1
>>>>>> logging Forestrysupport 7
>>>>>> logging Forestrynursery 29
>>>>>> logging logging 41
>>>>>> end
>>>>>>
>>>>>> list
>>>>>>
>>>>>> encode C_industry, gen(c)
>>>>>> encode S_industry, gen(s)
>>>>>>
>>>>>>
>>>>>>
>>>>>> drop C_ S_
>>>>>> list
>>>>>>
>>>>>> levelsof c, local(levs)
>>>>>> local rows : word count `levs'
>>>>>> matrix A=J(`rows',`rows',.)
>>>>>> matrix list A
>>>>>>
>>>>>> forval i=1/`=_N' {
>>>>>>   local r=c[`i']
>>>>>>   local c=s[`i']
>>>>>>   matrix A[`r',`c']=x[`i']
>>>>>> }
>>>>>>
>>>>>> matrix list A
>>>>>>
>>>>>> *******************************************
>>>>>>
>>>>>> my guess is that the best approach is to use a loop to input data into matrix.
>>>>>>
>>>>>> my original post indicates the code did not produce the matrix I
>>>>>> wanted. could you please critique?
>>>>>>
>>>>>> thanks a lot,
>>>>>>
>>>>>> Rochelle
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 28, 2014 at 3:49 PM, Nick Cox <[email protected]> wrote:
>>>>>>> I don't see that your code produces a matrix at all.
>>>>>>>
>>>>>>> Seems that you would be better off just typing it in directly.
>>>>>>>
>>>>>>> matrix want = (0,0,0\64,1,1\7,29,41)
>>>>>>> matrix rownames want = Forestrysupport Forestrynursery logging
>>>>>>> matrix colnames want = Forestrysupport Forestrynursery logging
>>>>>>>
>>>>>>> Nick
>>>>>>> [email protected]
>>>>>>>
>>>>>>>
>>>>>>> On 28 March 2014 19:37, R Zhang <[email protected]> wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> I have the following sample code to input data from stata (see below
>>>>>>>> datahave) and get an output in matrix form. after that i will compute
>>>>>>>> eigenvalue for this matrix.
>>>>>>>>
>>>>>>>> the code runs, but the output matrix has some elements misplaced. I
>>>>>>>> wonder if someone could help correct it.
>>>>>>>>
>>>>>>>> thanks!
>>>>>>>> ++++++++++++
>>>>>>>> datahave
>>>>>>>> clear all
>>>>>>>> input str20 C_industry str20 S_industry int x
>>>>>>>> Forestrysupport Forestrysupport 0
>>>>>>>> Forestrysupport Forestrynursery 0
>>>>>>>> Forestrysupport logging 0
>>>>>>>> Forestrynursery Forestrysupport 64
>>>>>>>> Forestrynursery Forestrynursery 1
>>>>>>>> Forestrynursery logging 1
>>>>>>>> logging Forestrysupport 7
>>>>>>>> logging Forestrynursery 29
>>>>>>>> logging logging 41
>>>>>>>> end
>>>>>>>> ++++++++++++
>>>>>>>>
>>>>>>>> ++++++++++++
>>>>>>>> matrix want
>>>>>>>>            c1 c2 c3
>>>>>>>> r1        0    0   0
>>>>>>>> r2        64   1   1
>>>>>>>> r3         7    29  41
>>>>>>>>
>>>>>>>>
>>>>>>>> I would like to replace c1,c2,c3 with variable names Forestrysupport
>>>>>>>> Forestrynursery  logging
>>>>>>>>
>>>>>>>> -Rochelle
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index