Stata 11 help for xmluse

help xmlsave, help xmluse dialogs: xmlsave xmluse -------------------------------------------------------------------------------

Title

[D] xmlsave -- Save and use datasets in XML format

Syntax

Save data in memory to XML-format dataset

xmlsave filename [if] [in] [, xmlsave_options]

Save subset of data in memory to XML-format dataset

xmlsave varlist using filename [if] [in] [, xmlsave_options]

Use XML-format dataset

xmluse filename [, xmluse_options]

xmlsave_options description ------------------------------------------------------------------------- Main doctype(dta) save XML file by using Stata's .dta format doctype(excel) save XML file by using Excel XML format dtd include Stata DTD in XML file legible format XML to be more legible replace overwrite existing filename -------------------------------------------------------------------------

xmluse_options description ------------------------------------------------------------------------- doctype(dta) load XML file by using Stata's .dta format doctype(excel) load XML file by using Excel XML format sheet("sheetname") Excel worksheet to load cells(upper-left:lower-right) Excel cell range to load datestring import Excel dates as strings allstring import all Excel data as strings firstrow treat first row of Excel data as variable names missing treat inconsistent Excel types as missing nocompress do not compress Excel data clear replace data in memory -------------------------------------------------------------------------

Menu

xmlsave

File > Export > XML data

xmluse

File > Import > XML data

Description

xmlsave and xmluse allow datasets to be saved or used in XML file formats for Stata's .dta and Microsoft Excel's SpreadsheetML format. XML files are advantageous because they are structured text files that are highly portable between applications that understand XML.

xmlsave saves the data in memory in the dta XML format by default. To save the data, type

. xmlsave filename

although sometimes you will want to explicitly specify which document type definition (DTD) to use by typing

. xmlsave filename, doctype(dta)

xmluse can read either an Excel-format XML or a Stata-format XML file into Stata. You type

. xmluse filename

Stata will read into memory the XML file filename.xml, containing the data after determining whether the file is of document type dta or excel. As with the xmlsave command, the document type can also be explicitly specified with the doctype() option.

. xmluse filename, doctype(dta)

It never hurts to specify the document type; it is actually recommended because there is no guarantee that Stata will be able to determine the document type from the content of the XML file. Whenever the doctype() option is omitted, a note will be displayed that identifies the document type Stata used to load the dataset.

If filename is specified without an extension, .xml is assumed.

Options for xmlsave

+------+ ----+ Main +-------------------------------------------------------------

doctype(dta|excel) specifies the document type definition (DTD) to use when saving the dataset.

doctype(dta), the default, specifies that an XML file will be saved using Stata's .dta format (see [P] file formats .dta). This is analogous to Stata's binary dta format for datasets. All data that can normally be represented in a normal dta file will be represented by this doctype.

doctype(excel) specifies that an XML file will be saved using Microsoft's SpreadsheetML document type definition. SpreadsheetML is the term given by Microsoft to the Excel XML format. Specifying this document type produces a generic spreadsheet with variable names as the first row, followed by data. It can be imported by any version of Microsoft Excel that supports Microsoft's SpreadsheetML format.

dtd when combined with doctype(dta) embeds the necessary document type definition into the XML file so that a validating parser of another application can verify the dta XML format. This option is rarely used, however, because it increases file size with information that is purely optional.

legible adds indents and other optional formatting to the XML file making it more legible for a person to read. This extra formatting, however, is unnecessary and in larger datasets can significantly increase the file size.

replace permits xmlsave to overwrite existing filename.xml.

Options for xmluse

doctype(dta|excel) specifies the document type definition (DTD) to use when loading data from filename.xml. Although optional, use of doctype() is encouraged. If this option is omitted with xmluse, the document type of filename.xml will be determined automatically. When this occurs, a note will display the document type used to translate filename.xml. This automatic determination of document type is not guaranteed, and the use of this option is encouraged to prevent ambiguity between various XML formats. Specifying the document type explicitly also improves speed, as the data are only passed over once to load, instead of twice to determine the document type. In larger datasets, this advantage can be noticeable.

doctype(dta) specifies that an XML file will be loaded using Stata's dta format. This document type follows closely Stata's binary .dta format (see [P] file formats .dta).

doctype(excel), specifies that an XML file will be loaded using Microsoft's SpreadsheetML document type definition. SpreadsheetML is the term given by Microsoft to the Excel XML format.

sheet("sheetname") imports the worksheet named sheetname. Excel files can contain multiple worksheets within one document, so using the sheet() option specifies which of these to load. The default is to import the first worksheet to occur within filename.xml.

cells(upper-left:lower-right) specifies a cell range within an Excel worksheet to load. The default range is the entire range of the worksheet, even if portions are empty. Often times the use of cells() is necessary because data are offset within a spreadsheet, or only some of the data need to be loaded. Cell range notation follows the letter-for-column and number-for-row convention that is popular within all spreadsheet applications. The following are valid examples:

. xmluse filename, doctype(excel) cells(A1:D100)

. xmluse filename, doctype(excel) cells(C23:AA100)

datestring forces all Excel SpreadsheetML date formats to be imported as strings to retain time information that would otherwise be lost if automatically converted to Stata's date format. With this option, time information can be parsed from the string after loading it.

allstring forces Stata to import all Excel SpreadsheetML data as string data. Although data type information is dictated by SpreadsheetML, there are no constraints to keep types consistent within columns. When such inconsistent use of data types occurs in SpreadsheetML, the only way to resolve inconsistencies is to import data as string data.

firstrow specifies that the first row of data in an Excel worksheet consist of variable names. The default behavior is to generate generic names. If any name is not a valid Stata variable name, a generic name will be substituted in its place.

missing forces any inconsistent data types within SpreadsheetML columns to be imported as missing data. This can be necessary for various reasons but often will occur when a formula for a particular cell results in an error, thus inserting a cell of type ERROR into a column that was predominantly of a NUMERIC type.

nocompress specifies that data not be compressed after loading from an Excel SpreadsheetML file. Because data type information in SpreadsheetML can be ambiguous, Stata initially imports with broad data types and, after all data are loaded, performs a compress to reduce data types to a more appropriate size. The following table shows the data type conversion used before compression and the data types that would result from using nocompress:

SpreadsheetML type Initial Stata type ------------------------------------------- String str244 Number double Boolean double DateTime double Error str244 -------------------------------------------

clear clears data in memory before loading from filename.xml.

Examples saving XML files

To save the current Stata dataset to a file, auto.xml type

. xmlsave auto

To overwrite an existing XML dataset with a new file containing the variables make, mpg, and weight, type

. xmlsave make mpg weight using auto, replace

To save the dataset to an XML file for use with Microsoft Excel, type

. xmlsave auto, doctype(excel) replace

Examples using XML files

Assuming that we have a file named auto.xml that was saved using the doctype(dta) option of xmlsave, we can read in this dataset with the command

. xmluse auto, doctype(dta) clear

If the file was saved from Microsoft Excel to a file called auto.xml that contained the worksheet Rollover Data, with the first row representing column headers (or variable names), we could import the worksheet by typing

. xmluse auto, doctype(excel) sheet("Rollover Data") firstrow clear

Continuing with the previous example, if we wanted just the first column of data in that worksheet, and we knew there were only 75 rows, including one for the variable name, we could have typed

. xmluse auto, doc(excel) sheet("Rollover Data") cells(A1:A75) first clear

Also see

Manual: [D] xmlsave

Help: [D] compress, [D] fdasave, [D] infiling, [D] odbc, [D] outfile, [D] outsheet, [D] save


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index