Reply To: Using SAS with NAACCR XML

Reply To: Using SAS with NAACCR XML

Home Forums NAACCR XML Standard Using SAS with NAACCR XML Reply To: Using SAS with NAACCR XML

AnonymousFabian Depry

I think we are done looking at SAS for now.

For reading SAS, it looks like an acceptable solution will involve an XMLMap file that tells SAS how to construct the data sets based on the different level of data in XML (so one data set for NaaccrData, one for Patient and one for Tumor, although all those sets can be defined in a single XMLMap file). The XMLMap will define SAS variables based on their NAACCR ID attribute (using XPath); it will also define a few “ORDINAL” variables which will be used as identifiers for every rows of the data sets (they are called ORDINAL because they are counters incremented when a specific tag is found in the data files). A SAS program will then be able to “merge” back the different data sets using the ORDINAL variables as a pivot (or linkage variable); the end result will be a single data set where the NaaccrData data is repeated for every Patient and Tumor, and the Patient data is repeated for every Tumor (which is the same behavior as reading flat files). There is one caveat to this solution: SAS will read and process every variables defined in the XMLMap; so using a mapping file that defines all variables won’t be practical for large data files (the processing will be too slow). Instead, a smaller mapping file should be used with just the variables that are needed for the program. Hopefully it will be possible to create those specialized XMLMap files using an open-source software. I am attaching an example of a mapping file including only a few variables:
– naaccr-xml-v16-data-sample.xml: a very simple NAACCR XML sample file
– an XMLMap file containing the definition of one variable at each XML level (plus the ordinal variables)
– a simple SAS program that merges Patient and Tumor data from the sample files and print frequencies of the defined variables.
– readin.level2.output: the results of running the SAS program (I only copied the relevant frequencies)

For writing SAS, the conclusion would be “don’t do it”. We found no satisfactory way of using an XMLMap to re-create a valid NAACCR XML file. There are other solutions that don’t use an XMLMap but they are very involved and require some type of coding that most people wouldn’t be willing to do. There are other tools and software that can recode variables and that will probably be updated to support NAACCR XML; the best approach for recoding XML files would be to switch to those tools.

*** Update: looks like I can’t upload the files in this post; all the files have been uploaded in the java NAACCR XML project in GitHub:

You must be logged in to view attached files.

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors