Isaac Hands

Following up on my last post to this thread…
I wonder if using a CSV formatted file would be a good intermediary between SAS and XML? The CSV format doesn’t suffer from many of the same limitations as the fixed-width file, such as needing to know the position and length of all variables beforehand, so translating between XML and CSV will not require maintenance of Volume II metadata to go along with every NAACCR Item. CSV will still be limited for conveying multi-tier data, such as Patient/Tumor/etc., but SAS does not understand multi-tier data models anyway, so maybe that’s OK for this use case.
I have been playing around with some Java code running inside SAS that can generate CSV from NAACCR XML and then load the data as a SAS dataset. So far, it looks promising, it takes about 4.5 minutes to load a 6GB NAACCR XML file into a SAS dataset with this method, using a pretty basic Windows 10 desktop computer, not sure if that will be acceptable, but it might make a nice proof of concept.
Here is what the SAS code looks like:

filename xmlfile 'C:\\Users\\isaac\\Documents\\ky9515v16.xml';
filename csvfile 'C:\\Users\\isaac\\Documents\\ky9515v16.csv';

data _null_;
		declare JavaObj j1 ("edu/uky/kcr/naaccrxml/csv/ConvertXmlToCsv", xmlfile, csvfile); 
		j1.callVoidMethod ("convert");

proc import datafile=csvfile

The Java code behind this is using the Java NAACCR XML library from IMS

