Home Forums NAACCR XML Standard Using SAS with NAACCR XML


Viewing 15 posts - 16 through 30 (of 37 total)
  • Author
  • #7024
    AnonymousFabian Depry

    Hi Isaac,

    I wanted to try your Java solution, but I ran into an issue: SAS uses a private JRE that they maintain and they are way behind: they latest version (SAS 9.4) requires Java 7 (which has been end-of-life for 3 years!). The NAACCR XML Java library is compiled under Java 8, and so it’s not compatible with SAS 9.4.

    I got that information from this link:

    How did you make your example run with the Java 8 NAACCR XML library?

    Isaac Hands

    For SAS 9.4, this is the magic parameter you need to set in the JREOPTIONS of C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg:
    -Dsas.jre.libjvm=C:\Program Files\Java\jdk1.8.0_161\jre\bin\server\jvm.dll

    The instructions online about setting sas.jre.home are wrong, here is the complete JREOPTIONS setting from my sasv9.cfg file:

    /*  Options used when SAS is accessing a JVM for JNI processing  */
    	-Dsas.jre.libjvm=C:\Program Files\Java\jdk1.8.0_161\jre\bin\server\jvm.dll

    You probably know this, but don’t forget to set your environment variable CLASSPATH to point to your jar file.

    AnonymousFabian Depry

    I see. But this is a global setting you change on your local machine. The SAS instance I use is a company-wide instance running on a remote Linux server.

    I guess I could ask our IT to change the SAS JRE globally for the company, but I am not sure they will accept that…

    I still think it’s an interesting solution, but I was hoping to be able to set the JRE when calling SAS (or that the default JRE would support Java 8 which has been out for 5 or 6 years now). That’s a bit disappointing.

    Thanks for the info though.

    AnonymousFabian Depry

    I put together a solution for reading using an XML Mapper and for writing using a tagset template. It seems to work fine for small data files but it doesn’t scale well and those solutions are not really usable for big files.

    I am currently working on a solution that involves calling a Java Archive (JAR) through SAS; the Java creates a tmp CSV file based on the XML and SAS can then easily read that CSV. The logic for calling Java is embedded in a SAS macro that can easily be distributed. This solution is still slower than dealing with flat files, but it’s much more reasonable for big files than the XML Mapper and/or tagsets.

    I posted all my code and experiments in the Java NAACCR XML GitHub project: (there is a NAACCR XML and SAS section at the bottom).

    Please feel free to download those examples and try them yourself and provide feedback in this forum!

    AnonymousValerie Yoder

    I tried the examples with Fabian’s JAR and I think it’s the best solution for SAS so far. It’s more straightforward to a user than the XML Mapper (I did not try tagsets). No special configuration was necessary which is great. I like the ability to specify a short list of variables to read, this is easier than what we currently have to do for flat files!

    I found the speed very good – it took slightly less time to read one of our annual submission files in xml than it did to read the flat file (both v16), and about twice as long to write it back out. I read abstracts in SAS far more often than I write them, so the writing being slower doesn’t bother me. When I read a smaller full abstract file from a hospital (v16, converted), the speed difference was negligible.

    -Add ‘replace’ option to the tmp csv import step
    -There are some truncation problems on import. With the submission file I observed this short list of variables imported up as $1. when they should be $2.-$4 (and therefore the truncated value was written out). tumorSizeSummary, tnmEditionNumber, tnmPathT, tnmPathN, tnmPathM, tnmPathStageGroup, tnmClinT, tnmClinN, tnmClinM, tnmClinStageGroup, radRegionalRxModality
    When I read full abstracts there were 65 variables that were truncated such as addrCurrentCity imported as $14. instead of $50.

    AnonymousFabian Depry

    That’s great, thanks for testing that solution!

    I will add the replace option, that makes sense.

    I assume that the truncating is because SAS only uses a subsets of the rows to determine the max length of a given column when reading CSV. If that’s the issue then I think I have a solution. I will try it soon and post new files.

    AnonymousFabian Depry

    Hi Valerie, I implemented the changes we talked about; do you mind re-trying your example when you get a chance?

    Note that I removed the version from the macro filenames; I figured it will be easier for people to just replace the files.

    I also re-created the SAS JAR file with a fix for the length issue, I re-posted it under the same name (naaccr-xml-4.9-sas.jar) in the release page of the GitHub project (eventually the version will be increased but I figured this can still be considered the “first” version).


    AnonymousValerie Yoder

    The truncation issues seem to be fixed! I agree the macro files don’t need versions.

    There’s a problem reading text fields that contain CDATA, they are not imported.
    <Item naaccrId=”rxTextRadiation” naaccrNum=”2620″><![CDATA[1/1/18 HOSPITAL – DR X. O’EXAMPLE: SOMETEXT – MORETEXT & SOMEMORETEXT]]></Item>

    AnonymousFabian Depry

    I will take another look at some point.

    To make this work with SAS, which still requires Java 7 or earlier (which has been end-of-life for several years now), I had to implement my own (simple) parsing logic. So there are things that are not going to be properly parsed. Hopefully I can address them as they are found.

    AnonymousFabian Depry

    Hi Valerie, I fixed the issue you reported with the CDATA sections. I re-created the JAR file (with the same 4.9 version still) and re-posted it on GitHub. It would be great if you could confirm the fix is working as expected.

    Thank you!

    AnonymousValerie Yoder

    Yes overall the fix is working for CDATA sections! Just one minor additional fix, when there are pairs of [] within the text, the second ] onward is consistently not read. It’s cut off in the temp csv and sas dataset.
    Example xml:
    <Item naaccrId=”rxTextChemo” naaccrNum=”2640″><![CDATA[1/10/2016 DrugB (Part1, Part2, & Part3) @ Facility w/ Dr. Name. [DrugA started in 1/2015, but DrugB regimen planned] 1/15/2017 Drugc @ Facility w/ Dr Name2]]></Item>

    The resulting variable only contains:
    1/10/2016 DrugB (Part1, Part2, & Part3) @ Facility w/ Dr. Name. [DrugA started in 1/2015, but DrugB regimen planned

    I think this is likely because CDATA uses [], I didn’t have problems with any other special characters in the data I tested.

    AnonymousFabian Depry

    Thanks for testing again! It’s really difficult to properly cover all those corner cases! I really should use a standard Java XML parser, but none of them is still compatible with Java 7 which is required by SAS. They really need to move along and update their Java version!!!

    I will take a look at this one soon.

    AnonymousFabian Depry

    Hi Valerie, the issue you reported should be fixed now, and I actually released a new version of the library (version 4.10). If you happen to re-test this, please make sure to use that new version of the SAS JAR file and not the previous 4.9. Thanks!

    AnonymousValerie Yoder

    Looks good to me, I don’t see any other problems at this time!

    AnonymousFabian Depry

    Awesome! Thanks for testing!

Viewing 15 posts - 16 through 30 (of 37 total)
  • The forum ‘NAACCR XML Standard’ is closed to new topics and replies.

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors