Forum Replies Created
-
AuthorPosts
-
Bruce Riddle
SpectatorUsing Northcon 210 (1.0.0.17), NAACCR XML Utility 7.5, and the NAACCR-XML Java Library on GitHub, I have successfully re-written our SAS programs to read and write NAACCR 21 XML files for case processing. GenEdits 5.1.064 and the NAACCR 21A metafile was used to examine the input and output XML files. All the software appeared to work well. I customized the read and write Macros from GitHub to meet our needs for NAACCR 21 and NAACCR A record. Two minor problems. One is finding a way to suppress some of the output produced by the Macro. A sequence of 6 or more reads or writes of the respective Macros makes the SAS Log less useful for debugging. The other problem is that if you create errors that involve the JAVA library, the only recovery is to stop and restart SAS.
Many thanks to everyone to developed these tools. Getting all of this to work was a major relief.
Bruce Riddle
SpectatorI am trying to work with NAACCR 210. How do I start to debug this?
1 * test of reading XML file using SAS ;
2 filename txml “J:\XML\SAS_XML\naaccr-xml-utility-7.5\naaccr-xml-utility-7.5\sas” ;
3 %include txml (read_naaccr_xml_macro.sas) ;
86 %readNaaccrXml(
87 libpath=”J:\XML\SAS_XML\naaccr-xml-utility-7.5\naaccr-xml-utility-7.5\sas”,
88 sourcefile=”J:\XML\Oct2020_test\120170_16Sep2020_V21.xml”,
89 naaccrversion=”210″,
90 recordtype=”A”,
91 dataset=stg1 ) ;ERROR: Could not find class com/imsweb/naaccrxml/sas/SasXmlToCsv at line 1 column 111. Please ensure that the
CLASSPATH is correct.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
java.lang.ClassNotFoundException: com.imsweb.naaccrxml.sas.SasXmlToCsv
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.45 seconds
cpu time 0.04 secondsBruce Riddle
SpectatorI have no issues with SAS macros; I use them. This is still early in the game and I have a limited amount of test data. Fabian pointed out today that a NAACCR 18 flat file I received from a national vendor does not exactly meet the “specifications” to be used in SEER DataViewer. I am powerless on the issue to get the vendor to “fix” the files. I assume the XML files I will receive will also be imperfect. I am trying to test out case processing scenarios that address the issues I outlined above and allow me to manage and track the inflows and characteristics of data received. I am only one voice. The current structure of the SAS Macro is awkward and limited for use in processing up to 40 files in a batch with imperfections in the data. Some of this is my own idiocentric ways of work with data; I never use long complex variables names in large data sets. Who wants to learn 791 variable names? I strongly prefer the NAACCR Item numbers. Right now the tools being developed seem to focus on one file at a time. I do not want to touch one file unless it is so bad it needs work. I want to work on files in batches with tools that allow me to add supplemental information to track the who and when. Before anyone writes yet another program, we thought is required and tools in development need to be released for testing and evaluation.
Bruce Riddle
SpectatorIssac,
I tried the XML macro and it does not scale well for monthly
production. Very awkward on an in a multi-file production run.
BruceBruce Riddle
SpectatorJoe,
I think is a wonderful idea. I have used XML Exchange Plus
in my experiments and I like the tool.Bruce
Bruce Riddle
SpectatorWe expect we will start to receive XML files in January or February. Although the flat file option will exist, I expect some IT people will make the choice for the registrars. Almost immediately I will need tools to go into the file to make changes such as missing hospital numbers or missing dates. I will also need to figure out a way to separate out rapid reports (within 45 days of diagnosis) and definitive reports (within 180 days of diagnosis). Please, can anyone suggest an XML text editor? I assume given the size of the files, they will arrive Zipped so it would be nice if the editor read and saved ZIP files.
Thanks.
Bruce Riddle
SpectatorIssac,
I was on the last call and I listened to the discussions.
I work with about 150 variables from the NAACCR dataset in SAS. It is
much easier to type the NAACCR Item number than some random short name. The
lookup is much faster. I am not asking for another name. The NAACCR numbers
are in place.
In a related topic, I am trying to work with Windows PowerShell to manipulate
HL7 ePath records. PowerShell provides a very useful way to do that with only
a few commands. In my work, I discovered that PowerShell can also manipulate
XML files. That should be very helpful.Bruce Riddle
SpectatorI want to make the case in writing that a different approach is needed to get data out of a central registry database into analytical tools like SAS, GenEdits, InterRecordEdits, SEER*PATH, Match*PRO, etc.
The NAACCR Volume 2 has the title ‘Data Standards and Data Dictionary.’ Then the next piece is called the ‘XML Data Exchange Standard.’ The primary goal of the data exchange standard is to ensure seamless transmission between registries be it a hospital registry or a central registry. Nowhere is it written that the XML Data Exchange Record has to be read by any of the analytical tools. Almost all of our analytical tools do not read or cannot read XML documents very well.Plan B: A secondary standard is needed that allows for a pipe-delimited formatted ASCII file to be exported from a central registry database to be input into an analytical tool. The two models for this are SEER*STAT and MATCH*PRO.
The primary assumption I am making is that right now a pipe (‘|’) is not contained in any names, addresses, or coding schemes collected by a registry or imported into a registry software system. If that assumption is violated, then we need to find another delimiter.
I would like to see developed analytical file formats that consist of selected data items needed for normal work. For instance, prior to calls for data, a list of data items would be developed along with the order that could be brought into GenEdits and InterRecordEdits. That file format would be installed in the registry vendor software to output the subsets of necessary cases.
In New Hampshire, because we are so small, we would seek all cases 1995-2017 that meet the required criteria. The output file would contain approximately 136 reportable data items along with a few confidential data items to facilitate editing of cases. The resulting file would be smaller than an XML file, faster to output and faster to read in the analytical software.
I would strongly prefer that the header use NAACCR Item numbers as the variables names (N18_20, N18_390, N18_400, etc.) to make manipulation easier.
Rarely does a central registry need to output the entire NAACCR record. It would be necessary for inter-state data exchange, for archive purposes, and for transmission to some authorities.
Other pipe delimited file formats could be used for submission to the NAACCR Geocoder. Match*PRO, etc. SAS PROC IMPORT can easily read into a pipe-delimited file with a header.
The use of analytical pipe-delimited does not diminish the value of the XML Standard for Data Exchange. At a certain size, a delimited file becomes unwieldy and cumbersome.
Bruce Riddle
SpectatorI agree that NAACCR Number is very important to operations. Almost all by
code uses the NAACCR number to refer to variables. The number gives me an
exact name. I cannot imagine working with on a day to day with the longer names
in the XML specification writing code. It is just a great deal of typing and chances
to make errors.Bruce
Bruce Riddle
SpectatorA very insightful comment. The conversion will not be simple.
Bruce Riddle
SpectatorMore research and figured out Part 1 of my question. Part 2 is harder. Like the SAS conversion issue,
the challenge remains on how to create an accurate analytic record that contains the correct patient
and correct tumor info.Bruce Riddle
SpectatorWe use RMCDS as the registry database. NH Rules and Regs require reporters to send us a rapid report within
45 days of diagnosis. Almost all reporter transmissions contain a mix of rapid and definitive or complete reports. I use SAS to separate out rapids from definitives. In that step, I can also correct for missing or incomplete data.RMCDS only lets us load NAACCR records. I use SAS to take reports from non-hospital reporters –pathology cases, death clearance only records, clinic records–to create a NAACCR record to load into RMCDS.
Bruce
Bruce Riddle
SpectatorMy experiments with SAS and XML have not been very successful. The loss of SAS eliminates a very powerful tool both for basic file processing prior to loading data in to the registry database and also working with the data on export from the registry database. I have little hope that SAS will invest in a more advanced XML tool.
Here is one idea for a solution to at least create analytical files. SAS Proc Import will read delimited files with a header. This provides an option for two applications. One application is to be able to export from the main database selected variables in a pipe delimited format with a header. To make this more user friendly, the application needs a configuration page where you can just check the variables you need and be able to keep that list as a file for future use. Some users will only need to set the configuration once. Then SAS Proc Import can read in the delimited file and create the SAS data set.
The second application would read an XML file and perform the same task as above.
In both instances, one line for patient/tumor. Very few exercises require the entire set of all NAACCR variables so these analytic data sets should be fairly small.
The major advantage of this method is that you do not need any input or format statement. The significant disadvantage is that PROC Import selects the input format so sometimes you get numeric when you want character, etc.
Another version of above is write out two separate files. One file of pipe delimited data and a second file of the input format. The input format could easily dragged into a SAS program. The configuration page could allow for selection of formats. For example, I read in all dates as character since NAACCR allows date with blanks. In SAS, I can fill in the blanks before creating a SAS date that can be manipulated.
The XML file for a standard time period, 1995 to 2018, will be very large. Few registries will have the storage capacity to keep a reasonable number of these files around. The ability to easily create analytic files is very important. Finding a very convenient way to upzip, run a tool or GenEdits, and re-zip will be important.Bruce Riddle
SpectatorAs I said in the beginning, “a number of powerful and robust tools.” Many smaller registries do not
have an IT staff so they have not really thought about XML and the impact it will have on registry operations. If many tools are not present, the move to XML will be very difficult.Bruce Riddle
SpectatorI tried out the sample code Fabian posted on XML files I created using various tools and our data for one year. The good news is that I got identical results from the XML files exported by the tools although they differed in size. For 8,000 cases, one file was 188,725 KB and one was 148,833 KB. The bad news is that it is very slow. SAS is provided under license to NPCR Registries and many take advantage of the opportunity. Few registries I know have any staff who know any JAVA, Python, or C++. If I know C++ or JAVA well enough to write code to manipulate XML, I would get a much better paying job.
One suggestion here was to use the XML tools built into MS SQL. We will explore that idea.
B
-
AuthorPosts