Forum Replies Created
September 23, 2020 at 11:00 am in reply to: using SAS with NAACCR XML #13207
I found the issue and fixed the write macro. You can find the new version (7.3) on the release page of the Java NAACCR XML project (https://github.com/imsweb/naaccr-xml/releases/).
I also updated/improved the NAACCR XML and SAS wiki page (https://github.com/imsweb/naaccr-xml/wiki/7:-NAACCR-XML-and-SAS).
Let me know if this is still not working as you expect.September 23, 2020 at 8:27 am in reply to: using SAS with NAACCR XML #13206
There is a definitively a problem in the write macro and I think I know what it is.
I am going to take a closer look today and release an updated version soon.September 18, 2020 at 6:47 am in reply to: user dictionaries #13137
There is also some good information on this wiki page:
In particular, the page contains links to 3 free software that can be used to create dictionaries. Those applications provide a nice GUI, they will validate the content and write it nicely formatted. Trying to create a dictionary by hand is not recommended.February 21, 2020 at 12:31 pm in reply to: Trimming the flatfile fat #12348
I totally share your excitement about the switch; it might be a painful transition for some, but at the end, it will allow many improvements to the process of transmitting NAACCR data; improvements that wound’t be possible with a limited flat-line data model!
Your point about missing information in the documentation is well taken, and we are going to see if we can make that clear.
Technically the trimming and padding rules were only necessary to stay compatible with the flat-file format (although it’s not completely true for the zero-padding rules). But since some software are currently relying on that information, it is unlikely those will be completely removed in the near future. They might be phased out during a longer period of time.December 17, 2019 at 10:55 am in reply to: Discussion: The case for preserving naaccrNum #12152
To the best of my knowledge, there is no plan (short term or long term) to eliminate the NAACCR numbers from the NAACCR XML specifications.
It is true that the numbers are optional in the data files, but they are required in the dictionaries; meaning it is not possible to define a data item without defining a unique number for it.
The reasoning for introducing unique NAACCR XML IDs was that they are human readable, they introduce less potential conflicts (if an organization adds a proper prefix to its own data item, it basically eliminates any possible conflict with other data item IDs), and they are valid programming variables (for most languages), allowing some developers to use them as-is in their programs (it wouldn’t make sense to call a variable 400 but it can be called primarySite).
The reasoning for making the numbers optional in the data file was that the IDs uniquely identify the data items, and so technically a software shouldn’t need them to consume a data file (if a software prefers to deal with the numbers, then the dictionary provides a one-to-one mapping between the IDs and the numbers).
At the end, some people/organizations prefer to deal with numbers the way they always have, and some people/organizations have embraced the new IDs, and stopped using the numbers completely. Both approaches are perfectly fine.December 17, 2019 at 10:29 am in reply to: NAACCR Fixed-Width data exchange format going away in 2020 #12149
Note that the timeline was pushed from 2020 to 2021 to give more time to vendors and registries.
NAACCR did survey the readiness of the registry recently and provided the results in a ListServ email; here is the link to the archive (the survey was about 2018 readiness, but there were a few XML questions at the end):
There was also another ListServ email sent recently to remind everyone of the available resources for the transition:
https://share.naaccr.org/community-home/digestviewer/viewthread?GroupId=25&MessageKey=91affbfe-5613-4a32-aec3-1f66acbd2522&CommunityKey=b29e2bb2-73c8-49d4-ad8e-7eb9abcda56d&tab=digestviewer&ReturnUrl=%2fcommunity-home%2fdigestviewer%3fCommunityKey%3db29e2bb2-73c8-49d4-ad8e-7eb9abcda56dDecember 17, 2019 at 10:21 am in reply to: Example parsing of a patient record #12148
There is no javadoc available for the Java NAACCR XML library, but most of the public methods in that project do have good comments.
The best example to use the PatientXmlReader is its unit test:
The main class (NaaccrXmlUtils) also has methods that translate flat to XML and XML to flat; those are also good example of using the PatientXmlReader to read all the available fields in a data file (See NaaccrXmlUtils.xmlToFlat() for example):
I hope that helps.December 17, 2019 at 10:16 am in reply to: NAACCR Record Version and XML #12147
We are talking about two different version here:
1. The NAACCR XML Specifications version
2. The NAACCR Layout version
The specifications version (currently 1.4) is tied to the syntax of the XML (when a tag or attribute is removed/added, or when the specifications change, like the maximum number of characters allowed for the NAACCR XML IDs).
The layout version (currently 180) is tied to the data items (that’s the one the community is used to).
Both those versions can be updated independently of the other one.
In 2021, NAACCR will release a new layout (only describing XML since fixed-columns will be retired) and that layout will be version 210 (or v21). There will be a new base dictionary associated to that version.
At that point, either the specification version will still be at 1.4, or maybe it will have been upgraded to address minor changes to the specifications.
I hope this answers your question.April 3, 2019 at 8:07 am in reply to: Dates in XML #10642
That’s correct, the “date” type is the same one that is currently used in the fixed-column files (without the trailing spaces). It’s not related to those more complex “XML” date types.April 3, 2019 at 7:24 am in reply to: Sample XML Files #10639
The samples will be updated to NAACCR 18, but like you said, they are rudimentary; their purpose is to provide a set of “valid” vs “invalid” files that can be used to test a given software.
There are tools that can create more complete “fake” data files (I know the SEER Data Viewer (https://seer.cancer.gov/tools/dataviewer/) does that and I think there might be others out there, maybe someone else will comment on this); but as far as I know, those tools only set a small subset of values.
I think a “full” sample file (meaning all variables filled in) would probably need to be crafted by hand. I will bring this topic to the NAACCR XML workgroup during our next meeting.April 3, 2019 at 7:15 am in reply to: Dates in XML #10638
The “date” format is defined in the XML Implementation Guide that is posted on the NAACCR website; here is what those specifications say about that format:
“A NAACCR-style full or partial date (yyyy, yyyymm or yyyymmdd).”
And here is the regular expression the specifications define for the type:
And so the following are valid dates (no trailing spaces):
The format doesn’t allow a known day and unknown year/month or a known day/month and an unknown year.
Many invalid dates (future dates, or dates with a day that is too high for a given month) will be deemed “valid” by this definition; I think the idea was to be able to use a simple regular expression to define what is acceptable and not acceptable and to let edits deal with corner cases.July 23, 2018 at 1:39 pm in reply to: Need to build an interface for XML to SQL Datadata #7484
Yup, the “naaccr-dictionary-180.xml” is the best source for mapping NAACCR numbers to NAACCR IDs. At least for standard items. For non-standard items, that mapping should be provide in a “user-defined dictionary” by whoever created the XML data files (the data-generator doesn’t support user-defined dictionaries; I wanted to keep things very simple for now).July 23, 2018 at 1:08 pm in reply to: Need to build an interface for XML to SQL Datadata #7481
The standard itself doesn’t require the “naaccrNum” attribute because technically the “naaccrId” is all that is needed to uniquely identify an item. But a lot of software still deal with the numbers, and so it’s convenient to have them.
I will update the data generator to allow an option to add the numbers to the created file (https://github.com/imsweb/data-generator/issues/30).
Ultimately, it’s up to you if you want your new framework to require the numbers on incoming data (which is probably more convenient for you), or if you want to follow the strict standard and deal with not always having those numbers.July 23, 2018 at 10:28 am in reply to: Tip #1: Running EDITS efficiently on NAACCR XML structures #7477
In my opinion, this makes total sense!
In theory the same concept can be applied to “NaaccrData” edits vs “Patient” edits where an edit on the registry ID item (for example) would only fail one for an entire data file. But I completely understand why supporting that would be much more difficult and there wouldn’t be much gain anyway (there are so little data items at that root level).July 22, 2018 at 1:51 pm in reply to: Using SAS with NAACCR XML #7473
I looked more into the issue you described, but I can’t reproduce it.
I used the following file:
I tried to create a file that represents the data you described.
Could you please try that file yourself when you have some time, and confirm that it’s also working for you. And if it is, can you please compare it with your own file and maybe try to figure out the difference?