Forum Replies Created
-
AuthorPosts
-
Fabian DepryModerator
As long as those are tears of joy! 🙂
Good luck with your testing.
Fabian DepryModeratorHi Jeff, You can use the SEER Data Generator to create large NAACCR XML files. Only core data items will have a value (about 40 of 50 of them) so it’s not perfect (for example, it would be nice to have text for Abstracts). But it’s better than nothing 🙂
You can download it from this release page: https://github.com/imsweb/data-generator/releases
Look for the “data-generator-X.X-all.jar” where “X.X” is the latest release.
This a standalone JAR, so you should be able to just download it and double-click it (assuming you have a Java environment installed on your machine).
You can find information about the data generator itself and what variables it computes on the project’s home page: https://github.com/imsweb/data-generator
Fabian DepryModeratorDarn 🙂
I will take another look at some point.
Fabian DepryModeratorAwesome! Thanks for testing!
Fabian DepryModeratorHi Valerie, the issue you reported should be fixed now, and I actually released a new version of the library (version 4.10). If you happen to re-test this, please make sure to use that new version of the SAS JAR file and not the previous 4.9. Thanks!
Fabian DepryModeratorUnfortunately, I agree 🙂
Things are going to be rough for the next few years, but I am convinced in the long run, these changes will help the community moving forward in the right direction.
Fabian DepryModeratorTo tie your second comment to mine, I would say that your analytic record should be based on the persistence data model (the consolidated view) and not the transmission data model. If the new NAACCR XML data model works for your persistence model, that’s great; if it’s not exactly what you need, then you might need to tweak it and use some type of conversion from one to the other.
Fabian DepryModeratorBruce,
I think your question is related to transmission versus persistence. The purpose of transmission is just to move data around (either from hospitals/labs to the Central Registry, or from the Central Registry to the Standard Setters, etc…). The purpose of persistence is to keep a consolidated view of the data at the Central Registry; I would assume that most Registries if not all use some kind of database for that. Both mechanism have a data model but those models do not have to be the same and in many registries, the persistence data model is more complex than the transmission data model because the Registry wants to keep track of more data than can be transmitted via NAACCR data files.
The primary purpose of the NAACCR XML Exchange standard is to transmit data and its data model was design with that in mind. A Registry is welcome to use the same data model for persistence if that works for them, but if not, they probably need to move to a more complex persistence data model.
Taking the address as an example: the standard defines it as the “current address” and so the transmission data model only allows one (the address at the time the abstract was created). If you want to keep track of the addresses over time in your database, you would have to use a more complex data model where you consolidate all the incoming current addresses into a list of addresses.
I understand having different models is not as convenient than using the same model for transmission and persistence because it requires some type of conversion, but in general those conversion should be fairly simple.
Fabian DepryModeratorThanks for testing again! It’s really difficult to properly cover all those corner cases! I really should use a standard Java XML parser, but none of them is still compatible with Java 7 which is required by SAS. They really need to move along and update their Java version!!!
I will take a look at this one soon.
July 9, 2018 at 12:32 pm in reply to: NAACCR Fixed-Width data exchange format going away in 2020 #7310Fabian DepryModeratorJust to clarify, the main side effect of the flat file format going away will be that NAACCR will stop defining start column positions for any data items.
Another side effect will be that the State/Requestor Items field will go away (States and Registries will need to use regular variables defined in a user-defined dictionary instead of plugging those variable into one long text field).
There will probably be other side effects, but I think those are the biggest ones.
Fabian DepryModeratorHi Valerie, I fixed the issue you reported with the CDATA sections. I re-created the JAR file (with the same 4.9 version still) and re-posted it on GitHub. It would be great if you could confirm the fix is working as expected.
Thank you!
Fabian DepryModeratorHi Jeff,
Another resource you might find useful is the Java implementation of the NAACCR XML specifications: https://github.com/imsweb/naaccr-xml
You probably won’t be able to use any of the code (Java doesn’t play well with C# and vice versa) but you can certainly see many examples and get some ideas.
The project home page contains links to the specifications and to a wiki page. I strongly encourage you to take a look at the wiki, it has all kind of information about NAACCR XML, including a (Java-based) standalone tool allowing you to go from XML to flat and flat to XML, and also a bunch of valid and invalid XML files that can be used to test your implementation.
I hope this helps!
Fabian DepryModeratorI will take another look at some point.
To make this work with SAS, which still requires Java 7 or earlier (which has been end-of-life for several years now), I had to implement my own (simple) parsing logic. So there are things that are not going to be properly parsed. Hopefully I can address them as they are found.
Fabian DepryModeratorHi Valerie, I implemented the changes we talked about; do you mind re-trying your example when you get a chance?
Note that I removed the version from the macro filenames; I figured it will be easier for people to just replace the files.
I also re-created the SAS JAR file with a fix for the length issue, I re-posted it under the same name (naaccr-xml-4.9-sas.jar) in the release page of the GitHub project (eventually the version will be increased but I figured this can still be considered the “first” version).
Thanks!
Fabian DepryModeratorThat’s great, thanks for testing that solution!
I will add the replace option, that makes sense.
I assume that the truncating is because SAS only uses a subsets of the rows to determine the max length of a given column when reading CSV. If that’s the issue then I think I have a solution. I will try it soon and post new files.
-
AuthorPosts