Jeff Reed

Jeff Reed

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 17 total)
  • Author
    Posts
  • in reply to: Need to build an interface for XML to SQL Datadata #9699
    AnonymousJeff Reed
    Spectator

    Don’t know what I would do without you KB, You zeroed in on the problem. The garbage collection of C# (PC Version: Recycled Resources), did clean out memory but not the large file content I was building in memory. I changed it to write to file every tumor instead and the 5,000 case test file built in 3.3 seconds down from 14 Min.

    Next step integrate with a .Bat file load integrator (oracle and SQL Server).

    in reply to: Need to build an interface for XML to SQL Datadata #9685
    AnonymousJeff Reed
    Spectator

    Sorry to report I probably will not be able to attend tomorrow’s meeting.

    Attached are some screen shots of the current progress using the XMLPluss.DLL. 100 records .4 milliseconds, 250 records 2.9 Seconds, 500 records ~12sec.

    There is a problem with memory not being cleaned out as the application slows down significantly over 500 records. The memory usage builds up till it takes over 14 Min for 5000 records or just fails with memory unavailable. (Box is has 16gb memory, 8gig free). Open to ideas where to look for clearing memory from our seasoned XMLPlus users …

    Attachments:
    You must be logged in to view attached files.
    in reply to: Need to build an interface for XML to SQL Datadata #9679
    AnonymousJeff Reed
    Spectator

    Kathleen thank you for your patience,

    Took a break to lure a large mouth bass but I can report I got the pointer working for the XMLPlus_GetItemDataByNaaccrId function. Somewhere in the code addition to support both the naaccrid and naaccrnum return values I over engineered the pointer assignment, nothing like a full re-write to flush out the bugs. The Edit50.dll worked like a charm when fed a proper pointer.

    Steps to success:
    Added a Pointer:
    XMLPlus_Callbacks.Callback_ReadXmlData funcPtr1 =
    new XMLPlus_Callbacks.Callback_ReadXmlData(ReadItemByNaaccrNum);

    XMLPlus_Callbacks.Callback_ReadProgress funcPtr2 =
    new XMLPlus_Callbacks.Callback_ReadProgress(ProgressFunc);

    XMLPlus_Callbacks.Callback_ReadXmlData funcPtr3 =
    new XMLPlus_Callbacks.Callback_ReadXmlData(ReadItemByNaaccrId);

    Used a switch to set pointer:
    if (cb_m1_byname.IsChecked == true)
    { readitem_callback = Marshal.GetFunctionPointerForDelegate(funcPtr3); }
    else
    {readitem_callback = Marshal.GetFunctionPointerForDelegate(funcPtr1); }

    call the function with a switch:
    if (cb_m1_byname.IsChecked == true)
    {iRtn = XMLPlus_Interop.XMLPlus_GetItemDataByNaaccrId(xmlId, t_id); }
    Else
    {iRtn = XMLPlus_Interop.XMLPlus_GetItemDataByNaaccrNum(xmlId, t_num); }

    I did end up using a global variable to pass the return value but will change that to have the functions call a generic function to build the output.

    private static void ReadItemByNaaccrId(
    System.IntPtr owner,
    int patient_ordinal,
    int tumor_ordinal,
    [InAttribute()][MarshalAsAttribute(UnmanagedType.LPStr)] string NaaccrId,
    int NaaccrNum,[InAttribute()][MarshalAsAttribute(UnmanagedType.LPStr)] string value)
    {
    globalVariable.sValue = value;
    }

    The test used a test file generated from the github test file generator with 5000 tumor records. The program loops through a patient with 7 field value calls to the get value function and loops through tumors with 127 calls to the function to build a delimited flat file of tumor with redundant patient info. It took ~11Min to process the file with Debugging on.

    in reply to: Need to build an interface for XML to SQL Datadata #9664
    AnonymousJeff Reed
    Spectator

    This C# code I use works:
    iRtn = XMLPlus_Interop.XMLPlus_GetItemDataByNaaccrNum(xmlId, t_num);
    sPPatient = sPPatient + readitem_callback.ToString() + “²”;

    This C# code the string and returns the pointer value which I need to use to build the string value:
    iRtn = XMLPlus_Interop.XMLPlus_GetItemDataByNaaccrId(xmlId, t_id);
    sPPatient = sPPatient + readitem_callback.ToString()
    // need to fill string value from returned pointer, function requires void def. not char for pointer;

    I’ll keep looking, The code works for num may have to go back to that and ask Fabian if he could add naaccrnum to test file

    in reply to: Need to build an interface for XML to SQL Datadata #9662
    AnonymousJeff Reed
    Spectator

    sorry copied the wrong ones, I am using the:
    [DllImportAttribute(“XMLPlus.dll”, EntryPoint = “XMLPlus_GetItemDataByNaaccrNum“)]
    public static extern int XMLPlus_GetItemDataByNaaccrNum(int XmlId,
    int naaccrNum);

    [DllImportAttribute(“XMLPlus.dll”, EntryPoint = “XMLPlus_GetItemDataByNaaccrId“)]
    public static extern int XMLPlus_GetItemDataByNaaccrId(int XmlId,
    [InAttribute()] [MarshalAsAttribute(UnmanagedType.LPStr)] string naaccrId);

    When I was using the ByNaaccrNum I could refference the string in the callback:
    iRtn = XMLPlus_Interop.XMLPlus_GetItemDataByNaaccrNum(xmlId, t_num);
    sPPatient = sPPatient + readitem_callback.ToString() + “²”;

    The ByNaccrId call is different in that I need to use the pointer that is returned. The process of building a char array and a stringbuilder routine to build the value is what I was hoping there was some examples of since I am having trouble getting the pointer to the value working.

    in reply to: Need to build an interface for XML to SQL Datadata #9658
    AnonymousJeff Reed
    Spectator

    Thank you Kathleen,

    Your pseudo code is what I have in development so no problem testing. Looks like I have the problem using a pointer return variable to access the actual string. The bynumber() function I successfully used returns the value so I have no problem there, the byNaaccrId() returns a pointer and apparently I am rusty on using pointers to access the string value because I cant seem to get it working and I am not finding a good example.

    [DllImportAttribute(“XMLPlus.dll”, EntryPoint = “XMLPlus_GetItemDefByNumber“)]
    public static extern int XMLPlus_GetItemDefByNumber(int XmlId,
    int naaccrNum, System.IntPtr owner, System.IntPtr callback_func);

    [DllImportAttribute(“XMLPlus.dll”, EntryPoint = “XMLPlus_GetItemDefByNaaccrId“)]
    public static extern int XMLPlus_GetItemDefByNaaccrId(int XmlId,
    [InAttribute()] [MarshalAsAttribute(UnmanagedType.LPStr)] string naaccrId,
    System.IntPtr owner, System.IntPtr callback_func);

    in reply to: Need to build an interface for XML to SQL Datadata #7563
    AnonymousJeff Reed
    Spectator

    Cant seem to get the XMLPLUS DLL function: XMLPlus_GetItemDataByNaaccrId(const int XmlId,const char* naaccrId) working

    I was able to get the function: XMLPlus_GetItemDataByNaaccrNum(const int XmlId,const int naaccrNum) working

    the function …bynaaccrid doesn’t seem to like the pointer I try to set. the function I got working does not use a pointer and just passed the numeric ID value. Is it reasonable to require the NAACCR Number in the data? That would be my preferred solution rather than using the shortnameID.

    in reply to: Need to build an interface for XML to SQL Datadata #7482
    AnonymousJeff Reed
    Spectator

    Thank you Fabian, I guessed wrong, will switch to the required field <Item naaccrId=> I was hoping naaccrNum would be required so I didn’t have to worry about the full name match.

    Is the best source to use for mapping the version_id to the valid xml name the “naaccr-dictionary-180.xml” file?

    AnonymousJeff Reed
    Spectator

    My view is similar, each tumor instance needs a check, so if the header patient info fails the edit should skip tumor data and skip to the next patient, is that something that is going to be in EDITS50?

    in reply to: Need to build an interface for XML to SQL Datadata #7479
    AnonymousJeff Reed
    Spectator

    Just noticed the test files from the github sample file generator did not include the <naaccr num> tag under the <item> tag where as the sample file example on the NAACC sample file did. is this <naaccr num> tag going to required or do I have to use the full name in the <naaccr id> tag to field map?.

    <Item naaccrId=”patientIdNumber” naaccrNum=”20″>01200001</Item>
    vs
    <Item naaccrId=”patientIdNumber”>00000001</Item>

    in reply to: Need to build an interface for XML to SQL Datadata #7475
    AnonymousJeff Reed
    Spectator

    Thank you Rich,

    The data generation tool Fabian pointed me to in the github community looks like it did a good job of generating test files, (haven’t used the files I generated yet). The tool populates 40-50 field variables so it is a good source to test basic load functionality but shy of being able to use for timing on a fully populated record load test.

    Though my needs are only for processing ‘I’ records, the tool will be able to handle all record types. I am currently working on incorporating a user customized XML reference file to define the fields to extract into the load file.

    I am handling the data as one flat file with each row representing a case/tumor. Patient/header data will be repeated on each row. Much of the validation and breakdown of the data will be done in the DB. I still see the database needed to check for duplicates and generate aggregate totals.

    Building multiple files based on a relational data model is not envisioned at this point but I could see where that would be useful if there is a global data model of the data stores to match the XML. Generic SQL to generate schemas could be a component of that effort. Relationally we consider a unique case/tumor key on the facility_id, Accession_nbr and sequence_nbr. Identification of the patient is a different story which I am sure there is a healthy thread out there somewhere…

    A correlated side project will be to use the EDIT50.DLL from the CDC to apply edit checks/scoring for each tumor record. This will be incorporated into the NAACCR XML file parser I am building (looking for a name for this beast). This process would also create KB delimited file(s) for use in bulk loading.

    Thank you for all your feedback

    J

    in reply to: Need to build an interface for XML to SQL Datadata #7470
    AnonymousJeff Reed
    Spectator

    Brought a tear to my eye to be able to generate 5,000 record sample files for V16,V18 fixed and xml to include my test facility_id and home state of IL. Now for some real testing …

    in reply to: Need to build an interface for XML to SQL Datadata #7466
    AnonymousJeff Reed
    Spectator

    Are there any sample large record sets files for V18 fixed and XML files out there? Also looking for recommendations/links for tools for helping generate sample files (SEER?).

    in reply to: Need to build an interface for XML to SQL Datadata #7446
    AnonymousJeff Reed
    Spectator

    Status Update:

    My first task was to test Oracle’s capabilities to store and process XML. I used their bulk Data Loader program to load the sample xml NAACCR version 140 file into a CLOB (Character Large Object Blob) field that could then be parsed with Oracles XML functions. The field size for a CLOB can be up to 128 Terabytes. Didn’t take me long to say forget this Oracle XML code given the CDC DLL’s capabilities to parse a NAACCR XML file.

    Next task was to create a XMAL application, (Screen pic in attachment), with a back-end of C# for parsing an XML file using the CDC’s XMLPlus.DLL. I got the application to read the XML file and assemble patient/tumor records in a record/field delimited file suitable for bulk loading into our DB. I am adopting what I am calling the KB delimiter in honor of Kathleen B., &#187. (»)

    Next Queued tasks
    • Integration of the CDC’s EDIT50.dll for scoring and reporting.
    • Identify/create better testing data. The Current XML sample file is a NAACCR version 140 ‘A’ record type. I need to test 160 and 180 versions for record type I.
    • Continue refining/testing load to add error handling and reporting.

    Attachments:
    You must be logged in to view attached files.
    in reply to: Need to build an interface for XML to SQL Datadata #7286
    AnonymousJeff Reed
    Spectator

    Thank you Kathleen,

    What a nice dissection you gave in navigating a NAACCR XML record from a patient record point of view. That makes a lot of sense to adopt XML translations to break down the records as part of this loading process. I could see breaking out the different record type’s into separate XML load modules to support relational data models that would be useful. We only get the ‘Incident’ record type so our focus may be a little different as it centers on tumors with the classic struggle to uniquely identify a patient to weed out duplicates. (Pls use a separate thread for the duplicate topic 😉

    My background is heavy on database architecture tied to ETL performance so I do have a classic bias of “I can do it faster” that I am working on. That strength/weakness makes me still lean towards producing files for bulk inserting rather than individual insert statements as we regularly need to process over 100 thousand files a day. As we already have a lot of business rules and validation tables in use in the database we are looking to leverage that in the end solution. That said, I will continue to try and keep a focus on a universal solution that will support a broader audience.

    If nothing else my flat earth file mentality may keep “encouraging” you add light to this sea of data ….

Viewing 15 posts - 1 through 15 (of 17 total)

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors