NAACCR Geocoder

Home Forums Research & Data Use NAACCR Geocoder

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #5070
    Recinda Sherman
    Spectator

    We are currently working on improving the NAACCR Geocoder–currently focusing on improving the underlying street file data.

    Please use this forum to discuss and report any potential issues with the NAACCR geocoder.

    #5092
    AnonymousFrancis Boscoe
    Spectator

    Here is the example that was discussed on today’s call (the house numbers are real, but are altered from any patient’s):

    For the non-existent address 182 Washington Avenue, Albany, NY 12203

    AGGIE currently gives

    182 Washington Avenue Extension, Albany, NY 12203 (in the NAACCR version)
    182 Washington Avenue Avenue, Albany, NY 12203 (in the SEER*DMS version – I suspect a small bug here)

    with a match score of 100 (*update – based on the call, this result will be penalized in the future to have a score less than 100).

    However, in this case, the correct address is 182 Washington Avenue, Albany, NY 12210

    Other viable candidates would have been:

    282 Washington Avenue, Albany, NY 12203
    782 Washington Avenue, Albany, NY 12203
    982 Washington Avenue, Albany, NY 12203
    1082 Washington Avenue, Albany, NY 12203

    Here are my own Bayesian prior probabilities for each of these possibilities:

    182 Washington Avenue Extension, Albany, NY 12203 (0.35)
    182 Washington Avenue, Albany, NY 12210 (0.35)
    282 Washington Avenue, Albany, NY 12203 (0.05)
    782 Washington Avenue, Albany, NY 12203 (0.05)
    982 Washington Avenue, Albany, NY 12203 (0.05)
    1082 Washington Avenue, Albany, NY 12203 (0.14)
    None of the above (0.01)

    So AGGIE is picking the most likely choice here (at least a tie for the most likely choice) – and I think that most of the time, this would be the case – but would still be incorrect 65% of the time.

    I think this is a typical example, in that there will usually be a handful of possible alternatives for every typo. No matter how much we tweak the weights and penalties, I don’t see how AGGIE could ever guess correctly more than half the time. Certain kinds of analyses can tolerate having a few percent of the records geocoded to the wrong place. In New York, because we are legally mandated to publish small-area case counts, and because we do many small-area cancer investigations, we can’t. Hence requiring a match score of 100.

    #5215
    AnonymousFrancis Boscoe
    Spectator

    Another example. Ocean Avenue is not the same as Ocean Parkway. There are tens of thousands more like this.

    #5216
    AnonymousFrancis Boscoe
    Spectator

    I am unable to edit the above post – I just get taken to a blank screen. Anyhow, it was a screen shot showing how the two streets are miles apart. 512 KB is a tiny file size limit, you might want to rethink that.

    #6076
    AnonymousFrancis Boscoe
    Spectator

    After many rounds of improvements on AGGIE’s part, here is my assessment of how it compares with the previously existing data in the New York State Cancer Registry. It’s not as detailed as what New Jersey did, but should suffice.

      County level

    94.9% – AGGIE returns same county
    4.9% – AGGIE replaces known with unknown – this reflects our conservative approach (requiring high match score) and is similar to what we had in our old system. We can handle manual review of this many cases.
    0.2% – AGGIE replaces unknown with known – I spot-checked a few and AGGIE looked good
    0.02% – AGGIE replaces known with known – I spot-checked a few and AGGIE looked good

    Latitude/longitude (restricting to where it is known on both databases)
    96.1% – AGGIE is within 100 meters of existing registry value
    3.7% – AGGIE within 100 m – 1 km
    0.2% – AGGIE more than 1 km different

    Most of the differences in the 1-3 km range seem to arise from choosing different points on the same road. Of the ones I’ve spot-checked, AGGIE is correct 63%, registry original value 27% and neither 12%.

    In the 20-30 km range, AGGIE was correct 3 times, registry 6 times (only 9 examples total). Two of the AGGIE errors were where it replaced COUNTY ROUTE 2 with COUNTY ROUTE and placed a point on a seemingly random county route (matched to county parcel layer with a score of 100). Maybe this can be fixed, but obviously it is an infrequent occurrence. 3 of the errors were on Route 12 in Watertown, but the errors were inconsistent – registry was right twice and AGGIE once.

    All the 18 differences of >50 km were typos by the registry.

    On balance, AGGIE wins. Time to turn it back on for NY.

Viewing 5 posts - 1 through 5 (of 5 total)
  • The forum ‘Research & Data Use’ is closed to new topics and replies.

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors