Finding Related accounts

Oct 18, 2009
361
17
No code here, we do this using Excel VBA, but the same approach could be done in TCC.

Scenario: Hospital records are downloaded as pipe-delimited single-line text records that contain patient and "guarantor" contact info, account number, date of admission and discharge, amount of charges, etc. These are for people involved in MVAs (Motor Vehicle Accident). A typical download might be 90 accounts.

In hospital-industry jargon "guarantor" means "anyone other than an insurer who is financially responsible for the bill". So the patient can also be the guarantor, if the patient is a minor the parent could be a guarantor, etc.

Problem: Often there are "related accounts". These could be other people in the same accident, and/or more than one bill, e.g., from a follow-up visit. Those all need to be sent to a single insurer, such as the liability insurer of the person who caused the accident.

Consider 3 people in the same MVA -- maybe with different names, e.g., Jane Maggers (driver) , Mary Garcia (Jane's sister, passenger), Bill Jones (Mary's live-in boyfriend, and a passenger).

So, obviously, "same name" won't work. Even with one person name variations are common, e.g., Mary Lou Garcia in one record and Mary Garcia in another--and maybe they don't have correct birthdate and SSN info.

We tried looking for close account numbers but ran into this: Jane's account is 005 and Harry Fritz's is 007. Yes, probably related. But people in one car usually don't remember the names of folks in other vehicles. So we stopped using that.

First, we sequence records by Guarantor's Street Address. You can't do a direct "If A EQ B". The problem is there are too many variations, e.g., 123 North Place Apt 345 and 123 North Place #345. Solution: Compare everything up to the first space plus the next two characters. That catches over 95% of "same address".

Yes, it would consider 123 North Place and 123 Nontum Court as "equal". But that winds up extremely rare.

Caveat: 123 E 67TH ST evaluates as 123^67. ("What? How can an address be "out of range" ??") So I had to make sure it considered it as STRINGS.

For the remainders, next look at Guarantor Surname. (e.g., if Jane were the Guarantor for two children all 3 records would show the same Guarantor, even if they lived at different addresses.)

Together these pull over 98% of "relateds". Just to be extra careful, compare Patient Surname to spot any remaining "stragglers".

Finally (because this is coded), for the accounts that have been flagged, sort by Account Number and check if the Account Numbers are within 6 (arbitrary number). If so, include them in a set separated by single CR/LF's. End each group with a blank line. If not "close", include them in the list anyway as a one-line "group". Particularly with return visits, the first and second numbers will not be close--but addresses will match, etc.
 
Oct 18, 2009
361
17
Obviously, there are some limits to this approach. It wouldn't be realistic running 10k-25k records in an area with a lot of apartment buildings, and a lot of people in the area with common ethnic names, e.g., lots of Garcia, Gomez in a largely Hispanic area. Pretty much everyone in the area would be going to the same hospital--I once lived in an apartment complex with 847 units three blocks from a major hospital, and with more than 50 other large complexes within a 5-mile radius ...
 
Oct 18, 2009
361
17
It's not about parsing and cleaning up data, it's about certain non-obvious relationships. "Patient" and "Guarantor" are pretty much universal in the U.S. healthcare industry.

The reason I posted this is the quirks I ran into such as "Well, obviously, if two people live at the same address and the account numbers are close, the accounts are related" but "If Address A equals Address B" doesn't work because of minor variations. I then tried "If street number + space + next character match." But that pulled up too many "similars". (We're in the Dallas-Fort Worth area, with a population of 7.5 million.) "space plus next two" works for more than 98%.

Also, a string being "out of range" really threw me at first. (I think it was actually set as "variant".)
 
Oct 18, 2009
361
17
You might also be interested in phonetic-coding algorithms like Soundex and Metaphone.

We're limited because a really big download will be 120 records and I can't check against the database, only the download. In a more realistic scenario we would be checking against the past 3 weeks in the database, which would catch most follow-up visits and family members whose accounts come in different downloads.

We found that the first check needs to be guarantor addresses. (More than 99% of the time Guarantor and Patient have the same address, e.g., because Patient is a child.) The problem with anything like Soundex is it uses only a few consonants and it doesn't handle numbers. I start by having it sort by street number, which by itself catches about 90%. Plus, a quick glance through the entire list finds things like obvious spelling errors, "NO ADDRESS", and similar issues.

But, basically, it got annoying having to unflag "matches" that just had the same street number (keep in mind--50+ large apartment complexes in a 5-mile radius), and the "number + space + next two" catches almost all.

--------------------------------------
Just a side light: one interesting thing is it spots school bus accidents -- there are a bunch of house numbers, almost sequential, on the same street, with account numbers grouped together, and similar ages. And then there's one more close account number with a completely different address and the patient is much older. That's the bus driver.