How to Get NPI Data Into Your Systems

Millions of records with helpful information on all types of health-care providers are available from the U.S. Government's Centers for Medicare & Medicaid Services in the National Provider Identifier data dissemination files - if you know how to access it. This article will outline what it would take for you to download the NPI data and put it to work.

Your first option is to go to the NPI Web site and search for one record at a time. Doing it that way isn't very efficient, but it has the advantage of not requiring any technical abilities greater than navigating a Web site and cutting and pasting text. It's the best option for when you only need to look up a handful of providers.

Your second option is to download the entire file. The federal government has it made it available on the same Web site where you can search for providers one at a time. It's a zipped file that is nearly 300MB in size. When unzipped it measures about 2GB. The good news is that about 50% of the file is "air." That's right, blank fields in double quotes. So if your data cleansing ETL scripts are working properly, you'll only need to make room for a 1GB database. The not-so-good news is that even if you only want a portion of the data -- a specific geographic area or certain specialty -- you'll have to download and unzip the entire data dissemination file.

Doing that probably means you'll be working with your IT department. They will want to know what format you want the data in, where you want to put it and how you want to access it. The extracted file may be too big for most off-the-shelf database applications such as Access, but you may be able to squeeze it in. Queries will likely be limited and run slowly, but once it is set up you won't have to ask for other resources to use it until it's time for the monthly update. Many companies will download the database to a server and load it into a SQL environment where SQL savvy analysts can tap into it. Your IT department can also help you fix data hiccups that block the raw government file from being loaded quickly and easily.

Once the data is in a place where you can access it and you've cleaned up the bad addresses you'll be ready to start running queries.

Next