CoG Church Database
April 30, 2008
I recently completed a fun little technical project, and it got me thinking about a true “digital divide” in our present, local, modern culture. I’m a licensed pastor with the Church of God, Anderson Indiana. Every year, the central office in Anderson publishes a Yearbook, which is essentially a directory of all the churches and pastors that are part of the Church of God.
This last year, I ordered both a printed Yearbook as is my custom and for the first time the Yearbook on CD. I expected to received on the CD a set of data files in a variety of formats and styles. I didn’t expect the data to be in a PostgreSQL or MySQL dump, but perhaps an Excel file, tab-separated values file, or some other common data format. Instead what I found on the CD, much to my disappointment, was the Yearbook in PDF.
It somewhat shocked me, in part for technical reasons. Why send out a CD with only a small PDF attached? Why not allow folks to just download the PDF? Part of the reason may be economic. I had to pay for the CD. But honestly, is the point here to make some money or to perform a service? Certainly, printing the Yearbook is expensive, and there should be some reasonable monetary compensation for that. But sending electrons across the Internet is cheap. I’d happily donate hosting to provide the data to those who chose not to purchase a printed copy.
That argument aside, there’s the point of having the content on CD at all if it’s only in PDF. What can you do with a PDF except print it, copy it to share with others, or search within the text? Of course, I can flip through the printed Yearbook with about as much efficiency as I can search for a name in the PDF. So it seemed somewhat pointless to me to have a PDF version at all provided it’s sold by CD and not available online.
I wanted to try to demonstrate that with a very short amount of effort, the data (which is public, by the way) could be used to power something interesting and ultimately useful. So I wrote a program that would read through the PDF and extract the data, clean and organize it, and load it into a simple MySQL database.
That was the hard part, unfortunately. From there, I spent a couple hours one afternoon putting together a simple set of three or four page templates to query the database. I also tied on an exporter to build a Google Earth KML XML file. Anybody who has done any basic CRUD web application work would consider what I did pretty elementary. And they’d be right. I wouldn’t want to show off the code, because it was thrown together pretty rapidly, but it works.
Take a look here for the online directory: CoG Church Yearbook
And this is what started me thinking: As a technologist, when I heard of the option to receive the Yearbook on CD, my mind quickly thought about how the data could be used, how it could be built into some sort of application for query and reporting. But I suspect this never occurred to the publishers of the data. Else why would they not provide the data in a more easily useable format? The data is public domain, there’s nothing confidential about it, and in fact, we in the pastorate should want to tell anyone interested where our churches are located, how large each might be, and what phone numbers folks could call to find out more.
I can’t believe it’s because the publishers want to try to more effectively monetize the data. They’re not in the business of profits above service. Instead, I think it’s because they just didn’t consider the fact that in our informational world, data is more important than presentation.
Having access to real raw data is the true source from which powerful applications come. Building a search engine on top of the Church of God Yearbook data was easy. It’s a simple application that almost anyone competent in web application development could have put together quickly. As such, it’s not worth all that much on its own. But combine the simple application with the data, and you have something that’s extremely useful.
I think there’s an important lesson here: Look at data as the source from which a plethora of applications can flow, rather than the other way around.
Comments (1)
Lloyd wrote:Great job extracting the data!
I have your database bookmarked!
I too would like to see the data readily available, but I can explain why they don't. They used to, but they found that people ended up using the database for direct marketing and e-mail campaigns to churches and pastors. The PDF is a pain, but it can keep people from abusing it, unless they go to the trouble you have gone to.
The reality is that this can also be extracted from the church finder application that appears on the Church of God website.
