Gillum.org

Jan 27

Lobbying databases

The guv’ment doesn’t make it so easy to search for lobbying data, I’ve found, but a little work can make it so:

http://senate.gov/legislative/Public_Disclosure/database_download.htm

The quarterly files come zipped as XML files, which can be converted into database files. It’s much faster than going through the Web interface on the Senate’s site.

Dec 09

Linux on Windows

I’m sure everyone knows this already, but cygwin provides a great way to use Linux commands on Windows. it was very helpful recently when I had to string lots of files together and use the “tail” command. Any other Linux tips for Windows are appreciated!

Sep 21

Google map overlays and the mortgage crisis

The meltdown on Wall Street, which has become issue No. 1 in the race for the White House, was fueled in part by lots of risky lending in Arizona.

I helped a colleague of mine examine millions of loan applications, data which came from the Federal Financial Institutions Examination Council. We took a look at three years’ worth of data, and found that many Census tracts in the Tucson and Phoenix metro areas had high rates of subprime mortgages.

One way I thought we could explain this to readers was to show them how widespread lending was in their neighborhood or different parts of the Tucson area. Using Google Maps, I made a searchable tool for people to do just that.

The process was fairly straightforward. Here’s how it came together:

       var gx = new GGeoXml("http://site-url/subprime.kml");

All that was left was to put the latitude/longitude of Tucson in to center the map on the city. 

The map eventually went with a larger story on lending. And those shapefiles created for the Google map also were used to create a graphic for the paper.

Aug 13

Fun with Chumby

So, I decided to break down and buy a Chumby, an Internet appliance of the squishy type.

It does lots of neat things, such as displaying news tickers, photos and your e-mail messages. It even has an alarm clock.

With all this fuss about news tickers, then, I made my own to display breaking news headlines from the Star, as well as hourly business updates.

So far, my coworkers aren’t really sure if it’s neat or not. Or even what the Chumby is.

May 19

Social promotion in Tucson

After 10 months of intensive public-records negotiating, programming, and collaborating with editors and reporters, I’m finally done with a project that examined social promotion in Tucson-area schools.

The project was unique in the sense that few (if anyone?) has done this before. Social promotion, in a nutshell, is the phenomenon of pushing students to the next grade even though they don’t pass their subjects. Our team, based on a source’s tip, examined the prevalence of this in all (but one) public-school districts in the metro area.

Basically, what made this project hard (among many things) was the lack of comprehensive grade data among school districts. There was almost no uniformity in data variables, and in many cases, we had to figure out what constituted core classes (English, math, etc.). I had to do a lot of cleaning up in Perl to get the data to talk to one another.

The other hard part was writing queries to parse out — accurately — how many students failed one, two or three (or more) classes in each school for each year. We then compared that with the school’s official promotion rate for each grade. Wide differences indicate that students are being moved along more often than their academic progress merits. 

The other main component was looking at the discrepancies between scores on the state test, Arizona’s Instrument to Measure Standards, and how often students failed English and math classes. Double-digit gaps, our experts and data show, suggests grade inflation is also occurring. 

Mar 08

Converting those pesky PDFs to TXTs

It really irritates me when I ask public agencies for Excel or tab-delimited files, and I’m greeted with a PDF in my e-mail inbox.

Alas, if the coding is right, that issue can be fixed using PDF2TXT (if you have a PC). This basically allows you to strip out the plain text, with the appropriate tabs (if it’s a spreadsheet) to then copy into Excel.

Mac users, you may have another option. You can open the PDF inside Preview, a lightweight alternative to Adobe Acrobat. Then, you copy the contents into TextEdit, and save that as a text file to open inside Excel.

There’s also Adobe’s free service, pdf2txt@adobe.com. Simply e-mail a PDF attachment to that address, and you’ll get back a text file in return.

Feb 04

A new set of (graphics) eyes

We in the newspaper business can be stumped with ways to present complex information. We try our best with breakout boxes, charts and other morsels of digestible journalism, but it doesn’t always work well.

IBM has come up with a neat way to visualize complex pieces of data, such as Michigan’s most violent cities or state-by-state gasoline taxes. I’d be interested to see what Flint Expatriates has to say about its namesake city having a sizable spot on the first map.

Here’s the IBM link to ManyEyes, as well as INSNA

Nov 25

Mapping foreclosures

ArcView is a very handy program for mapping information, particularly when you can visually find relationships between disparate sets of data.

Recently, I worked with another Star reporter to analyze foreclosure trends in Tucson, and compared those with the incidence of high-risk loans. We used two sets of data: The first, from NICAR, contained information on every mortgage application in the United States for 2005 and 2006 (when risky lending was generally considered to be the most widespread). The second set was a list of most 2007 foreclosures that my colleague, Christie Smythe, obtained from RealtyTrac.

If you’re ever interested in figuring out trends via mapping, here’s a step-by-step list that could come in handy. Some of it can be dense and technical, but the general ideas alone could help. (Here’s the map, which Kori Rumore made visually spectacular, if you want an idea of the finished product.)

Step 1: Preparing the data

Between the two, we were able to calculate the rate of high-risk (and possibly subprime) loans to total approved loans per census tract.

Step 2: Mapping the data

Step 3: Making sense of the data

Now, back to our geocoded addresses. ArcView has a function that allows us to plot points based on X and Y coordinates. That created a new “layer” with all of the foreclosures plotted.

The results: The foreclosure points were spread out all over Tucson, with more in the darker-shaded census tracts. In other words, areas with the greater concentrations of high-risk loans had more foreclosed homes the following year.

Here comes the magic. Census tracts are useful for journalists, but not necessarily for ordinary people. What if we were able to tell our readers which neighborhoods had the highest number of foreclosures? To do that:

From here, the rest was straightforward. We then opened the table in Excel and sorted the table by the greatest number of foreclosures. Christie then called up the heads of some neighborhood associations for some thoughts:

Midvale Park Neighborhood Association President Joe Miller said the reason might be that Midvale Park was seen as a more desirable place to live than some of the surrounding neighborhoods. Some buyers may have stretched their finances to get in, he said.

“They were probably overly optimistic,” he said.

This entry was posted on Sunday, November 25th, 2007 at 3:23 pm and is filed under Computer-assisted reporting. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Oct 19

Woah. What just happened in Phoenix?

I’ve been following this arrest of two Phoenix New Times executives with much interest. Sadly, it hasn’t surprised me much, considering Sheriff Joe’s tough law-enforcement approach in the past, although the quick turnaround by Maricopa County Attorney Andrew Thomas did.

The New Times folks wrote about what they called a “breathtakingly unconstitutional” request for information regarding their Web site users. They wrote about it, and were later sent to the slammer.

From a press conference today with Thomas regarding the arrest, via the New Times:

“There’s a big difference between that and putting his name and address on the front cover,” as the New Times did late in 2006. This reporter had to point out to Thomas that the law in question did not apply to print publication of such addresses, only Internet publication of same.

Thomas mumbled a response, to which I shot back: “So the law doesn’t matter to you?”

“That’s not what I said,” he frowned.

This ordeal happened around the same time of another alt-weekly arrest, this one in Orlando, where authorities say the paper aided and abetted prostitution.

I try not to get too worked up about this stuff, but considering it happened 90 miles up the road from my paper, it does make me a little squeamish.