Linux on Windows

I’m sure everyone knows this already, but cygwin provides a great way to use Linux commands on Windows. it was very helpful recently when I had to string lots of files together and use the “tail” command. Any other Linux tips for Windows are appreciated!

Google map overlays and the mortgage crisis

The meltdown on Wall Street, which has become issue No. 1 in the race for the White House, was fueled in part by lots of risky lending in Arizona.

I helped a colleague of mine examine millions of loan applications, data which came from the Federal Financial Institutions Examination Council. We took a look at three years’ worth of data, and found that many Census tracts in the Tucson and Phoenix metro areas had high rates of subprime mortgages.

One way I thought we could explain this to readers was to show them how widespread lending was in their neighborhood or different parts of the Tucson area. Using Google Maps, I made a searchable tool for people to do just that.

The process was fairly straightforward. Here’s how it came together:

  • In the mortgage data, the smallest unit of geographic specificity is the Census tract. By doing GROUP BY and sum() queries in SQL Server, I was able to pull out the dollar value of risky loans — those with a rate spread greater than 3 or 5 percent, depending on the loan — and total loans in each tract. 
  • I exported that data as DBF and then imported it into ArcView. Since I had a shapefile of Arizona’s Census tracts, I joined the DBF to those tracts. I also calculated the percent of risky to total loan value.
  • I exported that layer’s shapefile to a folder on my desktop, and imported it into Shp2kml, a free program that converts shapefiles to KML files, the latter that can be read by Google Earth and Google Maps.
  • I uploaded the newly created KML file to a Web server, and created a Google Maps API in a new, blank Web page. I then told the map to reference that KML file. Here’s an excerpt of the code:
       var gx = new GGeoXml("http://site-url/subprime.kml");

All that was left was to put the latitude/longitude of Tucson in to center the map on the city. 

The map eventually went with a larger story on lending. And those shapefiles created for the Google map also were used to create a graphic for the paper.

Fun with Chumby

So, I decided to break down and buy a Chumby, an Internet appliance of the squishy type.

It does lots of neat things, such as displaying news tickers, photos and your e-mail messages. It even has an alarm clock.

With all this fuss about news tickers, then, I made my own to display breaking news headlines from the Star, as well as hourly business updates.

So far, my coworkers aren’t really sure if it’s neat or not. Or even what the Chumby is.

Social promotion in Tucson

After 10 months of intensive public-records negotiating, programming, and collaborating with editors and reporters, I’m finally done with a project that examined social promotion in Tucson-area schools.

The project was unique in the sense that few (if anyone?) has done this before. Social promotion, in a nutshell, is the phenomenon of pushing students to the next grade even though they don’t pass their subjects. Our team, based on a source’s tip, examined the prevalence of this in all (but one) public-school districts in the metro area.

Basically, what made this project hard (among many things) was the lack of comprehensive grade data among school districts. There was almost no uniformity in data variables, and in many cases, we had to figure out what constituted core classes (English, math, etc.). I had to do a lot of cleaning up in Perl to get the data to talk to one another.

The other hard part was writing queries to parse out — accurately — how many students failed one, two or three (or more) classes in each school for each year. We then compared that with the school’s official promotion rate for each grade. Wide differences indicate that students are being moved along more often than their academic progress merits. 

The other main component was looking at the discrepancies between scores on the state test, Arizona’s Instrument to Measure Standards, and how often students failed English and math classes. Double-digit gaps, our experts and data show, suggests grade inflation is also occurring. 

Converting those pesky PDFs to TXTs

It really irritates me when I ask public agencies for Excel or tab-delimited files, and I’m greeted with a PDF in my e-mail inbox.

Alas, if the coding is right, that issue can be fixed using PDF2TXT (if you have a PC). This basically allows you to strip out the plain text, with the appropriate tabs (if it’s a spreadsheet) to then copy into Excel.

Mac users, you may have another option. You can open the PDF inside Preview, a lightweight alternative to Adobe Acrobat. Then, you copy the contents into TextEdit, and save that as a text file to open inside Excel.

There’s also Adobe’s free service, pdf2txt@adobe.com. Simply e-mail a PDF attachment to that address, and you’ll get back a text file in return.

Tags: CAR PDF

A new set of (graphics) eyes

We in the newspaper business can be stumped with ways to present complex information. We try our best with breakout boxes, charts and other morsels of digestible journalism, but it doesn’t always work well.

IBM has come up with a neat way to visualize complex pieces of data, such as Michigan’s most violent cities or state-by-state gasoline taxes. I’d be interested to see what Flint Expatriates has to say about its namesake city having a sizable spot on the first map.

Here’s the IBM link to ManyEyes, as well as INSNA

Tags: CAR graphics

Mapping foreclosures

ArcView is a very handy program for mapping information, particularly when you can visually find relationships between disparate sets of data.

Recently, I worked with another Star reporter to analyze foreclosure trends in Tucson, and compared those with the incidence of high-risk loans. We used two sets of data: The first, from NICAR, contained information on every mortgage application in the United States for 2005 and 2006 (when risky lending was generally considered to be the most widespread). The second set was a list of most 2007 foreclosures that my colleague, Christie Smythe, obtained from RealtyTrac.

If you’re ever interested in figuring out trends via mapping, here’s a step-by-step list that could come in handy. Some of it can be dense and technical, but the general ideas alone could help. (Here’s the map, which Kori Rumore made visually spectacular, if you want an idea of the finished product.)

Step 1: Preparing the data

  • The RealtyTrac addresses, in Excel form, needed “geocoding” — the process that adds X and Y (latitude and longitude) coordinates to most addresses. I did that via batchgeocode.com.
  • Next, I needed to calculate lending trends by census tract. The Star obtained two years’ worth of data from NICAR, which I pulled into Microsoft Access. From there, I ran two SELECT queries with GROUP BY clauses:
    • one that counted the total number of risky mortgages per census tract with a rate spread greater than 3 percent, and
    • the second, which counted with the total number of approved loans.

Between the two, we were able to calculate the rate of high-risk (and possibly subprime) loans to total approved loans per census tract.

Step 2: Mapping the data

  • Armed with a spreadsheet with colums that show 1) census tract name and 2) percentage of high-risk loans, I was able to import that into ArcView. I already had a shapefile — a map overlay, if you will — of census tracts; the next step was to do a joinon the two fields in common between the data sets (the census tract number).
  • ArcView then allowed me to change the “symbology” to shade the different tracts based on percentage. We chose three different shades of color: one for 20-30% high-risk mortgages per tract, a second for 30-40%, and the third for more than 40 percent high-risk loans.

Step 3: Making sense of the data

Now, back to our geocoded addresses. ArcView has a function that allows us to plot points based on X and Y coordinates. That created a new “layer” with all of the foreclosures plotted.

The results: The foreclosure points were spread out all over Tucson, with more in the darker-shaded census tracts. In other words, areas with the greater concentrations of high-risk loans had more foreclosed homes the following year.

Here comes the magic. Census tracts are useful for journalists, but not necessarily for ordinary people. What if we were able to tell our readers which neighborhoods had the highest number of foreclosures? To do that:

  • We first selected the layer with our foreclosure points. Right-clicking it gave us an option to do a “join.”
  • We then told the program to join the data to another layer (the neighborhoods layer), a function known as a “spatial” join.
    • These layers are available from Pima County via its very own GIS repository. The best part is, they’re free.
  • We then executed the spatial join. When we’re done, we were left with a table that contained not only each neighborhood, but also the number of points (foreclosure locations) that fell inside those neighborhood boundaries. At last, we had a list of neighborhoods and how many foreclosures were inside their borders for most of 2007.

From here, the rest was straightforward. We then opened the table in Excel and sorted the table by the greatest number of foreclosures. Christie then called up the heads of some neighborhood associations for some thoughts:

Midvale Park Neighborhood Association President Joe Miller said the reason might be that Midvale Park was seen as a more desirable place to live than some of the surrounding neighborhoods. Some buyers may have stretched their finances to get in, he said.

“They were probably overly optimistic,” he said.

Woah. What just happened in Phoenix?

I’ve been following this arrest of two Phoenix New Times executives with much interest. Sadly, it hasn’t surprised me much, considering Sheriff Joe’s tough law-enforcement approach in the past, although the quick turnaround by Maricopa County Attorney Andrew Thomas did.

The New Times folks wrote about what they called a “breathtakingly unconstitutional” request for information regarding their Web site users. They wrote about it, and were later sent to the slammer.

From a press conference today with Thomas regarding the arrest, via the New Times:

“There’s a big difference between that and putting his name and address on the front cover,” as the New Times did late in 2006. This reporter had to point out to Thomas that the law in question did not apply to print publication of such addresses, only Internet publication of same.

Thomas mumbled a response, to which I shot back: “So the law doesn’t matter to you?”

“That’s not what I said,” he frowned.

This ordeal happened around the same time of another alt-weekly arrest, this one in Orlando, where authorities say the paper aided and abetted prostitution.

I try not to get too worked up about this stuff, but considering it happened 90 miles up the road from my paper, it does make me a little squeamish.

Tags: the press