Skip to main content
Steven Cramer

Power Outages in SF Bay Area (Part 1)

I never thought much about power reliability—six years in downtown Chicago, and I can’t recall an outage that impacted my life. But in the Berkeley Hills, nestled in the wealthy, high-tech San Francisco Bay region, we lose power a few afternoons a year. It’s baffling that homeowners in $2M houses tolerate sudden WiFi, fridge, and light failures. During our third power outage in as many months, I wondered: How reliable is my electricity? Is it better a few blocks away?

Averages Don't Tell It All

Power outages don’t follow city or county lines: when the power goes out at my house, we walk 10 minutes to a café and keep working.

I imagined a map showing power reliability across my area—something like this:
Author’s recording of power outages within 10km of Berkeley between November 22, 2024 and December 10, 2024. Dark orange is a lot of time without power; grey is a bit of time without power; no color is sweet 100% reliable electricity.

Surely someone must publish this map. So I went looking for it online...

Currently Available Outage Information

Real-Time Outage Information from PG&E

In my neighborhood, power outages tend to be highly localized. My intuition is backed up by actual data.

PG&E, our flawed utility provider, it shares the exact location of houses without power. For example, as I type this now, about 1 customer in this area of approximately 15 houses has a power outage due to an unplanned outage. Their neighbors have power.


The homes in the green colored area have their power out, and have for the last 5 hours. I’m sure someone writing a book on PG&E’s organizational culture can write a nice chapter on why this area is colored green.

Historical Outage Information from PG&E

While PG&E publishes this high-resolution outage data, historical performance is reported at a level approaching the county-level:
https://www.pge.com/assets/pge/docs/about/pge-systems/CPUC-2023-Annual-Electric-Reliability-Report.pdf


This is cute; it clearly doesn’t tell a good story, but it also doesn’t tell the full story.

They also have a data request program, which allows academic researchers and governments to request data. However, the log of past requests and text of data available suggests that polygon-level outage information is not available.

Outage information from Poweroutage.us

The private market has come to the rescue, offering outage information for the entire country.


https://poweroutage.us/area/state/california

Their product page boasts: “Need historical data? We've got it! Every data point collected is also archived! Outage data can be retrieved at the utility, state, county, and city levels. Data can also be summarized in any way that meets your needs, raw data, hourly, daily, outage events, etc”.

Inquiring about purchasing polygon-level data reveals that they can’t provide anything more granular than the city/county level.

Outage information on GitHub

User Simon Willison scrapes PG&E’s outage map and posts a history of it on GitHub!

It seems promising, but a look through the readme shows that Simon’s data doesn’t have any polygon information. Per his comments:

This repository only archives outages that are reported for a single location.

The outage map also includes polygon data, which is much more interesting... but has not proven practical to archive here, for reasons explained in this issue comment. Short version: I'd have to constantly archive 100MB of data per snapshot because the polygons are so large!

It also suggests that people aren’t capturing this information because it’s too much data. Hmmm…

Outage information from California Office of Emergency Services (CalEOS)

The State of California essentially re-publishes the outage maps from PG&E and several other large California utilities on ESRI’s ArcGIS Hub.

This is the juicy data I’m after! From a state agency! Surely they’re capturing historical data!

Alas, an email reveals they are not capturing historical data. However, there’s both a “download data” button and an explicitly labeled Public Use license.

Perhaps I can make a contribution here…

Putting the pieces together: scraping CalEOS’s data

Spread across three files, CalEOS shares 11mb of data (approximately 2 iPod songs of data) updated every 10-20 minutes.

Indeed, this is a lot of data. Storing the raw data in GeoJSON would take something like 1 gb per day – enough that 2 weeks would fill a Gmail inbox on the free tier.

As I designed my data collection and analysis workflow, I had a few goals:

This lead to a few decisions:

Because I wanted to use basic Unix tools, I needed to think a lot about files. For simplicity, the filename does a lot of the heavy lifting, so it’s the first thing to design:

2024-12-02/layer0-2024_12_02T05_20_02_utc-0dff5…136f.json

With the key pieces written down, we can now write our bash function to download a file:

  1. Change to the proper directory, creating it if it doesn’t exist
  2. Use curl to download the file to a temp file
  3. Record the time the download finished
  4. Compute the hash of the downloaded file
  5. Write a file to disk using the naming scheme above. If we’ve never seen the file, record the entire file. If it’s a duplicate, create an empty file to record that the data has not changed.

Then, we loop that across all of the URLs we want to download, tell Cron to run it every 2 minutes, and we’re off to the races!

Check back for updates about analysing this data!