Several years of email statistics - code included

9 years ago

Out of boredom, I decided to create a couple of python scripts to download the dates of all the email I ever sent through Gmail and get some statistics on it.

Here they are: Number of emails sent per weekday, month, and hour.

I didn't know what to expect, to be honest, but – at first sight – nothing out of the ordinary. The per weekday chart shows, obviously, that I send less email on weekends (though I send more on Sundays than Saturdays, go figure). Friday is also not as busy as the rest of the week (kind of expected :)).

Per month shows that my most active month for email sending is October. No clue why is that, but it might be related to the beginning of the Autumn weather, somehow :)

Finally, per hour shows that I'm not that much of a night email kind of guy. Most of my emails are sent throughout the day, with the usual break for lunch around 13. This actually showed me how late I'm having lunch, hehe. My most active hours of the day for email sending seem to be 10am and 2pm (mid morning and right after lunch). But the interesting part is how, after both of these periods, it decays really smoothly. I wasn't expecting that.

And there you go, another time waster. Here are the scripts, if you plan on using them. Just replace the 'username' and 'password' part on the first one, to retrieve your sent email from your gmail account.

Cheers. get_email.py import getpass, imaplib

M = imaplib.IMAP4_SSL('imap.gmail.com', 993) M.login('username@gmail.com', 'password') M.select('[Gmail]/Sent Mail')

typ, data = M.search(None, 'ALL') fout = open("dates.txt", "w")

for num in data0.split(): typ, data = M.fetch(num, '(RFC822)') split_data = data0[1].split('\r') for line in split_data: if line.startswith('\nDate:'): print data0[0] fout.write('%s\r\n' % line[7:]) M.close() M.logout()

fout.close() generate_chart.py from datetime import datetime from pychart import *

def parse(data, format): try: return datetime.strptime(data, format) except ValueError: pass return None

months = {} weekdays = {} hours = {}

def increment_item(dict, item): if not dict.has_key(item): dict[item] = 0 dict[item] = dict[item] + 1

def store(dt): increment_item(weekdays, parsed.weekday()) increment_item(hours, parsed.hour) increment_item(months, parsed.month)

fin = open('dates.txt', 'r')

for line in fin: parsed = parse(line[:24], '%a, %d %b %Y %H:%M:%S')

if not parsed: #print 'Could not parse "%s"' % line pass else: store(parsed) #print parsed

#print months #print weekdays #print hours

def create_data(data): final_data = [] for item in data: final_data.append((item, data[item])) return final_data

datasets = ( #('Months', create_data(months)), ('Weekdays', create_data(weekdays)), #('Hours', create_data(hours)), )

import cairo

width, height = (640, 480) surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, width, height) options = { 'legend': {'hide': False}, 'background': {'color': '#f0f0f0'}, }

import pycha.bar

chart = pycha.bar.VerticalBarChart(surface, options) chart.addDataset(datasets) chart.render() surface.write_to_png('weekdays.png')

fin.close()