


This is a subsample of the email data set.


  • spam - Indicator for whether the email was spam.
  • to_multiple - Indicator for whether the email was addressed to more than one recipient.
  • from - Whether the message was listed as from anyone (this is usually set by default for regular outgoing email).
  • cc - Indicator for whether anyone was CCed.
  • sent_email - Indicator for whether the sender had been sent an email in the last 30 days.
  • time - Time at which email was sent.
  • image - The number of images attached.
  • attach - The number of attached files.
  • dollar - The number of times a dollar sign or the word "dollar" appeared in the email.
  • winner - Indicates whether "winner" appeared in the email.
  • inherit - The number of times "inherit" (or an extension, such as "inheritance") appeared in the email.
  • viagra - The number of times "viagra" appeared in the email.
  • password - The number of times "password" appeared in the email.
  • num_char - The number of characters in the email, in thousands.
  • line_breaks - The number of line breaks in the email (does not count text wrapping).
  • format - Indicates whether the email was written using HTML (e.g. may have included bolding or active links).
  • re_subj - Whether the subject started with "Re:", "RE:", "re:", or "rE:"
  • exclaim_subj - Whether there was an exclamation point in the subject.
  • urgent_subj - Whether the word "urgent" was in the email subject.
  • exclaim_mess - The number of exclamation points in the email message.
  • period_mess - The number of periods in the message.
  • signoff - Whether a sign-off of "Cheers", "Regards", or "Best" (also, "Best Regards") was used.
  • number - Factor variable saying whether there was no number, a small number (under 1 million), or a big number.

Link To Google Sheets:



License Type:



David Diez's Gmail Account, early months of 2012. All personally identifiable information has been removed.

R Dataset Upload:

Use the following R code to directly access this dataset in R.

d <- read.csv("https://www.key2stats.com/Sample_of_50_emails_1400_85.csv")

R Coding Interface:

Datasets Tag Questions & Instructional Blocks

No results found.