On this R-data statistics page, you will find information about the spam7 data set which pertains to Spam E-mail Data. The spam7 data set is found in the DAAG R package. You can load the spam7 data set in R by issuing the following command at the console data("spam7"). This will load the data into a variable called spam7. If R says the spam7 data set is not found, you can try installing the package by issuing this command install.packages("DAAG") and then attempt to reload the data. If you need to download R, you can go to the R project website. You can download a CSV (comma separated values) version of the spam7 R data set. The size of this file is about 101,669 bytes.
Spam E-mail Data
The data consist of 4601 email items, of which 1813 items were identified as spam.
This data frame contains the following columns:
total length of words in capitals
number of occurrences of the \$ symbol
number of occurrences of the ! symbol
number of occurrences of the word ‘money’
number of occurrences of the string ‘000’
number of occurrences of the word ‘make’
outcome variable, a factor with levels
n not spam,
George Forman, Hewlett-Packard Laboratories
These data are available from the University of California at Irvine Repository of Machine Learning Databases and Domain Theories. The address is: http://www.ics.uci.edu/~Here
spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang +
money + n000 + make, data=spam7)
Dataset imported from https://www.r-project.org.