We will explore diamonds dataset, history, and use EDA to create quantitative analysis.
Welcome
- Salomon, data analyst at Facebook, will make EDA to explore diamond.
- In the end, we will know, given the diamonds, is it a good deal or not.
- Wel also be able to predict the price of given diamonds.
Scatterplot Review
library(ggplot2)
data(diamonds)
names(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
ggplot(aes(x=carat, y = price),
data = diamonds)+
geom_point()+
coord_cartesian(xlim=c(0,quantile(diamonds$carat,0.99)),
ylim=c(0,quantile(diamonds$price,0.99)))+
stat_smooth(method = "lm")

Price and Carat Relationship
- There are fix relationship between carat and price
- Same carat may have higher price, but it depends on the other variables
More weight of carat, the higher price, but not go any lower
- We can see that some exponential increase as the price go higher.
- diversion increase as carat higher and price higher.
By using linear model, we may have off predicting the price(too bias!)
Frances Gerety
- We can’t just input the diamond data and pop the price.
- The diamonds’ price itself has each background story related to it.
- First found south africa.
- Earlier diamonds only found in India and Brazil. Back then, diamonds only priced by its supply.
- Then the biggest diamonds cartel build in US and control the diamonds market, De Beers which advertise the diamonds in many other way
A diamonds is….. FOREVER
- Diamonds earlier only for the rich, but the slogan, which made by Frances Gerety, quote “A diamonds is forever” which point to enggagement should make diamond engagement ring.
The Rise of Diamonds
- The slogan itself is powerful. It create the intense of the diamonds.
- They do that, as earlier said, the company has create a cartel and monopolize the diamonds in South Africa.
- Since then they give movie star a diamond, price vary giving each other between selebrity.
- They can even make Britsh Royal to use diamonds in their crown over other gems.
- They create the engagement ring should wear diamonds. And advertise what are the price of diamonds compared to what men achieve in life.
- Engagament symbol at Facebook
- Movie engagement most contain diamond
ggpairs Function
- each variable plotting other variable in ggpairs
- qual qual, scat qual auan
- group histogram in top left qual-qual group by x
- boxplot qual-quan
- correlation at lower right quan-quan
# install these if necessary
# install.packages('GGally')
# install.packages('scales')
# install.packages('memisc')
# install.packages('lattice')
# install.packages('MASS')
# install.packages('car')
# install.packages('reshape')
# install.packages('plyr')
# load the ggplot graphics package and the others
library(ggplot2)
library(GGally)
library(scales)
library(memisc)
## Loading required package: lattice
## Loading required package: MASS
##
## Attaching package: 'memisc'
##
## The following object is masked from 'package:scales':
##
## percent
##
## The following objects are masked from 'package:stats':
##
## contr.sum, contr.treatment, contrasts
##
## The following object is masked from 'package:base':
##
## as.array
# sample 10,000 diamonds from the data set
set.seed(20022012)
diamond_samp <- diamonds[sample(1:length(diamonds$price), 10000), ]
ggpairs(diamond_samp, params = c(shape = I('.'), outlier.shape = I('.')))
