• Home
  • About me
  • Curriculum
  • Projects
Facebook Linkedin Twitter

MauroCerbai

Software Engineer


I just received this email.
Congratulations!
Dear Mauro,
We are excited to offer you a Google Developer Challenge Scholarship to the Android Developer track.We received applications from many talented and motivated candidates, and yours truly stood out.
I'm very happy to announce that I've been selected for this scholarship involving the famous Google product and the amazing learning platform Udacity. Thank you!
Share
Tweet
Pin
Share
No commenti
CLUSTERING
186px-SLINK-density-data.svg.png
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).


It’s a unsupervised learning because the answer is not provided, there is not a label associated with the data indicating the correct answer



K-Means algorithm:
It can be achieved by various algorithms. The most basic and most used is k-means clustering.
Given an initial set of k cluster center randomly chosen, the algorithm proceeds by alternating between two steps:
  • Assignment step: Assign each data points to the "nearest" cluster center (the one with the least squared Euclidean distance)
  • Update step: For each cluster center calculate the new position in order to minimize the total distance between itself and the data points (minimize total quadratic distances)
The algorithm has converged when the assignments no longer change. There is no guarantee that the optimum is found, it does however find a local optimum, and is commonly run multiple times with different random initializations as choosing the best of multiple runs.


Sklearn:
from sklearn.cluster import KMeans
clf = KMeans(n_cluster=8, [ number of cluster]
n_init=10, [repeat the algorithm with different cluster center on initial step]
max_iter=300) [maximum number of iteration]
clf.fit(data)
kmeans.predict(test)

kmeans.cluster_centers_

Share
Tweet
Pin
Share
No commenti
OUTLIERS
An outlier is an observation point that is distant from other observations.
OutlierScatterplot_1000.gif


Detention:
You simply follow this flow:
  • Train the algorithm
  • Remove ~10% of data with the largest residual error
  • Train again and evaluate the accuracy test
    • Repeat
The residual of an observed value is the difference between the observed value and the estimated value of the quantity of interest
Share
Tweet
Pin
Share
No commenti
LINEAR REGRESSION220px-Linear_regression.svg.png
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Basically a regression output is not discrete but a function like:


y = ax + b
y is target
a is slope
b is intercept


Error metrics:
The error is calculated
error = actual data - predicted data
The best linear regression is the one that
minimizes all points( actual - predicted)2
algo: ordinary least squares (ols) - gradient descent
But it’s not perfect because it’s an high value if you have multiple data point and lower with fewer, so it’s not comparable very well.


Instead R2


R squared measures the fraction of the variance of the dependent variable expressed by the regression. In simple linear regression it is simply the square of the correlation coefficient. It’s independent from the number of data.


(not good) 0 < r2 > 1 (good)

Sklearn:
from sklearn.linear_model import LinearRegression
clf = LinearRegression()
clf.fit(feature_training, label_training)
prediction = clf.predict(feature_test)
accuracy = clf.score(feature_test, label_test) [ -> R2 error metric ]
slope = clf.coef_

intercept = clf.intercept_

Share
Tweet
Pin
Share
No commenti
Newer Posts
Older Posts

About me


Smiley face
Computer Science Degree, technology enthusiast, programmer, interested in startup & innovation, curious, precise & organized.

Follow Me

  • Facebook
  • Linkedin
  • Twitter
  • Bitbucket
  • Github

recent posts

Categories

  • dev
  • development
  • software
  • learn
  • learning
  • machine
  • machine learning
  • study
  • android
  • google
  • job
  • scikit
  • sklearn
  • app
  • udacity
  • gdg
  • google play
  • html
  • code
  • electronics
  • linux
  • script
  • uda
  • webgl
  • database
  • gdgmilano
  • help
  • open source
  • programming
  • smartphone
  • torino
  • weekend
  • work
  • workshop
  • 3d
  • firebase
  • gps
  • greatmind
  • hardware
  • location
  • personal computer
  • start up
  • .bashrc
  • GB
  • PS3
  • Vallée des Merveilles
  • action
  • analytics
  • audio
  • avi
  • bayes
  • books
  • bug
  • cpu
  • dinolib
  • docker
  • fake
  • ffmpeg
  • force
  • francaise
  • france
  • francia
  • free
  • gear 360
  • gglass
  • git
  • gitconfig
  • glass
  • hdd
  • hike
  • hiring
  • jenkins
  • joke
  • kde
  • kmix
  • magnetism
  • material
  • materialdesign
  • merge-it
  • messaging
  • microservices
  • mint
  • naive bayes
  • navigation drawer
  • nemo
  • nikola
  • nikolatesla
  • pc
  • ram
  • reading
  • refuge
  • samsung
  • space
  • spain
  • ssd
  • steam
  • tesla
  • unturned
  • valle delle meraviglie
  • veromix
  • versioning
  • windows
  • wizard
  • wolley
  • wolleybuy
  • xvid

Blog Archive

  • ottobre (1)
  • settembre (1)
  • gennaio (1)
  • novembre (1)
  • maggio (1)
  • aprile (1)
  • marzo (3)
  • febbraio (3)
  • gennaio (1)
  • novembre (7)
  • ottobre (4)
  • settembre (3)
  • agosto (1)
  • luglio (1)
  • settembre (1)
  • agosto (1)
  • giugno (2)
  • aprile (2)
  • marzo (1)
  • febbraio (3)
  • gennaio (2)
  • novembre (1)
  • agosto (2)
  • luglio (2)
  • giugno (3)
  • marzo (1)
  • novembre (1)
  • ottobre (1)
  • agosto (1)
  • giugno (1)
  • maggio (2)
  • marzo (2)
  • febbraio (1)
Facebook Linkedin Twitter Bitbucket Github

Created with by ThemeXpose | Distributed By Gooyaabi Templates