All technological notes.
Deep learning
Neural NetworksDeep Neural Networks
Deep Belief Networks
Contrastive Divergence algorithm, a layer of features is learned from perceptible units.Recurrent Neural Networks
Feed Forward Neural Network
perceptrons are organized within layers
back-loopsbackpropagation algorithm can be used to update the weight values, to minimize the prediction error.Recurrent Neural Network
neurons present in the hidden layers receives an input with a specific delay in time.Convolutional Neural Network
data that we can learn from.model of how to transform the data.objective function that quantifies how well (or badly) the model is doing.algorithm to adjust the model’s parameters to optimize the objective function.features / covariates / inputs
label / target
squared error
minimize error rate
gradient descent.Gradient descent (GD) is an iterative first-order optimization algorithm used to find a local minimum/maximum of a given function.
Gradient
Gradient Descent Algorithm

pn+1 = pn - lr*导数 n
parameter η / learning rate
The smaller learning rate the longer GD converges(收敛的时间越长), or may reach maximum iteration before reaching the optimum point.
If learning rate is too big the algorithm may not converge(无法收敛) to the optimal point (jump around) or even to diverge completely(完全偏离)
TensorFlow (TF)
Keras
PyTorch
GPUs (actually, so is TensorFlow).Scikit-learn