博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
机器学习笔记(Washington University)- Regression Specialization-week five
阅读量:6320 次
发布时间:2019-06-22

本文共 2317 字,大约阅读时间需要 7 分钟。

1. Feature selection

Sometimes, we need to decrease the number of features

Efficiency:

With fewer features, we can compute quickly

Interpretaility:

what relevant features are for the preidction task

 

2.  All subsets algorithm

We just take every possible combination of features we want to include in our model.

and we can evaluate them using the validation set or the cross validatoin. But the complexity is

that if the number of features D is large, then the choices goes up to O(2^D)

 

3. Greedy algorithm

We fit model using the current feature set(or empty), then we start to select the next best feature

with the lowest trainning error. And the complexity has been reduced to O(D^2)

 

4. Lasso

For ridge, the weights have been shrunk, but it is reduced to zero. Now we want to get some coefficients

exactly to zero. Firstly we cannot just set small ridge coefficients to zero, because the correalted features will

all get small weights, maybe it is relevant to our predictions.lasso add the bias to reduce variance, and with

group of highly correlated features, lasso tends to select arbitrarily.

 

Lasso regression(L1 regularized regression)

lambda s used to balance fit of the model and sparsity.

when lambda is between 0 and infinity, the solution of W(Lasso) is between 0 and W(Least square solution)

The gradient of |w| does not exist when Wj = 0, And there is no close-form solution for lasso. 

and we can use subgradients instead of using gradients

 

5. Coordinate descent

At each iteration, we only update only one coordinate instead of all coordinates, so we get this axis-aligned moves.

And we do not need to choose stepsize.

 

6. Normalize Features

We need to normalize both the training set and test set.

 

7. Coordinate descent for least squares regression

Suppose the feature is normalized and we get the partial deriative:

while not converged:  for j =[0,1,2...D]    compute ρj    and set wj=ρj

and in the case case of lasso, lambda is our tuning parameter for the model

while not converged:  for j =[0,1,2...D]    compute ρj    and set wj= ρj+λ/2(if ρj<-λ/2), 0(if ρj in [λ/2,λ/2]]),ρj=λ/2(if ρj>-λ/2)

in coordinate descent, the convergence is detected when over an entire sweep of all coordinates, if the

maximum step you take is less than your tolerance.

 

转载于:https://www.cnblogs.com/climberclimb/p/6819709.html

你可能感兴趣的文章
App 卸载记录
查看>>
计算机网络与Internet应用
查看>>
Django 文件下载功能
查看>>
走红日本 阿里云如何能够赢得海外荣耀
查看>>
在市场营销中使用敏捷方法:过程、团队与成功案例
查看>>
新书问答:Agile Management
查看>>
react入门
查看>>
VUE高仿饿了么app
查看>>
针对Kubernetes软件栈有状态服务设计的思考
查看>>
第八章 进程间通信
查看>>
CentOS 7 巨大变动之 firewalld 取代 iptables
查看>>
教你如何使用Flutter和原生App混合开发
查看>>
订单的子单表格设置颜色
查看>>
lvs fullnat部署手册(三)rs内核加载toa篇
查看>>
iframe 在ie下面总是弹出新窗口解决方法
查看>>
android编译系统makefile(Android.mk)写法
查看>>
MD5源代码C++
查看>>
Eclipse 添加 Ibator
查看>>
Linux中变量$#,$@,$0,$1,$2,$*,$$,$?的含义
查看>>
Python编程语言
查看>>