什么样的小鱼| 2008年是什么年| 肝硬化是什么原因引起的| 52岁属什么| 科甲是什么意思| 穿拖鞋脚臭是什么原因| 鬼火是什么| 钟馗是什么人物| 脾胃虚寒吃什么药好| 脸上容易出油是什么原因| 血沉50说明什么原因| 外面下着雨犹如我心血在滴什么歌| 轻度高血压吃什么食物可以降压| 2019年什么生肖| 古代内衣叫什么| 990是什么意思| 胎盘低是什么原因造成的| 诺贝尔奖是什么意思| 割包皮属于什么科室| 龟苓膏有什么作用| 回迁房是什么意思| 精液是什么味道的| 平五行属什么| 牙疼吃什么饭菜比较好| 电子邮件地址是什么意思| 甲亢是什么| us什么意思| 餐巾纸属于什么垃圾| 商贩是什么意思| 7月26日什么星座| 985和211是什么意思| 洋葱不能和什么食物一起吃| 情难自禁是什么意思| 河里有什么鱼| 眼花是什么原因引起的| 春季感冒吃什么药| 42年属什么生肖| 澈字五行属什么| 男人吃什么可以增强性功能| 泌尿感染是什么症状| 发小是什么意思| 鸾凤和鸣什么意思| 1980属什么| 女人依赖男人说明什么| 什么软件可以开空调| 一什么水花| 10.1是什么星座| 乌鱼子是什么意思| 情人的定义是什么| 马齿苋有什么好处| 试管什么降调| 甚嚣尘上是什么意思| 谵语是什么意思| 体育总局局长什么级别| 骨折不能吃什么| 钙是什么| 肛裂用什么药治最好效果最快| 什么运动有助于长高| 尿酸盐结晶是什么意思| 白细胞和淋巴细胞偏高是什么原因| 肛周湿疹挂什么科| 健谈是什么意思| 虢是什么意思| 交工是什么意思| 手指甲的月牙代表什么| 大学记过处分有什么影响| c3是什么驾驶证| 右下腹疼痛什么原因| 三点水一个前读什么| nfl是什么意思| 什么是特异性皮炎| 什么时候容易怀孕| 瞌睡多是什么原因| 腰扭伤挂什么科| 幽门螺旋杆菌阳性吃什么药| 腮腺炎吃什么食物| 眼睛干涩是什么原因引起的| 肛塞是什么| 吃什么肉不会胖又减肥| 冉是什么意思| 什么是成熟| 月经来了痛经吃什么药| 肺结核吃什么好| 3月21日什么星座| 什么的工作| 子宫囊肿是什么病| 用什么梳子梳头发最好| 什么是性病| 阙是什么意思| 精神病的前兆是什么| 天道好轮回什么意思| 什么叫发物| 米鱼是什么鱼| 为什么不建议切除脂肪瘤| 种牙和假牙有什么区别| 1月24日什么星座| 紫色和蓝色混合是什么颜色| 血压压差小是什么原因| 中国国花是什么| 王母娘娘叫什么名字| 一个月的小猫吃什么| 牙齿痛吃什么药好| 忽必烈姓什么| 布洛芬是什么药| 早上9点是什么时辰| 慰问金是什么意思| 胃热吃什么中成药| 41是什么意思| 三七粉适合什么人群喝| 月经期间吃什么食物最好| 感染艾滋病有什么症状| 哈尔滨有什么特产| 蓝色加黄色等于什么颜色| 嘴唇暗红色是什么原因| 陈皮和橘子皮有什么区别| 灵枢是什么意思| 北顶娘娘庙求什么灵验| 无花果是什么季节的水果| 做梦梦见蛇是什么意思| 富裕是什么意思| 梦见对象出轨什么征兆| 小孩手指头脱皮是什么原因| 谷氨酰基转移酶低是什么原因| 吃什么食物对心脏有好处| 葡萄球菌感染是什么原因引起的| 尿血吃什么药| 什么门什么户| 更年期出汗吃什么药好| 阴阳数字是什么数| 什么叫培根| 吃什么清肺| 探病送什么花| 毛豆不能和什么一起吃| ne医学上是什么意思| 属蛇的是什么星座| 舌头无苔是什么原因| 抗甲状腺球蛋白抗体高是什么意思| 乌龟喜欢吃什么食物| 2型糖尿病吃什么药降糖效果好| 心外科是看什么病的| 心电监护pr是什么意思| 头皮脂溢性皮炎用什么洗发水| 炸酱面用什么酱| 发烧吃什么药退烧快| 麦芽糊精是什么| 强身之道的强是什么意思| ntd是什么意思| 白色t恤配什么裤子| 为什么下巴经常长痘痘| 肾气不足吃什么药好| 丹参是什么样子| 第二个手指叫什么| 梦见在天上飞是什么意思| 冲太岁是什么意思| 骨折后吃什么恢复快| 1月24日是什么星座| 被隐翅虫咬了用什么药| 牙龈黑紫色是什么原因| 泌尿感染是什么原因引起的| 红颜知己是什么意思| 五年生存率是什么意思| 全血粘度低切偏高是什么意思| 四世同堂什么意思| 羊吃什么食物| 什么时候敷面膜是最佳时间| 全身无力是什么原因| 什么茶可以减肥消脂| 蕈是什么意思| 肾上腺分泌什么激素| 什么的蚂蚁| 麝香对孕妇有什么危害性| 为什么不| 衪是什么意思| 18号来月经什么时候是排卵期| hpv16有什么症状| 什么是单核细胞百分比| 农历7月25日是什么星座| 血小板低吃什么补的快| 12月7号是什么星座| 月经提前十几天是什么原因| 包涵是什么意思| 孕妇梦见龙是什么征兆| 羞明畏光是什么意思| 头发为什么长不长| 过年给老人买什么| ph值高是什么原因| 红豆与赤小豆有什么区别| 老夫聊发少年狂什么意思| 梦见吃月饼是什么意思| 花中隐士是什么花| 女性备孕吃什么养卵泡| 夏天用什么护肤品比较好| 眼镜是什么时候发明的| 头发的主要成分是什么| 思想包袱是什么意思| 约炮是什么意思| 龙的九个儿子都叫什么名字| 莲藕是荷花的什么部位| 天牛喜欢吃什么| 感冒头疼吃什么药好| 什么病不能吃豆制品| 什么红酒好喝| 好奇害死猫是什么意思| 梦见河水是什么意思| 褪黑素什么时候吃| 房间放什么可以驱蜈蚣| 蜂蜜为什么会结晶| 心悸是什么意思啊| 旮旯是什么意思| 什么是npc| 泰州有什么好玩的地方| 蔬菜沙拉都放什么菜| 华在姓氏里读什么| 腌羊肉串放什么调料| 好难过这不是我要的结果什么歌| 翡翠属于什么玉| 小腿发胀是什么原因| 龙凤呈祥代表什么生肖| 网监是干什么的| 泡沫尿是什么原因| 芥子是什么意思| 东南方向是什么位置| 脸长适合什么样的发型| 痱子是什么| 佝偻病是什么意思| 心肌缺血挂什么科| 为什么会得鼻炎| 结核感染是什么意思| 什么样的礼物| 冰心原名什么| 飞机杯长什么样| 凉栀是什么意思| 头自动摇摆是什么原因| 口下面一个巴念什么| 花生属于什么类食物| min是什么意思| 胃疼是什么原因| 前列腺液是什么| 晚上难入睡是什么原因| 什么是失信被执行人| 头皮屑是什么| 五月二十日是什么星座| 抹茶是什么做的| 驿站什么意思| 农历6月是什么星座| 皮肤病是什么原因造成的| 为什么会长疱疹| 福建有什么好吃的| 体型最大的恐龙是什么| 物以类聚是什么意思| 1973年属牛是什么命| 韩世忠为什么不救岳飞| 脸浮肿是什么原因引起的| 十月一日什么星座| 吃粽子是什么节日| 早餐吃什么养胃| 接驳是什么意思| 薛定谔的猫比喻什么| 兔子的眼睛是什么颜色| 畏寒肢冷是什么意思| 什么是三净肉| 什么叫上升星座| 青蛙像什么| 借你吉言是什么意思| 百度Jump to content

我国乡村振兴战略的实施路径

From Wikipedia, the free encyclopedia
百度 年轻人没时间陪伴老人,老人在相亲角图个乐,你为什么要主动跑去讨没趣?  说到底,不少年轻人轻易被相亲角挑起情绪,一方面是相亲角的某些物化方式也在现实生活中横行,这让年轻人承载了过多物质压力。

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., prediction of prices in the financial international markets. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Introduction

[edit]

In the setting of supervised learning, a function of is to be learned, where is thought of as a space of inputs and as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution on . In reality, the learner never knows the true distribution over instances. Instead, the learner usually has access to a training set of examples . In this setting, the loss function is given as , such that measures the difference between the predicted value and the true value . The ideal goal is to select a function , where is a space of functions called a hypothesis space, so that some notion of total loss is minimized. Depending on the type of model (statistical or adversarial), one can devise different notions of loss, which lead to different learning algorithms.

Statistical view of online learning

[edit]

In statistical learning models, the training sample are assumed to have been drawn from the true distribution and the objective is to minimize the expected "risk" A common paradigm in this situation is to estimate a function through empirical risk minimization or regularized empirical risk minimization (usually Tikhonov regularization). The choice of loss function here gives rise to several well-known learning algorithms such as regularized least squares and support vector machines. A purely online model in this category would learn based on just the new input , the current best predictor and some extra stored information (which is usually expected to have storage requirements independent of training data size). For many formulations, for example nonlinear kernel methods, true online learning is not possible, though a form of hybrid online learning with recursive algorithms can be used where is permitted to depend on and all previous data points . In this case, the space requirements are no longer guaranteed to be constant since it requires storing all previous data points, but the solution may take less time to compute with the addition of a new data point, as compared to batch learning techniques.

A common strategy to overcome the above issues is to learn using mini-batches, which process a small batch of data points at a time, this can be considered as pseudo-online learning for much smaller than the total number of training points. Mini-batch techniques are used with repeated passing over the training data to obtain optimized out-of-core versions of machine learning algorithms, for example, stochastic gradient descent. When combined with backpropagation, this is currently the de facto training method for training artificial neural networks.

Example: linear least squares

[edit]

The simple example of linear least squares is used to explain a variety of ideas in online learning. The ideas are general enough to be applied to other settings, for example, with other convex loss functions.

Batch learning

[edit]

Consider the setting of supervised learning with being a linear function to be learned: where is a vector of inputs (data points) and is a linear filter vector. The goal is to compute the filter vector . To this end, a square loss function is used to compute the vector that minimizes the empirical loss where

Let be the data matrix and is the column vector of target values after the arrival of the first data points. Assuming that the covariance matrix is invertible (otherwise it is preferential to proceed in a similar fashion with Tikhonov regularization), the best solution to the linear least squares problem is given by

Now, calculating the covariance matrix takes time , inverting the matrix takes time , while the rest of the multiplication takes time , giving a total time of . When there are total points in the dataset, to recompute the solution after the arrival of every datapoint , the naive approach will have a total complexity . Note that when storing the matrix , then updating it at each step needs only adding , which takes time, reducing the total time to , but with an additional storage space of to store .[1]

Online learning: recursive least squares

[edit]

The recursive least squares (RLS) algorithm considers an online approach to the least squares problem. It can be shown that by initialising and , the solution of the linear least squares problem given in the previous section can be computed by the following iteration: The above iteration algorithm can be proved using induction on .[2] The proof also shows that . One can look at RLS also in the context of adaptive filters (see RLS).

The complexity for steps of this algorithm is , which is an order of magnitude faster than the corresponding batch learning complexity. The storage requirements at every step here are to store the matrix , which is constant at . For the case when is not invertible, consider the regularised version of the problem loss function . Then, it's easy to show that the same algorithm works with , and the iterations proceed to give .[1]

Stochastic gradient descent

[edit]

When this is replaced by or by , this becomes the stochastic gradient descent algorithm. In this case, the complexity for steps of this algorithm reduces to . The storage requirements at every step are constant at .

However, the stepsize needs to be chosen carefully to solve the expected risk minimization problem, as detailed above. By choosing a decaying step size one can prove the convergence of the average iterate . This setting is a special case of stochastic optimization, a well known problem in optimization.[1]

Incremental stochastic gradient descent

[edit]

In practice, one can perform multiple stochastic gradient passes (also called cycles or epochs) over the data. The algorithm thus obtained is called incremental gradient method and corresponds to an iteration The main difference with the stochastic gradient method is that here a sequence is chosen to decide which training point is visited in the -th step. Such a sequence can be stochastic or deterministic. The number of iterations is then decoupled to the number of points (each point can be considered more than once). The incremental gradient method can be shown to provide a minimizer to the empirical risk.[3] Incremental techniques can be advantageous when considering objective functions made up of a sum of many terms e.g. an empirical error corresponding to a very large dataset.[1]

Kernel methods

[edit]

Kernels can be used to extend the above algorithms to non-parametric models (or models where the parameters form an infinite dimensional space). The corresponding procedure will no longer be truly online and instead involve storing all the data points, but is still faster than the brute force method. This discussion is restricted to the case of the square loss, though it can be extended to any convex loss. It can be shown by an easy induction [1] that if is the data matrix and is the output after steps of the SGD algorithm, then, where and the sequence satisfies the recursion: and Notice that here is just the standard Kernel on , and the predictor is of the form


Now, if a general kernel is introduced instead and let the predictor be then the same proof will also show that predictor minimising the least squares loss is obtained by changing the above recursion to The above expression requires storing all the data for updating . The total time complexity for the recursion when evaluating for the -th datapoint is , where is the cost of evaluating the kernel on a single pair of points.[1] Thus, the use of the kernel has allowed the movement from a finite dimensional parameter space to a possibly infinite dimensional feature represented by a kernel by instead performing the recursion on the space of parameters , whose dimension is the same as the size of the training dataset. In general, this is a consequence of the representer theorem.[1]

Online convex optimization

[edit]

Online convex optimization (OCO) [4] is a general framework for decision making which leverages convex optimization to allow for efficient algorithms. The framework is that of repeated game playing as follows:

For

  • Learner receives input
  • Learner outputs from a fixed convex set
  • Nature sends back a convex loss function .
  • Learner suffers loss and updates its model

The goal is to minimize regret, or the difference between cumulative loss and the loss of the best fixed point in hindsight. As an example, consider the case of online least squares linear regression. Here, the weight vectors come from the convex set , and nature sends back the convex loss function . Note here that is implicitly sent with .

Some online prediction problems however cannot fit in the framework of OCO. For example, in online classification, the prediction domain and the loss functions are not convex. In such scenarios, two simple techniques for convexification are used: randomisation and surrogate loss functions.[citation needed]

Some simple online convex optimisation algorithms are:

Follow the leader (FTL)

[edit]

The simplest learning rule to try is to select (at the current step) the hypothesis that has the least loss over all past rounds. This algorithm is called Follow the leader, and round is simply given by: This method can thus be looked as a greedy algorithm. For the case of online quadratic optimization (where the loss function is ), one can show a regret bound that grows as . However, similar bounds cannot be obtained for the FTL algorithm for other important families of models like online linear optimization. To do so, one modifies FTL by adding regularisation.

Follow the regularised leader (FTRL)

[edit]

This is a natural modification of FTL that is used to stabilise the FTL solutions and obtain better regret bounds. A regularisation function is chosen and learning performed in round t as follows: As a special example, consider the case of online linear optimisation i.e. where nature sends back loss functions of the form . Also, let . Suppose the regularisation function is chosen for some positive number . Then, one can show that the regret minimising iteration becomes Note that this can be rewritten as , which looks exactly like online gradient descent.

If S is instead some convex subspace of , S would need to be projected onto, leading to the modified update rule This algorithm is known as lazy projection, as the vector accumulates the gradients. It is also known as Nesterov's dual averaging algorithm. In this scenario of linear loss functions and quadratic regularisation, the regret is bounded by , and thus the average regret goes to 0 as desired.

Online subgradient descent (OSD)

[edit]

The above proved a regret bound for linear loss functions . To generalise the algorithm to any convex loss function, the subgradient of is used as a linear approximation to near , leading to the online subgradient descent algorithm:

Initialise parameter

For

  • Predict using , receive from nature.
  • Choose
  • If , update as
  • If , project cumulative gradients onto i.e.

One can use the OSD algorithm to derive regret bounds for the online version of SVM's for classification, which use the hinge loss

Other algorithms

[edit]

Quadratically regularised FTRL algorithms lead to lazily projected gradient algorithms as described above. To use the above for arbitrary convex functions and regularisers, one uses online mirror descent. The optimal regularization in hindsight can be derived for linear loss functions, this leads to the AdaGrad algorithm. For the Euclidean regularisation, one can show a regret bound of , which can be improved further to a for strongly convex and exp-concave loss functions.

Continual learning

[edit]

Continual learning means constantly improving the learned model by processing continuous streams of information.[5] Continual learning capabilities are essential for software systems and autonomous agents interacting in an ever changing real world. However, continual learning is a challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting.

Interpretations of online learning

[edit]

The paradigm of online learning has different interpretations depending on the choice of the learning model, each of which has distinct implications about the predictive quality of the sequence of functions . The prototypical stochastic gradient descent algorithm is used for this discussion. As noted above, its recursion is given by

The first interpretation consider the stochastic gradient descent method as applied to the problem of minimizing the expected risk defined above.[6] Indeed, in the case of an infinite stream of data, since the examples are assumed to be drawn i.i.d. from the distribution , the sequence of gradients of in the above iteration are an i.i.d. sample of stochastic estimates of the gradient of the expected risk and therefore one can apply complexity results for the stochastic gradient descent method to bound the deviation , where is the minimizer of .[7] This interpretation is also valid in the case of a finite training set; although with multiple passes through the data the gradients are no longer independent, still complexity results can be obtained in special cases.

The second interpretation applies to the case of a finite training set and considers the SGD algorithm as an instance of incremental gradient descent method.[3] In this case, one instead looks at the empirical risk: Since the gradients of in the incremental gradient descent iterations are also stochastic estimates of the gradient of , this interpretation is also related to the stochastic gradient descent method, but applied to minimize the empirical risk as opposed to the expected risk. Since this interpretation concerns the empirical risk and not the expected risk, multiple passes through the data are readily allowed and actually lead to tighter bounds on the deviations , where is the minimizer of .

Implementations

[edit]

See also

[edit]

Learning paradigms

General algorithms

Learning models

References

[edit]
  1. ^ a b c d e f g L. Rosasco, T. Poggio, Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2015. Chapter 7 - Online Learning
  2. ^ Kushner, Harold J.; Yin, G. George (2003). Stochastic Approximation and Recursive Algorithms with Applications (Second ed.). New York: Springer. pp. 8–12. ISBN 978-0-387-21769-7.
  3. ^ a b Bertsekas, D. P. (2011). Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optimization for Machine Learning, 85.
  4. ^ Hazan, Elad (2015). Introduction to Online Convex Optimization (PDF). Foundations and Trends in Optimization.
  5. ^ Parisi, German I.; Kemker, Ronald; Part, Jose L.; Kanan, Christopher; Wermter, Stefan (2019). "Continual lifelong learning with neural networks: A review". Neural Networks. 113: 54–71. arXiv:1802.07569. doi:10.1016/j.neunet.2019.01.012. ISSN 0893-6080. PMID 30780045.
  6. ^ Bottou, Léon (1998). "Online Algorithms and Stochastic Approximations". Online Learning and Neural Networks. Cambridge University Press. ISBN 978-0-521-65263-6.
  7. ^ Stochastic Approximation Algorithms and Applications, Harold J. Kushner and G. George Yin, New York: Springer-Verlag, 1997. ISBN 0-387-94916-X; 2nd ed., titled Stochastic Approximation and Recursive Algorithms and Applications, 2003, ISBN 0-387-00894-2.
[edit]
内热是什么意思 秘书是干什么的 心里烦躁是什么原因 女性分泌物增多发黄是什么原因 男人结扎有什么危害
女性尿道口有小疙瘩是什么原因 慈母手中线的下一句是什么 五十而知天命是什么意思 去黄疸吃什么药 尿钙是什么意思
3月份是什么星座 无名指麻木是什么原因 随大流什么意思 桃子什么季节成熟 野猫吃什么
钱是什么意思 发生什么事 reed是什么意思 吃什么容易放屁 10月17是什么星座
吃什么对痔疮好得快hcv9jop0ns8r.cn 榴莲为什么这么贵hcv8jop1ns1r.cn 生气容易得什么病hcv9jop3ns2r.cn 语什么心什么hcv8jop8ns2r.cn 吃南瓜有什么好处和坏处hcv7jop4ns8r.cn
张辽字什么zhiyanzhang.com 万花筒是什么hcv9jop5ns5r.cn 西瓜能做什么美食hcv8jop4ns2r.cn 代管是什么意思hcv9jop2ns8r.cn 什么的北京城hcv9jop1ns9r.cn
黄鼠狼最怕什么hcv8jop3ns4r.cn 虎与什么生肖相合hcv8jop6ns0r.cn 叶黄素对眼睛有什么功效hcv7jop4ns8r.cn 两个gg是什么牌子的包包zhiyanzhang.com 什么叫做凤凰男wuhaiwuya.com
怀孕了吃什么药可以打掉hcv7jop9ns2r.cn 阿尔山在内蒙古什么地方hcv8jop0ns0r.cn 蛊是什么hcv8jop5ns9r.cn 拉肚子吃什么消炎药dajiketang.com 住院需要带什么生活用品hcv8jop6ns4r.cn
百度