{"id":12379,"date":"2021-12-08T09:54:40","date_gmt":"2021-12-08T14:54:40","guid":{"rendered":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/?p=12379"},"modified":"2021-12-08T09:54:40","modified_gmt":"2021-12-08T14:54:40","slug":"improving-vision-transformer-efficiency-and-accuracy-by-learning-to-tokenize","status":"publish","type":"post","link":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/improving-vision-transformer-efficiency-and-accuracy-by-learning-to-tokenize\/12379\/","title":{"rendered":"Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize"},"content":{"rendered":"Originally published in Google AI Blog, Dec 7, 2021. Transformer models\u00a0consistently obtain state-of-the-art results in computer vision tasks, including\u00a0object detection\u00a0and\u00a0video classification. In contrast to standard\u00a0convolutional approaches\u00a0that process images pixel-by-pixel, the\u00a0Vision Transformers\u00a0(ViT) treat an image as a sequence of patch tokens (i.e., a smaller part, or \u201cpatch\u201d, of an image made up of multiple pixels). This means that at every layer, a ViT model recombines and processes patch tokens based on relations between each pair of tokens, using\u00a0multi-head self-attention. In doing so, ViT models have the capability to construct a global representation of the entire image. At the input-level, <a href=\"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/improving-vision-transformer-efficiency-and-accuracy-by-learning-to-tokenize\/12379\/\" class=\"more-link\">(more&hellip;)<\/a>","protected":false},"excerpt":{"rendered":"<p>Originally published in Google AI Blog, Dec 7, 2021. Transformer models\u00a0consistently obtain state-of-the-art results in computer vision tasks, including\u00a0object detection\u00a0and\u00a0video classification. In contrast to standard\u00a0convolutional approaches\u00a0that process images pixel-by-pixel, the\u00a0Vision Transformers\u00a0(ViT) treat an image as a sequence of patch tokens (i.e., a smaller part, or \u201cpatch\u201d, of an image made up of multiple pixels). This [&hellip;]<\/p>\n","protected":false},"author":72,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[11,48],"tags":[879,368,166,243,8],"class_list":["post-12379","post","type-post","status-publish","format-standard","hentry","category-industry-news","category-left-hand","tag-ai","tag-artificial-intelligence","tag-data-science","tag-machine-learning","tag-predictive-analytics"],"_links":{"self":[{"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/posts\/12379","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/users\/72"}],"replies":[{"embeddable":true,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/comments?post=12379"}],"version-history":[{"count":2,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/posts\/12379\/revisions"}],"predecessor-version":[{"id":12381,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/posts\/12379\/revisions\/12381"}],"wp:attachment":[{"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/media?parent=12379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/categories?post=12379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.predictiveanalyticsworld.com\/machinelearningtimes\/wp-json\/wp\/v2\/tags?post=12379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}