Developments Occurring Within Computer Vision | by Richmond Alake | Jun, 2022

By Jessie Hobb On Jun 9, 2022

An overview of the field of computer vision and how advances in technological infrastructure support its growth and scalability

Artificial intelligence (AI) practitioners and developers that work in computer vision (CV) implement and integrate solutions to problems involving vision within computers and computer systems. Image classification, face detection, pose estimation, and optical flow are typical examples of CV tasks.

Deep learning algorithms lend themselves well to solving computer vision problems. The architectural characteristics of convolutional neural networks enable the detection and extraction of spatial patterns and features present in image data. In other words, machines can identify and classify objects and even react to them.

Therefore, Computer Vision Engineers refer to themselves as Deep Learning Engineers or just plain old Machine Learning Engineers.

Computer vision is a rapidly growing field comprising research, business, and commercial applications. Advanced research into computer vision is now more directly and immediately applicable to the commercial world.

The field of computer vision is moving forward at a rapid pace, which necessitates CV experts to stay up-to-date on the latest discoveries and advancements.

Key Takeaways

Cloud computing services that help scale deep learning solutions
Automated machine learning (AutoML) solutions reduce the amount of repetitive work required in a standard machine learning pipeline
Researchers’ efforts to enable the use of Transformer architectures for optimizing computer vision tasks

Cloud computing provides computing resources such as data storage, application servers, networks, and compute infrastructure to individuals or businesses via the internet. In contrast to using local resources to execute computations, cloud computing solutions offer a quick and cost-effective answer to compute resource availability and scaling.

Storage and processing power are required for machine learning solution implementation. The focus on data during the early phases of a machine learning project (data aggregation, cleaning and wrangling) involves cloud computing resources for data storage and application/data solution interface access (BigQuery, Hadoop, BigTable).

Interconnected data center representing the need for cloud computing and cloud services — Photo by Taylor Vick on Unsplash

Recently, there has been a notable increase in devices and systems enabled with computer vision capabilities, such as pose estimation for gait analysis, face recognition for mobile phones, lane detection in autonomous vehicles, etc.

The demand for cloud storage is increasing, and it is projected that this industry will be valued at $390.33 billion — five times the market’s current value in 2021.

The computer vision of market size and application is projected to grow tremendously, leading to an increase in the use of inbound data to train machine learning models. An increase in data samples required to develop and train ML models directly correlates to larger data storage capacity requirements and extensively powerful compute resources.

The increased availability of GPUs has accelerated computer vision solutions. However, when servicing thousands or even millions of consumers, GPUs alone aren’t always enough to provide the scalability and uptime required by these applications. The obvious answer to this problem is cloud computing.

Cloud computing platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, provide solutions to core components of the machine learning and data science project pipeline, including data aggregation, model implementation, deployment, and monitoring.

Gaining an increased awareness of cloud computing services relevant to computer vision and general machine learning places any CV Engineer at an advantage to a business. By conducting an in-depth cost-benefit analysis, it is possible to identify the benefits of cloud computing services.

A good rule of thumb is to ensure that as a CV Engineer, you have an awareness or some form of exposure to at least one of the major cloud services providers and their solutions, including their advantages and disadvantages.

The following are examples of NVIDIA services that support typical computer vision operations to highlight what type of cloud computing services are suited to a CV Engineer.

Leveraging NVIDIA’s extensive NVIDIA Graphics Processing Unit Cloud (NGC) Catalog of pretrained deep learning models abstracts the complexity of deep learning model implementation and training. Deep learning scripts provide CV Engineers with ready-made pipelines that are customizable to meet unique requirements. The robust model deployment solution automates the delivery of models to end-users.

Additionally, NVIDIA Triton Inference Server enables the deployment of models from frameworks such as TensorFlow and PyTorch on any GPU or CPU-based infrastructure. The NVIDIA Triton inference Server provides scalability of models across various platforms, including cloud, edge, and embedded devices.

Furthermore, NVIDIA’s partnership with cloud service providers such as AWS enables the capability to deploy CV-based assets. Either NGC or AWS with minimal considerations given to infrastructure and compute resources by leveraging packaged solutions put together by NVIDIA’s experts. This means CV Engineers can focus more on model performance and optimization.

Businesses are incentivized to reduce costs and optimize strategies where feasible. Cloud computing and cloud service providers fulfilled this requirement by providing billed solutions based on usage and scaled based on service demand.

Machine learning algorithms and model development are processes that involve a number of tasks that can benefit from automation and reduction in manual processes through the creation of automated operation pipelines.

Take, for example, feature engineering and model selection. Feature engineering is a process that involves the detection and selection of relevant information and attributes from data that lends itself well to either describing the dataset or improving the performance of the machine learning-based solution.

Model selection involves evaluating the performance of a group of machine learning classifiers, algorithms, or solutions to a given problem. These activities require considerable time for ML Engineers and Data Scientists to complete and frequently necessitate practitioners revisiting the procedure operations to enhance model performance or accuracy.

The field of artificial intelligence (AI) devoted to automating a number of manual and repetitive operations in the machine learning process is called automated machine learning or AutoML.

AutoML enables the automation of repetitive tasks such as numeric calculations. — Photo by Stephen Dawson on Unsplash

There are several large ongoing projects to simplify the intricacies of a machine learning project pipeline. AutoML is an effort that goes beyond abstraction, and it focuses on automation and augmentation of ML workflows and procedures to make ML easy and accessible for non-ML experts.

Taking a second to examine the market value of the AutoML industry, projections expect the AutoML market to reach $14 billion by 2030. This would mean an increase in size nearly 42 times higher than its current value.

Computer vision projects have a series of repeated tasks to achieve desired goals. CV Engineers involved in model implementation know all too well. The amount of repeated work that goes into finding the appropriate hyperparameters enables the model’s training to converge to an optimal loss and achieve the desired accuracy, a process known as hyperparameter optimization/tuning.

Model selection and feature engineering are time-consuming and repetitive processes. AutoML is the effort to automate repetitive processes within machine learning pipelines.

This particular application of machine learning and automation is gaining traction. CV Engineers need to be aware of the benefits and limitations of AutoML.

AutoML in practice

AutoML is still a new technology focused on automating standard machine learning procedures. However, in the long term, the advantages gained are significant.

An obvious benefit of AutoML to CV and ML Engineers is the time-saving factor. Data aggregation, data preparation, and hyperparameter optimization are time-consuming processes that arguably do not use the core skills and capabilities of ML Engineers.

Hyperparameter tuning involves a trial and error process with educated guesses. While data preparation and aggregation are necessary processes, they involve repetitive tasks and depend on locating appropriate data sources. AutoML capabilities that prove successful in automating these processes allow CV Engineers to dedicate more time and effort to more demanding and fulfilling tasks.

AutoML and its applications, notably data sourcing, still contribute to data quality and, primarily, model performance. The attainment of quality data specific to the problem domain isn’t ripe for automation and requires expert human observation and oversight.

For those interested in exploring GPU powered AutoML, the widely used Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning library aimed at optimizing machine learning processes and pipelines through genetic programming. RAPIDS cuML provides TPOT functionalities accelerated with GPU compute resources. This article provides more information on TPOT and RAPIDS cuML.

Machine Learning libraries and Frameworks are essential in any CV Engineer’s toolkit. The development and advancement of ML libraries and frameworks are gradual and continuous. Major deep learning libraries such as TensorFlow, PyTorch, Keras, and MXNet received constant updates and fixes in 2021, and there’s no reason to assume this won’t continue into 2022.

More recently, there have been exciting advances going on in mobile-focused deep learning libraries and packages that optimize commonly used DL libraries.

MediaPipe extended its pose estimation capabilities in 2021 to provide 3D pose estimation through the BlazePose model, and this solution is available in the browser and on mobile environments. In 2022, expect to see more pose estimation applications in use cases involving dynamic movement and require robust solutions, such as motion analysis in dance and virtual character motion simulation.

PyTorch Lighting is becoming increasingly popular among researchers and professional machine learning practitioners due to its simplicity, abstraction of complex neural network implementation details, and augmentation of hardware considerations.

Deep learning methods have long been used to tackle computer vision challenges. Neural network architectures used to conduct face detection, lane detection, and pose estimation all use deep consecutive layers of convolutional neural networks.

CV Engineers are very aware of CNNs and will need to stay more attuned to the field’s research development, especially with the application of Transformer to solve computer vision tasks. Transformer, a deep learning architecture introduced in the paper “Attention Is All You Need” in 2017.

The article presented a new methodology for creating a computational representation of data by utilizing the attention mechanism to derive the significance of one part of the input data relative to other input data segments. The transformer neural network architecture does not utilize the conventions of convolutional neural networks, but research has shown the applications of Transformers in vision-related tasks.

Explore a Transformer model through the NGC Catalog that includes details of the architecture and utilization of an actual transformer model in PyTorch.

Transformers have made a considerable impact within the NLP domain, just refer to the accomplishments of GPT(Generative Pretrained Transformer) and BERT (Bidirectional Encoder Representation From Transformer).

This paper, published in the last quarter of 2021, provides a high-level overview of the application of the Transformer network architecture to computer vision.

CV Engineers interested in applied ML and not familiar with reading research papers, then this post presents a systematic method for reading and understanding research papers.

Edge devices are becoming increasingly powerful, and on-device inference capabilities are a must-have feature for mobile applications used by customers who expect quick service delivery and AI features.

Mobile devices are a direct commercial application of computer vision features. — Photo by Homescreenify on Unsplash

The incorporation of computer vision-enabling functionalities within mobile devices reduces the latency for obtaining model inference results; incorporating computer vision features into mobile devices provides benefits such as:

Reduced latency when mode inference results are obtained. The non-dependency on cloud servers for inference results conducted and provided on-device instead decreases the waiting time for inference outcomes.
By design, on-device inference capabilities limit the transfer of data from devices to cloud servers. This feature enhances the privacy and security of data as there are little to no data transfer requirements.
The reduced cost of removing dependencies on cloud GPU/CPU server for inference provides additional financial benefits.

Many businesses are exploring mobile offerings of their product and service, and this also includes an exploration of methods in how existing AI functionalities can be replicated on mobile devices. CV engineers should be aware of several platforms, tools, and frameworks to implement mobile-first AI solutions.

Computer vision technologies will increase in usage as AI becomes more integrated into our daily lives. As it becomes increasingly widespread in our society, the demand for specialists with knowledge in computer vision systems will rise.

CV engineers must stay on top of the newest developments and trends in the sector to stay ahead of the curve and capitalize on recent advancements. In 2022, you should be aware of the growing popularity of PyTorch Lighting, mobile-focused deep learning libraries, and the use of Transformers in computer vision applications.

Additionally, edge devices are becoming more powerful, and businesses are exploring mobile offerings of their product and service. Mobile-focused deep learning libraries and packages are worth keeping an eye on, as they are likely to see increased usage in the coming year.

In 2022, expectAutoML capabilities to become more widely used and the continued growth of ML libraries and frameworks. Increased development in augmented and VR applications will allow CV Engineers to extend their skills into new domains, like developing intuitive and efficient methods of replicating real objects into a 3D space. Computer vision applications will continue to change and influence the future, and there will be more development observed in supporting technological infrastructure supporting computer vision systems.

A version of this article first appeared on the Nvidia Developer Blog