Faglig Innhold
Modern computer vision (CV), driven by deep learning (DL), increasingly known as visual intelligence (VI), allows machines to interpret and understand visual data. This technology, crucial today in fields like autonomous driving and medical image computing, is expected to revolutionize various industries by enabling more accurate and efficient visual analysis. The course will cover the mathematical and computational foundations essential for deep learning-based CV, alongside key neural architectures, and their training mechanisms, including supervised, self-supervised, unsupervised, and reinforcement-based learning. It will address crucial computer vision tasks, highlighting influential and state-of-the-art models for each task. The course will investigate the principal frameworks and tools in the field and explore the application domains that are driving advancements in computer vision. Some more details about the course content: DL fundamentals : From neurons/units to neural networks (NNs). Ground truth (GT) data, parameters (weights and biases), activation functions and loss functions. Computational graphs, update rule, gradients, and supervised learning. Forward and backward pass in shallow NNs, matrix notation. Normalization (data/batch) and initialization (parameters). Hyper-parameter tuning and gradient decent optimization (from simple to SOTA optimizers). Generalization and regularization. Architectures : Fully Connected (Dense) NNs (FCNNs), Convolutional NNs (CNNs) and different types of convolutions (inc. Residual NNs and Capsule Nets), Recurrent NNs (RNNs, LSTMs, GRUs) for CV (e.g., sequences of frames in a video), Transformers and the self-attention mechanism. Vision Transformers. Graph NNs (GNNs) for CV. Retentive Networks (RetNets). CV tasks : Supervised: Image Classification, Object Detection, Segmentation (semantic, instance, panoptic), Depth estimation and POSE estimation etc. Object Tracking (e.g., same ID on object in a video sequence). Self-Supervised Learning (SSL): Large Vision Models and Multi-model (inc. images, video) Foundation Models. Unsupervised Learning: Autoencoders (AE) and Variational Autoencoders (VAE). Generative Adversarial Networks (GANs). Normalizing flows. Diffusion models. Reinforcement learning in the context of CV: Value-based methods, Policy gradient methods and Actor-critic methods.
Læringsmål
Knowledge: - Understand the fundamental concepts and mathematical principles behind deep learning algorithms and their application to modern computer vision. - Recognize the structure and functionality of various neural network architectures (FCNNs, CNNs, Vision Transformers etc.), as well as their roles in addressing specific computer vision tasks. - Comprehend the theoretical aspects of learning mechanisms such as supervised, self-supervised, unsupervised, and reinforcement learning, and how they contribute to the field of visual intelligence. Skills: - Apply knowledge of deep learning to construct and train neural networks for a range of computer vision tasks, such as image classification, object detection, segmentation, depth estimation, pose estimation and generative AI for vision tasks. - Employ state-of-the-art optimization techniques, normalization processes, and regularization methods to enhance the generalization of neural network models. - Utilize principal frameworks and tools established in the field to implement and evaluate computer vision models. General competences: - Analyze and critically assess different neural network models and architectures, and select the most appropriate one for a given visual intelligence task. - Integrate advanced computer vision solutions in various application domains, such as autonomous driving and medical image computing, to improve accuracy and efficiency. - Exhibit problem-solving abilities by tuning hyperparameters and adjusting network architectures to optimize performance for computer vision tasks.