AI – The Big Picture

THE BIG PICTURE

AI encompasses a wide range of technologies, including:

  • machine learning,
    Machine learning is the most commonly used AI technology, which involves training algorithms to make predictions or decisions based on data.
  • natural language processing,
    Natural language processing focuses on enabling machines to understand and interpret human language.
  • computer vision,
    Computer vision involves enabling machines to analyse and understand visual information, such as images and videos.
  • robotics,
    Robotics involves developing machines that can perform tasks typically done by humans, such as assembling products, handling materials, and more.
  • and more.

MACHINE LEARNING CATEGORIES

Machine learning can be divided into three main categories:

  • Supervised learning,
    involves training a model on labeled data,
    Deep learning is a subset of supervised learning that uses artificial neural networks to process and learn from data. Deep learning algorithms are capable of automatically extracting features from data, which makes them particularly useful for tasks such as
    • image classification,
    • speech recognition, and
    • natural language processing.
  • Unsupervised learning, and
    training a model on unlabelled data to identify patterns and relationships in the data.
    Unsupervised learning can be used for
    • pre-processing (sorting and grouping of data),
    • feature extraction,
    • anomaly detection,
    • recommendation systems and
    • generative modelling, where you generate new data based on existing data.
  • reinforcement learning (RL).
    training a model to make decisions based on feedback from its environment.
    In reinforcement learning, behaviour cloning is used to train an agent to learn a policy from expert demonstrations. The agent observes the expert’s actions and the corresponding rewards, and then tries to learn a policy that will generate similar actions.
    Behaviour cloning is a simple and efficient way to train an agent, but it can be brittle. The agent will only learn to imitate the expert’s behaviour, and not learn to adapt to changes in the environment. Additionally, behaviour cloning can be biased if the expert’s demonstrations are not representative of the real-world environment.

MACHINE LEARNING ALGORITHMS

There are various forms of machine learning algorithms.

  • Artificial neural networks (ANNs) are a type of machine learning algorithm that are inspired by the human brain. They are made up of interconnected nodes, called artificial neurons, which can learn to perform tasks by being trained on data. ANNs have been used to achieve state-of-the-art results in a variety of tasks, including:
    • image recognition,
    • natural language processing, and
    • speech recognition.
  • Decision trees: A type of machine learning algorithm used for classification and regression tasks. They work by recursively partitioning the data into subsets based on the value of different features, and then assigning a label or prediction to each subset. Decision trees have been used successfully in a wide range of applications, including:
    • fraud detection,
    • medical diagnosis, and
    • credit scoring.
  • Support vector machines: (SVMs) A type of machine learning algorithm that can be used for classification and regression tasks. They work by finding the hyperplane that maximally separates the data into different classes or predicts the value of a continuous target variable. SVMs have been used successfully in applications such as
    • image classification,
    • text classification, and
    • spam filtering.
  • Random forests: An ensemble learning technique that combines multiple decision trees to improve performance. They work by training multiple decision trees on random subsets of the data, and then aggregating the predictions of the individual trees to make a final prediction. Random forests have been used successfully in a wide range of applications, including:
    • credit scoring,
    • remote sensing, and
    • object recognition.
  • K-nearest neighbours: A simple machine learning algorithm that can be used for classification and regression tasks. It works by finding the K data points in the training set that are closest to a given test data point, and then assigning a label or prediction based on the majority vote or average of the labels or predictions of those K data points. KNN has been used successfully in applications such as
    • handwritten digit recognition and
    • gene expression analysis.

USE CASES

AI is being used across a wide range of industries, from healthcare to finance, to manufacturing. In healthcare, AI is being used to develop more accurate diagnostic tools and personalized treatment plans. In finance, AI is being used to detect fraud, assess creditworthiness, and automate trading decisions. In manufacturing, AI is being used to improve efficiency, optimize supply chain management, and reduce waste.

CHATBOTS

Large language models (LLMs), also known as a conversational AI or chatbot, are trained on a massive amount of text data. They are able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, they can provide summaries of factual topics or create stories.

Currently, LLMs also have some weaknesses, such as hallucinations. LLM have a pattern completion behaviour and will sometimes make things up to generate complete answers, because the model:

  • Does not understand that it is allowed to express uncertainty: Stating this may reduce the problem.
  • Has a reluctance to challenge the premise stated by the user,
  • Once caught in a lie, continues to produce lies to keep its response coherent.
  • Is just guessing wrong.

Behaviour cloning on data not in the learning set is actually just learning the model to guess. Likewise, if you train the model to answer “I don’t know” sometimes, you may learn it to withhold some of the information it actually has.

LLMs knows its own uncertainty for short form Q&A, because it puts weights on next-token predictions, Reinforcement Learning could be used to learn behaviour boundaries and adjust output distribution so the model is allowed to express uncertainty, challenge premise and admit errors.

Long form Q&A is difficult because answers are rarely completely wrong. They may be correct in general, but nevertheless contain some false or misleading information. The quality (informativeness, correctness etc) is difficult to measure. OpenAI use a human reference answer for factuality checks and then use GPT4 compare this to LLM’s own answers to rate their quality.

Retrieval in LLM context is there you allow the model to reference external sources. This allows for updates on current events in the world, more detailed information not available in the model. This also open up for sources references. WebGPT was a GPT3 based model that made use of such retrieval with source references. This function is now passed on to GTP4.

Some modern LLMs are distributed transformer-based language models. They can be trained on a larger dataset of data, and they can also be trained for a longer period of time. Distributed transformer-based language models are typically larger than monolithic transformer-based language models. This is because they can be trained on multiple devices, which allows them to be trained on more data, but both MT-NLG and Wu Dao 2.0 claim to be monolithic, so it’s hard to be sure.

ChatGPT by OpenAI

ChatGPT is built on the GPT (Generative Pre-trained Transformer) architecture, a type of deep learning neural network that is pre-trained on large amounts of text data to generate human-like responses to natural language inputs. It uses a combination of techniques such as

  • attention mechanisms,
    assigning a weight or importance score to each element of the input, based on how relevant it is to the current task.
  • multi-head self-attention, and
    Self-attention is a technique that allows a model to attend to different parts of the input when computing a representation for a given word or token in a sentence. For example, when computing a representation for the word «apple» in the sentence «I ate an apple for breakfast», the model might attend to the words «ate» and «breakfast» to better understand the context of the sentence. Multi-head self-attention takes this idea a step further by allowing the model to attend to different parts of the input using multiple «heads», or sub-attention mechanisms. Each head learns a different representation of the input, allowing the model to capture multiple aspects of the context and make more accurate predictions.
  • transformer architecture to generate responses to user inputs. Transformer models are a type of neural network that are particularly good at learning long-range dependencies in text. This makes them well-suited for tasks such as natural language understanding and generation. BERT, the most famous transformer model, is a standalone encoder. At the time of release, it beat the state of the art in many classification tasks, question answering tasks and masked language modelling. Encoders are very powerful at extracting vectors that carry meaningful information about a sequence.

To better understand and respond to user inputs, it also uses other natural language processing (NLP) techniques such as

  • named entity recognition,
  • part-of-speech tagging, and
  • sentiment analysis
    automated identification and extraction of subjective information from text data, such as opinions, emotions, attitudes, and feelings expressed by the author. The goal of sentiment analysis is to determine the overall sentiment or tone of a given piece of text, such as a product review, social media post, or news article.

John Schulman is the chief architect of ChatGPT.

Bard (Google)

Based on LaMDA, which is known for its ability to answer questions in an informative way. LaMDA is trained using Transformer-based neural language models. LaMDA is trained on a dataset of text and code. The text dataset includes books, articles, and other forms of written text. The code dataset includes open-source code from GitHub and other repositories.

Megatron-Turing NLG (NVIDIA / Microsoft)

Megatron-Turing NLG is a transformer-based language model, with 530 billion parameters. It was trained on 10,000 TPUv4 chips, which are Google’s custom-designed 7nm machine learning accelerators produced by TSCM. MT-NLG is trained using the Megatron-DeepSpeed optimizer and a technique called Megatron-Turing NLG. Megatron-Turing NLG is a parallel training system that combines:

  • pipeline parallelism,
    It works by breaking the model into a sequence of stages, where each stage can be executed on a different device. The stages are executed in parallel, and the output of one stage is passed to the next stage. GPT-3 was trained using pipeline parallelism on 175,000 GPUs.
  • data parallelism, and
    Data parallelism works by splitting the data into multiple batches, and then distributing each batch to a different device. The devices then train the model on their respective batches in parallel. The gradients from each device are then aggregated and used to update the model weights. This is a simpler technique than pipeline parallelism, but less suited for models with many parameters.
  • tensor slicing.
    Tensors are simply mathematical objects that can be used to describe physical properties. A scalar is a zero rank tensor, and a vector is a first rank tensor. Tensor slicing is a technique for extracting a subtensor from a tensor. A subtensor is a tensor that is a subset of another tensor. Tensor slicing can be used to extract a specific element, a range of elements, or a multidimensional subtensor from a tensor.

Tensor slicing is a powerful technique that can be used to manipulate tensors in a variety of ways. For example, tensor slicing can be used to extract specific features from a tensor, to perform element-wise operations on a tensor, or to create new tensors from existing tensors.

Tensor slicing is a fundamental operation in deep learning. It is used in a variety of deep learning tasks, such as image classification, natural language processing, and machine translation.

This allows MT-NLG to be trained on a massive dataset of text and code more efficiently than other LLMs. Megatron-Turing NLG is a combination of two different technologies: Megatron and Turing NLG.

  • Megatron is a deep learning library developed by NVIDIA that is designed to train large language models efficiently.
  • Turing NLG is a natural language generation model developed by Microsoft that is designed to generate text that is both accurate and creative.

By combining these two technologies, Megatron-Turing NLG is able to train large language models that are both accurate and creative. MT-NLG is trained using the Megatron-DeepSpeed optimizer, which allows it to be trained efficiently on a large scale.

Claude (Anthropic)

One of the fastest as of Aprile 2023, using a 100K token input window (compared to 32k for GPT4) Claude is trained on a dataset of 1.5 trillion tokens of text that has been filtered to remove offensive or dangerous content. Claude is trained using the Transformer architecture and the AdamW optimizer.

LLaMA (Meta)

A LLM released in February 2023. A variety of model sizes were trained ranging from 7 billion to 65 billion parameters. The 13 billion parameter model’s performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175 billion parameters)

PanGu-Σ (Huawei)

Developed by researchers from the Beijing Academy of Artificial Intelligence (BAAI) and the Institute of Computing Technology of the Chinese Academy of Sciences (ICT). This LLM has 1.085 trillion parameters, using the Megatron-Turing NLG training technique. It was trained on a dataset of 329 billion tokens containing text and code and took 100 days to train. PanGu-Σ has been shown to outperform previous language models.

Wu Dao 2.0 (Beijing Academy of Artificial Intelligence)

A transformer-based language model trained with 1.75 trillion parameters. Wu Dao 2.0 is trained using AdamW optimizer and a technique called FastMoE, which allows it to learn from both text and images. This makes Wu Dao 2.0 more powerful than other LLMs, which are typically only trained on text.

Chinchilla (DeepMind)

A 70 billion parameter LLM trained on a dataset of 1.4 trillion tokens of text and code, using Transformer architecture and the AdamW optimizer. Chinchilla is more scalable than other LLMs. it has been shown to be capable of generating creative text formats, like poems, code, scripts, musical pieces, email, letters, etc

TEXT TO IMAGE

Use deep learning algorithms, such as

  • generative adversarial networks (GANs)
    GANs are a type of neural network that can be used to create realistic images from scratch.  
  • Diffusion models. Diffusion models are a type of neural network that can be used to create images by gradually adding noise to a blank canvas.
  • variational autoencoders (VAEs).
    In machine learning, VAEs have been used to learn latent representations of images, text, and audio. These latent representations can then be used to perform tasks such as classification, regression, and clustering.

In computer vision, VAEs have been used to generate realistic images. These images can be used to create new forms of art or to train other machine learning models.

These algorithms learn to generate images that match a given textual description by training on large datasets of image-text pairs.

DALL-E 2: Developed by OpenAI, It can create a wide range of images, including animals, objects, and even surreal scenes. Not public, expensive, higher user threshold but best quality.

Midjourney is developed by a startup called Midjourney Studio. Midjourney uses generative adversarial networks (GANs) to generate high-quality product images from textual descriptions, such as product names and descriptions. It can create images of products in various poses, colors, and materials. One unique feature is that it can generate images of models wearing clothing. It can also generate images of products in various environments, such as on a beach or in a city street. Publicly available, easy to use, relatively cheap, but slow and sometimes lesser quality than DALL-E.

GPT-3: Although GPT-3 is primarily known for its natural language processing capabilities, it can also generate images from textual descriptions. While the images generated by GPT-3 are not as high-quality as those generated by DALL-E, it is still a powerful text-to-image tool.

CLIP: Also developed by OpenAI, CLIP is a neural network that can associate textual descriptions with images. It can be used to generate images from textual descriptions and to search for images that match a given textual description.

Text2Scene: Text2Scene is a text-to-image application developed by researchers at Stanford University. It can generate 3D scenes from textual descriptions, enabling users to create immersive environments for a variety of applications.

AttnGAN: AttnGAN is a text-to-image application that uses attention mechanisms to generate high-quality images from textual descriptions. It can create a wide range of images, including animals, objects, and even natural scenes.

AUTONOMOUS DRIVING

One of the most significant applications of AI is in the field of autonomous vehicles. Companies such as Tesla, Google, and Uber are developing self-driving cars that use AI technologies to navigate and make decisions on the road. Another area of application is in the field of chatbots and virtual assistants, which use natural language processing to understand and respond to human inquiries.

Public Debate

The public debate surrounding AI has focused on two main areas: its potential benefits and its potential risks. On the one hand, proponents of AI argue that it has the potential to revolutionize industries, improve efficiency, and solve complex problems. They also argue that AI could help address global challenges such as climate change, food security, and healthcare.

On the other hand, critics of AI argue that it poses significant risks, including the potential for job displacement, algorithmic bias, and loss of privacy. There are concerns that AI could lead to widespread unemployment, particularly in industries that rely heavily on manual labor. There are also concerns about algorithmic bias, which can occur when AI systems are trained on biased data and produce discriminatory results. Additionally, there are concerns about the potential for AI to be used for surveillance and other nefarious purposes.

Individuals Contributing to the Public Debate

Several individuals have contributed to the public debate surrounding AI, including scientists, policymakers, and entrepreneurs. Elon Musk, the founder of SpaceX and Tesla, has been a vocal critic of AI, warning that it poses an existential threat to humanity. He has called for increased regulation of AI and warned that it could lead to a future where humans are no longer in control.

On the other hand, Andrew Ng, the founder of deeplearning.ai, has been a strong proponent of AI, arguing that it has the potential to transform industries and solve complex problems. He has called for increased investment in AI research and development and for policymakers to create a regulatory environment that fosters innovation. Deeplearning.ai has developed Deep Learning, Natural Language Processing, and TensorFlow: Data and Deployment.

Another notable individual contributing to the public debate is Fei-Fei Li, the co-director of the Stanford Institute for Human-Centered Artificial Intelligence. Li has been a vocal advocate for the responsible development of AI, arguing that it is essential to address issues such as algorithmic bias and transparency. She has also called for greater diversity in the development of AI technologies, highlighting the need for diverse perspectives to ensure that AI systems are fair and equitable.

MEDICAL AND SCIENCE APPLICATIONS

Tankeleser? (artisana.ai/01.05.2023)

Published in Nature Neuroscience, researchers from the University of Texas at Austin used fMRI to gather 16 hours of brain recordings from three human subjects as they listened to narrative stories.

The GPT model generated intelligible word sequences from perceived speech, imagined speech, and even silent videos with remarkable (?) accuracy:

Perceived speech (subjects listened to a recording): 72–82% decoding accuracy.

Imagined speech (subjects mentally narrated a one-minute story): 41–74% accuracy.

Silent movies (subjects viewed soundless Pixar movie clips): 21–45% accuracy in decoding the subject’s interpretation of the movie.

The research team conducted an additional study in which decoders trained on data from other subjects were used to decode the thoughts of new subjects. The researchers found that «decoders trained on cross-subject data performed barely above chance.

Protein folding

AlphaFold, developed by DeepMind, uses deep neural networks to learn patterns in the vast amounts of data available on protein sequences and structures, and to make predictions about the structure of a protein based on its amino acid sequence. Additional techniques include:

Convolutional neural networks: Analysing the spatial relationships between different parts of the protein sequence and identifying the most probable folding configurations.

Policy gradient descent is a type of reinforcement learning algorithm used to optimize a policy, which is a function that maps states to actions in a given environment. The goal is to learn the optimal policy that maximizes the expected cumulative reward over a sequence of actions. In policy gradient descent, the policy is represented as a neural network, with the input being the current state of the environment and the output being a probability distribution over the available actions. The algorithm learns to adjust the parameters of the neural network using gradient descent to maximize the expected cumulative reward, which is defined as the sum of the rewards obtained by taking a sequence of actions in the environment.

The gradient of the expected cumulative reward with respect to the parameters of the policy network is computed using the policy gradient theorem, which states that the gradient is proportional to the product of the reward and the log probability of the action taken by the policy. By adjusting the parameters of the policy network in the direction of the gradient, the algorithm learns to improve the policy and maximize the expected cumulative reward.

Policy gradient descent has been successfully applied to a wide range of applications in robotics, game playing, and control systems. Its ability to optimize policies directly makes it particularly useful for problems where the optimal policy is complex and difficult to specify manually. However, policy gradient descent can be sensitive to the choice of hyperparameters and may suffer from high variance due to the stochastic nature of the algorithm.

Monte Carlo Tree Search: is used to explore the vast space of possible protein structures and select the most probable structures. (MCTS) is a decision-making algorithm used to solve complex problems, such as game playing, resource allocation, and optimization. The algorithm uses a combination of random simulation and strategic selection to efficiently explore a large search space of possible solutions and identify the best solution.
MCTS builds a tree structure that represents the possible moves and outcomes of a given problem and use it to guide the search for the best solution. At each step of the algorithm, the tree is traversed using a selection strategy that balances exploration of unexplored regions of the tree with exploitation of regions that have shown promise in the past.

Once a leaf node is reached, the algorithm uses random simulation to generate a set of possible outcomes from that point, and then evaluates each outcome using a scoring function. The scores are then propagated up the tree to update the statistics of each node, and the process repeats until a solution is found or a time limit is reached.

MCTS has been successfully applied in many domains, including game playing (e.g., AlphaGo and AlphaZero), optimization (e.g., vehicle routing), and decision-making under uncertainty (e.g., robot planning). Its ability to effectively balance exploration and exploitation makes it a powerful tool for solving problems with large search spaces, where other methods may struggle to find the optimal solution.

HVEM ER DE?

DeepMind Technologies

Grunnlagt 2010 i UK, av Demis Hassabis, Shane Legg and Mustafa Suleymankjøpt av Google i 2014. I 2016 ble de kjent for AlphaGo, et nevralt nettverk bygd for å lære spill. AlphaZero er en mer generell utgave, som bl.a. spiller sjakk.

I 2020 lanserte de AlphaFold, som kan beregne protein-folding. I 2022 jobber de med Flamingo, en Visual Language Model som kan beskrive bilder. Andre kjente prosjekter er Gato, et generalist-program som kan lære flere ting samtidig, AlphaCode, som kan programmere, språkmodellen Chinchilla, chatbotten Sparrow og AlphaStar som spiller Starcraft II.

The Battleground

Those against
Eliezer Yudkowsky (autodidakt)
Geoffrey Hinton (psykolog, Google Mind)
Sam Altman, (OpenAI)
Demis Hassabis (DeepMind) Stuart Russell

Those in the middle
Gary Marcus (forfatter)

Those in favour
Yann LeCun (chief AI sc, Meta)
Pedro Domingos (prof CS at UW)

John Schulman, jobber ved Berkley, med prosjektet TruthGPT, som bruker RL for å gjøre AI mer sannferdig.