We are working to restore the Unionpedia app on the Google Play Store
OutgoingIncoming
🌟We've simplified our design for better navigation!
Instagram Facebook X LinkedIn
Your own Unionpedia with your logo and domain, from 9.99 USD/month
Create my Unionpedia

Llama.cpp

Index Llama.cpp

llama.cpp is an open source software library mostly written in C++ that performs inference on various Large Language Models such as Llama. [1]

Table of Contents

  1. 40 relations: Advanced Vector Extensions, Android (operating system), Apple silicon, AVX-512, Bfloat16 floating-point format, BLOOM (language model), C (programming language), C++, Central processing unit, Command-line interface, DBRX, Fabrice Bellard, Fine-tuning (deep learning), Gemini (language model), GitHub, GPT-2, Graphics processing unit, Grok (chatbot), Half-precision floating-point format, Inference engine, Justine Tunney, Large language model, Library (computing), Llama (language model), Machine learning, Mamba (deep learning architecture), MIT License, Mozilla, Open source, OpenAI, PyTorch, Quantization (signal processing), Single-precision floating-point format, SYCL, Tensor (machine learning), Tensor algebra, Vulkan, Web server, Whisper (speech recognition system), X86-64.

  2. Large language models
  3. Open-source artificial intelligence

Advanced Vector Extensions

Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD).

See Llama.cpp and Advanced Vector Extensions

Android (operating system)

Android is a mobile operating system based on a modified version of the Linux kernel and other open-source software, designed primarily for touchscreen mobile devices such as smartphones and tablets.

See Llama.cpp and Android (operating system)

Apple silicon

Apple silicon refers to a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture.

See Llama.cpp and Apple silicon

AVX-512

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200 (Knights Landing), and then later in a number of AMD and other Intel CPUs (see list below).

See Llama.cpp and AVX-512

Bfloat16 floating-point format

The bfloat16 (brain floating point) floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

See Llama.cpp and Bfloat16 floating-point format

BLOOM (language model)

BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) is a 176-billion-parameter transformer-based autoregressive large language model (LLM). Llama.cpp and BLOOM (language model) are large language models.

See Llama.cpp and BLOOM (language model)

C (programming language)

C (pronounced – like the letter c) is a general-purpose programming language.

See Llama.cpp and C (programming language)

C++

C++ (pronounced "C plus plus" and sometimes abbreviated as CPP) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup.

See Llama.cpp and C++

Central processing unit

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer.

See Llama.cpp and Central processing unit

Command-line interface

A command-line interface (CLI) is a means of interacting with a computer program by inputting lines of text called command-lines.

See Llama.cpp and Command-line interface

DBRX

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. Llama.cpp and DBRX are large language models.

See Llama.cpp and DBRX

Fabrice Bellard

Fabrice Bellard (born 1972) is a French computer programmer known for writing FFmpeg, QEMU, and the Tiny C Compiler.

See Llama.cpp and Fabrice Bellard

Fine-tuning (deep learning)

In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained model are trained on new data.

See Llama.cpp and Fine-tuning (deep learning)

Gemini (language model)

Google Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Llama.cpp and Gemini (language model) are large language models.

See Llama.cpp and Gemini (language model)

GitHub

GitHub is a developer platform that allows developers to create, store, manage and share their code.

See Llama.cpp and GitHub

GPT-2

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. Llama.cpp and GPT-2 are large language models.

See Llama.cpp and GPT-2

Graphics processing unit

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles.

See Llama.cpp and Graphics processing unit

Grok (chatbot)

Grok is a generative artificial intelligence chatbot developed by xAI.

See Llama.cpp and Grok (chatbot)

Half-precision floating-point format

In computing, half precision (sometimes called FP16 or float16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory.

See Llama.cpp and Half-precision floating-point format

Inference engine

In the field of artificial intelligence, an inference engine is a software component of an intelligent system that applies logical rules to the knowledge base to deduce new information.

See Llama.cpp and Inference engine

Justine Tunney

Justine Alexandra Roberts Tunney (born 1984) is an American software developer and a former activist for Occupy Wall Street.

See Llama.cpp and Justine Tunney

Large language model

A large language model (LLM) is a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. Llama.cpp and large language model are large language models.

See Llama.cpp and Large language model

Library (computing)

In computer science, a library is a collection of read-only resources that is leveraged during software development to implement a computer program.

See Llama.cpp and Library (computing)

Llama (language model)

Llama (acronym for Large Language Model Meta AI, and formerly stylized as LLaMA) is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. Llama.cpp and Llama (language model) are large language models.

See Llama.cpp and Llama (language model)

Machine learning

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data and thus perform tasks without explicit instructions.

See Llama.cpp and Machine learning

Mamba (deep learning architecture)

Mamba is a deep learning architecture focused on sequence modeling.

See Llama.cpp and Mamba (deep learning architecture)

MIT License

The MIT License is a permissive software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s.

See Llama.cpp and MIT License

Mozilla

Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape.

See Llama.cpp and Mozilla

Open source

Open source is source code that is made freely available for possible modification and redistribution. Llama.cpp and Open source are free and open-source software.

See Llama.cpp and Open source

OpenAI

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California.

See Llama.cpp and OpenAI

PyTorch

PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. Llama.cpp and PyTorch are open-source artificial intelligence.

See Llama.cpp and PyTorch

Quantization (signal processing)

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements.

See Llama.cpp and Quantization (signal processing)

Single-precision floating-point format

Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

See Llama.cpp and Single-precision floating-point format

SYCL

SYCL (pronounced "sickle") is a higher-level programming model to improve programming productivity on various hardware accelerators.

See Llama.cpp and SYCL

Tensor (machine learning)

Tensor informally refers in machine learning to two different concepts that organize and represent data.

See Llama.cpp and Tensor (machine learning)

Tensor algebra

In mathematics, the tensor algebra of a vector space V, denoted T(V) or T(V), is the algebra of tensors on V (of any rank) with multiplication being the tensor product.

See Llama.cpp and Tensor algebra

Vulkan

Vulkan is a low-level, low-overhead cross-platform API and open standard for 3D graphics and computing.

See Llama.cpp and Vulkan

Web server

A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS.

See Llama.cpp and Web server

Whisper (speech recognition system)

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.

See Llama.cpp and Whisper (speech recognition system)

X86-64

x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first announced in 1999.

See Llama.cpp and X86-64

See also

Large language models

Open-source artificial intelligence

References

[1] https://en.wikipedia.org/wiki/Llama.cpp

Also known as GGUF.