githubEdit

LLM


Large language models: A survey

Here's a breakdown of each section of the survey paper on LLMs, starting with the Introduction and proceeding through to Challenges and Future Directions:

1. Introduction

  • Historical Context: The introduction covers the evolution of language models from the early statistical approaches, such as n-grams, to neural networks that introduced embeddings and contextual language understanding. These early developments laid the groundwork for today's large language models (LLMs), which can process and generate human-like language.

  • Significance of Transformers: The advent of transformer architecture, particularly with the self-attention mechanism, allowed LLMs to handle long-term dependencies in text more effectively. This section emphasizes that transformers enabled parallel processing, which sped up model training on large datasets, making modern LLMs possible.

  • General-Purpose AI Potential: The introduction concludes with a vision for LLMs as foundational models for artificial general intelligence (AGI), given their abilities in language generation, instruction-following, and multi-step reasoning across diverse tasks.

2. Large Language Models Overview

  • LLM Families: This section highlights three main families of LLMs: GPT, LLaMA, and PaLM, each with unique characteristics and performance improvements:

    • GPT Family: Initially developed by OpenAI, the GPT series has progressed through GPT-1, GPT-2, GPT-3, and GPT-4, showcasing advanced natural language processing (NLP) abilities like in-context learning and few-shot adaptation. GPT-3’s emergence abilities (e.g., general-purpose reasoning) and GPT-4's multi-modal capabilities have set new benchmarks.

    • LLaMA Family: Meta’s LLaMA models are open-source, offering a research-friendly alternative to GPT. LLaMA models use efficient architectures with fewer parameters and various optimizations, like rotary positional embeddings, for efficient training. LLaMA-2 includes advanced dialogue capabilities with fine-tuning on human feedback.

    • PaLM Family: Developed by Google, the PaLM models focus on massive scaling and efficient training with the Pathways system. PaLM-2 introduced further efficiency, with an emphasis on multilingual capabilities and reasoning tasks. Specialized versions like Med-PaLM target domain-specific tasks, such as healthcare.

  • Model Types and Architectures: The models fall into categories like encoder-only (for understanding tasks), decoder-only (for generation tasks), and encoder-decoder (for sequence-to-sequence tasks). Different architectures are optimized for specific types of NLP tasks, from text classification to language generation and machine translation.

3. Building LLMs

  • Data Preprocessing: LLMs require massive amounts of high-quality text data. This section explains the importance of cleaning (filtering noise, deduplication, and handling imbalances) to improve performance. Deduplication, in particular, prevents overfitting by ensuring models don’t overlearn patterns from repeated data.

  • Tokenization: Tokenization divides text into manageable chunks. Popular techniques include Byte-Pair Encoding (BPE), WordPiece, and SentencePiece, which handle out-of-vocabulary words and work well across multiple languages.

  • Pre-training and Fine-tuning: LLMs are first pre-trained on vast datasets in a self-supervised fashion using techniques like autoregressive (next-token prediction) or masked language modeling (filling in missing words). Fine-tuning, including Instruction Tuning, adapts the model for specific tasks or domains. For instance, reinforcement learning from human feedback (RLHF) enables models like InstructGPT and ChatGPT to align with user expectations.

  • Model Efficiency: To address the computational demands of LLMs, this section covers advancements like knowledge distillation (to create smaller, efficient versions of LLMs), quantization (reducing model precision), and sparse training methods like Mixture of Experts (MoE), which activate only relevant parts of the model per task, saving computational resources.

4. Using and Augmenting LLMs

  • Prompt Engineering: Effective use of LLMs often starts with careful prompt design. This section explores prompt tuning, in-context learning (providing examples within the prompt), and few-shot learning, where the model adapts to new tasks with minimal training data.

  • Retrieval-Augmented Generation (RAG): LLMs can be combined with external knowledge sources to enhance accuracy. RAG uses a retrieval system to fetch relevant documents, which the LLM then references during generation. This approach is popular for question answering and information retrieval tasks.

  • Tool Integration: LLMs are increasingly integrated with external tools like calculators or databases to perform actions beyond text generation. This includes API Calling and Function Calling, where the model can execute external code or access real-time data for interactive tasks.

  • LLM-Based Agents: Advanced applications use LLMs as AI agents that interact with dynamic environments, making decisions and taking actions. By continually learning from interactions, these agents can improve over time, often using reinforcement learning techniques.

5. Datasets and Benchmarks

  • Dataset Categories: Datasets are grouped based on the tasks they support:

    • Basic Tasks: Datasets for foundational tasks like language modeling, text understanding, and text generation.

    • Emergent Abilities: Datasets that challenge LLMs on in-context learning, reasoning, and instruction following, designed to assess higher-order capabilities.

    • Augmented Tasks: Datasets that incorporate tools or external knowledge sources, testing how well LLMs can access and use external information.

  • Benchmarks: This section also covers popular benchmarks used to evaluate LLM performance, such as GLUE, SQuAD, and BIG-Bench. Each benchmark measures a different set of skills, from text classification to comprehension and reasoning, often across multiple languages or domains.

6. LLM Performance

  • Comparison of Models: Performance comparisons among various LLMs on benchmarks highlight which models excel in specific tasks. For example, GPT-4 is noted for its multi-modal abilities, while LLaMA-2 and PaLM-2 excel in multilingual and dialogue tasks.

  • Evaluation Metrics: Common metrics include BLEU (for language translation), F1 score (for classification tasks), and perplexity (for language generation quality). Human evaluation is also essential, particularly for tasks requiring nuance, like dialogue generation and ethical content moderation.

7. Challenges and Future Directions

  • Efficiency and Scalability: While LLMs are highly capable, they demand immense computational resources. Future research focuses on creating more compute-efficient models, such as those using quantization, MoE, or knowledge distillation.

  • Ethics and Safety: The rapid deployment of LLMs raises concerns about bias, misinformation, and ethical considerations. The paper stresses the importance of responsible AI practices, such as fairness in training data and transparency about model limitations.

  • Multi-modal and Cross-Modal Models: Expanding LLMs to handle other data types (e.g., images, audio, and video) will enable richer AI applications. Multi-modal LLMs, like KOSMOS and Gemini, are early steps in this direction.

  • Enhanced Augmentation Techniques: Improving how LLMs use external knowledge and tools is a priority. Future LLMs could be more autonomous, effectively utilizing online databases or APIs to provide real-time information and even perform complex multi-step tasks.

  • Security: Ensuring secure use of LLMs involves safeguarding against adversarial attacks and unauthorized use, along with monitoring models for unintended harmful outputs.

This survey provides a comprehensive look at the current landscape, challenges, and potential future directions for LLM research. Let me know if there’s a specific section where you’d like more detail!


StockGPT

Abstract

  • Objective: Introduction of StockGPT, an AI model trained on U.S. stock returns to predict future stock performance.

  • Performance: StockGPT demonstrates significant returns through portfolios based on its predictions, outperforming traditional stock factors.

  • Implication: Shows generative AI’s potential in surpassing human-designed strategies in complex financial decisions.


1. Introduction

  • GenAI Impact: Generative AI (e.g., ChatGPT) is transforming sectors, including finance, with advanced predictive capabilities.

  • StockGPT Concept: Trained directly on numeric stock data, StockGPT differs from prior models trained on financial text data.

  • Advantages of StockGPT:

    • Learns directly from price data, bypassing reliance on news.

    • Provides predictions at each time point for each stock.

    • Offers a probability distribution of future returns, not just a single estimate.


2. Model Architecture

2.1 Overview

  • Structure: StockGPT uses a decoder-only transformer (similar to GPT) to predict the next value in a sequence of returns.

  • Training Objective: Model learns by minimizing the difference between its predictions and actual values.

2.2 Details

  • Embeddings: Uses token and positional embeddings to give context and structure to sequences.

  • Attention Mechanism: Enables the model to focus on relevant past data points.

  • Multi-Head Attention: StockGPT combines multiple attention heads to understand patterns in sequences.

2.3 StockGPT Specifics

  • Data Discretization: Converts continuous returns into discrete tokens for model processing.

  • Model Parameters:

    • Vocabulary size: 402 tokens, sequence length: 256, embedding size: 128.

    • 4 attention blocks, each with 4 self-attention heads.

  • Training and Testing:

    • Trained on stock data from 1926-2000; tested on data from 2001-2023.

    • Daily Forecast: Uses the previous 256 days to predict the next-day return.


3. Data

  • Source: U.S. stock return data from the Center for Research in Security Prices (CRSP).

  • Training and Testing Split:

    • Training data (1926-2000).

    • Testing data (2001-2023), includes stocks from NYSE, AMEX, and NASDAQ.


4. Results: Daily Prediction

4.1 Fama–MacBeth Regression

  • Objective: Evaluate accuracy of StockGPT predictions.

  • Results: Predictions closely align with realized returns, outperforming previous language model-based predictions.

4.2 Portfolio Sorting

  • Method: Long-short portfolios based on StockGPT predictions.

  • Performance:

    • Equal-weighted Portfolio: Annualized returns of 119% with a Sharpe ratio of 6.5.

    • Value-weighted Portfolio: Returns of 27%, indicating better predictability for small-cap stocks.

4.3 Spanning Test

  • Comparison: Assesses overlap with known price-based strategies.

  • Outcome: StockGPT-based portfolios span traditional strategies (e.g., momentum, reversals) and stock market factors.


5. Results: Monthly Prediction

5.1 Fama–MacBeth Regression

  • Evaluation: Assesses monthly prediction accuracy.

  • Result: Significant prediction accuracy for monthly returns, outperforming some standard stock factors.

5.2 Portfolio Sorting

  • Method: Monthly rebalanced long-short portfolios.

  • Performance:

    • Equal-weighted Portfolio: 13% annual return with a Sharpe ratio of 1.

    • Comparison: Outperformed by factors like short-term reversal, indicating potential for practical monthly strategies.

5.3 Spanning Test

  • Significance: The model’s monthly portfolio has a substantial alpha against standard factors, indicating unique predictive power.


6. Conclusion

  • StockGPT Potential: StockGPT performs well even on out-of-sample data, validating AI’s efficacy in stock prediction.

  • Future Enhancements:

    • Frequent retraining.

    • Expansion with higher data granularity and more complex architecture.

    • Potential with high-frequency data.


This summary should provide a clear, point-wise overview of each section. Let me know if you'd like a more detailed breakdown of any part.


A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model

1. Introduction

The paper begins by emphasizing the importance of time series data across multiple fields such as finance, healthcare, and IoT, where traditional models have shown limitations in flexibility and transferability. It introduces two main strategies to generalize time series representations: building models from scratch or adapting large language models (LLMs) for time series. To guide this review, the paper presents a 3E framework (Effectiveness, Efficiency, and Explainability) and introduces a domain taxonomy that helps readers understand advancements across specific fields.

This section contrasts past reviews on foundation models and time series analysis. It highlights that prior studies explored the application of LLMs in time series but lacked a detailed look into both large foundational time series models and explainability. The current survey addresses this gap by providing a comprehensive comparison across Effectiveness, Efficiency, Explainability, and Domain taxonomy.

3. Preliminary Concepts

  • Foundation Models: Defined as models pre-trained on large datasets and fine-tuned for downstream tasks, demonstrating generalization and adaptability. The paper notes their successful application in fields like computer vision and NLP.

  • Time Series Analysis: Discusses multivariate and univariate time series, along with popular tasks like classification, forecasting, imputation, and anomaly detection.

  • Time Series Properties: Key properties include temporal dependency, spatial dependency (especially relevant for multivariate series), and semantics diversity (different representations based on context).

4. Pre-training Foundation Models for Time Series

This section examines key steps for creating foundational time series models from scratch, covering:

  • Data Processing: Discusses methods for dataset alignment (like Z-score normalization), augmentation, and data quality control.

  • Architecture Design: Explores backbone models, emphasizing transformers as effective choices for processing long-range dependencies in time series data. The paper also discusses transformer modes and channel settings (channel-independence vs. channel-mixing).

5. Adapting Large Language Models for Time Series

The paper classifies time series adaptations into two categories:

  • Embedding-Visible LLM Adaption: Time series are transformed into vector sequences. Critical issues discussed include aligning time series embeddings with LLM semantic space and addressing time series-specific properties like temporal patterns, multivariate dependencies, and timestamp information.

  • Text-Visible LLM Adaption: Time series tasks are reformulated in natural language, allowing for the use of zero-shot and few-shot learning capabilities of LLMs. Prompting is essential here, often requiring design techniques like chain-of-thought (CoT) or in-context learning.

6. Efficiency of Model Fine-Tuning and Inference

This section assesses techniques to improve the efficiency of time series models, from fully fine-tuning to parameter-efficient methods. Highlighted strategies include tuning-free paradigms and various fine-tuning options, balancing computation demands with performance.

7. Explainability of Foundation Models

Explainability is explored through both local and global explanations. Techniques for enhancing model transparency include visualization, explanations based on temporal dependencies, and CoT, which articulates a step-by-step reasoning process.

8. Advancements in Application Domains

A taxonomy categorizes advancements by field—general domains, healthcare, finance, traffic, and others. This section also introduces open-source tools, datasets, and libraries for accelerating research.

9. Resources

Finally, this section compiles essential resources like codebases, benchmarks, and time series libraries, supporting continued research in time series analysis and model training.


This detailed summary covers each section's key points and insights into time series foundation models and their development through pre-training and LLM adaptation. Let me know if you'd like a deeper dive into any specific part!

Last updated