Infermatic

Guide to quant FP8

infermatic — Tue, 27 Aug 2024 02:43:41 +0000

Simple Guide to Convert an FP16 Model to FP8

Overview

This simple guide to quant models walks you through converting a model from FP16 to FP8, an 8-bit data format that significantly improves model inference efficiency without sacrificing output quality. FP8 is ideal for quantizing large language models (LLMs), ensuring faster and more cost-effective deployments.

Requirements for the quants

VM with GPUs: Ensure your VM has sufficient GPUs to handle the FP16 model download and conversion process.
Supported GPU Architectures: The conversion process requires GPUs with NVIDIA Ada Lovelace or Hopper architectures, such as the L4 or H100 GPUs.

Step 1: Setup the Environment

Access your VM or GPU environment and open a terminal.
Install Python and Pip:
```
sudo apt install python3-pip
```

Install the required Python packages:

pip install transformers
pip install -U "huggingface_hub[Cli]"

Clone the AutoFP8 repository:

git clone https://github.com/neuralmagic/AutoFP8.git

Navigate to the AutoFP8 directory:
```
cd AutoFP8
```
Install AutoFP8:
```
pip install -e .
```

Step 2: Download the FP16 Model

In a new terminal, use the Hugging Face CLI to download the FP16 model:

huggingface-cli download [modelName]

Step 3: Quantize the Model to FP8

Open the quantize_model.py script in a text editor:
```
nano quantize_model.py
```

Modify the script to reference the downloaded model name:

from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "meta-llama/Meta-Llama-3-8B-Instruct"
quantized_model_dir = "Meta-Llama-3-8B-Instruct-FP8-Dynamic"

# Define quantization config with static activation scales
quantize_config = BaseQuantizeConfig(quant_method="fp8", activation_scheme="dynamic")
# For dynamic activation scales, there is no need for calibration examples
examples = []

# Load the model, quantize, and save checkpoint
model = AutoFP8ForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
model.quantize(examples)
model.save_quantized(quantized_model_dir)

Run the quantization script:
```
python3 quantize_model.py
```

Step 4: Upload the Quantized FP8 Model

Log in to Hugging Face:
```
huggingface-cli login
```
Paste your Hugging Face token when prompted.
Navigate to the model’s weight directory:
```
cd [path_to_model_weights]
```
Upload the FP8 model:
```
huggingface-cli upload [modelName]
```

Conclusion

You have successfully converted your FP16 model to FP8 and uploaded it to Hugging Face!!! This conversion will allow for faster and more efficient inference, especially for large language models.

Check our FP8 models.

Understanding FP8 Quantization

TL;DR: FP8 is an 8-bit data format that offers an alternative to INT8 for quantizing LLMs. Thanks to its higher dynamic range, FP8 is suitable for quantizing more of an LLM’s components, most notably its activations, making inference faster and more efficient. FP8 quantization is also safer for smaller models, like 7B parameter LLMs, than INT8 quantization, offering better performance improvements with less degradation of output quality.

An Introduction to Floating Point Numbers

Floating point number formats were a revelation in the math that underpins computer science, and their history stretches back over 100 years. Today, floating point number formats are codified in the IEEE 754-2019 spec, which sets international standards for how floating point numbers are expressed.

A floating point number has 3 parts:

Sign: A single bit indicating if the number is positive or negative.
Range (Exponent): The power of the number.
Precision (Mantissa): The significant digits of the number.

In contrast, an integer representation is mostly significant digits (precision). It may or may not have a sign bit depending on the format, but no exponent.

FP8 vs INT8 Data Formats

FP8 and INT8 are both 8-bit values, but the way they use those bits determines their utility as data formats for model inference. Here’s a comparison of the dynamic range of each format:

INT8 dynamic range: 2^8
E4M3 FP8 dynamic range: 2^18
E5M2 FP8 dynamic range: 2^32

This higher dynamic range means that after FP16 values are mapped to FP8, it’s easier to tell them apart and retain more of the encoded information from the model parameters, making FP8 quantization more reliable for smaller models.

Applying FP8 in Production

In practice, FP8 enables quantizing not just an LLM’s weights but also the activations and KV cache, avoiding expensive calculations in FP16 during model inference. FP8 is supported on latest-generation GPUs such as the NVIDIA H100 GPU, where alongside other optimizations, it can deliver remarkable performance with minimal quality degradation.

Alternative with vLLM: Quick Start with Online Dynamic Quantization

Dynamic quantization of an original precision BF16/FP16 model to FP8 can be achieved with vLLM without any calibration data required. You can enable the feature by specifying --quantization="fp8" in the command line or setting quantization="fp8" in the LLM constructor.

In this mode, all Linear modules (except for the final lm_head) have their weights quantized down to FP8_E4M3 precision with a per-tensor scale. Activations have their minimum and maximum values calculated during each forward pass to provide a dynamic per-tensor scale for high accuracy. As a result, latency improvements are limited in this mode.

vLLM Quantization Documentation

The post Guide to quant FP8 appeared first on Infermatic.

L3-70B-Euryale-v2.1

infermatic — Tue, 20 Aug 2024 15:16:29 +0000

Meet L3 70B Euryale v2.1: Your New Creative Companion

What is L3 70B Euryale v2.1 ?

L3 70B Euryale v2.1 is a text generation model, ranked as the moment as one of the best RP/Story Writing models. As described by its creator Sao10K, like the big sister of L3 Stheno v3/3 8B. Think of her a model -> stronger, smarter, and more aware.

What Can L3 70B Euryale v2.1 Do?

Euryale is all about creativity and interaction. Here are some cool things she can help you with:

Storytelling and Writing: Need help with writing a story or creating a world for your next big novel? Euryale can spin tales, build worlds, and even help with dialogue that feels natural and engaging.
Role-Playing Adventures: Whether you’re into role-playing games or just want to dive into a character-driven story, Euryale can play along, adapting to different characters and scenarios with ease.
Virtual Assistants: Imagine having a chatbot that isn’t just smart but also feels like a real conversation partner. Euryale can power advanced virtual assistants that are great for customer service, education, or just having fun chats.
Creative Collaboration: Working on a creative project? Euryale can brainstorm ideas, suggest plot twists, or even generate poetry to inspire your next masterpiece.

Why is this model special?

Prompt Adherence: She’s great at sticking to the instructions you give her, making sure the output is just what you wanted.
Awareness: Euryale understands space and context better than previous models. For example, if you’re describing a scene, she’ll keep track of where things are and how they interact.
Creative Freedom: Unlike some AI models that can be a bit stiff, Euryale is super creative and not afraid to take some artistic liberties. This makes her perfect for generating unique and interesting content.
Interactive Scenarios: Euryale excels in role-play and interactive scenarios, adapting to different character roles and maintaining coherence throughout the interaction.

How to Use?

You don’t need to be a tech wizard to use Euryale. Here are some fun ways to try her out:

Get Creative with Templates: Euryale shines when you give her a specific format to follow. For example, you can try out a preset called “Llama-3-Instruct” to see how she adapts to different writing styles.
Experiment with Settings: Adjusting things like temperature (how creative she gets) or repetition penalty (to avoid repeating the same phrases) can make a big difference in the kind of responses you get.
Recommended Settings: Don’t know much about settings? We’ve got you covered! Here are some recommended settings:
- RP Instruct: https://files.catbox.moe/1c9sp0.json
- RP Context: https://files.catbox.moe/5wwpin.json
- Settings provided by: ShotMisser64
Integrate into Your Workflow: Whether you’re writing a novel, crafting a marketing campaign, or just need a creative boost, Euryale can seamlessly fit into your existing workflow, providing inspiration and ideas when you need them most.

Who Can Use Euryale?

Anyone with a bit of imagination! Whether you’re an author, a gamer, or just someone who loves creative projects, Euryale is a fantastic tool to have in your digital toolbox. She’s designed to be easy to interact with, so you don’t need to know all the technical details to get started.

Why Should You Try Euryale?

Euryale isn’t just another AI model; she’s a creative partner who can help you unlock new possibilities. With her advanced capabilities, you can push the boundaries of what’s possible in writing, role-playing, and even virtual assistance. Plus, with her adaptability and creativity, she’s always ready to help you bring your ideas to life.

Ready to explore the possibilities? Dive in and let Euryale help you create something amazing today!

-> TRY IT NOW <-

The post L3-70B-Euryale-v2.1 appeared first on Infermatic.

Using Infermatic.ai API with SillyTavern

infermatic — Fri, 21 Jun 2024 00:36:35 +0000

SillyTavern is one of the most popular interfaces to interact with LLMs. We have been working on developing an API and one of the first interfaces we wanted to integrate with was SillyTavern. We have done just that.

Requirements: Infermatic.ai Plus Tier subscription ($15/month)

Steps to integrate:

After you subscribe to Infermatic.ai you can generate an API key. You will see a new option on the left-side menu bar called API Keys. Select that and a modal will open to generate a new key or copy an existing key.

2. In the modal generate a new key or copy an existing key.

3. Run SillyTavern locally. To connect our API select the power socket icon and match each setting.

API: Text Completion
API: Infermatic
Custom Endpoint (If selected vLLM or Aphrodite): https://api.totalgpt.ai
Custom API Key: They key you copied from step 2 above.

Then hit ‘connect’ connect to our API. The available models drop down will populate with various models our API supports. Select one and use the same name listed in the Enter a Model ID section.

5. And that’s all, ready for you to enjoy it!

Now we are still learning so would love feedback on this integration. Feel free to join us in Discord to share your feedback.

The post Using Infermatic.ai API with SillyTavern appeared first on Infermatic.

Harnessing the Power of Tailored AI: Beyond LLMs to Specialized APIs

infermatic — Mon, 11 Dec 2023 17:16:04 +0000

Hey tech enthusiasts! If you enjoyed our dive into the world of specific Large Language Models (LLMs), hold onto your hats because we’re about to explore another facet of personalized AI: the world of specialized APIs (Application Programming Interfaces).

APIs: The Unsung Heroes of Customized Tech

APIs are like the diligent postal workers of the digital world, delivering requests and responses between applications. They’re the backbone of much of the interactivity you see online. But not all APIs are created equal.

Why Specialization Matters in APIs

Imagine trying to order a pizza using a food delivery app that only knows about Chinese food. It’s a bit of a mismatch, right? That’s why specialized APIs matter. They’re designed to understand and handle specific types of requests with a level of precision and efficiency that general APIs can’t match.

The Dynamic Duo: Specialized LLMs and APIs

Pairing a specialized LLM with a specific API is like Batman and Robin in the tech world. For instance, an LLM trained in medical literature combined with an API designed for healthcare data can revolutionize digital health services.

Real-World Impact: Examples to Excite

Fintech: APIs that process financial transactions paired with LLMs trained on market trends can offer personalized investment advice.
E-Commerce: Imagine an API that handles customer service inquiries, powered by an LLM that understands your shopping history and preferences.
Education: An LLM that can understand and generate educational content, working with an API that delivers tailored learning experiences.

Where to Find These Specialized Tools?

Remember our mention of Infermatic.AI in our last post? Alongside their range of specific LLMs, platforms like this often host a variety of specialized APIs, offering a one-stop-shop for all your tailored AI needs.

Conclusion: The Future is Tailored

As we continue to march into the future, the significance of tailored solutions in AI becomes increasingly apparent. Whether it’s through specialized LLMs or APIs, the power of these technologies lies in their ability to cater to specific needs, offering precision, efficiency, and a touch of personalization.

So, whether you’re a developer, a business owner, or just a tech enthusiast, remember: the future isn’t just about AI, it’s about the right AI for the right job.

The post Harnessing the Power of Tailored AI: Beyond LLMs to Specialized APIs appeared first on Infermatic.

Embracing the Future: Discover new AI tools for content Automation

infermatic — Thu, 07 Dec 2023 18:53:47 +0000

Welcome to the future of technology and resource management! In today’s fast-paced digital era, artificial intelligence (AI) is not just a buzzword; it’s a game-changer in automating tasks, enhancing productivity, and managing resources efficiently. Let’s dive into some of the most innovative AI tools currently making waves in the market.

Meet HeyGen: Your Personal Avatar Creator

Ever thought of having a digital twin or a unique avatar for your videos? Say hello to HeyGen! This remarkable AI creates avatars based on your likeness or any character you fancy. Imagine having a virtual you delivering presentations, hosting webinars, or even starring in your own digital content only having by giving a script. The possibilities are endless, and the personalization is next-level. HeyGen isn’t just about creating avatars; it’s about creating a new dimension of digital interaction.

InfermaticAI: A Trifecta of AI Excellence

Next on our AI tour is InfermaticAI, a platform that’s a goldmine for anyone interested in AI-driven content creation. This website offers not one, not two, but three distinct AI models that cater to a variety of needs:

Text Generation: Whether you’re drafting articles, scripts, blogs, or reports, InfermaticAI’s Airoboros 12 70b gpt4 2.0 model can produce high-quality content in a snap. Say goodbye to writer’s block and hello to seamless, efficient writing!
Code Writing: For the developers and tech enthusiasts Airoboros is also pretty accurate model, It’s like having a virtual coding assistant that can help you build, troubleshoot, and optimize your programming projects.
Story Writing and Character Conversation: This is where it gets really fun, especially for fans of interactive storytelling and anime. InfermaticAI with the MPT 7B Storywriter and MythoMax L2 13B can craft engaging narratives and character dialogues, making it a fantastic tool for writers, game developers, and anyone looking to add a creative twist to their projects.

Why Join Our Community?

In our community, we’re not just about discussing these AIs; we’re about experiencing them. By joining us, you’ll get firsthand insights, tips, and tricks on how to make the most of these tools. Whether you’re a business owner looking to automate processes, a content creator in search of innovative tools, or a tech enthusiast eager to explore the latest in AI, our community is the perfect place to connect, learn, and grow.

The Bottom Line

The integration of AI like HeyGen and InfermaticAI into our daily workflows represents a significant leap forward in how we manage resources and create content. These tools are more than just convenient; they’re revolutionary, empowering us to do more with less and opening up new realms of creativity and efficiency.

So, why wait? Join our community today and be part of the AI revolution. Discover, engage, and transform the way you work and create with the most advanced AI tools on the market. Welcome aboard!

You can find us on Discord, Reddit

The post Embracing the Future: Discover new AI tools for content Automation appeared first on Infermatic.

Hugging Face’s new Zephyr Model

infermatic — Fri, 03 Nov 2023 14:16:11 +0000

In the ever-evolving landscape of Natural Language Processing (NLP), Hugging Face has been at the forefront of innovation, consistently pushing the boundaries of what’s possible with language models. With a track record of delivering state-of-the-art solutions for language understanding and generation, Hugging Face has introduced a new addition to its arsenal: the Zephyr model.

The Zephyr model is the latest milestone in the journey of NLP, representing a significant leap forward in the field. In this blog post, we will dive deep into the world of Zephyr, exploring its architecture, capabilities, and the exciting possibilities it offers to researchers, developers, and NLP enthusiasts.

From its inception, Hugging Face has been committed to democratizing access to powerful language models, making them accessible to the wider community. With Zephyr, this mission continues, offering another groundbreaking tool that promises to revolutionize how we interact with and understand human language.

Whether you’re a seasoned practitioner or just starting your journey, Zephyr is a model you’ll want to get acquainted with if you’re looking to stay on the cutting edge of NLP. In this post, we will provide an in-depth look at Zephyr’s features, use cases, limitations, and more, to equip you with the knowledge and tools needed to harness the power of this remarkable model. So, without further ado, let’s embark on our journey to uncover the immense potential of Zephyr.

What is Zephyr?

At the heart of the NLP revolution lies Hugging Face’s Zephyr, a model that has taken the field by storm. Zephyr is a highly advanced language model designed to understand, generate, and manipulate human language with unparalleled precision and flexibility.

Unlike its predecessors, Zephyr boasts a remarkable blend of architecture, size, and pre-training techniques. This has made it a standout option for a wide range of NLP tasks and challenges, setting it apart from other Hugging Face models like GPT-3 and BERT.

Zephyr and its Capabilities

Zephyr is not your run-of-the-mill language model. It has been meticulously crafted to tackle complex NLP tasks with ease. Its capabilities include:

Language Understanding: Zephyr excels in comprehending the nuances of language, making it a valuable asset for tasks such as sentiment analysis, text classification, and named entity recognition.
Language Generation: Zephyr has a knack for generating human-like text, making it a fantastic tool for chatbots, content generation, and automated writing.
Contextual Reasoning: Zephyr understands context and can reason within it. This enables it to provide coherent and contextually relevant responses, making it a powerful tool for conversational AI.

Comparison with Other Models

To truly appreciate Zephyr’s significance, it’s important to compare it with other notable models:

GPT-3: While GPT-3 is known for its remarkable text generation capabilities, Zephyr stands out for its flexibility in understanding and manipulating text, making it more versatile for a variety of NLP tasks.
BERT: Zephyr’s architecture and size are different from BERT, allowing it to excel in tasks that require contextual understanding, whereas BERT is primarily focused on pre-training.

Zephyr strong performance against larger models

Key Features of Zephyr

Zephyr comes packed with a set of distinctive features that set it apart from its predecessors and make it a promising choice for various NLP tasks. Let’s dive into some of the key features that define Zephyr’s capabilities.

Language Understanding and Generation Capabilities
Contextual Understanding: Zephyr possesses the ability to grasp the nuances of language and the context in which it is used. This means it can understand the subtleties of conversational context, helping it generate contextually relevant responses in chatbots, virtual assistants, and other dialogue systems.
Multi-Lingual Support: Zephyr is designed to handle multiple languages, making it versatile for global applications. Its ability to work with different languages broadens its applicability in various regions and industries.
Named Entity Recognition (NER): Zephyr can identify and categorize named entities in text, such as names of people, places, organizations, and more. This makes it valuable for applications like information extraction and document analysis.
Model Architecture and Size

Zephyr boasts a state-of-the-art model architecture that contributes to its exceptional performance. Some notable aspects include:

Transformer Architecture: Like its predecessors, Zephyr is built on the transformer architecture, which has become the backbone of modern NLP models. This architecture enables the model to process and generate text efficiently.
Optimized Size: Zephyr is designed to strike a balance between model size and performance. It offers impressive capabilities without the excessive computational requirements of larger models, making it more accessible to a broader range of users.
Pre-Training and Fine-Tuning Processes
Pre-Training Data: Zephyr has been pre-trained on vast corpora of text from the internet, allowing it to learn from a wide range of sources. This extensive pre-training enhances its general language understanding.
Fine-Tuning Flexibility: Zephyr can be fine-tuned on specific tasks, making it a highly adaptable model. This fine-tuning process allows developers and researchers to customize the model for their unique needs, such as sentiment analysis, question-answering, and more.

Use Cases

Zephyr’s versatility and proficiency in handling natural language make it a valuable tool for a wide array of NLP tasks. Below, we explore some of the prominent use cases where Zephyr can shine:

Chatbots and Virtual Assistants

Zephyr is well-suited for building chatbots and virtual assistants. Its contextual understanding and language generation capabilities enable it to engage in meaningful and natural-sounding conversations. Whether you’re developing a customer support chatbot or a virtual assistant for daily tasks, Zephyr’s flexibility can make your application more user-friendly.

Content Generation

Content generation is another domain where Zephyr can be a game-changer. Blog posts, news articles, marketing copy, and creative writing are all areas where Zephyr can help automate the content creation process. By providing prompts and instructions, you can leverage Zephyr to generate high-quality text tailored to your needs.

Sentiment Analysis

Zephyr can be fine-tuned for sentiment analysis, a crucial task in understanding the emotional tone of text. It can assist in monitoring social media sentiment, customer reviews, and news articles to gain insights into public opinion and brand reputation.

Language Translation

Zephyr’s multi-lingual support makes it a valuable tool for language translation tasks. You can fine-tune the model to perform translation between multiple languages, helping break down language barriers and facilitate communication on a global scale.

Question-Answering Systems

Building question-answering systems is simplified with Zephyr. Fine-tuning the model for specific domains or knowledge bases can enable it to provide accurate and contextually relevant answers to user queries.

Text Summarization

Zephyr can also be utilized for automatic text summarization, helping users quickly extract the key points and insights from lengthy documents or articles. This is particularly useful in content curation and research applications.

Named Entity Recognition (NER)

In applications like document analysis and information extraction, Zephyr’s ability to recognize and categorize named entities (e.g., names of people, places, organizations) can enhance the efficiency and accuracy of data processing.

Language Tutoring and Learning

Zephyr can assist in language tutoring and learning applications by providing explanations, answering questions, and generating example sentences. This can be invaluable for language learners looking to improve their proficiency.

These are just a few examples of the many possible applications of Zephyr. Its adaptability, multi-lingual support, and pre-training capabilities make it a powerful ally for a wide range of NLP tasks. Whether you’re looking to enhance user experiences, automate content creation, or gain insights from large volumes of text data, Zephyr offers a promising solution that can be tailored to your specific needs.

Limitations and Challenges

While Zephyr offers a remarkable set of capabilities, it’s important to be aware of its limitations and challenges. Understanding these aspects can help you make informed decisions when working with the model.

Computational Demands: Zephyr, like many advanced language models, requires significant computational resources for both training and inference. Fine-tuning the model can be resource-intensive, limiting its accessibility for users with limited computing power.
Latency: Real-time applications, such as chatbots and virtual assistants, may experience latency when using Zephyr due to the time required for model inference. This could affect the user experience, particularly in highly interactive applications.
Model Size: While Zephyr is optimized for a balance between model size and performance, it may not be as efficient as smaller models for certain applications. Smaller models may be preferable for use cases with tight resource constraints.
Data Biases: Zephyr’s pre-training data comes from the internet, which can introduce biases in the model’s understanding of language. Care must be taken to address and mitigate biases, especially in applications where fairness and inclusivity are paramount.
Fine-Tuning Challenges: Fine-tuning Zephyr for specific tasks can be challenging. Careful consideration and extensive hyperparameter tuning may be required to prevent overfitting and ensure the model generalizes well to new data.
Context Window: Zephyr, like other transformer models, has a finite context window. It may struggle with tasks that require understanding very long documents or sequences, as it may lose important contextual information.
Low-Resource Languages: While Zephyr supports multiple languages, it may not perform as well in low-resource languages, as its training data is predominantly in major languages.
Ethical Use: As with any powerful language model, ethical considerations are crucial. Ensuring responsible use of Zephyr, avoiding misuse, and addressing potential issues related to misinformation and harmful content are essential responsibilities when working with the model.

By being mindful of these limitations and challenges, you can make informed decisions about whether Zephyr is the right fit for your specific NLP tasks. Addressing these concerns and actively working to mitigate them can lead to more responsible and effective use of the model in various applications.

Conclusion

In the ever-evolving landscape of Natural Language Processing, Hugging Face’s Zephyr model stands as a shining example of progress and innovation. With its remarkable capabilities and versatile applications, Zephyr has redefined the possibilities of what can be achieved with language models.

Throughout this blog post, we’ve taken a deep dive into the world of Zephyr, exploring its architecture, key features, use cases, and limitations. As we conclude, let’s reflect on the journey we’ve taken and the opportunities that Zephyr presents.

Zephyr’s ability to understand and generate human language with contextual accuracy has the potential to transform a wide range of industries and applications. Whether you’re looking to enhance user experiences, automate content generation, or gain valuable insights from textual data, Zephyr offers a powerful solution that can be tailored to your specific needs.

However, it’s crucial to recognize that Zephyr is not without its challenges. Computation, data biases, and ethical considerations are important aspects to address when working with this model. Responsible and thoughtful use is paramount to ensure the positive impact of Zephyr in the NLP community.

As the field of NLP continues to evolve, models like Zephyr represent the cutting edge, and the possibilities for innovation are limitless. Researchers, developers, and enthusiasts alike can harness the power of Zephyr to create intelligent chatbots, automate content creation, analyze sentiment, and much more.

The journey with Zephyr is just beginning, and it’s an exciting time to be a part of the NLP community. We encourage you to explore, experiment, and share your experiences with this remarkable model. As the NLP landscape continues to advance, Zephyr promises to be at the forefront, offering new opportunities for understanding and interacting with human language.

So, whether you’re a seasoned practitioner or a newcomer to the world of NLP, Zephyr is a model worth exploring. Embrace its potential, engage with its capabilities, and contribute to the ever-evolving story of NLP innovation. The journey is just beginning, and the future is full of promise.

Try Zephyr on Infermatic Today!

References

For a deeper understanding of the Zephyr model and the concepts discussed in this blog post, you may find the following references and resources useful:

Zephyr Model on Hugging Face Model Hub: Zephyr Model on Hugging Face
Hugging Face Model Hub: Hugging Face Model Hub
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 30-31).
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Bidirectional encoder representations from transformers. arXiv preprint arXiv:1810.04805.

The post Hugging Face’s new Zephyr Model appeared first on Infermatic.