2024 Q2 Most Exciting LLM-powered Projects

Following up on the list of 2023’s and Q1 2024’s most exciting projects, here is the list of the most useful, innovative and exciting LLM-powered projects I found in 2024 Q2.

AIOS: LLM-Powered OS 🖥️

The AIOS (Artificial Intelligence Operating System) repository, hosted on GitHub, is a comprehensive platform that integrates various AI technologies. It provides developers with a unified environment for building, deploying, and managing AI applications. From machine learning models to natural language processing tools, AIOS streamlines the development process, making it easier for researchers and engineers to create innovative AI solutions. Whether you’re working on computer vision, speech recognition, or recommendation systems, AIOS offers a robust framework to accelerate your AI projects. Explore the repository here and unlock the potential of artificial intelligence! 🚀🤖

This type of OS was expected for quite some time as Andrew Karpathy mentioned in his “Intro to LLMs“.

Open-source autonomous software engineers 🦾

Have you seen Devin? The world’s first fully autonomous AI software engineer. There are now several open-sourec options available.

  1. SWE-Agent:
    • Description: SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
    • Repository: https://swe-agent.com/
  2. OpenDevin:
    • Description: OpenDevin replicates Devin—an autonomous AI software engineer capable of executing complex engineering tasks and actively collaborating with users on software projects. It aims to enhance and innovate upon Devin through open-source contributions.
    • Key Features: Autonomy, web browsing, code writing, and collaboration.
    • RepositoryOpenDevin Repository.
  3. Devika:
    • Description: Devika is an advanced AI software engineer that understands high-level human instructions, breaks them down into steps, researches relevant information, and writes code to achieve objectives. It aims to be an open-source alternative to Devin.
    • Key Features: Planning, reasoning, web browsing, and code generation.
    • RepositoryDevika Repository.
  4. Codel:
    • Description: Codel is a fully autonomous AI agent capable of performing complex tasks using the terminal, browser, and editor. It runs in a sandboxed Docker environment, automatically detects steps, and provides seamless web browsing and code writing.
    • Key Features: Security, autonomy, browser integration, and code execution.
    • RepositoryCodel Repository
  5. AutoCodeRover:
    • Description: AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch..
    • Key Features: Seaches relevant code by context, utilizes test suites.
    • RepositoryAutoCodeRover Repository

Voicecraft: Zero-Shot Speech Editing and Text-to-Speech 🎙️

VoiceCraft is a cutting-edge token infilling neural codec language model that excels in zero-shot speech editing and text-to-speech (TTS) tasks with in-the-wild data, such as audiobooks, internet videos, and podcasts. It requires only a few seconds of a reference voice to clone or edit an unseen voice, showcasing its efficiency and effectiveness. The repository includes a comprehensive README with instructions for running inference in various ways, including Google Colab, Docker, and local environment setup, as well as links to pre-trained model weights.

AI/Claude researcher: Automated online research 🔍

This project by Matt Shumer is an AI agent that leverages Claude 3 and SERPAPI to conduct extensive research on specified topics. It systematically breaks down the research process into subtopics, creates detailed reports for each, and compiles them into a final comprehensive report. The tool is designed to generate a checklist of subtopics, perform iterative searches and analyses, and incorporate feedback to refine the reports, which are ultimately saved as a text file. It’s an experimental tool meant for informational purposes, and users are encouraged to verify the information with reliable sources.

You can also find the following fork variations for the same project:

  • Claude3 AI-Researcher (Google API)
  • Low-cost Haiku-Researcher

The results look impressive:

Ragflow: Cutting-edge open-source RAG architecture 🗃️

RAGFlow, hosted on GitHub by InfiniFlow, is a cutting-edge open-source RAG (Retrieval-Augmented Generation) engine that leverages deep document understanding to enhance question-answering systems. It’s designed to streamline the RAG workflow for businesses of any size, integrating Large Language Models (LLMs) to provide accurate, citation-backed answers from a variety of complex data formats. The key features that make RAGFlow an attractive option include its ability to extract knowledge from unstructured data, support for multiple data sources, and a template-based chunking system that reduces hallucinations in answers. With recent updates like local LLM deployment and new layout recognition models, RAGFlow is continually evolving to meet the needs of modern businesses seeking reliable AI-driven insights.

AI-generated text detector 👁️

The GitHub repository “Detect-AI-text-Easily” by Fareed Khan is a cutting-edge tool designed to identify AI-generated text. This application stands out for its user-friendly interface, allowing users to either upload a document or input text directly for analysis. It’s particularly beneficial for those looking to quickly discern between human and AI-written content, ensuring authenticity and originality in written works. You can also find a list of words that you can use to tell if a piece of text was written by AI!

With the rise of AI in content creation, this free tool provides a valuable layer of verification for editors, publishers, and anyone concerned with the integrity of text-based communication.

AutoCrawler: Web crawling agent 🌎

The AutoCrawler is an innovative project that leverages large language models (LLMs) to automate the generation of web crawlers, as detailed in the paper “AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation“. This tool is designed to simplify the process of extracting structured data from websites without the need for manual coding. It’s particularly useful for developers and researchers who require large datasets for training machine learning models or for data analysis purposes. The AutoCrawler stands out for its ability to progressively understand and adapt to different web page structures, making it a versatile and powerful tool for anyone looking to harness the power of web data efficiently and effectively.

ScareGraphAI: Automated web scraping with Python 🕷️

ScrapeGraphAI is a cutting-edge Python library that revolutionizes the way developers approach web scraping by leveraging the power of large language models (LLMs) and direct graph logic. This innovative tool simplifies the extraction of information from websites, documents, and XML files, allowing developers to specify what data they need, and the library handles the rest. It’s an essential asset for developers looking to streamline their data acquisition process, reduce the complexity of scraping tasks, and harness the capabilities of AI to create efficient scraping pipelines. With ScrapeGraphAI , developers can quickly install the library, set up their scraping parameters, and let the AI do the heavy lifting, making it a valuable addition to any tech stack. To experience the simplicity and power of ScrapeGraphAI , developers can try it out in a Google Colab environment, see it in action through a Streamlit demo, or dive into the documentation to explore its full potential.

LLM Autoquizer: Convert pages to quizes ❔

A Hugging Face tool that generates quizzes from URLs and evaluates LLMs’ web and “closed book” performance.

AI Comic Factory 🎨

Create comic books in any style you like just by defining the main plot using an LLM prompt!

Insanely Fast Whisper: Fast audio transcription 🎙️

Insanely fast whisper is able to transcribe 150 minutes of Audio (2.5 hours) in unders 100 seconds! It works on works on Mac or Nvidia GPUs. It is an improvement project based on OpenAI’s Whisper Large v3 model, aiming to achieve extremely fast audio transcription. This project greatly increases the transcription speed by using 🤗 Transformers, Optimum and flash-attn technologies.

Just install the library: pip install insanely-fast-whisper

and strart transcribing audio: insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>

Khoj: Your agentic second brain 🧠

Khoj is an innovative open-source application that empowers users by creating personalized AI agents tailored to their needs. By sharing their notes, documents, and repositories with Khoj, users can extend their cognitive abilities and harness the power of AI to enhance their productivity and decision-making processes. With access to real-time information from the internet, these AI agents can provide users with up-to-date insights and accurate semantic search capabilities across their digital assets. Khoj’s versatility extends across multiple platforms, including Desktop, Emacs, Obsidian, Web, and WhatsApp, making it a highly accessible and convenient solution. Furthermore, its self-hosting capabilities ensure privacy and data ownership, making Khoj an attractive choice for individuals and organizations seeking a secure and customizable AI solution. Go to https://app.khoj.dev to see Khoj live.

GA4 BigQuery SQL composer (Custom GPT) 💬

Himanshu Sharma (optimizeSmart.com) has introduced GA4 BigQuery Composer (Custom GPT), powered by GPT-4o and available to all ChatGPT users. This tool allows users to create efficient and precise SQL queries for GA4 BigQuery without needing to understand a single line of code. The Composer translates plain English instructions into SQL code, leveraging its specialized training on GA4 BigQuery datasets for accurate results. With the GA4 BigQuery Composer, users can generate complex SQL queries within seconds, troubleshoot errors by simply copying and pasting error messages, and modify generated queries with ease. This tool not only saves time by automating the query-building process but also reduces the chances of making errors in query syntax or structure.

LaVague: Automated web interactions 🌊

LaVague is revolutionizing web automation with its open-source Large Action Model (LAM) framework, enabling users to convert natural language instructions into executable Selenium code effortlessly. This innovative solution simplifies the automation of web-based workflows, allowing for seamless orchestration and deployment of browser-executed tasks. With features like natural language processing, Selenium integration, and local models for enhanced privacy and control, LaVague stands out as a powerful tool for developers and non-technical users alike. It’s particularly beneficial for automating repetitive admin tasks, RPA, and web testing, offering a significant boost in efficiency and productivity.

Tarsier: Automate web interactions with LLMs 🌐

Tarsier, developed by Reworkd, is a cutting-edge vision utility designed to enhance web interaction agents. It addresses common challenges faced when using large language models (LLMs) for web automation, such as how to feed webpages to an LLM and map responses back to web elements. Tarsier excels by visually tagging interactable elements on a page, providing a clear mapping for LLMs to execute actions. This system significantly improves the efficiency of web tasks, as it allows even text-only LLMs to understand a page’s visual structure through an OCR algorithm that converts screenshots into a structured string representation. With Tarsier, developers can expect a performance boost of 10-20% over traditional methods, making it an invaluable tool for creating more intelligent and responsive web agents

Jan: Open source ChatGPT 🛠️

Jan is an innovative open-source project that offers a compelling alternative to ChatGPT, designed for those who value privacy and customization. It allows users to run a powerful conversational AI locally, providing full control over models, configurations, and functionalities. This autonomy ensures that sensitive data remains private, and the flexibility to tweak the system to one’s liking enhances the user experience. With “Jan,” tech enthusiasts and developers have a versatile tool at their disposal, enabling them to explore the realms of conversational AI without compromising on transparency or personalization. Whether you’re a hobbyist looking to experiment or a professional in need of a reliable chatbot framework, “Jan” stands out as a go-to solution that aligns with the ethos of the open-source community.

GPT Computer Assistant: GPT4o for Windows 👩‍💻

GPT Computer Assistant is a game-changer. Now available for Windows, macOS, and Ubuntu, is a versatile Python library that promises to streamline your digital life. Whether it’s managing your calendar, taking meeting notes, or even writing code, the GPT Computer Assistant is designed to be your go-to solution. With a commitment to providing native install scripts for an even smoother user experience, this assistant is powered by Upsonic Tiger, an efficient function hub for LLM agents. It’s not just an alternative; it’s a fresh and stable approach to enhancing productivity and simplifying complex tasks. So why choose GPT Computer Assistant? It’s simple: it’s about making your digital interactions more intuitive, more efficient, and ultimately, more human.

Mora: Open-source Sora-like video generation 🎥

Mora is a multi-agent framework designed for generalist video generation, which stands out as a compelling open-source alternative for Sora. It leverages a collaborative approach with multiple visual AI agents, each specializing in different aspects of the video generation process, to produce high-quality outcomes across a broad spectrum of tasks. Whether it’s text-to-video generation, text-conditional image-to-video generation, or video-to-video editing, Mora covers an extensive range of applications. Its open-source nature encourages community collaboration, fostering innovation and continuous improvement. With proven performance close to that of OpenAI’s Sora, Mora is an invaluable tool for developers and creatives looking to push the boundaries of video generation technology.

Whisper WebGPU: Real-time in-browser speech recognition 🎙️

The Real-time Whisper WebGPU project, is an open source project that utilizes OpenAI’s Whisper model to deliver real-time, in-browser speech recognition. This innovative tool represents a significant leap forward in the way users interact with AI-driven web applications, offering a private and powerful AI that operates locally in the browser. The technology not only enhances user experience by providing real-time transcription services but also underscores the potential of WebGPU and transformer.js in revolutionizing browser capabilities for complex tasks like speech processing.

Awesome notebooks: Production-ready Python notebooks 📓

Not exactly an LLM-powered project, but this could come in handy when working on AI projects. The awesome-notebooks repository is a powerful resource for data scientists and AI enthusiasts, offering a long catalog of Jupyter Notebook templates organized by app. These templates cover a wide range of functionalities, including prompts, plugins, models, workflow automation, analytics, and code snippets. By following the IMO (Input, Model, Output) framework, the repository ensures that these notebooks are easily searchable and reusable in any context, making it an invaluable tool for streamlining and enhancing data and AI projects. You can find source code snippets for Gmail, BigQuery, Google Analytics, Google Calendar, OpenAI, Zapier and almost every commonly used SAAS product out there.

Token cost: LLM prompt estimation for 400+ LLMs 🧮

The AgentOps-AI/tokencost repository is an essential tool for developers working with Large Language Models (LLMs). It provides easy-to-use functions for estimating the USD cost of using over 400 LLMs by calculating the cost of prompts and completions. This is particularly important for managing and optimizing the expenses associated with deploying AI applications. The repository also tracks the latest price changes from major LLM providers, ensuring that users have up-to-date information for accurate budgeting and cost management.

Panagiotis

Written By

Panagiotis (pronounced Panayotis) is a passionate G(r)eek with experience in digital analytics projects and website implementation. Fan of clear and effective processes, automation of tasks and problem-solving technical hacks. Hands-on experience with projects ranging from small to enterprise-level companies, starting from the communication with the customers and ending with the transformation of business requirements to the final deliverable.