2024 Q4 Most Exciting LLM-powered Projects
Following up on the list of 2023’s, Q1 2024’s, Q2 2024’s, Q3 2024’s most exciting projects, here is the list of the most useful, innovative and exciting LLM-powered projects I found in 2024 Q4.
Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper 🕷️
Crawl4AI (11K stars) is a powerful, open-source Python asynchronous web crawling and data extraction library designed for large language models (LLMs) and AI applications. It offers blazing-fast performance, outperforming many paid services, while providing LLM-friendly output formats like JSON, cleaned HTML, and markdown.
Key features include:
- multi-URL crawling
- media tag extraction
- custom hooks for authentication and page modifications
- user-agent customization
- screenshot capture
- various chunking and extraction strategies
Crawl4AI excels in complex scenarios like session management and dynamic content crawling, making it ideal for tasks such as analyzing GitHub commits across multiple pages.
The library is particularly useful for developers working on AI-driven web scraping projects, offering a free alternative to paid services with comparable or better performance.
Users can try out Crawl4AI using the provided Colab notebook or explore its capabilities through the comprehensive documentation.
Firecrawl: Turns websites into LLM-ready markdown 🔥
Firecrawl is a powerful web crawling and data extraction API service designed for AI applications. It offers advanced scraping, crawling, and structured data extraction capabilities, converting web content into clean markdown or structured data formats ideal for large language models (LLMs).
Key features include multi-URL crawling, LLM-ready output formats, proxy support, anti-bot handling, and customizable extraction options. Firecrawl excels in reliability and customizability, offering features like custom headers for authentication, PDF and image parsing, and interactive page actions. It’s particularly useful for developers building AI-powered web scraping projects, offering both a cloud-based API service and open-source options.
Users can easily try Firecrawl through its playground or explore its capabilities via the comprehensive documentation. The project also provides SDKs for multiple programming languages and integrations with popular LLM frameworks, making it a versatile tool for various web data extraction needs.
AI Hawk: Auto jobs applier
Auto_Jobs_Applier_AIHawk is a beta version AI-powered job search assistant that automates the job application process. It offers features like intelligent job search automation, rapid application submission, AI-powered personalization for resumes and cover letters, and bulk application capabilities with quality control measures.
The tool is designed to streamline the job hunting process by automatically searching for relevant positions, filling out application forms, and even generating tailored resumes for each application. It supports various LLM models including OpenAI’s GPT, Ollama, Claude, and Gemini, allowing users to customize their experience. The project includes detailed configuration options for job search parameters, resume information, and LLM settings. It is particularly useful for job seekers looking to efficiently apply to multiple positions while maintaining personalization in their applications.
Users can try out the tool by following the installation instructions and configuring their job search preferences in the provided YAML files. The project’s GitHub repository offers comprehensive documentation, troubleshooting guides, and community support for users getting started with this automated job application tool. It’s also an interesting tool to study if you are working on agentic solutions.
Trigger.dev: Long-running background jobs without ⌛
Trigger.dev is an open-source platform and SDK for creating long-running background jobs without timeouts. It allows developers to write normal async code in JavaScript or TypeScript, deploy it, and never hit a timeout.
Key features include reliability by default, no infrastructure management, and compatibility with existing tech stacks. Trigger.dev integrates directly into your codebase, allowing for version control, local development, testing, and code review using familiar processes. The platform supports multiple environments (Development, Staging, Production) and provides full visibility of every job run through a detailed trace view. Trigger.dev offers both cloud-based and self-hosted options, making it flexible for various deployment needs.
Gateway: Blazing Fast AI Gateway 🤝🏼
Portkey AI Gateway is an open-source platform that provides a unified API for routing requests to over 200 language, vision, audio, and image models from various providers. It offers production-ready features such as caching, fallbacks, retries, timeouts, and load balancing, with the ability to be edge-deployed for minimal latency.
Key features include blazing fast performance (9.9x faster than direct API calls), a tiny footprint (about 100kb build), load balancing across multiple models and providers, automatic retries, configurable timeouts, and support for multimodal AI tasks. The gateway is compatible with OpenAI API and SDKs, making it easy to integrate into existing projects. It supports multiple deployment options, including a hosted version, self-hosted open-source version, and an enterprise version with additional security and management features.
The project is actively maintained, has processed over 480 billion tokens, and is used by companies like Postman, Haptik, and Turing. Users can get started quickly with the hosted API, or deploy the open-source version using npm or other deployment methods. The AI Gateway also integrates well with popular agent frameworks and offers extensive documentation and community support for developers.
Awesome ChatGPT Prompts: Prompt like a Pro! 💬
“Awesome ChatGPT Prompts” is a vibrant, community-driven repository that serves as a comprehensive collection of creative and practical prompts for ChatGPT interactions. Beyond just being a simple prompt library, it has evolved into a dynamic ecosystem featuring a curated store of custom GPTs and specialized models. The repository includes everything from technical prompts for Ethereum development to SEO content creation guidelines, making it an invaluable resource for developers, content creators, and AI enthusiasts looking to maximize their ChatGPT experience. What sets this repository apart is its dual focus on both practical application and community engagement, allowing users to not only consume but also contribute to an ever-growing collection of prompts that push the boundaries of what’s possible with conversational AI.
OmniParser: Screen parsing tool 🖥️
OmniParser is an innovative screen parsing tool designed to enhance the capabilities of Large Language Models (LLMs) in interacting with graphical user interfaces. The system comprises two main components: a detection model that identifies clickable elements in user interfaces, and a caption model that describes the functionality of these elements. Built on a dataset of 67,000 unique screenshots from popular websites and 7,000 icon-description pairs, OmniParser converts unstructured UI screenshots into structured formats that LLMs can better understand and act upon.
The tool has demonstrated superior performance compared to GPT-4V baselines on several benchmarks including ScreenSpot, Mind2Web, and AITW, notably achieving these results using only screenshot inputs rather than requiring additional contextual information. Its plugin-ready architecture also allows integration with other vision language models like Phi-3.5-V and Llama-3.2-V, making it a versatile solution for vision-based GUI interaction.
Zerox OCR: Visual Document Processing Made Easy 🔍
I’ve been struggling with extracting text from documents with complex layouts until I found Zerox OCR. What makes it unique is its approach to treat documents as what they really are – visual representations. Instead of wrestling with traditional OCR methods, Zerox converts your documents (PDFs, images, etc.) into a series of images and lets AI vision models interpret them naturally, handling those tricky tables, charts, and weird layouts that usually break other tools.
I love how it offers both Python and Node SDKs, making it super flexible for different projects. You can try it out with a variety of vision models from OpenAI, Azure, Anthropic, and others, and the best part is how dead simple it is to use – just feed it a file and get back clean Markdown. For me, what sealed the deal was seeing how it handled my multi-page documents with complex tables spanning across pages while maintaining formatting consistency. Whether you’re batch processing documents or need a reliable OCR solution for your AI pipeline, this tool just makes sense. If you don’t believe me, give it a try using the hosted demo website.
Skyvern: Browser Automation Reinvented 🐉
Anyone working with web automation scripts, has struggled to keep up with website changes that can break them. Until I discovered Skyvern – it’s a complete game-changer in how we automate browser-based workflows. Instead of wrestling with fragile XPath selectors that break whenever a website updates its layout,
Skyvern uses a brilliant combination of LLMs and computer vision to understand what’s on the screen and interact with it naturally, just like a human would. What really blew my mind is how it can tackle workflows it’s never seen before, from filling out complex insurance forms to downloading invoices across different platforms, all while being incredibly resistant to website changes. The coolest part? It can take a single workflow and apply it across multiple websites – imagine automating job applications or procurement processes across dozens of sites without writing custom code for each one. Whether you’re looking to automate repetitive tasks or build scalable web automation solutions, Skyvern’s approach using a swarm of specialized agents (for navigation, data extraction, password management, and more) makes it a much more reliable and adaptable solution than traditional automation tools.
500 LLM prompts for inspiration 💬
The article “500+ Best Prompts for ChatGPT (Ultimate List for 2024)” on God of Prompt provides a comprehensive guide to crafting effective and engaging prompts for ChatGPT. It emphasizes the importance of specificity and context in prompts to elicit detailed and useful responses. The guide includes a variety of templates tailored to different industries and use cases, ensuring users can find the perfect prompt for their needs. Whether you’re a blogger, content creator, or just looking to improve your interactions with ChatGPT, this list offers practical and versatile prompts to enhance productivity and creativity
Screenshot-to-code: Create HTML source code from mockups 🎨
Screenshot-to-code is a game-changing tool that transforms design mockups into production-ready code in seconds! As a developer who frequently works with designers, I find it incredible how it can take any screenshot, mockup, or Figma design and convert it into clean, functional code using state-of-the-art AI models like Claude Sonnet 3.5 and GPT-4o.
What really sets it apart is its versatility – whether you need HTML with Tailwind, React components, Vue templates, or even Bootstrap layouts, it’s got you covered. The coolest part? I can now even turn screen recordings into working prototypes! With both free self-hosted and paid hosted options available, plus enterprise support for larger teams, it’s become an essential part of my development workflow. It’s like having an AI pair programmer that handles all the tedious frontend implementation while I focus on the core functionality.
Llama OCR – Free Vision-Powered Text Extraction 🦙
As a developer I am always on the lookout for cost-effective OCR solutions, I was thrilled to discover Llama OCR. This npm package is a game-changer for anyone needing to extract text from images without breaking the bank, as it leverages Llama 3.2 Vision for free text extraction through Together AI.
I love how straightforward it is to use – just provide an image path and an API key, and you get back clean markdown text.
import { ocr } from "llama-ocr";
const markdown = await ocr({
filePath: "./trader-joes-receipt.jpg", // path to your image (soon PDF!)
apiKey: process.env.TOGETHER_API_KEY, // Together AI API key
});
Whether you’re processing receipts (like in their Trader Joe’s example) or any other image-based documents, the simplicity is unmatched. Plus, with their roadmap including PDF support and JSON output options, it’s clear this tool is evolving in the right direction. For those who need more processing power, they even offer paid endpoints with their 11B and 90B models – but the free tier is perfect for getting started. Try it out at LlamaOCR.com or you can just copy the prompt and use it with other LLMs.
More updates coming soon
Revisit this article for more updates and let me know in the comments if you think I missed anything interesting.