The Web Scraping Club by Pierluigi Vinciguerra

The Web Scraping Club is a Substack publication curated by Pierluigi Vinciguerra, a seasoned expert in the field of web scraping. This blog serves as a comprehensive resource for individuals interested in the technical and practical aspects of web scraping. It covers a wide range of topics, from basic tutorials for beginners to advanced techniques for experienced scrapers, ensuring there’s something valuable for readers at every skill level. Vinciguerra’s blog is particularly known for its ‘The Lab’ series, which delves into complex challenges faced in web scraping. Recent entries have explored bypassing Cloudflare with anti-detect browsers and evaluating commercial web unblockers. These posts are not only informative...

What’s next for AI agentic workflows by Andrew Ng (AI Ascent by Sequoia Capital)

I came across a fascinating video from Andrew Ng discussing the future of AI with a focus on agentic workflows. Unlike the current non-agentic workflows where we spoon-feed instructions to AI models, agentic workflows allow these models to act more independently, tackling tasks on their own. A great analogy mentioned by Andrew Ng is that non-agentic workflows operate as if you are writting an essay without the option to use backspace (which works remarkably well for LLMs), but what works even better is if you give them the option to contemplate on their results and provide updates (agentic workflows). The...

Airbyte serverless to load data to your warehouse in 10 lines of Python source code

AirbyteServerless is a straightforward tool designed to manage Airbyte connectors. It offers the flexibility to run these connectors either locally or in serverless mode. If you’re dealing with data pipelines, ETL, data warehousing, or data engineering, AirbyteServerless is a must-have in your tech stack. It simplifies the process of moving data from various sources to your data warehouse. The repository is available on GitHub here. You can use it to load data to your datawarehouse from almost any data source out there. And you don’t need a DB, a UI or an Airbyte server. Plus, serverless compute deployment is supported meaning it can work on Github...

Tokenization by Andrej Karpathy

In a recent YouTube tutorial, Andrej Karpathy—the wizard behind Tesla’s Autopilot and OpenAI’s GPT—unveiled the secrets of tokenization. Buckle up, tech-savvy professionals, because this isn’t your run-of-the-mill theoretical lecture. It’s a hands-on journey into the heart of language models. So, what’s the deal with tokenization? Imagine it as the backstage choreographer for Large Language Models (LLMs). It translates between human-readable strings and the cryptic tokens that LLMs munch on. In this tutorial, we’re not just peeking behind the curtain; we’re building our own tokenizer from scratch. Here’s the lowdown: So grab your code editor, channel your inner Karpathy, and let’s build...

150x Faster Pandas using NVidia

Believe it or not NVIDIA is making Pandas 150x faster without source code changes. What you need to do? Their RAPIDS library will automatically know if you’re running on GPU or CPU and speed up your processing. You can try it here: https://colab.research.google.com/drive/12tCzP94zFG2BRduACucn5Q_OcX1TUKY3 Github Repo (7K stars): https://github.com/rapidsai/cudf Sign up to receive updates about new posts!

GTM Mastery: ChatGPT’s Top Tips for Speeding Up Your Workflow!

If you’re tired of grappling with Google Tag Manager (GTM) and longing for expert advice to make your work more efficient, you’re in the right place. In this article, we’re unleashing ChatGPT’s top five tips to supercharge your GTM workflow. You won’t want to miss these game-changing hacks, from automating repetitive tasks to getting insights that will make your analytics sing. Rewrite incompatible source code GTM only supports source code compatible with ES5. For instance, you cannot register variables using const or let. But if you have an ES6 source code snippet you want to use in GTM just ask...

Creativity will set you apart in the AI era

Buckle up, fellow tech-savvy adventurers, because the age of Artificial Intelligence is turning us all into creative wizards! Picture this: with just a few taps on your keyboard, ChatGPT whips up a shiny new website, while the Code Interpreter crunches numbers like a pro. And let’s not forget Roblox, the ultimate playground for budding game-makers! What’s the magical twist, you ask? Well, it’s all about the unique flavors of creativity bubbling within us tech-heads. The real deal isn’t just the tools we wield; it’s the enchanting experiences we choose to conjure! So, here’s the spellbinding takeaway: In this era of...