Panagiotis Tzamtzis - Digital analytics consultant & Web developer

The Web Scraping Club by Pierluigi Vinciguerra

The Web Scraping Club is a Substack publication curated by Pierluigi Vinciguerra, a seasoned expert in the field of web scraping. This blog serves as a comprehensive resource for individuals interested in the technical and practical aspects of web scraping. It covers a wide range of topics, from basic tutorials for beginners to advanced techniques for experienced scrapers, ensuring there’s something valuable for readers at every skill level. Vinciguerra’s blog is particularly known for its ‘The Lab’ series, which delves into complex challenges faced in web scraping. Recent entries have explored bypassing Cloudflare with anti-detect browsers and evaluating commercial web unblockers. These posts are not only informative...

What’s next for AI agentic workflows by Andrew Ng (AI Ascent by Sequoia Capital)

I came across a fascinating video from Andrew Ng discussing the future of AI with a focus on agentic workflows. Unlike the current non-agentic workflows where we spoon-feed instructions to AI models, agentic workflows allow these models to act more independently, tackling tasks on their own. A great analogy mentioned by Andrew Ng is that non-agentic workflows operate as if you are writting an essay without the option to use backspace (which works remarkably well for LLMs), but what works even better is if you give them the option to contemplate on their results and provide updates (agentic workflows). The...

The rise of the AI engineer by Shawn Wang (Latent.space)

I love Shawn’s articles and thinking style. Both his blog and the latent.space substack have forward-thinking interesting articles around AI engineering. One of the articles that got my attention was the “The rise of the AI engineer“. In the rapidly evolving landscape of artificial intelligence, a new breed of technologists is emerging: the AI Engineer. As detailed in a Latent Space’s article, this role transcends the traditional Prompt Engineer to embrace the full potential of software development within the AI realm. The AI Engineer is at the forefront of a generational shift, harnessing the power of Foundation Models and open-source...

Airbyte serverless to load data to your warehouse in 10 lines of Python source code

AirbyteServerless is a straightforward tool designed to manage Airbyte connectors. It offers the flexibility to run these connectors either locally or in serverless mode. If you’re dealing with data pipelines, ETL, data warehousing, or data engineering, AirbyteServerless is a must-have in your tech stack. It simplifies the process of moving data from various sources to your data warehouse. The repository is available on GitHub here. You can use it to load data to your datawarehouse from almost any data source out there. And you don’t need a DB, a UI or an Airbyte server. Plus, serverless compute deployment is supported meaning it can work on Github...

Tokenization by Andrej Karpathy

In a recent YouTube tutorial, Andrej Karpathy—the wizard behind Tesla’s Autopilot and OpenAI’s GPT—unveiled the secrets of tokenization. Buckle up, tech-savvy professionals, because this isn’t your run-of-the-mill theoretical lecture. It’s a hands-on journey into the heart of language models. So, what’s the deal with tokenization? Imagine it as the backstage choreographer for Large Language Models (LLMs). It translates between human-readable strings and the cryptic tokens that LLMs munch on. In this tutorial, we’re not just peeking behind the curtain; we’re building our own tokenizer from scratch. Here’s the lowdown: So grab your code editor, channel your inner Karpathy, and let’s build...

GA4 and ChatGPT tricks for power users

GA4 and ChatGPT tricks for power users

Are you tired of sifting through mountains of data, drowning in analytics, and feeling like you’re always one step behind your digital marketing goals? Welcome to the era of GA4 Mastery, where the power of Google Analytics 4 (GA4) meets the limitless potential of ChatGPT. In this blog, we’re about to unlock the secrets that will forever change how you navigate the complex world of web analytics. Whether you’re a seasoned data wizard or just dipping your toes into the data-driven universe, these top five tips are your golden ticket to turbocharging your workflow, driving better insights, and achieving digital...

150x Faster Pandas using NVidia

Believe it or not NVIDIA is making Pandas 150x faster without source code changes. What you need to do? Their RAPIDS library will automatically know if you’re running on GPU or CPU and speed up your processing. You can try it here: https://colab.research.google.com/drive/12tCzP94zFG2BRduACucn5Q_OcX1TUKY3 Github Repo (7K stars): https://github.com/rapidsai/cudf Sign up to receive updates about new posts!

Intro to Large Language Models from Andrej Karpathy

Andrew Karpathy (Former Sr Director of AI @Tesla & Research scientist @ OpenAI) recorded a great 1 hour introduction to the mechanics of Large Language Models. I really liked the easy-to-understand examples that even non-experienced listeners can relate to, the awesome summarization of the most important events/breakthroughs that lead the field to where we are today and the LLM OS analogy. Sign up to receive updates about new posts!