• sealneaward@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Creating a web scraper vs actually maintaining one that is effective and works is two different things. It’s very easy to fight web scraping if you know what you are doing.

    • argv_minus_one@beehaw.org
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      1 year ago

      Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.

      Also, Google and Microsoft already have a search index full of Reddit content to scrape.

      • sealneaward@lemmy.ml
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        You are right. You would need a team of skilled scrapers and network engineers though would know how to get around rate limiters with some kind of external load balancer or something along those lines.

        • MrPoopyButthole@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          Rate limiters work on IP source. This is easily bypassed with a rotating proxy. There are even SaaS that offer this. The trick is to not use large subnets that can be easily blocked. You have to use a lot of random /32 IPs to be effective.