• argv_minus_one@beehaw.org
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    1 year ago

    Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.

    Also, Google and Microsoft already have a search index full of Reddit content to scrape.

    • sealneaward@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      You are right. You would need a team of skilled scrapers and network engineers though would know how to get around rate limiters with some kind of external load balancer or something along those lines.

      • MrPoopyButthole@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Rate limiters work on IP source. This is easily bypassed with a rotating proxy. There are even SaaS that offer this. The trick is to not use large subnets that can be easily blocked. You have to use a lot of random /32 IPs to be effective.