Ask HN: Weird archive.today behavior?

archive.today has recently (I noticed this, like, 3 days ago) started automatically making requests to someone's personal blog on their CAPTCHA page. Here's a screenshot of what I'm talking about: https://files.catbox.moe/20jsle.png

The relevant JS is:

   setInterval(function() {
     fetch("https://gyrovague.com/?s=" + Math.round(new Date().getTime() % 10000000), {
       referrerPolicy: "no-referrer",
       mode: "no-cors"
     });
   }, 300);
Looking at this blog, there seems to be exactly one article mentioning archive.today - "archive.today: On the trail of the mysterious guerrilla archivist of the Internet" (https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/), where the person running the blog digs up some information about archive's owner.

So perhaps this is some kind of revenge/DOS attack attempt/deliberately wasting their bandwidth in response to this article? Maybe an attempt to silence them and force to delete their article? But if it is, then I have so many questions. Like, why would the owner of the archive do that 2.5 years after the article was published? Or why would they even do that in the first place, do they not know about Streisand effect?

I'm confused.

63 points | by rabinovich 7 hours ago

10 comments

  • rafram 1 hour ago
    Remember when Archive.is/today used to send Cloudflare DNS users into an endless captcha loop because the creator had some kind of philosophical disagreement with Cloudflare? Not the first time they’ve done something petty like this.
  • dunder_cat 1 hour ago
    Hmm. If it is an attempt at DDoS attacks, it's probably not very fruitful:

      >$ resolvectl query gyrovague.com
    
      gyrovague.com: 192.0.78.25                     -- link: eno1
                     192.0.78.24                     -- link: eno1
    
    Viewing the first IP address on https://bgp.he.net/ip/192.0.78.25 shows AS2635 (https://bgp.he.net/AS2635) is announcing 192.0.78.0/24. AS2635 is owned by https://automattic.com aka wordpress.com. I assume that for a managed environment at their scale, this is just another Wednesday for them.
    • arcfour 53 minutes ago
      I believe they're probably trying to get the blog suspended (automatically?) hence the cache busting; chewing through higher than normal resources all of a sudden might do the trick even if it doesn't actually take it offline.
    • dunder_cat 1 hour ago
      It occurred to me while reading the article that I could also just have checked the TLS cert. The cert I was given presents "Common Name tls.automattic.com". However, maybe someone will discover bgp.he.net via this :-)
      • catlifeonmars 59 minutes ago
        > maybe someone will discover bgp.he.net via this

        I did, thank you!

    • mike_d 56 minutes ago
      It is using the ?s= parameter which causes WordPress to initiate a search for a random string. This can result in high CPU usage, which I believe is one of the DoS vectors that works on hosted WordPress.
  • ideasphere 31 minutes ago
    https://news.ycombinator.com/item?id=45922875

    “Behind the complaints: Our investigation into the suspicious pressure on Archive.today”

  • eli 39 minutes ago
    Well that is a very silly way to punish the author of an article you don’t want people to know about.
  • sbdaman 1 hour ago
    Given it's set to generate random pages on the site, is there even any possible explanation for this that isn't sketchy?
    • mediumdeviation 1 hour ago
      It's not random, setting the query string to a new value on every fetch is a cache busting technique - it's trying to prevent the browser from caching the page, presumably to increase bandwidth usage.
  • nativeit 1 hour ago
    I just tried in my browser (Firefox on Ubuntu) and got the same result. Deeply curious.
  • internetter 59 minutes ago
    There's really no interpretation of this which isn't malicious, although, not to defend this behaviour whatsoever, I'm not entirely surprised by it. The only real value of archive.is is its paywall bypassing abilities and, presumably, large swaths of residential proxies that allow it to archive sites that archive.org can't. Only somebody with some degree of lawlessness would operate such a project.
  • mediumdeviation 1 hour ago
    Pretty sure that blog is hosted on Wordpress.com infrastructure so it's not like the blog owner would even notice unless it generates so much traffic that WP itself notices.

    That said I don't think there's many non-malicious explanation for this, I would suggest writing to HN and see about blocking submissions from the domain [email protected]

    • heraldgeezer 1 hour ago
      Please no.

      It is the only non-cucked archive site in the world.

  • Barbing 44 minutes ago
    Worth blocking the URL for users of that Archive site then, avoid extra burden?
  • ventegus 6 hours ago
    They might need to tweak a single word. Streisand readers won’t have a clue which.

    Save the page now and compare a week later.