archive.today is a security risk

Koopa con Carne

Lidl K. Rool
MarioWiki
Koopa con Carne
This wiki, being an informational resource, relies actively and extensively on digital archives to cite info. Wayback Machine may be the most widely used and prestigious option, both here and overall, but it has its limitations (cannot parse Java Script too well, excludes certain sites), which leads people turning to a notable alternative: archive.today.

The problem with this site? Setting aside its occulted ownership and the fact that it is much less responsive to takedown requests than its bigger cousin, Wayback--it has been the perpetrator of a recent DDoS attack against a blog called gyrovague.com, essentially in response to a critical article doxxing attempt back in 2023. Here is this blog making the claim and detailing the situation:


Whether or not the owner of archive.today was in the right to retaliate is a different discussion. Why this concerns us is because archive.today is, apparently, using its visitors as proxies in this DDoS. This isn't just an inter-personal feud. It basically means that whenever a Mario Wiki reader is clicking an archive.today link on one of our articles, their traffic is instantly and unknowingly leveraged as part of the attack. As a result, I doubt these users would take steps to block related network requests.

What specifically made me confident enough to bring this up is the fact that Wikipedia itself is having a very lengthy debate at this moment on deprecating/blacklisting this archival service. While some users argue that it's the ethical thing to do, others are less eager to oblige, arguing that it remains a salient alternative to Wayback and that having a trustworthy encyclopedia should take priority over firewalling their readers.

What do we do? Do we straight up blacklist archive.today? Do we limit its use to citations where the original link is dead (and no other archival option is available)? Do we distance from it in any way?
 
Last edited:
MY GOD THE USE OF ARCHIVE TODAY HAS DISPANDED!!!
1770858833571.gif
 
Would it be worth duping the contents of these archive pages somewhere else as an alternative? Or embed the pages without Javascript somehow? Now that the vulnerability is known, it would be irresponsible of the wiki to let people be pawns in a cyberattack.

Though who the heck isn't using UBlock Origin in [currentyear]?
 
Considering we're only a video game wiki and not a service people use daily for important topics like Wikipedia, I'm personally willing to take the quality hit by blacklisting archive.today until they stop running the DDOS.

This stance is in part because keeping the links but telling readers to not use the links isn't practical: most people would either not find or not remember the instruction.
 
Last edited:
Man this is a tough one. The ethical implication of using archive.today would be a lot easier to handwave if using the service at all didn't serve as a proxy to DDoS some guy's blog, and mirroring the content on wayback machine or recreating it on some other ressource is not really viable either.

I don't have clear thoughts on this topic. Maybe when I read the full wikipedia discussion.
 
In that Wikipedia talk, someone suggested using megalodon.jp instead. What's particularly useful about it is that you can use it to archive archives from archive.today. I just did using a snapshot cited on our wiki! Look!

https://megalodon.jp/2026-0214-0427-13/https://archive.ph:443/uSC0a

Then you can run this page through Wayback Machine to archive the archive of the archive of archive.today. (You normally can't do that without the megalodon intermediary.)

https://web.archive.org/web/2026021...026-0214-0427-13/https://archive.ph:443/uSC0a
 
Last edited:
Addendum: if you use Megalodon to archive an AT snapshot of a Xitter post, then click on any photo in the captured post, you will be redirected to X.com through archive.md, which is an archive.today domain. Now, from my understanding, your traffic is used in that DDoS only if you encounter the captcha screen at archive.today, which doesn't occur in the aforementioned situation... but if you'd rather be safe than sorry, I would recommend straight up forgoing archive.today and only using Megalodon (followed by Wayback) if you want to capture an X post. Or, for that matter, anything else that isn't directly capturable by Wayback.
 
I've read a good chunk of the debate page on this subject. I feel the best course of action is approach content relying with archive.today cautiously but I don't think it's helpful to go backward to remove every link especially if there isn't a replacement available; archive.today is popular for a reason. I don't think we should stick with leaving the issue unaddressed either because someone from archive.today is exploiting this service to foster a feud against some blogwriter.

Wikipedia's discussion page is helpful to process what is going on, but Wikipedia is a far larger site than ours and so all issues related to users interacting with the CAPTCHA from an archive.today link is going to be magnified in Wikipedia's case due to Wikipedia enjoying far higher traffic and having far stricter sourcing standards that necessitate archived links than our wiki. Whatever harm in keeping the links in MarioWiki is not going to be as pronounced versus in Wikipedia.

Regardless, the harm being done still exists, as people are arguing, and the real question is, how much content are we willing to risk to minimize the harm? Due to the above, I do not think we should be bending over backward, and some people in discussion have already brought up that several sites already do exploit viewers with tracking, cookies, etc. that are typical of most sites with advertisements anyway and it's already debatable why potentially contributing to DDOS attack on a blogger should have higher priority for viewer security than the typical ad-infested page sources can link to.

Our decision should hinge on how much archive.today links we have. Someone on Discord answered my question: there are around 500 hits of archive.today being used, but I like to know more details such as where they're used and what kind of content it is typically.

archive.today's owners or whoever is managing that site is engaging in a legally dubious activity. In the end of the day, it may not be up to us to be vigilant with the links but instead is matter of law enforcement. Which leads to the next point: archive.today operates in a really gray legal zone due to its tendency to make paywalled articles accessible, and it seems to be a very common use for it in Wikipedia's articles. Even if this issue is resolved and the owners aren't using botnet, we need to proceed cautiously on relying on this site to prevent link rot. The site could go down the next hour, tomorrow, next week, next month, next year. Who knows. We need to pretend the site will be gone the next day.

And even if that isn't the case, let's say there isn't legal troubles concerning the questionable copyright status of the site, I do think the archive.today has proven itself to not be a trustworthy service if it's willing to leverage its status as a service for a petty dispute. edit: Even if this dispute apparently has arisen from the blog trying to dox the archive.today owners, which is a moot point to me compared to the terrible optics and decisions archive.today has made. If it's capable of this kind of stunt today, who knows what other dirty underhanded tactics it'll try? I don't think any of us want to find out the hard way.

So, I'm leaning toward keeping links to archive.today mostly because we have many instances of it, BUT there are caveats to including new archive.today links. I don't think we should ban new links to archive.today but instead any edits that add archive.today show a tag such as the tags we already have with disambiguation links, redirects, mobile edits, and so on. We could possibly add a warning on the top of the editing field to show the issue with archive.today links just to inform people about the situation but that might be overkill. What informs my approach to this is knowledge that MarioWiki is nowhere near the size of Wikipedia, doesn't have the traffic to really affect the status of a blog, and breaking archive.today links just to try to probably nearly negibly impact this blog's condition is overcorrecting, overstepping, overdoing, etc. BUT there is acknowledgement that the leveraging traffic to attack some blog is pretty gross hence probably there should be inconvenience and awareness of the issue and there should be a discouraging of using archive.today, excepting if no alternatives exist.

I lean toward the equivalent option B in the Wikipedia post but I wouldn't go as far as banning future links.
 
Last edited:
Personally, we'd just prefer to sidestep the issues with a service willing to weaponize its own userbase (no matter how much self-defense it involved) by just... Not using it whenever possible. Our patience for that sort of thing is nonexistent when it's smaller website; that a large archive is pulling this is kind of absurd!

Now, we don't think it should be outright prohibited, just as we don't doubt that there aren't at least a few pages where they're the sole archive available, but we should definitely dis-incentivize it by expressly replacing them with alternatives whenever possible. Really, it should just be a last resort.
 

If this is to be believed, it appears that the archive site tampered with snapshots to include the blogger's name and whatnot. This makes me more firmly believe we cannot trust archive.today for content on MarioWiki if it's willing to contaminate its content like this as such a petty getback at this blog. I'm surprised as the rest of you that the owners of archive.today has such poor judgement and tarnished their reputation over something most people otherwise wouldn't care about.

If Wikipedia found out that (according to an update on the Wikipedia discussion) uses can be replaced, then certainly MarioWiki can heed to that.
 
Last edited:
So Megalodon cannot fetch archive.today snapshots anymore, and some of the current archive.today captures haven't been backed up anyhow. This includes the character page from the European Mario Tennis Open site, which is/should be cited on "Profiles and statistics" pages.

How do we go around that? In the Tennis Open site's case, we can't simply take screenshots since the cited webpage uses a carousel UI to present its info, which is not functional in the archive.today capture. (However, the relevant info is still available in its source code.) This is kind of a crisis as the site hasn't been archived anywhere else at all, as it seems to be the fate for all European Nintendo sites...
 
Back