A tiny mouse, a hacker.

  • 0 Posts
  • 10 Comments
Joined 7 months ago
cake
Cake day: December 24th, 2023

help-circle
  • It’s not. It just doesn’t get enough hits for that 86k to matter. Fun fact: most AI crawlers hit /robots.txt first, they get served a bee movie script, fail to interpret it, and leave, without crawling further. If I’d let them crawl the entire site, that’d result in about two megabytes of traffic. By serving a 86kb file that doesn’t pass as robots.txt and has no links, I actually save bandwidth. Not on a single request, but by preventing a hundred others.



  • That would result in those fediverse servers theoretically requesting 333333 * 114MB = ~38Gigabyte/s.

    On the other hand, if the site linked would not serve garbage, and would fit like 1Mb like a normal site, then this would be only ~325mb/s, and while that’s still high, it’s not the end of the world. If it’s a site that actually puts effort into being optimized, and a request fits in ~300kb (still a lot, in my book, for what is essentially a preview, with only tiny parts of the actual content loaded), then we’re looking at 95mb/s.

    If said site puts effort into making their previews reasonable, and serve ~30kb, then that’s 9mb/s. It’s 3190 in the Year of Our Lady Discord. A potato can serve that.


  • I only serve bloat to AI crawlers.

    map $http_user_agent $badagent {
      default     0;
      # list of AI crawler user agents in "~crawler 1" format
    }
    
    if ($badagent) {
       rewrite ^ /gpt;
    }
    
    location /gpt {
      proxy_pass https://courses.cs.washington.edu/courses/cse163/20wi/files/lectures/L04/bee-movie.txt;
    }
    

    …is a wonderful thing to put in my nginx config. (you can try curl -Is -H "User-Agent: GPTBot" https://chronicles.mad-scientist.club/robots.txt | grep content-length: to see it in action ;))



  • algernon@lemmy.mltoLinux@lemmy.mlNixOS forked
    link
    fedilink
    arrow-up
    3
    ·
    2 months ago

    There’s plenty, but I do not wish to hijack this thread, so… have a look at the Forgejo 7.0 release notes, the PRs it links to along notable features (and a boatload of bugfixes, many of which aren’t in Gitea). Then compare when (and if) similar features or fixes were implemented in Gitea.

    The major difference (apart from governance, and on a technical level) between Gitea and Forgejo is that Forgejo cherry picks from Gitea weekly (being a hard fork doesn’t mean all ties are severed, it means that development happens independently). Gitea does not cherry pick from Forgejo. They could, the license permits it, and it even permits sublicensing, so it’s not an obstacle for Gitea Cloud or Gitea EE, either. They just don’t.




  • There’s a very easy solution that lets you rest easy that your instance is how you want it to be: don’t do open registration. Vet the people you invite, and job done. If you want to be even safer, don’t post publicly - followers only. If you require follower approval, you can do some basic checks to see that whoever sends a follow request is someone you’re okay interacting with. This works on the microblogging side of the Fediverse quite well, today.

    What I’m trying to say is that with registrations requiring admin approval gets you 99% of the way there, without needing anything more complex than that.