

Maybe not with just if statements. But with a heuristic system I bet any site that runs a tar pit will be caught out very quickly.
Maybe not with just if statements. But with a heuristic system I bet any site that runs a tar pit will be caught out very quickly.
When I worked in the U.S. I was well above $160k.
When you look at leaks you can see $500k or more for principal engineers. Look at valves lawsuit information. https://www.theverge.com/2024/7/13/24197477/valve-employs-few-hundred-people-payroll-redacted
Meta is paying $400k BASE for AI Reserch engineers with stock options on top which in my experience is an additional 300% - 600%. Vesting over 2 to 4 years. This is to H1B workers who traditionally are paid less.
Once you get to principal and staff level engineering positions compensation opens up a lot.
https://h1bdata.info/index.php?em=meta+platforms+inc&job=&city=&year=all+years
ROI does not matter when companies are telling investors that they might be first to AGI. Investors go crazy over this. At least they will until the AI bubble pops.
I support people resisting if they want by setting up tar pits. But it’s a hobby and isn’t really doing much.
The sheer amount of resources going into this is beyond what people think.
That and a competent engineer can probably write something on the BEAM VM that can handle a crap ton of parallel connections. 6 figure maybe? Being slow walked means low CPU use which means more green threads.
I see your point but like I think you underestimate the skill of coders. You make sure your timeout is inclusive of JavaScript run times. Maybe set a memory limit too. Like imagine you wanted to scrape the internet. You could solve all these tarpits. Any capable coder could. Now imagine a team of 20 of the best coders money can buy each paid 500.000€. They can certainly do the same.
Like I see the appeal of running a tar pit. But like I don’t see how they can “trap” anyone but script kiddies.
Fair. But I haven’t seen any anti-ai-scraper tarpits that do that. The ones I’ve seen mostly just pipe 10MB of /dev/urandom out there.
Also I assume that the programmers working at ai companies are not literally mentally deficient. They certainly would add .timeout(10)
or whatever to their scrapers. They probably have something more dynamic than that.
They want to reduce the bandwidth usage. Not increase it!
I mean they’re synced super fast to every file system. It works really well. Wayyy wayyy faster than nextcloud too. You can access them on that file system. If you want to “directly” access them you can always use the fuse driver. This being said there isn’t really a need to because all the files just are synced to your file system.
Yah that term isn’t an official term. I just meant it in the sense of a IPv6 prefix. Without knowing more about how your router firewall works / in set up I can’t be too specific.
But in general the way things work with ip addresses is that your ISP provides you with a block of IPv6 address. This block is the prefix/first part of any given ipv6 address on your network. Each host uses that prefix and generates a suffix that it adds in to it in order to generate a full globally reputable IPv6 address.
By default most hosts use the IPv6 privacy extension to random suffixes and cycle through them. This is good for privacy but bad for hosting a public service. You need to turn off the privacy extension and the second half of the IPv6 address will stay static.
Next up you need to write a firewall rule to allow traffic to that globally routable IPv6 address. In an IPv6 system the router does not intercept or rewrite the packets like it does with IPv4. So all a router does is act as a firewall saying “Yup outside hosts can or can’t make inbound connections to certain hosts/ports”
The trick with a consumer IPv6 address space is that just like IPv4 addresses given to your router, the IPv6 prefix can change randomly.
It would be annoying to have to update the firewall rule every time this happened. That’s why the idea of masking matters. You tell the firewall “ignore the prefix of this firewall rule. Just allow or deny based on the static suffix.”
The way to write such rules is different on different firewalls. Most consumer devices don’t have a way to configure such things. Even professional networking equipment mostly makes you use the cli to manage such things.
I hope this helps.
I’m glad you got it working with IPv4. For the record though the way to do such a thing in the future is to think in IPv6. In IPv6 there is no nat or port forwarding. Even if you have host exposure. You need to set an appropriate rule in your router firewall.
On the host itself you need to use public IPv6 addresses. Then on the router firewall you set a firewall rule with an appropriate delegation mask allowing traffic to the specified port.
It’s different than IPv4 but once you learn IPv6 it’s easy.
That’s insane. I would consider a ipv4 -> ipv6 cloud hosted haproxy style setup if this was my only option.
i would just ask for an Ipv4 address. I asked Vodafone for one and they just gave it to me for free.
It’s government reporting data. If you find a better source I say go for it. But I used that data for salary negotiations in the past successfully.
I’m not talking about take home. I’m talking about total annual compensation including things like RSU payouts etc.
Even if we throw out the ones you doubt there are many 300k to 400k entries with the AI researcher title. If we add annualized RSU payouts we easily hit over €500k.
At this point t though you are free to doubt me.