@chicken

chicken@lemmy.dbzer0.com · 9 days ago

that is not the … available outcome.

It demonstrably is already though. Paste a document in, then ask questions about its contents; the answer will typically take what’s written there into account. Ask about something you know is in a Wikipedia article that would have been part of its training data, same deal. If you think it can’t do this sort of thing, you can just try it yourself.

Obviously it can handle simple sums, this is an illustrative example

I am well aware that LLMs can struggle especially with reasoning tasks, and have a bad habit of making up answers in some situations. That’s not the same as being unable to correlate and recall information, which is the relevant task here. Search engines also use machine learning technology and have been able to do that to some extent for years. But with a search engine, even if it’s smart enough to figure out what you wanted and give you the correct link, that’s useless if the content behind the link is only available to institutions that pay thousands a year for the privilege.

Think about these three things in terms of what information they contain and their capacity to convey it:

A search engine
Dataset of pirated contents from behind academic paywalls
A LLM model file that has been trained on said pirated data

The latter two each have their pros and cons and would likely work better in combination with each other, but they both have an advantage over the search engine: they can tell you about the locked up data, and they can be used to combine the locked up data in novel ways.

chicken@lemmy.dbzer0.com · edit-2 9 days ago

Ok, but I would say that these concerns are all small potatoes compared to the potential for the general public gaining the ability to query a system with synthesized expert knowledge obtained from scraping all academically relevant documents. If you’re wondering about something and don’t know what you don’t know, or have any idea where to start looking to learn what you want to know, a LLM is an incredible resource even with caveats and limitations.

Of course, it would be better if it could also directly reference and provide the copyrighted/paywalled sources it draws its information from at runtime, in the interest of verifiably accurate information. Fortunately, local models are becoming increasingly powerful and lower barrier of entry to work with, so the legal barriers to such a thing existing might not be able to stop it for long in practice.

chicken@lemmy.dbzer0.com · 10 days ago

The OP tweet seems to be leaning pretty hard on the “AI bad” sentiment. If LLMs make academic knowledge more accessible to people that’s a good thing for the same reason what Aaron Swartz was doing was a good thing.

chicken@lemmy.dbzer0.com · edit-2 17 days ago

thepiratebay still exists but is regarded as untrustworthy and infested with malware. I’d say knowing you’re getting something from a trustworthy source is harder than it used to be.

chicken@lemmy.dbzer0.com · 17 days ago

I guess there are probably a lot of people trading that stuff dumb enough to be networking on facebook and instagram with their real identities

chicken@lemmy.dbzer0.com · 18 days ago

The listing notes that special operations troops “will use this capability to gather information from public online forums,” with no further explanation of how these artificial internet users will be used.

Any chance that’s the real reason and not just a flimsy excuse? What kind of information would you even need a fake identity to gather from a public forum?

chicken@lemmy.dbzer0.com · edit-2 18 days ago

In 2022, industry front groups co-signed a letter to Congress arguing that “[a] growing patchwork of state laws are emerging which threaten innovation and create consumer and business confusion.” In 2024, they were at it again this Congress, using the term four times in five paragraphs.

Big Tobacco did the same thing.

Is this really a fair comparison though? A variety of local laws about smoking in restaurants makes sense because restaurants are inherently tied to their physical location. A restaurant would only have to know and follow the rules of their town, state and country, and the town can take the time to ensure that its laws are compatible with the state and country laws.

A website is global. Every local law that can be enforced must be followed, and the burden isn’t on legislators to make sure their rules are compatible with all the other rules. Needing to make a subtly different version of a website to serve to every state and country to be in full compliance with all their different rules, and needing to have lawyers check over all of them would create a situation where the difficulty and expense of making and maintaining a website or other online service is prohibitive. That seems like a legitimate reason to want unified standards.

To be fair there are plenty of privacy regulations that this wouldn’t apply to, like the example the article gives of San Francisco banning the use of facial recognition tech by police. But the industry complaint linked in the article references laws like https://www.oag.ca.gov/privacy/ccpa and https://leg.colorado.gov/bills/sb21-190 that obligate websites to fulfill particular demands made by residents of those states respectively. Subtle differences in those sorts of laws seems like something that could cause actual problems, unlike differences in smoking laws.

chicken@lemmy.dbzer0.com · 22 days ago

Seems like a good thing, 3 chances one of them will get it right

chicken@lemmy.dbzer0.com · 22 days ago

That doesn’t sound like something you get arrested for though

chicken@lemmy.dbzer0.com · 23 days ago

I’d be skeptical that’s even real, outside of a select few countries with especially strict copyright enforcement

chicken@lemmy.dbzer0.com · 23 days ago

people being arrested for using pirate streaming services

What circumstances does that even happen in? Like a bar that plays a pirated sports stream?

chicken@lemmy.dbzer0.com · 1 month ago

It’s not actually clear that it only affects huge companies. Much of open source AI today is done by working with models that have been released for free by large companies, and the concern was that the requirements in the bill would deter them from continuing to do this. Especially the “kill switch” requirement made it seem like the people behind the bill were either oblivious to this state of affairs or intentionally wanting to force companies to stop releasing the model weights and only offer centralized services like what OpenAI is doing.

chicken@lemmy.dbzer0.com · 1 month ago

Another reason right to repair is needed

chicken@lemmy.dbzer0.com · 1 month ago

If you are at the point where you are having to worry about government or corporate entities setting traps at the local library? You… kind of already lost.

What about just a blackmailer assuming anyone booting an OS from a public computer has something to hide? And then they have write access and there’s no defense, and it doesn’t have to be everywhere because people seeking privacy this way will have to be picking new locations each time. An attack like that wouldn’t have to be targeted at a particular person.

chicken@lemmy.dbzer0.com · 1 month ago

Isn’t it risky plugging usb drives into untrusted machines?

chicken@lemmy.dbzer0.com · 1 month ago

I bet it was something like the hardware id instead but she misspoke

chicken@lemmy.dbzer0.com · 2 months ago

I know that’s how it works in the US, but the lawsuit is in Japan, which you always hear about having stricter copyright laws. Not really sure how this one will play out though.

chicken@lemmy.dbzer0.com · 2 months ago

I wonder if part of the reason for supporting this is that they like the secondary effect that all this information is now also available to governments

chicken@lemmy.dbzer0.com · 2 months ago

Can’t track mouse movements on mobile though

chicken@lemmy.dbzer0.com · 3 months ago

Obligatory LLMs see tokens not letters