Lemmy's List: Downloadable AI, Databases - Critical Knowledge Backup

Elias Griffin@lemmy.world · edit-2 1 year ago

Lemmy's List: Downloadable AI, Databases - Critical Knowledge Backup

sir_reginald@lemmy.world · edit-2 1 year ago

This is too much catastrophism for my taste, but If I wanted to start archiving, I’ll start by downloading Wikipedia, The Library Genesis and the Gutenberg Project.

Videos are too heavy to archive with ease, and they are probably of much less value of actual knowledge.

𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.one · 1 year ago

Haven’t heard about the Gutenberg project before, seems pretty neat!

I’d probably add repair.wiki to a list of things I’d archive, although some of that content is picture heavy so not as easily compressible as Wikipedia

There was a project that allows you to download wikipedia and some other online resources into an easy to search & navigate UI, think it was called Kiwi something but can’t remember. It was targeted at regions with poor internet coverage

JackGreenEarth@lemm.ee · 1 year ago

Yup Kiwix, an app available for Android, iOS, Linux and possibly other OSs too.

SeaJ@lemm.ee · 1 year ago

Project Gutenberg has been a thing for a couple decades. I think they are starting to also create free audiobooks from books they have in their collection. There is an TTS AI service that I checked out a week ago (play.ht)and that does voicing very realistically from the text that I gave it and I might spring spend $40 for a month of that service and build some audiobooks. The paid version gives access to more voices and will do 1 million characters of text a year.

Or if anyone knows a good open source online alternative, I’m all ears. I’d prefer to go that route but did not give anything that was a very good solution.

fubo@lemmy.world · 1 year ago

Humanity has been using writing for millennia. It’s a proven technology. Photographs and video don’t tend to last longer than the one institution or family that cares about them.

fiat_lux@kbin.social · 1 year ago

Mostly due to previous physical constraints, I would argue. Thankfully there are fewer chances your hard drive is going to decompose into vinegar while sitting in your cupboard, and even if it does, it’s likely not the only copy.

They’re also more limited for current data because they’re harder to parse and convert into other usable formats, but thankfully that will get better over time too.

I still preference text-first data for various reasons, but let’s not dismiss the leagues of potential video has for communication and archival value, both intentional and unintentional.

Taleya · 1 year ago

Plus writing dgaf if you get hit with a carrington event

fiat_lux@kbin.social · 1 year ago

Perhaps think of it more as knowledge decentralization as a form of resiliency for unplanned network outages. Sometimes the library of Alexandria just happens to catch fire, and it might be nobody’s fault at all.

Besides, plenty of people grew up in families with a basic encyclopaedia or dictionary or a repair manual. This is essentially the same thing, just with less paper.

Elias Griffin@lemmy.world · 1 year ago

I’m particulary looking for anyone that already has a collection of Arxiv and Sci-Hub papers. Please curate your collection and make it available here!

We also need a hashtag/topic/keyword for this project that is brief and catchy we can also use for a GitHub search, etc. Anyone?

Nix@merv.news · 1 year ago

Is it possible to download an archive of scihub?

PeachMan@lemmy.world · 1 year ago

Sci-Hub is ENORMOUS, about 100TB. If you want to help preserve it, you can torrent and seed one of their many 100GB chunks.

BolexForSoup@kbin.social · 1 year ago

Super cool never knew about this. I got probably 1-2tb I can spare for the effort.

Elias Griffin@lemmy.world · 1 year ago

What a fantastic resource, this is exactly what is needed. I also found about The Standard Template Construct Library:

“Learn about how to access large corpus of high-quality scholarly texts using Python and use them in AI apps”

Siddhartha-Aurelius@kbin.social · 1 year ago

Does anyone know if a LLM has been trained on something like scihub?

Lemmy's List: Downloadable AI, Databases - Critical Knowledge Backup

Lemmy's List: Downloadable AI, Databases - Critical Knowledge Backup

★ Lemmy List

Databases

AI