Idiot-Proof Methods of Datahoarding
Dec. 22nd, 2024 03:47 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[community profile]](https://www.dreamwidth.org/img/silk/identity/community.png)
A lot of datahoarding resources assume that you're intimately familiar with command prompts, webcrawlers, scripts, and more. But what if... you're stupid? What if you're extremely unfamiliar with those sorts of things, looking to have your hand held through the entire process? What if your name is Azure and you have an account called bedes on Dreamwidth.org? (Wait, that's getting too specific.)
Well, these resources are for you! Idiot-proof resources I've gathered for archiving stuff online! As tested and approved by an actual idiot!
(Quick disclaimer: I call myself "stupid" and "an idiot" lightheartedly, and as a morally neutral descriptor. Being stupid about certain things isn't bad! Which is why I'm making this post!)
Cobalt.tools: this is a great all-in-one resource for saving videos, audio, photos and gifs, with a large list of websites it can rip from. This list includes, but is not limited to, Facebook, Instagram, YouTube, Tumblr, Dailymotion, Pinterest, Reddit, and Twitch. It's open-source, easy to use, privacy-focused, ad-free, fast, and, frankly, It Just Works. No installation is required to use this tool.
Spotdownloader: a tool for ripping from Spotify with the highest possible quality! Very easy to use, and supports downloading a single song, a playlist, and an album, with all the important metadata about the song(s) in tact. (If downloading multiple songs, it puts them all in a convenient ZIP file.) Ad-free, and even has a userscript alternative. No installations are needed.
DiscordChatExporter: used to export any Discord message history to a file! It can export to HTML (dark/light), TXT, CSV and JSON, and supports Discord's form of markdown, embeds, attachments, and emojis. You have to install it via GitHub, and it has an excellent optional user interface that makes the process easy to follow!
Archive.org: known as the quintessential internet archive. I find it difficult to navigate what's already there sometimes, but, in terms of uploading your stuff, it's pretty easy! You do need an account to archive files, but archive.org is definitely the most useful for archiving webpages, which can be done without an account. There is also an Internet Archive extension, which you can use to save webpages quickly and easily, without needing to leave the page or open a new tab. You can also easily archive webpages via GhostArchive and Archive.Today, since backups are always necessary! No installations or downloads are required for any of these (except the Internet Archive extension, which requires you to add the extension).
Imgbrd-Grabber: a customizable tool for bulk-downloading images from imageboards, including (but not limited to) danbooru, safebooru, ArtStation, DeviantArt, Newgrounds and Pixiv, as well as many boorus specializing in pornographic content. The main reason that this tool is idiot-proof is thanks to the extremely thorough instructions provided for the installation and usage process.
Hydrus Network: you're gonna need a way to sort through all those new images you just downloaded, huh? Hydrus Network allows you to do just that -- making a locally-hosted booru of all of your art, with tags! It also supports bulk-downloading, like Imgbrd-Grabber, and shares a lot of the same supported sources, but has the notable unique features of being able to grab from Tumblr, and the ability to 'subscribe' to any gallery, repeating it every few days to keep up with new results. Also like Imgbrd-Grabber, it is an install which is mainly here thanks to its very thorough instructions, which walk the reader through everything.
AO3 Downloader: a life-saver for any person who has thought, "God, I wish I could download all of my bookmarks, but that would take sooo long to do individually." Another Github download which is saved by its thorough instructions!
tumblr-utils: a fantastic method for backing up your tumblr account. It's quite a pain to download and set up, but (say it with me, everyone!) it's saved with a Google Doc of extremely thorough instructions (skip to the "Tumblr-Utils" section... or read all of it! it's a great doc that goes over all sorts of different options for backing up your blog -- this is just the one I prefer). Once you get through setting it all up the first time, though, it works like a dream, with very simple command prompts, which are explained in the doc in layman's terms. These command prompts can save video and audio locally, fallback to the Internet Archive, save your likes, make an index of tags, and support ✨ incremental backups ✨! (Meaning that it continues from the last backup, instead of downloading the entire blog from scratch each time.)
If you know of any data-hoarding / archival resources that wasn't mentioned here, and you think even a total Python-illiterate doofus could get working, link it in the comments below! (Also, please include if it involves or requires any downloads, just because I think that's useful info.)
Well, these resources are for you! Idiot-proof resources I've gathered for archiving stuff online! As tested and approved by an actual idiot!
(Quick disclaimer: I call myself "stupid" and "an idiot" lightheartedly, and as a morally neutral descriptor. Being stupid about certain things isn't bad! Which is why I'm making this post!)
Cobalt.tools: this is a great all-in-one resource for saving videos, audio, photos and gifs, with a large list of websites it can rip from. This list includes, but is not limited to, Facebook, Instagram, YouTube, Tumblr, Dailymotion, Pinterest, Reddit, and Twitch. It's open-source, easy to use, privacy-focused, ad-free, fast, and, frankly, It Just Works. No installation is required to use this tool.
Spotdownloader: a tool for ripping from Spotify with the highest possible quality! Very easy to use, and supports downloading a single song, a playlist, and an album, with all the important metadata about the song(s) in tact. (If downloading multiple songs, it puts them all in a convenient ZIP file.) Ad-free, and even has a userscript alternative. No installations are needed.
DiscordChatExporter: used to export any Discord message history to a file! It can export to HTML (dark/light), TXT, CSV and JSON, and supports Discord's form of markdown, embeds, attachments, and emojis. You have to install it via GitHub, and it has an excellent optional user interface that makes the process easy to follow!
Archive.org: known as the quintessential internet archive. I find it difficult to navigate what's already there sometimes, but, in terms of uploading your stuff, it's pretty easy! You do need an account to archive files, but archive.org is definitely the most useful for archiving webpages, which can be done without an account. There is also an Internet Archive extension, which you can use to save webpages quickly and easily, without needing to leave the page or open a new tab. You can also easily archive webpages via GhostArchive and Archive.Today, since backups are always necessary! No installations or downloads are required for any of these (except the Internet Archive extension, which requires you to add the extension).
Imgbrd-Grabber: a customizable tool for bulk-downloading images from imageboards, including (but not limited to) danbooru, safebooru, ArtStation, DeviantArt, Newgrounds and Pixiv, as well as many boorus specializing in pornographic content. The main reason that this tool is idiot-proof is thanks to the extremely thorough instructions provided for the installation and usage process.
Hydrus Network: you're gonna need a way to sort through all those new images you just downloaded, huh? Hydrus Network allows you to do just that -- making a locally-hosted booru of all of your art, with tags! It also supports bulk-downloading, like Imgbrd-Grabber, and shares a lot of the same supported sources, but has the notable unique features of being able to grab from Tumblr, and the ability to 'subscribe' to any gallery, repeating it every few days to keep up with new results. Also like Imgbrd-Grabber, it is an install which is mainly here thanks to its very thorough instructions, which walk the reader through everything.
AO3 Downloader: a life-saver for any person who has thought, "God, I wish I could download all of my bookmarks, but that would take sooo long to do individually." Another Github download which is saved by its thorough instructions!
tumblr-utils: a fantastic method for backing up your tumblr account. It's quite a pain to download and set up, but (say it with me, everyone!) it's saved with a Google Doc of extremely thorough instructions (skip to the "Tumblr-Utils" section... or read all of it! it's a great doc that goes over all sorts of different options for backing up your blog -- this is just the one I prefer). Once you get through setting it all up the first time, though, it works like a dream, with very simple command prompts, which are explained in the doc in layman's terms. These command prompts can save video and audio locally, fallback to the Internet Archive, save your likes, make an index of tags, and support ✨ incremental backups ✨! (Meaning that it continues from the last backup, instead of downloading the entire blog from scratch each time.)
If you know of any data-hoarding / archival resources that wasn't mentioned here, and you think even a total Python-illiterate doofus could get working, link it in the comments below! (Also, please include if it involves or requires any downloads, just because I think that's useful info.)
no subject
Date: 2024-12-22 09:54 pm (UTC)no subject
Date: 2024-12-22 10:17 pm (UTC)Haha, I'm in that same boat! I'm working up to more and more complicated stuff slowly, but for now, the simple stuff is working just fine for my hoard.
no subject
Date: 2024-12-23 01:27 am (UTC)no subject
Date: 2024-12-23 01:52 am (UTC)I'm super glad you found it helpful! I was also guilty of putting off downloading many of these for many months lol
Also, fun coincidence seeing you here! I use one of your Dreamwidth themes for my journal :>
no subject
Date: 2024-12-23 01:58 am (UTC)OMG slayyy thank you! love your customizations to it!
Thank you!
Date: 2024-12-23 05:14 am (UTC)no subject
Date: 2024-12-23 07:26 am (UTC)For music, there's also lucida.to, which is a bit slow but you can get music from a lot of different platforms, including Spotify.
no subject
Date: 2024-12-23 02:27 pm (UTC)no subject
Date: 2024-12-23 10:24 pm (UTC)Ofc!! The reason I first sought it out was for similar reasons (my husband and I’s early relationship is largely documented over Discord)
no subject
Date: 2024-12-24 12:57 pm (UTC)I'm a fan footage and photos hoarder. Archive.org will often have deleted Youtube videos, although I have to use Web Inspector to grab them.
My fave command-line tool is yt-dlp, which saves video from a variety of websites including Youtube, Tumblr, and Instagram Stories. In its base form it's pretty idiotproof, but if you want to get fancy you can also specify things like quality or file format which is often very necessary for ancient Youtube videos.
no subject
Date: 2024-12-24 06:59 pm (UTC)-G
no subject
Date: 2024-12-24 02:28 pm (UTC)no subject
Date: 2024-12-24 03:56 pm (UTC)(The GitHub link is just a list of extension-store links for Firefox/Chrome/Safari/etc: you usually don't actually install it through GitHub.)
Install the browser extension (officially recommended by Firefox!), pin it to your browser bar, go to the webpage you want to save, and press the button. Ta-da, an HTML file *with* all of the page's images and fonts and such included, so you don't have to deal with the proliferation of _files folders of regular saving-to-HTML! And *so* much faster and easier than [trying to copy-paste into a Word document without losing too much of the formatting].
The right-click menu also offers more complicated options for saving only parts of a page or multiple pages at once.
no subject
Date: 2024-12-24 07:24 pm (UTC)Oh my goodness, this is fantastic!! Thank you for sharing!
no subject
Date: 2025-03-08 05:21 pm (UTC)Web to Epub browser extension: "Convert Web Novels (and other web pages) into an EPUB for offline reading. Works with many sites, including: 1. ArchiveOfOurOwn.org 2. FanFiction.net. 3. royalroadl.com"
(also on github)
no subject
Date: 2025-04-24 01:07 am (UTC)Oooh, this is great, thank you!