Storing the contents of the Cassiopaean Forum on your computer

Unfortunately, it appears that the signal extracted from the many articles published on SOTT trends toward the slow demise of our freedoms on an international scale. In its extrapolation, I fear that the freedom to engage in discussions of the nature currently taking place within this forum will eventually meet its end. Thus, when/if this lighthouse is extinguished by the slow and tedious chess-like maneuvering of the PTB what is to become of the wealth of information contained within this forum and its related sites? I suppose my question is: Have those in charge of this site (and related sites) considered such a scenario? Or is it not that plausible and possibly my pessimism is making a "mountain out of a molehill"? Might it be possible to store copies of the forum on our computer? I would like to apologize in advance for my lack of knowledge in copyright laws, if what I am proposing amounts to breaking the law. The scenario has been on my mind and I was curious if others have thought about it and what solutions might have come about in the process.
 
This is one scenario that I'm also wary of. This place is about the only site/forum in the net that I frequent and that is for a good reason. In light of past attacks on Sott I'm thinking it is a very good idea to store the information somehow.

Maybe not every thread should be saved (that would be too much work), but atleast every stickied and otherwise important discussion. Also, the content of Cassiopaea.org would be most helpful to have in a downloadable packet (I made a thread about that, but had some problems with saving individual articles from cassiopaea.org, might have to collect them by hand).

I understand that this would be a huge task for anyone to do, and that Sott/Cass-staff is busy so maybe do it as a sort of community project between those who would volunteer from the forum members?
 
One thing the site admin(s) could do - if legally viable (depends on how the copyright on members' posts submitted here is dealt with) - that would take relatively little work would be to release a copy of the forum database with member accounts removed (thus removing sensitive information) for download. This combined with the forum software or other software (something that could perhaps later be made) to read it would then make it browseable.

As for the main site, several people on the forum have previously mentioned making their own backups, be they complete or not, of its content. I also have a copy of some of the content saved. But I wouldn't be wholly comfortable releasing it without permission. So, in any case, it would be good to know the stance of the copyright owners (as in Laura, Ark) on this, and if they would grant permission to distribute copies of the material at whatever point the site comes to an end.
 
What you could do, is simply make use of the print button for the stickied topics. (Or for other topics you think you would later on read again, think they are important to you…) And print it, or safe it as html or if possible as a pdf file.

My two cents.
 
abcdefghiJoerg said:
What you could do, is simply make use of the print button for the stickied topics. (Or for other topics you think you would later on read again, think they are important to you…) And print it, or safe it as html or if possible as a pdf file.

My two cents.

If you've ever tried that, for the thousands of pages of data that have been compiled here on the forum, it's a gargantuan task, and you wouldn't have even begun to organize it in a usable way . . . osit
 
It would be kewl if the forum site changes were archived up into static html once per week as a zip or tarball that we could then download. Or, if it had rss functionality so we could just download the changes without causing a heavier load on the site.
 
Yossarian said:
abcdefghiJoerg said:
What you could do, is simply make use of the print button for the stickied topics. (Or for other topics you think you would later on read again, think they are important to you…) And print it, or safe it as html or if possible as a pdf file.

My two cents.

If you've ever tried that, for the thousands of pages of data that have been compiled here on the forum, it's a gargantuan task, and you wouldn't have even begun to organize it in a usable way . . . osit


To clarify my statement, I meant not to backup the whole forum with the "print" button that would, as you wrote, absolutely a gargantuan task, but more or less simply for the -stickied- topics (the first three threads as an example in: "What's on your mind" etc.).

And in the end who would have about 100 hundred of these -stickied- topics of the whole forum in a digital form (i.e. pdf's).
 
rs said:
You could always use one of the available web site mirror packages such as:

http://www.httrack.com/

A simple google search shows that there are a number of solutions.

http://www.google.com/search?q=website+copy+software&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official

The problem with those approaches is they put an artificially heavy load on the site while spidering. If every one of us started doing that it could bring the site to a crawl or cause the owners to have to invest in more hardware and/or bandwidth to keep up with the load. IOW, extremely inconsiderate without their permission.

In my opinion, the best approach would be to work with QFG to come up with an efficient solution if they don't have a problem with us archiving the contents on our computers.
 
FWIW I have had similar thoughts.

It would be nice to have the forum discussions on local disk *strictly for personal use*. For example to be able to review topics offline, and use one's own offline search tools.

As pointed out, it is maybe not recommended to use web site mirroring if it puts a strain on the site, and just to add to that: when I have tried those programs on forum sites, I notice it might possibly download literally gigabytes of binary files in addition to the text html files that contain the threads. Definitely not good for the site or the recipient of the data, IMO.
 
httrack

I'm just creating this thread, but I will probably post more if I get to exploring the program.

httrack is a command-line program which runs a mac / linux computers. It crawls/trawls (lol) the internet, certain websites etcetera, and downloads the pages into a browser viewable format(s). I figure if parts of the internet go down in the future, this might be a useful tool. The main motivation I had for downloading the program and learning how to use it, which should be fairly simple, is that I had the idea to download the entire cassiopaea.org website along with the forums here. This raises some ethical questions however. The program does allow bandwidth limits, which is a good thing. Anyways thats all, I'll probably get back with some more information later.
 
Re: httrack

Interesting post. I was thinking the same as to create my own replica of Cassiopaea content and keep it in synch.

Nonetheless it is also very interesting to use this strategy for downloading content that is not hosted but linked. Just to grab it all (including images, pdf, etc).

I was considering also to push all content within an Alfresco document management server. It is very nice to search using all available content and not depending on the content-type (html, pdf, doc, odt, etc.).

If I finally push some ideas into a real implementation will inform you.

Yours,
Jordi
 
Re: httrack

wetroof said:
I'm just creating this thread, but I will probably post more if I get to exploring the program.

httrack is a command-line program which runs a mac / linux computers. It crawls/trawls (lol) the internet, certain websites etcetera, and downloads the pages into a browser viewable format(s). I figure if parts of the internet go down in the future, this might be a useful tool. The main motivation I had for downloading the program and learning how to use it, which should be fairly simple, is that I had the idea to download the entire cassiopaea.org website along with the forums here. This raises some ethical questions however. The program does allow bandwidth limits, which is a good thing. Anyways thats all, I'll probably get back with some more information later.

About two months ago I used WinHTTrack to download the forum, Cass site the Cass glossary for the cases where I won't have internet access. Don't know if Mac/Linux version has the same settings, but it is advisable to set bandwidth limits just as a courtesy and because you don't know if it will eat all the bandwidth or overload the server. It will probably also depend on how many people are copying the site at the same time. I even had a scare at one point where the forum was inaccessible due to a server problem and thought that perhaps it was my fault. :scared: But apparently it was not. :halo: But this taught me that it is best to always ask first, especially if there are several people doing the download.

If time is not an issue for you, you can keep the program gradually downloading in the background while setting the limits to the minimum. Perhaps I was overcautious and it would work just as well with more connections, but what I did is to click on preferences and mirror options, click on "limits" and setting max transfer rate to 5000 B/s, and Max connections / seconds to 1. Then I clicked on "flow control" and set the number of persistent connections per second to 1 too. I didn't touch size or time limits because I wanted all of it downloaded fully.

One thing to remember is that if you don't set the mirroring depth, it will download all the site including images and pages that the site is linking too. It is cool if you are downloading the forum and you want for all the images to be displayed or being able to see the content of the added links, but it will increase the overall download size considerably (in gigabytes). I personally didn't mind, so it depends on your own preferences.

Also, a note that if you download Cass site, it will download the forum too.

And here is a link to the list that explains what not to do. On HTTrack site there are also detailed descriptions with screen-shots how to set everything. Hope it helps.

_http://www.httrack.com/html/abuse.html
Advice & what not to do
Please follow these common sense rules to avoid any network abuse

* Do not overload the websites!

Downloading a site can overload it, if you have a fast pipe, or if you capture too many simultaneous cgi (dynamically generated pages).
o Do not download too large websites: use filters
o Do not use too many simultaneous connections
o Use bandwidth limits
o Use connection limits
o Use size limits
o Use time limits
o Only disable robots.txt rules with great care
o Try not to download during working hours
o Check your mirror transfer rate/size
o For large mirrors, first ask the webmaster of the site

* Ensure that you can copy the website
o Are the pages copyrighted?
o Can you copy them only for private purpose?
o Do not make online mirrors unless you are authorized to do so

* Do not overload your network
o Is your (corporate, private..) network connected through dialup ISP?
o Is your network bandwidth limited (and expensive)?
o Are you slowing down the traffic?

* Do not steal private information
o Do not grab emails
o Do not grab private information
 
Re: httrack

Well, I'm not going to say, "don't download the entire Cass site", because there may be valid reasons to do so, and search engines more or less do exactly that all the time. I will say that if everyone and their dog starts grabbing all the content on the site at high-bandwidth settings, that would be bad, and your IP would be blocked. Technically, that would be a denial of service attack. So, if you must do it, do it slowly.

I guess the big question to ask yourself is WHY you are doing it. Do you NEED to, or do you just WANT to?

If you don't need to, then probably doing something else to help out would be a better idea! ;)
 
Re: httrack

For me one cause for thinking on owning a backup is because I'm a little bit afraid. Afraid of loosing valuable resources. It happened to me some time ago to recheck an URL and find the resource to be no longer available.

As a sample I can say that now 23:22h in Spain, the "sott.net" is unavailable for me to access. Only this site. What may I do? Going to meditate of course.

Salute,
Jordi
 
Back
Top Bottom