Cassiopaean timeline

Vulcan59 · Oct 15, 2009

Burma Jones said:
.......An admin is just about ready so that folks can start entering data, if anyone would like to volunteer.

Okay, I've got some time off next week, so I'll volunteer. :)

henry · Oct 15, 2009

Looks very nice!

Will you be able to read in files with multiple entries, or will entries have to be entered one-by-one?

Laura · Oct 15, 2009

anart said:
jeep said:

Yes, it certainly does give weight to the whole conspiracy agenda to think that FDR actually said this, but since he really didn't, I don't believe it's right to keep inferring that he did.

Click to expand...

Apologies for the grammatical nitpicking, but in case it helps if you use the words elsewhere - the word inferring is incorrectly used here. You mean to say 'to keep implying that he did'. The reader or listener infers, the writer or speaker implies - just a fwiw...

Actually, some people are inferring that he MUST have said it, and then implying to others that he did. In this case, pitching and catching are taking place almost simultaneously.

allen · Oct 16, 2009

Galahad said:
Looks very nice!

Will you be able to read in files with multiple entries, or will entries have to be entered one-by-one?

Well, the way the admin is being made at the moment, it would be one entry at a time. Could do bulk upload, but where would we get the source files? One way or another, those would have to be put into some sort of machine readable form one by one. Can you give an example of the sort of file you are talking about?

henry · Oct 16, 2009

I have a number of timeline files in different formats depending upon the source. There are probably many more available on the web that could be downloaded or gotten through screen scraping. If there was a file format for reading files into the db, then it might be possible to write some small text manipulation programmes that could put the files into the correct format. We would just need to know what are the fields in the db and their order.

Some examples:

Sunspots said:
YEAR MON SSN DEV
1749 1 58.0 24.1
1749 2 62.6 25.1
1749 3 70.0 26.6
1749 4 55.7 23.6
1749 5 85.0 29.4
1749 6 83.5 29.2
1749 7 94.8 31.1
1749 8 66.3 25.9
1749 9 75.9 27.7
1749 10 75.5 27.7
1749 11 158.6 40.6
1749 12 85.2 29.5
1750 1 73.3 27.3
1750 2 75.9 27.7
1750 3 89.2 30.2
1750 4 88.3 30.0
1750 5 90.0 30.3
1750 6 100.0 32.0
1750 7 85.4 29.5
1750 8 103.0 32.5
1750 9 91.2 30.5
1750 10 65.7 25.7
1750 11 63.3 25.3
1750 12 75.4 27.7
etc

This goes up to the present.

Wars said:
(BC)
First Dynasty Wars 2925 – 2776,
Mesopotamian Wars of the Early Dynastic Period 2900 – 2334,
Second Dynasty Wars 2775 – 2650,
Sargon's Conquest of Mesopotamia 2334 –2279,
Nomadic Invasions of Akkad 2217 -2193,
Ninth Dynasty Wars 2130 -2080,
Uruk-Gutian War 2117 -2110,
Sumerian Campaigns of Ur-Nammu 2112 -2095,
Twin Dynasty Wars 2064 -1986,
Ur-Amorite Wars 2034 -2004,
Elamite Destruction of Ur 2004,

Again, this one goes to the present.

JEEP · Oct 16, 2009

Laura said:
anart on October 09 said:

jeep said:

Yes, it certainly does give weight to the whole conspiracy agenda to think that FDR actually said this, but since he really didn't, I don't believe it's right to keep inferring that he did.

Click to expand...

Apologies for the grammatical nitpicking, but in case it helps if you use the words elsewhere - the word inferring is incorrectly used here. You mean to say 'to keep implying that he did'. The reader or listener infers, the writer or speaker implies - just a fwiw...

Click to expand...

Actually, some people are inferring that he MUST have said it, and then implying to others that he did. In this case, pitching and catching are taking place almost simultaneously.

Umm. . .Who's on First? :P

Wikipedia said:
Abbott and Costello performed "Who's on First?" numerous times in their careers, rarely performing it the same way twice. Once, they did the routine at President Roosevelt's request.

Gotta love it!

foofighter · Oct 16, 2009

Burma Jones said:
Galahad said:

Looks very nice!

Will you be able to read in files with multiple entries, or will entries have to be entered one-by-one?

Click to expand...

Well, the way the admin is being made at the moment, it would be one entry at a time. Could do bulk upload, but where would we get the source files? One way or another, those would have to be put into some sort of machine readable form one by one. Can you give an example of the sort of file you are talking about?

One by one is probably not going to be scalable at all, and doesn't work very well when the data is coming from another source that can be scraped. One way that I have done this in the past in other projects is to allow OpenOffice documents to be uploaded (OpenOffice since the internal format is XML, which is easy to work with). The column headers would be well-defined, like "Description", "Date", "Source", etc., and then each line below would have the actual data. A submitted document can then be parsed and the data extracted and put into the database. Not only does this allow lots of data to be input at once, but it also helps scraping scenarios (e.g. scrape a webpage and generate the OpenOffice document), and it also makes it possible to do the data entering "offline", which can then be uploaded in bulk.

allen · Oct 16, 2009

foofighter said:
Not only does this allow lots of data to be input at once, but it also helps scraping scenarios (e.g. scrape a webpage and generate the OpenOffice document), and it also makes it possible to do the data entering "offline", which can then be uploaded in bulk.

Excellent points. So, the minimum data required for each point is a title (the text that appears on the timeline) and a start date/time (at least the year, but will take all date/time info including day-of-week).

Optional information:
description (can be html, will default to the title if not included)
end date/time (if the event is a date span like "Crimean War")
icon
event type (Cs, historical document, archeological evidence, mythology, religious document) - probably the best way to handle this with bulk upload is to create separate XML docs for each type
links

What we have in the db schema that is implemented on the timeline yet but will be very soon:
tags (for filtering)
people involved (this will give an option of either getting more info on the person or filtering the timeline to show events involving this person)

So, would be cool is some of you could start collecting the above information and putting it in any machine readable format, meaning that there is a pattern to where each data point is so that I can easily parse it. Can be XML, spreadsheet, comma delimited file, whatever. While you do that, I'll get tags and people implemented along with the several thousand other things I've got to get done. :o)

foofighter · Oct 16, 2009

Burma Jones said:
Excellent points. So, the minimum data required for each point is a title (the text that appears on the timeline) and a start date/time (at least the year, but will take all date/time info including day-of-week).

Optional information:
description (can be html, will default to the title if not included)
end date/time (if the event is a date span like "Crimean War")
icon
event type (Cs, historical document, archeological evidence, mythology, religious document) - probably the best way to handle this with bulk upload is to create separate XML docs for each type
links

What we have in the db schema that is implemented on the timeline yet but will be very soon:
tags (for filtering)
people involved (this will give an option of either getting more info on the person or filtering the timeline to show events involving this person)

And you could add source (e.g. book, article), location (Country/City/Region), author (who put it in, without which you're going to have security issues), when it was put in, etc. etc. One problem with all this, and with using a relational database in the first place, is that we are talking about a sparse data model, where each node could have one or a hundred possible attributes. Relational databases are really really bad at handling those kind of datasets, and are also really lousy at being able to do interesting things with it once you have the data. It's simply not what they were made for (they rock at keeping bank accounts though). If it was me I'd opt for using either an RDF database instead (which the Simile widgets prefer also, and then you can do faceted browsing much easier), or a graph database (like Neo4j, which would allow you to do some really cool analysis algorithms on the data). Anything would be better than a relational database, really, for this type of data.

For instance, if you work with importing datasets from Excel spreadsheets, how do you handle updates of the data? Let's say I enter 100 rows first and upload it. Then I add 100 more, and change 10 of the old ones, and remove 3 that I found were wrong, and then want to upload that new version. How do you deal with that? How do you figure out what has changed? In an RDF database that's trivial. With a relational database it's a bit more tricky.

I know you are eager to rush forward with this (I would be too), but IMHO you should stop and consider what your options are first, and what the consequences are with going different routes, both for the data management, input handling, and visualization.

So, would be cool is some of you could start collecting the above information and putting it in any machine readable format, meaning that there is a pattern to where each data point is so that I can easily parse it. Can be XML, spreadsheet, comma delimited file, whatever. While you do that, I'll get tags and people implemented along with the several thousand other things I've got to get done. :o)

If you want people to start working on this Right Now, I would strongly suggest that you make an example spreadsheet with given row headers, as that is going to make it a heck of a lot easier to parse it later on. Don't forget to specify date formats and any other kind of formats for fields as well. I would write todays date as "2009-10-16", but other people might not. Having a consistent format from the beginning helps a lot.

foofighter · Oct 16, 2009

foofighter said:
If it was me I'd opt for using either an RDF database instead (which the Simile widgets prefer also, and then you can do faceted browsing much easier), or a graph database (like Neo4j, which would allow you to do some really cool analysis algorithms on the data). Anything would be better than a relational database, really, for this type of data.

If you want to look at a technology that really would fit quite well with the intent of this project, look at freebase.com. They have an open database where anyone can add more data, and then there's tons of tools to import/export/visualize it, and use it in your own website. Read their About page for more info on why a relational database just doesn't work all that well for this kind of semi-structured data.

In a perfect world, you'd simply use that instead of making your own tool for this. The problem, I think, is that the terms of use states that it is not allowed to upload material that "is illegal, harmful, threatening, abusive, harassing, defamatory, obscene, offensive, invades another’s privacy, or promotes bigotry, racism, or hatred for or harm against any individual or group". Pointing out historical errors could probably be at least "harmful", "threatening", "offensive" and sometimes even "illegal", if seen through pathological lenses. It's a pity there's no way to run your own "freebase", because then it'd be easy to just use that software as-is.

allen · Oct 16, 2009

foofighter said:
And you could add source (e.g. book, article), location (Country/City/Region), author (who put it in, without which you're going to have security issues), when it was put in, etc. etc.

Already have source (didn't list that, sorry) and location (lat and long). Proli should also put in text of the location. Author, for security reasons, is always a good idea, yeah.

foofighter said:
One problem with all this, and with using a relational database in the first place, is that we are talking about a sparse data model, where each node could have one or a hundred possible attributes. Relational databases are really really bad at handling those kind of datasets, and are also really lousy at being able to do interesting things with it once you have the data. It's simply not what they were made for (they rock at keeping bank accounts though). If it was me I'd opt for using either an RDF database instead (which the Simile widgets prefer also, and then you can do faceted browsing much easier), or a graph database (like Neo4j, which would allow you to do some really cool analysis algorithms on the data). Anything would be better than a relational database, really, for this type of data.

This isn't rocket science. We have a timeline from open source code with a predefined set of attributes. We also have a database that already can hold said data, deliver it to the timeline and display it. I don't see how simile timeline prefers RDF. What I feed it is a json encoded array of objects, which it likes just fine. Creating that array from my relational database is simple as pie.

As for your comments on relational databases pretty much only being good for keeping bank accounts....uhm, what?!

foofighter said:
I know you are eager to rush forward with this (I would be too), but IMHO you should stop and consider what your options are first, and what the consequences are with going different routes, both for the data management, input handling, and visualization.

Actually, you don't know anything of the sort. You have no idea how busy I am right now. For the past few nights I haven't been able to get to bed until about 3am, working solid through the day and night. What I am eager to do is get some sleep! However, I know from a whole lot of real world experience that a project as simple as this doesn't require a lot of design (see above comment on rocket science). What it does require is a continual movement of energy so that it doesn't stall.

The most difficult part of it is the data collection. Once we have the data, we can do anything we like with it. I really don't care what form it comes in or what headings are used. The only thing I care about is that it is complete and in some sort of structured format that can be parsed. As for date formats, if someone is going to collect 1,000 data points from various sources that use various date formats, they would have to go through all of those to make sure the date formatting is correct. At that point, they might as well just enter the data into a web form. I'd rather just write a parser that can handle any date format and make things as easy as possible on those collecting the data.

dant · Oct 16, 2009

The "upside" to Webhosting (co-hosting) is
low or no maintenance (with exception to
co-hosting), and high bandwidth connection,
redundancy, and perhaps with a few "bells and
whistles" thrown in such as "software: website,
database, etc...", but with less control?

The "downside" of private hosting is costly
software(linux is "free"), server-maintenance,
no-redundancy, and bandwidth, but with a higher
degree of control?

As for policies, is this what it breaks down to:

(1) (Integrated) Web hosted site: Policy
[which also includes 2(a)(b)(c) below]

(2) Private, but public site:
(a) Domain name provider: Policy
(b) ISP provider: Policy
(c) Phone company (wireless/landline): Policy

FWIW,
Dan

allen · Oct 16, 2009

dant said:
The "upside" to Webhosting (co-hosting) is
low or no maintenance (with exception to
co-hosting), and high bandwidth connection,
redundancy, and perhaps with a few "bells and
whistles" thrown in such as "software: website,
database, etc...", but with less control?

I'm not sure I follow. We already have plenty of server space available with everything in place.

dant · Oct 16, 2009

Burma Jones said:
dant said:

The "upside" to Webhosting (co-hosting) is
low or no maintenance (with exception to
co-hosting), and high bandwidth connection,
redundancy, and perhaps with a few "bells and
whistles" thrown in such as "software: website,
database, etc...", but with less control?

Click to expand...

I'm not sure I follow. We already have plenty of server
space available with everything in place.

It is not so much as server/space availability
but that of accessibility and preservation of
the data itself.

The question is, is the site you are working on,
trustable, and the work will not be co-opted?

The data/work ought to be backed up off-site,
often, in order preserve and restore the data to
another site if the need arises?

FWIW,
Dan

allen · Oct 16, 2009

dant said:
The question is, is the site you are working on,
trustable, and the work will not be co-opted?

Yeah, I guess that should be clarified. It will be as safe as SOTT or this forum since it will be running on one or both of those servers.

Cassiopaean timeline

Vulcan59

SuperModerator

henry

The Cosmic Force

Laura

Administrator

allen

Jedi Master

henry

The Cosmic Force

JEEP

The Living Force

foofighter

Jedi Council Member

allen

Jedi Master

foofighter

Jedi Council Member

foofighter

Jedi Council Member

allen

Jedi Master

dant

The Living Force

allen

Jedi Master

dant

The Living Force

allen

Jedi Master