Comment

Pamela Geller Edits Post to Conceal Violent Rhetoric in 'Email from Norway'

214
Fozzie Bear7/29/2011 2:28:50 pm PDT

re: #189 Obdicut

A git-like difference evaluator would only note that a reference to an image was added. It wouldn’t try to extract any meaning from it. Similarly, if a line was deleted, it would only note the deletion of X number of lines starting at line number Y.

I wasn’t implying that you could extract meaning from it directly, just that it would provide a compact and efficient way to track changes over time without losing any of the “original” information (i.e., the state of the document at the point in time you started tracking it.)

Look at the way the *nix utility diff works. Git does the same thing, but recursively over a directory, rather than on a single file.

There’s no reason you couldn’t apply that concept to a serialized representation of a website.

That’s the idea I was going for. I son’t see any fundamental problem with it apart from one: It would have to assume that each “page” retains a static naming convention, or else it wouldn’t work. So, it would work on some websites (fully RESTful ones) but not others. The issue of the loss of semantic meaning isn’t something such a utility would even attempt to deal with. It would just track changes.

Am I making sense? (I mean, it makes sense to me, but am I conveying the idea?)