The journal is like a computer program. It is a sequence of instructions that constructs a result. As such, it would be fruitful to consider journal compression as a variation of peephole optimization. wikipedia
At a minimum a modified journal should construct the same story. A journal that fails this test would be defective. Sadly, there are pages in the federation with defective journals. A common culprit is careless construction of page json by import libraries including some I have written. Bugs in the early codebase and some rare failure cases that persist today are also at fault.
We rely on the journal to construct important points in the past history of a story. Any adjustment of the journal is tampering with this history but we can probably agree that tampering with unimportant history could be useful. One that use to show up a lot was adding a blank paragraph and then immediately deleting it. No historian will miss this. This, by the way, was a quirk of the text editor that has since been removed.
We commonly see strings of edits to one paragraph. A peephole rule might say that these can be compressed with no important loss of history. We might agree to only apply this rule when the edits are contiguous and within a short period of time, perhaps same day. Edits to a caption of an image is a variation of this string of edits where there is much to be gained by collapsing them making it appear as if the image were added with the caption in place.
Aside: Images and captions present numerous inefficiencies and compromises which will require more than journal compression before they meet all expectations.
An important principle for all optimizations might be that they not cross possession boundaries. When a page has changed hands we should accurately represent the story at the point of exchange. And we should preserve the fork actions that mark these transitions unless they are truly redundant.
I'm interested in how we might arrive at a set of useful and justified peephole optimizations that we might build into the wiki core javascript. Maybe these only run when a journal grows too large to save without them. Or maybe the most agreeable optimizations are applied on fork with belief that the gory details of authoring will be available at the source.
I'm also interested in how defective journals might be repaired. The simplest fix would be to discard the existing journal and replace it with one create action that includes the complete story. The required attributions could be tacked onto this so as to meet the cc requirement even if it does not tell an accurate history which has been mostly lost in this case anyway.
The federation is filled with test cases. A sweet little node application could spider the federation assessing the health and fitness of the journals it finds. This could be a testbed for assessing peephole optimizations though any result would depend on agreement as to what is important to preserve.
This thread first appeared in the matrix. riot
# Counterpoint
David Bovill reminds us that what the journal stores and how it is presented are two different things. The direction of conversation shifts.
The journal serves the reader who is confronted with divergent copies of an important work. Here interpreting the "population" is more important than the precise history of any individual. We also have the potential to construct "chimeric" pages by merging divergent journals.
There is code in the core, now three years old, that will handle dragging one journal on top of another. github
The merge logic was not robust enough to handle every journal it encountered. Further, the merged journals had properties unexpected of journals constructed sequentially.
We could consider the journal more like a genome than a program. We could say that whatever pleases the author of a mashup is correct. If we more completely model the competition of thoughts in the creative mind, we might be doing more service for the future than just keeping accurate records.
# Merge
Here is an example of a confusing journal from a recent merge: Social Library
- A few weeks ago I had forked this page and made a few additions. - Today I see that there is a newer version, a paragraph had been added. - I drag the journal with the new item to the left and drop it on my page's journal. They merge. - The new paragraph is added and the journal shows it. - I see this first as a ghost page, a fabrication that doesn't exist anywhere. - I like it so I fork it saving the mashup as my own.
There are problems.
The journal doesn't show any contributions as being my own. The journal shows two forks, both from the remote, but no fork showing my own site since the remote author did not fork my first improvements.
The two pages share the same last edit so they show as 'same' in twins and recent changes. This makes more sense than saying one is newer than the other but same implies more similarity than exists.