-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical RDF, Linked Data Signatures #116
Comments
Comments of Digital Bazaar on the JSON-LD draft charter (which has explicitly declared these issues out of scope):
|
Separate discussions with DB agreed that one way forward is to provide an independent (peer reviewed) mathematical proof for the proposed (and also deployed) algorithm for Canonical RDF. A W3C WG may indeed be ill equipped to provide that. If such a proof was publicly available and considered as accepted by the community, turning it into practice could indeed be done in a Working Group... |
There was a discussion at the LDOW Workshop at the Web Conference WWW'2018 in Lyon, April 2018, which emphasized the importance of such standard for the future of Linked Data. |
Two academic reference from Aidan Hogan, who has published a canonicalization algorithm: Aidan Hogan. "Skolemising Blank Nodes while Preserving Isomorphism". In the Proceedings of the 24th International World Wide Web Conference (WWW), Florence, Italy, May 18–22, 2015. Aidan Hogan. "Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes". In ACM Transactions on the Web, 2017. |
Incidentally... the issue came up in another paper at another workshop (Distributed Ledgers and Linked Data) which does have references to digest algorithms and implementations (and reports on implementations: M. Sopek, P. Gradzki, W. Kosowski, D. Kuziski, R. Trójczak, and R. Trypuz, “GraphChain: A Distributed Database with Explicit Semantics and Chained RDF Graphs,” in Companion of the The Web Conference 2018 on The Web Conference 2018, Republic and Canton of Geneva, Switzerland, 2018, pp. 1171–1178, https://dl.acm.org/authorize?N655224 The presentation slides are at: https://ml.ms/TheGraphChainLyon (though those go beyond the specific issue of canonicalization) |
For completeness: the CG draft on canonicalization: http://json-ld.github.io/normalization/spec/index.html With some issues listed at: |
I did an implementation of Aidan's algorithm (see #116 (comment)): https://github.com/iherman/canonical_rdf This means that Aidan's paper may also be considered as solid basis for a formal specification. At this point, the question is whether there is sufficient interest in the community to go ahead for a WG. |
Notes of a discussion held at TPAC on a way forward: https://lists.w3.org/Archives/Public/www-archive/2018Oct/0011.html |
Latest discussions made it clear that the original deadlines (see 5th bullet point in the TPAC discussion) cannot be held: lack of time and money. Investigating the way forward further by email. |
There has been a TPAC Breakout session that reinforced the current strategy steps:
(1) is actually on a good track now: there is a not-yet-public draft for the Digital Bazaar algorithm which starts a reviewing process. |
Please let us know when/if we can expect this to be finalized. We would very much like to include signing in WoT TDs but also want to support discovery via directories that support SPARQL and JSON-LD "round tripping". Our fallback plan is to require that directories maintain the original JSON serialization exactly and provide it upon request and use JCS (IETF RFC that defines canonical JSON for signing) but this is kind of inelegant. |
Just to report in on progress here -- we now have (as of mid-2020) multiple independent reviews of the mathematical proofs for the RDF Canonicalization algorithm from "real mathematicians". We need to ask the reviewers to make those documents public so they can be taken as input documents into the work. Many of the people using the Linked Data Security work (building implementations, using it in pilot/production systems) are currently embroiled in getting the Decentralized Identifier specification to CR/PR -- it is the very next thing many of us want to do after DID Core goes to CR/PR, and we expect that to happen in early to mid 2021. So, doing a "charter in progress" announcement for the "Linked Data Security Working Group" seems like a good idea around March 2021 or so. The final thing we were waiting for is for multiple WGs to note that they want to see this work done, so we get the W3C Member votes needed. We expect that the VCWG, DIDWG, CCG, and now hopefully the WoTWG members would support the work. It feels like all of this is converging and we should be able to make progress on this WG this year since it seems to becoming a critical path item for multiple WGs. |
@msporny , was the canonicalization algorithm upgraded (based on Aiden Hogan's work) to address the diff use case in addition to the digital signatures use case? Addressing the diff use case means that a small change to the source graph is likely to yield a small change in its canonicalization. I ask because the diff use case was not addressed in the original JSON-LD canonicalization algorithm. |
As far as I know, there have been no production level implementations of Aiden's work wrt. diff'ing and so it's been difficult to test the "likely" part of your statement. We'd need more work in that area to see if it's a workable solution. A large number of the organizations using the Canonical RDF work just want us to start with standardizing the 2015 normalization algorithm (URDNA2015), which is already out in the field. Once that's done, we can work on improving in future c14n algorithms. At least, that would be our preference. With respect to diff'ing -- the 2015 algorithm can be used to do that (in fact, we do that when our digital signatures fail to work to see what is different in the graph that's being digitally signed and the graph that is being verified). It's usually fairly easy to see where things went wrong. Is that the use case you're wondering about, or were you hoping for something else/more? |
@msporny, I would need some technical explanation on this. As @dbooth-boston said, my feeling about the canonicalization algorithm (whether yours or Aidan) is that even a small change on the graph will lead to possibly big changes, because the blank node identification may radically change resulting in very different canonical forms. Or do you replace things with some form of similarity measure when relabeling the blank nodes and calculating the final hash of the graph? Diffing may be a major use case for this (just had some discussion about this with @danbri) so I would be happy to be proven wrong. |
To debug a diff of a graph, we take the c14n'd NQuads output of the signing software. Then we take the c14n'd NQuads output of the verifying software, and we literally run
This is not done.
Yes, but we haven't heard this creating a problem for people that have deployed URDNA2015 for years. We'd need to evaluate the use case to see why one needs to minimize changes between changes to the same graph when canonicalizing.
My concern is that there are a growing number of companies that are counting on URDNA2015 being standardized somewhere, people have built software around it, there are proofs on it, and it's been deployed in systems. Even hinting that the WG is going to change how URDNA2015 works (unless there is a critical flaw) is going to create a non-trivial number of upset companies... to the point where I think there will be objections on the charter itself. Now, to be clear -- I think it's just a matter of priorities. If folks want to innovate on what's already there by changing the canonical form, that's fine, even if they do it in the WG (call it URDNA2021)... but URDNA2015 is a high priority for a number of companies... there is a lot of software already written around it, and I expect that the chartering will fail if we don't specifically call out that a focus will be URDNA2015. We can do both... URDNA2015 and URDNA2021... I know there is support for URDNA2015... I don't know the use case for URDNA2021, there are no implementations, and I don't know the status of the formal proofs. |
I do not really understand what the role of c14n is in this respect. What c14n does is to re-label, in a canonical manner, the blank nodes, but the re-labeling is different if the two graphs are different. Ie, if the only thing you do is to follow the same procedure for converting your graph into NQuads, without c14n involved, the diff would work the same way... But we are digressing I guess...
Let us discuss this when we get to the specificities of the WG charter... |
Official strategy review started #262 |
Closing this issue in favor of #262 |
This issue has been lingering in the RDF community for a long time; the problem was that there was no known algorithm to define a canonical RDF graph representation, a necessary step for signing RDF graphs.
Things have changed a few years ago when, in conjunction with the work on JSON-LD and its applications, there is a published algorithm for the creation of a canonical RDF graph. However, this has never advanced to a WG.
Any usage of RDF (primarily in the form of JSON-LD) in the area of authentication, verifiable claims, etc., would rely on something like that.
The text was updated successfully, but these errors were encountered: