Stateless Etereum Call #5 transcript

# Stateless Etereum Call #5 transcript March 25th, 2020 So we're looking at this road map and like I said, we need this witness spec. I believe we're gonna merge that soon and it's not meant to be the final spec it's just meant to be the like merge starting point from which we can all start contributing back to. In the general gist, we're going to be refining the witness back over the next, you know, months. We realized during the summit that sync is likely prerequisite to the binary tree transition, so once we have a witness spec we will very likely be using that in whatever the iteration of sync -- and by sync I mean how clients sync the state -- we realize that if we transition to a binary tree before that we will break sync and then even if we do some things to make getNodedata and whatnot still keep working under the binary paradigm that the amount of intermediate tree data Binary tree has overall would cause problems with sync as well so so we landed on the sink as something that we need to to get done before the binary tree transition. There's database layout on here too, which is it generally just meant to to point at the fact that the naive approach to database layouts which stores all of the intermediate intermediate tree data as opposed to the turbo-geth layout (maybe we can start using the term just flat layout to be, you know, less client-specific but that clients are going to need to be moving towards this flat layout because without it the the likelihood of being able to keep up with the efficiency that clients are going to need to do is going to be low. Once we've got those things in place were in place to transition to the binary tree, there's likely a number of other things they get lumped into this as well since this ends up being a actual consensus, kind of protocol level change so we've got to migrate to the binary tree and since that touches every piece of the tree it's often been lumped in with any other things that we need to do that touch the rest of the trees. So things like code. Merkelization would likely get lumped into that in in forking things like that and then after that comes essentially we've got some some other things that could get started before all of this we need to make some EVM changes to account for the sizes of witness 02:46 via gas accounting. And once we've done all of that stuff in place we can actually move towards making witnesses compulsory, which generally means putting some sort of hash reference to the witness header. there's a lot of other things that are kind of like around the periphery on this and it isn't to say that those aren't important but in terms of things that sort of just like the critical path of things that need to be done, this is they have a good high-level picture starting point of what needs to be done. 03:20 Alright, there's a link to that in the in the chat if anybody wants to you know, look at that diagram anymore. You're welcome to get it from there. It's also linked on the meeting agenda. So, Anyways, we have two really good days of discussions and I think it's really a fruitful one of things that Alexey pointed out was that disagreeing in person is way more efficient and effective. 03:47 So we had two good days of being able to disagree with each other about this stuff and. One of the main things for me that showed up during the summit was removing the concept of a state network from kind of the overall critical path. It isn't to say that it isn't useful or valuable but you made a good argument to me or somebody did that even without the ability to do dynamic state retrieval kind of on-demand state retrieval which makes so that clients are not necessarily able to support each call or things like that. 04:27 There still is a lot of value delivered to the network by supporting stateless clients in a first class way and so by dropping that from the official roadmap we you know simplify our overall goals. It doesn't mean that we don't do it doesn't mean it can't happen currently. It just means that it's not really part of the critical path. We try to focus a lot on what is it that we absolutely have to do. Would a Prune node will be part of that realm of things? What do you mean when you say a prune node? Well, like forgetting part of the state, ah, when you say that do you mean dropping like chain history like dropping old ancient blocks and things like that or just operating with only a partial set of the state? 05:18 More the dropping of information so that knows don't have to hold yeah So yeah dropping old blocks or receipts or headers is also kind of off the critical path... it isn't it isn't that it's something that we don't want to support it's just that it doesn't absolutely have to happen as part of this and so it gets kind of lumped into that like side quest category of things that make stateless clients better and there's a lot of motivation for us to do those things. It's just that we can trim the scope down to this absolute kind of minimum thing to support statelessness and and then have less to do overall. I think that's. 06:06 Well I would say it's not just about the having less to do, itss about, to me, it's about sort of demonstrating that what we actually do in as could produce some some results. Because there's always a problem, there's always a danger if you sort of accumulate scope along the way you never actually deliver anything. So with the and that's what I would I was going to basically disagree with Piper when he was describing the the summary so that we for example when we want to migrate to binary trees if we wanted to first of all, you know, there is no certain decision that we wanted to because there are still arguments against it, but if you wanted to, for example, I would not like to be lumped in with any other changes like the code merkelization. Because that has a danger of that that has a danger of never achieving anything because you're efficiently long too many things like even two things instead of one there is a danger we'll never actually going to finish this because every additional thing brings more more with it, so we wanted to just try to do at least one thing and to prove to ourselves that we actually can do this thing rather than perpetually being an a state of like, okay, well can we do this, you know, another thing together. 07:30 Sorry I wasn't intending to say that they depend on each other. I guess maybe a better way to say what I was trying to say was if those two things end up ready at the same time we will probably do them together but I I wouldn't be I wouldn't be planning to block one if the other is ready, so I'm very much on board with as soon as we're ready to do the first thing let's do it. 08:01 You're mic was a little quiet. Martin when you when you said something. Oh sorry that's better okay yeah I just said I was fun because exactly what Alexey commented I wrote about on the ETH1.x discord. I don't see any gain at all to do code merkelization at the same time as converting the trie and rather the opposite. 08:25 I am curious to hear so in for something like code merklization for something like the binary tree for something like just if we end up having to do like account versioning as part of this. Um, Assuming that they're ready in the same timeline, are you saying that you don't think that we should do them as part of the same fork. Yeah, I also agree with that. 08:57 I mean, I think that switching to a trie will not be done in a single block. I mean, it will take it will take some processing time. Doing code merkelization if I mean if it was the case that we could do this thing and why we're why we're operating on the data we can also get another little fix in and then of course we can do it at the same time but for cold merklization particularly we only have the code hash, so there's no gain with them, we need to load the actual code and do the actual processing step. So there's a gain to be had doing it during the the code merklization because it's not something we can fix on the fly. 09:47 I don't know if I'll make but I think I understand it just doesn't exactly make sense to me and we don't have to spend a lot of time on this. I've just been operating under the assumption if we need to walk the entire tree and do something to each right Yeah so so I'm arguing from the theory that I mean guess as of yesterday or something we merged this snapshot the flats database. So that will be working in the short term future so I'm assuming that all geth nodes will actually have a flat database. so we won't walk the trie in that sense we will work the the flat database. Yeah and I guess I meant just in general any if any of the things that we do need to enumerate all the leaves of the database for all of the you know values in the database then it makes sense to put things together, but we don't have to make a call today, like I'm not pushing for this it's more just than something that I thought was a logical choice 10:51 But I mean I'm talking about the separation of these even I think if they even if they are sort of ready at the same time, I would still suggest to do them separately and this is from the point of view like my default position is basically like imposter syndrome is that you know, we haven't until we actually have done something. 11:12 I I always presume that we're going to fail so yes, that's right, so Is coinbase on this call? Doesn't matter but that's why my default position so I that's why I want to simplify as much as possible whatever we do in the first step and then whatever we're doing the same second step because the the the the complexity will be big enough to to to carry this out even in a very small pieces. 11:47 Yeah, it's going to be very complex. I mean, both changes are very complex, you want to be able to take them one step at a time, it's is to risky otherwise. Cool I'm on board with this I I have no problems with us doing things granularly and I and unless anybody wants to argue this point. I think we can we can all kind of say as a philosophy we can go with that and as soon as we have our first thing ready like we'll be ready so From the hard fork coordination side: having forks closer together totally works as long as we're kind of planning them concurrently or so like there's a oh yeah in a sort of amount of time we're gonna have forks that are closer together works, but having suddenly we're gonna have a fork sooner doesn't work as well. I yeah, I don't think anybody's proposing that and frankly since we have kind of representation in this group of all of the client teams…hello pairity. I don't think that the like like it shouldn't be as bad as other it should be easier to coordinate on kind of readiness and implementations being ready in all of the clients kind of, you know, I see that as being an easier process within this group since we have representation Actually pairity is also represented by the way, oh yeah, are you guys here or somebody here? Wei Tang! Okay, thank you that's all sweet, um, very glad to have you all here. I wasn't intending anything other than just general comedic value. All right so does anybody have any questions about the summit or anything that you'd like me to clarify or somebody else to clarify before we kind of move into some kind of quick high-level went through of the various topics to get some updates from the people working on them and then we'll drop into a little bit into deeper discussions. 14:00 Okay so I was hoping that we could go kind of round robin not through everybody but for anybody who's kind of working directly on one of these pieces if anybody has updates that they want to share this does not have to be things that are necessarily just on the critical path or whatever this can be a bit broader, but we're going to kind of like shorter or quicker updates and kind of what's going on in specific topics. 14:29 I'll start off with sync just on a quick update there. I made a post to the forums that I can dig up a link later somebody can do it now for what I'm starting to call merry-go-round sink, So the idea is kind of an evolution of a couple of things that's been talked about but it's basically rather than like the current sync mechanisms tend to be kind of call-based so you as a client who wants to synce kind of reach out and get the data that you need or want and the idea here was to try to kind of invert that and so you join the group of people that are sinking and the data that's being synced from the tree flows to you from all of those peers exactly what the mechanisms are to make the you know, how this work are up do debate, but the idea is that there's some sort of rotating hotspot through the tree that everybody is sinking from that eventually covers the entire state tree and you jump on the merry-go-round for one revolution of the state sink and by the end of it you have all of the state and you are done and you can hop off if you want or you can stay on and feed everybody else's data, but the goals here are one to remove support for dynamic state access from the network primitives, so remove the ability for clients to request specific pieces of state on demand the goal, there is to remove the ability to make clients that that expose stateless functionality that the network can't actually support overall so things like ETH_call and fetching that data dynamically is not something that the network can support and so by not allowing on-demand dynamic retrieval of state we can kind of dampen the ability for that kind of stuff to have an effect on the network. The other major benefit is kind of bit torrent swarming behavior, so if everybody is syncing the same part of the tree then people who are partially synced can serve other people who are partially synced. 18:03 I think that's kind of like the main gist there so that discussion is starting to try to write down the ideas you can see some of the general ideas that are on the merry-go-round sync posts in the ethresearch forums. Piper, it sounds a lot like warp sync to me. 18:23 I mean all so I think one of the main things that came out of the sync discussion was that everybody's ideas about what sync should look like have a lot of overlap that that most of us have thought about this topic a decent amount and that there's a lot of overlap between our different discussions, so... I would like to explain what is it in my view is a different between difference between warp sync and this merry-go-round sync. in my in my head is at first of all again the warp sync, although. It's sort of things should entire state it's still up to you as the requester to start it you initiate the process so you are essentially say okay from now I'm going to retrieve the entire state and you will receive the chunks and maybe in a certain order but it's up to you when to start the specific so you're dictate when the sync was going to start and when you're going to sync certain data, so the server has to accommodate you. they cannot choose. 19:22 I mean, they essentially is that you yours is a leecher actually choosing what data will be sent. we reverse this in merry-go-round. where it's actually the seeder who chooses when to sync the data and the leecher basically have to adjust. they do not choose when they want to they when they get certain that data but the seeders basically can pre-calculate the syncing schedule, because they have a data let's say. and what the leechers can do they can join and they passively like receive the data and then feed it to other people and once they've done enough of that they will have a synced state. but because they don't actually control the like how the state is syncing its much harder for them to overwhelm the seeders because the seeders will essentially be just seeding whatever they know. I mean the schedule will be will be known well in advance about what data has to be seeded at what time, which could actually make much more efficient caching strategies and stuff like this completely separate this behavior from the your usual processing. So essentially participating in this network as a seeder will be much much cheaper in terms of resources than anything like a warp sync or fast sync or anything like this, so that's why I would expect a lot of seeders would actually stay rather than just go away because it would be very easy for them to do so. 20:52 In another point I wanted to say that when Piper was saying that we were kind of want to remove the ability to dynamically request pieces of state what I was actually proposing is that disability should exist but it should exist through the right protocol like LES, so because if this is what it is designed for to be able to request a specific pieces of state on demand but my suspicion now becomes that the one of the reasons why it hasn't been used a lot because actually if there is a way to do it for free using the getNodeData, why would you do it for for for money? Right so in our model when we offer this syncing network which is going to be the merry go round where you can sync for free but you have to potentially wait for some time to get your pieces of state or you can you can get the data that you want immediately through the LES but you have to pay for it eventually. so that's basically your trade-off. Yeah my my statement about removing the ability to fetch it on demand was really purely focused on whatever the network primitives are that we use for synchronization. On the merry-go-round sync, in comparison to warp sync does it not have the same downside of having to get all the data before you can calculate if it's correct against the root or their partial proofs provided along with the chunks that on the merry-ground so this is a yes, this is the details, of course, but even my model that I currently trying to develop is that all the chunks contain the proofs and not only that in again if this is my current model is that the chunks are also combined with the block witnesses and the reason for that is because as you think in order to prevent your database from going stale, when you combine them with block witnesses, you kind of patch up the old data so by the time you you've done the entire round, you know that your data is totally fresh and it's totally consistent with the current state yeah that is the important point in point out that witness availability is kind of a key part of the merry-go-roundsync idea, maybe not absolutely critical but it's probably somewhat critical to being able to have a really fully up-to-date picture of the state. So, Um, Cool. Does anybody so I'm gonna open the floor up now to any of the other areas that that anybody wants to provide a brief update and we can kind of have some ad hoc discussions. I'm gonna try not to get anything too deep or long, um right now because I'd like to kind of get through the topics and then dive into any of them that people really wanted to discuss. Just a quick question, um, I remember Felix mentioning getNodeData had to be removed Was that somewhere in this path? Yes, the general just is that if is that we agree on a new syncing mechanism we get that in place and then we get rid of getNodeData. Okay, thanks. With the merry-go-round, would you join and have to wait to the beginning or is it just join and then you start from where you are? I think it's basically like if it takes an hour to go around the merry-go-round then you get on at any point and you get back off. So it takes an hour to get back to where you started from or I I'm using an arbitrary amount of time but however long it is it takes that long. There was a proposal by Vitalik that that has multiple sort of horses on the merry-go-round, you know, concurrently, you know interspaced. So if you join all of them you can stay on for you know, just one whatever ends of a rotation. Ah that's an interesting idea of kind of facilitating. I guess the idea there's that likes not there is a complexity which is that not all clients can adjust or receive the data at the same rate and so allowing people to kind of parallelize their participation on the merry-go-round could allow… Yeah actually I think that when it when we started to talk about this I mean it if it works assuming that this is going to work.. because at the moment my ideas are still flawed and I'm still figuring out how to un-flaw them, but if this works then I also think that this is actually quite interesting thing for the sharding design because one of the issues there is that when you switch when you rotate the validators and the validators are getting assigned to new shard they actually have to sync to the state of that shard and that could be interesting idea where each shard has its own merry-go-round where people come and then just they can suck up the state from that shard quite efficiently. I just had a thought for for a while. 26:18 I don't know if it's fit there, but so we have started we implement this staff protocol, which is a new sync protocol. Do you want me to give some a short TL;DR in geth? 26:42 Peter started implementing it it at protocol level so we're adding a new staff protocol looks like before and eventually we hopefully rolled out on the ETH-something but we want to be able to experiment with it. And it's based on these memory layers where we keep at least 128 layers of changes in in the format of the flat leaves. so basically you can ask another node, hey please give me ten thousand nodes ranging from address or from hash zero to whatever... and the node will then iterate through everything that's in memory and on the persistent disk layer and serve those ten thousand things. 27:35 And at the beginning and at the end we can also give him the trie proofs so we can validate so it's it's kind of like warp sync but instead of the whole thing it's split up. We can only do that for the 100 something most recent blocks. if it's older than that we start pruning the trie and we won't be able to provide the proofs so so I don't think geth could be a seeder for the merry-go-round just like we wouldn't be able to produce a warping sync full thing because we need to be able to move on. I think that there the merry-go-round concept is not defined enough let's say like I've I've not imagined it working from a single snapshot of the state over that amount of time. I've imagined that the witnesses are how you keep the data up to date and that you're just always working off of a slightly old version of it probably still within that 128 block range.. So just to just conclude so the idea is there so you get 10,000 from this guy, 10,000 from that guy, or actually 10,000 for this state root, 10,000 from that state root, and in the end you put together a trie or a database made up from maybe a hundred different state roots. It is contigious number of leaves, but obviously and each like separate separate slice of the trie is individually matched against the state root, but you'll obviously not have a complete final state root so in the end you need to polish it off with a bit of good old getNodeData + fast sync on top then you'll have complete new state. So that flies a little bit in counter to the idea of getting rid of getNodeData. But we haven't really been able to figure out a good way to get rid of getNodeData in this context. I think this is actually what you're doing is great because the the thing that I was talking at the Summit who well my suggestion about how we go about this thinking protocol is our suggestion to do some sort of kind of hackathon not in terms of like getting together to. 30:12 One place but internet hackathon in terms of sync protocols. Everybody does whatever they can in order to improve stuff and we check keep checking the notes and eventually we'll you know come up with something that works I guess and then just keep comparing the the actual implementations and designs. 30:34 We don't have to agree on what the final thing should look like right now. I think it's it's better to just do it in this manner of people trying to improve things and because when you do that, you will figure out all the little details that matter that you can't do it without implementation. 30:51 Yeah, exactly. Yeah. I just want to give you a bit of an update. Thank you, Martin. Let's see. Does anybody want? So Paul I saw you posted on the forums a couple things about stateless mining or anything like that. Do you want to give just a brief update on any of your work there? 31:14 Stateless mining isn't critical but it's interesting for many reasons and people are going to do it anyway. They can statelessly mine no matter what, you know, 21,000 gas balance transfers. But can we do more and it's a very interesting question which touches on the limits of computability and decidability. But I made a list of stateless mining strategies, but one interesting thing is once we have statelessness different transactions will be minable by stateless miners and certain transactions won't be mineable by stateless miners and there might be different costs associated. So, so it's something something interesting. I don't know if there's much to say I don't want to go through the the proposals. I had another post on ethresearch. For the strategies the block witness I think this is the biggest topic of the full stateless effort is the witness size. And this is this determines the you know, the throughput of the system and the big question that everyone is asking I guess is will we have the same throughputs? or will we be stuck at, you know, 100 kilobyte blocks or 200 kilobyte? because miners are too afraid to get uncled or whatever the case may be or something else. 32:31 Something's gonna happen. We're gonna you know, roll the block size up to some point and then we're gonna be stuck because for some reason. Another maybe technical maybe economical, whatever. So there's there's a big effort. I think there should be a big effort to reduce block witness size so we talked about hexary to binary-- there's also some other things and I just made a survey of existing ideas and proposals and I'm curious if there are more this is I think one of the most important topics in this whole full effort because it deals with throughput, that's it. 33:07 To add to that, the current to leading ways that we are looking at reducing block witnesses are through the binary tree transition and the merkleization of contract code. we have good numbers on the binary tree, we have promising preliminary numbers on code merkleization that need to be --we need to do a bit more quantitative research across a larger sample set and using merkleization method that. 33:41 The Merkelization method that we intend to use and to kind of validate that yes, we will see the the reduction sizes that we expect. But like Paul said and posted there are some other ways that we can get even bigger gains, but they require kind of deeper research and so we're kind of focusing on these two objective ones that we really know about and the door is very open for finding better ways to further reduce that size because Paul is absolutely correct that the overall witness size is going to be a key factor for us in making this viable. 34:28 Well is there anything else that anybody wants to chime in updates to basically I want to do a couple of updates for Igor. I think he just posted in a chat and I want to expand on what he's said and add something for myself it's about the witness specification, so the witness specification itself hasn't actually changed on github, but what we have done is that we are essentially trying to make our implementation fit and specification more because it happened that specification has leap-frogged the implementation a bit, so we're trying to bring it in line. Igor has done some work on this to essentially what he's done is that he extended the tries to include a code node which basically can have the code bytecode hanging of it and that would kind of match a bit more what the specification is saying. Because at the moment in implementation because we inherited that the the code of the smart contracts are is stored separately from the trie separate stored in separate sort of hash map where is like you get a code hash and it's mapped to the contract code. So he essentially reattached it into the tree, so the tree now basically is the same as in the spec and as an in the witness. 35:52 And another thing I want to say is that we are also as we kind of preparing turbo-geth for release we are also making sure that the the tools for running things like code merkleization and analysis will be ready so we kind of at the moment debugging them and think they pretty much working. 36:11 I just have to do more tests so we give you them when ready for these analysis and hopefully that somebody can start this analysis soon and we are obviously hoping to open up these tools as well. I mean, they all open source but basically by opening up I mean, Writing some documentation and letting people do use it. 36:31 Yeah, that's my update. 36:36 On the topic of witness back there's an effort as well for formalization the problem is if there are some ambiguity in the spec it would be good to know and it's not obvious at first glance for me at least because there are guards and there are you know, it's it's it's a little bit complicated so I propose that we use some standard textbook, you know, theorem proofs just as a baseline one thing is unique readability of the syntax of the witnesses, so there can't be ambiguity, you can't read one witness in two ways and you have to read each witness at least one way. And another thing is there's the execution what we call execution semantics and it would be good to have something called soundness property. And I have some textbooks with some standard definitions that we can build on and I think we will have to adjust everything a little bit but I think that this should be formalized other formalization effort is I think Alexey wanted to put some of this or I think you said you wanted to put the weather some functions are pure you wanted to prove that in Z3? I wanted to start with the I don't know what exactly I'm going to be proving but I wanted to make the semantics executable then I using this the z3 and actually see what we can prove. I mean, I'm I don't have a lot of intuition about what we can prove because I'm not experienced in this topic but I just wanted to gradually get into that yeah. I think it's going to be a lot of fun yes, of course yeah as always :) Anybody want to talk more about witnesses or any other updates? Just a quck update: so I made a writeup of the conversation I had with people about the binary to the hexary binary tree transition. I created a post on ethresearch and I also started a quick prototype. So if you have time for feedback, you always interested to hear about it. I'm copy-pasting it into the chat the link to it. 38:56 I have one question actually. Go for it. Okay, so don't know if there's any way to exploit this really but with the current with the current spec is it possible that after you you know change the boundary from the account try to the storage try like through the account node you can also have just like infinite nesting there, so you could make account nodes within account nodes after you go through branch nodes and such. Would it be sort of correct in terms of semantics? yes, it would be correct at the moment we're not the basically the way to fix it would be to start to distinguish between let's say account branch nodes and storage branch nodes and disallow the account hanging off the storage branch nodes and so forth. 39:18 However, I don't know if it's going to be very useful, sucha restriction. To me, it's a semantics it doesn't actually solve the purpose of being like a security mechanism. It just it it serves the purpose of essentially informing the implementers about what how this thing should work in an unambiguous way. 39:52 No doubt. I just wanted to like I didn't even know how one would potentially exploit it where if it if it's necessary to you know, add semantics onto that for that but I just wanted to call that out and make sure I understood it corrected. Yeah, yeah, so you understood it correctly. It is possible to do like to do it like a lot to do stuff, right? So because it's semantic it doesn't also tell you how deep the tree could be could be like thousand levels deep. It doesn't have any restrictions on that. Um, I would highly recommend um opening issues in the specs repo for things like this just so that they're tracked somewhere and public discussion can take place under that to kind of give us a spot to decide whether or not it's problem and actually discuss publicly how we might solve that kind of stuff. And I think one of the things that Igor is planning to do is to open up at least one issue maybe multiple issues to documents the current things that have been like unresolved discussions about the specs. So there's a lot of stuff things like Paul mentioned earlier all of those are great things for us to make sure that we get issues to track those concepts and ideas. 41:07 All right. Any other updates or comments questions about any of these topics so far. Piper do we have plenty of time or um, I would say that we have 40 minutes left an hour and 90 minutes is you know, once you go over 90 minutes everybody kind of dies for the last 30. If I want to do another 2 minutes, but if we're tight on time I won't do it. No go for it. I don't think that we are tied on time today. Right. The one interesting thing is testing and tooling. I had a new idea and I spoke with some people about it already but not everyone. I’m just looking for feedback. I didn't post anything publicly yet about test case generation for witnesses and for witness size estimation. The problem is we test things on historical blocks, the problem is historical blocks, there are only whatever ninety million or hundred million accounts, so there's you know, a limited depth to the tree and also statelessness will have different behaviors, so I want to propose a test case generator and witness modeler. 42:18 So we would validate this against the old blocks. It'll be a parameterized model of how a witness looks. So it would be at a statistical level we would run it and we would accumulate statistics. On how it generates witnesses and we hope to mimic the statistics of historic ETH1 blocks and once we do this we want to also predict how stateless blocks would look for example stateless in stateless we might behave differently. Merkel paths, you know, if someone's sending a two transactions they'll send them together so that they're in the same block so that's the Merkle path has as much overlap as possible. So I think that this is interesting and also for testing we have to test the algorithm and the the big idea for the test case generator is to have a fixed random number generator, everything is based on a fixed seed so we can do it-- because for example, we're gonna have you know a billion accounts and we can't really transfer this between computers but we can transfer the algorithm that generates the random number so I’m thinking percent twister with you know, fixed seeds and then it's unbounded so we can just run this algorithm until you know, the sun supernovas. 43:29 And generate more and more tests and also comparing, you know, if we want to change the witness spec in certain ways, we can say, okay based on this model we'll have some 1% or 2% size savings. Is it worth it? So I want to add this to the tooling section is is testing the witnesses. 43:48 Actually, you know Paul I think I might have not phrased it in the same way but I have probably added into it to the tooling section, so I just as we speak I just created the pull request because I realized I haven't done it yet. It's incomplete it's in the spec repo and the pull request number six. I'm just going to post it and I'm going to merge it soon so that we can have more sorry more contribution to it, but essentially if you look at the just do that in this there's a section number three in the tooling roadmap. I can't even post anything. I'm sorry. For some reason. I can't type anything in this thing. So anyway, so it's in the spec repo. And then there's a tooling road map and in and it's got number three, which is I called it emulators of large state re-org and various contract activity. And there I sort of mentioned hinted on these things, but of course your description is much more informative, so maybe we could expand that because that's that I think this is really good because in in the end I added for example, these would be able to cause very large witnesses to be generated or cause deep reorgs because the the relationship between the witnesses and the re-orgs also need to be studied. I mean, I think it might be very easy to resolve but it's still an interesting question. 45:30 Um, the thing that I would encourage in this route is for us to look at how we can bind these things into kind of continuous integration set up in the specs repo so that we can come up with like right now we've got the witness spec, ideally we can build a kind of like minimal implementation of it that lets us then test that implementation against these kind of test vectors and use that kind of stuff and see I which would even allow us to run this sort of like generative random test cases for like even just the time boxed amount of time in a CI environment. So making sure that we're using those tools ourselves, and then that lets us also more broadly use them, you know, somebody makes a nice tool for doing witness generator, you know, test vector witness generation it would be great to have something like that that's well maintained and that all of us can use in our various individual implementations. 46:32 So as these tools are thought about being built it might do us well to to think about how we can make sure that they are like used from the get-go. To make sure that they're well maintained at they that they work for people otherwise we're all going to build our own individual tooling to do these things and I think that's less ideal. Piper I think we should have a spec implementation. But of course the spec implementation Python is going to be slow and I think, people are going to implement it themselves anyway in their own, you know native whatever, but I think it would be useful to have a spec implementation in the within the you know, spec text or whatever in the spec repo But I again, I think people are going to handwrite them in their own whatever just to get that.. Agreed, I fully expect everybody to write their own implementation, but I I I have historically liked some of the things that ETH2 has done there and having this kind of “this isn't the blessed canonical implementation, it's just a speck implementation that the shows and the very minimal slow way how it works”. 47:56 Cool all right time for more updates, but if nobody has anything I'd like to transition into we've got about 35 minutes left and I... my suggestion for topics that I'd like to discuss is these are my topics that I'm interested in discussing but I'm but I'd like to open the floor to to understand what other people wanted to discuss. I want to talk about getting witnesses rolled out as soon as possible in an experimental way and I'd like to talk about the, This idea of new sync protocols and kind of what the way forward that all of the various client teams are interested and willing to participate. Are there other topics that people would like to get into in some depth while we're on this call? I’d like the talk about scheduling a stateless call if there's interest but that can be at the end You mean a sync call? I'm trying to think about the right way to ask the room because when you ask the wrong question, you don't get the answer you want, um, I have suggested that we have a separate call to talk about sync on a somewhat recurrent basis that call would be focused more towards the client development teams that would be implementing this stuff that doesn't mean that other people aren’t welcome or invited but it would be much more focused on clients specific stuff and having the the individual client developers talk to each other so that we can kind of come to an agreement together, hopefully or maybe even two agreements together if we can't agree on one thing and then kind of proceed with implementation together and and discuss how things are going. Does anybody I'm going to just say like either way does anybody post or support that like is is having an independent call for this sound problematic or like too much structure or or does that sound like a beneficial thing we did a monthly call on? On this topic of kind of new sync. 50:14 I think it's okay. I mean, this is probably what I would say comparing notes on this our sort of sync hackathon as I would call it. 50:37 Calling out individual teams. So Thomas I saw you were on the call is that something that you guys would be kind of like willing to jump on to to work on sync? Similarly Martin I know that it might not be you but do you think somebody from y’alls team would be willing to get on and and chat about that stuff? I don't see this as being a big like drawn out ordeal or anything. It's much more of a time for us to be able to coordinate. At the moment, my plan is to follow up on whatever geth suggests on the flat DB and the new snapshot sync. And separately we’re working on beam sync. And so far anything there's any other solution I think we'll be discussing because it's just not ready the research stage unless you'll see that this merry-go-round sink progress will be more detailed than but yeah, I'll be I'll be willing to jump on the alls with the with the game team and with other implementers to see how it's going. I think parity will be a bit busy implementing fast sync unless they want to skip it or for the sake of other things for the future, but this may come a bit too slow for them. Yeah. I mean, I'll be willing to join. But I think it might be just like about not necessary to be that big because if the geth team delivers the spec for the snapshot thing there is not much else, that will be working on I think within the next few months. Okay. Wei Tang, if you wanna weigh in similarities somebody from Pegasus just to check and see.. It's fine to say this is also, you know, another meeting is too much and we don't need that I think given the things that crystallizing a bit more. I think it's fine. I think it's a good idea to have the meeting where we can get more into the weeds on just sync stuff. 52:48 Oh, we're from Pegasus. I suggest doing one and then kind of going from there. not doing a schedule like our networking call we do kind of as needed when on the Eth2 side as things come to a head and we have enough to talk about and that's worked well for that like kind of sub call. Great. Okay, then that means that James will be getting that call scheduled here sometimes soon for us to kind of have an initial kickoff meeting for this idea of a new sync approach. Great. So the next thing I want to talk about is kind of a rolling witnesses out sooner than later. 53:35 The idea is that we can start experimenting with them sooner. This absolutely had like I have some personal motivation here of the witnesses can accelerate things like beam sync but I think that they also just give us a really interesting kind of view into what witnesses look like now and over time and can we start experimenting with… Well, it just gives us a platform to experiment with them in a non kind of network critical infrastructure way. And that would basically involve us coming to some just general loose agreements of whether we can come to a loose agreement but some sub-protocol that we can start just broadcasting witnesses over that is you know, an optional thing that clients can kind of opt into and join if they want to and that lets us start broadcasting witnesses around or even doing like just, The fast sync behind beam sync in a much more efficient way where you don't have to do the patch up at the end where everything you seek via fast sync is kind of always...You never have to patch it up later. Assuming that you can consume witnesses. So, I think that that's one thing that I'll be looking to roll out soon. I believe nethermind, that their beamsync implementation would benefit from it.... Trinity would benefit from it and assuming that they that rolling them out actually works that we can kind of broadcast them around... 55:20 We can also start to get a sense for what the network is able to handle and how bad it is to have giant witnesses and small witnesses various sizes and things like that. Anybody have any comments on that? Yes, I have a comment on this so I think it's interesting idea and the one thing I would like to experiment with which will help us later on is .. essentially what we want now is to have a essentially like experimental subnetwork within the ethereum network. So there's probably multiple ways of doing it if you read the this roadmap document, so I had this I had this point about three networks and so this is example where I was thinking okay, so you can have the you know, when you're trying to join the Ethereum network, you can discover your peers at the moment there's one single network but they might be sort of three networks or two networks. At the moment, let's say two networks: one network is the usual thing and the second network going to be the ones where the witnesses are going to be propagated. So somehow every node could decide whether they're going to be on on both or just on the one. So the ones that so most of the nodes in the beginning will only want to be on the single one which is the which is the usual the Ethereum network but some of them might decide to jump on both. And the interesting question here is that how do we discover the second network? is this and I think if we solve it in the real in a really good way we can use the solution to extend it to Three networks. So if we had the sync implemented in the third network, then we can actually use that knowledge and experience to roll that out sooner than later because it's also these kind of networks should be a kind of option, at least in the beginning. so yeah that I would like to maybe figure out in terms of discovery how this is possible. 57:27 I'm sure they think discovery v5 is is the planned and relatively like I think it's almost ready.. the Fix for this. Do discovery v5 has supposed to have pretty robust kind of topic discovery for finding who is interested in the things that you're interested in. so somebody correct me if I'm wrong but yeah there is a ta capability and discovery five that is deemed, as is, currently insecure not ready for production. There's a wave of other people looking at it and academic team looking at it who has things that they can harden up and get it ready for production but, As of today it is not ready to production actually in these two we're having to use some more grounds because we can't necessarily do the topic discovery as we intended. 58:36 If Discovery DHT and things like that are up your alley, Felix is constantly looking for more people to chime in on this problem. Okay that's good to know if there's any way that you can kind of point us at any of the like discussions that explain where the problems are and what.. Yeah I’ll drop some links.. There are a couple of issues in devP2P In general Alexey, I think that it's okay for us to not have a robust discovery mechanism for this in the like initial onset. We can do things like really ad-hoc stuff like shoving stuff in ENR records… and Our clients are starting to build out the infrastructure for us to be able to come like query a you know, historical record of all of the ENRs that we've seen for. Well, there's not just that there's not just the discovery in a sense that you ENR or discovery V5 there's also a decision on the on the each node side how many peers do I want to discover from the first network how many peers do I want to discover from a second network because if you want to be in present in both you might want to allocate “out of 25 peers I want 20 on network number two and 5 on network number 2” to make sure that you can basically be on both and do some interlinking there. Yeah, that's fair. All right, so I'm not sure if I had a like broader question in part of this but I was just gonna point out that we are going to be looking at rolling out witnesses soon. 1:00:18 I think we're already looking at rolling out just like access lists because they're the dumb way to do it. And we'll be probably iterating pretty quickly on that. However, I think that like if we can come to some loose agreement assuming there's other people who are interested in using witnesses for anything experimental in their clients that we can come up with a little bit of a loose agreement on on how we do that and you know share a share a a new named protocol or something like that. I proposed that we talk a little bit about sync... We did that earlier it just came up as we talked about updates. I believe we could get into a little bit of a deeper discussion here in the 20 minutes that we have remaining in the call. However. I would like to just open the floor up if anybody thinks there's something better that we can be using those last 20 minutes before because these kind of discussions can be useful on calls but we can also do these things slightly less quickly and nimbly on forum posts and things like that. 1:01:37 I'd like to take the last five talking about just the call schedule and things like that moving forward. That sounds good. Please help me stay on track there and interrupt us in about 13 minutes if we are still deep into something. Okay, so you will get into this more in more detail on the sync call but I think it's worth everybody doing a little bit of homework before that so that we can all start from a position of having some base knowledge. 1:02:14 So for those of you that would join the sync call in between now and then I would encourage you to go read the post on merry-go-round sync. Ask some questions. It poses one opinionated way of going about merry-go-round sync, but, One of the things I want to maybe do is kind of agree on like the high-level goals. And for me the I think I listed the two that I have specifically, which were bit torrent style swarming behavior so that the network so that a very large number of nodes can sync even off of the very small number of full notes seeding the data. Um, the second one was making sure that you can't do ETH_call style on demand data retrieval because that I believe is something that we need to make sure you can't do effectively on the network because otherwise you can build a more functional stateless client that has a better UX than a responsible one... I mean heck we're doing it right now and Trinity I think I think nethermind's doing it right now as well -- like it's very it's a very compelling feature to have but it it abuses the network in a certain way that the network just overall can't handle and so I think that we need to limit that functionality And then I think the third one was it would be really nice if we all had an agreed upon sync mechanism that we were all quite happy with. That doesn't mean that that it's perfect for everybody and it doesn't mean that everybody has to use it by any means but it would be really nice if we if we were able to come to an agreement on a single protocol for syncing that worked well for everybody There's actually there's another requirement which I forgot to mention not to requirement, but something one of the reasons why we thought the sync is prerequisite to the binary tree migration was the observation that this could be the way to very quickly basically join for the new nodes to join without recomputing their states So because I expect that there would be some implementations like turbo geth that will be able to simultaneously maintain binary and hexary trie at the same time with the pretty much the same database and they are going to be able to be seeders in that sync network and the effectively provide everybody else with the binary state whenever they want it. So essentially that means that you should have a choice either to read recompute your hexary database into the binary database, or you can just sync it from you know from that network and so that's why I thought it was important. Yeah agreed allowing clients to essentially leapfrog the complexity of implementing both and effectively piggyback off of another client who is able to maintain both is easier and to just kind of jump ahead straight into the binary tree rather than having the computer or transition yourself is a pretty nice feature there. 1:05:48 Well some of the unsolved problems in this sync approach are how to effectively chunk the data in a way that allows the merry-go-round to kind of automatically scale right like we don't oh and I'll highlight some of this… So you could say one revolution of the merry ground is two hours right and, And if we set it at two hours then that means that for say a test chain that's very small all of a sudden you've got this like two hour window where there's almost no traffic going over the network because it's such a small amount of data that it's you know, not optimal. And then switch to a massive state size with tons of data and now two hours is too short for you to actually transfer that data around and so ideally whatever the mechanism that we settle on here needs to be able to kind of automatically handle these different sizes of state even the imbalance of state in different parts of the tree because otherwise I think we end up with sort of these like magic number parameters that arent’ necessarily ideal, and may cause problems in the sync protocol as the state grows, and sort of things like that. This is an area that I don’t think we have elegant solutions for. I think we have a starting point but I would love to have people thinking about this problem and how we chunk the state up. I actually had a another sort of insight recently so ideally of course we would like to have apart from having the entire state we have we would like to have zero coordination among the seeders to decide what they're going to just stream at this point of time. This however might be mathematically impossible if you want to also satisfy the property that that it needs to be scalable in terms of the state size so we might actually havet o add some kind of coordination and... not straight away but in the beginning we can start with the magic number which sort of fits network but what we could do is afterwards we could actually coordinate through the miners again as we normally do. We normally do coordinate for the miners so we can make it the job of a miner to to to broadcast the correct sort of some sort of correct number which is currently a useful one and it should be somehow computed from the state, but the the thing is that this is something that could become extra coordination point which will solve the problem. But I haven't figured out exactly how and but this could potentially be done. 01:09:06 And yeah that actually even just brings up the other piece, which is that it should be a zero coordination thing. The people seating the state should just be able to know what they should be seeding and the people receiving the state should just be able to know what they should what they should expect to be receiving. No actually actually I don't agree with this one. I think it will be very difficult or impossible to produce the second property so that if you don't have any state how you're supposed to know at which time anything could be synced? You have essentially no information about this stuff so you cannot predict when things are going to be synced because you're assuming there are seeders will use the state information to compute the schedule. 1:09:55 I think I was talking more specifically about right now but I agree that it isn't a hard requirement and it may not actually be beneficial at all. so it's really more about the first one that seeders need to be able to know without coordination what data they should be seeding and and other seeders who are following the rules should also end up seeding the same data. Yes and then essentially if you're are sort of leecher/seeder who doesn't have the entire state then you're essentially your job is simply just pass things around and you can of course you will be able to verify these things is still kind of correct but yeah, you just passing along right? 1:10:53 So I don't think this solves exactly the coordination issue but the issue you were talking about Piper. about you know, how do you know how long to make the epoch so that they don't end up, you know too short or too long potentially we could look into having essentially sub networks but for different parts of a state tries so that you actually move yourself from different swarms to different ones, so there's like multiple like everyone's on their own merry-go-round around kind of. You mean any mean like you're you're some of them both basically start failing and nobody will join them anymore, Yes, that's the coordination issue. I think that the the that line of thinking is still very much worth us looking at because I really like the idea of you know different speed horses on the merry-go-round right? Yes this approach is very interesting. I remember I think we were Vitalik was proposing it on telegram. So of course that would require having multiple subnetworks and that goes back to the my question about how you distribute your peers through between the networks. Second thing is that if this approach kind of sort of works for for for a while we can then again solidify it by again introducing the miner coordination because we could have a for a while we can have some sort of centralized beacon which will tell everybody okay, here's the good network to sync on. and then like out of the out of postable networks these are the good ones everybody's using. 1:12:55 Well, um, this made me think of one other thing which is related to the witness spec and I really need to go do my homework to understand the witness spec better because this is an area where I've been I keep finding myself thinking about it and finding myself not able to answer these questions because I don't know it well. But one property that I think would be really powerful for us to be able to figure out how to add into the witness spec for the way that we hash witnesses or whatever it is. So you can imagine having a very large witness something that's too big to transmit and needing to chunk it up. And one of the things that I think would be really powerful is the ability to prove that a chunk of that witness is part of the overall witness. And I don't know what the specific mechanism to do this would be but essentially if we have this big witness and we've got a identifier to reference it then can we figure out a way to break it into pieces such the each piece is provably part of the big one oh yeah for definitely we can do that. I mean, it's not currently described the exactly how to do this in the spec but I think it will come in in supplementary document, but initially the idea is… And this is even if you've got things out of order? because I know that you can do it if you transmit things in sort of like... you will you you will have to still support this you have to still a maintain some partial order, which means that because the the the, If you imagine the witness is a triangle right? with the state root being on the top and then basically getting wider to the bottom, so then imagine that splitting it into pieces would be like taking the little triangle from the top and then if this triangle will have all the sub roots on the bottom and then from this each sub root you have another little triangle and so forth... we started to call them tiles or something like this so in this the idea is that you split them into this tiles and each tile is verifiable as long as you have already verified that the roots of the tile has already been verified. So essentially you have to stream it not in some specific order, but you have to ensure that there's a partial order. That if one tile is is streamed then it's predecessor has to be streamed already. but otherwise on one line it could be streamed in any order. 1:15:18 What I think what I'm actually getting at is that whatever mechanism we use to hash it to say, this is the hash of the overall witness that ideally you could provide any part of that and prove that it is a piece of it. 08:04 I mean, you mean the the hash in the header. I mean, I'm still not sold on this idea here to be honest because… that's fine Because it's me rehashing the data that's already hashed it's sort of that's what the the aspect I don't like Sure and I think that that is part of just a broader conversation of you know, compulsory witnesses and and the idea of like a canonical witness and whether we need it. Um, All right James I yield the floor to you. Right. None of this is meant to be prescriptive first of all, I I there's a lot of different things to work on and I think it would be useful to start kind of corraling slowly the groups that are interested in working on specific parts into different ones and the biggest one is the sync one we've talked about and possibly having a call around that so having kind of loosely formed working groups around these different topics. I'm gonna send out a questionnaire of that kind of points at the road map and says “what one two three things are you interested in in participating?” as a just to get like a map of general interest of the group then it isn't to say who is gonna work on what or not work on what it's just get like a heat map of interest for what for everyone to to see and then for scheduling it's given that most of us are now homebound for some known amount of time, It made sense to at least propose that we have some we have sooner calls like every two weeks and we can have an on week of stateless ethereum calls and then an off week of side calls scheduled as needed so we have like a little more consistent . Well. I actually... you probably make an assumption that if you when we're homebound we have more time which is a reverse for me. I have less time now because I'm homebound because my children also at home. I guess it was we're not out in events… I’m not really going to events anyway, mostly so. So basically what I'm saying is that I don't want to increase number of calls, so personally but people have a different idea. So do you mean every two weeks is would be too frequent? just so I understand. Probably yes. I would I think I'm gonna say this will be too frequent. Um, the other calls that are more specific topics can meet as needed, I think. and those should not in any way feel compulsory for like a broad group to meet. So all right. I just wanted to bring it up as a possibility and so then we'll think of in the next three or four weeks having another meeting and then probably a stateless meeting not on I want to make sure that the weeks that we have the other calls aren't on the same week as this call so that's pretty much the only thing I'm making sure doesn't happen. And keeping to Tuesday generally. Um yeah I'm supportive of getting our next one scheduled right away for something like four weeks out from now and then like James said figuring out what everybody's working on is I think one of the most important things to do right now and I think that there are groups who are just figuring out what they're working on but I think it's important for us to identify where people's focuses are so that we -- as in me and and some of the others who are trying to do the coordination side of this -- can find where the gaps are because right now I don't think that we have anybody who's directly working on the gas accounting for Witnesses. I think that there's other areas of our roadmap that we don't have people that directly working on them and I know that we have like the side quests that well not on the critical path are all often quite valuable things for us to do and I think there's many of those that don't have teams working on that at all, so there is a post on the eighth research forum where I ask everybody to chime in and give a very brief like “this is what I'm focused on.” I would appreciate it if we could get a broader picture of kind of where people's work focuses so that we can have this holistic high-level view and the start seeking out like. Both seeking out people to work on the areas where the gaps are as well as just making it publicly visible this is where the gaps are if you're you or your team are really interested in this subject this is a you know, high value target so. James is that sort of in line with what you were…? yep, yeah, yeah. All right anybody wants to anything else you want to bring up comments on we've got three minutes and then I'm gonna cut us off I wanna comment that something that I'm gonna be helping drive and that other people have already approached me about is researching, speccing and prototyping ETH1-ETH2 merge stuff, so if you are interested in that obviously I guess let James know if he doesn't survey or hit me up. I'm gonna try to get together some more notes on that soon. I'll add that as an option on the thing. Thank you. All right anybody else anything you want to address towards the group? If there's other things that you'd want added to the list of things to go out just PM me. I've a quick question actually: is there any writeups or literature on estimating the network overhead of stateless proofs in ethX, if that's somebody could link to me if it exists. Yes, there's been research into what they look like today and how they change with binary tree sizes which I think the turbo geth team has decent data and charts on that and Sina produced some information about what how code Merkelization changes those proof sizes and lastly Brian (lithper) on my team has done some experimentation into kind of just block propagation times and trying to find ways to estimate what sort of our upper bounds are on block size where we start seeing Uncle rate changes. So that that everything I just said there is sort of an order of things we know the most about the things we know the least about. Brian's work was very like speculative and heuristic base and indirect the code merkelization was on a small. Sample size, but I think the binary stuff is like very broad objectively well understood well I'd be interested in taking a look at the block propagation work that you just mentioned if you could look if you could send me a link. I will connect you if you will reach out to me on either telegram or something you discord whatever and I will connect you directly with Brian. Cool great, thanks All right everybody it is my clock says 11:30. I don't know what time it is for you, but I'm gonna call this meeting done thank you all for your time. See you all a month!