RIPE 88

Archives

RIPE 88.
Main hall.
24 May 2024 ‑ 9.30am.
Plenary.


CLARA WADE:: Good morning everyone and welcome to the final day of the RIPE meeting.

I hope you enjoyed the extra 30 minutes of sleep today.

(APPLAUSE.)

I know I did.

So today we have one plenary session as well as three lightning talks, then we are going to go over the GM results and then we'll have the final plenary session.

So first up we have Khwaja and from the Max Planck Institute.



KHWAJA ZILBAIR SEDIQI: Good morning. As Clara said, it's the last day. I am a student at Max Planck and last year we started a project together with our wonderful colleagues, Romain and Amreesh and we tried to identify what is taking time in the RPKI synchronisation process and if there's any room for optimisation, we started the project in the internet society project and then we continued to almost finish it by now. So today I am going to present some of the results. And but let's begin with the very basics or refresher for the morning. What is actually RPKI? So usually in the internet, an autonomous, system, it could be AS X, just a random number, announce a prefix but they could be some malicious attempt or some incidents, AS Y might also announce the same prefix which is not the actual order of the prefix. And this causes an issue for the legitimate owner and people thought let's fix it, let's set up a system that helps everyone to know who owns the prefix.

That's why they came up with the idea of the RPKI where every network operator or every AS can put their prefix information and also to which AS it belongs, sign it and publish it as a roller object and the rest of the networks in the internet can use this information an the BGP filters to filters the routes that are invalid or that is not the prefixes that's not announced by the legitimate owner.

So today when we talk we have half of the routing table, more than 550,000 roughly from last Friday, the numbers. More than 550,000 routes are having a ROA object or protected by ROA and out of them is 542 K unique prefixes.

But then these objects are if you look at the distribution for different ROAs, here you can see that the red one which is the which is the object from RIPE, RIPE has the largest contribution, nearly 50% to such as IPv4, IPv6 VRPs and the AS numbers. However on the first part you can see, the green part here, ARIN has the largest number of ROAs compared to any other RIR, we'll come to this point and then we will discuss it why it happens. APNIC is almost the second for many of the elements in the RPKI and AfriNIC has the smallest contribution until now.

But what we are interested in this talk today is more about the time factor or the delay factor.

Okay. It might be a bit complicated but simply, network operators can create their ROA information using one of these hosted repositories where the RIRs or the TS provide the services and they can publish their resource information into into their repositories, as we we call it hosted mode, it's also possible some of the organisations might say we want to have our own repository and our own set up, we refer to it as a delegated mode where one of the tiers are proven they provide this facility and this type of delegated CAs can either publish their own repository or publish using one of these services.

So RIPE 86, there was an interesting talk from a paper called 'RPKI Time of Light' by Romain and he discussed the entire ecosystem and emphasising on the time it takes for creation of the ROA and the time it appears in the control plane of the router. That paper mentioned that the time it takes for the relying party to fetch the data from these repositories, computer validation cycle and produce a set of VRPs, this is one of the key factors mentioned in red or highlighted in red colour and in this study we tried to understand this part because synchronisation delay in the RPKI and I am going to talk more in detail what is happening in the cycle where the line part is, fetch the data from the repositories and produce a set of VRPs, what could be the possible delays and how we can optimise it. Obviously the last stage is the stage where using one of the protocols, for example RPKI RTR protocol, the information gained from the validation process is then given to routers to be applied on BGP filters and filtered the routes based on that information.

So our focus is once more on this part where we call a synchronisation time, and the time we need to fetch the data from.

Repositories: Why is it important at all, why we care about it. If RPKI and BGP go out of sync, traffic can be dropped and it could have a negative impact on the services.

That's why it's important, on the other side, RPKI is growing in numbers, more and more people are deploying it, creating ROAs, but also the deployment, there are more and more repositories, the delegated ones and also the new thoughts about adding new objects such as ASPA, if you are expanding it, you are putting more information and data into it, and a faster reliable operation becomes more important for the network operators an that's why it's important to look at the delays that are happening and what we can do with that.

Our study has two parts. Part one, we tried to analyse the relying party synchronisation delay. And in this part, we set up a VM, located in Japan. And we used more than one relying party software in the routinator on RPKI client to do the RPKI synchronisation or the validation process.

We trade to do it in different modes, where there's no cache in the RPKI. We fetch everything from the network. Off line mode, we have a cache but we do not involve any network, we just validate the cache what we have.

And the cache we look for the update and we fetch the update files from the network and try to see how does that affect.

We also do for each of the TAs, across five routes, we do a deeper dive by looking at the certificate change and the delegated TAs if there's an impact on those, of those delegated CS in the validation process.

For our experiment, we did it for ten consecutive days and we tried to do it every two hours and it generated more than 3,000 validation rounds an I am going to present some of the results.

Generally how does it look like. What is the numbers that we are talking when we talk about RPKI validation time.

So in this plot, what we can see here is the total time it takes for different relying parties in different modes on the bottom. We can clearly see that the online mode takes the highest time and it makes sense because we need to fetch everything from the network and then do the validation cycle and produce VRPs.

That's followed by the cache and the last one is the offline mode where there's no network involvement.

We also did a short look at how different relying party behaves so we tried to check the user time or the time it takes for the relying party to run run the user space an also the kernel space, the system time and user time here. Typically both of the relying parties have a similar pattern with obviously the RPKI being slightly faster in the online mode.

And what other interesting point in this plot is the different behaviour of two softwares, if I just point out this offline mode, you can see the routinator is taking very short amount of time and the RPKI client is taking high amount of time, what does it mean. And this is offline, there's no network. It means routinator performs faster; if there's no network involvement it does a quicker validation of ROAs.

However, as long as the network is involved in the online mode, you can see the result is back the same or very similar and even RPKI client is a little bit better. And what does this hint us? It means RPKI client handles the network operation on fetching the data processing way better than routinator, I think it's tough there's stuff that you can learn from these two software can learn from the methods but we also understand both software has totally different architecture, deployed in different manners and each one has their own strength or maybe weakness. However our focus is not to compare softwares but rather look at the bottleneck in the RPKI validation.

Next we tried to check does it take the same time if I validate the same number of ROAs from each of the TA or is there any discrepancies. For example if I validate the same number of ROAs from RIPE and compare it with ARIN, will they take the same time? That's why in this plot we tried to show how long does it take for each route using both softwares but the number of objects are different, we tried to normalise it and we check the number of VRPs they produce per second or the rate they produce one could produce for each of them.

Let's focus on one of these, the message is the same, if he use the numbers here, you can see the red mark which responds to APNIC and it tells us that one could validate 10,000 VRPs are happening which is roughly three or four times than the other three AfriNIC, LACNIC and ARIN.

And it's even higher for the RIPE case which shows in blue.

Just to remind you, this is offline mode, it has nothing to do with the network. It's totally the same set of data. We are just trying to understand how long does it take for each of the TAs.

The question is what could be the possible reason, it's easy, look at the X axis, there we tried to show the average number of VRPs per ROA, how much VRPs on average they put in one ROA object and they publish it. As you can see here, for both the APNIC and also for the RIPE, we have an average 4 to more than 5 VRPs per ROA and it tells us what information. It means if you consolidate for bundle several prefixes of an AS in a single ROA and publish it, it takes less time in the validation process an it's up to three times faster.

But one might ask let's remember we can all bundle or consolidate the prefixes of the same AS, we cannot put someone else's prefix into a different ROA.

Next we tried to check is there potential, can we do it. For that purpose, we tried to X M I what is the prefix for AS distribution in the current RPKI data and as you can see in this plot, it tells us on the X axis for nearly 60% or 65% of the current data that we have in the RPKI repositories, they have more than one prefix.

It means there is a potential and we could use, we could bundle to improve the validation time for all of these prefixes, reduce the validation time basically.

Next, we take a deeper look at how much the depth of C AR, how much if you go down to the delegated C A, how much does this certificate chain contribute to the validation cycle or add. From this maybe complex numbers but it's really simple, we started with depth of 2 and then we tried to go down and check at every depth for every TA, how long does it take in the online mode where, we did the same in the offline mode but depth of certificate does not have any effect however for the online mode, if you focus for example for this particular area, you can see that the ARIN one when we have the depth of 3 which is the ARIN itself, it takes nearly okay, if you do a matching, it takes 100 or some plus or minus time in this range but as long as we move to the depth of 4 and 5, it contributes even to 200 even more.

And we check X in mind how many ROAs we get from that depth, how many VRPs with he get in that depth, it's 10% an the messages is very clear, ARIN when you go deeper in the certificate chain, and you examine the fact of the impact of the delegation, it seems that a small number of ROAs create up to 50% extra time in the validation process.

For this I am going to discuss the part 2 of the study. One could say but it depends on which network you are so if your network is faster, you might have a better or faster RPKI validation. We tried to measure the round trip time or the RTT, from around the globe towards RPKI publication points or repositories, we use the term interchangeably. For this purpose, we use 700 RIPE Atlas anchors located in 91 countries, now why we select only anchors, we thought RPKI validation is done mostly by network operators, it makes sense to use the anchors and ignore the prop stats located at end users because end user might not need to do the RPKI validation.

Some network operators might handle ICMP base differently than TCP, we tried to go with TCP base route and we conducted trout to all the publication point or repositories host names and for R sync on part 9981.

We did the experiment using both IPv6 and IPv6 from two networks to see if we were using one or the other has an impact. And we also tried to run the experiment for two weeks every five hours and we select five hours cavely because it shifts it every other iteration and it provides information about every hour and also it covers, it gives us an idea of how the delay looked like at different days of the week because there might be network congestion on different times or different days of the week.

And this plot, we tried to explain how the delay looks like globally. And I need a bit of water.

Great. So let's look what the data tells us.

We have two maps with the colour scheme or the different colours shows the round trip time for the delay, the greener it is, the lower the RTT for million seconds, as you can see the left plot is for the repositories with R or the R sync and right one is for repositories for RRDP so looking at the generally let's focus on one of these maps, you can see the regions in south Asia and also Australia and Africa, Latin America, they are more reddish or dark red. It means it takes more time for them to contact or access the repositories, these regions suffer an accessing RPKI content.

If you compare both side of the plots, one with each other, rsync with RRDP, you can see RRDP are darker green and Europe and also the US region, it means the RRDP host names are faster than the rsync, the way they are deployed and where people are fetching.

Okay, so what is next. Then we tried to look not all repositories are the same, can we identify repositories or publication points that are causing higher delay, so for that we exanuine basically the top five slowest publication points on the left plot and we did, the green for the rsync an the blue one for the RRDP to understand which repositories are actually slow. And remember, these are the slowest based on the data from many countries, so it's not a particular location, we measured if they are in the top five for many countries.

We also did it on the right side, we tried to measure the fasters repositories for many countries. The number on top of each bar shows how many countries the publication point was slow and the Y axis shows the percentage out of total, the X acquisition exactly is the host name for that repository.

It turns out that 4 delegated and 4 hosted, two from AfriNIC were in the bottom and also one from RIPE and LACNIC are among the slowest one.

X M Iing the fastest one, we can see the RIPE repositories have different host names are also hosted, the fastest one along with APNIC one and they are there are also five delegated repositories that are slow.

Now one could simply say that but yeah it's in milliseconds, how much does it affect actually the RPKI validation. What is, can you measure it. That's why we tried to actually run a measurement from the same location where we use the RIPEAtlas anchor and the validation software. Our goal was because the slow publication point might be different from network to network, a slow publication that's slow for me might be faster for you. So we tried to capture here the top five slowest one, the top five slowest repositories is shown in red and we tried to see on the X axis on average how long does it take to download the snapshot or the repository data from these slow publication points.

The green one indicates the time on the X axis, how long does it take to fetch the data from the fastest publication point an the blue one shows the rest of the data.

You can see that red is more Tore rid and it takes on average the high speakers and by the way the Y axis is the number of validation that we did, we measure how long does it take to download the data from slow repository and fast repository and you can see this is 15, and this one is 4. So slow repositories are taking almost three times higher delay in the RPKI, at least the fetching or downloading process.

But it might be that this is because these repositories have larger amounts of content? No, that's not true, these repositories have one fifth of the fast repository data. So their data is small but they take even longer time in the validation process.

And finally, we try tried to XMI. Why the fastest repositories are actually fast.

As a result, it appears they are using CDN services, all the top five fasters, they are hosted on CDN service providers. And then we try to quantify how fast they are and comparing to the other one, for this comparison, we tried to use the RIPE repositories which was among the top fasters one with ARIN one and we thought it's a good comparison because ARIN is also a TA and a RIR, so we did not compare it with the delegated one, you can see that the region where the RIPE, RRDP is people are trying to access is quite green and it's dark green, it means it's more or less less than 30 milliseconds.

In contrast, the regions where people are trying to access ARIN one is more red and it indicates the round trip time takes a longer time, nearly 300 milliseconds. To fetch the data.

So one of the solutions could be using services like CDN. But we understand it might come with a cost.

Great. So we did what we look at, what we understand from the presentation if we try to summarise it quickly.

We checked different RIRs and their data and we understood that bundling several prefixs of an AS into a single ROA might improve. Next we checked the depth of CA chain and we got that a few ROAs hosted at the delegated repositories might impact the validation time by 50%, it
Was a particular place for ARIN.

And next we took a look at the disparities. Not all regions are equitable or access to RPKI content and we think it's helpful to identify these areas where these repository maintainers can improve their services once they know in which region they are slow; they can improve it there. We also identified the top five slowest and fastest repositories that is currently in the RPKI system.

And finally we discussed that CDN usage is one of the factors that makes repositories faster and even sometimes nearly ten times faster than the non‑CDN user. With this, I am done with the presentation, thank you very much for your attention and happy to take your questions and comments.

(APPLAUSE.)

. Antonio Prado thank you. Are there any questions? One.

AUDIENCE SPEAKER: Tim Brujinzeels: Just a comment. Regarding putting more VRPs on a ROA object, combining them, a bit of background. There's actually advice in the IETF not to do this potentially if you lose one of the prefixes the complete ROA would become and valid, however for the RIPE NCC system, this is not a problem because in that system, we have both the parents and the child CA together so we know when resources change and we can combine them.

Because you know, when resources would be reduced we can just reissue the ROAs pro actively and that would not be a visible problem to the world but for delegated CAs, this is less of an option. Just a slight clarification the combination is not an option for everyone but is an option for some.

KHWAJA ZILBAIR SEDIQI: Thank you for the clarification, yes, I think the main message is currently at the RIPE NCC it's doable and it's safe to do bundling but it might not be a proper solution for every delegated, I appreciate the input and insight, thank you. Antonio Prado thank you.

(APPLAUSE.)

So, not bad, because Friday morning people are the best. Besides, we survived also whether alerts. So... next speaker is Andrei, we are going to deep dive into MANRS update and some news I guess.

ANDREI ROBACHEVSKY: Good morning everyone, it's not deep dive, it's a lightning talk, it's a short update and for those of who know what MANRS is, I have to say MANRS remains the same, there are substantial changes and for those who do not know what MANRS is, just let me very quickly introduce this to you. So mutually agreed norms for routing security in industry led initiative which is aimed at stimulating adoption of good practices in routing security and allowing people that adhere to those practices demonstrate that social responsibility in their adherence.

The sort of four main facets of the effort are well, the common baseline, so that's a reference line and we have four programmes for different types of network operators and one programme for network equipment vendors and each of them have a slightly different baseline which underpins the fact that those types of network operators can contribute slightly differently to network security or routing security.

Commitment, well that's one of the very important things, you have to commit and you have to demonstrate your commitment in order to become a MANRS participant and I heard from people and even now because it's almost ten years old, that MANRS is just a self declared web page. That's not true. We do validate applications and we look to the best viabilities that whatever is claimed by MANRS participants is in fact /EUFR /PHEPBLTD.

And we also do it in transparent manner, for this we create a tool that's called MANRS observatory, you can see how networks perform. I will talk about this a little bill later and it's community building, I mean all those actions that were created by the community, each of the problem was launched by the community and driven by the community.

So this is a quick sort of timeline how MANRS grew and how new programmes were introduced, starting with network operators in 2014, so yeah, that's 10 years, we can celebrate in November and finally in 2021, we worked together with network equipment vendors because of course you cannot implement best practices if your equipment doesn't support necessary controls.

As I said, the programmes are slightly different but they are united by sort of basically types of actions, I have to mention that those are not best practices and many networks that participate and even do not participate in MANRS they implement more than what is required is MANRS, MANRS is a common denominator and it was selected on the basis that it can be too low so it doesn't make a difference and it can't be too high that only a few can join and demonstrate so that sort of is what MANRS is looking for.

One of the most important is filtering and that's preventing incorrect routing announcements from propagating or originating in your network and also important things like co‑ordination which might be simple things but that's how the internet operates, ability to reach out and co‑ordinate with other networks, that's one of the essential features here.

I mentioned commitment and transparency and for this we created MANRS observatory, that's what you see on the left side.

Now MANRS observatory when we did this, it's available to public so you can go and play with it but you cannot see the scores for individual networks, right, that was a design decision to so we didn't want to create a name and shame page but rather provide a tool for MANRS participants and other operators that are interested to look at their own performance, how it looks from the outside.

But through the years, we made efforts together as network operators to improve on transparency and before MANRS participants table would just check marks and now we are publishing some of the scores. We are not publishing all the scores because we also are working on the quality of data, improving reducing amount of false positives but you already can see how some networks, different networks MANRS participants score on the table. That's to improve transparency.

So I didn't, I mentioned already several times ten years old, we have good uptake, we have more than a thousand participants which goes pretty still from one programme to four, various training resources and other useful content was created on the way and I think there is significant awareness of the evident also in the policy circles, MANRS is being referenced in policy documents in various countries, which is a good thing.

And we are working at improving and adding new features to the MANRS observatory.

Now this project was launched, with the help of internet society ten years ago and after ten years of providing support, a decision was made to establish partnership with another organisation which is called global cyber alliance and transfer the MANRS sec yet function from internet society to GCA. That happened in January 2024 so right now MANRS sec yet is located and supported by global cyber aligns.

Now, who is the global cyber alliance, it's a non‑for profit with a mission building trustworthy internet for all. It was established in 2015, it has three offices so it has sort of global footprint. And GCA is aimed at building programmes that are based on collaborations, those programmes are very similar to MANRS.

The two big programmes, one is aimed more at end users and building cyber capacity, when I say end users, it also includes enterprise and includes election systems and user capacity to combat cyber crime and in another part of the programme is called internet integrity programme and that is focused on infrastructure and the main infrastructure, primarily domain abuse, it focuses on traffic, unwanted traffic and it also focuses en routing now with MANRS.

Now where from there, as I said, continuity remains the same, MANRS, the spirit of MANRS, the community driven nature doesn't change.

We are looking for scalable growth. Because we can't just grow linear, we need to engage with local communities so we are trying to engage with local communities, so all these communities can take this initiative and spread machine the communities themselves. Conformance and level of assurance, as I said, we check how well self declared conformance is, it indeed works out in reality, but measurement framework requires improvement, you can measure only so much from the outside so we are working on that and improving data quality.

Working on increasing value proposition, yes connect presented and initiative or effort called MANRS plus and it's aimed that participants get more reward from participating in MANRS and therefore we can expect more from them in terms of co‑operation maybe, a level of assurance, how we can measure their performance.

Tracking, supporting regulatory work around the globe, that is very important. You know probably many governments, important governments that are taking efforts to regulate operators and routing in particular so I think pointing them to those efforts that make difference is very important.

And inside the global cyber alliance, there's sin gee between the three sub programmes, trying to do the same approach that he we did in MANRS to address some problems and challenges in domain space and traffic.

Right. So setting the next decade, we are very eager to hear from you, what you think about MANRS and your ideas. This is a very short survey, if you are interested, please very much appreciate it if you can share your thoughts and with this I think I'm done, thank you.



(APPLAUSE.)



CLARA WADE:: Thank you. We have one minute for questions. Anything online? Okay. Yeah.

AUDIENCE SPEAKER: Thank you very much, I am representing researchers basically and I recently heard about the MANRS and had a chance to get access throw the API and what I found out, I don't know if you can comment comment on this, that it's not very easy to download historical data using API so you need to crawl for every months for every ASN if you want to get all this and this is not very convenient, is it possible to improve somehow this thing.

ANDREI ROBACHEVSKY: I think for this feedback, I would ask you kindly if you can contact, write us a contact at MANRS.org so we can maybe see what improvements we can bring.

AUDIENCE SPEAKER: Thank you very much.

AUDIENCE SPEAKER: Thank you very much for this, just a quick one, it would be super helpful if there was a little bit more co‑ordination between the various sort of independent organisations that are measuring RPKI compliance, as they don't all say the same thing, and the message is kind of wildly different actually and of course they use different methodologies but theoretically they are looking at the same data his mostly RIPE RIS and 6477 stuff and it would be great if there was a little bit more co‑ordination interest there so they can have be all synced up.

ANDREI ROBACHEVSKY: We are using RIPE data and if you look at MANRS framework, it shows the data resources we use, we don't generate our own, we don't have our own measurement infrastructure but I take your point.

AUDIENCE SPEAKER: Thank you.

CLARA WADE:: Thank you Andrei.



(APPLAUSE.)



All right. Next up is Maria, she has a lightning talk on simulating networks in your laptop.

MARIA MATEJKA: Hello, I am Maria, good morning everybody. A bit of context for this. There was some talks here and some previous sessions and even elsewhere wherever people were presenting about simulating networks and they always need a big cloud platform or something like that.

Or you need to spin up a container for every single note and I am asking why. Because I am developing a software, I am Maria from CZ‑NIC if you don't remember and I am developing BIRD so I need to run a repeatable tests, I need to automate all the tests to make it possible to write once again and again and again and I if I can the bug. I need to make it fast. Because if it's not fast, I spend two minutes fixing a bug and 15 minutes testing. I want to flip this ratio to the other side, I want to spend several minutes fixing the bug and then the test shows almost immediately whether I am okay or not and I need it to be portable. At least physically.

Which means when I'm, when I sit in a train this afternoon, I will not run ‑‑ I will run my tests an continue coding even if there is no wifi and there was no wifi in the whole of Polish part of my journey to here on Sunday. So I am expecting that my laptop will have to run this at least for several hours.

Yeah. So what we need for testing routing software is we have to create different networks contexts, we have to keep different TCP IP stacks and we need to connect those with different kinds of links and we have to run some software in context of those networks. Of those TCP IP stacks and we want to have it easily accessible which is by the way one of kind of the problems with the containers because they are too much isolated, I want to see inside the nodes.

That's all what I need.

And what is useless for me is the full blown simulation, I don't need a kernel for every single node I am speaking OSPF on, I don't need even a distribution image, I don't need even an L pine, I don't need a system lead, I don't need anything, I just need my laptop. Also there is some legacy. We needed this, we needed this ten years ago. There was no quarantines, there was no cloud. These were quite crazy times when I was testing this in my first days in CZ‑NIC, we were experimenting with open V switch and stumbled on programmes sending packets inside open V switch, then I went to a party with my friends, one of them was an operator and I didn't know this was a zero day and I mentioned it and he I had what have you just said, it's em bar goad, you should not think about things that are under embargo, I was like what, I have just found a bug which is under embargo, I didn't know, nice.

So we created something we called net lab, there are several more net labs, this is a term from 2014, well it's basically by Andre, one of my colleagues longer on the project than me, it uses Linux kernel name spaces and we minimized start up times in such a way that I can spin up test in several seconds and turn it down in several seconds, it doesn't have much overhead, I can just run the nodes and adjusted the memory which is needed for the BIRD I am running in, I actually run this in my laptop.

There are some down sides. It's our internal project, it's still nothing to be actually well published. There is not much documentation, if somebody is willing to help us, we would love to accept some contributions to the docks men tation. It needs local root, which is going to be kind of history soon because we are, Andre is working on fixing this, there is no other isolation, you don't want to run this, it's a testing facility. It's just CLI only, you just type it into configuration, it expects BIRD to run everywhere, it's kind of hairy but it can be polished quite easily.

This is how a basic fore node loops look like, you need this and four configurations of BIRD and that's all and this says it's a loop of four nodes and it's somehow... this is one thousand node loop which I spun up on Tuesday morning in my laptop. I had to kill my Firefox because the one thousand BIRDs eight five weeks of ram, that was all it, it took something like three and a half minutes to spin it up and seven minutes to take down because when you tier down the first node, all the other nodes start to recompute the OSPF so it took some time. But it worked.

Yeah. If you want to try it, you can ditch your containsers for this thing. There is a link, we may move it to the main repository. If you want to contribute, you are welcome. You are very welcome. But please write us an email, it's a project in high development.

Yeah. And that's it. Here are my contacts.

Thank you.

(APPLAUSE.)

ANTONIO PRADO: Thank you Maria, it was fast indeed. Any questions, yes please.

AUDIENCE SPEAKER: Hello Maria. Costas: Very interesting specifications of course and all the software, the main question is what type of nodes do you simulate, from what I understood, BIRD only nodes.

MARIA MATEJKA: We are simulating the routing itself, so the types of nodes are just something which runs SPF or BGP or Babel, it's, we are expecting that we are only testing the routing itself, nothing more.

AUDIENCE SPEAKER: Yes but can you check again the routing implementation of the vendors, that's the question.

MARIA MATEJKA: It's possible to, it's theoretically possible to connect some, to connect some external boxes to this and then you just have an external box and some nodes in this. It should work.

AUDIENCE SPEAKER: Thank you.

MARIA MATEJKA: It's in the road map.

CLARA WADE:: We have an online question from Nick could he. Did you consider more abstract event based simulation, it wouldn't be as good as finding bugs as the interface between BIRD and OS but from experience it's excellent at finding bugs in protocol implementations as it can more exhaustively search event ordering, etc.

MARIA MATEJKA: I am quite confused with this vendor based things, there is indeed an overlay on this which is doing some more things, manipulating the BIRD inside so there is probably something resembling what that, what this in the question but it's not in the format of this talk. Antonio Prado thank you. Last question please?

AUDIENCE SPEAKER: Thanks to a lot for this, several of my vendors like the top of mind stuff in their marketing these days is what they call a network digital twin which is basically what this is, right, if you want to promote this a little more and make it current buzz word compliant right and get some more traction for it, that's what the cool kids are calling it these days but it's super, super helpful and I have been messing in this kind of stuff for a really long time so thank you.

ANTONIO PRADO: Thank you, thank you Maria.

MARIA MATEJKA: It looks like we are done. Thank you.

(APPLAUSE.)


ANTONIO PRADO: Okay now it's the turn of one of most experienced colleagues around here. Jeff, about the DNS root server system, thank you Jeff.

JEFF OSBORN: Hello good morning, my name is Jeff Osborn, I am with IFC an the operators of the F root, I am chair of the root server system advisory committee at ICANN. So ISC like RIPE an is an operator of a router sever, you have got the k‑root and the f‑root and one of the issues that we found is really pretty widespread, the engineering free reign we have had for the last few decades and how to run the internet is sort of ending and the legislators and regulators and government controllers are becoming more and more important.

So something refining is the laws are being put in place and what we need to do is figure out to how to talk to the law makers before bad laws get made. So in order for somebody to understand what they need to do and how to legislate around the DNS and root server system, they have got to understand what is DNS, what is the root server system. We sat around and talked about this and realised most of us who have been in this industry for a very long time and I have been involved in the commercial internet since 1983, are my mom doesn't know what I do for a living and my wife doesn't know and I have a lot of smart friends with masters degrees and doctor rates brilliant in their field have no idea what I do for a living. So we realised we were going to be talking to very intelligent, very well educated people about something they know nothing about and having to explain it in the short timeframes they will give you in a way that's both meaningful and actionable and this is hard.

So I have got a lightning talk timeframe so rather than give you the full presentation that we are providing, I wanted to have a meta presentation, I am trying to establish here in about five slides this is the problem set, this is what we think we are trying to do and if you are interested and as a RIPE community member you want to help out, the last 21 slides in the deck are this is the thing we are actually giving to them so I am telling you the prep part of this is our problem and then if you are interested, you can go from slide 7 to 28 and see ah‑ha, this is how they are going to present it, if that makes sense. So again the problem set is most policy makers don't understand the root server system and some of them are going to have to understand it. Not just theoretically but operationally.

The usual way you explain DNS makes sense from an engineering point of view but translates poorly unfortunately into politics, if you think about it, you have seen first you ask the root and then the TLD and finally get to the domain name, well that makes sense logically but it's not really operationally the way things work, nobody wants into a cold dead internet and starts it up from scratch. There are cautious and everything that already work, this makes it look as if that root server is the beginning step and is vitally important every action that happens and introduces all kinds of latency and we are giving the idea of 86 million milled seconds in a day, latency is not really an issue, everybody is going to know this stuff when it's warm.

It also under states the operational significance of the public resolvers, the quad 1, quad 8, quad 9, all these things.

So what's the problem with that. The problem is this all creates the false impression that the root servers are the gate keeper to the internet. The American term is 'on ramp', the English term is 'slip road', whatever you want to say, the idea is it looks as if this is a controlling factor of anybody being able to use the internet. The term hierarchy in engineering makes a whole lot of sense when you are looking at this but to a politician they hear hierarchy an every one of them wants to be at the top of the hierarchy so it sort of fails to correctly describe the way this works.

The idea being a gate keeper means you are the first place where the errors can happen and we really need to explain, this is a system that hasn't had an operational failure in 40 years.

The root server system is comprised of over 1700 instances with anycast, the operators run 12 different organisations on different /HAR platforms and different operating systems and different DNS implementations and so it's hard to imagine an institutional or technical point of failure.

So the solution is we are trying to explain the way this looks in a real world, where there are cautious and they are all full, the first thing that really happens is not that you go and figure out what's the hints file look like, what really happens is you show up, there's a cache and 90% of the time, the reso ever knows your answer.

Then in a less likely case you have got to figure out the domain names, in a less likely case you have got to figure out the TLDs and what we estimat to be about one in 5,000 to 10,000 queries you go to the root.

So this is literally a 0.02 query, it's a very unusual thing. This is the t‑shirt component of this presentation in my last minute. I don't yet have a t‑shirt with it on it but I want it, in the millisecond world of resolver, queries to the root system are rare. That's where the take away we want to leave people with.

How are we going to present this message. We have a document we have been putting together that's about 15 pages that most of you would find interesting I think. It's trying to make it relatively digestible but complicated and then the deliverable 2 is what I'm trying to do to take that and make it into an truck tree level thing, this is something you could show a family member who wasn't technical and they would get a grasp of what you do for a living, finally your spouse or partner or parent might know something and then when they say go dig into it deeper we have got the first deliverable, we are doing it on behalf of all the roof server operators and if you as part of the RIPE community would find that interesting, we would really appreciate taking a further look at it and I will show you, here's the 21 slides.

Summary. Thank you very much.

(APPLAUSE.)



CLARA WADE:: Thank you Jeff for the time keeping.

AUDIENCE SPEAKER: Jim Reid. I am speaking as an individual of course. Jeff, it's good that you are presenting this kind of stuff here but I have got a couple of points I'd like to make.

You mentioned there about the fact that the policy makers don't really know how DNS works and how the root server system works, I think it also goes the other direction the people involved in the root server system really don't understand the policy makers concerns and we have to somehow bridge that gap and I think that's something we both recognise needs to be done. So both these groups are eyeing each other warily, we know there's some gap between us but we haven't figured how to deal with that and I think it's important to get on top of that, you have the situation the policy makers says we have got things like the NIS2 directive because the policy people an the politicians think we need to have some kind of notional control over this thing let's legislate and put a club in place so we can beat these people about the head if things ever break. The other point I want to make, just answering to say the roof server system for 40 years that has never failed I think is a reasonable enough answer, you can't say yes, that's true it hasn't ever broken but from a policy point of view, people are going to think well, what if it does, then what.



JEFF OSBORN: Well put. Couldn't agree more. I think I am cutting into coffee break but for that first half of what you had, I am going to tag you in a conference coming and say and Jim can explain better than I can how the other half of this works.

AUDIENCE SPEAKER: Thank you.

CLARA WADE:: Thank you. All right. Niall I think you can ask your question maybe during the coffee break. Rile rile no, this is for everybody's ears and it's important that it's in plenary, it's very important the point that Jim has made and it's just one aspect because it's more than the DNS of the compliment tarity between different professions which now have to cooperate together in how the internet works. The legislators, the regulators, and engineers who have been doing so well for so long and it's psychologically it's a huge challenge, organisationally it's a huge challenge and Jim is one of the better people to explain it.

CLARA WADE:: Thank you, Jeff. All right, we are going into the break, Andre chairman of the board will also announce the GM results.


LIVE CAPTIONING BY TINA KEADY, RPR, CRR, CBC,
DUBLIN, IRELAND.