RIPE 88

Archives

RIPE 88
Plenary session
Main hall
21 May 2024
2 p.m.


MOIN RAHMAN: Hello, good afternoon everyone. I hope you had a nice lunch. Will so, Jan and me will be chairing this session. As usual, whenever you are actually going to ask the questions, please say your name and affiliation and please follow the Code of Conduct rules.

We have one‑and‑a‑half hour of time, if someone wants to nominate yourself for the PC committee, please nominate yourself or someone else. And I will start with our first speaker.

RADU ANGHEL: Hello everyone. I am Radu, and I will talk to you about numbers, mostly AS numbers, numbers of AS numbers, and other types of numbers.

This is the RIPE conference, so I probably don't need to explain what AS numbers are. I will go into it directly.

We have a lot of them. We have 4 billion, only 116,000 are assigned by RIRs. We can see them in the global routing table from the APNIC stats. We can see 75,000 v4 only. But we don't see 40,000 of them. Where are they?
We know around 7,000 are IPv6‑only, but we don't know anything about the rest. The 20% from the RIPE region that are not routed at all, those only refer to the RIPE region. So, they are not 20% of the 41,000 or from that, so it's just from what's assigned to the RIPE region.

Then, we have some valid reasons why an ASN could be uniquely assigned to somebody, but still not being visible in the global routing table. It could be Internet Exchange points, it it be route collectors, they wouldn't be visible. It could be private networks requiring unique numbering or it could be abandoned, we don't know, so we are missing a lot of ASNs from the global routing table.

Then we have the assignment numbers, Internet assign numbers authority, publish these statistics of average monthly assignments per RIR, or these numbers are from May 11, they change a bit but not dramatically. The RIPE region wins this with almost double the number of assignments than ARIN.

APNIC is also on the second place, so, APNIC also has a lot of assignments.
That's why I will focus only on RIPE and APNIC for now.
We also have this nice document, RFC 1930, that's 28 years old, and it trying to describe the guidelines for receiving assignments, an AS number. It is mentioned in documents for all the five RIRs but with different degrees of importance. For example, in APNIC, requests are validated based on it. In the RIPE region it's just a small reference, see RFC 1930. It's not clear if a request can be rejected based on that or not.

Also, RIPE has some harder requirements I will discuss later.
So, now focusing on this wonderful RFC. It has three main points that allow you to get an AS number. First one, and the most important, is that you need to exchange routing information externally, that is BGP because you want to be on the Internet so this is very easy to check. Having multiple prefixes is also mentioned but it's not very important, so it's not ‑‑ if you don't have multiple prefixes, go away, it's not like that. Another important thing is to have a unique routing policy. This is the most important, in my opinion.

Here, this is very debatable what a unique routing policy is. For example, in the RIPE region, a multihoming is a requirement, not very much enforced but it is a requirement. This multihoming is a unique routing policy because you can't say no to a network that is multihomed. But there are other unique routing requirements that might pass this check. So, I don't have a strong opinion if multihoming should or should not be required, but I think the unique routing policy is very important.

Now, let's assume you got ‑‑ you checked all the boxes, that you actually need and AS number, where do you get it from? You usually get it from the Regional Internet Registry where your network operates in. But your network would be very large and you can choose or you can find an RIR that is open to out of region assignments that is easy to deal with and may be cheap. So, there are a lot of out of region assignments.

This brings us to the RIPE NCC, which is amazingly our region. This issue was discussed last year. A maintenance fee was proposed, but people didn't like it for some reason. So it was rejected at the General Meeting. It is now proposed again in a different form. We don't know what will happen.

So what's the problem with these things? There are a lot of requests coming, an increasing number from end users. End users have no contract, no relationship to RIPE. They have to pass through an LIR. These end users can be natural persons, or they can be companies. But from a presentation from last year, we know that over 50% of these requests come from natural persons. So, this kind of makes people unhappy because there are a lot of requests.

Also, the number of unsuccessful requests is growing. These unsuccessful requests are requests where people don't provide adequate documents, they silently drop when they are requested for more documents from the NCC, and the NCC can't close the request until some time passes so they have to wait until the end user will provide documents or not. This consumes a lot of time because you have to receive the documents, you have to check them, some of these checks have actual costs, and guess what? These costs are supported byte RIPE members, which are the LIRs that have the contractual relationship, not by the end users.

Now again to the global routing table. Sorry, because it's only the v4 one. Now, we have different numbers. We have RIPE again on the top with a lot of ASNs, and in the brackets it's the number of ASNs that announce only one prefix. Somehow, this is double the number in ARIN, so I don't know what ARIN is doing that networks over there have multiple prefixes, but they are doing something.

Now, what are ASNs? This is wrong. They are no longer 16‑bit numbers in the RIPE region since last year I think they are only 32‑bit so they make no difference between them. But they are definitely not personal identification numbers.

Now, the thing: What are the personal ASNs? This is a more recent term. It refers to networks assigned to natural persons holding networks, and there are very different opinions on them. Some people are upset because they are not doing any actual research. Research is conducted in academic settings, have a goal, having BGP at home is not really research. Also, some people consider this as a way to get more young people interested in networking. These are the main different views. Maybe there is something in between too, but last year there was also a presentation that said there were some ASN requests from a one year old person and also an 80 year old person. So we're very inclusive, we accept requests from everybody, we can't just drop them, we have to check.

Now, the world population is 8 billion people, but there are only 4 billion ASNs. Again I'm excluding the reserved ones, we have all 4 billion, we can assign them at any time. But this means only half of the potential network engineers can get one, so maybe this could be again a scarce resource.

Now, in general, the answer to this question is no, you don't need an ASN. If you like, computer networks, you don't need an ASN. Doctors have licences that follow them if they change their jobs, so you can't have a medical practice with a janitor, you need a doctor with a licence. But you can have an ISP with your own ASN and it won't change if your network engineer leaves.

So network engineers, not all of them need AS numbers.

If you want to learn BGP, you also don't need an AS number, you can take courses, you can read books, you can play with private AS numbers. There are some private networks that try to replicate the Internet over tunnels. If it is a requirement to have a four routing table, I am sure it can be found from somebody. Somebody can send them this full routing table. If you are actually doing research, you can submit a request with what you are actually trying to research, and you will probably get an academic temporary assignment for research. But that has an end, so you have to actually research something, not have your PlayStation on a separate AS number from your TV.

Once you get an AS number registered, you suddenly get access to a lot of databases that are otherwise unavailable to you. This, you can't replicate with private ASNs, but I don't think it's a huge problem. But I think it's important to keep these databases clean and useful for the community because the community is not only the research networks, there are also some actual networks trying to do Internet things.

I will now have some examples how this can break things. There is this wonderful site PeeringDB where ASN contacts the register and they declare their network type, they provide information about data centres they are present in, Internet exchanges, a lot of things, but most of it is user generated content. And in general, there are issues with user generated content.

These holding networks, personal ASNs, how you want to call them, mostly pick the education research category in this. There was a feature request to add the network type personal, but for some reason this was rejected. So now you kind of don't know if you do research you don't know if this educational research network is a university, is it a national research network, is it somebody's living room network?
.
Also, you have the recommended prefix number which is used in some cases by tools to automate network a deployment where you set the prefix limits on the session based on that.

Now, let's assume that I am a serious person and I run an Internet Exchange. I am also certified for privacy, like other Internet exchanges. Now, I also accept these educational networks because I am open to letting people learn, but suddenly I get a lot of prefixes from ‑‑ I get a high prefix limit from one of the networks. Okay, this is just one measure that I take to protect my network from a leak, but let's assume that my RPKI fails, my other filters fail, so I am left with the prefix limit from PeeringDB, which is very high, there are actually educational networks that have more prefixes in PeeringDB than Hurricane Electric, or...
.

And I have a second example. We have this wonderful RFC about geofeeds. This is pretty recent. In section 3.3, they explain how to verify the veracity of this information. Somebody needs to check it if there are errors, manually check it. The RIPE dB supports it, you can add it as remarks, you can add it as geofeed.

Again, I am also a serious person, I am always a serious person, and I work for a bank, and I start to see penguins assessing my services but I don't have penguin customers, I have no customers in that region, so it must be a problem. I will contact my security people that take care of these updating these feeds and they will notice these feeds are wrong because somebody put something very creative over there. So this RFC is almost four years old, but you can't really trust it because people are playing with it.

I call this Internet graffiti because they are putting things that are really not operational over there. For example, you can see a card test signatures, this is not really operational info, but...
.
The alternative to this is becoming more restrictive in what you accept in the dB, but I'm not sure this is what we want. So, I don't think we want to become like LACNIC or ARIN, to only have certain fields.

Now, there is another trend with these ASNs. People have rights because GDPR, because they are natural persons, so maybe RIPE shouldn't see their birthday or other information. I have another idea. There are phone numbers, now you can operate our own mobile network with software defined radio, I think it would be wonderful to have a public phone numbers assigned to it and I'm curious how it would work going to the national agency that's in charge of these numbers and give them a sensor ID and tell them please give me a range of phone numbers. I am curious how that would work out.

Now, there is this website run by Ben over there, he manually maintains this list of personal ASNs that gets some bonus things if they are registered there. Well what happens is that most of these ASNs are in the RIPE region for some reason.
And the numbers are increasing pretty fast. Now, they represent almost 2% of the global routing table. I don't know if 2% is a lot or not. It depends. So, this is public information, these are the top five sponsoring organisations for these.

I will go a bit faster because my time is almost over.

So, there is a business model for this. You can ‑‑ these small companies can take money for ASN requests, but the RIPE doesn't take any money for the ASN requests. RIPE just handles the paperwork, they don't have to worry about that database accuracy, but guess what, if you have an AS number you also need additional things: You need IP addresses, IPv4 is expensive, what do you do? Do you deaggregate some IPv6 from the LIR helping you out. You have also a wonderful service that exports your route to RIS for a very small amount of money. I don't really understand how this helps except by polluting RIS and faking AS paths. I don't know why this is a must have for networks. But these services need to be cheap and affordable so people don't complain a lot.

How does this affect us? Us can mean a lot of things. It affects LIRs by increasing the membership fees. It affects the RIPE with database accuracy, and they can create operational issues for everybody.

And for me personally, it affects the data quality for actual research because I find a lot of weird shit over there. So how can we fix this? We can enforce the existing policies, we can ask the ASN contacts to clean up some of their non‑operational information they attach to their AS numbers, and maybe find some other ways for young network engineers to be interested in networking.

So, thank you. I am over time.

(Applause)

JAN ZORZ: Thank you very much.

AUDIENCE SPEAKER: Hi, Ben, BGP tools. This is going to be a comment. I'm really sorry. I'm not sure I have ever ‑‑ seen in person, but I don't really enjoy you seem to have a view that universities are the only ones that conduct research. If you look around this room, that is very clearly not the case. And also, the personal networks also do operate serious businesses. There are things called sole traders. So not all of these things are what you think they are. And I don't really understand the call to action for this talk. What is it that you want to change?

RADU ANGHEL: Well I would like to have some more thorough checks on the unique routing policy because these networks are just building overlay networks. If you like to have an overlay network, use ToR or something.

AUDIENCE SPEAKER: But ‑‑

RADU ANGHEL: I don't understand how it helps having a lot of tunnels the same.

AUDIENCE SPEAKER: The thing I mentioned here, this will be my last point, is what is the purpose of the Internet? Because I'm not sure the Internet has a defined purpose. The purpose of the Internet is what it does.

RADU ANGHEL: Yes, but currently it's mostly business and 2% research, and the 2% is increasing. So, it will maybe become 20%, 50%, I don't know, when will it stop?

JAN ZORZ: Okay. We will go to Leo and then we have online questions.

AUDIENCE SPEAKER: Hi, Leo Vegoda and I am wearing my PeeringDB product manager hat. So answer your question about why PeeringDB rejected the request for a personal network type label, and that was because PeeringDB discussed the whole network types label last year, this is the product committee, and they decided that that's not actually a useful thing in making a decision about whether you are going to peer with someone. You know whether you are going to peer with someone based on how much traffic you exchange with the network, and whether they are a content network or a personal network or whatever isn't really a deciding factor, and cluttering up PeeringDB with lots of additional information isn't useful. Of course if you disagree with that, the PeeringDB product committee can rediscuss it. You can create an issue in Github and explain why they were wrong, and they can discuss it again. But I just wanted to answer your question and explain the situation.

RADU ANGHEL: I actually agree with that because I don't think personal ASNs exist, they are just numbers that represent networks. So I agree with that.

AUDIENCE SPEAKER: Tobias. First disclaimer, I am currently holding three ASNs personally registered to me. But I understand I'm not in that data. Anyway first is the point about the phone numbers. That is a splendid idea and I can only encourage that and I recommend you talk to Havel Felton which is pretty much in that. And concerning the tunnel networks, I think the Whois without a tunnelled backbone meaning MPLS or anything else that pacts packets under packets under packets may throw out the first zone.

MOIN RAHMAN: "None of the issues are limited to personal ASes. Such ASes are mindful of ... many non‑personal have... either accidentally or maliciously. This is an intrinsic issue with self reporting. One can't expect the data to a hundred percent to be accurate and up to date. Do you have new quantitative data that AS are a significant part of that problem? It seems ridiculous to plan them for a possible something without that?

RADU ANGHEL: I didn't really understand the question.

MOIN RAHMAN: "Do you have any quantitative data that such personal ASes are a significant part of that problem?"

RADU ANGHEL: Well, there are the numbers that the requests are increasing, so they are part of the problem. AS numbers are increasing, so...

AUDIENCE SPEAKER: Marco Schmidt, RIPE NCC: Thanks for the presentation, and I just want to clarify, Registration Services is not happy. We are very happy people but it's true that the workload is impacting and actually I plan to elaborate more on that one tomorrow in the Working Group session, after the social today you have energy to be there, I can give a bit more information to what you mentioned. So thank you.

AUDIENCE SPEAKER: Will: I have got two comments. First of all, I was totally with Ben and all the comments he made, so plus one on that, thank you Ben. I think you have been not been using the database, the Whois things for so long, because you would know that it's a mess for a long time. I don't think that those personal ASNs will increase the mess that we have got there, and the last comment on that was, will be that I think that now that we don't have any IPv4 space available in any pool anywhere, we will not have so many requests anyway because there is no interest to have an ASN without IPv4.

RADU ANGHEL: But there is. There is a market where the deaggregate PA v6, and yeah...

JAN ZORZ: The queue is closed. Thank you very much.

(Applause)

MOIN RAHMAN: Our next presentation is L4S, by Werner from Nokia.

WERNER COOMANS: Hello, excited to be here, it's my first RIPE meeting. It's also the first time I am being sub‑titled which is actually a good thing because I tend to talk fast.

STENOGRAPHER: Slow down so!

WERNER COOMANS: I am here to talk about L4S, which is a new technology, enabling ‑‑ I changed the title a bit, simplification and scaleability in how to support low latency services. It's being sanitised at IETF. Why we talking about L4S? We did a lot of pioneering research behind this and are driving the standardisation in the IETF. We are on a bit of mission to get the word out, because L4S has a lot of potential but there is very little awareness in the industry about L4S at this point. And we think it can have a significant impact on the future of the Internet and how actually low latency can be supported in a scalable way across the Internet.

L4S, what does it stand for? Four Ls and an S, it is, low latency, low loss, and scalable throughput. If you think about latency, it's of a single number. Latency is actually distribution, statistical distribution, which has minimum value which is determined by the propagation delay, that's been solved currently by bringing the compute closer to the end user. There is an interface delay determined by the feedback and physical layer implementations of technologies like ethernet, like upon, like 5G, and this is continuously being decreased by introducing new technology standards. Lastly there is the queueing delay, also sometimes call the tail latency. This is nor L4S is tackling. So it's about enabling a low latency, also enabling a consistently low latency. So a really low jitter.

And this is actually the biggest source of latency variations within the network. High level, the goal of L4S is to reduce what we call working latency. And working latency is a latency that you experience when the network is not idle. When you are for example doing video conferenceings while your neighbour is downloading the latest PlayStation game or while your son is doing that in the same household, creating congestion on the network. An easy test you can do for yourself at home is to do a speed test in parallel with a ping. This is not perfect, ping is not prioritised in the same way but the results will not change significantly. What you see then, if you look at the ping latency over time, during the speed test, first you will get a nice low value, 20 milliseconds is a reasonable idle network latency. Once you start congesting the network, this is actually measured on a commercial 4G network, you see latencies of easily exceeding 1 second is popping up.

So, L4S is about reducing that latency to the idle network latency. No matter what the load is on the network.

So, latency, okay, it's nice, it's a single n. Let's translate it to a figure of merit that you can relate to. This is done by Apple recently. Apple is quite a front runner here in terms of documentation. They actually support L4S on all IOS and macOS devices, it can be enabled by a software switch. You see here an illustration in terms of video rendering for a face time application, so video conferencing. You see on the left the video stall percentage with legacy technology with L4S technology, so you see that actually L4S can completely eliminate any video stalls in video conferencing applications.

Another figure of merit is received frames per second. That now stick to the maximum value of 30 fates per second when activating L4S support.

A lot of companies active this side in terms of the application operating systems are NetFlix, Google, with a G force now Cloud gaming for example, just to name a few of the bigger names.

So, as we mentioned, as you have seen the names I have mentioned are application or OS vendors and the reason is L4S is a two part solution. There is one part within the application, which is actually using a new rate adaption algorithm, a new congestion control algorithm, that actually allows a fine‑grained rate control down from values as low at 100 kilobits per second plus to up to infinity. Even for very, very low round trip times.

The second part is within the network, there you have to implement an immediate marking based rate control, meaning it doesn't leverage any packet drops, but it uses ECN marking to single congestion, and that signalling can be done instantly. Not like with classic active queue managing there is no smoothing in the network. The smoothing actually happens in the application.

So there is no need to build up a queue in the network before marking a packet.

Secondly, it includes also mechanisms for co‑existence and compatibility with non‑L4S traffic. So, you don't start any non‑L4S traffic on the network, and you do that using the marking capability to control the data rate of L4S. An example of that is the dual queue active queue management mechanism based on pie square, which is also described in the RFC 9332.

There is a couple of dual queue mechanism.

So, if you think about low latency, well the first solution you think of is well let's implement an active queue management mechanism. There is a few issues with that. So if you represent the network by a water tap, the source controlling the data rate, and the bottom by a funnel, you get to this picture, you can use congestion controls by CUBIC or BBR and they react to the signals, the packet loss or delay measurements, BBR exactly. If you don't have an active queue management your reaction will only come when the buffer is overflowing.

When you implement an active queue management, what do you do? You introduce a controlled loss. When you set a lower delay target, to drain your queue faster than it would not allow it to overflow. However even with active queue management and with classic congestion control mechanism you still need a significant queue for different reasons. One reason is you need it to cover for data rate variations. You can see it here. The second reason is also even for delay based congestion controls, you still need to have a queue to actually have valid measurements of the delay. They don't work these measurements if you have no queue in the network.

And then, thirdly, you also have a big benefit of having a bigger queue to improve rates RTT fairness. You have a single bottleneck shared by different flows with different round trip times you tend to get very rate unfair capacity distributions. The bigger the queue, actually the more fair your capacity sharing is going to become.

Then last, but not least, you also have a fundamental trade‑off. If you lower the delay target even more, you enter a regime where you actually are imposing so much loss to the application flows, that you have big impact on the overall loss rate.

So this is why actually classic AQMs struggle today with imposing or enabling a low latency. Of course, a known solution or a solution you might think of for the loss is to use classic ECN. As you probably know that didn't really work out. The reason being the implementation of classic ECN is still a single queue mechanism where ECN traffic was mixed with classical loss based traffic. So the reaction to a loss or a mark was made identical. And also the rate of marking and the rate of dropping packets were made identical. Losing any latency benefits.

A better evolution in terms of leveraging ECN was actually data centre TCP. And data centre TCP was a very powerful, is a very powerful mechanism works within data centres using a new congestion control, and combining that with what we calling an immediate active queue management. So an active queue management with a very, very low marking threshold. That actually enable a very, very fast and frequent rate of congestion signalling, allowing you to control the data rate of your flows very accurately and actually enables a combination of very smooth high throughput flows that do have a continuously low latency.

However, the problem is if you take the data centre and you put it out in the Internet, it doesn't work for several reasons. One is it will starve out any non‑data centre TCP traffic being compatible with loss based congestion control, and also data centre TCP work in a data centre and use the queue that was actually larger than the base round‑trip time. The base round tripe time in data centres are very, very small. Base round trip times on the Internet are actually much bigger and in the Internet, we'd like to use a threshold that is lower than that Internet base round‑trip time. So there were reasons to adapt data centre TCP congestion control as well. That's what actually L4S brought to the fore. One is coexistence and compatibility mechanisms between non‑L4S and L4S traffic. Such as the dual pie square coupled AQM and the other was a lot of changes to the congestion control which is called Prague, according to the IETF meeting where it was agreed to, and the location of it, then it introduced changes such as source basing of the traffic flow, rather than ACK based basing. Also, put limits on burst size. Simply because when they use a queue that is larger than the base round‑trip time, any impact of those is not visible in the marking. But once you deploy the Internet you want to have that threshold lower than the base RTT, then you need to actually tackle those to make it work.

Thirdly, this is not an exhaustive list, thirdly, also Prague also introduced RTT independence, meaning that your flow throughput, when sharing common bottleneck, is actually made more equal, more fair. And also increases the inertia of low latency flows using L4S. Increasing the inertia, if you want to have low latency, it's actually beneficial to have a slightly larger inertia, especially in dynamic link conditions, this is why this works, or with a lot of varying number of flows.

So the Prague congestion control, this is a picture of how it works. It actually couples the data rate through the marking probability in a one to one relationship. No dependence on the round trip time, if it's lower than 25 milliseconds. And moves along this formula where you see this is a lock lock plot by the way, if you have for example, 1% of packets being marked on your flow, it means your flow will have a data rate of 100 megabits per second. It's a one to one relationship.

If you look at these lines here, what it actually means is, if you move along diagonal line you have a constant rate of feedback of 2 ECN marks every 25 milliseconds. It's constant, it's up to date. It's now our day to day information. At the very lower rates, there was also a changes being made to reduce the packet size at that point, to keep that pace up of 2 packets every 25 milliseconds.

So the end result is that you have a data rate that can be very, very finely grained controlled down from 100 kilobits all the way up to plus infinity.

And L4S is an IP based mechanism. So it actually leverages the ECN bits within the IP header. This is an IPv4 header. Perhaps you better use an IPv6 header for this crowd. 2 values of four have been used for L4S. The first one is the ECT 1, the 01 code point. What that allows if you set that as an application, you are telling the network hey, I'm an L4S packet, if you have the capability to do so, please treat me as an L4S packet. Then you have the second code point, the congestion experience code point, it flips the more significant bit there. It allows the network to mark the packet, the L4S packet has having experienced congestion which will make the application travel its date rate accordingly.

Because it's an IP based mechanism, it's also a technology agnostic mechanism, meaning it's not coupled to a 5G technology, BGP technology, not ITU specific, it actually can work over any technology. So we actually have a uniform mechanism for applications and their rate adaptivity across all communication technologies.

What does it require to support L4S in a network node? You need a couple of elements. The first element is to be able to classify or distinguish L4S traffic from non‑L4S. You can do it by directly looking into the IPCN bits or using a proxy Layer2 identified fire to which the traffic has been remapped by another node in the network, for example a VLAN. Whatever you use, you need to be able to distinguish L4S from non‑L4S. Why? Because you then need to put the packets of L4S in a different queue. By putting them in a different queue you have isolation and you avoid the latency build‑up of the non‑L4S application actually has an impact on the latency build‑up, the queue delay of the L4S applications. Thirdly when going to the schedule, you actually prioritise to allow a few L4S packets to skip ahead of the non‑L4S packets. All together this is just simple prioritization of L4S over non‑L4S traffic.

That is being controlled in terms of impact on non‑L4S by ECN marking. On the L4S queue, you have ECN marking being applied, that controls the rates and feeds that back to acknowledgment, through the application.

So, it uses scalable congestion control. It also allows scalability in how to offer low latency services. I have to admit this is actually a slide I copied from Deutsche Telekom, this is a nice illustration of the difference between how to manage low latency with L4S or with, for example, guaranteed bit rate services on radio networks. If you have a good channel, you use a fair proportion of the cell resources, and everything is okay. Now if you move to the cell edge or you go indoors, your spatial efficiency goes down and then if you have a guaranteed bit rate you are exploding your resource within the cell. Limiting the amount of low latency users within that cell and also impacting the other users.

L4S takes a different approach. It also sustains the low latency but it manges the throughput. Enabling much more statistical multiplexing gain in your network disease.

Similarly when you have a single bottleneck shared by multiple flows they will share the bottleneck, if two flows they will get each a half. Four flows it will conversion to double amount of the marks and half of the throughput you had previously. So it's flow fair capacity sharing within a single bottleneck.

Supporting L4S on a network then. Well, TTL first thing you need to make sure of that, is that ECN bits are not being bleached. Right. That's a big no no. Although if there is bleaching happening on the network, L4S will detect it and will default back to traditional congestion control f you want to benefit from low latency that's something you want to remove from the network. ECN bleaching. Then the biggest gains is by supporting L4S in the bottleneck nodes. Where are the biggest bottleneck nodes? Inhome network, wi‑fi or the access network typically. That's where the biggest gain of L4S support.

At the edge, it's also interesting to introduce per user rate limits, to have a nice introduction mechanism and limit the amount of L4S traffic in your network. In the other parts, L4S support it a bit optional, and I have a better plot to illustrate t you have to see your transport network, how it's behaving. You look at the distribution, if you have a significant ‑‑ an insignificant amount of jitter, well you don't need to worry, L4S will go through. Will not receive any additional jitter. You don't need to do anything. If your jitter is significant but transient, not very consistent, well you can go about just prioritising L4S. That will solve all of the jitter issues. Only when a significant and consistent, such as the inhome network and the access network typically, you have to prioritise and support ECN marking.

Okay, what about if you have now a Layer2 network that is congested in case that would be the case? You can still support L4S by doing a remapping between the L4S IP bits and the Layer2 headers. The example here for MPLS, for example, you map the ECT 1 combination to a particular MPLS traffic routes. That would suffice for prioritisation. If you also need to support marking, because it's very congested, you have to have a second Layer2 proxy, which is a proxy of the congestion experience marker in the ECN.

Okay, then we end with a couple of experimental results. So this is actually an example showing how L4S behaves on a wi‑fi access point, which was the first networking device to actually support it. In orange you see the latency on the top load of non‑L4S flow which fills the buffer and then continues doing that. L4S keeps the latency consistently low. In the data rate you see a small impact there. In this case it's related to the interaction between the wi‑fi mac scheduler which was a blogging scheduler, and the L4S marking. But there is a bit of a fundamental trade‑off as well, if you think about having no queue in the network, that means if you have a dynamic capacity, when it increases, you cannot fill that up, fill that up immediately, because there is no packets being queued that you can immediately send out. So in very dynamic conditions you do lose a bit of link utilisation. Unvoidably, it's a fundamental trade‑off. But you can fit it up with best effort traffic. Without any issue.

If you look it the CDF, this is really the proper distribution, 99 percentile from hundreds of milliseconds in this case goes down to 12 and a half, which is limited by the wi‑fi Mac layer, not by the queue delay. Another demonstration we have recently done is on end‑to‑end fixed network, so an all these connected and operating L4S, I'm showing here communicative distribution function, without L4S, you see the in my submission overflow of buffers, we are fully congestions a network in all of these measurements. When you activate L4S, the L4S latency goes down to 10.4 milliseconds. It's a P 99.99 percentile value.

Then so zoom in on the left there. This is actually still lower limited by a wi‑fi mac layer, which we were also trying to solve. We can now improve the wi‑fi mac and get a benefit there as well. If you take the LAN connection you have an end‑to‑end latency one way of less than 1 second in a fully congested situation.

Let me end here with saying that okay, with L4S you now have the capability for an application to choose between using classic way, maximising throughput, buffering the network, very nice in dynamic conditions, or doing it L4S way where you actually prefer to have a lower latency than a higher throughput if you need to choose between the two.

And it allows for the two worlds to perfectly co‑exist. An application can even makes the different traffic types within the application according to its needs, and they will fairly share common bottlenecks in terms of throughput.

With that, I'd like to end this talk. Thank you.

(Applause)

JAN ZORZ: Thank you very much. Who will be first?

AUDIENCE SPEAKER: Kemal, Cisco ThousandEyes. Thanks for your talk. As part of the talk you mentioned that in order to enable L4S, you would need to have proxy effect only on your network, right? So the question becomes whether it's actually worthwhile in going adding additional infrastructure just to enable something that's not implemented yet fully versus just enabling VPR that already exists and is doing pretty much what you just described? Thank you very much.

WERNER COOMANS: There is no need for a proxy only if you have a congested network, that then it would be beneficial. I consider it to be rare. It's not doing the same thing as BBR. That cannot solve any self inflicting latency. It can not behave in wireless connections. We have experiments on wi‑fi and wireless and radio connections because it's model based. The model is actually a really bad fit for wireless connections so it cannot keep latency low. It goes to 100 milliseconds and above if you are in congested situations, the only way to solve it to revisit both the congestion control as well as the marking behaviour in the network which is what L4S is doing.

AUDIENCE SPEAKER: Alexander Azimov. Thank you for your report. I find is very interesting as a technology, I have a few questions. The first question is that you said that with a growing number of flows, there'll be growing number of marking. Does it mean that the device that measures this number of flows. So does it mean that we need a device which measures number of flows coming through it?

WERNER COOMANS: No it goes autonomously. It means you have more flows trying to get a piece of the cake that you converge to a higher amount of marks. And basically if you get to the curve, the higher amount of marks simply means that you have a lower throughput of the flow. So they will share the throughput evenly so you need to get to a higher amount of marks to have a lower throughput. Kind of the equilibrium situation, not the dynamics that lead to that.

AUDIENCE SPEAKER: Another question is, as far as I remember, in previous standardisation of ECN, both marking 01, and 10, were just marking ECN capable traffic. And there might be in the world network devices that will be flipping the bit because they are not supporting L4S, they are just legacy devices and for some reasons ECN is enabled on the queue that processes the traffic. It will affect this technology.

WERNER COOMANS: This was one the reasons why it too so much time for the RFC to actually become an RFC, because that issue. But it was deemed that there is very, very very little deployment of actual ECN that would actually be affected by this. So, in the end, it was decided to reuse one of those two code points for L4S. If there would be such issues this is still an experimental draft at this point that would pop up in the near future.

AUDIENCE SPEAKER: Thank you.

AUDIENCE SPEAKER: Ben, BGP tools. As an end user, how do I ‑‑ how would I get involved in L4S? How do I use it? So are there any ‑‑ apologies if you explained this, I didn't fully understand the talk I guess. Is there a TCP congestion control algorithm that takes into account today L4S bits or is this more of a user thing that ‑‑ is this something that you would implement for say a quick stack or something like that?

WERNER COOMANS: There are multiple options. Either it is within fully the application top of an UDP socket, you can implement your own, or you put it in the IOS protocol, like capital have done. For Linux you have TCP Prague which also supports L4S. It's basically you can choose. There is multiple options there.

JAN ZORZ: If there are no more questions...

AUDIENCE SPEAKER: Matthias. We are in a position where we are actually implementing such a thing being an ICP vendor, and my question would be: When do you see adoption of this technology by hardware network process vendors like big K and big queue?

WERNER COOMANS: It's a good session, you need both application support as well as network support, application support it getting there, we have Apple, NVIDIA all supporting it. Second will be when you operators start implementing it? We are aware of some operators that actually have thoughts about commercialising L4S, and monetising L4S service. It's hard to look into the future. But applications are there. Now it's the turn of the network operators to actually introduce support in a gradual way.

AUDIENCE SPEAKER: Maybe one more: So, if you go back to your proof of concept, did your own wi‑fi access point, Nokia wi‑fi access point, which hardware can you share which hardware you used?

WERNER COOMANS: I can share that, yeah.

AUDIENCE SPEAKER: Can you share now?

WERNER COOMANS: Nokia wi‑fi beacons.

AUDIENCE SPEAKER: But there was no hardware off loading I suppose because the hardware off loading on those trips would probably cancel, we see now ‑‑

WERNER COOMANS: I can't go into the implementation details.

AUDIENCE SPEAKER: I figured. Thanks.

JAN ZORZ: We need to wrap it up. Thank you very much.

(Applause)
.
So, before we go into the lightning talks, let me check. Oh, there is a numbers visible. For the lightning talks there will be ten minutes. We will count ten minutes and please stick to the numbers, whoever it is.

LUCA SANI: Hello everyone. I am Luca. I work for catch point. Mostly on BGP and network measurement stuff.

So, this talk is about, is a follow‑up of a tool that we presented at the last RIPE, which is Pietrasanta traceroute. The Pietrasanta traceroute is an open source tool we are keeping maintaining at cash point which is based on Dmitry Butskoy Linux traceroute on top of which we announced several announcements, mainly to gain speed but also to introduce new features select for example the possibility to run quick traceroute, traceroute within a TCP session and so forth.

So, by the way, this is the name of the CD web data. The. So, in this talk, I'll focus on a feature which is called the ECN bleaching detection. I think from previous talk, it is pretty clear what is ECN. So I will be very quick about that. When the ECN mechanism is in place, a packet that experiences congestion, instead of being dropped is marked, and still developed to their destination. The destination then signals back to the source that the congestion happened so that the source can adjust the rate accordingly.

Typically, not necessarily, the feedback is implemented at transport level. For example, in TCP it was introduced the dedicated flag called ECN echo, by which the destination can communicate to the source that a packet was received with congestion experience mark.

As we know from the previous talk, ECN got renewed attention due to L4S technology. In particular, as we know, the version of ECN that is required is not the classic one. A more accurate one. The accuracy is implemented again, can be implemented again at trust point level and it means that the destination will be able to communicate to the source not just a packet experiences a congestion, but more precisely, the amount of packets that were issued with any value of ECN at eBGP level. 1, 0, 0, 1 or 1, 1. In TCP there is a draft which is called accurate ECN, so not yet a standard, which a new flag on top of ECN echo is added to the TCP header, which is called accurate TCN flag, this flag along with dedicated TCP options can be used to report the ECN counters to the suits for the feedback.

In quick, this mechanism should be supported by default because QUIC ACK frames contrains ECN counts. Which are dedicated to report on the values of each ECN value to the source.

So, as we know, one of the main problems in using ECN is that the value of ECN of the packets between source and destination may be altered for different reasons ranging from bugs to traffic engineering decisions. So if the value of ECN is getting altered, the mechanism cannot be used properly, and also for example, for us, it cannot be used and deployed effectively.

So, we introduced the possibility to check the value of ECN at IP level by hop from the source to the destination. In particular, we introduced the possibility to check the value of ECN in the moment that the probe expired on the hop. So, we can compare the value of ECN that we sent with a value of ECN that was experienced at the moment the probe expired, so we can check the bleaching or any kind of alteration, for example even the congestion.

We also introduced the possibility to detect if the destination trust player supports the more accurate version of ECN over the classic one. This is important because not all TCP stacks support the new more accurate TCP feedback. And also, because we observed that not all QUIC implementations actually supports the QUIC counters correctly.

So, in order to report the ECN hop by hop as I anticipated, we look into the IP header of the probe that was encapsulated into the ICMP TTL exceeded. So, on the left, you see the probe that was sent. This is a probe of RTCP traceroute. In that you can see that we set ECN equal to 1. And the TTL of the probe is 2, so after two hops is expected to expire. The ICMP that is exceeded is reported on the right. And if we look in the header of the probe when expired on the hop, we see that the ECN value is still 1. So this means that from the source up to the hop where the probe expired, the ECN value was not altered.

These are 2 examples of traceroutes running from a source to a destination. On the left, both are TCP traceroutes. So on the left you have an example of a traceroute where an ECN bleaching is not happening. So you can see that each hop from the source to the destination contains the value of ECN that we decided to use and input. And moreover, you can see on the last hop, a combination of TCP flags that allows us to conclude that the destination supports the more accurate feedback of ECN. So, the accurate ECN standard draft.

On the right, you see another TCP traceroute where ECN is getting to start bleaching from hop 13. So as you can see, there is one hop, of 13, after that they are all 0s. And the destination reports simple sin ACK, so this means that for example in this scenario, it could not be used effectively because there is bleaching from the source to the destination.

So, leveraging these feature that we introduced in Pietrasanta traceroute, we decided to run a measurement campaign from our vantage points deployed all over the world to understand how frequent and how much bleaching is there on the Internet. So, besides the search curiosity, they can be useful to understand for example how much the network is prepared to accommodate the L4S. As we know, ECN is an essential requirement for L4S. So it is a necessity but not sufficient condition.

This is not a rigorous research work. For example the results that I'm going to present are for sure biased by vantage points that we selected in our network. We tried to be as fair and distributed as possible in choosing the vantage points. For example, participating attention to not select vantage points with the same upstream, this kind of stuff, but still the measurement platform will be biased.

I think that I want to say, I asked your forgiveness for the results for the moment are only related to IPv4 traceroutes. IPv6 campaign is running so stay tuned for more results.

So, this is the overall view about the bleaching. Over the total of traceroutes that we ran, around 12 percent of them experience bleaching at some point during the path. As you can see, the bleaching happens more or less all over the world. The thing that I didn't see is that all the traceroutes that we ran are run from source and destination in the same continent. We decided to do that because usually if you use L4S for streaming or gaming, the content is likely to be provided locally.
.
The percentage that you see is the percentage of traceroutes starting from that continent that experienced the bleaching and the absolute number is the absolute number of them.

Since we are at the RIPE meeting we focused a little bit on Europe. So in this graph, on the left, you can see the percentage of traceroutes that bleached, starting from each country. So if you take a country and the percentage means that that percentage of traceroutes starting from that country going to another country in Europe experienced bleaching.

An interesting thing is that among all the bleaching that we observed in Europe, the vast majority of them were due to a single autonomous system that introduced a lot of alteration in ECN value. On the right, you can see the distribution of the number of bleaching ‑‑ out of time. I just want to say that if you find this interesting, the tool is open source. So, it can be deployed on other platforms and maybe RIPE Atlas, I don't know, so feel free to check out the code and reach out to me in order to set up maybe another campaign or collaboration. Thank you.

(Applause)

JAN ZORZ: No time for questions. You can talk to him directly. We need to keep to the schedule because we have Ondrej now. Look at the numbers here, you have ten minutes.

ONDREJ FILIP: Hello everyone, my name is Ondrej Filip, I come from my country, which is a southern neighbour of this beautiful country from Czech Republic and I think I have, I would say, an interesting input to the problematic of IPv4/IPv6 transition from my government.

So, my government, or the government of the Czech Republic, started to deal with IPv6 a long time ago, it was roughly 15 years ago, and at that time they passed the first resolution which explicitly didn't mention in IPv6 and it says that, it said that, you know, all the equipment has to be compatible with IPv6, and that all governmental website has to be reachable via IPv6. Honestly, the time the amount of website was a must so it was so so felt. But anyway, because it was slowly forgotten, some five years later they passed a very similar resolution which explains this obligation just to IPv6 support, not also DNSSEC. So they say that all governmental domains has to be signed by DNSSEC and also they said if you have some public tender for hardware or service, then it has to mention IPv6. So basically with that, we should be ready, right. There was the only problem at that time they didn't include any monitoring nor enforcement, nothing like that. So in reality, quite often the IPv6 part was ignored and many tenders just forgot to include this clause. So, that was not really very satisfactory status.

The current government has a really strong focus on digitalisation and they really work hard on to renew all governmental services on the Internet. So they pass another resolution that should probably improve the situation, especially in the area of DNS, and they said that they would like to unify all the services under the .gov dot CZ domain. There was several reasons for that. There was DDoS attacks that were specifically targeted some of the DNS of some of the governmental services, so they took a lesson from that that the infrastructure which is multiplied many times is not sufficient. And also there was some phishing attacks that kind of, you know, the problem of citizens of the Czech Republic was that they couldn't recognise what is the governmental sign and what is not. They wanted to clean this mess and all the e‑mails and all the websites they wanted to put them into .gov dot CZ.

Also again, the IPv6 and DNSSEC also reiterated.

So again interesting move, but the last resolutions was roughly four months ago and those probably the strongest and most interesting. First of all it's called the rest out of the disc and IPv6 because they figured out it was mainly based on the RIPE NCC Internet country report for central Europe that the implementation and the deployment of IPv6 is not optimal in Czech Republic and they wanted to change it. So they reminded all the peers of resolution, they say they are valid but they are not really fulfilled so they said we really want this to be fulfilled. But also they put monitoring clause there, so the quality or the status of the deployment will be monitored annually, starting next year. So that's a novelty there, so it will be monitored and must be published in the report. So, the people will see how this resolution is fulfilled.

And the most important part of it, they also set a date when the governmental services will be ceased ‑‑ will not be provided on IPv4. So, that's something I would say revolutionary. My government set a date. It's June 6, 2032. It was chosen because it's exactly 20 years after the first IPv6 day, so 20 years after the start of IPv6, they believed that 20 years is enough to adopt a new technology. And of course we create a website and T‑shirts and stuff like that, so you can check end of IPv4 dot CZ, there is a count down, and also of course you can download the resolution. But basically, that's the main news. We have a final date of IPv4 in my country, so I'm really looking forward to that and we will see what happens. That's all. Thank you very much.

(Applause)

JAN ZORZ: See. Thank you Ondrej, now we have time for questions.

ONDREJ FILIP: And there are none.

AUDIENCE SPEAKER: Hello. Michelle. Question: Did any commercial entities follow suit here? Because this is only government. So that's only part of it. Obviously, you would need your old infrastructure to support IPv6 like all ISPs and so on. And then also you could see enterprises following suit and also saying that we will also stop that, or right now, because we are not discussing stopping today. It's far in the future. Now it's more the push that everything that is not yet IPv6 should get IPv6 now.

ONDREJ FILIP: There are two things, right. Like, economic services or the providers of websites, services like that, they are not touched by this because they can continue to operate as they do normally. This was mainly a message for ISPs because not all ISPs in the country do provide IPv6 support. And I must say, a lot of things are changing these days, especially the mobile operators starting to offer IPv6 and so on. So, we can see some change, but, you know, still it's we'll plan ahead so I don't expect anything rash but that's the main message for the ISPs, do implement IPv6 as quickly as you can, yeah.

JAN ZORZ: Okay. Anyone else? Is there anything online?

MOIN RAHMAN: No.

JAN ZORZ: Apparently everything is clear. Now we have the v4 end date. Thank you.

Who is next?

MOIN RAHMAN: Libor Peltan.

LIBOR PELTAN: Caution, this will be about DNS, so, please don't hesitate to leave the room if you suddenly feel uncomfortable.

I am Libor Peltan from cz.nic, as well as our boss, Ondrej, and I am the developer of the open source development of the open named called DNS.

So, nowadays, mostly the zones are signed by dedicated servers. They were the way that the unsigned version of the zone comes with an implement its own transfer to the sign, the signer signs the zone and the signed version is again signed incremental zone transfers to public facing secondaries. You might want to use some multiple signers, not just one. And there are the possibilities either you run the multiple signers inside your infrastructure, or use separate providers for the DNSSEC. If you want to transition between those providers, you need to shortly become multiprovider, multisigner, because it can the be done automatically, but this is the scope of the music project, which is large, ambitious, I wish them all well but they are still in development and they have their own problems. This talk been about the multisigner within your company.

When configuring multisigner, mostly you will face and things can give you a headache, so this talk about give you some hints and warnings what to do and what to not do when configuring multisigner environment.

Some of those are common for all the set‑ups that are possible and some of those warnings are specific to the specific types of setup.

One of the considerations is that when you configure multiple signers, then the public facing secondaries actually actually need to be configured to be able to obtain the implement its own transfers from the all the signers, because if one of your signers goes offline or stops processing the updates, the secondaries will smoothly transition to the working plan. But in case of implementers own transfers you really need to take care because the versions of the sign zone will be different on each signer. The resource record signatures will defer in the version of the zone that's produced by one signer and another and they can have the same source real. So if the secondary, if the secondary downloads implements its own signer from one signer and then another and mixed them up, it will lead to a big mess.

So, there are multiple ways how to mitigate this and how to avoid this situation. Some of them can be configured on the site of the signer, some of them on the site of the secondaries and the best approach is to combine them.

I would like to also present to you some of the set‑ups that are possible within multisigner in your operation.

It mostly depends on how you want to manage your keys. There are more possibilities, first of them is that the signing keys are shared among the signers. This means that all the signers have also the same private keys. And you can manage them on one of the signers and share to the others, or manage them elsewhere and then import all the keys to all the signers. But in any case, you need to have the tools that are able to export, transport and import the private keys into all the signers. And you need to repeat this process every time there is a key rollover and you need to also consider that this export and import takes sometime, so this delay has to be accounted for in your key timings.

Another possibility is that all the timers, all the signers share the key signing key, and each signer has its own sign zoning key. This mode of operation really resembles the offline KSK mode where you have only like one party that signs the zone with the zone signing key and the party that only signs the DNSKEY set with the key signing key. This works very similar or at least almost the same way, so the same tools can be utilised for this mode of operation. So if your software is able to operate with offline KSK, it's possible to configure this setup.

And the third possibility is that keys are different each signer uses their own keys. But in order to make DNSSEC working, you need to share the public keys between the signers, because the DNSKEY RRSet must be the same on each signer and must contain all the keys that are used.

So, in in this mode of operation, the signers need to share the public keys between them, which can be synchronised various ways, but it has to be done first when you configure the host and then upon every key rollover that happens at any of the signers.

Various ways can be used to transfer the public keys between the signers but for DNS the native way is to use dynamic DNS. And it can be configured the way that when a signer updates a key, it sends the dynamic DNS to all the other signers to inform them about the change. Or it sends the dynamic DNS update to the common primary into the unsigned zone. So that in the result, the unsigned version of the zone actually also contains the DNSKEY set. Either way it has some down sides.

So, it seems that the multisigner is painful to operate but I hope it's not and it's possible to operate smoothly. But you need to align your tools not to do things manually because you can break things manually. You should automate your processes, keep monitoring them that everything is okay. Your tools have to be able to react on events. It means events during rollovers. And rollover timings need to reflect the delay imposed by the synchronisation of either the private keys or just the public keys.

We have written some guidelines, which should help you configure this setup in not DNS but I hope those guidelines are also applicable to whatever software or whatever you use to do this.

So, we have two and a half minutes for questions.

(Applause)

JAN ZORZ: Thank you very much. We have two and a half minutes for questions.

AUDIENCE SPEAKER: Hello, Matthias speaking, IEC. Thanks for bringing this to attention to a larger audience. I think my main takeaway, I was working on mill signing BIND codes, we have to change the assumption on DNSSEC signing, because basically we have been thinking about a single signer and that means we own the key material, and with multisigner, suddenly we have data key material that's other providers. At least in our co previously we made a lot of assumption, this is our stuff so we can do smart things and suddenly there is a key someone else that broke all those assumptions. So it would be good, I should be reading those guidelines but it would be good to see if we can find a difference in those assumptions between single and multisigner. It's not a question but it's a comment. But thanks for the presentation.

LIBOR PELTAN: I'll be happy to discuss further.

JAN ZORZ: Thank you. Any other questions, comments? Anything online? No. Do we have anyone else?

MOIN RAHMAN: The nomination for the PC has been closed and the voting starts from 1600 hours today. We also have a BoF from 6 o'clock in this same room. And we also have a social network event, the first bus for the event will will actually leave at 8:30 from the entrance of the hotel. Thank you everyone. And enjoy your coffee.

JAN ZORZ: Please rate the talks. We at the Programme Committee would love to see good or bad ratings and the explanation why something was well accepted or not. Thank you very much. There is a coffee outside.

(Applause)

LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.