Plenary session
21 May 2024
11 a.m.

FRANZISKA LICHTBLAU: Welcome everyone. Welcome to the second part of the morning Plenary today. I am Franziska and this is my co‑host, Antonio, from the Programme Committee, and we will chair this session for you.

General remarks:
Please do us a favour, rate the talks, give us a comment, that way we know what you like, what you disliked, and you can still put yourself up for the Programme Committee, we have two seats open for election. You can put yourself up until today at 3:30, just send us a mail at pc [at] ripe [dot] net, and without further ado, I would like to remind you that if you go to the microphone, please state your name and affiliation. If you are unsure how to behave at the microphone, we have a Code of Conduct that will explain it to you. And with that, we will start with our first speaker.

Our first speaker is Joshua from the university of York and he will show us a new tool to create meta rich Internet topology graphs to understand the Internet in a geopolitical world.

MARTIN LEVY: I am here to speak a little bit about architecture routing in a geopolitical world. My supervisor is also here if you want to ask questions.

So, without any further a data networks we have got a few different ways of looking at what the Internet looks like, I am sure some of you have seen one of these different versions of it. One is looking at the AS course and how autonomous systems interact on the Internet. One of them is can looking at what part of the Internet is hosting how much content. And there is art of sorry connectivity maps through time as well.

Some of these have some advantages. Others have others. But importantly, why should we care? And here is an example, for instance we all have our own priorities and what we want an Internet to look like, from a user perspective that might be really owe latency. From an operator perspective, that might be network routing trans patterns, so you know where content is going, how you can steer it. And also free peering is an economic incentive so everyone is able to provide transit.

Arguably this might be so some people a kind of border free, settlement free, trans point and uncensored Internet which of course we all know is not the reality.

So this is kind of touching on that. Looking at the geopolitical element. So, kind of within the context of Internet sovereignty where people are trying to exercise what they would call a right to govern their networks to serve nature national. In the UK it's working online safely. In other countries there is other implementations. There are other governance ideas being shared, a week ago, they had their own outcomes and vision for how the Internet to be governed.

So, alongside that, there is also increased regulatory intervention. So, at each one of these content platform add network levels, different countries are introducing different types of legislation, across the EU there is some examples, here in Poland there is some legislation goes to counteract some of the EU examples. In the UK we have got the new online safety ability. Everyone is kind of starting to build even more of a kind regulatory base for the Internet. And of course, that want be implemented in a kind of borderless Internet world.

There are kind of three questions I want to look at a bit here. First of all, does the topology differ between countries? Different countries Internets look slightly different?
How does it differ? And then why might it differ? So, looking at whether that legislation might actually be kind of the impactful element.

So, our work is kind of building a tool to collate Internet topology data from a number of different Internet sources and compile it into an Internet topology graph that we can analyse and do other things with. I have got a link on here that will take you to a lab website and next week I'll release the source code for this. I need to clean it up before I do that.

So first of all, looking at the approach that we take I'm going to talk you through about what it does. First of all, it's taking sort of Internet routing data from a variety of different sources. So, views from the University of Oregon, the RIPE NCC's RIS service, along with RIPEstat and a couple of other tools like Atlas. And then the packet clearing house which provides sort of dump data from a number of routers at various Internet exchanges around the world. Another quite rich source to expand what we can see.
And this is kind of how the system works. So we take in kind of Internet routing data from these route collectors, we take in meta data, things like registered country, registered location, who owns the resource and whether they own any other resources. And we process that in. We also look at kind of geolocation data. It can he see where a given router is in the world and that way we can start to detect AS that is span multiple places.

And then we process the data. We label Internet topology graph. And then we use this to output.

So, when it comes to capturing location, there are challenges of one of them is the different registries capture that data under a slightly different field. And also, when it comes to network spanning multiple locations, the different AS tags that come associated with that. Also quite interesting.

And then another part of this is kind of looking at prefix geolocation. And in our case we are using MaxMind because it's a free source for our research. But it also comes with some of the like cyclical dependencies that if they don't know where something is they use the registry information anyway which doesn't give us any additional richness.

Adding additional metadata to this, so we take in the registered owner but then we have to try and work out if that owner exists in another autonomous system record, say for instance if a company is merge with another one, there is an inconsistency in the data despite the company being the same, so we have to look that. We are using things like PeeringDB and that kind of thing to see if we can correlate between the same owner across multiple registries.

Also looking at the registered location. This is using start country codes. We are going could keep that consistent throughout this. Whenever you see the country code. That's the ISO standard code. Looking at State open source data to see if the State owns a percentage or more than 50% of some of these companies that we detect within the data.
And then ultimately we get this as an output. We get a node being an autonomous system with a number and associated metadata and then the edge is just an convergence between the things we're detecting like AS paths and AS data from RIPE Atlas probes.

Here is a basic example just a general node to node peering, so an AS to an AS.

So if we look at 2024, this is coming from the 1st May, and my time stamp start point was at midnight so the start of the 1st May. We see 84,266 autonomous systems and to give you a bit of a comparison with alternative data sources for the same time period. We have got how many ASes were reserved within RIPE RIS and we're higher than that because we have combined a number of different sources. And then to try and give you a comparison of what kind of adjacencies were varying from that data. I put our data for 2024 which is a little bit higher on the right‑hand side here with the blue graph line, and then ours versus CAIDA in 2020, because that has the richest set of data. We have got a little bit of progress on where we were before. So, topology seeing a little bit more and part of that is to do with the richness of the data sources we are combining to build a better dataset. In particular we know that our data isn't quite as accurate for Africa because there is little in the way of routing collecting probes in Africa. So if anyone can advocate for that that would be helpful.
We're in the same time we're seeing 84,266 ASes, we're also seeing 116,977 flow assignments. That's the maximum number of potential ones that could be in the dataset. This is the only ones we're seeing in active. Clear we're not seeing everything.

And that's kind of an overview of how rich our data is.

And this is what we see for it. In this case we have got a CCDF plot of the maximum deregulate and that's running along the bottom, and then vertically we have got the kind of probability of a scale of seeing that, and so this is showing across four different periods of 2010, 2015, 2020 and 2024. What's particularly interesting though if I look at this next slide is the kind of decrease in the medium deregulate nodes between 2003 and 2024, and then seeing a higher number of high degree nodes, so this is talking nodes with a deregulate of like more than 10:3. And that's an interesting trend that we are kind of starting to see this like medium number decreasing and the highest degree nodes increasing.

Also, if we look at kind of RPKI deployment in the same scale, we're taking RPKI deployment as well we can see one ROA object for the AS which I appreciate is not possibly the best measure of it but it's one way of looking at it. On here we have got all ASNs, that's at purple line going through and we can see the number of medium degree nodes where they have deployed RPKI is much higher, but a high degree nodes there is possibly a little bit less on that front and then we we have seen no RPKI, we are seeing that mostly existing still within that medium degree node space. So the probability of encountering a non‑RPKI deployed node, you are going to get a degree between 10:1 and 10:3. So 10 and 1,000.

If we look at the average neighbour degree, so this is taking the average degree of all of the ASes locally collected peers that is we can see. This is another interesting plot which kind of shows that the at the higher end where the average end node degree is right above 10: 3, the number there has been increasing over the past kind of four time periods of four lots of five years.

So another interesting kind of key trend.

And then, this is a kind of crude way of looking at it but the most influential where we talk about icon centrality, how central to a network graph a node is when it comes to an autonomous system, this is the kind of top collection of autonomous system providers that we can see based on our company data. We know our data is slightly biased because a lot of the route views and RIPE RIS data is collected from route collectors peered with Hurricane Electric in particular. So where we see them being the most central in this graph, it's possibly not true. What we can see is the kind of major providers that we would expect to see are appearing in that data in the most central.

Then we talk about kind of autonomous systems and peerings. The number of autonomous systems registered or observed is increased significantly since the start of our data collection period in 2005. The number of public inter‑connections is also increased. But what is really quite interesting here is the average path length has increased as well. So where in like a lot of academic literature we tubing about average path length of 3.7. In this latest collection of data from 2004, and also similarly in 2003, we're seeing that path length being 1. There are a couple of possible explanations for that. One of them is there is a problem with our data. I don't think that's true because we're capturing the previous measurements up until about 2015, using our data as well. So there is something going on there. And I think it would be interest to find out quite what it was, whether it was kind of this, what I'm going to suggest is to believe kind of son sore ship approaches or whether it's just an increaseing in provide peering and maybe we're not detecting this. But we're trying to mitigate that with traceroute entries as well. Hopefully we will be picking up much of that.

So, the kind of does element here. Does topology differ between countries? I have got a force directive plot of the Internet looking at May 2004, and so I have put a number of countries allocated colours in this graph on here, and you can see kind of over the top here where that green patch is emerging, that's the United States. They have got a strong domestic interconnected base. On the right‑hand side coming out, the blue, that would be the connectivity to Russia. Their approach to censorship is quite interesting. What we see is a high degree of interconnect activity with the rest of the world with a lot of nodes but we still see a separation coming out where they have got a domestic network that be exist almost on its own. Then coming off on the left‑hand side, you have got this kind of weird like spring side here, that's Iran and we can pick that up clearly on the graph and China is actually so far off this graph we can't see it, and it goes out on the bottom edge. It's an interesting way of seeing the interconnectivity of some countries, particularly the United Kingdom is right in the middle of this graph, so you can't even distinguish it. Some of these other countries are quite prominent.

Then the how. To look at this we want to look at what a foreign neighbour so this is calculating it here. We have the registered location of these autonomous systems, and so, on here, we have like not company LLC within the UK, it relies on the upstream provider, it's a net provider limited which is in the UK, then following that is connected to a foreign neighbour. In this occasion it's going to be Hurricane Electric, which is in the US. That's how we are defining a foreign neighbour would be the foreign neighbour of its net provider limited.
We look at unique upstream neighbours, which is going to be the number of neighbours we haven't yet seen that are not from a domestic country and are not yet seen as we move at each step. We are going to move from that kind of first base where we're in the UK, it's called a foreign neighbour, that's going to be the first unique upstream connection. Beyond that that's the one set of neighbours are connected to that we haven't already seen. And this is kind of where we get this graph from here. So, I have got a number of different countries running along the bottom. It's Norway, China, Russia, Great Britain, India and the United States. And you can see there is a trend here with some of the countries where you'd expect higher censorship. In the first few steps we are going to get a decrease in the number of unique neighbours we discover. Interestingly, also we get the same thing when we look at the United States, although it's not quite as pronounced. It's an interesting topological trend is the number of unique neighbours is decreasing after a first one of these steps.

And then visualising connectivity. I'm using a graph format here where effectively in the centre of it is going to be the most important nodes and when we move further out they are becoming lesser importance. They are not as critical.

When we're close to the centre we are expecting that kind to be tier 1, tier 2. And so if we look at this for a collection of countries. Here we have got Iran. What's interesting here is we're looking at the topology of Iran and its first degree foreign neighbours, we can see where they are connected as being almost on the edge, sort of the lesser importance to the domestic connectivity to the country. I want to keep that picture in mind and if I show you a different one. This is Norway and we see that in Norway, a lot of their domestic connectivity is delivered on the outer tier, so the lesser of importance nodes. And a lot of connectivity they have to the rest of the Internet is delivered almost centrally to this graph to the tier 1, tier 2 providers.

An interesting trend. And we can see this kind of reflected through a number of countries. This is the UK here, we are seeing again that kind of dense mass of a lot of different colours being a lot of different countries being the most central part of the network and on the outside we're looking at the domestic connectivity of the country, that's dependent on these international providers. Interesting.

So, looking at the why is this a thing. I am going to suggest that potentially censorship is one of the items there. So domestic regulation in each of these countries and a way of implementing that is to have control of the connectivity going externally. In this case we're using the uni data, so the open observatory for network interference data, and there are two different ways this detects censorship. One is the explicit censorship. One seeing block pages or we can see that the DNS resolution is just not correct. And then we have got kind of non‑explicit ways where we're looking at anomalies in the data that we're collecting. We need to mitigate a bit for that. We look at the ratio of anomalies to normal traffic or good traffic.

And then we're talking about relative upstream change. And that is again that same phenomenon where we look at the number of neighbours going outwards. In this case we're not looking at you unique neighbours, just the volume of ASes at each one of those steps and we're going to look at the change and we're looking at that from the border. We're talking about from the domestic nodes layer and the first degree of foreign nodes and then the second degree.

If we do that we have relative upstream change going up on the left‑hand side here and then the ratio of censored and anomalous to normal or okay traffic along the bottom. We can start to see a relationship as we move along, degree 1 being the first step and 2 the next step. We see that trend increases. I don't want to put too many dots on this. But you can start to see the trends emerging.

When we kind of accentuate that, we look at cumulative downstream burden which is going to be the number of ASes that are external dependent, or the country is dependent on for its connectivity domestically. So dependent on being the connectivity to the outside world. If you are a first degree node, for instance, you are going to have a greater dependency if there are fewer also at that tier.

And here, we're starting to see a correlation between the two. So as the amount of interference or censorship in anomalous traffic to okay traffic increases, this kind of connectivity burden is also increasing. So there is a kind of funneling effect here happening where nodes are being like dependent on the further away from the country's core.

And then this kind of doing the same metric, but this this case it's removing the duplication. If one company is owning five of these nodes, then we're going to collapse that to being one company because it's got one rule set potentially behind it. And we see a much clearer distinction. So what we're seeing here is the higher level of censorship is equating to a higher level of what we're going to call the funneling effect where this is kind of bottlenecking traffic going out from a country. And so we have seen that higher censorship. What's interesting is where we're seeing high levels of funneling you can see there is geographical restrictions as well. For instance, South Korea where we have got a higher level of funneling has neighbours that you can't connect to. Like north Korea and China, their level of funneling is much higher because they are dependent on a small amount of connectivity to the outside world.

So, another kind of interesting trend. Also geopolitical so one of them is the political angle and this would be the geographic part.

In summary, what I have shown here is kind of the basics of this kind of Internet topology fusion tool that we're going to make open source in the next week. The degree of the highest, the volume of the highest degree as it is increasing over time. But also the volume of the lowest degree is also growing but in that mid‑range we saw that between 2023 and 2024, the volume of ASes is actually decreased.

And also, that geopolitics is having a statistically impact on the the shape of topology and this is a trend I would expect to continue growing as censorship in different countries it increasing. So we can see that kind of relationship is existent.

And if you have any questions, I'd welcome those. And if you want to get in touch you have got my contact details and my supervisor's contact details as well. Thank you very much.


FRANZISKA LICHTBLAU: Do we have any questions in the room? I don't see anything. Do we have something in the chat? Oh, Tobias.

Tobias: So you have these interconnected graphs which showed around to be less connected. Like, I was thinking about I would assume a similar picture if for example there was no LG I in Iran, there was no Deutsche telecom in Iran and no Telefonica, no Orange, which in Europe, usually have pretty state boundary crossing presences s like you can get from a subsidiary of NGI you can have couple of Germany, Vodafone, did you account for that?

MARTIN LEVY: Yes, quite a bit. I lot of the data here is biased by the major peering connections, with the route collectors if I expanded the graph you would start to see these ones appearing. So Telefonica in particular I have seen quite a lot appearing in the data as being a kind of a major inter‑connection provider. There is more work to be done but hopefully if you can use this tool afterwards, then you'll also be able to detect the same at the moment nay you have described, at the moment we're looking at registered ASes but the dataset itself does contain the countries that we see networks appearing in.

FRANZISKA LICHTBLAU: Thank you. We have a comment from the chat.

ANTONIO PRADO: Not from the chat, from myself. A question for you. I would like to know about censorship. If you counted HTTPs as a common name errors in your statistic when you redirect a page to a landing page from the authority, for example.

MARTIN LEVY: Yes, that would be like explicit censorship where like the traffic is redirected. So it wouldn't be the page that would be used in the control test. So we have kind of a dataset in the uni approach is they have a dataset of normal kind of expected pages and if it differs we can see that that's either going to be anomalous or explicit block, in this case it would be explicit.


DESIREE: Thank you for presenting that tool here with us. A point of clarification. In your presentation, you talked about the length of a digital path going from 3.7 to 12. Could you just a little bit explore why do you see that and what the path length is actually going in which parts of the world is getting more lengthier?

MARTIN LEVY: I think, in particular, the parts of the world is lengthy, so this is taking just a general average, it's not kind of counting for bias in certain areas. In particular places like China where we have a really strong domestic connectivity that is not dependent on the outside world at all is skewing the data slightly and the same with Iran where we saw that jetting off to the side. Countries like that where we have a really high degree of censorship in their topologies is fundamentally quite different to those of where the kind of topologies is amalgamated with the rest of the world. That is skewing the data slightly as well. So that's the kind of main problem with it.

AUDIENCE SPEAKER: It's within the national boundaries, I see.


AUDIENCE SPEAKER: I have a question: If you counted a sudden change of upstreams, like we observed in Ukraine in regions that were occupied by Russia the local providers were forced to change the Russian upstreams to that the only traffic that can go by Russian upstreams and get censored.

MARTIN LEVY: This does pick it up. I haven't looked an at it but it is something we have seen. There is a paper by Louis Petinance that looks at that in much more detail particularly in the context of Russia and Ukraine.


AUDIENCE SPEAKER: Peter Hessler. I have a partial or a potential answer for why the AS path could have extended. It has to do with the BGP selection algorithm and the first, so you checked if it's valid, then you check the local preference, and as an operator specific value. Then the check the AS path length and it's very common for a lot of medium and small networks and even for large networks to have a better local preference for peers versus for transit. And so that's a very easy way to see massive explosions of AS path length. One potential explanation for that.

MARTIN LEVY: Yeah. I think I have seen a little bit in the data. Of course it's very difficult to work out explicitly from like a no knowledge perspective of what's happening within certain networks. I think it is notable we have seen is it only in very recent years, I don't think that's necessarily the most recent trend. So I think there is possibly other factors as well. But that might be a contributing one too.

AUDIENCE SPEAKER: Sebastian. I am asking about, so did you consider to look at the EU and as maybe one country, also to look how it changes because it's like a huge economically interconnected block and like comparing it to the US would probably be fairer if you didn't consider all inter EU peerings than foreign.

MARTIN LEVY: If you remember the kind of massive splat graph early on, if we remove the EU from that as a whole, the splat graph doesn't change much really. So, yes, we have looked at it kind of the EU as an interconnect I have region is arguably closer to is that kind of utopian idea where we see a high degree of international connectivity but fundamentally the rest of the world Internet isn't kind of dependent on, although the shape of the world Internet isn't dependent on that.

FRANZISKA LICHTBLAU: Thank you very much.


FRANZISKA LICHTBLAU: Our next speaker is Tobias from Max Planck Institute for Informatics and here we will stake a step back he will talk to us about the state of general Internet measurements projects in academia and what can potentially be done to help them.

TOBIAS FIEBIG: Welcome to everyone. So, this is a bit in the context of a project called measurement dot network which is supported by the RIPE community fund and a couple of other members of the community for which I am really, really grateful.

And I think we should first quickly talk about what network measurements are. It's basically the thing we do. Well, as academics. It's also kind of what you do as an operator. It's called historic monitoring and you do it it to figure out what your network does and we as academics we don't have our network on our own we do that for the whole Internet. It's important for us because we get papers out of that and for while practitioners, if you run your own network, you want to know what it does and if you are for example doing protocol development, you want to improve protocols.

You can do them active or passvie. Passive is the easy part. You just listen in to things. The active part is where you take a stick and poke at things. And sometimes things are not happy for that to happen. That makes them more difficult.

Academia, another term used, that is basically the place where we all go to study, learn and research, academics like me, people like me are the ones that are there to do the studying, teaching and learning.

You have different levels of education, and the European region it's usually bachelor, master, Ph.D. Bachelor, is you show that you can do stuff, when you are told to do. A masters is that you can do stuff without being told what exactly to do. And then the Ph.D. is the pinnacle, because you demonstrate that you can actually come up with stuff yourself and do it. Those people usually do that after they graduate at their masters. The purpose of academia is that we want to find new knowledge, new technology, make the world a better place. Educate people to go out into the world and make the world a better place. And one other thing, we do have our own currency which is research output: Papers, renown and fame.

So why can I talk about academia? I have spent a lot of time there. I got a bachelor of science degree, a master of science degree and network engineering and then actually did a Ph.D., used to be like a teacher, and then switched over to the MPI for Informatics.

And I also know a little bit about ops. So a lot of people here know me for probably somewhat longer than I have been known in academia. I basically always have been pushing packets around. Setting up systems, running systems, infrastructure. My little hobby AS is, well equipment with a larger AS kernel than some professional ASes and I think we will hear about personal ASNs later today.

And I kind of do both.

Disclaimer before we now go into the content of this talk. My work is not perfect. And I don't claim it to be. My work actually usually has like a lot of bugs. And in practice, we tend to be all just cooking with water. So, like, these are mistakes everyone, including me, tends to make.

The examples in those slides are from my work, right. Except like one or two, but those I think are fair. Simply because I want to make the point that these things happen. This is like a general systemic thing. So also please don't take it as observing those academics. They don't know what they are doing, don't hate the people, hate the game.

Let me do what I can do best, which is stupid stuff.

Stupid stuff in this case looks like this. This is my mail setup being a bit more enthusiastic. You might notice it says like SMTP is not a thing and my my SQL is doing 400 queries per second. And the mail server is a little bit busy. What I did I put like trace, or the OpenBSD equivalent and figured out it's kind of complaining because somebody trying to log into my mail server with well characters that are not contained in the lines. So the fails and the server is like, oh query failed. Maybe I should re‑try it up to 400 times a second.

Now, this was basically ‑‑ well this was most likely Internet background noise but in a way somebody doing an active measurement against my system. My system was in a little way special and then it toppled over. Let's do something else. Let's do e‑mail security. I mean e‑mail, e‑mail is kind of the thing which BGP people look at and say like, well at least I'm not doing something complex at that. Granted. But it has some interesting things. And one of those things is a sender policy framework where you can say which IP address is allowed to originate e‑mails for your domain. One interesting feature is has is include. And if you can /22 something you can often to recursion. And if you can do recursion, that quite quickly smells like denial of service. And the RFC actually says if you include things in specific SPF records you should do ten lookups. We all know if the RFC says you should do things, all implementations out there will behave according to that. Yeah!
Anyway you get really excited as a researcher because I smell a paper, that's the thing that keeps you going. You want to see is this really a problem because what sells a paper is large scale impact. What you do is you go to stack overflow. DNS server, Python, because you know Python, and copy paste together a quick Python DNS server. Throw a measurement by sending people to an eternal SPF tree and it turns out these are internal mail servers are actually recursing, while doing that they don't really do that mail processing stuff any more, which might make users a bit unhappy.

In addition, you recently moved institutions, you are returning your experiments because your previous and current institution ‑‑ this is from a collaborator and I was involved in that work ‑‑ both kind of outsource their IT so everything is. The delegations for your DNS kind of fell victim to that so people don't know it's you doing those measurements. So, you had a couple of mail servers falling over and people couldn't reach you and didn't know who was kicking over the mail servers or how.

So, after you have gathered your data you are happy because you have a nice finding and you approach an operator mailing list and they are like, not necessarily happy with what you did. And you are a bit confused. You maybe also a bit hurt, or actually get one of the more stronger winds of operator or opinion after toppling their systems.

You can also then think about like running more reliable, into. You try to get a mail server that won't have those bugs. You opt for like the U loco low mail setup and you just try to use that instead of yourself written Python server. So you clone the GIT repositories, you do your Docker‑Composer and you see this. You have a mail server running, Docker container you have a recursive resolver that does things and you have your authoritative DNS. The thing is, DNS is not really the only thing. So all of a sudden your measurements might have this little bug. Especially when packets get too big and say the... attached to our R set. Which is kind of not that good for the quality of your results but you can maybe kind of don't notice. Also, maybe your DNS resolver just stops resolving because your Docker stops. There is no realtime monitoring. Maybe the DNS didn't get delegated, or your recursive resolvers is just a strap and forward to one of the quad resolvers which is kind of well... I would now say everyone, but I think the better word would be "Far too many people" are doing in general. So you don't even know when you see something hitting your authoritative, whether it might have been an odd effect of your network measurement setup at all.

If you wanted to security stuff it gets more fun. We wanted to do some SSH related scanning. And the poor Ph.D. student had a very nasty colleague who happens to have his own personal AS, I don't know who that was, and the Ph.D. student was forced to like test the measurements setup and for some reason there were some hosts in that AS not being found. All packets are going out, like the SYN packets. SYN ACK comes back, leaves the AS that is being scanned, they never arrive at the scan box, no clue why.

The reason was basically that. The scan server had ETH 0 which had 191.0.23 /24 for management purposes and then a bond with 192.02.42 /24, and the scan perfect should go out there. To make that happen, there were rules installed that said like from /24 look up table 101, and corresponding table sending it because it's the management traffic and the same for 100.2. Somebody see the issue?

Well, in the rule it says /24 as well. And because Linux in this part of setting networking rules does not require networks to be networks, the first one matched the whole /24. And that led to a very weird interaction with the Cisco switch on the other side which made half of the packets arrive at ETH zero, even though they should have come into the select P, then we only saw half of the packets. Anyway, security, I said, and this is what you would see on the other side of our scans.

We are not actually attempting authentication. We are doing something that looks like authentication, but I assure you it's not. But it's still a great opportunity to make many non‑friends via the partially anonymous service, and we are actually reading that. So, that is one that doesn't go to the detail not. It's an even greater way to get the organisation's whole IPv4 prefix kind of block listed. It's an even more amazing way because you are sending since to all IP addresses within the Internet to find all the hidden middleboxes in the organisation's network. So usually large scale active network measurements are a great way to get into contact with your IT department. They are usually not that happy about that but it works.

In addition to that, the Internet just tends to be very, very complex and law driven. In understanding how the network works is difficult. Just ask yourself with whom do you peer and with whom are you actually peering? And then ask the same question to your contracts department. And I am pretty sure most of you really don't want to ask that question to the contracts department.

Another thing is like I said I have my own little AS and I do have some form of AS cone thereunder and if you analyse the data foreign network measurements, I for some reason upstream to to ISPs, one insurance company or finance company and one media publishing company. I can assure you it's all personal ASNs, we will be hearing about later.

At the same time, you are also, if you talk to researchers at conferences you often hear things like peering agreements are generally established between the legal departments of two corporations and then communicated to IT for implementation.

And I mean that one wasn't even a joke, right. It's like, it's reasoning about a world you have never experienced.

Network protocols, on the other hand, are even more complex. So whoever put the S for simple into SMTP and SNMP also put L for lightweight into LDap. Whenever I figure out that person is I will have a strong chat with them. I asked whether it would be feasible for a person to write DNS software after roughly one year of working on or with DNS, that can be safely run against the Internet without breaking anything. And the reply was well I wouldn't even trust myself to write system new DNS software that does not break anything when thrown against the whole Internet. I mean it's DNS. Well at least it's not mail.

If you want to write reliable measurement software, you have to be aware of the Internet being full of corner cases and all the underwritten and undocumented law based rules of protocols of your choice and you have to build something reliable, ideally reusing as much tested and working software as possible. To be able to do that, you have to be an experienced programmer that is able to just write really good software. I mean, many people are looking for that kind of person. We need version control, tests, proper development process.

If you want to run these measurements, like writing the software is only one part, you have to operate it, you have to run the measurements, you have to be an experienced system operator that can debug things like this little ACP issue before you have to know about all the tools involved in doing that, you have to monitor your stack. And you need historic monitoring to figure out well do I run into a bottleneck? Is it maybe just my disc performance I am measuring when I do these RPKI measurements? You also need to know whether it still works for realtime monitoring. You really need an end‑to‑end understanding of the stack you are using to measure as well as of what you are measuring. And the whole thing needs to be self contained. So no outsourcing, no yeah we put that into AWS because whenever you do that you could use random unknown variables that might influence your measurements and might make for a very, very exciting thing you are analysing.

And all of that needs to be ethical. Right. There is one very fun thing about the Internet being made from duct tape and bubble gum, I think there are many things we know that are issues. So we just don't talk about them. If researchers find them they are really, really excited because that is maybe, that is paper for sure.

You need to get ethics approval from ethics departments that often don't even know what the Internet is, I'm exaggerating here a bit. You need to have proper probity going, you need to be able to set reverse DNS. Negotiating that within IT department can be a bit challenging. Ideally you want to adjust the Whois, you want to run a web server, you want to have 24/7 FUs and you want to maintain a block list.

The Ph.D. we need is all of that combined.

It's not a Ph.D. student. It's basically a whole IT department. And that bites into the reality of a Ph.D. That means four to eight years you are working on a topic during which you should churn out four to six top tier papers. It must be new research advancing the field. It needs to be embedded in related work meaning you have to read that academic related work. And you of course also have to do all these, it's kind of basically the whole IT department things. And you do that after a bachelor degree in the US or after a /*S ma err's degree here. So in terms of industry experience, that's like juniors. Right, people joining the organisation, show them the ropes, you meant for them, teach them how the stuff works, you take them to the RIPE meeting at some stage and they learn it. With a difference that there is a clear timeout attached here and that there is no seniors there and ideally if you look at like four divided by four, the first paper should already be on a submission after roughly one year. And then the most important rule of all: Bytes. People under pressure will do people things. You are just take short cuts when you are under pressure, when you feel like you are drowning in work. When you feel like you are trying to scale a mountain that is just too big for you.

And that ties in with something called the LPU which is the least publishable unit. It's usually a lot easier to publish what is novel or flashy and has impact. I mean we all probably have this like oh God not again moment when we saw yet another vulnerability coming along with its own logo.

The least publishable unit of we spend six months on engineering often used as a funny term in reviews, which was really, really challenging and we learned a lot, but it did not impact or change our results apart from making our results more reliable, the LPU of that is 0. You can actually get previews which are this was a lot of engineering effort, but where is the science?
In addition to that, academia is very, very competitive. You are measured, you are constantly measured and not in a GitHub measurement way you are measured on the number of papers on the number of supervised students, the number of acquired grants, on the committee tasks you are doing, on the courses you are holding, on the quality of student evaluations. There is just endless metrics. And there the truth is, any sufficiently evaluated metric for those being evaluated is in distinguishable from a gain. And at the same time, there is always somebody who is better at everything at the same time than you.

So what you need is more papers, more diversification of your research interests, less going into the depth to one thing, just taking the shoe shotgun approach and trying to get those numbers up. And sadly, some things you need to do to well get that going are things you kind of only learn by doing them. Like running networks. Some experiences you simply cannot tell a person about. They need to have felt that. I mean just a shock of the CLI freezing after you issued a command and you are like was that commit or commit confirm? And then you go to your car. I mean those moments, you can't tell people about them. They have to feel it.

And that is basically why measurement.netwok was born as an idea. The idea would be to change the game a little. Core idea you have some infrastructure where active measurements can be run from, which is kind of good for general equity because there is a lot of organisations like universities where especially union people do not have access to that kind of resources because if you are like a senior person that go to the IT Idempotent being like we need this for our research, it's easier than when you just started.

Ideally this should not be tied to any organisation with a publishing incentive. This is also why this in not a main funding stream of the MPI. It should be well known blockable so people can just discard it without being scared that they are currently blocking EC 2 prefixes which then, two weeks later, might host a learning management system. It should take some basic ops task of the place of researchers like monitoring based infrastructure, DNS, ensuring unfiltered BB is can be reached. It should make things for accessible. It's easy for researchers to get an ASN and temporary prefixes. What's not easy for them is knowing that this is possible and knowing how to find and contact an LIR. Which sounds like well, if you want to find and LIR, you just chat to one on the RIPE meeting. Like, no no, like, that world is somewhat removed. It's like looking at this, they just don't know.

And in addition, it should also support a scientific process like for example reviews and it should loop in people who didn't know about things before there is some unhappy mailing list rights that have to be started.

Currently the progress is that there is an ASN. And also another ASN, which is now also part of this project, which we will be talking about in the v6 Working Group. Some IP addresses which you can block, thankfully some upstream myself but is heavily pretended because main upstreams are these, and we heard about this yesterday.

There is a couple of Juniper routers being deployed. A couple of servers. Things are being set‑up. Some first measurements as test cases were run. And I'm now approaching the point where I do have a review system patched and configured so most likely some of you will get annoying e‑mails from me trying to recruit you to adopt and mentor a couple of academics. They are really nice.

So, key takeaways:

Academics try their best to do good work. The realities of academia tend to stand in the way. There is far too little interaction between academia and people running systems. And measurement.netwok is a fun project aimed at making research more reliable, accessible and tight in practice. So consider to review, mentor and contribute.

With that, one last note: The problem of engineers is they need to be artisans and scientists at the same time, and universities just make scientists and the other education powers just make artisans, we need both!

FRANZISKA LICHTBLAU: I will start using my microphone privilege to jump in on one important thing. I have been through all of this and I made my fair share of dodgy measurement campaigns. So one question for you: I mean I strongly think we should, everyone in this room, should support and try to support your project, but do you have any hints for this community how to actually interact, support, and help researchers that come into our community and want to take part and want to improve their work? What should we do?

TOBIAS FIEBIG: The thing is what I believe would help is if researchers would have the time to, when they start their Ph.D. in network measurement, to spend one or two years first, as an intern, junior, in the industry, and see it. Because that is the most valuable thing the community could do. The problem is that academia is not leaving room for that.


BRIAN NISBET: So this is fascinating. I love it. And obviously I can't speak for DFN but I'm hoping that this is something you can speak to NRENs about and can maybe speak to GEANT as well in relation to getting the academic ‑‑ because, TNC, which is the NREN conference is in, what three weeks time, and I would have loved to have seen this as an at least a lightning talk at that event to bring awareness there and I'm hoping you might look into that for negotiation year. But have you had any conversations with NRENs or GEANT about any of this? Because it seems like a really good place to talk to people about it?

TOBIAS FIEBIG: You might have noticed like the v4 prefix kind of looks like legacy because it is, so, that came from the DFN actually. I talked a bit to them, but I think one very important aspect of this project is to keep it a little bit away from the traditional path of academia and I want to hand it over to some non‑publication incentivised or academia related governance, function, body like the MAT Working Group here or maybe in the IETF, simply to kind of protect it from this A: It's not my choice I don't want play with it and B) to ensure that it doesn't then ultimately fall into the same pathways we had for the past 20, 30 years in academia. And it remains something that independently helps, instead of just being another one of those we have a lot of already.

AUDIENCE SPEAKER: Peter Hessler, speaking as a private individual. You hinted at this in your talk, but I was curious if ‑‑ so, as, you know, there are many standards to choose from. Some of them are truthful, some of them are blatant lies. Have you had collected any resources to help academics understand which standards they can actually believe and which ones are, you know, take a grain of salt and which ones are ignore it and here is the real information?

TOBIAS FIEBIG: You are asking for standards?

PETER HESSLER: Well like, for example, in the IETF, the RFC for BGP relative reliable, relatively well followed. The RFCs for HTTP and HTML especially less so where most of that is done in another body, and then the standards for EML for example, are yeah, you'll know. Kind of helping researchers understand where those boundaries and restrictions are.

TOBIAS FIEBIG: I do a lot of work on e‑mail and would like to disagree but can't. The point being it depends, right, it's like what binds us to our colleagues in the legal department. You ask a lawyer a simple question, a legal one and they say well it depends and you ask an engineer and even more so simple technical question and they are like it depends and you get a 90 minute lightning talk. So, I don't think that you can do this, you need experience, you need the ability to evaluate these documents, assess how the industry lore has developed, the role of different bodies, the senseive of different players, why certain projects are moving to other bodies and that that's something that freezing shell, you need to acquire that skill.

ANTONIO PRADO: We have a comment from Sergei. I think it's related to the first part of your presentation.

"Running a UDP only DNS where is even for the measurements sounds kind of weird or too far away from the real life."

TOBIAS FIEBIG: Yeah, it was a bad example but you can assure you it's on the level of things you sometimes find.

FRANZISKA LICHTBLAU: We have all been there.

Okay. I don't see anyone at the microphone any more. Thank you.

Next up is Jan from Akamai and he will talk to us on who actually owns all of this v4 space we keep talking about.

JAN SCHAUMANN: Who here speaks Polish?

STENOGRAPHER: I speak about 400 languages.

JAN SCHAUMANN: On my flight over here I asked what's important to know and I came up with... the boy is eating an Apple. The boy is eating bread and the man is eating an apple. You can tell we are already in Monty Python Hungarian phrase book territory over here. But okay. So...
I was going to talk about centralisation in the open source of IP blocks. And I am fully aware that I am the last thing standing between you and lunch, so bear with me.

I am from the Internet, I am here to help and my idea of being helpful is generally to complain about things. I mean, to point out interesting findings such as that RIPE NCC actually is a redundant acronym since it translate to European IP networks, to we have the European IP networks coordination centre.

Or, that this particular map that you have seen before is a lie. Which kind of gets me to the topic that I'm going to talk about, but before I do that, there is one more thing that you really need to know about me, which is that I have no idea what I'm doing.

STENOGRAPHER: Join the club!

JAN SCHAUMANN: I work at Akamai as an information security architect. I always wanted to present to be an architect. But that is information security. That is not networking. So my colleagues from Akamai is what is Jan doing up there? He doesn't do networking. That is often times where what I try to do. I try to though myself into the uncomfortable situation and talk to people I don't normally talk to. So you get a little bit of an outsiders view here at RIPE where you all have a lot IPv4 expertise than me. Keep that in mind when you are asking questions and say why didn't he just... and it turns out the answer is: I don't know what I'm doing.

Let's get started. The year is 2035. The year of end of the desktop. This is going to be it. IPv6 is next. We're going to get there as well. I promise you. And of course Amazon has just bought the last available /8255.0.0 /8 and now owns all all IPv4 space.

Now did we get here?
Well I'm going to suggest that 2035 is perhaps even too far in the future. If you have looked around at Amazon, you'll find that they are already using internally the 240 network, class E for internal routing, so that's going to be great fun when at some point we decide are we actually going to be using this network? And Amazon says you are going do break all of EC 2 if you start announcing this somewhere else. So it will be fun.

I actually tend to think that what the future really wants is probably more IPv6 not IPv4, but as I said, maybe we'll get there by 2034.

So, how did we get there? We started out and we were told that IANA assigns IP space to the regional Internet registries, right. But of course, some people early on got the gift from John Pastel the original IANA, they got their own /8 at the beginning, right. And some organisations got also additional IP space. So the distribution was not completely fair really.

If we're looking at IP space just purely by IP, we can see the division of space that we have here, the reserved one includes the multicast and the class E etc. As well as the the other reserved networks. If we take those out and just look at available IP space, then we roughly see how IPv4 is allocated to the RIRs like so. Which is not quite even.

Just for kicks, compare the IPv6 space, it's a little bit more even. IPv6 is a lot more boring in this regard. I'm just throwing it up here as an example.

IPv4 space, we started running out, an then they were being traded and reallocated. So blocks were given from one RIR to the next one, reassigned, reallocated, etc., so things are moving around. And people started buying IP space, right. So you can actually buy IP space, which comes in really handy if you happen to have a spare /9 lying around and want to make some money. You can sell that IP space.

And two of my friends of Amazon here in the audience, nothing internal, there are other companies that do the same thing, right. Google bought also plenty of ISP spaces and other companies do as well.

So you see that like IP space is being traded, sold, etc. And of course, really cool thing, AWS at some point decided these are worth some money, we can charge people for these.

And I started this whole talk basically by stumbling across the AWS IP ranges and going like, wait a second! That's a lot of IPs for just one organisation to have, right. That's a fair bit of number space that is available. And that's just AWS, that they are just saying these the ones that we're using here.

Which, by the way, large link of 5 cents or half a cent per hour per IP adds up to a net value of 6.4 billion dollars. It's not too shabby.

Anyway, again, IP space is being traded and traded and traded. And I was going to, I was thinking to myself how do we actually know who currently owns what IP space? Now, in theory, we know that we can find out this information. We have the means for this. But how can we look at this comprehensively? Can we find out on a larger scale which different organisations own which particular parts of the IP space?
And so I went like, I know, I am going to use choice, and you can tell that I already told you I don't know what I'm doing, right. Because Whois is great. Whois is neat because you can just look at this and as a humour like, oh I can make sense of this. This is great. But Whois has different format depending on what you end up on, it's all free text. After a while of processing this at larger scale you think to yourself, wait a second, this is not going to work out so well. So maybe Whois is not the right approach to this. You say wait a second, I know, I am going to use the real statistics that they are publishing so nicely on the different FTP sites or HTTPS sites nowadays, this is what that looks like and that is help. . You have this file that tells you which IPs start out with and then the count of IPs given for that particular assignment including some geolocation information. And that is really useful. And it's great. But it doesn't tell me who the IP block is assigned to in this case, right. So all I know is like there are these allocations but I don't know who it is assigned to.

So, like, ah, I know, am I going to use RDAP because RDAP is defined in the RFCs. So many RFCs. And it's uses JSON so he can process this a lot better, right. And kind of what that looks like. And that is really useful. It has good information. You have got nicely structured data there with some of the information that we might be looking for. And so I went ahead and did the obvious thing, I thought you know what I'm going to start at 000 and go through all the IP space and hit RDAP for that. I'm not going to hit every IP address, but I can hit the first one, see what the CIDR is, jump to the next one and move on from there.

And so the problem is, if you are dealing with multiple ways of looking at information, and if you are trying to look at sources of truth, and I'm sure all of you know this, if you have ever looked at your inventory database or something like that, assets you have, you know that there is a set of things that you have in the database. And then you have the set of things that you kind of know should be there. And then you have the set of things that you you are actually seeing out there. And those three things, at most intersects, they are not congruent. And that is a problem I have seen many many times. So it basically means that yeah, we have a three body problem right here. So, back to RDAP.

Yeah, it's a thing about RDAP. RDAP is great, but you want to do a couple of things. Sometimes you get a redirect to the other RDAP server. Sometimes there is data in there, sometimes there isn't data. Okay, you can deal with that. Redirects of course mean that sometimes you have redirect loops between the different RDAP servers. So etc. That an issue. RDAP doesn't include the AS information that I might be looking for, but Whois has that information so I have to, you know, mingle those again. AFRINIC has API limits. But they won't tell you what they are. I asked them, they said we're not going to tell you. I said Mmm, okay, so I hit them, I get a 429 back, which is good, but there is no re‑try after. So I have to guess.

The Brazilian RDAP server has an API limit and they tell you what it is. They tell me one request for 20 seconds. After that, to get a 403, and you are unthat list for a long time. So, you can't do this, you know, as conveniently so you have to rotate IPs to source IPs and get the information that way which isn't in their spirit either. Then you have all sorts of other issues that come after that, right. So like RDAP is wonderful, has a lot of useful information, but at the end of the day, it's the same kind of garbage in and out as Whois is only structured in JSON. Which, you know, is progress. It's regretting a little bit better.

Anyway, so, and yes, this actually now says my hover craft is full of eels, because this is how I feel when I'm either looking at this data or I'm in a meeting like this which I normally tend to avoid. So now I have this data and I'm start to go analyse it and look at it and so I said okay, let's see who is CIDR is it anyway?
So I collected all this information and looked at how many CIDRs are allocated for, you know, the given countries. I came up with 238 countries and roughly 300,000 different allocations. Countries of course is relative. The UN count something like 196 something like that. We have over 300 ccTLDs, but all right, 238 countries, two letter codes anyway.

Just by counting how many CIDRs are allocated these are the top ten. Russia leads this. Germany is second. Brazil is third and so on.

But that is just how many CIDRs are allocated. And that could be a /32, that could be a /23, this could be a /8. So it's not really meaningful information, right. It just tells you how many they are.

I thought okay etc. Go for IPs, counting who has most of these IPs actually. You find that the US and China are the first two. With quite a bit of them and then the rest of the top ten show up like this approximately, but notice that the US over here with 1.6 billion IPs covers about 43 percent of all available IP space. The top ten account for 75% of all IP space and the top two for over 50% of IP space. So that's, you know, again not evenly distributed across different markets.

I broke this down by regional registry, but first IPv6, a lot less interesting because we have so many of those addresses, right. The only interesting thing here is I found that there are some countries that don't have any IPv6 allocation. So the central African republic over here or north Korea, and they have Kosovo, western Sahara, Falkland islands, and the French and Arctic territories. Okay, those are not so surprising there are there are some countries that don't have any IPv6 space, apparently.

Anyway, so AFRINIC it looks like this. And now think back to the original map that I showed you saying this is a lie, AFRINIC apparently hands out IPs or at least has IP allocations for observed 127 countries. The first three are South Africa, Egypt and Morocco, the remaining top ten still reside in Africa, and I am going through this fairly quickly. Those slides will be available. I'll put it up, you can look at it later yourself. Again you can see centralisation there, where a certain number of allocations are concentrated in certain markets. For AFRINIC, this does not even completely overlap with the expected population charts or even economy sizes which we will see in some other markets which this alliance a little bit more. Like APNIC, not surprisingly covers a lot of Asia, China, will be one of the top players. The first one, and then Japan and here are the other ones, as the top ten.

There, again, the top three account for 75% of all the IP space that APNIC has.

ARIN obviously concentrates in the US. Interesting thing there, the US covers 95.4% of all of ARIN space. Covers 36% of all IP space. LACNIC is a little bit more concentrated in its assigned markets so to speak. But there again, top three account for the majority.

RIPE, take the first place with 164 countries. Which is quite a lot. More than just Europe, as you might expect. But attest loo the top ten stay within there approximately.

All right. I then looked at what types of IP space are they actually? We have different allocation types. And of course each RIR use as different kind of string for this because the RFC says type is a string. Okay. Cool. Now anybody can just pick their string. So AFRINIC uses seven different types, ARIN 4, APNIC 5 types, LACNIC, I couldn't find documentation but I found 5 observed, RIPE has 11, but you have this kind of diverse set of data.

Breaking it down it looks like so. I said okay, and this over here is from APNIC to give the distinction between what these different things might mean. But different things mean different things to different RIRs. So it's difficult to view this and say this is this and this is that because it might be depending on type.

So, for AFRINIC, we see these over here. And what I really care about is not so much the allocations to local registries, but more the allocations to the end user, right, this is kind of what I care about. And so, for AFRINIC, those are assigned PA and assigned PI, so we can focus on those for this particular aspect.

APNIC that is these distributions. And uses assigned non‑portable for end users officially. And if you are wondering why there is such a large number of unknowns it's because JPNIC etc. It is null. So that's not helpful.

ARIN uses assignment for end users, although of course sometimes the direct allocation is also an end user, because now we have to decide what actually is an ISP, what is a local Internet registry, is Amazon one of those perhaps? Right, because you could argue Amazon uses EC 2, AWS, there are other people there so it reallocates things. Is the the bank that uses IP space for its ATM network something that reallocates things? From my perspective, they are really just that still same organisation, so I care about them as well.

LACNIC uses assigned and reassigned, it is position of having unallocated blocks in this case. And RIPE NCC uses these types.

But we also have legacy. Legacy of course again is another type that may or may not be end users.

I also looked at the / sizes. So, what is the most common allocation that is we find, /24, not surprisingly takes the lead over there. People tell me that pie charts are stupid, so I should use a perennial start. /24 is the big one. Something that's interesting is there is a large number of /32 assignments that we have, but of course we also have a bunch of /8s still.

LACNIC is the only one that's an out layer with the /2222 being the most common assignment, which I thought was interesting. And by comparison IPv6 looks like so. And hey, /24 is also the same most popular allocation for IPv6. Of course that's like to gazillion IPs versus 256 but they share that in common.

Then I went to net names, I said let's try to find out who they actually are, these different networks in different allocations they they look like so. I went okay now I can count them by IP address instead of frequency. I have so somehow map these strings to the actual organisations. Which isn't easy to do necessarily because the data is or is not in RDAP, that's shorter data that we have. You can correlate to some degree. You can also identify AS numbers, correlated to the different networks that you have, and again you'll notice that we have a many / too many mapping layer, we have multiple AS, multiple net names and multiple identities. And of course different organisations show up multiple times because they are the same entity but different identifiers. If you want to correlate that, you have to know that level 3 and century link are the same company nowadays, companies purchase each other and take over IP space so it's not easy to look at this.

I found about 63 distinct AS, if you go by AS count it looks like so. Again these are assigned to these different companies. Something to note there actually is a stark industry solutions, like stark industry, like iron man, his company, there is apparently a British VPS provider that uses that name.

Top AS by IP count. Again, looks like so. You break that down and then you merge all of these things. You take the net names, the top IP count, the AS dataset and eventually you have to do it manually for the most part. And so what I then came up with, is basically the top organisations AS combined by IP count looks like so. DOD takes the cake. Then we have Amazon, China Telecom, AT&T, Verizon, etc.
As chunks of all IP space, it's DOD with 8.9 and so on.

What's interesting here is me is that there is two companies that have different from the others. All the other ones, except for the DOD, all will other ones are telecoms, all the other ones are ISP telecom providers, Microsoft and Amazon are the only ones that are not.

Rushing through this. Whose CIDR is it? It's really hard to tell. Distinguishing between the end user and what becomes an ISP or an LIR is really different to make. There is a lot of inconsistencies in the real definitions across the RIRs, that again makes it difficult to get a global view of this data. There is inconsistent data in RDAP looking up information, sometimes you get back the first name of the person who is the contact. Sometimes you get the company name. And so it's difficult to then correlate that.

The RIRs seem to be a lot less regional than you might initially expect based on the trading of the IP blocks. About 30% of all IPs appear to be managed by just 13 organisations which for a global Internet is, you know, a different thing. It's less distributed than we might like.

The DOD still controls over 8% of all IPs, which is great fun because you may remember that a few years ago, all of a sudden they started to announce this IP space that they hadn't previously announced like the 11 /8 and half the Internet lost its shit because crab you are breaking me because I am hogging your IP internally. As I mentioned earlier, not a good idea to do that. Of course they are still being announced nowadays. Amazon and Microsoft take the two top spots there as non‑ISPs. IPv6 is a lot more boring. And I think I like boring, we should do a lot more of that.

Anyway that's my talk. Thank you. I will make this available. I know I rushed through this but all this data will be made available, so you can dig into that. Before you go to lunch we can answer questions if there is time. Thank you.


AUDIENCE SPEAKER: Jim Reid, just a random punter off the street. Interesting work and I appreciate the problems you have because of these inconsistencies being how data is stored at all the various RIRs and all the repositories. But I was wondering when you were looking at it if you had considered using RIPEstat as initial starting point for your data mining exercises?

JAN SCHAUMANN: I did consider that. What I wanted to do was get an IP of what is allocated not just what is being announced and that was the main reason why I excluded that dataset. Not to get another part of the elephant that I'm touching blindly there, right. So, there is a lot of IP space that is assigned that is not being announced. I wanted to look at who actually owns that and that is where the DOD comes in and the other space. It's definitely worth looking at.

JIM REID: One other point I would like to stress at this stage is that although you are assigning addresses to countries based on what their allocations are and most of them their various RIR databases, doesn't necessarily follow those IP address are used in that location. You have to think very carefully about that, save for example for multinational companies are operating globally, so just seeing a huge chunk of address space goes to country X doesn't make it sense that that space is being used there or even is actually active there.

One other thing, sorry, please try not to talk about open source of IP addresses, that can be a very difficult conversation.

AUDIENCE SPEAKER: So, really interesting talk, first of all. I am also very annoyed at the inconsistencies in RDAP, and I would encourage you, if you want to help to try to fix this, to join the Database Working Group, at least for RIPE. Thank you.

AUDIENCE SPEAKER: Hello. You mentioned a list of a letter ISOC codes that does not have IPv6, and you mentioned Kosovo between them but Kosovo does not have ISO code so how's that.

JAN SCHAUMANN: Those IPv6 allocations were based on the RIR stats only because I did not want to iterate through all of IPv6 and to test it individually, right. So, what the RIRs published they did not include those country codes, and that's possible either for allocation reason that they are grouping those into other locations, they are saying look Kosovo does not get its own allocation because it's included in that region, I don't want to get into the geopolitics that have, but that might be something that comes into play there.

FRANZISKA LICHTBLAU: Thank you. Do we have anything from the remote participants? No. Okay. I can assure you the pain of trying to parse RDAP will pass after a couple of years, you are going to be okay.

Everyone who is interested in assignment types and things like that, if I'm correctly informed, there will be a presentation in the Address Policy Working Group on that, and with that thank you very much. And before everyone is rushing off, I would like to remind you we still have opening lightning talks for Friday, so if you like to submit something, submissions are open, we are happy to receive them. You can still nominate yourself for the Programme Committee until 3:30 and please remember to rate the talks and enjoy your lunch.