System Design for Google Maps
It could be a complete mismatch with what actually goes on in Google Maps.
This only shows how one can logically reason and try to draw some estimates about a big cloud service.
Google maps shows images of earth and allows zooming in and out of these areas.
So a good starting point is to know how much area does Google maps cover and how much storage would it need to cover all the area.
Earth has an area of around 500 million km2.
Removing 70% water and other non-interesting area, we are left with 10% area that should be actually mapped ~ 50 million km2 = 5x107 km2
Lets say we have one image for each 10mx10m block, and size of each such image is 1 MB
Assuming 2 road-names and 1 building-name per image: 300 bytes per image
We can ignore such contextual information in regards to the actual image size of 1 MB
We will have 100x100 such blocks per km2, so 10,000 MB of images per km2
For 5x107 km2, we will need 5 * 1011 MB space which is ~ 1012 MB = 1000 PB
We also need to store zoom levels.
Assume a factor of 3.3 when zooming up or down, we need to calculate image-data at following scales:
Adding these, we get an upper bound of 1500 PB
If you think this is crazy and probably there is a mistake in the calculations, then you may be right but 100s of petabytes of data is not uncommon these days.
For example, Yahoo! revealed in 2014 that it stores around 500 PB of data in over 40,000 servers!
Also read this interesting article about how a startup Backblaze bought hard-drives when their customers were consuming 50 TBs each day.
So we assume that 1500 PB of data is not a bizarre number and we can go ahead with our analysis.
Number of storage nodes requiredAssuming a single machine could manage around 15 TB of space, we would need 1500 PB/ 15 TB = 100,000 database shards.
We would need some replicas with each shard to handle read traffic and to add fail-tolerance.
Assuming 1 master 2 replicas per shard system, we would need 300,000 database nodes, each holding 15 TB of space.
BTW, 15 TB of space per machine may not be too costly these days.
Also see how Backblaze got some really cheap storage.
Managing 300,000 database nodes could become a nightmare.
So instead of just relying on horizontal scaling, we could use some vertical scaling as well.
Let us see theoretically how much memory is addressable by a 64-bit machine.
232 = 4 GB (4*109 bytes), therefore
264 = (4*109)2 = 16*1018 bytes = 16*106 TBs = 16000 PBs = 16 EBs (Exabytes)
Thus a 64 bit system can theoretically address 16 EBs of space and can theoretically host the entire 1500 PBs of data in a single machine!
But in practice, systems with such huge memory become exorbitantly expensive.
So we don't know how to avoid this nightmare of 300,000 servers and to continue our analysis, we would assume that so many servers are actually being used.
Traffic analysis for Google Maps
Maps are from satellites, but where does real-time traffic come from?
Governmental transportation departments install solar-powered traffic sensors on major roadways to gather planning statistics, improve accident response times and increase traffic flow. Google can partner with these departments to share the costs of sensors while getting a share of the traffic information.
For smaller and remote areas, devices running the maps application provide the actual speed the device is traveling to measure speed.
More at this link
Also see: Whos got the most web servers
|Email:||(Your email is not shared with anybody)|