Originally known as the Leibniz formula for calculating Pi, it was discovered that Indian mathmematician and astronomer Madhava of Sangamagrama (or perhaps his followers) had formalized this method of calculating Pi in the 14th-15th century. It's now known as the Madhava–Leibniz method of calculating Pi, a naming improvement since Leibniz already has too many formulas named after him.
It's beautifully simple. 1 - 1/3 + 1/5 - 1/7 + 1/9... continued to infinity gets you 1/4th of Pi.
And it's very easy to write in code. Here it is using python:
quarterPi = 0
for i in range(0, 1000000):
numerator = 1 if i % 2 == 0 else -1
denominator = i * 2 + 1
quarterPi += numerator / denominator
print(quarterPi * 4)
If we run this code, we get 3.1415916535897743
, which is surprisingly close to Pi, accurate to the 6th decimal.
Infinite series can't really be calculated to completion using a computer, but that for i in range(0, 1000000):
line controls how many iterations you calculate for (currently 1 million).
If we crank it up to 100 million iterations we can really make our CPUs work:
3.1415926525880504
100 million iterations took 180 seconds on my computer, and it's now accurate to the 8th decimal. Now, what about if we only loop 100 times?
3.1315929035585537
Not very accurate. It took my computer less than half of a millisecond to compute 100 iterations.
It's incredible to think that mathematicians were able to come up with this formula in a time when they couldn't necessarily check the results by hand.
User /u/Another_moose came up with this one line version of the algorithm:
sum(-(i*8%16-4)/(i*2+1) for i in range(10**6))
Mathematics can really illustrate how powerful programming can be, often with very little effort. Code representing math often ends up being some of the most concise, maybe because these two fields of study share some deep parallels.
Imagine you're a king or queen ruling over a vast realm.
Key to your power is knowledge. You've maintained a network of horse riders and stables for your messengers to use as they bring news from far parts of the kingdom to you.
One day, you have an incredible proclamation, a wedding invitation for the grandest ceremony and party.
You pay several local villagers to ride their horse to a city in the realm, each carrying a "save the date" letter. Since the date is far off you tell the messengers that their job is done after delivering the message.
This would be analogous to sending UDP packets. Messages are sent without confirmation of delivery. After giving these instructions to the riders you realize you don't know when or if the letters arrived at their destination.
After sending the letters you get some unfortunate news. One of the most important guests cannot attend that date, and so the wedding date will have to be changed. You repeat the process, sending out messengers with the new date.
Some of the letters with the new date arrive sooner than the letters with the old dates. You end up getting confused envoys asking which date is the real date. This is known in computer networking as Out-of-order_delivery.
Determined not to create any more chaos you hire the wisest thinkers to implement a system that can deliver messages in order reliabily. Here is a system they came up with. They call it TCP.
You use a trusted courier that will deliver the message, then return with confirmation that the message has been delivered. Easy enough. Your thinkers also tell you to write down a number on each message corresponding to the number of letters sent to that destination. That way the recipient knows if a message comes out of order.
TCP is like this trusted courier. TCP is a protocol that provides reliable, ordered delivery of data.
Once the courier returns, they get sent out again, this time to acknowledge (ACK packet) that the confirmation was received by the original sender. Although it's a lot of travel, missing messages can be noticed when the acknowledgement has gaps in the numbering.
Some computer terminology: The time it takes for the messenger to travel from the capital to its destination is called Latency. Instead of messengers you would call them "packets". Ping would be how long it takes for a full round-trip. In this analogy some of the messages may be time-sensitive (especially as the wedding date gets closer). If a message doesn't arrive to its destination for any reason (bandits? treacherous cliffsides?), it might not be noticed until the next message is received, and then it would still require a messenger to be sent back to the capitol in order to request a copy of the missing of letters. This increased number of round-trips and additional latency is one of the big reasons TCP is not a great system for fast communication (like in fast paced games).
In real life before trains, latency would measure in days, weeks, or even months for very far off destinations. But in computer networks, latency is anywhere between a few milliseconds to... well, actually there is no upper limit.
Light could theoretically circumnavigate the Earth within 133 milliseconds, which means our best fiber optic cables could feasibly see latency as low as 66ms for somebody on the other side of the world. This number is optimistic because there's always slow down from hops along different routers to and along the internet backbone, and from wifi or satellite. There can also be lots of internet "traffic" which bottlenecks some routes around the internet causing lag spikes.
When you navigate to a website or play an online game, you can imagine signals traveling back and forth between your computer and the server that hosts that website or game. Every time you see the page load or the contents update, some data must have been exchanged. Imagine if a message is lost because a line went dead or one of the many routers on the route had a bug or reset. In UDP those messages would be lost forever. With TCP, there is a way for the sender and receiver to request the missing messages but with a big latency penalty equal to the round trip time. The basic principles of message passing still are followed in the same way they were when the word was spread by horse and carriage. Many complex networking related topics like the Cap Theorem can be explained by the horse and carriage analogy – because at its core it boils down to simple message passing.
I use my laptop often and deal with neck strain, so I thought I'd build myself a lapdesk capable of raising my laptop closer to eye level.
The design is pretty simple. A stand and base connected by a basic doorhinge, and some legs that can lock into slots along the base.
I had originally tried using a CNC router to cut out the piece but found it took too long to cut. This time I used a laser cutting machine.
Using the lightburn software, I mocked up the designs. Lightburn is somewhat cumbersome, I would recommend using a different software like illustrator and export to SVG.
This is the laser cutter. Shout to Seattle Makers.
The cut was pretty quick, less than a half hour. It's also quite precise. The edges are burnt uniformly, giving it a pretty unique appearance. The plywood also smells nice and woody while lasering.
Once cut, I glued pieces together, essential to have gaps for the legs to slot into.
You can see the design taking shape. I sourced regular old door hinges and assembled the final product.
Here's a side profile of an earlier design (the CNC routed attempt).
The completed product
I made this using a laser cutter, lightburn software, and an anodized aluminum card.
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 25%
(b) 50%
(c) 50%
(d) 100%
What do you think the answer is?
And just as crucially why do you think the other answers are NOT correct? Take your time, this one is a head scratcher.
---
---
---
---
---
Ready for the answer?
---
---
---
Well, they all seem kind of correct!
(a) 25% can be argued as correct because there's a 1 in 4 chance of landing on (a).
Both (b) and (c) look correct at 50%, since they together account for half of the options.
And finally if all of the options we've seen have been correct, then wouldn't (d) 100% also be true?
In this scenario, we have a circumstance where there are no wrong answers. How can this be?
Reflect for a moment whether you disagree or agree.
This problem is so fascinating because it's self-referential. The answer affects the question. It's asking about itself. A cool property of self-referential questions like this that unlike normal questions, the options available changes the answers to the question.
Let's change the question around a little bit to get an idea of how it works.
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 0%
(b) 25%
(c) 98%
(d) 100%
There's one obviously correct answer about. (b) 25%. One in four chance of landing on (b), and no other answer seems correct.
It might, at first glance look like (a) 0% could be a correct answer, but if it were correct wouldn't the probability of picking it be greater than 0%?
In this question, 0%
is a funny answer, because it can never be right. It's like saying This statement is false. The question is asking what the probability is of being correct in choosing 0%. If the correct answer is 0%, you couldn't possibly be correct!
It's a contradiction, and can really be seen in this example:
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 0%
(b) 0%
(c) 0%
(d) 0%
It can be hard to keep track of what the question is even saying. 0% probability implies we cannot possibly get the answer right. But since it's the only option, we're guaranteed to land on it 100% of the time.
0% is wrong in that you can land on it. That would then make it correct in saying 0% is the wrong answer. Does that make it right, wrong, or somehow both?
Reductio ad absurdum
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 3%
(b) 50%
(c) 50%
(d) 99%
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 1%
(b) 75%
(c) 75%
(d) 75%
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 100%
(b) 100%
(c) 100%
(d) 100%
In the last example, all options are true! So convenient.
If you chose an answer to this question at random,
what is the probability you will be correct?
(a) 0%
(b) 25%
(c) 25%
(d) 100%
Sadly, none of these options make much sense. 25% appears twice, so you're 50% likely to land on a 25%, which sounds suspiciously like a contradiction.
I originally found a variant of this question on Imgur.
In this original question, it appears that there are no correct answers. Little appears to be known about the original author. The question is so interesting because it's such a rich and fascinating insight into mathematics, and how we grapple with probabilities.
Both Bayesian and Frequentist statisticians would find fault with the question itself.
A frequentist could take samples of the question - but we don't actually know how to verify if an option chosen is actually correct!
Take our original example with 25%, 50%, 50%, 100%
options. We would indeed land on 25% with 25% probability, 50% with 50% probability, but 100% with only 25% probability. So 100% could be argued as least correct of all of these.
I wrote a script to use Python 3's built in random library to sample the questions and compare the answers. Be warned, this code doesn't answer the question directly but instead its counterfactual: Assuming the option chosen is true, what is the chance of it being landed on? Here's a sample output of the script.
0.25 is within 1% correct: 0.250050
0.5 is within 1% correct: 0.498430
1 is incorrect: 0.251520
My good friend mgsloan argues,
'If (a) (b) (c) are all "consistent" and that is the same as "correct", then it could be argued that the correct answer should be 75%, and so no answer is correct.'
He brings up a convincing point that only deepens the mystery of this question. What if (d) were instead 75%? Would it be more true? Would that then make the true answer 100%, or would it still be 75%?
Bayesians fair no better, even though it almost sounds like a Bayesian question in the form Given I chose this option randomly, what's the chance it's correct?
Formulating it into a mathematical equation we get:
The above statements are recursive, which makes sense given the problem. Bayes' Theorem states that we can rearrange the probability like such:
The last statement only confirms our suspicions that this problem is inherently self-referential, yet it yields no secrets about itself, besides how the question never was Bayesian to begin with.
If you were looser with mathematics, you might argue that you could take a random normal distributions and multiplied it by the options supplied. 25% + 50% + 50% + 100% = 225%, averaged would give us 56.25%, which makes no sense either, since we're adding together probabilities from options that could be incorrect and therefore shouldn't contribute to our total.
I've heard some argue the set of solutions is of cardinality 3, and it looks like {25%, 50%, 100%}, therefore the true answer is 1/3, 33.333...%. I have no comment whether or not this approach more or less accurate than thinking about 4 options.
How can such a simple question be so impenetrable to analysis?
In my opinion the reasons we come up with for our answers to questions like these is the fascinating part.
Kind of like time travel paradoxes, self reflective reasoning can lead to all sorts of strange contradictions like Godel's incompleteness theorem. These problems are the edges of reasoning because of their abstractness and yields not a correct answer, but in their self-reflective nature turns the mirror onto us and lays bare the rationalizations we construct to justify them.
One of the more challenging bugs I've encountered is widespread, industry-standard and looks like normal working code but with consequences serious enough slow to a crawl well provisioned big-enterprise systems.
With caching a common pattern I see is to check to see if the cache contains a value for a key. If it exists, get it from the cache, otherwise fetch the data from the database then save it into the cache.
In Node.js using Redis's node library and SQL such as PostgreSQL, this pattern looks like this:
redisClient.exists(key, (err, exists) => {
if (exists) {
redisClient.get(key, (err, value) => {
resolve(JSON.parse(value));
})
} else {
postgres.query({text: "SELECT * FROM accounts"}).then((results) => {
redisClient.set(key, JSON.stringify(results.rows), 'EX', '3600');
resolve(results.rows);
})
}
})
'EX', '3600'
in our redisClient.set()
tells Redis our key has a time to live (TTL) of 1 hour. A better way to write this would be:
redisClient.get(key, (err, value) => {
if (value !== null) {
resolve(JSON.parse(value));
} else {
sql.query({text: "SELECT * FROM accounts"}).then((results) => {
redisClient.set(key, JSON.stringify(results.rows), 'EX', '3600');
resolve(results.rows);
})
}
})
We've removed the if (exists)
, which minimizes calls to Redis, but also eliminates the possibility that our key expires in the short time between our redisClient.exists()
call and our redisClient.get()
.
This is usually a good enough solution – I would recommend stopping here unless you've profiled and found your app needs even more performance.
Let's say the above code exists in a function called getOrSetCache()
. Get it if it's in the cache, set it if it's not, resolve with the value. This getOrSetCache()
implementation does have one surprisingly common edge case where our caching falls short and we query the database more often than we need to.
for (let i=0; i<5000; i++) {
getOrSetCache(key);
}
It's a cache right? You couldn't fault somebody for assuming wrongly that these 5000 calls should only result in 1 database query.
The issue is in these lines:
redisClient.get(key, (err, value) => {
...
sql.query({text: "SELECT * FROM accounts"}).then((results) => {
...
redisClient.set(key, JSON.stringify(results.rows), 'EX', '3600');
Each one of these callback functions (err, value) => {}
is delayed as we make the network requests to Redis and SQL. That means that in between the short amount of time we call get()
, query the db, then set()
, other processes calling the same code will execute their get()
call and find the cache empty, triggering more database calls.
Node.js' event loop will execute the get()
part of our function 5000 times, each of them sending network requests to Redis who diligently finds nothing. Then SQL would be queried, 5000 times. This sort of bug can bottleneck the database, especially on repeated complex queries.
This sort of performance bug isn't unique to Node.js. By using a cache that interfaces via network/api calls, a delay is introduced. Even if we used a synchronous in-memory cache, there's a delay while we wait on our database query, any thread checking the cache will not see it populated.
Sometimes in an app, especially one that handles data sources from multiple databases, a column from one database might reference an UUID for rows stored in a completely separate database. In these cases, it's easy to end up writing code that loops over values in this column, and ends up making duplicated database calls to the other database.
User input can also lead to cascading database calls. One of the more interesting performance bugs I’ve seen in my career was while I was contracting with Microsoft, which I'll get to in a moment.
First, how do we solve this caching issue? We need to add batching to our cache. Thankfully, this isn't hard to do with promises.
const batched = {}
function batch(key, promiseFunction, args, thisObj) {
return new Promise(async (resolve, reject) => {
if (batched[key] === undefined) {
batched[key] = [{resolve: resolve, reject: reject}];
try {
const value = await promiseFunction();
batched[key].forEach((promise) => {
promise.resolve(value);
})
delete batched[key]
} catch (e) {
batched[key].forEach((promise) => {
promise.reject(e);
})
delete batched[key];
}
} else {
batched[key].push({resolve: resolve, reject: reject});
}
})
}
The first call to batch(key) will call promiseFunction and await for it to resolve with a value. Any calls to batch(key) in the meanwhile with the same key will also wait on and resolve when that first promiseFunction resolves.
It might be tempting to add batching to the code that queries the database, but the proper place is as part of the cache like so:
async function getOrSetCache(key, dbQueryPromise) {
return await batch(key, () => {
return new Promise((resolve, reject) => {
redisClient.get(key, async (err, value) => {
if (err) {
reject(err);
return
}
if (value !== null) {
resolve(JSON.parse(value));
} else {
try {
const results = await dbQueryPromise();
redisClient.set(key, JSON.stringify(results.rows), "EX", "3600", (err) => {
reject(err);
});
resolve(results.rows);
} catch (e) => {
reject(e);
}
}
})
})
})
}
The inside bulk of the code should look familiar. It's called like this:
const rows = await getOrSetCache('key', async () => {
return await sql.query({text: "SELECT * FROM accounts"});
})
You can even have the key be the query string to the database itself.
const queryString = "SELECT * FROM accounts";
const rows = await getOrSetCache(queryString, async () => {
return sql.query({text: queryString});
})
Redis supports arbitrary strings as keys, so your queries will be cached with the query string as the key.
For PostgreSQL users, it can be helpful to have Node.js log all of your database calls to determine how many calls your app makes.
It's also a bad idea to over-use batching. Generally you should only pair it with a cache. Why? I'll illustrate using an extreme example: What if you batch all database calls?
It sounds very appealing. Wouldn't you save in performance if duplicate queries only triggered one database call?
On reflection it's obvious why it's bad. Repeated INSERT
will disappear. Transactions may not reflect updated states of the database.
It only takes a few lines of modern javascript to monkeypatch every query we make to be batched:
// DO NOT use this code, your queries will lose ordering
// postgres.oldQuery = postgres.query;
// postgres.query = async (...args) => {
// return await batch(args[0].text, async () => {
// return await postgres.oldQuery(...args);
// })
// }
postgres.query({text: "SELECT username FROM accounts;"}).then();
postgres.query({text: "INSERT INTO accounts (username) VALUES ('Jimmy Page')"}).then();
postgres.query({text: "SELECT username FROM accounts;"}).then();
The 2nd SELECT
here could get batched, returning with the same values as the 1st SELECT
. Potentially we may not see user Jimmy Page in our 2nd SELECT
. Although our database may have levels of isolation, our queries do not.
However there is one place in our applications we already make the assumption that our data may be out-of-date. The cache. Use batching on the cache, where it belongs.
At Microsoft, there is a dashboard used by hundreds of employees on a daily basis. Users complained about slowness so I was brought in to help.
These dashboards would query various databases whose results were cached in Redis with a default 24 hour time to live. Many of these queries were -large-, written by data scientists and researchers and could take upwards of 15 minutes to complete. Even worse, each page would often have duplicate queries because each page contained up to several dozen charts, each one customized to trigger a database call on load.
The 24 hour ttl on the cache keys would naturally synchronize to people's work schedule so they would start expiring around 7-8am, causing cache misses... and database calls. Lots of them.
The psychological effect of this was disastrous. The first person to load the dashboard in the mornings, could find it took a few minutes for the first chart to load. But the code, even with a cache, was showing loading spinners on other widgets that all make the exact same database call. So what does the user do? They hit refresh. Because they've been trained by the UI to expect a refresh to populate these empty charts.
These refresh's would trigger an even bigger cascade of database calls for the remaining cache misses. Imagine being the 4th person in the morning to load the page and see cache misses. The first 3 user's queries are still in progress. They are waiting at the end of an enormous queue of queries, and their charts won't pull cached data until they hit refresh enough times that they don't see a cache miss.
In the meanwhile, the poor databases were chugging along, fulfilling large workload duplicate requests that could have been batched away into a single call.
Using a combination of server-side batching and an optional cache auto-refreshing, our team eliminated the bottleneck, to the delight of the research teams.
Caching is an optimizing technique. It's best done once you have metrics on performance. Consider batching with your cache as a solution to large volumes of the same database call.
Sometimes code can't be cached, because your app needs to be up-to-date with what's in the database. While you may still optimize your code to remove redundant queries, know that in these cases, these successive reads are a good thing. Your app is maintaining its state against the database, doing what it's supposed to be doing.