The age-old and trusted CAPTCHA mechanism, which used to deter the use of bots to open fake accounts or to cast votes, is no longer as fool-proof as it was initially meant to be.
A team of researchers at a California-based AI firm claim that their algorithm can identify numbers and letters from CAPTCHA shapes in websites.
CAPTCHA codes have been used by websites for over two decades and are usually considered much harder for AI bots to crack. This is because they add clutter and crowd letters and numbers together ‘to create a chicken-and-egg problem for character classifiers’, say the researchers.
However, a team of researchers from artificial intelligence firm Vicarious has now proved that CAPTCHAs are no longer impregnable to advanced algorithms that can perform a variety of visual cognition tasks, recognise contours and surfaces and can imagine objects with unusual appearances.
‘We introduce a hierarchical model called the Recursive Cortical Network (RCN) that incorporates these neuroscience insights in a structured probabilistic generative model framework,’ they wrote.
‘RCN integrates and builds upon various ideas from compositional models — hierarchical composition, gradual building of invariances, lateral connections for selectivity, contour-surface factorization and joint-explanation based parsing — into a structured probabilistic graphical model such that Belief Propagation can be used as the primary approximate inference engine,’ they added.
Basically, the algorithm has been designed in such a way that it can mimic the way a human brain responds when it sees objects with unusual appearances, like numbers and alphabets in CAPTCHA codes.
This isn’t the first time that the AI firm has succeeded in breaking CAPTCHA codes using AI bots. In 2013, the firm announced that its researchers had decoded CAPTCHA texts used by leading firms like Google, PayPal, and Yahoo with high accuracy.
Even though CAPTCHA tests have since become more complex, the researchers have demonstrated that they are still able to decode complex CAPTCHA codes used by the likes of PayPal and Yahoo with around 60% accuracy.
‘A CAPTCHA is considered broken if it can be automatically solved at a rate above 1%. RCN was effective in breaking a wide variety of text-based CAPTCHAs with very little training data, and without using CAPTCHA-specific heuristics,’ the researchers said.
‘It was able to solve reCAPTCHAs at an accuracy rate of 66.6% (character level accuracy of 94.3%), BotDetect at 64.4%, Yahoo at 57.4% and PayPal at 57.1%, significantly above the 1% rate at which CAPTCHAs are considered ineffective.’
The fact that the researchers could find a way to beat CAPTCHA also means that similar techniques could be used by cyber criminals in the future to create fake accounts in websites, bypass website security, and to influence voting results. As such, websites and internet giants need to come up with new techniques to prevent malicious usage of bots in the future.
‘We’re not seeing attacks on Captcha at the moment, but within three or four months, whatever the researchers have developed will become mainstream, so Captcha’s days are numbered,’ said Simon Edwards, a cyber-security architect at Trend Micro to the BBC.
‘The very nature of big data analysis and machine learning is that if you give it enough data to play with, it will eventually work out most things,’ he added.