Managing Information: Data and Analysis Integration

The challenge of our time today, in Computer Science at least, is our world’s data, the world’s bytes. The ratio of the world’s data that is available by online search, compared to all of the world’s data, will be one. The deep web, the hidden web, the data that is stored behind forms, that has not been crawled by search engines, will soon be searchable. Personal data will be shared with those with whom we wish it to be shared. Database systems are becoming fully self-managing; the high cost of ownership, the installation difficultly, attunement problems for database-dependent applications, are all being overcome. The need to alleviate these high costs are enabling businesses to emerge in third-world countries, where data will be served for cheaper than ever.

As the ratio between the world’s data, and the amount of searchable data tends towards one, so the gap we see between the amount of data (and the diversity of data), and the amount of data that we can actually use, will continue to grow. Not only that, but the kinds of data we are beginning to store is the kind that is huge, and growing at an unimaginable rate – images, sound recordings, video, maps of the human genome.

Debunking the Popular Myth about Computing

“Computers are becoming so sophisticated, soon I’ll just have one computer to do everything for me.” – This is almost an exact quote from a presentation I recently heard from one of the smartest people I know, and it caused an interesting debate in my class. In my opinion, it’s not that computers are becoming sophisticated that matters, what matters is that they are becoming more complex, and complexity means integration. Integration means many computers, perhaps thousands per person. The idea that it’s boiling down to a person just having an iPad or such as their only device is contrary to every trend before us. We have on average more computers than ever per person, from our iPods, blackberries, laptops, to our heart-rate-monitor watches and analysis fridges. The integration between these devices is the ideal outcome we should be striving towards as computer scientists, really – the constant communication between my temperature-sensing watch and my air-conditioning system. The operational term for this is ubiquitous computing.

Where Does Our Data Come From?

Take a look at your cellphone. How many buttons does it have? My iPhone has 4 that I can count, and a keyboard on the screen. The blackberry next to me has a couple too, but not much more than the first ever computer had. These systems for taking in data are almost wasteful compared with the computing power of these devices. The new paradigm is about taking in data, not from the internet or from text, but from the real world. It’s about inputting images, videos, information about your body’s state, using microphones that are always on, always aware and taking commands from the world around it. Human genomes, brain activity; there is so much data that is about to be accessible to computers – how are we going to sort it, how are we going to transfer it, store, it – how are we going to analyze it?

Bring Back A.I.

Computer vision, computer learning – these are things that need to be brought back into A.I., that have been lost in the previous decade. I know of computer vision conferences being gathered together at this moment, and it’s very exciting, but altogether there is simply less progress on the matter than there should be in our time, despite uplifting results. Cars can drive themselves for hundreds of miles without any human interaction – but there are still tens of thousands of fatal car accidents occurring every year. People spend 1 to 2 hours of their day in traffic – traffic? The nation’s highways are crowded by three times as much traffic as were the predictions for our time. How can we increase the quality of people’s lives by increasing the efficiency of our transport systems? How can we do this using A.I.? Robots can identify objects, open doors – now how can we get them reasoning about many tasks at the same time? How can we bring A.I. into robotics, and bring that into every day life?

Biology and Computer Science

Our time is a time of biology – the discipline that had the greatest impact in the last 50 years was computer science, but in the next 50 years it will be biology – and that is precisely because of computer science. Computer readable data is coming through in an ever-increasing amount; from astronomy, biology, neuroscience – in amounts far exceeding the human ability to comprehend and read. Computer science helps by taking this data, storing it, analyzing it, and constructing simulations that can help up interpret the data as knowledge. We can have systems produce a hypothesis that we can test in the lab. In days biologists can get the full picture of the human genome, and understand the condition and activities of a population of cells. They can figure out what effect certain mutations will have on the organism – biology really tries to understand the most complex system in the world. Computer science is the tool that biologists can use to create an understandable model of these things, and even create predictions and run experiments to verify theories that never have to take place is reality. If we go and mutate a gene, what will be the effect of that? We are even beginning to understand the function of the brain, at a neurological level, based on simulations. Is there a more interesting field of research at the moment?

Security

Spammers will continue to grow, and continue to defeat programmers.

Computer Security is often an afterthought to all of the great trends. It’s about where we’ve been and where we’re going, and then the security. New technologies like mobile devices simply leads to the next generation of security development. What attacks will be made on the mobile platforms? How do we pre-empt these attacks and prevent them?

The number of malicious URLs is growing all the time; phishing is the closest thing to the perfect crime in our age. But the battleground is shifting; a lot of money is to be made by attacking users through the internet, but what is more exciting for hackers today is attacking mobile devices. So inevitably this open the markets for protection for mobile devices, innovative firewalls, anti-virus, and so forth. As cars become more computerized, they also become more vulnerable – the Lexus with a GPS, the home entertainment systems with internet connection, and so on, are the new targets for these hackers. How about the shift in the U.S. for managing political elections electronically? What is the state of security for the voting systems – have attacks on digital election systems already occurred? Human/computer interactions – how does the average person know what applications to allow to dial out? If my mother’s computer says that “Application ‘xyz’ wants to access the internet,” how is she supposed to know what to do?

We’re all connecting on platforms like Twitter, personal blogs, etc., and this is great, but it puts user privacy at risk. Email addresses, telephone numbers, and global positioning is quickly becoming more available on sites Facebook. In a generation that takes such security for granted, and that is comfortable with making personal data open, who knows how much more vulnerable our security will become?