All three of the principal challenges, faced by the DataLab Project in the early years were related. But data storage, retrieval and manipulation were also considered somewhat independent challenges to the overall goals of commercial, academic and government utility. What all three had in common was that utility required ever faster rates of speed for each to develop new uses. They were also equally constrained only by the then available, even cutting edge, computer hardware technologies.
This was the beginning of the putative intelligent machine stage in DataLab Project development.
The DL Main could not, of course, think. No true artificial intelligence has yet been developed, at least not in the Hollywood meaning of the term, but rapid advances in both emerging hardware and software technology had the combined effect of working so fast, so efficiently and so accurately that software programmers sometimes felt like the “damn thing could almost think”. In that sense, the DL Main was an intelligent machine.
The most innovative toys and gadgets developed through the advances in expensive new technology could only be owned and afforded by governments, which were all remarkably keen to learn about all the bells and whistles that the DL Main could bring to bear in solving their problems. What could the DL Main do now that we can utilise? What else can the DL Main do? What else might it be able to do eventually? How can new goals be moved along and who will pay for it?
As academics and business research organizations lined up to add their research data to the DL Main, they also sought time and new software and processes to assess and evaluate their own data for their own purposes. That the data always entered but never exited the DL Main was of little concern to almost everyone.
Until, of course, one day it was.
By then the world of computing discovered it was too late. No longer were they even sure if there was such a thing as an off switch for their own computers already connected to the damn thing. Assurances of fair play and equitable access as well as system safety and security had always been freely given by the DataLab Project, but some no longer trusted whoever was in charge and whether any assurance given was worth the paper it was written on. Nobody was entirely clear on just who currently ran the DataLab Project anymore, or who or where you would go to sue if a breach of the standard agreement was discovered. An amorphous reference to “an agency of the United States government” seemed about the only pertinent information provided. The contracts were with federal organizations nobody had ever heard of and whose physical locations were not known.
Analysis of the fine print in DataLab’s standard contract became a cottage industry among lawyers but the knowledge confirming exactly how you had just been legally screwed by Washington was of little consolation. Unfortunately, broad permission for data usage and licensing had been granted in the DataLab documentation to the government via its ownership of the DataLab Project. An address for notifications under the contract was provided; it was nothing more than a PO Box in Washington, DC however. Most negotiations between the Project and a user took place via email, telephone or online.
New applications, based on the proprietary and seminal software developed by the Institute, had now been incorporated into a wide range of industrial, educational and government software programs. The precise nature of these algorithms, how they interacted with the primary programming and what they did was a trade secret and confidential IP owned by the Institute. Some of it was, at times, covered by government secrecy legislation. Permission to modify any software utilized by a licensee to render it compatible with DL Main operating system was built into the standard agreement. This could mean an unwelcome and potentially intrusive modification to, or capture of, third party software but only if the user was unaware of the issue in the documentation. The only way to avoid problems was to keep the systems separate, requiring an alternate and expensive arrangements for a pristine backup system.
As a “take it or leave it” proposition was the only one on the table, and the full menu of DataLab Project tricks not completely known or even disclosed, most users gave in to the proposition that the government could access their data one way or another anyway. If government intrusion was illegal and affected them, they would deal with that in the court system. If it didn’t affect them directly, or harm them in any way, then why did they care?
Any organization with any truly sensitive data would “air gap” their systems anyway. Many did, and these measures had zero to do with their DL Main usage or licensing. It was simply prudent to remove from harm’s way the very tasty treats that corporate spies would like to obtain with a virtual five finger discount.
Adam had used many of these advances in jobs he had undertaken for the Institute and began applying the technology to his father’s profession, forensic archeology. The results had been very useful in building criminal cases for the FBI and DOJ and for foreign governments similarly seeking his father’s forensic and investigatory expertise.
The data in the Library Adam had collected in Tucson had not originally even been considered for uploading to the DL Main, at least not right away. If this was part of a potential criminal investigation, as he had been led to believe, he would need to be very careful with who saw or handled the data. Chain of custody was an important consideration in handling evidence to avoid potential loss or tampering. Too many fingers should not be allowed to touch this information; nonetheless basic verification of what had been collected was essential.
Dealing with it as evidence at trial would be addressed after a later and thorough assessment of what they had. A cursory inventory was first undertaken, almost by accident.
Just in case.
Chapter 9
Edward St. James was an archeologist of some repute at UCLA when he and two close friends were recruited by the Institute following the tragic death of Edward’s wife, Anna. Edward, his son Adam and colleagues Maria and Agustin Suarez together their young son Rodrigo made the long trek to Barrows Bay, the home town of the Victoria Institute. They eventually became Canadian citizens, and, except for Edward and the boys, rarely left Barrows Bay or Vancouver Island.
Maria was a gifted engineer whose specialty was micro engineering and her husband Agustin was a highly regarded research chemist and toxicologist with a specialty in biochemistry. They settled into small town Canadian life and were very happy.
As time went by, Edward began taking assignments for museums and governments in cases of archeological pillaging and theft. By understanding the techniques utilized by the robbers to loot sites, methods of getting the items out of the country and eventually into the hands of dealers and buyers around the world, Edward could construct likely scenarios, potential suspects and even hypothesize who and where the buyers might be. Edward was among the first archeologists in the world to utilize modern, cutting edge crime scene investigation techniques to quantify and prioritize likely suspects and participants in the chain of criminal enterprise.
His most famous case, memorialized in the non-fiction novel The Curse of the Minotaur, was a hugely successful best seller bringing temporary notoriety to the reclusive and normally anonymous author.
Edward’s proprietary database of information was extensive, and data often was provided by museums, governments, police departments, insurance companies and international organizations. As well, he catalogued information from local sources irrespective of whether any suspects were convicted of a crime.
As his information grew, so did his need for digital data capture, cataloging and retrieval. Patterns could sometimes be detected based on crime scene evidence which could then yield valuable clues as to the identity of the site robbers as well as the likely onward conspirators in the upstream value chain of criminality. New software developed by Dr. Bitsie Tolan and Dr. Adam St. James in behavioral analytics and pattern recognition became essential work tools when criminal investigations were begun.
Edward could now look back on cases with 20/20 hindsight and pick out the salient det
ails which he could have discovered sooner had more extensive data, data retrieval and analytical software programs had been available back in the day.
Adam was presently on the top secret, government dominated internal advisory board of DataLab Project, the Project’s Chief Technologist and extremely active on the development and planning committees. His activities on these committees began at first as the representative of the Victoria Institute and later as the designated representative of the Institute’s Endowment. Neither were positions anybody really wanted. But as his role as the developer of unique solutions for difficult DataLab Project problems began to become common knowledge among his colleagues and peers, his advancement in the operations committees progressed steadily as well.
As Adam began his serious software work in his early teen years, he was sometimes not taken seriously by older programmers of a certain age and mentality and occasionally mistaken for a child of a programmer at a bring-your-child-to-work day event. Adam rolled with the inevitable slights, jokes and disparaging remarks, since he knew most of his detractors were far less talented and insightful than he. They had plateaued long ago and were coasting on past accomplishments and published articles, while he was already their equal and just getting started. He was just a kid and could solve more problems in a day than they could solve in weeks. His solutions were nimbler, and, most importantly, they worked. Adam wondered what it felt like to be forty and feel that threatened.
Adam didn’t take offense; he just felt sorry for them.
Adam had been directly involved with the DataLab Project, and its many iterations, over the past ten years beginning at around age sixteen. Before that, he was aware of the Project while it was still evolving but only wrote code on a few of its projects and even that was largely done anonymously.
As he became more familiar with the structure and broader outlines of the DataLab Project, and the software that had been developed to enhance the massive DL Main, he began to see applications to various projects his father had undertaken. Then, almost a decade ago, Adam and his colleagues at the Institute developed a program for his father and began using the DL Main to test results.
As was the case for the young programmer in Alberta, the results were somewhat unexpected and truly spectacular. By linking his father’s proprietary database of information to the DL Main, and generating appropriate queries, this new program could make connections and provide possibilities and scenarios that were real world useful. While not every connection was accurate, in the vast majority of cases it was, at least in part. Edward could now provide robust leads in perplexing cases for the authorities to pursue.
Of particular note was a novel new software program developed at the Institute that was extremely useful in tracking individuals and tracing activity backwards in time. This, in turn, led to a wealth of new connections to other individuals and events which had heretofore been difficult to piece together. Then, just about six years ago with the addition of software developed by Dr. Bitsie Tolan, the proprietary database of Dr. Edward St. James reached its zenith or utility for law enforcement.
It wasn’t proof of any criminality but might lead to arrests in the future when similar patterns emerged. At a minimum, it was a good place to start.
Using his new resources, Adam began to consult with a few of his father’s former colleagues as well as academics pursuing important but not life critical or commercially viable projects. These were largely information gathering and research projects, often in arcane fields where funding was limited, data gathering difficult and access to foreign resources quite limited.
If a researcher wished, for example, to track primary source documents regarding the Templars, those documents and related information were likely strewn across many countries in Europe and the Middle East. Researching the topic in one library after another in original and ancient languages could take years, yield no tangible result and be extraordinarily costly.
The DL Main offered a more comprehensive and inexpensive solution to that problem. With full access to public, private and government data resources, not only was information currently in the system available but additional data resources could be requested through a ‘special order’ protocol.
For that information to be useful, however, extensive knowledge was required to design the software, devise appropriate queries and analyze results. The difference was that a toehold of accurate information might be all that was required to secure the desired result. Then the algorithm fuelled program would twist and turn its way through the narrow openings of relevant data in the system and find the critical information that might help solve a mystery.
Or a researcher may discover that there was simply no data in the system to find that was readily available. But as data continued to be added, with more foreign governments, private collections and commercial enterprises participating, the raw data continued to grow exponentially. Many foreign governments rightfully concluded that participation, even with only limited access, was better than simply waiting for the NSA or some other organization to access their systems and take the data anyway. Cooperation was not coerced but most thought options were astonishingly limited.
One such project, for a historian at UC Berkeley, sought to uncover information about a wide swath of secret societies in Europe dating back to Greek and Roman times, and continuing to the present day. The project was designed to identify these societies, what they were about and identify who past and present members may have been. Through church and government records, as well as journals, diaries and other primary source documentation, much surprising conjecture was confirmed and leads developed for new research. Cross matching often indicated that many individuals, ostensibly leading advocates for one idea publicly, privately held contrary views. Similarly, many individuals were publicly associated with certain organizations that they often opposed privately or were members of various Clandestine or secret societies.
The DL Main could make extraordinary connections and associations based on seemingly disparate and unconnected facts. It could find people, trace families and uncover secrets never meant to see the light of day.
It was a powerful tool. It threatened powerful groups. It made enemies.
Adam loved every minute of it.
Chapter 10
The process of uploading data to the DL Main was never as straightforward as one might imagine. It had largely to do with the evolution of the presumed purpose of the DL Main, which, of course, had changed dramatically over the years. As the ability to capture data and ‘manipulate it’, as one of the founders stated in his last interview ever on the subject, it would be possible soon to redefine our understanding of the world in ways we could not presently imagine.
Imagine, he said, what we could understand about the universe if we could detect and know the relative position of every particle, everywhere, over time. Physicists have long held that with that knowledge they could predict the future and likely explain everything that has ever happened in the past. Not excluded from any discussion on the topic of prediction was, of course, the reputed chaos of human behavior.
A slippery slope to some; an impossibility to others. Mostly it was viewed by academics as an interesting topic to consider but since the physics of the underlying requirement wasn’t considered even remotely possible, now or ever in the future, what would be the point of consideration beyond interesting cocktail conversation? Conclusion: highly theoretical with no practical application or value.
A young grad student heard the story and took a different path of analysis. While, he posited, knowing the existence of, and relative position of, every particle in the universe would be an impossibly daunting task, he wondered what might happen if you confined your inquiry to just one small topic. What if the universe he wanted to examine, or perhaps metaphorically just a good size solar system, was a small one, with lots of available data? An entire universe with very little data was a dead end but some solar systems of informat
ion might be viable for study. You didn’t need everything, necessarily, you just needed some good and relevant data.
Further, he conjectured, what might one learn if one amassed every known fact about that topic, from every source imaginable and had sufficient computing capacity and the right software to analyze it? What could you learn then?
When this revelation came to him there was neither the computing power nor the analytical tools necessary to even begin to test his hypothesis. So, he did what every scientist would do: absent the tools, he theorized how such an endeavor might be undertaken and what might be needed to make this concept a reality. In his seminal paper on the subject, he postulated what the hardware, software and logistical requirements would have to be to support such an endeavor.
While his conjecture was correct, his ‘math’ was not even close. The requirements and sophistication of the tools necessary to achieve even small steps forward were well beyond what he had imagined. Not to worry though, others who read his paper and believed it had merit understood exactly what problems they faced and understood that it needed to have long term funding and a starting point.
They found both, first for an innocuous little academic library project funded by a Canadian endowment, and later from the deepest of all deep global pockets. Proponents of the initiative took the long view that “we can’t do it now, but we will someday”.
Then they waited.
Chapter 11
In early 2004, a researcher at a small boutique computer lab in Calgary, Alberta was attempting to adapt specific artificial intelligence software to a data extraction program to let it run against known and verified oil and gas drilling results. If the new set of programs could use existing data to confirm the results that were already known, new opportunities might be extrapolated from other data already in the database but not otherwise readily accessible to oil and gas industry researchers.
Discovery Page 39