Book Read Free

Jim Baen’s Universe

Page 78

by Edited by Eric Flint


  If the MyLi­fe­Bits pro­j­ect is any in­di­ca­tor, the an­s­wer is, qu­ite a ways.

  MyLifeBits is an on­go­ing pro­j­ect in­vol­ving Gor­don Bell and ot­hers at Mic­ro­soft Re­se­arch. In­s­pi­red in part by Van­ne­var Bush’s 1945 ar­tic­le, “As We May Think,” which des­c­ri­bed an elec­t­ro­nic me­mory ex­ten­der that Bush cal­led the me­mex, the MyLi­fe­Bits te­am set out to put as much of Bell’s li­fe in di­gi­tal form as pos­sib­le. They fo­cu­sed ini­ti­al­ly on the sim­p­le task of di­gi­ti­zing his le­gacy ma­te­ri­als, such as his past and cur­rent wri­tings, pho­tos, and CD col­lec­ti­on.

  Bell, for tho­se un­fa­mi­li­ar with him, is one of the grand ol­der men of com­pu­ting. He jo­ined the then-new and now va­nis­hed Di­gi­tal Equ­ip­ment Cor­po­ra­ti­on (DEC) in 1960, wor­ked on many pro­j­ects the­re, in­c­lu­ding the early mul­tip­ro­ces­sor PDP6 system, and was the fat­her of the very in­f­lu­en­ti­al and highly suc­ces­sful VAX mi­ni­com­pu­ter ar­c­hi­tec­tu­re for DEC. He wor­ked in mul­tip­le mul­tip­ro­ces­sor-com­pu­ter com­pa­ni­es, co-fo­un­ded The Com­pu­ter Mu­se­um, and ul­ti­ma­tely jo­ined Mic­ro­soft Re­se­arch, whe­re he still works to­day. (Dis­c­lo­su­re: One com­pany that Bell co-fo­un­ded, En­co­re Com­pu­ter Corp., pur­c­ha­sed a com­pany I co-ow­ned. I met Bell but ne­ver re­al­ly wor­ked with him; my loss.)

  When Bell and the MyLi­fe­Bits te­am be­gan the pro­j­ect so­mew­he­re aro­und 2000, they es­ti­ma­ted that a te­rab­y­te of sto­ra­ge wo­uld be eno­ugh to hold the re­adings and wri­tings of a typi­cal 80-ye­ar hu­man li­fe. At the ti­me, a te­rab­y­te was an im­po­sing qu­an­tity of sto­ra­ge, tho­ugh disk pri­ce and ca­pa­city trends we­re ma­king it an in­c­re­asingly mo­re ap­pro­ac­hab­le fi­gu­re even for tho­se wit­ho­ut IT de­par­t­ments.

  Even now, if you don’t fol­low disk sto­ra­ge a te­rab­y­te may so­und li­ke an ex­pen­si­ve and lar­ge amo­unt of sto­ra­ge, but it’s not; many of us co­uld af­ford it, and com­pa­ni­es ro­uti­nely fill many ti­mes that much on­li­ne spa­ce. You can buy 300GB disks for a lit­tle over a hun­d­red bucks, so hit­ting a raw te­rab­y­te of ca­pa­city will set you back less than $500. With that much sto­ra­ge, you’d al­most cer­ta­inly want so­me in­teg­ra­ted re­dun­dancy, but the chas­sis and con­t­rol­lers ne­ces­sary to pro­vi­de tho­se fe­atu­res aren’t ex­pen­si­ve; ven­dors sell pre-pac­ka­ged te­rab­y­te sto­ra­ge de­vi­ces in the $1.2K to $2K ran­ge. So, tho­ugh it wo­uld ob­vi­o­usly be a lu­xury, the sto­ra­ge it­self is not an ob­s­tac­le. (The PC sit­ting at the end of my desk, one I bu­ilt and plan to ma­ke my pri­mary system, has a te­rab­y­te of use­ful spa­ce in a disk ar­ray with bu­ilt-in re­dun­dancy.)

  The fu­tu­re of sto­ra­ge, of co­ur­se, lo­oks even mo­re pro­mi­sing, and the im­p­ro­ve­ments are, li­ke tho­se in all the key tec­h­no­lo­gi­es I ci­ted, ar­ri­ving with ex­po­nen­ti­al­ly in­c­re­asing spe­ed. In the short gap bet­we­en the first and se­cond com­p­le­te drafts of this co­lumn, Se­aga­te an­no­un­ced a 500GB disk dri­ve that uses per­pen­di­cu­lar re­cor­ding, a tec­h­ni­que that’s be­en he­ading to­ward mass com­mer­ci­ali­za­ti­on for so­me ti­me. A Se­aga­te spo­kes­per­son sa­id that we can ex­pect that ca­pa­city to grow by a fac­tor of fi­ve over the next three to fi­ve ye­ars.

  Meanwhile, me­mory ven­dors are ex­p­lo­ring a va­ri­ety of new tec­h­no­lo­gi­es in the­ir qu­est for ever fas­ter, lar­ger, and less vo­la­ti­le non-mo­ving sto­ra­ge. Nan­te­ro, a start-up com­pany tar­ge­ting the po­ten­ti­al­ly hu­ge mar­ket for non-vo­la­ti­le RAM (NRAM) an­no­un­ced re­cently that it had cre­ated the ba­sis of a 10G-bit NRAM sto­ra­ge ar­ray. This de­ve­lop­ment is now­he­re ne­ar a pro­duct, and the­re’s no gu­aran­tee that the­ir par­ti­cu­lar ap­pro­ach will work, but it’s a go­od bet that so­me com­pany will be cre­ating mul­ti­gi­gab­y­te NRAM mo­du­les wit­hin a de­ca­de.

  The MyLi­fe­Bits te­am was de­fi­ni­tely right to con­c­lu­de that sto­ra­ge will not be a prob­lem.

  The chal­len­ge be­ca­me fil­ling that sto­ra­ge. Scan­ning all the pa­per ma­te­ri­als and cap­tu­ring all the al­re­ady di­gi­tal con­tent, such as songs on CDs and exis­ting di­gi­tal pho­tos, was a mat­ter of la­bor and sof­t­wa­re. Bell had a de­ci­ded ad­van­ta­ge over most of us in the­se ef­forts, be­ca­use he was wor­king with a tec­h­ni­cal te­am that pro­vi­ded the la­bor, but any of us with eit­her the ti­me and ex­per­ti­se to do it our­sel­ves or the mo­ney to pay so­me­one el­se to han­d­le it co­uld ta­ke the sa­me steps.

  While put­ting exis­ting con­tent on­li­ne, they al­so star­ted cap­tu­ring the con­tent Bell was using and ge­ne­ra­ting in his da­ily li­fe. Pho­tos we­re easy; Bell used di­gi­tal ca­me­ras. Sa­ving in­s­tant mes­sa­ges is no har­der than tur­ning on so­me stra­ig­h­t­for­ward sof­t­wa­re log­ging to­ols. You can re­cord pho­ne calls with di­gi­tal re­cor­ders or, even bet­ter, by using Vo­IP (Vo­ice over In­ter­net Pro­to­col) pho­ne sof­t­wa­re on yo­ur PC and cap­tu­ring the calls di­rectly.

  Bell and his te­am did it, and many of us co­uld do it, too, we­re we so in­c­li­ned.

  Put all that to­get­her, and you end up with a lar­ge sur­ro­ga­te me­mory.-one that Bell has sa­id he’s co­me to de­pend on rat­her he­avily.

  As you might ex­pect, the MyLi­fe­Bits te­am qu­ickly re­ali­zed that they co­uld re­aso­nably and inex­pen­si­vely as­sem­b­le a gre­at de­al mo­re than a te­rab­y­te of sto­ra­ge, that de­man­ding mo­re sto­ra­ge, even a gre­at de­al mo­re, was no prob­lem, and that they had many mo­re ide­as for fil­ling that elec­t­ro­nic spa­ce. Thus, the MyLi­fe­Bits pro­j­ect evol­ved its mis­si­on to be­co­me one in which they wo­uld cap­tu­re ever­y­t­hing pos­sib­le, not just exis­ting pa­per, pho­to, and vi­deo con­tent.

  Today, they’re de­aling with an ever-ex­pan­ding re­alm of di­gi­tal ma­te­ri­als. Bell is we­aring a small ca­me­ra un­der his hat and re­cor­ding con­ver­sa­ti­ons and me­etings as they hap­pen. He’s using a GPS that’s con­s­tantly trac­king and re­cor­ding his lo­ca­ti­on. A Bod­y­Bugg ar­m­band full of sen­sors is mo­ni­to­ring and cap­tu­ring da­ta abo­ut his body, in­c­lu­ding the num­ber of ca­lo­ri­es he’s bur­ning thro­ug­ho­ut the day. Log­ging sof­t­wa­re wor­king with the grap­hi­cal user in­ter­fa­ce on his system is re­cor­ding ever­y­t­hing he’s do­ing on the com­pu­ters he uses.

  Most of this is ma­na­ge­ab­le for the rest of us. Even re­cor­ding vi­deo full-ti­me is pos­sib­le, al­be­it with hu­ge sto­ra­ge re­qu­ire­ments, be­ca­use a de­cent di­gi­tal vi­deo ca­me­ra costs less than a grand. Want to log the TV shows you watch? Set up a me­dia cen­ter PC and re­cord them. Mo­vi­es you see at the­aters are har­der to copy, but by wa­iting un­til the­ir DVD ver­si­ons hit the sto­res or they ap­pe­ar on cab­le TV you can get the da­ta, ad­mit­tedly with a ti­me de­lay (and pos­sibly in vi­ola­ti­on of cop­y­right laws via sof­t­wa­re the en­ter­ta­in­ment in­dustry wo­uld pre­fer you not use).

  The con­s­tantly ac­ce­le­ra­ting tec­h­no­logy trends ma­ke the da­ta-cap­tu­re ever sim­p­ler, of co­ur­se, with ca­me­ras shrin­king in si­ze and im­p­ro­ving in qu­ality, mo­re and mo­re con­tent be­co­ming in­s­tantly ava­ilab­le di­gi­tal­ly, and so on.

  The mo­re da­ta you sto­re, of co­ur­se, the lar­ger an in­for­ma­ti­on or­ga­ni­za­ti­on prob­lem you fa­ce. The MyLi­fe­Bits te­am ran in­to this is­sue big-ti­me.

  They be­gan by using a sim­p­le PC fi­le system and so­me fi­le na­ming con­ven­ti­ons, but over ti­me this ap­pro­ach was not, as you might ima­gi­ne, up to the task. They simply had too much da­ta.


  Today, they use a sof­t­wa­re system bu­ilt aro­und a SQL Ser­ver da­ta­ba­se. They sto­re both the raw con­tent-text, pic­tu­res, ema­il, wha­te­ver-and so­me as­so­ci­ated at­tri­bu­tes and com­ments (“me­ta­da­ta” in ge­ek-spe­ak). They al­so sto­re links among the items in the da­ta­ba­ses. For exam­p­le, a pho­to of a me­eting might link to the tran­s­c­ript of that me­eting and to en­t­ri­es for all the par­ti­ci­pants. The com­bi­na­ti­on of the da­ta­ba­se’s se­ar­c­hing po­wer, de­cent me­ta­da­ta, and links bet­we­en da­ta items ma­kes the sto­red in­for­ma­ti­on qu­ite po­wer­ful-tho­ugh still not as easy to ac­cess as they’d li­ke. The mo­re me­ta­da­ta and the mo­re as­so­ci­ati­ons bet­we­en items they can get Bell to ma­ke, the mo­re use­ful the da­ta be­co­mes.

  The po­wer of the links, by the way, is much gre­ater than might be ini­ti­al­ly ap­pa­rent. For one thing, the links bring the da­ta­ba­se clo­ser to the way our bra­ins se­em to ope­ra­te than it wo­uld be wit­ho­ut them. Trying to re­mem­ber the na­me of that guy you met bri­efly in a me­eting last Tu­es­day? Lo­ok up eit­her the en­t­ri­es for the day or for the me­eting, whic­he­ver works best for you. Se­ar­c­hing for an ar­tic­le you re­ad on a Web si­te whi­le on a bu­si­ness trip in Lon­don? Start with the GPS lo­ca­ti­on of the ho­tel you we­re in and fol­low the po­si­ti­on-ba­sed links. A strong set of links go­es a long way to­ward mi­mic­king the mul­ti-way as­so­ci­ati­onal me­mory sto­re our bra­ins pro­vi­de.

  Of co­ur­se, the mo­re Bell has to add me­ta­da­ta and cre­ate links, the mo­re ti­me the system is de­man­ding from him, tur­ning it from a ser­vant to a mas­ter. The MyLi­fe­Bits te­am works con­s­tantly to find ways to ge­ne­ra­te links auto­ma­ti­cal­ly and to ma­ke com­ments and ot­her me­ta­da­ta as easy as pos­sib­le for Bell to supply.

  If you fol­low the news at all, the prob­lem of fin­ding as­so­ci­ati­ons in vast qu­an­ti­ti­es of da­ta will so­und very fa­mi­li­ar; it’s cer­ta­inly high on the NSA’s to-do list. On­ce aga­in, tec­h­no­logy de­ve­lop­ments, this ti­me in da­ta mi­ning, will help our ca­use-and ha­ve the po­ten­ti­al to hurt us, of co­ur­se, as the li­fe-web of da­ta we we­ave be­co­mes a com­mo­dity an­yo­ne can se­arch; the­re’s a dark si­de to ever­y­t­hing.

  But is it va­lu­ab­le?

  A da­ta­ba­se of this mag­ni­tu­de and highly per­so­nal na­tu­re is in­te­res­ting (and ra­ises so­me scary is­su­es; mo­re on that be­low), but you ha­ve to won­der if it’s use­ful. In an ar­tic­le on this pro­j­ect in Com­mu­ni­ca­tions of the ACM, Bell and his co-aut­hors com­men­ted on how va­lu­ab­le the in­for­ma­ti­on sto­re had be­co­me:

  “Having a sur­ro­ga­te me­mory cre­ates a fre­e­ing, up­lif­ting, and se­cu­re fe­eling-si­mi­lar to ha­ving an as­sis­tant with a per­fect me­mory.”

  The va­lue of this sto­red in­for­ma­ti­on has pro­ven to be so high, in fact, that Bell has chan­ged the way he works and li­ves. If he has a spa­re mo­ment, for exam­p­le, he might well very bri­efly vi­sit a Web pa­ge on the off chan­ce that he might la­ter want to ha­ve its con­tents in his “me­mory.”

  The mo­re va­lu­ab­le so­met­hing is, of co­ur­se, the mo­re we fe­el its loss when it go­es mis­sing. A hard dri­ve crash re­sul­ted in the loss of fo­ur months of cap­tu­red Web pa­ges, so­met­hing that Bell has com­men­ted he felt as an emo­ti­onal blow.

  As in­te­res­ting, suc­ces­sful, and wi­de-re­ac­hing as the MyLi­fe­Bits da­ta has be­en, its te­am has no­ted that the list of things they’d li­ke to do is still gro­wing. From con­tent, such as pa­per bo­oks Bell re­ads, that they’ve cho­sen not to cap­tu­re for cop­y­right re­asons, to li­mi­ta­ti­ons of the cur­rent sof­t­wa­re, the system’s flaws and are­as of po­ten­ti­al im­p­ro­ve­ment are many. For exam­p­le, if they’re not al­re­ady do­ing it, they co­uld in­dul­ge in spe­cu­la­ti­ve re­cor­ding, in which sof­t­wa­re and/or hu­man agents anal­y­ze Bell’s exis­ting sto­red in­for­ma­ti­on and add mo­re da­ta they think he might find in­te­res­ting. They’ve com­men­ted that they al­ways end up reg­ret­ting not the da­ta they cap­tu­re, but rat­her the in­for­ma­ti­on they don’t.

  They’re al­so acu­tely awa­re that they’re only to­uc­hing the ed­ges of the pos­sib­le va­lue of the in­for­ma­ti­on. They’ve no­ted, for exam­p­le, that the body sta­tis­tics might be very use­ful over ti­me in spot­ting he­alth trends and is­su­es and the re­asons un­der­l­ying both.

  One of the dif­fi­cul­ti­es they fa­ce in fi­gu­ring out what to re­cord is a phe­no­me­non well known to many of us, and cer­ta­inly to wri­ters: you fre­qu­ently can’t know the va­lue of a pi­ece of in­for­ma­ti­on un­til well af­ter you’ve ob­ta­ined it. Traf­fic da­ta for the north si­de of town is not in­te­res­ting or use­ful when you li­ve and work on the so­uth si­de, un­til, of co­ur­se, you ha­ve to run an er­rand in the ot­her di­rec­ti­on. Tid­bits of all sorts of ap­pa­rently use­less re­adings and ex­pe­ri­en­ces turn up in my fic­ti­on all the ti­me. It’s the na­tu­re of the way our bra­ins work, so it’s only re­aso­nab­le that the sa­me sho­uld be true of our di­gi­tal me­mory ex­ten­ders.

  The MyLi­fe­Bits te­am al­so un­der­s­tands that the amo­unt of va­lue Bell gets from the sto­red in­for­ma­ti­on de­pends a gre­at de­al on how easy it is for him to se­arch that in­for­ma­ti­on qu­ickly. Ma­king it easy to se­arch by any of the ava­ilab­le types of paths and links is ob­vi­o­usly key, but that’s just a be­gin­ning. They’re grap­pling with da­ta vi­su­ali­za­ti­on al­ter­na­ti­ves as they try to find the best ways to pre­sent dif­fe­rent types of in­for­ma­ti­on. They’ve fo­und, for exam­p­le, that a scre­en sa­ver that throws up se­mi-ran­dom se­lec­ti­ons of pho­tos and short vi­deo clips has pro­ven use­ful both as a way to ref­resh Bell’s (physi­cal) me­mory and as a me­ans for ma­king it easy and even fun for Bell (and ot­her vi­ewers) to add mo­re me­ta­da­ta.

  These folks are by no me­ans alo­ne in the­se ef­forts, of co­ur­se; re­se­ar­c­hers all over the world are grap­pling con­s­tantly with the chal­len­ge of ma­king the con­s­tantly gro­wing sto­re of di­gi­tal in­for­ma­ti­on mo­re use­ful and easi­er to use. I al­re­ady men­ti­oned the NSA, but they’re far from alo­ne in be­ing ob­ses­sed with da­ta mi­ning; every lar­ge com­pany that’s ever got­ten a tas­te of yo­ur cre­dit cards wo­uld li­ke to know mo­re abo­ut you.

  Transactional da­ta is al­so but one of the types of in­for­ma­ti­on re­se­ar­c­hers are wor­king on mi­ning. At an In­tel De­ve­lo­per Fo­rum last ye­ar, for exam­p­le, I saw a de­mon­s­t­ra­ti­on of tec­h­no­logy from the Di­amond pro­j­ect, a col­la­bo­ra­ti­on bet­we­en In­tel Re­se­arch Pit­tsburgh and Car­ne­gie Mel­lon Uni­ver­sity. The Di­amond pro­j­ect fo­cu­ses on ways to ma­ke it easi­er for users to se­arch lar­ge da­ta­ba­ses of ima­ges, such as pho­tos or me­di­cal ima­ges. In the de­mo I wat­c­hed, the user wan­ted to find a pho­to of a par­ti­cu­lar spe­aker from the pre­vi­o­us ye­ar’s con­fe­ren­ce, but no­ne of the pho­tos had any me­ta­da­ta or even la­bels or da­tes as­so­ci­ated with them. Be­ca­use he was se­eking a per­son, the user first told the sof­t­wa­re to se­arch for ima­ges with fa­ces in them. Be­ca­use the spe­aker al­ways wo­re a blue shirt when pre­sen­ting, the user next in­s­t­ruc­ted the sof­t­wa­re to se­arch for blue. In only a few steps, he fo­und the pho­to he wan­ted. Su­re, the de­mo was ca­re­ful­ly or­c­hes­t­ra­ted, but the un­der­l­ying al­go­rithms and sof­t­wa­re at work we­re qu­ite im­p­res­si­ve no­net­he­less.

  The mo­re ad­van­ces we ma­ke in se­arch tec­h­no­logy, the mo­re use­ful the in­for­ma­ti­on be­co­mes.

  Issues abo­und
r />   Of co­ur­se, the is­su­es this type of sto­red in­for­ma­ti­on ra­ises are both many and pro­fo­und.

  Privacy is an ob­vi­o­us and una­vo­idab­le con­cern with any such ef­fort. Sto­re yo­ur li­fe on­li­ne, and you’d bet­ter eit­her tightly con­t­rol who can see the re­sul­ting da­ta­ba­se or aban­don any ho­pe of pri­vacy. You wo­uld al­so qu­ite re­aso­nably want fi­ne-gra­in con­t­rol, so that dif­fe­rent pe­op­le co­uld see only dif­fe­rent por­ti­ons of the in­for­ma­ti­on.

  What hap­pens, tho­ugh, when the go­ver­n­ment wants to sub­po­ena yo­ur me­mory?

  Speculative re­cor­ding is a co­ol no­ti­on and one you might find very use­ful, but from the mo­ment it star­ted the da­ta­ba­se-yo­ur on­li­ne me­mory-wo­uld con­ta­in in­for­ma­ti­on you’d ne­ver se­en. Wo­uld that da­ta be part of what ot­hers sho­uld con­si­der to be yo­ur me­mory? Co­uld you be re­aso­nably bla­med for for­get­ting it (e.g., yo­ur fri­end’s first re­ci­tal)? Sub­po­ena­ed for wit­nes­sing it (think porn)?

  The prob­lem gets even to­ug­her if you al­low ot­hers to add in­for­ma­ti­on they be­li­eve you sho­uld know. I don’t even want to think abo­ut the do­mes­tic ar­gu­ments that ca­pa­bi­lity co­uld ca­use.

  If such ex­ten­ded me­mo­ri­es we­re to be­co­me com­mon, we’d run smack in­to ma­j­or is­su­es re­gar­ding the on­li­ne me­mory rights of tho­se pe­op­le, no­tably chil­d­ren but al­so ol­der pe­op­le li­ving with ca­re­gi­vers, un­der the le­gal con­t­rol of ot­hers. If you ha­ted it when one of yo­ur pa­rents po­ked thro­ugh yo­ur stuff, how wo­uld you fe­el if they’d be­en run­ning the­ir se­arch al­go­rithms over yo­ur sto­red pho­ne calls, in­s­tant mes­sa­ges, mu­sic se­lec­ti­ons, and so on?

  Everything I’ve des­c­ri­bed is pos­sib­le to­day, and a lot of it is go­ing on right now in the MyLi­fe­Bits pro­j­ect. Even wit­hin the very li­mi­ted res­t­ric­ti­ons of to­day’s tec­h­no­lo­gi­es, this ef­fort is blur­ring the li­ne bet­we­en our bi­olo­gi­cal sel­ves and what we might re­aso­nably think of as our in­tel­li­gen­ces and me­mo­ri­es. What Bush des­c­ri­bed in 1945 as a vi­si­on of the far fu­tu­re is now, li­ke so many spe­cu­la­ti­ons, hap­pe­ning, at le­ast on a small sca­le, in this pro­j­ect. Mo­re im­por­tantly, no­ne of it is be­yond the re­ach of a mo­de­ra­tely we­althy per­son. Even the we­alth is only ne­ces­sary if you want a sup­port te­am to do the work for you; the har­d­wa­re and sof­t­wa­re wo­uld set you back less than the pri­ce of a low-end car. As the ra­te of im­p­ro­ve­ment of the sup­por­ting tec­h­no­lo­gi­es con­ti­nu­es to in­c­re­ase, the cost will only les­sen as the ca­pa­bi­li­ti­es grow.

 

‹ Prev