Today, personal data is being used to make decisions at every level of modern life. Electoral politics, tax policy, health care, finance, and education have all been radically changed by the capacity to collect and store personal data on an unprecedented scale. Private companies collect even more data than the government, trying not only to figure out what book you will buy next, but also how many calories you are willing to consume in a single cookie. We, as individuals, also have some access to personal data, which we use when we peruse online hotel reviews or check house sales in our neighborhood. Decisions are being made for us, and occasionally by us, based on enormous collections of personal data.
The free flow of information has always been a central principle of democracy. Yet we as a society are not truly scrutinizing whether all this data is being collected in a manner consistent with our democratic principles. Information is quickly becoming the most valuable resource in our modern economy, but even as scandals arise and fears of privacy invasion grow, we are not asking ourselves the truly difficult questions: Who has access to my personal information? What is it being used for? Is it being used for me or against me?
Perhaps the most important question of all is, "How could this valuable resource benefit all of us and not just the government agencies, institutions, and private corporations who hold it now?"
The Common Datatrust Foundation seeks to answer these questions and propose a new way of thinking about personal data that could revolutionize the way we see ourselves and our society.
1. Who has access to my personal data? How did they get it?
Personal information is being collected from us everyday. Although science fiction writers have always warned us to watch out for Big Brother, the U.S. government is not the only holder of personal data. The data from the census and tax returns are dwarfed by the quantity and quality of data private corporations have access to and buy and sell to each other. Not only do companies store our name, address, and credit card information, they track what we buy and even what we type into search engines.
Despite the sensitive nature of this information, it is collected mostly as a byproduct of transactions. Insurance companies, pharmaceuticals, banks, and mere retailers all collect information, but those who provide online services have a particular ability to capture every action of their customers for aggregation and analysis. They bait individuals with a free service and depend on general apathy to get them to agree to a lot of fine print in privacy policy statements and "End User License Agreements" that are essentially meaningless because no one, not even the service provider, understands what the implications will be of agreeing to the terms stipulated in the agreement.
These documents include vague clauses that refer to collecting data about the customer's use of the service, and sometimes cast even broader nets, collecting data about general usage of the user's computer or the information they access over the internet. Most include clauses that state the agreement may change at any time and without any notice. Every effort is made to minimize specifics about what exactly is being collected, both to minimize liability around inaccuracies and because over-specificity might garner undesired attention and scrutiny.
What do we get in return for our personal data? Right now, nothing. We may be able to seek some penalties for misuse, but only after misuse has occurred and only after we prove there were damages. This is because we don't own our personal information; it is not considered our property. Yet our data is valuable enough to companies that buying and selling it is big business.
2. What is my data used for? What's wrong with the status quo?
Karl Rove made headlines for using data on wine versus beer preferences in electoral campaigns, but personal data is used by the government to shape less-sexy policies that nevertheless affect all of us. "The proposed tax cut will help the middle class"; "the proposed fare hike will not affect many transit riders"; "this community needs x amount of federal dollars" - all these claims are made based on personal data, whether collected from the U.S. Census, surveys, or other datasets. On an even larger scale, private companies use personal data every day. Private companies mainly use it to try to sell us more things, but that can mean something obvious, like Amazon making book recommendations, or something less obvious, like using Nielsen ratings to determine how much advertising should cost.
None of these uses is necessarily nefarious. However, there are several significant problems with the status quo. First, there is a great deal of information out there that is personal, valuable, and yet completely inaccessible to the people who created it. Access to information is a crucial condition for democracy, why we zealously guard our freedom of speech, press, and association. That some of the largest collection and analysis of information in the world is being done without our explicit participation has serious implications beyond the relatively simple crime of identity theft. The potential harm to our democracy is much graver than any digital divide that is occurring because of socioeconomic disparities in the availability of laptops.
Second, many of the decisions being made that impact our lives are being made based on flawed or limited data. Private corporations are limited by law from gathering certain kinds of data, and current methods of surveying people are unreliable, yet millions of dollars, both private and public, are spent everyday based on inaccurate data.
Third, as bad as the situation is, it could very easily get worse. The status quo will certainly collapse if it continues. Consumers are increasingly aware of how much of their information is being collected, especially when a government laptop is stolen, or when Yahoo helps the Chinese government identify an online dissident. Even private corporations are unhappy with the status quo. They are limited in their ability create high-quality, accurate, and detailed personal data sets, and they know the government is likely to squash their ability to collect even the data they currently collect.
If this happens, we as a society will never find out what we might have achieved with this valuable resource.
3. How could personal data benefit all of us?
For the first time in history, we have the technological ability to view and analyze an infinity of data points, to look beyond anecdotal studies and speculative models, and instead, accurately introspect about ourselves and our society.
An individual with access to an anonymized aggregate of data from similar individuals would have a tool much sharper than a Google search to determine what kind of medical treatment is best, what kind of investment strategies make the most sense, or what a house is really worth. That individual would not have to rely only on the advice of doctors being paid by pharmaceutical companies, financial advisers with interest in the companies whose stocks they sell, or shady real estate developers.
On a broader level, a broad database of anonymized, aggregated data would allow a public health official to determine how best to use limited resources to fight disease. A researcher could analyze, rather than theorize, whether government tax policies have the effects legislators claim. Cutting-edge technology for anonymization and encryption could allow government agencies to be transparent, as mandated by law, without jeopardizing individuals' privacy. We could actually create programs that work and eliminate those that don't.
We are at the brink of something huge and utterly unprecedented. In the same way Gutenberg couldn't fully anticipate how society would be revolutionized by his printing press, we cannot anticipate how the world could change through smart, thoughtful collection, sharing, and analysis of personal information. The technology has raced ahead of our ability to understand its potential. But private companies have already realized that incredible potential exists, and it's time for the public to realize this, too.
We are at a crucial moment in time. Consumers are concerned about invasions of privacy and identity theft. Corporations are at the limit of what they can collect, and fear data collection will be shut down altogether. Private and public sector interests are aligned in that we all know the status quo cannot continue. We thus have the opportunity to work together to make change now, by shifting the discussion from "privacy versus information" to "privacy and information".
The Common Datatrust Foundation thus proposes the creation of a new kind of institution, a nonprofit "datatrust," that would provide two main services. Wholly transparent and trustworthy, the datatrust would act as the central administrator for data transactions between individuals, corporations, researchers, and government agencies, and maintain a secure store for the data using cutting-edge anonymization, encryption, and watermark technologies.
Data would no longer be collected through vague user agreements or inaccurate surveys. Rather, individuals themselves would actively participate by providing personal data to the nonprofit institution in return for access to the anonymized, communal data store themselves, as well as other data management services. Personal data would become quantifiable personal property of the individual providing it, each "datagram" deposited the way people currently deposit money in the bank, and with all the protections our law provides to personal property. Individuals would choose among a variety of settings on the extent to which their anonymized data would be made available to researchers, government agencies, and corporations. The terms of each transaction would be clear, explicit, and enforceable under contract law. Individuals would be able to query the data store in a variety of contexts, such as personal financial planning or healthcare. Individuals would have an incentive to provide more accurate, detailed data than is currently available anywhere as 1) they would be confident that it would remain safe and anonymous, and 2) the more information they provide, the more information they would be able to access.
On the other end, researchers, government agencies, and private companies would be able to access detailed, accurate personal data for service fees that would be used to maintain the datatrust. Corporations would no longer be able to simply gather data as a byproduct of other transactions, but they would gain access to accurate, higher quality data. Researchers would gain access to information that could result in revolutionary changes in public health, tax policy, and numerous other areas. Government agencies and other entities could even choose to deposit their databases of personal information with the institution, as many of their existing systems are inadequate and insecure. Eventually, the datatrust would make "grants" of data to researchers or other nonprofit organizations that are unable to afford the service fees.
Clearly, such an institution would not immediately have a significant, large store of personal data from individuals, but it could quickly provide a secure store for existing datasets that would increase public access, including researchers, agencies, and corporations, to data that is detailed, accurate, and secure. More importantly, it would set forth a new paradigm, a new way of thinking about personal data and privacy that openly acknowledges its immense value to individuals and all of society, and not just the corporations and other entities that currently hold it.
Such an institution would have to be absolutely trustworthy, both in terms of the technology it uses and the policies by which it governs itself. There are clearly outstanding issues that need to be resolved - our legal understanding of privacy may have to change, technology will have to be licensed or developed. However, the Common Datatrust Foundation is confident that a trustworthy "datatrust" could be created through transparent governance policies and technological innovation. (For more information on technologies currently being considered, see "An Open Technology Platform," below. For more information on CDTF's proposed governance policies and likely organizational structure, see "Our Values and Our Governance," below.)
Most importantly, however, the Common Datatrust Foundation seeks to foster awareness and dialogue around privacy issues and to promote a solution that returns ownership and control of personal data to individuals without shutting down the full potential of information sharing for society as a whole. The creation of a nonprofit institution to administrate data transactions and store personal data securely is one possible solution, but the Common Datatrust Foundation actively seeks any solution that would effectively solve the data and privacy problems facing us today. The Common Datatrust Foundation therefore pledges to be transparent and open in the policies it promotes and invites participation from individuals, researchers, agencies, and private companies.
5. An open technology platform.
The CDTF technology platform is a secure-structured data storage system that differs from traditional data storage systems in several ways. Therefore, we have termed this storage system a "datatrust."
Data is submitted to the database in units called "datagrams," which are one or many database records attributed to the originator of the data, i.e., the individual or institution providing the personal data. Each "datagram" has a set of rules around how it may be disclosed, as designated by the originator. For example, an originator could decide to barter a datagram of personal demographic information for aggregate information about other individuals under the condition that it is disclosed only in aggregate form through the CDTF anonymizing noise filter. When data is queried from the platform, the disclosure requirements for the set of datagrams queried are evaluated, and a series of mechanisms are applied to preserve privacy to the designated level. In contrast, a traditional database does not allow for queries that both respect different privacy settings while also maintaining the informational value of the data.
There are three key aspects that are important to the success of the platform: security, anonymization and watermarking, to prevent a recipient of data from then using the data outside the agreed-upon terms. Luckily, there are promising new technological developments in each area.
Security
As should be expected, the platform's infrastructure will be secured against unauthorized access using appropriate industry-leading technologies. From a secure end-user client application to the datacenter, from the datacenter to the delivery of data to institutions, the handling of data will be transparent, open for industry scrutiny and in the end, deferential in control to the data owner, the individual who provided the data. CDTF will continue to investigate, pursue and invest in security that will keep the datatrust at the cutting edge of computer and network security.
One technology we are examining that takes data security to a new level is called Crypto-Secure Computation. This technology allows the statistical combination of encrypted datagrams into an aggregate encrypted datagram. In other words, each datagram would contain a cutting-edge level of security, and the entire datatrust would have the exponential security level of all the datagrams combined. The cost of brute-force decryption of each individual datagram submitted would be astronomical, therefore creating an immense (though albeit theoretically surmountable) barrier to full disclosure. At the same time, Crypto-Secure Computation would allow the aggregate datagram to be decrypted at approximately the cost of decrypting a single individual's data, therefore maintaining the flexibility and utility necessary for analysis.
Anonymization
Core to the value proposition of CDTF is the ability to effectively "anonymize" sensitive data. Today, most privacy discussions revolve around Personally Identifiable Information, which involves the careful handling of information such as people's names, addresses, social security numbers, credit cards, etc. While CDTF will certainly remain at the forefront of best practices around handling such data, it is more concerned with a less intuitive form of anonymization. By combining two different sets of seemingly innocuous personal attributes, it's possible to uniquely identify individuals in a dataset. Attention is rarely paid to protecting against this variety of identification disclosure.
Perhaps one of the most promising technologies CDTF is examining to allow for the graduated control of true anonymization of data is called Interactive Query Sanitization. This process adds random noise to a dataset, which introduces uncertainty, and thus privacy into the data it secures. Unlike current technologies that "scrub" data, Interactive Query Sanitzation allows accuracy in the aggregate query results to remain. But removing any single individual's data (even from quite small groups) has no perceptible effect on the summary data, which means that even highly-targeted querying cannot be used to single out individuals. There is a sliding scale of noise-to-identity that can be adjusted depending on the level of privacy required.
Watermarking
Because CDTF will be releasing data frequently to parties for explicit uses by individuals, organizations, and other entities, data-watermarking technology is of great interest. Watermarks can retroactively indicate the responsible party in the event of an unauthorized transfer of data. Traditional data watermarks, however, have had trouble proving which of numerous parties who handle data during a transaction is responsible for the disclosure. Crypto-secured Watermarking allows for a mathematical proof that watermarked data could only have been unpackaged (and then transferred without authorization) by the receiving party. So, CDTF would encrypt data (probably with a public-key/private-key type mechanism) such that only the intended receiving party could have opened the watermarked data. This could be used as a mechanism for data buyers to prove their proper-handling of data, and in the event of an unauthorized disclosure, help prove who should be held liable for damages. Additionally, the existence of such technologies would also have a deterrence effect for those tempted to misuse CDTF data.
As CDTF works with researchers and others involved in the development of these and other technologies, CDTF will continue to remain open about how the technologies are being developed and any potential weaknesses that arise.
The Common Datatrust Foundation believes that public access to information is crucial for a democratic society. Correspondingly, it believes that as an organization, it can only become trustworthy if its operations are transparent and open for public scrutiny. CDTF therefore seeks to create not only a datatrust that is technologically secure, but also an organizational structure that forces CDTF's directors and staff to operate in a wholly transparent manner. CDTF will work with experts in security technology, information, and nonprofit law to develop policies for operating externally and governing internally that are in keeping with the following values.
CDTF will be motivated by its long-term mission rather than any short-term financial concerns.
CDTF's decision to incorporate as a 501(c)(3) not-for-profit organization was a carefully considered decision. Because CDTF would be a "datatrust" and holder of a very valuable asset, information, it is crucial that CDTF's directors and staff be motivated by goals that are broader and richer than short-term financial gain. The directors of for-profit businesses have a fiduciary responsibility to seek profit; in contrast, the directors of CDTF will have a fiduciary responsibility to its nonprofit mission.
Currently, CDTF is funded by donations from a private donor. CDTF will soon seek funding from established foundations with similar goals and values. In the long run, CDTF will charge service fees to institutions, agencies, and businesses that seek to access and analyze the anonymized information in the datatrust, with eventual "grants" of data being made to worthy applicants. All fees will go towards maintaining and operating the datatrust; no "profits" will inure to CDTF's directors or employees. CDTF will avoid any debt financing, as its primary asset should never be used as collateral. CDTF will maintain a "shut-down fund," separate from all other funds available for its operation, that will be used to destroy all the data in the datatrust should the organization dissolve for any reason.
CDTF will create clear-cut, explicit standards for use by information-providers and information-collectors that allow individuals to specify precisely how their information may or may not be used; these terms will not be subject to change.
Many "privacy policies" do not actually promise to keep personal information private, as most state that its terms are subject to change at any time. CDTF pledges to do the opposite, to broker exchanges of data that between individuals and agencies, institutions, and businesses based on clear-cut, explicit standards for anonymity and secondary use that are never subject to change. If necessary, CDTF will support legislation or litigation that supports an individual's right to control his or her personal information. Although CDTF will establish its datatrust with the goal of promoting information-sharing and greater access to knowledge, it will never transfer the information in the datatrust in any manner that violates these standards, including transfers of assets upon the event of CDTF's dissolution.
CDTF will minimize to every extent possible conflicts of interest between the organization and its directors and staff.
CDTF recognizes that all organizations, for-profit or not-for-profit, run the risk of misconduct by their directors and/or employees. Given CDTF's mission and the sensitive and valuable nature of its primary asset, any misconduct by directors or employees would be particularly dangerous. As a nonprofit, CDTF will both cooperate fully with every regulatory requirement and go beyond these requirements to minimize to every extent possible any misconduct by its directors and employees. CDTF will aim set a standard for transparency within the nonprofit sector.
CDTF will work to create an environment, a healthy ecosystem, in which the organization can be open, creative, and flexible in the pursuit of its mission.
CDTF's mission is broader than the establishment of a datatrust in any particular shape or form. In order to achieve its mission, CDTF pledges to create an environment in which creative solutions to advance these goals are given due consideration. Although there are many outstanding issues to be resolved, CDTF will address these issues in an open manner with participation from a diverse and interested community of advocates for privacy and democratic information-sharing.
Home Blog White Paper About Us Contact
© 2008 The Common Datatrust Foundation. All Rights Reserved.