A Gentle Introduction to Robotics

Information Security

Introduction

Computer or information security is a very large field; it is a specialty area of practice. Security is difficult because of three absolutely unique aspects:

This section is not going to turn you into a computer security consultant. It is not even going to give you recipes for success. It will, however, give you some background concepts and information, which can serve as a base for your further learning if you choose to pursue it.

Three Pillars of Information Security

There are three overall factors generally considered to be key in information (and to an extent computer) security. These are conveniently summarized by the acronym CIA, an acronym which has some connotations in this area:

Successful attacks against any of these three factors can be devastating: the attacker can subvert, and possibly control, the behavior of the entities relying on the information.

I also want to point out that phrase, “legitimate users.” Legitimacy is not something a machine can judge before performing an operation. Legitimacy is something only a human can judge and typically only after the fact. Someone exceeding their authority or accessing data for a malign purpose is not a legitimate user, but may very well be an authorized one. (You occasionally read in the news of a police officer running a criminal records check on their daughter’s new boyfriend, or some such. That is an example of the difference.) Technologically, we can only limit access to authorized users; the organization itself has to make the distinction, typically after the fact, between authorized use and legitimate use, and respond in some way when they get out of sync with one another. And they will: humans are going to human, and managers will continue to demand their workers pull rabbits out of hats even when the security policy forbids hats and all varieties of lagomorphs. But this means, at a minimum, logs of operations, associating the operations with the users on whose behalf they were performed, and someone reviewing the logs.

Identification, Authentication, and Authorization

The question of “legitimate users” is typically addressed through three steps:

Of these, the easiest is identification, the most challenging is authentication, and the most tedious is authorization. Identification, in most systems, is the production of a user name: a string that uniquely identifies a user of the system. This is typically assigned by the system, or may be something taken from a believed to be unique namespace such as email addresses. Authorization, in most systems, is a long list of available operations and a yes or no permission for each identified user to perform the operation. For convenience, users are typically combined into groups which have sets of permissions: so a user can perform operation x if the user belongs to a group that can perform operation x or has an independent user-specific permission to perform operation x. Each operation, when it is performed, must check to see whether the user requesting the operation is permitted to perform the operation — the user is authorized to perform the operation — and refuse to proceed if not. Group permissions, incidentally, should be based on job function, not organizational division. I have been a software developer in a manufacturing group that for some bizarre administrative reason reported through the finance department. Despite me being “in finance,” I strongly suspect the company didn’t actually want me to have access to the company’s general ledger or accounts payable.

Now let’s talk about authentication. Typically authentication relies on one of three things: something the user is, something the user knows, or something the user has. Examples of each are biometrics (measurements of the physical being presenting as the user: fingerprint or retinal scanners, for example), passwords (a string typed in), and dongles (small devices, typically USB, which produce unique results to challenges; or timed random number generators producing a unique but known time-dependent sequence).

Almost every system uses the simple password as an authentication factor. The password is supposed to be a secret known only to the user and the system, and the ability to produce the password is taken as authentication of the claim to identity. A common improvement on this arrangement is for the system to know not the password itself but a cryptographic hash of the password: the result of a complex, information losing, irreversable computation on the password itself. The password supplied is then subjected to the same cryptographic hash, and if the hashes match, the password supplied is valid. This is an improvement because if the system’s password database is captured, the passwords themselves remain secure: the hashes cannot be reversed into the actual passwords. (This is more or less true with certain precautions, including what is called “salt” and using a high-quality hash such as SHA-256.)

A challenge-response protocol can ensure the password itself or its hash (which functions as a password) is never sent over the connection between the user and the system: the system generates random data (the challenge) and presents it to the user; the user modifies the random data in a way that depends on the password; the user sends the modified data back to the system; the system performs the same computation on the original random data using its knowledge of the password and checks to see if the two sets of modified data are the same; if so, the user produced the password.

The password — the something the user knows factor — has been found to be unreliable in practice. Users forget passwords; users write down passwords and put them on a sticky note on their monitors; users tell others their passwords in exchange for offers of “oh, I’ll do that for you, what’s your password?”; and so on. Multi-factor authentication depends on two or more of the authentication factors (being, knowing, having).

There is some debate over what counts as multi-factor authentication. The typical “security question” obviously does not: it is a second “knowing” factor, not an independent factor: it is just a longer password (and should be treated as such: the street of my childhood home is not in fact Rumplestiltskin, and my mother’s maiden name is not in fact Wendywindywattywoo, but there is a web service out there that thinks so.) A code sent as a text to a designated mobile phone counts as multi-factor authentication and is easy to implement, but it turns out the telephone network is notably easy to subvert, so a code sent as a text to a designated mobile phone is not a particularly good second authentication factor. Fingerprint scanners and security dongles are the current favorites: fingerprint scanners address the “what the user is” factor, while security dongles address the ”what the user has” factor. Software support for these, however, is notably a great deal more complex than the simple comparisons required for passwords, verification questions, and texted security codes.

Misfortunes and Disasters

Attacks can be directed (by an adversary), or random (by circumstance). We call attacks by circumstance “misfortune,” although thinking of them as attacks on vulnerabilities is actually more fruitful. A random “attack” might be loss of power, for example. Loss of power affects availability: if the computers are unpowered, they aren’t going to be responding to requests. Loss of power can be a random misfortune: a car crashed into a power pole, a power pylon fell over in a high wind, a tunnel going under a river developed a leak. Responding to random misfortune, or failure preparedness — what happens when things go wrong — is an aspect of information security. Responding to widespread random misfortune, or disaster preparedness — what happens when a lot of things go very wrong over a wide area — is also an aspect of information security. A typical disaster preparedness plan for a large organization involves a “hot” (or at least warm) physically remote duplicate information center, an information feed to keep the remote site up to date, and failover arrangements which are periodically tested; backup generators, fuel stores, food and water stores, and portapotties; an excess of trained and available personnel; on-site spares; transportation and supply arrangements to deal with infrastructure destruction; and so forth.

Now, the type of disaster preparedness that is worthwhile for you or your organization is a different question. In general, if keeping track of the orders you received for live chicks is going to be the least of your worries when the dam breaks and the pieces of your house and hatchery float down the river, then no, a remote data center is probably not a worthwhile preparation. Your disaster preparedness plan probably should be just a detailed inventory of assets for the insurance claim, and maybe a CD kept by your brother in law in Iowa holding your customer list for when you rebuild. On the other hand, if you are Bank of America, the sudden disappearance of your data center in the World Trade Center in New York City — possibly associated with the sudden disappearance of the World Trade Center, or, if the balloon well and truly goes up, of New York City itself — is probably something you want not to completely halt ongoing operations, and a remote data center (and not “just across the river”) and a slate of remote “just in case” corporate officers is probably a worthwhile investment.

The incidence of disasters are also related also to reliability measurements. You will sometimes see references to “five nines” availability — where the availability of a system is “guaranteed” to be 99.999%. Think about that for a moment: it means a total of one day of unavailability in 274 years. Now consider: can you identify an area of inhabited Earth surface that has been undisturbed by fire, flood, earthquake, tsunami, hurricane, tornado, war, or other disaster in the last 274 years? Such places do indeed exist — but your data center isn’t at one of those places, is it? Some vendors even claim “six nines.” One day of outage in 2738 years? Don’t make me laugh: 2700 years ago, Rome and Sparta were being founded and the first Mayan cities were being developed. The Antikythera mechanism isn’t that old. Entire civilizations have risen, and fallen, and risen again in that time. Geography doesn’t have six nines availability. The Great Pyramid of Giza has six nines availability; a device — any device, backed by any maintenance plan — does not.

This question of what preparedness and protection is, or is not, worthwhile actually applies to all information security concerns. At the high end, an information security compromise might result in the destruction of the organization and possibly consequences for those charged with protecting that information. While doubtless unfortunate and unpleasant for all concerned, this is not in fact the end of the world. (Unless it actually is the end of the world, in which case, well, bye.) But typically a breach of information security has surprisingly few and mild negative consequences. Storing usernames and plain text passwords in your database is a poor security practice: if someone copies the database, you can no longer tell who is a legitimate user and who is an impostor. Is that an actual problem? For the users who reuse passwords, sure; but for the organization? Maybe, maybe not. Having to send a mass email saying “hey everybody, change your passwords!” seems to be about the limit of organizational consequences. Storing usernames and unencrypted email addresses in your database is a poor security practice: if someone copies the database, most of your users can be identified in real life. Will disclosure destroy the organization? If you’re Ashley Madison, one would have thought so; but apparently not. Some bad press, a few resignations, some lawsuits, and life goes on. If you’re a national government, though, it’s apparently just one of those things citizens need to put up with. Some bad press, a few resignations, some lawsuits that get squashed with soverign immunity, and life goes on. China gets the specifications and blueprints for the F-35? Panic and recriminations, but overall actually meh: some bad press, a few resignations, and life goes on. All your data gets encrypted by ransomware? Some bad press, a few resignations, your business insurance carrier pays then drops you, you get new insurance, and life goes on. Your major competitor gets a copy of your five year plan? Great! They’ll be distracted while you do something else instead; it’s not like your five year plan was realistic, anyway.

In the long term, defending against misfortune and attack is typically cheaper than not defending. But in the short term, simply ignoring risk and threat is a high payoff strategy — at least until it suddenly isn’t. Your organization’s appetite for risk, and (unfortunately) intelligence in assessing risk, will guide its response. Most organizations refuse to believe in risk until risk has turned into threat, a great many refuse to believe in threat until threat has turned into danger, and a few refuse to believe in danger until danger has turned into harm; and somewhere around risk becoming threat it’s typically too late to do much about it. But this is typical; after all, this is business we’re talking about: “preparation and hard work pay off in the future, but willful ignorance and procrastination pay off now!

The Adversarial Landscape

In security, there are adversaries, in addition to just environments. There are adversaries, and there are adversaries.

The scope of possible attacks depends a great deal on who the attackers are. A script kiddie on the other side of the world is going to present as ill-intentioned packets arriving on the network interface, and maybe a misspelled phishing email or two. A competitor is going to present as attractive employment offers to knowledgeable key personnel, or plain old bribes to plain old employees, or the purchase of your office cleaning contractor as a way to get physical access to your computing equipment. A ransomware criminal organization is going to present as spear phishing emails, or possibly a subverted web site somewhere. A nation state, on the other hand, is going to present (on the subtle end) as physical intrusion, targetet extortion of key personnel, assassination, strikingly beautiful persons suddenly attracted to your corporate officers, network and environmental subversion, pre-installed hardware back doors in computing equipment, intelligent and actively malevolent custom one-off manufactured USB flash drives substituted for your Amazon order, or (on the not-so-subtle end) as air strikes, shelling, and invasion by infantry and possibly armored divisions; and an incredible range of other possibilities in between. Recall that law enforcement in the United States routinely involves subversion of targets’ computing platforms, has involved both the destruction of an entire domestic city block (Philadelphia) and the invasion and occupation of an entire sovereign country (Panama), and addressing “national security” concerns has even less restraint. Be careful of the attackers you attract.

In general, if subverting your security is a line item in anyone’s budget, your security practices must be orders of magnitude more extensive than would be regarded as typical — or possibly even sane. The range of attack options open to a powerful adversary is unbelievable; and as we have seen in the early 2020s, attracting the attention of a powerful adversary is easier than it seems: being successful is enough.

Which is why information security is not absolute, it is relative to the threats you expect to face. Although, absolute — or at least very resistant — information security is available. It involves hollowed out mountains, blast doors, isolated networks, exhaustive personnel screening and surveillance, armed guards, manual data transfer among compartmentalized computing elements by security officers, computing equipment suspended mid-air where all connections can be observed, building computing equipment yourself from chips you yourself have examined under electron microscopes, powering computer equipment through motor-generator pairs, equipment rooms mounted on springs with a a few feet of rattle space all around, operator chairs bolted to the floor with retention harnesses for the personnel, Faraday cages, capture denial devices probably in the five to ten megaton yield range, and doubtless a thousand other preparations neither of us is paranoid enough to ever think of. Your organization isn’t going to do that — it in fact cannot do that unless it is a major nation state; and even major nation states will do that very rarely. You’re going to be lucky if your organization has an IT department that knows how to configure a network firewall, and true organizational dedication to security means someone reviews the firewall logs from time to time.

Most organizations don’t have to deal with nation state adversaries. (I have worked at places that did. It is, shall we say, an interesting experience.) In the absence of nation state adversaries, typical “best practice” defenses are enough:

and so on. You can find many simple security precaution checklists on the web; all are pretty good. These are simple, sensible precautions that do not get in the way of actually doing work, but do get in the way of misfortune and unsophisticated attacks.

Oh, and test your backups. Make sure you can in fact restore from the backups: not you should be able to do so, not you think you can do so, but you have done so within the last quarter and therefore know you can. It’s important. Out of all the information security preparations you might need, the one you absolutely 100% guaranteed will need is backups.

The Role of Physical Security

Complicating information security is the observation that all security depends on physical security: without physical control of the physical device at all times, there simply is no hope of information security.

Computation is not something people do. Computation is something people manage. Other things — mechanisms — actually do the computation. It is instructive to ponder, for a moment, the physical attack points on a typical desktop computer system. What components, if compromised, could reveal, or change, or inject information? Start with the room itself: the sound of the keyboard being used can reveal what is being typed. The keyboard itself obviously knows what is being typed. The keyboard itself contains a microcontroller, which can either be reprogrammed or replaced and so subverted. The cable between the computer and the keyboard knows what is going on. So does a whole raft of software on the computer — some of which you can introduce, all of which can be subverted with physical access, some of which can be subverted with only network access. The computer itself: the bus, all bus participants, the hard drive, the CPU, the memory subsystem, the DMA controller, the keyboard controller, and a whole lot of other things all know, or can find out, exactly what is going on in the system — and change it. The cable to the display, the display itself, the light emitted by the display, the van Eck radiation emitted by the display as a side effect, the current induced in the AC line by the display as a side effect, they all can reveal what is going on.

Now, can you do something about all that? Yes you can, but you won’t. (After all, what is your construction budget for information security? You’ll need a bunch of sheet copper for the Faraday cage, a motor-generator pair for the supply current, anechoic tiles for the walls and ceilings, a steel door, a badge reader lock, and a bunch of other things — for each workstation. Otherwise, you’re doing security theater, not security.) But then, you don’t need to do something about all that, either. With the exception of your significant other putting a key logger on the family computer to catch you having an affair, these are all in the realm of things that need a “sophisticated” adversary to exploit, and you don’t have a “sophisticated” adversary. (Unless you’re Bank of America or the NSA, in which you do have sophisticated adversaries; but if you are the Bank of America or the NSA, you also have a construction budget for information security.) But even major Wall Street financial institutions seem to have trouble putting passwords on their WiFi networks; you can do at least that, right?

As I mentioned before, if subverting your security is a line item in someone’s budget, things get a lot more interesting and it’s time to call in the professionals. But at least do the best practices, which are the computer equivalent of locking your doors when you’re not at home. Prudence is not paranoia.

Where the physical control basis of security actually shows up in typical software development is client-server applications: the server is here, inside the security perimeter, and the client is out there, somewhere in the world. Hint: it’s not your client your server is talking to. Your server is talking to something that just speaks the same language as the client you wrote, and the language is typically standardized. If the client touches something, produces something, or tells you something, verify it, don’t just trust it. Use https with both client and server certificates; use challenge-response or personal certificates to authenticate the user; check the server’s record of the user’s authorization to access functionality, don’t rely on the client saying it’s authorized. Trust is for inside the security perimeter — unless you have an even stronger policy in place such as a zero trust architecture. Network-borne attacks don’t need a nation state behind them: a network-borne attack is within the reach of a single motivated, curious, or even bored individual, and well within the reach of a small group. In client-server, verify, don’t trust.

Data in Motion

These days, data does not stay in the data center. Data gets shipped around: to customers, to vendors, to banks, to business partners, to remote offices, to cloud providers. Data in motion is particularly vulnerable to observation and corruption. Once the data goes out the building, it is in someone else’s hands, and your information security depends on that someone else’s information security. Since you probably can’t buy Verizon or Lumen Technologies and bring them inside your security perimeter, you have to adjust. Typically, adjustment means end-to-end encryption.

Encryption is a reversible mathematical operation on data which turns the data into an uninterpretable mess. However, reversing the operation requires knowledge of a secret: without the secret, you cannot reverse the operation. Which implies you are left with an uninterpretable mess, of which you can make neither heads nor tails. Which in turn means an observer can observe that communication is happening (which leads to an attack called “traffic analysis,” which can reveal that something is up and these particular entities are involved in the something), but not the content of the communication (the particular something that is up).

End-to-end encryption means the two endpoints of the communication — the origin and the destination — are performing cryptography with the same “cyphertext” — what one receives is exactly what the other sent. This is different than mediated encryption, where two clients communicate with one another through a server, and the client-server communication is encrypted but the server has to decrypt from one client to get “cleartext” in the middle before reëncrypting for the other client. (This, incidentally, is how corporate firewalls typically guard against data exfiltration and network borne threats, and how the Great Firewall of China works: they “man in the middle” the connection, decrypt it, examine the data, and reëncrypt the data to continue its travels. At least one of the parties to communication has to permit this to happen; typically it is the party inside the corporation or China, and this happens through a “trusted certificate.” With the certificate, the connection is man in the middle’d; without the certificate, the connection is simply blocked.)

Encryption comes in two forms: shared key and public key. In shared key encryption, a small quantity of data — typically 16 bytes of random data — is known to both parties, and this small quantity of data is used as “the secret” for the encryption. This type of encryption is also called symmetric encryption. In public key encryption, through mathematical wizardry, a very large secret (typically 256 or 512 bytes) is split into two pieces: either piece can be used without revealing the secret, but both together reveal the secret. One of these pieces is called the public key and can be handed out freely. The public key lets data be encrypted, but not decrypted; decryption depends on the other piece, called the private key. The mathematical operations to use the public key are cumbersome and take quite a long time, while symmetric or shared key encryption is typically quite rapid; so the public key is used primarily to encrypt a random session key, which is used to encrypt the actual data with symmetric encryption.

And incidentally, random means random. Neither you nor I can produce random bits: actual randomness is not something organic tissue can do. Nor is it something software can do, because software is deterministic. A major function of a “hardware security module” is the generation of actual random bit streams from thermal noise, radioactive decay, microsecond timings of excited electron orbital decay, and other very high quality sources of randomness. Most modern CPUs have a true random bit stream generator buried inside, which uses manifestations of quantum randomness (typically thermal noise) to generate truly random bits; most modern operating systems have facilities which combine truly random events (microsecond keyboard timings, microsecond hard disk drive response times, the CPU’s random bits) into truly random bit streams. These true random bit streams and then expanded into much longer pseudo-random bit streams through — what else? — encryption; these pseudo-random bit streams are “sufficient” to serve as keys to protect data. In Linux, /dev/random provides actual random bits, while /dev/urandom provides pseudo-random bits; the purpose of /dev/random is to seed /dev/urandom. /dev/random can run out of randomness (called, in this context, entropy); /dev/urandom cannot. In .NET, the System.Security.Cryptography.RNGCryptoServiceProvider class provides a platform-independent way to access these “cryptographically random” pseudo-random bit streams. In microcontroller systems, where there is typically no convenient source of actual randomness, we build our own, typically using resistor thermal noise or reverse biased diode breakdown noise. We will cover this when we get to electronics and microcontroller systems.

All of which means, if you don’t have shared secrets and aren’t using certificates (the implementation of public and private keys), you aren’t doing encryption, and your data is open to view while it is in motion. Which you would think was obvious, but I have seen developers who think they are doing “encryption” by wrapping data in XML; because, after all, who could possibly interpret XML? (Answer: any human being, or any XML parser. Ease of interpretation is, in fact, why we use XML in the first place.)

Data at Rest

When data leaves the building, or the organization, it becomes vulnerable to disclosure and substitution. But even data not intended to leave the building is subject to exfiltration — surreptitious transport out of the building by ill-intentioned humans or software — or capture — change in physical control of the building. (Capture? No one is going to come in with guns and capture the building, you say. That’s true, no one except the IRS will come in with guns, unless you have been very naughty indeed. Instead they will buy your landlord, and lock you out over a “rent dispute” which will get cleared up as a “mistake” immediately after they have rummaged through your stuff. If they sell on the landlord afterwards, this won’t even cost them anything — capital is cheap. They might even make a profit.) So it is wise to prepare for potential loss of control of data, even data held internally. We do this, too, through encryption.

Data storage encryption comes in various forms. One popular form is the encrypted disk volume: the data on the disk is encrypted, and hence incomprehensible without the key. The key is provided to the hardware or the system software when the disk is mounted. The data on the disk is decrypted, transparently, when the data is fetched from the disk; and data going to the disk is encrypted, again transparently, when it is stored. The physical disk never has cleartext data; there is no massive “decrypt the disk” or “reëncrypt the disk” operation involved. Instead, just the bits and pieces transferred from and to the disk are decrypted and encrypted. An encrypted disk might be as actual physical disk drive or partition on a disk (whole disk encryption) or a logical disk drive (virtual disk image encryption). Databases typically provide independent encryption facilities, too, either for the database as a whole or for specific columns in specific tables. These are examples of file based encryption, rather that whole disk encryption: most data on the disk is cleartext, but some is protected. Backups — copies of data kept as a protection against loss of the primary copy of the data — are typically not treated as sensitive, and so should always be encrypted. Encrypting the backup save set is also an example of file based encryption.

Encrypting operational data has a risk: what do you do if the encryption key is lost or becomes unavailable? If the key becomes unavailable, the data itself becomes unavailable. Encrypting data at rest is a choice, one which ranks loss of availability to be less severe than loss of confidentiality. Joe over in finance kept his disk encrypted, just like policy says, but Joe is currently hospitalized with a stroke and payables are due out by the end of the week. Which is why all data is encrypted to two public keys: one the intended user, and the second the organizational data recovery key. The organizational data recovery private key is kept by a bonded organizational officer in a safe and is never pulled out unless really needed or during the yearly audit to make sure it’s still readable.

X.509, PKI, and Trust

Public key encryption is a rather general idea. One implementation of public key encryption, known as X.509, is quite popular (and is the basis for common network security protocols such as https). X.509 includes, among other things, the use of certificates which contain the public key of an entity, along with attestations by others that the public key is in fact associated with the named entity. These attestations are called signatures and can be validated with the public key of the attestor. These attestors’ public keys are in turn signed by other attestors, which are in turn signed by other attestors, and so on; until the last attestor in a chain of attestors attests to itself. These various attestors are called certificate authorities (CAs); they are generally organizations that have validated themselves to other certificate authorities as “trustworthy to attest to keys,” whatever that means to the other certificate authorities attesting to their keys’ validity. Further, you are trusting the validity of the last attestor in the chain, the one that says “I pinky swear that I am myself.” These “root keys” are typically included in your operating system by the manufacturer. This entire arrangement of public keys attesting to other public keys, rooted in the self-signed keys from major CAs, is known as the PKI: Public Key Infrastructure.

So in using a public key, you are trusting all the attestors in the chain. But this is not trust in their finances, or overall moral tone, or their ecological sensitivity, or their avoidance of conflict minerals in their supply chain; you are trusting only their ability to attest identity: to have signed attestations only for keys they have validated as actually belonging to the entity they claim to belong to; and to do that reasonably well: to have retained control of their private keys so their attestations cannot be forged, in all of the past, the present, and the future. The legal landscape of the CA’s country may affect this: in the US or China, for example, signing keys may need to be produced if the local government demands them. You are trusting not only the CA to do its thing, but the CA’s government to let the CA do its thing.

That’s a lot of trust. Is this trust warranted? In general, no. Even major certificate authorities have been hoodwinked into signing fraudulent keys. Perfectly legitimate manufacturers’ keys have been stolen and used to sign malware. There are certificate authorities that have been subverted, secretly or quite publicly, typically by nation states, and have signed fraudulent keys. There are certificate authorities run by nation states that knowingly make fraudulent signatures (“We’re the government! Trust us!”) when their national security organizations need a fraudulent certificate. All of these are trusted “out of the box” by the so-called “root certificates” your operating system manufacturer pre-loads into your computer. Only when a certificate authority is quite publicly caught with its pants down does it get removed from the manufacturers’ trusted root certificate lists. It is a Big Deal™ when that happens, with articles in the trade press. It doesn’t happen frequently.

An alternative to the hierarchical trust model embodied in X.509 certificates is the “web of trust” model used, for example, in GPG keys. In the web of trust model, you assign trust to signatures, based on your trust of the signature keys. Note that here “trust” does not mean you trust things signed by a key to be innocent, or truthful, or valid in any way; it means you trust the owner of the signing key to only sign other keys after validating those keys’ ownership. There are various ways, some even completely automatic, that this can be done. Keysigning parties are a popular way for groups of individuals to attest to one another’s personal keys. It’s even possible to set up a key signing email service: mail it a public key, and it signs the email address identity of the public key with the service’s key, encrypts the signature to the public key, and mails it back to the email address. The encrypted signature can be obtained only by someone controlling the email address, and can be decrypted (and then added to the public key’s attestations) only by someone controlling the corresponding private key. Viola: a moderately trustworthy attestation that “this public key is in some way associated with this email address.”

An implementation, of sorts, of the web of trust model in the X.509 space is to have your own certificate authority, and trust only keys that are signed by that certificate authority. This uses the widespread support for the X.509 model, but without depending on the PKI. Many organizations do this for internal keys: I can have a “BrianHetrick.com CA” local certificate authority that my programs trust, possibly exclusively if the programs need to communicate only with other programs in my organization. But this means you are running a certificate authority, which is somewhat complex and has substantial requirements as to organizational governance: your organizational data security is dependent on the flake vice president assigned to safeguard the ultimate signing key, who doesn’t understand this whole “IT” thing and whose idea of “secure storage” is “in the credenza with the sherry.” If the organization does this with a key-signing key trusted outside the organization (typically by a rather expensive and invasive business arrangement with a widely trusted external certificate authority that actually keeps your signing key themselves and requires a lot of paperwork to sign a key because they are putting their reputation on the line for your actions), these keys will also be recognized as valid outside the signing organization.

In general, there still isn’t a good answer to the “key distribution” problem, as this issue is known. With secret key encryption, getting the secret key to the two parties is the issue; with public key encryption, ensuring a putative public key in fact corresponds to the other party is the issue. The public key infrastructure just kicks the problem down the road a bit.

Destroying Data

All data has a “best by” date, and after that date it becomes toxic. Organizations typically have data retention policies, and data should be destroyed in accordance with that policy. (Except a court can issue a data retention order, typically in conjunction with discovery in a civil or criminal case. You must not destroy data under a data retention order; doing so is “spoilation of evidence” and typically involves quite severe repercussions.) However, deleting a file from a disk, for example, does not actually delete the file: it just deletes the index information for the file and marks the medium area used by the data itself as available for reuse. Until the medium area is actually reused, the data is still there, and can be recovered. Data can be truly deleted by writing random bits over it several times; the US Department of Defense has a standard for “file shredding” and there are freely available tools that shred files according to that standard. DBAN (Darik’s boot and nuke) is a popular choice for personal use data destruction of entire disks; there are also commercial alternatives applicable to wider sets of circumstances such as RAID sets.

But note that flash media cannot do data destruction by simple overwriting. Flash media, such as USB flash storage drives and solid state disks, do wear leveling, where what physical flash cells are used for a given logical sector changes over time: thus the part of the medium you shred might not be the part of the medium where the data was actually recorded. This also happens with some intelligent hard drives that do error sector remapping. Also, depending on the resources of whomever is attempting to recover the overwritten data, up to half a dozen or so overwrites can be “seen through” and the original data recovered. The only sure way to destroy data recorded on a storage medium is physical destruction of the medium itself: physical shredding followed by incineration, and if anything remains after that, grinding the remaining solids into powder and melting the powder into a puddle.

Now, do you want to go through the effort of physical destruction of your old media? Shredding, incinerating, grinding, and melting hunks of metal doesn’t come cheap. I can’t answer that for you, but you can: which is more valuable to you, keeping your data your data, or the salvage value you get selling your old hard drives on eBay? Most organizations, and a large number of individuals, have a really quick answer to that one.