- Quick tips
Email's Best of 2021
Everyone hated the privacy policy email armageddon, businesses included. Not because some of their emails were going straight to spam, but because a lot of businesses had to take a second look at their security measures. Revamping security measures can suck – but losing a ton of personal information is the absolute worst.
Businesses can reduce their risk of having information stolen by implementing the right practices for data protection – and some are pretty simple to get behind. One method we’re pretty fond of is Pseudonymization, which is the fancy way of saying sensitive data camouflage. Pseudonymization replaces identifying information in a data record with fake identifiers (pseudonyms) which makes it difficult to trace any given data point. If you think about it, it’s kind of like that fake myspace you had in 2007.
The great thing about Pseudonymization is that it’s the same data just under an assumed name, or in this instance, a very long string of characters. While no one protection is enough on its own, combining it with other practices like encryption, hashing, or tokenization help reduce the risk of re-identification. Applying pseudonymization to your data is relatively simple, and there is more than one way to accomplish it. In this example, we’ll be looking at the Logstash Fingerprint filter plugin, but you can also try a generic file script using a Ruby filter plugin if this doesn’t work out for you. Both methods will mask the username and IP fields, so keep that in mind!
Before we get started, grab some Mountain Dew because nothing makes you feel more like a computer mastermind than questionable soda choices. Once you’ve cracked open that cold one, download the files in the repository to a local directory. Here is some code from GitHub that makes it easier to download the files individually:
1curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/docker-compose.yml23curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/logstash_fingerprint.conf45curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/logstash_script_fingerprint.conf67curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/pipelines.yml89curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/pseudonymise.rb1011curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/Dockerfile1213curl -O https://raw.githubusercontent.com/elastic/examples/master/Miscellaneous/gdpr/pseudonymization/sample_docs
Check that the directory with the downloaded files is shared with the docker. Then go into the directory and execute the following command:
1ELASTIC_PASSWORD=changeme TAG=6.2.2 docker-compose up
Look for the logline below. This will tell you Logstash has started and can now accept data.
1logstash_1 | [2018-03-20T12:40:33,638][INFO ][logstash.agent ] Pipelines running {:count=>2, :pipelines=>["fingerprint_filter", "ruby_filter"]}
Now take a sip, babes. We’re almost there.
For the Fingerprint filter plugin, execute the following command:
1cat sample_docs | nc localhost 5000
Ta-da! You can now inspect and use the data! The pseudonymized information will be indexed to an events index which you can access through the following query:
1curl "http://localhost:9200/events/_search?pretty" -u elastic:changeme
It should look a little something like this:
1{23 "_index": "events",45 "_type": "doc",67 "_id": "tQOjQ2IBED8Jv9YVVDxs",89 "_score": 1,1011 "_source": {1213 "host": "gateway",1415 "user_agent": "Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1",1617 "job_title": "Electrical Engineer",1819 "username": "95b88d8d477e18a8acca833e7bcbd2c5d5f646b29e2d1c9604a1d930e2f63313",2021 "@timestamp": "2018-03-20T13:39:59.799Z",2223 "ip": "e85022a9801b356dd8c3ed6b2e02f0061a3aeea5bbad15a9ff4aed35b5bb3a42",2425 "source": "ruby_pipeline",2627 "city": "Komsomol’skiy",2829 "title": "Mr",3031 "country_code": "UZ",3233 "@version": "1",3435 "gender": "Female",3637 "country": "Uzbekistan",3839 "port": 411264041 }4243 }
Cool, right? But wait, there’s more! The key-value pair lookups are in an identifies index, which you can access with this query:
1curl "http://localhost:9200/identities/_search?pretty" -u elastic:changeme
If you’re wondering what that looks like, take a peek below:
1{23 "_index": "identities",45 "_type": "doc",67 "_id": "1924d02bd98a46c795cb2a925b98a22ae59c563e0de49f4ba4aa49e6cab072ad",89 "_score": 1,1011 "_source": {1213 "key": "1924d02bd98a46c795cb2a925b98a22ae59c563e0de49f4ba4aa49e6cab072ad",1415 "value": "174.145.248.21",1617 "tags": [1819 "identities"2021 ],2223 "@timestamp": "2018-03-20T13:39:59.957Z",2425 "@version": "1",2627 "source": "ruby_pipeline"2829 }
You should always have 200 documents in a pseudonym index no matter how many times you index the data. There is one document for each unique value in the table and in this case, we have the username and IP address. Need to reidentify a value? You can look it up by ID in the identities index. ICYMI – this is what a pseudonymized value looks like:
6efda88d5338599ef1cc29df5dad8da681984580dc1f7f495dcf17ebcf7191f8
If you need the original value, you can get it with this command:
1curl "http://localhost:9200/identities/doc/6efda88d5338599ef1cc29df5dad8da681984580dc1f7f495dcf17ebcf7191f8?pretty" -u elastic:changeme
BAM! Pseudonymization! It’s like the witness protection program for data – we’re a big fan. All of that pseudonymized data makes it difficult for bad actors to do anything with it even if they’re good at what they do. With a solid data retention policy your risk for theft can be drastically minimized, and who doesn’t love that? If you’re curious about Mailgun’s data processing, check out our website! We get real technical with email, real fast.
Last updated on August 28, 2020
Email's Best of 2021
A Word of Caution For Laravel Developers
Privacy Matters: Your Data Is Safe With Us
How To Use Parallel Programming
How we built a Lucene-inspired parser in Go
Gubernator: Cloud-native distributed rate limiting for microservices
What Toasters And Distributed Systems Might Have In Common
Internet Security – Defending Against Spam
Same API, New Tricks: Get Event Notifications Just In Time With Webhooks
Sending Email Using The Mailgun PHP API
InboxReady x Salesforce: The Key to a Stronger Email Deliverability
Become an Email Pro With Our Templates API
Google Postmaster Tools: Understanding Sender Reputation
Navigating Your Career as a Woman in Tech
Implementing Dmarc – A Step-by-Step Guide
Email Bounces: What To Do About Them
Announcing InboxReady: The deliverability suite you need to hit the inbox
Black History Month in Tech: 7 Visionaries Who Shaped The Future
How To Create a Successful Triggered Email Program
Designing HTML Email Templates For Transactional Emails
InboxReady x Salesforce: The Key to a Stronger Email Deliverability
Implementing Dmarc – A Step-by-Step Guide
Announcing InboxReady: The deliverability suite you need to hit the inbox
Designing HTML Email Templates For Transactional Emails
Email Security Best Practices: How To Keep Your Email Program Safe
Mailgun’s Active Defense Against Log4j
Email Blasts: The Dos And Many Don’ts Of Mass Email Sending
Email's Best of 2021
5 Ideas For Better Developer-Designer Collaboration
Mailgun Joins Sinch: The Future of Customer Communications Is Here
Always be in the know and grab free email resources!
By sending this form, I agree that Mailgun may contact me and process my data in accordance with its Privacy Policy.