GlueCon2018: AWS Security for DevOps by Chris Henson

Gratitude is what comes to mind when reflecting back to my speaking opportunity at GlueCon2018. Back in January this year,  I came up with the topic of ‘AWS Secuirty for DevOps’ as a way to introduce the concept of an IAM role, show some basic policies and understand why Principle of Least Privilege is needed when using Apps inside AWS.  I built the slide deck and prepared the talk – without knowing if I was going to be selected to speak.

GlueCon was amazing! I did get selected to speak in the main hall used for the Keynotes (YAY! Thank you, Eric Norlin)  and although the presentation was not filmed, you can down load the deck here from the ‘about me’ page ( at the bottom ) and use the included MD5 hash to ensure you are getting the same file I put up.

 

 

 

Posted in AWS, Gluecon2018 | Leave a comment

Gluecon2018 Keynote w/ Adrian Cockroft + AWESOME!

Cool things happen when a Security person gets to attend a Developer Conference! In all seriousness, Last January, I planned to attend GlueCon this year because I feel development is a critical part of Security and I want to understand development concepts more in depth so I can add value there.

Below are my notes on the Gluecon2018 Keynote w/ Adrian Cockroft. Adrian works for a Amazon Web Services as a Solutions Architect; and he was previously at Netflix as a key part of their team.  The keynote was solid ! … and one of the best presentations on Cloud Architecture I have seen.

 

Adrian opened up with the fact that Architects must ask the “Awkward”  questions of their customers. ‘ What should your system do when it fails?’ [ because it will fail ] ‘ If a permissions lookup fails, what should you do?’ ‘Do you have a real DR?’ ‘ How do you know your system works’? [ what metrics tell you? ]  ‘How often do you fail the entire data center all at once?’ ‘ How exactly does your your system return to normal after a DR?’ Most customers don’t want to talk about failure scenarios and it is an Architect’s job to bring them up and address them in the design. Adrian pointed out that some companies have “Availability Theatre” where true failure scenarios are not part of the over-all DR testing process; yet DR is touted as functional.

Next, Adrian moved onto to talk about avoidable failure scenarios. He pointed out a SaaS company that forgot to renew their domain name and due to expiry, everything failed. He also pointed out SSL certificate expiry as a failure scenario.  Aside from the obvious controls in place to avoid these types of failure scenarios; Adrian pointed out that you could program an alternative DNS name to which your API could fail, so that DNS dependent services could still function in the event of a domain name expiry. He mentioned that DNS is one of the weakest points of a large system and needs to be taken into consideration when we architect for chaos.

Transition to  Chaos Architecture  . . .  4 Layers . .

  1. Infrastructure Layer – No Single Point of Failure
  2. Switching and Interconnecting Layer – Data replication / traffic routing
  3. Application Layer – app Failures / Error handling
  4. Users / People Layer – Operator confusion, users not interpreting data properly and making changes based off of what they see vs. what is actually happening

To mitigate problems on the User Layer . .Not enough emphasis on ‘People training’ and Fire drills when it comes to reacting and responding to system failures. Implement training to help users / operators behave in a consistent way when certain failures occur.  Also, implement ‘Game Days‘  where failures are purposely introduced as a way of training.

To mitigate problems in the Application Layer – Leverage Simian Army toolsets that introduce specific problems into various components of the application to understand how your system reacts to these failures so they can be addressed.  Adrian mentioned cHAP to automate.

To mitigate on Switching / Interconnection and Infrastructure Layers; use Gremlin to run specific failures and experiments against your infrastructure to understand how it behaves in those scenarios. Adrian pointed out that we must not think ” the network is reliable ” and architect for failures in the network domain when we design our systems and applications.

Adrian talked about how a ‘Chaos Engineering Team’ is like a ‘Security Red Team’ wherein a Security Red Team identifies Security weaknesses in Security; a Chaos Engineering Team identifies weaknesses in Availability.

Other critical points Adrian touched on:

The ‘Red Queen theory‘ wherein as we evolve; the people and environments around us evolve around us as well. I took this as a warning as to not “Architect in a Bubble”

Read ‘ the Safety Anarchist’  by Sydney Decker.

Amazon is beginning to implement chaos tests for customer use via Aurora DB Cluster Fault Injection 

Recommended to get involved in Chaos Engineering Working Group 

 

 

Posted in AWS, Gluecon2018, Uncategorized | Leave a comment

OSSEC / Auto-OSSEC Automation in AWS Linux – More GLUE!

 

OSSEC is a tricky devil to automate. And what I mean by automate; is install the ossec-hids-server, install the ossec-hids agent, register the agent and have the server recognize that registration without human prompts. If you’ve done this before, you know there are lots of manual steps. The smart folks over at BinaryDefense have added some automation to that process with their auto-ossec tool

They really took a lot of work out of all of the manual steps needed to connect the client to the server, generate the key and exchange the key…

but… the process was still not as automated as I needed it to be. In AWS you don’t know what the OSSEC server IP will be, and that IP needs to be passed to auto-ossec as an arguement +  placed in the ossec-hids-agent config file.  Not to mention all of the repo adds, tweaks to ossec config files that must happen even for ossec to start properly.

I have written two scripts, located in my git repository,  that automates the installation of the remaining pieces that auto-ossec does not; that is outfitted for AWS Linux.

The LinuxOssecServer script installs ossec-hids-server and binarydefense auto-ossec listener on the AWS Linux Ec2 instance that will be in the role of Ossec Server.

We leverage S3 as a storehouse for needed files:
The atomix file/ script that you run to install the ossec repositories: https://updates.atomicorp.com/installers/atomic would go in s3://yourbucketname.

Also, a clone of binary defense repo https://github.com/binarydefense/auto-ossec would go in s3://yourbucketname.

You need to allow your EC2 instance access to S3 and to query other instances, so EC2 instance Role required for access to S3 and EC2.

The LinuxOssecClient script installs ossec-hids-agent and and binarydefense auto-ossec and then automatically locates the AWS EC2 instance ossec-server ip (via a pre-set tag) and registers the agent and starts services on AWS Linux. Same requirements as above for the fole.

The line with ‘aws ec2 describe-instances’ must have correct region, so put your region in there. For the public version of the code,  ossec server must have AWS tag of Name=tag:Role,Values=OssecMaster for script to locate the IP addr of the EC2 instance that is the ossec server, so when you start your OssecServer instance, be sure to add that tag.

You’ll notice some sleep commands I’ve put in the scripts. OSSEC initialization is a little buggy, meaning, [ see ref links 1 and 2 below ]  that you have to restart the ossec-hids-server process on the server after the first agent attempts to register; once that is done, all the subsequent agents will register with no problem. I don’t know why this is and this behavior is lame – and I hated to have to code around it.  I need to come up with a better way that just sleeping the script during the first agent registrations; and then running a restart after x minutes.  Or maybe the next version of OSSEC will fix this so the first agent will register without a restart.

Ref .1  Issue where you have to restart OSSEC after first agent registers

Ref 2 Issue where you have to restart OSSEC after first agent registers

Also, don’t forget to configure your Security Groups correctly.

You’ll need  9654 TCP open on the OSSEC server for the auto-ossec listener

You’ll need 1514 UDP open on the OSSEC server to accept agent keep alive messages.

 

Posted in Cloud Security, Cyber Security, Linux Security | Leave a comment

Path to AWS Architect Professional – Storage Anti-Patterns

 

This post a summary on my notes from reading the Storage Design Anti-Patterns addressed in this AWS Whitepaper.  

“An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive”

S3 Anti-Patterns: 

Amazon S3 doesn’t suit all storage situations. The following list presents somestorage needs for which you should consider other AWS storage options

Storage Need:  File-System. S3 uses a flat name space and is not meant to be a POSIX-compliant file system.  Instead, consider Amazon EFS as for a File System.

Storage Need:  Structured Data with Query: S3 does not offer query for specific objects. When you use S3, you need to know the bucketname and key for the files you want to retrieve. Instead use / or pair S3 with: AWS Dynamo DB, Amazon RDS or  CloudSearch

Storage Need:  Rapidly Changing data: Use solutions that take read and write latencies into account, such s Amazon EFS , AWS Dynamo DB, Amazon RDS, or Amazon EBS

Storage Need:  Archival data: For data that requires infrequent read access with encrypted archival storage with a long RTO is ideal for Amazon Glacier

Storage Need:  Dynamic Website hosting: Although S3 is ideal for hosting static content, dynamic websites that depend on server-side scripting or database interaction are more ideal for Amazon EC2 or Amazon EFS

Glacier Anti-Patterns: 

Amazon Glacier doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options.

Storage Need:  Rapidly Changing data: Look for a stroage solution with lower read and write latencies such as Amazon RDS, Amazon EFSAWS Dynamo DB,  or DBs running on Amazon EC2

Storage Need:  Immediate Access Data store in Glacier is not available immediately, typically takes 3-5 hours, so if you need to access your data immediately, Amazon S3 is a better choice .

Amazon EFS Anti-Patterns: Amazon EFS doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options

Storage Need:  Archival data: For data that requires infrequent read access with encrypted archival storage with a long RTO is ideal for Amazon Glacier

Storage Need:  Relational Database Storage: In most cases, relational databases require storage that is mounted, accessed, and locked by a single node (EC2 instance, etc.).. Instead use: AWS Dynamo DB, Amazon RDS

Storage Need:  Temporary Storage: Consider using local instance store for items like buffers, cache, queues and caches.

Amazon EBS Anti-Patterns:

Amazon EBS doesn’t suit all storage situations. The following list presents some
storage needs for which you should consider other AWS storage options.

Storage Need:  Temporary Storage: Consider using local instance store for items like buffers, cache, queues and caches.

Storage Need:  Multi-instance Storage: EBS volumes can only be attached to one EC2 instance at a time.  If you need multiple instances attached to a single data store, consider using Amazon EFS

Storage Need:  Highly Durable Storage: Instead use Amazon S3 or Amazon EFS. Amazon S3 Standard storage is designed for 99.999999999 percent (11 nines) annual durability per object. You can take a snapshot of EBS and that snapshot gets saved to S3, thereby providing the durability of S3.  Alternatively, Amazon EFS  is designed for high durability and high availability, with data stored in multiple Availability Zones within an AWS Region.

Storage Need:  Static Data or Web Content: If data is more static, Amazon S3 might represent a more cost-effective and scalable solution for storing  fixed information. Web content served out of Amazon EBS requires a web server running on Amazon EC2; in contrast, you can deliver web content directly out of Amazon S3 or from multiple EC2 instances using Amazon EFS.

Amazon EC2 Instance Store Anti-Patterns:

Amazon EC2 instance store doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options.

Storage Need:  Persistent StorageIf you need disks that a similar to a disk drive and must persist beyond the life of the instance,  EBS Volumes, EFS file systems or S3 are more appropriate

Storage Need:  Relational Database Storage: In most cases, relational databases require storage that is mounted, accessed, and locked by a single node (EC2 instance, etc.).. Instead use: AWS Dynamo DB, Amazon RDS

Storage Need:  Shared Storage: Instance store can only be attached to one EC2 instance at a time.  If you need multiple instances attached to a single data store, consider using Amazon EFS   If you need storage that can be detached from one instance and attached to a different instance, or if you need the ability to share data easily, Amazon EFS, Amazon S3, or Amazon EBS are better choices.

Storage Need:  Snapshots: If you need, long-term durability, availability,
and the ability to share point-in-time disk snapshots, EBS volumes with snapshots stored in S3  are a better choice

Posted in AWS, AWS Certified Solutions Architect, Uncategorized | Leave a comment

Path to AWS Architect Professional – Which DB to use? re:Invent Notes

AWS reInvent vids on youtube are a goldmine for knowledge seekers. What follows are my notes from the AWS ReInvent 2017 ‘Which Database to use when’ Presentation I am using these vids to study for my AWS Architect Professional exam coming up at the end of May. I hope my notes help you, too.

Amazon Database Philosophy:  Purpose build Database to satisfy particular workload for best price, programability, cost and performance for customers.

Self Managed Database: You have full responsibility for upgrades and backup. You have full responsibility for Security. Full control over parameters of server and DB. Replication is expensive and requires significant Engineering.

          VS

AWS Managed Database: AWS provides upgrades, backup and fail-over as a service. AS provides infrastructure security, certification and tools for security. DB is a managed appliance so you can easily automate. AWS provides fail-over as a packaged service. You can leverage API calls to DB / S3. “Everything is at the end of an API call.”

Generalities -what are you doing with you DB?

Operational [transactional, system of record, content management ]

Usually a good fit for caching * Small Compute sizes, few rows, items documents per request. High-throughput, high concurrency. Mission critical, HA DR data protection. Size at limit, bounded or unbounded?

Things to consider: Size at limit, bounded or unbounded. Rows, key values or documents. Need relational capabilities, Push down compute to DB? Change velocity(insert only workload vs. update workload). Ingestion requirements.

Relational Store, are really good if you need Referential integrity, strong consistency. transactions and hardened scale. Complex query support with SQL.

Key-Value Store, low latency GET and PUT.  High throughput, partition-able, fast ingestion of data. Simply Query methods with Filters.

Document Store. Indexing and storing any document with support on querying any property. Simply query with filters, projections and aggregates.

Graph Store.  Creating and navigating relations between data easily and quickly. Easily express queries in terms of relations.

RDS

RDS is a great general purpose DB, start very small and grow with business. When you are building an application with common frameworks like Ruby on Rails, Jenjo, etc.., and you can choose particular engine based on skills on your team, ( python skill and PostgreSQL ) app requirements. Bringing apps in the cloud, people start with RDS, SQL server for bringing in IIS or .net – get out of business of DB management and focus on application. Aurora vs EBS.

Aurora is a fantastic storage env because its always Multi-AZ. For massive scale relational workloads – chose Aurora, then choose features based on application. RDS is bounded at limit. You provision storage, grow with it – but ultimately limits. Aurora offers encryption and rest and in transit. 5x better performance than SQL, cost efficient.

Dynamo DB.

Break out of limitations with relational DBs. Gives ability to operate efficiently with much more scale. You can mix and match Dynamo with RDS.  Putting shopping cart with high availability, high throughput, put that on DynamoDB. If you have data with huge amounts of “push down” computation, you may put that in RDS. Data migration service moves data between sources. DyanmoDB UpdateStreams listens for changes and updates data. Partitioned.

DAX for caching – gives unbounded low latency – Application implemented on DyanmoDB with DAX in front, where we don’t manage cache management – DAX is “write-through cache” massive acceloration in performance.

Elasticache. You add cache on and use an operational store. Application takes responsibility for making cache and data consistent. Memcache and Redis interfaces -for you have an open approach.

If your data is bigger than the cache size, and you are missing hits on the cache, then cache does no good. Not everything is cacheable.

Neptune – Graph DB. Store billions of relations and query with milliseconds latency. Six replicas of data across three AZs.  Build queries with Gremlin SPARQL. Relationship is persisted in graph store.  Push down compute

Analytics- [retrospective, streaming, predictive ] 

Analytic Workloads – Columnar Format. Data in Columns tends to repeat, very compressible.  Analytic workloads are large and usually partitioned. Large compute size. Heavy compute push-down. Little to no updates, nee lots of memory and in-memory compute.

Analytic Workloads – Primary decisions to consider: Streaming or not, latency requirements, ETL or no ETL Serverless or dedicate compute. Always active or occasionally active.  What is your data format?

Amazon Athena – interactive analysis products. Treat data at rest [ structure or unstructred ] like a DB and you can query it. Zero set up cost, point to S3 and start querying.  pay per query.  ANSI SQL interface. Zero administration. Serverless.  Doing retrospective analysis, develop trends.

RedShift – Fast, Powerful – data warehouse, schema and pattern are understood, ectremely fast queries at scale. Resize cluster up and down. Data encrypted at rest and intransit.  Manage your own keys with KMS. Inexpensive. Doing retrospective analysis, develop trends

Kinesis – Doing real-time analytical. process realtime data with SQL.  Example would be a customer support case, you want to react in real time. Or brake sensors on a train. You dont have time to index that into a Datawarehouse, you need it now.

Elasticsearch – good for log analysis, full text search and application monitoring. More flexible, more natural ways. IT has Kibana bundled in

 

Posted in AWS, AWS Certified Solutions Architect, Uncategorized | Leave a comment

Lambda Access Key Age Checker using Python

How old are those Keys, anyway? 

The story goes like this: I needed to automate a way of knowing how old every API Key is and the user to which it belongs –  in all the AWS accounts I work with.

I went to look for  a Lambda script on the interwebs that would do some checks on Access Key age for me. Seems like a basic thing you think would be out there.. but, no – no one had done this in this way. At least what I could find.  I did find some code pieces that others had done for pulling out just the key age, but I ended up putting most all of this together myself and learning a lot!

So what does this script do? Specifically, leveraging boto3 libraries, it’s calls out to IAM and return all usernames into a list, then, the next loop iterates through that list of users  – doing the following for each: get the key for each user and parse the create date against current date, if the key is older than 90 days,  append that user and key age to a new list – then, convert that new list into a string, (so it complies with the SNS message type),  pass the string to SNS – and SNS then emails out to those who have subscribed to that topic.

Lambda Access Key Age Checker using Python

Here is the code in my github.   The Python 2.7 code does exactly what is stated above. Also here is the JSON policy that gives Lambda access to IAM, SNS, and CloudWatch Logs that you will need to attach to a Lambda Execution Role that will need to be created for this function to run.

Lambda has a default function timeout of 3 seconds. In on of the regions in which I implemented this, the code took 5 seconds to run, so I had to increase the default timeout to 5 seconds, fyi –

I set up a CloudWatch Events Rule to Trigger this Lamba using AWS CRON Expressions

I don’t claim to be a Developer,  but I do love automating things with code and this script works, and works well, repeatedly.  I hope it works will for you, too!

Oh, one last thing!  This script could also be easily modded to disable keys that are older than 90 days, which is the next logical step after creating user awareness of key age and implementing and communicating a policy on key age.

Posted in AWS, Lambda | Leave a comment

AWS S3 Bucket Policy Batman Style

 

As easy as bucket Policies are supposed to be, I fought for some hard won knowledge on how to bolt together what I thought was going to be a simple policy.  After banging my head, I called AWS support for help – so I wanted to share the win.

My goal: Apply an S3 Bucket policy that limits access to only IAM users and a IAM Role in the same account as the Bucket.

AWS advocates that Bucket Polices are designed for cross account access.  and to do what I wanted to do they recommended IAM policies. That is all good and well, but I believe in tightening security closest to the source when I can.  Bucket Polices are great for that.  I am using condition statements to deny access except for the entity named. So here is what I thought would work:

{

     "Version": "2012-10-17",

     "Id": "BatCavePolicy",

     "Statement": [

        {

               "Sid": "Deny access except NotPrincipal list",

               "Effect": "Deny",

               "NotPrincipal": {

                     "AWS": [

                            "arn:aws:iam::111111111111:user/batman",

                            "arn:aws:iam::111111111111:user/robin",

                            "arn:aws:iam::111111111111:user/catwoman",

                            "arn:aws:iam::111111111111:role/batcomputercontrollerrole"

                    ]

          },

        "Action": "s3:*",

        "Resource": "arn:aws:s3:::batcomputerbucket"

      }

  ]

}

Nope. I locked myself out of the Bucket.  Here’s why:.

Instead of providing the complete Principal ARN AWS needs  the RoleID and userID of the IAM resources for a Bucket Policy of this type. We can get these ID’s by running the AWS CLI commands:

$ aws iam get-role --role-name batcomputerrole

"RoleId": AROZZZZZZZZZZ1

Since the EC2 instance (when assuming a IAM role) is using temporary credentials of the IAM user, We have to use the RoleID “AROZZZZZZZZZZZZ1:*” in the bucket policy. ‘*’ follows the temporary credentials of the IAM user.

$ aws iam get-user --user batman

"UserId": AIDYYYYYYYYYYY1

$ aws iam get-user --user robin

"UserId": AIDWWWWWWWWWWW1

$ aws iam get-user --user catwoman

"UserId": AIDXXXXXXXXXXX1

And then our correct Bucket Policy looks like this:

{

     "Id": "BatCavePolicy",

     "Version": "2012-10-17",

     "Statement": [

         {

                 "Sid": "BatComputerBucket",

                 "Action": "s3:*",

                 "Effect": "Deny",

                 "Resource": [

                        "arn:aws:s3:::batcomputerbucket",

                        "arn:aws:s3:::batcomputerbucket/*"

                   ],

                "Condition": {

                  "StringNotLike": {

                        "aws:userid": [

                             "AROZZZZZZZZZZ1:*",

                             "AIDYYYYYYYYYYY1",

                             "AIDWWWWWWWWWWW1",

                             "AIDXXXXXXXXXX1"

                         ]

                 }

            },

             "Principal": "*"

       }

   ]

}

AWS does have this documented  but, that link only references doing it this way with the role. having to get the user-id element for the users threw me off. I hope this helps.

Posted in AWS, AWS Certified Solutions Architect, Bucket Policy | Leave a comment