AWS Security Automation
Security automation is the automatic handling of a task in a script or machine based security application that would otherwise be done manually by a cybersecurity professional. AWS Security Automation is automating your AWS testing tasks like scanning, enumeration, that would save time and workload of security professionals. If AWS Security Automation is not carried out, it results in an increased amount of error rates and poor efficiency.
Why the need for Security Automation?
Reliability: To reduce the risk of making mistakes reliability is required. We also want to avoid giving humans too much access. Tasks can be achieved much faster in a more reliable and consistent manner using Security automation.
Efficiency: If we have to test manually across multiple accounts one needs to keep credentials around, login into different accounts which is a tedious task with high error rates. Managing multiple accounts automatically.
Scalability: We need to scale with our business units. Scale up when more resources are required and scale down when the requirement is lenient. Scaling becomes very difficult if done manually and hence there’s a need for automation here.
Apart from these three major requirements, there is Detection, Alerting, Remediation, Countermeasures, and Forensics. None of it is compulsory, it depends on the user on which tasks should be automated and which can be done manually.
How to test or verify?
Firstly we need to generate the issue manually. Using production accounts might not be a good idea, therefore, it is recommended that test accounts are used with Security automation. Try with different sources, account types as information looks different depending on where it comes from. At times it might not always be practical to generate the issue and one can use Cloud Watch Events Samples for that.
A Lambda test event can also be used in which you will have to create your own test data based on the issue.
Let’s say we have a scenario where someone compromised your server and you automatically isolate that server from your network. This can be done by denying the server security groups, removing the user etc. The problem occurs when there’s something wrong in the script or the attacker starts jumping to all your servers. Implementing guardrails is therefore recommended for example if one server gets removed out guardrails ensure that you never take out more than a certain percentage of the servers as long as you can handle that or if you start disabling IAM roles, guardrails ensure that you don’t disable yourself.
Beginning with the initiation of the scripts the various steps involved are as follows:
React: How do we react to the events whether it is a CloudWatch Event, configuration rules or log parsing or something that you have written yourself.
Trigger: How do you trigger an event? What is it that you want to trigger?
Learn: How do you learn from what happened? If you’ve run a security automation or something similar, how do you take whatever information you’ve got from it and put it back in the chain so that when it happens again you’re aware of the measures you need to take.
Moving to the execution of the scripts, there are multiple steps involved which include:
- Priority Action: What is the most important thing that we need to do or the first thing we need to do when something happens. Like for example restarting a server, deleting a user etc.
- Forensics: Discover who where and when? Whether the same event has happened before maliciously or intentionally.
- Countermeasure: Disable access keys, isolate an instance
- Alert: How do we alert our user or our team about an incident that has occurred? Using webhooks, email integration, sending alerts to an HTTP endpoint using a lambda function etc can be done for alerting the concerned people.
- Logging: Storing core information in the database. We don’t need to store everything but only important information like for example what users have done this before or when did they do it.
Amazon CloudWatch Events
This is basically a monitoring service for AWS cloud resources and the applications you run on AWS. Amazon CloudWatch can be used to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. Additionally, it can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, log files that are generated etc.. You can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health.
CloudWatch Events are driven by API activity and only captures Read and Write events. It doesn’t capture Read-only events like List events, Get events or Describe events. It mainly captures events that pose a significant danger. It supports a wide range of services. In fact, anything that is supported by CloudTrails is supported by CloudWatch Events.
- If you have event patterns then they will have to contain all the fields listed in the pattern. So if you add something that you’ll have to look for and you have multiple event patterns in there, all of them has to match the real event. These are case sensitive too and must be used as it is.
- Character-by-character matching is maintained.
- The values being matched follow JSON rules: quotes, numbers and unquoted keywords like true, false and null.
- Number matching is at string representation level.
Find what you need:
- Easier to manage multiple resources so that if anyone does something to them, you can be notified.
- It allows us to write around custom ruling of what we want to capture like for example if you want to capture anything that is a state change in EC2 so that anytime something goes into pending mode, it will trigger off an alert.
- A user can also write down specific instructions for particular cases like API calls. Like for example anytime an API call happens for a specific user an event is triggered.
- One can also have multiple targets. So if you do want to shoot off some Lambda scripts, you can easily do that. Additionally, you can shoot off multiple Lambda scripts. For example, if you’re testing our something you can add your dump event which will dump the full event for you to see what else comes out of there but also run your actual script on it.
- You can choose what you want to send. You can send the whole event that comes in or you can extract certain information that you require.
Use Case: Live user activity tracker for IR scenarios.
When we have either a red team event or a real IR scenario we need to know what a suspected user is doing in near real time. This so we can follow our incident response playbook around when to disable a user’s access with minimal risk to security and availability of our services without alerting the attacker.
- How do we track what the user is doing as close to real time as possible?
- Possible ways to integrate with our existing tools for team collaboration when working with security incidents.
- Automate the process to start based on other risk-based solutions.
Use Case: Exposed keys remediation using Trusted Advisor
Exposed key pose a great security risk from availability and financial perspective.
- Detecting and handling exposed keys to make sure they are not being used for malicious activity.
- An alternative to improve our reaction time between detection and reaction.
- Ensuring the right team gets notified.
- Ways to prevent interference to our CICD pipelines.
Use Case: Auto remediate world accessible S3 buckets using Amazon Macie
Consider a case where a user has got high enough permissions and he or she accidentally opens an S3 bucket with public access. Publicly open buckets is not wrong but publicly open bucket with Personally Identifiable Information (PII) is wrong. Admins with high enough AWS Identity and Access Management(IAM) permissions can change the ACL of S3 buckets that contain sensitive or regulated data to be world readable or writable.
- Allowing public buckets for nonsensitive data.
- Automatically remediate overly open permissions for sensitive or regulated data.
- Notifying the right team during an event occurrence.
Macie has the feature of automatically finding buckets with PII data that is open to the world. A user can set up this queries to automatically kickoff an alert when that is discovered and send that alert directly to a Lambda function or send that to CloudWatchEvents and that in turn triggers a Lambda function with the full information from the Macie event which includes risk levels, individuals files found, type of data etc. Config rules can also be used alternatively that will run through all your buckets and look for these events.
Thank you for reading! – Setu Parimi, Steve George & Indranil Roy
Sign up for the blog directly here.
Check out our professional services here.
Feedback is welcome! For professional services, fan mail, hate mail, or whatever else, contact [email protected]