Qualoom has been the spanish AWS partner selected by Fleurop Interflora España S.A. to migrate their IT systems to the cloud and support the production environment once in full use. Interflora is the Spanish leader in the floral and plant orders management business, and one of the most important business units of Fleurop Inc. global network.
In early 2014, Interflora made a decision to migrate its infrastructure (hosted in a classic hosting server provider) to a cloud computing solution to ease the numerous problems involved in managing servers, especially those during peak business periods where the activity is multiply by five. The main goals were:
- An infrastructure able to absorb the peak period load without having to maintain a fleet of oversized servers.
- Minimize operating costs of server infrastructure.
- Optimize the deployment of business applications.
- Improve platform infrastructure monitoring.
- Ensure a safe access to the various enterprise applications and minimize the application surface exposed to external traffic.
- Optimize the backup and recovery after disasters.
In order to address these requirements, it was decided to migrate the entire server infrastructure to Amazon Web Services, adapting each part to maximize the benefits offered by AWS managed services.
Interflora’s e-commerce platform is the main entry point of the business and it is the most exposed part to the Internet, so the following requirements are the most critical:
- Maximize service availability, even in scenarios of unexpected or programmed peak load.
- Minimize response time to end customer
- Minimize exposure to Internet
To achieve these goals we chose to implement multiple AWS managed services:
- EC2 + Autoscaling: Autoscaling implementation allows Interflora to adapt the infrastructure size to the real needs of the business at all times and implement an automatic recovery mechanism or in case of failure in one of the nodes.
- Elastic Load Balancing (ELB): The AWS load balancer service minimizes the attack surface exposed to the Internet and provides a robust and scalable load balancing mechanism with an implantation time of a few minutes.
- Relational Database Service (RDS): Relational database with automatic backup management, replication, failover and vertical scaling.
- Elasticache: Cache and sessions warehouse shared by all application nodes.
- Cloudfront: Pay-per-use content distribution network for optimization of static content download and reduction of the need for provisioning permanent infrastructure servers for this purpose.
Interflora has several business applications that facilitate the management of multiple business processes beyond the simple receipt of orders carried out on the E-Commerce part:
- Purchase order management (ERP)
- Customer relationships (CRM)
- Financial management
- Internal administration
- Business Intelligence e
All these activities are done with commercial business management applications on a Windows infrastructure that relies on various Microsoft technologies such as Active Directory or MS SQL Server, the latter being one of the critical components responsible for the system performance due to the high memory and storage demand. In order to solve this problem, EC2 instances optimized for EBS (SAN storage) and SSD volumes have been used, whose predictable performance and integrated monitoring tools have allowed us to reach a right balance between the desired performance and infrastructure costs.
This application infrastructure requires additional servers for business intelligence and reporting, and provides a web services’ layer for the integration with E-Commerce ERP system, balanced between different application servers with the ELB service. The schema of the different services is summarized in the following figure: In order to ensure an optimal access to the ERP, and due to the heavy customer tool restrictions regarding latency with the application servers (unavoidable because of the latencies between Spain and the closes AWS datacenter) it was decided to implement a Remote Desktop Infrastructure for providing an optimum access mechanism to the users in Spain.
Hybrid Cloud Infrastructure
One of the key features for Interflora when migrating all services to the Amazon cloud was to ensure a safe and efficient communication with business applications and control panels, as well as to continue allowing proper synchronization between certain services hosted in the Spanish Interflora facilities (Active Directory, VoIP, etc.) and the rest of the AWS infrastructure. To achieve this goal with the best guarantees we chose AWS DirectConnect in collaboration with Colt, one of the many Spanish partners that provide access to the service. DirectConnect gives end customers the ability to connect to a dedicated port for physical network connectivity to all the services of Amazon, hosted in the public cloud AWS or in the private network service VPC.
Configuration Management: Puppet
In order to provide a central and documented mechanism for configuration, especially in regard to the Linux instances of the E-Commerce environment, the entire system configuration is implemented in the form of manifestos and Puppet modules. With this service, the entire platform configuration is centralized, modeled as versionable code and allows you to greatly simplify the deployment of configuration changes without requiring access to each node independently. In addition, to facilitate the work of parallel task execution and local group of instances information gathering, Mcollective was implemented as a means of orchestrating tasks. For this scenario being possible with the minimum system administrator interaction, we have implemented a custom bootstrap so that all instances are recorded in these management services automatically and assigned a number of properties depending on their role in the infrastructure, so that each node or group of nodes is easily reachable by simple Mcollective filters.
Monitoring and Alerts
- Amazon CloudWatch:: All AWS services are integrated immediately with this service, allowing an easier task of metric gathering from the AWS hosted services (ELB, RDS, ElastiCache, Cloudfront ...). To monitor some key aspects of infrastructure unavailable with the standard metrics provided by AWS, multiple Windows and Linux customized infrastructure metrics were implemented.
- Amazon ELB + S3: TAll ELBs published, main entry point for new customers, export their access logs to an S3 for subsequent analysis.
- ELK: To provide a detailed vision of the events occurring in any part of the platform, a log gathering service was implemented based on Elastic Inc. tools for syslog analysis, Windows events, public ELBs access logs and multiple log files from different internal applications.
- Ganglia:Although CloudWatch provides all the tools necessary for infrastructure performance monitoring, a Ganglia collector and the distribution of agents on all nodes, allows the metric aggregation by business concepts, offers a better past record metric and provides the ability to report metrics with a granularity of seconds.
- Nagios: To run custom HealthChecks that provides the status of the various services hosted on the EC2 instances, Nagios is used, the standard for this kind of monitoring.
- Amazon SNS: Due to its immediate integration with other Amazon services, the simplicity of its API for integration with third party applications and its notification model based on subscriptions, SNS is used for notifications, both the alerts generated by both CloudWatch and those from the HealthChecks Nagios and also from several other multiple internal business processes.
One of Interflora biggest concerns when moving their servers to the cloud was to minimize the risk of security breaches in access to the various services, both in the public part of it (E-Commerce) and in the internal applications. To do this, we have made use of a wide range of options provided by different network and security AWS services:
- VPC: All EC2 instances stay inside VPC, an AWS service that enables an EC2 section provision completely isolated from Internet.
- Subnets: : Inside the VPC different subnets were provisioned, each one intended for accommodation of certain types of services based on their nature: public or private access, Interflora E2 instances, AWS managed services, availability zone, etc…
- Security Groups: Instead of implementing complex mechanisms of firewall and network ACLs, we opted for the utilization of Security Groups, an EC2 networking management characteristic that allows configuring network restrictions based on group membership of each instance and / or managed AWS service.
- NAT/VPN administrative: As mentioned previously, VPC is an EC2 net section completely isolated. To provide access to Internet and other AWS services hosted outside the internal network (S3, SNS ...) a NAT instance was set for each availability zone. Additionally, these instances serve a VPN for VPC administrative access from remote locations.
- Virtual Private Gateway y DirectConnect: For the communications with Interflora Spain offices in a safe, latente and bandwidth optimal state, DirectConnect has been used in conjunction with Virtual Private Gateway. The former allows direct communication from the Spanish offices using dedicated hardware infrastructure, while the latter provides a fully managed mechanism to link the VPC internal network with DirectConnect dedicated hardware.
- ELB: To minimize the attack surface exposed to Intern, it was decided to reduce the number of EC2 instances in subnets with public access to as much as possible. So much it is that only the NAT and administrative VPN instances (one per Availability Zone) have public IPs across the infrastructure. All incoming E-Commerce platform traffic pass through the AWS balancers, minimizing the attack surface and centralizing SSL certificates and encryption algorithms management.
Backup plans and disaster recovery
To simplify the backup and recovery process, a wide use of the services offered by Amazon Web Services has been made:
- EC2 and EBS
- Every instance AMI ready for an eventual recovery service in case of unrecoverable errors in production instances.
- Regular Snapshots of physical data volumes.
- RDS and Elasticache
- Integrated backup based on snapshots of storage volumes.
- Instance automatic recovery in case of serious errors that prevent restoring the service.
- Data replication integrated in the service.
- S3 and Glacier
- File storing backups via scheduled tasks from EC2 instances dedicated to data storage.
- Lifecycle implementation for automatic backup filing in low-cost storage services and old items deletion.