News
Catch Up with What’s New for AI in AWS in 2025
Things slow down in the tech industry over the holiday season, so here’s a catch-up of AI news announced by Amazon Web Services (AWS) since Nov. 20, 2024, a Monday when many people started their seasonal vacations, ranging from updates to the Amazon Q dev tool to support for latency optimized models.
Amazon Q Business Is Now SOC Compliant
Amazon Q Business, AWS’s generative AI-powered assistant, has achieved SOC (System and Organization Controls) compliance as of Dec. 20, 2024. This certification covers SOC 1, 2, and 3, enabling customers to use Amazon Q Business for applications that require SOC compliance.
Key points of this announcement include:
- Amazon Q Business can now be used for SOC-compliant tasks within enterprise systems
- The certification provides insight into AWS’s security processes and controls for protecting customer data
- AWS maintains SOC compliance through rigorous third-party audits
- The compliance applies to all AWS Regions where Amazon Q Business is available
This certification enhances Amazon Q Business’s capability to handle sensitive enterprise data while maintaining high security and compliance standards. More info here.
Amazon Bedrock Agents, Flows, and Knowledge Bases Now Support Latency Optimized Models
These components of Amazon’s generative AI platform that enable developers to build sophisticated AI applications now support latency-optimized models through the SDK, as announced on Dec. 23, 2024. This update enhances AI applications built with Amazon Bedrock Tooling by providing faster response times and improved responsiveness.
Key features of this update include:
- Support for latency-optimized versions of Anthropic’s Claude 3.5 Haiku model and Meta’s Llama 3.1 405B and 70B models
- Reduced latency without compromising accuracy compared to standard models
- Utilization of purpose-built AI chips like AWS Trainium2 and advanced software optimizations
- Immediate integration into existing applications without additional setup or model fine-tuning
This enhancement is particularly beneficial for latency-sensitive applications such as real-time customer service chatbots and interactive coding assistants. The latency-optimized inference support is available in the US East (Ohio) Region via cross-region inference and can be accessed through the Amazon Bedrock SDK using a runtime configuration. More info here.
AWS Neuron Introduces Support for Trainium2 and NxD Inference
AWS released Neuron 2.21, introducing several significant updates to its AI infrastructure. The AWS Neuron SDK now supports model training and deployment across Trn1, Trn2, and Inf2 instances, available in various AWS Regions and instance types.
Highlights include:
- Support for AWS Trainium2 chips and Amazon EC2 Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer
- Introduction of NxD Inference, a PyTorch-based library integrated with vLLM for simplified deployment of large language and multi-modality models
- Launch of Neuron Profiler 2.0 (beta) with enhanced capabilities and support for distributed workloads
- Support for PyTorch 2.5
- Llama 3.1 405B model inference support on a single trn2.48xlarge instance using NxD Inference
- Updates to Deep Learning Containers and AMIs, with support for new model architectures like Llama 3.2, Llama 3.3, and Mixture-of-Experts (MoE) models
- New inference features including FP8 weight quantization and flash decoding for speculative decoding in Transformers NeuronX
- Additional training examples and features, such as support for HuggingFace Llama 3/3.1 70B on Trn2 instances and DPO support for post-training model alignment
More info here.
Llama 3.3 70B Now Available on AWS via Amazon SageMaker JumpStart
AWS made Meta’s Llama 3.3 70B model available through Amazon SageMaker JumpStart as of Dec. 26, 2024. This large language model offers a balance of high performance and computational efficiency, making it suitable for cost-effective AI deployments.
Key features of Llama 3.3 70B include:
- Enhanced attention mechanism for reduced inference costs
- Training on approximately 15 trillion tokens
- Extensive supervised fine-tuning and Reinforcement Learning from Human Feedback (RLHF)
- Comparable output quality to larger Llama versions with fewer resources
- Nearly five times more cost-effective inference operations, according to Meta
Customers can deploy Llama 3.3 70B using either the SageMaker JumpStart user interface or programmatically via the SageMaker Python SDK. SageMaker AI’s advanced inference capabilities optimize both performance and cost efficiency for deployments.
The model is available in all AWS Regions where Amazon SageMaker AI is supported. More information is here and in a separate blog post.
Amazon Q Developer Is Now Available in Amazon SageMaker Code Editor IDE
The general availability of Amazon Q Developer in Amazon SageMaker Studio Code Editor was the first AWS AI announcement of 2025, being posted yesterday.”SageMaker Studio customers now get generative AI assistance powered by Q Developer right within their Code Editor (Visual Studio Code – Open Source) IDE,” AWS said. “With Q Developer, data scientists and ML engineers can access expert guidance on SageMaker features, code generation, and troubleshooting. This allows for more productivity by eliminating the need for tedious online searches and documentation review, and ensuring more time delivering differentiated business value.”
Key features and benefits of Amazon Q Developer in SageMaker Studio Code Editor include:
- Expert guidance on SageMaker features
- Code generation tailored to user needs
- In-line code suggestions and conversational assistance
- Step-by-step troubleshooting guidance
- Chat capability for discovering and learning SageMaker features
This integration aims to enhance productivity for data scientists and ML engineers by:
- Eliminating the need for extensive documentation review
- Accelerating the model development lifecycle
- Streamlining code editing, explanation, and documentation processes
- Providing efficient error resolution
Amazon Q Developer is now available in all commercial AWS regions where SageMaker Studio is supported. This feature is accessible to both Amazon Q Developer Free Tier and Pro Tier users, with pricing potentially varying depending on individual service models.
The addition of Amazon Q Developer to SageMaker Studio Code Editor represents a significant step in AWS’s efforts to integrate generative AI capabilities into its machine learning development environment, potentially transforming the workflow for data scientists and ML engineers, the company said.