Monday, August 24, 2020

TRAINING HADOOP DEVELOPER WITH APACHE SPARK


Purnama Academy 0838-0838-0001 training syllabus :


BIG DATA HADOOP DEVELOPER WITH APACHE SPARK

Durasi
:
5 Hari



Deskripsi
:
Saat ini Industri banyak menggunakan Hadoop secara ekstensif untuk menganalisis kumpulan data yang mereka miliki ,  alasannya adalah bahwa framework Hadoop bekerja atas dasar pada model pemrograman sederhana (MapReduce) dan memungkinkan solusi komputasi yang terukur, fleksibel, toleransi kesalahan dan hemat biaya. Disini, yang menjadi perhatian utama adalah menjaga kecepatan dalam mengolah dataset besar dalam hal waktu tunggu antara Query  dan waktu tunggu untuk menjalankan program.
Spark diperkenalkan oleh Apache Software Foundation untuk mempercepat proses komputasi komputasi Hadoop
Tidak seperti yang kebanyakan orang kira bahwa Spark bukanlah versi modifikasi dari Hadoop dan sebenarnya juga tidak bergantung pada Hadoop karena memiliki manajemen cluster sendiri. Hadoop hanyalah salah bagian dalam implementasi Spark.
Spark menggunakan Hadoop dengan dua cara – Pertama adalah penyimpanan dan yang kedua adalah pemrosesan. Namun karena Spark memiliki perhitungan manajemen cluster sendiri maka Spark menggunakan Hadoop lebih kepada untuk tujuan penyimpanan saja.



Target Peserta
:
-          Hadoop Developer
-          Big Data Analyst
-          IT Developer
-          DBA



Materi Training
:
1.      1. Introduction to Data Analysis with Spark                                                        
v  What Is Apache Spark?
v  A Unified Stack
v  Spark Core
v  Spark SQL
v  Spark Streaming
v  MLlib
v  GraphX
v  Cluster Managers
v  Who Uses Spark, and for What?
v  Data Science Tasks
v  Data Processing Applications
v  A Brief History of Spark
v  Spark Versions and Releases                                                                                                  
v  Storage Layers for Spark
2.Downloading Spark and Getting Started                                                                                       
v  Downloading Spark                                                                                                                 
v  Introduction to Spark’s Python and Scala Shells                                                                     
v  Introduction to Core Spark Concepts                                                                                      
v  Standalone Applications                                                                                                         
v  Initializing a SparkContext                                                                                                      
v  Building Standalone Applications                                                                                          
v  Conclusion
3.        3. Programming with RDDs
v  RDD Basics                                                                                                                              
v  Creating RDDs                                                                                                                         
v  RDD Operations                                                                                                                      
v  Transformations                                                                                                                     
v  Actions                                                                                                                                    
v  Lazy Evaluation                                                                                                                       
v  Passing Functions to Spark                                                                                                     
v  Python                                                                                                                                     
v  Scala                                                                                                                                       
v  Java                                                                                                                                         
v  Common Transformations and Actions                                                                                  
v  Basic RDDs                                                                                                                              
v  Converting Between RDD Types                                                                                             
v  Persistence (Caching)                                                                                                             
v  Conclusion
4.        4. Working with Key/Value Pairs
v  Motivation                                                                                                                              
v  Creating Pair RDDs                                                                                                                 
v  Transformations on Pair RDDs                                                                                               
v  Aggregations                                                                                                                          
v  Grouping Data                                                                                                                        
v  Joins                                                                                                                                        
v  Sorting Data                                                                                                                            
v  Actions Available on Pair RDDs                                                                                              
v  Data Partitioning (Advanced)                                                                                                 
v  Determining an RDD’s Partitioner                                                                                          
v  Operations That Benefit from Partitioning                                                                            
v  Operations That Affect Partitioning                                                                                       
v  Example: PageRank                                                                                                                
v  Custom Partitioners                                                                                                                
v  Conclusion
5.        5. Loading and Saving Your Data                                                   
v  Motivation                                                                                                                              
v  File Formats                                                                                                                           
v  Text Files                                                                                                                                
v  JSON                                                                                                                                       
v  Comma-Separated Values and Tab-Separated Values                                                          
v  SequenceFiles                                                                                                                         
v  Object Files
v  Hadoop Input and Output Formats                                                                                         
v  File Compression                                                                                                                    
v  Filesystems                                                                                                                             
v  Local/“Regular” FS                                                                                                                 
v  Amazon S                                                                                                                                
v  HDFS                                                                                                                                       
v  Structured Data with Spark SQL                                                                                             
v  Apache Hive                                                                                                                            
v  JSON                                                                                                                                       
v  Databases                                                                                                                               
v  Java Database Connectivity                                                                                                    
v  Cassandra                                                                                                                               
v  HBase                                                                                                                                     
v  Elasticsearch                                                                                                                           
v  Conclusion
6.         
7.        6. Advanced Spark Programming                                                                   
v  Introduction                                                                                                                            
v  Accumulators                                                                                                                          
v  Accumulators and Fault Tolerance                                                                                         
v  Custom Accumulators                                                                                                            
v  Broadcast Variables                                                                                                               
v  Optimizing Broadcasts                                                                                                           
v  Working on a Per-Partition Basis                                                                                           
v  Piping to External Programs                                                                                                   
v  Numeric RDD Operations                                                                                                       
v  Conclusion
8.        7. Spark SQL                                                                                  
v  Linking with Spark SQL                                                                                                           
v  Using Spark SQL in Applications                                                                                             
v  Initializing Spark SQL                                                                                                              
v  Basic Query Example                                                                                                              
v  SchemaRDDs                                                                                                                          
v  Caching                                                                                                                                   
v  Loading and Saving Data                                                                                                        
v  Apache Hive                                                                                                                            
v  Parquet                                                                                                                                   
v  JSON                                                                                                                                       
v  From RDDs                                                                                                                              
v  JDBC/ODBC Server                                                                                                                  
v  Working with Beeline                                                                                                             
v  Long-Lived Tables and Queries                                                                                               
v  User-Defined Functions                                                                                                          
v  Spark SQL UDFs                                                                                                                      
v  Hive UDFs                                                                                                                               
v  Spark SQL Performance                                                                                                          
v  Performance Tuning Options                                                                                                 
v  Conclusion
8. Spark Streaming                                           
v  A Simple Example                                                                                                                   
v  Architecture and Abstraction                                                                                                  
v  Transformations                                                                                                                     
v  Stateless Transformations                                                                                                     
v  Stateful Transformations                                                                                                       
v  Output Operations                                                                                                                  
v  Input Sources                                                                                                                          
v  Core Sources                                                                                                                           
v  Additional Sources                                                                                                                  
v  Multiple Sources and Cluster Sizing                                                                                       
v  / Operation                                                                                                                             
v  Checkpointing                                                                                                                         
v  Driver Fault Tolerance                                                                                                            
v  Worker Fault Tolerance                                                                                                          
v  Receiver Fault Tolerance                                                                                                        
v  Processing Guarantees                                                                                                           
v  Streaming UI with Flume and Kafka                                                                                      
v  Performance Considerations                                                                                                  
v  Batch and Window Sizes                                                                                                        
v  Level of Parallelism                                                                                                                
v  Garbage Collection and Memory Usage
-           



Training Lanjutan yang Disarankan
:
 MLIB SPARK















0 comments:

Post a Comment

Terima kasih telah mengunjungi halaman website kami, Jika ada pertanyaan terkait informasi di Atas silahkan isi Comment Box di bawah ini, Tim kami akan merespon komentar/ pertanyaan Anda paling lambat 2 x 24 Jam

Untuk respon cepat silahkan hubungi 0838-0838-0001 (Call/Whatsapp)

Regards,

Management,
www.purnamaacademy.com

IT Management and Certifications

  • TRAINING ITIL V.4 FOUNDATION (IT INFRASTRUCTURE LIBRARY)
  • TRAINING TOGAF (THE OPEN GROUP ARCHITECTURE FRAMEWORK) PART 1 & PART 2
  • TRAINING ARCHIMATE (ENTERPRISE ARCHITECTURE MODELING LANGUAGE)
  • TRAINING PMP/ CAPM (PMBOK - PROJECT MANAGEMENT BODY OF KNOWLEDGE)
  • TRAINING PMO (PROJECT MANAGEMENT OFFICE)
  • TRAINING CTFL - ISTQB (CERTIFIED TESTER FOUNDATION LEVEL - INTERNATIONAL SOFTWARE TESTING QUALIFICATION BOARD)
  • TRAINING SQA (SOFTWARE QUALITY ASSURANCE)
  • TRAINING PRINCE2 (PROJECTS IN CONTROLLED ENVIRONMENTS)
  • TRAINING RDBMS CONCEPT
  • TRAINING DATA MANAGEMENT BODY OF KNOWLEDGE (DMBOK)
  • TRAINING SAD UML (SYSTEM ANALYSIS DESIGN WITH UML 2.0)
  • TRAINING CBAP - BUSINESS ANALYSIS WITH BABOK V.2
  • TRAINING SEO & DIGITAL MARKETING
  • TRAINING E-FILLING WITH EDMS (ELECTRONIC DOCUMENT MANAGEMENT SYSTEM)
  • TRAINING ISO/IEC 27001:2022 (INFORMATION SECURITY MANAGEMENT SYSTEMS - ISMS)
  • TRAINING ISO/IEC 20000-1:2018 (INFORMATION TECHNOLOGY SERVICE MANAGEMENT)
  • TRAINING ISO/IEC 38500:2008 (CORPORATE GOVERNANCE OF INFORMATION TECHNOLOGY)
  • TRAINING ISO/IEC 90003:2014 (SOFTWARE ENGINEERING)
  • TRAINING ISO 31000 : RISK MANAGEMENT
  • TRAINING COSO ENTERPRISE RISK MANAGEMENT
  • TRAINING ISO 28000:2007 (SUPPLY CHAIN SECURITY MANAGEMENT)
  • TRAINING ISO 22301:2019, SECURITY AND RESILIENCE – BUSINESS CONTINUITY MANAGEMENT SYSTEMS
  • TRAINING INDEKS KAMI V.4.2 (INDEKS KEAMANAN INFORMASI) – BSSN
  • TRAINING SIX SIGMA FOUNDATION (GREEN BELT)
  • TRAINING IIOT (INDUSTRIAL 4.0 INTERNET OF THINGS)
  • TRAINING MANAGERIAL SKILLS AND LEADERSHIP
  • TRAINING KMS - KNOWLEDGE MANAGEMENT SYSTEM
  • TRAINING ELMS – ELECTRONIC LEARNING MANAGEMENT SYSTEM
  • TRAINING DEVOPS/ DEVSECOPS
  • TRAINING CISA (CERTIFIED INFORMATION SYSTEMS AUDITOR)
  • TRAINING CCISO (CERTIFIED CHIEF INFORMATION SECURITY OFFICER)
  • TRAINING CISM (CERTIFIED INFORMATION SECURITY MANAGER)
  • TRAINING CISSP (CERTIFIED INFORMATION SYSTEMS SECURITY PROFESSIONAL)
  • TRAINING CDPSE (CERTIFIED DATA PRIVACY SOLUTION ENGINEER)
  • TRAINING CGEIT (CERTIFIED IN THE GOVERNANCE OF ENTERPRISE INFORMATION TECHNOLOGY)
  • TRAINING CRISC (CERTIFIED IN RISK AND INFORMATION SYSTEMS CONTROL)
  • TRAINING SCRUM MASTER AND PRODUCT OWNER
  • TRAINING SECURING & AUDITING MICROSERVICES APPLICATIONS
  • TRAINING COBIT 2019 FOUNDATION
  • TRAINING COBIT 2019 DESIGN & IMPLEMENTATION
  • TRAINING ITIL V4 FOUNDATION
  • TRAINING CSM - CERTIFIED SCRUM MASTER
  • TRAINING CERTIFIED INFORMATION PRIVACY PROFESSIONAL (CIPP)
  • TRAINING ITAM – (IT ASSET MANAGEMENT)
  • TRAINING DATA STORYTELLING (INFOGRAFIS)
  • TRAINING EFFECTIVE WRITING SKILLS
  • TRAINING OPERATIONS - CAPACITY MANAGEMENT
  • TRAINING HDFCNA - HDF CERTIFIED NIFI ARCHITECT
  • TRAINING ICT DOCUMENT MANAGEMENT
  • TRAINING TECHNICAL WRITING FOR SOFTWARE DEVELOPMENT
  • TRAINING MODERN ICT OPERATIONS MANAGEMENT
  • TRAINING IT RISK MANAGEMENT
  • TRAINING IT HELPDESK MANAGEMENT
  • TRAINING PAYMENT CARD INDUSTRY DATA SECURITY STANDARD - PCI DSS 4.0 LEAD AUDITOR
  • TRAINING PROJECT MANAGEMENT WITH AGILE (SCRUM, RAD, KANBAN, VMODEL)
  • TRAINING UI/UX DESIGN
  • TRAINING COMPTIA SECURITY+

Top Topics