Sunday, August 23, 2020

TRAINING HADOOP DEVELOPER WITH APACHE SPARK


Purnama Academy 0838-0838-0001 training syllabus :


BIG DATA HADOOP DEVELOPER WITH APACHE SPARK

Durasi
:
5 Hari



Deskripsi
:
Saat ini Industri banyak menggunakan Hadoop secara ekstensif untuk menganalisis kumpulan data yang mereka miliki ,  alasannya adalah bahwa framework Hadoop bekerja atas dasar pada model pemrograman sederhana (MapReduce) dan memungkinkan solusi komputasi yang terukur, fleksibel, toleransi kesalahan dan hemat biaya. Disini, yang menjadi perhatian utama adalah menjaga kecepatan dalam mengolah dataset besar dalam hal waktu tunggu antara Query  dan waktu tunggu untuk menjalankan program.
Spark diperkenalkan oleh Apache Software Foundation untuk mempercepat proses komputasi komputasi Hadoop
Tidak seperti yang kebanyakan orang kira bahwa Spark bukanlah versi modifikasi dari Hadoop dan sebenarnya juga tidak bergantung pada Hadoop karena memiliki manajemen cluster sendiri. Hadoop hanyalah salah bagian dalam implementasi Spark.
Spark menggunakan Hadoop dengan dua cara – Pertama adalah penyimpanan dan yang kedua adalah pemrosesan. Namun karena Spark memiliki perhitungan manajemen cluster sendiri maka Spark menggunakan Hadoop lebih kepada untuk tujuan penyimpanan saja.



Target Peserta
:
-          Hadoop Developer
-          Big Data Analyst
-          IT Developer
-          DBA



Materi Training
:
1.      1. Introduction to Data Analysis with Spark                                                        
v  What Is Apache Spark?
v  A Unified Stack
v  Spark Core
v  Spark SQL
v  Spark Streaming
v  MLlib
v  GraphX
v  Cluster Managers
v  Who Uses Spark, and for What?
v  Data Science Tasks
v  Data Processing Applications
v  A Brief History of Spark
v  Spark Versions and Releases                                                                                                  
v  Storage Layers for Spark
2.Downloading Spark and Getting Started                                                                                       
v  Downloading Spark                                                                                                                 
v  Introduction to Spark’s Python and Scala Shells                                                                     
v  Introduction to Core Spark Concepts                                                                                      
v  Standalone Applications                                                                                                         
v  Initializing a SparkContext                                                                                                      
v  Building Standalone Applications                                                                                          
v  Conclusion
3.        3. Programming with RDDs
v  RDD Basics                                                                                                                              
v  Creating RDDs                                                                                                                         
v  RDD Operations                                                                                                                      
v  Transformations                                                                                                                     
v  Actions                                                                                                                                    
v  Lazy Evaluation                                                                                                                       
v  Passing Functions to Spark                                                                                                     
v  Python                                                                                                                                     
v  Scala                                                                                                                                       
v  Java                                                                                                                                         
v  Common Transformations and Actions                                                                                  
v  Basic RDDs                                                                                                                              
v  Converting Between RDD Types                                                                                             
v  Persistence (Caching)                                                                                                             
v  Conclusion
4.        4. Working with Key/Value Pairs
v  Motivation                                                                                                                              
v  Creating Pair RDDs                                                                                                                 
v  Transformations on Pair RDDs                                                                                               
v  Aggregations                                                                                                                          
v  Grouping Data                                                                                                                        
v  Joins                                                                                                                                        
v  Sorting Data                                                                                                                            
v  Actions Available on Pair RDDs                                                                                              
v  Data Partitioning (Advanced)                                                                                                 
v  Determining an RDD’s Partitioner                                                                                          
v  Operations That Benefit from Partitioning                                                                            
v  Operations That Affect Partitioning                                                                                       
v  Example: PageRank                                                                                                                
v  Custom Partitioners                                                                                                                
v  Conclusion
5.        5. Loading and Saving Your Data                                                   
v  Motivation                                                                                                                              
v  File Formats                                                                                                                           
v  Text Files                                                                                                                                
v  JSON                                                                                                                                       
v  Comma-Separated Values and Tab-Separated Values                                                          
v  SequenceFiles                                                                                                                         
v  Object Files
v  Hadoop Input and Output Formats                                                                                         
v  File Compression                                                                                                                    
v  Filesystems                                                                                                                             
v  Local/“Regular” FS                                                                                                                 
v  Amazon S                                                                                                                                
v  HDFS                                                                                                                                       
v  Structured Data with Spark SQL                                                                                             
v  Apache Hive                                                                                                                            
v  JSON                                                                                                                                       
v  Databases                                                                                                                               
v  Java Database Connectivity                                                                                                    
v  Cassandra                                                                                                                               
v  HBase                                                                                                                                     
v  Elasticsearch                                                                                                                           
v  Conclusion
6.         
7.        6. Advanced Spark Programming                                                                   
v  Introduction                                                                                                                            
v  Accumulators                                                                                                                          
v  Accumulators and Fault Tolerance                                                                                         
v  Custom Accumulators                                                                                                            
v  Broadcast Variables                                                                                                               
v  Optimizing Broadcasts                                                                                                           
v  Working on a Per-Partition Basis                                                                                           
v  Piping to External Programs                                                                                                   
v  Numeric RDD Operations                                                                                                       
v  Conclusion
8.        7. Spark SQL                                                                                  
v  Linking with Spark SQL                                                                                                           
v  Using Spark SQL in Applications                                                                                             
v  Initializing Spark SQL                                                                                                              
v  Basic Query Example                                                                                                              
v  SchemaRDDs                                                                                                                          
v  Caching                                                                                                                                   
v  Loading and Saving Data                                                                                                        
v  Apache Hive                                                                                                                            
v  Parquet                                                                                                                                   
v  JSON                                                                                                                                       
v  From RDDs                                                                                                                              
v  JDBC/ODBC Server                                                                                                                  
v  Working with Beeline                                                                                                             
v  Long-Lived Tables and Queries                                                                                               
v  User-Defined Functions                                                                                                          
v  Spark SQL UDFs                                                                                                                      
v  Hive UDFs                                                                                                                               
v  Spark SQL Performance                                                                                                          
v  Performance Tuning Options                                                                                                 
v  Conclusion
8. Spark Streaming                                           
v  A Simple Example                                                                                                                   
v  Architecture and Abstraction                                                                                                  
v  Transformations                                                                                                                     
v  Stateless Transformations                                                                                                     
v  Stateful Transformations                                                                                                       
v  Output Operations                                                                                                                  
v  Input Sources                                                                                                                          
v  Core Sources                                                                                                                           
v  Additional Sources                                                                                                                  
v  Multiple Sources and Cluster Sizing                                                                                       
v  / Operation                                                                                                                             
v  Checkpointing                                                                                                                         
v  Driver Fault Tolerance                                                                                                            
v  Worker Fault Tolerance                                                                                                          
v  Receiver Fault Tolerance                                                                                                        
v  Processing Guarantees                                                                                                           
v  Streaming UI with Flume and Kafka                                                                                      
v  Performance Considerations                                                                                                  
v  Batch and Window Sizes                                                                                                        
v  Level of Parallelism                                                                                                                
v  Garbage Collection and Memory Usage
-           



Training Lanjutan yang Disarankan
:
 MLIB SPARK















0 comments:

Post a Comment

Terima kasih telah mengunjungi halaman website kami, Jika ada pertanyaan terkait informasi di Atas silahkan isi Comment Box di bawah ini, Tim kami akan merespon komentar/ pertanyaan Anda paling lambat 2 x 24 Jam

Untuk respon cepat silahkan hubungi 0838-0838-0001 (Call/Whatsapp)

Regards,

Management,
www.purnamaacademy.com