In recent time I got a requirement where i was supposed to process a huge amount of data provided by the user in the excel sheet and update this data in the HR infotypes. Unfortunately the function module that updates the infotype ‘HR_INFOTYPE_OPERATION’ updates only one infotype at a time. So i was forced to call the function module in a loop, which sometimes results with time out error when the data volume is more. Thinking the possibilities of avoiding the timeout error i came to know the concept of parallel processing in ABAP which is processed in asynchronous mode. With parallel processing, there is a possibility of increase the performance to a huge extent. There are some limitations though. One should be careful when using this process.
Whenever we are processing a function module in a loop what happens is the looping will be performed by a dialog work process and it wait till the execution of the function module completes and then resume the operation from there (Synchronous). Here in the case of parallel processing the function module is assigned to be executed with a new work process, here there is no need for the dialog work process processing the loop to wait till the execution of the function module it continues to resume the process with the next set of data(Asynchronous mode).
The below mentioned reference link will teach you lot about parallel processing.
Reference link
http://help.sap.com/saphelp_nw70/helpdata/en/fa/096e92543b11d1898e0000e8322d00/content.htm
http://help.sap.com/saphelp_nwpi71/helpdata/en/22/0425c6488911d189490000e829fbbd/content.htm
http://sapignite.com/learn-parallel-processing-in-abap/
Here are few things that i learnt:
1. Parallel processing is not suitable for data that are to be processed sequentially. Hence logically independent work can only be processed by this method.
2. All function modules that are RFC enabled can be processed by this mode.
3. When RFC enabled function module is called we are supposed to specify a group in which the rfc is supposed to be executed. You can find this RFC group in Tcode RZ12. A group includes the application server instances and that instance in turn can contain any no of dialog work process configured by the basis team. I noticied that there was group created with ‘parallel_generator’ and i guess it must be available in all servers for parallel processing.
4. Before we are going to process the task lets check for the available no of dialog work process and split our processing accordingly. Try to utilize only a min of work process that are free so as not to soak up the jobs running in other applications
Below is a simple example to demonstrate the difference between the two modes of processing data.
Step 1: Created a table with the following structure.
Step 2: I have created a function module that update data to the table one by one.
Now to check the difference in performance between parallel processing and normal procedure I have created two programs.
Program 1: Without parallel processing.
Code:
data : wa_data type zzcsk_emp.
data : it_data type standard table of zzcsk_emp.
data : lv_result type flag.
start-of-selection.
* Populate the dummy data
do 1000 times.
wa_data-mandt = sy-mandt.
wa_data-field1 = sy-index.
append wa_data to it_data.
clear wa_data.
enddo.
* Call the function module to update the data.
call function 'ZCSK_PARALLEL_TABLE_UPDATE'
importing
lv_result = lv_result
tables
data = it_data.
if lv_result = abap_true.
write : / 'Success'.
else.
write : / 'Failed'.
endif.
Runtime analysis of program without parallel processing:
Program 2: With Parallel processing
Code:
data : wa_data type zzcsk_emp,
it_data type standard table of zzcsk_emp, " internal table with contentss
it_data_temp type standard table of zzcsk_emp. " Internal table to store the content to be processed
data : lv_appsvr type rzllitab-classname value 'parallel_generators'. " All the application servers assinged to an instance
" are grouped in the table rzllitab " You can see the instance data's in tcode rz12
data : lv_total type i, " Total no of dialog work process available in the group server
lv_available type i, " No of dialog work process that are free.
lv_occupied type i, " No of occupied dialog process
lv_diff type i, " Percentage difference of available work process
lv_split type i. " No of partitions the main data is to be split
data : lv_lines type i, " No of records in the internal table
lv_lines_tab type i, " No of lines per tab
lv_start type i, " Start point for processing
lv_end type i. " End point for processing
data : lv_task type string, " Name of the task to be created
lv_index type string, " Variable for index
lv_sent type i, " No of package sent
lv_comp type i, " No of package completed
lv_result type flag. " Variable to collect the result.
data : lv_result_string TYPE string. " Result string
start-of-selection.
* Call the function module spbt_initialize the list of application server available and those that are free
call function 'SPBT_INITIALIZE'
exporting
group_name = lv_appsvr
importing
max_pbt_wps = lv_total
free_pbt_wps = lv_available
exceptions
invalid_group_name = 1
internal_error = 2
pbt_env_already_initialized = 3
currently_no_resources_avail = 4
no_pbt_resources_found = 5
cant_init_different_pbt_groups = 6
others = 7.
if sy-subrc = 0.
* Split the data to be processed into no of work processes.
lv_occupied = lv_total - lv_available.
* Calculate the difference in percentage
lv_diff = ( lv_available * 100 ) / lv_total.
* Based on the available no of workprocess split the data
if lv_diff <= 25.
lv_split = lv_available div 2.
elseif lv_diff between 25 and 50.
lv_split = lv_available * 2 div 3.
elseif lv_diff >= 50.
lv_split = lv_available * 3 div 4.
endif.
endif.
* Dummy data generation
do 1000 times.
wa_data-mandt = sy-mandt.
wa_data-field1 = sy-index.
append wa_data to it_data.
enddo.
* Split the internal table accordingly.
lv_lines = lines( it_data ).
lv_lines_tab = lv_lines / lv_split.
do lv_split times.
lv_index = sy-index.
concatenate 'task' lv_index into lv_task.
lv_start = lv_start + lv_lines_tab.
lv_end = lv_lines_tab + 1.
if lv_index = 1.
lv_start = 0.
endif.
if lv_split = lv_index.
lv_end = 0.
endif.
it_data_temp[] = it_data[].
if lv_start is not initial.
delete it_data_temp to lv_start.
endif.
if lv_end is not initial.
delete it_data_temp from lv_end.
endif.
* Process the record set
* Call the function module to update the data.
* Here each and everytime the function module is called it will be called in a dialog work process that is free
call function 'ZCSK_PARALLEL_TABLE_UPDATE' starting new task lv_task destination in group lv_appsvr
performing update_status on end of task
tables
data = it_data_temp.
if sy-subrc = 0.
lv_sent = lv_sent + 1.
endif.
enddo.
WAIT UNTIL lv_comp >= lv_sent.
write : / 'The no of packets sent' , lv_sent,
'The no of packets completed', lv_comp.
*&---------------------------------------------------------------------*
*& Form UPDATE_STATUS
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
* -->P_DATA text
* -->P_= text
* -->P_IT_DATA_TEMP text
*----------------------------------------------------------------------*
form update_status using lv_task.
lv_comp = lv_comp + 1.
receive results from function 'ZCSK_PARALLEL_TABLE_UPDATE'
importing
lv_result = lv_result.
if lv_result is initial.
lv_result_string = 'Success'.
else.
lv_result_string = 'Failure'.
endif.
concatenate 'The data passed via task' lv_task 'updation is' lv_result_string into lv_result_string separated by space.
write : / lv_result_string.
endform. " UPDATE_STATUS
Runtime analysis of program with parallel processing:
Hi Arun Krishnamoorthy,
ReplyDeletefirstly thanks for the post, Very informative and self explaining.
But i have one doubt, insted of "DO lv_split TIMES" if we use "DO ENDDO" while calling parallel FM, how many times it will execute?
Because it is not mandatory to use FM SPBT_INITIALIZE. If we dont use it then i think we Need to use only Do and END Do.
** In your example if i remove DO lv_split TIMES with DO i am getting Resource failure error. please suggest why is this happening.
*** Can we replace DO with Loop?
Thanks in advance,
regards,
Narsireddy.
The reason why i am calling the SPBT_INTIALIZE method is to get to know how many work process are available in the system at that point of time of execution. By knowing that i utilize a max of 1/3 rd of process for my self and leave the rest for the process executed by others.
DeleteIn each and every do loop i am assigning a set of records to be processed by a particular work process. So depending up on the no of work process available i am splitting my data and processing the data. say for example i have 3 work process to utilize and 90 records to update. I split it into 3 data sets and assign each of the data set to one work process.
In your case, when you are blindly using do statement, what might happen is that, say you have 9 work process left in the system to be utilized and in each loop you are assigning the task to one work process at the 10th loop you will not have any work process to carry out your job and you will end up with errors. say a resource failure might be..
So please ensure before you assign a task that the work process is available to carry out that job.
Best Regards,
Arun Krishnamoorthy
Dear Arun Krishnamoorthy,
DeleteThanks for your reply,
I have considered the exceptions and in case of System failure or Communication failure,In help.sap.com it is specified to "Log the data that has been processed when an RFC Task is started and when it Returns, so that the Job can be restarted with unprocessed data." (http://help.sap.com/saphelp_nw70/helpdata/en/fa/096e92543b11d1898e0000e8322d00/content.htm)
Can you help me how to log the data that is processed and can be restarted with unprocessed data if an error occurs.
Thanks & regards,
Cheruku
Hello,
ReplyDeleteThank you very much for share all this information.
I have one doubt about parallel processing with multiply jobs. It is possible without causing any problem.
Thank you in advance.
Miguel Silva
there is different code for every company SAP and other stuffs
ReplyDeletei want the c++ code for the parallel job to create destination and delete it
Superb explanation Arun..
ReplyDeleteYour Student,
Muthukumar
Useful information. i am looking for SAP HANA Online Training from professional expert.
ReplyDeletehi
ReplyDeleteHi Arun Krishnamoorthy,
ReplyDeleteWhat's role of variable lv_occupied? It's nothing when set lv_occupied = lv_total - lv_available.
Hi Arun Krishnamoorthy,
ReplyDeleteWhat's role of variable lv_occupied? It's nothing when set lv_occupied = lv_total - lv_available.