The issue: When firming a production order sometimes it may cause for all the AOS's in a cluster to crash and restart.
The setup: 2 AOS cluster with 1 Batch AOS (3 AOS's in total)
The research: Because firming the PO sometimes would not crash the AOS's I looked into the windows log whenever it did and found the following error:
Object Server 01: An error has occurred in the services framework. Method: AifMessageInspector::AfterReceiveRequest. Error: System.ServiceModel.FaultException: Failed to logon to Microsoft Dynamics AX.
Conclusion: After this all seems to be well in the world. I think that step #2 had the most to do with it working, while the others were just bad configs that no one spotted.
The setup: 2 AOS cluster with 1 Batch AOS (3 AOS's in total)
The research: Because firming the PO sometimes would not crash the AOS's I looked into the windows log whenever it did and found the following error:
Object Server 01: An error has occurred in the services framework. Method: AifMessageInspector::AfterReceiveRequest. Error: System.ServiceModel.FaultException: Failed to logon to Microsoft Dynamics AX.
at Microsoft.Dynamics.Ax.Services.AxServiceOperationContext.InitializeSession()
at Microsoft.Dynamics.Ax.Services.AxServiceOperationContext.InitializeContext()
at Microsoft.Dynamics.Ax.Services.AxServiceOperationContext.Attach(OperationContext owner)
at System.ServiceModel.ExtensionCollection`1.InsertItem(Int32 index, IExtension`1 item)
at System.Collections.Generic.SynchronizedCollection`1.Add(T item)
at Microsoft.Dynamics.Ax.Services.AifMessageInspector.AfterReceiveRequest(Message& request, IClientChannel channel, InstanceContext instanceContext)
Because the begining of the message contains "Failed to logon to Microsoft Dynamics" you know it is something dealing with creditials. Currently we have an account for the service called domain\AX_AOS_Service and an account for the .NET Business Connector called domain\AX_BusCon so I knew that it had to deal with one of these.
Action/Solution (Here is what I did to fix the issue):
- Make sure an SSRS default config instance is available for every aos. Including the batch server
- Add the AOS service account to the users in AX. This account should be added to the groups "System User" and "System Administrator" (leave employee off)
- Changed the report servers within AX from looking at a single sql server to the sql cluster (individual server vs SQL cluster)
- Changed the "Batch Group" (under system administration > setup > batch group) so that only batch servers were listed in any of these and the empty batch group was changed to have no servers. Before it contained non batch servers listed as batch servers
- Restart all 3 AOS's (Microsoft service/via services.msc)
- After restarting the AOS it would still crash but not as often. I would also now start get the following message instead of the original one
- The description for Event ID 180 from source Microsoft Dynamics AX cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: Microsoft Dynamics AX Business Connector Session 9.RPC exception 1722 in Ping occurred in session 19 ,process: Ax32.exe ,thread: 8456
- This message tells us that there is still an issue but no longer a specific one but generic. These type of messages are usually fixed by a system reboot.
- Restart all 3 AOS's boxes and not just the service (systems/windows)
Conclusion: After this all seems to be well in the world. I think that step #2 had the most to do with it working, while the others were just bad configs that no one spotted.