4 minute read

In this blog post, we’ll explore how to automate the process of splitting large reports into smaller, more manageable files using C#. The code we’ll cover works with both CSV and Excel files (.csv and .xlsx), leveraging the Microsoft Office Interop for Excel and standard file I/O operations. We’ll build a Windows Forms application that allows users to select a report file, specify chunk size, and generate smaller, organized output files.

Running project

The complete source code for this project is available on GitHub. Feel free to clone it, experiment with the code, or contribute to the project.

Key Features of the Application:

  1. File Selection: Users can select a CSV or Excel file using a file browser.
  2. Chunk Size Definition: Specify the number of rows per file chunk.
  3. Automatic File Splitting: The application splits the report into smaller CSV files.
  4. Responsive UI: The background worker ensures the UI remains responsive while processing large files.

How It Works

1. File Selection

Users can select either a CSV or Excel file. Here, the OpenFileDialog is configured to filter and browse both file types:

private void buttonBrowse_Click(object sender, EventArgs e)
{
    var openFileDialog1 = new OpenFileDialog
    {
        Title = @"Browse Report File",
        CheckFileExists = true,
        CheckPathExists = true,
        DefaultExt = "csv",
        Filter = @"csv Files (*.csv)|*.csv|Excel Files (*.xlsx)|*.xlsx",
        FilterIndex = 1,
        RestoreDirectory = true,
        ReadOnlyChecked = true,
        ShowReadOnly = true
    };

    if (openFileDialog1.ShowDialog() == DialogResult.OK)
    {
        textBoxFilePath.Text = openFileDialog1.FileName;
    }
}

This dialog prompts the user to select a file, which is then displayed in a TextBox.

2. Handling File Splitting

The core logic of the application lies in the ProcessFile method, which determines the type of file selected (CSV or Excel) and processes it accordingly.

private static void ProcessFile(string filePath, int chunkSize, string fileNamingConvention)
{
    if (Path.GetExtension(filePath).Equals(".xlsx", StringComparison.OrdinalIgnoreCase))
    {
        ProcessExcelFile(filePath, chunkSize, fileNamingConvention);
    }
    else if (Path.GetExtension(filePath).Equals(".csv", StringComparison.OrdinalIgnoreCase))
    {
        ProcessCsvFile(filePath, chunkSize, fileNamingConvention);
    }
    else
    {
        throw new NotSupportedException("File type not supported. Please select a CSV or Excel file.");
    }
}

The file extension determines whether to use Excel Interop for Excel files or StreamReader for CSV files. Let’s explore each method.

3. Splitting Excel Files

When processing an Excel file, Excel Interop is used to read through rows. Data from the first column of each row is collected and split based on the chunk size. Once the chunk size is reached, a new CSV file is created.

private static void ProcessExcelFile(string excelFilePath, int chunkSize, string fileNamingConvention)
{
    Excel.Application xlApp = null;
    Excel.Workbook xlWorkbook = null;
    Excel.Worksheet xlWorksheet = null;
    Excel.Range xlRange = null;

    try
    {
        var idFromFile = new List<string>();
        xlApp = new Excel.Application();
        xlWorkbook = xlApp.Workbooks.Open(excelFilePath);
        xlWorksheet = xlWorkbook.Sheets[1];
        xlRange = xlWorksheet.UsedRange;

        var rowCount = xlRange.Rows.Count;
        var fileIncremental = 0;

        for (var i = 1; i <= rowCount; i++)
        {
            var s = xlRange.Cells[i, 1]?.Value2?.ToString();
            if (string.IsNullOrEmpty(s)) continue;

            idFromFile.Add(s);
            if (idFromFile.Count == chunkSize)
            {
                fileIncremental++;
                CreateCsvFile(idFromFile, fileNamingConvention, fileIncremental, chunkSize);
                idFromFile.Clear();
            }
        }

        // Write remaining data if any
        if (idFromFile.Count > 0)
        {
            fileIncremental++;
            CreateCsvFile(idFromFile, fileNamingConvention, fileIncremental, chunkSize);
        }
    }
    finally
    {
        CleanupExcelResources(xlApp, xlWorkbook, xlWorksheet, xlRange);
    }
}

After processing, the code ensures proper cleanup of COM objects to prevent memory leaks.

4. Splitting CSV Files

When dealing with CSV files, the logic is simpler, using StreamReader to read and process each line.

private static void ProcessCsvFile(string csvFilePath, int chunkSize, string fileNamingConvention)
{
    var idFromFile = new List<string>();
    var fileIncremental = 0;

    using (var reader = new StreamReader(csvFilePath))
    {
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();
            var values = line?.Split(',');
            if (values == null || values.Length == 0) continue;

            idFromFile.Add(values[0]);
            if (idFromFile.Count == chunkSize)
            {
                fileIncremental++;
                CreateCsvFile(idFromFile, fileNamingConvention, fileIncremental, chunkSize);
                idFromFile.Clear();
            }
        }
    }

    // Write remaining data if any
    if (idFromFile.Count > 0)
    {
        fileIncremental++;
        CreateCsvFile(idFromFile, fileNamingConvention, fileIncremental, chunkSize);
    }
}
5. Creating CSV Files

Both the Excel and CSV methods call CreateCsvFile to write chunks of data into new CSV files. Each file is named based on the chunk size and a custom naming convention.

private static void CreateCsvFile(IEnumerable<string> data, string fileNamingConvention, int fileIncremental, int chunkSize)
{
    var fileName = $"{fileNamingConvention}_{fileIncremental}_{chunkSize}.csv";

    try
    {
        using (var sw = new StreamWriter(fileName))
        {
            sw.WriteLine(string.Join(Environment.NewLine, data));
        }
    }
    catch (Exception ex)
    {
        MessageBox.Show($@"Error creating CSV file: {ex.Message}");
    }
}

This method ensures each new file has a unique name and writes the chunk of data accordingly.

6. Memory Management

When working with Excel Interop, releasing COM objects is crucial to avoid memory issues. The CleanupExcelResources method handles this:

private static void CleanupExcelResources(Excel._Application xlApp, Excel._Workbook xlWorkbook, Excel.Worksheet xlWorksheet, Excel.Range xlRange)
{
    if (xlRange != null) Marshal.ReleaseComObject(xlRange);
    if (xlWorksheet != null) Marshal.ReleaseComObject(xlWorksheet);
    if (xlWorkbook != null)
    {
        xlWorkbook.Close(false);
        Marshal.ReleaseComObject(xlWorkbook);
    }
    if (xlApp != null)
    {
        xlApp.Quit();
        Marshal.ReleaseComObject(xlApp);
    }

    GC.Collect();
    GC.WaitForPendingFinalizers();
}

Conclusion

By combining Microsoft Office Interop for Excel and standard file operations for CSV, this application provides a robust solution for splitting large reports into smaller, manageable files. The responsive UI and ability to process both file formats make it versatile for various reporting and data-processing needs.

Feel free to extend this application by adding more error handling, additional file formats, or improving the UI. Happy coding!

Leave a comment