Extract Email ids From File using Java

 Posted On  | Yashwant Chavan 

Sometimes we required to extract Email id from random text. In this tutorial we are going to demonstrate how to extract Email Ids from text file.

Below code is pretty much simple. Input file contains email ids along with junk text. Here we are going to separate out the email ids from text.

To extract email ids from file string , you need to define a regular expression, Refer below regular expression to extract the email ids.

([\\w\\-]([\\.\\w])+[\\w]+@([\\w\\-]+\\.)+[A-Za-z]{2,4})

A regular expression, specified as a string, must first be compiled into an instance of Pattern Class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression.

All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.

Input File

Lorem ipsum dolor sit amet, consectetur xyz@gmail.com adipisicing elit. Quasi, sapiente, nam sunt rem beatae architecto cupiditate numquam 
consectetur dolorum aliquam suscipit adipisci expedita vel quaerat illum aperiam facere inventore officia abc@gmail.com

Consequuntur test.test@yahoo.com sed ipsum eius minima eum velit test@yahoo.com soluta accusamus omnis test122@yahoo.com maiores modi quae mollitia consectetur adipisci.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas metus nulla, 

Commodo a sodales sed, some_test@yahoo.com dignissim pretium nunc. some@in.com Nam et lacus neque. Ut enim massa, sodales tempor convallis et. Lorem ipsum dolor sit amet, consectetur adipisicing elit. 
Ipsa, alias, nihil molestias libero corporis perferendis a quasi at.

Eos, illum, odit nulla provident abs@pqr.com sint atque quasi necessitatibus dolores voluptatibus perspiciatis aliquid tempora possimus laudantium. 
Blanditiis, deleniti!

EmailExtract.java

package com.technicalkeeda.app;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmailExtract {
    public static void main(String[] args) {
        BufferedReader br = null;
        FileReader fileReader = null;
        List <String> emailList = new ArrayList <String>();
        try {
            File file = new File("C:\\email.txt");
            fileReader = new FileReader(file);

            br = new BufferedReader(fileReader);

            String line = null;
            System.out.println();
            while ((line = br.readLine()) != null) {

                final String RE_MAIL = "([\\w\\-]([\\.\\w])+[\\w]+@([\\w\\-]+\\.)+[A-Za-z]{2,4})";
                Pattern p = Pattern.compile(RE_MAIL);
                Matcher m = p.matcher(line);

                while (m.find()) {
                    String email = m.group(1).trim();

                    if (!emailList.contains(email)) {
                        emailList.add(email);
                    }
                }
            }

            for (int i = 0; i < emailList.size(); i++) {
                System.out.println(emailList.get(i));
            }
        } catch (FileNotFoundException e) {
            System.out.println("FileNotFoundException:- " + e.getMessage());
        } catch (IOException e) {
            System.out.println("IOException:- " + e.getMessage());
        } finally {
            try {
                fileReader.close();
                br.close();
            } catch (IOException e) {
                System.out.println("IOException Finally:- " + e.getMessage());
            }
        }

    }
}

Output

xyz@gmail.com
abc@gmail.com
test.test@yahoo.com
test@yahoo.com
test122@yahoo.com
some_test@yahoo.com
some@in.com
abs@pqr.com


© technicalkeeda.com 2017

 |  Find us on Google+ |  Rss Feed

Loaded in 0.0339 seconds.