How do I insert punctuation in unpunctuated text?

Hi all,
I am currently working on a very fun project, but unfortunately came across a problem I haven't been able to solve for some time. What I am trying to do is punctuate text that contains no punctuation. At the time of writing I have a lot of text files that contain proper punctuation, and matching text files without punctuation.
Initially, I thought that Matlab would have some Neural Network that I could train with the input and output files I have but no, unfortunately not.
Therefor I am reaching out to you and hope there is someone who can help me punctuate unpunctuated text.

8 commentaires

Jan
Jan le 22 Jan 2018
Please give us a small example.
You would not remove the punctuation in your case -- but you could convert them into tokens.
r r
r r le 22 Jan 2018

@Jan Simon:

The input is a txt file without punctuation:

What s happened to me he thought it wasn t a dream his room a proper human room although a little too small lay peacefully between its four familiar walls a collection of textile samples lay spread out on the table samsa was a travelling salesman and above it there hung a picture that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame it showed a lady fitted out with a fur hat and fur boa who sat upright raising a heavy fur muff that covered the whole of her lower arm towards the viewer

The output has to be the input with punctuation:

What's happened to me? he thought. It wasn't a dream. His room, a proper human room although a little too small, lay peacefully between its four familiar walls. A collection of textile samples lay spread out on the table - Samsa was a travelling salesman - and above it there hung a picture that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame. It showed a lady fitted out with a fur hat and fur boa who sat upright, raising a heavy fur muff that covered the whole of her lower arm towards the viewer.

r r
r r le 22 Jan 2018
@Walter Roberson:
I will take a look at the link provied
Guillaume
Guillaume le 22 Jan 2018
I have no idea how to solve this. It seems a very difficult problem, particularly as there seems to be many valid ways of applying punctuation to the given sample. e.g:
"What's happened to me?" he thought. It wasn't a dream.
What's happened to me? He thought it wasn't a dream.
r r
r r le 22 Jan 2018
@Guillaume,
I want to train a Neural Network to punctuate texts. With enough training it should be able to do this, and I have tons of ebooks to train the NN with.
Guillaume
Guillaume le 22 Jan 2018
No neural network is going to be able to say which is more correct of:
What's happened to me? He thought. It wasn't a dream.
"What's happened to me?", he thought. It wasn't a dream.
What's happened to me? He thought it wasn't a dream.
Without a ton of context no human can do that either. And even with context, it can still be ambiguous.
The traditional example:
Eats shoots and leaves. (Panda)
Eats, shoots, and leaves. (Gunman)
r r
r r le 22 Jan 2018
I don't need it to say if it's correct, all it needs to do is put some punctuation in and I will make a table containing all those entries. I will then compare this table with a table I've made of the original text and compare the two tables.
I've heard that this problem is more suitable for deep LSMT. But I have no idea how to implement this with Matlab. Do you maybe know how to implement this?

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Deep Learning Toolbox dans Centre d'aide et File Exchange

Question posée :

r r
le 22 Jan 2018

Commenté :

r r
le 22 Jan 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by