Monday, September 3, 2018

Getting Started with spaCy

spaCy claims to be the fastest natural language processing (NLP) library in the world. It also claims to be the best way to prepare text for NLP processing:

spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem. With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.

spaCy's creators differentiate spaCy from SciKit and Stanford's CoreNlp by characterizing spaCy as a production library, not a research library. For that reason, it is "opinionated," as that term is used in the tech world, and does not provide the ability to switch between similar models for a task. For my work, this is just fine.

The Lightning Tour shows how to load and use spaCy's most salient features, including its powerful browser-based visualizers. Text processing produces some complicated vectors yet speech itself is intuitive to most of us. The gap between our intuitive understanding of a text and the vectors produced when analyzing the same text is nicely bridged through these visualization tools.

The developers provide some simple spaCy demo programs written in Python.

My projects tend toward analyzing the speech of people who are excitedly blabbering into a text area in a browser. A common text might involve several hundred words, without punctuation, that appear to comprise multiple sentences but which are not obviously organized into sentences by the traditional likes of punctuation, capitalization, or line-breaks.

All NLP libraries utterly FAIL on most of the text I need to process. I did not make this next example up--this is exactly the question that was posed to me:

Okay in November last year cps made visit to my house an found no immediate dangers of my child so they set up a service plan an I followed everything except going up to do a drug assessment because i was had claimed to be smoking weed an that me an my boyfriend that lives with me at the time fight an that i shot at him well investagitor came out an assests the home an found no bullet holes in the home then i recieved a letter in the mail that states unable to determine but they was still going to offer family based safety service plan well i did not comply so i had stop responding to them an they then sent me a letter stating I was ordered to court an had to supply a drug test an that is when I tested positive for methaphemines an they removes her from the home cause my boyfriend at the time refused to even take one so they removed her an places her with one my daughter friend an the girls father well now as it is reported by cps my daughter has become even more depressed an has been smoking weed that was provided by the foster dad's niece in the home an my daughter has been sneaking out the house at night an having sex with men she's 18 to 20 an then I was notified that the foster dad niece had shot up 40units of heroin in front of my daughter an almost over dosed in front of my daughter an as after my daughter had told them she's going to kill herself she was removed from the home an placed in an behavioral center an then was removed from the home an is now still in the behavioral center due to they have no where to place my daughter so the hospital is holding her there until they can find placement well since then I did get clean from drugs an now have a job doing home healthcare as I did in previous years but now they are taking me back to court to have my parental rights terminated can I file indigent an get an attorney an prove that they submitted her to much more dangerous situations an gain coustidy back of her with stipulations as in I would submit a weekly drug test an random as well as my daughter an show that she was not in as much danger as to what she has experienced since she has been in the coustidy of cps

(Yes, the author of that question is probably still using "methaphemines".)

I ran that jewel through every model and SaaS I could find and threw it at a few humans, some whom were attorneys and some who were not. The software didn't even come close to discerning the true call of the question and no human made it through the text without screaming and saying something to the effect of, "this person is crazy."

Crazy or not, the person had a question and I gave her a 100% accurate one-word answer:

Question 1: If I meet the county's indigence requirements, can I get a court-appointed and court-paid-for attorney to represent me in a suit where the state wants to terminate my parental rights?

Answer 1: Yes.

Of course, she had the same follow up question that all litigants have, i.e., "Will I win?"

Question 2: If I can prove the foster care system exposed my daughter to more danger than I ever did, will I win?

Answer 2: I don't know.

CONCLUSION: spaCy appears to be a powerful library for NLP processing. The current state of the art requires that we prompt humans to break their text into smaller chunks responsive to more narrow prompts than, "What happened?"

No comments:

Post a Comment

9.4Thomas James Daley